Identification of Potential Transcriptional Biomarkers Differently Expressed in Both S. aureus- and E. coli-Induced Sepsis via Integrated Analysis

Sepsis is a critical, complex medical condition, and the major causative pathogens of sepsis are both Staphylococcus aureus (S. aureus) and Escherichia coli (E. coli). Genome-wide studies identify differentially expressed genes for sepsis. However, the results for the identification of DEGs are inconsistent or discrepant among different studies because of heterogeneity of specimen sources, various data processing methods, or different backgrounds of the samples. To identify potential transcriptional biomarkers that are differently expressed in S. aureus- and E. coli-induced sepsis, we have analyzed four microarray datasets from GEO database and integrated results with bioinformatics tools. 42 and 54 DEGs were identified in both S. aureus and E. coli samples from any three different arrays, respectively. Hierarchical clustering revealed dramatic differences between control and sepsis samples. GO functional annotations suggested that DEGs in the S. aureus group were mainly involved in the responses of both defense and immune regulation, but DEGs in the E. coli group were mainly related to the regulation of endopeptidase activity involved in the apoptotic signaling pathway. Although KEGG showed inflammatory bowel disease in the E. coli group, the KEGG pathway analysis showed that these DEGs were mainly involved in the tumor necrosis factor signaling pathway, fructose metabolism, and mannose metabolism in both S. aureus- and E. coli-induced sepsis. Eight common genes were identified between sepsis patients with either S. aureus or E. coli infection and controls in this study. All the candidate genes were further validated to be differentially expressed by an ex-vivo human blood model, and the relative expression of these genes was performed by qPCR. The qPCR results suggest that GK and PFKFB3 might contribute to the progression of S. aureus-induced sepsis, and CEACAM1, TNFAIP6, PSTPIP2, SOCS3, and IL18RAP might be closely linked with E. coli-induced sepsis. These results provide new viewpoints for the pathogenesis of both sepsis and pathogen identification.


Introduction
Sepsis is the leading cause of death in noncoronary intensive care units, and sepsis has been increasing worldwide annually [1,2]. Sepsis is a critical, complex medical condition, and is characterized as "a life-threatening organ dysfunction caused by a dysregulated host response to infection [3]". The main causative pathogens of sepsis are bacteria, virus, and fungi, and Staphylococcus aureus (S. aureus) and Escherichia coli (E. coli) are common microorganisms detected in sepsis [4,5]. Because every hour of delay after the first 6 hours increases mortality by 8% [6], both prompt diagnosis and treatment aid survival of sepsis.
In the last few decades, genome-wide studies identified candidate host genes for sepsis development, but only some of them classified the different pathophysiological mechanisms of sepsis caused by Gram-positive bacteria (or S. aureus) and Gram-negative bacteria (or E. coli) [7][8][9][10][11][12][13]. Tang et al. revealed 2 BioMed Research International 94 genes differentially expressed between intensive care unit patients with and without sepsis and the subgroups of grampositive, gram-negative, and mixed infection samples had a similar transcriptional profile [14]. Ahn et al. identified classifier sets (human: two-factor and murine: four-factor) to distinguish S. aureus from healthy controls or E. coli bacteremia [12]. Thus, limitations still exist in any single study, and researchers wonder whether there are differently regulated genes among different types of microarrays. With an unbiased bioinformatics approach, we integrated the previous results and were able to discover effective and reliable biomarkers.
Therefore, the present study identifies significant host DEGs that are commonly regulated in S. aureus-and E. coliinduced sepsis by analyzing four microarray datasets from the gene expression omnibus (GEO) database. DEGs in S. aureus-induced sepsis vs. healthy controls or E. coli-induced sepsis vs. healthy controls were obtained by the R software (v3.3.2) and were enriched in any three datasets, and both the Gene Ontology (GO) process [15] and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways [16] were performed by the STRING database [17] and DAVID online tools [18,19], respectively. Then, the DEGs screened from all four datasets were identified and validated in an ex-vivo model with quantitative real-time polymerase chain reaction (qPCR). Our study provided potential transcriptional biomarkers for sepsis diagnosis, as well as pathogen identification.

Methods
. . Identification of Eligible Microarray Datasets. GEO database is a public database supporting high-throughput gene expression data (www.ncbi.nlm.nih.gov/geo/). We searched GEO for relevant studies with key words "sepsis", "homo sapiens", "expression profiling by array", "Staphylococcus aureus", and "Escherichia coli". A study was included in our analysis if the study fulfilled the following selection criteria: (1) study included both sepsis patients with positive culture results of S. aureus and E. coli and normal controls; (2) the sample was whole blood which is easier, available, and more widely used; (3) the study had almost whole genome-coverage (more than 10,000 genes) in each study. With these three selection criteria till January 1st, 2017, four datasets (GEO accession number numbers: GSE4607 [7][8][9], GSE25504 [10,11], GSE33341 [12], and GSE65088 [13]) met the above inclusion criteria and were retained for subsequent analysis. The samples of these four datasets were from human blood. Within the four datasets, 9289 genes were extracted for subsequent analyses. Basic information of these datasets, such as published articles, patient characteristics, source of infection, and sampling time, is showed in Table 1.
. . Data Preprocessing and Screening for DEGs. All primary data of the four studies (GSE4607 [7][8][9], GSE25504 [10,11], GSE33341 [12], and GSE65088 [13]) were downloaded from the GEO database and were analyzed, respectively, by R software and Bioconductor packages [20]. Firstly, arrays from Affymetrix were normalized with the "MAS5.0" normalization method [21], and Illumina arrays were normalized by GenomeStudio software (v2011.1, Illumina Inc.). Secondly, probe ID was converted into a unique official gene symbols; the symbol depended on the probe annotation information. Thirdly, to identify the DEGs within each dataset, we used the "limma" package [22] in R/Bioconductor to compare the gene expression between S. aureus/E. coli and control samples. Fourthly, the | log 2 Fold Change| ≥1.5 and false discovery rate (FDR, Benjamini & Hochberg methods) < 0.05 were used as the cut-off values for screening DEGs. Fifthly, the DEGs were analyzed by the VennDiagram function [23] in R; this function identifies the genes common to S. aureus-/E. coli-induced sepsis. Finally, the hierarchical cluster analysis of the candidate DEGs in either S. aureus or E. coli samples vs. healthy controls was sorted with Cluster 3.0 software (Stanford University), and the results were visualized with TreeView Tool [24].
. . Functional Annotation Analyses. The common genes in any three datasets were collected to gain insight into the biological functions in GO enrichment analysis by the STRING database (https://string-db.org/cgi/input.pl). A pvalue threshold of 0.05 was the statistically significant threshold, which identified significantly enriched GO biological process terms [15]. The KEGG database is a useful resource for pathway mapping, which integrate genomic, chemical, and systemic functional information [16]. DAVID is a group of online tools, which provide functional annotation of understanding biological meaning behind large list of genes (https://david.ncifcrf.gov/) [18,19]. With DAVID, KEGG pathway enrichment analysis was conducted for the common genes with p < 0.05 considered statistically significant.
. . Quantitative Real-Time PCR. A human whole-blood model validated the candidate genes identified in the all four studies, and the expression of candidate genes was performed by qPCR. Anticoagulated blood sample of healthy human donors (n = 6, male, over the age of 18) was treated with both S. aureus ATCC 25923 (1 × 10 ∧ 7/mL) and the same dose of E. coli ATCC 25922; was incubated at 37 ∘ C with gentle rotation for 4 hours; and was treated with stroke-physiologicalsaline solution for mock infection. Both S. aureus and E. coli were inoculated with six different donors. Total RNA was collected from the incubated blood, was extracted with Blood Total RNA Rapid Extraction Kit (BioTeke, Beijing, China), and was reverse-transcribed with PrimeScript RT reagent Kit. All qPCR reaction mixtures were performed with SYBR Premix Ex Taq kit (TaKaRa, Dalian, China), and the primers used are listed in Table S1. We performed all the kits according to the manufacturers' instructions. Messenger RNA (mRNA) expression of all candidate genes was normalized to the expression of 18S rRNA, and the relative expression of gene transcript was calculated using the 2 −ΔΔCt method [25]. Human peripheral blood was collected from healthy volunteers after informed consent. The study was approved by the Ethics Committee of the Third Xiangya Hospital of Central South University, and written informed consent was obtained from all blood donors in accordance with the Declaration of Helsinki. . . Statistical Analyses. All statistical analyses and graphs were performed with GraphPad Prism 7.00 software (Graph-Pad Software Inc.). Statistical differences among three groups were performed with one-way ANOVA followed by the Tukey test, and statistical differences are expressed as mean ± standard error of mean. All p-values are two-sided, and p < 0.05 was considered as statistically significant.

. . Identification of DEGs and Common Genes across Four Datasets.
With both | log 2 Fold Change| ≥1.5 and FDR < 0.05 as cut-off criteria, we selected hundreds of significantly upregulated or downregulated genes in each dataset. Our selections are summarized in Tables S2 and S3. Among the candidates, 42 genes were selected repeatedly between S. aureus-induced sepsis patients and healthy controls in at least three of the four datasets. Among the 42 genes, 31 upregulated and 11 downregulated genes, shown in Figure 1(a) and Table S3, were selected. Yet there were 54 significantly common genes between E. coli-induced sepsis and normal controls. Among the 54 genes, 41 upregulated and 13 downregulated genes, shown in Figure 1(b) and Table  S3, were screened. Hierarchical cluster analysis of these 96 selected genes revealed that remarkable differences existed between the control and sepsis samples, but a huge similarity was seen between S. aureus and E. coli groups (shown in Figure 2).
. . Functional Annotation Analysis. Functional enrichment analysis of the 42 or 54 common DEGs identified from any three datasets was performed separately in both S. aureus-induced sepsis and E. coli-induced sepsis, and the top 10 significantly enriched biological processes are listed in Table 2. In the GO analysis, "defense response" (GO: 0006952, p = 2.37E-04) was the most dramatically enriched function in sepsis caused by S. aureus (Table 2(a)), and "regulation of cysteine-type endopeptidase activity involved in apoptotic signaling pathway" (GO: 2001267, p = 2.61E-04) was the most highly enriched function in sepsis with E. coli infection (Table 2(b)). We used DAVID to analyze the total DEGs identified from any three studies, and the significantly enriched pathways of these genes were submitted to KEGG analysis. The results were shown in Table 3. The "tumor necrosis factor (TNF) signaling pathway" and the "fructose and mannose metabolism" were mainly enriched signaling pathways within the upregulated genes in both S. aureus-and E. coli-induced sepsis, while in the E. coli infection group "inflammatory bowel disease (IBD)" was an extra enrichment pathway within the upregulated genes. However, no significantly enriched pathway was identified in downregulated genes in either group.

. . Validation of the Most Commons across All Four Datasets by Ex-Vivo Experiments.
To further investigate the common DEGs between S. aureus-and E. coli-induced sepsis, we screened eight key genes, which emerged in all four datasets. Surprisingly, none of common genes was found to be downregulated in either infection. Most notably, CEACAM1, GK, PFKFB3, and TNFAIP6 were increased in S. aureus group in four datasets, while CEACAM1, IL18RAP, LILRA5, PFKFB3, PSTPIP2, and SOCS3 were raised in E. coli group. The foldchange, T-test, and FDR-adjusted p-values of these eight key genes in the original four studied datasets are presented in Table 4.
To validate the eight candidate genes searched in all datasets, a human whole-blood ex-vivo model was carried out to detect the mRNA expression of these eight key genes by real-time qPCR. As shown in Figure 3, the expression levels of GK and PFKFB3 in the S. aureus group were higher than those of the mock infection group (p < 0.001, respectively, Figures 3(b), 3(c), and 3(d)). Also in Figure 3, GK, CEACAM1, TNFAIP6, PSTPIP2, SOCS3, and IL18RAP were remarkably higher in the E. coli-treated group than in the mock infection group (p < 0.001 and p < 0.01, respectively, Figures 3(a), 3(e), 3(f), and 3(g)). However, no significant difference existed in the LILRA5 level between the control and the E. coli group.

Discussion
Sepsis is one of the common causes of death in intensive care units [26][27][28]. The pathogenesis of sepsis involves invading pathogens, host immune responses, and multiple tissue damage caused by their complex interactions. Despite great progress made in understanding the pathophysiology of sepsis, we still lack indicators for early diagnosis. Therefore, the interaction between microorganisms and host is important to study, and understanding the molecular mechanisms of sepsis development is important.
Microarray studies that detect the mRNA levels of millions of genes in human beings provide an opportunity for early diagnosis in sepsis [29]. Because only a few identified the changes of host expression levels in different pathogen infections in sepsis, clear and effective diagnostic biomarkers are unknown. Most studies came from either a single cohort study or a multiple pathogen background. In addition, the results for the identification of DEGs are inconsistent or discrepant among different studies because of heterogeneity of   specimen sources (e.g., blood, peripheral blood mononuclear cells, and neutrophils), diverse types of pathogens, various data processing methods, or different backgrounds of the samples. Therefore, confounding effects cannot be eliminated in these studies.
In this study, we identified, with R software, DEGs with both S. aureus-and E. coli-induced sepsis in four different gene expression profiling datasets, and integrated common DEGs for deep analyses by informatics tools. Based on four public GEO datasets with case-control study design, we identified 42 notable genes with S. aureus samples (31 upregulated and 11 downregulated) and 54 significantly changed genes with E. coli patients (41 upregulated and 13 downregulated).
Both the 42 genes and the 54 genes are commonly regulated in at least three different arrays, respectively.
The microarray and the pathophysiology of sepsis are consistent. From hierarchical clustering analysis, remarkable differences between control and sepsis samples were observed, but, unfortunately, many similarities between S. aureus-and E. coli-induced sepsis were observed. These results were essentially in agreement with previous studies. For instance, Tang confirmed that sepsis patients with Grampositive and Gram-negative infection had a homogeneous host response at the transcriptional level [14]. In fact, the clinical features of Gram-positive and Gram-negative sepsis are not easily distinguishable [30]. It is usually thought that Note. FC: fold-change; FDR: false discovery rate; CEACAM1: carcinoembryonic antigen related cell adhesion molecule 1; GK: glycerol kinase; PFKFB3: 6phosphofructo-2-kinase/fructose-2, 6-biphosphatase 3; TNFAIP6: TNF alpha induced protein 6; IL18RAP: interleukin 18 receptor accessory protein; LILRA5: leukocyte immunoglobulin like receptor A5; PSTPIP2: proline-serine-threonine phosphatase interacting protein 2; SOCS3: suppressor of cytokine signaling 3.
this conservative program of gene expression might be part of host's general "alarm signal" to maximize the detection of invasive pathogens. Nevertheless, the heterogeneity of the pathogenic mechanism remained in two bacterial infections; this observation was seen in both the GO analyses and the KEGG pathway enrichments of DEGs. DEGs were analyzed by GO functional annotation, which showed that DEGs in S. aureus group were mainly involved in the responses of both defense and immune regulation; however, common genes of E. coli group were mainly related to the regulation of endopeptidase activity involved in the apoptotic signaling pathway. Furthermore, the enriched KEGG pathways of common genes in S. aureusinduced sepsis included both the TNF signaling pathway and fructose and mannose metabolic pathway, while the KEGG pathway enrichments in sepsis with E. coli infection consisted of TNF signaling pathway, IBD, and fructose and mannose metabolism. TNF signaling pathway is intimately implicated in the innate immune response in the development of sepsis [31]. As one of the most important proinflammatory cytokines, TNF-can mediate a wide range of pathways such as both apoptosis and inflammation [32] and has been defined as a major component in the pathogenesis of sepsis [33]. One experimental mouse model suggested that the deficiency of the TNF receptor I could protect mice from both lipopolysaccharides (LPS) and S. aureus-enterotoxin B induced septic shock [34]. Fructose and mannose metabolism leads to enhanced glycolysis and N-glycan biosynthesis [35,36], anaerobic glycolysis may be a novel therapeutic target for sepsis-related acute lung injury [35], and the product lactose is closely bound up with septic shock [3]. Therefore, these two pathways may play important roles in the development of sepsis induced by S. aureus and E. coli and may provide potential insights of the therapeutic strategies in sepsis.
As for the eight common genes screened out in all four datasets, CEACAM1, GK, PFKFB3, and TNFAIP6 emerged repeatedly in the S. aureus group, but CEACAM1, IL18RAP, LILRA5, PFKFB3, PSTPIP2, and SOCS3 emerged in the E. coli infection. Both CEACAM1 and PFKFB3 were reduplicated. Then, by detecting the changes of mRNA expression, we validated these eight key genes in an ex-vivo experiment of both S. aureus-and E. coli-treated human whole-blood samples.
Importantly, we revealed that GK and PFKFB3 were upregulated in S. aureus group, yet GK, CEACAM1, TNFAIP6, PSTPIP2, SOCS3, and IL18RAP were increased in E. coli group. The protein encoded by PFKFB3 is an important enzyme in glycolysis, and the protein contributes to cell apoptosis, enhancement of ROS, and the development of sepsis [35][36][37]. GK is a key enzyme in the regulation of glycerol uptake and metabolism, and a study found that GK was increased in the septic rat models [38]. TNFAIP6 is upregulated in response to many proinflammatory cytokines such as TNF-and interleukin-1, and elevated levels of TNFAIP6 have been reported in the plasma of both LPS stimulation [39] and S. aureus-induced mastitis [40]. CEACAM1 is a receptor on neutrophils, and CEACAM1 negatively regulates both NLRP3 inflammasome activation and immune response [41][42][43] and has been found to increase the susceptibility of bacterial infection [44]. PSTPIP2 is an actin-associated protein expressed in macrophages, and PSTPIP2 regulates both filopodia formation and directional motility of the macrophage [45]. SOCS3 plays important role in the course of sepsis and is reportedly involved in the proinflammatory phenotype polarization of the M1 macrophage [46,47]. IL18RAP is a subunit of the heterodimeric receptor for interleukin 18 and is reported to be elevated in E. coli-caused bacteremia [48]. A mutation of IL18RAP is closely related to both Crohn's disease and IBD [49][50][51]. LILRA5 is involved in both macrophage activation and secretion of several proinflammatory cytokines, and LILRA5 has a potential impact on pathogenesis of rheumatoid arthritis [52]. However, the expression of LILRA5 mRNA has no difference between E. coli infection and control group by qPCR in the ex-vivo model. In this study, qPCR results further indicated that almost of eight candidate genes were expressed differently in different bacterial infections, and qPCR has the potential to distinguish S. aureus and E. coli infections. Our qPCR conclusion roughly agrees with studies previously reported. Although the exact contributions of these genes to identify both S. aureus-and E. coli-induced sepsis are not clear yet, further research should investigate these eight genes as potential transcriptional biomarkers for pathogen identification in sepsis. Hence, to achieve a more convincible conclusion, further validation using patient samples is as well required.
In conclusion, we identified 42 or 53 DEGs that were differentially expressed between sepsis patients with S. aureus or E. coli infection and healthy controls, respectively. GO and pathway enrichment analysis revealed that these common markers were strongly associated with immune response or regulation of endopeptidase activity. The qPCR results suggested that GK and PFKFB3 might contribute to the progression of S. aureus-induced sepsis, and GK, CEACAM1, TNFAIP6, PSTPIP2, SOCS3, and IL18RAP might be closely linked with E. coli-induced sepsis. Our study has gained novel insight into sepsis pathogenesis and has confirmed systematic changes in different gene expression patterns between S. aureus-and E. coli-induced sepsis. Such insights may ultimately lead to early pathogen identification in sepsis. Tumor necrosis factor TNFAIP6: TNF alpha induced protein 6.

Data Availability
The datasets supporting the conclusions of this article are within the article and its additional files.

Ethical Approval
The study was approved by the Ethics Committee of the Third Xiangya Hospital of Central South University.

Consent
Written informed consent was obtained from all blood donors.

Conflicts of Interest
The authors declare that they have no competing interests.