Diagnostic biomarkers for invasive aspergillosis utilizing weighted gene co-expression network analysis

Background: Invasive aspergillosis (IA) has a signicant mortality in immunocompromised patients. In recent years, with more aggressive immunosuppressed therapies, the incidence of IA was increasing. However, diagnostic biomarkers with high sensitivity and specicity remain rare. To get new diagnostic biomarkers, microarray dataset GSE78000 was analyzed. Methods: Weighted gene co-expression network analysis (WGCNA) was used to identify hub genes. Roc curves were employed for investigating diagnostic biomarkers for IA. Results: Hub genes were TLR4, TP53I3/PIG3, TMTC1, ITGAM, CYSTM1, FAR1, GAS7 and MKNK1. However, after we compared gene expression of hematological patients suffering from IA with non-IA patients, only TLR4, TP53I3/PIG3 and TMTC1 were signicantly high expression in IA patients. At the optimal cut ‐ off value, TLR4 can diagnose patients with IA with 78.3% sensitivity and 72.7% specicity. TP53I3/PIG3 can diagnose patients with IA with 91.3% sensitivity and 54.5% specicity. TMTC1 can diagnose patients with IA with 78.3% sensitivity and 81.8% specicity. In addition, the data of hematological patients suffering from Staphylococcus aureus (S. aureus) and Escherichia coli (E.coli) infections were also analyzed. The results showed that TLR4 and TP53I3/PIG3 were also signicantly high expression in S. aureus and E.coli infections, while only TP53I3/PIG3 was obviously higher expression in patients with bacterial infections compared with IA. As for TMTC1, we cannot annotate the gene from the microarray data. Conclusions: our results suggested that TLR4, TP53I3/PIG3 and TMTC1 might be used for the diagnosis of IA, and TP53I3/PIG3 can also be used to discriminate hematological aspergillosis and bacterial infections.


Background
Immunocompromised patients with hematologic malignancies (HM), hematopoietic stem cell transplant (HSCT), solid organ transplantation and other immunosuppressed therapies are susceptible to IA [1,2]. Worldwide, IA is mainly caused by Aspergillus fumigatus (A. fumigatus) [3] with high mortality and morbidity especially in patients with HM and HSCT [4][5][6]. Although healthy people inhale conidia of Aspergillus daily, we don't show illness. Healthy people can clear Aspergillus via their immune system, primarily including alveolar macrophages, neutrophils, monocytes, and natural killer cells [7][8][9]. While, immunocompromised patients cannot clear Aspergillus spores in time because of the de ciency of local and systemic immune defences. As a result, Aspergillus can invade the lung and disseminate to other organs and tissues through hematogenous spread, for example the brain [10,11].
Given the serious harm of IA, the early diagnosis and therapy are crucial for survival rates of IA. However, the early symptoms and signs of IA are lack of speci city [12]. In addition, because the conventional methods for the diagnosis of IA have low sensitivity and speci city, early diagnosis of IA is extremely di cult. Therefore, new diagnostic biomarkers to detect IA are urgent.
To address this challenge, we analyzed the expression pro le of 23 samples of hematological patients with IA, 11 samples of hematological patients with non-IA and 9 normal samples via bioinformatics. The gene expression data were uploaded by Andreas Dix et al [13]. The data were downloaded from the GEO database and reanalyzed by bioinformatics. We identi ed a total of 2303 differentially expressed genes (DEGs) between hematological patients with IA and normal samples. These DEGs were analyzed by gene ontology (GO) analysis, Kyoto encyclopedia of genes and genomes (KEGG) analysis, gene set enrichment analysis (GSEA) and WGCNA. The results of WGCNA showed that 8 co-expression modules were constructed. A black module was founded signi cantly correlated with IA. Hub genes in black module were TLR4, TP53I3/PIG3, TMTC1, ITGAM, CYSTM1, FAR1, GAS7 and MKNK1. Then, diagnostic biomarkers were identi ed by comparing the gene expression of hematological patients suffering from IA with non-IA. Then, the data of hematological patients with bacterial infections (the data was uploaded by SH Ahn et al [14]) was also analyzed to validate the gene expression of diagnostic biomarkers. Finally, we identi ed TLR4, TP53I3/PIG3 and TMTC1, which were high sensitivity and speci city, may be used as new diagnostic biomarkers for IA, and TP53I3/PIG3 may also be used to discriminate hematological aspergillosis and bacterial infections.

GEO dataset
Gene expression data of GSE78000 [13] and GSE33341 [14] were downloaded from the National Center for Biotechnology Information (NCBI) GEO database. In GSE78000 [13], there are 43 samples of expression pro le including 23 hematological patients with IA, 11 hematological patients with non-IA and 9 normal samples obtained from whole blood. In GSE33341 [14], there are 94 samples including 43 hematological patients, 32 hematological patients with S. aureus infections and 19 hematological patients with E.coli infections.

Identi cation Of DEGs
The data were downloaded by R package GEOquery [15]. The data were normalized using Limma package and we used eBayes analysis to analyze DEGs [16]. We employed |log 2 Fold change|≥0.5 and p ≤ 0.05 as parameters to screen DEGs. Volcano map was generated using ggplot2 package [17], and heatmap was produced using pheatmap package [18].

GO, KEGG, GSEA And Disease Ontology Semantic Enrichment Analysis
DEGs and module genes were analyzed by ClusterPro ler package [19], which was also used for GO, KEGG and GSEA analysis. Disease Ontology Semantic enrichment (DOSE) analysis was processed utilizing DOSE package [20]. All results are at a signi cant level (p < 0.05).
WGCNA Analysis WGCNA package [21] was used to construct the co-expression net. First, we identi ed soft-thresholding power β using the function of WGCNA package [21]. Second, WGCNA algorithms were employed to construct co-expression modules and blockwiseModules function of WGCNA was used to determine coexpression modules. Then, the modules were assessed to identify the correlations between modules and clinical traits. The module which was signi cantly correlated with clinical traits was analyzed using GO, KEGG and DOSE analysis. The string database [22] was further used to identify the genes, which may play an important role in black module. Finally, hub genes were identi ed using the reported methods [23]. Cytoscape [24] was used to visualize module genes and hub genes. Statistical analysis of hub genes was conducted by ggstatsplot package in R. In addition, receiver operating characteristic (ROC) curves of possible biomarkers were drawn using pROC package [25].

Results
The expression pro le analysis of microarray data We identi ed 2303 DEGs, including 1076 up-regulated genes and 1227 down-regulated genes. The heatmap of DEGs was shown in Fig. 1A. The volcano plot (Fig. 1B) showed the distribution of all genes, the results showed that red plots represented up-regulated genes, green plots represented down-regulated genes and the symbol of DEGs, whose fold change (FC) beyond 8 folds were labeled. The GO analysis was performed to identify the function of DEGs. The Fig. 1C showed that several terms were enriched, immune related functions such as neutrophil activation, neutrophil activation involved in immune response, neutrophil mediated immunity, neutrophil degranulation, regulation of leukocyte activation, T cell activation, positive regulation of cytokine production, lymphocyte differentiation and T cell differentiation were primarily enriched. KEGG analysis was conducted to identify the key pathways of

Construction Of Co-expression Network
The WGCNA analysis was conducted on 2303 DEGs. Softpower 18 was chose to construct the coexpression network. The results were shown in Fig. 2A, we showed the all modules and the merged modules, which were merged because of similarity of modules. After modules were merged, a total of 8 modules including green, midnightblue, cyan, lightcyan, black, grey60, salmon and grey were obtained. The grey module was removed in analysis, because of these genes not belonging to any module.
Correlations between modules and IA were shown in Fig. 2B. Correlation analysis showed that the black module was the mostly signi cant correlation with IA (r = 0.62). As shown in Fig. 2C, the results of cluster also indicated that black module is the highest relationship with IA, followed by salmon (r = 0.53) and grey60 (r = 0.52). Meanwhile, the module expression patterns of black, salmon and grey60 were shown in Fig. 2D. In addition, Fig. 2E showed the relationship among module eigengenes.

Go Analysis Was Applied To Analyze Black Module
We analyzed the black module, which was the highest relationship with IA, using GO analysis. The results showed that there several terms were same as the results of total DEGs GO results (Fig. 3A). However, there were still some immune-related terms were uniquely enriched, such as regulation of in ammatory response, myeloid cell differentiation, negative regulation of cytokine production, myeloid leukocyte differentiation and macrophage activation. Figure 3B showed gene expression of immune-related functions. The genes of immune-related functions were almost all up-regulated, while down-regulated genes were in the minority (Fig. 3C). As shown in Fig. 3D GO-GSEA analysis was further performed, the black module genes were mainly enriched in secretory granule, secretory vesicle, cytoplasmic vesicle part, intracellular vesicle and cytoplasmic vesicle. Finally, the cluster results were shown in Fig. 3E.

KEGG Analysis Was Applied To Analyze Black Module
We also performed the KEGG analysis for black module. Histogram and bubble charts of pathway enrichment were shown in Fig. 4A. Immune-related pathways such as Human T-cell leukemia virus 1 infection, NOD-like receptor signaling pathway, TNF signaling pathway, Acute myeloid leukemia and Fc epsilon RI signaling pathway were signi cantly enriched. Overlapped top ten pathways were shown in Fig. 4B. The top ten pathways were shown in circle type (Fig. 4C), gene expression of these pathways was also presented. The heatmap of enriched pathways was shown in Fig. 4D. Almost all genes were upregulated, while FCER1A and HLA-DQA1 were signi cantly down-regulated. In order to identify which diseases were related to the black module, DOSE analysis was conducted. As was shown in Fig. 4E, Pneumonia, Lung diseases, Carotid Atherosclerosis, Periodontitis and Juvenile arthritis were signi cantly enriched. Remarkably, Pneumonia and Lung were the top two enriched diseases. They may be correlated with IA. In addition, the string database [21] was used to identify key genes, which may play an important role in IA. Eight genes were screened as key genes, including MMP9, TLR8, TLR2, CYBB, ITGAX, ITGAM and MPO (Fig. 4F).

Hub genes in black module and ROC curves of biomarkers
We identi ed a total of 621genes in black module using WGCNA in R. 87 genes (r > 0.6) were signi cantly related with IA. Then, hub genes were identi ed by degree values, the results were shown in Fig. 5A. TLR4, TP53I3/PIG3, TMTC1, CYSTM1, FAR1, MKNK1, GAS7 and ITGAM were identi ed as the hub genes. The hub genes expression and statistical analysis between hematological patients with IA and normal samples were conducted (Fig. 5B). As was shown in Fig. 5C, the hub genes expression and statistical analysis between hematological patients with IA and non-IA. Finally, TLR4, TP53I3/PIG3 and TMTC1 were identi ed as potential biomarkers for IA, because they were signi cantly differential expression between hematological patients with IA and non-IA. Moreover, to explore if TLR4, TP53I3/PIG3 and TMTC1 could be outstanding biomarkers. ROC curve analysis was performed (Fig. 5D). The results revealed that the area under curve (AUC) for the three genes was beyond 0.7. At the optimal cut-off value of TLR4 (cut-off value = 8.109), sensitivity was 78.3% and speci city was 72.7%. The results for TP53I3/PIG3 (cut-off value = 4.470), sensitivity was 91.3% and speci city was 54.5%. In addition, for TMTC1 (cut-off value = 2.898), sensitivity and speci city were 78.3% and 81.8%, respectively. Then, the three genes were validated if they were high expression in hematological patients with bacterial infections. Surprisingly, TLR4 and TP53I3/PIG3 were high expression in hematological patients with bacterial infections (Fig. 5E). However, the expression levels of TP53I3/PIG3 were obviously higher in hematological patients with bacterial infections than hematological patients with IA.

Discussion
Aspergillus species appear widely in our environment, most ubiquitously in soil, water and decaying plant matter [26]. There are more than 250 species, which were identi ed as the Aspergillus species, and infections caused by Aspergillus species are the primary cause of fatality in patients with chronic granulomatous disease [27,28]. Moreover, there is the most dread of mortality (about 85%-100%) in compromised patients with central nervous system aspergillosis [29]. Due to IA has a rapidly progressive [30], diagnosis and treatment should be done as quickly as we can. Yet even galactomannan (GM), beta-D-glucan (BDG) and polymerase chain reaction (PCR) have been ubiquitously used to the diagnosis of IA [31,32], clinical diagnosis is notoriously challenged, because of the low sensitivity and speci city of the conventional methods and the unspeci c syndromes [33].
Aspergillus GM is the most frequently researched and used biomarker for the diagnosis of IA. Previous studies reported that GM serum had a sensitivity of 68-74%, while GM bronchoalveolar lavage (BAL) had a sensitivity of 55-87% [34,35]. However, there are false-positives, caused by food, medications and other fungi. Moreover, after posaconazole prophylaxis as the standard of care especially for patients undergoing chemotherapy with acute myeloid leukemia, the sensitivity of GM is only about 23.07-35.7% [36,37]. Previous studies also reported that BDG serum and BDG BAL had sensitivity of 67-84% and 71%, respectively [38,39]. However, for the diagnosis of IA, the BDG test cannot be an e cient tool [40]. Furthermore, the BDG test cannot distinguish between Aspergillus and other fungi [41]. As for PCR, there remain disputes among experts. Indeed, there was lack of standardization in different tissues, and biomarkers were very rare.
In this study, we screened out new biomarkers for the diagnosis of IA. Firstly, we analyzed the gene expression pro le between hematological patients with IA and normal samples. The results of Go and KEGG analysis indicated that several immune-related functions or pathways, such as neutrophil activation, neutrophil activation involved in immune response, neutrophil mediated immunity, Th17 cell differentiation, Th1 and Th2 cell differentiation were signi cantly enriched. In addition, GSEA analysis also indicated that MAPK signaling pathway and NOD-like receptor signaling pathway were up-regulated. Then, WGCNA analysis was conducted to identify the modules of the relationship with IA. Black module was identi ed as the key module and GO, KEGG and DOSE analysis were performed to explore the gene functions of black module. GO and KEGG of black module were similar to the results of DEGs. This also revealed that these immune-related functions or pathways may play a crucial role in the response to the infections of Aspergillus. In the DOSE analysis, Pneumonia and Lung diseases were signi cantly enriched. That may be correlated with the infections of Aspergillus. Because inhale the spores of Aspergillus to the lung is the precondition. Then, we identi ed eight hub genes in black module. Hub genes were validated by the gene expression and were analyzed using ROC curve. Finally, TLR4, TP53I3/PIG3 and TMTC1 were identi ed as new biomarkers for the diagnosis of IA.
TLR4 was generally considered that played an important role in the recognition of bacterial lipopolysaccharides. However, TLR4 receptor was also reported that can recognize the pathogenic fungi and active the immune responses [42,43]. In a review also introduced that TLR4 can recognize Aspergillus [44]. In our analysis, TLR4 was signi cant overexpression when hematological patients suffering from IA were compared with hematological patients with non-IA patients and normal samples.
The overexpression of TLR4 may be that immune systems of hematological patients can recognize the Aspergillus, although immune systems of hematological patients cannot clear the infections because of immunode ciency. Besides, the overexpression of TLR4 may also provide us with a new diagnostic biomarker. At the optimal cut-off value of TLR4, the sensitivity and speci city to diagnose hematological patients suffering from IA and non-IA patients were 78.3% and 72.7%, respectively. While, TP53I3/PIG3, an inducer of reactive oxygen species (ROS), can be activated by TP53 and may also participate in apoptosis [45]. TP53I3/PIG3 was also signi cant overexpression in hematological patients with IA. This may provide an indirect clue that the overexpression of TP53I3/PIG3 induces ROS production to antifungal infections. In this study, TP53I3/PIG3 may be a potential diagnostic biomarker, because it had 93.1% sensitivity and 54.5% speci city to diagnose IA. That was high enough sensitivity to diagnose IA.
As for TMTC1, there were few researches and the functions of TMTC1 remain obscure so far. TMTC1 was up-regulated about 2.6 folds, after IA patients compared with hematological patients with non-IA or normal samples. It may play an important role in the response to Aspergillus infections and be used as a new diagnostic biomarker. In addition, TMTC1 can diagnose patients with IA with 78.3% sensitivity and 81.8% speci city in this study. In order to identify if the three biomarkers were only high expression in IA, they were validated in hematological patients with bacterial infections. The results showed that TLR4 and TP53I3/PIG3 were high expression in hematological patients with bacterial infections. As for TMTC1, we cannot annotate the gene from the microarray data. Although TP53I3/PIG3 was high expression in hematological patients with bacterial infections, the expression levels of TP53I3/PIG3 were up-regulated nearly 4 times in these patients. That was only up-regulated about 2 times in IA patients. Therefore, we can diagnose the patients as potential IA when TP53I3/PIG3 was up-regulated no more than 2 times.

Conclusions
Our studies screened out three new diagnostic biomarkers. More importantly, TLR4, TP53I3/PIG3 and TMTC1 can combine with other biomarkers to increase the sensitivity and speci city. In addition, TP53I3/PIG3 may also be used to distinguish bacterial and Aspergillus species infections. However, in order to detect IA more exaction, there are still many questions that need to be addressed and more researches in exploring new diagnostic biomarkers should be done.

Availability of data and materials
The data that support the ndings of this study are openly available in GEO at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE78000.
Ethics approval and consent to participate Not applicable