Meta-analysis of transcriptome datasets: An alternative method to study IL-6 regulation in coronavirus disease 2019

Graphical abstract


Introduction
Coronavirus disease 2019 (COVID- 19) is wreaking havoc in healthcare and economic systems worldwide [1]. Analyses of the mode of action of the pathogen that causes COVID-19, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), are crucial to develop therapeutics.
Studies have shown that high ratio of COVID-19 patients develop severe disease and a proportion of such patients die [1,2]. Cytokine release syndrome (CRS; also known as ''cytokine storm") can be triggered by various factors, and occurs when large numbers of white blood cells are activated and release inflammatory cytokines which, in turn, activate yet more white blood cells [3]. CRS has been suggested to be one of the principal causes for severe COVID-19, which leads to increased risks in patients with cardiovascular disease and diabetes mellitus.
Clinical evidence has revealed interleukin (IL)-6, a major chemokine in CRS, to be a critical biomarker and predictor for severe COVID-19 [4][5][6][7]. IL-6 is pleiotropic, and a vital pro-inflammatory factor that regulates hematopoiesis, respiration, as well as the response to infection and cancer [8][9][10][11]. Hence, studying the key regulatory genes downstream of IL-6 is important for analyses of its mechanism of action and the design of drug targets against SARS-CoV-2.
Analysis of the transcriptome is a high-throughput, molecularbiological approach to deconstruct the intricate gene-regulation network. So far, there are studies applied this method to resolve this problem and identified several differential expressed genes in COIVID-19 patients [12][13][14][15][16]. However, to study severe COVID-19 in this way, a series of complex problems must be taken into consideration to save time and ensure accuracy, such as the heterogeneity of clinical samples and the difficulty of obtaining appropriate samples. IL-6 has been shown to be a key biomarker for severe COVID-19, so there is an alternative method to integrate the common factors regulated by IL-6 from other diseases or treatment studies. Table 1 Summary of transcriptome datasets with IL-6 treatment used in this study. The numbers of experiment and control indicate the replicates in each dataset. All treatment used exogenous IL-6 except the * and # marked one. *, IL-6 receptor antibody; #, IL-6 level determined in patients. (-1/2/3 in the same Datasets were different treatment experiments).

Datasets
Experiment Control Sample Treatment Several transcriptome datasets related to IL-6 regulation are available. Study used Illumina HT12.0 microarrays to ascertain the differentially expressed genes (DEGs) in dendritic cells upon treatment with tumor necrosis factor (TNF)-a, IL-1, IL-6, prostaglandin E2 or interferon (IFN)-c [17]. Researcher investigated the transcriptional changes of renal tubular epithelial cells stimulated by the proinflammatory cytokines IL-6, IFN-c or TNF-a. They showed that nuclear factor-kappa B (NF-jB), signal transducer and activator of transcription (STAT)1 and STAT3 were the key genes influencing gene expression during renal aging [18]. A similar study on dendritic cells used transcriptome analyses to validate regulation of STAT3 expression by IL-6 [19]. Study used an antibody against the IL-6 receptor to block the IL-6 pathway and investigated the transcriptome in different tumor cells. They found that the STAT3 signaling pathway was highly active in most cell lines, suggesting that IL-6 regulated signaling [20]. Researchers investigated IL-6 regulation by transcriptome analyses on apoptosis of INA6 cells through the p53/STAT3 pathway [21]. Human monocyte-derived macrophages have also been studied under IL-6 stimulation by genome-wide transcriptome analyses [22]. The transcriptome research mentioned above suggests that IL-6 regulation varies under different conditions. So far, scholars have not integrated those results to find the common genes involved in IL-6 regulation to support relevant application in COVID-19.
Meta-analyses are popular methods to statistically integrate results considering the sample size in each experiment. It has enabled considerable progress to be made in identifying and replicating common genetic variants associated with susceptibility to certain diseases. Researchers undertook a meta-analysis to study infection by the West Nile virus by comparing new listed genes between samples from infected patients and those from healthy controls [23]. A meta-analysis can also be very helpful for studying functional annotation among multi-gene expression. Researchers employed a meta-analysis to study liver and heart tissues by collecting transcriptome data [24]. Researchers utilized an integrative meta-analysis of expression data to identify the various genes expressed [25]. However, they just average the fold change or statistical possibility from multiple experiments without standard processes of meta-analysis.
A standard meta-analysis is a statistical procedure for combining datasets from multiple studies considering the sample size. If the treatment effect (or effect size) is consistent from one study to another study, a meta-analysis can be used to identify this common effect. Due to the differences in the genetic background of human samples, a meta-analysis is required.
We undertook a standard meta-analysis by collecting transcriptome datasets from IL to 6-treatment experiments. We discovered that this method can help to identify IL-6-regulated genes in COVID-19.

Sample collection and ethical approval
Blood samples from patients with severe COVID-19 admitted to Ganzhou Fifth People's Hospital (Chizhu, China) from January 2020 were used for analyses. The diagnostic criteria for the selected cases were based on the COVID-19 Treatment Guide (4th Edition, 2020). Ethical approval of this study was obtained from the Ganzhou Fifth People's Hospital Ethics Committee.

Collection of microarray data
The keyword we used for searching was ''IL-6", and we limited the research type to ''expression profiling by array", in the Gene Expression Omnibus (GEO) database from the National Center for Biotechnology Information (www.ncbi.nlm.gov/geo/). the results provide associated with IL-6 treatment genome-wide expression data are 13 sets of chips.
The inclusion criteria for the dataset were: (i) the dataset must be genome-wide mRNA-expression chip data supported by the literature; (ii) the original or standardized dataset must be considered; (iii) each dataset must include > 3 samples.
After these searches, we obtained the gene-expression microarray data from the GEO database. Simultaneously, we established a corresponding control group, from which four independent microarray datasets with raw data were selected (Table 1).
2.3. Meta-analysis of differential gene expression R language (R Center for Statistical Computing, Vienna, Austria) was used to process the data, conduct statistical analyses, and obtain the path through which the data changed together. A meta-analysis was conducted on the results with the randomeffects model to obtain the combined differential expression of genes, statistical tests, and to input the genes into the Database for Annotation, Visualization and Integrated Discovery (DAVID) website to obtain the possible enriched pathway.

Gene enrichment using the gene ontology (GO) and kyoto encyclopedia of genes and genomes (KEGG) databases and disease analysis
In the present study, DEGs were defined through functional interpretation using DAVID (http://david.abcc.ncifcrf.gov/) [26]. We carried out statistical analyses, with p < 0.05 denoting significance. We obtained a gene-symbol list that was uploaded into DAVID. We also used the GO and KEGG databases. To unify their format, a functional annotation chart was used for uploading and analyses. Interactions between disease and chemicals were analyzed using the Comparative Toxicogenomics Database (CTD) (ctd.mdibl.org/). The co-expression network were constructed by STRING database (https://string-db.org/cgi/) 2.5. RNA extraction and quantitative reverse transcription-polymerase chain reaction (qRT-PCR) An RNA Isolation kit was applied to extract total RNA according to manufacturer instructions. A NanoDrop TM ND-1000 spectrophotometer (Thermo Fisher Scientific, Wilmington, MA, USA) was used to determine the quantity and purity of RNA. Expression of complimentary DNA was measured using a Prime Script TM RT kit (TaKaRa Biotechnology, Otsu, Japan). qPCR was undertaken on a 7900HT fast RT qPCR instrument (Applied Biosystems, Foster City, CA, USA) according to manufacturer instructions. The RNA of actin was used as an internal loading control.

Less overlapping in different COVID-19 transcriptome studies
We collected the transcriptional profiles from five COVID-19 clinical studies which used samples including peripheral blood, PBMC and ronchoalveolar lavage fluid. In totally, <20 COVID-19 patients were enrolled to be studied. Fig. 1A showed that only one gene TNFSF10 is differential expressed genes (DEG) in all five studies. Ten DEG were identified in four studies, the most DEGs are only identified in just one study.

Summary of IL-6 related datasets
After screening the GEO database, 13 transcriptome datasets related to IL-6 treatment in Homo sapiens were collected for further analyses (Table 1). In these datasets: eight involved direct exogenous treatment with IL-6; three involved treatment using monoclonal antibody inhibitors of IL-6 receptors: one was divided into high and low expression of IL-6 according to measurement in peripheral blood; one involved withdrawal of IL-6 stimulation after treatment for a period of time. The cells used in the experiments included HK-2 cells and dendritic cells. The datasets with the largest number of samples had 18 experimental-group samples and 18 control-group samples. The 13 datasets contained 61 experimental samples and 61 control groups. Because of differences in the genes in each dataset, finally we integrated the 9121 genes detected in all 13 datasets for further analyses.
Using the traditional method to analyze transcriptome datasets, we compared the genes with twofold significant difference. We found that only two genes, EGR2 and LYPD1, had a twofold significant difference in at least two datasets ( Table 2). The trend in upregulation and downregulation of LYPD1 expression in the two datasets was contrary. Only EGR2 had the same trend in upregulation and downregulation of expression in the two datasets, and had statistical significance. There were >9000 genes, most of which had significant twofold differences in only one dataset (usually GSE12385).

IL6 regulated genes identified by meta-analysis
In general, 352 genes were identified using the random-effect models with p < 0.05 to denote significance. Among these 352 genes, expression of 237 genes was downregulated, whereas that of 115 genes was upregulated, after IL6 treatment. Table 3 lists the top-10 genes with the most significant upregulation, among which ICAM1, LDLR and SERPINA1 had the most significant upregulation, with a p-value of 3.88 Â 10 -4 , 5.65 Â 10 -4 and 8.02 Â 10 -4 , respectively. Table 3 lists the top-10 genes with the most significant downregulated expression, among which DCHS1 had the greatest (1.08 Â 10 -3 ). Among these genes, the adjusted statistical mean difference was largest for SOCS3 (1.32) ( Table 3).
We also checked the overlapping between our results and COVID-19 clinical studies. As shown in Fig. 1B, 120 DEGs were identified by both our meta-analysis and other COVID-19 clinical studies, and 232 DEGs were only found by our method. A coexpression network was built to show interaction in 352 genes identified from our studies. Fig. 1C displayed that several coexpressions are exist among these genes.

Enrichment analyses using the GO database revealed the biological function of IL-6-regulated genes
To understand the biological functions of DEGs, we utilized enrichment analyses using the GO database to classify genes into three groups: biological process, cell compartment and molecular function.

DEGs with fold-change
According to the results of the meta-analysis of ICAM1 and LDLR (the two genes with the most significant changes in expression), using the random-effects model, the combined average difference (95% confidence interval (CI) was 1.02 (0.43-2.47) and 0.18 (0.02-0.38), respectively. However, in E-GEOD-68941 alone, the mean fold-change of ICAM1 in the experimental group was more than twofold compared with that in the control group, but LDLR did not have a greater-than-twofold change in any dataset (Fig. 5A, B). In total, 315 genes had a greater-than-twofold change in at least one dataset (Fig. 4C). Only 17 genes overlapped with the DEGs in the meta-analysis, including ICAM1, PFKFB3, NAMPT and LIMA1 (Fig. 5C).

ICAM1 and PFKFB3 expression was induced in patients with severe COVID-19
RNA expression of ICAM1 and PFKFB3 in the blood plasma of patients with severe COVID-19 and healthy controls was measured by qRT-PCR. Expression of ICAM1 and PFKFB3 was significantly induced in severe-COVID-19 patients compared with that in healthy controls (Fig. 6).

Chemical-gene interactions interfere with IL-6 regulation
CTD Database could predict the interaction between gene and chemicals which are potential drug for disease. The evidence of interaction between chemicals and genes, such as direct protein binding or inhibition of protein activity, was collected from the CTD. Fourteen genes and 111 chemicals had an interaction. Several chemicals could interact with multiple genes. Copper interacted with five genes: ICAM1, LIMA1, CFB, SOX9 and ZHX3 (Table 4). Resveratrol interacted with NAMPT, SPHK1, PML and SOX9 (Table 4).

Discussion
Aim of this study is to understand the molecular and cellular processes involved in severe COVID-19. Considering the importance of IL-6 in COVID-19, it is rational to identify the genes and pathways regulated by IL-6. Multiple publicly available transcriptome datasets underwent meta-analysis for the transcriptome characteristics of IL-6 treatment. Using this strategy, we identified the DEGs in different types of samples, and described their biological functions, pathways (using GO and KEGG databases) and the diseases they were associated with.
Collection of transcriptome information under IL-6 treatment revealed that virtually no gene had repeatable twofold changes in expression. This finding may have been because different datasets had different treatment methods for IL-6, or because use of different cell types/tissues led to different gene responses. Nevertheless, DEGs with a significant difference in expression were identified by our meta-analysis. The same response was found for genes from different cell types and under different treatment con-  ditions, suggesting that these genes were likely to be key genes for IL-6 regulation in various conditions. Meta-analysis re-evaluated the statistics by considering the sample size of each experiment. Based on this, we found that DEG has more statistical repeatability in different conditions and a large number of cases. Compared with the published clinical transcriptional profiling studies of COVID-19, these studies used only a few cases, and the samples were different tissues. In the results of these studies, the number of repeated DEG was relatively small (Fig. 1). This suggests that our method has potential advantage than clinical research to save time and increase the accuracy.

ICAM1 PFKFB3 Relative Expression Level
Our meta-analysis disclosed some significant differences in gene expression after IL-6 treatment ( Table 3). Many of the genes we found to be upregulated by IL-6 are supported by experimental evidence. Among them, ICAM1 (a well-known gene involved in viral infection and cardiovascular disease) expression has been demonstrated to be consistent with IL-6 expression in mouse macrophages [28,29]. Some treatments, such as Phoenix 20 and soluble matrine 2, can induce the co-expression of ICAM1 and IL-6 [30,31]. In human astrocytes, IL-6 can upregulate SBNO2 expression [32]. IL-6 can induce PFKFB3 expression [33] through the STAT3 signaling pathway. TGFA expression has a positive correlation with IL-6 expression [32]. However, expression of some genes showed the opposite trend in other experiments. In some studies, IL-6 expression was found to be negatively correlated with SER-PINA1 expression [34]. In other studies, expression of IL-6 and SER-PINA1 was increased with disease progression [35,36]. SOCS3 expression has been shown to be negatively correlated with IL-6 expression [37][38][39][40]. A negative correlation between expression of TPCN1 and TOMM20 and IL-6 expression has been documented [40,41]. However, contrary to our findings, studies showed that expression of IL-6 and LIPA decreased in the treatment of some diseases [42]. High expression of IL-6 has been noted in critically ill patients with COVID-19, and IL-6 expression could be used to assess risk. Our meta-analysis showed that expression of ICAM1 and PFKFB3 was increased in COVID-19 patients. These genes could be used as biomarkers in COVID-19.
We found expression of ICAM1 and PFKFB3 to be enriched in virus-related regulatory pathways, the NF-jB signaling pathway, and apoptosis-related pathways (Figs. 2, 3). As an important cytokine in the immune system, IL-6 has been shown to be closely related to the pathways mentioned above in several studies. Through analyses of the KEGG database, we found expression of PRKCA, SCN1B, PLCB4, CAMK2G, CREB3L1 and PPP1CC (which are negatively regulated by IL-6) to be enriched in adrenergic signaling in cardiomyocytes. Only a few studies have found that IL-6 regulates the response to adrenaline stimulation through the mitogen-activated protein kinase (MAPK) pathway in mouse cardiomyocytes [43]. ICAM1 is a critical player in heart diseases. Evidence suggests that inhibition of ICAM1 expression could be novel treatment for hypertrophic heart diseases [44]. In the current outbreak of COVID-19: (i) the CRS caused by IL-6 has been an important cause of death; (ii) patients with heart disease or diabetes mellitus have a high prevalence of death. PFKFB3 has been suggested to have a critical role in diabetes mellitus [45]. We identified PFKFB3 to be enriched in the present study.
We conducted a more extensive study (through the CTD) on the relationship between disease and the genes found to be regulated by IL-6 in our meta-analysis. The IL-6 regulatory genes were closely associated with vascular diseases, cardiovascular diseases, respiratory diseases, and diabetes mellitus. Digestive-system releases, hypersensitivity, male urogenital releases, and nervous-system diseases were also enriched by genes regulated by IL-6, so we could consider a combination of these diseases in the CRS elicited by IL-6. Finally, we collected drug information related to IL-6 regulatory genes from the CTD database and found that resveratrol interacted with four key genes. This finding could aid development of treatment to counteract CRS during COVID-19.