Identifying hub genes associated with clinical characteristics in IgA nephropathy by WGCNA

Background Clinically, IgA nephropathy (IgAN) has a variety of symptoms including paroxysmal gross hematuria, nephritic syndrome and nephrotic syndrome. However, the currently research of IgAN research is not thoroughly enough. This study was aimed at investigating hub genes and genes modulars related to IgAN clinical characteristics. Results We collected microarray data from 66 human samples to construct the gene co-expression network by weighted gene co-expression network analysis and identify the hub genes associated with clinical characteristics. The GO-enriched analysis, pathway analysis and protein-protein interaction network were used to study the specic clinically relevant hub genes. The results showed there were 1470 differentially expressed genes (DEGs) in IgAN glomeruli, of which 48 hub genes associated with blood pressure (Bp) and enriched in ERK1 and ERK2 cascade and Rap1 signaling pathway, 223 hub genes associated with body Mass Index (BMI) and related to organic acid catabolic process and fatty acid degradation pathway, 136 hub genes associated glomerular ltration rate (GFR) enriched in immune response and PI3K-Akt signaling pathway, 82 hub genes associated with proteinuria enriched in extracellular matrix organization and PI3K-Akt signaling pathway. Moreover, there were 480 DEGs IgAN in tubulointerstitium. Among 480 DEGs, 35 hub genes associated with Bp which enriched in positive regulation of apoptotic process, 87 hub genes associated with GFR which related in negative regulation of macromolecule metabolic process and RNA transport, 33 hub genes associated with proteinuria and enriched in regulation of apoptotic process and FoxO signaling pathway. Conclusions We made a preliminary investigation of the transcriptome of IgAN and identied hub genes and pathways closely related to BMI, GFR and proteinuria in IgAN.


Introduction
Immunoglobulin A nephropathy (IgAN), also known as Berger's disease, initially described in 1968 by Berger and now was considered as the most common chronic glomerular disease in the world1. This condition is de ned by the predominant deposition of IgA in glomerular mesangium by immuno uorescence. Main clinical symptoms of IgAN include slight histological changes with glomerular mesangial proliferation, microscopic hematuria, gross hematuria and latent onset. About 30% of patients with IgAN reached end-stage renal disease (ESRD) after 10 ~ 20 years of initial diagnosis2.
Epidemiologically, IgAN may occurs at all age groups, with the peak incidence at 20 ~ 30 years old. Males are more susceptible to IgAN than females with a males to females ratio of 2 ~ 5: 1, however, the effect of age or sex for prognosis of IgAN is still uncertain. Therefore, it is of great signi cance to investigate the molecular mechanisms of IgAN and its relationship with the clinical characteristics, in order to understand the pathogenesis of IgAN. This is crucial for the evaluation of IgAN prognosis and guiding diagnosis and treatment of IgAN.
Weighted Gene Co-expression Network Analysis (WGCNA) 3 is an advanced and comprehensive algorithm for co-expression analysis based on R programming language. Currently, gene co-expression networks are increasingly being used to look for functionally similar genes, using a very straightforward concept: the nodes in the network represent the genes, and the signi cantly co-expressed genes will be clustered together by selecting the appropriate tissue samples. In fact, it is di cult to de ne whether the two nodes can be connected in a network. Traditionally, the binary method was used to construct genes coexpression network by de ning 1 as connected and 0 as unconnected, but it cannot interpret biological signi cance between the "hard" threshold 1 or 0. To solve this problem, WGCNA uses a soft threshold to de ne a weight value to determine the probability of interaction among a set of genes. Using this concept, a weighted co-expression network is formed. In order to use soft threshold, the co-expression network is transformed into a weight connection network, and parameters of soft threshold were set by scale-free topology criterion based on biological and statistical signi cance 3 .
In our study, we used 66 human samples from the European Renal cDNA Bank to construct the gene coexpression of differential expression genes by WGCNA, and identi ed the hub genes associated with clinical characteristics in renal interstitium and glomeruli respectively.

Results
Differentially expressed genes between IgAN and healthy control Series GSE37463, including 25 samples of tubulointerstitium and 27 samples of glomeruli from IgAN patients, 6 samples of tubulointerstitium and 5 samples of glomeruli from Healthy Living Donors (LD), were downloaded from GEO with clinical parameters including age, sex, blood pressure (BP), Body Mass Index (BMI), glomerular ltration rate (GFR), serum creatinine (Scr), and proteinuria (Fig. 1ab) Weighted Gene Co-expression Network Analysis WGCNA 3 is a systematic and robust gene co-expression network algorithm based on R programming language to describe the correlation of gene expression matrix, detecting highly correlated gene modules as well as evaluating the correlation between gene modules and clinical traits. According to the o cial protocol, WGCNA can be brie y divided into the following steps: 1. Construct co-expression network speci ed by its adjacenct matrix which is calculated by soft threshold power; 2. Transform adjacency into topological overlap matrix, using hierarchical clustering and dynamic tree-cutting method to screen module; 3. Select module and gene related to external information. The signi cance of a gene module was measured by combining correlation coe cient and Gene Signi cance (GS) 3 across a module. GS was provided by WGCNA, which was used to incorporate clinical characteristics into the gene coexpression network. The higher the absolute GS value, the more biologically signi cant is the gene. On Page 4/21 this basis, q.Weight < 0.01 (local FDR) 18 , a corrected weighted p-value of the association with a clinical characteristic was considered as the threshold to select hub genes.

Gene Ontology And Kegg Pathways Analysis Of Hub Genes
Firstly, we performed GO enrichment analysis. For glomeruli, hub genes associated with Bp were enriched in ERK1 and ERK2 cascade and cellular response to organic substance; hub genes associated with BMI were enriched in small molecule catabolic process and organic acid catabolic process; hub genes associated with GFR were enriched in immune response and regulation of immune response; and hub genes associated with proteinuria were enriched in extracellular matrix and structure organization (Table 1). For tubulointerstitium, hub genes associated with Bp were enriched in positive regulation of apoptotic process and positive regulation of programmed cell death; hub genes associated with GFR were enriched in negative regulation of macromolecule metabolic process and negative regulation of metabolic process; genes signi cantly associated with proteinuria were enriched in regulation of apoptotic process and regulation of programmed cell death ( Table 2).  Secondly, KEGG pathways were analyzed on the same hub gene. For glomeruli, hub genes associated with Bp were enriched in central carbon metabolism in cancer and Rap1 signaling pathway; hub genes associated with BMI were enriched in type I diabetes mellitus and fatty acid degradation; hub genes associated with GFR were enriched in PI3K-Akt signaling pathway and endocytosis; genes signi cantly associated with proteinuria were enriched in PI3K-Akt signaling pathway and ECM-receptor interaction (Table 1). For tubulointerstitium, hub genes associated with Bp were enriched in PI3K-Akt signaling pathway metabolic pathways (P > 0.05); hub genes associated with GFR were enriched in RNA transport and protein processing in endoplasmic reticulum; hub genes associated with proteinuria were enriched in FoxO signaling pathway and hepatitis B (Table 2).

Protein-protein Interaction Network Analysis Of Hub Genes
In IgAN glomeruli, the PPI enrichment P-value of genes set associated with BMI was 1.78e − 08 , which those genes are at least partially biologically connected (Fig. 5). Moreover, the network PPI enrichment Pvalue of genes set associated with GFR and proteinuria is 0 (Fig. 6a). Furthermore, the P-value of genes set associated with BMI, GFR and proteinuria in tubulointerstitium was 7.33e − 5 (Fig. 6b). In addition, in the PPI network, genes in one module or related to same clinical parameter are more likely to have interaction or be in a cluster, indicating that the WGCNA was successfully performed.

Discussion
A large number of studies have con rmed that BMI, hypertension, proteinuria and renal function contribute to the progression of IgAN, however, the genetic mechanism of these clinical characteristics leading to IgAN progression is not clear. The relationship and association between factors and IgAN also require more research and evidence of biological processes at the molecular level. Diseases involve thousands of gene expression changes with a huge and complex gene regulated network, which means research for a single gene is super cial and insu cient, it is hard to explain the mechanism of a disease. WGCNA is an advanced, successful and comprehensive algorithm for co-expression analysis, and it was not only used to construct gene co-expression network and screen gene modules. It served as a powerful tool to identify hub genes associated external information, clinical characteristics especially, and help researchers to understand the mechanism of the disease, providing a theoretical basis for the diagnosis and treatment of the disease.
Genes sets, which associated with GFR in glomeruli, enriched in immune response, cellular response to chemical stimulus and PI3K-Akt signaling pathway, indicated that immune response and IgA stimulus were dominant in impaction of glomerular ltration. It is known that glycosylated IgA has a transferrin receptor on the surface of mesangial cells 4 and abnormal glycosylated IgA immune complexes are speci cally recognized and deposited in the mesangium, causing proin ammatory cytokines and angiotensin to be released 5 . Tumor necrosis factor alpha, derived from IgAN patients podocytes cells autocrine synthesis, caused TNF receptor 1, TNF receptor 2 and IL-6 to be up-regulated. Elevated expression of TNF receptor 1 leads to podocyte apoptosis and up-regulation of TNF receptor 2 expression, leading to chronic in ammation 6 ; PI3K-Akt is an important intracellular pathway involved in cell metabolism, apoptosis, proliferation and differentiation 7,8 . A study by Cox et al. reported that PI3K-Akt signaling pathway was hyperactive in IgAN patients and played an important role in IgAN 9 . Therefore, based on the results of WGCNA, we believed that PI3K-Akt signaling pathway speci city impacts the renal function in IgAN. For proteinuria, hub genes in glomeruli were enriched in the extracellular matrix organization, extracellular structure organization and PI3K-Akt signaling pathway, this means that these genes may play an important role in changing the extracellular matrix and potentially leading to proteinuria.
In tubulointerstitium group, there were 480 DEGs between IgAN and health control group. Among 480 DEGs, 6 hub genes associated with age, 15 hub genesassociated with sex, 35 hub genes associated with Bp enriched in positive regulation of apoptotic process, cellular response to nutrient levels and regeneration. Moreover, 87 hub genes associated with GFR in tubulointerstitium enriched in negative regulation of macromolecule metabolic process, negative regulation of metabolic process and RNA transport, and 33 hub genes associated with proteinuria enriched in the regulation of apoptotic process, regulation of programmed cell death and FoxO signaling pathway. Proteinuria is closely associated with poor cardiovascular outcomes and progression of ESRD in patients with CKD 10,11 . It is worth noting that since the expression pattern of genes is very similar, WGCNA only screens one turquoise module in the IgAN tubulointerstitium group and genes that cannot be clustered into one of the modules are assigned to the grey module which represents background genes outside of the modules. Thus, genes in the grey module could possibly be related to GFR or Scr but not belong to WGCNA modules.

Conclusions
In conclusion, we made a preliminary investigation of the molecular mechanisms of the relationship between IgAN and clinical characteristics. We identi ed hub genes and pathways closely related to BMI, GFR and proteinuria in IgAN through a series of bioinformatics analysis, but it still needs to further explored and validated. Identi cation Of Differentially Expressed Genes R (Version 3.3.4) programming language was applied to data quality assessment, normalization, and detection of DEGs. Based on "affyPLM" 15 package, we conducted the quality assessment on microarray data using RNA degradation curve and Normalized Unscaled Standard Errors 16 . Then, we used RMA (Robust Multi-array Averaging) 17 to preprocess the raw data and utilized the limma (Linear Models for Microarray Analysis) package to obtain genes differential expression. The false FDR for fold change between IgAN and healthy control was calculated by moderated t-test, and log2 fold change ≥ 0.6 or < -0.6 with FDR < 0.05 were considered as the threshold for select DEGs selection. Finally, probes were annotated by o cial annotations le (Affymetrix HG-U133A Annotations, Release 35).

Gene Enrichment Analysis And Pathway Analysis
Biological signi cance of hub genes was explored by Gene ontology (GO) 19