Coexpression network analysis identified lncRNAs-mRNAs with potential relevance in African ancestry prostate cancer

Aim: This study aims to investigate similarities and differences using lncRNA and mRNA coexpression network analysis in African ancestry (AA) and European ancestry (EA) among prostate cancer (PCa) patients. Methods: We performed weighted gene coexpression network analysis of the expression from 49 of AA and 49 of EA to identify lncRNAs-mRNAs. Results: 27 lncRNAs and 36 mRNAs were highly expressed in patients of AA. Two mRNAs and their antisense lncRNAs were expressed. Additionally, seven mRNAs were DE or coexpressed and had an impact on survival. Conclusion: We present a list of lncRNAs and mRNAs that were DE and coexpressed when comparing patients of AA and EA, and these data are a resource for future studies to understand the role of lncRNAs.

The products of the human transcriptome are coding RNAs or noncoding RNAs (ncRNAs). Coding RNAs are templates used to organize the amino acids that constitute proteins during the translation process, and their structure and function are well known. Furthermore, based on their sizes, ncRNAs are classified as small ncRNAs (ncRNAs; <200 bp) and long ncRNAs (lncRNAs; >200 bp). Small ncRNAs are 20 to 30 bp in length, and based on structure/functionality, they are subdivided into cis regulatory (internal ribosome entry site [IRES], Leader, Riboswitch), gene (tRNA, rRNA, snRNA/snoRNA, miRNAs, among others) and intron sequences (Groups I and II) [9,10]. In medicine, miRNAs are of great interest because their expression constitutes a molecular fingerprint of different types of tumor cells, and they are possible candidates for treatment targets. miRNAs also play roles in gene expression by activating transcription [11] or inhibiting translation [12].
On the other hand, lncRNAs are classified as circular (circARN) or linear according to their structure. circRNAs are a type of single-stranded RNA that are generated by the formation of a loop in the primary pre-mRNA transcript due to back-splicing of an exon and/or intron, and the circle is closed by a covalent bond that makes it less susceptible to exonucleases. circRNAs regulate the expression of target genes during transcription and splicing as inhibitors of miRNAs, and circRNAs modulate the function of proteins by acting as decoys and encoding functional peptides [13]. In addition, linear lncRNAs are classified based on genomic localization relative to protein-encoding genes. These groups are sense, antisense, bidirectional, intronic, intergenic 3 overlapping, macro ncRNA, process transcripts and enhancer lncRNAs [9,10]. LncRNAs are involved in different cellular processes, such as maintaining nuclear structure integrity, positively or negatively regulating gene expression by recruiting transcription factors and/or chromatin remodelling complexes to DNA targets, regulating RNA splicing, acting as decoys to sequester RNA-binding proteins (RBPs) and sequestrating miRNAs (miRNA sponges) [14,15].
Dysregulation of miRNAs and lncRNAs is widely observed in the pathogenesis of different cancer types, and these RNAs are mainly involved in tumor growth, invasion and metastasis [16][17][18]. LncRNAs are easily obtained for study from tissues and body fluids due to their resistance to degradation. These two types of ncRNAs are good candidate biomarkers with effects on the clinical diagnosis, prognosis and treatment of cancer. In recent years, numerous preclinical and early clinical studies have explored lncRNAs as new therapeutics [19].
In PCa, numerous miRNAs are involved in different processes that promote prostate oncogenesis, such as the cell cycle, apoptosis, epithelial-mesenchymal transition (EMT), DNA replication/repair, migration, androgen receptor suppression, metastasis and treatment resistance [20][21][22]. lncRNAs have also been shown to be dysregulated and associated with the pathogenesis and progression of PCa [14,18,23]. In a recent review published on the past 30 years, Ramnarine et al. [23] found 109 lncRNAs associated with PCa. These lncRNAs are involved in the migration and proliferation of tumoral prostate cells in different stages of disease, such as localized disease, metastatic disease or castration-resistant PCa. The molecular mechanisms targeted by these lncRNAs in PCa pathogenesis were grouped into transcriptional regulation (epigenetic modification, transcriptional activation, transcriptional inhibition, RNA decoy), post-transcriptional regulation (regulation of antisense RNA, translational regulation, mRNA stabilization, miRNA host, miRNA sponge) and posttranslational regulation (subcellular structure, protein transport, protein complex mediation). In PCa, several lncRNAs have been proposed as biomarkers because they favor both carcinogenesis and tumoral progression [24]. In the present study, a systems biology analysis was conducted to identify similarities and differences using lncRNA and mRNA coexpression network analysis.

Data collection
The data provided by TCGA and processed for reuse by recount2 [25] were used as a primary source for data analysis. Gene counts for all prostate cancer and normal prostate tissues were downloaded. Samples annotated as having African and European ancestry by Yuan et al. [8] were selected. Finally, we kept all tumor samples from patients of AA, and we selected the same number of samples from patients of EA. These samples were randomly matched 1:1 (AA vs non-AA), taking into account the Gleason score obtained from the clinical information registered in The Cancer Genome Atlas Program (https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga) as a parameter.

Data preprocessing
The transcriptomic data (mRNA and lncRNA) available in recount2 [25] were already processed with Rail-RNA as described in recount2 [25]. We performed quality control of all datasets to identify that the samples we selected had a similar distribution after and before normalization by the use of density plots and box plots. Additionally, we plotted a multidimensional plot, and we performed a visual evaluation of the localization of the samples in the first four dimensions.

Identification of differentially expressed RNAs
The gene count data of each kind of RNA in patients of AA and EA were analyzed independently on the Galaxy web platform [26]. Data were normalized by library sizes using the trimmed mean of M-values (TMM) method. Genes without more than 1 count per million mapped reads (CPM) in at least 50% of the samples were considered not expressed and were filtered out. The limma-voom method was used to identify the differentially expressed (DE) genes. The primary factor was normal versus tumor, and the second factor was the Gleason score of the samples. An mRNA or lncRNA was selected as DE if the linear fold change was higher than the absolute value of 2 and the false discovery rate (FDR) was lower than 0.01.
Functional enrichment analysis of mRNAs with dysregulated expression DAVID (https://david.ncif crf.gov/) was used to identify the functions of the selected differentially expressed genes (DEGs). The upregulated and downregulated genes in each list of DEGs were analyzed separately. The Kyoto Encyclopedia of Genes and Genomes (KEGG) was chosen for overrepresentation analysis. Pathways with p-values lower than 0.05 were selected as enriched.

Weighted gene coexpression networks analysis
The total levels of differentially expressed mRNAs and lncRNAs in tumor samples from patients of AA compared with normal samples were added to one list. Two coexpression networks were developed using this gene list. The first list was generated for patients of AA, and the second list was generated for patients of EA.
To build the coexpression networks, we followed a typical methodology. First, the similarity matrix was calculated by identifying the Pearson correlation coefficients of the expression levels for the samples based on all possible mRNA-mRNA, lncRNA-lncRNA and lncRNA-mRNA combinations. Then, the similarity threshold was calculated with the adjacency function, which was established according to the unique characteristics of each similarity matrix [27]. The method developed by Elo was used to select the threshold [28]. This method compared the tau values for the network grouping coefficient (Co) with those expected for a random network (Cr). It uses the clustering coefficient of the real graph in comparison to a rando graph. The threshold for significant similarities is chosen so that the obtained real graph is scale-free. Finally, the adjacency matrix (2 × 2) of the network was established and allowed the representation of binary relationships. In this case, a pair of genes that exhibit coordinated gene-expression activity (coexpression) is indicated by (1); otherwise, a (0) is reported. All the weighted gene coexpression networks analysis (WGCNA) analyses were performed in an R unique environment using statistical functions (https://www.r-project.org/).

Analysis of WGCNA
The WGCNA was plotted through network analysis using Cytoscape [29]. The identification of the differences between the AA and EA networks was calculated using the advanced network merge interface function implemented in Cytoscape.

Survival analyses of differentially expressed genes among AA & EA & genes coexpressed with lncRNAs
Gene Expression Profiling Interactive Analysis (GEPIA;) [30] was used to calculate disease-free survival and overall survival between the DEGs coexpressed with lncRNAs. The lower and upper 50% of gene-expression levels were set as the standard for analysis. The confidence interval was 95%. High-and low-expression genes are represented in red and blue, respectively. Log-rank test results with p < 0.05 were regarded as statistically significant.

Patient characteristics
In total, 98 PCa tissues, 49 in each group (AA and EA), and 44 normal prostate tissues (7 AA patient tissues and 37 EA patient tissues) were included. The mean age of the AA patients was 56.9 years, the mean age of the EA patients 58.8 years and the mean age of the patients from whom normal prostate tissue collected was 59.3 years. Each group included the same number of patients according to the Gleason score (GS). Nine patients had GS  3 + 3, 22 had GS 3 + 4, 11 had GS 4 + 3, 2 had GS 4 + 4, 2 had GS 4 + 5, 2 had GS 5 + 4, and 1 had GS 5 + 5. In the AA group, two patients were t2a, two were t2b, 22 were t2c, 14 were t3a, and 9 were t3b, while in the EA group, two patients were t2a, 29 were t2c, 11 were t3a, 5 were t3b, and two had no information. In the AA group, 7 were n1, 36 were n0, and 6 had no information; in the EA group, 5 were n1, 29 were n0, and 15 had no information. Only two patients died in each group. Supplementary   Most DE mRNAs and lncRNAs were shared between prostate tumors from patients of AA and EA. For example, of 702 upregulated mRNAs in AA patients, 489 (69.7%) were shared with the upregulated mRNAs in EA patients. A similar result was obtained when the lists of lncRNAs were compared, following the same example: of the 708 DE lncRNAs in the samples from patients of AA, 472 (67%) of the genes were DE in common with the samples from patients of EA. Interestingly, the overlap between the lists of downregulated mRNAs and lncRNAs was even higher than that of upregulated genes. A total of 87.6% of the mRNAs and 74.2% of the lncRNAs upregulated in patients of AA were simultaneously upregulated in patients of EA.
To clarify the potential transcriptomic differences between prostate cancer samples from patients of AA and EA, we performed a direct comparison of those tumor samples. Table 2     Weight coexpression network analysis of DE mRNAs & lncRNAs in AA prostate cancer patient samples Taking all of the previous results together, it was evident that there were few important differences in the transcriptomic deregulation of AA and EA. These differences are not sufficient to identify key molecular pathways related to the special clinical characteristics of AA prostate cancer patients.
We hypothesize that the combination of the lncRNA and mRNA expression profiles could be specific for each clinical group and that the differences could help to explain part of the differential clinical behaviour in prostate cancer patients of AA. Then, we performed a coexpression network analysis to identify critical networks of coexpressed lncRNAs-mRNAs in tumor samples from patients of AA.
First, we selected the DE mRNAs and lncRNAs in the comparison of tumor samples from patients of AA versus normal prostate samples. Second, using only the tumoral expression profiles, we calculated two coexpression networks, one for the 49 tumor samples from patients of AA and the other for the 49 tumor samples from patients of EA.
The AA network contained 224 nodes, 65 of them were lncRNA and 159 were mRNA. There were 68 pairs, six trios, two quartets, one quintet and four networks of more than eight nodes of co-expressed genes. Figure 3 shows the coexpression networks with more than five nodes as calculated with AA samples. While the EA network contained 203 nodes, 54 of them were lncRNA and 149 were mRNA. There were 52 pairs, six trios, two quartets, one quintet and three networks of more than eight nodes of co-expressed genes (Supplementary Figure 1).
We calculated the differences in the former network of coexpression of patients of AA with respect to the network of coexpression of patients with EA with the same mRNAs and lncRNAs but with the tumor samples from patients of EA. The results of the network difference analysis indicated the specific relations of PCa patients of AA. The ). In the quintet network one lncRNA was co-expressed. One quartets network was with only lncRNA co-expressed. Among the five trios networks, two had two lncRNA co-expressed and the other only one. Small networks with fewer than two nodes are not shown. instances in which patients of AA showed reduced expression compared with patients of EA included 63 nodes, 27 of which were lncRNAs and 36 of which were mRNAs. There were 29 pairs, one trio and four subnetworks with at least six nodes (Figure 4). 24 mRNAs were coexpressed with at least one lncRNAs. Seven mRNAs (CENPN, DNAAF3, DNASE1L2, PCDHB2, PCSK6, RTN4RL2, THBS4) were upregulated. THBSA4 and PCSK6 were coexpressed with their antisense lncRNAs, ENSG00000249825 and ENSG00000259172, respectively ( Table 3).
Assessment of the prognostic significance of differentially expressed genes among patients of AA & EA & genes coexpressed with lncRNAs Using GEPIA [30], which employed the PRAD dataset, the DE genes between tumors from patients of AA versus tumors from patients of EA (Table 2)

Discussion
In the present study, we observed subtle differences among PCa patients of AA and EA when we compared transcriptomic dysregulation focused on individual genes. Few DE mRNAs and lncRNAs were obtained in the comparison between the two populations. In the pathway enrichment analysis, some differences between the groups were observed. Two pathways exclusive to patients of AA were the Wnt and MAPK signaling pathways. Both pathways have been associated with PCa oncogenesis due to cellular activities, including cell proliferation, differentiation, survival, death and transformation [31,32]. Interestingly, these two pathways are involved in the immune-related pathway [31,32] and regulation of the AR (androgen receptor) pathway [33,34]. Yuan et al. [8] observed high expression of DE mRNA-associated immune-related pathways in patients of AA and found a strong regulatory role for the AR signaling pathway in DE-lncRNAs. Variants of genes of the enzymes involved in both biosynthesis and metabolism of androgen have been observed in patients of AA [35]. Among the DE genes obtained by comparing patients of AA and EA, two upregulated genes (FOXH1 and NPIPB15) in patients of AA associated with poor disease-free survival were associated with the AR signaling pathway. FOXH1 is a member of the forkhead-box (FOX) gene family of transcription factors and participates in mediating transforming growth factor-β/activin signaling through its interaction with the Smad2-Smad4 complex [36]. FOXH1 may act as a corepressor of AR [36]. On the other hand, NPIPB15 is a member of the nuclear pore complex-interacting protein family; however, its role in PCa is unknown. The nuclear pore complex promotes the progression of PCA by increasing POM121-driven E2F1, MYC and AR nuclear import [37].
We found 1563 DE lncRNAs expressed between tumoral tissue versus normal tissue in patients of AA. In the literature, the majority of DE lncRNAs are unstudied. Yuan et al. [8] also observed 1868 (11.7%) DE lncRNAs, and the majority were not studied. A total of 45.9% were classified as intergenic lncRNAs, which suggests that they are transcribed from protein-coding independent transcriptional units. Additionally, a high proportion of DE lncRNAs were observed in the nucleus, which indicates that they are probably involved in gene regulation at the transcriptional level.
Due to a few differences between patients of AA and EA, we constructed two coexpression networks oriented to identify critical networks of coexpressed lncRNAs-mRNAs in tumor samples from patients of AA. The results in the first network of DE RNAs in patients of AA showed two tightly connected networks of the same family of genes, TBC1 domain family member and 3 (TBC1D3) and protocadherin alpha (PCDHA). Members of the TBC1D3 family, such as TBC1D3A or PRC7 (prostate cancer gene 17 protein), have been associated with PCa [38]. PCDHA belongs to a subset of a group of cell adhesion molecules, a cadherin superfamily. Yang et al. [39] observed that PCDHA is involved in the EMT axis associated with miR-193a-5p and with p53/RBM25-circAMOTL1L. These interesting networks are also present in EA networks (Supplementary Figure 1).
In the second network, where only the coexpression of lncRNAs-mRNAs in patients of AA to a lesser extent than in patients of EA was analyzed, it was observed that mRNAs and lncRNAs were still preserved. Interestingly, two of the upregulated genes (THBSA4 and PCSK6) were coexpressed with their antisense lncRNAs. THBSA4 (thrombospondin 4) was coexpressed with its antisense lncRNA (THBS4-AS1). Liu et al. observed high levels of THBSA4 in PCA and mainly in patients with GS >7. Cell lines (PC-3 and DU145) were observed to demonstrate reciprocal regulation of the long noncoding RNAs THBS4-003 and THBS4 to control both migration and invasion processes [40]. A recent study demonstrated that the overexpression of THBSA4 promoted self-renewal and proliferation, inhibited the apoptosis of PCa stem cells and enhanced in vivo tumorigenicity, which was achieved by activating the PI3K/Akt pathway [41]. THBSA4 also plays a role in the carcinogenesis of gastric cancer [42] and hepatocellular carcinoma [43]. PCSK6 is another overexpressed gene and is coexpressed with its antisense lncRNA (AC023024.1). PCSK6 is a member of the proprotein convertase family that is involved in PCa carcinogenesis [44] and was recently associated with the proliferation, invasion and migration of breast cancer cells by disturbing cell cycle arrest via the mitogen-activated protein kinase pathway [45]. A coexpression network analysis described the TGFB1I1 mRNA which is a coactivator of the AR that is associated with PCa cell differentiation, and decreased gene expression was observed to be associated with tumor progression [46].
Interestingly, given its high expression in AA and clinical impact, the DNASEL1 gene was noted. This is a member of the DNase 1-like endonuclease family and possesses keratin cell-specific endonuclease activity with an important role in DNA degradation [47]. In a recent study, it was observed that DNASEL1 is upregulated in breast carcinoma, which impacts the EMT process and has relevance in poor overall survival [48].

Conclusion
In the present study, we observed that populations of AA and EA have many similarities in terms of DE lncRNAs and mRNAs and pathway enrichment. We provide a list of DE and coexpressed lncRNAs and mRNAs in both the AA and EA groups. The pathway enrichment analysis revealed two interesting signaling pathways (MAPK and Wnt) that are involved in oncogenesis and are exclusively involved in patients of AA. In the network analysis, lncRNAs and mRNAs were highly coexpressed in AA, and these RNAs have relevance in different pathways related to PCa oncogenesis and have been associated with survival. In summary, this study provides a resource for future investigation of the role of lncRNAs and coexpressed genes in patients of AA and EA.

Summary points
• We observed subtle differences among prostate cancer (PCa) patients of African ancestry (AA) and European ancestry (EA) when we compared transcriptomic dysregulation. • The pathway enrichment analysis revealed that MAPK and Wnt are involved in oncogenesis and are exclusively involved in patients of AA. • Two tightly connected networks of the same family of genes (TBC1 domain family member and 3 (TBC1D3) and protocadherin alpha) were expressed in AA. Members of these families have been associated with PCa. • 24 mRNAs were coexpressed with at least one lncRNAs. Seven mRNAs (CENPN, DNAAF3, DNASE1L2, PCDHB2, PCSK6, RTN4RL2, THBS4) were upregulated. THBSA4 and PCSK6 were coexpressed with their antisense lncRNAs, ENSG00000249825 and ENSG00000259172, respectively. • Genes differentially expressed or coexpressed as FOXH1, NTM, NPIPB15, DNASE1L2 were associated with poor disease-free survival. And low expression of RNASE4, NUDT7 and SORBS1 served as protective factors for disease-free survival. • Further studies are required to evaluate a possible role of these mRNA-lncRNA in PCa.

Supplementary data
To view the supplementary data that accompany this paper please visit the journal website at: