Genetic networks in Parkinson’s and Alzheimer’s disease

Parkinson’s disease (PD) and Alzheimer’s disease (AD) are the most common neurodegenerative diseases and there is increasing evidence that they share common physiological and pathological links. Here we have conducted the largest network analysis of PD and AD based on their gene expressions in blood to date. We identified modules that were not preserved between disease and healthy control (HC) networks, and important hub genes and transcription factors (TFs) in these modules. We highlighted that the PD module not preserved in HCs was associated with insulin resistance, and HDAC6 was identified as a hub gene in this module which may have the role of influencing tau phosphorylation and autophagic flux in neurodegenerative disease. The AD module associated with regulation of lipolysis in adipocytes and neuroactive ligand-receptor interaction was not preserved in healthy and mild cognitive impairment networks and the key hubs TRPC5 and BRAP identified as potential targets for therapeutic treatments of AD. Our study demonstrated that PD and AD share common disrupted genetics and identified novel pathways, hub genes and TFs that may be new areas for mechanistic study and important targets in both diseases.

genetic variants that increase the risk of both AD and PD, for example the strong risk factor for AD, APOE4, has been shown to be related to cognitive decline in PD [13].
Gene co-expression relationships contain a wealth of information that univariate methods like differential expression analysis cannot detect [14]. Weighted gene co-expression network analysis (WGCNA) is a popular tool used in systems biology to construct coexpression gene networks which can detect gene modules as well as identify key genes and hubs within these modules [15]. WGCNA has been used to find strong evidence for mitochondrial dysfunction and chronic low grade innate immune response in AD [16]. In addition, Chatterjee et al. [17] identified 11 hub genes by using WGCNA in frontal cortex and SN brain samples of PD patients.
To date there have been no studies investigating PD and AD using gene expression network simultaneously to reveal potential shared biological process and pathology. In this study we analyzed gene co-expression networks based on PD and AD blood microarray data and identified common genetic networks between both diseases. See our analysis workflow illustrated in Figure 1. Compared to brain tissues, blood tissue is easier to access from patients with ND, and publicly available AD and PD blood datasets have a large enough sample size to construct reliable and robust networks. Our network analysis expands on standard WGCNA and hub detection approach which can robustly find key processes and genes that are associated with both PD and AD.

Gene co-expression network construction
After quality control, we obtained 19176 genes in the PD dataset which included 204 PD and 230 healthy control (HC) samples, meanwhile 13661 genes were Figure 1. Workflow of our analysis. Filtered and normalized microarray data were separated into five datasets: AD disease (ADAD), healthy control (ADHC) and MCI (ADMCI) data from the AD dataset, and the PD disease (PDPD) and healthy control (PDHC) data from the PD dataset. On each dataset gene co-expression networks analysis was performed using the WGCNA R package [15]. An additional k-means correction step to reduce number of misplaced genes [70] was then performed and module preservation between cohorts within AD and PD was found using NetRep (v.1.2.1) [18]. The pathways associated with non-preserved modules were then found using the Enrichr web tool [19,20] and hub genes and transcription factors in these non-preserved modules identified. The SCAN (single nucleotide polymorphism (SNP) and Copy number ANnotation) database) database [25] was used to find SNPs associated with the genes in each non-preserved module and these SNPs used to search the MiRSNP database to find the SNPs at 3' UTR of disease associated miRNAs. AGING obtained in the AD dataset which included 245 AD, 142 mild cognitive impairment (MCI) and 182 HC samples. We applied WGCNA [15] to build our networks and selected the soft threshold power to define the adjacency matrix of each dataset based on approximate scale-free topology R 2 of 0.85 ( Figure 2). In this method, highly correlated nodes are placed into a single module or cluster which are thought to be regulated by similar transcription factors (TFs) and represent certain biological processes. These networks were constructed for the AD disease (ADAD), healthy control (ADHC) and MCI (ADMCI) data from the AD dataset, and the PD disease (PDPD) and healthy control (PDHC) data from the PD dataset separately. We discovered 27, 54, 29, 32 and 58 modules in PDPD, PDHC, ADAD, ADMCI, ADHC networks respectively.

PD blood and brain DEG overlap
We identified 360 DEGs in the PD blood dataset (nominal Pvalue < 0.01, see Supplementary Table 2) and compared these DEGs to the DEGs identified in our recent meta-analysis study about PD in substantia nigra region [11]. An overlap of 21 genes were found including LRRN3, BASP1 and TPM3. However, a Fisher Exact test was not significant for the overlap showing that this was likely by chance (OR = 1.08, 95% CI 0.65~1.72, Pvalue = 0.72, Fisher Exact test).

Identification of non-preserved modules
In our network analysis, if the relationships and correlation structure between nodes composing each module were not replicated, then they were considered non-preserved. In the case of healthy and disease networks, non-preserved modules suggested the expression pattern and regulation of the genes in these modules vary between disease and healthy conditions. On the other hand, modules preserved between disease and healthy networks represented processes that are not affected by disease status. Here we focused on nonpreserved modules which may help to reveal the disease mechanism. The R package NetRep (v1.2.1) was used to identify these non-preserved modules [18]. Table 1 shows the non-preserved modules between PDHC and PDPD networks and the biological processes associated with these modules. Three of the 54 modules in the PDPD network were not preserved in PDHC network, and one of those 27 PDHC modules was not preserved in the PDPD network. The Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) terms that were significantly enriched within non-preserved modules (Pvalue < 0.01) were found using the Enrichr web tool [19,20]. The PDPD salmon module was found to be associated with insulin signaling (KEGG pathway, Pvalue = 0.0030, 7/108 overlap). The PDPD darkseagreen4 module was found to be associated with antigen processing and presentation (KEGG pathway, Pvalue = 5.38E-16, overlap = 14/77) and natural killer cell mediated cytotoxicity (KEGG pathway, Pvalue = 2.94E-15, overlap = 10/41). Table 2 shows the non-preserved modules between the ADHC, ADMCI and ADAD networks. Of the 29 ADAD modules, one was not preserved in both ADHC and ADMCI networks. In addition, one of the 32 ADMCI modules was not preserved in ADAD and ADHC networks. Moreover, three of the 58 ADHC modules were not preserved in both ADAD and ADMCI networks and one non-preserved in ADMCI networks. The ADAD blue module was not preserved in ADHC and ADMCI networks and was associated with regulation of lipolysis in adipocytes (KEGG pathway, Pvalue = 6.24E-4, overlap = 10/55) and neuroactive ligand-receptor interaction (KEGG pathway, Pvalue = 0.005070, overlap = 30/338). The ADHC darkolivegreen module was associated with sensory perception (GO biological process, Pvalue = 1.83E-4, overlap = 8/55).

Identifying hub genes
Hubs are genes that are highly interconnected or important within a module and likely have functional significance [21]. Hubs have a role in maintaining the structure of the gene network of the module and the biological processes associated with the module. In our study, hub genes were identified using five approaches: Betweenness centrality (BC), PageRank, module membership (MM), closeness centrality and Kleinberg's centrality. Any gene with a Pvalue < 0.01 in any hub detection method was considered as a significant hub gene. Using multiple methods for identifying hubs allowed for hub identification that may otherwise have been missed by use of just one method. To demonstrate hub score distribution, Supplementary Figure 2A shows an example of betweenness hub score distribution across all genes in the PDPD darkseagreen4 module which was non-preserved in PDHC network and the (Supplementary Figure 2B) distribution of the significant GINS2 (Pvalue = 0.005) BC scores across the 1000 iterations of the hub permutation test.
We identified 34 hubs in modules not preserved between the PDPD and PDHC networks (Supplementary Table 3) and 92 hubs in the nonpreserved modules between ADAD, ADMCI and ADHC networks (Supplementary Table 4). It was expected that larger modules may have more hubs than smaller ones, for example the PDHC purple module contained 606 genes, of which 17 were found to be hubs (e.g. FAM110C, PAK4, NEB), and the smaller salmon PDPD module contained 351 genes, of which only 10 were hubs (e.g. HDAC6, TYSND1). The PD salmon module was associated with insulin resistance and was not preserved in PDHC network shown in Figure 3A, where hub genes are highlighted. Interestingly, it includes HDAC6 which has been shown to influence tau phosphorylation and autophagic flux in AD [22]. The blue AD module which was associated with regulation of lipolysis in adipocytes and neuroactive ligandreceptor interaction and was not preserved in ADMCI and ADHC networks ( Figure 3B) which included TRPC5 and BRAP as hub genes. Networks were visualized in Gephi [23].

Identifying transcription factors (TFs)
Genes that are clustered together by WGNCA likely are regulated in a similar way, thus we intended to identify which TFs potentially regulate the gene expression of each module. The TFs that potentially regulate each non-preserved module (Pvalue <0.01) were identified by using Encyclopedia of DNA Elements (ENCODE) and chromatin immunoprecipitation (ChIP) Enrichment Analysis (ChEA) consensus TFs from ChIP-X by using the Enrichr web tool [19,20]. We found a total of four TFs that regulated at least one of the three PDPD modules, including FOXM1 which regulated 6 genes in the salmon modules (Pvalue = 0.0066) and 9 in the AGING  Table 5 shows the significant TFs found in modules that were not preserved between PD and HC networks.

Single nucleotide polymorphism (SNP) analysis of significant WGCNA modules
As non-preserved modules contain genes which play a role in processes that were associated with AD or PD, they may have been more likely to contain disease associated variants than preserved modules. We searched each non-preserved PD module for known Genome Wide Association Studies (GWAS) genes associated with PD [24]. There are 69 known GWAS genes, of which four (TMEM163, TLR9, ITIH4, TUBG2) were in the salmon module and two (TMEM175, STAB1) were in the navajowhite2 module.
We observed a significant enrichment of GWAS genes within modules that were not preserved compared to preserved networks (OR = 2.96, 95% CI 1.04~6.88, Pvalue = 0.02, Fisher Exact test). Furthermore, the nonpreserved PDHC purple network contained five GWAS gene (KAT8, BIN3, TLR9, ITIH4, TUBG2), however the non-preserved HC modules were not more likely to contain GWAS genes (OR = 2.61, 95% CI 0.08 ~ 6.47, Pvalue = 0.052, Fisher Exact test). We did the same analysis for the non-preserved AD modules, however, no AD associated GWAS genes were found within any non-preserved modules.
In addition to searching for known GWAS genes in non-preserved modules, we used the SCAN (SNP and Copy number ANnotation) database (http://www. scandb.org/) [25] to identify SNPs corresponding to the genes in each non-preserved module. These SNPs were used to search the MirSNP [26] database to identify SNPs associated with known PD or AD microRNAs (miRNAs) dependent on the dataset of the module. We identified 29 SNPs associated with 9 PD related miRNAs across all non-preserved modules in the PD dataset (Supplementary Table 7), and 27 SNPs associated with 8 AD related miRNAs across all nonpreserved modules in the AD dataset (Supplementary Table 8).

Comparison of AD and PD results
There is increasing evidence that PD and AD share several common characteristics [9], thus we investigated the shared processes associated with nonpreserved modules in both the AD and PD dataset to see which were important in both diseases. The biological processes found to be associated with significant modules in AD and PD were compared to see which were important in both diseases. Unfortunately, we did AGING not find any significant modules that were common between these two. However, we identified some similarities between AD and PD.

DISCUSSION
In this study, by using gene co-expression network analysis we identified many important biological processes and key genes in PD and AD blood samples, and the common results between them. To our knowledge this is the largest network analysis of AD and PD blood to date. We found insulin resistance to be associated with PD and HDAC6 may play an important role in this process. We highlight the overlap in disease miRNA associated SNPs that are shared between PD and AD, suggesting similarities in genetic risk factors between the diseases. Our approach used blood data as the available blood datasets have a large enough sample size to construct robust and reliable networks and blood samples are easily accessible in neurodegenerative disease patients. We previously found that DEGs in AD blood were more likely to be DEGs in AD brain tissue [27]. However, in this study, we found that DEGs in blood were not more likely to be DEGs in brain tissue for PD, nevertheless it has been shown that changes in blood gene expression did reflect changes in PD [28].
The PD network module associated with insulin resistance is not preserved in HCs. Insulin resistance is increasingly being shown to be important in PD as a potential therapeutic target [29] and has a high prevalence in non-diabetic PD patients [30], additionally insulin receptor signaling pathways are disturbed in PD [11]. Within this module we identified HDAC6 as a hub gene which promotes the formation of inclusions from α-synuclein toxic oligomers [31]. HDAC6 can promote insulin resistance by deacetylating phosphatase and tensin homolog (PTEN) in ovarian OVCAR-3 cells [32], and PTEN has in turn been shown to be involved in the pathophysiology of PD [33].
HDAC6 has a role in influencing tau phosphorylation and autophagic flux in neurodegenerative disease [22]. In addition, insulin signaling promotes the DNAbinding activity of FOXM1, identified as a significant TF in the insulin resistance module, which regulates pathways to promote adaptive pancreatic β cell proliferation [34], but its role in ND is not clear.
The PD module associated with cellular response to misfolded proteins was also not preserved in HC networks. PD is characterized by accumulation of misfolded α-synuclein and a failure of the proteasome to degrade these and other large protein aggregates [35]. The hub gene SNRNP70 has been shown to be differentially expressed in PD blood previously [36]. Additionally, SNRNP70 encodes the small nuclear ribonucleoprotein snRNP70 which co-localizes with tau in AD [37], and as tau aggregation is shown in ~50% of PD cases snRNP70 may colocalize in PD cases [38]. We also identified MIR142, which encodes miRNA-142, as a hub. miRNA-142 has been identified as an important miRNA in PD, regulating GNAQ, TMTC2, BEND2, and KYNU [39].
The AD module associated with regulation of lipolysis in adipocytes and neuroactive ligand-receptor interaction was not preserved in both MCI and HC networks. Aβ, a key molecule in AD brain pathology, can induce lipolysis within human adipose tissue [40]. In addition, lipolysis is promoted by insulin resistance and in turn lipolysis generates ceramides further impairing insulin signaling, which is becoming increasingly more important in AD [41]. We identified TRPC5 as a hub in this module, which along with other transient receptor potential canonical (TRPC) proteins assembles to form non-selective Ca2+-permeable channels. Another hub, BRAP, has a polymorphism associated with obesity and other metabolic traits, which can play a role in effecting insulin signaling and aging [42]. Interestingly, a module in the HC network that was not preserved in AD and MCI networks was also associated with regulation of lipolysis in adipocytes. This suggests that these processes are occurring in both healthy and AD conditions, however the enrichment pathways are different between the two. As no hubs are shared between the regulation of lipolysis in adipocytes AGING modules in healthy and AD networks they are likely regulated differently.
The module associated with sensory perception in the HC network was not preserved in AD and MCI networks. Sensory dysfunction may precede the cognitive symptoms of AD [43], particularly olfactory impairment [44]. OR5AS1 was identified as a hub gene within the module which encodes a member of the olfactory receptor family and plays a role in triggering response to smells [45]. The TF REST was identified as a regulator of the module and has been shown to regulate olfactory systems [46]. We have identified REST to be an important upstream TF for DEGs identified in both AD and PD previously, and as an important potential therapeutic target [11]. Future work to validate our identified hubs and TFs in both AD and PD disease models would further elucidate their potential as targets for disease treatment.
Although we did not identify any common nonpreserved modules in the AD and PD cohorts, there were other similarities shared in the results. Four TFs were shared between the PDHC purple and the ADHC darkorange2 module (CREB1, NFYB, PBX3, SIX5). These two modules were associated with different transport pathways in HCs which were not preserved in the disease networks, suggesting that the roles of these TFs are dysregulated in both AD and PD. In addition to this, we identified 12 SNPs that were shared between the 29 PD miRNAs associated SNPs and 27 AD miRNAs associated SNPs. This number of shared SNPs is highly significant, which suggests that there are potential risk factors that underlie both diseases.
Several studies have applied WGCNA in ND studies for gene expression and proteomics analysis. For example, Seyfried and colleagues studied proteomic data of cortical tissue of asymptomatic and symptomatic AD [47]. They found that there was a modest overlap between networks at RNA and protein level. If a larger dataset becomes available, expanding our methods to proteomic data could give further understanding into the mechanisms of AD and PD and enable the investigation into the link between genomics and proteomics. Chatterjee et al. [17] have performed network analysis of PD brain tissue, however they only performed WGCNA on DEGs found in the data, which built very limited networks that removed potentially important gene interactions and disease regulators and introduced a bias of modules and hubs towards these DEGs. In addition, they used tissue from multiple brain regions which would all be affected differently by the disease [48].
A limitation of this study is that, although it has been shown that AD blood DEGs are more likely to be DEGs in the brain [27], our results suggest this is not the case for PD. Because of this, our results may not reflect major changes that take place in the brain. However, our network analysis approach emphasizes the interactions of genes which univariate methods like differential expression does not. Similarly to AD, there is disruption that happens in the blood brain barrier (BBB) of PD patients [49]. Hence, it is likely that changes that take place in the brain could be reflected in the blood and vice versa. Additionally, a lot of the biological processes and genes we found in our PD network has been implicated in the PD brain previously [11]. Tau and Aβ are hallmarks of both AD and PD in the brain and have potential as blood biomarkers in both diseases [50,51], suggesting that changes in the brain are reflected in blood.
Leukocytes have been shown to impact progression of neurodegenerative diseases. An interaction between brain and systemic inflammation has been implicated in PD progression by an association between leukocyte apoptosis and central dopamine neuron loss [55]. Increased mitochondrial respiratory activity in leukocytes has been shown in PD patients, potentially impacting progression of neurodegeneration [56] and elevated leukocytes in cerebrospinal fluid are significantly associated with shorter survival of patients [57]. Peripheral leukocytes have been discussed as potential biomarkers for AD previously [52], and gene expression changes in leukocytes have been shown to be closely associated with AD progression [53]. In AD animal models circulating leukocytes have been shown to cross a dysfunctional blood brain barrier and impact brain integrity [54].
Recently limbic-predominant age-related TDP-43 encephalopathy (LATE) has been reported to be underrecognized and often misdiagnosed as AD as they share common pathogenetic mechanisms and present similarly in patients [58]. There is the potential that patients in our AD cohort may have been misdiagnosed and actually have LATE, however as LATE is seen with increasing frequency over the age of 85, and less than 6% of our AD samples were over the age of 85 this likely had little effect on our results.
The greatest risk factor for both AD and PD is age. Adjusting AD data by age before WGCNA ensured any changes we found were reflective of disease state. The PD data, however, did not include samples' age information when we downloaded, thus the effect of age could not be removed technically. As a result of this, the PD results may have been biased towards changes as a result of aging if there was a significant difference in age between PD and HC cohorts. However, the samples were age matched in the original design which should reduce such biases [59].

AGING
From the PD dataset we removed patient samples with known PD mutations. Although the biological pathways underlying familial and sporadic forms of PD are likely to be shared, known PD mutations may impact pathways to disease or regulators of disease [60]. Removal of samples with known PD mutations prevented these mutations from having an impact on results, however had little impact on sample size due to the low number of samples with mutations. AD samples were not screened for known mutations, which could have had an impact on our results. For example, nearly 19% of the familial late onset AD population carry 2 APOE ε4 alleles which only occurs in about 1% of normal Caucasian controls [61]. This and other known mutations may impact the progression and regulators of AD, and knowing which samples had these mutations could have improved our findings.
In conclusion, our network analysis is the largest study using AD and PD blood data to date. We highlight the non-preserved module in PD associated with insulin resistance, and the hub HDAC6 identified in this module. Our results reveal that a large proportion of disease miRNA associated SNPs are shared between PD and AD, suggesting similarities in genetic risk factors between the diseases. The hub genes that we have identified have the possibility to be further investigated as potential biomarkers for disease. These insights suggest several new areas for mechanistic studies in PD and AD research fields.

Data preparation for PD and AD blood datasets
The publicly available peripheral venous whole blood dataset comprising 205 PD and 233 control samples was downloaded from the GEO (Gene Expression Omnibus) database (http://www.ncbi.nlm.nih.gov/geo/) with accession identifier GSE99039. This dataset is the largest of its type and has a sample size enough to run WGCNA and reliably find hub genes [62]. Samples with known PD mutation genes (Parkin, DJ-1 and PINK1, ATP13A2, LRRK2, SNCA) were removed to reduce biases introduced by these genes (see discussion), and outlier samples were detected and removed based on box and density plots of probe intensities. This removed a total of one PD and three HC samples, leaving 204 PD and 230 HC samples. Data was then Robust Multiarray Average (RMA) normalized using the affy R package [63]. Samples missing gender information (35 samples) were assigned sex by using the massiR R package [64] which uses the information from microarray probes that represent genes in Y chromosome to perform k-medoids clustering to classify the samples into male and female groups. We selected a probe-variation threshold of 4 by inspecting a probe-variation plot (Supplementary Figure  1) to select the Y chromosome probes to be used in the sex classification process.
The ComBat function in the sva R package [65] was used to control the effect of gender and running batch of the samples. After this, control probes and those without Entrez gene annotation were removed. For any genes that mapped to multiple probes, the probe with the highest median absolute deviation (MAD) was kept. MAD was used as, similarly to inter-quartile range, the probe with the highest MAD has the greatest variability and so likely has more information [66]. Finally, the bottom 5% probes by average expression values across all samples were removed.
For AD, the two independent peripheral venous whole blood datasets GSE63060 and GSE63061, from the AddNeuroMed Cohort [67], were used to construct the blood gene expression networks. As these two datasets were from the same cohort study and sample collection and analysis was carried out using the same methodologies, except using different biological samples and microarray platforms, they can be merged to produce a larger dataset that can improve the power of our study. The two normalized datasets (generated by different Illumina platforms) were merged using the inSilicoMerging R package [68], which removes the batch effects between these two, as we have done previously [27].
Patients of Western European and Caucasian ethnicity were extracted from the merged dataset leaving a total of 245 AD, 142 MCI and 182 HC to reduce any potential genetic impact that ethnicity may have on AD. The effect of the age and gender were controlled for using the ComBat function in the sva R package [65]. As with the PD data, control probes and those without Entrez gene annotation were removed and for any genes that mapped to multiple probes, the probe with the highest MAD was kept. Finally, the bottom 5% probes by average expression values across samples were removed. Information on number of samples, gender and age of samples is shown in Supplementary Table 1.

PD blood and brain DEG overlap
To see if there was a significant overlap between PD gene expression in blood and brain as has been shown previously in AD [27], our data was compared to DEGs previously identified in PD substantia nigra [11]. Using the normalized and filtered PD data, DEGs were identified by applying limma with gender and running batch adjusted. Slightly stringent nominal Pvalue <0.01 AGING was used for significance as only one DEG could pass multiple testing (FDR corrected Pvalue <0.05).

Gene co-expression network construction
The R package WGCNA [15] was applied to perform gene co-expression network analysis as follows: A matrix of pairwise correlations between all pairs of genes across each sample group (e.g. case and control groups separately), was created and each raised to a soft-thresholding power to achieve a scale-free topology R 2 of 0.85. From this, a topological overlap matrix (TOM) was calculated, which takes correlation between genes expression as well as connections the genes share into consideration. This TOM was then converted to topological overlap dissimilarities to be used with hierarchical clustering. Then, a dynamic tree-cutting algorithm was used to determine initial module assignments of genes (cutreeHybrid, using default parameters except deepSplit of 3, minModuleSize of 10 and mergeCutHeight of 0.05) [69]. An additional kmeans clustering step was applied to improve the results of the hierarchical clustering in WGCNA as proposed by Botía et al [70] which has been reported to be able to reduce the number of misplaced genes and improve the enrichment of GO pathway terms. All analysis was conducted in R3.5.2 [71].

Calculation of module preservation
In order to identify modules that are not preserved between conditions within datasets, we applied NetRep (v1.2.1) [18] which uses a permutation test procedure on seven module preservation statistics. We permuted 10,000 times. The "alternative" parameter is set to "less" to test whether each module preservation statistic is smaller than expected by chance in order to identify these nonpreserved modules which are extremely different in the two networks. If all seven module preservation statistics had a Pvalue < 0.05 then that module was significantly non-preserved between conditions.

Pathway enrichment analysis
To identify the biological pathways that the modules represent we performed GO and KEGG pathway enrichment analysis (KEGG 2019) using the Enrichr web tool [19,20]. Pathways and GO terms with a Pvalue < 0.05 were considered significant.

Hub gene identification
Generally, detecting hub genes in co-expression networks has been done using MM, which is the correlation of a gene to its eigengene (the first principle component calculated using the expression data of genes in each module) [72]. BC of a gene is the number of shortest paths connecting all gene pairs that pass through that gene [73], and genes with high BC were considered as "high traffic".
Here we have expanded hub detection to include multiple other hub detection methods frequently used in network analysis. In addition to MM and BC, we used closeness centrality [74], Kleinberg's hub centrality score [75] and the PageRank algorithm [76] which would reduce the chance of missing any important hub genes that regulate the network that may be missed by applying individual methods. Genes with high closeness centrality scores have the shortest path to all other genes in the module and are placed to influence the entire network quickly [74]. PageRank emphasizes nodes that are connected to other nodes with high Pagerank scores [76]. Kleinberg's hub centrality score [75] is similar to the PageRank algorithm, however, the small differences between the two widens the net for identifying important hubs.
A novel hub detection permutation test was developed to obtain Pvalues for each hub detection store and determine if they are statistically significant. Briefly, the gene ID labels on the adjacency matrix were randomly re-labelled and hub score recalculated 1000 times to obtain a statistical distribution. The Pvalue was calculated by dividing the number of recalculated permutation hub scores that are higher than the observed hub score in the original network by the number of permutations. Genes were considered significant hubs if any hub scores had a Pvalue < 0.01. This was performed for all modules not preserved between PD and HCs in the PD dataset, and the modules not preserved between any of the AD, MCI and HCs networks in the AD dataset. BC, closeness centrality, PageRank and Kleinberg's hub centrality scores were calculated using the igraph R package with default settings without normalization [77]. The R code used for the novel hub detection test is available at http://dx.doi.org/10.5281/zenodo.3686007.

Identifying transcription factors
To identify TFs that potentially regulate each module, we used ENCODE and ChEA Consensus TFs from ChIP-X found using the Enrichr web tool [19,20]. TFs with a Pvalue < 0.01 were considered significant. If a TF was found significant in both ENCODE and ChEA then the lower Pvalue was assigned to the TF.

SNP and microRNA analysis of significant WGCNA modules
A two-tailed Fisher's exact test was used to test our hypothesis that non-preserved modules were more AGING likely to contain GWAS identified genes than preserved modules. The risk loci for PD and AD were from recent GWAS, between which only one GWAS gene was shared (KAT8) [24,78].
We gained further insight into SNPs associated with non-preserved modules, using a similar methodology to Chatterjee et al. [17]. The SCAN database [25] was used to find all SNPs that have been shown to predict the expression of each gene within non-preserved modules. For each non-preserved module, only SNPs that predicted gene expression with Pvalues < 1.0e-4 and frequency > 0.10 within the CEU human samples of European descent were selected.
Previous studies have revealed that differential expression of miRNAs were associated with PD [79] and AD [80]. In addition, SNPs have been identified as disease prognostic markers by association to miRNAs [81]. SNPs we found to be associated with genes from the PD related modules were used to search the MirSNP [26] database in order to find which SNPs were associated with the 83 experimentally confirmed PD related miRNAs in the HMDD v3.0 database [82]. The same process was done for genes within the AD related modules and the 57 experimentally confirmed AD related miRNAs in the HMDD v3.0 database. The MirSNP database identified the SNPs that are present at the 3' untranslated region of miRNA target sites, and so narrowed down the selection of SNPs to those that likely effect known miRNAs associated with the disease.

Comparison of PD and AD results
The processes associated with non-preserved modules in AD and PD were compared to see if any processes were similar between diseases. Hub genes and TFs identified in non-preserved modules were also compared between AD and PD to see if any were shared. In addition, we test our hypothesis that AD and PD share SNPs we identified in non-preserved modules associated with disease related miRNAs in AD and PD respectively.

AUTHOR CONTRIBUTIONS
XL conceived of the presented idea. JK performed the experiments and data analysis. X.L., JK, CC, RM and SL. analyzed the data and interpreted results. All authors reviewed the manuscript, and all authors read and approved the final manuscript.

Supplementary Figures
Supplementary Figure 1. The probe variation plot used to determine which genes to use in massiR R package [53]. A threshold of 4 was selected as it encompassed the genes with the highest variation and ignores genes with low variation that may be useful in classifying samples.

Supplementary Figure 2. (A)
The distribution of betweenness scores for each gene in the darkseagreen4 module. Many genes have a betweenness score of 0 indicating they do not act as hubs in regard to betweenness in this module. After the hub permutation test, one gene was found to be significant (GINS2, Pvalue = 0.005). (B) The distribution of betweenness scores for GINS2 over the 1000 iterations of the hub permutation test. The betweenness score of GINS2 in the original darkseagreen4 module network is highlighted.

Supplementary Tables
Supplementary Table 1. Information on number of samples, sex and age of samples in datasets.