Identification of Key Genes and Regulatory Pathways in Multiple Sclerosis Brain Samples: A Meta-Analysis of Micro-Array Datasets

Multiple sclerosis (MS) is an autoimmune disorder of the central nervous system (CNS) whose aetiology is only partly understood. Investigating the intricate transcriptional changes occurring in MS brains is critical to unravel novel pathogenic mechanisms and therapeutic targets. Unfortunately, this process is often hindered by the difficulty in retrieving an adequate number of samples. However, by merging data from publicly available datasets, it is possible to identify alterations in gene expression profiles and regulatory pathways that were previously overlooked. Here, we merged microarray gene expression profiles obtained from CNS white matter samples taken from MS donors to identify novel differentially expressed genes (DEGs) linked with MS. Data from three independent datasets (GSE38010, GSE32915, and GSE108000) were combined and used to detect novel DEGs using the Stouffer’s Z-score method. Corresponding regulatory pathways were analysed using the Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway databases. Finally, top up- and down-regulated transcripts were validated by real-time quantitative PCR (qPCR) using an independent set of white matter tissue samples obtained from MS donors with different disease subtypes. There were a total of 1446 DEGs, of which 742 were up-regulated and 704 genes were down-regulated. DEGs were associated with several myelin-related pathways and protein metabolism pathways. Validation studies of selected top up- or down-regulated genes highlighted MS subtype-specific differences in the expression of some of the identified genes, underlining a more complex scenario of white matter pathology amongst people afflicted by this devastating disease.


Introduction
Multiple sclerosis (MS) is a demyelinating disease of the central nervous system (CNS), characterised by chronic inflammation and neurodegeneration [1,2]. In MS, oligodendrocytes, the myelin-producing cells of the CNS, are damaged, resulting in the progressive loss of myelin and consequent formation of multi-focal lesions within the white matter (WM). The exact pathogenesis of MS is unknown; however, it is believed to be the result of a complex interaction between environmental, genetic, and lifestyle factors [3].
The clinical course of MS is categorised into three main subtypes: relapse-remitting MS (RRMS), secondary-progressive MS (SPMS), and primary-progressive MS (PPMS). Each subtype follows its own clinical course, with RRMS usually presenting with recurrent relapses of symptoms followed by periods of relative stability and recovery [4]. It is common for patients diagnosed with RRMS to progress into SPMS (~80% of cases), where the episodes of remission no longer occur, and progressive worsening of clinical symptoms ensue [5]. The third main subtype of MS is PPMS, a rare form of MS that only represents around 10% of diagnosed patients [6]. This form of MS is characterised by the progressive worsening of symptoms from the onset of the disease.
Lesions in the WM of MS patients can either be defined as active, inactive, chronic active or inactive, and regenerating based on their histopathological characteristics [7,8]. The formation of these pathological demyelinating lesions of the WM is accompanied-and often exacerbated-by co-existing factors such as inflammation, neurodegeneration, and gliosis [7][8][9][10][11]. However, it is not uncommon to observe signs of remyelination in some lesion types [12]. Despite current efforts, the exact aetiology of the disease remains unknown, and many questions on the molecular mechanisms behind lesion development remain obscure. Interestingly, the WM surrounding lesions is often devoid of any obvious signs of injury and is nearly indistinguishable from the WM tissue of healthy subjects. However, molecular studies suggest that certain abnormalities, such as altered tight junctions and early signs of inflammation, can still be detected along the rims of lesions and in the peri-lesional WM [13][14][15]. Thus, investigating the transcriptional profile of WM samples gathered from different regions and with different pathophysiological characteristics could be useful for the identification of transcriptomic signatures of disease in spite of the heterogeneity of tissue and clinical presentation.
Researchers have used several methods to decipher pathological changes in the WM of people afflicted with MS. With the advancement of high-throughput techniques such as microarray and RNA-sequencing, scientists have attempted to explore the transcriptomic profile of the WM of MS patients [16][17][18]. Whilst these approaches have helped to gain some insights into gene expression changes across different lesion types, key factors such as small sample size, variations in storage conditions, and quality of samples RNA (among others) have hindered the depth of these analyses. Hence, in this study, we merged multiple publicly available datasets with the idea of overcoming some of these issues whilst also subserving the need to provide a more in-depth evaluation of available transcriptomic data [19][20][21].
To achieve this goal, we performed a meta-analysis using transcriptomic data obtained from three publicly available microarray datasets obtained from WM samples collected from MS donors and aged-matched non-MS controls. In addition, to validate our findings and assess whether gene expression changes were consistent across the different MS subtypes, we performed reverse transcription quantitative real-time PCR (RT-qPCR) of selected top novel up-and down-regulated transcripts using WM samples from aged-matched non-MS and MS donors (RRMS, SPMS, and PPMS) readily available in our laboratory.

Data Collection and Meta-Analysis
After searching the GEO database for studies that met our inclusion-exclusion criteria (see Section 4.1), three datasets were selected for the meta-analysis study (please refer to the experimental workflow shown in Figure 1).
For each dataset, we grouped sample data into non-MS and MS WM tissues to allow comparisons between non-pathological and pathological tissue, irrespective of lesion type or WM appearance. This approach allowed us to maximise the number of samples included in the meta-analysis, hence providing a larger cohort of samples from which we could extrapolate more in-depth transcriptional changes. The GSE38010 dataset contained 5 MS samples and 2 control samples. GSE32915 contained 12 MS samples and 4 control samples, while GSE108000 contributed 30 MS samples and 10 control samples. Pre-processing of the three datasets resulted in a comparable range of expression levels ( Figure S1).
Following pre-processing of datasets, we utilised Stouffer's Z-score method to determine the differentially expressed genes (DEGs) in our meta-analysis. Evidence has shown that this method outperforms Fisher's combined probability test, which has been a popular method for microarray meta-analyses [21,22]. Using the former probability test, we found that 1446 genes were differentially expressed when comparing control versus MS WM samples (please see Table S1). A total of 742 genes were significantly up-regulated (log2 Fold Change (FC) ≥ 0.5 in the combined dataset, and 704 genes were significantly For each dataset, we grouped sample data into non-MS and MS WM tissues to allow comparisons between non-pathological and pathological tissue, irrespective of lesion type or WM appearance. This approach allowed us to maximise the number of samples included in the meta-analysis, hence providing a larger cohort of samples from which we could extrapolate more in-depth transcriptional changes. The GSE38010 dataset contained 5 MS samples and 2 control samples. GSE32915 contained 12 MS samples and 4 control samples, while GSE108000 contributed 30 MS samples and 10 control samples. Pre-processing of the three datasets resulted in a comparable range of expression levels ( Figure  S1).
Following pre-processing of datasets, we utilised Stouffer's Z-score method to determine the differentially expressed genes (DEGs) in our meta-analysis. Evidence has shown that this method outperforms Fisher's combined probability test, which has been a popular method for microarray meta-analyses [21,22]. Using the former probability test, we found that 1446 genes were differentially expressed when comparing control versus MS WM samples (please see Table S1). A total of 742 genes were significantly up-regulated (log2 Fold Change (FC) ≥ 0.5 in the combined dataset, and 704 genes were significantly down-regulated (log2 (FC) ≤ −0.5. An overview of the top 15 up-and down-regulated DEGs are shown in Table 1 below.
Using the gene set variation analysis [23], we also aimed to replicate our findings by comparing DEGs from our merged microarray dataset with DEGs obtained from a publicly available RNA-sequencing dataset. Following the comparison with GSE138614, we confirmed that the general transcriptomic alterations observed in the merged dataset were consistent with those in the RNA-sequencing dataset ( Figure S2A,B).  Using the gene set variation analysis [23], we also aimed to replicate our findings by comparing DEGs from our merged microarray dataset with DEGs obtained from a publicly available RNA-sequencing dataset. Following the comparison with GSE138614, we confirmed that the general transcriptomic alterations observed in the merged dataset were consistent with those in the RNA-sequencing dataset ( Figure S2A,B).

Functional Classification and Pathway Analyses of DEGs
To gather functional insights into the biological and intracellular processes associated with the DEGs obtained from our meta-analysis, we utilised Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. The top 15 annotated GO terms and KEGG pathways are depicted in Figure 2A On the x-axis, the minimum −Log10 of the calculated adjusted p-value is given (qscore). On the y-axis, the associated GO (A) or KEGG pathway (B) terms are listed.

Unique DEGs Detected from the Meta-Analysis
Following our analyses of merged datasets, we sought to determine if the unified dataset improved the depth of the analysis, enabling us to identify unique DEGs that could not be identified in the three datasets when analysed individually. Using the limma package, we performed DEG analyses comparing the Control versus MS tissue of GSE32915, GSE38010, and GSE108000 individually (Tables S2-S4). The overlap between DEGs found in each analysis and our meta-analysis is illustrated in Figure 3. Both analyses allow the extrapolation of the biological pathways and intracellular mechanisms that are altered based on the number and statistical power of the annotated DEGs from our combined analysis. The most significant GO term was central nervous system myelination (FDR = 9.87 × 10 −5 ). As expected from a disease affecting the WM, most of the top GO terms were associated with oligodendrocytes-or myelin-related alterations. Interestingly, we also identified DEGs that were linked to protein metabolism/turnover (proteasome-mediated ubiquitin-dependent protein catabolic process, FDR = 2.98 × 10 −3 ; negative regulation of protein phosphorylation, FDR = 3.078 × 10 −3 ; and regulation of proteasomal ubiquitin-dependent protein catabolic process, FDR = 6.870 × 10 −3 ).
In contrast, KEGG analysis of the DEGs from the merged dataset revealed no statistically significant associations with any specific intracellular pathway ( Figure 2B). However, some relevant pathways emerged as being affected by the disease, such as those pertaining to insulin resistance (FDR = 0.0977), axon guidance (FDR = 0.0977), and cell adhesion molecules (FDR = 0.162).

Unique DEGs Detected from the Meta-Analysis
Following our analyses of merged datasets, we sought to determine if the unified dataset improved the depth of the analysis, enabling us to identify unique DEGs that could not be identified in the three datasets when analysed individually. Using the limma package, we performed DEG analyses comparing the Control versus MS tissue of GSE32915, GSE38010, and GSE108000 individually (Tables S2-S4). The overlap between DEGs found in each analysis and our meta-analysis is illustrated in Figure 3. On the x-axis, the minimum −Log10 of the calculated adjusted p-value is given (qscore). On the y-axis, the associated GO (A) or KEGG pathway (B) terms are listed.

Unique DEGs Detected from the Meta-Analysis
Following our analyses of merged datasets, we sought to determine if the unified dataset improved the depth of the analysis, enabling us to identify unique DEGs that could not be identified in the three datasets when analysed individually. Using the limma package, we performed DEG analyses comparing the Control versus MS tissue of GSE32915, GSE38010, and GSE108000 individually (Tables S2-S4). The overlap between DEGs found in each analysis and our meta-analysis is illustrated in Figure 3.  As it can be appreciated in Figure 3, whereas each dataset produced several DEGs (GSE380010 = 663, GSE32915 = 405, and GSE108000 = 3830, respectively), only four DEGs were in common among the three individual datasets, and the number increased to nine DEGs when the combined dataset was also included. Interestingly, the combined meta-data identified 175 new DEGs (Table S5). Table 2 shows the top 15 up-and down-regulated genes extrapolated from the combined dataset.

Validation of Selected DEGs Using Real-Time Quantitative PCR
Since meta-analyses are statistical tools that rely on p-value combination methods, we thought it was of interest to determine whether selected DEGs that were uniquely affected in the unified dataset could be validated by real-time quantitative PCR (RT-qPCR). For this purpose, we used an independent set of WM tissues obtained from MS donors that had been previously diagnosed with different clinical MS subtypes (i.e., relapse-remitting [RRMS], secondary-progressive [SPMS], and primary-progressive MS [PPMS]) and non-MS age-matched controls.

Validation of Selected DEGs Using Real-Time Quantitative PCR
Since meta-analyses are statistical tools that rely on p-value combination methods, we thought it was of interest to determine whether selected DEGs that were uniquely affected in the unified dataset could be validated by real-time quantitative PCR (RT-qPCR).  Surprisingly, BBS7 gene expression was not different in the WM of MS versus non-MS samples ( Figure 5A), although when results were clustered based on MS subtype, data showed a bimodal distribution of data for RRMS cases and a trend towards an increase in expression for SPMS cases ( Figure 5A ). Analyses of EPHX4 transcripts demonstrated increased gene expression in the MS group, although these were not statistically significant (p = 0.089 vs. control; Figure 5B). However, upon clustering data by clinical subtypes, SPMS-specific EPXH4 gene up-regulation was found to contribute remarkably to the overall up-regulation seen in MS samples (** p < 0.01 vs. control; Figure 5B ), with a minor increase in gene expression also identified in RRMS cases (p > 0.05 vs. control). Similarly, L1CAM expression was increased in MS samples versus non-MS controls, although not achieving statistical significance (p = 0.0771; Figure 5C). At a subtype-specific level, this trend towards an increase was driven solely by a robust up-regulation in SPMS cases (*** p < 0.001; Figure 5C ) and marginally contributed by PPMS, but not RRMS cases. Finally, an overall and significant increase in NFAT5 gene expression levels was seen in all MS cases (* p < 0.05; Figure 5D), which was contributed by RRMS (p = 0.051; Figure 5D ) and SPMS (** p < 0.01; Figure 5D ), but not PPMS cases (p > 0.05; Figure 5D ).
Upon examining the expression of genes predicted to be down-regulated, we observed more robust effects ( Figure 5). RT-qPCR validation of CYB5R2, a gene encoding for a protein involved in cholesterol biosynthesis, showed a strong reduction in expression levels among MS samples when compared with non-MS controls (*** p < 0.001; Figure 5E), an effect that was visible across all the tested MS subtypes (* p < 0.05 for RRMS, ** p < 0.01 for SPMS and PPMS, respectively; Figure 5E′). PTP4A1 mRNAs also confirmed the downregulation seen in our bioinformatics studies (* p < 0.05 vs. controls; Figure 5F). However, the subtype-specific down-regulation was statistically significant in RRMS cases (** p < 0.01; Figure 5F′) and only marginally reduced in progressive MS subtypes (p > 0.05). A strong and global GPR37 down-regulation was observed in the MS WM (**** p < 0.0001; Figure 5G), which was consistently seen across all MS subtypes (**** p < 0.0001 for RRMS, *** p < 0.001 for SPMS and PPMS, respectively; Figure 5G′). Similarly, VLDLR gene expression was also remarkably reduced in the MS WM (*** p < 0.0001; Figure 5H), and the reduction was equally contributed when examined at a subtype-specific level (** p < 0.01 for both RRMS, SPMS, and PPMS; Figure 5H′).  . Fold changes were calculated with the ΔΔCt method using the expression of ribosomal protein S18 as a reference gene. N = 6-10 per group, with MS containing the combined results of each subtype. Mean ± SEM is plotted. * p < 0.05, ** p < 0.01, *** p < 0.001, **** p < 0.0001, as determined by unpaired t-test (control vs. MS) or one-way ANOVA followed by Dunnett's post-hoc test (control vs. MS subtypes).

Discussion
In the present study, we conducted a meta-analysis by merging three publicly available microarray datasets; the final goal was to exploit this combinatorial approach to unveil additional transcriptomic changes in the MS WM that may have been overlooked when performing individual analyses using single datasets. Indeed, whilst the meta-analysis identified a number of genes that overlapped with those from the individual datasets, it revealed several new DEGs that were not previously reported. Moreover, our validation studies on a panel of selected genes whose expression levels were uniquely altered in our combined dataset confirmed the predicted gene expression changes. For this purpose, we utilised an independent set of WM samples from donors with different clinical forms of MS and age-matched controls, thus enabling us to unveil additional transcriptional information pertaining to MS subtype-specific alterations amongst the newly identified DEGs ( Figure 5).
GO and KEGG analyses of combined metadata demonstrated that, in addition to the expected annotation of terms linked to oligodendrocyte differentiation, myelination, and oligodendrocyte development, perturbations in biological pathways associated with lipid and protein metabolism were also detected. Impaired lipid metabolism within the WM . Fold changes were calculated with the ∆∆Ct method using the expression of ribosomal protein S18 as a reference gene. N = 6-10 per group, with MS containing the combined results of each subtype. Mean ± SEM is plotted. * p < 0.05, ** p < 0.01, *** p < 0.001, **** p < 0.0001, as determined by unpaired t-test (control vs. MS) or one-way ANOVA followed by Dunnett's post-hoc test (control vs. MS subtypes).
Upon examining the expression of genes predicted to be down-regulated, we observed more robust effects ( Figure 5). RT-qPCR validation of CYB5R2, a gene encoding for a protein involved in cholesterol biosynthesis, showed a strong reduction in expression levels among MS samples when compared with non-MS controls (*** p < 0.001; Figure 5E), an effect that was visible across all the tested MS subtypes (* p < 0.05 for RRMS, ** p < 0.01 for SPMS and PPMS, respectively; Figure 5E ). PTP4A1 mRNAs also confirmed the down-regulation seen in our bioinformatics studies (* p < 0.05 vs. controls; Figure 5F). However, the subtypespecific down-regulation was statistically significant in RRMS cases (** p < 0.01; Figure 5F ) and only marginally reduced in progressive MS subtypes (p > 0.05). A strong and global GPR37 down-regulation was observed in the MS WM (**** p < 0.0001; Figure 5G), which was consistently seen across all MS subtypes (**** p < 0.0001 for RRMS, *** p < 0.001 for SPMS and PPMS, respectively; Figure 5G ). Similarly, VLDLR gene expression was also remarkably reduced in the MS WM (*** p < 0.0001; Figure 5H), and the reduction was equally contributed when examined at a subtype-specific level (** p < 0.01 for both RRMS, SPMS, and PPMS; Figure 5H ).

Discussion
In the present study, we conducted a meta-analysis by merging three publicly available microarray datasets; the final goal was to exploit this combinatorial approach to unveil additional transcriptomic changes in the MS WM that may have been overlooked when performing individual analyses using single datasets. Indeed, whilst the meta-analysis identified a number of genes that overlapped with those from the individual datasets, it revealed several new DEGs that were not previously reported. Moreover, our validation studies on a panel of selected genes whose expression levels were uniquely altered in our combined dataset confirmed the predicted gene expression changes. For this purpose, we utilised an independent set of WM samples from donors with different clinical forms of MS and age-matched controls, thus enabling us to unveil additional transcriptional information pertaining to MS subtype-specific alterations amongst the newly identified DEGs ( Figure 5).
GO and KEGG analyses of combined metadata demonstrated that, in addition to the expected annotation of terms linked to oligodendrocyte differentiation, myelination, and oligodendrocyte development, perturbations in biological pathways associated with lipid and protein metabolism were also detected. Impaired lipid metabolism within the WM has been associated with microglia senescence [35], hindered myelin maintenance [36], and reduced regeneration capacity [37] in both rodent and human studies. Therefore, results from our bioinformatics approach further confirmed such alterations of these biological processes in human ex vivo MS WM tissues. Similarly, a systematic review of proteomic studies reported several imbalances of WM proteostasis in MS brains [38], as seen in our study, although these reports showed some inconsistencies that were most likely due to the clinical heterogeneity of the disease. KEGG analyses also revealed alterations in pathways associated with insulin resistance. While the strength of this association was not statistically significant, it raises important questions on the possible relationship between systemic glucose sensitivity and MS. In this regard, studies have shown that insulin resistance has a high predictive value for disease severity in people with MS, especially among those with a diagnosis of SPMS [39,40]. Additionally, insulin resistance was found to exert detrimental effects on oligodendrocyte and myelin health in other neurodegenerative conditions such as Alzheimer's disease [41], multiple system atrophy [42], and in developmental pathologies caused by ethanol consumption, such as foetal alcohol spectrum disorder [43]. These reports provide more than a simple anecdotal evidence of the link existing between glucose metabolism and WM health and warrant further investigations to help better understand the crosstalk between insulin signalling and the functionality of myelin-producing cells.
Through comparative analyses of DEGs from the merged dataset and the individual source datasets, we were able to define a subset of novel DEGs. Enrichment (GO) pathway analyses of this subset of altered genes revealed robust and significant links with G-proteincoupled receptor (GPCR) signalling pathways. GPCR signalling plays a critical role in regulating both the maturation of myelin cell and immune cell activities. For instance, intrinsic signalling of the G-protein-coupled receptor 17 (GPR17) in oligodendrocyte progenitors was found to inhibit different stages of cell maturation [44]. Other well-established and relevant GPCRs linked to autoimmunity and MS are the sphingosine-1-phosphate receptors [45,46], which are specifically targeted by the best available drugs approved for the treatment of relapsing and secondary progressive MS forms, such as fingolimod [47] and siponimod [48].
From KEGG analyses of novel DEGs uniquely found in the merged dataset, we also identified robust alterations of pathways associated with fatty acid degradation. The disruption of lipid metabolism due to impaired degradation of fatty acids is detrimental to myelin formation and stability. Myelin is a modified cell membrane that forms a multilayer sheath around the axon. Whilst myelinated sheaths retain several features of conventional cell membranes, myelin constituent lipid components differ, and so do fatty acids [49]. Indeed, there is evidence from mutations associated with demyelination to suggest that myelin health requires intact lipid pathways with a steady balance between lipid synthesis and degradation [50,51]. As such, data from our bioinformatics analyses brings under the spotlight the critical role of lipid pathways in the regulation of myelin synthesis and repair, pinpointing how these underlying WM defects play a key role in the onset and progression of MS pathology.
To strengthen the bioinformatics analyses, selected DEGs that appeared exclusively in our merged dataset were validated by RT-qPCR using WM tissue samples obtained from MS donors with different disease subtypes (RRMS, SPMS, or PPMS).
Two genes associated with cholesterol/lipid metabolism appeared amongst the top 15 of the 175 newly detected DEGs, namely EPHX4 and VLDLR, with EPHX4 being upregulated in SPMS cases and VLDLR being down-regulated in all subtypes of MS [26,34]. Impaired lipid metabolism has been linked to disease progression in MS and was found to have strong predictive value in determining the degree of neurodegeneration and lesions' formation [52][53][54][55][56]. Cholesterol is an essential lipid for the synthesis of myelin, and aberrant cholesterol metabolism is linked to both impaired remyelination and myelindebris phagocytosis [37,[57][58][59][60]. Interestingly, another of our selected genes for validation that was down-regulated (GPR37) is linked to oligodendrocyte functioning. GPR37 is a GPCR whose expression is normally increased in pre-myelinating and myelinating oligodendrocytes and acts as a negative regulator of oligodendrocyte differentiation [29]. In fact, GPR37-null mice exhibit signs of hypermyelination. The strongly reduced expression of this gene we observed in our meta-analysis and validation experiments may suggest the inherent need to increase the maturation of surviving oligodendrocytes in the WM distal to the lesion site of MS patients or alternatively, promote the phenotypic transition from oligodendrocyte precursors to mature oligodendrocytes cells, perhaps in the attempt to replenish the depleting pool of myelinating cells.
However, the pathological processes that underpin MS disease are not limited to demyelination and/or impaired remyelination. Altered expression of other molecular targets, such as cell-adhesion molecules, has also been implicated in disease pathophysiology, as these have a strong impact on immune-cell functioning, maintenance of blood-brain barrier integrity, and myelin density [13,14,[61][62][63][64][65]. L1CAM is a gene that encodes for a cell-adhesion molecule that we found to be significantly up-regulated in the WM of SPMS patients only. Although it is not essential for ongoing myelination, L1CAM plays an important role in the initiation of myelination and supports the interactions between axons and oligodendrocytes, a process that appears to be activated upon the proteolysis of the L1CAM protein [31,66,67]. Since L1CAM proteolytic products (L1 molecules) have been detected in brain lysates, the increased expression of L1CAM in SPMS cases suggests the increased need for cleaved L1CAM proteins to restore homeostasis in patients with this progressive form of MS [68].
Altogether, the results reported here highlight a complex scenario where different pathological processes and intracellular pathways coexist and prevail over homeostatic mechanisms, culminating in a reduced WM regeneration potential, especially in SPMS.
Notwithstanding that this study provided novel transcriptomic data, it is noteworthy mentioning that the meta-analysis included only three publicly available microarray datasets. This limited our ability to balance the number of controls versus MS samples. Although, in principle, this could introduce a small risk of sampling bias, our validation experiments confirmed that the newly detected genes from our meta-analysis were indeed significantly deregulated in the MS WM, thus providing new molecular targets to investigate in the context of MS.
Taken together, this meta-analysis revealed additional information about gene expression changes in the WM of MS donors, irrespective of lesion type or tissue appearance. After merging the datasets, we were able to provide additional DEGs that were not detected previously in the individual datasets. Additionally, validation studies proved that the expression of selected genes matched that predicted in our meta-analysis, further highlighting the strength of bioinformatics tools for the identification of novel genes and regulatory pathways to gain a better understanding of MS and identify new molecular targets for therapeutic development.

Datasets Selection and Pre-Processing of Data
The public database Gene Expression Omnibus (GEO) was used to search for available micro-array datasets obtained from MS WM (and non-MS controls). All the available microarray studies investigating human MS brain WM tissue were considered. Each dataset had to contain control brain samples from age-matched individuals to be included in the meta-analysis. No distinctions were made between MS subtypes or the typology of the lesion in this meta-analysis. Moreover, no distinction between microarray platforms was made considering the limited number of studies available. Based on these criteria, three datasets from GEO were included in this meta-analysis, namely [GEO:GSE108000], [GEO:GSE32915], and [GEO:GSE38010].
The following steps were all performed in RStudio (Version 1.4.1717; R version: 4.1.1). Using the R package GEOquery, raw data and sample metadata from the aforementioned datasets were obtained [69]. The Affymetrix dataset, GSE38010, was normalised using "Robust-Multi array Average" normalisation with the use of the affy package [70]. Raw data for the remaining two datasets, GSE32915 and GSE108000, both performed on the Agilent platform, were read and normalised using the limma package [71]. In the case of GEO:GSE108000, only the channel containing experimental data was included to keep normalisation steps similar among datasets. Both background corrections and quantile between-array normalisations were performed on these two datasets. The datasets were then annotated using the appropriate annotation packages obtained from Bioconductor.

Meta-and Differential Gene Expression Analysis
Meta-analysis and subsequent analyses of DEGs were performed using the recently developed DExMA package [72]. Briefly, the meta-analysis was performed on the normalised datasets using Stouffer's Z-score method, comparing MS WM tissue with non-MS control samples [21]. Genes had to be present in all three datasets for the gene to be included in the meta-analysis. The identified (DEGs) were considered statistically significant if the log2 fold change (FC) was > 0.5 or < −0.5 and if the false discovery rate was < 0.05. Individual DEG analyses of each dataset were performed using the limma package according to the manual's instructions.

Gene Ontology and Pathway Analysis
Gene Ontology (GO) classification and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses were performed using the DEGs obtained from comparing non-MS brain WM tissue with MS WM tissues. To achieve these analyses, both the GO and KEGG analyses were implemented in RStudio using the "ClusterProfiler" package [73].

Gene set Variation Analysis
In RStudio (Version 1.4.1717; R version: 4.1.1), a gene set variation analysis (GSVA) was performed using the package 'GSVA' (1.42.0, Bioconductor) using the publicly available RNA-sequencing dataset GSE138614 [16,23]. Briefly, the raw count matrix was obtained from the NCBI GEO database and transformed using the variance stabilizing transformation (VST) algorithm [74]. Either up-or down-regulated DEGs detected in our meta-analysis were used for the analysis. Results were plotted using GraphPad Prism (version 9.3.1 for Windows, GraphPad Software, San Diego, CA, USA, www.graphpad.com; last accessed on 31 March 2023).

Human Post-Mortem Brain Tissue
CNS WM samples from MS and non-MS donor tissues were obtained from the MS Research Australia Brain Bank (Tissue Transfer Deed-CT31920, approved on 21 June 2021) and the Victorian Brain Bank (Material Transfer Agreement-VBB. 19.07, approved on 16 January 2020). In total, 16 samples from RRMS, SPMS, and PPMS donors were used for validation studies. As a control, we used additional tissue shavings obtained from non-MS  Table 3. Micro-dissections of snap-frozen WM shavings from both MS and non-MS donors were performed using a stereoscopic microscope (10× magnification) under RNase-free conditions. Briefly, dissected tissue shavings (~30-40 mg wet-weight) were immediately submerged in RNAlater™ Stabilization Solution (Thermo Fisher Scientific, Scoresby, VIC, Australia) and then snap-frozen in liquid nitrogen until further processing. Total RNA was extracted using TRIreagent (Sigma-Aldrich, Castle Hill, NSW, Australia). Briefly, samples were homogenised using γ-irradiated autoclavable pestles (Sigma-Aldrich, Castle Hill, NSW, Australia) and then centrifuged at 12,000 × g at 4 • C, 15 min in the presence of 200 µL chloroform (Sigma-Aldrich, Castle Hill, NSW, Australia). The RNA fraction was collected and spun down with 500 µL ice-cold 2-propanol (Sigma-Aldrich, Castle Hill, NSW, Australia) as above. Pellets containing RNA were then washed twice with 75% ethanol and air-dried. Thereafter, RNA was subjected to DNAse I treatment (Thermo Fisher Scientific, Scoresby, VIC, Australia), followed by a clean-up step using the RNeasy Micro Kit (Qiagen, Clayton, VIC, Australia). RNA concentrations were determined using a NanoDrop™ 2000 spectrophotometer (Thermo Fisher Scientific, Scoresby, VIC, Australia).
cDNA was generated using the Tetro cDNA synthesis kit (Bioline, Sydney, NSW, Australia) according to the manufacturer's instructions. To analyse mRNA expression, we performed real-time quantitative PCRs for a selection of eight genes (the primers list is shown in Table 4). Each reaction contained 5 µL iTaq Universal SYBR Green Supermix (Bio-Rad, South Granville, NSW, Australia), 3µL of cDNA (100 ng), and 0.8 µL forward and reverse primers (final concentration = 500 nM). Ribosomal protein S18 was used as a reference gene. For the analysis, mean fold changes were calculated using the ∆∆Ct method, as described previously [75].  Informed Consent Statement: Not applicable.

Data Availability Statement:
The micro-array datasets utilised in this study are publicly available and can be found here: GSE108000, GSE32915, and GSE38010. The RNA-sequencing dataset is available here: GSE138614. Any other data presented in this study can be made available upon reasonable request to authors.