Drug Repurposing for Amyotrophic Lateral Sclerosis Based on Gene Expression Similarity and Structural Similarity: A Cheminformatics, Genomic and Network-Based Analysis

: Background: Amyotrophic Lateral Sclerosis (ALS) is a devastating neurological disorder with increasing prevalence rates. Currently, only 8 FDA-approved drugs and 44 clinical trials exist for ALS treatment specifying the lacuna in disease-specific treatment. Drug repurposing, an alternative approach, is gaining huge importance. This study aims to identify potential repurposable compounds using gene expression analysis and structural similarity approaches. Methods: GSE833 and GSE3307 were analysed to retrieve Differentially Expressed Genes (DEGs) which were utilized to identify compounds reversing the gene signatures from LINCS. SMILES of ALS-specific FDA-approved and clinical trial compounds were used to retrieve structurally similar drugs from DrugBank. Drug-Target-Network (DTN) was constructed for the identified compounds to retrieve drug targets which were further subjected to functional enrichment analysis. Results: GSE833 retrieved 13 & 5 whereas GSE3307 retrieved 280 & 430 significant upregulated and downregulated DEGs respectively. Gene expression similarity identified 213 approved drugs. Structural similarity analysis of 44 compounds resulted in 411 approved and investigational compounds. DTN was constructed for 266 compounds to identify drug targets. Functional enrichment analysis resulted in neuroinflammatory response, cAMP signaling, PI3K-AKT signaling, and oxidative stress pathways. A preliminary relevancy check identified previous association of 105 compounds in ALS research, validating the approach, with 172 potential repurposable compounds.


Introduction
Amyotrophic Lateral Sclerosis or Lou Gehrig disease is devastating neurological disorder characterized by loss of motor neurons and skeletal musculature paralysis mostly affecting middle-aged people [1,2] Symptoms associated with ALS were found to be cognitive and behavioral like dysarthria, dysphagia slurred speech, tripping, falling, trouble in swallowing, untimely crying, laughing/yawning etc [3][4][5] As per the report in 2019, the incidence rate for ALS was found to be 0.6-3.8 per 100,000 persons per year, however, in Europe, incidence rate was found to be 2.1-3.8 per 100,000 persons per year which is at growing stage.Similarly, prevalence rate was reported to be 4.1-8.4 per 100,000 persons globally.Age of onset of disease was found to be between 51-66 years [6].Risk factors were found to be family history, genetic (mutations in SOD1, TBK1, PFN1, TUBA4α, C9orf72 etc.), environmental factors (smoking, exposure to toxins, radiation), viral infections (HIV and polio) [3,4].
ALS is majorly two types based on its genetic background: Familial and Sporadic.Familial ALS corresponds to very less number of patients (~5-10%) which might be due to family history and inheritance, whereas, the majority of patients fall under sporadic category which lack family history but the disease was seen due to genetic alterations or other risk factors [2,4].ALS is mainly have four clinical representations based on symptoms: (1) Primary Lateral Sclerosis, (2) Limb onset ALS, (3) Progressive muscular atrophy and (4) Bulbar-onset ALS [3,4].It was reported that ALS pathophysiology was mainly involved in loss-of-function or gain-of-function of genes and it was associated with mitochondrial dysfunction, impaired RNA metabolism, cytoskeletal trafficking defects and altered proteostasis.So far around 40 genes were found to be associated with ALS progression [2,3,7], however there is a dearth in identification of disease specific genes, as there were less number of gene reported to far.So far only eight drugs namely, Riluzole, Edavarone, Sodium phenylbutyrate, Taurursodiol, Tofersen, Dextromethorphan hydromide, Quinidine sulfate and Rimabotulinum toxin B were approved by FDA for ALS treatment [8].However, all these compounds were found to be symptomatic rather than disease specific which opens an opportunity for drug discovery or drug repurposing.As the drug discovery process involved around 15-20 years, there is an increase in demand for drug repurposing.
In this study, we designed a drug repurposing approach which retrieves the compounds based on gene expression similarity and structural similarity.Initially, GEO datasets corresponding to ALS were identified and Differentially Expressed Genes (DEGs) were retrieved and used as input to retrieved the compounds that can reverse the gene signatures.Parallelly, the FDA approved small molecules and compounds which were in clinical trials for ALS were retrieved and performed a structural similarity analysis to retrieve the similar drugs.Thus, obtained compounds from both approaches were collated and constructed a drug-target-network to identify the targets that selected drugs were interacting with.Finally functional enrichment analysis was performed for the identified drugs to retrieve KEGG pathways and GO terms.Additionally, a preliminary search was performed to identify the compounds that were explored in ALS research.

Part-1: Selection of Microarray Datasets and Gene Expression Analysis
Microarray datasets pertaining to Amyotrophic Lateral Sclerosis (ALS) were retrieved from the Gene Expression Omnibus (GEO) database [9] and were screened through a set of inclusion and exclusion criteria.

•
Datasets satisfying all the following criteria were selected The selected datasets were preprocessed, curated and analyzed individually for retrieval of Differentially Expressed Genes (DEGs) (both upregulated and downregulated) through Limma in Bioconductor package [10] in R. The datasets which revealed DEGs with False Discovery Rate (FDR) p-value (adjusted p-value according to Benjamin-Hochberg method) <0.05 were selected and segregated into up (log FC >1) and down regulated (logFC <1) and are utilized to retrieve the compounds that can reverse the gene signatures.The volcano plots were generated using EnhancedVolcano package [11] in R.

Part-2: Creation of Drug Library Based on Structurally and Gene Expression Similarity
SMILES of both FDA approved ALS drugs and clinical trial compounds (only Phase2 and above) were retrieved from DrugBank [12].The smiles were utilized through DrugBank data to retrieve structurally similar compounds with cut-off Tanimoto similarity (0.7 and above were selected).RDkit [13] was utilized to convert the smiles into Morgan fingerprints and calculate the structural similarity.Thus, retrieved compounds through structural similarity and gene expression similarity approach were mapped to their developmental stage and only compounds under "approved" and "Investigational" categories were selected for further analysis.Similarly, the obtained DEGs were utilized to retrieve the compounds that can reverse signatures from L1000CDS2, SIGCOM and L1000_FWD tools of LINCS [14].

Part-3: Construction of Drug-Target-Network (DTN) and Functional Enrichment Analysis
The obtained drugs from structural similarity and gene expression similarity in the above step were subjected to construction of DTN through STITCH database [15] with confidence score of 0.7 and above to retrieve the targets associated with identified compounds.The interacting proteins were subjected to functional enrichment analysis through ClueGO app [16] in Cytoscape to retrieve GO terms and KEGG pathways [17].The compounds that were interacting only with proteins/drug targets were subjected to preliminary search for their exploration in ALS research.

Part 1: Gene Expression Analysis
Around five GEO datasets corresponding to ALS belonging to "Homo sapiens" were obtained, out of which only 2 (GSE833 [18] and GSE3307 [19,20]) (Figure 1) were found to be meeting inclusion & exclusion criteria (Table 1).The datasets were analyzed through Limma package in Bioconductor [10] package in R. 13 upregulated and 5 downregulated DEGs were retrieved from GSE833 (Figure 2 and Supplementary Materials Table S1).280 upregulated and 430 downregulated DEGs were obtained from GSE3307 (Figure 3 and Supplementary Materials Table S2).

Part 2: Creation of Drug Library Based on Structural and Gene Expression Similarity
Two FDA approved drugs (small molecules) namely Riluzole and Edavarone [8,12] were found for ALS.However, 44 clinical trials (Phase 2 and above) were found for ALS in clinicaltrials.gov[21], which revealed around 20 molecules (Table 2).Structural similarity analysis through RDkit resulted in a total of 569 compounds, out of which only 397

Part 2: Creation of Drug Library Based on Structural and Gene Expression Similarity
Two FDA approved drugs (small molecules) namely Riluzole and Edavarone [8,12] were found for ALS.However, 44 clinical trials (Phase 2 and above) were found for ALS in clinicaltrials.gov[21], which revealed around 20 molecules (Table 2).Structural similarity analysis through RDkit resulted in a total of 569 compounds, out of which only 397 compounds were belonging to approved and investigational categories (Supplementary Materials Table S3).Similarity, the DEGs retrieved from GEO datasets were used as input to retrieve the compounds that can reverse the signatures in LINCS database [14].
This revealed 213 compounds belonging to approved category (Supplementary Materials Table S4).

Part-3: Construction of DTN and Functional Enrichment Analysis
The list of unique compounds obtained by collating the compounds obtained from structural and gene expression similarity were subjected to construction of drug-targetnetwork using STITCH database [15].DTN resulted in 915 nodes with 7200 edges (Figure 4) (Supplementary Materials Table S5.Among 915 nodes, 504 were found to be proteins and were subjected to functional enrichment analysis using ClueGo [16] app in Cytoscape.Functional enrichment analysis revealed 2339 significant GO_biological processes, out of which 17 were found to be involved in inflammatory response and 162 in signalling pathways (Supplementary Materials Table S6, Supplementary Materials Figure S1 and Figure 5).Around, 111 terms were found to be significantly associated with cellular response (Supplementary Materials Table S7, Supplementary Materials Figure S2 and Figure 6).However, 344 and 10 terms related to Molecular function (Supplementary Materials Table S8, Supplementary Materials Figure S3 and Figure 7) and immune response (Supplementary Materials Table S9, Supplementary Materials Figure S4 and Figure 8) were found to be significantly associated with the proteins.KEGG analysis revealed around 142 significant pathways (Supplementary Materials Table S10, Supplementary Materials Figure S5 and Figure 9).Around 411 compounds were found to be significantly involved in DTN, out of which 134 compounds were found to be as anti-cancer drugs and antibiotics which were excluded from the study.The rest 266 compounds were explored for their association in ALS studies which revealed around 172 compounds (Table 3) were unexplored in ALS research, thus opening a gateway to explore these drugs for their potential in ALS treatment.

Discussion
ALS is a progressive neurological disorder affecting the motor nerves and inducing cognitive and behavioural deficits mainly affecting middle aged.It is reported that incidence and prevalence is being increased year by year.Mutations in genes like SOD1, TBK1, PFN1, TUBA4α, C9orf72 etc., were known possess huge risk factor for ALS [2,3].ALS pathogenesis is associated with Currently, there are only eight FDA approved drugs and 20 compounds in clinical trials for ALS [8,12].SOD1, a characteristic gene for ALS was known to be involved in mitochondrial dysfunction and increase oxidative stress, whereas, TUBA4A was reported to induce cytoskeletal and axonal trafficking defects.TDP-43 was known to be involved in autophagy and dysregulated proteostasis.Several proteins like SOD1 and TUBA4A were known to form prion like aggregates which initiates inflammation.Due to this, there is increase in microglial activation in neuromuscular junctions before disease onset leading to axonal death [1,7].Although there were many reports exploring the role of inflammation in disease progression, none of the studies reported the compounds which can target inflammatory pathways.Our study identified the potential repurposable compounds in two different approaches (1) gene expression similarity wherein, compounds which can reverse the genetic signature obtained by analysing the gene expression data and (2) structural similarity approach in which, compounds which possess structural similarity with the FDA approved compounds and clinical trial compounds.Thus, obtained compounds were subjected for drug-target-network construction.Functional enrichment analysis was performed for the protein that were interacting with selected compounds.Proteins were associated with GO terms like "Chronic inflammatory response", "neuroinflammatory response", "acute inflammatory response" were found to be inflammatory responses in ALS.KEGG analysis of proteins revealed their association with "calcium signalling", "cAMP signalling pathway", "AMPK signalling", "PI3K-AKT signalling", "Rap1 signalling pathway" etc., which were enriched in ALS progression.Additionally, 277 compounds from our analysis were previously explored for their potential in ALS research which validates our approach.The rest 172 compounds were found to be unexplored in ALS research which have potential repurposable capacity in ALS.Thomas et al. 20, iden-tified potential compounds which can reverse the gene expression by utilizing differentially expressed genes through LINCS databases.A study by Giulia et al.21 integrated human interactome network and ALS disease associated genes to retrieve similar diseases by Random Walk with Restart algorithm, which then the retrieved drugs based on the diseases identified.Furthermore, a study by Jing-Jing Zhang et al. [22], retrieved repurposable potential of Carbamezepine (an anticonvulsant) for ALS by reducing motor neuron loss in SOD1-G93A ALS mouse model.Recently Helena Chaytow et al. [23], evaluated repurposable Terazosin (an anti-hypertensive agent) increased PGK1 activity which resulted in extended survival, increased motor neuron number in Thy1-hTDP-43 ALS mice model.Preclinical and clinical studies on Primidone (RIPK1 inhibitor) improved exhibited positive correlation with severe bulbar symptoms [24].

Conclusions
Inflammatory pathways play a major role in ALS disease progression.Retrieval of compounds by gene expression similarity and structural similarity identified around 266 compounds that were previously explored for their research in ALS and 172 compounds which were unexplored so far.This study opened a new research avenue in which the compounds can be further explored for their repurposable potential for ALS through virtual screening studies and experimental validation for their potential in ALS, which is our futuristic studies.

Supplementary Materials:
The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/biomedinformatics4030093/s1, Figure S1 S1: List of DEGs obtained from GSE833 dataset; Table S2: List of DEGs obtained from GSE3307 dataset; Table S3: List of compounds obtained through structural similarity approach; Table S4: List of compounds selected for DTN construction; Table S5: List of significant GO-biological process terms associated with targets; Table S6: List of significant GO-cellular response terms associated with targets; Table S7: List of significant GO-molecular function terms associated with targets; Table S8: List of significant GO-immune response terms associated with targets; Table S9: List of significant KEGG pathway terms associated with targets.

Figure 1 .
Figure 1.PRISMA flowchart representing the selection of GEO datasets for ALS.

Figure 1 .
PRISMA flowchart representing the selection of GEO datasets for ALS.

Figure 5 .
Figure 5. Gene Ontology-Biological processes associated with targets retrieved from DTN.

Figure 6 .
Figure 6.Gene Ontology-Cellular response associated with targets retrieved from DTN.

Figure 7 .
Figure 7. Gene Ontology-Molecular functions associated with targets retrieved from DTN.

Figure 6 .
Figure 6.Gene Ontology-Cellular response associated with targets retrieved from DTN.

Figure 7 .
Figure 7. Gene Ontology-Molecular functions associated with targets retrieved from DTN.

Figure 8 .
Figure 8. Gene Ontology-Immune responses associated with targets retrieved from DTN.

Figure 9 .
Figure 9. KEGG pathways associated with targets retrieved from DTN.

Figure 8 .
Figure 8. Gene Ontology-Immune responses associated with targets retrieved from DTN.

:
Number of targets from DTN associated with individual GO-biological process terms; Figure S2: Number of targets from DTN associated with individual GO-cellular response terms; Figure S3: Number of targets from DTN associated with individual GO-molecular function terms; Figure S4 Number of targets from DTN associated with individual GO-immune response terms; Figure S5: Number of targets from DTN associated with individual KEGG pathways; Table

Table 1 .
List of GEO datasets selected for the study.

Table 2 .
List of compounds selected for structural similarity.