Genetic Insights from a Molecular Pathway Analysis on Two Independent Samples of Autistic Patients

Background: Autism spectrum disorder (ASD) is a neurodevelopmental disorder characterized by limited interest and lacking ability in social interactions, repetitive behavior and dysfunction in social communication. ASD runs in families. Twin studies suggest a strong genetic basis for ASD. The complete definition of a genetic profile at risk for ASD is nevertheless currently lacking. Methods: NIHM-Autism datasets 3 and 4 (n=1233 and n=2890 respectively) were analyzed. A molecular pathway analysis was conducted. Quality analysis was run as usual (λ values). Plink and R (ReactomePA and Bioconductor packages) served for TDT, association tests and the molecular pathway analysis. Results and Discussion: The “Adherens junctions’ interactions pathway” and “Axon guidance” were enriched in the first sample, while the “Extracellular matrix organization pathway” was enriched in the second sample. The “Axon guidance pathway” showed a trend for enrichment in the second sample. A trend of significant enrichment was observed for the “NCAM1 molecular pathway” when the severity of autistic symptoms was investigated. Conclusion: Cell to cell interaction and the cell-matrix interaction may hold the genetic risk for ASD. Both neurodevelopment and immune response (T-cell) rely on those processes and may be involved in the pathophysiology of ASD. individuals are retrieved from the NIMH database (https://www. nimhgenetics.org/). A molecular pathway analysis is conducted on the transmission disequilibrium test (TDT) result in order to identify common molecular pathways enriched in variations associated with Autism in both samples. Moreover, the same approach is used to identify one or more pathways enriched in variations associated with the severity of autistic symptoms as ranked from 1 to 4 (from broad spectrum to strict Autism) in one of the samples. The present contribution is originated from a previous published abstract at the 29th ECNP Conference [22]. Materials and Methods


Introduction
Autism spectrum disorder (ASD) is a neurodevelopmental disorder that appears early in life and is characterized by limited interest and lacking ability in social interactions, repetitive behavior and dysfunction in social communication. Motor and intellectual deficits, together with mood and sleep disorder and sensory and gastrointestinal abnormalities are common [1]. ASD affects up to 1% of the general population [2] and has a genetic basis [3,4]. A number of genetic variations have been associated with ASD (https://gene.sfari.org/autdb/GS_Home. do), but results are not definitive [5][6][7][8][9][10][11]. A notable exception was recently published [12], reporting a positive association finding for a couple of genes previously associated with Autism, the SHANK3 and the WBSCR17. Apart from this result, the overall lack of replication rate in genetic association finding is in conflict with evidence showing that the heritability of ASD due to common variants is as high as 60% [13]. Limited power of the analysis, small sample sizes, and a critical phenotype definition may be limitations of the studies conducted so far [14][15][16]. A possible solution to this sparse genetic evidence is the "many genes common pathways" hypothesis, which suggests that the impact of different genes may converge to common pathways, whose impaired function results in the core symptoms of a disease [17]. The molecular pathways analysis is consistent with this hypothesis and may help understanding complex genetic diseases [18].
Previous results about the genetic networks whose disruption led to ASD showed that the following molecular pathways might be involved: Protein synthesis and metabolism, modulation of transcription process, chromatin remodeling, calcium signaling and the oxytocin pathway [19][20][21]. Nevertheless, those findings were generated by systematic reviews, and to the best of our knowledge a molecular pathway analysis on GWAS data derived from ASD trios is yet to be conducted.
In the present contribution two independent trios sample of ASD individuals are retrieved from the NIMH database (https://www. nimhgenetics.org/). A molecular pathway analysis is conducted on the transmission disequilibrium test (TDT) result in order to identify common molecular pathways enriched in variations associated with Autism in both samples. Moreover, the same approach is used to identify one or more pathways enriched in variations associated with the severity of autistic symptoms as ranked from 1 to 4 (from broad spectrum to strict Autism) in one of the samples. The present contribution is originated from a previous published abstract at the 29 th ECNP Conference [22].

Dataset
Genetic data were available from the NIMH genetics (https://www. nimhgenetics.org/). The Autism Dataset 4 (Study 65/TASC GWAS Data) sample was chosen for the investigation sample. The Autism Dataset 3 (GWAS Data on 1,264 Non-AGRE Samples) served as a replication sample. GWAS were conducted with Affymetrix 5.0 in both samples.
Page 2 of 5 frequency <0.01 and low genotype call rate. Mendellian error threshold was set at 5% for families and 10% for single variations and missing data rate was set at 5% for the Dataset. Duplicate samples, excessive mendellian errors and monozygous twin (removed one of the twins) were sample exclusion criteria. Deviations from the Hardy-Weinberg equilibrium were accepted under a P-threshold of 0.0001. Lambda values served to exclude inflation factors.

Statistical analysis
In the principal analysis, a transmission disequilibrium test (TDT) [23] was used to test for the presence of a genetic linkage between each and every SNP passing the quality test and the phenotype under analysis. The samples are comprised of trios were parents are labeled as "0" (cases) and the ASD child or children are labeled as "2" (controls). For the secondary analysis, an association analysis for quantitative data (command -assoc in plink) was run for every and single SNP using the phenotype from 1 to 4 available from the NIMH database, indicating different degrees of severity of autistic symptoms.

Enrichment analysis
Plink [24] served for the TDI GWAS analysis and genetic annotations. R [25] and dedicated packages served for the analyses. SNPs associated with the phenotype under analysis at a p level <0.05 were included in the molecular pathway analysis. The enrichment analysis was conducted in an R environment [25], through Bioconductor [26] and the package ReactomePA [27]. The Reactome [28] is a manually curated database that includes chemical reactions, biological processes and molecular pathways. Reactome PA was developed to analyze molecular pathways associations with gene lists obtained from highthroughput genomic investigations. Bonferroni, and False Discovery rate q-values are incorporated for multiple comparison corrections.

Results
1233 individuals were available from the Autism Dataset 3, 789 males, 444 females and 588 cases. 333 nuclear families, 3 founder singletons were detected. 579 non-founders with 2 parents in 321 nuclear families, 14 non-founders without 2 parents in 9 nuclear families and 579 affected offspring trios were detected. 393763 markers were included in the analysis. Genomic inflation was excluded (λ=1.03). The "Adherens junctions interactions pathway" was significantly enriched (p=0.000008; adj.p=0.008) ( Table 1). 2890 individuals (1824 males, 1069 females) with non-missing phenotype were found in the Autism Dataset 4, 936 cases, 1954 controls and 3 missing. 1930 founders and 963 non-founders were found, the total genotyping rate was 0.93. 965 nuclear families were detected, 2 founder singletons were found and 963 non-founders with 2 parents in 963 nuclear families were included in the analysis. 934 affected offspring trios were identified. 1160305 markers were available for the analysis. Genomic inflation was excluded (λ=1.002). The "Extracellular matrix organization pathway" was significantly enriched (p=0.000008; adj.p=0.007) in this sample. The "Axon guidance pathway" showed a trend for enrichment (p=0.00019; adj.p=0.084). A trend of significant enrichment was observed for the "NCAM1 molecular pathway" when the severity of autistic symptoms was investigated (p=0.00009; adj.p=0.097) ( Table 2). Finally, the "NCAM1 molecular pathway" showed a trend for association with a worse presentation of the autistic symptoms when only subjects with autism were selected out of the total sample (p=0.00009; adj.p=0.097).

Discussion
ASD is a frequent condition in the general population, characterized by impaired social abilities, restricted interests and repetitive behavior. The disorder was consistently proven to have a genetic basis, but the number of involved genes could be as high as hundreds, which suggests a polygenic nature. In order to test the convergence of specific molecular pathways towards ADS, a metabolic pathway analysis was undertaken in two independent samples of Autistic trios. As a result, two different molecular pathways were found to be enriched in the different databases, namely the "Adherens junction's interactions pathway" and the "Extracellular matrix organization pathway", while a third pathway, the "NCAM1 molecular pathway" showed a trend for significance when the severity of autistic symptoms was taken into consideration. Our findings are consistent with what was anticipated to be one of the most critical molecular pathways in Autism [29]. Cell adhesion proteins are well known to influence the neuronal function. Rendall  Note: ID=Molecular pathways' ID; Description=Description of the pathway; GeneRatio=Number of genes in pathway in the selected database/number of genes overall in the selected database; BgRation=Number of genes in the pathway in international dataset/number of genes overall in international datasets; p.adjust=p values after Bonferroni correction; q value=p values after false discovery rate correction for multiple testing. the CNTNAP2, a cell adhesion protein whose deletion results into myelin formation's insufficiency and delayed learning, two possible characteristics of ASD [30]. On the other hand, cell-adhesion molecules of the immunoglobulin superfamily have a critical role in brain development and in the maintenance of synaptic plasticity, and their defect can severely alter the function of the brain [31][32][33]. One of the most known and investigated gene located in the "Adherens junction's interactions pathway" is the Catenin Beta 1 (CTNNB1). Its product is a part of the proteins that consitute adherens junctions and it is involved in the Wnt pathway and the Reeling Pathway. The Wnt pathway is a complex set of different molecular pathways that mostly control 1) gene expression; 2) cell polarity and 3) calcium balance. It is relevant in cell fate, cell proliferation and cell migration. Variations within the pathway lead to cancer, diabetes and at least other 100 diseases (http:// www.malacards.org/search/results/CTNNB1), which also include mental retardation. Evidence showing an involvement of the Wnt pathway in Autism is also gathering [34][35][36]. Quite interestingly, Wong and colleagues reported that the manipulation of the extracellular environment with pro-inflammatory mediators may impact both the activation of the Wnt molecular pathway and the activation of the metalloproteases that modify the extracellular matrix and allow and facilitate the cells' migration [37]. Metalloproteases are one of the principal enzymes of the "Extracellular matrix organization pathway". Disturbances in inflammation have been reported to be possible causes to Autism [38,39], and neurodevelopment is dependent on the efficiency of the immune system [40][41][42]. It is tempting to postulate, that a signal from a genetic susceptibility to an inflammation event during neurodevelopment was detected in the present contribution. A less efficient molecular cascade (the "Adherens junction's interactions pathway" or the "Extracellular matrix organization pathway") would respond in a less efficient way to inflammatory insults, this resulting in higher risk for ASD. This hypothesis needs to be further tested by independent analyses. Consistently with this, one of the genes that is included in the "Adherens junctions interactions pathway", the CADM2, was previously found to be associated with ASD [43], in a sample of 1402 trios of ASD. Another gene included in the same molecular pathway, the ACTB gene, was deleted in a single case report a child exhibiting autistic like behavior along with brachycephaly, prominent ears, cryptorchidism, speech delay, poor eye contact, and outburst of aggressive behavior. Finally, NCAM1 is also a cell-adhesion molecule whose activity impacts on a wide range of events including cellular adhesion, migration, proliferation, differentiation, survival and synaptic plasticity.

Conclusion and Limitations
Adhesion molecules have been candidates for unraveling the genetics of Autism in the last decades, but results have been inconsistent or negative. A possible explanation to this apparently conflicting finding is the poor penetrance of the single variations for the phenotype under investigation. When investigated alone, SNPs do not have sufficient power to emerge from the noise signal in the most common association analyses. This caveat is addressed by a molecular pathway analysis, where the weak signals from different SNPs are gathered by the molecular pathway labeling, acquiring power to emerge as statistical significant. Molecular pathway analysis may be a promising approach to GWAS data, but it also has limits. One of the main limits of the genome wide molecular pathway analysis is that it relies on known molecular pathways and known genetic functions. Moreover, this technique is limited by the numbers of available SNPs included in the GWAS, so that the risk of false negative findings due to poor coverage of specific genes cannot be ruled out. Keeping this in mind, it is possible to combine the results of the molecular pathway analysis with the current published evidence about a specific phenotype, helping defining the genetic makeup that increases the risk for a disease or a group of diseases. Another limit of the present contribution is that it cannot take into account the de novo variations, as they may not be labeled and cannot therefore be grouped in any molecular pathway. De novo variations may explain a part of the missing heritability, but it was estimated that 49% of the genetic architecture of ASD is related to common inherited variants and only 3% by de novo variations and rare variants [44].

Acknowledgements Datasets
Genetic data were available from the NIMH Center for collaborative genetic studies.
NIMH Study 65, also known as AGP or TASC (PI: Gallagher), deposited genotype data in four sets, cleaned and raw, stage 1 and stage 2. NIMH then combined the cleaned stage1 and stage2 data into one dataset named Autism Dataset 4, in the PLINK file format. The n=2893 records in Dataset 4 occur in n=964 families (99% trios), of which n=935 families have one or more probands with a diagnosis, ranging from strict to broad autism, see documentation for definitions. The AGP Simplex Collection (TASC) was funded by an award from Autism Speaks and by funding support to the repository development by the NIMH. The principal investigator and co-investigators on this study were Louise Gallagher, 43 Note: ID=molecular pathways' ID; Description=Description of the pathway; GeneRatio=Number of genes in pathway in the selected database/number of genes overall in the selected database; BgRation=Number of genes in the pathway in international dataset/number of genes overall in international datasets; p.adjus t=p values after Bonferroni correction; q value=p values after false discovery rate correction for multiple testing.