A Critical Review of the Impact of Candidate Copy Number Variants on Autism Spectrum Disorders

Autism spectrum disorder (ASD) is a heterogeneous neurodevelopmental disorder (NDD) that is caused by genetic, epigenetic, and environmental factors. Recent advances in genomic analysis have uncovered numerous candidate genes with common and/or rare mutations that increase susceptibility to ASD. In addition, there is increasing evidence that copy number variations (CNVs), single nucleotide polymorphisms (SNPs), and unusual de novo variants negatively affect neurodevelopment pathways in various ways. The overall rate of copy number variants found in patients with autism is 10%-20%, of which 3%-7% can be detected cytogenetically. Although the role of submicroscopic CNVs in ASD has been studied recently, their association with genomic loci and genes has not been properly studied. In this review, we focus on 47 ASD-associated CNV regions and their related genes. Here, we identify 1,632 protein-coding genes and long non-coding RNAs (lncRNAs) within these regions. Among them, 552 are significantly expressed in the brain. Using a list of ASD-associated genes from SFARI, we detect 17 regions containing at least one known ASD-associated protein-coding genes. Of the remaining 30 regions, we identify 24 regions containing at least one protein-coding genes with brain-enriched expression and nervous system phenotype in mouse mutant and one lncRNAs with both brain-enriched expression and upregulation in iPSC to neuron differentiation. Our analyses highlight the diversity of genetic lesions of CNV regions that contribute to ASD and provide new genetic evidence that lncRNA genes may contribute to etiology of ASD. In addition, the discovered CNVs will be a valuable resource for diagnostic facilities, therapeutic strategies, and research in terms of variation priority.


Introduction
Autism spectrum disorder (ASD) is defined as a highly heterogeneous and complex condition of Neurodevelopmental Disorder (NDD), which is characterized by varied degree of impaired socialization, speech and language developments, the presence of repetitive patterns of activities, and restricted interests [1].Additionally, there is often a combination of various psychological manifestation as sociated with ASD, including schizophrenia, sleep disturbance, epilepsy, attention-deficit/ hyperactivity disorders (ADHD), and anxiety and intellectual disabilities (ID) [2,3].The global prevalence of ASD is estimated to be about 1 in 100 children.According to Center for Disease Control (www.cdc.gov/ncbddd/autism/data.html, retrieved on 2 August 2022) he ratio among mans is four to five times more common among males [4][5][6].ASD development is influenced by a number of genetic factors and environmental variables.The contribution of genetics to the etiology of autism is undoubtedly significant, and including several factors such as single nucleotide polymorphisms (SNPs), changes within non-coding and regulatory regions such as long non-coding RNAs

Results and Discussions
As a result, we identified 31 known genes in 17 CNV regions and approximately 300 novel candidate genes in the 47 ASD-associated CNV regions (Table 1 and Figure 1) that some of them will be discussed in more details in the following sections.

1p36.22-p36.33 Deletions
Our analysis identified a significantly interstitial deleted region at 1p36.22 -p36.33 locus (chr1:10001-10270615) that spans 10.2Mb and contains at least 306 genes.A variety of clinical manifestations including congenital abnormalities, developmental delay (DD), ID, muscular hypotonia (MH), and seizures have been observed in patients with different size of 1p36 deletions, both terminal and interstitial deletions in 1p36 [25][26][27].Shared clinical manifestations were detected in patients with different sizes of 1p36 deletions, making the correlation between genotypes and phenotypes complicated.
We identified 57 coding genes and 44 lncRNAs within the identified region with significantly enriched expression in the brain.As a result of our study concentrating on the possible roles of genes in the neurodevelopmental process and ASD pathogenesis, 19 coding genes with neural system phenotype in mutant mouse model such as DNAJC11, DVL, VWA1, GNB1, GABRD, PRKCZ and PERE as well as 16 lncRNAs with dynamic expression during iPSC (induced pluripotent stem cell) to neuron differentiation including RP1-140A9, TP73-AS1, CATG00000045106, CATG00000004170, CATG00000004816 and CATG00000004823 has been identified [28][29][30].
Brain-derived tissues expressed genes are the major proportion of known genetic contributors to autism [16,31].Among the brain-enriched genes, DVL1 (disheveled segment polarity protein 1) is a gene with putative link to ASD and is remarkably eliminated in autistic individuals.Previously, Lijam et al. reported impaired social interaction behaviors (a characteristic of autism) in a mouse model lacking Dvl1 [32].According to our analysis, using the FANTOMCAT database [28], DVL1 is highly expressed in the NS of the mouse [29,30], suggesting it may play a role in neurodevelopment and ASD.RERE (arginine-glutamic acid dipeptide repeats) is another brain-enriched gene that is significantly deleted in autistic patients [26].According to studies on zebrafish and mouse models, mutation and haploinsufficiency of RERE leads to NDDs, ID, DD and kidney problems [26,29,30,[33][34][35].These data showed that RERE mutation carriers and autistic individuals had some common phenotypes, indicating this gene is a plausible strong contributor to ASD.We also found a significant deletion of lncRNA RP11-206L10 on 1p36.This brain-enriched lncRNA was also reported as a gene residing in ID-associated CNVs [36], implying its possible function in neurodevelopmental regulation.

1q21.1 rearrangements
In this locus, CNVs have been detected in patients with a broad range of developmental defects and neurological disorders, including DD, ID, schizophrenia, and ASD [37].Our investigation of 1q21.1 locus reveals the presence of two structural abnormalities: 1.6Mb (chr1:146113643-147727385) and 1.75Mb region (chr1:146113643-147870095) that are significantly duplicated and deleted in autistic peoples, respectively.
Our finding indicates that this locus contains 10 brain enriched genes (5 coding genes and 5 lncRNA genes) that all of them are considerably deleted and/or duplicated in ASD patients.
Figure1.ASD copy number variations on human chromosomes.Blue: Autistic duplication; Red: Autistic deletions; Green: control copy number variations.Black dashed line indicates our threshold for significant regions (better P-value than 99.9999 percentile of 500,000 permutations).X-axis indicates genomic loci and y-axis indicates logarithm 10 Pvalue.P-value calculated through one-tailed Fisher exact test.
The 1q21.1 microdeletion have been associated with phenotypes such as Congenital Heart Disease (CHD) [38], schizophrenia [39], ID and microcephaly [27], whereas the 1q21.1 microduplications have been generally associated with ID, macrocephaly, and ASD [27,37,40].The result of our study indicates that brain-enriched RP11-337C18, a significantly duplicated/deleted lncRNA with dynamic expression in an iPSC model of neuron differentiation, has potential role in neurodevelopmental process and is likely to be important in ASD.Furthermore, BCL9 (B-cell CLL/lymphoma 9) as one of brain enriched coding gene within this locus, plays a critical role in signal transduction of canonical Wnt pathway through promoting β catenin's transcriptional activity [41,42].The results of association studies have been indicated that Wnt/β catenin signaling dysfunction in neurons may contribute to ASD phenotypes [42,43].Also, it has been demonstrated that BCL9 modulates Wnt signaling through its interaction with ARX gene (a causative gene for autism) [44].This evidence is consistent with our results, suggesting that BCL9 gene may be involved in ASD.

2p16.3 Microdeletion
A 253kb region (chr2:51105853-51359022) on 2p16.3 was significantly deleted in ASD individuals.There are one protein-coding genes NRXN1 and lncRNA AC007682.1 in this region, both of which showed brain enrichment, while NRXN1 had also physiological and morphological effects on mutant mice [29,30].NRXN1 (neurexin 1) a member of the neurexins family of presynaptic cell-adhesion proteins that play a role in synaptic connections, has already been linked to ID and DD.In addition, this gene has been implicated in the pathogenesis of psychiatric disorders such as schizophrenia and autism with incomplete penetrance and variable expressivity of mutations [45][46][47][48].Furthermore, we discover only one NS-enriched lncRNA AC007682.1 in this region, suggesting a probable role in autism pathogenesis.

2q22.3-q23.1 Microdeletions
In autistic individuals, we discover a 326kb region at 2q23.1 with 5 brain-enriched genes (3 coding genes and 2 lncRNA genes) that are significantly deleted.Several phenotypes and disorders are caused by 2q23.1 microdeletions ranging in size from 0.2Mb to 5.5 Mb, such as dysmorphic features, ID, seizures, behavioral problems, and ASD [49][50][51][52].MBD5 and ACVR2A are two of three brain enriched coding genes within the identified region that showed dynamic expression from iPSCs to neuronal differentiations and caused morphological changes in mutant mice [29,30].MBD5 (methyl-CpG binding domain protein 5) belongs to the methyl CpG-binding domain protein family, which encompasses MECP2, an already known ASDassociated gene that is mutated in Rett syndrome [51,53,54].It has been proposed that MBD5 is a dosagesensitive essential gene in normal development, with autism-related mutations [52,[55][56][57]

2q37.3 Microdeletion
The 2q37 deletion has been associated with brachydactyly mental retardation syndrome (BDMR), a condition characterized by ID, DD, behavioral abnormalities, and ASD [58,59].The 2q37.3 microdeletion has also been linked to autism [60] and miscarriage [61].These findings point to the presence of potentially important regulatory genes in this region, with possible roles in the progression of NDDs.Our analysis identified a 76.76kb as a significantly deleted region at 2q37.3 (chr2:242930600-243007359) in autistic individuals, containing three lncRNA genes (AC131097.3;AC093642.3;AC093642.4di),possibly valuable to be investigated for their involvement in the ASD.
PAK2 (p21 (RAC1) activated kinase 2) is a serine/threonine kinase that regulates actin cytoskeleton dynamic, with potential signaling roles in cortical neurons [68][69][70].It is highly expressed in the fetal brain and may contribute to neuronal differentiation [69,70].It was also documented that PAK2 plays a crucial role in the processes of brain development and autism progression [64,71].

FBXO45 (F-box protein 45
) is an ubiquitin ligase involved in the growth and formation of synapses [72,73] and that also contributes to the progression of schizophrenia and autism [74].In addition, studies in mice suggested roles for Foxo45 in neuronal migration and development [75].Overall, these findings and genotypephenotype correlation study for 3q29 deletion indicate that FBX045 might be a candidate gene involving in the progression of neuropsychiatric disorders [63].
DLG1 (discs large MAGUK scaffold protein 1) is a paralogue of DLG3, a gene engaged in the pathogenesis of ID, when mutated [64].This gene is required for normal development and is a component of a core transsynaptic complex (Neurexin/Neuroligin/DLG/SAPAP/Shank) whose mutations probably increase autism's susceptibility [74,76,77].

4q16.3 Microdeletion
The 4q16.3 deletion has been reported in patients with Wolf-Hirschhorn syndrome (WHS) [78].We identified 15 genes (coding and non-coding) in a significant deleted 367kb region on 4q16.3 (chr4:1705715-2073645) of which five coding genes were expressed highly in the brain.Two of these genes, FGFR3 and NAT8L appear to be potential candidates for neurodevelopmental involvement since both express highly in NS and exhibit phenotypic manifestations in mutant mice [29,30].The gene ontology analysis indicates that FGFR3 is strongly associated with seizures and congenital abnormalities [79][80][81], which are very common in ASD as well as STAT cascade with roles in neural development [82].This finding suggests a possible role for FGFR3 in autism.

6p25.3 rearrangements
Our data scanning for finding substantial CNVs related to ASD revealed a novel 80.3Kb region (chr6:259528-339802) on 6p25.3 chromosomal band which is significantly deleted and duplicated.The region contains a single gene DUSP22 (Dual specificity phosphatase 22), which has previously been revealed to contribute to autism pathogenesis through JNK signaling pathway [83][84][85][86].JNK signaling is believed to have a crucial role in the development of psychiatric diseases such as ID, schizophrenia and ASD [84].The same study, in parallel to other findings [87], also indicated that DUSP22 is an imprinted gene expressed maternally in adult prefrontal cortex [86], and further suggested as a possible contributor in the pathogenesis of ASD.In addition, heterozygous deletion of DUSP22 gene has also been reported in ASD [88].

7q11.23 rearrangements
We also identify deletions and duplications of 1.5 to 1.8 Mb at 7q11.23 locus, that results in Williams-Beuren Syndrome (WBS; MIM 194050) and 7q11.23 duplication syndrome (Dup7; MIM 609757), respectively [27,[88][89][90].WBS has a 6-10 times higher prevalence in autism compared to the general population [88], indicating that its deletion may play a crucial role in autism pathogenesis.In autistic individuals, we found a significantly deleted/duplicated region (chr7:72742064-74142064), encompassing 51 genes (26 coding and 25 lncRNA genes), spanning 1.4Mb on 7q11.23 chromosomal band.After narrowing the list of significantly deleted/duplicated brain-enriched genes (9 coding and 8 lncRNA genes) located at 7q11.23 locus, we identified STX1A, GTF2I, GTF2IRD1, CLIP2, FZD9, LIMK1, EIF4H that are associated with NS phenotype in mutant mouse and CATG00000096069, CATG00000092613, CATG00000096089 that display dynamic expression in iPSC model as genes with possible importance in ASD.Furthermore, our analysis of STX1A suggests a role for this gene in autism, since it is highly expressed in brain tissues, has significant deletions and duplications in ASD, and shows effects on the NS when it is mutated in mice.
STX1A (syntaxin 1A) modulates serotonin transporter activity, which is overexpressed in autism with high functioning and can potentially be involved in the pathogenesis of ASD [91][92][93].GTF2I (the general transcription factor 2I) and CLIP2 (CAP-Gly domain containing linker protein 2) are also other NS-enriched genes that may play a role in the etiology and manifestation of autism [94,95].

8p23.1 rearrangements
We also study various deletions and duplications in 8p23.1 as an extremely susceptible locus for genomic rearrangements.Various changes in the range of 3.6 Mb to 6.5 Mb residues were reported by different groups [96][97][98][99][100].Most interstitial deletions of 8p23.1 are de novo [100,101], whereas terminal deletions are inherited [100].We narrowed down the regions of CNVs and uncovered four regions between 38kb and 3 Mb that were significantly deleted (chr8:10657597-10695288, chr8:7268819-7366446) or duplicated (chr8:7721060-7752586, chr8:8650757-11629240) in autistic individuals on 8p23.1.Our study revealed that deletions in these regions are associated with an increased risk of NDDs, psychiatric conditions, abnormality of the heart, hyperactivity, and mild to moderate ID [27], whereas duplications in these loci are linked to abnormal facial shape, behavioral/psychiatric abnormality, problems with learning/speaking, malformation of the heart, and ID [27].In these four regions, we only detect 27 remarkably duplicated brain-enriched genes (9 coding and 18 lncRNA genes).Despite the absence of any reported ASD associated gene in these regions, our study points to a number of significant candidate genes that are either brain-enriched such as TNKS, PINX1, MSRA, XKR6, FAM167A, NEIL2 or associated with NS phenotypes in mutant mouse like RP1L1 and GATA4 [28][29][30].In addition, we identified that lncRNA RP11-1E4.1 significantly expressed in brain and dynamically expressed during iPSC to neuron differentiation, suggesting its effect on neurodevelopment and possible contribution to ASD phenotypes.
MSRA (methionine-sulfoxide reductase) is located in a remarkably duplicated region on chr8:8650757-11629240.It has a regulatory role in human fetal brain development [102] and protective function against oxidative stress [103], as a causative mechanism for ASD [104].Furthermore, MSRA is associated with bipolar disorder [105] and schizophrenia [106].All these findings along with our result emphasis the possibility of a strong correlation between MSRA and autism.
PINX1 (PIN2/TRF1-interacting telomerase inhibitor 1) is a regulator of telomere integrity that is required for chromosomal stability [107,108].Multiple lines of evidence imply that some genetic variations involved in ASD are linked to telomere shortening [109,110].As a result, PINX1 may be a potential candidate gene for telomere shortening in ASD patients.
EHMT1 (euchromatin histone methyltransferase 1) is reported as a genetic cause of ID syndrome associated with subtelomeric deletion of 9q34.3, due to its haploinsufficiency [115,116].This gene has been implicated in the appearance of autistic-like symptoms in knockout mice [117] as well as the development of psychological problems, exclusively mood disorders and ASD in mosaicism status [118].
CACNA1B (calcium voltage-gated channel subunit alpha B) belongs to the calcium channel genes family, whose members modulates neural function and possibly contributing to ASD [119,120].According to gene ontology analysis, CACNA1B causes neurological speech impairment, which is one of the common problems in ASD.Altogether, these findings and our results may indicate a possible involvement of this gene in the progression of ASD.
GRID1 (glutamate receptor ionotropic, delta-1) is suggested to be associated with schizophrenia [123] and contributes to the genetics of autism [124][125][126].Additionally, BMPR1A (bone morphogenetic protein receptor type-1A) is a significantly deleted gene in patients with ASD, and Juvenile Polyposis Syndrome (JPS) results from loss of function mutations [127].Microdeletions in BMPR1A are associated with severe polyposis and childhood malignancies as well as some autism related phenotypes such as cardiac abnormalities, macrocephaly and DD [128].Another gene in this region, NRG3 (neuregulin 3), as a brain enriched gene, is thought to be a schizophrenia susceptibility gene [129], and 10q22-q23 microdeletions including this gene have been linked to NDDs such as cognitive impairment, DD, and autism [129,130].

11p14.1-p13 deletions
There is evidence that the 11p14.1 microdeletion is associated with obesity and autism as well as severe developmental delays, and further clinical features in individuals carrying larger deletions [131].Using SNATCNV, we found a significantly deleted region spanning 2 Mb (chr11:29635361-31653568) on 11p14.1-p13loci which contained 19 genes (8 coding and 11 lncRNA genes), where 14 of them were NS-enriched genes (7 coding and 7 lncRNA genes).Among these genes, we identified KCNA4 (potassium voltage-gated channel subfamily A member 4) as the most significantly deleted gene with an effect on the mouse NS, when mutated [29,30].This gene modulates the time interval and shape of the cardiac action potential and heart rate [132] as well as neurotransmitter release [133,134].Despite lack of studies that links KCNA4 and autism, it is known that dysfunction of neurotransmitter system and altered heart rate are linked to autism [135][136][137][138], which may be suggestive of a role for KCNA4 in ASD.ELP4 (elongator acetyltransferase complex subunit 4) is also another NS-enriched gene, residing in the 11p14.1-p13CNVs.Deletion of ELP4 has been demonstrated to be likely principal for the progression of various NDDs, ranging from language impairment to epilepsy and ASD [139].
13q34 Deletion The 13q34 deletion has been associated with high risk for NDDs including epilepsy and mild ID as well as DD [140][141][142].Our bioinformatics analysis of CNVs related to ASD, using SNATCNV, revealed a significantly deleted 3.38 Mb region at 13q34 (chr13:111699708-115085141) in autistic patients, encompassing 42 coding genes and 42 lncRNA genes.Among the significantly deleted genes with enrichment in the brain and NS: CATG00000016319, LINC00403, CATG00000018119, CATG00000018133, CATG00000016366, and CATG00000018138 displayed dynamic expression in neural differentiation of iPSCs as well as ATP11A that showed to affect the mice's NS, when mutated [29,30], possibly reflecting their importance in the progression of NDDs.
Deletion of 13q34 harboring ATP11A as well as MCF2L, F7, F10, PROZ, PCID2, CUL4A, and LAMP1 genes within our defined significantly deleted region was detected in an Italian family suffering bleeding diathesis, multiorgan involvement, and language delay, suggesting underlying roles for these genes in ASD-related manifestations such as language delay [143].In addition, UPF3A, as another brain enriched gene that is significantly deleted in ASD, plays a compensatory role in patients with loss of UPF3B function [144].UPF3B is a paralog of UPF3A, with known implications in ID, childhood onset schizophrenia, ADHD, and autism [145].
One of the genes located on 15q11.2-q13.1,known as UBE3A (ubiquitin protein ligase E3A) is an imprinted multifunctional gene that acts as both ubiquitin ligase and transcriptional co-activator with regulatory roles at synapses [167][168][169].Copy number changes in this gene leads to NDDs such as 15q11.2-q13.3duplication syndrome and ASD [170], which is consistent with our findings that this gene may be involved in ASD.Linkage disequilibrium at UBE3A was also reported in families with autism [154].GABRB3, GABRA5, and GABRG3 genes encode gamma-aminobutyric acid (GABA) receptor subunits which play crucial roles in neurodevelopment and deficiencies in their function cause autism and autism-related NDDs [171].Another gene at 15q13.3 locus that codes for a ligand-gated ion channel protein that mediates signal transduction at synapse is CHRNA7 (cholinergic receptor nicotinic alpha 7 subunit).The dosage-sensitive of this gene seems to be involved in the cognitive and behavioral manifestations [172,173].Furthermore, reduced expression of CHRNA7 has been reported in the frontal cortex of individuals with Rett syndrome and autism [152].Finally, NIPA1 (NIPA magnesium transporter), a significantly deleted gene in autistic individuals, which encodes a magnesium transporter protein responsible for mediating Mg 2+ uptake, is likely to be implicated in some NDDs including ASD [174].Van der Zwaag et al. suggested NIPA1 and CYFIPI (another gene located at 15q11.2 CNVs) as autism risk genes with function in axonogenesis and synaptogenesis [175].This evidence together with our results strongly supports the relevance of these genes to ASD. 16p (16p13.11;16p12.2;16p11.2) rearrangements 16p is an autism-susceptibility locus with several duplications and deletions [176][177][178].The 16p11.2 and 16p13.11rearrangements have been associated with a wide range of neurological disorders such as seizures/epilepsy, ID, and autism [179][180][181][182][183], while abnormality of the face, ID, macrocephaly, and speech articulation abnormalities have been linked to 16p11.2 and 16p12.2deletions [27,179].In addition, several manifestations including speech articulation abnormalities and microcephaly have been detected in 16p11.2duplication carriers [179].Along with these findings, our results derived from bioinformatics analysis also revealed six valuable regions in autistic individuals including two deleted regions at 16p11.2 (chr16:28822499-29052499; chr16:29517499-30367499), which were 0.23 Mb and 0.85 Mb in size, respectively, and a 0.68 Mb duplication at 16p11.2 (chr16:29517499-30202499) as well as a 0.52 Mb deletion at 16p12.2 (chr16:21942499-22462224), a 1 Mb deletion at 16p13.11 (chr16:15248707-16292499) and a 0.8 Mb duplication at 16p13.11 (chr16:15502499-16292499).
As a result, we identify 60 genes (47 coding genes and 13 lncRNA genes) with a high-level expression in NS that 35 of them are significantly deleted and 25 are considerably duplicated in ASD patients.In the following, our study shows that CATG00000028919 and CATG00000028920, the only two significantly deleted and brain-enriched lncRNA genes at the 16p12.2,represent dynamic expression during iPSCs to neuron differentiation.It is also noteworthy that KCTD13, PRRT2, SEZ6L2, DOC2A, TBX6, and MAPK3 are NS expressed genes that have effect on the neural system of mutant mice [29,30], presumably playing roles in NDDs such as autism.In addition, TAOK2, another gene at the 16p11.2CNV region, is implicated in NDDs due to its essential role in dendrite morphogenesis [184,185].KCTD13 (potassium channel tetramerization domain containing 13), a gene found at 16p11.2 CNVs, has crucial role in the regulation of neurodevelopmental processes [186,187].A study on zebrafish and mouse models indicates that KCTD13 is a regulator of head size phenotype, causing microcephaly and macrocephaly in the case of duplication and deletion, respectively [186].The same study reported dosage changes of KCTD13 due to deletion and de novo structural alteration are consistent related to autism [186,188], which supports our findings that this gene may contribute to the progression of ASD.MAPK3 (mitogen-activated protein kinase 3), another brain-enriched gene, is a member of MAP kinase family which regulates brain growth, synaptic target selection and connectivity [189].The abnormality in the MAP kinase pathway has been linked to different developmental disorders, including autism [190,191].Our investigation also revealed that SH2B1, a NS-enriched gene with roles in leptin and insulin signaling, glucose homeostasis regulation and body weight, is deleted significantly in autism, suggesting it as a possible candidate genes at 16p11.2 CNVs for obesity as a symptoms of ASD [192][193][194].
Furthermore, the deletion of 17p11.2locus results in Smith-Magenis Syndrome (SMS), hyperactivity, ID, MH, self-mutilation, short stature, sleep disturbance, and stereotypic behavior [27,201]; whereas Potocki-Lupski Syndrome, (PTLS), autism, hyperactivity, short attention span, and short stature are clinical manifestations of 17p11.2duplication [27,202].Our bioinformatics analysis to identify other possible genes related to autism in ASD CNVs revealed five significantly deleted or duplicated regions including: two duplications at 17p13.3 that are 635 kb (chr17:1038189-1673212) and 1.2 Mb (chr17:2302359-3502250) in size, a 1.7Mb region of deletion at 17p13.3 (chr17:911295-2593250), a 1.93 Mb region of duplication at 17p11.2 (chr17:16446387-18380166) and a 1.5 Mb region of deletion at 17p11.2 (chr17:16789275-18299275). In total, 259 genes were detected, 92 of which were highly expressed in brain (59 coding genes and 33 lncRNA genes) which among them, 49 genes were significantly duplicated, and others were remarkably deleted.To further narrowing the number of possible autism-related genes in the specified regions, we discovered significantly duplicated CATG00000030828, CATG00000032809, RP11-74E22.3, and CATG00000030472 lncRNA genes as well as significantly deleted CATG00000032796 and CATG00000032809 lncRNA at 17p13.3 CNV regions, that are expressed during iPSC to neuron differentiation, possibly indicating their role in neurodevelopmental processes and the development of ASD.Likewise, we identified 16 coding genes such as ASPA, TRPV1, RTN4RL1 YWHAE, RASD1, RAI1, and PAFAH1B1, mapped to significantly deleted and duplicated regions, affecting the mouse NS when mutated [29,30], as candidate genes with probable involvement in ASD.TRPV1 (transient receptor potential cation channel subfamily V member 1) is a brain enriched gene located at 17p13.3 that is significantly duplicated.It has been demonstrated that this gene is regulated by SHANK3 [203], one of the best known autism-implicated genes with function in synaptic formation, maturation, and maintenance [204][205][206], located at 22q13.3 CNVs.In sensory neurons, activation of TRPV1 have impact on pain perception [207], whereas abnormal pain sensitivity is typically observed in ASD [208].Therefore, TRPV1 and its adjacent gene, TRPV3, could be possibly implicated in occurrence of this phenotype in ASD.
RAI1, another ASD related gene at the 17p11.2CNV regions, is a dosage sensitive gene tough to be responsible for the most phenotypes relevant to SMS and PTLS [209,210].Moreover, RAI1 has dynamic expression during iPSCs to neuron differentiation, which along with our analysis suggests its possible implication in autism.
Finally, PAFAH1B1 (platelet activating factor acetyl hydrolase 1b regulatory subunit 1; also known as lissencephaly gene 1 (LIS-1)), located in the 17p13.3CNV regions, is another candidate ASD related gene due to its dynamic expression during iPSC to neuron differentiation and impact on the neural system of mutant mouse [29,30].Additionally, the presentation of lissencephaly in Miller-Dieker syndrome is caused by mutation or deletion in PAFAH1B1 gene [215] , while hypotonia and mild developmental delay have been linked to its duplications [214].

17q (17q12; 17q21.31) rearrangements
Speech and motor delay, seizures, vision problems, and behavioral abnormalities are some of disease related to 17q12 recurrent duplication, which usually indicate reduced penetrance and variable expression [216].The deletion of 17q12 confers high risk for schizophrenia and ASD [217] and are also associated with different diseases such as liver cancer, diabetes mellitus, and multiple renal cysts [27,218].In addition, 17q21.31microdeletion and duplication has been reported to cause Koolen-de Vries syndrome (KdVS), a disorder with various symptoms such as DD/ID, neonatal/childhood hypotonia, dysmorphisms, congenital malformations, speech and language delay [27,219], as well as frontotemporal lobar degeneration (FTLD) and schizophrenia in a few patients, respectively [220,221].
Our bioinformatics analysis displays three regions at 17q harboring CNVs that are significantly rearranged in ASD, including a 40 kb region of duplication at 17q12 (chr17:34815887-34856055), a 1.4 Mb region of deletion at 17q12 (chr17:34815887-36225887), and a deleted region of 0.46 Mb in size at 17q21.23 (chr17:43704217-44164182). Eight out of 24 coding genes and nine out of 19 lncRNA genes detected in the significantly duplicated/deleted ASD CNV regions are highly expressed in the brain.A number of these gene may have an impact on neurodevelopment and autism, including RP11-445F12, RP11-697E22, and MAPT-AS1 lncRNAs with dynamic expression during iPSC to neuron differentiation as genes as well as LHX1, CRHR1, MAPT, and KANSL1 with effects on the NS of mutant mouse and significant deletion in ASD [29,30].LHX1 (LIM homeobox 1), mapped to 17q12 locus, is a brain-enriched gene rearranged in various diseases including autism [222].In addition, [223] suggested the likelihood of involvement of LHX1 in the development of brain.MAPT (microtubule associated protein tau) and KANSL1 (KAT8 regulatory NSL complex subunit 1) are two genes located at CNVs in the 17q21.31region.The MAPT gene encodes tau proteins that are involved in Alzheimer's disease (AD) and Parkinson disease [224].Furthermore, it has been shown that mutations and duplications of MAPT are associated with frontotemporal lobar degeneration (FTLD), whilst its deletions is linked to several manifestations like facial dysmorphism, hypotonia, and ID [225].[226] discovered that the reduction of tau protein prevents autism-like behaviors in mouse models.In contrast, haploinsufficiency of KANSL, the chromatin modifier gene, caused 17q21.31microdeletion syndrome, that is characterized by ID, distinctive facial features, and hypotonia [227].
We further discovered that CRKL, RTN4R, ZDHHC8, SEPT5, MAPK1, GNAZ, and BCR with considerable deletion and significantly duplicated RTN4R, ZDHHC8, and SEPT5 genes have effects on the NS of mutant mice [29,30] as well as XXbac-B444P24.14, DGCR5, AC000068.5,CATG00000058233, CATG00000058251, and LL22NC03-86G7 with remarkable deletion and significantly duplicated XXbac-B444P24.14 and AC000068.5 lncRNA genes dynamically expressed in iPSC model.SEPT5 (SEPTIN 5), a brain enriched gene located within ASD CNV region, is a GTPase with cytokinesis activities and regulates neurotransmitter release at synapses.It has been reported that there is a genetic background-dependent relationship between Sept5 deficiency and ASD-related phenotypes using mouse model [234,235].Another gene, MAPK1 as a member of MAPK family, plays role in proliferation, differentiation, apoptosis, development, and synaptic plasticity [236][237][238] and its deficiency results in brain impairment and autism-related traits [239][240][241].Furthermore, we identified that CLTCL1 (clathrin heavy chain-like 1) and GNB1L (G protein subunit beta 1 like) significantly deleted or duplicated in autistic individuals are potential genes involved in ADS [242,243].Along with coding genes linked to ASD, DGCR5, a lncRNA gene with putative involvement in schizophrenia [244], has been proposed as a possible candidate autism-related gene.

22q13.2-q13.33 Deletion
The 22q13 deletion syndrome also known as Phelan-McDermid Syndrome (PMS) is caused by deletion ranging in size from 100 kb to over 9 Mb at 22q13.2-q13.33loci.While larger deletions have been associated with various manifestations and neurological symptoms including delayed speech and language development, DD, macrocephaly, hyperactivity, and ID; smaller deletions (median size of 3.39Mb) rarely cause these phenotype and have been found in individuals with ASD [27,204].In this study, we identified a 8.2 Mb region (chr22:43000056-51224208), that is significantly deleted in ASD and encompasses 113 coding genes and 109 lncRNA genes.Among the 48 brain-specific coding genes, SHANK3, MAPK8IP2, TTLL1, MLC1, BRD1, PANX2, MAPK11, and PLXNB2 were identified to impact on the NS of mutant mouse NS [29,30], that may suggest possible connections between the function of these genes and neurodevelopmental processes.In addition, CATG00000057830, RP3-388M5, CTA-217C2, CATG00000057926, RP3-402G11.26,CHKB-AS1, CATG00000058090, and CATG00000058095 from CTA-217C2the 26 brain-enriched lncRNA genes showed dynamic expression during iPSC to neuron differentiation (using FANTOMCAT database [28]), which may indicate their importance in the development of neurological conditions, despite the fact no study has yet demonstrated such correlation.
SHANK3, as a deleted gene in PMS, is the most well-known candidate gene for neurological manifestations in 22q13 deletion syndrome and ASD.It plays role in the formation, maturation, and maintenance of synapse [204,205] and also associated with ASD phenotype [245].An additional candidate gene for Chr22qterassociated disorders that is almost deleted in PMS and ASD cases [246] is MAPK8IP2 (also known as IB2).It encodes a scaffold protein involved in JNK signaling [247] with high enrichment in postsynaptic densities.Both MAPK8IP2 and SHANK3 are brain specific genes that highly expressed during iPSC to neuron differentiation with effects on the NS of mutant mice [29,30], implying roles for these genes in the development of NDDs including autism.Moreover, PANX2 is another significantly deleted NS-enriched gene at 22q13.33 locus that has been shown to be differentially expressed gene in autism [248] and has effects on the mouse brain [29,30].

Xq28 Duplication
The Xq28 sub-chromosomal band is a gene rich region (harbors about 13% of the total X-linked genes) which is associated with more than 40 of roughly 300 X-linked diseases [249].Rett syndrome (a disorder representing autistic phenotypes), Xq28 duplication syndrome, and MECP2 duplication syndrome have all been reported to be linked to CNVs at this locus [250,251].A study using chromosome microarray analysis revealed Xq28 to be the most common subtelomeric locus with copy number gains among patients with DD, ID, dysmorphic features, multiple congenital anomalies, and seizure [252].Further, it has been documented that delayed speech and language development, severe ID, MH, recurrent infections, seizures, and spasticity are some manifestations associated with Xq28 duplication syndrome [27,251].
L1CAM (L1 cell adhesion molecule) and TMLHE (trimethyllysine hydroxylase, epsilon) are two candidate risk genes for ASD with role in the development of the NS [261] and transmission of fatty acid through the mitochondrial membrane, respectively [262].There is evidence that some autistic patients carry rare deletion mutations in TMLHE linked to autism [262,263].Furthermore, SPRY3 (sprout RTK signaling antagonist 3), another autism susceptibility gene adjacent to TMLHE, showed high expression levels in central and peripheral NS ganglion cells in both mouse and human [264], which is consistent with our findings that suggests SPRY3 as a highly significant duplicated NS specific gene at 8q28.Another gene that impacts on the cognition function is GDI1 (GDP dissociation inhibitor 1) [265], which is the most significantly duplicated gene in the ASD region.Considering the link between ID and GDI1 duplication [251] as well as its effect on the NS of mutant mouse, this this gene may be a strong candidate to contribute to ASD and possibly other NDDs.Finally, ATP2B3 (ATPase plasma membrane Ca 2+ transporting 3) with function in Ca 2+ extrusion, global and local Ca 2+ homeostasis, and intracellular Ca 2+ signaling [266], is another brain-enriched gene that significantly duplicated in autistic individuals.It is well known that Ca 2+ signaling pathways have vital role in the regulation of neuronal development and may be involved in the pathogenesis of ASD [267].These findings suggest that ATP2B3 might be a strong candidate gene for ASD.

Xp22.33 Duplication
The Xp22.33 spans 4.4 Mb on the short arm of chromosome X and its duplication (Xp22.33p22.32)has been reported with different clinical diseases including ID, DD, delayed speech and language development, short stature, and autism [27,268].At this research, we discovered a 2.08 Mb region on Xp22.33 (chrX:619146-2700156) that contains 11 brain enriched genes out of 26 (15 coding genes and 11 lncRNA genes) and is remarkably duplicated in autistic individuals.One of these genes, ASMT (acetylserotonin O-methyltransferase) with enzyme activity in melatonin synthesis is located at the pseudoautosomal region (PAR) of the X chromosome.This gene has regulatory effects on sleep, circadian rhythms, and memory with possible influence on cognitive function [269][270][271].Moreover, ASMT has been identified as a possible autism susceptibility gene due to its rare deleterious mutations and SNP association among ASD patients [272,273].In addition, ASMTL (acetylserotonin O-methyltransferase-like gene), which is an ASMT-like gene together and ASMTL-AS1 (ASMTL-AS1 antisense RNA 1) lncRNA divergent, with significant duplication in ASD are both located at Xp22.33 locus, implying a possible role for these genes in the progression of neurological disorders and autism.Furthermore, CATG00000112948 lncRNA is one of the brain-enriched genes in this region that shows dynamic expression during iPSC to neuron differentiation, perhaps highlighting its relevance to NDDs and ASD.

Xp22.31 Deletion
The Xp22.31 deletion has been associated with X-linked ichthyosis (XLI), a condition that has been characterized by a variety of symptoms including ocular changes, cryptorchidism, ID, epilepsy, hyperactivity, and autism [274,275].We detect a 1.4 Mb region (chrX:6490000-7885155) with significant deletion in ASD, containing four coding genes and four lncRNA genes.Besides STS, as causative gene for XLI and various NDDs, PNPLA4 were expressed highly in the brain.PNPLA4 (Patatin Like Phospholipase Domain Containing) encodes an enzyme that belongs to the phospholipases of the patatin-like family.In addition to having triglyceride lipase and transacylase activities, the encoded enzyme may play a role in adipocyte triglyceride homeostasis and is associated with human obesity.Recently, it has been suggested that this gene could be implicated in X-linked ID due to its high expression level in the human brain [276].It is important to note that this finding is in support of our results that PNPLA4 might play a role in NDDs and ADS.

Conclusion
Autism is a neurological disorder whose causes are both genetic and environmental.ASD risk factors continue to be explored in research, and our findings provide a guide for further etiological investigation especially genetics background.Genetic variations have been demonstrated to account for most of the heritability of ASD, among which CNVs are responsible for a significant proportion of ASD cases .
In this study, using SNATCNV tool along with the obtained evidence from bioinformatics analysis and literature search emphasized a strong CNV contribution to ADS.Furthermore, it has been identified that gene disruption can be caused by both deletion and duplication of CNVs.Autism -associated CNVs ultimately affect the development of neurons by impairing gene expression in the brain and have phenotypic impact on NS of mice mutant.As a result of prior studies, approximately 300 candidate genes (coding and non-coding genes) have now been identified to be linked to ASD that most of them were brainenriched lcnRNA genes.Using iPSCs model of neuron differentiation has showed that approximately 100 lcnRNA were dynamically expressed, indicating that they may play a critical role in neurodevelopment process.In comparison with other studies that have been conducted previously, our findings in this matter are in line with the prior studies [277][278][279].As gene expression can be affected by lncRNAs activity in both trans and cis situations [280,281] , there is a possibility that lncRNAs may cause ASD by regulating the expression of protein coding genes involved in this disorder.This study also suggests a list of lncRNAs associated with ASD that can be explored further to determine their potential function.These non-coding genes can be further investigated through chromatin interactions data analyses through publicly available tools such as MaxHiC [282] and MHiC [283].
In addition, this research adds a valuable data to the growing list of CNVs associated with autism by increasing the number of ASD risk gene (coding and non-coding genes), as well as accelerating future genetic studies on ASD including therapeutic pathways and genetics diagnostic tests, based on the system-wide analysis of the putative role of candidate genes (coding and lcnRNA genes) in ASD.Each of the identified CNV and ASD risk genes can be critical for the accurate explanation and personalized intervention of the causative factors for autistic patients due the complicated role of CNVs in ADS etiology.
Furthermore, it has been shown that the SNATCNV is a suitable and time-efficient tools for detection of ASD CNVs, because of its power to reliably discover CNV regions and decrease the genomic space for causative CNV in ASD in a short period of time.

Samples collection
The SFARI CNV dataset was downloaded through the public SFARI data portal (https://gene.sfari.org/autdb/CNVHome.do).There are 19,663 autistic samples with 47,189 CNVs and 6,479 healthy samples with 24,888 CNVs in this dataset.As CNVs have been reported in different genome build versions such as hg17, hg18, and hg19, we first converted all CNV coordinates to UCSC hg19 using UCSC Lift Genome Annotations tools [284] and confirmed the locations with NCBI remap tools (www.ncbi.nlm.nih.gov/genome/tools/remap).We excluded 4,500 CNVs for which there was no information provided about their genomic coordinates.Further information is provided in the Supplementary table S1 for more details.

Literature search strategy
Our literature searches were focused on human and mice English language papers available in the PubMed, Scopus, and Web of Science.We also used data and text mining techniques to extract additional related genes [285,286].Knowledge-based filtering system techniques have been also used to categorize the texts from the literatures search.The search terms included "Autism", "ASD", "noncoding RNA", "CNV", "copy number variations".

Identification of significant regions
To identify regions that are recurrently duplicated or deleted in ASD, SNATCNV [16] and PeakCNV [23] mined previously published ASD and control copy number variation databases that are collected and combined by the Simons Foundation Autism Research Initiative (https://gene.sfari.org/database/cnv).For every position in genome, SNATCNV counted the number of deletion and duplication CNVs from 19,663 ASD patients and 6,479 controls that overlapped and then calculated a P-value for each position using Fisher's exact test.To identify significant regions, SNATCNV calculated P-values for 500,000 random permutations of case/control labels to estimate the probability that an association emerges by chance.As a result, we identified 47 CNV regions that are significantly deleted or duplicated for ASD case samples (Supplementary table S1).
The 47 CNV regions contain 856 protein-coding genes and 776 non-coding RNAs.We used normalized tag counts from the FANTOM5 expression atlas FANTOM5 [287] to identify nervous system enriched protein coding genes and non-coding RNAs within the 47 CNV regions.To determine whether a gene had enriched expression in the 101 nervous system samples profiled by the FANTOM5 consortium (compared to the total set of 1,829 samples profiled by FANTOM), we first ranked samples by expression and then selected the top 101 samples.We then used Fisher's exact test to determine a P-value indicating whether the top 101 samples were more likely to be nervous system samples or not.To determine a threshold on the P value we carried out 50,000 permutations randomizing the sample labels and determined the thresholds at the 99% confidence interval.Then, identified any gene with a P-value better than this permutation-specific threshold as nervous system-enriched gene [7].
We also annotated protein-coding genes with the information from the Mouse Genome Informatics (MGI) resource (http://www.informatics.jax.org;June 12 th 2018) to identify those genes that showed a nervous system phenotypes in Mouse mutant (Supplementary table S1).FANTOM5 dataset was also used to identify noncoding RNAs with upregulation in iPSC to neuron differentiation.We then compared all genes in the 47 CNV regions to the 96 known causal genes from MSSNG [288] and SFARI [289].If a region contained one or more of these genes, we considered it as a region with known ASD genes.We then performed a comprehensive literature search analysis for the remaining CNV regions to check if there are any previously ASD-associated genes in the regions.Our extended literature search was specifically focused on "gene name + autism" or "gene name + ASD".DECIPHER [27], SFARI [289], MSSNG [288] and two largest CNV studies of Global developmental delay to date by Coe et al. [290] and Cooper et al. [291] were used to annotate the CNV regions.

Funding
This work was supported by the UNSW Scientia Program Fellowship and the Australian Research Council Discovery Early Career Researcher Award (DECRA) under grant DE220101210 to HAR.The work was also supported by a grant to JI-TH from the Telethon-Perth Children's Hospital Research Fund.