Integrated whole exome sequencing and functional approach delineate genetic heterogeneity in cerebellar ataxias

Purpose Disease deconvolution in heterogeneous cerebellar ataxias (CAs) needs a focussed approach to overcome the diagnostic challenges. A diverse clinical presentation with over 100 reported genetic loci, in addition to the various challenges associated with genotype-phenotype correlation complicate the genetic diagnosis in 40-60% of the CA cases that remain uncharacterized. We present here an integrated whole exome sequencing combined with a functional validation approach to delineate the genetic etiology in Indian CA patients. Method A total of 50 familial and sporadic progressive CA families (negative for CNG expansion) including 101 subjects were recruited for this study. Index patients from 50 families were subjected to singleton whole exome sequencing (S-WES). Family-based WES (F-WES) was carried out for seven S-WES selected families. Protein simulation and docking studies were performed for seven genetic variants identied through WES. A Cell line-based model was used to assess disease signatures for variants in KCNC3 and a new candidate gene, SPTB. Clinically relevant identied in 70% (35/50) of the selected families. We achieved a 50% (25/50) denitive diagnostic yield and 14% (7/50) probable diagnostic yield while 6% (3/50) of the families showed variants of uncertain signicance. We prioritized compound heterozygous variants in a candidate gene, SPTB for cerebellar ataxia with hereditary spherocytosis. Lymphoblastoid cell line derived from a patient with a KCNC3 variant showed altered disease signatures with induced ROS and elevated unfolded protein response markers at the basal level.


Introduction
Cerebellar ataxias (CAs) are a group of rare monogenic neurodegenerative disorders with heterogeneous clinical presentation and variable age at onset. The debilitating disease progression and disability imposed by CAs on the patients impose an economic burden to the affected families along with a disease burden for future generations.
Amongst the clinical manifestations of CAs, incoordination of voluntary motor movements is a cardinal cerebellar symptom besides more than 35 other non-cerebellar and other non-neuronal symptoms 1 .
While more than 100 different genetic loci have been implicated in CA subtypes so far, ataxia is also present as a clinical feature in over 500 other neurological disorders with known genetic etiology [2][3][4] . Heterogeneous clinical presentation, a large number of underlying variants, anticipation, pleiotropy, variable expressivity, and a highly diverse population structure are some of the factors that obscure a differential or de nite diagnosis for CA subtypes.
For any monogenic disorder, a rapid genetic diagnosis is an invaluable aid for the healthcare system as it enables timely disease prognosis and better patient management. Among the genetically de ned CA cases, tandem expansions of nucleotide repeats account for nearly 30-40% of the disease burden, while conventional variants add another half to the causal factor list 5 . Whole exome sequencing (WES) as a high throughput tool for detection of all coding variants, has been found to be more e cient than other genetic tools at improving diagnostic yield and enabling novel gene discoveries [6][7][8] . Our previous study in autosomal recessive CA (ARCA) families has enabled us to identify both, already reported as well as novel variants in typical and atypical ataxia genes with a much higher diagnostic yield (56%) using WES, as compared to targeted resequencing 9,10 . WES could therefore be a more appropriate tool for genetic delineation in otherwise unresolved cases of CA families, speci cally the sporadic CA occurrences.
The present study applied a rapid, focussed, and integrated approach involving genetic and functional analysis for the diagnosis of CAs. Use of WES in indexed patients (Singleton; S-WES) and family members (family-based design; F-WES) in 50 families enabled us to delineate clinically relevant genetic defects in 70% (35/50) of the selected families with a de nitive genetic diagnosis in 50% (25/50) of the families.
We observed genetic variants in CAs, hereditary 1spastic paraplegia (HSP), and other neurological disorders genes. Moreover, we propose a new candidate gene SPTB for CAs with hereditary spherocytosis. Through our functional approach of testing pathogenicity of identi ed variants with rapidity and sensitivity, we demonstrated the potential of patient-derived lymphoblastoid cell lines (LCLs) based assays for elucidation of neurodegeneration pathways, utilizing KCNC3 variant as a model.

Clinical representation of selected cases
Amongst the 54 uncharacterized cases selected from the 50 families, the gender ratio (male (M): female (F)) was 38:16. The mean (SD) age of onset (AO) of the CA subtypes in the families was observed to be 35.6 (11.6) years for autosomal dominant cerebellar ataxias (ADCA), 18.2 (9.4) years for sporadic early-onset cerebellar ataxias (SPEOCA), and 51.2(6.3) years in case of sporadic late-onset cerebellar ataxias (SPLOCA) (Supplementary Table S1). Among the cerebellar features of CA, gait was the most prevalent cerebellar manifestation in patients exhibited by 94% of the cases, followed by dysarthria (85%), intention tremor (70%), and nystagmus (56%). A pyramidal sign such as the extensor plantar re exes (are exia, hyperre exia, and clonus), tone (spasticity), and planter (Babinski sign) was presented by 30-40% of the families. Extrapyramidal signs (postural tremor, rigidity) as a subclinical feature were represented in very few of the evaluated patients. Other neurological signs such as peripheral neuropathy, autonomic dysfunction, skeletal abnormality, and psychiatric symptoms were also presented by the cases (Supplementary Fig. S1 and Supplementary Table S2).

Diagnostic yield
We used S-WES in 50 familial and sporadic families of CA to screen variants in gene panels consisting of ataxia phenotype genes. Further, to enhance the diagnostic yield and enabling new gene discoveries, we applied F-WES in seven out of the 50 families. Technical validation, segregation analysis (Supplementary   Table S3 and S4), frequency estimation (in 257 ethnically matched controls and in-house data), protein modelling, and genotype-phenotype correlation were performed for the prioritized variants ( Figure 1).
We identi ed and designated clinically relevant variants in 70% of the families (35/50), 62% with S-WES and 8% through F-WES. Among the categorized families, de ned genetic etiology in ADCA, SPEOCA, and SPLOCA was 57%, 86%, and 57% respectively. We delineated de nitive genetic diagnosis in 50% of the families (25/50). Reanalysis using updated annotation pipelines and data resources enabled further identi cation of probable diagnostic variants in 14 % (7/50) of the families (Table 1).
While following the American College of Medical Genetics and Genomics and the Association for Molecular Pathology (ACMG-AMP) guidelines for variant classi cation [11][12][13] in both, the S-WES and the F-WES design, we identi ed 41 clinically relevant variants in 29 genes including 12 pathogenic, 25 likely pathogenic and 4 variants of uncertain signi cance (VUS). The above-mentioned 41 variants also include 10 novels, 12 reported (Clinvar and/or HGMD), and 19 rare variants. (Table 2 and Supplementary le 1). Of these, 37 pathogenic and likely pathogenic variants were seen in 32 families and 4 VUS variants in three families ( Table 2).

Spectrum of genetic variants in cerebellar ataxia subtypes
Following WES in single and family-based design, we discovered 14 CA subtypes in our cohort, including 11 spinocerebellar ataxia (SCA) and 3 spinocerebellar ataxia autosomal recessive (SCAR) subtypes in 17 families (Table 2 and Supplementary le 1).
Among the SCA subtype variants observed, we found three previously reported pathogenic variants and two likely pathogenic novel variants. In AT1798, we observed a novel heterozygous (Het) variant, p.Glu53Asp in EEF2 (SCA26, 609306), which lies in proximity to the phosphorylation site, p.Thr56. Two phosphorylation sites, p.Thr56 and p.Ser595 regulate the eEF2 role in the cell cycle and translational machinery. Variant p.Ser595 directly regulates p.Thr56 and is required for its e cient phosphorylation. Our reported variant p.Pro596His, which lies next to the phosphorylation site p.Ser595, leads to impaired translation and increases the susceptibility for proteostatic disruption. Identi ed variant p.Glu53Asp lies in a GTP binding domain (P-loop containing nucleoside triphosphate hydrolase) next to phosphorylation site, p.Thr56, and probably ascertains pathogenicity through similar mechanisms as caused due to p.Pro596His variants 14,15 .
Two carriers were found to harbor variants in FAT1; AT2176 had two Het variants (p.Thr874Met and p.Phe3590Leu) while AT2029 had one Het variant (p.Asp1554Asn). AT2176 with AO of 24 years presented with gait, intention tremor, vision impairment, dysarthria, head tremor, nystagmus, mild bradykinesia, and psychiatric symptoms and carried a differential clinical diagnosis of Friedreich ataxia (FRDA)-like or a mitochondrial disorder. AT2029 on the other hand, had an AO of 50.5 years and presented with cerebellar signs, gait, intention tremor and dysarthria only. Autosomal dominant (AD) mode of inheritance has been reported for FAT1 in cerebellar ataxias with an AO ranging between 10-70 years and variable clinical presentation 16 , while reported cases of childhood-onset (<10 years) patients with episodic ataxia have p.Asp1930His variant 16 .
Of the two variants identi ed in AT2176, p.Phe3590Leu was found to have 44 Het in the gnomAD database and has been classi ed as a VUS using the ACMG-AMP criteria. Protein modelling studies of these variants have implicated their role in the alteration of the protein structure (Supplementary Fig. S2 and S3). Due to the unavailability of the parent's sample for AT2176, we were unable to test the two variants for cis or trans (CHet) heterozygous state. AT2176 also reported a pathogenic variant, W22R 17 in Het state in autosomal recessive (AR) inherited gene, NDUFB3 as an incidental nding (Supplementary le 2). We conclusively designated the p.Thr874Met variant for the causative genetic defect in our indexed patient AT2176. Functional analysis of the same further explained the disease severity and early age of onset in the patient AT2176.
Further, we observed variant p.Ser399Leu in KCNC3 (SCA13, 605259) and a previously reported variant p.Gly230Val 18 in ELOVL5 (SCA38, 615957) in sporadic patients AT2216 and AT1889 respectively. An asymptomatic younger family member of the AT2216 was also found to harbor the p.Ser399Leu variant. Variant p.Gly230Val in ELOVL5 was also observed in asymptomatic offspring of indexed patients. The above ndings highlight the importance of revaluation of asymptomatic family members, which could help in timely assessment of the disease prognosis.
Amongst all the identi ed CA subtypes variants, those in SETX were found to be the most common. Three SPEOCA families had compound heterozygous (CHet) variants in SETX representing SCAR1 phenotype. One ADCA family had the Het variant. Out of the seven variants observed in SETX, 3 Het missense variants i.e. p.Tyr324Cys, p.Ser2158Asn and p.Thr2373Ala are novel variants and segregated in CHet fashion.
Four families characterized with three HSP subtypes exhibited an overlapping phenotypic representation with CAs and therefore, broadened the ndings of these two disorders 19 . We adopted a composite approach of inclusion of HSP genes in the ataxia gene panel, which resolved these uncharacterized patients with homozygous (Hmz) or CHet variants in SPG7, GBA2, and CAPN1 in autosomal recessive (AR) mode.
In the present study, we further observed variants in 10 genes, which have been reported in other neurological disorders. Besides ataxia, these patients also presented other overlapping clinical features. The aforementioned neurological genes observed in this study have been previously reported in CA cohorts from other populations as well, indicating the pleiotropic effect of these genes, thereby serving as rare loci for ataxia 20 .
We identi ed three AR genes, CEP290, RELN, and NPC1, which have been assigned as subtypes for autosomal recessive cerebellar ataxias 21 24,25 . The spectrum of PSEN1 variants across the exon 7, its pleiotropic nature and the damaging effect of p.Asn190Asp variant highlights its potential as a likely pathogenic diagnostic variant.

Family based design identi ed recently reported CAs subtype and new gene
After screening and searching variants in ataxia gene panel in index patients of S-WES design, we followed F-WES design in the remaining seven uncharacterized families ( Supplementary Fig. S4). Following segregation and other pathogenicity assessment, we identi ed genetic determinants in four of these families.
In one SPEOCA patient, we observed CHet missense variants (p.Ser662Phe and p.Leu1383Met) in SPTB that were segregated in parents, absent in ethnically matched control and predicted to be protein damaging (Figure 2A-2D). SPTB has been reported in hereditary spherocytosis (HS) 26 . A case report by McCann & Jacob (1976) had reported two different patients with spinal cord disease who presented with loss of balance, spasticity, intention tremor, unsteady gait, and also had HS 27 . We further tested the HS phenotype in the indexed patient through the osmotic fragility test and observed an altered osmotic fragility level ( Figure 2B). This con rmed that the SPTB carrier has a mild HS phenotype in addition to the CA symptoms. Recently, Het variants in SPTB have been reported in autism spectrum disorder 28,29 and CHet variants in amyotrophic lateral sclerosis 30   Mitochondrial dysfunction and calcium homeostasis could be tested in AT2216 to ascertain involvement of converging pathways as reported in other CAs and neurodegenerative disorder [38][39][40] .
Unfolded protein response (UPR) assay, which checked using BiP expression, showed slight induction in FRDA-LCLs, while markedly induction was observed in KCNC3-LCLs. Induced expression of BiP suggests putative pathological perturbation in UPR. However, the expression level of CHOP protein found to be very low in all LCLs ( Figure 3C). This could either be due to transient expression of CHOP, homeostatic maintenances of cells or involvement of other cellular processes.
Further, to elucidate the functional role of SPTB in a cell line model, we used the SKNSH cell line, as the expression of SPTB RNA and protein is very low in LCLs ( Supplementary Fig. S6). We prepared a transient knockdown model using siRNA approach, designed at variant location. An average of 40-50% knockdown e ciency was achieved in SPTB siRNA treated cells when compared to the scrambled siRNA treated cells ( Supplementary Fig. S7). With the same knockdown condition, cell viability in SPTB siRNA treated cells reduced to 82% as compared to the scrambled siRNA treated control cells ( Figure 3D). This suggests that SPTB may have a functional role in human neuronal cells. Further variant speci c functional validation in a disease model is warranted to elaborate its pathophysiology in CAs.
Of the CHet variants p.Ser662Phe and p.Leu1383Met in SPTB, residue 662 lies within the fourth spectrin repeat while residue 1383 is present within the spacer region between the 10th and 11th spectrin repeats, close to the interacting site of ankyrin R and postsynaptic density (PSDs) scaffolding proteins. MD simulation of CHet variants of SPTB pointed towards opening of the loop in the vicinity of residue 662, while the helix region near residue 1383 changes to a coil conformation. These variants may therefore, possibly affect the binding of SPTB with its interactants ( Figure 2D and Supplementary Fig. S2 and S3).
A simulation study of two variants in FAT1 (p.Thr874Met and p.Phe3590Leu) observed in AT2176 showed that the structure of variant protein is more compact than the wild type protein. The simulation results further indicated that the coil region localized in residue 3590 which forms a small helix in the wild type transformed to a beta-sheet in the variant ( Supplementary Fig. S3).
Docking of 1,2-Ethanediol (EDO) in MME variant (p.Arg374Lys) localized site elucidated an altered orientation of the binding site, thereby affecting the binding of EDO ( Supplementary Fig. S5D). MD simulation of FAT2 domains showed that in the wild type residue, Val2260 adopted a sheet-coil-sheet conformation while a more open structure was found in the variant residue, Asp2260 and extended betasheets around the variant localized region remain unchanged. Simulation of p.Gly220Arg variant containing region in CAPN1 suggests that Arg220 residue compacts the overall structure of the protein and possibly affects the binding of calcium ion at its binding site ( Supplementary Fig. S2 and S3).

Discussion
In the present study, we identi ed a diagnostic utility variant in 50% (25/50) of the selected families.
De nitive/probable diagnostic yield with a clinically relevant pathogenic/likely pathogenic variant, reported by us, is much higher (64%) than other recent studies that have extensively utilized genetic and/or functional approach 16 . Our experimental design enabled us to unveil many rare CA subtypes and ataxias presented with other neurological disorders within our cohort. The deviation from the reported phenotype in our selected patients highlighted the pleiotropic effect of genes (such as GPR88, PSEN1, and other neurological disorders genes from our study) linked to neurodegeneration. Observed known variants in ataxia reported genes in our population signi es founder events of the past, while identi cation of novel variants also suggests population-speci c disease variants. Variants in SETX are the most common among characterized families. We obtained the highest diagnostic yield in the SPEOCA group of patients (Table1). Ẁ e propose SPTB as a new candidate gene for CAs since it has not been reported so far to the best of our knowledge for involvement in any cerebellar phenotype in any model system. Along with neurological symptoms, we observed mild HS phenotype in AT1929 suggesting that the variants in SPTB could lead to co-occurrence of HS and CAs phenotype. SPTB shows an abundant expression in the PSDs in Purkinje neuron 41 , plays an important role in synaptic plasticity along with other PSDs and is involved in NCAM mediated neurite outgrowth 42 . Knockdown of the β1 spectrin in mice resulted in an increase in the number of perforated neurons and AMPA receptor (AMPAR) endocytosis, leading to an abnormal synaptic activity 43 . Overexpression of the actin-binding domain of SPTB leads to an increased number of dendritic spine, size of PSDs, and less number of functional synapses 44 . In spectrin protein families, multiple isoforms (two alpha spectrin and ve-beta spectrin) have been reported with diverse cellular localization, thus indicating their variable functions. Variants in different spectrins have been implicated in both neurological and erythrocytic disorders 42,[45][46][47] . In the beta spectrin family, SPTBN2 variant has been associated with SCA5 and SCAR14 42,45 while variant in SPTBN4 has been reported to cause congenital myopathy with neuropathy and deafness 47 . The cumulative evidence of reported neuronal function of SPTB and its genetic aberrations along with other spectrins leading to a neurological manifestation underscores the relevance of SPTB as a new candidate gene for CAs.
So far, the genetic diagnosis and mechanistic understanding of heterogeneous neurological disorders is limited by numerous constraints. The research domain in CAs diagnostics is largely restricted by the unavailability of family members' samples for segregation analysis or that of other affected families with the same gene. The reasons for not obtaining yield in all the cases with WES signi es its limitation to sequence the GC rich regions and also the repetitive elements, while WES in our study was based on short-read sequences, however long read sequencing approach in future may deal with the better outcomes. This also necessitates the application of a better phenomics-based approach using comprehensive variants annotation and the development of rapid functional validation assays for ascertaining the pathogenicity of the novel and VUS variants.

Conclusion
Our study design represents a valuable, focused, and clinically translatable approach to delineate the genetic etiology of CA patients in a genetically and ethnically diverse Indian population. The proposed approach emphasises and remarks on the need for a population-speci c gene-variant panel for rapid and cost-effective genetic diagnosis of CAs. The integrated WES and functional approach is expected to increase the clinical yield of diagnostics over time allowing for timely and well-informed clinical decisions and patient management.

Study design
Our ongoing efforts of genetic investigations of cerebellar ataxia patients of our cohort (~5000 until date since 1997 inclusive of unpublished observations) utilize screening of common repeat-associated SCAs (SCA1, SCA2, SCA3, and SCA12) and FRDA and able to diagnosed 30-40% cases. To characterize the remaining unsolved cases, we adopted the WES approach. This study was an effort to know the e cacy of WES in diagnostic yield in our heterogeneous population and the frequent, rare hereditary ataxia subtypes with their variant spectrum that would help in strategizing rapid genetic diagnosis of unsolved cases.
Our selected study of participants comprised of 101 subjects including probands and other affected, unaffected and asymptomatic family members from 50 families. We have categorized families into two groups based on family history, familial and sporadic. In the familial group, we have selected ADCA families. In sporadic family groups, we have subcategorised the families based on their AO. Families with probands AO is ≤ 40 years termed as SPEOCA and >40 years de ned as SPLOCA.  Table S6).
WES was performed using TruSeq™ Exome Enrichment and Nextera exome capture protocol (Illumina Inc., San Diego, California) followed by 100 bp paired end sequencing on Hiseq2000 9 . On average, a total of 6GB raw data as generated for each sample was processed using the method described earlier 9,10 .
Brie y, Trimming and ltering of bad quality data was performed with Trimmomatic 48 .

Variant Filtering, validation, and classi cation
To identify molecular aberrations in rapid and focused ways, we rst screened variants of our enlisted ataxia gene panel in selected S-WES cases. Further, we have ltered variants from S-WES and F-WES data and searched for atypical loci and candidate gene variants. Routinely used ltering steps for rare disorders were followed wherein protein disruptive variants (missense, nonsense, splice site, frameshift) that are novel or rare (frequency ≤0.1%) in population database, 1000GP and gnomAD 37,51 selected. Variants that were present in WES sequenced 11 ethnically control samples (ADCA-Het and Hmz variants; SPEOCA and SPLOCA-Hmz variants) were ltered out.
We applied an AD or autosomal recessive (AR) disease model based on the inheritance pattern and/or age at onset (Detailed in supplementary material and methods).
Genotype-phenotype correlation performed using reported and presented clinical features. Technical validation of selected variants performed through Sanger sequencing using designed primers pairs (Supplementary Table S7) and analytical validation of selected variants done by Sequenom, iPLEX Gold Mass array technology 9 . Segregation analysis of selected variants performed in informative family member's samples using Sanger sequencing.
ACMG-AMP guidelines 11 were followed for variants class and interpreted using VarSome (automated) 12 and InterVar (user adjusted) 13   and SPTB carrier) and controls were established using an in-house established protocol 55 . Basal level Unfolded Protein Responses (UPR) in LCLs of SCA1, SCA2, FRDA, KCNC3 carrier, and SPTB carrier were evaluated using BiP and CHOP after 48 hours' culture and positive control was generated by treating control LCL with 4μg/ml tunicamycin for 24 hours. Reactive oxygen species (ROS) at basal levels evaluated in LCLs (KCNC3 carrier, FRDA, and control) after 24 hours of culture using 5μM CM-H 2 DCFDA dye and analysed through ow cytometry. Control LCLs treated with tert-butyl hydroperoxide (t-BHP) and used as a positive control (Supplementary methods and materials).

Cell viability assay in SPTB knockdown cell lines
Knockdown of SPTB was performed in SKNSH cell line using designed siRNA (designed in the proximity of S662F region, sense; 5'-CAUAGUCCAGGGAAGAAUAdTdT-3' and antisense; 5'-UAUUCUUCCCUGGACUAUGdTdT-3') from Sigma Aldrich. Cells were transfected with 20 nM SPTB and scrambled siRNA (Sigma) in 1:2 ratio with a vehicle, incubated for 24 hours, and tested for knockdown e ciency. Cell viability in SPTB knockdown cells checked using MTT assays. (Supplementary methods and materials).

Declarations Ethical clearance
Institutional Ethics Committees of the participating institutes have approved this study and all recruited subjects had given their informed consent for participation. Figure 1 Technical validation, segregation analysis, frequency estimation (in 257 ethnically matched controls and in-house data), protein modelling, and genotype-phenotype correlation were performed for the prioritized variants. In one SPEOCA patient, we observed CHet missense variants (p.Ser662Phe and p.Leu1383Met) in SPTB that were segregated in parents, absent in ethnically matched control and predicted to be protein damaging .

Figure 3
Basal Reactive Oxygen Species (ROS) levels in KCNC3-LCLs and FRDA-LCLs showed 2-fold and 1.34-fold change respectively as compared to the negative control-LCLs ( Figure 3A and 3B). The expression level of CHOP protein found to be very low in all LCLs ( Figure 3C). With the same knockdown condition, cell viability in SPTB siRNA treated cells reduced to 82% as compared to the scrambled siRNA treated control cells ( Figure 3D).

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.