The genetic spectrum of a cohort of patients clinically diagnosed as Parkinson’s disease in mainland China

So far, over 20 causative genes of monogenic Parkinson’s disease (PD) have been identified. Some causative genes of non-parkinsonian entities may also manifest with parkinsonism mimicking PD. This study aimed to investigate the genetic characteristics of clinically diagnosed PD with early onset age or family history. A total of 832 patients initially diagnosed with PD were enrolled, of which, 636 were classified into the early-onset group and 196 were classified into the familial late-onset group. The genetic testing included the multiplex ligation-dependent probe amplification and next generation sequencing (target sequencing or whole-exome sequencing). The dynamic variants of spinocerebellar ataxia were tested in probands with family history. In the early-onset group, 30.03% of patients (191/636) harbored pathogenic/likely pathogenic (P/LP) variants in known PD-related genes (CHCHD2, DJ-1, GBA (heterozygous), LRRK2, PINK1, PRKN, PLA2G6, SNCA and VPS35). Variants in PRKN were the most prevalent, accounting for 15.72% of the early-onset patients, followed by GBA (10.22%), and PLA2G6 (1.89%). And 2.52% (16/636) had P/LP variants in causative genes of other diseases (ATXN3, ATXN2, GCH1, TH, MAPT, GBA (homozygous)). In the familial late-onset group, 8.67% of patients (17/196) carried P/LP variants in known PD-related genes (GBA (heterozygous), HTRA2, SNCA) and 2.04% (4/196) had P/LP variants in other genes (ATXN2, PSEN1, DCTN1). Heterozygous GBA variants (7.14%) were the most common genetic cause found in familial late-onset patients. Genetic testing is of vital importance in differential diagnosis especially in early-onset and familial PD. Our findings may also provide some clues to the nomenclature of genetic movement disorders.


INTRODUCTION
Parkinson's disease (PD) is a progressive neurodegenerative movement disorder with bradykinesia, rigidity and rest tremor as the cardinal motor symptoms. Genetic research on PD has achieved significant advance in recent years 1 . So far, pathogenic/ likely pathogenic (P/LP) variants in over 20 genes that lead to Mendelian inheritance have been identified in PD with various evidence levels 2-4 (defined as known PD-related genes in the study), accounting for 3%-5% of all PD patients 1 and 7%-9% of PD patients with early onset age (<50 years old) or family history [5][6][7] .
Based on the phenotypic heterogeneity, patients with P/LP variants in causative genes of other diseases, such as ATXN2 in spinocerebellar ataxia (SCA) 8 , GCH1 in dopamine responsive dystonia (DRD) 9 or MAPT in frontotemporal lobar degeneration (FTLD) 10 , might present with PD-like symptoms in the early stage or even in the late disease course 11 and might be clinically misdiagnosed as PD.
Large scale genetic studies of PD have been carried out in different ethnic populations in recent years [5][6][7][12][13][14][15][16] . The frequency of P/LP variants ranged from 1.4 to 14%, with higher rate of 4.1-33.9% in early-onset Parkinson's disease (EOPD) patients or patients with family history in different studies. Though P/LP variants in PRKN, LRRK2 and GBA were commonly detected, the genetic architecture differs in different ethnicities. Two large genetic studies of Chinese PD patients have been reported recently 5,15 . One study included familial PD, EOPD and sporadic PD and focused on 23 known PD-associated genes while the other included EOPD and focused on 26 PD-related genes and 20 other genes linked to neurodegenerative and lysosome diseases.
Here we conduct another large-scale genetic testing in Chinese Han PD patients. We investigated the 116 disease causative genes (including 22 known PD-related genes) in a cohort of clinically diagnosed PD patients with early age at onset (AAO) or family history to identify the genetic spectrum and the nomenclature of genetic movement disorders besides the genetic screen strategy.

RESULTS
The demographic and the clinical features of the patients initially diagnosed as PD The median AAO of the early-onset group and familial late-onset group was 39.00 (12.00) and 59.00 (10.00) years, respectively [Median (interquartile range (IQR)), p < 0.0001], and the disease duration was 46.00 (67.00) and 45.00 (57.00) months (p = 0.1371), respectively (Supplementary Table 1). The patients with family history in early-onset group account for 28.62% (181/636). There is significant difference between the two groups regarding sex (p = 0.0306) and education years (p = 0.0001). All the participants were self-reported Chinese Han ethnic.
Twenty-nine novel P/LP single nucleotide variants and short insertions/deletions were found in the cohort, with 13 Table  2 and Supplementary Fig. 1). In addition, one novel triplication of exon 2-4 of PRKN was also found.
The mutational frequency of all the screened genes and the genotypes were described in Supplementary Materials.
The clinical characteristics related to the known PD-related genes in the cohort Comparing with genetic undefined (GU) patients in early-onset group, PRKN patients had an earlier AAO, longer disease duration and less levodopa equivalent daily dosage (LEDD), and tended to have more preserved olfactory function and cognitive function in domains of executive function, language, and attention. After adjusting for confounders (age, sex, education, disease duration, and LEDD), all the significance remained. More females and shorter disease duration at baseline were found in patients bearing heterozygous GBA variants. More significant olfactory dysfunction was observed in GBA patients, both before and after adjusting for confounders. Patients with P/LP variants in PLA2G6 had an earlier AAO, younger age at examination. They showed more severe motor dysfunction with higher Unified Parkinson's Disease Rating Scale (UPDRS) III score. They were more affected in non-motor symptoms with higher Non-Motor Symptoms Questionnaire (NMSQ) scores and more depression. They also had significantly more decreased quality of life and worse cognitive function in all the 5 subdomains than GU patients both before and after adjustment ( The continuous data are presented as median (interquartile range (IQR)), and the categorical data are presented as n (%). AAO age at onset, EOPD early-onset Parkinson's disease, GU genetically undefined, BDI beck depression inventory, ESS Epworth Sleepiness Score, LEDD Levodopa equivalent dose daily, MMSE mini mental state examination, NMSQ non-motor symptoms questionnaire, PDQ39 39-item Parkinson's disease questionnaire, RBDSQ rapid-eye-movement sleep behavior disorder screening questionnaire, SSST-12 Sniffin' Sticks screening 12 test, UPDRS Unified Parkinson's Disease Rating Scale. a Significant after adjustment for age, gender, education, disease duration, and levodopa dose equivalents. P: Comparison between the early-onset group with variants within a specific gene and GU-EOPD group. *P < 0.05, **P < 0.001. (2.00 (4.00)) were observed in SNCA patients (Supplementary Table 4).
The clinical characteristics related to the causative genes of other diseases in the cohort As for the patients carrying P/LP variants in causative genes of other diseases, all the patients met the clinical diagnostic criteria of PD at the first clinical visit. All the affected family members of the probands with family history were diagnosed or suspected with PD. No cerebellar signs were observed in all of the SCA patients at the baseline visit, and only one SCA2 patient showed mild nystagmus at follow-up. As for the noteworthy features, three out of seven SCA2 patients had dystonia, with one patient having foot dystonia and blepharospasm at the first year of disease and 2 patients developing blepharospasm, and spastic torticollis/upper limb dystonia respectively during follow-up. One spinocerebellar ataxia type 3 (SCA3) patient had peripheral neuropathy and one patient with PSEN1 variant had a medical history of epilepsy and stroke. Among the 12 patients followed up in our center, 10 patients still met the clinical diagnostic criteria of PD with the disease duration from 4 to 15 years, and the other two with MAPT variant developed progressive supranuclear palsy (PSP) or frontotemporal dementia like symptoms at the third and fourth year of disease (Supplementary Table 5).

DISCUSSION
In the current study, the genetic spectrum was investigated in a cohort of clinically diagnosed PD patients from mainland China. We found 26.92% patients (224/832) carrying P/LP variants in known causative genes in a cohort of patients with an early AAO or family history. Of note, twenty patients (20/832, 2.15%) carried P/LP variants in causative genes of other diseases, including ATXN2, ATXN3, GCH1, MAPT, PSEN1, TH, GBA (homozygous variants) and DCTN1. The genetic architecture in our cohort and its relationship with clinical diagnosis and AAO were summarized in Fig. 2 17,18 .
For the known PD-related genes, we found a higher frequency of P/LP variants in early-onset patients (30.03%) than familial lateonset patients (8.67%), this finding is consistent with previous reports that the genetic factors play a major role in the disease onset in EOPD 5 . The frequency of P/LP variants (8.67%) was much lower in familial late-onset patients comparing with that in all the included familial patients regardless of onset age (27.05%), though all the probands had family history of parkinsonism. Notably, two third late-onset patients in the cohort had a family history of lateonset PD. The familial aggregation of late-onset PD might partly be explained by the genetic burden of rare variants or common variants, the similar environmental exposures in one family, and the prevalence with ageing, rather than a single strong genetic effector 19 .
The frequency of causative genes or the P/LP variant spectrum within a certain gene varied among different geographical regions and populations 12,20,21 . LRRK2 variants were reported as the leading genetic cause of PD in European population, with LRRK2 G2019S explaining about 5-6% familial PD and 1% sporadic cases 13 . The frequency of LRRK2 G2019S in PD can surge to almost 14% in Ashkenazi Jews population 5 . However, no LRRK2 G2019S were identified in our cohort in line with previous reports in Asian population 22 . In Chinese EOPD and familial PD patients, P/LP variants in PRKN were the most prevalent with 4.3-5.7% in different studies, followed by heterogenous GBA variants (2.1-7.2%) [5][6][7]15 . The P/LP variants of PINK1 (0.4-1.2%), PLA2G6 (0.3-0.6%), SNCA (0.2-0.7%), LRRK2 (0.4-2.0%), VPS35 (0.06-0.3%) were reported with a lower frequency. However, in this study, the proportion of patients carrying PRKN (15.72%), GBA (9.50%) and PLA2G6 (1.89%) variants was higher than previously reported, while the proportion of patients carrying the other PD causative genes was similar. Accordingly, the frequency of known PDrelated genes excluding GBA was 19.81% in early-onset group, higher than other findings (11.6% 15 , 9.3% 6 , 8.74% 5 , 7.5% 7 ). It could be interpreted to the distribution of AAO and the selective bias of a single center of tertiary hospital. The percentage of patients with AAO ≤ 30, 30 < AAO ≤ 40, and 40 < AAO ≤ 50 was 19.6%, 37.2%, 45.1% in our cohort, with much more patients with younger AAO than those in Zhao's study (6.6%, 19.6% and 73.8%) 5 . This may explain the higher molecular diagnosis rate of our study, since the lower AAO was related with a higher molecular diagnosis rate 5 .
Dynamic variants of ATXN2 and ATXN3 explained 1.4% of familial PD in this study, which indicated the importance of genetic testing of dynamic variants in patients with parkinsonian symptoms. Though autosomal dominant cerebellar ataxia was recognized as the predominant phenotype of SCA2, parkinsonism was also common in SCA2 23 . The SCA2 patients predominantly manifesting parkinsonism might be reminiscent of PD patients and should be detected by genetic testing 24,25 . Similarly, parkinsonian phenotype was also frequently described in SCA3,  albeit less prevalent than ataxia 26,27 . Intrafamilial phenotypic heterogeneity was found in reported SCA3 cases with parkinsonism 28 , but the affected family members of four SCA3 probands in this study were all PD phenotype, resulting in further setbacks to the exact diagnosis.
SCA2 or SCA3 patients with parkinsonism being the dominant symptoms could have unremarkable findings in brain magnetic resonance images (MRI) 25,29 , which indicated the brain MRI may or may not be the clues to differential PD from SCA. One patient with SCA3 dynamic variant had peripheral neuropathy, which was commonly observed in ataxia predominant SCA2 and SCA3 30,31 and was also reported in parkinsonism dominant SCA2 and SCA3 26,32 . Though the percentage of SCA dynamic variants in patients with parkinsonism plus peripheral neuropathy was unknown, we suggested besides family history, peripheral neuropathy can be another important hint to the SCA diagnosis.
The P/LP variants of GCH1 were initially detected in the patients with DRD. Besides classic phenotype of DRD, the broadened clinical spectrum of GCH1 can include adult-onset parkinsonism, focal dystonia, DRD-simulating cerebral palsy or spastic paraplegia 33 . The P/LP variants of GCH1 were later reported to be associated with DRD and PD in different family members in the same pedigree, and even with sporadic PD patients 9,34,35 , indicating the GCH1-related mechanism might contribute to the pathogenesis of PD and tend to increase the risk of PD 34 . The patients in our study showed PD-like symptoms. Significant sleep benefit and good response to levodopa treatment was observed but no dystonia. They might be PD or DRD since the two GCH1 variants were reported both in DRD and PD 2,34,35 . The positron emission computed tomography (PET) for dopamine transporter could be helpful to the diagnosis, which, however, was not applied in the patients.
Biallelic TH and GBA variants were identified as the genetic cause of DRD and Gaucher's disease. For phenotypic spectrum, patients with homozygous/compound heterozygous variants of TH or GBA could also present PD-like characteristics [36][37][38] . DCTN1 was initially identified as the genetic cause of Perry syndrome, but adult-onset (atypical) parkinsonism was later found prominent in patients bearing DCTN1 variants 39 . Patients with MAPT or PSEN1 variants usually present with FTLD and AD respectively, but can have prominent parkinsonism 10,40 . In the early disease stage, these patients might manifest characterized symptoms of PD and be misdiagnosed as PD 10,[40][41][42][43] . With the progression of the diseases, more atypical symptoms emerge which indicate other diagnosis instead of PD. Parkinsonism is one of the core phenotypes of the frontotemporal dementia and parkinsonism linked to chromosome 17 (FTDP-17) spectrum caused by MAPT variants 10 , which should be differentiated from PD in the first place especially in cases with the autosomal dominant inherited family history. Besides, the atrophy pattern on brain MRI may give clues to the diagnosis of FTLD or AD, but the genetic testing should be implemented early whenever possible.
Our work may provide important implication for the application and interpretation of genetic testing in clinically diagnosed PD patients, especially in Chinese Han population. The genetic testing will improve the diagnostic accuracy of PD in the early-onset and familial patients, especially those with onset age below 50. Besides known PD-related genes, causative genes of other diseases should also be attached importance during interpretating the results of genetic testing, as they were also the indispensable genetic contributor to the PD-like phenotype.
Our findings would also refine the patient recruitment for clinical trials. The "one-size-fits all" patient recruitment approach contributed a lot to the high rate of failure in previous PD clinical trials 44,45 . The genetic status determined by proper genetic testing can lead to quick access to eligible candidates for new treatment options. Genetically targeted clinical trials and the implementation of personalized medicine may bring new opportunities to PD disease-modifying therapy.
Due to the clinical heterogeneity of the neurogenetic diseases, the broad and complex phenotypes-including the parkinsonian characteristics, can be associated with a specific causative gene, sometimes it is a causative gene of other disease. Meanwhile, the P/LP variants in the known PD-related genes might present other phenotypes rather than the characterized PD symptoms. For example, LRRK2 variant carriers can present with PSP phenotype 46 , PRKN variant carriers with DRD, ATP13A2/PLA2G6/FBXO7/SYNJ1/ VPS13C variant carriers with atypical Parkinsonian syndromes 11 . These confusions raise the question of the nomenclature of genetic movement disorders which has been discussed for a long time [47][48][49] . Thus in 2016, the international Parkinson and movement disorder society task force recommended a 2-axis nomenclature system with the phenotype followed by the gene, and divided the hereditary parkinsonism-related genes into 3 categories based on phenotype besides parkinsonism. The recommendation was updated in 2022 39,47 , in which, most mutated genes found in the study were included in the hereditary parkinsonism (DCTN1 in atypical parkinsonism or complex phenotypes; GCH1, PLA2G6, and TH in combined phenotypes; GBA, ATXN2 and MAPT in disorders that usually present with other phenotypes but can have predominant parkinsonism) except for PSEN1 and ATXN3. Our work might improve the understanding of nomenclature system and offer further clues to the system. This study has some limitations. Firstly, we did not apply wholeexome sequencing (WES) and dynamic variant testing of SCA to all the patients which might bias the frequency of causative genes in other diseases in the cohort. Secondly, a PET for dopamine transporter was not available in some patients carrying causative genes of other diseases. A follow-up study was important to these patients. Thirdly, this study was conducted in a single center of tertiary hospital which may inevitably result in bias, thus more multicenter studies are necessary in the future to verify our findings.
In conclusion, we investigated the genetic characteristics of a group of clinically diagnosed PD patients. This study demonstrated the importance of genetic testing in the diagnosis in clinically diagnosed PD patients, especially in the early-onset patients or patients with family history. It provided some clues to the nomenclature of genetic movement disorders. This will be crucial in grouping the patients in clinical trials and developing the treatment for the etiology.

METHODS Participants
Patients meeting the following criteria from February 2014 to December 2020 were investigated retrospectively: (1) a diagnosis of PD at the initial clinical visit; (2) AAO < 50, or AAO ≥ 50 and with family history of PD (defined as having at least one other affected relative in the family); and (3) consent to genetic testing related to PD. The diagnostic criteria used for PD were the United Kingdom PD Society Brain Bank Clinical Diagnostic Criteria 50 (for patients recruited before 2016) or the 2015 Movement Disorder Society Clinical Diagnostic Criteria for PD 51 (for patients recruited from 2016 on) (Fig. 3). The affected relatives were not included in the cohort.
Eight hundred and thirty-two patients were included. They were divided into "early-onset group" (AAO < 50, n = 636) and "familial late-onset group" (AAO ≥ 50 and with family history of PD, n = 196).
The study was approved by the Institutional Review Board of Huashan Hospital and the China human genetic resources management office. Written informed consent was obtained from all study participants.

Clinical assessments of the patients initially diagnosed with PD
The clinical assessments were performed through a face-to-face interview with all patients. Baseline data were collected including demographic profiles (age, sex, education), disease history, family history, clinical signs, comorbidities, medications, and neurological examination results. An established method was used to calculate the LEDD 52 .
The Hoehn and Yahr (H&Y) scale and UPDRS motor examination (items 18-31) were conducted during the off-medication state, defined as withdrawal of anti-PD medications for at least 12 hours, except in those who could not tolerate it (n = 9). A battery of neuropsychological tests was performed to assess cognitive function and affected subdomains, as indicated in our previous study 53 . The non-motor symptoms were investigated by Epworth sleepiness scale (ESS), rapid eye movement sleep behavior disorder screening questionnaire (RBDSQ), Beck Depression Inventory (BDI), NMSQ, 39 item Parkinson's disease questionnaire (PDQ39), and SSST-12, as indicated in our previous study 53 .
For target sequencing, genomic DNA was fragmented into 150-200 bp length by sonication. The DNA fragments were processed by end-repairing and enriched by a panel capturing the coding exons and corresponding flanking regions of 116 genes related to movement disorders (Supplementary Table 6). Paired-end sequencing was performed on lllumina HiSeq2000 platform to provide a mean read depth of over 100X and a coverage of at least 20X in more than 95% of targeted areas. Raw data was processed by the Illumina pipeline (version 1.3.4) for image analysis, error estimation, base calling and generating the primary sequence data. Variant calling was performed using Genome Analysis Toolkit (GATK) "Best Practices" workflow 55,56 . Briefly, after the removal of 3′-/5′-adapters and low-quality reads for the quality control, the clean reads were aligned to the human reference genome (Genome Reference Consortium Human Build 37 (GRCh37)/ human genome, version 19 (hg19)) using Burrow-Wheeler Aligner (BWA) (version: dynamic update with the time, http://bio-bwa.sourceforge.net) with default parameter settings. Polymerase chain reaction (PCR) duplicates were removed by Picard (version: dynamic update with the time, http:// picard.sourceforge.net). Then we followed GATK (version: dynamic update with the time, https://software.broadinstitute.org/gatk/) standard pipelines to call germline variants.
As for WES, DNA libraries were prepared with KAPA Library Preparation Kit (Kapa Biosystems, KR0453) following the manufacturer's instructions and DNA libraries were sequenced on the Illumina Novaseq platform with 200-bp paired-end mode. Similarly, we used GATK "Best Practices" workflow for germline variant calling, as previously described for target sequencing.
The copy number variants (CNVs) in the 116 genes were analyzed by a bioinformatic tool of CNVkit 61 . The suspected CNV would be confirmed by real-time PCR.
The candidate variants with unknown pathogenicity were rated according to the American College of Medical Genetics (ACMG) guidelines 62 .
The candidate variants and any CNVs detected were further tested in the proband's parents, other affected or unaffected family members for segregation and co-segregation by Sanger sequencing, real-time PCR or MLPA respectively, if necessary and possible, during which the homozygous or compound heterozygous status of the variants in genes with autosomal recessive inheritance would be confirmed.
The quality of the sequencing As for the panel, the average sequencing depth of the target region was 393.65× and the mean percentage of the target region covered at least 20× was 99.35%. A total of 2153 regions were sequenced; 14 regions (0.65%) had read depths below 20×, which were mainly in GC-rich areas.
As for the WES, 251779 regions were sequenced. The average sequencing depth of the target region was 101.3x and the mean percentage of the target regions covered at least 20x was 80.6%. As for the 2235 regions in the 115 genes of interest, the average sequencing depth was 108.6x and the mean percentage covered at least 20x was 87.5%. The regions with sequencing depth below 20x in both target sequencing and WES were listed in Supplementary Table 7.

Statistical analysis
All measurements were taken from distinct samples. The Shapiro-Wilk test was applied to test the normality. While categorical variables were demonstrated as frequencies (%), continuous variables were demonstrated as the median and IQR. The Chi-squared test, or Fisher's exact test was used for comparing the categorical variables, and the Kruskal-Wallis test was used for comparing the continuous variables. Raw scores of neuropsychological tests were transformed into Z-scores as previously described 53 . The mean Z-scores of each domain from individual tests were computed. The generalized linear model (GLM) was used to evaluate the association between motor or non-motor scales and genetic status, which adjusted for age, sex, education, disease duration, and LEDD. Two-tailed P values were presented. Differences were considered statistically significant at P < 0.05. The data analysis was conducted by STATA 17.0 (StataCorp).

Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.

DATA AVAILABILITY
The cloud data sharing of the raw DNA sequencing data was not included in informed consent signed by the participants. The sequencing data can be available once the applicant's institution and the research objective are verified by the corresponding author Jian Wang. The other data that support the findings in the study can be available from the request of the corresponding author Jian Wang. All the data should be applied for non-commercial purposes only, and sharing restrictions may be applied to sensitive data to preserve the participants' privacy.

CODE AVAILABILITY
The bioinformatic analysis of the sequncing data was conducted as the previously reported sequencing analysis pipline, which is available at https://www.protocols.io/ view/adbs-whole-exome-sequencing-wes-analysis-pipeline-8epv56rz6g1b/v1.