Detection of genomic rearrangements from targeted resequencing data in Parkinson's disease patients

ABSTRACT Background The analysis of coverage depth in next‐generation sequencing data allows the detection of gene dose alterations. We explore the frequency of such structural events in a Spanish cohort of sporadic PD cases. Methods Gene dose alterations were detected with the eXome‐Hidden Markov Model (XHMM) software from depth of coverage in resequencing data available for 38 Mendelian and other risk PD loci in 394 individuals (249 cases and 145 controls) and subsequently validated by quantitative PCR. Results We identified 10 PD patients with exon dosage alterations in PARK2, GBA‐GBAP1, and DJ1. Additional functional variants, including 2 novel nonsense mutations (p.Arg1552Ter in LRRK2 and p.Trp90Ter in PINK1), were confirmed by Sanger sequencing. This combined approach disclosed the genetic cause of 12 PD cases. Conclusions Gene dose alterations related to PD can be correctly identified from targeting resequencing data. This approach substantially improves the detection rate of cases with causal genetic alterations. © 2016 The Authors. Movement Disorders published by Wiley Periodicals, Inc. on behalf of International Parkinson and Movement Disorder Society.

Next-generation sequencing (NGS) technologies such as whole-genome, whole-exome, and custom targeting sequencing strategies are increasingly being used to understand the genetic factors underlying both common and rare neurological disorders. 1,2 Notably, the recent advent of the NGS technologies has been accompanied by the implementation of new bioinformatic tools to detect copy number variants (CNVs) from the depth of read mapping, even after applying    [3][4][5][6][7] Most of these new tools are based on the assumption that differences in the depth of coverage among specific genomic regions and across multiple samples can be used as an indicator of the relative number of copies, resulting in a semiquantitative estimation of CNVs. However, as far as we know, this approach has never been attempted in PD.
The main aim of this study was to explore whether CNVs help to explain some sporadic PD patients in which no causative point mutations are found. In particular, we used our own resequencing data of 38 PD-associated genes in 249 PD cases and 145 unrelated controls of European ancestry 8 to study the presence of structural variants predicted by eXome-Hidden Markov Model (XHMM) software (https:// atgu.mgh.harvard.edu/xhmm/), which was specifically developed to recover information on copy number variation from normalized read depth data obtained from targeted sequencing experiments. 3 Subsequently, all the predicted CNVs were validated through quantitative polymerase chain reaction (PCR). Moreover, detailed frequencies for all potentially functional exon dose alterations detected here and for previously described pathogenic single nucleotide polymorphisms (SNPs) and indels in the same resequencing dataset are provided to understand their relative relevance in PD.

Coverage and Detection of CNVs
The mean coverage per target and sample was 49.39X, and 91% of the target bases were covered at 15X depth. 8 The detection of CNVs was performed with the XHMM software, which uses principal component analysis normalization and a hidden Markov model to detect and genotype CNVs from normalized read-depth data from targeted sequencing experiments. 3 Phred-scaled quality scores for the CNV events in the inferred intervals ranged from 30 to 99, with a mean of 80 (see Table S1). Gene dose alterations in Mendelian PD genes were then subsequently validated by quantitative PCR (see details in Appendix 2 in the Supplementary Data).

PCR and Sanger Sequencing Validation Analysis
Frameshift deletions and stop gain mutations within the PARK2, LRRK2 and PINK1 genes plus genomic rearrangements around the GBA-GBAP1 region were confirmed by PCR and Sanger sequencing analyses (see details in Appendix 3 in the Supplementary Data).

Collapsing Tests
The potential enrichment of exon dosage alterations was tested in the whole set of cases and controls using the full list of CNVs reported in Table S1. CNV enrichment was performed with the VariantTools package (http://varianttools.sourceforge.net/), which includes up to 12 different collapsing tests. 9

Results
Our study of structural genetic alterations across the 38 PD-associated genes previously sequenced through NGS 8 disclosed a total of 11 structural variants in the PARK2, GBA, and DJ1 genes, affecting 10 of 249 PD cases. All of these genomic alterations were predicted by the XHMM sofware 3 (see details in Table S1) and were subsequently validated by quantitative PCR (Figure 1 and Figure S1). CNVs were also detected in 2 of 145 controls in a heterozygous state for the recessive gene PARK2 and the GBA-GBAP1 region. The clinical features of the PD samples in which structural variants were found are provided in Appendix 1 (Supplementary Data). Additional functional sequence alterations detected in the same dataset are listed in Table 1 (see Table S2 for complete genotypes).

Structural Variants in the PARK2 Gene
We identified 4 different exon deletions and 1 exon duplication in the PARK2 gene, affecting a total of 6 PD cases ( Figure 1A). All of them have already been described in PD cases. 10,11 The 2 largest PARK2 deletions spanned from exon 3 to 6 and from exon 2 to 4 and were found in heterozygosis in Cas241 and Cas20, respectively. These 2 PD cases are examples of compound heterozygotes because both individuals also present different frameshift mutations in exon 2, causing a premature stop codon: Cas241 is heterozygote for a dinucleotide deletion (p.Gln34Argfs), whereas Cas20 is hemizygote for a single-nucleotide deletion (p.Asn52Metfs; see Table 1, Table S2, and Figure S2). The PARK2 deletion comprising exons 3 to 4 was found in 2 different PD cases: Cas57, which is homozygote for the deletion, and Cas246, which is an additional PARK2 compound heterozygote also carrying the p.Asn52Metfs mutation. Finally, Cas232 is found homozygote for a PARK2 deletion affecting exon 2, and Cas148 is homozygote for a duplication affecting exon 3. As expected, early age at onset (before 45 S P A T A R O E T A L years of age) was found in 4 of the 6 patients with structural variants in PARK2.

Structural Variants in the GBA-PGBA1 Region
The high homology between GBA and its neighboring pseudogene (GBAP1), which share 96% of sequence identity, not only explains several genepseudogene rearrangements and gene-conversion events 12 but also complicates the analysis of the whole region. 13 Our analysis disclosed 4 individuals presenting CNVs along the GBA-GBAP1 region (see Figure 1B). Cas103, which has already been described in  Figure S3 and Table S3). In addition to the GBA-GBAP1 duplication rearrangement, Cas62 also carries the p.Asp370Ser mutation described to increase risk for late-onset PD 15 and known to be the most common causal mutation for Gaucher's disease in Ashkenazi Jews 16 in a heterozygous state (Table S2).

Structural Variants in the DJ1 Gene
Among all analyzed participants, only 1 PD patient (Cas136) carried a heterozygous deletion comprising the whole exon 4 of the DJ1 gene ( Figure 1C). As far as we are aware, this alteration has never been reported before. 10,11 Interestingly, the same individual is also heterozygote for 1 duplication in the GBA-GBAP1 region. Therefore, Cas136 could be a particular case of a double heterozygote for genomic rearrangements occurring in 2 different PD loci.

Other Functional Mutations and Enrichment of Rare Variation
Besides the detected structural variants, other functional alterations (including several known PD Mendelian mutations) had been previously detected in the same dataset. 8 Frequencies and details are presented in Table 1 and Appendix 4 (Supplemental Data). Here, we further validated particular genotypes by Sanger sequencing in some PD cases to confirm genotypes in regions with low coverage depth as well as novel potential causative variants. Notably, we confirmed the identification of 2 stop-gain mutations, to our knowledge previously unreported, having checked the PDmutDB, 10,11 1000 Genomes Project, 17 ExAC database (http://exac.broadinstitute.org/, accessed April 26, 2016) and the dbSNP database 18 : p.Arg1552Ter in LRRK2, which was found in 1 heterozygote carrier (Cas55), and p.Trp90Ter in PINK1, which was found in homozygosis (Cas154) and could also represent a new causal variant for PD (see Figure S2).
In the same dataset, we have previously shown that PD cases displayed significantly higher proportions of rare (minor allele frequency (MAF) <1%) codealtering (nonsynonymous SNPs, nonsense mutations, and coding indels) variants than controls on Mendelian genes. 8 Notably, when performing the same type of collapsing analyses, considering only the exon dosage alterations identified here, PD cases also show significant enrichment of CNV in PD Mendelian genes when compared with controls (P value <.05 in 10 of 12 tests; Figure S4).

Discussion
Our analysis demonstrates the usefulness of NGS for discovering different types of variants with a potential

S P A T A R O E T A L
role in human disease. Among the detected inactivating variants, we not only report point mutations and small indels but also different exon dosage variants already known to be involved in PD aetiology. Notably, all predicted CNVs were subsequently confirmed by quantitative PCR, suggesting that the analysis of coverage from resequencing data across multiple individuals provides valuable information for the identification of exon dosage variants. It should be noted, however, that the sensitivity of the XHMM software could not be evaluated with this research design and that our study was focused on a limited set of candidate genes. Thus, an unknown number of CNVs could remain undiscovered in this set of patients.
Whereas the analysis of functional SNP variation and indels in Mendelian genes related to PD allowed us to identify putative causative variants for 6 PD cases (Cas211, Cas213, Cas226, Cas113, Cas55, and Cas154), the joint analysis of these inactivating variants, together with the exon dose alterations detected here, probably helps to explain the disease phenotype of 6 additional PD cases (Cas246, Cas20, Cas57, Cas232, Cas148, Cas241) in our Spanish cohort of 249 PD cases (2.4%). Thus, as demonstrated in this dataset, CNVs in the form of exon dose alterations are at least as important as indels and other functional SNP variations in PD Mendelian genes when explaining the phenotype of apparently sporadic PD cases. Given the recognized role of structural variants in several neurodegenerative disorders and because many NGS-based projects with large numbers of individuals are currently underway to study rare and common variant association, efforts should be made to integrate the analysis of potential CNVs in these new datasets.

Supporting Data
Additional Supporting Information may be found in the online version of this article at the publisher's web-site.