Copy Number Variants Account for a Tiny Fraction of Undiagnosed Myopathic Patients

Next-generation sequencing (NGS) technologies have led to an increase in the diagnosis of heterogeneous genetic conditions. However, over 50% of patients with a genetically inherited disease are still without a diagnosis. In these cases, different hypotheses are usually postulated, including variants in novel genes or elusive mutations. Although the impact of copy number variants (CNVs) in neuromuscular disorders has been largely ignored to date, missed CNVs are predicted to have a major role in disease causation as some very large genes, such as the dystrophin gene, have prone-to-deletion regions. Since muscle tissues express several large disease genes, the presence of elusive CNVs needs to be comprehensively assessed following an accurate and systematic approach. In this multicenter cohort study, we analyzed 234 undiagnosed myopathy patients using a custom array comparative genomic hybridization (CGH) that covers all muscle disease genes at high resolution. Twenty-two patients (9.4%) showed non-polymorphic CNVs. In 12 patients (5.1%), the identified CNVs were considered responsible for the observed phenotype. An additional ten patients (4.3%) presented candidate CNVs not yet proven to be causative. Our study indicates that deletions and duplications may account for 5–9% of genetically unsolved patients. This strongly suggests that other mechanisms of disease are yet to be discovered.


Introduction
Prior to the advent of next-generation sequencing (NGS), high phenotypic overlapping and lack of pathognomonic signs in neuromuscular disorders (NMDs) made clinical diagnosis difficult and molecular confirmation extremely complex and challenging [1]. In the NGS era, different genetic approaches have been described in literature [2,3] with a detection rate of single nucleotide variants or small ins/dels ranging between 40% and 60% [4].
Patients remaining undiagnosed may harbor mutations in unknown genes, multifactorial or polygenic conditions, or elusive variants such as deep intronic mutations, variants in regulatory elements, trinucleotide repeat expansions and copy number variants (CNVs) [5].
CNVs are defined as genomic deletions and duplications of 50 base pairs (bp) or longer [6] that may account for 5-10% of elusive mutations in the human genome [7]. To date, few custom array comparative genomic hybridization (CGH) studies on muscular genes have been described [8][9][10]. Recently, CNVs were detected from target sequencing, whole exome, or genome data by applying different computational algorithms [11,12]. Despite the use of bioinformatics tools, array CGH remains the gold standard for detection of exon CNVs [13].
Previous reports suggested that the frequency of CNVs in myopathy patients is around 4-10% [9]. However, these studies mainly provided a technical validation of array CGH platforms [8,9], with a limited number of genes [10] and/or patients tested [9]. In this study, we enrolled 234 patients who had remained undiagnosed after an extensive molecular investigation including Sanger sequencing of candidate genes and MotorPlex, a targeted NGS panel designed to detect single-base substitutions or small ins/dels [14].
We analyzed these genetically undiagnosed patients by Motor Chip, a custom array CGH for the detection of deletions and duplications in 425 neuromuscular genes [9], and identified causative and potential disease-causing CNVs in 22 patients.

Patients Recruited
We collected DNA samples from 504 patients presenting with clinical signs of limb-girdle muscular dystrophies, congenital myopathies and other conditions affecting the muscles including isolated hyperCKemia. All samples were analyzed by a targeted NGS tool, named MotorPlex [14,15], to identify single nucleotide changes as well as small ins/dels. Based on DNA availability, 234 out of 286 myopathic patients who had remained undiagnosed after this preliminary NGS study were recruited to look for CNVs. The patients involved in the project GUP11006 had already provided written informed consent at the time of blood collection and the Ethics Committee of the University of Campania, Naples, Italy approved the extension of the project [14]. Genomic DNA from leukocytes of peripheral blood was isolated using standard operating procedures established by the EuroBioBank network. DNA quality and quantity were assessed using spectrophotometric (Nanodrop ND 1000, Thermo Scientific Inc., Rockford, IL, USA) and fluorometry-based (Qubit 2.0 Fluorometer, Life Technologies, Carlsbad, CA, USA) methods.

Array CGH and Pair Analysis
All the undiagnosed patients were enrolled for this study immediately after analysis of NGS results in 2015. At that time, few bioinformatics tools were available for CNV analysis on whole exome data, but none had been optimized for use on data from targeted NGS. We used the most user-friendly tool, SureCall (Agilent Technologies, Santa Clara, CA, USA) and tested its ability to identify CNVs from 50 targeted NGS samples. The SureCall algorithm uses pair analysis (sample vs. reference), comparing the per base read depth coverage of each target interval for sample and reference BAM files. Aberration intervals were predicted by the log 2 ratio of depth coverage of the sample to the reference.
A custom array CGH named Motor Chip, able to investigate more than 400 genes related to neuromuscular disorders with an exonic resolution, was then used for the detection and characterization of CNVs [9]. Labeling and hybridization were performed using SureTag labeling kit (Agilent Technologies) according to the manufacturer's specifications. Scanned array images were analyzed by Cytogenomics v4 (Agilent Technologies). After performing a quality control test, duplications and deletions were identified using the Aberration Detection Method 2 (ADM-2) algorithm. At least three target probes with changes in the number of copies were required for a CNV call. Deletions and duplications corresponding to well-known copy number polymorphisms were filtered off [16]. Variants not known to be pathogenic or of doubtful significance were compared with the Database of Genomic Variants [17], DECIPHER (https://decipher.sanger.ac.uk/), and ExAC Browser (http://exac.broadinstitute.org/) to facilitate their interpretation. Data reported here are submitting to ClinVar [18] and Leiden [19].

Real-Time PCR
All identified CNVs were further confirmed by real-time PCR using the Bio-Rad CFX96 system (Bio-Rad, Pleasanton, CA, USA). Specific primers (available from the authors on request) able to investigate the deleted or duplicated regions were designed using the Primer 3 webtool. The 20 µL reaction contained 10 µL 2X FastStart universal SYBR Green Master (Bio-Rad), 10 µM of each primer, and 10 ng of genomic DNA as template. The following thermal conditions were used: 10 min of preheating at

MLPA Analysis for DMD and SPAST Genes
Complete or partial deletions/duplications involving DMD and SPAST genes were confirmed by Multiplex ligation-dependent probe amplification (MLPA) using the SALSA MLPA P034 and P035 DMD kits and P165 SPAST kit (MRC-Holland, Amsterdam, The Netherlands), according to the manufacturer's recommendations. MLPA data analysis was performed with the Coffalyser.net package (MRC-Holland).

cDNA Analysis
To further characterize the duplications, a detailed analysis of the transcript was conducted. RNA was isolated from leukocytes of peripheral blood or from muscular biopsies, according to standard procedures. A specific coding sequence demonstrating the predicted tandem duplications was then amplified and bidirectionally sequenced on an ABI 3130xL automatic DNA sequencer (Applied Biosystems, Foster City, CA, USA).

Histological Studies and Western Blot Analysis
Histological and histochemical examinations in muscle biopsies were carried out following standard procedures [20]. Western blotting of muscle biopsy samples was performed according to standard methods [21].

Results
After NGS analysis, 50 undiagnosed patients were initially tested for CNVs by SureCall pair analysis, which confidently identified on average eight CNV calls per sample. Motor Chip, performed to validate the identified CNVs, showed that SureCall pair analysis achieved a detection rate of about 40-60%. Because of the high number of CNV calls with low sensitivity and the lack of accuracy in CNV breakpoint mapping -with the SureCall pair analysis, we used Motor Chip to further test the remaining undiagnosed patients. Motor Chip analysis identified a causative deletion or duplication in genes responsible for the observed phenotype in 12 out of 234 patients (5.1%) ( Figure 1). Molecular findings and clinical data of patients carrying the identified CNVs are summarized in Table 1.

CNVs in Dystrophin Gene
Previously, described dystrophin gene (DMD-MIM 300377) deletions [22] were identified in four patients (two men and two women) presenting with proximal weakness and/or isolated serum CK levels 10 times over the maximum normal range (patients I-IV). The deletion of exons 3-7 in patient IV occurred de novo. In patient III the deletion of exons 45-48 was inherited from her affected father, who had remained undiagnosed due to a very mild phenotype and a very low increase in CK values.

CNVs in Laminin-2 Gene
In four patients, laminin-2 gene (LAMA2-MIM 156224) CNVs were detected in the presence of a previously identified heterozygous single nucleotide mutation by MotorPlex. In particular, patient V carried an unreported nonsense mutation (c.5374G>T, p. Glu1792*) and a 190.6 kb intragenic duplication. A further molecular characterization of the LAMA2 transcript, isolated from leukocytes, showed a tandem, in-frame duplication involving exons 21-55 of the gene (Figure 2a,b). A merosin deficiency was confirmed by Western blotting of skeletal muscle. Patient VI harbored a previously unreported missense variant in exon 47 of LAMA2 gene (c.6599G>A, p.Arg2200His) and a known heterozygous in-frame deletion of exons 13-37 [9] in the same gene. Array CGH identified a novel heterozygous LAMA2 deletion (exons 13-14) generating a frameshift of open reading frame in patient VII, affected by a congenital myopathy. In the same patient, MotorPlex had identified a heterozygous mutation affecting the canonical splice site (c.4312-1G>A) of exon 29.
Similarly, patient VIII harbored a previously undescribed intronic variant (c.6429+3A>C) as well as a 75 kb intragenic duplication (exons 4-12). Muscular RNA analysis identified two transcripts (Figure 2c,d): one transcript with skipping of exons 44 and 45 leading to the loss of reading frame (due to the intronic splicing variant) and a second transcript showing an in-frame duplication of 1386 nucleotides, predicted to produce a protein longer by 462 amino acids. However, we were unable to perform a Western bot analysis as additional tissue was not available.

CNVs in Sarcoglycan Genes
In patient IX, we found a previously unreported homozygous deletion of the first coding exon of δ-sarcoglycan gene (SGCD-MIM 601411). His unaffected parents were heterozygous for the observed CNV. Similarly, the strong clinical suspicion for patient X and the lack of a causative variant after NGS analysis prompted us to check the coverage for sarcoglycan genes. NGS data coverage showed the absence of reads in the last exon of beta-sarcoglycan gene (SGCB-MIM 600900) suggesting a deletion. Motor Chip analysis defined the deletion at the last 12 codons in exon 6 and the 3' UTR (untranslated region) [23]. We also detected a reported heterozygous deletion spanning exon 7 of gamma-sarcoglycan gene (SGCG-MIM 608896) on the maternal allele of patient XI [24]. In the same patient, NGS analysis had previously identified an in-frame 3 nucleotide deletion (c.124_126del), listed in the LOVD database [19], on the father's allele in exon 2 of SGCG gene.

Deletion in Spastin Gene
We found a reported in-frame heterozygous deletion of exons 10-16 [25] of spastin gene (SPAST-MIM 604277) in patient XII, who presented with fatigue and hyposthenia since adolescence. A large segregation analysis showed the presence of the deletion in her three affected relatives (III-1, III-8, IV-3; Supplementary Figure S1) and in her asymptomatic sister (III-7; Supplementary Figure S1). Three unaffected family members (III-3, III-4, III-6; Supplementary Figure S1) were negative for the observed deletion, while genetic testing was not performed in the clinically affected patients II-3 and IV-1, who refused DNA investigations.
We identified variants of uncertain significance (VUS) in ten patients (4.3%) ( Table 2). VUS are genomic variants not previously reported in normal individuals and with insufficient information about their clinical significance [26]. As required by the American College of Medical Genetics and Genomics guidelines, an exhaustive characterization will be needed to clarify their clinical meaning. segregation analysis showed the presence of the deletion in her three affected relatives (III-1, III-8, IV-3; Supplementary Figure S1) and in her asymptomatic sister (III-7; Supplementary Figure S1). Three unaffected family members (III-3, III-4, III-6; Supplementary Figure S1) were negative for the observed deletion, while genetic testing was not performed in the clinically affected patients II-3 and IV-1, who refused DNA investigations.
We identified variants of uncertain significance (VUS) in ten patients (4.3%) ( Table 2). VUS are genomic variants not previously reported in normal individuals and with insufficient information about their clinical significance [26]. As required by the American College of Medical Genetics and Genomics guidelines, an exhaustive characterization will be needed to clarify their clinical meaning.   Abbreviations: n.a. = not available, CK = creatine kinase, EMG = electromyography, RF = respiratory function, pat = paternal, DG = dystroglycan, N = normal, het = heterozygous; del = deletion; dup = duplication. * list of genes is available upon request.

Role of CNVs in Skeletal Muscle Disorders
Whole exome and targeted sequencing have proven to be robust and cost-efficient diagnostic tools in heterogeneous diseases, increasing the detection rate of disease-causing variants compared to the traditional gene-by-gene approach [28]. However, over 50% of undiagnosed patients affected by nonspecific skeletal muscle disorders do not receive any molecular diagnosis using NGS strategies [14].
Taking advantage of a large cohort of myopathy patients previously screened with a gene panel assay, we designed a study aiming to verify the presence of CNVs in unsolved patients [14]. An extensive screening of 234 myopathy patients by Motor Chip identified 22 CNVs. A causative deletion or duplication was found in 12 out of the 234 patients (5.1%).
The identification of deletions or duplications in DMD gene causing Becker muscular dystrophy (BMD) was not unexpected, and only confirmed the clinical overlap between BMD and limb-girdle muscular dystrophy [29]. In total, 2.2% of the original 504 patients recruited to the study were diagnosed with BMD, including those carrying the 7 single nucleotide variants (small indels, nonsense and splice site variants) identified by MotorPlex [14].
The presence of female carriers as isolated cases in the absence of affected males in the family (case IV) is noteworthy. The diagnosis of a Duchenne muscular dystrophy or BMD manifesting carrier should be suspected in a female presenting with limb-girdle weakness, despite a negative family history [30]. As previously reported for X-Linked myotubular myopathy [31,32], the extensive use of genome-wide tools in a diagnostic setting will identify an increasing number of BMD manifesting carriers, probably due to skewed X-chromosome inactivation [33]. Their identification will be crucial for appropriate genetic counseling and correct prenatal diagnosis.
Several LAMA2 intragenic deletions and duplications were previously reported [34]. In our cohort, we found a CNV in four out of 15 patients (27%) harboring a rare (minor allele frequency < 1%) single nucleotide variant in LAMA2. Interestingly, LAMA2 CNVs in compound heterozygosity with a protein truncating variant (PTV) were observed in three patients. Therefore, in the presence of a heterozygous PTV, a CNV affecting the second allele should always be suspected.
CNVs in sarcoglycan genes are generally considered a rare event. However, our results, as well as previous data [9], suggest that their incidence has likely been underestimated. As in the case of LAMA2, screening for exonic deletions/duplications in sarcoglycan genes is strongly recommended in patients carrying a single heterozygous mutation, above all if a reduction of sarcoglycan proteins is observed [23].
Deletions in SPAST gene are common in hereditary spastic paraplegia [35]. However, the high phenotypic variability of the disease makes clinical evaluation and molecular characterization challenging, especially when proximal muscle weakness in the limbs and minimal neurological signs are present [36], as in patient XII. In this case, identification of the SPAST deletion provided the molecular diagnosis in her family, increasing the number of described cases. As previously reported, the presence of an asymptomatic carrier of a spastin deletion may be considered a reduced penetrance of the disorder [36].
Retrospectively, we performed a SureCall analysis using 10 out of 12 causative CNVs as positive controls. Although in four samples (II, VI, VIII, IX) we identified the respective deletion or duplication previously found with array CGH, breakpoints were not correctly defined. Unlike MotorPlex, which provides exon level coverage, Motor Chip design comprises probes covering the intronic region of neuromuscular genes, yielding a better definition of CNV breakpoints.

Variants of Uncertain Significance
We identified non-recurrent rare CNVs, which we interpreted as VUS, in ten patients (4.3%) including heterozygous deletions involving genes responsible for autosomal recessive muscle disorders (ETFB-patient XIII; MLYCD-patient XIV). Despite our exhaustive genetic investigation, including an NGS strategy and a CNV assessment, the presence of further elusive variants (such as deep intronic variants or small repeat changes) in these genes cannot be excluded.
Two interesting heterozygous deletions were found in genes (MYPN-patient XV and CSRP3-patient XVI) associated with a dominant cardiomyopathy. Specifically, an MYPN deletion was found in patient XV presenting with hypertrophic cardiomyopathy and centronuclear myopathy. Biallelic mutations of MYPN are associated with a slowly progressive nemaline myopathy [37] and a recessive cap myopathy [38]. However, no other single nucleotide variants in MYPN were identified in this patient. Still more complex is the interpretation of the CSRP3 deletion identified in patient XVI, who presented with dilated cardiomyopathy and muscle weakness. To date, no CSRP3 mutations have been associated with a skeletal muscle phenotype. However, no other variants in myopathy disease genes included in our Motor Chip and MotorPlex panels explain the observed phenotype in these two cases. Although MYPN and CSRP3 may have a role in the observed cardiomyopathies, the primary genetic cause of the skeletal muscle disorder in these patients remains to be identified.
A duplication of the 15q11.2 region was found in patient XVII affected by congenital arthrogryposis. The patient recently obtained a clinical diagnosis of Nail Patella syndrome, confirmed by the identification of a de novo variant in LMX1B gene. However, the role of the identified duplication is still unclear since CNVs in the 15q11.2 region are normally associated with neurodevelopmental disorders [39,40].
Segregation studies in only two cases, patient XVIII and patient XIX excluded the primary disease role of a heterozygous deletion of SPG11 gene and Xp22.13-p22.12, respectively.
In contrast, the G6PC deletion detected in patient XX and the SEPN1 duplication found in patient XXI are not sufficient to explain the observed clinical phenotype in the absence of comprehensive biochemical and molecular evidence. Lastly, a heterozygous deletion of 1q23.3-24.2 was identified in patient XXII presenting with core myopathy. As this rearrangement involves five disease genes not previously associated with core myopathy, its role remains unclear.
In sum, although these VUS may not act as primary disease drivers, we cannot exclude the possibility that some may play a role as modifiers, contributing to the observed phenotype. However, the absence of functional assays and the lack of comparisons with similar cases are a limitation in the clinical interpretation of these rearrangements.

Conclusions
Here, we report the results observed in the largest cohort of patients with a skeletal muscle disorder analyzed to date by array CGH. In line with data reported in literature, our findings confirm that deletions and duplications are present in 5-9% of patients affected by a skeletal muscle disorder without a molecular diagnosis. However, further extensive studies will be needed to clarify the role of CNVs in about half of these cases. The addition of data (including frequency) on CNVs to the ExAC database may assist in the clinical interpretation of observed rearrangements [41]. However, as evidenced here as well as by others, the integration of NGS results with biochemical, histological and Western blotting findings remains crucial for a correct clinical and diagnostic evaluation [23,42].
Although NGS panels are extensively applied in clinical settings for the detection of single nucleotide variants or small ins/dels, identification of deletions or duplications of whole exons, particularly single-exon CNVs, has proved problematic [43].Improvements to tools identifying CNVs from NGS data are regularly reported [11,12]. Since several algorithms used to detect CNVs from NGS data [11] are now available, the exclusive adoption of SureCall for the identification of CNVs from NGS data may represent a limitation of this study. We are aware that more advanced algorithms may likely have higher sensitivity and specificity. However, their use still generates a high number of false positives, and their readouts still require validation by other methods such as array CGH or MLPA for further mapping of breakpoints. Array CGH remains the gold standard method for CNV detection and analysis, especially in diagnostics, as it is a well-established and validated strategy. Noteworthy, our study suggests that Sanger sequencing is still the only reliable method for determining exact breakpoints at base-pair level, especially for duplications.
Since muscle tissues express several very large disease genes, some of which are prone to deletions or duplications, further assessment of possible CNVs is strongly advised in neuromuscular diagnostic settings. Finally, the use of whole genome sequencing or single-molecule long-read sequencing will help extend the search to other elusive variants not detectable by targeted or whole exome sequencing and CNV mapping [44].
Supplementary Materials: The following are available online at http://www.mdpi.com/2073-4425/9/11/524/s1, Figure S1: Pedigree of family with the SPAST deletion. Female III-5 in the pedigree corresponds to patient XII, harboring deletion of exons 10-16 of SPAST gene. Funding: This study was entirely supported by grants from Telethon, Italy (TGM11Z06 to V.N.) and Telethon-UILDM (Unione Italiana Lotta alla Distrofia Muscolare) (GUP 10006 to G.P.C. and V.N., GUP11006 to V.N.). The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.