Molecular and clinical analysis of 27 German patients with Leber congenital amaurosis

Leber congenital amaurosis (LCA) is the earliest and most severe form of all inherited retinal dystrophies (IRD) and the most frequent cause of inherited blindness in children. The phenotypic overlap with other early-onset and severe IRDs as well as difficulties associated with the ophthalmic examination of infants can complicate the clinical diagnosis. To date, 25 genes have been implicated in the pathogenesis of LCA. The disorder is usually inherited in an autosomal recessive fashion, although rare dominant cases have been reported. We report the mutation spectra and frequency of genes in 27 German index patients initially diagnosed with LCA. A total of 108 LCA- and other genes implicated in IRD were analysed using a cost-effective targeted next-generation sequencing procedure based on molecular inversion probes (MIPs). Sequencing and variant filtering led to the identification of putative pathogenic variants in 25 cases, thereby leading to a detection rate of 93%. The mutation spectrum comprises 34 different alleles, 17 of which are novel. In line with previous studies, the genetic results led to a revision of the initial clinical diagnosis in a substantial proportion of cases, demonstrating the importance of genetic testing in IRD. In addition, our detection rate of 93% shows that MIPs are a cost-efficient and sensitive tool for targeted next-generation sequencing in IRD.


Introduction
Leber congenital amaurosis (LCA, MIM #204000) was first described by Theodor Leber in 1869 and refers to a heterogeneous group of severe, mostly recessively inherited, early infantile-onset retinal dystrophies with typically extinguished electroretinograms (ERGs). Later, a separate group of milder disease phenotypes, with some preservation of the ERG responses, the so-called "early-onset severe retinal dystrophy" (EOSRD) or "severe early childhood onset retinal dystrophy" has been described. LCA and EOSRD together are the most severe and earliest forms of all inherited retinal diseases (IRDs). They affect 20% of blind children and account for 5% of all IRDs [1]. In Germany, the estimated number of cases is 2000 (source: Pro Retina Deutschland e. V.). To date, mutations in 25 genes have been associated with LCA (https://sph.uth.edu/retnet/). A substantial proportion of cases (10-20%) remain unsolved despite extensive molecular testing [2][3][4]. This is due to technical limitations as copy number variations often remain undetected in datasets derived from capture panels or whole exome sequencing, but also because of the focus on coding regions in most diagnostic settings which will not detect deep intronic variants acting on splicing or variants in regulatory sequences. There is a considerable clinical and genetic overlap between LCA, EOSRD and other types of IRD, therefore, an accurate clinical diagnosis cannot always be made at the first visit of the young patients. Furthermore, the clinical examination of infants is challenging or limited. Hence, the initial clinical diagnosis sometimes has to be revised once genetic results are available.
For a long time the genetic heterogeneity of LCA (and IRD in general) hampered DNAbased (molecular) diagnoses, since parallel screening of all associated genes requires next generation sequencing approaches, for which reimbursement to the patient is often not guaranteed. We sought for a cost-effective and sensitive approach to obtain a molecular diagnosis for 27 patients that had been diagnosed with LCA at the University Eye Hospital Tuebingen. The present study focuses on these genetically unsolved cases, which were screened for sequence variants in 108 genes associated with non-syndromic IRD by a cost-effective targeted panelbased next-generation sequencing approach.

Subjects and clinical assessment
In this study we included 27 unrelated patients of German origin with a clinical diagnosis of LCA who were not genetically pre-investigated. Their clinical diagnosis was established by standard clinical ophthalmologic examinations including patient history, psychophysical and electrophysiological examinations. Genomic DNA of patients was extracted from peripheral blood using standard protocols. Samples from all patients and family members were recruited in accordance with the principles of the Declaration of Helsinki and were obtained with written informed consent accompanying the patients´samples. The study was approved by the institutional review board of the Ethics Committee of the University Hospital of Tuebingen.

Sequencing analysis
Molecular testing was performed by targeted next-generation sequencing at a core facility (Department of Human Genetics, Radboud University Nijmegen Medical Centre). We used molecular inversion probes (MIPs) with 5-bp molecular tags to conduct targeted next generation sequencing of 108 genes associated with IRD (see S1 Table). The 1,524 coding exons and the 10 bp flanking each exon were targeted with 6,129 probes for an overall target size of 647,574 bp. On average, 4-6 MIPs cover one exon. The panel also includes the frequent LCAassociated pathogenic intronic variant c.2992+1655A>G in CEP290 [5]. Pooled and phosphorylated probes were added to the capture reactions with 100 ng of genomic DNA from each individual to produce a library for each individual. The libraries were amplified with 21 cycles of PCR, during which an 8-bp sample barcode was introduced. The barcoded libraries were then pooled and purified with AMPureXP beads (Beckman-Coulter). Sequencing was performed on an Illumina NextSeq 500 system. Demultiplexed BAM files were aligned to a human reference sequence (UCSC Genome Browser hg19) via the Burrows-Wheeler Aligner (BWA) v.0.6.2 [6]. In-house automated data analysis pipeline and variant interpretation tools were used for variant calling. Rare and potentially disease-causing variants were confirmed by Sanger sequencing using standard protocols. Sanger sequencing was also used to screen for the recurrent c.2843G>A/p.C948Y variant in the CRB1 gene because it was not covered by the MIP reads.

Variant filtering and classification
Only non-synonymous single nucleotide variants (nsSNVs), nonsense variants, putative splice site (±10 bps) variants, insertions, duplications and deletions represented by more than 20 sequence reads were considered for further analysis. In addition, variants with a minor allele frequency (MAF) >0.5% in the Genome Aggregation Database (gnomAD) Version r2.0.2 [7] were excluded from further investigation. For variant classification we applied the terminology proposed by the American College of Medical Genetics and Genomics and the Association for Molecular Pathology [8].

Results
Utilizing our capture panel technology, we were able to obtain an average of 1.2 million reads on target per sample, with an average coverage of 213 reads per probe. Moreover, an average of 88% of targeted regions had 10x coverage or more, which was sufficient for accurate variant calling. The pipeline initially called an average of 532 single nucleotide variants and 64 insertions/deletions for each sample. Putative pathogenic variants were identified in 25/27 index cases (Table 1), thereby achieving a detection rate of 93%. All putative disease-associated variants were validated by conventional Sanger sequencing. Homozygosity was observed for eight patients (26%): variants were seen in true homozygous state in four patients and in apparent homozygous state in four patients, respectively. Two patients were hemizygous, and compound heterozygosity was observed for four patients based on the analysis of paternal alleles. Trans configuration of variants could not be demonstrated for 11 patients because DNA of family members was not available and the respective variants were located too far apart for allelic cloning. In patient 26, a single heterozygous variant in IMPG2 was observed. In patient 27, no putative disease-causing variants were identified. The mutation spectrum comprises 34 different alleles, 17 of which are novel. All novel variants were deposited to the ClinVar database (https://www.ncbi.nlm.nih.gov/clinvar/) [13] with accession codes provided in Table 1.
The variants comprise 14 missense variants, eight nonsense variants, seven deletions or duplications leading to a frame-shift, three canonical splice site variants, two non-canonical splice site variants and one in-frame deletion. Pathogenicity was interpreted in accordance with the American College of Medical Genetics guidelines [8]. The respective categories are given in Table 1. Missense variants that have never been reported before were analysed using different in silico prediction algorithms. These scores, together with the MAFs sourced from the gnomAD browser are shown in Table 2.

LCA / EOSRD patients
A summary of clinical findings is shown in Table 3 including all 27 index patients. In 19 of 27 patients, the initial diagnosis of LCA/EOSRD was confirmed by the molecular genetic analysis. In all of these cases, disease onset was typically at birth or within the first months of life. Patients showed a progressive disease history with severe visual impairment from the beginning. In the following, the LCA-associated genes that were found to be mutated in these patients are listed in detail.   [14][15]. Of note, this particular variant was not covered by the MIPs in our assay, but we screened all patients by conventional Sanger Sequencing for this variant, because of its known high frequency and relevance (MAF 0.0002027).
RPGRIP1. Of the six potentially disease-causing variants in RPGRIP1 detected in four patients (14.8%; 4/27), all represent likely null alleles and three were novel. Compound heterozygosity of a reported nonsense variant and a novel 68-bp deletion was demonstrated for one patient. One patient harbored a reported canonical splice site variant on one allele and a novel frame-shifting duplication on the other allele. Two patients were homozygous for two different nonsense variants, one of them novel.

RPE65.
A total of four novel variants in RPE65 were identified in two patients (7.4%; 2/ 27), including one nonsense and three missense. Biallelism could not be formally proven in both cases.
AIPL1. Of the three variants detected in two affected individuals in AIPL1 (7.4%; 2/27), there was one novel missense variant found in homozygous state in one patient. Another patient harbored a nonsense variant and a non-canonical splice site change. Whether the variants are in trans configuration in this patient could not be established.
NMNAT1. NMNAT1 variants were detected in one patient (3.7%; 1/27) who possessed one reported frame-shifting duplication and the known hypomorphic variant c.769G>A/p. E257K [16]. Biallelism could not be confirmed due to lack of additional family DNA samples. CEP290. One patient was found to be homozygous for the common c.2991+1655A>G/p. C998 � allele which causes insertion of a cryptic exon and subsequent truncation [5,17].

Other patients
In addition to the cases described above, we identified eight patients (30%) who harbored pathogenic variants in genes not typically associated with LCA. Clinical re-evaluation of these cases led to a revision of the initial clinical diagnosis in all of them. Within this group, ABCA4 was the most frequently mutated gene, as biallelic variants were seen in four patients. In these cases, a later onset of disease and a dense pigmentation of the retina were observed (Table 3). After genetic testing and re-evaluation of clinical data, the diagnosis was corrected to cone-rod dystrophy (CRD), demonstrating severe morphological and functional damage in all cases. In addition, we found a male patient to be hemizygous for a pathogenic variant in RP2. He had been initially diagnosed in adult age with severely progressed retinal degeneration. Consequently, his diagnosis was corrected to X-linked retinitis pigmentosa.
Another male patient was shown to be hemizygous for a pathogenic variant in CACNA1F. He was suffering from nystagmus, night blindness, photophobia and very poor vision since birth. His fullfield ERGs showed residual photopic and scotopic responses. Morphologically, slight attenuation of the retinal vessels, changes in the macular reflexes and only minimal peripheral pigmentary changes could be observed. In this case, the diagnosis was changed to X-linked congenital stationary night blindness (CSNB).
One patient harbored pathogenic variants in CDHR1. The revised clinical diagnosis in this case was CRD, but interestingly, this female patient also suffered from renal insufficiency, secondary hyperparathyroidism and obesity. Whether these symptoms can be considered as a unique disease identity or syndrome remains unexplained. So far, such extra-ocular symptoms have not been described as a feature of CDHR1-related disease but would be typical features of a ciliopathy to which CDHR1-associated IRD does not belong to.
The last male patient presented in our clinic with a severe retinal degeneration at the age of 50 years and was found to be homozygous for an in-frame insertion/deletion in PROM1. On the basis of patient history, clinical findings and genetic results, the clinical diagnosis was changed to autosomal recessive retinitis pigmentosa.

Discussion
In a cohort of 27 German patients initially diagnosed with LCA, we were able to identify sequence variants likely explaining the disease phenotype in 25 cases (93%) by applying a costefficient targeted next-generation sequencing approach designed at the Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands. The MIP panel targets 108 known IRD genes, including 22 genes that are associated with LCA, that were reported in October 2013.
Undoubtedly, those LCA genes with the highest disease-causing variant load have already been discovered. However, the fact that half of the variants (17/34) we identified are novel suggests that the mutation spectrum of LCA and other IRD genes is far from being saturated and confirms the known genetic heterogeneity of IRD in an outbred European population.
The most frequently mutated LCA genes in our cohort were CRB1 (6 cases, 22%) and RPGRIP1 (4 cases, 15%). Among the six patients with CRB1 mutations, five carried the recurrent p.C948Y variant on one allele, which is known to be a founder mutation [18]. We only identified one patient with a CEP290 variant in our cohort, despite CEP290 being one of the most frequently mutated LCA genes in different populations [5,19], but this is due to the fact that most patients in the present study had already been pre-screened for the recurrent pathogenic intronic variant c.2992+1655A>G.
Several criteria were considered to evaluate the potential pathogenicity of variants: (1) variants have previously been reported to be pathogenic, (2) variants are observed only in few heterozygous cases or are absent among 277,264 general population alleles sourced from gnomAD browser; (3) variants represent likely null alleles (nonsense, canonical splice site and frame-shift variants), and (4) in the case of missense variants they are predicted to be damaging by in silico prediction algorithms. In addition, all variants were classified according to their pathogenicity based on the American College of Medical Genetics and Genomics (ACMG) guidelines [8]. With nonsense, canonical splice site and frame-shifting variants having a strong weight in the ACMG scoring system, this class of variants are consequently classified either as likely pathogenic or pathogenic, whereas missense variants that lack segregation data and functional analyses to support a damaging effect are always classified as variants of uncertain significance (VUS). To compensate for this simplistic categorization of the ACMG classification system, we provide in silico predictions from four algorithms for all missense variants identified in this study, regardless of having been reported previously or not, along with phy-loP scores, Grantham differences and MAFs sourced from the gnomAD browser ( Table 2). The extremely low MAF or even the absence in the gnomAD browser, the evolutionary conservation as well as the type of the respective amino acid substitution are strong indicators that all missense variants we identified and reported are indeed pathogenic. One missense variant that is predicted to be benign by the majority of algorithms is the recurrent c.769G>A/p. E257K variant in NMNAT1, but it has been shown previously that this is a hypomorphic variant and almost always causes LCA in combination with more severe alleles [16].
Apart from the fact that we lack segregation data for several patients, the only case that is left with some level of uncertainty is patient LCA 108 who carries a nonsense variant and a non-canonical splice site variant in AIPL1. The latter is a transition of T to C at position +6 of the splice donor of exon 2. It is absent in the gnomAD browser, but since the +6 position is not invariable, we performed an in silico prediction. The bioinformatic tool Human Splicing Finder [20] predicts that the c.277+6T>C variant breaks the natural splice donor site, since the mutant score is reduced by 41% compared to the wildtype score when using maximum entropy as the algorithm type. However, since AIPL1 is not expressed in accessible tissues like blood or skin fibroblasts, mRNA analyses to confirm the in silico prediction are not feasible. Sanger sequencing of the entire coding region of AIPL1 in this patient revealed no other variants than c.834G>A/p.W278 � and c.277+6T>C. While most cases with mutations in AIPL1 are biallelic, certain mutations may result in dominant cone-rod dystrophy or juvenile retinitis pigmentosa [21], however, this most probably is not the case for loss of function alleles like the c.834G>A/p.W278 � variant in our patient. Of course, we cannot rule out that the phenotype of our patient might not be related to AIPL1 at all.
The different forms of IRD may present with considerable clinical overlap [22]. This often precludes the assessment of a diagnosis on the basis of the disease phenotype alone, no matter how experienced and meticulous the clinician might be. Hence, we were not surprised that eight patients in our cohort (30%) were found to carry pathogenic variants in genes not typically associated with LCA. We reassessed the clinical data of these patients and revisited the initial diagnosis in all of them. A recently published study on Brazilian patients with LCA found the same proportion (i.e. 30%) of patients that were solved by identifying variants in non-LCA genes [23]. This impressively demonstrates how a molecular diagnosis can help to refine a clinical diagnosis.
The underlying variants in two patients remained unresolved (7.4%; 2/27). One of these patients was found to be heterozygous for a known missense variant in IMPG2. Biallelic mutations in IMPG2 are a known cause for RP [24]. All exons and adjacent intronic regions of this gene were sufficiently covered which excludes the existence of a second variant in the coding region. Whether non-coding deep-intronic variants or large deletions in the IMPG2 gene account for the second pathogenic allele in this patient remains unknown.
Supposing that all patients in whom we could not confirm trans configuration of variants are indeed biallelic, our detection rate is 93%. This is in line with recent studies for LCA which achieved 80-90% in panel-based approaches [2][3] and 89% by whole exome/genome sequencing [4]. Analysis of our sequencing data revealed several regions with low or no coverage, as for instance for parts of exon 6 of CRB1. We would have missed several patients carrying the recurrent c.2843G>A/p.C948Y variant in this gene, had we not re-sequenced this exon in all patients with conventional Sanger sequencing.
Several studies have shown that whole exome sequencing (WES) and whole genome sequencing (WGS) can outperform targeted sequencing approaches in terms of variant detection [4,[25][26][27]. In fact, NHS England is already planning to commission WGS into routine clinical care pathways [28]. However, targeted sequencing approaches have several benefits, including a higher coverage rate for targeted regions and higher throughput in terms of patient numbers. What is more important, they are associated with considerable lower costs, which is relevant for those patients who cannot expect reimbursement from their health care provider or have no health insurance at all. The MIP technology we used can be as low as € 80 per sample per gene panel, which is 10 to 20 times lower than the price tag for other NGS-based sequencing procedures. Reaching a detection rate of 93%, we could demonstrate that MIPs are a cost-efficient and sensitive tool for targeted next-generation sequencing in IRD.
Supporting information S1