Findings from a Genotyping Study of over 1000 People with Inherited Retinal Disorders in Ireland

The Irish national registry for inherited retinal degenerations (Target 5000) is a clinical and scientific program to identify individuals in Ireland with inherited retinal disorders and to attempt to ascertain the genetic cause underlying the disease pathology. Potential participants first undergo a clinical assessment, which includes clinical history and analysis with multimodal retinal imaging, electrophysiology, and visual field testing. If suitable for recruitment, a sample is taken and used for genetic analysis. Genetic analysis is conducted by use of a retinal gene panel target capture sequencing approach. With over 1000 participants from 710 pedigrees now screened, there is a positive candidate variant detection rate of approximately 70% (495/710). Where an autosomal recessive inheritance pattern is observed, an additional 9% (64/710) of probands have tested positive for a single candidate variant. Many novel variants have also been detected as part of this endeavor. The target capture approach is an economic and effective means of screening patients with inherited retinal disorders. Despite the advances in sequencing technology and the ever-decreasing associated processing costs, target capture remains an attractive option as the data produced is easily processed, analyzed, and stored compared to more comprehensive methods. However, with decreasing costs of whole genome and whole exome sequencing, the focus will likely move towards these methods for more comprehensive data generation.


Introduction
Inherited retinal degenerations (IRDs) are a broad set of clinically and genetically diverse conditions that represent the leading cause of visual dysfunction in those of working age. IRDs are typically caused by improper development or death of photoreceptor cells and have a substantial effect on both the quality of life of those affected and health economics. Inheritance patterns include autosomal recessive, autosomal dominant and X-linked, as well as rarer mitochondrial and digenic forms [1,2].

DNA Acquisition and Next Generation Sequencing
DNA was isolated from either blood (DNA Blood Maxi Kit, Qiagen, Hilden, Germany) or saliva (Oragene-DNA, DNA Genotek, ON, Canada) samples from participants. Sample preparation was carried out using a hybridisation-based target capture sequencing method previously described [15]. The average read coverage achieved was 125× per captured region. All genomic locations refer to the Hg38 reference genome (Homo sapiens GRCh38).

Variant Confirmation Sequencing
To validate variants identified by NGS, relevant genomic loci containing the mutations were amplified by polymerase chain reaction (PCR) and investigated by direct sequencing. Primers were procured from Sigma-Aldrich (Gillingham, England, UK). DNA products were standardly amplified using Q5 High-Fidelity 2× Master Mix (New England Biolabs Inc., Ipswich, MA, USA). Segregation analyses for additional family members were amplified similarly or alternatively, directly from blood (Phusion Blood Direct PCR Kit, Thermo Scientific, MA, USA) or saliva (Phusion Human Specimen Direct PCR Kit, Thermo Scientific) where applicable. The annealing temperatures for reactions were optimised for each variant; all other details were executed as per the supplier's recommendations. Sanger sequencing was performed by Eurofins Genomics (Ebersberg, Germany).

Sequencing of RPGR ORF15
A previously reported sequencing strategy of RPGR was employed in order to cover the highly repetitive sequences of ORF15, a major cause of X-linked RP [19]. Primers used to probe this region can be found in Figure S2.
2.6. Single-Molecule Molecular Inversion Probe (smMIP)-Based Sequencing of ABCA4 smMIPs-based whole gene sequencing of ABCA4 was carried out using a NextSeq 500 as part of a large-scale study by Khan et al. [13]. 3866 single-molecule molecular inversion probes (smMIPs) were designed as described previously [20] and employed to capture 110nt increments of both the sense and antisense strands of the ABCA4 gene, as well as 40 kb of flanking sequences. Sample processing, sequencing, data analysis, and variant interpretation was carried out, as described by Khan et al. [13].

Data Analysis and Variant Interpretation of Target Capture NGS Data
Raw sequencing data were demultiplexed and mapped to the IRD-relevant regions of the human genome (Hg38) as previously described [15]. The American College of Medical Genetics and Genomics (ACMG) criteria for classifying pathogenic variants was utilised to interpret variants [21]. Other proposed modifications to the ACMG guidelines to further quantitate certain lines of evidence were also employed during the study. These modifications effect three lines of evidence that can be applied. These adjustments consider using segregation data in a quantitative manner (code: PP1) [22], quantitating evidence that considers the rare incidence of variants (code: PS4) and the value of accurately phenotyping conditions with single gene aetiologies (code: PP4) [23]. To further implement these recommendations, REVEL and dbscSNV RF scores were added to the existing suite of ensemble predictors to more stringently detect agreement between predictors of pathogenicity [24,25]. Similarly, Manta was added to our structural variant pipeline to assist in the detection of genomic rearrangements [26].

Ethical Approval
Ethical approval for this study was awarded by the Research and Medical Ethics committee of the Royal Victoria Eye and Ear Hospital (13-06-2011: HRA-POR201097) and by the Institutional Review Board of the Mater Misericordiae University Hospital and Mater Private Hospital (MMUH IRB 1/378/1358), Dublin, Ireland, prior to commencement. All work was carried out in accordance with the approved guidelines. All patients have given informed consent before recruitment to the study.

Clinical Presentation and Positive Candidate Detection Rates
Thus far, 710 pedigrees have been analysed as part of this study by target capture sequencing. 458 individuals have been sequenced for variants in 210 genes, 689 individuals have been sequenced for variants in 254 genes, and confirmation sequencing is ongoing for other affected family members. A spectrum of IRDs were observed in this study, as can be seen in Figure 1. The positive candidate dectection rate has improved from previous reports [15,27], now reaching approximately 70% (495/710). Furthermore, a single candidate variant only has been identified in an additional 9% (64/710) of pedigrees diagnosed with an IRD that typically exhibits a recessive inheritance pattern ( Figure 2). In addition, a number of novel missense variants have been detected in the course of this study (Table 2).  (Table S2).

Figure 2.
Diagnostic yield rates for 710 Target 5000 pedigrees utilising target capture next-generation sequencing of the exonic regions of over 250 genes and previously identified pathogenic intronic variants (Table S3).  (Table S2).  (Table S2).

Figure 2.
Diagnostic yield rates for 710 Target 5000 pedigrees utilising target capture next-generation sequencing of the exonic regions of over 250 genes and previously identified pathogenic intronic variants (Table S3).  (Table S3).

Retinitis Pigmentosa
The most prevalent clinical presentation was retinitis pigmentosa (RP; MIM: 268000), accounting for 37.75% (379/1004) of all recruited pedigrees ( Figure 1). These figures incorporate various phenotypic presentations, including atypical, inverse, and paravenous RP. The total number of pedigrees analysed and clinically reported as having a family history of RP was 184. Of these, 131 have been genetically resolved. The total number of pedigrees analysed and clinically reported as having simplex RP was 79. 48 of these pedigrees have been genetically resolved. This gives a genetic diagnosis rate of 71% and 61%, respectively. Of these resolved cases, 81 exhibited an autosomal dominant inheritance pattern, 71 exhibited an autosomal recessive inheritance pattern, and 27 exhibited an X-linked inheritance pattern. Autosomal dominant was the most common inheritance pattern, accounting for over 45% (81/178) of all sequenced RP pedigrees in this cohort ( Figure 3). This figure is larger than reported in other studies [28]. Variants in RHO (MIM: 180380) were the most common candidate variants for RP in this cohort, accounting for over 14% (26/178) of all sequenced RP pedigrees, and over 32% (26/81) of dominant pedigrees alone. It is important to note that there is likely an even higher prevalence of RHO-linked RP in the Irish IRD cohort, as many pedigrees involved in single-gene studies were excluded from this study due to the discovery of a causative variant previously [15,27]. Of these variants, namely c.533A>G (p.Tyr178Cys), c.541G>A (p.Glu181Lys), and c.620T>G (p.Met207Arg) collectively account for candidate RHO mutations across 13 different pedigrees to date.   (Table S4).  (Table S4).
The most commonly observed candidate gene for autosomal recessive RP was USH2A (MIM: 608400). Two variants in this gene were found in over 21% (15/71) of pedigrees with this phenotype ( Figure 3). As variants in USH2A are also the primary cause of Type II Usher Syndrome, it is important to note that participants included in this section presented with recessive RP without syndromic disease manifestation. Non-syndromic USH2A-linked RP has been consistently reported in other IRD cohorts [29]. Interestingly, a proportion of recessive RP pedigrees (>9%) (7/71) carry variants in ABCA4, typically associated with Stargardt Disease, a form of macular dystrophy. In the Target 5000 patient cohort, it has become apparent that an atypical RP phenotype appears to be caused by variants in ABCA4, as previously described in other cohorts (Figures 4 and 5) [10,30].  (Table S4).

Stargardt Disease and Other Macular Dystrophies
Stargardt disease (STGD1) is the second largest phenotypic presentation in this study ( Figure 1). As mentioned previously, variants in ABCA4 [31] were the most common cause, accounting for over 97% (109/112) of resolved STGD1 pedigrees in this cohort ( Figure 6). Six potentially novel coding variants in ABCA4 were detected as part of this study, NM_000350.
, and c.6743T>C (Phe2248Ser). All of these variants, with the exception of c.5351T>C Pathogenic variants in RPGR (MIM: 312610) were the most frequent cause of X-Linked RP, explaining over 88% (23/26) of sequenced pedigrees ( Figure 3). Employment of a bespoke sequencing strategy of ORF15 has enhanced the success of sequencing this region, which typically presents a diagnostic challenge due to low coverage and poor variant detection [19].

Stargardt Disease and Other Macular Dystrophies
Stargardt disease (STGD1) is the second largest phenotypic presentation in this study ( Figure 1). As mentioned previously, variants in ABCA4 [31] were the most common cause, accounting for over 97% (109/112) of resolved STGD1 pedigrees in this cohort ( Figure 6). Six potentially novel coding variants in ABCA4 were detected as part of this study, NM_000350.2: c.1865G>A (p.Ser622Asn), c.223T>C (p.Cys75Arg), c.4468T>C (p.Cys1490Arg), c.5329A>G (p.Met1777Val), c.5351T>C (p.Leu1784Pro), and c.6743T>C (Phe2248Ser). All of these variants, with the exception of c.5351T>C (p.Leu1784Pro), were observed in individual pedigrees where another known pathogenic ABCA4 variant was also observed. Variant c.5351T>C (p.Leu1784Pro) was observed in three individuals of two not knowingly related families also carrying a known pathogenic variant. The probands of these pedigrees do not appear to share any other rare variants in common in the regions captured.
Genes 2020, 11, 105 9 of 24 Target 5000 carrying this variant and another pathogenic ABCA4 variant, suggesting significant enrichment in the Irish STGD1 population.
In agreement with previous reports on Target 5000 [15,27], ABCA4 c.5603A>T (p.Asn1868Ile) is a significant causal variant of a milder form of STGD1 in this IRD cohort. Of 110 STGD1 pedigrees with 2 positive candidate variants identified, this variant was observed 14 times. In addition, 6 STGD1 patients were found to carry this variant homozygously. ELOVL4 and PROM1 were determined to be the cause of STGD1 disease in 2.7% (2/112) of sequenced pedigrees. Other maculopathies that presented prominantly include best disease accounting for 9.4% (15/159) of macular dystrophies in this study, cone-rod dystrophy, and a general macular dystrophy phenotype, each accounting for almost 7% (11/159) of all maculopathy pedigrees ( Figure 6). However, it is clear from Figure 6 that STGD1 is the predominant cause of macular dystrophy in Target 5000 participants.  (Table S5).

Usher Syndrome
Usher syndrome is the most common manifestation of syndromic-IRD in this cohort, accounting for over 7% (78/1004) of all clinical presentations in Target 5000 ( Figure 1). Usher syndrome is a form of ciliopathy characterised by RP and sensorineural hearing loss. It is generally divided into subgroups based on severity of symptoms, ranging from most severe in Type 1 to least severe in Type 3. In total 57 pedigrees were genetically diagnosed with Usher syndrome, where 2 candidate variants were detected, with Type 2 being the most common presentation. Seventy-eight pedigrees were clinically diagnosed with Usher syndrome, yielding a positive candidate detection rate of 73% (57/78). Usher type 2 is by far the predominant sub-group in the Target 5000 cohort (41/57), most frequently caused by variants in USH2A (28/41) (Figure 7). Variants in MYO7A were the most frequent cause of Usher type 1 (10/14). Variants in CLRN1 and MTTS2 (1/2) contributed equally to cases of Usher Type 3 in this study, with one individual carrying a novel candidate variant in CLRN1 (Table 2).  (Table S5).
It is becoming increasingly clear that intronic variants play a major role in the causation of STGD1. Nine pathogenic intronic variants were included in the capture panel of this study and found in 26 cases in 20 pedigrees with another ABCA4 variant. In recent years, single-molecule molecular inversion probe (smMIP) based sequencing of the whole ABCA4 gene including intronic sequences has dramatically increased positive candidate detection rates for STGD1 [32,33]. Thirty-six unresolved probands from Target 5000 presenting as STGD1 or cone-rod dystophy included in this cohort were analysed as part of a landmark study on the genetic landscape of ABCA4 in over 1000 probands, with a positive candidate detection rate of 44%. (16/36) [13]. This has increased the rate at which ABCA4 is identified as the causal gene of STGD1 in this cohort ( Figure 6). Of note from Irish participants in this study, 5 individuals were found to carry c.4539+2028C>T (p.[=,Arg1514Leufs*36]) in trans with another pathogenic ABCA4 variant. Upon retrospective analysis, 5 further incidences of this variant were detected in previously unresolved individuals, totaling 10 participants recruited by Target 5000 carrying this variant and another pathogenic ABCA4 variant, suggesting significant enrichment in the Irish STGD1 population.
In agreement with previous reports on Target 5000 [15,27], ABCA4 c.5603A>T (p.Asn1868Ile) is a significant causal variant of a milder form of STGD1 in this IRD cohort. Of 110 STGD1 pedigrees with 2 positive candidate variants identified, this variant was observed 14 times. In addition, 6 STGD1 patients were found to carry this variant homozygously. ELOVL4 and PROM1 were determined to be the cause of STGD1 disease in 2.7% (2/112) of sequenced pedigrees. Other maculopathies that presented prominantly include best disease accounting for 9.4% (15/159) of macular dystrophies in this study, cone-rod dystrophy, and a general macular dystrophy phenotype, each accounting for almost 7% (11/159) of all maculopathy pedigrees ( Figure 6). However, it is clear from Figure 6 that STGD1 is the predominant cause of macular dystrophy in Target 5000 participants.

Usher Syndrome
Usher syndrome is the most common manifestation of syndromic-IRD in this cohort, accounting for over 7% (78/1004) of all clinical presentations in Target 5000 ( Figure 1). Usher syndrome is a form of ciliopathy characterised by RP and sensorineural hearing loss. It is generally divided into sub-groups based on severity of symptoms, ranging from most severe in Type 1 to least severe in Type 3. In total 57 pedigrees were genetically diagnosed with Usher syndrome, where 2 candidate variants were detected, with Type 2 being the most common presentation. Seventy-eight pedigrees were clinically diagnosed with Usher syndrome, yielding a positive candidate detection rate of 73% (57/78). Usher type 2 is by far the predominant sub-group in the Target 5000 cohort (41/57), most frequently caused by variants in USH2A (28/41) (Figure 7). Variants in MYO7A were the most frequent cause of Usher type 1 (10/14). Variants in CLRN1 and MTTS2 (1/2) contributed equally to cases of Usher Type 3 in this study, with one individual carrying a novel candidate variant in CLRN1 (Table 2).  (Table S6).

Other IRDs Encompassed by Target 5000
The second most frequent syndromic IRD clinically diagnosed in this study was Bardet-Biedl syndrome (BBS), typically characterised by polydactyly, intellectual disability, and obesity. Over 2% of pedigrees (21/1004) in the entire cohort present with BBS at the clinic ( Figure 1); however, this is thought to be an under-representation, as some of those diagnosed with simplex RP are later found to have had polydactyly removed as children. These individuals are re-diagnosed with BBS following  (Table S6).

Other IRDs Encompassed by Target 5000
The second most frequent syndromic IRD clinically diagnosed in this study was Bardet-Biedl syndrome (BBS), typically characterised by polydactyly, intellectual disability, and obesity. Over 2% of pedigrees (21/1004) in the entire cohort present with BBS at the clinic ( Figure 1); however, this is thought to be an under-representation, as some of those diagnosed with simplex RP are later found to have had polydactyly removed as children. These individuals are re-diagnosed with BBS following detection of two candidate variants in a BBS-associated gene and confirmation with Target 5000 affiliated ophthalmologists. In total, 14 pedigrees received a genetic diagnosis of BBS having been initially clinically diagnosed with RP. Variants in BBS1 were the most frequent cause of this syndrome accounting for over 71% (23/32) of sequenced pedigrees. As reported previously, the most common variant is c.1169G>T (p.Met390Arg) [15]. This variant has been detected 28 times homozygously in 22 different pedigrees in this cohort. The second most frequent cause of BBS were variants in BBS10 (4/32), one of which is believed to be a novel variant (Table 2). BBS causality is followed by SDCCAG8 (n = 2), BBS9, BBS4, and TTC8 (n = 1 each) in the Target 5000 cohort (Figure 8).  (Table S7).

Novel Variants
Here we present 19 novel missense variants that, to the best of our knowledge, have not yet been associated with an IRD (Table 2). In silico tools including MetaLR, M-CAP and REVEL were utilised to predict the pathogenicity of these variants. Segregation analysis was carried out when possible and was used as evidence according to the American College of Medical Genetics and Genomics (ACMG) guidelines where appropriate [21]. Two of the variants listed here, BBS10 c.155G>A (p.Gly52Asp) and PRPH2 c.464C>T (p.Thr155Ile), had been allocated dbSNP IDs; however, this is likely due to their detection in population sequencing studies, such as gnomAD [34], where they have not previously been associated with an IRD. Ten of these variants have been classified as variants of unknown significance (VUS) although having strong in silico predicted pathogenicities. This highlights the importance of functional analysis in vitro and in vivo in order to confidently call such variants likely pathogenic.  (Table S7).
Many other rare forms of IRD are also included in the Target 5000 study. Sixteen pedigrees with a diagnosis with Leber congenital amaurosis (LCA) have been genotyped with a spectrum of genes implicated as positive candidates (Figure 8). Other rare IRDs examined in this study include retinoschisis (n = 10), achromatopsia (n = 5), Stickler syndrome (n = 4), Leber hereditary optic neuropathy (n = 3), and optic atrophy (n = 2), among others. In total, 90 pedigrees with positive candidate variants are encompassed by Figure 8, illustrating the wide spectrum of disease presentation included in Target 5000.

Novel Variants
Here we present 19 novel missense variants that, to the best of our knowledge, have not yet been associated with an IRD (Table 2). In silico tools including MetaLR, M-CAP and REVEL were utilised to predict the pathogenicity of these variants. Segregation analysis was carried out when possible and was used as evidence according to the American College of Medical Genetics and Genomics (ACMG) guidelines where appropriate [21]. Two of the variants listed here, BBS10 c.155G>A (p.Gly52Asp) and PRPH2 c.464C>T (p.Thr155Ile), had been allocated dbSNP IDs; however, this is likely due to their detection in population sequencing studies, such as gnomAD [34], where they have not previously been associated with an IRD. Ten of these variants have been classified as variants of unknown significance (VUS) although having strong in silico predicted pathogenicities. This highlights the importance of functional analysis in vitro and in vivo in order to confidently call such variants likely pathogenic.

RPE65
In Ireland, the prevailing RPE65 phenotype has been that associated with the specific variant, (NM_000329.2) c.1430A>G (p.Asp477Gly). The disease phenotype is comparatively much milder than RPE65-LCA and closely resembles the clinical manifestations of choroideremia. This amino acid position has been shown to be highly conserved across multiple species [35]. This variant is one that continues to be detected in additional cases in the ongoing Target 5000 study. Initial reports and phenotype characterisation of this mutation were based on several pedigrees originating from Ireland [35,36]. The dominant c.1430A>G (p.Asp477Gly) variant remains to be probed for, yet is undetected, in large RPE65 screening studies (n > 2000) of non-Irish ethnicity [37]. Currently in the Target 5000 cohort, there are 23 genetically confirmed affected individuals, with many patients from these pedigrees that are pending recruitment. These 23 affected patients span 7 pedigrees although likely originate from a single source as observed in previous studies [38]. An example of segregation of the c.1430A>G (p.Asp477Gly) variant in one such pedigree is shown in Figure 9.

RPE65
In Ireland, the prevailing RPE65 phenotype has been that associated with the specific variant, (NM_000329.2) c.1430A>G (p.Asp477Gly). The disease phenotype is comparatively much milder than RPE65-LCA and closely resembles the clinical manifestations of choroideremia. This amino acid position has been shown to be highly conserved across multiple species [35]. This variant is one that continues to be detected in additional cases in the ongoing Target 5000 study. Initial reports and phenotype characterisation of this mutation were based on several pedigrees originating from Ireland [35,36]. The dominant c.1430A>G (p.Asp477Gly) variant remains to be probed for, yet is undetected, in large RPE65 screening studies (n > 2000) of non-Irish ethnicity [37]. Currently in the Target 5000 cohort, there are 23 genetically confirmed affected individuals, with many patients from these pedigrees that are pending recruitment. These 23 affected patients span 7 pedigrees although likely originate from a single source as observed in previous studies [38]. An example of segregation of the c.1430A>G (p.Asp477Gly) variant in one such pedigree is shown in Figure 9.

FLVCR1
Typically, variants in FLVCR1 have been associated with a neurological syndrome, posterior column ataxia with retinitis pigmentosa (PCARP; MIM: 609033) [39][40][41], and more recently a specific splice variant (c.1092 + 5G>A) has been reported multiple times to be associated with non-syndromic RP [42][43][44]. Through Target 5000, substantial evidence has been obtained that suggests the first incidence of a protein coding FLVCR1 variant c.1022A>G (p.Tyr341Cys) implicated in non-syndromic RP [45]. RP is the most common clinical diagnosis for participants in the Target 5000 study, where the clinical diagnosis of RP accounts for nearly 40% (379/1004) of total pedigrees enrolled to date (Figure 1). Patients with this variant present with typical RP, without any extraocular features ( Figure 10). Since its initial detection, this variant has been observed and deemed to segregate with the condition in three additional not knowingly related families in the Target 5000 cohort. Each of the genotyped affected individuals in these three pedigrees are homozygous for this variant.
Genes 2020, 11, 105 14 of 24 Figure 10. Montage of wide-field fundus images from a retinitis pigmentosa patient with a FLVCR1 genotype. The FLVCR1 genotype results in a phenotype (left-right eye, right-left eye) that appears to bear a resemblance to classical retinitis pigmentosa, with features such as masses of bony spicules in the periphery, attenuated blood vessels, and waxy disc pallor. There is also a relative preservation of the maculae.

Choroideremia
Choroideremia is an X-Linked recessive chorioretinal degenerative condition with progressive atrophy of various retinal cell types and the surrounding blood retinal barrier. Here we describe a novel deletion in the CHM gene found in two Irish pedigrees. This nearly 500 kb deletion represents the largest as yet detected IRD-associated gene deletion in Ireland (Figure 11). Two members of a large X-linked Retinitis Pigmentosa pedigree ( Figure 12) clinically presented with choroideremia and tested negative for the segregating RPGR variant found in other affected members of this pedigree. Both males were analysed with target capture sequencing and found to possess large deletions spanning the CHM gene, approximating 500 kb. The observation of two IRDs in this pedigree highlights the significant value of NGS-based diagnostics for IRDs ( Figure 12). Figure 10. Montage of wide-field fundus images from a retinitis pigmentosa patient with a FLVCR1 genotype. The FLVCR1 genotype results in a phenotype (left-right eye, right-left eye) that appears to bear a resemblance to classical retinitis pigmentosa, with features such as masses of bony spicules in the periphery, attenuated blood vessels, and waxy disc pallor. There is also a relative preservation of the maculae.

Choroideremia
Choroideremia is an X-Linked recessive chorioretinal degenerative condition with progressive atrophy of various retinal cell types and the surrounding blood retinal barrier. Here we describe a novel deletion in the CHM gene found in two Irish pedigrees. This nearly 500 kb deletion represents the largest as yet detected IRD-associated gene deletion in Ireland (Figure 11). Two members of a large X-linked Retinitis Pigmentosa pedigree ( Figure 12) clinically presented with choroideremia and tested negative for the segregating RPGR variant found in other affected members of this pedigree. Both males were analysed with target capture sequencing and found to possess large deletions spanning the CHM gene, approximating 500 kb. The observation of two IRDs in this pedigree highlights the significant value of NGS-based diagnostics for IRDs ( Figure 12).
The same CHM deletion has also been detected in a second Irish pedigree since its initial discovery. Two additional males and two carrier females from this second pedigree were all found to be affected with progressive choroideremia. In this pedigree, the incidence of choroideremia could be traced back 5 generations, 4 of which have been assessed by the clinical team. A third Irish pedigree with a large CHM deletion was also detected previously as part of Target 5000 [27]. This mutation spanned approximately 6.5 kb and encompassed exons 3 and 4 of CHM (Figure 11). In each instance, target capture NGS detected the presence of the deletion and breakpoints were approximated by tiled PCR analyses. Interestingly, some probing PCRs in the intergenic regions surrounding CHM suggested that genomic DNA was present between deleted regions. Upon analysis, these PCR products were likely the result of large regions of homology that exist within this genomic area and other regions of the genome. Several structural variants in this CHM-proximal intergenic region have also been observed in control samples ( Figure 11).
Choroideremia is an X-Linked recessive chorioretinal degenerative condition with progressive atrophy of various retinal cell types and the surrounding blood retinal barrier. Here we describe a novel deletion in the CHM gene found in two Irish pedigrees. This nearly 500 kb deletion represents the largest as yet detected IRD-associated gene deletion in Ireland (Figure 11). Two members of a large X-linked Retinitis Pigmentosa pedigree ( Figure 12) clinically presented with choroideremia and tested negative for the segregating RPGR variant found in other affected members of this pedigree. Both males were analysed with target capture sequencing and found to possess large deletions spanning the CHM gene, approximating 500 kb. The observation of two IRDs in this pedigree highlights the significant value of NGS-based diagnostics for IRDs ( Figure 12). Figure 11. Schematic representation of the genomic region surrounding the CHM gene. Wholegenome coverage data from a control population database (https://gnomad.broadinstitute.org/). Structural variants detected from controls samples and Target 5000 patients are aligned to this region to illustrate the instability of this genomic region. Blue = duplication, Red = deletion, Pink = template positive results likely due to highly similar sequence elsewhere in the genome. The same CHM deletion has also been detected in a second Irish pedigree since its initial discovery. Two additional males and two carrier females from this second pedigree were all found to be affected with progressive choroideremia. In this pedigree, the incidence of choroideremia could be traced back 5 generations, 4 of which have been assessed by the clinical team. A third Irish pedigree with a large CHM deletion was also detected previously as part of Target 5000 [27]. This mutation spanned approximately 6.5 kb and encompassed exons 3 and 4 of CHM (Figure 11). In each instance, target capture NGS detected the presence of the deletion and breakpoints were approximated by tiled PCR analyses. Interestingly, some probing PCRs in the intergenic regions surrounding CHM suggested that genomic DNA was present between deleted regions. Upon analysis, these PCR products were likely the result of large regions of homology that exist within this genomic area and

Discussion
Up to now, over 1000 individuals have been genotyped as part of Target 5000, accounting for over 20% of the estimated Irish IRD cohort. The results of this study thus far highlight the unique genetic architecture of IRDs in Ireland, cumulatively resulting in over 89 novel variants identified to date [15,27], 19 of which were missense mutations in addition to 1 novel structural variant identified in the most recent analysis (Table 2, Figure 11). With a positive candidate detection rate of almost 70% (495/710 pedigrees), Target 5000 highlights the value of sequencing the exons of 254 IRD-associated genes as well as some known pathogenic intronic regions. This is consistent with other sequencing studies of various IRD cohorts, which report a diagnostic yield of between 38-75% [17,[46][47][48][49]. This range can be expected due a number of factors. Firstly, deep-phenotyping to accurately categorise IRDs greatly assists with genotype-phenotype assessment. Assessor variability will exist between clinics and studies, contributing greatly to the variability of diagnostic yield. Secondly, there are differences between the numbers of genes and regions sequenced in these cohorts, where smaller gene panels will not capture some of the known genes associated with IRDs [46]. Lastly, the employment of additional analyses such as WES or WGS in other studies has increased the detection of candidate variants in previously unresolved cases [18,[49][50][51]. The candidate variant detection rate of Target 5000 demonstrates that target capture sequencing is a cost and time-effective first-tier approach for genetic screening of those affected by IRDs in Ireland.
This study also highlights the importance of an accurate genetic diagnosis in addition to a clinical diagnosis. Genetic diagnoses continue to become more relevant as gene-based medicines move towards the forefront of IRD treatment given a first approved gene therapy for an IRD and an array of gene therapies for other forms of IRD in clinical and preclinical development. Furthermore, genetic diagnoses facilitate a better understanding of disease progression and manifestation for both patients and clinicians.
Stargardt disease (STGD1) is an autosomal recessive disorder caused almost exclusively by variants in the ATP-binding cassette subfamily A member 4 (ABCA4) gene (MIM: 601691) [52]. It is the most frequent form of macular dystrophy with an estimated prevalence of approximately 1 in 10,000 individuals [53]. STGD1 is characterised by progressive bilateral central vision loss, colour vision defects, delayed dark adaptation, and flecks in the retinal pigmentary epithelium [54]. A spectrum of disease severity underlies STGD1, with age of onset and progression governed by the specific combination of variants an individual carries, highlighting again the value of a specific genetic diagnosis for disease progression and risk assesment. An extensive list of 5962 likely pathogenic ABCA4 variants in 3928 cases was published in 2017 [55], providing a valuable resource to aid in the interpretation of variants discovered as part of this study. In addition to coding variants captured, it has become increasingly clear that many of the unresolved one allele only ( Figure 2) cases may likely carry a pathogenic intronic variant where the disease is typically associated with a recessive inheritance pattern. This has been elegantly demonstrated by deep intronic variants in ABCA4 causing STGD1, such as c.4539+2028C>T (p.[=,Arg1514Leufs*36]). This variant has been observed as a candidate in 10 Target 5000 individuals, of which 5 were disovered as part of a landmark study of ABCA4-linked STGD1 employing smMIPs based whole gene sequencing [13] and 5 of which were discovered through target capture sequencing runs. This highlights a significant enrichment of this variant in the Irish IRD patient cohort. c.4539+2028C>T (p.[=,Arg1514Leufs*36]) was previously functionally analysed in photoreceptor precursor cells, and was shown to result in a 345 nucleotide pseudoexon inclusion due to strengthening of exonic splice enhancers. It is notable that such variants are an attractive target for antisense oligonucleotide-based splice correction therapy [56].
Identified variants proximal to the RHO p.Met207 region appear predominant in our sub-cohort of autosomal dominant RP patients, as reported in Section 3.2. Amino acids in this region of the protein form part of the fifth transmembrane domain of rhodopsin. This domain has been shown to be vital to retinal binding in bovine rhodopsin [57]. It is possible that undesirable changes in the protein folding of this region result in steric hindrance that prevents the apoprotein from successfully binding retinal. This impaired binding capacity may possibly underpin the reason for the prevalence of pathogenic variants found in the coding region of this protein domain.
Feline leukaemia virus subgroup C cellular receptor 1 (FLVCR1; MIM: 609144) is a transmembrane protein involved in erythropoiesis and heme transport [58]. FLVCR1 was first discovered for its role in aplastic anaemia in domestic cats and subsequently erythroblast destruction in vitro [59]. This was also the first study that theorised that FLVCR1 was a receptor for an organic anion and identified its component domains as strikingly similar to other ancient Major Facilitator Superfamily (MFS) members. In a later study it was shown this receptor protein provided the first description of a mammalian heme transporter [60]. The same group proceeded to investigate its effect in knockout mouse models, which resulted in embryonic lethality [58]. Candidate variant detection in FLVCR1 has significantly increased in recently recruited pedigrees. It is notable the c.1022A>G (p.Tyr341Cys) variant is only the second disease-associated variant to be found in FLVCR1 associated with RP without posterior column degeneration [43]. Moreover, it is the first protein coding variant found in this gene to be affiliated with non-syndromic RP [45]. There were no extraocular features associated with this variant (Figure 10). It is possible that the distinct families in this cohort with this variant share a common ancestry. However, they do not share any rare variants in the regions captured and originate from distinct geographical locations in Ireland. The age of onset for the symptoms of ataxia in PCARP has been reported as typically in the third decade of life. Of note in the Target 5000 cohort, all patients carrying this variant have reached this age, with the oldest patient currently in their seventh decade of life. It is unclear yet as to whether the disease pathology associated with this variant will remain completely non-syndromic throughout the entirety of a patient's lifespan or will result in a milder, later onset of additional symptoms compared to other pathogenic variants found in this gene. Fortunately, the small size of this gene (2.6 kb) in principle makes it suitable for inclusion in AAV vectors for use as a gene therapy. Such a therapy may help to alleviate the toxic effects of intracellular free-heme [39]. The serotype of AAV could be chosen based on the presence or absence of systemic phenotypes. For example, if only the retina was to be targeted, an AAV 2/5 or 2/8 might be effective serotypes; on the other hand, if the whole central nervous system was to be targeted, then AAV9 or AAVB1 would be more valuable [61]. Many additional bespoke serotypes are now emerging as a result of site-directed evolution of AAV serotypes [62][63][64][65]. Given the recent results here on the role of FLVCR1 in this form of non-syndromic RP, FLVCR1 along with other IRD genes, becomes an increasingly interesting candidate for exploration of AAV-mediated gene therapies.
The RPE65 enzyme is active in the retinoid cycle that recycles retinoids in the RPE, which are utilised by photoreceptor cells. Mutations in the RPE65 gene were initially found to be causative of some cases of autosomal recessive LCA, a severe and clinically distinct IRD [66][67][68]. This was subsequently expanded to include a milder, dominant form of retinopathy and more recently, to include less severe autosomal recessive retinopathies [35,37]. Luxturna, the first ocular gene therapy approved by the FDA, is for biallelic RPE65 retinal disease [16]. RPE65 c.1430A>G (p.Asp477Gly) remains the predominant RPE65 genotype observed in the Irish population. This variant has since been detected in other population studies alongside the clinical presentation of similar phenotypes, largely described as a choroideremia phenocopy [36,38]. However, until recently the exact disease mechanism caused by this variant has evaded researchers. It was initially believed that this variant produced an abnormal protein structure that may affect enzymatic function [35]. More recently, it has been shown in a mouse model that although this variant is not located in a critical functional domain, the variant induces sufficient change to alter the physiochemical properties of the p.D477 loop to produce an aggregation-prone surface. This surface gains the aberrant function of enabling abnormal protein-protein interactions [69]. One hypothesised type of interaction is with ubiquitin ligases, which would likely result in proteasomal degradation as seen in several other RPE65 mutations [70]. This theory may support the variability observed phenotypically as severity might then become dependent on several other factors such as ubiquitination rates and proteasome response. It is of note that aberrant RNA splicing has been shown to have a major pathogenic effect in a knock-in mouse model with this variant [71]. Given the success of the gene therapy treatment of biallelic RPE65 retinopathies with Luxturna [16], an AAV-based therapy, it is interesting to speculate whether the therapy might have any utility for monoallelic dominant RPE65 cases.
The CHM gene hosts the largest collection of pathogenic structural variants detected in any IRD-related gene as part of the Target 5000 study to date. The presence of several structural variants in the general population may also indicate a natural instability in this genomic region. Additionally, female carriers of CHM mutations typically show mild stationary signs with no symptoms, while males are severely affected. In this instance, some females were more severely affected than expected with advanced signs of degeneration and progressive visual decline. One female carrier showed radial pigmentation at the retinal pigment epithelium level and scattered choroidal atrophy, whereas another showed more marked reactive pigmentation and extensive symmetrical atrophy of the choroid and outer retina. These manifestations were apparent both anatomically and functionally. A more severe phenotype was also associated with whole CHM gene deletion in a previous genotype-phenotype choroideremia study [72]. However, in this study, the phenotype was described as choroideremia plus, due to additional syndromic features. These additional symptoms, such as deafness, were attributed to the extent of the deletion encompassing several surrounding genes, which is not the case for the whole gene deletion patients described here.
Going forward with the Target 5000 study, more efforts will be focused on the detection of additional variants in unresolved IRD cases and also in the functional assessment of variants of unknown significance identified thus far. In the age of budding emerging gene therapies, it has never been more important to stringently identify the pathogenic variants that underpin a condition. It is therefore vital that the variants located outside of exonic regions causing disease pathologies are detected, and that these participants are also considered for treatment, should an applicable gene therapy become available in the future. A whole exome or whole genome approach will be more extensively employed in the future to increase the identification of novel variants. Recent studies have highlighted that molecular diagnostic yield is increased when WGS is employed by comparison with other methods [51]. It is expected that for many of the currently unresolved cases, pathogenic variants may be present in deep-intronic, promoter, enhancer regions or in genes that are not yet associated with an IRD [73,74]. Recent studies such as that in STGD1, highlighting the relative prevalence of deep intronic mutations in the ABCA4 gene, attest to the need for more extensive sequence analysis [13]. These regions have been very poorly covered in patients screened by target capture protocols, unless specifically designed to do so. Additionally, WGS data will offer higher resolution in terms of data associated with copy number abnormalities and large structural rearrangements. It has been reported that there are over 1300 CNVs associated with IRD genes reported in the literature [75]. Exon-based target capture methods have a significantly reduced capacity to detect CNVs and SVs. This is due in part to the range of read-depth at captured regions and the inability to detect breakpoints that occur outside of captured loci. Notably, similarly targeted methods, such as non-WES/WGS studies, which have specifically probed intronic and exonic regions of genes have been shown to successfully detect large structural variants [13]. This illustrates that the enormous data volumes generated by WGS studies may be surplus to requirement if the SVs/CNVs to be searched for are relevant to a small genomic region. Table 2 illustrates that many variants with seemingly strong evidence of pathogenicity may still be classified as VUS. Variants that are currently of unknown significance likely to alter enzyme function, such as those in RPE65 [76], may be applicable to assessment in vitro by employing enzyme function assays. Alternatively, if predicted to perturb splicing having undergone in silico interrogation [77], these variants could be assessed by midigene analysis. Midigene analysis refers to the incorporation of variants of interest into vectors containing several exons of the relevant IRD gene, enabling the in vitro interrogation of the effect(s) of that variant directly at an RNA level. This approach is particularly effective for testing exonic and intronic variants that are likely to disrupt correct splicing, as these variants will have a distinct impact on the RNA product when compared to the wild-type equivalent. This method is also particularly useful for the interpretation of variants that are located within introns, yet outside of canonical splice sites, and additionally to examine rare synonymous exonic variants proximal to exon junctions [32,78].
A viable in vivo strategy for the assessment of novel missense variants is to introduce these VUS into model organisms with similar or conserved genes. The significant advantages of such model organisms is the ability to rapidly generate models of disease [79]. This is particularly relevant currently, as recent advances in genome editing technologies has made this approach more readily accessible and successful than ever before [80]. Although not all model organisms fully represent the visual system that exists in humans, many organisms can offer useful information ranging from protein mislocalisation [81] to optokinetic response [82]. By introducing a VUS into the appropriate system, additional evidence can be gained to better evaluate the functional impact of the VUS in human ocular pathologies.
Thus far, genetic analysis of IRD patients has helped to resolve ambiguous phenotypes and to identify causative mutations in nearly 70% (495/710) of sequenced pedigrees. The continuous expansion of our cohort has enabled us to better interrogate the sequencing data and interpret the potential pathogenicity of novel variants when detected. In addition to this, the growing body of data from NGS studies of IRDs globally should facilitate better correlations between genotype and phenotype and further refine methods for diagnoses and prognoses. Furthermore, whole gene and WGS analyses are highlighting the significant role of non-coding variants as causative of some IRDs. Target 5000 aims to provide actionable outcomes empowering patients with genetic diagnoses and potentially future access to clinical trials or approved treatments where appropriate. Given the rapid development of the field of ocular therapeutics, it is clear that genetically characterising those affected by an IRD has become a diagnostic imperative of the utmost importance.  Table S3: Tabular summary of the diagnostic yield rates for 710 Target 5000 pedigrees utilising target capture next-generation sequencing of the exonic regions of over 250 genes and previously identified pathogenic intronic variants, Table S4: Tabular summar of the genetic architecture of retinitis pigmentosa (RP) in Target 5000 participants, Table S5: Tabular summary of the genetic architecture of macular dystrophies in Target 5000 participants. CRD: Cone-Rod Dystrophy FA: Fundus Albipunctatus, Table S6: Tabular summary of the genetic architecture of Usher Syndrome (USH) in Target 5000 participants, Table S7: Tabular summary of the genetic architecture of less common phenotypes encountered in the Target 5000 study. LCA: Leber Congenital Amaurosis, LHON: Leber Hereditary Optic Neuropathy, Figure S1: The available age distribution of recruited Target 5000 participants, Figure S2: Sequencing coverage from three affected X-linked retinitis pigmentosa individuals (H41, P1 and K1) from three distinct pedigrees as seen on IGV software. Alligned to this region is sequencing primers (R4,R7,R8,R9,F10 and R5) and PCR primers (F3,R6 and P9) as shown by forward (blue) and reverse (red) orientations.