Analysis of rod-cone dystrophy genes reveals unique mutational patterns

Background Rod-cone dystrophy (RCD) is the most common inherited retinal disease that is characterised by the progressive degeneration of retinal photoreceptors. RCD genes classification is based exclusively on gene mutations’ prevalence and does not consider the implication of the same gene in different phenotypes. Therefore, we first investigated the mutations occurrence in autosomal recessive RCD (arRCD) and non-arRCD conditions. Then, finally, we identified arRCD enriched mutational patterns in specific genes and coding exons. Methods and results The mutations patterns differed according to arRCD (p=0.001). Specifically, When compared with missense; insertions/deletions (OR=1.2, p=0.007), nonsense (OR=1.2, p=0.014) and splice-site mutations (OR=1.6, p=0.038) increased the OR of arRCD by 20%–60% versus non-arRCD conditions. The gene-based analysis identified that EYS, IMPG2, RP1L1 and USH2A mutations were enriched in arRCD (p<0.05). The exon-based analysis revealed specific mutation patterns in exons of CRB1, RP1L1 and exons 12, 60 and 62 coding for Laminin EGF and FTIII domains of USH2A. Conclusion The current analysis showed that many aRCD genes have unique mutational patterns.


INTRODUCTION
Rod-cone dystrophy (RCD), also known as retinitis pigmentosa, is an inherited retinal disease (IRD) characterised by the progressive degeneration of the rod and cone photoreceptors . 1 In the most cases, this deterioration results in night blindness followed by progressive centripetal constriction of the visual field. 2 The worldwide prevalence of RCD is around 1:4000 individuals. 2 This condition is transmitted as a Mendelian trait caused by disease-causing mutation(s) in gene(s) associated with the disease phenotype. 2 RCD is exceptionally heterogeneous 3 with mutations in more than 60 genes being implicated (list of genes is accessible on: https:// web.sph.uth.edu/RetNet/sum-dis.htm#Agenes). 3 4 Different mutations in the same gene may cause different retinal phenotypes (such as Usher syndrome and Leber congenital amaurosis), and the same mutation may produce different retinal phenotypes even among siblings. 3 RCD has three modes of inheritance, with the autosomal recessive (ar) being the most prevalent (50%-60%), followed by autosomal dominant (ad) (30%-40%) and X-linked patterns (5%-15%). 2 5 Mutations in 23 genes have been related to adRCD, 36 genes to arRCD and 3 genes to X-linked RCD. 3 6 The diagnosis of RCD is usually complex due to its noticeable heterogeneity. 3 It depends on various investigations, including a comprehensive medical examination (visual function, multimodal retinal imaging, electrophysiology) and molecular genetic testing. 7 Genotype-phenotype correlations in RCD and other rare diseases have largely been based on cosegregation analysis. Furthermore, RCD genes classification is based exclusively on gene mutations' prevalence and does not consider the implication of the same gene in different IRDs. Therefore, we first investigated the occurrence of mutations according to arRCD. Then, we searched for specific mutation types highly enriched in arRCD rather than non-arRCD, such as

STRENGTHS AND LIMITATIONS OF THIS STUDY
⇒ The current study is the first to investigate all the autosomal recessive rod-cone dystrophy (arRCD) genes to report unique arRCD mutational signatures in exons and genes. ⇒ Our study has several limitations: (1) Our analysis relied on the number of reported mutations and not the patients carrying them; thus, we could not use the allelic frequencies in all our analysis; (2) No association with specific clinical ocular phenotypes such as the visual field, the electroretinogram and the fundus appearance was performed. Unfortunately, this was not possible in the current study because of the absence of this information; (3) We could not stratify these genotype-phenotype correlations according to the geographic location and (4) For the gene based, 1 test per gene was performed (63 independent tests in total); thus, a Bonferroni correction might further be used. If applied, USH2A and CRB1 remain highly associated (p<0.001). In the exon-based analysis, one test was performed for all the gene exons, thus abolishing the concern of multiple testing.

Open access
Stargardt disease, Usher syndrome, Leber congenital amaurosis and bestrophinopathies. Finally, we identified unique mutational patterns in specific genes and coding exons.

Data extraction, inclusion and exclusion criteria
The retinal information network database The Retinal Information Network (Retnet) is a database that provides tables of genes and loci causing IRDs. 6  LOVD database Similar to HGMD Pro, genetic variations in arRCD genes were also downloaded from the LOVD database (N=1104, 9 accessed: 20 September 2021). To retrieve all these variations, we searched for every arRCD gene by entering its symbol in the gene search tab (https://grenada.lumc.nl/ LSDB_list/lsdbs/).

UniProt and gene databases
All a.a domains were retrieved from the universal protein knowledgebase (UniProt) (https://www.uniprot.org/) . 10 On the other hand, the longest mRNA isoform was selected from the National Center for Biotechnology Information gene database (https://www.ncbi.nlm.nih. gov/gene). These databases provided a means to annotate the protein domain and transcript location of each genetic mutation extracted from the HGMD.

Mutations stratified according to arRCD
Each individual mutation extracted from the HGMD database was categorised as either an arRCD or a non-arRCD (any other disease even those not related to the eye such as diabetes, hearing impairment and many others) mutation. In this analysi,s individual mutations but not their frequencies were used to create an integer count or mutation occurrence statistic. As such, if a mutation was associated with disease in more than one person in the database, it was still only counted once. However, if a mutation is genetically heterogeneous-influences more than one trait-then it was counted once for each phenotype studied here. This statistic was defined for three genomic features: (1) 'global' or genome wide, (2) 'genic' or for each gene of interest and (3) 'exonic' or for each exon defined for the longest transcript of each gene of interest.
As a sensitivity analysis, the mutations identified in the LOVD databases were also used to derive this mutation occurrence statistic but only in a (1) 'global' framework.
To test for differences among the arRCD and non-arRCD mutation occurrence statistics, we performed a χ 2 test of independence. The null hypothesis was defined as an equal number of variations across all the tested categories.

Statistical analyses
The analyses were conducted using SPSS software V.20 (SPSS). All studied variables were expressed as frequencies. The plots were generated using Origin software (OriginPro, V.8, OriginLab Corporation, Northampton, Massachusetts, USA). χ 2 and logistic regressions, the null hypothesis of no association was rejected based on p<0.05.

RESULTS
We first used the RetNet database to identify genes previously known to cause arRCD. Sixty-three genes were found and further investigated, all listed in table 1. To study the mutation occurrence and patterns inside these arRCD genes, we searched the HGMD database, which revealed 5868 genetic variations, of which 2092 (36%) were arRCD. In comparison, the remaining two-thirds were specific for different IRDs such as Stargardt disease (17%), Usher syndrome (13%), Leber congenital amaurosis (7%), Bardet-Biedl syndrome (3%) and cone-rod dystrophy (3%) (figure 1). Interestingly, some genotypes within the arRCD genes were found in non-IRD conditions such as mucopolysaccharidosis IIIC (1%), hyper-IgD periodic fever syndrome (1%), diabetes (1%) (figure 1). The mutations occurrence in arRCD and other diseases is provided in table 1. Only 34% of the total mutations in  , table 1). We tested the possibility that longer genes have more arRCD mutations. One would anticipate that arRCD is a deleterious trait and that the accumulation of mutations should be proportional to the number of (functional) base pairs. The correlation analysis between gene size and the pattern of arRCD mutations showed no associations (p>0.05).
To investigate possible mutational signatures for arRCD, we have stratified the different types of mutations according to arRCD. We found that the mutations' types varied according to the phenotype (non-arRCD and arRCD) (table 2, p≤0.004). Specifically, we observed a 5% decrease in missense (61% in non-arRCD vs 56% in arRCD), an increase in InDels (23% in non-arRCD vs 25% in arRCD) and nonsense mutations (15% in non-arRCD vs 16% in arRCD) ( The mutations data were extracted from the HGMD Pro database. Data were presented as numbers (N), percentages (%) and frequencies.
The percentages are the proportion of all mutations at each locus that are arRCD mutations. The total exons size was calculated by adding the length of every exon in a gene (LoVD database: https://grenada.lumc.nl/LSDB_list/lsdbs/). Gene size were retrieved from GeneCards (www.genecards.org) arRCD, autosomal recessive rod-cone dystrophy; HGMD, Human Gene Mutation Database.  Open access replicated in the LOVD database since the variation types showed a similar trend (  (table 2). Importantly, when the associations between the mutations' type and USHER syndrome were conducted, we found that the InDels, nonsense and splice-site mutations increased the OR of USHER syndrome at least twice when compared with missense (2<OR<2.4, p<0.0001, table 2). We have conducted a gene-based analysis and found enrichment for arRCDamong eight genes: ABCA4, BBS1, CRB1, CYP4V2, EYS, IMPG2, RP1L1 and USH2A (χ 2 , p<0.05, table 3). Mutations in these genes were distributed differently between arRCD and non-arRCD. In ABCA4, BBS1, CYP4V2, >92% of missense, InDels and nonsense mutations were enriched in non-arRCD disorders (p≤0.043, table 3). In CRB1, nonsense and InDels mutations were enriched in non-arRCD (p=0.0001, table 3). In EYS, all types of mutations were enriched in arRCD (p=0.002). In IMPG2, all the InDels and splice sites were enriched in arRCD, whereas two-thirds of the missense mutations were non-arRCD (p=0.002, table 3). In RP1L1, the majority of the nonsense mutations were arRCD (p=0.05, table 3). In USH2A, the InDels, nonsense and splice site mutations belonged mainly to the group 'non-arRCD' (p=0.0001, table 3), while the missense mutations did not show any preference (~50%).
To go further, we searched for specific coding exons harbouring arRCD mutations (table 4). Our analysis revealed that exons 20 and 28 coding for the cytoplasmic region between nucleotide binding domains (NBD) and transmembrane domain (TMD, 1007 a.a-1,051 a.a) and the extracellular domain (ECD, 1411 a.a-1,452 a.a) in ABCA4 belonged to the group 'non-arRCD'. In contrast, all the coding exons in EYS were enriched in arRCD (table 4). Exons 2, 6 and 7 in CRB1 coding for EGF Like and Laminin G Like domains contain InDels and nonsense mutations that are non-arRCD (p<0.05, table 4). Interestingly, mutations in exon 4 of RP1L1 showed an opposite pattern: the InDels were non-arRCD', whereas the nonsense mutations showed an enrichment in arRCD (p=0.001, table 4). Mutations in exons 12, 60 and 62 coding for Laminin EGF and FTIII domains of USH2A showed a different spectrum: the missense mutations were mainly arRCD, whereas the InDels, nonsense and splice site were enriched in non-arRCD (p=0.001, table 4).

DISCUSSION
Here, we found that 36% of all the downloaded mutations in arRCD genes were specific for arRCD. The remaining two-thirds were present in non-arRCD phenotypes such as Stargardt disease, Usher syndrome and OR: measure of association between an exposure and an outcome. χ 2 measure of association between two categorical variable variables (related or independent).Regulatory mutations were not shown in the table because of the low sample size.
*P value for X 2 test was used to compare the mutations' occurrence in non-arRCD versus arRCD. †P value for multiple logistic regression analysis of genetic mutations with arRCD (arRCD vs non-arRCD). ‡P value for multiple logistic regression analysis of genetic mutations with arRCD (arRCD vs Usher syndrome). arRCD, autosomal recessive rod-cone dystrophy; HGMD, Human Gene Mutation Database.

Open access
Leber congenital amaurosis and non-retinal phenotypes such as hearing impairment and diabetes. We showed that the mutations' pattern differed according to arRCD than non-arRCD. Compared with missense, InDels, nonsense and splice-site mutations increased the ORs of arRCD by 20%-60% versus non-arRCD. Furthermore, we have conducted a gene-based analysis and found enrichment for EYS, IMPG2, RP1L1 and USH2A mutations with arRCD. The mutations data were extracted from the HGMD Pro database. Data were presented as numbers (N) and percentages (%). P value for χ 2 test was used to compare the mutations' occurrence in non-arRCD versus arRCD. χ 2 measure of association between two categorical variable variables (related or independent). arRCD, autosomal recessive rod-cone dystrophy; HGMD, Human Gene Mutation Database; InDel, insertion/deletion.

Open access
The exon-based analysis revealed that the vast majority of EYS mutations were enriched for arRCD. In contrast, the mutations in RP1L1 exon 4 and USH2A exons 12, 60 and 62 showed opposite patterns; the missense mutations were mainly arRCD, whereas the InDels, nonsense and splice site were specific for non-arRCD. The investigation of the mutational spectrum in arRCD genes differed between arRCD and non-arRCD conditions. Specifically, the prevalence of nonsense, InDels and splice-site mutations increased in arRCD. Furthermore, these types had a higher OR of arRCD (20%-60%). Noteworthy, these results were replicated in the LOVD database since the latter showed the same trend observed in the HGMD database.
Our findings point out that the InDels, nonsense and splice-site mutations increased the OR of USHER syndrome at least twice compared with missense (table 2). These findings go in the same direction with: (1) a survey of targeted panel sequencing in 525 Japanese RCD patients revealed that truncating variants in USH2A were detected in all syndromic patients with more severe phenotypes than non-syndromic ones; 11 (2) in the largest cohort of Chinese patients with USH2A, Zhu et al reported that individuals with truncating mutations experienced an earlier decline in visual function. 12 All the above is reasonable since truncating mutations might largely inactivate the function of the entire protein, thus leading to a more severe phenotype.
The differences in the mutation types between arRCD and non-arRCD phenotypes, directed us to a gene-based analysis to identify the genes responsible for these differences. Of note, this analysis revealed that unlike EYS, CNGB1 and PDE6A whose functions appear to be uniquely tied to arRCD, genes such as ABCA4 and USH2A seem to play a broader physiological role because mutations in it are commonly associated with conditions other than just arRCD. A possible explanation for being arRCD 'specific' is their expression site, as they are predominantly expressed in the eye and specifically involved in the biology of rods and cones. RP1 is a microtubule-associated protein crucial for photoreceptors' function, It encodes a photoreceptor-specific protein expressed in rods and cones. 2 13 EYS is the largest gene expressed in the human eye. It is expressed in the human retina with minor expression in other tissues. 14 Human EYS has been shown to play a role in stabilising ciliary axonemes in rods and cones photoreceptors. 15 In humans, CNGB1 encodes an ion channel needed for phototransduction by regulating the ion flow into the rod photoreceptor in response to light-induced alterations. 16 Mutations in PDE6A lead to excessive accumulation of cGMP and subsequent rod, followed by cone photoreceptors death. 17 USH2A has been shown to harbour mutations causing Usher syndrome type II and non-syndromic arRCD. Mutations in exons 12, 60 and 62 coding for Laminin EGF and FTIII domains in USH2A showed an interesting pattern of implication in arRCD: missense mutations were mainly arRCD specific, whereas InDels, nonsense and splice site mutations were abundant in non-arRCD. The mechanisms involved in this phenotypic variation are usually presented as assumptions and only exceptionally rely on proven data. 18 The phenotype heterogeneity associated with USH2A mutations underlines the complex relationship between the disease-causing mutations and the retinal phenotype, rendering the Mendelian concept of monogenic diseases not applicable for a growing number of diseases. The mutations data were extracted from the HGMD Pro database. Data were presented as numbers (N) and percentages (%). P value for χ 2 test was used to compare the mutations' occurrence in non-arRCD versus arRCD. χ 2 measure of association between two categorical variable variables (related or independent). arRCD, autosomal recessive rod-cone dystrophy; ECD, extracellular domain; InDel, insertion/deletion; NBD, nucleotide binding domain; TMD, transmembrane domain.

Open access
For decades, the genotype-phenotype correlations were based on cosegregation analysis inside small pedigrees. Our study is the first to use statistical tests to investigate the mutations patterns globally, per gene and per exons according to arRCD. On the other hand, this study has several limitations: (1) Our analysis relied on the number of reported mutations and not the patients carrying them; thus, we could not use the allelic frequencies in all our analysis; (2) No association with specific clinical ocular phenotypes such as the visual field, the electroretinogram and the fundus appearance was performed; (3) We could not stratify these genotype-phenotype correlations according to the geographical location and (4) For the gene based, one test per gene was performed (63 independent tests in total); thus, a Bonferroni correction might further be used. If applied, USH2A and CRB1 remain highly associated (p<0.001). In the exon-based analysis, one test was performed for all the gene exons, thus abolishing the concern of multiple testing.
In conclusion, the current approach showed specific mutational patterns specifically enriched in arRCD.
Contributors SES conceived and designed the study; LJ and MI download the data; SES and MI analysed the data; LJ and SES wrote the first draft, SES revised the manuscript, SES is the guarantor.
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests None declared.
Provenance and peer review Not commissioned; externally peer reviewed.
Data availability statement Data are available in a public, open access repository. The raw data of the current study was deposited in DRYAD repository: https:// datadryad.org/stash/share/9ntbTjE-nvOl_mB1P-9UamV0k2NcAennp6NvIm5yqDw. No licence is needed.

Open Practices
Open access This is an open access article distributed in accordance with the Creative Commons Attribution 4.0 Unported (CC BY 4.0) license, which permits others to copy, redistribute, remix, transform and build upon this work for any purpose, provided the original work is properly cited, a link to the licence is given, and indication of whether changes were made. See: https://creativecommons.org/ licenses/by/4.0/.