Introduction

Keratoconus (KTCN) is a corneal disorder characterized by progressive thinning and a conically shaped protrusion of the cornea that induces optical aberrations, leading to a severe impairment of visual function.1

Among the general population, the estimated frequency of KTCN is one in 2000 individuals, making it one of the most common ectatic disorders of the cornea.1 The vast majority of KTCN cases are non-syndromic. However, the co-occurrence of KTCN with genetic disorders, including Ehlers–Danlos syndrome, mitral valve prolapse, osteogenesis imperfecta, Down syndrome, or Leber congenital amaurosis, has been described.2, 3, 4, 5, 6

Although the most common presentation of KTCN is sporadic, 5.0–27.9% of patients report a positive family history.7 Generally, family pedigrees suggest KTCN is an autosomal dominant trait. However, other patterns of inheritance have also been described.8 Genetic contribution to the pathogenesis of KTCN is supported by studies showing higher concordance of disease in monozygotic than in dizygotic twins.9

Several loci implicated in a familial form of KTCN have been reported and there is now a growing list of candidate KTCN genes. It has been observed that some variants might be more penetrant or have a larger effect because familial aggregation of disease is observed.10, 11 While locus heterogeneity makes it difficult to identify factors unambiguously influencing the KTCN phenotype, the replication for any particular gene is expected to be rare.

One strategy to identify KTCN genes is to assess the segregating variation linked with KTCN in families with more than one affected member.12, 13, 14, 15 Whole-exome sequencing (WES), which allows determination of the rare and unique coding sequence variants in an individual personal genome, can be also utilized in familial studies. However, due to a number of candidate genes generated by linkage analysis or WES, the evaluation of each individual variant is challenging. Thus, a combination of linkage analysis and WES holds promise for causative gene discovery.

Here, we present WES and Sanger sequencing results for a selected Ecuadorian family with KTCN, which previously showed a suggestive linkage at 5q31.1–q35.3 (two-point NPL LOD = 3.3815).16 This strategy allowed for a reduction in the number of candidate genes and prioritized potential relevant variants.

Materials and methods

Subjects

A total of eight available members of Ecuadorian family (KTCN-011) with KTCN were included in the study (Figure 1a). All participants underwent complete ophthalmic evaluation. The pedigrees of Ecuadorian KTCN families have been previously described.14, 16 Briefly, the diagnosis of KTCN was based on visual acuity testing, intraocular pressure assessment, biomicroscopic evaluation, and fundus examination with dilation. In addition, a topographic study (Humphrey Atlas Topograph; Carl Zeiss Meditec, Jena, Germany) with a computer-assisted videokeratoscope was performed in all affected individuals as well as in individuals with a suspected corneal abnormality.

Figure 1
figure 1

(a) Pedigree and the Sanger sequencing results of the family KTCN-011. Black-filled symbols indicate individuals with KTCN, gray-filled symbols indicate individuals without DNA samples available, whereas the open symbols indicate unaffected individuals. The segregating phased variants in RAPGEF6 (c.3379G>A), SKP1 (c.475T>G), PROB1 (c.671G>A), and IL17B (c.527G>A) in the 5q31.1–q35.3 linkage region (framed), and HKDC1 (c.850G>A) in 10q22 are shown. (b) Hierarchical flow diagram of filtering process performed on sequence variants identified by WES in family KTCN-011. Among a total of 910 nonsynonymous variants identified by WES, which passed the standard variant quality control, 173 variants were observed simultaneously in both affected individuals. The filtering based on a minor allele frequency (MAF) <0.001 in the common and our internal exome databases allowed us to narrow the number of interesting variants to 74 SNVs. Among those 74 variants, four nonsynonymous SNVs were coincided with the region of the linkage peak on 5q31.1–q35.3 and were selected for Sanger sequencing in KTCN-011 family. The remaining 70 variants from other chromosomal regions were filtered out based on different criteria, including pathogenicity, conservation scores, and genes function. This strategy enabled us to select 12 additional variants for further evaluation. (c) Localization of the loci linked with KTCN and candidate genes. Schematic representation of (1) chromosome 5, (2) overlapping KTCN loci, the 5q31 and 5q32–q33 previously reported and the 5q31.1–q35.3 locus identified in family KTCN-011, and (3) genes localized in linked region (narrowed to the candidate genes screened in KTCN patients). (i) Genes with the variants identified by WES and verified by Sanger sequencing segregating with KTCN in family KTCN-011 (RAPGEF6, SKP1, PROB1, and IL17B). (ii) Genes screened by Sanger sequencing in our previous study in family KTCN-011 (PITX1, IL9, TGFBI), and in Italian patients with KTCN (SPARC); no variants were identified.

The study protocol was approved by the Institutional Review Boards at Poznan University of Medical Sciences in Poland and the Baylor College of Medicine (Houston, TX, USA). All individuals provided informed consent after the possible consequences of the study were explained, in accordance with the Declaration of Helsinki.

DNA preparation

Genomic DNA samples were previously extracted from blood samples using Gentra Puregene Blood Kit (Qiagen, Hilden, Germany) as described.14 The quality of the DNA samples were ascertained by electrophoresis and determined to be of high quality (size >23 kb) with no visible degradation. The quantity of samples was assessed spectrophotometrically using a NanoDrop instrument (Thermo Scientific, San Jose, CA, USA).

Whole-exome sequencing

We applied WES in two KTCN individuals (KTCN-011-04 and KTCN-011-06) at Baylor College of Medicine Human Genome Sequencing Center (BCM-HGSC) through the Baylor Hopkins Center for Mendelian Genomics (BHCMG). DNA samples were processed according to protocols previously described.17 Briefly, DNA samples were prepared into Illumina paired-end libraries and underwent whole-exome capture using BCM-HGSC core design (52 Mb; Roche NimbleGen, Inc., Madison, WI, USA), followed by sequencing on the Illumina HiSeq 2000 platform (Illumina, Inc., San Diego, CA, USA) with an 150 × depth of coverage. Data produced were aligned and mapped to the human genome reference sequence (hg19) using the Mercury pipeline. Variants were called using the ATLAS (an integrative variant analysis pipeline optimized for variant discovery) variant calling method and SAMtools (The Sequence Alignment/Map) and annotated using the in-house-developed ‘Cassandra’ annotation pipeline that uses ANNOVAR.18, 19, 20

Prediction of the effect of amino acid substitutions on protein function

The potential effects of the identified nonsynonymous amino acid substitutions on the proteins structure and function were predicted by three algorithms PolyPhen-2, SIFT, and MutationTaster along with a conservation score (phyloP).21, 22, 23, 24

Sanger sequencing validation

To confirm the identified WES variants, genomic regions of interests were verified by Sanger sequencing. We amplified the target exons in two individuals analyzed by WES (KTCN-011-04 and KTCN-011-06) and in all other available family members.

The data were deposited in the ClinVar database http://www.ncbi.nlm.nih.gov/clinvar/ (Organization ID:505572; submission ID:1484065, ClinVarAccession: SCV000267605, SCV000267606, SCV000267607, SCV000267608).

Results

Exome capture revealed a total of 910 nonsynonymous variants in at least one of two analyzed individuals (KTCN-011-04 and KTCN-011-06) that passed the standard variant quality control. A total of 173 variants were observed simultaneously in both affected individuals. Additional filtering based on a minor allele frequency <0.001 in the 1000 Genomes Project (http://www.1000genomes.org), NHLBI Exome Sequencing Project (http://evs.gs.washington.edu/EVS/), and our internal exome database (~5000 exomes) CMG at BCM allowed us to narrow the number of variants potential interest to 74 SNVs. Among these 74 variants, four nonsynonymous, heterozygous SNVs, c.3379G>A in RAPGEF6 (NM_001164386.1), c.475T>G in SKP1 (NM_006930.3), c.671G>A in PROB1 (NM_001161546.1), and c.527G>A in IL17B (NM_014443.2), coincided with the region of the linkage peak at 5q31.1–q35.3 and were selected for segregation analysis in the KTCN-011 family (Figure 1b), regardless of the pathogenicity of these variants. All four variants were not observed in our internal (CMG at BCM) database covering ~5000 individuals of mixed ethnicities (Europeans, Turkish, Asians, Africans, and Hispanics). Also, those variants were not detected in the control databases (1000 Genomes Project, NHLBI Exome Sequencing Project).

The remaining 70 variants from the other chromosomal regions were filtered out based on different criteria including pathogenicity (variant predicted as damaging by one of three prediction tools), and/or conservation scores (positive PhyloP score), and gene function. This strategy led to the selection of 12 additional heterozygous variants for further evaluation, including c.850G>A in HKDC1 (NM_025130.3) (Figure 1b and Table 1).

Table 1 Variants identified in affected individuals in WES, which are selected for segregation analysis in KTCN-011 family

DNA samples derived from two WES-tested individuals (KTCN-011-04 and KTCN-011-06), and all remaining members of KTCN-011 family (KTCN-011-01, KTCN-011-02, KTCN-011-03 KTCN-011-05, KTCN-011-07, and KTCN-011-08) were analyzed by Sanger sequencing. Targeted Sanger sequencing has confirmed the WES findings in KTCN-011-04 and KTCN-011-06 for all four variants from 5q31.1 to q35.3 chromosomal region and 12 selected variants localized in other loci. Among these variants, four substitutions completely segregate with affected phenotype in KTCN-011 family, including SKP1 c.475T>G p.(Cys159Gly), PROB1 c.671G>A p.(Gly224Asp), and IL17B c.527G>A p.(Cys176Tyr) from the 5q31.1–q35.3 linkage region, and HKDC1 c.850G>A p.(Gly284Ser) from 10q22 (Figure 1a).

Discussion

The genetic nature of KTCN is heterogeneous and complex and the major factor contributing to the disease development has not been found. Owing to a large number of candidate genes and variants identified using different molecular techniques, explanation of the genetic bases of KTCN phenotype is challenging. In this study we examined the KTCN-011 Ecuadorian family by WES and Sanger sequencing. The previous linkage analyses performed in this Ecuadorian family uncovered a putative KTCN locus on chromosome 5q31.1–q35.3 (two-point NPL LOD=3.3815).16 The proximal boundary of the proposed disease haplotype was at D5S471, defined by recombination in KTCN-011-07. Lack of a more distal recombination did not allow specification of a distal border with confidence.16 Interestingly, other suggestive loci on 5q have been identified also for KTCN.13, 25, 26, 27 Our 5q31.1–q35.3 linkage region overlapped two of them, 5q31 and 5q32–q33 in the Caucasian/Hispanic and the Southern Italian populations, respectively13, 26 (Figure 1c). These findings further indicate that the 5q31.1–q35.3 locus might be truly linked with KTCN.

Our previous molecular screening of the KTCN candidate genes TGFBI, IL9, and PITX1 in 5q31.1–q35.3 locus has not revealed any variants affecting function of these genes in the KTCN-011 family16 (Figure 1c). Given that the chromosomal interval mapped by linkage analysis was large in size, only few chosen functional candidate genes were analyzed by Sanger sequencing. Thus, other genes should be further evaluated in the 5q31.1–q35.3 locus.

Bisceglia et al.26 proposed SPARC as a candidate gene for KTCN because of its location in the 5q32–q33 linkage region. However, molecular screening of this gene in a large cohort of Italian KTCN patients has revealed that the variants identified in SPARC are rare polymorphisms rather than mutations causative for KTCN.28

In family KTCN-011, we have not identified any variants in SPARC. However, the sequential filtering of KTCN-011-04 and KTCN-011-06 variants revealed in WES has pointed to the RAPGEF6, SKP1, PROB1, and IL17B genes localized in the 5q31.1–q35.3 KTCN linkage region. All of these genes are expressed in human cornea based on literature search and our transcriptome analyses (Kabza et al., 2016, unpublished). All four variants were not observed in both our internal and control databases. Therefore, we assumed that all four variants were not the incidental finding.

Further evaluation by Sanger sequencing performed in all available family members showed complete segregation of three of the four variants, SKP1 c.475T>G, PROB1 c.671G>A, and IL17B c.527G>A, excluding the remaining RAPGEF6 from the putative KTCN genes.

Since we recognized 5q31.1–q35.3 as a locus linked with KTCN in studied family, WES variants localized within that chromosomal region were prioritized for further evaluation. However, in order not to miss any SNVs potentially affecting gene function, the 12 putative variants from the remaining chromosomes were also selected for segregation analysis. Interestingly, Sanger sequencing of these 12 variants has shown that HKDC1 c.850G>A in 10q22 was also present in all family members affected by KTCN and absent in healthy subjects.

HKDC1 encodes hexokinase domain containing 1 protein, which phosphorylates hexoses.29 Owing to the fact that phosphorylation is the first step in glucose metabolism, hexokinases may play an important role in regulating energy metabolism and thus regulating cell growth.30 However, little is known about HKDC1 function and there are no reports linking HKDC1 variant and ocular diseases. Analysis of c.850G>A variant leading to amino acid substitution in HKDC1 with different prediction algorithms shows that this variant may be pathogenic for the protein function and structure. However, a definitive relationship between the identified HKDC1 variant and KTCN phenotype could not be established.

SKP1 encodes S-phase kinase-associated protein 1 containing 163 amino-acids. This protein is an essential component of the SCF (SKP1-CUL1-F-box protein) ubiquitin ligase complex. The SCF complex mediates ubiquitination of specific protein substrates involved in cell cycle progression, signal transduction, and transcription, and targets them for degradation by the proteasome.31 In the SCF complex, SKP1 functions as a bridging protein that interacts with both cullin (CUL1) and F-box proteins, and is essential for the regulation of cellular functions. Rabbit corneal endothelial cells, in response to FGF-2 stimulation, remove cell cycle inhibitor – p27 protein – by ubiquitin ligase complexes. As a consequence, endothelial cells that remain in the G1 phase during lifespan resume their proliferation ability. This alteration might lead to endothelium fibrosis and physically blocking light transmittance in the cornea.32

There are no reports linking SKP1 variants to cases of corneal diseases, and c.475T>G substitution is the first SKP1 variant observed in patients with familial KTCN phenotype. Because this variant was referred as non-pathogenic, based on in silico analyses with different prediction algorithms, the role of identified substitution is challenging to evaluate, and definitive relationship between SKP1 and KTCN phenotype could not be established.

The second gene from 5q31 in which we found the variant segregating with KTCN in the KTCN-011 family is PROB1. This gene encodes proline-rich basic protein 1. There is no information about function of this gene. Definitive relationship between the identified PROB1 variant and KTCN phenotype could not be established, since this variant is predicted as benign by in silico tools. However, identification of c.671G>A substitution leading to p.(Gly224Asp) variant in KTCN patients indicates that this gene might be linked with KTCN.

IL17B encodes interleukin 17B, a member of the interleukin 17 (IL-17) family of cytokines that functions as an important mediator in the initiation or maintenance of the proinflammatory response. Interestingly, an elevated level of IL-17 has been observed in tear fluid of patients with KTCN. Because stimulation of corneal fibroblasts with IL-17 leads to production of metalloproteinases, it was suggested that elevated levels of IL-17 may induce corneal damage in KTCN. In herpetic keratitis, in response to infection of the cornea with herpes simplex virus, IL-17 induces a corneal inflammation by stimulating stromal cells to produce various proinflammatory cytokines including IL-6 and IL-8.33 However, there is no evidence that elevated level of IL-17 in KTCN tear fluid is responsible for corneal inflammation. Still, there are several reports supporting the notion of a role of inflammation in the thinning and weakening of the corneal tissue in KTCN.34, 35

In 2013, we identified several variants in genes encoded inflammatory molecules, IL1A and IL1B, in a KTCN family, but we observed a random distribution of them between KTCN and normal individuals. On the other hand, in all KTCN patients, we found a c.214+242C>T variant in the IL1RN gene, whose protein product modulates the effect of IL-1.15 The c.527G>A in IL17B is the second variant in interleukin genes family segregating with KTCN in the Ecuadorian families and the first with predicted pathogenicity for the protein structure and function. However, it is not clear whether the interleukin alteration contributes to KTCN.

In summary, a combination of methods, previously performed linkage analysis, and currently implemented WES and Sanger sequencing allowed us to significantly narrow the analyzed genomic region and reduce the list of putative exonic variants. A full segregation of the three sequence variants in SKP1, PROB1, and IL17B with the disease phenotype further verifies the previously described linkage between the 5q31.1–q35.3 chromosomal locus and familial KTCN in the Ecuadorian family KTCN-011. Moreover, due to the fact that this locus overlapped two other chromosomal regions recognized in distinct populations, the 5q31.1–q35.3 locus might be truly linked with KTCN. While promoter variants affecting gene expression, intronic variants, and other regulatory elements were not assessed in this study the exact genotype–phenotype correlation still remains unclear and should be further evaluated.