Recurrent Rare Copy Number Variants Increase Risk for Esotropia

Purpose To determine whether rare copy number variants (CNVs) increase risk for comitant esotropia. Methods CNVs were identified in 1614 Caucasian individuals with comitant esotropia and 3922 Caucasian controls from Illumina SNP genotyping using two Hidden Markov model (HMM) algorithms, PennCNV and QuantiSNP, which call CNVs based on logR ratio and B allele frequency. Deletions and duplications greater than 10 kb were included. Common CNVs were excluded. Association testing was performed with 1 million permutations in PLINK. Significant CNVs were confirmed with digital droplet polymerase chain reaction (ddPCR). Whole genome sequencing was performed to determine insertion location and breakpoints. Results Esotropia patients have similar rates and proportions of CNVs compared with controls but greater total length and average size of both deletions and duplications. Three recurrent rare duplications significantly (P = 1 × 10−6) increase the risk of esotropia: chromosome 2p11.2 (hg19, 2:87428677-87965359), spanning one long noncoding RNA (lncRNA) and two microRNAs (OR 14.16; 95% confidence interval [CI] 5.4–38.1); chromosome 4p15.2 (hg19, 4:25554332-25577184), spanning one lncRNA (OR 11.1; 95% CI 4.6–25.2); chromosome 10q11.22 (hg19, 10:47049547-47703870) spanning seven protein-coding genes, one lncRNA, and four pseudogenes (OR 8.96; 95% CI 5.4–14.9). Overall, 114 cases (7%) and only 28 controls (0.7%) had one of the three rare duplications. No case nor control had more than one of these three duplications. Conclusions Rare CNVs are a source of genetic variation that contribute to the genetic risk for comitant esotropia, which is likely polygenic. Future research into the functional consequences of these recurrent duplications may shed light on the pathophysiology of esotropia.

S trabismus affects 2% to 4% of the population and causes amblyopia, loss of binocular vision, and lower quality of life. 1,2 Strabismus runs in families, and population, family, and twin studies support a genetic contribution. 3−5 Twin meta-analysis supports a strong genetic contribution, 3 particularly for esodeviations. 6 The relative risk for first-degree relatives of an affected proband is estimated to be between 3 and 5. 3,5,7−9 The heritability factor remains significant after correction for the known environmental risk factors: 5 low birth weight, prematurity, maternal smoking, and advanced maternal age. [10][11][12][13][14][15][16][17] Causative genes have been identified for paralytic strabismus syndromes, in which patients cannot fully move their eyes. 18 In common forms of strabismus, however, no specific mutations have been reported, despite reported mapping of three Mendelian loci (7p22.1, 4q28.3 and 7q31.2). 7,19,20 We recently completed a genome wide association study (GWAS) of non-accommodative esotropia and identified one risk allele, an intronic single nucleotide polymorphism (SNP) of the WRB gene, which affects expression of WRB and neighboring genes. 21 A second GWAS, using self-reported strabismus in the UK Biobank, identified a locus on chromosome 17q25, which extends across the NPLOC4-TSPAN10-PDE6G gene cluster. 22 This locus has been associated through GWAS with several eye conditions, including macular thickness, 23 astigmatism, 24 retinal microvascular size, 25 and myopia. 26 Genetic variation can result from DNA sequence differences, duplications or deletions of genomic elements (copy number variants [CNVs]), or complex genetic rearrangements. CNVs can alter gene function, gene dosages, regulatory elements, or 3D chromatin structure. 27 CNVs have been implicated in neurodevelopmental disorders with complex inheritance, including autism spectrum disorder, [28][29][30][31][32][33][34][35][36][37] intellectual disabiltity, [38][39][40][41][42] and Tourette syndrome. 43 Strabismus is a neurodevelopmental disorder affecting the neural pathways that control ocular alignment and binocular fusion and is prevalent in patients with other neurodevelopmental disorders. We therefore examined our cohort of individuals with isolated esotropia for rare CNVs. We report here on three rare, recurrent DNA duplications that increase the risk of esotropia.

Cases
The esotropia cohort consists of patients from our previous GWAS, 21 including both accommodative and nonaccommodative cases. Inclusion and exclusion criteria were the same: manifest or intermittent esotropia of any size, a history of strabismus surgery for comitant esotropia, or esophoria >10 prism diopters. Accommodative esotropia was defined as manifest esotropia that reduced with hyperopic correction to <10 prism diopters. Infantile esotropia was defined as esotropia with onset before the age of 12 months. Nonaccommodative esotropia was defined as manifest or intermittent esotropia with onset after age 12 months that did not reduce to <10 prism diopters with hyperopic correction; this includes partially accommodative esotropia. By definition, fully accommodative cases did not have strabismus surgery; any patient who had strabismus surgery was classified as either non-accommodative or infantile, depending on age of onset. Exclusion criteria included structural ocular abnormality causing acquired vision loss; structural brain abnormality on neuroimaging; deprivation amblyopia; molecularly defined genetic syndromes or diagnoses associated with strabismus, such as trisomy 21 or craniosynostosis; or defined nonheritable cause of strabismus. A total of 2030 participants who self-reported as White of European ancestry (and in whom principal component analysis confirmed European ancestry) were enrolled: 1105 from Boston Children's Hospital, 745 from Australia (private ophthalmologists and public hospitals in Victoria, Western Australia, Tasmania, and New South Wales), 111 from Leicester, University Hospitals of Leicester, UK, 52 from Cole Eye Institute (Cleveland Clinic), 5 from Children's Hospital of Philadelphia, and 12 self-referred. After all quality control filters for CNV calling, the total number of participants included was 1614.

Controls
Control subjects of Caucasian ancestry were ascertained in which participants were genotyped on the Illumina Omni platform, and intensity data were available. This included controls from the Genomic Psychiatry Cohort and publicly-available controls from a GWAS of Fuchs' Endothelial Corneal Dystrophy (FECD) (accession number: phs000421.v1.p1), derived from the database of Genotypes and Phenotypes (dbGaP). After all quality control filters, 3922 control participants were included. None of the controls were reported to have strabismus, although strabismus was not specifically excluded from the ascertainment cohorts.

Genotyping
Esotropia patients were genotyped on Illumina Infinium human OmniExpress-24v1-0 array. Control cohorts from FECD were genotyped on Illumina HumanOmni 2.5 Versions 4v_1H array and the Genomic Psychiatry cohort was genotyped on Illumina OmniExpress 12v1.0. 98% of the individual SNPs present on OmniExpress-24v1-0 are present on the other arrays. SNP clustering and genotype calling was performed with GenomeStudio v2.0 (Illumina, San Diego, CA, USA). Samples with a call rate <0.98 or with discordant sex were excluded.

Intensity Sample Quality Control
Intensity-based metrics were used to eliminate samples unsuitable for CNV calling. These included the following: waviness factor (WF)-a measure of the waviness in intensity values, a known artifact caused by improper DNA concentration that can lead to spurious calls; Log-R ratio standard deviation (LRR_SD)-a measure of the overall variance in intensity; B allele frequency drift (BAF_DRIFT)a summary of the deviation of BAF from expected values. Cutoff values for each were determined empirically. Samples included had LRR-SD of <0.3, absolute value of WF <0. 43, and BAF_DRIFT <0.01. We eliminated samples with greater than 50 CNV calls, because those are more likely to be spurious calls. The final samples included 1614 esotropia patients and 3922 controls.

CNV Calling
We used two hidden Markov Model (HMM)-based CNV calling algorithms, PennCNV 44,45 (version 1.0.4) and Quan-tiSNP 46 (version 2). These algorithms detect CNVs based on B allele frequency (BAF) and logR ratio (LRR). We created GC wave-adjusted LRR intensity files for all samples using PennCNV's genomic_wave.pl script. 47 Because HMM algorithms can artificially break up large CNVs, CNV segments were merged using PennCNV's clean_cnv.pl script if they were of the same copy number and the intervening markers were less than 20% of the total of both segments. Calls from the two programs were merged by taking the intersection of overlapping calls of the same copy number. Only CNVs called by both programs, greater than 10kb, and encompassing 10 or more SNPs were included in the final call set. Black circles represent deletions, and gray circles represent duplications. Three duplications, on chromosomes 2, 4, and 10, were significant to P = 1 × 10 −6 . The three circles for chromosome 2 represent the same duplication, which has different breakpoints in different individuals. Some of the individual breakpoints have lower P values.

Call Filtering
CNVs were filtered out if they overlapped (>50%) with regions known to generate artifacts in SNP-based CNV detection: immunoglobulin domain regions, segmental duplications, telomeric ends and centromeric regions. We eliminated deletions with a PennCNV confidence score <25 and duplications with a PennCNV confidence score <10. These cutoffs were determined empirically based on confirmation of CNVs using ddPCR. Several deletions with confidence scores <25 were not confirmed by ddPCR, but duplications with scores above 10 were confirmed. To limit our dataset to rare CNVs, we eliminated CNVs that overlapped (>50%) with common CNVs (any CNV with >10% prevalence in large studies compiled by the DECIPHER database or Database of Genetic Variants (DGV)). We further eliminated any CNVs present in greater than 1% of the controls used in this study.

CNV Annotation
Rare CNVs were annotated for gene content according to RefSeq for the hg19 assembly using PennCNV.

Association Testing
We performed 1 × 10 6 label-swapping permutations in PLINKv1.07 to determine both locus-specific and genomewide P values empirically, using the max(T) method. 48 In the segmental test, case and control frequencies were calculated at each unique CNV breakpoint. For the gene-based test, frequencies were based on the number of genic CNVs at each gene locus. Association testing was conducted separately for deletions and duplications.

Removal of Related Individuals
Because recruitment of the esotropia cohort focused on patients with affected relatives, some of the participants were related. To ensure this was not biasing the results, we repeated association testing and per-gene testing after removing related individuals. From each family, the individual with the highest quality control scores was included, and others were excluded. After removing related individuals, 1379 unrelated esotropic individuals remained.

Determination of Insertion Sites
Whole-genome sequencing (WGS) was performed for three individuals with each of the significant CNVs to confirm the presence of CNVs and determine the insertion sites and breakpoints. WGS was performed at the Broad Institute of MIT and Harvard and called against the hg38 reference genome, which is a single representation of multiple genomes. Results were interpreted by examining read depth and split reads at the identified areas using integrated genome viewer software and compared to other individuals sequenced in the same call set.

Esotropia Cohort
Of the 2030 individuals with esotropia included in the previous GWAS, 21 1614 passed quality control measures for CNV calling. This included 851 females and 763 males; 911 from the US, 84 from the UK, and 620 from Australia. A total of 224 had accommodative esotropia, 317 had infantile esotropia, and 1075 had nonaccommodative esotropia.

Three Rare Recurrent Duplications Confer Risk for Esotropia
To test for enrichment of rare CNVs at individual loci, we conducted a segmental genome-wide association test, treat- ing deletions and duplications separately. We also conducted a complementary gene-based test, conditioned on CNVs affecting exons, to account for potentially non-overlapping CNVs affecting the same gene. In CNV analysis, in contrast to SNP-based GWAS, there is no established P value threshold for genome-wide significance. Therefore we established locus-specific and genome-wide corrected P values empirically through 1,000,000 label-swapping permutations, using the max(T) method, 48 following the methods of Huang et. al. 43 Association testing identified three recurrent rare duplications enriched among esotropia patients that survived genome-wide correction for multiple testing (Fig. 2). No specific deletions reached genome-wide significance.
Each of the significant CNVs was validated in affected cases by ddPCR. All patients with chromosome 2 and 4 duplications were validated, and all patients with the larger chromosome 10 duplication were validated, because the probe location is within the region unique to esotropia patients. Overall, 114 cases (7%) and 28 controls (0.7%) had one of the three duplications. No case nor control had more than one of the duplications.
We repeated association testing after removing related individuals within the esotropia cohort (see methods), leaving 1379 cases. The same three duplications were again significant to P = 1 × 10 −6 , by both the breakpoint test and gene test, indicating that our results were not driven by relatedness of our cases.

Insertion and Breakpoint Analysis
To determine whether the duplications were tandem or interspersed and to identify the breakpoints, three cases with each duplication were chosen for WGS. Sequencing all individuals was not feasible, so we chose several unrelated individuals with each duplication who harbored different predicted breakpoints based on SNP calling. For the chromosome 2 duplication we sequenced one infantile and two nonaccommodative esotropia participants. For the chromosome 4 duplication we sequenced one accommodative, one partially accommodative, and one nonaccommodative esotropia participant. For the chromosome 10 duplication we sequenced three nonaccommodative esotropia participants.
Despite the SNP prediction of different breakpoints, the three individuals with the chromosome 4 duplication all harbored a tandem duplication with breakpoints at chr4:25,554,985 and chr4:25,578,843 (hg38, which correspond to hg19:chr4:25,556,607 and chr4:25,580,465). Splitreads were readily identified that span the breakpoints, and sequencing coverage was higher across the area of duplication. Exon 1 of LOC101929161 is included in the duplication and the breakpoint is just upstream of the exon 2 junction (Fig. 6). The chromosome 2 and chromosome 10 duplications were in areas of the genome with multiple repetitive elements and poor mapping of short sequencing reads. We therefore could not identify definitive breakpoints for these two duplications in these individuals, nor determine whether the duplications were tandem or interspersed.

Esotropia Subtypes
To determine whether these duplications were associated with subtypes of esotropia, we compared the proportion of participants in the cohort with accommodative, infantile, or nonaccommodative esotropia (as defined above) to the proportion with each duplication. In the full cohort, 1075 (66.5%) participants were classified as nonaccommodative, 317 (19.6%) as infantile, and 224 (13.9%) as accommodative. Although the numbers are small, the distribution of subtypes differs significantly between the duplications (chi square 17.74, degrees of freedom 6, P = 0.0069). Accommodative esotropia is underrepresented in patients with the . A smaller, ∼300-kb duplication was present in 28 additional cases and 18 controls (dark blue). A nearby deletion was present in one case (red). RefSeq genes are listed underneath. Protein coding genes are in blue, long noncoding RNA genes are in red, and noncoding RNAs are in orange. H3K27Ac mark indicates several putative regulatory regions fall within the duplication. This area is not well conserved over 100 vertebrates. There is some conservation with monkey, but very little with other animals. The blue vertical line indicates the position of the ddPCR probe used to confirm the duplication. The gap in the annotations indicates an unmappable area of the reference genome, usually because it is highly repetitive or of low complexity. At bottom are indicated repeats in the region identified by RepeatMasker: SINE, short interspersed nuclear elements; LINE, long interspersed nuclear elements; LTR, long terminal repeat elements; DNA, DNA repeat elements; SIMPLE, microsatellites, low complexity repeats, satellite repeats, RNA repeats, and other repeats. chromosome 2 duplication (only 1 [4%] of individuals with the chromosome 2 duplication had accommodative esotropia), and absent from patients with the larger chromosome 10 duplication. By contrast, accommodative esotropia is overrepresented among patients with the chromosome 4 duplication (eight cases [29.6%] and the smaller chromosome 10 duplication (six cases (21.4%) (Fig. 7).

DISCUSSION
We demonstrate a role for CNVs in the risk for esotropia, a disorder with poorly understood pathophysiology. We observe a greater global burden of total rare CNV length, and report three recurrent rare duplications that significantly increase risk.
The chromosome 4 duplication includes exon 1 of LOC101929161, a lncRNA of unknown function encompassing 4 exons. This RNA is exclusive to primates, with no homology in mice. In published RNASeq data, expression is primarily in lung and digestive system. 50,51 Duplicating one exon could alter the conformation of the RNA molecule, affecting its affinity for its binding partners. Alternately, the duplication could change the 3D chromatin structure, affecting the topographically associated domains and thus regulation of nearby genes.
The chromosome 2 duplication encompasses one lncRNA (CYTOR) and two overlapping microRNAs (miR4435-1 and miR4435-2). CYTOR is broadly expressed in fetal and adult tissues, with low levels in adult and fetal brain, 50,52 and is overexpressed in cancer cells. 53 The two microRNAs are single exons and are presumed to regulate translation of other genes. There are no homologous genes or microR-NAs in mouse. The duplicated region contains multiple putative regulatory regions, and duplication of these could alter expression of their target genes or genes near the insertion site.
The chromosome 10 duplication includes 12 genes, of which only a few have known functions. NPY4R and NPY4R2 encode neuropeptide Y receptors; neuropeptide Y is a gut-brain peptide which modulates multiple physiologic processes, including feeding behavior and anxiety. 54 ANXA8 encodes annexin 8, one of a family of Ca ++ effector molecules that regulate EGF receptor localization and activity. 55 ANTXRL, FAM25C, FAM25G, and AGAP9 are protein-coding genes of unknown function. LINC00842 is a lncRNA of unknown function. HNRNPA1P33, FAM35DP, ANTXRLP1, and BMS1P2 are pseudogenes.
None of the genes involved in the duplications suggest an obvious pathologic mechanism for strabismus, but study of their developmental expression patterns and functions may lead to further insights into strabismus. The genetic loci identified as strabismus risk factors through GWAS, WRB 21 and NPLOC4-TSPAN10-PDE6G, 22 similarly do not have obvious roles in strabismus pathology.
WGS in three individuals with the chromosome 4 duplication showed the duplication is tandem and defined the breakpoints. Although these individuals are unrelated, the breakpoints are identical. These particular breakpoints may be a "hotspot" for new duplications, or these individuals may share an ancestral haplotype that includes this duplication and confers risk for esotropia.
The breakpoints could not be definitively identified in the individuals with chromosome 2 and 10 duplications, because the breakpoint regions are in areas of the genome with highly repetitive sequence. Mapping reads and identifying split reads in these areas is difficult using short-read next generation sequencing, because 100 to 150 base pair reads of repetitive sequence map to multiple locations in the genome. This hinders CNV calling by WGS. By contrast, SNP calling uses SNPs present across the region, and these CNVs were validated with ddPCR. Unfortunately, repetitive genomic areas are those most likely for insertion and deletion events to occur. This is a problem throughout the field of genetics, which may be solved in the future by long read sequencing.
A limitation of using publicly available control datasets is that individuals with strabismus, especially a history of treated childhood strabismus, may be included in our control set. This, however, strengthens our findings, because some of the control individuals with these duplications may have strabismus. Similarly, strabismus patients may be included in public databases of "healthy" individuals, making comparisons to public databases of CNVs difficult to interpret. DGV reports structural variation present in healthy individuals, from studies that called CNVs using differing algorithms and genotyping platforms. A similar duplication on chromosome 4 has a frequency of 0.34% in DGV, 56 similar to the 0.2% rate in our controls. On chromosomes 2 and 10, somewhat larger duplications have frequencies of 1.58% and 1.74%, respectively, much higher than in our control population (0.2% and 0.4%). This may reflect differences in CNV calling algorithms and genotyping platforms, quality control measures, or populations included. Using these control frequencies, the chromosome 10 duplication remains significant whereas the chromosome 2 duplication does not. Because our cases and controls were called using identical parameters, they provide a more valid comparison.
Although we have one of the largest and most accurately phenotyped esotropia cohorts, it remains underpowered to detect extremely rare CNVs or those of moderate effect. Although we show an association between these rare duplications and esotropia subtypes, our numbers are too small to draw definitive conclusions. Whether these duplications are more prevalent in exotropia, indicating a genetic predisposition to strabismus generally, rather than esotropia specifically, remains to be explored.
Large CNVs cause many genetic syndromes that include strabismus, including Down syndrome (duplication chromosome 21), [57][58][59] Williams-Beuren syndrome (deletion 7q11.23), 60 and deletion of 10q26. 61 CNVs have also been reported in patients with Duane syndrome, a form of paralytic strabismus. 62 Interestingly, large (>5Mb) duplications of 10q11, which encompass the 654kb region we report, cause 10q duplication syndrome which includes developmental delay, dysmorphic features, and, in 5/8 cases, strabismus. 63 A male with a 4.5Mb duplication of 4p15.2, which encompasses the 23kb duplication we report here, was reported to have developmental delay, congenital heart disease and strabismus. 64 The participants in this study all had nonsyndromic strabismus, and there are no specific syndromes associated with the precise duplications we identified.
We provide here the first evidence that CNVs contribute to genetic risk in nonsyndromic esotropia. Further research into the functional consequences of these duplications will hopefully increase our understanding of the pathophysiology of strabismus.