Investigating the effects of copy number variants on reading and language performance

Reading and language skills have overlapping genetic bases, most of which are still unknown. Part of the missing heritability may be caused by copy number variants (CNVs). In a dataset of children recruited for a history of reading disability (RD, also known as dyslexia) or attention deficit hyperactivity disorder (ADHD) and their siblings, we investigated the effects of CNVs on reading and language performance. First, we called CNVs with PennCNV using signal intensity data from Illumina OmniExpress arrays (~723,000 probes). Then, we computed the correlation between measures of CNV genomic burden and the first principal component (PC) score derived from several continuous reading and language traits, both before and after adjustment for performance IQ. Finally, we screened the genome, probe-by-probe, for association with the PC scores, through two complementary analyses: we tested a binary CNV state assigned for the location of each probe (i.e., CNV+ or CNV−), and we analyzed continuous probe intensity data using FamCNV. No significant correlation was found between measures of CNV burden and PC scores, and no genome-wide significant associations were detected in probe-by-probe screening. Nominally significant associations were detected (p~10−2–10−3) within CNTN4 (contactin 4) and CTNNA3 (catenin alpha 3). These genes encode cell adhesion molecules with a likely role in neuronal development, and they have been previously implicated in autism and other neurodevelopmental disorders. A further, targeted assessment of candidate CNV regions revealed associations with the PC score (p~0.026–0.045) within CHRNA7 (cholinergic nicotinic receptor alpha 7), which encodes a ligand-gated ion channel and has also been implicated in neurodevelopmental conditions and language impairment. FamCNV analysis detected a region of association (p~10−2–10−4) within a frequent deletion ~6 kb downstream of ZNF737 (zinc finger protein 737, uncharacterized protein), which was also observed in the association analysis using CNV calls. These data suggest that CNVs do not underlie a substantial proportion of variance in reading and language skills. Analysis of additional, larger datasets is warranted to further assess the potential effects that we found and to increase the power to detect CNV effects on reading and language.


Background
Reading disability (RD or developmental dyslexia) and specific language impairment (SLI) are two of the most prevalent neurodevelopmental disorders, with a prevalence of ≈5-8 % among school-aged children (as reviewed in [1][2][3]). Both RD and SLI are multifactorial disorders with moderate to high heritabilities and are characterized by high comorbidity, also with other neurodevelopmental disorders such as attention deficit hyperactivity disorder (ADHD) and speech sound disorders (SSD) [2,4,5]. It is likely that RD and SLI, as well as the underlying readingand language-related skills, share some genetic/neurobiological mechanisms [6,7].
In these genes, most of the variants that have been tentatively associated with reading and/or language traits are single-nucleotide polymorphisms (SNPs), although other types of genetic variants have also been implicated. These include balanced translocations disrupting ROBO1 [12] and DYX1C1 [13] in dyslexic cases and translocations and deletions affecting FOXP2 in a severe form of speech and language disorder, involving childhood apraxia of speech (CAS) [14].
The putative genetic associations reported so far can explain only a small proportion of heritable variance in reading and language skills [1,4,10]. Part of the "missing heritability" may be represented by Copy Number Variants, defined as structural variations in the genome that result in regions larger than 1 kb showing a nondiploid copy number. Several copy number variants (CNVs) have been identified in severe neurodevelopmental and neuropsychiatric disorders, including schizophrenia (SCZ), Autism Spectrum Disorders (ASD), Intellectual Disability (ID) and Developmental Delay (DD) [15,16]. However, only a few studies have focused on reading and/ or language impairments, which we review briefly here. In the majority of these studies, a perfect co-segregation between CNVs and poor reading/language performance has seldom been observed.
The largest study to date on CNVs in dyslexia involved 376 RD cases and 337 controls. Candidate susceptibility CNVs were found, overlapping IMMP2L and AUTS2 (7q11.22) [20], a well-known ASD susceptibility locus.
With regard to language impairments, Wisznieski et al. [21] identified a heterozygous deletion disrupting the gene TM4SF20 (2q36), co-segregating with language delay in 15 Southeast Asian families. In a CNV scan of SLI families, a~21-kb exonic microdeletion within ZNF277 (7q31.1, adjacent to the IMMP2L/DOCK4 locus) was found [22]. A genome-wide CNV study comparing 127 independent SLI cases from the same dataset, together with firstdegree relatives and unrelated controls, reported novel candidate de novo CNVs [23], disrupting the genes ACTR2 (2p14), CSNK1A1 (5q33.1), and the regions typically involved in 22q11.2 and 8p23.1 duplication syndromes. A recent CNV screen in a longitudinal cohort of children with language-related difficulties or family risk of dyslexia revealed a de novo deletion in 15q13.1-13.3, observed in a child with persistent language impairment, normal reading skills, and no evidence of sensory or neurological problems [24]. This large heterozygous deletion had been previously reported in cases of broader neurodevelopmental delay [24].
Other CNVs have been associated with poor reading or language performance in the context of other comorbid disorders. A deletion disrupting both DOCK4 and IMMP2L (7q31.1) was found to co-segregate with poor reading performance in a family with two ASD cases, and another DOCK4 exonic deletion co-segregated with RD in a distinct dyslexic family [25]. Canonical 16p11.2 microdeletions-usually implicated in mild cognitive impairment, general developmental delay, speech and language problems, and ASD-have been associated with CAS by independent studies [26,27]. The same microdeletion was hypothesized to act jointly with a 6q22.31 duplication in a subject with CAS and pervasive developmental disorder [28]. Prader-Willi/Angelman patients, presenting deletions/duplications of the 15q11.2 region, have been reported to frequently exhibit speech and language delays [29]. Similarly, subjects with 2p15-p16.1 microdeletion syndrome typically show cognitive, linguistic, and psychiatric disabilities. In this region, a de novo deletion encompassing BCL11A has been implicated in a mild phenotype characterized by apraxia, dysarthria, and expressive language delay [30].
Recently, Stefansson and colleagues [31] investigated the effect of several CNVs previously associated with SCZ or ASD (hereafter called "neuropsychiatric CNVs") on different cognitive traits in a large Icelandic sample (N~102,000). By comparing SCZ patients, neuropsychiatric CNV carriers, other CNV carriers, and general population controls, they found that neuropsychiatric CNV carriers performed at a level between SCZ patients and controls on several psychometric tests, suggesting an effect of these CNVs on general cognition. Some neuropsychiatric CNVs showed association with cognitive abilities: among these, 16p11.2del and 22q11.21dup were associated with category and letter fluency, while 15q11.2del was associated with a history of dyslexia and dyscalculia.
In the present study, we have further investigated the potential influence of CNVs on reading and language performance through a comprehensive set of analyses, including total genome-wide CNV burden testing and two complementary methods to screen the genome for individual CNVs that may affect these traits. We used a dataset that has been previously included in a SNPbased GWAS meta-analysis (GWASMA) of reading and language traits [11], composed of children recruited for school history of RD or ADHD, and their unaffected siblings.
Current CNV research in psychiatric genetics often relies on case/control dichotomous classifications and seldom detects perfect co-segregation between CNVs and disease status. When heritable quantitative traits are available that are strongly correlated with a dichotomous definition of a disorder-as in the case of reading/language traits-analyzing the effect of putative CNVs directly on the quantitative trait provides an effective alternative to the analysis of co-segregation between CNVs and the disorder. The former analysis is aimed at detecting variants with reduced penetrance and variable expressivity on traits of interest, while the latter one is aimed at detecting variants with full penetrance and expressivity. We used both approaches in our study.

Methods
The experimental workflow of the present study, described in this section, is summarized in Fig. 1. For simplicity, genotype and phenotype quality control (QC) are described below in single paragraphs.

Dataset
The dataset analyzed in the present work was collected in the Colorado Learning Disabilities Research Centre (CLDRC) study, an ongoing research project on the etiology of learning disabilities carried out in 27 school districts in Colorado, USA [32]. Briefly, pairs of twins were recruited for a school history of RD or ADHD in at least one of the twins; they were then administered a number of psychometric tests for several learning-related skills, along with their additional co-siblings, and DNA was collected for genetic studies. The Institutional Review Boards of the University of Nebraska Medical Center Fig. 1 Experimental workflow and dataset analyzed in the present study. (Single asterisk) As described in [11]. (Double asterisks) RD cases were defined as samples in the lowest 10 % of IBGdiscr score distribution. (Triple asterisks) Legend of CNV states: "CNV+" corresponds to copyN ≠ 2 (≠1 for X chromosome probes in males); "CNV−" corresponds to copyN = 2 (=1 for X chromosome probes in males). See "GWAS with CNV state" section for further details and of the University of Colorado at Boulder had approved the protocol, and written informed consent of the participants (or their parents) was obtained.
For MZ twin pairs, we selected one child per pair based on the maximum availability of reading-and language-related trait data or otherwise randomly. The sample of twins and siblings available for this study comprised 749 participants in total (mean age 11.7 years, age range [8][9][10][11][12][13][14][15][16][17][18][19], from 343 unrelated twinships/sibships. Of these, 266 of the twinships/sibships (a total of 585 participants) were originally recruited via a proband with a history of RD and 77 of the twinships/sibships (164 participants in total) were originally recruited via a proband with a history of ADHD. The two subsets are indicated hereafter as CLDRC-RD and CLDRC-ADHD.

Reading and language measures
The reading-and language-related traits assessed in CLDRC are reported in Table 1, and the relevant measures are described in detail elsewhere [11]. These traits had been previously age-adjusted according to normative data and further rank-normalized when a measure differed significantly from normality. Phenotypic outliers were removed from the dataset, along with subjects with full-scale IQ <70 (two participants in CLDRC-RD in total). This left 564 subjects in CLDRC-RD and 163 in CLDRC-ADHD. Then, samples underwent separate principal component analyses (PCAs) in CLDRC-RD and in CLDRC-ADHD for the computation of the first principal component scores within each dataset, as briefly described below (further details in [11]).

First principal component score
The first principal component (PC1) from all of the language-and reading-related traits available (Table 1) was derived in each dataset, through the SPSS® 20.0 Factor Analysis. Only linear components with Eigenvalue >1 were extracted, allowing for correlation among the components (oblique rotation) and excluding subjects with any missing measure. PC1 explained 64.5 % of the total variance in CLDRC-RD and 52 % in CLDRC-ADHD, while PC2 explained no more than 13 % of the total variance in both datasets. PC1 scores showed a broad pattern of loadings across the traits in both datasets ( Table 1). To obtain a measure of shared variance in reading and language skills independent of general cognitive abilities, we also regressed PC1 against performance IQ (which had not been included in PC1 computation), again separately within the two datasets, and used the residuals as IQ-adjusted PC1 scores (IQadjPC1).

IBG discriminant score
We also used an additional phenotypic trait, the IBG discriminant score (called IBGdiscr hereafter), a discriminant function empirically developed to diagnose RD in the context of the CLDRC study [33]. This score is a composite measure of word recognition, spelling, and reading comprehension tests (further details available in Additional file 1). For the purpose of the present study, we used IBGdiscr to select all the participants in the first and tenth decile of the score distribution (Additional file 1: Figure S1a), namely all the subjects with a standardized IBGdiscr <−1.4 (N = 67) and >2.2 (N = 69), as representative of poor and good reading performance, respectively. For simplicity, we indicate these subjects as "RD cases" and "controls" in the analyses where a dichotomous casecontrol classification was needed (see below). Pairwise trait correlations of the reading and language composite/component scores analyzed-computed as median Pearson's r coefficients over 100 repeat random samplings of one individual from each unrelated sibship-were high (r~0.83-0.98), both in CLDRC-RD and in CLDRC-ADHD (see Additional file 1: Table S1).

DNA array data: generation and quality control
The two subsets were treated as a single dataset in DNA data generation and QC, as previously described in our GWAS meta-analysis [11]. DNA was extracted from whole blood or buccal swab samples and prepared for genotyping using standard protocols. DNA array data were generated using Illumina® Human OmniExpress array (730 k SNPs), and data were processed using Illumina's GenomeStudio® software, following the manufacturer's guidelines. QC and CNV calling process followed procedures already used in previous CNV studies [23,34,35]. Samples with genotyping success rate <95 % were discarded in GenomeStudio, along with probes mapping as "0" (no position) and "Y" (Y chromosome) and probes with call frequency <95 %. Using functions in the software PLINK v1.07 [36], we filtered out samples which showed inconsistencies in genome-wide identity-by-descent sharing with their siblings and unrelated individuals, or sex mismatches, or call rates <98 %, as well as homozygosity outliers, as described elsewhere [11].
As a further QC step for this study, we ran a PCA on the log R ratio (LRR) intensity signals of the 723,002 probes passing QC, through the pca command (singular value decomposition method) in the pcaMethod R package [37], extracting the first 100 principal components. This allowed us to assess the absence of extreme batch effects among the different plates of the microarray and to detect and remove 14 LRR outliers (Additional file 1: Figure S1d), which left 713 subjects for subsequent analysis.

CNV calls
To detect CNVs, we applied PennCNV (version June 2011) [38] separately for autosomes and the X chromosome (704,855 and 18,147 SNPs, respectively), analyzing the two subsets jointly (N = 713). For this analysis, we built a custom population B allele frequency (PBF) file from our array intensity data through the compile_pfb.pl script in PennCNV, while default HMM parameters and GC model signal adjustment file were used. In order to obtain highly reliable CNV calls, we applied a series of filters to the CNV events initially called through the detect_cnv.pl script: only putative CNVs with a minimum confidence score of 10, covering at least 20 kb and 10 consecutive SNPs and showing limited overlap (<50 %) with Ig regions, pseudo-autosomal regions (PARs), centromeres, or other large genomic gaps were selected. Moreover, to ensure only high quality of samples, we filtered out samples showing an excessive number of CNV calls (>100 autosomal CNVs per sample) and LRR standard deviation >0.35. All the other parameters for sample filtering were set to default. Close CNVs were joined when the gap separating them was ≤20 % of the total length of the region that they covered. CNVs passing QC were finally annotated to RefSeq genes, within 50 kb beyond the 5′-and 3′-untranslated regions (UTRs), to include CNVs overlapping potential regulatory regions. Similarly, we annotated CNVs overlapping exons, and we identified a subset of "rare" CNVs, defined as CNV calls showing overlaps with less than five CNV events reported in the Database of Genomic Variants (DGV, July 2013 release, hg19). At the end of this process, we had 4490 final CNV calls for 702 samples, of which 3344 were annotated to genes, 2542 to exons, and 872 were rare.

Interpretation of CNVs and general statistics
The samples passing PennCNV QC (N = 702) were tested for correlation between their CNV burden-both in terms of total length and of total number of CNV events per sample-and our continuous traits of interest, namely PC1 and IQadjPC1, separately in the two subsets. This analysis was applied to 525 PC1/IQadjPC1 scores available in CLDRC-RD and to 155 scores available in CLDRC-ADHD. We repeated the same analysis on CNVs annotated to genes, on CNVs annotated to exons, and on rare CNVs (defined as above). Similarly, we analyzed correlations by length class, i.e., for short calls (<100 kb), medium calls (≥100 and ≤500 kb), and large calls (>500 kb) separately. To generate correlations unbiased by non-normality of CNV burden measures and by sample relatedness, rho correlation coefficients were calculated as the median rho over 100 repeat random samplings of one individual from each unrelated sibship, in R [39].
For the same classes of CNVs analyzed above (all, annotated and rare CNVs), we carried out a case-control burden analysis on 67 RD cases and 69 controls as defined above, through logistic regression of binary affection status versus CNV burden measures, over repeat random samplings of one individual from each unrelated sibship.
The final annotated CNVs were also assessed individually for co-segregation with the "RD case" status, focusing on large CNVs, on CNVs shared between two or more affected co-siblings, on CNVs affecting genes previously implicated in reading and language traits (see the "Background" section) or overlapping with other neuropsychiatric CNVs (previously tested by Stefansson et al. [31]).
Genome-wide CNV association analyses of continuous reading and language PC traits GWAS with CNV state CNV calls made in PennCNV were also used for a genome-wide association test between CNV state at each probe and PC1/IQadjPC1. The alternative CNV states at each probe were "CNV-negative" (CNV−) when a probe showed a diploid copy number, and "CNV-positive" (CNV+) when it showed an abnormal copy number. In other words, both deletions and duplications at each probe were considered as a single CNV+ state.
In order to have a bi-allelic coding for probes involved in this analysis, which indicated the presence or absence of a non-diploid state, proxy genotypes were created in the .ped input files. These were coded as "11" when the probes were not covered by any CNV (i.e., copy number =2) and as "12" when they fell within CNV calls (i.e., copy number ≠ 2). For chromosome X, CNV states per probe were coded as "11" for probes with copy number =1 and "12" for probes with copy number ≠1 in males, while they were coded following the rules of autosomal CNV state in females. Then, X chromosome probes were tested for association separately within males and females and later meta-analyzed. To correct for non-independence of siblings, permutations were run in QFAM analysis, as previously described in [11]. After QFAM analysis, the results of separate GWAS for CLDRC-RD and CLDRC-ADHD were meta-analyzed using the METAL software package, through the samplesize-based scheme [40]. Results were then interpreted in terms of consecutive probes showing significant associations (i.e., at least two consecutive probes with p < 0.005 at the genome-wide level and contiguous with two or more probes with p < 0.05), representing regions of overlap of two or more CNVs with potential effects on the continuous traits investigated.

GWAS with intensity data
As a complementary analysis, we tested for association of LRR and BAF (beta allele frequency) intensity data from our DNA array with PC1 and IQadjPC1, applying FamCNV 2.0 [41] (beta version available upon request to Dr. Mario Falchi).
In this analysis, we tested for association of 704,855 autosomal probes passing QC in CLDRC-RD (N = 525) and in CLDRC-ADHD (N = 155), using as covariates the first and second principal components computed in the PCA of LRR data (see the "DNA array data: generation and quality control" section). After running separate GWAS in the two subsets, the results were metaanalyzed as above, using rho correlation coefficients between LRR data and PC1/IQadjPC1 as beta values at each probe, indicative of the direction of association. Results were interpreted in terms of contiguous probes showing significant associations (i.e., pairs of consecutive probes with p < 0.001 and contiguous with two or more probes with p < 0.05), which were more likely to represent real CNV effects.

Pathway-based analysis of CNV calls
To test specific molecular networks for enrichment of potentially disrupting CNVs, we ran a pathway-based association analysis in INRICH v1.0, through the TARGET algorithm [42]. This tool tests groups of independent genomic intervals for enrichment of overlaps with predefined gene sets, through a permutation-based approach. We initially tested 306 CNVs called in 67 RD cases, and then we restricted the analysis to 84 rare CNVs in the same subset.
Gene boundaries in the tested gene sets were again defined as extending 50 kb beyond the 5′-and 3′-UTRs, while random genomic intervals simulated in the permutations of the test were extracted from a reduced set of 43,525 SNPs, namely all the probes encompassed by CNV calls. We considered testing CNV calls more suitable than testing associated genomic intervals as produced by GWAS analyses, since such intervals would need to be defined on an LD basis, which is clearly inappropriate for the analysis of CNVs.
Initially, we tested three candidate gene lists for enrichment, based on the gene sets of the Gene Ontology Database (http://www.geneontology.org/). These gene sets represented three distinct neurobiological hypotheses on the etiology of RD (see the "Background" section): axon guidance (including all the GO sets containing the term "axon guidance"), neuronal migration (including all the GO sets containing the term "neuron migration"), and sex hormone biology (including all the GO sets containing the terms "steroid," "androgen," "estrogen," "progesterone," and "testosterone"). Then, we extended the assessment to 1748 GO sets containing at least 10 genes, for exploratory purposes.

CNV calls General CNV burden statistics
After QC, there were 4490 final CNV calls in 702 samples, of which 3344 were annotated to genes within a 50-kb interval from the UTRs, 2542 were annotated within exonic borders, and 872 were rare. Samples passing QC showed a median number of 6 CNVs per sample (4 considering only CNVs annotated to genes) and a median total length of~640 kb covered by CNVs per sample (~479 kb considering only CNVs annotated to genes).
Correlation assessments between CNV burden measures (both CNV number and total length) and our continuous traits of interest-PC1 and IQadjPC1-did not reveal any significant correlation in the two CLDRC subsets, when considering all the CNVs passing QC (most significant correlation rho~−0.097, p = 0.4) or when considering only CNVs annotated to genes or to exons (most significant correlation rho~−0.036, p = 0.51). Similarly, burden analysis by length class (applied to all CNVs passing QC) did not show any significant correlation (most significant correlation rho~−0.17, p = 0.13, detected for short CNVs). We also tested correlation using burden statistics of rare CNVs called in our dataset, but again found no significant correlation with principal component (PC) scores (most significant correlation rho~−0.15, p = 0.2). Finally, case-control burden analysis comparing 67 RD cases and 69 controls did not reveal any significant association with CNV burden statistics (data not shown).

Large CNVs
Large CNVs are more likely to span multiple genes and to have deleterious effects than smaller CNVs. Among CNVs spanning more than 500 kb in RD cases ( Table 2, see Additional file 2: Table S2a for further details), a heterozygous duplication was detected in two affected siblings, but not in their unaffected co-sibling (with IBGdiscr = −0.62, PC1 = −0.47, and IQadjPC1 = 0.42). This large CNV spanned~1.2 Mb in the pericentromeric region 11q11-q12.1, covering several OR genes (encoding olfactory receptors) and TRIM genes (encoding tripartite motif proteins).

CNVs shared between RD cases
Among all the sibships analyzed, ten presented more than one RD case. In these sibships, we assessed annotated CNVs which were detected in two or more affected co-siblings but in no unaffected participant, regardless of their length. We investigated these variants as they were more likely to confer genetic susceptibility to reading impairment, compared to CNVs presented by single cases. A total of three CNV events fell in this category ( Table 3, see Additional file 2: Table S2b for further details), including the large duplication mentioned above and other two CNV events.
In a family presenting two affected siblings but no unaffected co-siblings, we detected two shared CNVs (both heterozygous duplications), which were not detected in any other participant in the study. One of them, spanning 27 kb on 6q24.2, covered the last nine exons (66-74) in the 3′ terminal region of the UTRN (utrophin) gene, including its 3′-UTR. The other one spanned~63 kb and overlapped exons 38-49 within DNAH14 (dynein axonemal heavy chain 14) on 1q42.12.

CNVs in genes previously associated with reading and language traits
We identified three putative CNVs annotated to candidate susceptibility genes that have been implicated in reading and language traits by more than one study (see [8][9][10] for reviews). These CNVs are reported in Additional file 2: Table S2c. Among the candidate genes assessed, DYX1C1 and CNTNAP2 were overlapped by one or more of these CNVs. However, only one of the three participants showing these variants was impaired and none of these CNVs co-segregated with poor reading-language performance (Additional file 2: Table S2c).
Similarly, we detected four CNV calls overlapping genes in which suggestive associations were observed in previous GWAS studies of both reading and language skills (reviewed in [10]). A list of these CNVs is reported in Additional file 2: Table S2d. Again, none of these variants co-segregated with RD status or with poor reading-language performance.

CNVs previously associated with weak reading/language performance and common neuropsychiatric CNVs
We checked our CNV calls for overlaps with genes and regions previously found to be disrupted by CNVs in subjects with weak reading/language performance (see the "Background" section). Additional file 2: Table S2e reports these CNVs, which were detected in NEGR1, IMMP2L, PCDH11X, CNTNAP2, CSNK1A1, MSRA (8p23.1), UBASH3B, CACNA2D1, VWA3B, CXorf22, CHRNA7 (15q13.1), and in several genes in the 22q11.21 region. As before, none of these variants showed cosegregation with RD or poor reading-language performance in the sibships.
Similarly, we assessed overlaps with common neuropsychiatric CNVs recently tested by Stefansson and colleagues [31] for effects on several cognitive traits in a large sample of the Icelandic population. Additional file 2: Table S2f reports a list of canonical CNVs detected in our study (i.e., largely or completely overlapping the abovementioned neuropsychiatric CNVs, reported in Table S1 in [31]). Among these CNV events, a 1.33-Mb heterozygous duplication in 16p13.11 was detected in an affected participant, who had the lowest phenotypic scores in his sibship and exhibited strong score discrepancies with his co-sibling (>3 for IBGdiscr and PC1 and >2.6 for IQadjPC1). However, a similar duplication was present in an unrelated participant showing normal performance, with PC1 and IQadjPC1 scores higher than those of his sibling (data not shown). None of the other carriers of such canonical neuropsychiatric When a CNV is annotated to more than five RefSeq genes, these are reported in a footnote (see below). All the CNVs partially overlapped or encompassed the genes to which they were annotated. All the positions are expressed in hg19 coordinates. The frequency column specifies how each CNV call showed substantial overlaps (≥50 %) with any CNV reported in the DGV database (July 2013, hg19): ≥5 overlaps for common CNVs, <5 overlaps for rare CNVs. An extended format of this table including further details is available in Additional file 2: Table S2a)  CNVs were RD cases, based on IBGdiscr performance (see Additional file 2: Table S2f). When two or more CNV calls were overlapping in these regions, the encompassed probes were assessed in PLINK QFAM analysis of CNV state to detect stretches of consecutive probes associated with PC1 and IQadjPC1 scores.

Family-based GWAS of principal component scores
Association test with CNV state at each probe GWAS meta-analysis testing association between CNV state at each probe and PC1/IQadjPC1 did not report any significant association surviving correction for multiple testing of two traits and 5173 autosomal probes meta-analyzed (α = 4.8 × 10 −6 ), representing all the probes encompassed by at least one putative CNV event in both our subsets. None of the 1900 X chromosome probes lay within CNV events detected in participants of both sexes and in both CLDRC subsets; therefore, none of these probes was meta-analyzed. The results of this analysis on an individual probe basis are reported in Additional file 3: Tables S3a, b. No genome-wide significant association was detected in either of the two subsets analyzed (data not shown).
These results were interpreted in terms of consecutive probes showing significant associations with PC1 and/ or IQadjPC1 (i.e., at least two consecutive probes with p < 0.005 and contiguous with two or more probes with p < 0.05), in regions of overlap of two or more CNVs in our dataset ( Table 4) Table 3 Annotated CNVs shared between two or more affected co-siblings in ten families presenting more than one RD case, which were not detected in any unaffected participant When a CNV is annotated to more than five RefSeq genes, these are reported in a footnote (see below). All the CNVs partially overlapped or encompassed the genes to which they were annotated. All the positions are expressed in hg19 coordinates. The frequency column specifies how each CNV call showed substantial overlaps (≥50 %) with any CNV reported in the DGV database (July 2013, hg19): ≥5 overlaps for common CNVs, <5 overlaps for rare CNVs. An extended format of this table including further details is available in Additional file 2:  All the regions of overlap of two or more CNVs, showing at least two consecutive probes with association p < 0.005 and two or more contiguous probes with association p < 0.05, are reported. The results of this meta-analysis on an individual probe basis are reported in detail in Additional file 3: Table S3a Figure S3c); and chr11:55,241,556-55,362,955 encompassed genes OR4C15 and OR4C16 (olfactory receptors 15 and 16, family 4, subfamily C, 11q11; Additional file 3: Figure S3d). Frequency of CNV+ state in these regions ranged between 0.4 and 3.4 %. We also checked the presence of nominally significant associations (i.e., at least two consecutive probes with p < 0.05) in the regions disrupted by CNVs in cases of reading, language, or more severe neuropsychiatric disorders (see the "Background" section). If CNV events in any of these regions had been called only in one of the subsets and therefore meta-analysis had not been run for the probes encompassed, we assessed directly the GWAS results in the relevant subset. Among the candidate CNVs assessed, ã 134-kb region (chr15:32,380,064-32,514,341) partially overlapping CHRNA7 (cholinergic nicotinic receptor alpha 7, 15q13.3; Fig. 2) showed a series of 25 consecutive probes associated with PC1 (p values~[0.026; 0.045], Additional file 3: Table S3e). These associations were detected in CLDRC-RD as no CNVs were called in the CLDRC-ADHD subset and were not significant in the IQadjPC1 GWAS (p values~[0.053; 0.085]). This region showed a frequency of CNV+ state of~1.8 % (see Additional file 2: Table S2e for relevant CNV calls) and a positive allelic trend between the CNV+ state and PC1/ IQadjPC1 (see Additional file 3: Table S3e).

Association test with probe intensity data
GWAS meta-analysis of PC1/IQadjPC1 scores with intensity data (FamCNV) did not reveal any genome-wide significant association surviving correction for multiple testing of 704,855 autosomal probes and two traits metaanalyzed (α = 3.6 × 10 −8 ). The results of this analysis on an individual probe basis are reported in Additional file 3: Tables S3c, d. No genome-wide significant association was detected in the two subsets analyzed (data not shown).
Also for this analysis, we were interested in detecting two or more consecutive probes showing significant association. For this purpose, we filtered our association results to detect all the pairs of consecutive probes with p < 0.001 and contiguous with two or more probes with p < 0.05. Such criteria were set to reduce the probability to observe spurious associations due to noise intrinsic to raw intensity data. Although we did not find any region meeting these criteria in the results of the meta-analysis, we found such a region in the GWAS in CLDRC-RD. This~58-kb region (chr19:20,657,781-20,715,228) consisted of eight consecutive SNPs on 19p12, associated with both PC1 (top consecutive hits rs2021399 and rs2545918, p = 6 × 10− 4 and 4 × 10 −4 , respectively) and IQadjPC1 (p = 2 × 10 −4 and 9 × 10 −4 ; see Additional file 3: Table S3f ). This region lay within a~80-kb deletion that was frequent in our dataset (called in 11.3 % of CLDRC participants, for a total of 80 CNV calls, reported in Additional file 2: Table S2g) and~6-kb downstream of ZNF737 (zinc finger protein 737, Fig. 3). The same region of overlap also showed nominally significant association (p values~[0.009; 0.022]) in the PLINK QFAM analysis with CNV state, in a wider interval (chr19:20,626,179-20,715,228, see Fig. 3 and Additional file 3: Table S3g). Both in the association test with SNP intensity data and in the association test with CNV state, this deletion showed a positive effect on PC1/IQadjPC1.

Pathway-based analysis of CNV calls
Pathway association analysis of 306 CNV calls presented by 67 RD cases did not reveal any significant enrichment, neither in the analysis of three composite candidate pathways representing neuronal migration, axonal guidance, and steroids-related processes (Additional file 3: Table S3h), nor in an exploratory analysis at the pathway-wide level (data not shown). Similarly, we did not observe any significant enrichment in the analysis of 84 rare CNVs detected in RD cases (see Additional file 3: Table S3i for results on candidate pathways).  Table S2e for details)

Discussion
Our research on potential effects of CNVs on reading and language is novel for two main reasons: First, we investigated the effects of CNVs on a continuous index of reading and language performance, in datasets enriched for the lower tail of the population distribution. Although a similar approach was used in a recent study by Stefansson and colleagues [31], who investigated the effect of candidate neuropsychiatric CNVs on cognitive traits in a large sample of the Icelandic population, their study analyzed a broad spectrum of cognitive abilities and included general population controls. It was not aimed at capturing shared variation derived from a detailed battery of reading and language measures in a selected population, as was our study.
Second, to detect effects of CNVs on continuous reading and language performance, we used two complementary approaches: one aimed at detecting copy numberdependent effects in a "dosage-dependent" additive model and one aimed at detecting associations with a "CNVpositive" state irrespective of the non-diploid copy number. These two analyses were performed in order to identify potential CNVs with reduced penetrance and variable expressivity and were in turn complementary to our analysis of co-segregation between CNVs and RD status, which was aimed at detecting variants with high penetrance and expressivity.
In our dataset of subjects with school histories of RD/ ADHD and their siblings, we did not identify a significant correlation between CNV genomic burden-both in terms of total length and of total number of CNVs per subject-and PC scores representing reading-language performance. Similarly, our case-control burden analysis did not reveal any significant contribution of CNVs to RD. This is in line with a previous study which detected no significant difference in the genomic burden of large rare CNVs between RD cases and controls [20]. However, our result is in partial contrast with a recent study which reported an increased CNV burden in SLI cases compared to controls [23]. On balance, it appears likely that CNVs have a relatively limited role in affecting reading-related performance at the population level, whereas they are known to play a more important role in severe neuropsychiatric disorders such as autism, schizophrenia, and ID [15,16,20] and may also affect severe cases of SLI. Further analyses in independent datasets will be needed to clarify the extent to which CNVs may affect cognitive domains that are shared between reading and language.
In this study, we detected a CNV which co-segregated with the dyslexic status in a family with two RD casesincluding the most severely impaired subject in our dataset-and one unaffected sibling. This large CNV event spanned~1.2 Mb in the pericentromeric region 11q11-q12.1, covering several OR (olfactory receptors) and TRIM (tripartite motif protein) genes. While TRIM proteins are not well characterized, the role of olfactory receptors in triggering odor perception signals in sensory neurons is well known. Interestingly, olfactory bulbs dysgenesis/agenesis has been previously implicated in ASD [43] and reduced volumes have been reported in schizophrenic patients [44]. However, the partial overlap of this CNV with a centromeric region and the relaxed selection at the OR loci [45] suggest caution in the biological interpretation of this variant.
Two other CNVs shared between cases were detected, overlapping potential susceptibility genes. These two heterozygous duplications were observed in a pair of affected siblings but were not detected in any unaffected participant. One of them overlapped 9 exons in the 3′ terminal region of the UTRN gene (utrophin, or dystrophin-related protein 1, 6q24.2) and the other one overlapped 12 exons within DNAH14 (dynein axonemal heavy chain 14, 1q42.12). Utrophin is a large skeletal muscle protein-also expressed in the CNS-contributing to postsynaptic membrane maintenance and to clustering of acetylcholine receptors in the neuromuscular synapses and possibly playing a role in anchoring the cytoskeleton to the plasma membrane. However, as the partial duplication of UTRN overlaps its 3′-UTR region, it is possible that this has no effect on the mRNA produced. This possibility may be addressed through future gene functional analysis. Dyneins are microtubule-associated motor proteins with a key role in cilia-mediated cell motility.  Table S2g) Independent studies have reported evidence of involvement in cilia-related processes for two RD candidate genes, DYX1C1 [46,47] and DCDC2 [48]. This led to the hypothesis that dyslexia may sometimes be a form of ciliopathy [46], involving abnormal neuronal development and migration [48].
Pathway-based enrichment testing of CNV calls detected in RD cases revealed no significant associations for three candidate gene sets representing mainstream hypotheses on the etiology of RD, namely axon guidance, neuron migration, and steroids-related processes. This is in line with the pathway enrichment test based on SNP associations in an earlier GWASMA study that we carried out [11].
The two complementary strategies for genome-wide association testing between CNVs and our principal component reading-language scores revealed partly different but partly consistent results. The first of these analyses-which made use of CNV calls and tested association with CNV state at each probe-was aimed at detecting associations in regions of overlap of CNV calls, irrespective of the abnormal copy number state. The second analysis, in FamCNV, assessed copy number (or allele dosage)-dependent associations between DNA array intensity data and PC1/IQadjPC1. These are practical strategies to detect different kinds of effects of CNVs on continuous traits, both of which have precedence in the literature: in recent studies, copy number-dependent (dosage) effects were reported for continuous traits such as body mass index [49] and structural brain measures [31]; while either deletions or reciprocal duplications of specific regions have been reported to result in similar clinical and phenotypic manifestations, as in the case of ASD, language/developmental delays, and other psychiatric disorders [15,16]. Both analyses were run probeby-probe, but results were then interpreted in terms of consecutive probes showing significant associations, which was appropriate for an analysis of CNVs.
Although no associations reached genome-wide significance in PLINK QFAM meta-analysis, some of the top associated regions involve plausible candidate genes. Ã 11-kb CNV overlap, associated with IQadjPC1, lay in an intronic region within CNTN4 (contactin 4, 3p26.3; Additional file 3: Figure S3a). This overlap was shared by three heterozygous duplications and one heterozygous deletion, which all showed concordant positive effects on PC scores. Contactins are Ig cell adhesion molecules with a fundamental role in neuronal development and plasticity. CNVs and structural rearrangements disrupting CNTN4 have been implicated in severe neurodevelopmental disorders such as ASD [50,51] and DD [52]. Interestingly, the associated region detected in the present study overlaps with CNVs reported in ASD cases in two previous studies [50,51], and contactin 4 is widely expressed in the brain, particularly in the cerebellum, thalamus, amygdala, and cerebral cortex [50]. However, this association was weaker and not significant with PC1.
Another intronic CNV overlap region of~21 kb, associated with both PC1 and IQadjPC1, was found within CTNNA3 (catenin alpha 3, 10q21.3; Additional file 3: Figure S3c). This region resulted from the overlap of three deletions and showed a negative effect on PC scores. α-catenins have a crucial role in cell adhesion, and CTNNA3 has been implicated in ASD etiology through both CNV studies [53,54] and GWAS studies [55,56]. Our associated region partially overlaps an inherited compound heterozygous deletion encompassing exon 11, found in an ASD patient [53]. Expression of CTNNA3 in mouse hippocampus and cortex at postnatal day 0 suggests a specific neuronal role at very early developmental stages [53].
We also assessed CNV overlaps in regions previously reported to be disrupted by CNVs in reading, language, or more severe neuropsychiatric disorders. Among these, a~134-kb region of overlap between nine heterozygous duplications and one heterozygous deletion, encompassing several exons in the 3′ region of CHRNA7 (cholinergic nicotinic receptor alpha 7, 15q13.3; Fig. 2), presented nominally significant association with PC1 in the CLDRC-RD subset (while no CNV calls were detected in CLDRC-ADHD). The association only approached significance after IQ adjustment and the CNV state exerted a positive effect on PC1/IQadjPC1, with both deletion and duplications showing the same direction of effect. Nicotinic cholinergic receptors are ligand-gated ion channels that mediate fast signal transmission at synapses and are ubiquitously expressed in the CNS. Several studies have suggested a possible involvement of CHRNA7 in language skills. A CNV encompassing this gene was suggested to contribute to the disruption of synaptic pathways in a patient with ID and language impairment [57]. A genome-wide CNV screen also reported CHRNA7 among the genes disrupted in a group of unrelated SLI cases, as well as a significant over-representation of the GO category acetylcholine binding in a pathway-based analysis of these CNVs [23]. A recent longitudinal study of children with language difficulties implicated a deletion at 15q13.1-13.3 (BP3-BP5) in the etiology of SLI, and the authors hypothesized a role of CHRNA7 in the phenotypic effects associated to this region [24]. The 15q13.3 region is also a hotspot of neuropsychiatric CNVs, which have been implicated in several disorders including SCZ, ASD, ADHD, and epilepsy [15,16]. CNVs encompassing this gene have also been tested for effects on general cognitive abilities, including school history of mathematical and reading difficulties, but no associations were reported [31].