Rare Copy Number Variations in Adults with Tetralogy of Fallot Implicate Novel Risk Gene Pathways

Structural genetic changes, especially copy number variants (CNVs), represent a major source of genetic variation contributing to human disease. Tetralogy of Fallot (TOF) is the most common form of cyanotic congenital heart disease, but to date little is known about the role of CNVs in the etiology of TOF. Using high-resolution genome-wide microarrays and stringent calling methods, we investigated rare CNVs in a prospectively recruited cohort of 433 unrelated adults with TOF and/or pulmonary atresia at a single centre. We excluded those with recognized syndromes, including 22q11.2 deletion syndrome. We identified candidate genes for TOF based on converging evidence between rare CNVs that overlapped the same gene in unrelated individuals and from pathway analyses comparing rare CNVs in TOF cases to those in epidemiologic controls. Even after excluding the 53 (10.7%) subjects with 22q11.2 deletions, we found that adults with TOF had a greater burden of large rare genic CNVs compared to controls (8.82% vs. 4.33%, p = 0.0117). Six loci showed evidence for recurrence in TOF or related congenital heart disease, including typical 1q21.1 duplications in four (1.18%) of 340 Caucasian probands. The rare CNVs implicated novel candidate genes of interest for TOF, including PLXNA2, a gene involved in semaphorin signaling. Independent pathway analyses highlighted developmental processes as potential contributors to the pathogenesis of TOF. These results indicate that individually rare CNVs are collectively significant contributors to the genetic burden of TOF. Further, the data provide new evidence for dosage sensitive genes in PLXNA2-semaphorin signaling and related developmental processes in human cardiovascular development, consistent with previous animal models.


Introduction
Tetralogy of Fallot (TOF) is the most common form of cyanotic congenital cardiac disease in humans. With surgical advances and increased longevity, attention has shifted from immediate outcomes to understanding causation. However, for most patients with TOF, the genetic basis for the disease remains unknown. Recently, there has been a focus on unbalanced structural genomic changes, or copy number variants (CNVs), and disease [1]. Copy number variation contributes to the genetic heterogeneity of many complex human diseases [2,3]. Investigation of CNVs that overlap genes has led to the discovery of novel etiologies and disease pathways, especially for developmental disorders [1,[4][5][6]. Our current understanding of the role of CNVs in the etiology of TOF, however, is limited. Early reports of CNVs in subjects with various types of congenital cardiac conditions, using low resolution methods, suggested that CNVs may be important [7][8][9][10][11] but there is just one report of genome-wide CNVs in 111 TOF patients using a high resolution microarray [12]. We used a high resolution genome-wide microarray and proven methods to: a) investigate the burden of rare CNVs in TOF compared to controls, b) identify putative candidate genes associated with rare and recurrent CNVs and c) assess, using a pathway analysis, whether the exonic CNVs found in TOF could identify functional gene sets relevant to cardiac development.

Results
Of the 495 unrelated adults with TOF recruited, 53 (10.7%; including 49 of European ancestry) had 22q11.2 deletions associated with 22q11.2 deletion syndrome [61], four had chromosomal anomalies detectable on karyotype (two with XXY, one with XXX and one with a 16 Mb 18q22 deletion) and five had previously diagnosed genetic conditions for which clinical genetic testing is in progress (three with Holt-Oram syndrome, one with CHARGE syndrome and one with VACTERL association). The remaining 433 adults [239 (55.2%) male] formed the CNV discovery sample [mean age 32.56 (SD 12.29 years)]; 45 (10.39%) had pulmonary atresia and 57 (13.16%) were in the syndromic group.
Using a strict CNV analysis strategy [5,6,13], we detected 63 CNVs on average per genome in TOF cases with a median size of 18,020 (range 397-5,997,249) bp, similar to results for the controls (Tables 1, 2 and 3 in Supporting Information S1). To minimize false positives, we focused on rare CNVs using a conservative definition (,0.1% in population-based controls), and employed identical methods for both cases and the independent Ontario Population Genomics Platform (OPGP) controls used for casecontrol analyses (see Methods). The main quantitative analyses involved only those subjects of European ancestry. We compared rare CNVs in the 340 TOF cases and 416 OPGP controls of European ancestry. To assess the experimental reproducibility of rare CNVs after in silico detection, we tested 68 CNVs across different size ranges using quantitative PCR (qPCR). We observed a high true positive validation rate of 65/68 (95.6%), consistent with our previous studies [5,6,13].

Rare CNV burden in TOF
We first compared the CNV burden of large (.500 kb) rare CNVs in the TOF cases and the OPGP controls. Consistent with our hypothesis, a significantly greater proportion of cases harbored large rare CNVs compared to controls (OR 1.89, 95% CI 1.06-3.35, p = 0.0278) ( Table 1). This was most notable for differences in large gains that overlapped exons (OR 2.54, 95% CI 1.17-5.50, p = 0.0148). However, if the 49 individuals with TOF and 22q11.2 deletions of European ancestry had been included, the odds ratio for large rare exonic losses would also have been significantly higher compared to controls (OR 10.87, 95% CI 4.80-24.08, p,0.0001). In contrast, the overall quantitative burden of rare CNVs of any size was similar between the TOF group and OPGP controls; most TOF and OPGP control subjects had one or more rare CNVs (Table 1). When CNV burden for individuals was defined as having two or more rare CNVs, there was no significant difference between cases and controls (data not shown).

Rare CNV burden in TOF subgroups
For those subjects with TOF, large rare CNVs were enriched in the syndromic subgroup however this difference reached statistical significance only for subjects with large rare exonic losses (OR 9.53, 95% CI 2.89-31.41, p = 0.0004) ( Table 1). These results would have been even more significant if individuals with 22q11.2 deletions had been included (data not shown). When individuals with one or more rare exonic loss CNVs of any size were considered, results were still significant but with a smaller OR (OR 2.69, 95% CI 1.35-4.60, p = 0.0013). A further TOF subgroup analysis comparing those with and without pulmonary atresia showed no significant enrichment of individuals with rare exonic loss CNVs in those with pulmonary atresia [48% (16 of 33) vs. 37% (115 of 307), p = 0.2162)]. Table 2 shows the 47 large (.500 kb) rare CNVs found in 43 of the 433 adults with TOF in the discovery sample. Most (39/47) were very rare, i.e., not found in any of 2,773 controls (2,357 population controls or 416 OPGP controls) and all but three overlapped genes. Several of these loci showed evidence for recurrence in TOF. The most compelling were 1q21.1 duplications ( Figure 2 in Supporting Information S1) (OMIM #612475) identified in four (1.18%) of 340 subjects of European ancestry. None met our syndromic criteria, however detailed examination of the phenotype revealed macrocephaly in two and tall stature in another of these subjects.

Large rare CNVs in TOF
There were two other subjects in the non-syndromic subgroup with genomic disorders at loci previously associated with congenital cardiac disease: one proband with a previously undetected 22q11.21 duplication (OMIM #608363) [14] and another with a typical 16p11.2 duplication (OMIM #611913) [15].
Amongst other large rare CNVs of note, one proband with syndromic features had a novel tandem duplication-deletion in the 18q22.3-q23 region that was transmitted to her daughter. Both had TOF, learning difficulties, short stature, obesity, and thyroid disease. This complex CNV overlapped the region involved in the 18q22 deletion syndrome, e.g., the 16 Mb deletion of a subject excluded from our TOF cohort ( Figure 3 in Supporting Information S1). There are three candidate genes in the distal end of the 3.5 Mb 18q23 deletion that may have an impact on cardiac development and/or are implicated by a relevant family of genes [16,17] (Table 2, Figure 3 in Supporting Information S1): NFATC1, PARD6G and SALL3 [18]. Another proband with syndromic features had a 1q41 deletion that may overlap the region of a translocation reported in a patient with TOF [19] and possibly the 1q41 deletion region (OMIM #612530) associated with holoprosencephaly 10.

Smaller very rare CNVs identifying genes of interest for TOF
The first section of Table 3 shows the smaller (,500 kb) very rare CNVs in the TOF sample that implicate specific candidate genes of interest at loci associated with TOF, including: GJA5 in the 1q21.1 duplication region ( Figure 2 in Supporting Information S1) [20], CDH19 at 18q22.1 (Figure 3 in Supporting Information S1), NBEA at 13q13.3 [21] and ANGPT2 at 8p23.1 [22]. Other candidate genes highlighted through overlap with results from other studies include: CECR5 in the cat eye syndrome region [23], RAF1 involved in Noonan syndrome [12] and PPM1K [12].

Author Summary
Congenital heart disease affects nearly 1% of all live births. Tetralogy of Fallot (TOF) is the most common form of cyanotic congenital heart disease. This condition is associated with hemizygous deletions of chromosome 22q11.2 and chromosomal trisomies, but little else is known about the genetic heterogeneity of this complex disease. We used high-resolution microarrays and stringent methods to study structural (copy number) variants in a systematically phenotyped cohort of unrelated adults with TOF. We found that individually rare genic copy number variants (CNVs) were collectively significant contributors to the genetic burden in TOF. Among CNVs that implicated candidate genes of interest were loss CNVs overlapping the PLXNA2 gene that codes for plexin A2. This is the first study to show a role for this semaphorin receptor in human congenital heart disease, consistent with a Plxna2 mouse knockout phenotype. Pathway analyses comparing rare exonic loss CNVs in the TOF sample to controls implicated other novel gene sets suggest new pathogenetic mechanisms. Table 3 also shows novel very rare (not found in 2,773 controls) CNVs overlapping genes with evidence for cardiovascular involvement. Two unrelated probands had 1q32.2 loss CNVs overlapping the PLXNA2 gene (Figure 1), which were confirmed by qPCR and sequencing across the junction breakpoints. Plexins play an important role in cardiac development, including cardiac neural crest cell migration and outflow tract morphogenesis [24]. We therefore resequenced PLXNA2 exons and splice sites in a subset (n = 192) of the TOF cases of European ancestry. This yielded nine missense variants but no additional nonsense or frame-shift mutations that would lead to haploinsufficiency of the gene ( Table 6 in Supporting Information S1). No point mutations were detected in the two individuals with PLXNA2 deletions and in silico inspection of the intronic CNV revealed no conclusive evidence of regulatory region disruption. Two other loss CNVs involved adjacent semaphorin genes at 7q21.11 with previous evidence for structural cardiac phenotypes (Table 3). One overlapped three exons of the SEMA3D gene coding for semaphorin 3D and the other overlapped the first intron of the SEMA3E gene, previously associated with CHARGE syndrome [25].
We also identified a group of four subjects with novel small rare CNVs containing genes associated with ciliary dysmotility: DNAH11 (n = 2), BBS9 (n = 1) and SNX8 (n = 1). Primary ciliary dyskinesis has several genetic causes, including mutations in DNAH11, a gene coding for a dynein heavy chain component of the axoneme, the inner cytoskeletal core of cilia (OMIM #6033). Similarly, BBS9 is one of 14 genes known to be responsible for Bardet-Biedl syndrome, a multisystem disorder [26]. Loss of ciliary function results in a multisystem disease and loss of function during embryogenesis can lead to congenital cardiac lesions, typically abnormalities of cardiac situs (heterotaxy), and less commonly TOF [27].
FGF10 was another plausible candidate gene for human congenital cardiac disease [28][29][30] implicated by a very rare exonic loss CNV. FGF10 codes for fibroblast growth factor 10, a protein with dosage sensitive expression in several aspects of early murine cardiovascular development [28,29]. Notably, the loss CNV involved the entire gene, thus would encompass an evolutionarily conserved cis-regulatory module in intron 1 of the FGF10 gene recently reported to be functional during human cardiac development [30]. The proband with this CNV did not meet criteria for lacrimoauriculodentodigital (LADD) syndrome (OMIM #149730) or autosomal dominant aplasia of lacrimal and salivary glands (ALSG; OMIM #180920), conditions associated with point mutations in FGF10 coding regions that may have different expression from that of an intronic point mutation [30] or a structural variant alone.

Pathway analysis
In pathway analyses testing of case and control subjects with rare CNVs that overlapped 6 or fewer genes, only exonic losses led to significant results for the gene-set test (permutation FDR, = 27.5%, nominal p-value, = 0.05) ( Table 7 in Supporting Information S1), in line with previous findings for autism [5]. Nineteen gene-sets passed the significance thresholds ( Table 8 in Supporting Information S1) and were selected for visualization ( Figure 2). The gene-sets identified belonged to five overlapping functional clusters, representing both those expected and more novel ( Figure 2): vasculature development (p = 0.0351), chromosome organization (p = 0.0224), cell motility (p = 0.0224), chemotaxis (p = 0.0440) and neuron projection and development (p = 0.0440). We also selected the three top-scoring previously reported TOF disease genes (GATA4, NKX2-5 and TBX5) and identified as potential disease candidates their high-confidence functional neighbors affected by a rare exonic CNV in two or more cases and none in controls (Table 10 and Table 11 in Supporting Information S1). PLXNA2-semaphorins was the only gene-set found exclusively in our systematic CNV review.
Integrating the results of pathway analysis and systematic CNV review, we identified potentially important convergences ( Figure 2). GJA5 was found in all disease gene neighborhoods. ANGPT2 and FGF10 were found in vasculature development, cell motility, chemotaxis and in association with at least one of the disease  Table 4 in Supporting Information S1. a Figure 2 in Supporting Information S1. b Table 3.
c Neighbor of a top disease gene (GATA4, NKX2-5, TBX5), as identified in the pathway analysis (Table 11 in Supporting Information S1).   genes. PLXNA2 was found in cell motility, chemotaxis and neuron projection and development. In contrast, HDAC9 was a novel gene identified by gene-set association (vasculature development and other clusters) and disease gene neighbor analysis (NKX2-5), but not in our systematic CNV review. Figure 4 in Supporting Information S1 presents results of a manual review of further lines of evidence to reconstruct putative regulatory relations between the candidate genes in a potential disease pathway.

Discussion
Copy number changes appear to be important genetic variants contributing to the etiology of TOF, with rare exonic losses occurring more frequently in patients with TOF than in controls. Many CNVs associated with TOF appear to disrupt gene pathways that control cell migration and vasculature development, both potentially important in cardiac development [31]. Notably, several plausible candidate genes for TOF were implicated in humans for the first time, including the PLXNA2 gene and related pathways. Our findings suggest that individually rare structural genomic changes are important contributors to the collective genetic burden of TOF.
Based on recommendations from the International Standard Cytogenomic Array (ISCA) Consortium, it is now suggested that chromosomal microarrays be used as the first-tier diagnostic test for patients with multiple congenital anomalies and/or unexplained developmental delay [32]. This recommendation is based on the higher diagnostic yield of genetic testing, specifically as it relates to the high sensitivity of detecting submicroscopic deletions and duplications. Many patients seen in adult congenital cardiac clinics, including those with TOF, will meet these criteria [33]. Notably, our data suggest that clinical screening for syndromic features will likely be insufficient to identify patients with large, pathogenic gains. In contrast, large rare losses may more often be associated with complex phenotypes [1]. In the next decade, many more CNVs associated with congenital heart disease will likely be discovered. The results of the current study will contribute to the strategies used to assess the pathogenicity of a CNV for TOF.
After 22q11.2 deletions, the most common large rare CNV in our TOF cohort was the recurrent 1q21.1 duplication. The 1.18% prevalence of the 1q21.1 duplication is consistent with results using a targeted assay in two previous studies of TOF [12,20] and with early reports of CNVs at the 1q21.1 locus where variable expression included TOF and neuropsychiatric conditions [34]. Congenital cardiac defects that have been reported to be associated with 1q21.1 duplications include TOF [12,20], ventricular septal defect [35], univentricular heart [35] and unspecified complex congenital cardiac disease [36]. Results of the current study, including pathway results and a very rare CNV overlapping GJA5 in this 1q21.1 CNV region (Figure 2 in Supporting Information S1, Table 3), add to previous studies that implicated GJA5 as a promising candidate gene for TOF [12,20]. GJA5 codes for connexin40, a gap junction protein in a protein family known to be important in cardiac development and shown to be associated with TOF in mice [37]. Point mutations in GJA5 have also been reported in patients with arrhythmias [38,39]. This is the first study to report 1q32.2 deletions at the PLXNA2 locus in patients with congenital heart disease. The PLXNA2 gene codes for a transmembrane protein, plexin A2 [40]. Plexin A2 is a receptor for semaphorin C3, which acts as a guidance molecule and is necessary for neural crest influx and endothelial cell function during outflow track septation [41][42][43]. In PLXNA2 knockout mice, congenital cardiac defects, including TOF, have been described [43]. Interactions with other plexins and  March 2006); CNV size, in base pairs; CN, type of copy number aberration; Exonic, CNV overlaps exon of candidate gene (N); Confirmed, by qPCR (N) or not done (ND); CV involvement, known cardiovascular system involvement (N; not necessarily in human); Structural CV phenotype, known structural cardiovascular system phenotype associated with mutation (N; not necessarily in human); References derived from systematic searches of human (e.g., Online Mendelian Inheritance in Man; www.omim.org/) and model organism (e.g., Mouse Genome Informatics; http://www.informatics.jax.org/) databases are presented in Table 5 in Supporting Information S1. a Novel and previously proposed candidate genes for TOF identified because of overlap with two or more CNVs in unrelated subjects (at least one Caucasian) with TOF, where the CNVs were not observed in 2,773 controls (see text). We have also shown selected candidate genes overlapped by very rare singleton CNVs in our cohort, including all those that overlapped rare CNVs reported by Greenway et al. [12].
b Figure 2 in Supporting Information S1. c Greenway et al. [12]. d Figure 3 in Supporting Information S1. transcription factors that control neural crest cell migration may also be important in the development of congenital cardiac lesions [44][45][46][47].
There are multiple semaphorin-plexin pathways. We identified two subjects with loss CNVs involving semaphorin genes: a 7q21.11 deletion overlapping three exons of semaphorin 3D (SEMA3D) and a 7q21.11 deletion intronic to semaphorin 3E (SEMA3E) ( Table 3). Semaphorin 3D has been shown to be expressed in the cardiac cushions of chick heart and ventricular trabeculae [48]. Semaphorin 3E is involved in modulating the NOTCH signaling pathway via a VEGF feedback mechanism [49]. Although most commonly due to mutations involving the chromodomain helicase DNA-binding protein-7, CHARGE syndrome can be caused by mutations in the SEMA3E gene [50].
These CNV-related results direct attention to novel genes potentially involved in cardiac development in humans, and  extend data from previous animal and human studies. For example, the migration of cardiac neural crest cells into the outflow tract is a process orchestrated, in part, by PLXNA2 signaling [51]. PLXNA2-semaphorin signaling is also implicated in guidance of both blood vessels and nerves [31]. Placed in this context, gene-set clusters labeled ''Neuron projection and development'' are compelling candidates for importance in cardiac development. Our findings implicating ciliary genes are also consistent with involvement of processes such as migration of cardiac neural crest into the outflow tract and parallel guidance of blood vessels and nerves in development. A further novel finding, from the pathway analysis, indicated HDAC9, a gene previously linked to muscle development and cardiac hypertrophy in mouse and human [52][53][54].
There is a known association between ciliopathies and congenital heart disease. In the current study, we found four cases with rare CNVs overlapping genes responsible for ciliary motility disorders: primary ciliary dyskinesis, Bardet-Biedl syndrome and 7p22 deletion syndrome. Two subjects had CNVs overlapping the DNAH11 gene, one of the genes responsible for primary ciliary dyskinesis. The cardiac lesion in this condition is believed to be due to an abnormality of nodal ciliary motility during development. In addition to abnormalities of cardiac situs (left and right sided heterotaxy), other congenital lesions including TOF have been reported in humans [55,56] and mice [57]. Another individual with TOF had a very rare loss CNV that overlaps exons 10-21 of BBS9. Although the exact function of the BBS9 gene has not yet been determined, mutations in this gene are known to cause Bardet-Biedl syndrome, another classic ciliopathy that can affect multiple systems and has a highly variable phenotype [26]. Deletion of exons 5-20 in the BBS9 gene was recently reported to have a severe phenotype but this did not include a congenital cardiac defect [58]. The SNX8 gene, coding for sorting nexin 8, lies in the region of overlap of two previously reported 7p22.2 deletions with associated cardiac malformations, including one with TOF [59]. Although the role of SNX8 in development is unknown, sorting nexins have recently been implicated in ciliogenesis [60]. Because of the variable expressivity of the phenotype in ciliopathies, it is possible that patients with congenital cardiac lesions, including TOF, who have these conditions may be undiagnosed [56].
Most of these CNV findings remain in the research realm with functional studies essential to determine the true role of such variants and candidate genes in cardiac maldevelopment. Even the possible role for 1q21.1 duplications in the genetic burden of TOF, detectable by clinical genome-wide and targeted microarrays, requires more data to delineate the associated breadth and penetrance of cardiac and extracardiac expression [34]. There remains a large cohort of adults with TOF of, as yet, unknown etiology. The candidate genes and pathways identified in our study should help to inform subsequent genetic, including sequencingbased, studies of TOF. Most other variants identified will be rarer and of as yet uncertain clinical significance. Nevertheless, these results contribute to our understanding of pathogenesis in TOF -a crucial step towards future clinical applicability of genetic investigations.

Advantages and limitations
This is the first study of genome-wide CNVs in TOF to use a well-characterized cohort of adult patients, stringent molecular methods, and multiple converging analyses. We have identified novel candidate genes for TOF, in addition to providing replication of previous findings, including some from a smaller genome-wide study of CNVs in TOF [12]. There are, however, limitations to this novel study. The conservative laboratory and CNV analytic methods used, including the restricted focus on rare CNVs at the ,0.1% level, may have resulted in missing some rare variants of interest. However, the fact that we used the same approach and adjudication control set to determine rarity meant that our a priori decision to minimize false positives, at the expense of such false negatives, would be expected to affect both cases and controls equally. Although our results overlapped certain previously described CNVs, further replication studies will be important to help define the significance and relative prevalence of the novel rare CNVs identified in this study. Large, multicentre studies may be useful, provided that comparable phenotyping and stringent quality control methods, as highlighted in this study, are maintained [2]. Meta-analyses could clarify if the lack of evidence for two or more rare CNVs per subject, as previously found for 22q11.2 deletions [61], is due to insufficient power. Family studies are also needed to delineate inherited or de novo status and segregation patterns of CNVs. These data will be essential to determine the true penetrance and variable expression of individual CNVs. Examining CNVs in patients with other forms of conotruncal defects or other forms of congenital heart disease may also be informative, and could reveal a genetically-related spectrum of clinically-distinct cardiac maldevelopment as is increasingly appreciated for, e.g., neuropsychiatric disorders [2]. Other study designs, e.g., using whole genome sequencing, will be needed to fully delineate the genetic architecture of TOF, including detection of relevant sequence-based mutations, such as those in non-coding regulatory regions that may be important for cardiac development [30]. Pathway analyses were restricted to subjects with rare CNVs overlapping 6 or fewer genes, and insufficient numbers precluded separate analyses involving only large rare CNVs. However, pathway results were similar when subjects with multigenic CNVs overlapping .6 genes were included (data not shown). Lastly, proving causality of specific genetic variants is beyond the scope of this study and more evidence, including replication of association in independent cohorts, will be needed to corroborate our putative candidate genes for tetralogy of Fallot. Fortunately, the functional significance of several key candidate genes implicated by our CNV results has already been validated in model organisms such as mice and zebrafish.

Conclusions
In addition to well known 22q11.2 deletions, other structural genomic changes appear to be important contributors to the genetic heterogeneity of TOF. In particular, these include 1q21.1 duplications and other rare copy number changes that disrupt genes involved in cell migration and vasculature development pathways including PLXNA2-semaphorin signaling and perhaps ciliary motility. Further studies will help to improve our understanding of the complex etiology and pathogenesis of TOF and of congenital heart disease in general.

Ethics statement
The study was approved by institutional research ethics boards at the University Health Network and the Centre for Addiction and Mental Health.

TOF sample
We prospectively recruited 495 unrelated adults ($18 years) with TOF including a subset with TOF-pulmonary atresia or pulmonary atresia and ventricular septal defect (collectively termed ''TOF'' in this study), without autosomal trisomies, from a single clinic (Toronto Congenital Cardiac Centre for Adults). Patients with pulmonary atresia in the setting of more complex cardiac lesions, such as single ventricle lesions or transposition complexes, were not included. We excluded 62 subjects with documented syndromes, including 53 with 22q11.2 deletion syndrome associated with 1.5 to 3.0 Mb 22q11.2 deletions and genome-wide CNV data reported elsewhere [61]. The remaining 433 subjects formed a CNV discovery sample for this study.
TOF diagnosis was confirmed using echocardiogram and/or cardiac catheterization together with other imaging and surgical data reviewed using lifetime medical records [62]. All subjects underwent direct clinical screening for potential syndromic features [63]; available medical records were also reviewed. Subjects were stratified into syndromic and non-syndromic subgroups using criteria previously validated for identifying 22q11.2 deletion syndrome in adults [63]. Individuals with at least two of three features (history of learning difficulties, global dysmorphic facial features, hypernasal voice) were placed in the syndromic subgroup [63]. All phenotyping was done blind to genotype. Further details regarding cardiac and extracardiac phenotypes are provided elsewhere [33]. We were underpowered to perform subgroup analyses involving individuals with specific congenital cardiac outcomes (e.g., heart failure).

Control sample for formal analyses
To optimize our analyses, we used an independent Canadian control sample from the Ontario Population Genomics Platform (OPGP) genetic epidemiological project that comprised adults of European ancestry [(208 (50.0%) male; mean age 44.96 (SD 12.05) years]. To maximize quality control and minimize artefactual/ laboratory-related findings [64], all OPGP control samples were handled and experiments performed by the same laboratory using identical array methods and protocols, including CNV analyses and rarity assignation using separate large control cohorts, as for the TOF cases (see below).

Genotyping
High quality genomic DNA was genotyped using the high resolution Affymetrix Genome-Wide Human SNP Array 6.0. CNV analysis and adjudication for all TOF case and OPGP control samples were performed at The Centre for Applied Genomics (Toronto, Canada). Arrays meeting Affymetrix-recommended quality control guidelines of contrast QC.0.4 were used for further analysis as outlined below and in Figure 1 in Supporting Information S1.

Ancestry
To accurately estimate ancestry, in addition to self-reported ethnicities genotypes of the TOF cases from 1,120 genome-wide unlinked SNPs were clustered by the program STRUCTURE [65] together with those from 270 HapMap samples, which were used as references of known ancestry during clustering. Ancestries were assigned with a threshold of coefficient of ancestry .0.9. Of the 433 TOF subjects, there were 340 of European, 61 of Admixed, 27 of East Asian and 5 of African ancestry.

CNV determination, adjudication, and prioritization
Genome-wide CNVs were determined using a multiplealgorithm approach to maximize sensitivity and specificity of CNV calling, as described previously [13]. Briefly, for each subject we defined ''stringent'' CNV calls as those detected by at least two of three different CNV calling algorithms: Birdsuite [66], iPattern [67] and Affymetrix Genotyping Console, and spanning 10 kb in length and five or more consecutive array probes. In this dataset, the mean number of calls per sample was 51, 50 and 32 for Birdsuite, iPattern and Genotyping Console, respectively. Overlapping calls at the sample level from Birdsuite and iPattern were merged with the outside probe boundaries. Singleton calls from iPattern or Birdsuite were also included in the stringent CNV set if they overlapped with a Genotyping Console call from the same sample. On average, 59% of CNVs in a sample were stringent. All subsequent analyses focused on the stringent CNVs, which in our experience have very high positive validation rates by independent methods such as quantitative PCR [5,6,13]. Merging CNV calls on a sample level across different algorithms has the additional advantage of correcting for the tendency of individual algorithms to segment single CNV events into multiple calls.
Each stringently defined CNV identified in the TOF case and OPGP control samples was then adjudicated for rarity by comparison to those CNVs identified in two large populationbased control cohorts comprising 2,357 individuals of European ancestry from Ontario and Germany, which had already been assessed using an identical microarray platform and CNV analysis strategy (i.e., as above) [13]. We adopted a conservative definition of rare CNVs, retaining only those CNVs present in ,0.1% of these 2,357 population controls. Further details of the comprehensive adjudication methods, including assessment of segmental duplications and Database of Genomic Variants (http://projects. tcag.ca/variation/) CNVs, may be found elsewhere [5,13].
CNVs.6.5 Mb in size, likely to be detectable by karyotype and/or potentially indicating artefactual results, were excluded. To ensure consistency of data, for major analyses we used only autosomal CNVs.10 kb in size in individuals of European ancestry (Figure 1 in Supporting Information S1).
Large CNVs were defined as those .500 kb in size. We prioritized smaller CNVs (,500 kb) meeting the following criteria for more detailed examination: a) very rare (i.e., not present in any control sample using a 50% reciprocal overlap criterion) [13] and b) recurrent in unrelated TOF subjects, including those reported in the literature, and/or c) overlapping 'interesting' gene(s) possibly involved in TOF. When available, immediate relatives were studied using the same methods as for the proband to determine if a CNV was de novo or inherited.

Experimental validation of CNVs
Confirmatory studies of possible TOF-associated CNVs used Stratagene SYBR Green based quantitative-PCR (qPCR). Each qPCR assay was performed in triplicate, for both the target region and for a control region at the FOXP2 locus on chromosome 7. Where available, molecular cytogenetic or microarray results from clinical laboratories also confirmed CNVs.

Sequencing and mutation screening
For candidate gene discovery in TOF we prioritized further sequencing characterization to a single gene selected based on our CNV results and previous animal model studies to be the most likely to be involved in cardiac development. We performed mutation screening of PLXNA2 coding sequence (spanning 5,682 nucleotides) using standard PCR-based Sanger sequencing. The PLXNA2 gene contains 31 coding exons (67 to 1,268 bp) that were fully sequenced with 32 amplicons. The program Primer 3 (http://frodo.wi.mit.edu/primer3/) was used to design primers. The amplified products were sequenced with the Big Dye Terminator kit using the ABI 3730XL capillary sequencer (Applied Biosystems) and analyzed for sequence variants using Sequencher (Gene Codes, Ann Arbor, MI, USA). Putative sequence variants of interest were confirmed by sequencing in the reverse direction. SIFT [68] and POLYPHEN [69] were used for in-silico prediction of the effect of missense variants on protein function.

Statistical methods
Statistical analysis was performed using SAS software (version 9.3, SAS Institute Inc., Cary, NC, USA). The main analyses compared rare CNVs in the 340 TOF cases of European ancestry with those in the 416 OPGP controls and within-group comparisons of syndromic versus non-syndromic TOF subjects. Chi-square or Fisher's exact tests were used to compare categorical variables and Student's t tests for continuous variables, as appropriate. All tests were two-side, with statistical significance defined as p,0.05.

Pathway analyses
For pathway analyses, we first assessed if pre-defined gene-sets (corresponding to biological functions and pathways) displayed a higher rare CNV load in TOF cases than in OPGP controls. Gene-sets were derived from Gene Ontology annotations (downloaded from NCBI in April 2011 and up-propagated according to ontological relations), pathway databases (KEGG, Reactome, BioCarta, NCI; March 2011) and protein domains (PFAM; March 2011). Only gene-sets with a number of member genes between 25 and 750 were tested: 2,456 total, with 1,939 from GO, 414 from pathways and 103 from PFAM domains. Gene-sets with fewer than 25 genes decrease the statistical power of the analysis, whereas those with more than 750 genes tend to have a very broad biological scope (e.g. GO ''regulation of biopolymer catabolism'') and hinder the visualization of results. Subjects with rare CNVs overlapping more than 6 genes were not considered for the geneset analysis, as these may have a more promiscuous set of gene functions perturbed by the rare variant. For exonic losses this led to the exclusion of 14 TOF cases and 11 OPGP controls.
For each gene-set, we built a contingency matrix with subjects of European ancestry as sampling units. Subjects were categorized as (a) TOF cases or OPGP controls and (b) having at least one geneset gene harboring a rare CNV or not. On the basis of this contingency table, a one-tailed Fisher's Exact Test was used to test higher prevalence of rare CNVs in TOF probands versus OPGP controls. This test can be regarded as an extension of a single-gene or single-variant association test; however, testing association for groups of genes, unlike single genes or single variants, provides sufficient power to detect significant association even when considering only rare variants [70]. To map CNVs to genes we used a stringent method, and restricted to CNVs overlapping exons. We tested all types of variants as well as losses-only and gains-only; only losses produced significant results (see analysis method below), in line with our previous findings for autism [5].
The Fisher's Exact Test nominal p-value was corrected for multiple tests using a case/control class permutation procedure to estimate an empirical false discovery rate. We favored a permutation strategy over classical Benjamini-Hochberg false discovery rate owing to the highly complex dependency structure among gene-sets and overly conservative nature of this test [5]. Case and control labels were permuted 2,000 times, and for each permutation gene-sets were tested following exactly the same procedure. Real nominal p-values were ranked from lowest (most significant) to highest (least significant) and, for each real p-value, the empirical false discovery rate was computed as the average number of genesets with equal or smaller p-value over permutations. Therefore, the empirical false discovery rate can be interpreted as an estimate of the fraction of gene-sets that would be significant under the null hypothesis of no association at the chosen nominal significance level. We selected 27.5% as the empirical false discovery rate significance threshold for final results; we additionally required the nominal p-value to be ,0.05.
Previously known TOF disease genes ( Table 9 in Supporting Information S1) were scored for association following a similar strategy, but using functional neighbors instead of functional genesets. For each known disease gene, we scored TOF case and OPGP control subjects of European ancestry. The score was defined as the highest functional weight between (a) the known disease gene being tested and (b) the CNV-harboring genes in the subject being scored. The functional weight was obtained from STRING, a publicly available resource that predicts the probability of two genes participating in the same pathways based on physical interaction, pathway membership, co-expression and PubMed co-citation. For each TOF disease gene, we tested if functional neighborhood scores were higher in TOF cases compared to OPGP controls by logistic regression analysis. All exonic CNVs (gains and losses) were used; unlike the gene-set association test, restricting to losses did not improve significance. We finally selected the three top-scoring known disease genes (GATA4, NKX2-5, TBX5).
For visualization, we integrated results from gene-set association, disease gene neighborhood analysis and systematic CNV review as a gene-set overlap network using the Cytoscape plugin Enrichment Map [5,71]. Gene-sets significant after the gene-set association test were restricted to genes with higher prevalence in TOF cases than in OPGP controls [5], whereas functional neighborhood gene-sets included the known TOF disease gene as well as its neighbors that had high interaction confidence according to STRING (score.700, equivalent to interaction probability .70%) and harbored CNVs in 2 or more TOF case subjects but no OPGP control (Table 10 in Supporting Information S1). The combined jaccard-overlap index was used to generate the gene-set network, setting a threshold of 0.2. Clusters of overlapping gene-sets were manually identified and colored.

Supporting Information
Supporting Information S1 Table 1: List of rare CNVs in 340 TOF and/or pulmonary atresia cases of European ancestry. Table 2: List of rare CNVs in 416 OPGP control individuals. Table 3: Summary of Affymetrix 6.0 microarray CNV data TOF sample (N = 340). Table 4: Rare large CNVs (.500 kb) in 43 of 433 unrelated adults with tetralogy of Fallot. Table 5: Very rare CNVs overlapping 26 candidate genes for tetralogy of Fallot. Table 6: PLXNA2 sequence variants detected in 192 unrelated TOF cases of European ancestry. Table 7: Gene-set association results for all gene-sets tested, rare CNVs restricted to exonic losses. Table 8: Additional gene-set information for the 19 gene-sets selected for final results. Table 9: Known TOF disease genes used for the disease gene neighborhood analysis. Table 10: Test results on disease gene neighborhoods for all disease genes, using the STRING network. Table 11: Neighbor gene details for the three top disease genes. Figure 1: Overview of study design and CNV analysis workflow. Figure 2: Rare CNVs at chromosome region 1q21.1 in TOF cases. Figure 3: Rare CNVs at chromosome region 18q22.3-q23 in TOF cases.