Copy number variation analysis in Chinese children with complete atrioventricular canal and single ventricle

Congenital heart disease (CHD) is one of the most common birth defects. Copy number variations (CNVs) have been proved to be important genetic factors that contribute to CHD. Here we screened genome-wide CNVs in Chinese children with complete atrioventricular canal (CAVC) and single ventricle (SV), since there were scarce researches dedicated to these two types of CHD. We screened CNVs in 262 sporadic CAVC cases and 259 sporadic SV cases respectively, using a customized SNP array. The detected CNVs were annotated and filtered using available databases. Among 262 CAVC patients, we identified 6 potentially-causative CNVs in 43 individuals (16.41%, 43/262), including 2 syndrome-related CNVs (7q11.23 and 8q24.3 deletion). Surprisingly, 90.70% CAVC patients with detected CNVs (39/43) were found to carry duplications of 21q11.2–21q22.3, which were recognized as trisomy 21 (Down syndrome, DS). In CAVC with DS patients, the female to male ratio was 1.6:1.0 (24:15), and the rate of pulmonary hypertension (PH) was 41.03% (16/39). Additionally, 6 potentially-causative CNVs were identified in the SV patients (2.32%, 6/259), and none of them was trisomy 21. Most CNVs identified in our cohort were classified as rare (< 1%), occurring just once among CAVC or SV individuals except the 21q11.2–21q22.3 duplication (14.89%) in CAVC cohort. Our study identified 12 potentially-causative CNVs in 262 CAVC and 259 SV patients, representing the largest cohort of these two CHD types in Chinese population. The results provided strong correlation between CAVC and DS, which also showed sex difference and high incidence of PH. The presence of potentially-causative CNVs suggests the etiology of complex CHD is incredibly diverse, and CHD candidate genes remain to be discovered.


Background
Congenital heart disease (CHD) is the most common birth defect with an incidence of 1-1.2% in live births [1,2]. Due to disrupted early-stage development, CHD consists of many structural malformations of the cardiovascular system, ranging from simple lesions such as atrial septal defects (ASD) and ventricular septal defects (VSD), to complex lesions such as tetralogy of Fallot (ToF), complete atrioventricular canal (CAVC) and single ventricle (SV). Although clinical treatment have Open Access *Correspondence: qihuafu@126.com; yuyongguo@shsmu.edu.cn; qingxiao18@163.com significantly improved, complex CHD still remains to be a leading cause of newborn-related mortality [3].
Consistent with the complexity of early heart development, the etiology of CHD is multifactorial. To date, only about 20-30% of CHD cases could be attributed to genetic or environmental causes based on available technologies [4][5][6]. The incidence of some specific CHD types has been revealed with sex or race biases [7,8]. The recurrence risk of CHD in the offspring of an affected parent, as well as in the siblings of a CHD child, has been reported to be higher than the general population [9,10]. The evidences emphasize that genetics plays an important role in the pathogenesis of CHD [11].
Both small and large genetic variations could contribute to CHD [12]. Small insertions and deletions (INDELs), ranging from 1 bp to 10 kb in length [13], are typically detected by sequencing technologies [14]. Putative deleterious small variants in single genes could cause both syndromic and isolated CHD. For instance, Noonan syndrome, a common genetic disorder, is mostly caused by mutations in PTPN11 gene. Pulmonary valve stenosis and CAVC represent relatively common features in Noonan syndrome [15,16]. Heterotaxy syndrome, which comprises a class of congenital disorders resulting from malformations in left-right body patterning, has been reported to be associated with mutations in NODAL, ACVR2B, LEFTY2, GDF1, ZIC3, CRELD1 and NKX2.5 [17]. Majority patients with heterotaxy syndrome have serious CHD including SV [17,18]. In addition to syndromic CHD, an increasing number of genes have been identified in individuals with isolated CHD [11]. Whole exome sequencing and whole genome sequencing are able to effectively identify small variants associated with CHD.
Large genetic variants, including aneuploidies, chromosomal rearrangements and copy number variations (CNVs), are also important genetic causes of CHD. CNVs can range in size from single genes to large contiguous deletions or duplications of millions of base pairs [19,20]. Pathogenic CNVs tend to be large, de novo and disrupting coding regions [20]. Although recent advances in next generation sequencing showed their potential in CNVs detection, chromosome microarray, either array comparative genomic hybridization or single nucleotide polymorphism array, is still the gold standard for CNV detection and validation [21].
Nowadays, investigation of genes in overlapping CNV regions can probably identify relevant genes or refined intervals for certain genetic diseases [22][23][24]. Considering the heterogeneity of CHD etiology, a large number of CNVs associated with CHD have been identified over the past decades, especially the conotruncal anomalies including TOF, TGA and pulmonary atresia (PA)/VSD [25][26][27]. In our study, genome-wide CNVs in Chinese children with CAVC and SV were screened since there were scarce researches dedicated to these two types of CHD.

Study subjects
We obtained a cohort of 528 children diagnosed as CAVC (n = 264) or SV (n = 264) by echocardiography from the Shanghai Children's Medical Center between November 2010 and August 2019. The patients had an average age at 8.77 ± 2.77 (mean ± SD) years. The phenotypic details of this cohort were summarized in Additional file 1: Table S1. The Ethics Committee of the Shanghai Children's Medical Center reviewed and approved this study (SCMCIRB-K2017009).

DNA extraction
Genomic DNA was isolated from peripheral blood samples of all patients using Gentra Puregene Blood Kit (QIAGEN, Hilden, Germany) according to manufacturers' instructions. NanoDrop2000 spectrophotometer (Thermo Fisher Scientific, Waltham, MA) was used to check the quantity and quality of the DNA samples. Only samples with OD260/OD280 ratios between 1.8 and 2.0, and OD260/OD230 ratios > 1.5 were selected for further investigation.

Microarrays
Microarrays were designed based on the Affymetrix Arrays platforms (Thermo Fisher Scientific, Waltham, MA), namely the CytoScan 750 K arrays. We deleted probes with high population frequency and added probes particularly designed for sites marked with two stars in Clinvar as well as pathogenic variants in HGMD. In the meantime, design of probes was also based on clinical data of high-morbidity diseases in newborns, which was applied to screen for CHD, especially CHD patients accompanied by extra-cardiac anomalies. Genomic DNA samples were amplified, fragmented and stringently hybridized onto arrays according to manufacturers' instructions. Microarrays were automatedly processed by GeneTitan Multi-Channel instruments together with Affymetrix Command Console (AGCC) Software for instruments control and production of probe cell intensity data (CEL file).

Data analysis
Microarray data processing was implemented using the Affymetrix Chromosome Analysis Suite v2.0 (ChAS) Software, and CNVs were called based on human assembly GRCh38 (hg38). There were 521 patients including 262 CAVC and 259 SV passed the QC tests finally.
Only CNV calls larger than 200 kb and with at least 50 probes for deletion and duplication were considered for further analysis. In this study, the detected CNVs were classified according to the following criteria: (1) The ones having ≥ 70% overlap with CNVs reported in DGV were categorized into non-causative CNVs. And the rest CNVs were identified as potentially-causative CNVs. The frequencies for non-causative CNVs and potentially-causative CNVs were calculated based on DGV and DECIPHER database, respectively. (2) In our cohort of 262 CAVC and 259 SV patients, the CNVs had a frequency less than 1% were defined as rare CNVs, and the others (≥ 1% in our dataset) were identified as common CNVs. (3) Novel CNVs were those have not been previously reported in the literature or available public database.

Clinical characteristics of the 528 patients
In 264 CAVC patients, the sex ratio (Males to Females) was 0.98 (131:133) and the average age was 8.80 ± 3.76 (mean ± SD) years. In 264 SV patients, the sex ratio was 1.34 (151:113) and the average age was 7.04 ± 3.57 years. The results were summarized in Table 1.

CNV detection in CHD cases
In this study, seven samples failed the QC criteria (CAVC209, CAVC211, SV134, SV145, SV177, SV181, SV254). Among the rest 521 samples (262 CAVC and 259 SV) who passed the QC tests, a total of 3465 CNVs were detected with a median size of 922.3 kb (max 23.9 Mb, min 51.5 kb). Large CNVs with size greater than 200 kb were selected for further analysis.
Additionally, we also consulted the DECIPHER and ISCA databases for evidences of clinical relevance. Duplication of 9p24.3-9p13.11 (CAVC207) has been reported to associate with ToF/TGA/coarctation of the aorta (CoA) phenotype.
Potentially-causative CNV frequency in Decipher was calculated with specific CNV counts dividing by the number of open-access patient records

Discussion
CAVC, accounts for ~ 4% of CHD, is a complex cardiac malformation characterized by a variable deficiency of the atrioventricular area in the developing heart [28,29]. SV, one of the most common forms of severe CHD, comprises a spectrum of congenital cardiac malformations defined by severe underdevelopment of one ventricle [30].
In 262 CAVC patients reported here, 14.89% (39/262) carried the duplication of 21q11.2-21q22.3, which could be diagnosed as trisomy 21, namely Down Syndrome (DS). A striking association of CAVC with DS was found in this study. All DS patients had the same ~ 3.3 Mb duplication at 21q11.2-21q22.3, and a systematic reanalysis indicated that 21q22.13 was the minimal critical region to the DS phenotype [31]. Additionally, another study detected 57.6% cardiac malformations in 500 patients with DS, and it also suggested CAVC (35.1%) was the most frequent heart anomaly [32]. It is putative that CAVC is the most frequent type of CHD in DS patients, and our study also provide strong evidence for this correlation in Chinese population. Additionally, CAVC also referred to as complete atrioventricular septal defect, and it has been reported that AVSD (atrioventricular septal defects) are more common in the female of DS patients [33]. In our study, the female/male ratio of CAVC with DS patients was 1.60 (24:15), which suggest that potential sex differences existed in the prevalence of CAVC in DS patients. Besides, we also noticed that rates of pulmonary hypertension (PH) in DS patients with CAVC was 41% (16/39), which was higher than previous report (28%, 364/1242) [34]. In fact, it was well known that PH is common in children with DS, and our study intensely proved this correlation.
Nowadays, several genes located in the "CHD critical region" on chromosome 21 have been proved to be associated with CAVC, including DSCAM, COL6A1, COL6A2, and DSCR1 [35]. However, there were three DS patients simultaneously had another CNV located at different chromosome in our cohort, and one of the CNVs (3q12.1-3q12.2 dup) has been reported to associate with VSD in Decipher database. Additionally, several DS patients showed not only CAVC (4/27), but also other cardiac anomalies, such as ToF, ASD, patent foramen oval (PFO) and patent ductus arteriosus (PDA). Although the above-mentioned genes can explain partial cardiac phenotypes in DS patients, the genetic causes still were difficult to clarify especially when DS probands accompanied with multiple CNVs and diverse CHD phenotypes.
In our study, two potentially-causative CNVs had been identified as main causes of certain syndromes with heart anomalies. The microdeletion on 7q11.23 caused Williams-Beuren Syndrome (WBS; OMIM 194050), which is a multisystemic developmental disorder mostly accompanied with CHD [36,37]. More than 90% of WBS patients have the ~ 1.55 Mb pair deletion extending from FKBP6 to GTF2I, and it has been widely accepted that the deletion or mutation of an elastin (ELN) allele is a major cause of WBS [38]. One patient in this study (CAVC162) had a ~ 1.52 Mb deletion at 7q11.23 extending from NCF1B to GTF2I, encompassing the ELN gene. The other microdeletion on 8q24.3 have been recognized as associated with Verheij syndrome (OMIM 615583), which is characterized by growth retardation, developmental delay (DD), microcephaly, vertebral anomalies, dysmorphic features, cardiac and renal defects [39]. Poly(U) Binding Splicing Factor 60 (PUF60) were suggested as the main cause for heart defects in the syndrome, since knockdown of Puf60 alone resulted in cardiac structural defects [40]. The patient (CAVC145) reported here had a ~ 2.5 Mb deletion of 8q24.3, representing with growth retardation and heart anomalies.
Among the rest CNVs identified in the CAVC patients, CNVs located at 8q21.13-8q21.2, 9p24.3-9p13.1 and 3q12.1-3q12.2 have been seldom reported. The detected potentially-causative deletion CNV, 8q21.13-8q21.2, encompasses Zinc Finger and BTB Domain Containing 10 (ZBTB10), which has been known as a CHD gene. ZBTB10 encodes a telomere-associated protein [41]. Lately, a GWAS involving 4,000 unrelated Caucasian patients diagnosed with CHD indicated that ZBTB10 was associated with TGA, since two highly significant SNPs (rs148563140 and rs143638934) closely located to this gene [42]. Furthermore, they suggested strong cell-type specificity in murine cardiac development for Zbtb10. Except the known CHD gene ZBTB10, this CNV region in the patient (CAVC102) also included STMN2 related to abnormality of the cardiovascular system. STMN2 encodes a member of the stathmin family of phosphoproteins, functioning in microtubule dynamics and signal transduction [43]. Compared with controls, methylation of STMN2 significantly increased (FDR p value = 4.27 × 10 -51 ) in VSD cases [44]. Besides, it has been shown that Stmn2 expresses in atrioventricular node, endocardium and outflow tract in mouse according to the LifeMap Discovery database. For the remaining two duplication CNVs, one of them (9p24.3-9p13.1, CAVC207) has been reported as VSD or TOF in DECI-PHER. In this region, only Rfx3 gene was in "ventricular septal defect" derived from the MGI (mouse genome informatics) database. For the duplicated region of 3q12.1-3q12.2, a report had shown a VSD patient had a ~ 116 kb duplication of this region, and TBC1D23 has been identified as the major candidate gene [25]. In our study, the patient (CAVC274) had a ~ 0.8 Mb duplication at 3q12.1-3q12.2, encompassing this CHD candidate gene TBC1D23.
As for the proband (SV143) with a ~ 1.6 Mb duplication at 5q34-5q35.1, we found that this region encompasses 2 genes (SLIT3 and TENM2) related to septal defects of heart. SLIT3 (Slit Guidance Ligand 3) expressed in cardiomyocyte-like progenitor cells [49], and membranous ventricular septum defects as well as atrioventricular and aortic valve abnormalities are exhibited in SLIT3-mutant mice [50]. Recently, SLIT3 variants in humans has shown association with CHD involving in ToF and septal and outflow tract defects [51]. TENM2 (Teneurin Transmembrane Protein 2) expresses abundantly in human fetal heart. Moreover, patients with loss of TENM2 presented ASD in Decipher database, gain of TENM2 didn't show any phenotype of CHD yet.
Structural genetic changes, especially copy number variants, represent a major source of genetic variation contributing to CHD patients. In recent years, a large number of CNVs associated with CHD have been identified [25][26][27]. Nevertheless, the role of pathogenic CNVs in SV and CAVC remain largely unknown because of their low incidence. In our study, genome-wide CNVs in 521 Chinese children with CAVC and SV were screened.
A total of 27 CNVs ≥ 200 kb was detected, comprising 10 deletions and 17 duplications, in 11.52% (60/521) CHD cases, namely 16.79% (44/262) in CAVC cases and 6.18% (16/259) in SV cases. According to our strategy, 6 potentially-causative CNVs in 43 cases were identified and contributed to 16.41% (43/262) CAVC patients. Whereas, 6 potentially-causative CNVs in 6 cases were classified which led to the contribution to 2.32% (6/259) SV cases. CNVs in isolate/syndromic CHD patients have been investigated previously, providing a genome-wide (likely) pathogenic CNV burden ranging from 4.3 to 27.9% [52][53][54][55]. In our study, the rate of potentially-causative CNVs in SV cohort is relatively lower, possibly due to the subphenotype difference of CHD and/or the different stringency in variant interpretation standards. Based on previous study, different cardiac subphenotypes showed various enrichment of large CNV events [55], that is, the detection rates in various types of CHD were different. Additionally, some CNVs < 200 kb, which ignored in the present study, may be pathogenic. Furthermore, genomewide CNVs with a minor allele frequency (MAF) < 1% are usually recognized as an important contributor to CHD [56] and majority non-causative CNVs in our study had a MAF < 1% in DGV database (Tables 2, 3). The conservative CNV analytic methods used in our study, including the restricted focus on CNVs that were absent in DGV, may result in missing some functional CNVs. Further study of these CNVs is still needed to evaluate the clinical implication.

Conclusion
In conclusion, we identified 12 potentially-causative CNVs in 521 CAVC and SV patients, which represented the largest cohort of these two rare CHD types in China. Most CNVs identified in our study were rare (< 1%), occurring just once among the CAVC or SV samples except the 21q11.2-21q22.3 duplication in CAVC cohort. In this study, Chinese CAVC patients were mostly 21 trisomy with DS, which was consistent with the previous reports. Furthermore, it also suggested that there was no race difference in the close correlation between CAVC and DS patients. Combined with the present CNVs reports of CHD and the intolerance of genes within the CNVs regions, our results provided novel genetic evidences that could help clarify the etiology of CHD. Additionally, the potentially-causative CNVs we detected were seldom overlapped with known CHD loci, which implicated that abundant gene involved in heart development and diverse genetic causes of CHD.