Next-generation sequencing using a pre-designed gene panel for the molecular diagnosis of congenital disorders in pediatric patients

Next-generation sequencing (NGS) has revolutionized genetic research and offers enormous potential for clinical application. Sequencing the exome has the advantage of casting the net wide for all known coding regions while targeted gene panel sequencing provides enhanced sequencing depths and can be designed to avoid incidental findings in adult-onset conditions. A HaloPlex panel consisting of 180 genes within commonly altered chromosomal regions is available for use on both the Ion Personal Genome Machine® (PGMTM) and MiSeq platforms to screen for causative mutations in these genes. We used this Haloplex ICCG panel for targeted sequencing of 15 patients with clinical presentations indicative of an abnormality in one of the 180 genes. Sequencing runs were done using the Ion 318 Chips on the Ion Torrent PGM. Variants were filtered for known polymorphisms and analysis was done to identify possible disease-causing variants before validation by Sanger sequencing. When possible, segregation of variants with phenotype in family members was performed to ascertain the pathogenicity of the variant. More than 97 % of the target bases were covered at >20×. There was an average of 9.6 novel variants per patient. Pathogenic mutations were identified in five genes for six patients, with two novel variants. There were another five likely pathogenic variants, some of which were unreported novel variants. In a cohort of 15 patients, we were able to identify a likely genetic etiology in six patients (40 %). Another five patients had candidate variants for which further evaluation and segregation analysis are ongoing. Our results indicate that the HaloPlex ICCG panel is useful as a rapid, high-throughput and cost-effective screening tool for 170 of the 180 genes. There is low coverage for some regions in several genes which might have to be supplemented by Sanger sequencing. However, comparing the cost, ease of analysis, and shorter turnaround time, it is a good alternative to exome sequencing for patients whose features are suggestive of a genetic etiology involving one of the genes in the panel.


Background
Congenital disorders comprise conditions present at birth or those that developed during infancy or early childhood. Presentations include structural abnormalities, neuromuscular disorders, developmental delay, and intellectual disability which collectively affect more than 10 % of children. The European Surveillance of Congenital Anomalies (EUROCAT) reported the prevalence of major congenital anomalies to be about 2.4 % of live births [1], while the Center for Disease Control and Prevention (CDC) reported 3.3 % for birth defects [2]. The prevalence of developmental disabilities is reported to be 13.9 % in the USA [3].
Less than half of these disorders have an identifiable cause such as aneuploidy, metabolic disorder, maternal infection, parental exposure to teratogenic agents, or intrapartum events. The remaining cases are thought to have a genetic etiology such as submiscroscopic chromosomal abnormalities or rare single/multiple nucleotide changes. The former can be detected by using chromosomal microarray analysis (CMA) which is now the recommended first-tier test for children with dysmorphism, multiple congenital anomalies, developmental delay/ intellectual disability, and/or autism spectrum disorder [4]. Although CMA is more sensitive than conventional karyotyping, the diagnostic yield for this group of disorders is still only about 20 % in multiple studies [5][6][7]. Genetic causes for the rest are likely due to small deletions and insertions, balanced translocations involving gene disruptions, and point mutations which cannot be detected by commonly used CMA platforms.
With massively parallel sequencing, many regions and even the entire genome can be interrogated simultaneously to identify such mutations. Although the cost of whole genome sequencing has become progressively lower in the last few years, data analysis and interpretation remain challenging. Due to the large number of short-reads, the sequence data has to be mapped back to the reference genome and filtered through known databases to identify variants for each individual, leading to long turnaround time from clinic testing to reporting. There is also the issue of incidental findings unrelated to the indication for testing and the American College of Medical Genetics and Genomics (ACMG) have recommended the reporting of pathogenic variants for 56 genes [8]. Subsequently, the ACMG recommended that patients be given the choice of opting out of receiving such information [9]. For these reasons, many laboratories still use Sanger sequencing of single or a few genes when there are known causal genes for the suspected disorders.
Exome sequencing can partly overcome the issue of data throughput but not the possibility of incidental findings. Targeted gene panels can address both by focusing on a set of relevant candidate genes with known diagnostic yield, while providing cost-related advantage as well as easier data analysis without the need for specialized computing infrastructure and expertise. The American Society of Human Genetics (ASHG) also recommends that gene testing should be limited to single genes or targeted gene panels based on the clinical presentations of the patient [10]. Compared to Sanger sequencing of single genes, targeted gene panel sequencing has much higher throughput, but each design needs to be evaluated for coverage and sensitivity before being put to routine clinical diagnostic use.
Among several pre-designed catalog panels for pediatric congenital disorders, there is one comprising 180 genes located within chromosomal regions with a high frequency of cytogenetic abnormalities in constitutional disorders [11] according to publicly available data from the International Collaboration for Clinical Genomics (ICCG-previously known as International Standards for Cytogenomic Arrays or ISCA) [12,13]. To assess the coverage and sensitivity of this ICCG gene panel for high-throughput next-generation sequencing in congenital disorders, we used the Ion Torrent PGM platform to perform mutation screening of 15 pediatric patients with suspected genetic disorders.

Ethics statement
The patients were previously recruited under two separate projects (CIRB Ref: 2007/831/F and 2010/238/F). Approval to conduct this sequencing study was provided by the SingHealth Central Institutional Review Board (CIRB Ref: 2013/798/F). All the subjects were minors, and written informed consent had been obtained from the parents.

Study samples
The 15 patients were previously recruited from the hospital's Genetics Clinics for testing of chromosomal imbalance using human 400 K CGH arrays (Agilent Technologies Inc., Santa Clara, USA). No significant pathogenic copy number changes were identified in all 15. Inclusion criteria include developmental delay/intellectual disability and multiple congenital anomalies. Each patient had been followed up and examined by a clinical geneticist. All of them have clinical features suggestive of a disorder associated with one of the 180 genes, although the features may not have been typical or completely fulfilled the clinical criteria of a specific syndrome at the time of recruitment.

DNA extraction
Genomic DNA was manually extracted from peripheral blood collected in EDTA tubes using the Gentra Puregene Blood Kit (Qiagen Inc., Valencia, USA) according to the manufacturer's instructions. DNA quality and quantity were measured on a Nanodrop Spectrophotometer (Thermo Scientific, Wilmington, USA).

Library construction, sequencing, and data analysis
Genomic DNA (225 ng gDNA) was digested with 16 different restriction enzymes at 37°C for 30 min to create a library of gDNA restriction fragments. Both ends of the targeted fragments were selectively hybridized to biotinylated probes from the HaloPlex ICCG panel (Agilent Technologies Inc., Santa Clara, CA, USA), which resulted in direct fragment circularization. During the 16-h hybridization process, HaloPlex ION Barcodes and Ion Torrent sequencing motifs were incorporated into the      The data from the sequencing runs were analyzed using the Torrent Suite v4.0.2 analysis pipeline, which includes raw sequencing data processing (DAT processing), splitting of the reads according to the barcode for the individual sample output sequence, classification, signal processing, base calling, read filtering, adapter trimming, and alignment QC. Single-nucleotide polymorphisms (SNP), multi-nucleotide polymorphisms (MNPs), insertions, and deletions were identified across the targeted subset of the reference using a plug-in Torrent Variant Caller (v4.0-r76860), with the parameter settings optimized for germ-line high frequency variants and minimal false positive calls. The output variant call format (VCF) file was then annotated through the web-based user-interfaced GeneTalk (GeneTalk GmbH, Berlin, Germany) and Ensembl Variant Effect Predictor [14].
Sequence variants were compared with data in dbSNP, 1000 Genomes and Human Genome Mutation Database. Variants not previously reported in healthy controls or previously classified as pathogenic were evaluated for coverage depth and also visually inspected using the Integrative Genomics Viewer before validation by dideoxy sequencing using standard protocol for BigDye® Terminator v3.1 Cycle Sequencing Kit (Life Technologies, Carlsbad, CA, USA). Segregation analysis was performed when DNA from family members was available. Sequencing was carried out on the Applied Biosystems® 3130 Genetic Analyzer (Life Technologies, Carlsbad, CA, USA). In addition, SIFT (sift.bii.a-star.edu.sg) and Poly-phen2 (genetics.bwh.harvard.edu/pph2) were used to check the likely functional significance of missense variants for clinical interpretation.

Results
An average of 790 Mb was generated per chip (range 748-828 Mb). Loading densities of the targeted sequencing of four libraries (four samples were multiplexed in each library) ranged from 75 to 81 %. The total number of reads (usable sequence) ranged from 5.8 to 6.4 M, and average read length ranged from 124 to 131 bp. After filtering out polyclonal, low quality, and primer dimers, the percentage of usable reads ranged from 69 to 73 %. On average, each sample yielded 196 M bases from 1.5 M reads (Table 1 and Fig. 1) from 58,670 amplicons with a mean read length of 128 bp. One sample was sequenced twice, with near identical output obtained for both runs. The numbers of reads were 1,552,042 and 1,556,202 for total reads and 1,522,728 and 1,524,576 for mapped reads, and total numbers of bases sequenced were 199,024,281 and 200,813,003.
Approximately 97.4 % of the reads were aligned to the reference genome (hg19) and 91 % mapped to the target regions, with average base coverage ranging from 203× to 256× for individual samples. 97.7 % of the targets had minimum read depth of 20×, 95.6 % at >50× and 88.2 % at >100×. Full coverage was achieved for more than 95 % of targets in all 15 samples, and most (approximately 89.9 %) target bases did not show any bias toward forward or reverse strand read alignment. The average total coverage of all targeted bases was 95.7 % at 20× and 82.38 % at 100×. Coverage was also uniform across all samples. More than 88 % of called bases had a quality score of ≥Q20 (Table 1).
At the gene level, 137 of the 180 genes had mean coverage of at least 20×, of which 99 had a mean of >50× and 40 had a mean of >100× (Table 2). Despite the high target region coverage, amplification failed for at least 26 exons across the 180 genes. Thirteen genes (CFC1, CHRNA7, CYP21A2, EHMT1, F8, HBA1, HBA2, IKBKG, NOTCH2, PKD1, SGCE, SRY, TSC2) had at least one region that was not amplified and therefore not sequenced (lowest number of reads "0" in Table 2). The sequencing coverage of CFC1, IKBKG, HBA1, and HBA2 was low with >50 % of these genes sequenced at >20× ( Table 3). The gene with the highest mean coverage was SALL1 (358×). The poorest coverage was for CFC1. Mean read depth for individual exons for three different genes were shown in Figs. 2, 3, and 4.
Overall, 2326 single-nucleotide variants (SNVs) and 25 indels were identified in the 15 patients. These variants identified from the Ion Reporter had an average coverage of 595× and an average Qscore of 38. Variant annotation indicated that 2203 were common variants present in dbSNP and 1000 Genome Project databases. The number of variants ranged from 154 to 175 per patient, with an average of 9.6 novel variants each. Synonymous variants were the most common.
Variants were prioritized for Sanger confirmation based on the individual's clinical presentations. Pathogenic variants were confirmed in six patients. The identified CHD7 (two patients), SHH, TCF4, TSC2, and MECP2 variants and the clinical features of these six patients are listed in Table 4. Another five patients had candidate variants

Discussion
The HaloPlex ICCG panel is a pre-designed made-toorder panel targeting 180 genes. It follows the ICCG recommendations for design and resolution and is available through SureDesign from Agilent Technologies. The targeted panel includes genes in the most commonly altered chromosomal regions according to the ISCA/ICCG database. The 180 genes are covered by 2509 target regions which range in size from 2 to 6575 nucleotides.
Depending on its size, a region is covered by between 1 and 547 amplicons. The recommended minimum read depth for clinical diagnostic sequencing is 20× [15,16], which was achieved for over 90 % of the target for 170 genes. For CHD7, even the exon with the poorest coverage had a mean of 36 (Fig. 2). Of the remaining ten, four genes had 80-90 % coverage, and the other six (CFC1, CYP21A, HBA1, HBA2, IKBKG, NOTCH2, PLP1) had <80 %. More than half of the targets in these individual genes are within GC-rich regions. Less efficient PCR for these templates might have resulted in sequencing failure during library preparation, or insufficient sequence data were produced [17]. In addition, the HaloPlex protocol uses restriction enzymes which are sequencedependent and nonrandom, this method might have contributed further to uneven coverage and also gaps in coverage [18]. For IKBKG, the presence of a pseudogene might have caused non-specific alignment and contributed to the low capture of target sequences [19]. Nijman et al. have almost no mapped reads in IKBKG in their targeted sequencing, and generally poor coverage of CFC1 and IKBKG had been reported in multiple studies [20][21][22]. For the gene with the poorest coverage CFC1, all six exons had no reads across all 15 samples. This gene is associated with the generation of left-right asymmetry via the TGF pathway. There were 23 mutations in HGMD, 13 of which were found in patients with congenital heart disease [23]. This panel would not be useful for patients with clinical suspicion of CFC1 gene mutations.
The first exon of 64 genes was not included in the design (indicated with "*" in Table 2). All the 64 genes have one or more non-coding exon. The entire exon 1 of these genes (and additional exons for some others) contains only untranslated regions. In general, amplification of exon 1 of some genes was problematic because of the generally higher GC content and sequence complexity [24][25][26]. Our results showed that MECP2 had an average target base read depth of 118×. The coverage for exon 1 is the lowest among all, but it is still two times that of the minimum of 20× recommended for clinical diagnostics (Fig. 3). SATB2 had an average target base read depth of 300×, but exon 1 was not covered in the design (Fig. 4). Nevertheless, including non-coding exons in the design might improve the yield of NGS as variants affecting splicing of non-coding exons have been reported to be disease-causing [27].
Many congenital disorders do not have unique and exclusive features, and the presentations may be nonspecific. Even for syndromic disorders, there are overlapping features, and the phenotypic features in some patients may be atypical, making it challenging for the clinical geneticists to come to a diagnosis based on clinical history and examination. All the 15 patients in this study have constitutional disorders and suspicion of chromosomal disorders, but CMA did not find any pathogenic copy number abnormality. With this targeted panel,  we were able to reach a molecular diagnosis for six patients after reviewing the results with their primary physicians (Table 4). Pathogenic CHD7 variants were detected in two patients with clinical features consistent with CHARGE syndrome. Both CHD7 variants identified (p.R2613X and p.Q201X) have been previously reported in other CHARGE patients [28]. A pathogenic p.R255X MECP2 variant was detected in a patient with clinical features of Rett syndrome. This variant has also been reported previously [29]. The patients with the truncating TSC2 variant and the missense SHH variant also showed clinical features consistent with the respective causative genes. These two variants are novel and the missense variant is predicted to be pathogenic according to both SIFT and Polyphen. Similarly, the clinical features of the patient with the TCF4 variant are found to be consistent with Pitt-Hopkins syndrome upon retrospective review of the patient's progressive features by the attending physician. This p.R580Q TCF4 variant has been reported as pathogenic in patients with Pitt-Hopkins syndrome [30]. The identification of a patient's causative mutation has the translational benefit of providing the parents with an answer for their child's condition. In addition, it provides a guide to the attending clinician on the management and prognosis of the patient. A molecular diagnosis would also facilitate access to clinical trials and programs for special needs children. The use of appropriate gene panels obviates the need for subjective clinical decision on which gene(s) to test in each patient, and may lead to a standard testing workflow for each group of disorders. Generally for those whose diagnosis can be narrowed down to a few suspected genetic syndromes, targeted gene panels would be superior to exome sequencing which has more limitations in the diagnostic setting due to coverage deficiencies in some genes and longer turnaround time. Higher-average read depth could be attained at a lower cost, making it superior to exome sequencing in terms of cost, sensitivity, and expected diagnostic yield [31,32].

Conclusions
The Haloplex ICCG panel had good coverage except for ten of the target genes. Consideration would have to be made for the low coverage for some regions in several genes which might have to be supplemented by Sanger sequencing. However, comparing the cost, ease of analysis, and shorter turnaround time, it is a good alternative to exome sequencing for patients whose features are suggestive of a genetic etiology involving one of the genes in the panel.