Validation of a semiconductor next-generation sequencing assay for the clinical genetic screening of CFTR

Genetic testing for cystic fibrosis and CFTR-related disorders mostly relies on laborious molecular tools that use Sanger sequencing to scan for mutations in the CFTR gene. We have explored a more efficient genetic screening strategy based on next-generation sequencing (NGS) of the CFTR gene. We validated this approach in a cohort of 177 patients with previously known CFTR mutations and polymorphisms. Genomic DNA was amplified using the Ion AmpliSeq™ CFTR panel. The DNA libraries were pooled, barcoded, and sequenced using an Ion Torrent PGM sequencer. The combination of different robust bioinformatics tools allowed us to detect previously known pathogenic mutations and polymorphisms in the 177 samples, without detecting spurious pathogenic calls. In summary, the assay achieves a sensitivity of 94.45% (95% CI: 92% to 96.9%), with a specificity of detecting nonvariant sites from the CFTR reference sequence of 100% (95% CI: 100% to 100%), a positive predictive value of 100% (95% CI: 100% to 100%), and a negative predictive value of 99.99% (95% CI: 99.99% to 100%). In addition, we describe the observed allelic frequencies of 94 unique definitely and likely pathogenic, uncertain, and neutral CFTR variants, some of them not previously annotated in the public databases. Strikingly, a seven exon spanning deletion as well as several more technically challenging variants such as pathogenic poly-thymidine-guanine and poly-thymidine (poly-TG-T) tracts were also detected. Targeted NGS is ready to substitute classical molecular methods to perform genetic testing on the CFTR gene.

The challenge in CFTR genetic screening resides on its high allelic heterogeneity, with more than 1900 sequence variants reported (Cystic Fibrosis Mutation Database, January 2015, http://www.genet.sickkids.on.ca) since its discovery 25 years ago Riordan et al. 1989;Rommens et al. 1989). Although one mutation (del-taF508) accounts for about 70% of CF alleles worldwide (Bobadilla et al. 2002), diverse heritages are reflected for the CFTR gene and distributed with varying frequencies among populations often complicating genetic analysis (Estivill et al. 1997). The current guidelines of the American College of Medical Genetics and Genomics (ACMG) recommend a panel of only 23 variants for populationbased CF carrier screening (Watson et al. 2004), leaving the vast majority of possible genotype changes untested.
To date, the identification of CFTR pathologic variants relies on commercial tests that screen for specific common mutations and/or laborious direct DNA Sanger sequencing of the moderately large CFTR gene (27 exons) (Nakano et al. 2014). Yet, more effective next-generation sequencing (NGS) technologies are rapidly tested and introduced into clinical practice (Grosu et al. 2014). Here, we validated an NGS analysis pipeline based on the Ion Torrent PGM benchtop next-generation sequencer and the Ion AmpliSeq TM CFTR Panel (Life Technologies, Carlsbad, CA) combined with robust bioinformatics tools for CFTR genetic screening. In order to investigate its applicability to clinical genetic diagnostics we performed a broad analysis of CFTR variants in 177 previously characterized patients with diverse CF and CFTR-related phenotypes. We were able to identify genetic alterations in CFTR, including single nucleotide variants (SNV), insertions and deletions (InDels) and structural variants (SV). Strikingly, a seven exon spanning deletion as well as several more technical challenging variants such as pathogenic poly-thymidine-guanine and poly-thymidine (poly-TG-T) tracts were detected.

DNA extraction
DNA was isolated from Ethylenediaminetetraacetic acid (EDTA) blood using two automated procedures. The spinÀcolumn-based extraction was performed on QIAcube instrument with QIAamp DNA Blood Mini QIAcube Kit (Qiagen, Valencia, CA) following the manufacturer instructions. Alternatively the QIAsymphony DSP DNA Mini Kit (Qiagen) on the QIAsymphony instrument was used to purify the DNA from blood. Following extraction all DNA samples were stored at À20°C. Prior to the analysis the DNA quality and concentration was determined photometrically (OD 260 /OD 280 1.8-2.0).

Amplicon library construction
The target regions in the CFTR gene were amplified using the Ion AmpliSeq TM CFTR Panel (Life Technologies). It consists of two primer pools (102 amplicons) that target the entire coding region, including 10-20 bp of intronic flanking sequences around all coding exons, of the gene. In order to amplify each library 4 lL of 5X Ion AmpliSeq TM HiFi mix, 10 lL of 2X Ion Amp-liSeq TM primer pool (two of them in separate wells for each sample), 10 ng of gDNA per reaction (2 lL of 5 ng/lL stock), and 4 lL of nuclease free water were mixed together. Following temperature profile was applied to the final 20 lL of PCR mixture: 99°C for 2 min; 99°C for 15 sec, 60°C for 4 min (19 cycles); with a final hold at 10°C. Then primer sequences were partially digested, and adapters and barcodes ligated to the amplicons as described in Ion AmpliSeq TM library preparation manual. Each library was marked with a unique adapter provided in Ion Xpress TM barcode adapters 1-96 Kit (Life Technologies). Purified libraries were quantified with the Qubit â 2.0 fluorometer (Life Technologies) using the Qubit â dsDNA HS assay kit, diluted to~100 pmol/L and combined in equimolar proportion. Freshly prepared library stock dilutions were used on the same day for the preparation of enriched, templatepositive ion sphere particles (ISPs). Automated protocols were run on the Ion OneTouch TM 2 System and the Ion OneTouch TM ES Instrument (Life Technologies) according to the version of the user guide and using the 200 bp chemistry kits.

Sequencing on the Ion Torrent platform
All barcoded samples were sequenced on the PGM (Life Technologies) with 318 chips taking up to 48 samples on a single chip per sequencing run. Chip loading procedure was performed twice according to the user guide for the on Ion PGM TM sequencing 200 kit v2.

Data analysis
Raw sequence data analysis, including base calling, demultiplexing, alignment to the hg19 human reference genome (Genome Reference Consortium GRCh37), and variant calling, were performed using the Torrent Suite Software v.4.0.2 (Life Technologies). For the variantCaller plugin we used the optimized parameters for the CFTR panel. Variants were annotated using Annovar (Wang et al. 2010) and in-house ad hoc bioinformatics tools. Alignments were visually verified with the Integrative Genomics Viewer v.2.1 (Robinson et al. 2011) and Alamut v.2.2 (Interactive Biosoftware, Rouen, France).
Variant analysis was performed without bias with a cascade of filtering steps previously described (Walsh et al. 2010). The reference sequence used for CFTR was NM_000492.3. All candidate variants were required on both sequenced DNA strands and to account for ≥20% of total reads at that site with a minimum depth of coverage of 80X. Common polymorphisms (≥5% in the general population) were discarded by comparison with dbSNP138, the 1000G (January 2015, http://www.1000genomes.org), the Exome Variant Server (January 2015, http://evs.gs.washington.edu), and an in-house exome variant database to filter out both common benign variants and recurrent artifact variant calls. However, as these databases also contain known diseaseassociated mutations, all detected variants were compared to our internal mutation database (CentoMD â ) and HGMD â to directly identify and annotate changes previously described in the literature as definitely and likely pathogenic, uncertain, and neutral variants.
The 95% confidence intervals (CI) were calculated by statistical inference using the standard deviation (SD) (Mattocks et al. 2010;Chan et al. 2012). In instances where there were no false positives (SD = 0), the 95% CI were produced with the Wilson score method (Newcombe 1998).

Evaluation of the pathogenicity of the variants
Evaluation of the pathogenicity of the variants not previously described in the literature and absent in the Cen-toMD â and HGMD â databases was performed with the following criteria. Mutations predicted to result in a premature truncated protein: nonsense, frameshift mutations, and large genomic rearrangements, as well as canonical splice site mutations were classified as definitely pathogenic. Missense variants were considered a priori unclassified sequence variants (UCV) and their potential pathogenicity was evaluated taking into consideration the biophysical and biochemical difference between wild type and mutant amino acid, the evolutionary conservation of the amino acid residue in orthologs (Tavtigian et al. 2006), a number of in silico predictors (Sift, Polyphen, Mutation taster and Condel), and population data. Then UCV were classified into three groups: likely pathogenic, neutral and variants of uncertain significance when previously conflicting information has been published about their functionality. Noncanonical splicing variants were analyzed using Alamut version 2.2 (Interactive Biosoftware), a software package that uses different splice site prediction programs to compare the normal and variant sequences for differences in potential regulatory signals.

Sequencing statistics
The Ion AmpliSeq TM CFTR Panel (Life Technologies) generates 102 amplicons of 150 bp on average, that cover all targeted coding exons and exon-intron boundaries (including 10-20 bases of flanking sequences around all targeted coding exons) of the CFTR gene. It has been designed to yield sequence coverage redundancy with overlapping amplicons across exons. Sequencing of the CFTR gene in the 177 patients generated a mean of 90,650 reads per patient. On average, 98% of these reads mapped to the targeted regions of CFTR. An evenly distributed mean depth of coverage of 852X for CFTR was achieved (Table 1). Ninety-four percent of the targeted base pairs of CFTR were covered by more than 100 reads. To determine if coverage was substantially lower for any region, we calculated the proportion of base pairs that were captured by <50 reads, which is the minimum that we required to perform variant calling. The proportion of these poorly covered regions accounted for 2.35% of CFTR targeted base pairs, being all of them randomly spread over intronic regions at the ends the amplicons and sequencing reads.
From these data, we can conclude that all samples were uniformly covered at depths that in all cases exceed by far the minimum coverage required for reliable variant calling (Fig. 1). The minor differences between samples were neutralized by the excessive overall coverage achieved by the assay. The sequence quality metrics of this data warrant a confident detection of variants in all patients.

Detection of CFTR variants
The selection of the samples for this study was carried out with the idea to include as many different types of CFTR variants as possible, to simulate a real-world diagnostics scenario, so that we could test the performance of the NGS assay for different types of genetic variation. To assess the sensitivity of the assay we blindly inspected all mapped sequence reads from the 177 samples with previously defined sequence variants analyzed with the conventional diagnostic workflow. We identified 630 of 667 previously known mutations and variants in their correct zygosity status, including SNVs, InDels, and large SVs achieving a sensitivity of 94.45% (95% CI: 92% to 96.9%). The 37 false negatives accounted for a total of four unique variants: c.744-37_744-34delATTA (seen in 2 patients), c.744-9_744-6del (seen in 30 patients), c.1647T>G (seen in 1 patient), and c.3718-2531A>G (seen in 4 patients). All of them are intronic benign variants, except c.1647T>G which has been previously reported as pathogenic. All false negative variants were consequence of the absence sequencing reads in their loci in the affected patients. However, these regions were identified by the bioinformatics pipeline and automatically reported as target regions with low or inexistent sequence coverage that required Sanger sequencing repeats for gap filling the NGS data.
To assess the specificity of the assay across the targeted bases of the CFTR gene, we evaluated all sequenced positions previously screened by Sanger sequencing. Genotype data were available across the 177 patients for a total of 1,823,100 sites within the targeted regions of CFTR. Specificity of detecting nonvariant sites from the CFTR reference sequence was 100% (1, We also inspected the CFTR sequencing depth profile of all the patients with the aim to detect large SVs. Noteworthy, a previously described homozygous large deletion spanning exons 4-11 was detected in one of the patients (Fig. 2), confirming the previous MLPA results (i.e. 0 false positive calls).
An overview of the definitely pathogenic mutations, likely pathogenic mutations, uncertain, and neutral variants in these samples, as confirmed by conventional Sanger sequencing and MLPA, is listed in Table 2.

Discussion
The accurate diagnosis of CF combines clinical evaluation, in particular medical symptoms of the CF phenotype and sweat test measurements, with CFTR genetic testing. To date, the molecular characterization of CFTR mutations in a given sample relies on commercial tests that screen for specific common mutations. Test panels range from 4 to 70 CFTR mutations and comprise technologies such as reverse dot blot INNO-LIPA CFTR (Innogenetics, Gent, Belgium), Cystic Fibrosis Genotyping Assay/OLA (Abbott, Chicago, IL), Elucigene CF-EU2 (Elucigene, Manchester, United Kingdom) and xTAG Cystic Fibrosis 71 kit v2 (Luminex, Austin, TX) among others. More recently an FDA approved NGS-based platform screening for 139 CFTR mutations from Illumina has been released (Grosu et al. 2014). The detection rate of these panels varies depending on the mutations included and the molecular heterogeneity of each population. For many patients with common CFTR mutations that are present in these commercial panels, there is no need for additional studies. However, the high heterogeneity of CFTR mutations in some CF populations and in CFTR-RD often requires the complete molecular screening of the 27 exons and the regulatory regions of CFTR, a putative costly and laborintensive task.
The purpose of this study was to evaluate and establish a NGS workflow based on the Ion Torrent PGM benchtop next-generation sequencer (Life Technologies), as a routine method for comprehensive genetic screening of CFTR for CF and CFTR-related diagnostics. We show that it can be easily incorporated into clinical practice with low-cost and short turnaround time. In addition, this strategy offers a complete definition of the two genes, including the 23-mutation panel recommended by the ACMG, without the need, anymore, for stepwise testing and choosing which exon to sequence first. While excluding carry-over estimation, specimen stability testing and intra or interassay precision assessment, we performed a comprehensive NGS versus Sanger genotype comparison that has statistically validated, in terms of both sensitivity and specificity, the Ion Torrent PGM benchtop NGS method for routine CF screening. The bioinformatics pipeline applied here proves high sensitivity, specificity and predictive values in detecting different classes of sequence variants. We are able to identify the most important genetic alterations in CFTR, including SNVs, InDels, and large SVs. Strikingly, a seven exon spanning deletion as well as several more technically challenging variants such as the pathogenic poly-TG-T haplotypes were detected.
Recently, different NGS platforms and genomic enrichment strategies have been tested for the identification of sequence variants in CFTR, demonstrating comparable performance in terms of both specificity, sensitivity, and time-and cost effectiveness with the assay described here (Trujillano et al. 2013;Grosu et al. 2014). In our case, we decided to adopt in our CFTR diagnostics workflow the PGM in combination with the Ion AmpliSeq TM CFTR Panel (Life Technologies), because it delivers fast TAT coupled with throughput flexibility, enabling rapid timeto-results in processing either a small or large number of samples. In addition, it offers fast library construction for affordable targeted sequencing of the CFTR gene, based on ultrahigh-multiplex PCR, requiring as low as 10 ng of input DNA. All these arguments make of this system a convenient NGS configuration easily adaptable by diagnostic labs, as an accurate, economical, and easy-toimplement end-to-end solution.
In routine diagnostics current stepwise Sanger sequencing and choosing which genetic region to sequence first, often becomes time consuming and expensive. Additionally, Sanger has been shown to be incomplete in terms of the identification of disease-causing variants and intricate  CFTR regions such as poly-TG-T in comparison to NGS (Chen and Prada 2014). Our NGS-based strategy not only enables rapid time-to-highly accurate results in processing either small or large sample numbers, but also offers fast library construction for affordable targeted sequencing of the CFTR gene based on ultrahigh-multiplex PCR.
In summary, we are opening new diagnostic avenues to concurrently investigate different types of pathogenic sequence variants by presenting a NGS-based CFTR genetic screening workflow as a precise and economical alternative to conventional CFTR genetic testing in medical laboratories. This straightforwardone assayapproach offers high clinical convenience for the handling of CF genetic diagnostics, allowing test reporting 7 days after receiving the DNA samples.

Acknowledgments
We wish to thank the patients for taking part in this study, and the referring physicians who participated in this study.  Sabrina Eichler, and Jenny Creed are employees of Centogene AG. Arndt Rolfs is the founder and CEO of Centogene AG; he is also Professor at the University Rostock as head of the Albrecht-Kossel-Institute. The authors have no financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed ahead. This includes employment, consultancies, honoraria, stock ownership or options, expert testimony, grants or patents received or pending or royalties. Erol Baysal, Iqbal Yousuf Jaber, Dina Ahmed Mehaney, and Chantal Farra declare no conflict of interest. The project was supported in part (consumables) by Life Technologies. No writing assistance was utilized in the production of this manuscript.