Accurate genetic diagnosis of Finnish pulmonary arterial hypertension patients using oligonucleotide-selective sequencing

The genetic basis of pulmonary arterial hypertension (PAH) among Finnish PAH patients is poorly understood. We adopted a novel-targeted next-generation sequencing (NGS) approach called Oligonucleotide-Selective Sequencing (OS-Seq) and developed a custom data analysis and interpretation pipeline to identify pathogenic base substitutions, insertions, and deletions in seven genes associated with PAH (BMPR2, BMPR1B, ACVRL1, ENG, SMAD9, CAV1, and KCNK3) from Finnish PAH patients. This study represents the first clinical study with OS-Seq technology on patients suffering from a rare genetic disorder. We analyzed DNA samples from 21 Finnish PAH patients, whose BMPR2 and ACVRL1 mutation status had been previously studied using Sanger sequencing. Our sequencing panel covered 100% of the targeted base pairs with >15× sequencing depth. Pathogenic base substitutions were identified in the BMPR2 gene in 29% of the Finnish PAH cases. Two of the pathogenic variant-positive patients had been previously tested negative using Sanger sequencing. No clinically significant variants were identified in the six other PAH genes. Our study validates the use of targeted OS-Seq for genetic diagnostics of PAH and revealed pathogenic variants that had been previously missed using Sanger sequencing.


Introduction
The role of genetic diagnostics in the evaluation of patients with Mendelian disorders is increasing rapidly in all fields of medicine. Understanding the underlying genetic causes of inherited diseases can contribute to the treatment and follow-up strategies, and may bring psy-chological comfort for the patient. Moreover, genetic diagnosis of the proband can allow effective risk assessment of family members, rationalize follow-up strategies, and enable earlier interventions.
Pulmonary arterial hypertension (PAH) is a severe and progressive disease characterized by vascular remodeling of small pulmonary arteries resulting in an increase in pulmonary arterial pressure, and eventually leading to right heart failure (Runo and Loyd 2003). Despite the improvement of treatments, PAH still remains a fatal disease with high mortality (Chakinala 2005). While the pathogenesis of idiopathic and hereditary forms of PAH (IPAH and HPAH) is a complex and incompletely understood, their genetic basis is well recognized. Hundreds of mutations in eight genes have been reported to associate with both IPAH and HPAH. HPAH is commonly regarded as an autosomal dominant disease (Larkin et al. 2012;Machado et al. 2009).
Mutations in the bone morphogenetic protein receptor 2 (BMPR2) gene, a member of the transforming growth factor beta (TGF-b) superfamily, are currently considered as main causes for the pathogenesis of PAH (Machado et al. 2006). Over 300 different BMPR2 mutations have been identified in PAH patients (Machado et al. 2009(Machado et al. , 2001. Mutations in the BMPR2 gene have been identified in up to 75% of HPAH cases and in 25% of IPAH cases. Previously, the penetrance of PAH among BMPR2 mutation carriers was considered low, however, a recent study based on the Varderbilt Pulmonary Hypertension Registry found an overall penetrance of 27%, (42% in females and 14% in males). The disease penetrance could be even higher as the disease manifestation may be delayed for 75 years and express phenotypic variability (Larkin et al. 2012;Machado et al. 2009;Soubrier et al. 2013).
The genetic factors contributing to PAH in Finland, a genetic isolate in Europe, has not been comprehensively evaluated. In this study, we analyzed Finnish PAH patient isolate to detect pathogenic or likely pathogenic variants associated with PAH by utilizing the novel-targeted Oligonucleotide-Selective Sequencing (OS-Seq) technology (Myllykangas et al. 2011). We designed target-specific oligonucleotides to capture all coding exons, exon-intron boundaries, and known intronic mutations in the PAHassociated target genes. All these target genes were comprehensively analyzed using OS-Seq. These 21 Finnish PAH patients had been previously tested for mutations in the BMPR2 and ACVRL1 genes using Sanger sequencing (Sankelo et al. 2005). With OS-Seq we detected all previ-ously identified pathogenic or likely pathogenic variants of BMPR2 gene and identified two new pathogenic BMPR2 variants that were previously missed using Sanger sequencing. Our results validate the use of OS-Seq panel for diagnostics of PAH.

Patient samples
Blood samples for genetic studies were obtained from Finnish patients diagnosed with PAH under the approval from the Ethical Committees of the five University Hospitals and by the Ministry of Social Affairs and Health of Finland. All participants provided the written informed consent for genetic analysis as previously reported (Sankelo et al. 2005). Genomic DNA was extracted from the blood samples (Sankelo et al. 2005) and DNA samples of 21 Finnish iPAH (N = 18) and HPAH (N = 3) patients were selected for the study.

PAH panel
Seven genes were selected for the PAH panel based on literature search in 2012-2013 (BMPR2, BMPR1B, ACVRL1, ENG, SMAD9, CAV1, and exon 2 of KCNK3) and all validated information about the mutations were incorporated into the bioinformatics pipeline (Table 1 and Fig. 1). Noteworthy, exon 1 of KCNK3 was omitted in the analysis because it is difficult to sequence due to extremely high GC content and because all mutations in KCNK3 that have been linked with PAH are located in exon 2 (Ma et al. 2013).

Sequencing
The quality of genomic DNA samples was evaluated using gel electrophoresis and DNA concentration was measured using Qubit (Life Technologies, Grand Island, NY). Geno- mic DNA samples were fragmented using Fragmentase (New England Biolabs, Ipswitch, MA) and sequencing libraries were prepared using NEBNext Ultra (New England Biolabs) and PCR amplified. Target-specific oligonucleotides were designed to capture all coding exons, exon-intron boundaries, and known intronic mutations in the PAH-associated target genes and obtained from Integrated DNA Technologies (Leuven, Belgium). Exon definitions for target genes were derived from the CCDS. The targeting and sequencing of DNA was automated using the MiSeq Desktop Sequencer (Illumina, San Diego, CA). Pathogenic and likely pathogenic mutations found in the patient samples were confirmed using Sanger sequencing.

NGS data analysis
Primary data analysis was carried out by Illumina's MiSeq Reporter software using the generateFASTQ workflow, that generates a single FASTQ file with base calls and Phred quality scores for R1 and R2 reads, separately. The data were transferred to a dedicated pipeline server for downstream analysis, which started with extraction of data for individual samples using a proprietary demultiplexing algorithm that analyses the 9 bp long index adapter sequence of each read to determine sample identity. Sequence reads were then trimmed to exclude adapter and capture oligo sequences using a proprietary algorithm that estimates the size of captured target DNA based on the genomic coordinates of read mate-pairs. Low-quality data were identified and removed using the Prinseq package (Schmieder and Edwards 2011) to exclude reads with more than 10% of unclassified nucleotides, as well as trimming nucleotides with Phred scores inferior to 20 from starting from the 3 0 -end. The cleaned-up sequence reads were subsequently aligned to the human reference sequence assembly GRCh37 using the Burrows-Wheeler Aligner (Li and Durbin 2009) (Adzhubei et al. 2010;Henikoff 2001, 2002), and Mutation Taster (Schwarz et al. 2014) tools were used to predict the effects of variants on the protein structure and functions. Variants were classified as pathogenic or likely pathogenic when previous studies had identified them in other PAH patients. Furthermore, pathogenic gene variants associated with PAH are considered to be extremely rare or absent in control populations and often considered deleterious by in silico prediction.

Results
Here, we present the first clinical study on Finnish PAH patients using the novel-targeted OS-Seq technology. We confirmed pathogenic or likely pathogenic variants of BMPR2 in four patients and detected two new pathogenic variants of BMPR2 gene that were previously missed using Sanger sequencing (Sankelo et al. 2005  c.2696G>C (p.Arg899Pro) located in C-terminal cytoplasmic domain ( Table 2). Both of these variants featuring missense mutations were classified as deleterious in all in silico variant effect prediction analyses. In addition to these five pathogenic variants, three other variants were detected in BMPR2. Polymorphisms were observed in two patients and one short GGC-insertion in the promoter was identified in three patients (Table 3). The clinical significance of the identified GGC-insertion is unknown, since the frequency of this variant is unknown in control populations, as the region is not covered in these databases. Altogether, our OS-Seq panel revealed six (29%) PAH patients carrying a pathogenic, or likely pathogenic, variant of BMPR2 gene (Table 2).
In addition, seven variants were detected in ACVRL1 and ENG (Table 3). One patient carried a gene variant featuring a novel missense mutation c.536A>C (p.Asp179Ala) in a serine and threonine residue-rich region in the ACVRL1 gene. This ACVRL1 variant was predicted to be benign by in silico analyses. This patient also tested positive for a pathogenic variant featuring the frameshift mutation in BMPR2 (Table 3), and therefore the ACVRL1 variant was classified as likely benign. The variants identified in the ENG gene were not considered deleterious (Table 3).
We evaluated the average sequencing depth of the PAH panel for the DNA samples of the 21 Finnish PAH patients by measuring the number of overlapping Likely benign (c.536A>C, p.Asp179Ala). Mutation nomenclature is based on GenBank accession NM_001204.6 with nucleotide on being the first nucleotide of the translation initiation codon. * Refers to * Likely benign (c.536A<C, p.Asp179Ala). sequencing reads at each nucleotide position in the 12,638 base target region. The median sequencing depth was 7919 and coverage of nucleotides with >159 sequencing depth was 100%. To demonstrate the sequencing coverage performance of the PAH panel we calculated the percentage of nucleotides exceeding specific sequencing depth threshold (19-5009) (Fig. 2). To demonstrate further the efficiency of targeted sequencing, the sequencing depth was also evaluated for individual patient samples (Fig. 3) showing that 100% of the target regions were covered with >159 sequencing depth. Furthermore, we showed that the sequencing depth in the coding exons of the BMPR2 gene in six PAH patient samples were sequenced in at high depth, as the median coverage was 9789 and the coverage of nucleotides with >159 coverage was 100% (Fig. 4).

Discussion
Diagnostics of inherited diseases has entered a new era, in which sophisticated sequencing and bioinformatics reveal pathogenic mutations efficiently. There is estimated to be more than 7000 disorders with Mendelian inheritance and for many of these diseases the molecular genetic mechanism has been discovered. Although individual inherited diseases are rare, together they impose a significant clinical and economical challenge (Costa et al. 1985).
The increasing realization of genetic underpinnings of inherited diseases has led to diagnostic challenges. Obtaining a complete genetic view of the disease requires high-quality sequencing of large number of genes and genomic regions with explicit clinical relevance and is beyond the scope and capacity of Sanger sequencing applications. At the same time, treatment of inherited diseases in the clinic presumes faster turnaround time and cost-effectiveness from genetic testing. Therefore, long odysseys of mutation exclusion by queuing Sanger sequencing experiments are already outdated in the clinic. Novel technologies have emerged to fulfill the growing clinical demand and to overcome the challenges associated with genetic testing of inherited diseases. Targeted next-generation sequencing (NGS) is a practical solution for diagnostic-grade and comprehensive analysis of the clinically relevant genomic regions. While sequencing whole genomes or exomes are powerful tools for discovery and research, they fall short meeting the quality requirements of genetic diagnostics. Typical genome with 309 coverage or exome even with 1009 coverage contains gaps and regions where sequence information remains unreliable (Meynert et al. 2013;Rehm et al. 2013;Sims et al. 2014). In order to reach consummate clinical genetic interpretation and solid diagnosis, those regions where data are narrow or missing need to be patched with alternative approaches, such as Sanger Sequencing. Necessity of gap-filling makes application of whole-genome sequencing (WGS) and whole-exome sequencing (WES) for genetic diagnostics uneconomical and impractical. Targeted NGS approaches enable cost-efficient, reliable, and high-depth sequencing of clinically relevant genomic regions with complete coverage.
To demonstrate the utility of NGS in the diagnostics of inherited diseases, we have developed an OS-Seq sequencing panel to accurately detect mutations in seven genes that have been implicated in the pathogenesis of PAH. We collected over 900 mutations from literature and existing databases to support the interpretation of the patients' variants. Over 90% of the collected mutations were base substitutions causing missense, nonsense, and splice site mutations or small indels of up to 20 bp that can be identified with NGS (Fig. 1).
The overall performance of our test was high. Sequencing depth and coverage are the main indicators of the performance of NGS. Sequencing depth refers to the number of sequencing reads that pileup at specific nucleotide and is directly associated with the confidence of the genotype call. The average (measured as median) sequencing depth was 791, which illustrates the confidence in calling the genotypes. Sequencing coverage defines the breadth of the genomic area, which has sequencing depth exceeding a specific threshold. Sequencing coverage is related to sensitivity of detecting variants in the target regions (Sims et al. 2014). We regarded target regions being covered when exceeding a 159 sequencing depth. It has been demonstrated that a sequencing depth of at least 159 is appropriate for making confident calls, as calling heterozygous variants rarely requires more than 13 overlapping sequence reads (Meynert et al. 2013;Sims et al. 2014). Sequencing coverage at  >159 depth for our PAH panel was 100% indicating a high sensitivity to detect variants in the target regions.
We analyzed 21 Finnish PAH patients using our novel, targeted OS-Seq panel for PAH. Although the study cohort was small it was able to confirm relatively high incidence (29%) of pathogenic or likely pathogenic variants in BMPR2 gene among Finnish PAH patients (Table 2). In this cohort, the analysis of other PAH-associated genes did not reveal any other genetic variants considered significant (Table 3), further pinpointing the pivotal role that BMPR2 has in the pathogenesis of PAH. Importantly, with our OS-Seq approach, we identified all previously detected BMPR2 mutations and identified two individuals with pathogenic or likely pathogenic BMPR2 gene variant that were previously missed by Sanger sequencing. In the Sanger method, nonspecific binding of the primers and the formation of DNA secondary structures may cause sequencing errors (Hert et al. 2008;Sanger et al. 1977). There is also increased risk for human error when interpreting the raw sequencing results due to ambiguities in the capillary electrophoresis readouts. These factors could have caused the failures to identify the two BMPR2 mutations in the primary study (Sankelo et al. 2005).
ACVRL1 or ENG mutations are often found in PAH patients with family history of hereditary hemorrhagic telangiectasia (HHT) (Harrison et al. 2003). As none of the studied Finnish PAH patients manifested clear HHT-disease, it was not surprising that the analysis of ACVRL1 and ENG did not identify any pathogenic variants ( Table 3). The patient with the variant in the ACVRL1 gene (c.536A>C, p.Asp179Ala) also carried a pathogenic deletion in the BMPR2 gene (c.1376_1377delGA, p.Arg459ThrfsX11) (Table 2). Interestingly, this pathogenic BMPR2 variant resulting a frameshift was missed in previous study (Sankelo et al. 2005) using Sanger sequencing and the ACVRL1 mutation was considered a potential cause for PAH although the patient lacked clear symptoms associated with HHT. As the patient is carrying a BMPR2 variant resulting a deletion and is considered to be pathogenic, we suggest that it is unlikely that this AC-VRL1 variant is an independent disease-causing mutation. The identified ENG gene variant, featuring a missense mutation (c.14C>T, p.Thr5Met; rs35400405), from two patients is interesting as it leads to the replacement of threonine by methionine resulting a suppression of ENG expression (Bourdeau et al. 2001). Despite its direct effects on protein (Bourdeau et al. 2001) it is predicted to be non-disease causing as the allele is common in general population (MAF 0.04790) ( Table 3).
Characterization of mutations in PAH enables establishing estimates on prognosis. BMPR2 mutations are associated with an earlier age of disease onset, regardless of familial history, and with more severe disease phenotype. Furthermore, patients with BMPR2 mutations are less likely to response to vasodilators than mutation-negative patients (Austin et al. 2009;Ma and Chung 2014). PAH patients carrying mutations in ACVRL1 gene show more rapid disease progression than patients with BMPR2 mutations, despite responding to vasodilators at the time of diagnosis (Girerd et al. 2010). In addition, patients identified with BMPR2 and ACVRL1 mutations have a shorter survival time or earlier need for lung transplantation (Austin et al. 2009;Chida et al. 2012b;Girerd et al. 2010;Ma and Chung 2014). The study of Austin and coworkers in 2009 reported that patients with missense mutations in BMPR2 gene are diagnosed in much younger age and seem to have a significantly shorter survival time compared to patients with truncating mutations (Austin et al. 2009). In 2012, Liu with colleagues reported male BMPR2 mutation carriers had a worse prognosis compared to female mutation carriers (Liu et al. 2012). Pediatric PAH patients, with or without BMPR2 mutation, show distinct response to vasodilators than adults, and are more likely to associate with other genetic syndromes (Ma and Chung 2014).
Utilization of novel targeted OS-Seq approach increases the diagnostic efficacy by offering better quality and faster turnaround time. As shown in this study, OS-Seq technology proved to be a practical and effective tool for genetic profiling of Finnish patients with PAH. As genetic testing is increasingly applied in clinical practice, it is important to acknowledge the requirements of quality and performance of available genetic tests. Diagnostic-grade deep sequencing with 100% covered target base pairs is becoming a standard requirement in today's clinical genetic testing. The high incidence of disease mutations in patients with IPAH and HPAH and elevated estimates of disease penetrance are supporting the utilization of genetic testing as routine procedure in the evaluation of patients with IPAH or HPAH. As in other hereditary cardiovascular diseases genetic diagnosis can significantly rationalize the risk stratification and follow-up strategies in the family and can have impact in estimating index patient's prognosis. New sequencing strategies and bioinformatics tools are enabling diagnostics with faster turnaround times and decreased cost.