Analysis of genetic variants in myeloproliferative neoplasms using a 22-gene next-generation sequencing panel

The Philadelphia (Ph)-negative myeloproliferative neoplasms (MPNs), namely essential thrombocythaemia (ET), polycythaemia vera (PV) and primary myelofibrosis (PMF), are a group of chronic clonal haematopoietic disorders that have the propensity to advance into bone marrow failure or acute myeloid leukaemia; often resulting in fatality. Although driver mutations have been identified in these MPNs, subtype-specific markers of the disease have yet to be discovered. Next-generation sequencing (NGS) technology can potentially improve the clinical management of MPNs by allowing for the simultaneous screening of many disease-associated genes. The performance of a custom, in-house designed 22-gene NGS panel was technically validated using reference standards across two independent replicate runs. The panel was subsequently used to screen a total of 10 clinical MPN samples (ET n = 3, PV n = 3, PMF n = 4). The resulting NGS data was then analysed via a bioinformatics pipeline. The custom NGS panel had a detection limit of 1% variant allele frequency (VAF). A total of 20 unique variants with VAFs above 5% (4 of which were putatively novel variants with potential biological significance) and one pathogenic variant with a VAF of between 1 and 5% were identified across all of the clinical MPN samples. All single nucleotide variants with VAFs ≥ 15% were confirmed via Sanger sequencing. The high fidelity of the NGS analysis and the identification of known and novel variants in this study cohort support its potential clinical utility in the management of MPNs. However, further optimisation is needed to avoid false negatives in regions with low sequencing coverage, especially for the detection of driver mutations in MPL.

blood cells), and primary myelofibrosis (PMF) (characterised by progressive bone marrow fibrosis, and is further subdivided into an early, prefibrotic stage (pre-PMF) and a late, overt fibrotic stage (overt-PMF)). Among the three, ET is the most indolent, whereas PMF is associated with the highest symptom burden and worst prognosis. Common causes of morbidity and mortality include thromboembolic and/or haemorrhagic complications, as well as disease progression to myelofibrosis (MF) and/ or transformation to acute myeloid leukaemia (AML), all of which vary in frequencies between MPN subtypes [1]. To date, allogenic stem cell transplantation remains the only curative option for MPNs but it is often not considered due to age-related co-morbidities and high transplant-associated mortality rates. Hence, available treatment options such as phlebotomy, aspirin, hydroxyurea, anagrelide, pegylated interferons, and JAK inhibitors are primarily aimed at reducing the risk of disease complications, disease progression, and malignant transformation [2,3].
The discovery of driver mutations in the JAK2, CALR and MPL genes contributed towards the improved accuracy of MPN diagnostics. The JAK2 gene encodes the JAK2 non-receptor tyrosine kinase associated with several receptors which are critical for normal myelopoiesis, including the erythropoietin, thrombopoietin, and granulocyte colony-stimulating factor receptors; the MPL gene encodes the thrombopoietin receptor; while the CALR gene encodes the chaperone protein calreticulin. Driver mutations in these genes (JAK2 p.V617F or exon 12 mutations, CALR exon 9 insertions and deletions, and MPL W515 mutations) lead to the constitutive activation of associated receptors and the subsequent amplification of downstream signalling pathways, which result in the aberrant cell proliferation in MPN. However, these driver mutations are not MPN subtype-specific, and can also be found in other myeloid diseases such as myelodysplastic syndromes (MDS), myelodysplastic/myeloproliferative neoplasms (MDS/MPN) and AML [4,5]. In addition, there are also MPN cases that are triple-negative (TN) for all the driver mutations.
The management of MPNs requires an integrated, multimodal approach that includes the evaluation of clinical features, peripheral blood smear, bone marrow morphology, immunophenotype, cytogenetics, as well as molecular genetics, as recommended in the 2016 World Health Organisation (WHO) classification guidelines [4]. As MPNs are chronic in nature, cases are often asymptomatic upon diagnosis and discovered incidentally through physical examination or abnormal blood counts [4]. Typically, the diagnostic procedure includes molecular tests for MPN driver mutations, as well as the morphological analysis of the peripheral blood smear and bone marrow aspirate/trephine biopsy; the findings of which are correlated with the results of the full blood count. However, morphological features such as the degree of bone marrow fibrosis used for disease diagnosis are subjective in nature and thus have high inter-observer variability [6][7][8]. Therapeutic decisions are then made based on the patient's mutational landscape, disease burden, as well as disease prognosis. Nevertheless, the significant genotypic and phenotypic heterogeneity that exist between MPN subtypes and other myeloid disorders remains a challenge towards effective management of MPNs.
Rapid advancements in gene sequencing technology in the last decade have led to the discovery of other MPN-associated genetic aberrations, such as mutations in ASXL1, EZH2, TET2, IDH1, IDH2, SRSF2 and SF3B1, contributing towards improved disease prognostication. In a novel prognostic model developed by Grinfeld,et al. [9], patients with MPN were stratified into eight groups by combining genomic data with traditional laboratory and clinical findings. Compared to current prognostic models, the model was superior in performance and was able to better define disease outcomes, especially within the "intermediate-risk" categories of the current prognostic models [9]. However, more comprehensive genetic profiling of patients using next-generation sequencing (NGS) technology is required before such a model can be implemented in clinics worldwide.

Technical validation of custom NGS panel performance
The custom NGS panel performance was validated using four reference standards through two identical but independent NGS runs (Additional file 3: Table S3). The reference standards used were: (1) Tru-Q0 (100% wild-type) Reference Standard (Horizon, Cambridge, UK)-as a negative control; (2) Tru-Q1 (5% Tier) Reference Standard (Horizon)-as a positive control for 5% mutant allele in FLT3 (p.ΔI836), IDH1 (p.R132C), and JAK2  Table S3). The NGS assay performance was assessed based on the sensitivity, specificity, repeatability, concordance, and positive predictive values.

Study cohort
Ten patients who were clinically diagnosed with ET, PV or PMF according to the 2016 WHO classification guidelines [4] in the participating institutions were recruited for this study. All patients who agreed to participate in the study provided their written consent. Approximately 10 mL peripheral blood samples from each patient were collected in EDTA tubes and processed within 24 h to extract the DNA from the cells. Patient demographic and clinical data were recorded by using standardised forms.

Genomic DNA extraction and quality control
Genomic DNA was extracted from each blood sample using the Maxwell ® RSC Whole Blood DNA Kit (Promega, Madison, USA) according to the manufacturer's guidelines. The purity (A 260 /A 280 and A 260 /A 230 ) and concentration of extracted DNA were assessed using the Nanodrop ™ (Thermo Fisher Scientific, Waltham, USA) spectrophotometer. DNA samples with satisfactory purity ratios of 1.80-2.00 (A 260 /A 280 ) and 2.00-2.20 (A 260 /A 230 ) were visualised using 1% agarose gel electrophoresis and stored for downstream experiments.

Preparation of NGS libraries and sequencing
The NGS libraries were prepared according to the AmpliSeq for Illumina On-Demand, Custom and Community Panels Reference Guide Protocol, DNA Panels Standard Workflow Procedure for Three Primer Pools, using reagents from the AmpliSeq ™ Library PLUS for Illumina ® . The concentration of the input DNA was measured using the Qubit dsDNA BR Assay and the concentration of the  Figure 1 shows a schematic representation of the steps involved in the quality control and bioinformatics analysis of the NGS data. The sequencing metrics were visualized using the Illumina Sequence Analysis Viewer software. The quality of the raw NGS data was assessed using the FastQC software on the Illumina BaseSpace ™ Sequence Hub. The sequencing data was analysed using a combination of two sequence alignment and variant calling applications (apps) also on the Illumina Bas-eSpace ™ Sequence Hub-the DNA Amplicon app and Pindel app, at a 1% somatic variant allele frequency (VAF) detection limit, aligned against the Genome Reference Consortium human genome build 37 (GRCh37). The DNA Amplicon app is designed to detect small variants, whereas the Pindel app is designed to detect larger structural variants such as large deletions (as large as 10 kb), medium sized insertions, inversions and tandem duplications [10]. The DNA Amplicon app also generates a detailed report that summarises various information including sample read quality, amplicon data, base level statistics, and coverage by amplicon region. The called variants were then annotated using wANNOVAR (Wang Genomics Lab, Philadelphia, USA).

Quality control and bioinformatics analysis of NGS data
For the technical validation of the custom NGS panel, only variants that were present in the four reference standards were filtered-in and prioritized; whereas for the screening of the clinical MPN samples, all annotated variants were manually reviewed by filtering and prioritizing using the following criteria: (1) all variants except for exonic variants were excluded, (2) variants with minor allele frequencies (MAFs) of ≥ 1% (as reported in the 1000 Genomes Project, ExAC, ESP6500, and gnomAD databases) were excluded, and (3) potential sequencing errors (variants with VAF of < 5% and/or appear in majority of the samples) were excluded. In addition, variants with VAFs between 1 and 5% were inspected to check for any pathogenic/ likely pathogenic variants. Aligned read (.bam) files for all samples were then manually inspected to confirm the presence of the filtered-in and prioritized variants using the Integrative Genomics Viewer (IGV) software (Broad Institute, Cambridge, USA). In order to identify any putative novel variants, the variants were manually checked against dbSNP [13] as well as the ClinVar [14] and COSMIC [15] databases to determine if they had been previously reported as pathogenic.

Variant confirmation via Sanger sequencing
The Ensembl genome browser [16] and NCBI Primer-BLAST [17] were used to design PCR primers for the target variants (primer sequences are listed in Additional file 4: Table S4). Each PCR mixture contained 17 μL of nuclease free water, 25 μL of the Amplitaq Gold ® 360 Master Mix (Thermo Fisher Scientific), 2 μL 360 GC Enhancer (Thermo Fisher Scientific), 2 μL each of 5 μM forward and reverse primer, and 2 μL of 50 ng/μL sample DNA. For the TET2 exon 7 p.D1314Mfs*48 variant, the 360 GC Enhancer in the PCR mixture was found to reduce the PCR product yield, and was therefore replaced with nuclease free water. The PCR thermocycling conditions for all NGS detected variants were as follows: 95 °C for 10 min, 35 cycles at 95 °C for 30 s, 55 °C for 30 s, and 72 °C for 1 min, followed by a final extension at 72 °C for 7 min. The PCR products were visualised using a 1% agarose gel electrophoresis to confirm successful amplification and then purified using Wizard ® SV Gel and PCR Clean-Up System (Promega) by centrifugation according to the manufacturer's instructions but eluted in 15 mL of nuclease free water twice to obtain higher product concentration. The concentration and purity of the PCR products were assessed using the Nanodrop ™ spectrophotometer as previously described. Samples with satisfactory concentration and purity ratios were sent for Sanger sequencing. Alignment of Sanger sequence (.seq) files were conducted via the ClustalW algorithm using  [12] the Jalview software [18] whereas the Sanger sequence chromatograms were analysed using the 4peaks software (Nucleobytes, Aalsmeer, Netherlands). The pathogenicity of the variants was assessed using ClinVar [14].

Technical validation of the custom NGS panel
The custom NGS panel performance was evaluated using a set of reference standards in two identical but independent NGS runs. After library preparation, all samples were confirmed to be of sufficient quality and quantity. The majority of the constructed libraries were within the targeted size range of 400 bp for sequencing using the custom NGS panel (Additional file 5: Fig. S1). Both NGS runs achieved cluster densities of 860 to 878 K/mm 2 , > 93% of clusters passed the quality filter, and > 95% of the read bases with quality scores of above Q30, which were close to The Miseq System specifications of 865-965 K/mm 2 cluster density and > 80% of bases above Q30 (Additional file 6: Table S5). All samples achieved ~ 99.5% on-target aligned reads and minimum amplicon mean coverage depths of between 4589x to 7944x (Additional file 7: Table S6). Analysis of the coverage depth per amplicon region revealed that 98.6% of the targeted regions (n = 216/219 amplicons) had average coverage depths of > 1000x. Two amplicons had average coverage depths of < 1000x, namely AMPL89337 (DNMT3A exon 17, chr2:25464411-25464625) and AMPL1156 (TP53 exon 4/exon 5, chr17:7578360-7578579), while AMPL117202 (MPL exon 10, chr1:43814902-43815103) had a coverage depth of below 100x (Additional file 8: Fig. S2).
Combined analysis of sequencing results with the DNA Amplicon and Pindel apps revealed that the former was able to detect all known variants in the reference standards except for large duplications in FLT3; whereas Pindel was able to detect all frameshift variants as well as large duplications in FLT3, but not SNVs (Fig. 2). Overall, the custom NGS panel has a sensitivity of 99.2%, a specificity of 96.3%, a positive predictive value of 97.7%, an average intra-run concordance of 98.8% [range 95.2-100%], an average inter-run concordance of 99.0% [range 95.2-100%], and a detection limit of 1% VAF (Fig. 2).

Study cohort demographics and clinical data
A total of 10 MPN patients (ET n = 3, PV n = 3, PMF n = 4) were recruited for this study (Sunway Medical Centre (n = 7), Ampang Hospital (n = 1) and Universiti Kebangsaan Malaysia Medical Centre (n = 2)) ( Table 1). All patients were diagnosed based on the latest WHO criteria [4]. The median age of diagnosis was 52 years (range: 30-79 years). The majority of the patients were male (n = 7/10). Half of the study cohort was of Chinese ethnicity, 4 were Malay, and 1 was Dutch. Six patients presented with constitutional symptoms, and only 1 patient (Sample 09, overt-PMF) presented with hepatosplenomegaly. Half of the study cohort had a history of smoking. Seven patients were tested positive for the JAK2 V617F driver mutation, and 3 were positive for CALR driver mutations. No patients tested positive for MPL driver mutations.

Identification of genetic variants in clinical MPN samples
An initial total of 314 unique variants were detected across all 10 clinical MPN samples (Fig. 3). After filtering out intronic and UTR variants as well as all variants with MAFs of ≥ 1%, 115 exonic variants remained. Subsequently, variants determined as sequencing errors were excluded. Ultimately, a total of 20 unique variants (which include known MPN driver mutations) with VAFs above 5% were identified across the 10 clinical MPN samples, and one variant with a VAF of 1.4% was found to be pathogenic as listed in the COSMIC database (COSM97191) ( Table 2).
The ABL1 p.Asn350Ser point mutation identified in this study (Sample 07, pre-PMF) was reported in dbSNP (rs144448357). However, it is unknown whether it was previously identified in MPN due to the lack of ClinVar and COSMIC data ( Table 2). The variant was identified in a patient diagnosed with pre-PMF at 50 years of age, with JAK2 V617F mutation. The patient presented with increased platelet (859 × 10 9 /L) and white cell counts (17.8 × 10 9 /L), as well as constitutional symptoms (Table 1).
Four ASXL1 variants were identified in this study, of which, the ASXL1 p.Gly646Trpfs*10 (c.1927_1928insGGG GGG GGT GGC CCG GGT GGA GGT GGC GGC GGG GCC ACC GAT GAG GGG GGG GGC AGA GGC AGC AGC A) stopgain variant (Sample 04, PV) was found to be located in the same dbSNP cluster rs750318549 as the previously reported ASXL1 p.Gly646Trpfs*12 (c.1934dupG) stopgain, which is the most common ASXL1 mutation accounting for > 50% of all identified ASXL1 mutations in myeloid malignancies [19] (Table 2). While ASXL1 p.Gly646Trpfs*12 is the result of a duplication of a G nucleotide within a homopolymer region of eight G nucleotides [19], the variant ASXL1 p.Gly646Trpfs*10 identified in this study is the result of an insertion of 67 nucleotides at position chr20: 31022442, making it a novel stopgain variant. Two other ASXL1 variants identified in this study, ASXL1 p.Leu731Tyrfs*12 (Sample 09, overt-PMF) and Q1433Q (Sample 03, ET) were also putative novel variants with no dbSNP, COSMIC or ClinVar data; whereas the ASXL1 p.Tyr591*variant (Sample 10, overt-PMF) has been reported in various diseases including ET and MF [9,20], MDS [21], chronic myelomonocytic leukaemia [22], AML [23], mast cell neoplasm [24] and CNL [25], as well as breast cancer [26], but has not been previously reported in PV (Table 2).

Discussion
In this study, we evaluated the performance of our custom 22-gene NGS panel using reference standards and subsequently, a small cohort of clinical MPN samples. The 22 genes selected in this panel are known to be frequently mutated in myeloid neoplasms (not necessarily MPNs), and are known disease markers with diagnostic, prognostic and/or therapeutic value. First, the performance of the custom NGS panel was technically validated using reference standards in two identical but independent sequencing runs. The combined analysis of variants detected by the DNA Amplicon and Pindel tools revealed that the panel achieved high sensitivity, specificity, concordance and positive predictive values. The overall good performance of the panel was further supported in its ability to detect variants with VAF values as low as 1% (Fig. 2). Overall, the depth-of-coverage achieved across all amplicons was around our targeted coverage of 5000x (Additional file 8: Fig. S2). However, the low average depth-of-coverage of the MPL amplicon (AMPL117202, MPL exon 10, chr1:43814902-43815103) Fig. 3 Variant filtering and prioritization process. Note that after this process, variants with VAFs between 1 and 5% were inspected for the presence of any pathogenic/likely pathogenic variants. Adapted from Zheng et al., 2018 [12] at < 100 × could potentially lead to false negative results. In order for the panel to be used in MPN diagnostics, optimisation of the sequencing coverage for the MPL region is critical for the detection of driver mutations in MPL. In addition, AMPL89337 (DNMT3A exon 17, chr2:25464411-25464625), AMPL1156 (TP53 exon 4/exon 5 chr17:7578360-7578579) and AMPL90417 (ASXL1 exon 13, chr20:31022935-31023187) were found to have average depths-of-coverage of < 1000x (Additional file 8: Fig. S2, Additional file 10).
Uneven coverage or coverage bias may originate from various steps in the NGS workflow. The use of the PCR method during NGS library preparation has been found to be the primary contributor [38][39][40]. In PCR, the annealing efficiency of primers is commonly hindered by templates that are GC-rich that tend to remain double-stranded, as well as templates that are AT-rich (or GC-poor) that anneal poorly to primers. As a result, GC-and AT-rich regions (also known as low-complexity regions) are often poorly amplified and manifest in NGS as regions with low sequencing coverage. Further analysis of AMPL117202 showed that it had a GC content of 61% and AT content of 39%, and a plot of the nucleotide distribution revealed that the GC content was > 60% in the majority of 30 bp windows (Additional file 11: Fig. S4). Hence, it is likely that the majority of the AMPL117202 amplicons remained annealed as doublestranded DNA during PCR, leading to inefficient amplification and subsequent low NGS coverage of the amplicons. Future studies should involve the review of targeted regions and further optimisation of the NGS library preparation to improve the coverage of GC-rich regions, i.e. optimising PCR amplification conditions by using lower primer-extension temperature [38]. In order to rule out false negatives due to coverage bias, variants located in DNMT3A exon 17 and TP53 exon 4/exon 5 should also be used to validate the performance of the custom panel.
From the screening of clinical MPN samples with the custom NGS panel, 21 unique variants including MPN driver mutations in JAK2 and CALR, and one pathogenic stopgain variant in TET2 with a VAF of 1.4% were identified. The ASXL1 p.Leu731Tyrfs*12, p.Gln1433Gln, p.Gly646Trpfs*10 and TET2 p.Tyr1245Cys variants were identified as putative novel variants, whereas reported and likely pathogenic variants were identified in ABL1, SF3B1 and U2AF1. The study findings support the notion that the co-presence of multiple variants within a single sample results in a potentially synergistic effect that promotes disease development and progression, and contributes towards higher symptom burden, poorer prognosis, higher risk of leukaemic transformation, and drug resistance [21,[41][42][43][44][45][46]. In ABL1, point mutations that confer resistance against tyrosine kinase inhibitors in patients with CML have been discovered in more than 50 different hotspots in the kinase domain [47][48][49][50]. It is possible that ABL1 p.Asn350Ser identified in this study may confer drug resistance in a similar fashion to those that have previously been identified.
As MPNs are a clonally heterogenous and multifactorial group of diseases, differences in gene dosage and clonal architecture, the order of mutation acquisition and even germline predisposition can contribute towards MPN pathogenesis as well as differences in disease phenotype and prognosis [51,52]. In addition, other factors such as lineage bias in haematopoietic stem cells, changes in the bone marrow microenvironment, and aging have also been reported [53]. Further studies should aim to experimentally characterise the functional impact of the variants identified and Fig. 4 Variants detected in the clinical MPN samples with VAF > 5%. Note that Sample 02 also carries another variant in TET2 with a VAF of 1.4% (not shown in Fig. 4) investigate their possible synergistic effects and biological significance in myeloid disorders, especially in MPN. The putative novel variants should be further investigated using in vitro candidate gene approaches via protein function assays and bioinformatics analyses in order to elucidate their impact especially in poorly characterized proteins [54,55].
In this study, all NGS-identified variants were confirmed via bidirectional Sanger sequencing except for TET2 p.Tyr1631*, JAK2 V617F and U2AF1 p.Gln157Pro where the allelic frequencies were < 15%, whereas DMNT3A p.Pro385Pro was not confirmed due to nonspecific primer binding during the PCR step ( Table 2, Additional file 9: Fig.  S3). Sanger sequencing is widely regarded as the 'gold standard' for the validation of NGS-detected variants as it can discriminate true variants from NGS artifacts or sequencing errors [56]. However, as the Sanger method relies on the detection of fluorescence, it has limited sensitivity when detecting variants with low allele frequencies; particularly in mosaic tumour samples, leading to false negative results [57][58][59]. The sensitivity of Sanger sequencing has been reported to be around 15-20% allele frequency, whereas NGS has a sensitivity of approximately 1% allele frequency [59,60]. Therefore, best practice standards for NGS variant confirmation should include alternative methods with higher sensitivity, such as droplet digital PCR or allele-specific qPCR, followed by high-resolution melting curve analysis for the confirmation of variants with low allele frequencies [56,60,61]. Nevertheless, such methods are also accompanied by technical limitations and caveats which should be addressed and optimised for their specific intended purposes [62].
Future studies may also benefit from the use of an expanded bioinformatics pipeline which will ultimately provide a greater wealth of information to the clinician, such as the indication of variant germline/somatic status, the identification of clonal haematopoiesis of indeterminate potential (CHIP, which shows evidence of clonal expansion), monitoring of minimal residual disease, determination of disease predisposition, as well as accurate prediction of treatment response/outcome [63,64]. A larger study cohort as well as the collection of samples and clinical data at follow-up as well as data on response to therapy and survival will allow for better genotype-phenotype associations and contribute towards the better understanding of MPNs.

Conclusions
In summary, the custom NGS panel enabled the detection of known MPN-associated genetic variants, as well as the identification of novel variants of potential biological significance, indicating its potential clinical utility in the genetic profiling of MPN patients. However, further performance optimisation is required especially for regions with poor coverage depth to ensure that the custom NGS panel will serve as a robust and reliable tool for the personalised management of MPNs. It is hoped that the data generated from the screening of MPN patients using the custom NGS panel will also contribute towards the MPN knowledgebase, and support the adoption of more accurate genomics-based disease classification as well as prognostic frameworks for the management of MPNs.