Advances in Molecular Diagnostics

Copyright: © 2015 Hesse A, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Clinical Next-Generation Sequencing for Somatic Mutation DetectionAdvancements and Commercialization Strategies Andrew Hesse, Christopher Chen and Honey V. Reddi*

Pathologists can readily score sections from biopsies to identify samples with greater than 10% tumor burden [9], but the emerging field of liquid biopsies will enable far more sensitive detection methods.
To lower the limit of mutation detection, researchers have developed ways to simplify extraction of liquid biopsies, improve sequencing technology, enrich for mutant populations, and enhance bioinformatics software ( Figure 1). This review will focus on the latest developments within each of these methods and discuss the utilization of combinations of technologies and strategies for commercialization.

Sequencing Methods
Whole exome sequencing will typically identify more than 20,000 variants [10], but most of these are clinically insignificant polymorphisms, non-pathogenic missense mutations or false positive calls resulting from sequencing errors. The simplest way to sift through the noise of NGS data is to perform deeper sequencing. This method of adding coverage to increase base call reliability was verified by Izawa et al. in a 2012 study which demonstrated that a variant with a 1% allele fraction can be detected with statistical confidence at 700x coverage comprised of 350 reads from each strand [11]. The drawback of deep sequencing is the increase in cost. A most practical way to increase coverage is to focus sequencing on a small panel of genes relevant to the disease state rather than whole genome or exome. Accordingly, for Cancer, many companies are beginning to introduce panels tailored to a "broad spectrum" common mutation cancer panel or a typespecific cancer. Another simple way to reduce the noise from NGS is to perform "paired tumor-normal" sequencing. This technique involves orthogonally sequencing (independent, simultaneous runs of paired specimens) somatic and normal tissue sample from whole blood. Common experimental designs produce independent sequence files that are imported into analysis software that compare the germline and somatic data to eliminate non-specific variants [12]. Further refinement is accomplished with customized bioinformatics pipelines and confirmation of suspect mutations on an alternate technology. Illumina and Ion Torrent are the current market leaders in NGS platforms, having been largely adopted by industry due to lower error rates compared to Pacific Biosciences and Oxford Nanopore, making them more suitable for somatic mutation sequencing.

Illumina
Currently, Illumina is the most widely used NGS technology with 74% market share (www.marketsandmarkets.com/Market-Reports/ next-generation-sequencing-ngs-technologies-market-546.html). Compared to other available sequencing technology, Illumina offers the largest data output, the lowest cost per-base and relatively fast turn-around time. Illumina products amplify fragments by clonal bridge amplification and sequence by synthesis using reversible dye terminators. Three "off the shelf " somatic cancer panels are available: the TruSight Tumor Panel™, the TruSight Myeloid PanelTM, and the TruSeq Amplicon Cancer Panel™. The TruSight Tumor Panel consists of 26 genes spanning 21kb of sequence and achieves a minimum coverage of 1,000x per amplicon at 7,000X mean coverage (illumina/ datasheet.pdf). The TruSeq Amplicon Cancer Panel is composed of 48 genes spanning more than 35kb of sequence with an average coverage of 1,000x per run [13]. The TruSight Myeloid PanelTM covers approximately 141kb from 15 full genes (exons only) and hotspots from 39 additional genes. In this panel, sequencing depth is 500X for 95% of amplicons with an LOD as low as 3% (illumina_trusight_tumor.pdf).
Illumina recently released the NextSeq 500 v2. This kit is compatible with the TruSight Myeloid and TruSeq Ampicon Cancer Panels run on NextSeq sequencers. It improves upon the previous sequencing reagents and clustering chemistry with error rates in line with those seen on MiSeq or HiSeq.

Thermo fisher
Ion Torrent products amplify DNA fragments by emulsion PCR, and sequencing is performed directly on a silicon chip that detects changes in pH from the release of a proton during DNA polymerization. The Personal Genome Machine (PGM) and the Ion Proton exhibit the quickest run time, the former in little as 3 hours [14] and yield roughly 1-2 GB and 10-15 GB of data per run, respectively. Ion Torrent has recently released several cancer panels including the 50-gene Ampliseq Hotspot Cancer Panel v2 which is an update of the previous Ampliseq panel adding an additional 4 genes and about 2000 COSMIC mutations (2,800 total). This assay has been validated using various carcinomas, gastrointestinal stromal tumors, melanoma, and brain tumors [13,15]. The panel is composed of a relatively small panel (less than 13.5 kb of sequence), allowing for scalable runs generating as much as 5,000x coverage using Ion Torrent's 316 chip (www.edgebio. com/ampliseq-cancer-panel). Using a smaller "hotspot" panel on the Ion Torrent further increases the speed of sequencing and allows for faster reporting. The AmpliSeq™ Comprehensive Cancer Panel is  Ion Torrent recently upgraded their sequencing chemistry for the PGM with the launch of the Hi-Q™ sequencing kit. In developing the Hi-Q kits, mutated polymerases were screened to identify a novel enzyme that reduces the false positives caused by insertion/deletion polymerase errors by 90%. Furthermore, the new chemistry supports 400 base pair read lengths. Ion Torrent technology offers lower cost equipment and faster turnaround times than Illumina, but more expensive sequencing runs. Comparison of the Illumina and Thermo Fisher commercial cancer sequencing kits and technologies is listed in Tables 1 and 2.

Mutation enrichment
One method to reliably sequence rare mutations below the existing limits of detection is to specifically enrich variants from the wild-type sequence to easily detectable levels before sequencing. There have been many methods developed for this purpose and can be divided into those that detect specific known mutations and those that can enrich unknown mutations.
Enriching for known mutations can easily be done by designing PCR primers specific for the mutation. There have been a number of methods developed with this basic premise including amplification refractory mutation system (ARMS), allele-specific amplification (ASPCR), allele-specific amplification (ASA), PCR amplification of specific alleles (PASA), PCR amplification of multiple specific alleles (PAMSA), competitive oligonucleotide priming (COP), mutant enrichment PCR [enriched or mutant-enriched PCR (EPCR or ME-PCR)], mismatch amplification mutation assay (MAMA), mutant allele-specific amplification (MASA), antiprimer quenching-based real-time PCR (aQRT-PCR), restriction endonuclease-mediated selective PCR (REMS-PCR), Scorpion and Pointman. The difference among these methods is beyond the scope of this review, but they have been compared in detail by Milbury et al. [16]. Enriching unknown mutations introduces a level of complexity. Enzymatic digests using mismatch specific endonucleases leaves DNA products unavailable for sequencing. To preserve the DNA, more complex methods like high performance liquid chromatography (HPLC) have been utilized. More recently, CO-Amplification at Lower Denaturation temperature or COLD-PCR was developed to circumvent the need for HPLC. COLD-PCR is an amplification performed at a reduced denaturation temperature, such that heteroduplex DNA containing a mixture of wild-type and mutant DNA are preferentially amplified over wild-type homoduplexes. The development of Improved and Complete Enrichment CO-amplification at Lower Denaturation temperature or ICE-COLD-PCR (IC-PCR) goes one step further by also including synthetic reference DNA (RS) molecules that compete to bind with wild-type (WT) DNA strands. The synthetic reference sequence is also chemically modified to prevent primer binding and is phosphorylated on the 3' end to prevent polymerization. Spiking the amplification solution with WT RS establishes dynamically favorable binding of polymerase with mutant DNA strands, thereby preferentially amplifying the mutant strand in high number [3]. Interestingly, IC-PCR exhibits an inverse relationship between amount of enrichment and initial mutation abundance. Milbury et al, substantiated this inverse trend using IC-PCR to enrich mutations for subsequent pyrosequencing. The researchers observed a 5.5-fold increase in sensitivity with 10% pre-enrichment abundance of the mutant allele, a 35-fold increase for a 1% mutant allele and a 75-fold increase when starting with 0.1% mutant allele.
A potential drawback of mutational enrichment using PCR is that it is difficult to extrapolate back to determine the initial ratio of wild-type to mutant DNA. Quantitation is extremely valuable in liquid biopsies because it can be used for disease monitoring before and after treatment. Approximate quantitation can be achieved by comparing results to those obtained using wild-type DNA spiked with known mutant DNA as standards. With the recent development of digital PCR systems that compartmentalize individual template DNAs during PCR, it should be possible to obtain absolute quantitation of these rare mutations in the future.

Bioinformatics and Analysis Software
Calling mosaic variants can be challenging due to low allelic fraction and variability in depth of coverage. Additionally, sequencer error rates may exceed the natural mutation rate with low frequency variants, which results in an increased number of false negative calls [12]. There are an increasing number of software platforms available to aid in overcoming these issues and facilitate the process of variant calling. For applications without the need for de novo assembly, such as re-sequencing testing performed in clinical labs, software can be optimized for low divergence and thus increase the detection of low abundant mutations. In combination with species-specific mutation rate and known error rates of the sequencing platform, statistical assumptions can be made that decrease the demand for computational resources and increase accuracy [9]. The most popular software programs, such as Varscan 2 and MuTect, utilize a method of paired tumor-normal samples that compare normal tissue with somatic tissue for the purpose of eliminating polymorphisms [17]. This review will cover the more widely used software tools. A comprehensive evaluation of over 200 genome software tools has been reviewed by researchers at the Innsbruck Medical University [18]. Somatic variant calling software covered in this review are summarized in Table 3.

Varscan 2
VarScan is a variant detection software developed by the Genome Institute at Washington University with validated, high quality results for somatic mutation calling. The major advantage of VarScan 2 is that it directly performs simultaneous paired tumor-normal analysis position by position to maximize detection of low abundant alleles that were under-sampled in normal tissue. Genotype calls are then made independently by germline consensus method and compared using a parametric decision tree algorithm (varscan.sourceforge.net/somaticcalling.html) Koboldt et al validated VarScan 2 in 2012 using 151 ovarian adenocarcinoma samples that underwent exome scale sequencing [12]. The authors noted that VarScan 2 is an effective tool for the detection of somatic mutations and identification of copy number variations (CNV) and loss of heterozygosity. Additionally, VarScan 2 has a notably low false-negative rate of 0.84%, making it a highly dependable analysis tool. It is important to note that variants missed by VarScan 2 in the Koboldt study were also missed by similar software [19], suggesting that this is a limitation of the sequencing rather than the software itself.

Mutect
The Genome Analysis Tool Kit (GATK), developed by the Broad Institute, is a popular software for analysis of human germline mutations. With the increased demand for somatic analysis tools, The Broad Institute developed MuTect, which exhibits high sensitivity and reliable detection of low frequency variants [11,19]. In addition, MuTect can be used with an unmatched normal sample or in the absence of a normal sample; however, extensive post software analysis would then be required for the attainment of actionable results. Wang et al examined a number of tumor-normal pairs in order to determine the utility of six such variant-calling tools, including MuTect and VarScan 2. They found that MuTect outperforms other programs in making accurate calls on lower quality reads (those with low allelic fraction or low coverage), while Varscan 2 showed superiority for high quality calls and for SNVs with alternate alleles. Therefore, they concluded that running data through both programs with these complementary strengths should maximize the number of correctly identified variants [19].

Development and Commercialization Strategies
Despite the growing availability of somatic genetic testing protocols and commercially available assays, many clinical laboratories continue to offer only hereditary cancer panels. Wide-scale industry adoption of somatic testing will not only require extensive validation of highthroughput repeatability in wet lab, bioinformatics, and reporting procedures, but cost effective workflows that can meet the demand for expected turnaround times. Table 4 lists the labs that have entered the somatic testing space by either utilizing commercially available kits or implementing proprietary assays. Commercial NGS-based tests are listed in Table 4 and liquid biopsy sample enrichment tests are collated in Table 5.
The Jackson Laboratory of Genomic Medicine has recently developed a somatic cancer panel, the Cancer Treatment Panel (JAX-CTP™). This test sequences 190 clinically actionable genes with an average coverage depth of 300X and is designed to detect mutation fractions as low as 10%. JAX validated their protocols and pipeline using HapMap and FFPE samples to evaluate: (1) repeatability, (2) reproducibility, (3) specificity, (4) sensitivity and (5) accuracy. Precision, measured as a composite of repeatability and reproducibility, met the 98% concordance requirement for validation. The JAX-CTP also includes copy number variation (CNV) analysis utilizing NanoString nCounter®, which distinguishes this panel from many other solid tumor NGS tests. However, there are two significant limitations to this component that must be considered when interpreting the data: (1) the design only allows for reliable detection with copy numbers six or greater--requiring at least 50% tumor purity-and (2) the validation did not quantify detection limits for deletions due to sample availability [20].

Mayo Medical Laboratories designed the Solid Tumor Targeted
Cancer Gene Panel by Next-Generation Sequencing (CANCP) that consists of 50 genes sequenced by NGS at a 5-10% allelic fraction detection limit. CANCP is utilized with the goal of discovering mutations known to confer resistance or desirable responses to treatment therapies (www.mayomedicallaboratories.com/test-catalog/Clinical+and+In terpretive/35594).
ARUP offers the Solid Tumor Mutation Panel, a 48-gene NGS-based hotspot panel for solid tumor samples. ARUP has validated a process to sequence somatic tissue with a tumor percentage as low as 10% (not to be confused with mosaic detection limit) and can report results in less than 2 weeks (www.aruplab.com/files/resources/oncology/SolidTumor.pdf).
Washington University in St Louis through their Genomics and Pathology Services laboratory (GPS) provides somatic cancer testing with the Solid Tumor Gene Test: a 65 gene panel for profiling tumors for diagnosis and treatment guidance (gps.wustl.edu/cancer#solid%20tumor).
Foundation Medicine offers FoundationOne™, a solid tumor panel consisting of 315 genes and intronic regions from an additional 28 genes with a median depth of 500x (foundationOne_technicalspecs. pdf). The panel is designed to detect SNVs, INDELs, CNVs as well as selected gene rearrangements with high sensitivity and detection limit as low as 5% [21]. The turnaround time for test results is between 11 and 14 days from receipt of sample. Foundation Medicine also has ongoing clinical trials to evaluate the utility and performance of a liquid biopsy test, which is expect to launch in 2016 (foundationmedicine. com/releasedetail).
Memorial Sloan Kettering Cancer Center developed the Integrated Mutation Profiling of Actionable Cancer Targets (MSK-IMPACT™) test to target both common and rare cancer variants using Illumina's HiSeq 2500 as the sequencing workhorse. The panel covers 410 genes with selected intronic regions and analyses SNVs, INDELs, CNVs as well as some structural rearrangements. The test methods achieve low detection limits of 2% and 5% for hotspot and nonhotspot mutations respectively. Currently, this test is only offered to MSK patients [22].
Knight Diagnostic Laboratories specializes in cancer diagnostics and has a line of somatic tumor panels, the GeneTrails® Cancer Panels. Using Ion Torrent PGMs, KnightDx has created tumor panels that yield a minimum of 100x coverage, an LOD of 5-15% and produce final  University of Pittsburgh Medical Center designed a custom thyroid cancer panel, ThyroSeq® that covers 12 genes totaling 284 mutation hotspots. The test is run on an Ion Torrent 318 chip achieving analytical accuracy of 100% and a mutation detection limit as low as 3%. This is accomplished by using the Torrent Suite pipeline from Ion Torrent. Additional analysis and annotation is performed by a custom in-house design developed by UPMC [24]. A second version of ThyroSeq, ThyroSeq v.2 has been released that now includes 14 genes covering over 1000 mutation hotspots and 42 thyroid cancer specific gene fusions. Additionally, a 60 gene version of the ThyroSeq v.2 test is offered through a partnership with CBLPath. www.cblpath.com/ Guardant Health developed a liquid biopsy test for commercial use in 2014, the Guardant360™, which tests for 68 clinically actionable cancer genes across more than 150kb of DNA. The test has a reported specificity greater than 99.99% and, detects SNVs, CNVs, INDELs and genomic rearrangements with a LOD of 0.1 and a turnaround time of 2 weeks.
www.guardanthealth.com/guardant360/ Transgenomic, Inc developed MX-ICP-multiplexed ICE COLD-PCR™. MX-ICP technology produces as much as a 500-fold increase in mutation detection sensitivity, allowing a detection limit as low as 0.01%. Testing is currently offered for EGFR mutations to determine NSCLC and CRC treatment resistance and has a turnaround time of 7-10 days.
www.transgenomic.com/clinical-applications/mx-icp-overview/ Biodesix launched GeneStrat, a commercial liquid biopsy test, in May of 2015. The test targets 3 cancer genes: EGFR, KRAS and BRAF for mutations that provide guidance for treatment decisions of advanced NSCLC lung cancer patients with a turnaround time of 72 hours. Post enrichment sequencing is performed by droplet digital PCR (ddPCR). www.biodesix.com/genestrat/ Pathway Genomics released 2 new liquid biopsy tests to market in 2015, CancerIntercept™ Detect and CancerIntercept™ Monitor. The former intended for early discovery and the latter for serial monitoring of tumor and treatment progress. The tests require 10ml of blood in 2 specialized tubes for a total of 20ml and can achieve a detection limit as low as 0.01% with 300ng of DNA and 0.25% with as little as 10ng of DNA. Furthermore, by enriching the sample for 9 well known driver mutation genes affecting multiple cancer types in combination with the ultra-sensitive design of CancerIntercept, the assay is able to function as an early screening test for common tumors. Post enrichment sequencing is performed on Illumina instruments.

Challenges
The presence of genome heterogeneity (genetic mosaicism) and heterogeneous tissue provoke some of the larger biological problems for clinical laboratories. Genetic mosaicism is an unavoidable characteristic of tumor biology; however, improvements to bioinformatics processes can greatly reduce the burden this phenomenon imposes on clinical interpretation. Indeed, as population genetics data become more plentiful, mutation rate algorithms can be further optimized and accuracy subsequently improved. Heterogeneous tissue-the various normal cells mixed in with a tumor mass-impose difficulty on wet lab procedures resulting in reduced efficiency and accuracy of sample sequences. To combat this challenge, significant progress is being made in the field of single cell genomics, which will substantially reduce neighboring cell contamination [25]. As it stands, there are a number of problems that must be resolved before single molecule sequencing can be used as a comprehensive diagnostic tool [26]. For example, Pacific Biosciences' RS II sequencer produces a relatively low output of 1GB per run with a relatively high instrument cost of roughly $700,000 dollars and a considerably high error rate of 14% (compared to 0.1-1.0% for current leading technologies). Detection of CNV is another persistent issue for Next-generation sequencing. Although read depth-based methods for CNV detection are utilized for clinical testing, the methods are limited to high amplification (>6 copies), homozygous deletions and are sensitive to sample purity [20,21]. Despite these challenges, advancements in mutation enrichment of liquid biopsies are enabling genetic testing labs to march forward not only with increased diagnostic sensitivity, but a viable method of cancer treatment monitoring.
While the scope of this review aims to detail the technologies and software used for somatic mutation detection and their adoption by the diagnostic industry, the difficulties of reimbursement and coverage of such tests cannot be ignored. Diagnostic testing for medical treatment and management needs to go hand in hand with reimbursement to address the market need thereby driving further advancements. Two major complications for reimbursement according to Genome.gov are insufficient data regarding the economics of such testing and evaluating the costs of technologies used for testing. While this data will become available with time as more and more genetics laboratories continue providing information, other, more intricate problems will persist. Insurance plans differ considerably on what qualifies for reimbursement regarding genetic tests and many even use "evidence-based coverage" plans that attempt to justify the accuracy of tests and the availability of treatments. Dissimilarity not only between insurance providers but also between laboratory protocols, the technology used and the purpose of testing determined by the patient's physicians as well as clinical utility of the test makes this an incredibly complicated issue.

Conclusion
In the past, cancer and other somatic diseases with mosaic presentations were, for the most part, limited to germline risk assessment. Today, sequencing and computing technology permit targeting and identification of complex and low abundant variation, which is forever changing the diagnostic landscape in medicine. Clinicians now have rapid access to accurate, low-cost genetic information and can therefore develop thorough, highly personalized treatment plans and track the progress through non-invasive serial testing.
There is still a great deal of work to be done, however. While clinicians make use of what is currently available, laboratory and bioinformatics scientists need to design scalable, high throughput processes that can handle commercial volume with high reproducibility