The Validity and Predictive Value of Blood-Based Biomarkers in Prediction of Response in the Treatment of Metastatic Non-Small Cell Lung Cancer: A Systematic Review

With the introduction of targeted therapies and immunotherapy, molecular diagnostics gained a more profound role in the management of non-small cell lung cancer (NSCLC). This study aimed to systematically search for studies reporting on the use of liquid biopsies (LB), the correlation between LBs and tissue biopsies, and finally the predictive value in the management of NSCLC. A systematic literature search was performed, including results published after 1 January 2014. Articles studying the predictive value or validity of a LB were included. The search (up to 1 September 2019) retrieved 1704 articles, 1323 articles were excluded after title and abstract screening. Remaining articles were assessed for eligibility by full-text review. After full-text review, 64 articles investigating the predictive value and 78 articles describing the validity were included. The majority of studies investigated the predictive value of LBs in relation to therapies targeting the epidermal growth factor receptor (EGFR) or anaplastic lymphoma kinase (ALK) receptor (n = 38). Of studies describing the validity of a biomarker, 55 articles report on one or more EGFR mutations. Although a variety of blood-based biomarkers are currently under investigation, most studies evaluated the validity of LBs to determine EGFR mutation status and the subsequent targeting of EGFR tyrosine kinase inhibitors based on the mutation status found in LBs of NSCLC patients.


Introduction
Lung cancer is the leading cause of cancer-related deaths worldwide and is known for its high incidence and mortality rates [1,2]. The current treatment standard for early-stage non-small cell lung cancer (NSCLC) (stage I-II) is resection. In addition to resection, patients are offered stereotactic radiotherapy, adjuvant chemotherapy, or a combination of both depending on the tumor stage. Patients diagnosed with stage III-IV are eligible for systemic therapy such as chemotherapy, chemoradiotherapy, the biomarker was also detected in a matched tissue sample. The number of included patients ranged from 10 to 989 with a mean of 55 patients.  The majority of studies (72%, n = 56) reported validity of EGFR mutations, including exon 19 deletion, L858R, and T790M mutations. Reported sensitivity values for identified biomarkers ranged from 19.6% to a perfect 100%. In these studies, the sensitivity was reported for EGFR, exon 19 deletion, L858R, and T790M in 23, 21, 23, and 10 studies. The results indicate that next generation sequencing (NGS) is more sensitive than polymerase chain reaction (PCR) in the detection of EGFR and T790M mutations, but less for L858R mutations. Figure 2 depicts the sensitivity, specificity, and concordance reported by each. As shown in Figure 2, the average sensitivity of NGS in the detection of EGFR and T790M mutations was 81% and 87%, respectively. While the average sensitivity of PCR in the detection of EGFR and T790M mutations was 62% and 64%, respectively. A slightly higher sensitivity of PCR compared to NGS was reported for exon 19 deletions (NGS 67%, PCR 76%).
Specificity was reported in 21, 20, 20, and 8 studies for L858R, exon 19 deletion, EGFR, and T790M mutations respectively. A specificity of >90% was seen in most of the studies, despite a few exceptions like a study reporting a specificity of 47% in a 50-gene panel including EGFR, ALK, and KRAS [19]. The specificity of L858R mutation detection was 97.8% and 98.2% for PCR and NGS-based methods respectively. While the average specificity for PCR-and NGS-based methods in the detection of exon 19 deletion was 98% and 97%, respectively. In the detection of T790M mutations with an average reported specificity of 94% and 82% for NGS-and PCR-based methods.
Finally, the concordance between LBs and TBs is reported as a percentage agreement. Concordance rates of EGFR mutation detection were reported in 14, 15, 14, and 6 studies for L858R, exon 19 deletion, EGFR, and T790M mutations, respectively. Concordance ranged from 40% for detection of the T790M mutation to 98.7% for the detection of EGFR mutations. On average reported concordance rates were higher for NGS-based methods compared to PCR-based methods for all EGFR mutations. With an average concordance rate for NGS and PCR of 91% vs. 88% in L858R mutations, 90% vs. 87% for exon 19 deletions, 89% vs. 84% for EGFR mutations, and 69% vs. 68% for T790M mutations. Validity measures of all identified analytes, including the sensitivity, specificity, and concordance of liquid biopsy results compared to matched tissue samples. The y-axis presents each of the reported biomarkers with analysis platform used and separated through an underscore (e.g., EGFR_NGS). The size of the "circle" (see caption right of the figure) depicts the number of patients in whom the biomarker was detected in the tissue sample. Likewise, the "plus" shaped marker depicts the average of the reported values. A boxplot is used to present the range of the reported values, the box represents the 25 th and 75 th percentiles, while the whiskers extend to a maximum of 1.5 times the inter-quartile range.

Study Evidence Levels for Predictive Biomarkers
A description of all studies included as describing the predictive value of biomarkers is listed in Table 2.   Studies were classified according to the evidence framework as proposed by Rao et al. [147]. Six different evidence levels were identified, ranging from retrospective non-case/control studies, to post-hoc biomarker correlative analysis of a prospective randomized clinical trial. The majority of studies were classified as III B, a prospective observational study (n = 38.59%). Other classes included I D post-hoc biomarker correlative analysis of a prospective randomized controlled trial (n = 6.10%), II B prospective biomarker driven non-randomized clinical trial (n = 5.8%), II C a post-hoc biomarker correlative analysis of non-randomized clinical trial (n = 3.5%), III C a case-control study (n = 1.2%), III E a retrospective non-case-control study (n = 11.17%) (Figure 3.).

Evidence of Predictive Value of a Biomarker Based on LBs
A total of 64 studies were identified reporting on the predictive value of a LB to guide a specific treatment. The included studies tested 67 different analytes for 24 different treatments or treatment combinations. EGFR mutations (including exon 19 deletion, T790M, and L858R) were described in 18 studies (28%), while 10 studies described the predictive value of CTC count (16%). Nineteen studies (30%) evaluated the LB to indicate chemotherapy, either a single, doublet or combination therapy. Targeted therapy agents (e.g., erlotinib, gefitinib, icotinib, afatinib) were subject of evaluation in 31 (48%) of the identified studies, while immunotherapy agents (e.g., patritumab, nivolumab, bevacizumab) were described by 9 (14%) studies. Figure 4 depicts the analytes and therapies described in the different studies, stratified according to the evidence level. As previously shown in Figure 3, the majority of studies were classified as class III B, a prospective observational study. In this category, CTC count, EGFR mutations, and cfDNA level were identified most frequently. While CTC count and cfDNA level were researched in combination with several types of treatments including chemotherapy, immunotherapy, and targeted therapies. EGFR mutations in this category were exclusively researched in combination with targeted therapies. Looking at the class with the highest evidence level (I D, post-hoc biomarker correlative analysis of a prospective randomized clinical trial), we see that majority of studies in this class evaluate EGFR mutations including exon 19 deletion and L858R.

Discussion
Our results provide a clear overview of the current developments within the field and the potential clinical utility of the biomarkers identified in our study. More specific, our findings suggest that in the diverse and active landscape of biomarker research, many studies focus on EGFR mutation detection in LBs. The review also concludes that the EGFR is a valid marker in comparison to tissue analysis. It was shown that using these LB markers it is possible to indicate the treatment likely to be effective.  . Evidence level per analyte and with reference to the companion therapies. The data is presented for each evidence level (levels I-III; right y-axis) and studies were categorized based on the biomarker of interest. Different colors are used to indicate the treatment these biomarkers were compared to and numbers within the bars refer to the corresponding reference number. The evidence levels were adopted from Rao et al. [147]. I D: Post-hoc biomarker correlative analysis of a prospective randomized clinical trial. II B: Prospective biomarker driven non-randomized clinical trial. II C: Post-hoc biomarker correlative analysis of a non-randomized clinical trial. III B: Prospective observational study. III C: Case-control study. III E: Retrospective non-case/control study.
Results show a significant variety in reported sensitivity, specificity, and concordance values for LB results compared to matched tissue samples. The variation in results might be explained by differences in sample preparation, sample volume, used assay, previous lines of treatment prior to study inclusion, disease stage, amount of tumor shedding, and the number of patients included in the study. The difference in sensitivity and specificity of the platform used in mutation detection was also shown in a review by Li et al. [148]. In this review, the authors compared the performance of multiple platforms in the detection of T790M mutations. In a review, Kim et al. [149] reported on the sensitivity, specificity, and concordance rate in the detection of EGFR mutations. In this review, authors reported a variation in outcomes depending on the technology and the genomic mutation of interest. Variations in the sensitivity of mutation detection might indicate that at this moment, LBs are not ready to replace TBs in practice; however, LBs might be a good alternative in patients in whom a TB was deemed unfeasible. Moreover, as shown in Figure 2, there are studies reporting sensitivity values exceeding 90%, indicating that by selecting the correct analysis method and patient group, LBs might provide a satisfactory sensitivity for clinical applications. The variation in concordance and specificity might be an indication of tumor heterogeneity missed by TB and would indicate that there might be an added value of performing LBs alongside tissue analysis. In a report, the International Association for the Study of Lung Cancer recommended the use of LB techniques to detect EGFR mutations in treatment naïve patients. However, a negative result should be considered uninformative and should be followed by a TB [150]. Moreover, the dominant presence of studies reporting on the clinical validity of LBs in the detection of EGFR mutations found in this review is in line with the view of the International Association for the Study of Lung Cancer, and it is to be expected that the first role of LBs in the management of NSCLC will involve detection of EGFR mutations. Our results are potentially biased towards the evaluation of the validity of LBs in the detection of EGFR mutations. Considering the potential of LBs in the detection of acquired resistance to 1st and 2nd generation EGFR tyrosine kinase inhibitors (TKIs) attracted considerable attention; however, the introduction of osimertinib, a 3rd generation EGFR TKI, lessens the need for detection of the T790M resistance mechanism, for which the FDA approved the use of plasma ctDNA analysis. Moreover, current guidelines now recommend the use of osimertinib in the first-line setting, further reducing the need for the detection of acquired resistance to 1st and 2nd generation EGFR TKIs [151]. Although the necessity for LBs in the detection of T790M mutations was diminished by the introduction of osimertinib, a more comprehensive frame of reference seems appropriate, since only 12% to 45% of NSCLC patients present with EGFR mutations, depending on geography, histology, and smoking status [152,153]. While more driver mutations, targetable pathways, drugs, and companion diagnostics are being discovered [154]. Indicating that there is a lot of potential for LBs beyond the detection of EGFR mutations and resistance mechanisms to provide clinical benefit in the future. Looking at Figure 4 it becomes apparent that a lot of biomarkers are being investigated at this moment in relation to a large variety of treatments and treatment combinations. This indicates that this is an active field in which multiple research groups try to identify the most beneficial treatment for patients based on genomic mutations or other biomarkers identified by LBs. In this review, we looked at studies reporting on treatment outcomes based on biomarker analysis prior to initiation of the study related treatment. In a number of studies, patients did receive previous lines of treatment before study inclusion. Response monitoring could also be considered a predictive value of LBs; however, response monitoring was not taken into account in this review.
Currently, most targeted treatments requiring a companion diagnostic focused on tissue-based analyses for treatment selection, as indicated by Bernabé et al. [155] and also supported by the classification of studies according to their evidence level in this review (Figures 3 and 4). The preliminary nature of the evidence makes it difficult to access the clinical benefit of mutation detection using LBs since the beneficial effect of the treatment is unclear in tissue negative, plasma positive patients. Therefore, more studies should aim to include LB analysis in the study design to build on the currently available preliminary evidence. In our review, we found that 59% of identified studies were of prospective observational nature, while only 10% of the identified studies reported on a randomized clinical trial with post-hoc biomarker analysis. Future directions towards implementation might include large registry studies, which include matched tissue and LB results, and repeated LB measurements to possibly evaluate the predictive value of a LB in response monitoring.
Like every review, this review has potential limitations. Despite the generally accepted problems of selection of studies, A more fundamental problem might be the decision not to report the specific methodology used in sample preparation and analysis. This was deliberately chosen as our focus was to review evidence levels for each of the analytes. However, it is acknowledged that specific analytic issues (such as DNA extraction) will potentially impact the clinical validity and predictive value. One of the reasons for this restriction, was that more than half of the included studies in the validity group did not provide detailed information regarding the applied methodology (e.g., DNA input quantity), referred back to previous work, only listed the test kits used, the authors stated that DNA purification or library preparation was performed according to manufacturers' instructions, or sample analysis was performed in an external laboratory. This lack of information makes it difficult to compare different test accuracies, even within biomarkers analyzed using similar methodologies (e.g., NGS or PCR). Second, TBs are regarded as the gold standard in determining the sensitivity, specificity, and concordance rate of LBs. In this review we did not collect information on the methods used to detect biomarkers in tissue samples, the accuracy of methodologies used in the analysis of tissue samples directly influences the accuracy of LB results, e.g., mutations missed in the analysis of tissue samples potentially lead to a reduction in the specificity and concordance rate of the LB analysis. However, it was expected that all studies included in this review applied generally accepted methods or used commercially available equipment in the analysis of tissue samples.

Eligibility Criteria
Studies included in this review could cover a wide range of LBs, but had to present results of either the clinical validity or predictive validity. Original full-text articles published in English were selected for review.

Search
A systematic literature search was performed in September 2019 using Scopus and PubMed databases to identify relevant studies published between 1 January 2014 and 1 September, 2019. The time span was selected to cover all recent developments in LB development while maintaining a amenable amount of search results. The search included the following keywords and allowed for different conjugations: NSCLC, non-small cell, ctDNA, microRNA (miRNAs), CTCs, extracellular vesicles, blood, and serum. The full search queries used to perform the literature search are depicted in Supplementary Materials I (S1). All article types were included in the initial search. This systematic review was conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines [156].

Study Selection
After removing duplicate records, a review protocol (Figure 1) was used towards the selection of relevant articles. Prior to conducting a full-text review, one author (F.v.D.) reviewed the title and abstract of all records to determine their relevance. Exclusion of records from full-text review was based on article type (e.g., review, letter to the editor, short communication, meta-analysis), cancer type (other than NSCLC), number of patients included in the study (<20), clinical utility (the absence of a relation between the biomarker and treatment outcome or a comparison between the LB results and matched tissue samples), and biomarker type (single nucleotide polymorphisms (SNPs) were excluded from this review). Inclusion criteria were checked in a fixed order (as depicted in Figure 1), inclusion criteria were not mutually exclusive and exclusion of articles was based on the first unmet criteria. All reviewed abstracts were discussed with two co-authors (V.R. and H.K.) in case of doubt, until a consensus was made on the inclusion of the paper. The two co-authors independently reviewed 70 randomly selected abstracts (~4% of all records identified). Results were compared to check for disagreement between reviewers.
Articles were excluded if (1) the described study included less than 20 patients, (2) the intended use of the biomarker was not categorized in terms of being prognostic, predictive, or diagnostic, (3) the study did not report overall survival (OS), progression free survival (PFS), sensitivity, specificity, and/or concordance rate, (4) full English text was unavailable.
Studies were excluded if they only reported on a biomarker of interest that could be classified as an SNP. Reported sensitivity, specificity, and concordance were only extracted in case the study included >10 patients in whom the biomarker was detected in matched tissue samples. Thresholds were chosen to ensure a minimal evidence base.

Data Extraction
A full-text review was conducted on all records selected by title and abstract screening to determine the eligibility of the articles for data extraction.
Full-text articles were screened for relevant outcomes, including Overall Survival (in months or days, OS), Progression-Free Survival (PFS, in months or days), Sensitivity, Specificity, and Concordance rate (percentage of identical measurement outcomes).
Records included after full-text screening were classified into two categories, namely validity and predictive value. Articles describing a direct comparison between LB and tissue-based molecular analysis were categorized in the category validity. From these papers we extracted the sensitivity, specificity, and concordance rate. The sensitivity and specificity reflect the true positive and true negative rate, respectively. While the concordance rate should reflect the overlap between LB and TB outcomes. The category predictive value was assigned to articles describing differences in clinical outcomes from study treatments based on the presence of a biomarker detected by LB analysis.

Evidence Classification
To gain an insight into the stage of biomarker research, we classified the level of evidence for all articles included after full-text review and categorized as describing the predictive validity of a biomarker. For this purpose, the evidence framework as proposed by Rao et al. was adopted [147]. evidence levels were classified from level I A (high-quality meta-analysis) to level IV E (expert opinion). All records were classified by the first author (F.v.D.) and discussed with co-authors (V.R. and H.K.) in case of doubt. Records were classified according to the highest applicable evidence level.

Data Interpretation
Information provided by included studies was summarized to provide a comprehensible overview. Meaning, in studies classified as predictive all mentioned chemotherapy agents in single, doublet, and combination therapies were labeled as chemotherapy. The therapy of interest in the study was labeled as EGFR TKI in case the study included multiple comparable EGFR therapies, e.g., erlotinib and gefitinib without stratification of results based on the prescribed therapy. In studies describing the validity of a biomarker, the detailed description of the biomarker analysis method was reduced to the principal technique or method. The distribution and average of the reported sensitivity, specificity, and concordance values were estimated using a weighted approach based on the study size.

Conclusions
Current literature shows that the field is moving towards the use of LBs in the detection of EGFR mutations and the prescription of EGFR TKI inhibitors. Moreover, the first adoption of LBs in practice is expected to involve the detection of EGFR mutations as an addition to currently employed TBs. The currently available evidence for most analytes is limited to observational studies, and the sensitivity, specificity, and concordance rates of LBs showed a strong variation between studies. Although the diagnostic accuracy of LB compared to TB results is not perfect, it should be noted that LBs might detect mutations missed in TBs, and further research is needed to evaluate the clinical benefit of adopting LBs in practice.