A serum circulating miRNA diagnostic test to identify asymptomatic high-risk individuals with early stage lung cancer

Lung cancer is the first cause of cancer mortality worldwide, and its early detection is currently the main available strategy to improve disease prognosis. While early diagnosis can be successfully achieved through tomography-based population screenings in high-risk individuals, simple methodologies are needed for effective cancer prevention programs. We developed a test, based on the detection of 34 microRNAs (miRNAs) from serum, that could identify patients with early stage non-small cell lung carcinomas (NSCLCs) in a population of asymptomatic high-risk individuals with 80% accuracy. The signature could assign disease probability accurately either in asymptomatic or symptomatic patients, is able to distinguish between benign and malignant lesions, and to capture the onset of the malignant disease in individual patients over time. Thus, our test displays a number of features of clinical relevance that project its utility in programs for the early detection of NSCLC.

INTRODUCTION Non-small cell lung carcinoma (NSCLC) is the first cause of cancer mortality worldwide (Parkin et al, 2005), largely due to the lack of effective tools for early diagnosis. Patients diagnosed with early stage disease (stage I) have a reasonably favourable prognosis (60-80% survival at 5 years, Jemal et al, 2008). Unfortunately, only 20-25% of all NSCLCs are diagnosed early (Greenlee et al, 2000;Porter & Spiro, 2000). Tobacco exposure is the major risk factor for lung cancer (Peto et al, 1996). Even in former smokers, the tobacco-associated risk remains high for a long time (Jemal et al, 2009). Thus, effective tools for the early detection of NSCLC, especially in high-risk individuals, are critically needed.
Low dose spiral computed tomography (LD-CT) has been tested in prospective observational trials and has proven to be an effective detection method for small nodules (even <5 mm) as well as for the diagnosis of asymptomatic lung cancers, most of which (67-85%) are at stage I (Henschke et al, 2006;Pastorino et al, 2003;Swensen et al, 2003;Veronesi et al, 2008). There is debate over whether annual LD-CT screening reduces the number of diagnoses of advanced stage disease (stages III-IV) and lung cancer deaths (Bach et al, 2007;Chien & Chen, 2008;McMahon et al, 2008), and the possibility has been raised that LD-CT screening may lead to over-diagnosis and over-treatment of biologically indolent cancers, which would rarely reach clinical relevance or pose a health hazard for patients (Reich, 2008;Welch et al, 2007). However, the effectiveness of LD-CT has recently been confirmed, as a large randomized trial (The National Lung Screening Trial, Aberle et al, 2011) was prematurely stopped on the basis that 'the accumulated data now provide a statistically convincing answer to the study's primary question' (http://www. cancer.gov/newscenter/pressreleases/NLSTresultsRelease).
While early diagnosis is indeed an effective method to reduce lung cancer deaths, concerns remain regarding the feasibility of large-scale population screening by LD-CT: the technique is cumbersome, it requires specialized health centres, and the recruitment of at-risk individuals with no symptoms of disease is difficult. Therefore, the development of diagnostic blood tests becomes of paramount relevance for the execution of effective population screenings. The detection of specific circulating microRNAs (miRNAs) might serve this purpose. Indeed there is evidence to support the idea that circulating miRNAs with diagnostic potential exist for almost every type of malignant (Lawrie, 2010;Skog et al, 2008;Tanaka et al, 2009) and nonmalignant disease (Laterza et al, 2009;Redell et al, 2010;Rong et al, 2011;Wang et al, 2009;Zhang et al, 2010), including lung cancer (Wittmann & Jack, 2010). Here, we describe a blood test, based on the detection of serum miRNAs, that identifies early NSCLC patients in a population of asymptomatic high-risk individuals with $80% accuracy.

RESULTS
Serum miRNAs in asymptomatic NSCLC patients and healthy smokers We took advantage of a large study (the COSMOS study, Veronesi et al, 2008), in which 5203 high-risk individuals were screened by annual LD-CT to detect lung cancer (Fig 1A, clinical and pathological information for all patients in this study are summarized in Table SI of Supporting Information). Ninetythree NSCLCs were diagnosed in the first 2 years of screening (55 at baseline and 38 at annual screening). Sera-collected before surgery-were available from 59 of these patients ( Fig 1A). At the time of serum collection, all patients were asymptomatic, and the detection of cancer was solely a consequence of their enrolment in the observational project. We selected control sera (matched for confounding factors, such as gender, age, and smoking exposure) from 69 individuals enrolled in the same study, in whom no cancer was found by LD-CT during the entire study (healthy smokers, henceforth 'normal') ( Fig 1A). Sera from the COSMOS study were divided into two sets: a training set (39 normal subjects and 25 with adenocarcinomas-AC) and a testing set (30 normal subjects, 22 AC and 12 squamous cell carcinomas-SCC) ( Fig 1A, and Table 1). A total of 365 miRNA assays were employed in the study (listed in Table SII of Supporting Information). A series of calibration tests (schematized in Fig 1B and Table SIV of Supporting Information). Importantly, the trend of regulation in tumour versus normal of these two groups of miRNAs was maintained between the training and the testing set, thus confirming stability of the expression profile ( Fig S3).

Research Article
A 34-miRNA model for NSCLC diagnosis in asymptomatic high-risk individuals We developed a multivariate risk predictor, using the weighted linear combination of the 34 miRNA expression values (see Materials and Methods Section and Table SI of Supporting Information), to assign each patient to a high or low risk category (with a cut-off score set at 3.235 during the training of the classifier, which was performed on the training set). In the training set, the risk algorithm displayed an accuracy of 78% and an area under the curve (AUC) of 0.92 (Table 2, Fig 2A). The predictor was remarkably stable when applied to the testing set, displaying an accuracy of 80% and an AUC of 0.89 (Table 2, Fig 2A). Importantly, the algorithm, which was derived on a training set containing only ACs, performed well in both SCCs and ACs in the testing set (AUC: 0.85 and 0.94 for AC and SCC, respectively; Table 2, Fig 2A). In addition, fairly accurate predictions could also be obtained by employing models containing fewer miRNAs. As shown in Fig S5, a 5-miRNA model displayed an AUC of 0.77 when applied to the testing set. By increasing the number of miRNAs, the predictor became increasingly more accurate, reaching an AUC of 0.89 when the 34-miRNA model was employed.
Finally, it is of note that, while our training set was intentionally enriched in stage I patients (one of our main goals was to develop a signature sensitive enough to detect also early stage cancers, in which surgery is most effective), the testing set contained a relevant proportion of more advanced cancers (Table 1). In the testing set, the predictor performed well for cancers of all stages (I-IV), with an AUC of 0.89 for stage I tumours and 0.88 for stage II-IV (Table 2, Fig 2A), arguing for its ability to detect all stages of NSCLCs in asymptomatic patients. Indeed, a sensitivity analysis showed that the 34-miRNA signature remained a strong predictor of risk independently of the subgroup of patients considered ( Fig 2B). Finally, it is worth mentioning that comparable results (in the abovedescribed and in all subsequent analyses) were obtained using two different methods for data normalization (see Fig S4 of Supporting Information).

Research Article
Fabrizio Bianchi et al.  Performances of the 34-miRNA predictor model. The area under curve (AUC, from receiver operating characteristic curves, see Fig 2A) is shown together with the overall accuracy (ACC), the sensitivity (SEN, the probability for a tumour to be correctly predicted as 'tumour') and specificity (SPE, the probability for a normal sample to be correctly predicted as 'normal'). a Accuracy, sensitivity and specificity in the training set are based on K-fold

Additional clinical validations
We performed a series of analyses to further validate the 34-miRNA predictor and to gain further insights into its potential value in the clinical setting. Initially, we wanted to exclude the possibility that the performance of our predictor could be linked to some unknown 'study bias'. Such a bias could be hypothetically represented by the fact that tumours detected by LD-CT might be intrinsically different from those that present themselves symptomatically, or to other systematic biases in the management of the individuals enrolled in the COSMOS trial.
To address this issue, we employed pre-operative sera from an independent cohort of symptomatic NSCLC patients, who underwent surgery at the European Institute of Oncology across a number of years as part of the routine clinical activity of the Institute (the Symptomatic set, 23 AC and 13 SCC, Fig 1A and Table 1). As a stringent control for the symptomatic set, we screened pre-operative sera from patients harbouring benign pulmonary hamartomas (PHs, 15 sera, Fig 1A). When the 34-miRNA predictor was applied to evaluate the risk in the symptomatic set and in the PH set, it performed remarkably well ( Fig 2C). The average risk index of NSCLCs patients was virtually indistinguishable, both for ACs and SCCs, between the symptomatic set and the testing set of the COSMOS trial (Symptomatic set: AC ¼ 7.0, SCC ¼ 13.6; COSMOS set: AC ¼ 7.7, SCC ¼ 14.6; Fig 2C). In addition, the average risk index of patients from the symptomatic set was clearly higher, in a statistically significant manner, than that of PH-harbouring patients (average risk score PHs ¼ À1.9, AC ¼ 7.0, SCC ¼ 13.6, p-value <0.01 and <0.001 between PHs and AC or SCC, respectively, Fig 2C, right). Finally, the risks of normal individuals from the testing set of the COSMOS trial and of PH-harbouring patients were similar and the difference was not statistically significant (Cosmos set Normal ¼ À4.2, Symptomatic set PHs ¼ À1.9, p ¼ 0.328, Fig 2C,   We next took advantage of sera from a group of 33 patients, who were detected at baseline LD-CT with benign lung nodules and did not develop lung cancer during the entire period of the study (nodules, Fig 1A). This provided us with the opportunity to test whether our predictor could distinguish between benign and frankly malignant lung disease in asymptomatic patients.
We therefore compared the performance of the predictor in the 'normal' sera of the testing set and in the sera of patients with the LD-CT-detected benign nodules. There were no significant differences in the average risk of the normal and nodule categories in spite of the fact that the 34-miRNA model and the risk algorithm were derived by training on a dataset (the training set) that did not include nodules (average risk score: normal ¼ À4.2, nodules, À2.3, p ¼ 0.36; Fig 3A). Indeed, the specificity of the predictor in scoring patients with benign nodules as 'normal' was 79% (26 out of 33 samples), whereas in normal high-risk volunteers it was 90% (27 out of 30 samples). In the COSMOS trial, the percentage of screened individuals who displayed non-calcified nodules was 53% (Veronesi et al, 2008). Thus, we predict that the false positive rate of our test applied to a population of high-risk individuals (which comprises both normal individuals and those with benign nodules) would be $16%.
Next, we analyzed a group of sera collected before the onset of NSCLC (i.e. from patients who were negative at the screening round but who developed lung cancer >1 year after). For 13 of such cases, we had both the sera harvested before disease onset (BDO) and the tumour sera that were already included in the training or testing sets. When the risk predictor algorithm was applied, it indicated a significantly increased average risk index for sera collected after the onset of the disease (average risk BDO, À7.1; tumour, 10.4; p < 0.001, paired t-test; Fig 3B). Thus, at least in the cases analyzed, the 34-miRNA model was capable of detecting the conversion from a normal to a malignant state.
Finally, we tackled the question of the specificity of the 34-miRNA predictor for NSCLC detection, as opposed to other types of cancer, by screening sera from a cohort of 18 patients with invasive ductal breast carcinoma and 10 with breast benign nodules. In unsupervised hierarchical clustering, using all 147 miRNAs, the expression profile of these samples did not show major differences with respect to all other sera used in this study (Fig S2 of Supporting Information). When the 34-miRNA risk predictor algorithm was applied, it could not discriminate between breast tumours and benign breast nodules ( p ¼ 0.65; Fig 3C).

DISCUSSION
We have developed a serum test that can identify patients with early stage NSCLC in asymptomatic high-risk individuals. The test displays several characteristics that are desirable in a clinical setting: (i) it can be performed on modest amounts of serum (0.5-1 ml) without any need for a pre-amplification step, (ii) it is likely going to be considerably cheaper, easier and more immediately implementable (particularly from the point of view of patient accrual and compliance) than current screening procedures, (iii) it assigns disease probability accurately in highrisk asymptomatic individuals, (iv) it can also distinguish malignant lesions from benign nodules that are frequently found by LD-CT in high risk population, (v) it captures the onset of the malignant disease in asymptomatic individuals. When applied to a high-risk population screened by LD-CT, the most powerful  tool for early diagnosis available, our test showed an accuracy of 80% and was remarkably stable between training and testing sets. A number of signatures have been reported, which were developed for the diagnosis of NSCLCs Foss et al, 2011;Heegaard et al, 2011;Hu et al, 2010;Shen et al, 2011). The majority of these signatures were obtained (and validated to various degrees) on symptomatic lung cancers, and it is not presently known whether they can be useful in the diagnosis of asymptomatic patients, for the purpose of diagnostic anticipation. A study by Boeri et al (2011), however, was published while we were completing the present manuscript, portraying a study design similar to ours. Among various signatures, these authors identify a diagnostic signature, which is conceptually comparable to ours in that it allows diagnosis in asymptomatic subjects. Their diagnostic signature is composed of 13 miRNAs, five of which overlap with our signature, with concordant direction of variation (upregulated or downregulated). Although there is a partial overlap between the two signatures, which may suggest that the two studies capture the core of circulating miRNA differences between normal and cancer individuals, the majority of the miRNAs are different. The reason for this difference is presently unclear. As observed for cancer messenger RNA (mRNA) signatures derived from similar studies, differences could be due to intrinsic genetic heterogeneity between groups of patients. It is also to be noted that Boeri et al performed their analyses in plasma, while we analyzed sera, and that it is known that differences in the analyte can have an impact on the results of studies of this nature (Heegaard et al, 2011;Kroh et al, 2010;McDonald et al, 2011). Finally, the signatures could be different as the result of their intrinsically different properties. Our signature was trained on a high number of controls, and upon validation showed a remarkable specificity both on normal subjects and benign lung nodules. It will be interesting to determine whether the signature of Boeri et al possesses similar features when tested on a high number of screening relevant controls, such as benign lung nodules, which are detected at high frequency in high-risk populations by LD-CT scan (Veronesi et al, 2008). Specificity is a relevant issue when developing a signature for diagnostic anticipation in high-risk individuals, and it becomes even more relevant in the case of benign nodules. Our signature seems also specific for lung cancer when compared to another type of cancer (i.e. breast cancer).
An important question relates to the tissue origin of the serum miRNA of our signature. In principle these miRNAs can originate from the tumour itself or from host responses to the tumour. In the former case, an additional question is whether the miRNAs of the signature play any role in the process of lung tumourigenesis. Our data do not allow us to distinguishing between these possibilities. However, it is interesting to note that several members of our predictor are directly involved in lung cancer. Members of the Let-7 family (let-7 a/b/e are present in the predictor and are downregulated in our tested tumour sera) are often downregulated in lung cancer with diagnostic and prognostic value (Yanaihara et al, 2006). These miRNAs regulate lung tumour growth in NSCLC cell lines and mouse models (Trang et al, 2010), by targeting several oncogenes including rat sarcoma viral oncogene homolog (RAS) (Johnson et al, 2005). Members of the 17-92 cluster, the first oncomiR identified (He et al, 2005), are also present in our predictor (miR-17 and miR-92a). Interestingly, the 17-92 cluster, which is under the control of the myelocytomatosis viral oncogene homolog (MYC) oncogene (He et al, 2005;O'Donnell et al, 2005), has a role in lung development (Ventura et al, 2008) and interacts with multiple cancer-relevant pathways (reviewed in Mendell, 2008). Finally, miR-126 and miR-486-5p have both been found to be downregulated in primary cancer and in the sera of NSCLC patients in several independent studies (Boeri et al, 2011;Hu et al, 2010;Shen et al, 2011;Volinia et al, 2006;Yanaihara et al, 2006). In addition, miR-126 inhibits cancer cell growth and metastasis in vitro and in vivo (Crawford et al, 2008;Liu et al, 2009;Tavazoie et al, 2008). In the case of miR-486-5p, while this miRNA has not been directly mechanistically implicated in lung cancer, it is of note that it is expressed in stem-like precursors bronchoalveolar stem cells (BASCs) from mouse lung (Qian et al, 2008). Thus, the potential role of our signature miRNAs in the pathogenesis of NSCLC deserves further experimental attention.
While a larger study that systematically and prospectively compares the results of LD-CT and our serum test will be needed to optimize our blood test, we anticipate that the composition of the 34-miRNA model will not change substantially, given its remarkable stability. However, large-scale validation might improve the predictive power of the test by allowing fine-tuning of the predictive algorithm.
We envision that our serum test for NSCLC will find its main application in the clinic as a 'first line screening test' for highrisk individuals, to identify those who should undergo further testing, including by LD-CT. Such a test might prove very useful for high-risk population screening, as its implementation in the clinic is cheap and minimally invasive. Furthermore, the simplicity of the procedure avoids 'medicalization' of asymptomatic individuals, which could encourage population compliance to large-scale screening programmes.

Patient selection criteria
Patients with NSCLC and healthy individuals were selected from a consecutive series of 5203 patients enrolled in the COSMOS (Veronesi et al, 2008). All were smokers or former smokers with a smoking exposure of more than 20 pack-years, aged over 50 (see Table 1). Tumour stage at the time of diagnosis was determined according to guidelines of the American Joint Committee on Cancer (http://www. cancerstaging.org/). Informed consent was obtained from all patients. Other cohorts of patients employed in the study are described in the main text. A summary of the clinical and pathological information for all patients is in Table SI Table SII of Supporting Information) were suitable for the purpose of our study, thus only these assays were considered for data normalization and further analyses. In the training set, we selected six miRNAs behaving as 'housekeeping (HK) serum-miRNAs' (miR-197, miR-19b, miR-24, miR-146, miR-15b, miR-19a , Table SIII of Supporting Information) with the following criteria: (i) they passed all the quality tests described in Fig S1, in particular, they were optimal quantitative variables (R 2 ¼ 0.99), (ii) they were expressed at high levels and in all samples (median Ct < 30; 100% of detection); (iii) they were not statistically different among the analyzed classes (Welch's t-test, p > 0.1); iv) they were the least variable miRNAs among all samples (SD < 0.9 Ct). Raw data were therefore normalized by the geometric mean of the 'housekeeping (HK) serum-miRNAs' . Only values below a minimal threshold (Ct < 36) were normalized in order to avoid artefactual regulation due to sample normalization. Values for the testing and symptomatic sets were normalized using the same HK serum-miRNAs. Expression values for all serum samples used in the study, before and after normalization, are shown in Fig S2 of Supporting Information. In addition, to exclude the possibility that our results were influenced by the normalization method, we repeated all the analyses with a different method, i.e. the median normalization method in which each sample was centred based on the median value of all the 147 expressed miRNAs. The two different normalization methods yielded almost identical results (R 2 ¼ 0.8, p-value <0.0001; Fig S4 of Supporting Information).

Clustering and class prediction analyses
Hierarchical clustering analysis was performed using Cluster 3.0 for Mac OSX (http://bonsai.hgc.jp/$mdehoon/software/cluster/software.htm). Expression data were clustered using uncentred correlation and average linkage. Tree pictures were generated using Java TreeView software (http://jtreeview.sourceforge.net). For class prediction, log 2 median centred expression data were analyzed using BRB-ArrayTools Version: 3.8.0-Beta_1 Release (April 2009) (http://linus.nci.nih.govBRB-ArrayTools.html). MiRNAs significantly different between case and control classes in the training set, at 0.05 significance level (parametric t-test, random variance model), were used for class prediction. The misclassification rate of the classifier was computed by diagonal linear discriminant analysis (DLDA) and K-fold (K ¼ 5) cross-validation method repeated 100 times. Statistical significance of the DLDA classifier was assessed by 1000 random permutations of the class labels. The classification of patients in the testing or symptomatic sets was performed blinded, using the following prediction rule from the Diagonal linear Discriminant predictor: a sample is classified 'high risk' if the inner sum of the weights (wi) (see Table SI) and expression (xi) of the 34 miRNAs is greater than the threshold (determined in the training set); that is, S wi xi > 3.235. To ensure that the 34 miRNA composition of our model was optimal we repeated the entire analysis using four additional classifiers built by using 5, 10, 15, 30 miRNAs of the initial list of 34 miRNAs (Fig S5). Indeed, the best performance in terms of AUC was obtained by using the 34-miRNA classifier (Fig S5). Statistical significance of the differences of the average risk index in the various sets of patients was calculated using ANOVA (in the case of more than two groups) or Welch's t-test using Prism (GraphPad Software, Inc.). Statistical significance of the differences of the average

Research Article
Fabrizio Bianchi et al.

PROBLEM:
Non-small cell lung carcinoma is the leading cause of cancer mortality worldwide, largely due to a lack of effective tools for early diagnosis. Patients diagnosed with early stage disease (stage I) have a reasonably favourable prognosis (60-80% survival at 5 years). Unfortunately, only 20-25% of all NSCLCs are diagnosed early. Effective tools for the early detection of NSCLC, especially in high-risk individuals (i.e. smokers), are critically needed to reduce diagnoses of advanced stage disease (stages III-IV) and lung cancer deaths. Several studies have reported the usefulness of LD-CT, which is effective for the detection of nodules (even when these are small) and which can be used to diagnose asymptomatic lung cancers, most of which are at stage I. There is, however, concern regarding the feasibility of largescale population screening by LD-CT: the technique is cumbersome, it requires specialized health centres, and the recruitment of at-risk individuals with no signs of disease is difficult. A blood test, on the other hand, could be administered easily, would be fast and cheap, would not require specialized health centres and would encourage the recruitment of high-risk individuals for the diagnostic anticipation of lung cancer.

RESULTS:
The authors developed a blood test for lung cancer diagnosis in asymptomatic high-risk individuals (heavy smokers, aged over 50) based on the detection of miRNAs from serum. Sera were collected from high-risk subjects enrolled in a large prospective early detection trial (the COSMOS study) for lung cancer by annual LD-CT. A diagnostic signature of 34 serum miRNAs was identified. The signature can identify patients with early stage NSCLC with 80% overall accuracy. In addition, the signature can distinguish between benign lung nodules (which are a frequent occurrence in at-risk individuals) and frank NSCLCs. Finally, the signature can capture the time of disease onset in individual patients over time.

IMPACT:
The authors report a blood test for the diagnosis of NSCLC of potential impact in the design of screening programmes for early detection in at-risk individuals, with perspective improvement in the prognosis of the disease. risk index between BDO and relative matched tumour sera was calculated using the one-tailed paired t-test (GraphPad Software, Inc.). Sensitivity analyses and Forest Plots were prepared using the statistical software JMP IN (SAS), and p values calculated with the Fisher's exact test.

Author contributions
FB, FN, MM, VDO, LB and EB performed experimental work. GP, PM and GV performed the clinical part of the work (patient selection, sera procurement, histology, patient case collection and biostatistics). FB, FN and PPDF planned and supervised the project, performed data analysis and wrote the manuscript. FB, FN and PPDF had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.