Diagnostic value of circulating genetically abnormal cells to support computed tomography for benign and malignant pulmonary nodules

Background The accuracy of CT and tumour markers in screening lung cancer needs to be improved. Computer-aided diagnosis has been reported to effectively improve the diagnostic accuracy of imaging data, and recent studies have shown that circulating genetically abnormal cell (CAC) has the potential to become a novel marker of lung cancer. The purpose of this research is explore new ways of lung cancer screening. Methods From May 2020 to April 2021, patients with pulmonary nodules who had received CAC examination within one week before surgery or biopsy at First Affiliated Hospital of Zhengzhou University were enrolled. CAC counts, CT scan images, serum tumour marker (CEA, CYFRA21–1, NSE) levels and demographic characteristics of the patients were collected for analysis. CT were uploaded to the Pulmonary Nodules Artificial Intelligence Diagnostic System (PNAIDS) to assess the malignancy probability of nodules. We compared diagnosis based on PNAIDS, CAC, Mayo Clinic Model, tumour markers alone and their combination. The combination models were built through logistic regression, and was compared through the area under (AUC) the ROC curve. Results A total of 93 of 111 patients were included. The AUC of PNAIDS was 0.696, which increased to 0.847 when combined with CAC. The sensitivity (SE), specificity (SP), and positive (PPV) and negative (NPV) predictive values of the combined model were 61.0%, 94.1%, 94.7% and 58.2%, respectively. In addition, we evaluated the diagnostic value of CAC, which showed an AUC of 0.779, an SE of 76.3%, an SP of 64.7%, a PPV of 78.9%, and an NPV of 61.1%, higher than those of any single serum tumour marker and Mayo Clinic Model. The combination of PNAIDS and CAC exhibited significantly higher AUC values than the PNAIDS (P = 0.009) or the CAC (P = 0.047) indicator alone. However, including additional tumour markers did not significantly alter the performance of CAC and PNAIDS. Conclusions CAC had a higher diagnostic value than traditional tumour markers in early-stage lung cancer and a supportive value for PNAIDS in the diagnosis of cancer based on lung nodules. The results of this study offer a new mode of screening for early-stage lung cancer using lung nodules. Supplementary Information The online version contains supplementary material available at 10.1186/s12885-022-09472-w.


Background
Lung cancer is the main contributor to cancer mortality globally [1,2]. The main screening method for the early diagnosis of lung cancer is low-dose spiral CT (LDCT) when lung nodules are small in size, for which aspiration biopsy is not suitable. However, according to the report of the National Lung Screening Test (NLST), only 3.6% of lung nodules screened by LDCT are diagnosed as lung cancer [3], and this situation often causes overdiagnosis or a significant delay in the early diagnosis of lung cancer, and patients may lose the opportunity to receive timely treatment [4]. Moreover, due to differences in the experience and understanding of imaging readers, there remains a need for a method to assist in the analysis of CT results.
To date, there have been considerable efforts to improve the efficiency of diagnosis of lung cancer based on imaging, which includes computer-aided diagnosis (CAD) systems. Indeed, CAD systems can help in detecting lung nodules in LDCT and in determining the nature of nodules by extracting and analysing the imaging characteristics of nodules, including their size, shape, and density, among others [5]. A matched case-control study using NLST data found that the CAD image analysis method significantly improves diagnostic accuracy for lung nodules detected at low-dose CT [6]. Nevertheless, the imaging features of early-stage lung cancer are usually atypical, and it is still a challenge to use CAD alone to separate small malignant nodules from the majority of benign nodules. Furthermore, CAD lacks rigorous evidence to make explainable medical decisions because of the black-box-based inference process of deep learning [7]. Therefore, CAD cannot be applied for medical diagnosis and decision-making alone, yet the combination of multiple clinical indicators may help to improve diagnostic accuracy [7][8][9].
Besides a more reliable method to analyse and interpret CT results, biomarker tests from blood sample are also with great potential in lung cancer diagnosis. In addition to traditional tumour markers, noninvasive liquid biopsies, such as circulating free nucleic acids (RNA and DNA) and circulating tumour cells (CTCs), have been reported in recent years.
However, liquid biopsy has not yet been adopted in routine clinical practice owing to many limiting factors [10], and traditional tumour markers are limited because of their low sensitivity and false positives caused by infection or other factors [11]. Moreover, circulating tumour cells (CTCs) of lung cancer often display nonepithelial characteristics, and CTCs are difficult to detect through epithelial cell adhesion molecule (EpCAM)dependent methods [12]. The recently proposed biomarker of circulating genetically abnormal cells (CACs) may solve this dilemma.
CACs are defined as peripheral blood mononuclear cells carrying mutations on chromosome 3 (3p22.1, 3q29) and chromosome 10 (10q22.3, CEP10); the detection of these cells are not EpCAM dependent and therefore overcome the limitation of CTCs detection [13]. Abnormalities at the above loci have been shown through comparative genomic hybridization analysis to commonly occur in lung cancer [14]. Katz et al. then confirmed genomic abnormalities in the sputum, tissue and blood of patients with non-small-cell lung cancer (NSCLC) [15][16][17]. Katz et al. also proved that CACs have auxiliary diagnostic value in different stages of lung cancer, with the latest research showing a sensitivity and specificity of 88.8% and 100%, respectively, for lung cancer diagnosis [17]. Therefore, CACs have great potential for diagnosing pulmonary nodules [18].
In this work, we retrospectively analysed data for patients with pulmonary nodules and attempted to identify a novel biomarker to support the ability of CT to differentiate malignant from benign pulmonary nodules. The objective of this study was to explore new ways of diagnosing pulmonary nodules by establishing new diagnostic models based on artificial intelligence-based CAD and comparing the diagnostic efficiency of different models.

Study design and patients
This was a retrospective study of patients with pulmonary nodules detected by CT at First Affiliated Hospital of Zhengzhou University; Totally, 111 patients were included from May 2020 to April 2021.
The inclusion criteria for the study were as follows: (1) ≥ 18 years of age; (2) pulmonary nodule diameter no more than 30 mm (measured by CT scan), including single and multiple pulmonary nodules; (3) diagnosis histologically confirmed using nonsurgical biopsy (including fibre bronchoscope biopsy, computed tomography or ultrasonic-guided percutaneous transthoracic biopsy) or surgical resection; and (4) CAC tests performed within 1 week prior to surgery or biopsy. The exclusion criteria were as follows: (1) CT slice thickness greater than 2 mm; Keywords: Circulating genetically abnormal cells (CAC), Pulmonary nodules, Lung cancer, Early diagnosis, Computed tomography (CT) (2) a history of malignant tumours; (3) malignant nodules that were not classified as stage I based on the 8th edition of the American Joint Committee on Cancer (AJCC) staging system [19]; and (4) malignant nodules that were not primary malignant tumours of the lung. Ultimately, 93 patients were enrolled and divided into benign and malignancy groups based on histopathologic results (Fig. 1). Tumour pathology was classified according to the World Health Organization (WHO) classification standard of lung tumours (2015 edition) [20].

Data collection
Clinical data for the patients were collected, including sex, age, smoking history and family history of malignant tumours. The results of preoperative serum tumour marker levels, including carcinoembryonic antigen (CEA), cytokeratin fragment 21-1 (CYFRA21-1) and neuron-specific enolase (NSE), for 66 patients were collected. The chest CT imaging data for the enrolled patients were separately exported, and the imaging features of nodules (including the diameter, type, location, counts, number and spiculation of nodules) were independently assessed by two senior physicians. When opinions differed, a consistent conclusion was reached through discussion with the third senior physician.

Pulmonary Nodules Artificial Intelligence Diagnostic System (PNAIDS) based CAD
PNAIDS is an artificial intelligence-based CAD that applies machine learning technology and a deep convolutional neural network to realize 3D reconstruction and segmentation of nodules and predict the malignant probability of pulmonary nodules [21]. All chest CT scans were obtained during deep inspiration; the CT images were of no more than 5 mm of layer thickness and reconstructed with a slice thickness less than 2 mm. Imaging of the lung window was downloaded in DICOM format and uploaded to a cloud platform in the same format. The malignancy probability of each nodule was calculated. The highest malignancy probability value of all nodules was used for analysing patients with multiple nodules.

CAC detection
Ten millilitres of peripheral venous blood was collected within one week before surgery or biopsy, blood samples were collected into an anticoagulation tube containing EDTA and fixed with cell preservation solution (including solution A containing phosphatase inhibitor and protease inhibitor and solution B containing formaldehyde) within 2 h. Peripheral blood mononuclear cells (PBMCs) were isolated by Ficoll-Hypaque density gradient centrifugation within 96 h. PBMCs were diluted to 40,000/100 μl, and a smear was prepared. Four-colour (3p22.1, 3q29 and 10q22.3, CEP10) fluorescence in situ hybridization was performed using a mononuclear cell chromosome abnormality detection kit (Zhuhai SanMed Biotech Inc.). The scanning, imaging and analysis procedures were automatically completed by a pathological section scanner (The Duet System, Allegro Plus, Bioview Ltd.). A total of 10,000 cells were randomly selected for a 15-layer cell scan, and the number of CACs was calculated. CACs were defined as cells exhibiting abnormal amplification at specific sites and at least three fluorescent signals at two or more specific probe sites (as presented in Fig. 2).

Mayo Clinic model
The widely accepted Mayo Clinic model [22] was also performed to predict the malignant probability of nodules. The model expresses the malignancy probability as a function of six predictors: (1) probability of malignancy = e x / (1 + e x ); (2) x = -6.8272 + (0.0391 × age) + (0.7917 × smoking) + (1.3388 × cancer) + (0.1274 × nodule diameter) + (1.0407 × spiculation) + (0.7838 × upper lobe); (3) e is the natural logarithm; age is the patient's age (years), if the patient is a current or former smoker, smoking = 1 (otherwise = 0); if the patient has a history of extrathoracic malignancy more than 5 years, cancer = 1 (otherwise = 0); the nodule diameter is the diameter of the nodule (mm); if there are burrs at the edge of the nodule,

Statistical analysis
Statistical analyses were performed using SPSS 21.0. Quantitative variables are expressed as the mean ± standard deviation (X ± S) or median and quartiles [M(QL, QU)], and independent sample t-tests or Mann-Whitney U tests were applied. Categorical variables are expressed as n (%) and analysed using the Chi-square test or Fisher's exact test. A receiver operating characteristic curve (ROC) and area under the curve (AUC), sensitivity (SE), specificity (SP), positive predictive value (PPV), negative predictive value (NPV) and Youden index was used to determine the cut-off value. To validate the robustness of the diagnostic model, logistic regression and Fisher discriminate analysis were both performed. The Chi-square test was applied for correlation analysis of classification variables. Two-sided P < 0.05 was considered significant. Correlation between numerical variables was analysed by calculating the Spearman rank correlation coefficient, with two-sided P < 0.01 considered significant. Boxplots, forest plots, and heatmaps were drawn in R (v4.0.10). DeLong's test was applied to compare AUC between ROC curves (R package pROC).

Patient characteristics
A total of 111 patients were initially screened in this study, among which 18 were excluded for different reasons (7 cases were not stage I, 3 were not primary lung cancer, 4 involved a malignancy history, 4 were without slice CT data) (Fig. 1). Ninety-three patients were ultimately included in the analysis, of which 59 (63.4%) were diagnosed with lung cancer and 34 (36.6%) with benign nodules. There were 39 males (41.9%) and 54 females (58.1%), with a mean age of 53.11 ± 10.74 years.
There were statistically significant differences in sex (P = 0.003), smoking history (P = 0.035), and type of nodules (P = 0.001), whereas no differences in age, family history of cancer, diameter of nodules, multiple nodules, upper lobe nodules, or burr signs were found between the benign group and the malignancy group. As none of the females had a history of smoking, a subgroup analysis of smoking history was performed, stratified by sex. Stratified analysis showed a nonsignificant difference in smoking history between the benign and malignancy groups in the male subgroup, with Chi-square test statistic of 0.300 (P = 0.584). The basic characteristics of the two groups are shown in Table 1.

PNAIDS and CAC counts in patients
There was no significant differences between the benign group and the malignancy group at the time before surgery or biopsy (3(1,5) days for benign group and 4(2,5) days for malignancy group; P = 0.393). The median CAC counts was 1.5(0, 3) in the benign group and 4 (3,6) in the malignancy group; the  Fig. 2a), the blue and yellow probes located in chromosome 10 have two signals (see blue and yellow arrows in Fig. 2a), indicating normal cells. b) both the green and red probe which located in chromosome 3 have three signals (see green and red arrows in Fig. 2b), the blue and yellow probes located in chromosome 10 have two signals (see blue and yellow arrows in Fig. 2b), indicating that the cell has abnormal amplification on chromosome 3, which is CAC median PNAIDS was 67.5% (59.5%, 78.8%) and 82.0% (70.0%, 90.0%), respectively. The distribution of CAC (U = 1562.5) and PNAIDS (U = 1396.5) between the benign and malignancy groups was statistically significant, at P < 0.001 and P = 0.002, respectively (Fig. 3).

Diagnostic efficiency of different methods
Based on PNAIDS, CAC counts, Mayo Clinic model, and tumour marker levels in the benign and malignancy groups, the ROCs were drawn (Fig. 4). The AUC, 95% confidence interval (CI), and Youden index of all these indicators are presented in Fig. 5. SE, SP, PPV and NPV are shown in

Correlation between indicators
Correlation analysis among CAC counts, PNAIDS, age, CEA, CYFRA21-1, NSE and nodule diameter showed a weak correlation between CAC counts and age (r = 0.311, P = 0.002), NSE and diameter of lung nodules (r = 0.323, P = 0.008). PNAIDS did not exhibited significant correlation with any of these indicators (Fig. 6). Notably, no significant correlation between PNAIDS and CAC was observed.
Numerical variables (PNAIDS and CAC counts) were converted to categorical variables according to cutoff values. Because PNAIDS was accurate to 2 decimal places, it was classified by whether it was less than 70.0%. In correlation analysis of PNAIDS, CAC counts and other categorical variables, PNAIDS correlated significantly with the type of nodule (P = 0.035), but no statistical significance with other indicators was detected. In addition, there was a nonsignificant correlation between CAC counts and all these categorical variables ( Table 3).

Combination of different indicators for diagnosing lung nodules
As tumour markers tend to be used in combination in the clinic, a logistic regression model named TM was established using CEA, CYFRA21-1 and NSE, and its ROC curve used to diagnose lung nodules (Fig. 7). First, Model 1 was established by combining TM and PNAIDS. Second, PNAIDS and CAC counts were combined to build Model 2. To better apply CAC counts to the model, we transformed this marker into ln (CAC counts + 1), which was then applied to the logistic regression model. Similarly, the transformed data were used to build Model 3, which combined PNAIDS, CAC counts and TM. The ROCs of the three models are shown in Fig. 7; Fig. 5 and Table 2      formulas of logistic regression models are presented in Additional file 2.
The best cut-off point was obtained according to the maximum Youden index in the ROC curve of Model 2 to divide patients into predicted benign and predicted malignancy groups. The distribution of CACs (U = 1936.0) and PNAIDS (U = 1550.5) between these groups was significantly different, with both at P < 0.001 (Fig. 8).

Discussion
In this study, the value of CT in discriminating lung cancer from small lung nodules was questioned, even with applying a current advanced artificial intelligence screening method which significantly increased the  [23]. Therefore, the traditional mode of small lung nodule diagnosis should be investigated. Previous works as well as this work suggest that CACs are an ideal candidate marker, as such a test only requires the simple process of blood collection and its accuracy has been reported in early lung cancer patients [18,24,25].
Despite the unsatisfactory result of the efficiency of CT alone, this approach still performed better than the Mayo Clinic model and tumour markers. As a common clinical imaging examination, CT has unquestionable value in the diagnosis of a variety of lung diseases [26]. A multicentre study involving 534 patients showed that PNAIDS had a higher diagnostic accuracy than the Mayo Clinic model and radiologists [21], consistent with our results. Several classic clinical indicators and imaging features are included in the Mayo Clinic model but show poor efficiency in distinguishing early lung cancer from benign pulmonary nodules. Thus, more specific imaging data may have higher diagnostic value than traditional imaging features.
Moreover, the diagnostic value of CACs in comparison with other traditional biomarkers has been confirmed using a cohort of patients with lung nodules, which is an independent validation to the work conducted by Ye et al. [24]. In the present study, the highest diagnostic efficiency was achieved when a CAC counts of 3 was chosen as the cut-off value, this result is similar to the study conducted by Qiu [25] and Ye [18] et al. Overall, CAC counts presented better diagnostic value than commonly used tumour markers (CEA, CYFRA21-1, and NSE), which agrees with the results of several studies reporting the advantages of CACs for the diagnosis of lung cancer [17,18,24,25,27], CACs have the potential to become a better novel diagnostic marker of lung cancer.
Biomarkers and imaging are often used in combination to improve diagnostic accuracy [8,28], our results also showed that the efficiency has been greatly improved when CAC is combined with PNAIDS in the diagnosis of lung nodules. Correlation analysis further suggested that PNAIDS and CACs are independent of each other, which is consistent with the premise of the model that variables are independent. Interestingly, Model 2, which combined CAC counts and PNAIDS, displayed significantly higher diagnostic efficiency than CAC counts or PNAIDS alone. CAC counts and PNAIDS reflect the biogenetics and imaging features of patients, respectively. The 95% confidence intervals of the AUC of NSE, CYFRA21-1, CEA and the combined index TM all contained 0.5. However, Models 1 and 3, which further included TM, did not show improved diagnostic efficiency compared with PNAIDS or with PNAIDS combined with CAC. This result also suggests the limitation of the currently clinically used TM, that is, lack of sensitivity and specificity.
In addition, we analysed correlation between the indicators and demographic characteristics, which indicated a weak correlation between CAC counts and age. In the study by Liu [27], there was no relationship between a positive CAC result (CAC counts ≥ 1) and age (age ≥ 60), which is contrary to our finding. Liu et al. treated age and CAC counts as dichotomic variables, which may have led to poorer testing efficiency, whereas we directly analysed the correlation between age and CAC counts. The observed correlation may be due to genetic mutations that accumulate in cells with age, which also suggests that the age of the population may be a factor that needs to be controlled for or corrected in CAC detection. It should be noted that age is also a risk factor for NSCLC, further research is needed to explore whether there is a biological significance between CAC and age. The serum level of NSE had a significant correlation with nodular diameter, which can be explained by tumour burden [29,30]. PNAIDS only showed a correlation with the type of nodule, suggesting the independence of imaging features and the value of imaging data for early screening of lung cancer.
Nevertheless, there are limitations in this study. First, as most malignant nodules screened by CT were adenocarcinoma, stratified analysis of different pathological types could not be applied to further explore the potential bias resulting from other pathological types of lung cancer. Second, smoking history was not common among the female patients, who comprised most cases; therefore, a larger sample size is required to assess the association between smoking and CAC counts or other indicators. Third, the sample size of this study was relatively small. Although the main statistical analysis yielded positive results, more studies with larger sample sizes are still needed to further confirm the practicability of our findings. Fourth, there is still scope to improve the diagnostic accuracy of PNAIDS, more data will be included in the future to train the PNAIDS model and construct a predictive model with higher accuracy. It is noteworthy that detection results of CACs can be obtained within 5 working days, quickly providing a more reliable auxiliary diagnostic basis when combined with PNAIDS, with a wide range of clinical application prospects. Our results indicate that this diagnostic model is promising for lung nodule diagnosis; with more data support in the future, it may be able to be extended worldwide.

Conclusions
In conclusion, this work suggests that CACs, as a novel lung cancer biomarker from liquid biopsy, show higher diagnostic value than traditional tumour markers in earlystage lung cancer and a supportive value for CT scans in the diagnosis of cancer based on small lung nodules. The results of this study pave the way for further applications of CACs and offer a potential new mode for screening early-stage lung cancer using small lung nodules.

Learn more biomedcentral.com/submissions
Ready to submit your research Ready to submit your research ? Choose BMC and benefit from: ? Choose BMC and benefit from: