Multimodal fusion of liquid biopsy and CT enhances differential diagnosis of early-stage lung adenocarcinoma

This research explores the potential of multimodal fusion for the differential diagnosis of early-stage lung adenocarcinoma (LUAD) (tumor sizes < 2 cm). It combines liquid biopsy biomarkers, specifically extracellular vesicle long RNA (evlRNA) and the computed tomography (CT) attributes. The fusion model achieves an impressive area under receiver operating characteristic curve (AUC) of 91.9% for the four-classification of adenocarcinoma, along with a benign-malignant AUC of 94.8% (sensitivity: 89.1%, specificity: 94.3%). These outcomes outperform the diagnostic capabilities of the single-modal models and human experts. A comprehensive SHapley Additive exPlanations (SHAP) is provided to offer deep insights into model predictions. Our findings reveal the complementary interplay between evlRNA and image-based characteristics, underscoring the significance of integrating diverse modalities in diagnosing early-stage LUAD.

individuals had available preoperative blood samples and chest CT scans.Among them, 111 patients were diagnosed with LUAD, while 35 were categorized as benign.The LUAD group is subdivided into three pathological categories: adenocarcinoma in situ (AIS; N = 36), minimally invasive adenocarcinoma (MIA; N = 34), and invasive adenocarcinoma (IA; N = 41).
Model development details are illustrated in Fig. 1A.We extracted imaging features, referred to as Rad features, from a pre-trained multitask 3D DenseSharp neural network 15 .These features included malignancy probability, IA probability, invasiveness category, attenuation category, 2D diameter, and volumetric consolidation tumor ratio (vCTR).In addition, blood samples were collected in 10 mL K2EDTA anticoagulant vacutainer tubes.Subsequent steps for serum extracellular vesicle (EV) purification, RNA isolation and RNA-seq analysis followed procedures from our prior study 12 .
We selected 17 evlRNA features from differentially expressed genes (DEGs) between the LUAD and control groups.Moreover, to evaluate our methods compared to human performance and investigate the potential enhancement of diagnostics through the integration of human expertise, we conducted an observer study involving both a senior and a junior investigator.
For multimodal fusion, incorporating Rad features extracted by AI from CT, evlRNA features from liquid biopsy, and observation features from clinicians, we employed the XGBoost machine learning framework 16 .Separate XGBoost models were established for each feature fusion scenario, with a primary training objective of multi-class classification (IA, MIA, AIS, Benign).The flexibility to use different combinations allows for diverse subgroup analyses.A 5-fold cross-validation was adopted, and average results are reported.
The performance evaluation of the multimodal fusion is shown in Fig. 1B-D, revealing several intriguing discoveries: (1) Combining evlRNA and Rad features results in a highly effective diagnostic method, with an impressive AUC of 0.919 (Fig. 1B).This combined model surpasses unimodal models and is comparable to the performance of senior expert.(2) Integrating human expertise with the combination of evlRNA and Rad characteristics leads to improved results, with AUC values of 0.934 and 0.924 for the inclusion of senior and junior experts, respectively (Fig. 1C).(3) Furthermore, evlRNA-based and image-based features complement each other, displaying a mutually reinforcing relationship (Fig. 1D).The three subplots illustrate that combining evlRNA with image-based attributes (Rad, (v)CTR, observer) leads to better performance than using a single modality.
The evlRNA + Rad model outperforms other multi-modal fusion models without human expert intervention.In the subsequent text, we'll use it as our standard fusion model.We conducted a detailed assessment of our model's performance, concentrating on three vital clinical subtasks (see  18 .In contrast, IA patients have a lower disease-free survival rate, around 60% to 70% 19,20 .The fusion model achieved excellent results for this task with an AUC of 92.1%, a sensitivity of 92.8%, and a specificity of 88.6%.Our fusion model consistently outperforms single-modal models across different subtasks, just as it did in the four-class classification.Notably, our fusion method significantly improves specificity, effectively reducing false positives and overdiagnosis.The fusion model exceeds the specificity of senior experts by 14.3%, 9.7%, and 11.9% in subtasks (a), (b), and (c), respectively.Furthermore, when combining evlRNA, Rad, and senior expert inputs, our model achieves 100% specificity in distinguishing malignant from benign nodules during cross-validation.
To enhance the understanding of feature importance in predictive modeling, we employed the SHapley Additive exPlanations (SHAP) post hoc explanatory framework 21 .We applied this framework to three models: evlRNA, Rad, and evlRNA + Rad.The feature impacts for the 4-category classification are depicted in Fig. 1E.Notably, in the fusion model, vCTR is the most crucial feature.Furthermore, the SHAP framework extended to individualized validation predictions (Fig. 1F).The visual illustration unveils that the patient with a high probability of 0.94 for being classified as IA.This probability primarily results from factors such as an IA probability of 0.9312, a vCTR value of 0.2426, a gene CCND value of 15.89, and other risk-contributing factors.Understanding individual predictions is valuable for clinical decision-making.In addition, we explored how feature values relate to predicted categories (Fig. S1 in Supplementary).In the evlRNA analysis, certain genes exhibit distinct correlations with category predictions, which become more evident in the Rad feature analysis.We believe Rad features, being AIgenerated, naturally possess discriminative abilities.In the joint analysis of Rad and evlRNA features, the top five crucial features combine genetic and imaging traits, highlighting their synergistic effects (details in Supplementary Results).
Assessing a model's robustness is crucial for both evaluation and practical use.We evaluated the robustness of our XGBoost model by adding Gaussian noise to input features (Fig. S2).With low noise, the model's performance slightly declines, but as noise increases, the degradation intensifies.Remarkably, in a specific noise range, our multimodal fusion model, consistently outperforms single-modal models, showcasing its robustness.
Our study has a few limitations.Firstly, we only include 146 participants due to difficulties in obtaining both evlRNA detection data and CT imaging samples.Collecting evlRNA information is time-consuming and expensive.In the future, a larger dataset is needed to avoid overfitting and improve validation accuracy.Secondly, our study only involved internal validation and did not include external validation, thereby leaving the model's applicability and generalizability unexplored.
In summary, our study has underscored the complementary nature between evlRNA-based and image-based features, with human analysis integration leading to improved performance.These results emphasize the critical importance of multimodal fusion to enhance differential diagnosis of early-stage lung adenocarcinoma in the LDCT screening.The validation performance evaluation of different models was conducted using 5-fold cross-validation for four-category classification (Benign AIS, MIA, IA).The mean ROC curves are depicted with dark lines, while the ROC curves for each fold are shown with light-colored lines.The shaded area surrounding the average curves indicates the standard deviation of the 5-fold.The legend for the ROC curve includes the mean AUC with median (±SD).The senior or junior models are represented by the marker "x" with their corresponding sensitivity provided in the legend.B Single-modal models compared with evlRNA + Rad multimodal models.C Collaboration of evlRNA, Rad, and human analysis from junior and senior expert.D Multimodal fusion of evlRNA and image-based features (Rad, (v)CTR, human expert).E Features importance of 4-category classification in SHAP post hoc explanation for three models (evlRNA, Rad, evlRNA + Rad).In each subplot, the horizontal axis denotes the feature names, and the vertical axis denotes SHAP values.Features with larger SHAP values are more important.The four distinct colors on the graph correspond to the four categories.Some features in the figure are abbreviated: malignancy probability (malig_prob), IA probability (IA_prob), and invasiveness classification (Rad_invas).F Explanation of SHAP values for a patient prediction from the model evlRNA + Rad.This patient is pathologically diagnosed as IA.The function f(x) is the output of the model (the predicted probability 0.94), and the base value follows the average of the model predictions.Features that increase the prediction (i.e., higher risk) are highlighted in red, while features that decrease the prediction (i.e., lower risk) are highlighted in blue.The size of the arrow denotes the effect of the features.

Data characteristics
The study involved 146 participants who underwent lung operations due to the presence of pulmonary nodules between 2018 and 2020.This group included 111 patients diagnosed with lung adenocarcinoma (LUAD) and 35 controls classified as benign cases.Essential participant characteristics are provided in Table S1.The following inclusion criteria for the LUAD patients were applied: (a) patients pathologically proven to have LUAD (tumor size < 2 cm), (b) obtainable preoperative blood samples, (c) obtainable chest CT scan, and (d) patients gave their informed consent before enrollment.
This study was approved by the ethics committee of Shanghai Chest Hospital, Shanghai Jiao Tong University School of Medicine, and complied with all relevant ethical regulations including the Declaration of Helsinki.All participants were from a registered lung cancer screening study (China Lung Cancer Screening Study, NCT03975504), and signed informed consent to take part in the research.Our study did not specifically address cases involving multiple nodules.In our cohort, only two individuals had multiple pulmonary nodules.For these cases, we chose to analyze only the most severe nodule.
In this study, the CT images of the latest CT examination before surgery were collected from a single clinical center (Chest Hospital affiliated to Shanghai Jiaotong University School of Medicine).Thicknesses of these scans range between 0.625 mm and 1.5 mm.The pathological label and mass center of each lesion is manually labeled by a junior thoracic radiologist, according to corresponding pathological reports.These annotations are then confirmed by a senior radiologist with 15 years of experience in chest CT.Patient identities are anonymized for privacy protection.

Pre-trained DenseSharp model
To extract nodule features from CT images, we utilized a pre-trained 3D DenseSharp neural network 15 , which had undergone extensive training on two internal datasets: Pretraining cohort A contained 651 subcentimeter nodules 15 , and pretraining cohort B comprised 4728 nodules from the Pulmonary-RadPath dataset 22 .Number of nodules for pretraining can be found in Table S2.The DenseSharp model generates outputs through five heads: four for classification and one for creating a 3D nodule segmentation.
We conducted standard data preprocessing adhering to common practices: (1) Resampling CT volumes to dimensions of 1 mm × 1 mm × 1 mm.(2) Normalizing Hounsfield Units to the range [−1, 1].(3) Cropping a 32 × 32 × 32 volume centered at the centroid of each lesion.In our proposed model, the input consists of a cubic CT volume patch measuring 32 mm × 32 mm × 32 mm.
The training employs early stopping based on validation loss-training stops if the validation loss does not decrease within 10 epochs.We incorporate online data augmentations, such as random rotation, flipping, and translation, in every volume.We use Adam optimizer 23 to train all models end-to-end for 200 epochs.Our experiments are conducted using PyTorch 1.11 24 on 2 Nvidia RTX 3090 GPUs.

Extracting imaging features
We employed the pre-trained 3D DenseSharp neural network to perform the classification task and generate the nodule mask.We collected prediction logits from the classification task, resulting in four nodule attributes.Since the size of the solid component within SSNs observed on CT images is closely related to the extent of tumor infiltration 25,26 , we developed an internal tool to calculate 2D diameter (mm), the consolidation tumor ratio (CTR), and volumetric CTR (vCTR).Notably, a nuanced differentiation exists between the two, as CTR measures the diameter fraction of the solid components in nodules, whereas vCTR quantifies the volumetric proportion.By combining these features with the previously established fundamental attributes, we derived a set of six nodule imaging features known as Rad features.These Rad features encompass malignancy probability, IA probability, invasiveness category, attenuation category, clinically measured 2D diameter (mm), and v(CTR).

Integrating multimodality features
We employed the machine learning framework XGBoost 16 to perform multimodal fusion of various features.To integrate human expertise, we gathered pathological four-type judgments from both doctors for all samples.These judgments were then used as features, combined with other modal features, and introduced as fusion features into the XGBoost model for training and validation.In our experiments, we established separate XGBoost models for each feature fusion scenario.
The primary training objective of our model involves multi-class classification, specifically distinguishing between (IA, MIA, AIS, Benign) categories.This is achieved using the multiclass softmax as the objective function, which generates a probability distribution for each class.When it comes to prediction results, we have the flexibility to use different combinations based on specific needs, allowing for various subgroup analyses.As an illustration, we outline the calculation of positive and negative probabilities for each task as follows: in task (a), y postive = y IA + y MIA + y AIS , y negative = y Benign ; in task (b), y postive = y IA + y MIA , y negative = y AIS + y Benign ; in task (c), y postive = y IA , y negative = y MIA .We adopted a 5-fold cross-validation approach, wherein the entire dataset was evenly divided into five distinct subsets.During each iteration, four subsets were used for training, leaving one subset for validation.The stopping criteria involve early stopping based on the maximum number of iterations, with the default value of num_boost_round set to 10.The reported performance metrics represent the average results obtained across the fivefold validation.To assess the effectiveness of a diagnostic test in distinguishing between positive and negative cases, we employed the Youden index.This threshold is employed to strike a balance between sensitivity and specificity.Our multi-modal fusion model's parameters are shown in Table S3.

Observation study
To compare our methodologies with human proficiency, an experienced senior radiologist (with over a decade of expertise in chest CT interpretation) and a junior radiologist (with 3 years of experience in chest CT interpretation) from Chest Hospital affiliated to Shanghai Jiaotong University School of Medicine were consulted.These professionals, who were kept unaware of the histopathological findings and clinical information, independently undertook the task of classifying and diagnosing all the nodules.The outcomes of their expert-based image interpretations were referred to as observation features.

Model robustness analysis
During the experiments, we introduced Gaussian noise with a mean of 0 and observed its impact on the model's performance (4-categoray classification AUC) as perturbations increased, corresponding to an increase in the standard deviation.We selected three models for analysis: Rad, evlRNA, and a multi-modal fusion model, evlRNA + Rad.In our study, we injected Gaussian noise into the input features of the XGBoost model with a mean of 0 and a standard deviation of σ.We then observed the trend of 5-fold AUC validation performance on the dataset as the standard deviation varied.For each standard deviation value selected, we randomly generated Gaussian noise 100 times and calculated the average AUC results for these 100 runs.In Fig. S3, we represented the average results of the 100 runs with a solid line, and we used shading to indicate the standard deviation interval for 100 noise injections.

Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
The sequencing data have been deposited in the National Center for Biotechnology Information Gene Expression Omnibus (GEO) (http://www.ncbi.nlm.nih.gov/geo/)database under accession number GSE200288.Additional data utilized and/or analyzed during the current study are available from the corresponding author upon reasonable request. /doi.org/10.1038/s41698-024-00551-8

Fig. 1 |
Fig.1| Development, diagnostic performance and post hoc explanation of the multimodal fusion model.A Multimodal fusion model development and post hoc explanation.B-D The validation performance evaluation of different models was conducted using 5-fold cross-validation for four-category classification (Benign AIS, MIA, IA).The mean ROC curves are depicted with dark lines, while the ROC curves for each fold are shown with light-colored lines.The shaded area surrounding the average curves indicates the standard deviation of the 5-fold.The legend for the ROC curve includes the mean AUC with median (±SD).The senior or junior models are represented by the marker "x" with their corresponding sensitivity provided in the legend.B Single-modal models compared with evlRNA + Rad multimodal models.C Collaboration of evlRNA, Rad, and human analysis from junior and senior expert.D Multimodal fusion of evlRNA and image-based features (Rad, (v)CTR, human expert).E Features importance of 4-category classification in SHAP post hoc https://doi.org/10.1038/s41698-024-00551-8Brief communication npj Precision Oncology | (2024) 8:50

Table 1 |
Classification performance overall and across three clinical subtasks discrimination between malignant nodules (IA, MIA, or AIS) and benign nodules, discrimination between invasive nodules (IA or MIA) and preinvasive nodules (AIS or Benign), and discrimination between IA nodules and MIA nodules.The performance metrics of AUC (%), AUC p-value (P), sensitivity (Sens, %), and specificity (Spec, %) are presented from the mean of 5-fold cross-validation, with 95% confidence interval (CI) provided.Bold: Best-performing model within its subtasks and subgroup analysis.