CT-based deep learning radiomics biomarker for programmed cell death ligand 1 expression in non-small cell lung cancer

Background Programmed cell death ligand 1 (PD-L1), as a reliable predictive biomarker, plays an important role in guiding immunotherapy of lung cancer. To investigate the value of CT-based deep learning radiomics signature to predict PD-L1 expression in non-small cell lung cancers(NSCLCs). Methods 259 consecutive patients with pathological confirmed NSCLCs were retrospectively collected and divided into the training cohort and validation cohort according to the chronological order. The univariate and multivariate analyses were used to build the clinical model. Radiomics and deep learning features were extracted from preoperative non-contrast CT images. After feature selection, Radiomics score (Rad-score) and deep learning radiomics score (DLR-score) were calculated through a linear combination of the selected features and their coefficients. Predictive performance for PD-L1 expression was evaluated via the area under the curve (AUC) of receiver operating characteristic, the calibration curves, and the decision curve analysis. Results The clinical model based on Cytokeratin 19 fragment and lobulated shape obtained an AUC of 0.767(95% CI: 0.673–0.860) in the training cohort and 0.604 (95% CI:0.477–0.731) in the validation cohort. 11 radiomics features and 15 deep learning features were selected by LASSO regression. AUCs of the Rad-score were 0.849 (95%CI: 0.783–0.914) and 0.717 (95%CI: 0.607–0.826) in the training cohort and validation cohort, respectively. AUCs of DLR-score were 0.938 (95%CI: 0.899–0.977) and 0.818(95%CI:0.727–0.910) in the training cohort and validation cohort, respectively. AUCs of the DLR-score were significantly higher than those of the Rad-score and the clinical model. Conclusion The CT-based deep learning radiomics signature could achieve clinically acceptable predictive performance for PD-L1 expression, which showed potential to be a surrogate imaging biomarker or a complement of immunohistochemistry assessment.

CT-based deep learning radiomics biomarker for programmed cell death ligand 1 expression in non-small cell lung cancer

Background
Lung cancer is the second most commonly diagnosed cancer worldwide [1,2].Although early-stage detection through low-dose CT screening and mini-invasive videoassisted thoracoscopic surgery have improved patients' survival and life quality greatly, lung cancer remains the leading cause of cancer-related death due to approximately 80% of lung cancers being diagnosed at advanced stage, which are unresectable and systemic chemotherapy is the only option [3,4].Only 20-40% of patients response to the standard platinum-based chemotherapy [5].With the development of immunotherapy, the management of lung cancers has evolved enormously recently.Non-small cell lung cancer (NSCLC) accounts for about 85% in all lung cancers, including the most common subtypes such as lung adenocarcinoma and lung squamous cell carcinoma [6].The first-generation antibody-based immunotherapy, which targets at blocking the receptor and/or ligand interactions of molecules, such as programmed cell death protein 1 (PD-1) and its ligand (PD-L1) or cytotoxic T lymphocyte antigen-4 can modulate antitumor responses, had shown remarkably response durable in NSCLCs [7].Unfortunately, only 17-21% of patients with NSCLCs demonstrated a response to anti-PD-1 or PD-L1 therapy [8,9].Therefore, acknowledgment of which patients would benefit from immune checkpoint inhibitors (ICIs) is needed in NSCLCs treatment strategies.Many studies revealed that tumor mutational burden and PD-L1 expression were independent predictive factors for the response of ICIs [10,11].However, acknowledging these biomarkers requires invasive procedures to obtain tumor tissue specimens for gene sequencing or immunohistochemistry (IHC) staining, which are time-consuming and expensive.Furthermore, obtaining tissue specimens is difficult, even impossible in most clinical scenarios for patients with advanced NSCLC.
As a noninvasive technique, CT has been widely implemented in the diagnosis, staging, treatment planning, and response assessment of NSCLCs through radiologists' visual interpreting, which uses only a few metrics of imaging.The development of computer science and artificial intelligent results into the emergence of radiomics, which extracts high-dimensional features from medical imaging data to decode imaging phenotype to achieve comprehensive clinical goals [12].In recent years, convolutional neural network (CNN) with multiple network structures has been widely used in radiological tumor research.They can extract a large number of useful deep learning (DL) features for tumor grade prediction, lymph node metastasis prediction and risk prognosis prediction [13][14][15].However, the construction of CNN often requires a large number of samples, and most medical studies often have a small sample size.Therefore, transfer learning is widely used in the field of medical deep learning, which can alleviate the limitation of small data sets [16].Transfer learning involves the use of pretrained neural networks on other images and allows existing training models to be applied to unsolved problems, thus greatly reducing the need for a large amount of training data.The purpose of this study was to develop deep learning radiomics signature as a surrogate imaging biomarker for PD-L1 expression of NSCLC, using transfer learning to extract features from CT images, in order to provide decision-making support for selecting patients who would benefit from ICIs treatment.

Patients
This study was approved by the Local Hospital Ethics Committee of our hospital with the waiver of informed consent due to the retrospective nature.We searched our hospital's database for patients who had pathological diagnosis of NSCLC via biopsy or surgery receiving CT examinations being performed within 3 months before biopsy or surgery and IHC staining for PD-L1 expression.Exclusion criteria were as follows: (1) insufficient image quality for nodule segmenting (n = 5); (2) undergoing anti-tumor therapy (radiotherapy, chemotherapy or chemoradiotherapy) before biopsy or surgery (n = 7); (3) Radiomics feature or deep learning feature extraction failed (n = 30).At first, 301 deemed eligible patients were identified.After excluding 42 patients, the final study cohort included 259 patients.whowere divided into a training cohort and a validation cohort at a ratio of 7:3 according to the chronological order (Fig. 1).Patients before October 2021 were assigned to training sets(PD-L1 negative n = 131, PD-L1 positive n = 32), and patients after that wereassigned to validation sets (PD-L1 negative n = 67, PD-L1 positive n = 29).There was no significant difference in the distribution of PD-L1 expression in the training set and the validation set (P = 0.053), which could be used for model establishment and validation.

CT acquisition and interpretation
Preoperative chest CT examinations were performed on CT scanners (Brilliance iCT, Philips Medical Systems).The acquisition parameters were as follows: 0.625 mm x 128 of a collimation; 120 kVp of tuber voltage; automatic exposure control (AEC) of tube current; a reconstruction slice thickness of 1.5 mm and a gap of 1 mm; field of view of 350 × 350 mm; matrix of 512 × 512.
Two experienced radiologists (2 years and 5 years chest CT interpretation experience), who were blinded to the clinical and PD-L1 expression, interpreted the thin slice CT images in the lung window setting to obtain the semantic features of nodules on the PACS (Picture Archiving and Communication Systems).In the semantic description of nodules, CT characteristics as following were included: vacuolar sign, cavity, pleural thickening, pleural indentation, hilar adenopathy, mediastinal adenopathy, vessel convergence, location, lobulated shape, spiculation, airbronchial sign, types of nodules and size.

Feature extraction, feature selection and signature construction
The thin slice images in Digital Imaging and Communication in Medicine (DICOM) format derived from PACS were transferred to ITK-SNAP 3.8.0(http://www.itksnap.org) of a personal computer for tumor segmentation and radiomics feature extraction.The regions of interest (ROIs) of tumor were manually drew using ITK-SNAP software on each thin slice to convert a three-dimensional volume of interest (3D-VOIs).Two radiologists with 2 years and 5 years of experience in thoracic CT interpretation segmented tumors in 30 randomly selected patients independently.The radiologist with 2 years of experience segmented all the nodules manually.1834 radiomics features were extracted from the 3D-VOIs using the opensource software package Pyradiomics (https://github.com/Radiomics/pyradiomics).
A pre-trained CNN, ResNet 50, was used for transfer learning to extract deep learning features from thin CT images of NSCLCs.First, the image with the largest tumor area per patient was selected and the grayscale values were normalized into the range [− 1,1] using a min-max transformation.Then each cropped subregion image was resized to 224 × 224, and the resulting image was used as model input [17,18].Z-Score normalization was implemented to reduce the influence of features' scales.Intraclass correlation coefficients (ICC) of interobservers were implemented to roll out those radiomics or deep learning features with low repeatability (ICC ≤ 0.75).Pearson correlation was performed to exclude the radiomics or deep learning features with high correlations (r > 0.90).Then the retained features were introduced into the least absolute shrinkage and selection operator (LASSO) regression with 5-fold cross-validation to select radiomics or deep learning features which were strong association with PD-L1.Radiomics score (Rad-score) and deep learning radiomics score (DLR-score) for each patient were calculated through a linear combination of the selected features weighted by their coefficient.Feature selection procedure was implemented on both radiomics features and deep learning features in the training cohort.The signatures trained on the training cohort were applied to the validation cohort for testing in independent cases.The workflow of model building is shown in Fig. 2.

PD-L1 testing
PD-L1 expression of NSCLCs was assessed using immunohistochemical staining and reported as tumor proportion score (TPS).TPS is defined as the percentage of tumor cells stained with PD-L1 membrane of any intensity.The PD-L1 expression was dichotomized according to TPS level (TPS < 1% is negative, TPS ≥ 1% is positive).Finally, there were 198 PD-L1 negative cases and 61 PDL1 positive cases.

Statistical analysis and Model Development
For clinical metrics, univariate analysis (t-test, Mann-Whitney U rank test, χ 2 test, or Fisher's precise probability test) was used to select those which were related to PD-L1 expression, and then introduced into a multivariate logistic regression model.The predictive performances of the clinical model, the Rad-score and the DLR-score were assessed by area under the curve (AUC) of receiver operating characteristic (ROC) which were compared using the DeLong method.The calibration effectiveness, the goodness of fit, the net benefit and the clinical effectiveness of the better model were evaluated using the calibration curve, the Hosmer-Lemeshow and the decision curve.ROC analysis of models was performed to obtain the optimal cut-off value [19,20].

Patient characteristics and clinical model
Demographic characteristics of all patients were showed in Table 1.At univariable analysis, Cytokeratin 19 fragment (P = 0.047) and lobulated shape(P = 0.014) were related to PD-L1 with statistical significance (Table 2) in training cohort.Multivariate logistic regression revealed that Cytokeratin 19 fragment and lobulated shape were independent risk factors for PD-L1 (Table 2).A clinical model based on these two clinical features yielded predictive performance with an AUC of 0.767(95% CI: 0.673-0.860)for PD-L1 in the training cohort and 0.604 (95% CI:0.477-0.731) in validation cohort.activation mapping that can provide a rough positioning map to highlight important areas of the classification target (Figs. 4 and 5).The calibration curves of the radiomics model showed a good calibration effect on the predictive efficacy of PD-L1 expression in the training cohort (Fig. 6A), and the Hosmer-Lemeshow test showed nonsignificant statistic in the training cohort (P = 0.141), indicating that there was no significant difference between prediction and pathology result.The decision curve analysis shows that radiomics model achieved a high net benefit at most probability thresholds, indicating that DLR-score could achieve excellent clinical effectiveness when the probability of threshold is approximately between 0% and about 80% in the training cohort (Fig. 6B).

Discussion
In this study, we probed whether deep learning signature derived from preoperative CT could be used to predict PD-L1 expression.We developed and validated three models, clinical, radiomics, and DLR-score, for predicting PD-L1 expression by quantitative analysis of CT images of NSCLCs.In the training and validation cohorts, the DLR-score showed the best predictive performance compared to other models.The AUCs of DLRscore were 0.938(95%CI: 0.899-0.977) in the training cohort and 0.818(95%CI:0.727-0.910) in the validation cohort.DCA showed that the DLR-score can improve the predictive performance of PD-L1 expression.The high predictive performance of the DLR-score showed the possibility to be a noninvasive surrogate biomarker for PD-L1 facilitating the selection of patients who would benefit from ICIs treatment.PD-L1 plays an important role in guiding immunotherapy of lung cancer.When the PD-L1 of tumor cells binds to the PD-1 on the surface of immune cells, a negative immune response will occur, resulting in the escape of tumor cells and promoting the occurrence, development and metastasis of tumors [23].PD-1/PD-L1 inhibitors kill tumor cells by blocking the binding of the PD-1/PD-L1 pathway, relieving the negative regulation of immune cells and preventing immune escape [24].PD-1/PD-L1 inhibitors were included in the first category of recommendations in NSCLC's NCCN guidelines [25,26].When stratified analysis of PD-L1 in tumor tissue, it was found that PD-1/PD-L1 inhibitor was more effective in patients with PD-L1 positive [27][28][29].Therefore, there is an urgent need for screening patients who might be most likely to benefit from ICIs treatment.Recent studies showed that PD-L1 expression, tumor mutational burden and tumor immune microenvironment can be used as predictive biomarkers to predict the response of ICIs treatment, but these require not only invasive procedures to obtain tissue specimens, but also time-consuming and expensive laboratory tests [7,30].Therefore, these biomarkers are not broadly available in clinical scenarios, especially in patients with advanced NSCLCs for tissue specimens are difficult to obtain sometimes, even impossible.Furthermore, malignancies are heterogenetic, and tissue specimens, especially those through biopsy, may harbor sample errors.Therefore, clinicians still confront the challenge to choose suitable patients for ICIs treatment.
Radiomics is a data-driven discipline based on widely available imaging data that can be used to improve diagnosis, prognosis, and clinical decision support [31].In order to provide decision support of ICIs treatment, several studies have investigated the association between radiomics signature with PD-L1 expression and tumor immune microenvironment in several kinds of solid tumors [32][33][34].Regarding PD-L1 expression in NSCLC, Jiang et al [35] derived radiomics signatures from CT, PET, and PET/CT, which achieved predictive performance to identify PD-L1 expression over 1% with AUC of 0.86, 0.62, and 0.85, respectively.Using the same algorithm, Sun et al [36] reported preoperative CT-derived radiomics signature obtained AUCs of 0.786 and 0.807 in the training and validation cohort, respectively.When   4 The comparison of clinical and CT features between high DLRad-score and low DLRad-score combined with clinicopathological features, the predictive performance increased to 0.829 and 0.848, respectively.However, the authors did not evaluate whether the difference was statistically significant.Our present study achieved a similar predictive performance with these two studies.In our study, the AUC of our clinical model and the radiomics score were 0.767 (95%CI: 0.673-0.860)and 0.849 (95% CI: 0.783-0.914),respectively, which was similar with the results of the above two studies.Deep learning has made significant progress in the field of medical image analysis by mining high-throughput information from medical images for recognition of images or prediction of gene expression [37][38][39][40].Wang et al. developed an end-to-end deep learning model based on CT images, which obtained AUCs of 0.85 (95% CI 0.83-0.88) in the primary cohort and 0.81 (95% CI 0.79-0.83) in the independent validation cohort to predict EGFR mutation status in lung cancers, respectively [38].Chen et al. developed a deep learning-based method for the automatic segmentation of meningiomas from multiparametric MR images, and the AUC of the radiomics model with automatic segmentation was comparable to the AUC of the manual segmentation model in the internal (0.95 vs. 0.93, p = 0.176) and external (0.88 vs. 0.91, p = 0.419) test cohort [37].In this study, a pre-trained CNN, ResNet 50, was implemented to extract 2048 deep learning features from CT images of NSCLCs, and 15 features were found to be strongly associated with PD-L1 expression.Compared to the clinical model and radiomics model, the DLR-score demonstrated the highest predictive performance for PD-L1 expression with AUCs of 0.938(95%CI: 0.899-0.977)and 0.818(95%CI:0.727-0.910) in the training cohort and the validation cohort, respectively.The optimal cut-off value derived from DLR-score was 0.246 achieved sensitivity, specificity, PPV, NPV, and accuracy, of 90.6%,  87.8%, 64.4%, 97.5%, and 88.3%, respectively, which were more improved than the other two models.The decision curve analysis also showed that the clinical net benefit of the DLR-score was higher than that of clinical models and radiomics score, both in the training and validation cohort.
Several limitations of this study need to be acknowledged.First, this was a retrospective study with a small sample size and no external validation cohort.Second, the imbalance distribution between PD-L1 positive and negative expression may impact on the predictive performance of the model.Third, to avoid overfitting, transfer learning often requires a large sample size, and the sample size in this study was clearly not sufficient for 3D analysis, so we used the image of the largest tumor area, rather than using 3D whole tumor volume to extract DL features [17].However, this approach is time-saving and may be more clinically appropriate.Finally, owing to surgical confirmed cases at an early stage and a short followup period, only several patients received ICIs treatment after surgery due to recurrence or metastasis at present.Therefore, the predictive performance of the CT-based deep learning radiomics signature for treatment response was not evaluated at this study.

Conclusion
In conclusion, this study developed clinical, radiomics and deep learning models to predict PD-L1 expression in NSCLCs non-invasively.It showed the CT-based deep learning radiomics model could achieve clinically acceptable predictive performance in both training and validation cohorts.The deep learning radiomics signature could offer a surrogate imaging biomarker or a complement for IHC analysis, which could facilitate clinical decision support in identifying NSCLC patients who are likely to benefit from ICIs treatment.

Fig. 3
Fig. 3 The ROC curves of clinical model, Radscore, and DLR-score in training cohort (A) and validation cohort (B), respectively

Fig. 4 A
Fig. 4 A 51-year-old man with NSCLC.(A) Axial CT image shows a solid nodule of the left upper lobe.DLR-score 0.839.(B) The Photomicrograph shows a positive expression of PD-L1(IHC; x400).(C) Grad-CAM visualization.Grad-CAM, gradient-weighted class activation mapping

Table 1
Demographic and CT characteristics of all patients ing cohort and validation cohort, respectively.The predictive performance of the DLR-score was significantly higher than that of the clinical model and Rad-score in both cohorts (Fig.3).The results of the ROC analysis of each model are shown in Table3.The Youden index of 8%, 64.4%, 97.5%, and 88.3%, respectively.The comparison of clinical and CT features between high (more than cut-off value) and low ( less than cut-off value) DLR-score was shown in Table4.Intending to examine the interpretability of DL features, we also visualize the network by applying a gradient-weighted class

Table 2
The univariate and multivariate logistic regression of PD-L1 expression based on clinical and CT characteristics in training cohort

Table 3
The performance of deep transfer learning radiomics signature to predict PD-L1 expression The P value of the clinical model and Rad-score were obtained by performing DeLong test in two cohorts with reference to the AUC of DLR-score respectively PPV: Positive predictive value; NPV: Negative predictive value *