Nomogram model combining macro and micro tumor-associated collagen signatures obtained from multiphoton images to predict the histologic grade in breast cancer.

The purpose of this study is to develop and validate a new nomogram model combining macro and micro tumor-associated collagen signatures obtained from multiphoton images to differentiate tumor grade in patients with invasive breast cancer. A total of 543 patients were included in this study. We used computer-generated random numbers to assign 328 of these patients to the training cohort and 215 patients to the validation cohort. Macroscopic tumor-associated collagen signatures (TACS1-8) were obtained by multiphoton microscopy at the invasion front and inside of the breast primary tumor. TACS corresponding microscopic features (TCMF) including morphology and texture features were extracted from the segmented regions of interest using Matlab 2016b. Using ridge regression analysis, we obtained a TACS-score for each patient based on the combined TACS1-8, and the least absolute shrinkage and selection operator (LASSO) regression was applied to select the most robust TCMF features to build a TCMF-score. Univariate logistic regression analysis demonstrates that the TACS-score and TCMF-score are significantly associated with histologic grade (odds ratio, 2.994; 95% CI, 2.013-4.452; P < 0.001; 4.245, 2.876-6.264, P < 0.001 in the training cohort). The nomogram (collagen) model combining the TACS-score and TCMF-score could stratify patients into Grade1 and Grade2/3 groups with the AUC of 0.859 and 0.863 in the training and validation cohorts. The predictive performance can be further improved by combining the clinical factors, achieving the AUC of 0.874 in both data cohorts. The nomogram model combining the TACS-score and TCMF-score can be useful in differentiating breast tumor patients with Grade1 and Grade2/3.


Introduction
Breast cancer is the most prevalent type of malignancy among females in the world, comprising almost 25% of all cancer cases among females. It is also the leading cause of cancer-related mortality in women worldwide. According to the latest global cancer data in 2020, the number of new breast cancer is 2.26 million and breast cancer officially replaces lung cancer and becomes the largest cancer in the world [1]. Breast cancer is a highly heterogeneous disease with different clinical manifestations, morphological appearances, molecular features, response to treatment, and clinical outcomes. Current routine management of breast cancer depends on the availability of clinical and pathological prognostic factors to guide patient decision making and treatment selection [2]. A consensus statement by the College of American Pathologists discussed several factors that would determine the prognosis of breast cancer, including tumor size, lymph node status, histologic type, histologic grade, and hormone receptor status [3]. One of the well-established prognostic factors is histological grade, which represents the morphological evaluation of tumor biological characteristics and has been proved to be able to produce important information related to the clinical behavior of breast cancers [2,3]. Studies have showed that histological grade is an independent prognostic factor in specific subgroups of breast cancer patients, including estrogen receptor positive [4], and lymph node negative [5,6] or positive [6,7]. Moreover, histological grade has incorporated a variety of effective prognostic algorithms to determine breast cancer treatment, such as Nottingham prognostic index, adjuvant! Online, and the St. Gallen guidelines [8,9,10]. Therefore, histological grading is very important to guide the appropriate treatment in clinical practice, and accurate identification of histological grade in invasive breast cancer can provide useful guidance for prognosis.
For patients diagnosed with breast cancer, histological grade describes the invasive potential and is a comprehensive score of tubule formation (TF), nuclear pleomorphism (NP), and mitotic count (MC) based on the microscopic evaluation by pathologist. For each component, it is assigned a score from 1 to 3. The overall histological grade is determined by the sum of the scores from the three components. Grade 3 tumors are the most aggressive, highly proliferative and poorly differentiated. Grade 2 tumors are moderately differentiated. Grade 1 tumors are the least aggressive, slow growing and well differentiated, where a lower grade indicates a better prognosis and a higher grade is associated with a lower survival rate [11,12]. In general, pathologists examine postoperative histopathology images under high-resolution microscopes to determine histological grade through assessment of the tumor cell morphology and tissue structure without considering the role of extracellular matrix in the tumor microenvironment. Multiphoton microscopy (MPM) is widely applied in biological imaging since 1990 [13]. With superior features such as high-resolution at the cellular and subcellular levels, rapid, and label-free property, this microscopic imaging technique is particularly suitable for imaging unprocessed tissue samples. Multiphoton imaging could incorporate the tumor cell information in the two-photon excited fluorescence (TPEF) image with the collagen fiber information in the second harmonic generation (SHG) image, and therefore has the potential to identify the morphological changes in both tumor cells and their surrounding collagen fibers. Currently, multiphoton imaging has been widely developed and applied in the biomedical science field with the development of interdisciplinary medicine [14,15]. Previous studies suggested that SHG image can identify collagen patterns related to invasion and metastasis of breast tumors, and predict the prognosis of breast cancer [16,17]. Kakkad et al. reported that a significantly increased density of collagen is associated with lymph node metastasis in breast cancer by SHG imaging [18]. Chen et al. revealed that collagen alterations in the tumor microenvironment of early gastric cancer significantly predict lymph node metastasis [19]. In particular, our previous study has proved that different tumor-associated collagen signatures (TACS) have a strong correlation with the prognosis of breast cancer [17].
Nevertheless, the association between collagen structure characteristics and histological grade of breast tumor has not been reported. Therefore, on the basis of our previous research, we conducted the study on the relationship between macroscopic TACS and tumor grade. In addition, we further extracted the corresponding microscopic characteristics of TACS (TCMF) and combined them to more comprehensively reflect the relationship between the collagen signatures and tumor grades. Finally, we developed and validated a nomogram model that combines the macro and micro tumor-associated collagen signatures derived from multiphoton images for personalized prediction of pathologic grades of breast tumor.

Patients
This retrospective research was approved by the Institutional Review Boards of Fujian Medical University Union Hospital. Exclusion criteria for patients are: neoadjuvant systemic therapy, damage, tumor-free section, no available histological grading information and pathological report. The specific patient selection pathway is shown in Fig. S1. 543 patients with a mean age of 49 years and an age range of 21-84 years were included, and 99 patients were classified as low-grade (Grade1), 280 as intermediate-grade (Grade2) and 164 as high-grade (Grade3). The tumor histological grade was assessed by the Elston-Ellis System [12]. Clinical characteristics were obtained, including age at surgical intervention, molecular subtype (Luminal A, Luminal B, HER2-enriched and Triple-negative), tumor size (≤2cm: the long diameter of the tumor mass is less than 2 cm, 2-5cm: the long diameter of the tumor mass is greater than 2 cm and less than or equal to 5 cm; >5cm: the long diameter of the tumor mass is greater than 5 cm), and node status (0: 0 positive lymph node; 1-3: 1 to 3 positive lymph nodes; ≥4: more than or equal to 4 positive lymph nodes). Patient clinical information in the training and validation cohorts is given in Table 1.

Workflow
The complete workflow of the study is shown in Fig. 1 and can be divided into the following aspects: the region of interest selection, MPM image acquisition, feature extraction and construction of tumor histological grade classifier.
Throughout the entire tissue section, several non-overlapping regions of interest (ROI) across the invasion front and inside of the tumor were labeled in the H&E images, and then the corresponding MPM images were acquired. The macroscopic tumor-associated collagen signatures (TACS) were visually examined by three independent reviewers who are blind to the final pathological outcomes. TACS corresponding micro-feature (TCMF) were then extracted from the segmented ROIs to build an automated extracting-based model to classify the pathological grade of tumor. Two separated cohorts were used to develop and validate the tumor histological grade classifier. The data from Fujian Medical University Union Hospital were randomly divided into training cohort and validation cohort, and the validation cohort was used to verify the developed classifier.

Sample preparation, multiphoton image acquisition, TACS quantification and TCMF extraction
To obtain a large number of samples and relevant clinical data, formalin-fixed paraffin-embedded (FFPE) tissues were used in this study. Two serial 5-µm thickness sections were cut from tissue samples for MPM imaging and H&E staining, respectively. Multiphoton imaging was achieved using a previously described nonlinear optical imaging system. Briefly, a commercially laser scanning microscope (LSM 880, Zeiss, Germany) equipped with a mode-locked femtosecond Ti: Sapphire laser (Chameleon Ultra, Coherent, 810 nm excitation light) was used to obtain high-resolution images. The backscattered signals were obtained via two independent channels at the same time: one channel for detecting second harmonic generation (SHG) signal (green color) was set between 395 nm and 415 nm, whereas the other channel for detecting two-photon excitation fluorescence (TPEF) signal (red color) was set between 428 nm and 695 nm. A Plan-Apochromat 20× objective (NA = 0.8, Zeiss, Germany) was employed for acquiring images from tissue samples. The protocol for TACS quantification has been described in detail in previous study [17]. Subsequently, for each non-overlapping MPM imaging, we intercepted a region of interest with a field of view of 150µm× 150µm to extract TCMF. Four types of collagen features were extracted from MPM images using Matlab 2016b, including 8 morphologic features, 6 histogram-based features, 80 GLCM-based features and 48 Gabor wavelet transform features. Morphological features are the collagen area, number, length, width, straightness, crosslink density, crosslink space and orientation. Histogram-based features are the mean, variation, skewness, kurtosis, energy and entropy of the SHG pixel intensity distribution. The GLCM-based features and Gabor wavelet transform features have been previously described [20]. The GLCM is a second-order statistical representation that would reflect the relationship of neighboring pixels in one image and is constructed by counting the number of occurrences of gray levels for pixel pairs, while the Gabor wavelet transform features is used to describe the image patterns at a range of different scales and directions. In this work, the matrix size in our GLCM is 273 × 273 pixels.

Ridge and LASSO regression to build TACS-score and TCMF-score
Based on the quantified TACS and tumor grades in the training cohort, we used ridge regression to retrieve the coefficient of each TACS, and built the TACS-score based on the 8 TACS coefficients. To build a reliable TCMF-score, the least absolute shrinkage and selection operator (LASSO) regression method was applied to select the most robust and non-redundant features from the 142 features. A newly-assembled collagen signature was created by a linear combination of selected features weighted by their coefficients.

Statistical analysis
Univariate and multivariable logistic regression analyses were used to evaluate the conventional clinical risk factors, TACS-score and TCMF-score, and to explore the association of these variables with histologic grades (Grade1 group and Grade2/3 group). The nomogram, which could provide an intuitive method and quantitative tool for clinicians to quickly predict the potential outcomes of patients, was developed from the training cohort and validated on the validation cohort. The predictive accuracy of nomogram model was evaluated by the receiver operating characteristic (ROC) curves. The calibration of the nomogram was evaluated by a calibration plot, which was a graphic representation of the relationship between the actual incidences and the predicted probabilities. The degree of overlap between the calibration curve and the diagonal in the graph reflects the predictive accuracy of the model. The patients were classified into Grade1 group and Grade2/3 group using a threshold computed from the training cohort by the maximal Youden Index (sensitivity + specificity-1). The maximum of the Youden index was used to determine the optimal sensitivity and specificity, as well as the optimal cutoff value that was also applied to the validation cohort. The Mann-Whitney U test was applied to analyze collagen signatures, and χ 2 test was performed to compare the differences between clinical categorical variables. All statistical analysis was performed with R 3.5.2 and IBM SPSS Statistics 24.

Patients' characteristics
In this study, 584 breast cancer patients received a diagnosis from November, 2003 to June, 2017, where 543 patients who meet the inclusion criteria were enrolled, and 328 and 215 patients were randomly divided into the training and validation cohorts, respectively (Table S1). The training and validation cohorts have a similar distribution in patient characteristics. No significant difference is found in pathologic grade and clinical characteristics (age, molecular subtype, tumor size, node status) between the training cohort and validation cohort (P >0.05). The detailed distribution of clinical characteristics in the Grade1 and Grade2/3 groups was summarized in Table 1. The molecular subtype, TACS-score and TCMF-score have a significant difference between the Grade1 and Grade2/3 groups both in the training and validation cohorts.

TACS-score and TCMF-score
As shown in Fig. 1, tumor grade-related TACS-score was a manifestation of the eight TACSs, and the formula for each patient was presented in the supplementary material. A total of 142 TCMFs were extracted from TACS in each SHG image. Seven histological grade-related features with nonzero coefficients were screened using a LASSO logistic regression model in the training cohort (Fig. S2). In univariable logistic regression analysis, these seven selected features are significantly associated with histologic grades (Table S3). These features were presented in the TCMF-score calculation formula (supplementary material). Grade2/3 patients generally display a higher TACS/TCMF-score than Grade1 patients (Table 1)

Histologic grade prediction using the TACS-and TCMF-score
The TACS-score displays an AUC of 0.722 (95% CI, 0.671-0.770) and 0.733 (95% CI, 0.668-0.791) for predicting histologic grade (Grade1 vs. Grade2/3) in the training and validation cohorts (Fig. 2). The TCMF-score indicates a favorable prediction that produces an AUC of 0.812 in the training cohort (95% CI, 0.765-0.853) and 0.805 in the validation cohort (95% CI, 0.746-0.856), respectively (Fig. 2). When the two scores are combined, the AUC increases to 0.859 (95% CI, 0.816 to 0.894) and 0.863 (95% CI, 0.809 to 0.906) (Fig. 2). The quantitative values of TACS + TCMF-score for each patient are shown in Fig. 3(A) and (B). We used a threshold calculated by the maximum Youden index for group classification, and accordingly divided our patients into Grade1 and Grade2/3. The sensitivity is 88.7% and 83.8% in the training and validation cohorts, respectively. This result demonstrates the high accuracy of the developed collagen signatures for the classification of Grade1 and Grade2/3. In the Fig. 3(A) and (B), the vertical black dashed line represents the best cutoff value. The left side of the line is the predicted Grade1 and the right side is the predicted Grade2/3. Green dots indicate the actual Grade 1, and red dots show the actual Grade 2/3. The markers in red rectangle indicate the patients with incorrect tumor grade discrimination, and the specificity in the training and validation cohort is 68.3% and 77.8%, respectively. As can be seen, most patients are correctly predicted. Figure 3(C) and (D) reveal that TACS + TCMF-score of Grade1 are lower than that of Grade2/3 and there is a significant difference between them.
Furthermore, the TACS-score and TCMF-score were used to form a nomogram for personalized prediction of histological grade (Fig. 4(A)). For example, a patient with the TACS-score of 1.96 and TCMF-score of 1.01 would have a total point of 90.8 and has Grade 2/3 rate of 82.3%.    In order to elucidate the predictive performance of the TACS + TCMF-score among different subgroups, we conducted a number of subgroup analyses according to clinical variables. Except for the unsatisfactory prediction in patients with tumor size greater than 5 cm, TACS + TCMF-score performs well in other subgroups (Table 2).

Performance comparison of different predictors and prediction models
We assessed the correlation of histologic grade with age, molecular subtype, tumor size, node status in the training and validation cohorts, as shown in Table 3 and Table S2. The molecular subtype, tumor size, node status, TACS-score and TCMF-score are significantly associated with histologic grade by univariate analysis. After multivariate analysis, the TACS and TCMF-score and molecular subtype remain as the independent predictors. Noteworthy, the TACS and TCMFscore are the most significant factors (with the smallest P-values in all variables) compared with the clinicopathological factors in the training and validation cohorts. By contrast, the clinical model combining age, molecular subtype, tumor size, nodal status demonstrates a weak predictive performance, with an AUC of 0.717 (95% CI, 0.665-0.765) in the training cohort and 0.672 (95% CI, 0.605-0.734) in the validation cohort. However, a combination of the clinical factors and TACS + TCMF-score shows the best performance (AUC = 0.874: 95% CI, 0.833-0.908 in the training cohort, and AUC = 0.874: 95% CI, 0.823-0.916 in the validation cohort). Similarly, the corresponding sensitivity and specificity of combined model are much higher than the clinical model ( Table 4). The specificity of combined model in the validation cohort is slightly lower than that of TACS + TCMF model, which may be due to the low specificity of the clinical model (50%). In the training and validation cohorts, the false positive rates of the combined model were 23.8% and 25.0%, respectively, which were much lower than 41.3% and 50.0% of the clinical model.

Discussion
In this study, we demonstrate the association of the macro and micro tumor-associated collagen signatures with histological grade of breast cancer. Previous studies suggested the macroscopic tumor-associated collagen signatures (TACS) play a crucial role in tumor formation and progression [16,21,22]. Our results demonstrate that the higher TACS-score is, the higher the tumor grade becomes. The conclusion is analogous to the results from our previous findings: the higher the TACS-score, the worse the prognosis [17]. In the optimization process of LASSO method for TCMF selection, the wavelet features have the highest weights in the collagen signatures, indicating the vital role of wavelet-based features in the prediction model. The Gabor wavelet transformation decomposes image into different frequency components on three axis of the image region which may further explore the spatial heterogeneity at different scales and directions [23]. This result is similar to that of previous studies in which wavelet-based features were incorporated into collagen model construction [20]. Histogram, grey-level co-occurrence matrix (GLCM)-based and Gabor wavelet transformation features are textural features of collagen fibers that have been reported by several studies and have potential clinical applications in the diagnosis of diseases [19,20,24,25]. After univariate and multivariate logistic regression analyses, the molecular subtypes, TACS-and TCMF-score remain as the independent predictors of histological grade. Previous studies have shown that high grade tumors are significantly associated with hormone receptor negativity in breast cancer patients [26]. Ehinger et al. suggested that histological grade can be used as a surrogate molecular subtype of breast cancer, for example, patients with ER-positive/HER2-negative/Grade 1 breast cancer have a prognosis similar to that of 'Luminal A-like' and might be avoided chemotherapy without other adverse prognostic factors [27]. Consistent with the study, our study has shown that Triple-negative is strongly associated with an increased risk of a high tumor grade. Moreover, TACS-score and TCMF-score are found to be effective for breast tumor grade classification. These signatures could stratify patients into Grade1 and Grade2/3 groups with the AUC of 0.859 and 0.863 in the training and validation cohort, respectively. The predictive performance is further improved by combining the collagen signatures with clinical model, achieving the AUC of 0.874 in both data cohorts. In addition, we developed and validated a nomogram model for individual estimation of tumor pathologic grades of breast cancer patients based on the TACS-score and TCMF-score. The developed nomogram is validated by the independent cohort, suggesting its reproducibility and reliability.
Breast tumor grading has been an important prognostic factor and continues to be a key pathologic feature for the treatment of patients with its incorporation into prognostic staging by the most recent AJCC staging manual [6,28]. Accurate assessment of the degree of breast cancer differentiation is beneficial for clinicians to determine a comprehensive treatment. X-ray mammography, magnetic resonance imaging (MRI), and ultrasonography are the most common clinical imaging modalities currently and used for screening and diagnosis of breast cancer. These imaging techniques not only can distinguish breast lesions but also can predict histopathological characteristics of breast cancer. For example, Forgia et al. reported that radiomics signature of Contrast-Enhanced Spectral Mammography (CESM) can be used to predict histological outcomes of breast cancer [29]. Fan et al. integrated radiomic features of dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) and T2-weighted imaging (T2W) to predict histological grade in ductal breast carcinoma [30]. Grajo et al. demonstrated that ultrasound elastography is related to breast tumor grades [31]. To some extent, these findings can assist clinicians in making better treatment decisions for patients. However, due to the limited resolution of these imaging methods, it is impossible to accurately assess histopathological characteristics at the cellular and molecular level like H&E staining.
Multiphoton microscopy yields similar resolution to H&E staining, and SHG imaging shows notably higher specificity to collagen fibers. Histological grade of tumor depends on the degree of differentiation of the tumor tissue and mainly refers to the semi-quantitative evaluation of tumor cell morphological characteristics. Since the seed and soil hypothesis of Paget, it has been recognized that tumor microenvironment or soil surrounding the tumor seed plays a vital role in its development [32]. Our results show that the changes of collagen fibers in the tumor microenvironment are significantly associated with histological grade in breast cancer. We observed that compared with a single image feature, the fusion of the macro and micro tumor-associated collagen signatures (TACS-score and TCMF-score) can achieve more accurate prediction of histological status, and can provide related and complementary information in the analysis of breast tumor to improve collagen-based histological grade prediction. In the future, we can further integrate the morphological characteristics of tumor cells and extracellular matrix to achieve more accurate prediction. We also acknowledge some potential limitations in this work: our study is a single-center retrospective trial. Therefore, our results need further external validation by a multi-center, large sample, and prospective cohort study.
In summary, our preliminary results confirm that histological grades of breast tumors can be differentiated with satisfactory accuracy by means of TACS-score and TCMF-score from multiphoton images. The nomogram model we developed and validated could potentially be useful for prognosis and treatment management in breast cancer patients.