In-vitro study on the identification of gastrointestinal stromal tumor tissues using laser-induced breakdown spectroscopy with chemometric methods

: Early-stage detection of tumors helps to improve patient survival rate. In this work, we demonstrate a novel discrimination method to diagnose the gastrointestinal stromal tumor (GIST) and its healthy formalin fixed paraffin embedded (FFPE) tissues by combining chemometric algorithms with laser-induced breakdown spectroscopy (LIBS). Chemometric methods which include partial least square discrimination analysis (PLS-DA), k-nearest neighbor (k-NN) and support vector machine (SVM) were used to build the discrimination models. The comparison of PLS-DA, k-NN and SVM classifiers shows an increase in accuracy from 94.44% to 100%. The comparison of LIBS signal between the healthy and infected tissues shows an enhancement of calcium lines which is a signature of the presence of GIST in the FFPE tissues. Our results may provide a complementary method for the rapid detection of tumors for the successful treatment of patients.


Introduction
Gastrointestinal stromal tumors (GISTs) or stomach cancer continue to be the second leading cause of cancer-associated deaths all over the world [1]. GIST's can arise anywhere in the digestive tract and are firstly supposed that they initiate through gastrointestinal stromal cells; therefore, it comes to know that this kind of tumors exist from interstitial cells of Cajal [2]. Current epidemiology displays that every year about 0.70per 100 000 people in the United States are affected by GIST, and the trend is increasing every year [3]. It has long been thought that gastric cancer is a destructive disease with a five-year existence rate less than 30% [4]. In last few decades, the incidence and mortality rate increased gradually and it becomes the sixth most common cancer disease that cause deaths in the developed countries [5][6][7].
Early detection of gastric cancer can effectively control the spreading of malignancy and decrease the mortality in the world. Diagnosis of gastrointestinal stromal tumor depends on qualitative histological evaluation of tissue samples. This method is time taking and relies on the histopathology's interpretation. It is very difficult to identify or diagnose the gastric cancer, especially at early stages using state-of-the-art methods such as computed tomography virtual gastroscopy (CTVG), fibro gastroscopy, three-dimensional imaging by multi slice spiral CT (3DMSCT) and ultrasonic gastroscopy [8]. Moreover, some emerging optical techniques such as optical coherence tomography (OCT), near-infrared spectroscopy (NIRS), autofluorescence imaging (FAF) and Raman spectroscopy (RS) have been used for diagnoses and classifications of different malignant tissues [9]. The OCT is an imaging technique that probes changes in mark biological tissue structure for infections [10]. Guo et al. used Raman and surface-enhanced Raman spectroscopies to diagnose the gastric cancer with 94.1% and 98.5% accuracies using principal component analysis and characteristic ratio method, respectively [11]. However, In Raman spectroscopy, Raman signals are still not much robust and it brings exogenic nanoparticles for the surface-enhanced Raman spectroscopy (SERS). Huang et al. used fiber-optic Raman endoscopy to find the primary lesions of stomach for in vivo differentiation of gastric cancer tissue from its healthy tissue [12]. In this technique, it is hard to capture the photos by injecting camera, because of tiny space in stomach. Further, blood loss and cerebrospinal fluid affect the imaging results. Therefore, there is an urgent need to find experimental techniques for the rapid detection of gastrointestinal stromal tumor.
Laser-induced breakdown spectroscopy (LIBS) is an attractive technique due to high resolution and real-time assessment of multi-elements detection [13,14]. The LIBS has been used for the detection and classification of different kinds of biomedical species, such as bio-aerosols, calcified materials and soft tissues [15]. El-Hussein et al. performed an experiment for the discrimination and classification of colorectal and breast cancers using LIBS, and showed that malignant tissues contained calcium and magnesium in higher access compared to its healthy ones [16]. Recently, Teng et al. proposed LIBS as a potential diagnostic tool to discriminate between glioma and infiltrating boundary with k-NN and SVM [17]. LIBS experiment was performed to discriminate the lung tumor and boundary tissues with different classification methods including SVM and Boosting Tree classification models combined with Principal component analysis (PCA) or Random Forest (RF), where RF-Boosting Tree model achieved the higher accuracy of over 98.9% [18]. Therefore, LIBS technique has a potential for rapid detection and discrimination of different biological tissues.
Many researchers presented remarkable researches in the diagnosis and prognosis of different tumor tissues and serum samples by LIBS combined with different machine learning algorithms including PLS-DA, LDA, SVM and k-NN models. k-NN, ANN and SVM classification models were used for the diagnosis and staging of multiple myeloma (MM) with accuracies of over 90%. Area under the curve (AUC) values of these classifiers were 97% with sensitivity of ∼93% and specificity of 91% for the discrimination between MM and healthy serum samples. Similarly, AUC values for the different staging of MM were about 97% with 91% sensitivity and 93% specificity respectively. It concluded that diagnosis and staging of serum samples of MM with machine learning methods can achieve the higher accuracy by using LIBS technique [19]. LIBS with chemometric methods used for the discrimination and identification of melanoma and normal FFPE tissues has been performed to achieved their accuracy, sensitivity and specificity all of 100% by using PLS-DA models [20]. Therefore, LIBS with chemometric algorithms is a powerful analytical tool in discrimination and classification of human and animal malignances.
In this work, an effective and rapid method is proposed to identify the Gastric tumor based on elemental components and discriminate it with its healthy tissue by using LIBS combined with chemometric algorithms for the first time. This is possible by comparing LIBS spectra recorded from normal and infected tissues. Classification models including PLS-DA, SVM and k-NN were used for discrimination analysis with two ways of feature line selection, for the efficient detection of GIST. We have found that the PLS-DA gave better accuracy compared to k-NN and SVM algorithms.

LIBS experimental setup
Gastrointestinal stromal tumor tissue samples were detected using LIBS experimental setup shown in Fig. 1. A flash-pumped Q switched Nd: YAG laser (λ = 1064 nm, pulse frequency 1 Hz, pulse duration τ = 5 ns, beam diameter = Ø6 mm, energy = 40 mJ/pulse) was used for ablation. The laser beam guided through three plane reflectors R1, R2 and R3 and focused on the surface of sample by a convex lens with 100 mm focal length. A three-dimensional translation stage was used to make a fresh spot of sample for each laser shot. Plasma emission produced on the surface of tissues samples was collected by a convex lens of diameter Ø25 mm and focal length 36 mm into a fiber optic bundle (two fibers, with Ø600 µm aperture each). The optical fiber bundle was connected to a two-channel gated charge-coupled device (CCD) spectrometer (AvaSpec 2048-2-USB2, Avantes). The spectral range of this spectrometer is from 200 nm to 900 nm, with a spectral resolution of 0.20-0.30 nm. External trigger used in the system included a photodetector and a digital delayer (SRS-DG535, Stanford Research System). To adjust the time interval between laser and the spectral achievement, the delay time was optimized to 1.29 µs in order to decrease the bremsstrahlung. The integration time of CCD was set to 2 ms. This experiment was performed under ambient air conditions.

Tissue samples
In this work, total twenty samples of GIST and corresponding healthy FFPE tissues of twenty different patients were verified by pathology report, ten GIST and ten healthy controls collected from the Origene Technologies Inc. of America. Each tissue origin was stomach with size of (7cm × 7cm × 6cm), one set thickness was 5µm having size 5 × 5 mm large. Images of GIST and non-GIST tissue obtained from pathology are shown in Fig. 2. There were 10 tissue slides of GIST and 10 tissue slides of healthy class. 90 spectra were collected for each FFPE sample. Total 900 spectra of each class were taken in which 90 spectra were taken by averaging 10 spectra. Total 180 LIBS spectra were obtained for the discrimination between two classes. After laser ablation FFPE sample of GIST and healthy tissue on glass slides are shown in Fig. 3.

LIBS spectra
The emission spectra from the GIST, non-GIST tissue samples and empty glass slide are shown in Fig. 4. In the average spectrum of empty slide, the significant emission lines of Ca, K, Na, N, H, CN and O can be seen. In combined spectra of GIST and its healthy tissues, red line shows average spectrum of tumor tissues and blue line shows average spectrum of healthy tissues. The spectra of GIST and its healthy tissue samples are quite different from that of the empty slide. The NIST atomic emission database was used to label the atomic emission lines in these spectra [21]. Here, the strong spectral lines of Ca, Mg, Fe, Na, Mg, CN, K, N, H and O were observed in the GIST and its healthy tissues. In these spectra, the intensities of several elements in tumor tissue are higher than those in healthy tissue samples. Total 37 emission lines present in LIBS spectra of both GIST and its healthy tissues which included Ca, Mg, Fe, K, Na, H, N, O and CN-band. Biological samples mainly contained Ca, Mg, Na, K and Fe with different composition and LIBS spectra of the empty slide contained the elements of Ca, Na, K, O, H, N and CN-band. The significant emission lines of Na, Ca, N, H and O were existed in both the spectra of empty slide and malignant samples [22]. Here, Significant emission lines of Ca, Mg, K, Fe and Na with higher intensities were observed in tissue samples. Meanwhile, Ca, K, Na, N, H and O were observed in both spectra of tissue and empty slide. Paraffine wax might also produce difference in intensities of C, H, N and O. Therefore, atomic emission lines emitting from H, N and O were not included in the analysis because of the potential interference produced by the elements in the atmosphere. EL Sherbini et al. concluded in their research that intensities of some specific elements like Ca, K, Mg, Mn, Cu, Na and Fe experienced a dramatic elevation in malignant liver tissues in contrast with healthy once through LIBS [23]. On the basis of strong and significant spectral emission lines existed in GIST tissues, twenty-one atomic lines of Ca, Fe, K, Na and Mg were chosen for the discrimination analysis. These elemental lines are listed in Table 1. Histogram comparing the average normalized intensities of 21 intense LIBS spectral lines of both tumor and healthy samples. The error bars are the standard deviations of the intensities of independent spectra are shown in Fig. 5. Some elements from Fe, Ca, Mg, Na and K in histogram show higher intensities of GIST tissues than its healthy tissues. From these lines, seven specific Ca lines were selected for further improvement in discrimination results as listed in Table 2. These specific Ca lines were selected on the basis of high intensities in tumor tissues as compared to normal tissues for better discrimination results. In this preliminary study, the number of Ca lines were abundant in GIST and its healthy tissues, therefore the classification results by using the Ca lines are different from other elements. This is a similar approach used by Seifalinezhad et al. to distinguish between neoplastic and non-neoplastic gastric tissues by using Spark Discharge LIBS (SD-LIBS). Ca, Mg, K, Na, Fe, Ti, C, O and N elements were identified in the spectrum of gastric tissue. The authors showed that specific elements like Ca and Mg have higher intensities in malignant tissue spectra as compared to its normal ones from the same patient [24].

Discrimination analysis
The principal component analysis (PCA) was performed with the two feature line selections as mentioned above contained 21 feature lines and 7 Ca feature lines. Figure 6

SVM and k-NN analysis
For discrimination, two sets of feature lines were used as an input of the PLS-DA, SVM and k-NN classifiers and produced classification accuracies as an output. Firstly, dividing the data into two sets: 72 spectra as test set, another 108 spectra as the training set to build the model. In the process of building the discrimination models by 10-fold cross validation, the data set is divided into ten parts, 9 of which are used for training and one is used for verification, and the average of the results of 10 times is used as an estimate of the accuracy of the algorithm. After this process, the model was built, and we use the test data set to test the model. Training set including 54 spectra of GIST and normal samples (108 spectra in total) and a test set including 36 spectra of gastric and normal samples (72 spectra in total) used for statistical analysis. Among these test data, 36 GIST spectra named as class 1 and 36 healthy tissue spectra named as class 2 were used for discrimination analysis as shown in Fig. 7.  Support vector machine (SVM) is a supervised machine learning method works on the principle of structure reduction and the vapnik-chervonenkis dimension theory of statistical learning. The working of SVM is to find the best classification hyperplane to fulfil the classification demands which enhanced the blank area on both sides of the hyperplane with confirming the classification accuracy [25]. The kernel function was selected as radial basis function and their parameters like penalty parameter "c" and kernel function parameter "g" were adjusted by particle swarm optimization (PSO) algorithm.
k-NN is used for pattern identification and classification approaches. Euclidean distance measure between particular point "x" and known training data points in the test data, where taking k known training data points by shortest distance from a particular point and calculating the value of k related to which class, calculating the class with the most classification and allocating the "x" point to that class [26]. Euclidean distance was used in selecting nearest samples, where the value of k was 3. This procedure was repeated many times to improve the accuracy, their averaged identification of sensitivity, specificity and accuracy were used to analysis the classification effect. In medical diagnosis, sensitivity is a statistical test which is used to correctly find the malignant samples called as true positive rate, whereas specificity is another statistical test which is used to correctly find the nonmalignant samples called as true negative rate. Meanwhile, Positive predictive value (PPV) is a statistical test's probability, when returning a positive result, of correctly identifying the malignant samples i.e., identifying true positives rate and at the same instant avoiding false positive rate, where NPV is a statistical test's probability, when returning a negative result, of correctly identifying nonmalignant samples i.e., identifying true negatives rates and at the same instant avoiding false negative rate.
The final accuracy, sensitivity and specificity of the SVM and k-NN model are: Positive predictive value (PPV) = TP TP + FP Negative predictive value (NPV) = TN TN + FN (5) where TP, FP, TN, and FN represent true positive, false positive, true negative, and false negative, respectively [27,28]. Classification results for SVM and k-NN model are shown in Fig. 7. During classification, accuracy decreased by increasing the number of feature lines. In case of 21 feature line selection, SVM accuracy result was 95.83% with sensitivity was 91.67% and specificity was 100% respectively. Its positive predictive value (PPV) was 1 and negative predictive value (NPV) was 0.923 according to Bayes' formula in eq (4) and (5). On the other hand, k-NN classifier achieved the accuracy of about 94.44%, their sensitivity was 88.89% with specificity 100%, respectively. Its PPV was 0.100 and NPV was 0.90. These results specify the difference between the spectra of same kind of tissues because of individual changes between each GIST and its healthy tissues. Classification results of SVM and k-NN is not so good because may be the training datasets do not completely occupy the spectral variations from these differences.
Seven most important feature lines were of Ca which showed the highest accuracy rate of 100% in both k-NN and SVM models. Ca played an important role for classification in contrast to the other elements. Meanwhile, other emission lines of Ca show different classification results in gastric and healthy tissues. Calcium in tumor tissues is abundant with higher intensities than in normal tissues. The results are consistent in the previous studies of emission lines of Ca in gastric tumors [29,30]. Therefore, it is more reliable to choose calcium for discrimination of tumors from normal tissues. Remaining emission lines of Ca, Mg, Fe, K and Na lines from spectral data may be mainly caused by the glass slide as a substrate, paraffine wax and environment so they didn't show significant effect on the classification and even introduced a decreasing accuracy rate for gastric and healthy tissues.

PLS-DA analysis
PLS-DA is one of the supervised methods which deals with both classification and regression analysis. The PLS-DA model is a multivariate data analysis technique which is very suitable for spectral analysis of biological samples [31,32] The primary goal of this model is to decrease the number of independent variables to the number of latent variables (LVs) to make the linear groups of dependent variables. For classification, the number of latent variables plays an important role because it acts same as the number of independent variables. If number of LVs are smaller, than it cannot achieve good accuracy but if it is larger than it shows overfitting [33]. In this paper, PLS-DA was used with LIBS technique to identify GIST tissues and its healthy ones. Results of the PLS-DA algorithm are the regression coefficients of the independent variables (LIBS spectra peaks in our analysis). Regression coefficients with high absolute values are usually associated with important peaks for class discrimination and peaks with positive coefficients contribute to the increase of the class calculated response [34]. To make the model, the classification_toolbox_5.0 (Eigenvector Technologies, Inc.) is used in MATLAB. The PLS-DA algorithm showed good classification performance which predicted that all misidentified samples were from diseased tissues and its healthy tissues encountering higher accuracy of about 100%. However, PLS-DA is an effective classification method to discriminate between GIST and its healthy tissues based on atomic and ionic emission lines of LIBS spectra. Total analyzing time were 2 seconds including spectrum calculation and for predictive classification results. All calculations were conducted on a computer with software at Intel Core i7-6600U, CPU 2.60GHz, Windows 10 system, 16 GB RAM. All classification models were used on MATLAB 2016a.  Figure 8 shows the accuracy of SVM, PLS-DA and k-NN classifiers according to different number of feature lines of Ca. Figure 9 shows the receiver operating characteristic (ROC) curves achieved from 21 lines and 7 selected Ca line intensities by normalization with three discrimination models of GIST. The ROC curve plots sensitivity versus (1-specificity) with different classification models. By using DeLong test, for 21 lines the AUC (Area under the curve) for PLS-DA were 0.100, SVM were 0.9583 and k-NN were 0.9444% respectively. In case of 7 Ca lines AUC for all three discrimination models show 0.100. The working points marked on the upper left corner of ROC curves. Therefore, the tumor tissue slices could be unambiguously discriminated from the healthy ones by using all the three models by leading an ideal ROC property in both cases of 21 feature lines and 7 Ca lines selection. Thus, our results demonstrated that LIBS is a promising analytical method for the detection of GIST and its corresponding healthy tissues.
LIBS technique has a broad application prospect in biomedical, especially in the field of tumor tissues detection [35][36][37]. In future, it is possible to build in-vitro early diagnosis structure based on LIBS. Classification results of 21 feature lines and 7 selected Ca line intensities of spectral data are listed in Table 3. In our results, PLS-DA model shows the highest accuracy of 100% with latent variable 9 in both cases of 21 lines selection and 7 Ca lines. In case of 21 feature lines, k-NN had accuracy of over 94.44% and the optimal value of k was 3, but for 7 Ca lines k-NN model had accuracy 100% and the optimal value of k was 1. SVM showed 95.83% accuracy and the optimal value of c and g were 0.041 and 0.846. In case of 7 selected Ca lines its SVM  model had 100% accuracy and the optimal value of c and g were 0 and 1. It was concluded that a smaller number of selected spectral lines of a specific element is important for the best classification results. In multiple verifications, the accuracy difference of the discriminant model is very stable, and variance is less than 4%, so the overall accuracy is used as the comparison standard. Therefore, the accuracy results of PLS-DA, k-NN and SVM models had better results for 7 feature lines selection.
For in-vivo future measurements, the wetness and softness of sample surface is the biggest challenge for LIBS, because it will affect the energy coupling and plasma excitation for biological samples. The inhomogeneity of sample and fluctuating performance of the LIBS system influences the properties of laser-matter interaction. As a preliminary study, this paper demonstrates the showed 95.83% accuracy and the optimal value of c and g were 0.041 and 0.846. In case of 7 selected Ca lines its SVM model had 100% accuracy and the optimal value of c and g were 0 317 and 1. It was concluded that a smaller number of selected spectral lines of a specific element is 318 important for the best classification results. In multiple verifications, the accuracy difference 319 of the discriminant model is very stable, and variance is less than 4%, so the overall accuracy 320 is used as the comparison standard. Therefore, the accuracy results of PLS-DA, k-NN and SVM 321 models had better results for 7 feature lines selection.

Conclusion
We introduce the LIBS technique to discriminate between GIST and its healthy tissues with machine learning for the first time. In this work, atomic emission lines of Ca, Mg, Fe, K and Na had been chosen for the discrimination analysis of tumor. Chemometric algorithms, such as PLS-DA, k-NN and SVM were used to build the classification models with two feature line selection of elements. From 21 selected lines of Ca, Fe, Mg, K and Na elements, 7 specific Ca lines were selected for further improvement in discrimination results. These specific 7 Ca lines were selected on the bases of high intensities in tumor tissues and also depended on better discrimination results. In the PCA score plots, the dispersion of points in each region of the two classes is relatively small which further shows that LIBS combined with PCA can distinguish GIST and healthy tissues. By comparison of SVM, k-NN and PLS-DA classifiers in case of 21 selected lines, their accuracy level increased from 94.44% to 100%. Therefore, both feature selections (21 and 7 emission lines) showed that PLS-DA model have achieved a remarkable discrimination performance for GIST, leading to 100% diagnostic sensitivity as well as specificity and thus representing the potential of LIBS analysis for exploring and screening of soft tissues. SVM and k-NN model both gave 100% accuracy for 7 selected Ca lines, and in case of 21 feature lines classification accuracy of SVM was 95.83% and k-NN was 94.44%. Therefore, it has been concluded that Ca lines are more important than other lines for identification of GIST. Regarding GIST diagnosis, it is logical to argue that higher peaks of specific Ca lines in a part of a gastric tissues in comparison to the related healthy tissues are associated with the existence of tumor. It can be concluded that, this in-vitro study had high detection sensitivity with highlighting the potential of this tool in clinical applications. On the basis of these results, we will further conduct research for the detection of fresh soft tissues to promote the applications of LIBS in biomedical field.

Disclosures. The authors declare no conflicts of interest.
Data availability. Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the corresponding author upon reasonable request.