Analysis and classification of kidney stones based on Raman spectroscopy

The number of patients with kidney stones worldwide is increasing, and it is particularly important to facilitate accurate diagnosis methods. Accurate analysis of the type of kidney stones plays a crucial role in the patient's follow-up treatment. This study used microscopic Raman spectroscopy to analyze and classify the different mineral components present in kidney stones. There were several Raman changes observed for the different types of kidney stones and the four types were oxalates, phosphates, purines and L-cystine kidney stones. We then combined machine learning techniques with Raman spectroscopy. KNN and SVM combinations with PCA (PCA-KNN, PCA-SVM) methods were implemented to classify the same spectral data set. The results show the diagnostic accuracies are 96.3% for the PCA-KNN and PCA-SVM methods with high sensitivity (0.963, 0.963) and specificity (0.995,0.985). The experimental Raman spectra results of kidney stones show the proposed method has high classification accuracy. This approach can provide support for physicians making treatment recommendations to patients with kidney stones

The Raman spectroscopy method allows real-time, lossless and non-invasive measurements. In addition, it requires only minimal sample preparation. It has a better signalto-noise ratio than X-ray diffraction, and the problem of spectral line overlap is less than FTIR and the detection time is short. However, no one has attempted to classify kidney stones using Raman spectroscopy combined with machine learning.
Therefore, analysis and classification of kidney stones by Raman spectroscopy is completely effective. Raman spectroscopy is now combined with machine learning in medical diagnostics [19,20]. In this study, we used micro-Raman spectroscopy to analyze kidney stones by percutaneous nephroscope lithotripsy. The contributions of this paper are as follows. First, we first used Raman spectroscopy in combination with machine learning methods to classify kidney stones, compared to methods already used in hospitals. Raman spectroscopy is faster and takes less time. Second, 135 samples of kidney stones were collected. This is the first time that a large-scale sample analysis has been performed compared to previous studies. Third, using a variety of machine learning models for comparison, it was found that most models are well-suited for the classification of kidney stones.

Sample
The major crystalline components of human urinary tract stones are listed in Table 1. A total of 135 kidney stone samples were obtained from patients at the First Hospital of China Medical University. The numbers of different types of kidney stones were 34 for Lcystine kidney stones, 34 for purines kidney stones, 32 for phosphates kidney stones and 35 for oxalates kidney stones. The mean patient age was 62 years. The oldest patient was 88 years old, and the youngest patient was 31 years old. All kidney stones were obtained by percutaneous nephroscope lithotripsy. The stones were washed with deionized water to remove debris such as blood, mucus and gypsum. The experimental procedure in this study was performed after obtaining written permission from the First Hospital of China Medical University and the patients. The washed kidney stones were dried in a moisture proof box for 1 hour. The oven-dried kidney stone samples were ground into a fine powder by using an agate mortar and pestle. In addition, we photobleached all kidney stone samples to reduce the effect of fluorescence.

Raman spectroscopy
The spectrum from the kidney stone samples were recorded in the spectral range of 200 cm −1 to 1700 cm −1 by using a Raman system (Horiba JY HR Evolution, France), which can provide a spectral resolution of 1 cm −1 . A 785 nm diode laser with spatial resolution of 1μm was used as an excitation source. The power of the laser is 19.2 mW, and the kidney stone samples were excited by a 20x microscope objective lens (NA = 0.40).
We measured three different locations for each sample during the collection of Raman spectra of kidney stones. Each spectrum was acquired with an integration time of 1s and accumulated 5 times.

Preprocessing
The spectra collected using the Horiba JY HR evolution Raman spectrometer yielded noise and fluorescence background. The noise was removed by Savitzky-Golay and the fluorescence background was also removed [22,23]. Savitzky-Golay filters are widely used for data stream smoothing and reducing noise. The most important feature of this kind of filter is the filtering ensures the shape and width of the signal while filtering out noise. The fifth-order polynomial fit was used to estimate the fluorescence background and then subtract from the original spectrum. In order to compare the changes in spectral shape and relative peak intensity in different urine samples, all the Raman spectra were normalized.

Classification and quantification
Principal Component Analysis (PCA) is a statistical method that reduces the dimensions of high-dimensional data to simplify complex data sets. A dependent-sample t test was conducted to select the most diagnostically significant PCs (P < 0.05).
There were two classifiers adopted for identification analysis; KNN and SVM. Both of these classifiers are well studied for classification problems and have been used in the fields of biophotonics, pattern recognition and classification for many years. The KNN algorithm is a mature and basic classification and regression method proposed by Cove and Hart in 1967 [24]. For a newly arrived instance the KNN algorithm predicts the category of the new instance by means of majority voting according to the category of its k nearest neighbor training set instances. The following algorithm is used: is the class of the instance, i = 1, 2, …,N; the instance feature vector x; Output: Class y to which instance x belongs 1. According to the given distance metric, k points in the training set t are found to be nearest to x, and the neighborhood of x covering the k points is denoted as ( ) Determine the class y of x according to the classification decision rule (such as majority vote) in ( ) The standard SVM was proposed by Cortes and Vapnik [25]. This is a supervised learning algorithm that implements network optimal parameter selection by minimizing structural risk minimization. The support vector machine can realize non-linear mapping of input vectors to high-dimensional feature space through nonlinear kernel function and can achieve effective classification. This makes the sample linearly feasible within this feature space.
The following kernel functions are used in this study:

Spectral preprocessing
The data preprocessing substantially improved the Raman spectra quality. The Raman spectrum was smoother and the Raman peaks of different kidney stones were distinguished. Fig. 1. Typical Raman spectra of kidney stones after preprocessing. Figure 1 shows the typical Raman spectra for the kidney stones in four different categories. The spectral intensities of the samples were different from each other. Specific assignments of individual peaks could be found in Table 2. The Raman spectrum of L-cystine stones, as shown in Fig. 1(a), have a strong S-S stretching in Raman peak at 499 cm −1 . There are some faint Raman bands at 614, 679,1341 1387, 1408cm −1 . Figure 1(b) shows that Raman spectra of uric acid stones have four main characteristic peaks at 626, 997, 1039 and 1405 cm −1 . There are two more peaks at 1501 and 1648 cm −1 . Figure 1(c) shows Raman spectra of phosphate stones, mostly caused by phosphate. In the composed calcium phosphate, the spectrum mainly appears in the 2 4 HPO − and 3 4 PO − bands in the spectral range of 900-1000 cm −1 . The intense band in the range of 1000-1100 cm −1 belongs to the bending vibration of the 3 4 PO − group. Figure 1

Principal component analysis
The PCA was performed to reduce the number of variables in the analysis. The data presented in Table 3 show the first four principal components of the Raman spectrum of the kidney stone sample retained 94.94% of the original data. There is a substantial amount of original information compressed into principal component 1, principal component 2 and principal component 3.These three principal components account for 89.19% of the original information. Figure 3 shows the pc 1, the pc 2 and the pc 3 scores. The different types of kidney stones are well distinguished. The scores of different types of kidney stones are within their respective regions without interfering with each other. In addition, the dispersion of points in each region of the kidney stone sample is relatively small and further shows that Raman spectroscopy combined with principal component analysis can distinguish kidney stone types. The PCA was performed to reduce the number of variables in the analysis.  The spectra also contain redundant data and noise, which limit the efficiencies of the KNN and SVM techniques. It is critical to reduce the dimensions of the spectral data using the PCA technique to simplify the implementation of the SVM algorithm and to improve the performance. Therefore, PCA can accurately capture changes in different types of stones, reduce the dimensions for subsequent machine learning of the data, and reduce calculations.

Classification
An accurate discrimination algorithm is required to properly use all of the information contained in Raman spectra for the classification. The KNN and SVM models were developed for the classification of Raman spectra of kidney stones. The classification model was evaluated by using a 5-fold cross validation approach. This approach divides the whole data set into 5-subsets. For each of the sub-sets we use one as the test set and the remaining four as training sets. The overall process is repeated 5 times to predict all the samples stepwise. This method makes full use of all the samples.  (Table 4). These are considered high quality results for the classification of kidney stones.  The performance of the model is usually evaluated based on accuracy, sensitivity, and specificity. We further confirmed the performance of the diagnostic model developed by the KNN and SVM algorithms using a receiver operating characteristic (ROC) for all classification algorithms (Fig. 4). The ROC curve is a graph that illustrates the performance of the binary classifier system because of its varying threshold of discrimination. The integrated area under the ROC curve (AUC) is a quantitative indicator used to represent the classifier performance. The sensitivity is the ability to correctly classify all patients with this disease and all patients without the disease are correctly identified with specificity ( Table  5).The larger AUC value means that the classifier has higher prediction accuracy. These results confirm the polynomial kernel SVM algorithm produces better diagnostic accuracy than other algorithms.

Conclusions
This study demonstrates the use of Raman spectroscopy combined with machine learning techniques can classify spectral data obtained from different kidney stones. The combination of Raman spectroscopy and statistical tools has great potential for the effective diagnosis and study of kidney stones. This is the first report classifying kidney stones according to Raman spectroscopy combined with machine learning methods.
It is feasible to classify kidney stones optically. We used Raman spectroscopy combined with PCA to investigate the identification of kidney stones. Our findings indicate kidney stone samples could be satisfactorily discriminated. The PCA methods can considerably simplify the complexity of calculation without sacrificing the performance of the algorithm. The experimental results show the proposed classification algorithms are effective methods and the KNN and SVM achieved high classification accuracy. Thus, there is potential to provide an effective and accurate diagnostic means for kidney stone detection. Future research should consider the potential effects of different optical methods in classifying kidney stones more carefully. Additionally, further work is required to disentangle the complexities in multi-component kidney stone classification. The research in our laboratory is using different optical imaging methods to accurately diagnose kidney stones. We hope this technique will determine the clinical advantage of Raman spectroscopy in the diagnosis of kidney stone types.