Exploratory Study on Screening Chronic Renal Failure Based on Fourier Transform Infrared Spectroscopy and a Support Vector Machine Algorithm

Chronic renal failure (CRF) is a clinically serious kidney disease. If the patient is not treated in a timely manner, CRF will develop into uremia. However, current diagnostic methods, such as routine blood examinations andmedical imaging, have low sensitivity. /erefore, it is important to explore new and effective diagnostic methods for CRF, such as serum spectroscopy. /is study proposes a cost-effective and reliable method for detecting CRF based on Fourier transform infrared (FT-IR) spectroscopy and a support vector machine (SVM) algorithm. We measured and analyzed the FT-IR spectra of serum from 44 patients with CRF and 54 individuals with normal renal function. /e partial least squares (PLS) algorithm was applied to reduce the dimensionality of the high-dimensional spectral data. /e samples were input into the SVM after division by the Kennard–Stone (KS) algorithm. Compared with other models, the SVM optimized by a grid search (GS) algorithm performed the best. /e sensitivity of our diagnostic model was 93.75%, the specificity was 100%, and the accuracy was 96.97%. /e results demonstrate that FT-IR spectroscopy combined with a pattern recognition algorithm has great potential in screening patients with CRF.


Introduction
Chronic renal failure (CRF) refers to chronic and persistent renal impairment with various causes, resulting in renal sclerosis and nephron loss, and it is accompanied by many complications, such as cardiovascular disease [1][2][3][4]. If the patient is not treated promptly, CRF may develop into uremia, and the patient will not be able to maintain basic functions, leading to severe systemic involvement [5,6]. Current routine examinations for the diagnosis of CRF include examination of blood, urine, and glomerular filtration function. Glomerular filtration function refers to the function of the kidneys whereby metabolites, poisons, and excessive water in the body are removed. e main methods for examining glomerular filtration function include detection of serum creatinine (Scr) concentration and creatinine clearance (Ccr), as well as radionuclide measurement of the glomerular filtration rate (GFR) [7,8]. However, the Scr and Ccr values may be significantly different for different people, which may make it difficult for clinicians to make a correct diagnosis. According to a research by Sanchez, the sensitivity of Scr is only 46% by using a GFR of 90 mL/min as a cutoff value [9]. In addition, CRF can be diagnosed by medical imaging such as ultrasonography, but this process relies mainly on the doctor's expertise and subjective experience. erefore, it is important to find a rapid, objective, and accurate method for diagnosing CRF.
In recent years, infrared spectroscopy, combined with pattern recognition algorithms, has provided a new method for the early screening of many diseases [10]. Fourier transform infrared (FT-IR) spectroscopy is a cost-effective and noninvasive technique for measuring transitions in vibrational and rotational levels of infrared absorption [11][12][13]. FT-IR spectroscopy can measure the differences between components in serum, such as glucose, protein, and cholesterol [14,15]. For the past few years, FT-IR spectroscopy has been found to be of great value in clinical applications. It only requires a small amount of sample preparation [16]. e measurement process is largely automated, and the operators do not need much professional training. FT-IR spectroscopy has been used to diagnose many diseases, for example, oral cancer [17], dengue fever [18], leukemia [19], and human immunodeficiency virus [20]. However, there has been no relevant study of the diagnosis of CRF patients by FT-IR spectroscopy thus far.
Since the support vector machine (SVM) algorithm was introduced, due to its complete theory and the good results it has achieved in practical applications, and it has attracted widespread attention in the field of machine learning [21,22]. rough the kernel function, an SVM can solve the high-dimensional space problem by using a solution obtained under the linearly separable condition, thereby avoiding the complexity of high-dimensional space [23].
Based on the above considerations, this study selected 44 patients with CRF and 54 individuals with normal renal function and then measured the FT-IR spectra of their serum. e partial least squares (PLS) algorithm was applied to reduce the dimensions of the FT-IR spectra. e PLS algorithm projects the spectral data in the maximum covariance direction and decomposes the original infrared spectrum into multiple principal component spectra. e principal components of different infrared spectra represent the contributions of different components and factors to the spectrum.
rough reasonable selection, the components representing interference are removed, and useful main components are selected [24]. en, we use the Kennard-Stone (KS) algorithm to divide the 98 samples into a training set and a test set. e KS algorithm considers all samples as candidate samples of the training set and then selects certain ones to put into the training set. First, the two vector pairs with the longest Euclidean distance are selected to go into the training set. In the next iteration process, the candidate samples with the largest minimum distance are selected, and so on, until the number of samples required by the training set number is reached.
is method can ensure that the samples in the training set are evenly distributed according to their spatial distances [25]. Next, an SVM was selected as a classifier to build the diagnostic model, and it was optimized by the particle swarm optimization (PSO) algorithm and the grid search (GS) algorithm. PSO is an optimization algorithm based on swarm intelligence in the field of computational intelligence. Its basic premise is derived from the simulation of the processes involved in the migration and aggregation of foraging birds, and its purpose is achieved through collective cooperation and competition between birds [26,27]. e GS algorithm searches a spatial grid composed of parameters to be solved according to a certain step size and traverses all points in the grid to find the optimal parameters [28]. Additionally, we compared the accuracy of the SVM with those of discriminant analysis (DA) [29], the extreme learning machine (ELM) algorithm [30], and the learning vector quantization (LVQ) algorithm [31] to assess the performance of the SVM. To the best of our knowledge, this is the first study to explore the feasibility of detecting CRF by FT-IR spectra of serum.

Materials.
e fresh blood samples used in the experiment were provided by a hospital in Xinjiang. When collecting samples, we used a GFR of 90 mL/min as the cutoff value. We used this as a criterion to include the research subjects. In addition, we excluded some pregnant or lactating women and patients with nephritis due to systemic autoimmune diseases. We selected a total of 98 fresh blood samples, including 44 patients with CRF and 54 individuals with normal renal function. We obtained the patient's right to informed consent and did not involve any privacy of the patient in the study. First, 5 ml of fresh blood was collected from each sample. e blood was then centrifuged at 4°C with a relative centrifugal force of 4000, and the uppermost transparent liquid was processed into serum. Finally, we put it into a centrifuge tube and stored it in a freezer at −70°C.

Measurement of FT-IR Spectra.
e serum samples were first removed from the freezer, and 5 μl of each serum sample was plated on zinc selenide crystals after liquefaction at room temperature. After drying at room temperature for approximately 15 minutes, the plated sample was measured by a VERTEX 70 FT-IR spectrometer (Bruker Corporation). e air background data were measured with OPUS 65 software before each measurement of the FT-IR spectrum. e selected resolution was 8 cm −1 , the number of scans was 32, and the scanning range was 700-4000 cm −1 . CO 2 compensation was selected as the atmospheric compensation parameter. Each sample was scanned 5 times and then averaged for further analysis. e way to collect the spectrum of the sample is shown in Figure S1 in Supplementary Files.

Data Processing.
To reduce the intensity variation between spectra, FT-IR spectral data were normalized to [0, 1]. e operation of normalization was applied for each spectrum. After normalization, the spectral data contained 855 sets of intensity variables ranging from 700 to 4000 cm −1 . e PLS algorithm was used for dimensionality reduction. en, the 98 samples were divided into a training set and a test set by the KS algorithm. We used an SVM to complete the classification of CRF patients and individuals with normal renal function. e radial basis function (RBF) was selected as the kernel function of the SVM, and the SVM was optimized by PSO and the GS algorithm to find the best parameter. Finally, we compared the accuracy of PSO-SVM, GS-SVM, DA, ELM, and LVQ. e experimental environment used in the study was MATLAB 2016a. Journal of Spectroscopy

Model Evaluation.
Sensitivity, specificity, and accuracy were used to assess the classifier model [32]. ese parameters are defined as follows: where TP, FP, FN, and TN represent the number of truepositive, false-positive, false-negative, and true-negative samples, respectively; positive indicates CRF patients, and negative indicates individuals with normal renal function. Figure 1 shows the normalized mean FT-IR spectra of CRF patients and the normal control group in the range of 700 to 4000 cm −1 and the difference between the two groups. As shown in Figure 1, the FT-IR spectra of the two groups both have characteristic peaks at 1080, 1243, 1314, 1400, 1453, 1543, 1650, 2931, and 3289 cm −1 . e FT-IR spectral intensities of CRF patients are higher than those of individuals with normal renal function near 1080, 1243, 1314, 1400, 1454, 1543, 1650, and 2931 cm −1 and lower near 3289 cm −1 . Especially, near 1543 and 1650 cm −1 , the difference between the FT-IR spectra of the two groups is obvious. We can see the difference in FT-IR spectra at the bottom of Figure 1, and this is the basis for differentiating the FT-IR spectra of the CRF patients from those of the control group. To better distinguish the two groups of FT-IR spectra, we used different algorithms to build diagnostic models. We have added Figure S2 in Supplementary Files, which contains all the spectra.

Feature Extraction.
It is not possible to enter all spectral data directly into the classifier as modeling variables because this could lead to overly complex calculations [33]. erefore, it is necessary to compress high-dimensional spectral data. In this study, the PLS algorithm was chosen to compress the spectrum (multidimensional spatial data) into lower-dimensional spatial data. We determined the optimal number of components by means of 10-fold cross-validation (CV). By verifying the predicted error sum of squares (PRESS) under each principal component, the number of principal components with a small PRESS value was selected as optimal principal component. It can be seen from Figure S3 that the number of optimal principal component is 6.
We draw a three-dimensional figure with principal component 2, principal component 3, and principal component 5 as shown in Figure 2. Figure 2 shows a certain degree of difference between CRF patients and control subjects. To further distinguish between the two groups, an effective classifier is needed in subsequent analysis.

Diagnostic Model.
In this study, the 98 samples were divided into training and test sets by the KS algorithm, as shown in Table 1. en, the SVM, DA, ELM, and LVQ were used to classify CRF patients and members of the control group. In the SVM model, the choice of the kernel function has a great impact on the classification ability of the SVM. Common types of kernel functions of SVMs include linear, polynomial, RBF, and sigmoid [34]. Since RBF has fewer parameters and can classify multidimensional data, we chose RBF as the kernel function of our model [35]. In the SVM with RBF as the kernel function, the selection of penalty parameter C and kernel function parameter g is very important for classification accuracy; C is used to control the degree of penalty associated with an error, and g represents the width of the RBF, which was used to prevent overfitting [36]. e PSO and GS algorithms were used to optimize the values of C and g of the SVM. We used a 10-fold CV to solve overfitting problems when training the SVM. In the PSO-SVM model, the PSO algorithm parameter settings were as follows: the local search ability was 1.5, the global search ability was 1.7, maxgen was 200, sizepop was 20, and the ranges of C and g were [10 −2 , 10 2 ] and [10 −2 , 10 3 ], respectively. e parameters of the GS algorithm were set as follows: the values of C and g were both [2 −4 , 2 4 ], and the sizes of the search steps were all set to 0.1. Table 2 shows the results for the five models. e sensitivity of GS-SVM is 93.75%, the specificity is 100%, and the accuracy is 96.97%.
To further evaluate the reliability of our model, we plotted the receiver operating characteristic (ROC) curve, as shown in Figure 3. Table 3 lists the integration area under the ROC curves (AUC) of five models. Table 3 shows that the AUC of GS-SVM reached 0.969, which indicates that our diagnostic models have a high accuracy.

Discussion
In summary, we first discussed the application of FT-IR spectroscopy in combination with pattern recognition algorithms in the diagnosis of CRF. It has important applications in preventing CRF from deteriorating into uremia. By analyzing the average FT-IR spectra of CRF patients and individuals with normal renal function, we found that there were specific differences between the two groups of spectra. Based on these differences, we can use serum FT-IR spectra combined with the SVM algorithm to diagnose CRF.
Due to the metabolic abnormalities of CRF patients, the contents of proteins, lipids, and other biomolecules in the serum change, which may cause the shape of the FT-IR spectra to change. Table 4 lists the main IR absorption bands and their corresponding assignments [15][16][17]37]. e peaks at 1080 and 1243 cm −1 correspond to the absorption of symmetric and asymmetric stretching vibrations, respectively, of P═O of nucleic acid [17]; the band at 1314 cm −1 corresponds to amide III [16]; the bands at 1400 and 1453 cm −1 are due to the stretching vibration of CH 3 and asymmetric bending vibration of CH 2 in lipids, respectively [37]; proteins contribute to the amide II peak at 1540 cm −1 (N-H bend) and the amide I peak at 1650 cm −1 (C═O stretch) [15]; the peak at 2931 cm −1 is due to CH 2 asymmetric stretching vibrations of lipids [17], and the peak at 3289 cm −1 corresponds to stretching vibrations of O-H of water [17].
Compared with those of the individuals with normal renal function, the peaks of nucleic acid (1080 and 1243 cm −1 ), lipids (1400, 1453, and 2931 cm −1 ), and proteins (1314, 1540, and 1650 cm −1 ) were stronger in the FT-IR spectroscopy of CRF patients. According to reports, lipid metabolism disorders are common in CRF patients, and with the deterioration of renal function, abnormal lipid metabolism is more significant [38]. Patients with CRF have higher lipid levels than the normal population possibly due to renal excretion disorders that cause lipids to accumulate [39]. Disorders of lipid metabolism in the serum of CRF patients will aggravate kidney disease, severely damage kidney function, and  cause harm to the cardiovascular system, increasing the incidence of cardiovascular disease [40]. In addition, the FT-IR spectral intensities of CRF patients were significantly higher near 1543 and 1650 cm −1 than those of the individuals with normal renal function. We can preliminarily conclude that there is a certain difference in the amide II band and amide I band of proteins between CRF patients and individuals with normal renal function. In summary, this study serves as exploratory research. As a result of the complexity of serum, more research is urgently needed to further understand the possible changes in the serum composition of CRF patients.

Conclusion
In this paper, we analyzed and compared the serum FT-IR spectra of 44 patients with CRF and 54 individuals with normal renal function. e PLS algorithm was used to compress high-dimensional spectral data. e KS algorithm was used to divide samples, which can improve the representation of the samples. We selected an SVM algorithm as the classification algorithm to build the diagnostic model. rough analysis and comparison, GS-SVM was found to have the best performance. e sensitivity of the GS-SVM diagnostic model was 93.75%, the specificity was 100%, and the accuracy was 96.97%, which indicated that the GS-SVM model proposed by our research group has good reliability. Given that this was an exploratory study, the number of samples we collected was limited. In the next step, we will increase the sample size to verify this exploratory study and further evaluate the effects of using FT-IR spectroscopy to diagnose CRF, providing an efficient, low-cost, noninvasive, and reliable diagnostic method for CRF diagnosis.

Data Availability
e data used to support the findings of the study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.      Figure S1. e flowchart of measurement of FT-IR spectra. Figure S2. Original spectra of 98 samples. Figure S3.