Dual-model analysis for improving the discrimination performance of human and nonhuman blood based on Raman spectroscopy

: The discrimination accuracy for human and nonhuman blood is important for customs inspection and forensic applications. Recently, Raman spectroscopy has shown effectiveness in analyzing blood droplets and stains with an excitation wavelength of 785 nm. However, the discrimination of liquid whole blood in a vacuum blood tube using Raman spectroscopy, which is a form of noncontact and nondestructive detection, has not been achieved. An excitation wavelength of 532 nm was chosen to avoid the fluorescent background of the blood tube, at the cost of reduced spectroscopic information and discrimination accuracy. To improve the accuracy and true positive rate (TPR) for human blood, a dual-model analysis method is proposed. First, model 1 was used to discriminate human-unlike nonhuman blood. Model 2 was then used to discriminate human-like nonhuman blood from the “human blood” obtained by model 1. A total of 332 Raman spectra from 10 species were used to build and validate the model. A blind test and external validation demonstrated the effectiveness of the model. Compared with the results obtained by the single partial least-squares model, the discrimination performance was improved. The total accuracy and TPR, which are highly important for practical applications, increased to 99.1% and 97.4% from 87.2% and 90.6%, respectively.


Introduction
Blood carries genetic information. Moreover, blood from people of different nationalities carries different genetic information. Using the genetic information of whole blood, biological weapons against a particular nationality may be developed by terrorists. On the other hand, genetic information from blood can help identify a victim or suspect and solve a criminal case. To prevent genetic terrorist activities and combat crime, correctly discriminating human blood is important for customs inspection departments and forensic investigations [1][2][3][4]. Many methods, such as, DNA tests [5, 6], latex agglutination [7], highperformance liquid chromatography [8][9][10], and mass spectrometry [11], have been used to discriminate human and nonhuman blood and have exhibited effectiveness. However, these methods require reagents and are time-consuming and destructive to samples. Moreover, they are dangerous for practitioners because of the pretreatment of the blood samples.
Spectral analysis technology combined with chemometric algorithms has recently attracted attention for the identification of the species of origin of blood samples. Attenuated total reflection Fourier transform infrared spectroscopy (ATR FT-IR) was first shown to be effective in a body-fluid examination by Elkins [12]. Subsequently, the effectiveness of species differentiation according to dry traces of blood using ATR FT-IR was demonstrated by Lednev [13]. Dry blood samples, which are typically found at a crime scene, were used to demonstrate the method. Blood from humans, cats, and dogs were analyzed, and 100% accuracy was achieved. Near-infrared transmitted spectroscopy was synchronously demonstrated by Zhang [1] and Lin [14] to be effective for discriminating human and nonhuman blood. In 2014, a method combining visible reflectance spectroscopy with partial least-squares discriminant analysis (PLS-DA) was demonstrated by Lin to be a powerful tool for species differentiation [15]. Moreover, Raman spectroscopy was reported as an effective technology for discriminating human and nonhuman blood [16][17][18][19][20][21][22]. Given its advantagessuch as, being nondestructive, not requiring sample preparation, and having high accuracy for the identification of organic, inorganic, and biological species-Raman spectroscopy has shown greater potential for species identification applications than other spectral methods. Because the compositions of human and nonhuman blood are similar, the Raman peaks for both types of blood samples are the same. Chemometric algorithms, including principal component analysis [18], PLS-DA [19,22], support vector machines [23], and self-reference algorithms [18], are used to identify the species of origin. All the aforementioned methods were proven to be effective for blood samples, which should be transferred from the vacuum tube onto a slide coated with aluminum by practitioners. However, this sampling process may expose the practitioners to infection risks. Thus, considering the safety of the practitioners, we explored a discrimination method involving measurement of blood samples in their original containers (vacuum blood tubes), which has been considered in diffuse reflectance spectroscopy [24]. Thus far, the discrimination of human and nonhuman blood with the samples in the vacuum blood tubes via Raman spectroscopy has not been proven by other groups. This is possibly because the fluorescence of the vacuum blood tube is so high that the Raman spectra of the blood disappear at an excitation wavelength of 785 nm, which has been demonstrated as the most suitable excitation wavelength for blood samples, as the fluorescence of blood is minimized. To avoid the influence of the fluorescence background of the vacuum blood tube, the excitation wavelength of 532 nm is used, and the Raman spectra of the blood in the vacuum blood tube are obtained. Analysis of the Raman spectra reveals that the identification accuracy of the PLS model is lower than that of Ref [22]. (in this paper, the excitation wavelength is 785 nm) owing to the small difference in the Raman spectra of nonhuman blood from different species.
To solve this problem, in this study, we proposed a dual-model analysis method, in which the first model was used to discriminate human and nonhuman blood for all blood samples, and the second model was used to discriminate human and nonhuman blood using the "human blood" determined by the first model. The nonhuman blood samples differed between the training data sets used to build the two models. For the first model, the nonhuman-blood training data were those whose Raman spectra were not similar to those of human blood. As a result, the discrimination of human blood or human-like nonhuman blood and nonhuman blood was achieved. The human blood or human-like nonhuman blood determined by the first model was then processed using the second model. The training data set for the second model comprised human blood and human-like nonhuman blood, while that for the first model comprised the same human blood and human-unlike nonhuman blood. Compared with the results obtained by the traditional PLS model, the discrimination accuracy and especially the true positive rate (TPR) of human blood are improved.

System
The Raman spectroscopy used in this work was similar to that in our previous work [25]. The optical system comprised three parts: a light source, a microscope, and a spectrograph. The wavelength of the light source (SLM G-100, Diode Tek) was 532 nm. A CX40M (Sunny Optical Technology, China) microscope was used to illuminate the sample and collect the Raman signal. The collected Raman signal was transmitted to a spectrograph (SR-303i-B, Andor) through a fiber. The spectral resolution of the spectrograph was ~2.6 cm −1 . To avoid photodegradation and sample damage, the laser power incident on the blood was ~3 mW. The laser exposure time was 1 s, and the accumulation was 7.

Materials
Whole blood samples from a total of 159 human donors and 173 animal donors, including chickens, ducks, geese, doves, pigs, and sheep, were used. All the animal donors were provided by the Laboratory Animal Center of Soochow University. Before the blood was collected, all the donors were ensured to be healthy and not be taking medication. All experiments were performed in compliance with the law. The vacuum tube, which was made of glass, contained dipotassium ethylenediaminetetraacetic acid (K 2 EDTA) at a concentration of ~1.6 mg K 2 EDTA per 1 mL of whole blood. The blood samples were immediately preserved in an icebox after they were collected. The spectra were obtained between 24 and 72 h after the blood samples were acquired. The preservation time of the blood was demonstrated to be as long as 3 months [17]. The thickness of the vacuum blood tube was ~1 mm; thus, a long-working distance objective lens (N PLAN H, 50 × , 0.5 NA, 7.1-mm working distance, Leica) was used to focus the light onto the surface of the blood in the tube. A total of 332 Raman spectra were collected, including 159 Raman spectra of human blood, 26 Raman spectra of chicken blood, 29 Raman spectra of duck blood, 32 Raman spectra of goose blood, 27 Raman spectra of dove blood, 12 Raman spectra of pig blood, 21 Raman spectra of sheep blood, 16 Raman spectra of rabbit blood, five Raman spectra of mouse blood, and five Raman spectra of dog blood. Each spectrum was collected from one donor. To ensure diversity, the ages of the human donors varied (number of donors aged between 18 and 34: 38; number of donors aged between 35 and 60: 74; number of donors aged above 60: 47), and blood samples from both males and females (number of males: 78; number of females: 81) were used. The race of the human donors was Chinese. The blood samples were mixed before the Raman spectra were measured.

Model
The Raman spectra of the blood from different species are presented in Fig. 1. Figure 1(a) shows the original spectra, and Fig. 1(b) shows the preprocessed spectra with subtraction of the background and normalization. The background of the Raman spectra was removed using the airPLS method [26,27], and the Raman spectra were normalized by dividing the maximal value of the spectra by the background subtracted. In the figure, the main Raman peaks of human and nonhuman blood are similar. According to the literature, no obvious difference is observed among the Raman spectra of nonhuman blood collected from different species under 785-nm laser excitation [18,20]. However, when the excitation wavelength was 532 nm, the Raman spectra of the blood collected from chickens, ducks, geese, and doves differed from those collected from pigs, rabbits, and sheep. As shown in Fig. 2, the Raman spectra of the blood collected from pigs, rabbits, and sheep are similar to those for humans. To improve the discrimination accuracy, we divided the blood samples into three groups: 1) human blood, 2) human-unlike nonhuman blood, and 3) human-like nonhuman blood. Accordingly, the traditional PLS model used for discriminating blood was separated into two models. Figure 2 illustrates the analysis procedures for the traditional single-model method and the proposed dual-model method. The final discrimination of human blood was performed using model 2, whereas the discrimination of nonhuman blood involved the comprehensive consideration of the discrimination of nonhuman blood achieved by models 1 and 2. The training set was used to construct the PLS model with six latent variables, and the leave-one-out method was used for internal cross-validation. For the models, the values 1 and 2 represent human and nonhuman blood, respectively, and the threshold is set as 1.5.

Prediction value of prediction set obtained via dual-model analysis
A total of 38 Raman spectra of human blood and 36 Raman spectra of human-unlike nonhuman blood (spectra from nine chicken-blood samples, nine duck-blood samples, 11 goose-blood samples, and seven dove-blood samples) were used to build model 1. A total of 109 Raman spectra of human blood (1-109) and 97 of nonhuman blood (samples from 110 to 187, which included chicken, duck, goose, and dove, combined with the samples from 188 to 206, which included pig and sheep) were used to validate the effectiveness of the model. The results are shown in Fig. 3. The predicted value for the samples ranging from 1 to 109 was <1.5, which was discriminated as human blood by the PLS model. The predicted value for the samples ranging from 110 to 187 was >1.5 (except for the 149th sample, whose predicted value was 1.43), which was discriminated as nonhuman blood. However, for the samples ranging from 188 to 206, the results obtained by the PLS model were incorrect: the nonhuman blood was discriminated as human blood. We defined this blood as human-like nonhuman blood. The same 38 Raman spectra of human blood and 30 Raman spectra of human-like nonhuman blood (16 rabbit blood, eight pig blood, and six sheep blood) were used to build model 2. The human blood determined by model 1 was used to validate the model. The results are shown in Fig. 4. Among the 109 samples of human blood, only the 21st was discriminated as nonhuman blood, and among the 20 samples of nonhuman blood, the 149th, 113rd, and 114th were discriminated as human blood. Combining the results obtained by models 1 and 2, only one human blood sample and three nonhuman blood samples were mis-discriminated by the dual-model method. The final discrimination accuracy was 98.1%, and the TPR for human blood was 99.1%.

Blind test and external validation
Blind test and external validation [16] were performed to validate the performance of the model. A total of 12 spectra were randomly chosen from the above species for the blind test of the dual-model. Raman spectra of blood originating from a mouse and dog, which are not included in the above species, were adopted for external validation. The discrimination results of a blind test and external validation are shown in Fig. 5 and Table 1. The accuracy of the blind test and external validation were 100%. These results illustrate the classification ability of the method for discriminating human blood.

Comparison with single PLS model
The traditional PLS model was analyzed. A total of 38 Raman spectra of human blood, 36 Raman spectra of human-unlike nonhuman blood, and 30 Raman spectra of human-like nonhuman blood were used to build the model. All the Raman spectra used to build the model and those used to validate the model were the same as those used in the dual-model method, and results are shown in Fig. 6. Among the 109 human blood samples used to validate the model, 14 were discriminated as nonhuman blood, and among the 117 nonhuman blood samples, 11 were discriminated as human blood. The discrimination accuracy was 87.9%, and the TPR for discriminating human blood was only 87.2%. For customs inspection and forensic applications, discriminating human blood correctly is more important than accuracy discriminating between human and nonhuman blood. Table 2 presents the results of the dual-model and traditional PLS methods. The TPR for human blood reached 99.1% for the dual-model method, indicating an increase of 13.6% compared with the PLS method. All the parameters indicate the effectiveness of the proposed method for improving the accuracy of discrimination. For both models, the values 1 and 2 represent human and nonhuman blood, respectively. To estimate the models for discriminating human blood, the receiver operating characteristic curve (ROC) was analyzed. To set the human blood as positive, the reciprocal of the predicted value was used. Finally, 0.5 (negative) was set to represent nonhuman blood, and 1 (positive) was set to represent human blood. In Fig. 7, the ROC curves of human blood versus nonhuman blood obtained using the dual-model and single-model methods are shown. The area under the ROC curve is one of the parameters that can be used to estimate the performance of the models. For the dual-model and traditional single-model analyses, the area under the curve was 0.998 and 0.947, respectively. The results of the dual-model analysis are more promising.

Conclusion
We report a dual-model method for improving the accuracy and TPR of human-blood discrimination. In total, 332 Raman spectra from 10 species were used to demonstrate the proposed method. The results achieved were compared with those of the traditional PLS model. The TPR for human blood reached 99.1%, which is 13.6% higher than that for the PLS model, and the accuracy of discrimination of human and nonhuman blood reached 98.1%, which is 11.6% higher than that for the PLS model. The comparison results reveal that the dual-model method can improve the accuracy and TPR for discriminating human blood and is highly suitable for customs inspection and forensic applications. In a future work, the effects of the preservation time and conditions will be studied and discussed.