Rapid identification of pathogens in blood serum via Raman tweezers in combination with advanced processing methods

Pathogenic microbes contribute to several major global diseases that kill millions of people every year. Bloodstream infections caused by these microbes are associated with high morbidity and mortality rates, which are among the most common causes of hospitalizations. The search for the “Holy Grail” in clinical diagnostic microbiology, a reliable, accurate, low cost, real-time, and easy-to-use diagnostic method, is one of the essential issues in clinical practice. These very critical conditions can be met by Raman tweezers in combination with advanced analysis methods. Here, we present a proof-of-concept study based on Raman tweezers combined with spectral mixture analysis that allows for the identification of microbial strains directly from human blood serum without user intervention, thus eliminating the influence of a data analyst.


Rapid identification of pathogens in blood serum via Raman tweezers in combination with advanced processing methods: supplemental document
Pathogenic microbes contribute to several major global diseases that kill millions of people every year.Bloodstream infections caused by these microbes are associated with high morbidity and mortality rates, which are among the most common causes of hospitalizations.The search for the "Holy Grail" in clinical diagnostic microbiology, a reliable, accurate, low cost, real-time and easy-to-use diagnostic method, is one of the essential issues in clinical practice.These very critical conditions can be met by Raman tweezers in combination with advanced analysis methods.Here, we present a proof-of-concept study based on Raman tweezers combined with spectral mixture analysis that allow for the identification of microbial strains directly from human blood serum without user intervention, thus eliminating the influence of a data analyst.

SENSITIVITY AND SPECIFICITY
Sensitivity is defined in Eq.S1, where TP is the number of True Positive identifications divided by the sum of the number of True Positive and False Negative (FN) identifications.Specificity is defined in Eq.S2, where TN stands for True Negative and FP for False Positive identification.
In our work we define TP as correctly identified pure serum or presence of any microbe, FN as any microbe identified as pure serum and FP when the algorithm identifies any microbe as pure serum.

OPTICAL TRAPPING
Fig. S1 shows an optically trapped E. coli bacterium.The image was obtained using the setup described in the main text.For better visualization, the concetration of the bacteria in this sample is higher, than was used for the collection of experimental data.

AVERAGE RAMAN SPECTRA
The averaged Raman spectra of microbes trapped directly in blood serum (and pure blood serum) are plotted in Fig. S2.The cosmic ray peaks in the data were corrected using the method described in the main text, and the plotted spectra were normalized using the L2 norm.The data did not undergo any sort of smoothing or background removal.The averaged spectra were obtained from 32 spectral acquisitions for the blood serum spectrum, 41 for the C. albicans spectrum, 42 for the E. coli spectrum, 28 for the S. aureus spectrum and 35 for the S. epidermidis spectrum.These spectra are plotted using data from the serum E dataset.

PRINCIPAL COMPONENT ANALYSIS LOADINGS
Principal component analysis (PCA) was used to reduce the dimensionality of the Raman spectral dataset and to extract the main features of the spectra.Principal component scores are used in the bacterial identification algorithm.Fig. S3 shows first four PCA loadings for serum E.  Table S1.Summary of visible peaks/bands found in the Raman spectra of microorganisms including suggested assignments of chemical compounds.

CONFUSION MATRICES IN DETAIL FOR FULLY CONSTRAINED LEAST SQUARES
As mentioned in the main text, for our experimental condition, we obtained the highest accuracy of microbe identification in a blood serum for the application of the Fully Constrained Least Squares (FCLS) algorithm.Normalized confusion matrices for each of the experimental data sets (for each serum) are shown in Fig. S4.

CONFUSION MATRICES OF THE REST OF THE USED ALGORITHMS
Confusion matrices of identification using Non-Negative Least Squares (NNLS), Unconstrained Least Squares (UCLS), and K-nearest Neighbors with Iterative Polynomial Fitting (IPF and KNN) algorithm are shown in Fig. S5.

Fig. S2 .
Fig. S2.Average Raman responses of microbes (and blood serum E) trapped by Raman tweezers.The shaded areas represent the variation in measured spectral intensities (standard deviation).

Fig. S4 .
Fig. S4.Confusion matrix of results from FCLS algorithm processed Serum A-E.The data are normalized by rows (true labels).
Fig. S5.Averaged confusion matrix of results from A) non-negative least squares (NNLS), B) unconstrained least squares (UCLS) and C) k-nearest neighbors combined with iterative polynomial fitting (IPF and KNN) algorithm processing.The values within the confusion matrices were averaged from all 5 datasets (serum A-E).The data are normalized by rows (true labels).

Fig. S6 .
Fig. S6.Raman spectra of microbes trapped by Raman tweezers in serum A-E that were misidentified using the FCLS algorithm.