In Situ Raman Hyperspectral Analysis of Microbial Colonies for Secondary Metabolites Screening

Since the discovery of penicillin, a vast array of microbial antibiotics has been identified and applied in the medical field. Globally, the search for drug candidates via microbial screening is ongoing. Traditional screening methods, however, are time-consuming and require labor-intensive sample processing, significantly reducing throughput. This research introduces a Raman spectroscopy-based screening system tailored to the in situ analysis of microbial colonies on solid culture media. Employing multivariate curve resolution-alternating least-squares (MCR-ALS) for spectral decomposition, our approach reveals the production of secondary metabolites at the single colony level. We enhanced the microbial culture method, enabling direct, high signal-to-noise (S/N) ratio Raman spectroscopic measurements of colonies of Escherichia coli and actinomycetes species. Through semisupervised MCR analysis using the known spectra of actinorhodin and undecylprodigiosin as references, we accurately assessed the production of these compounds by Streptomyces coelicolor A3(2). Furthermore, we herein successfully detected the production of amphotericin B by Streptomyces nodosus, even in the absence of prior spectral information. This demonstrates the potential of our technique in the discovery of secondary metabolites. In addition to enabling the detection of the above-mentioned compounds, this analysis revealed the heterogeneity of the spatial distribution of their production in each colony. Our technique makes a significant contribution to the advancement of microbial screening, offering a rapid, efficient alternative to conventional methods and opening avenues for secondary metabolites discovery.


Table of Contents
Proof of concept for semi-supervised MCR-ALS (Figure S2

Semi-supervised MCR-ALS: six compounds Raman spectral dataset
To demonstrate the sparse spectral decomposition of the dataset with the small numbers of components by a large number of reference spectra, the spectral dataset with six biomolecular components were prepared.These Raman spectral components were mixed with the following intensity profile (25 x 25 grid) with artificially created background components and random noise component (Figure S3a).For the tailored Raman spectral dataset (Figure S3b), background subtraction was performed by MCR-ALS: A = WH + E. In this process, components with broad spectral features without sharp band shapes were extracted as background components.Then, the contribution of the background spectral components was calculated based on the intensity information obtained from the H matrix, and subtracted from the original spectra to remove the background.The resulting Raman spectra indicated the background mostly removed, while preserving biomolecular Raman spectral feature (Figure S4).

Figure S4. Background subtracted Raman spectral dataset
For the background subtracted dataset, semi-supervised MCR-ALS analysis was performed by using reference spectra of previously mentioned compounds except for benzylpenicillin and pyruvate (Figure S2).Benzylpenicillin and pyruvate were regarded as unknown compounds to mimic the discovery of new biomolecules being detected with no reference spectra in the calculation.In the ALS calculation, spectral variation was allowed for the reference spectra in the range of 0.999 in cosine similarity.
In MCR-ALS, it is generally necessary to select a matrix rank to estimate the number of components.However, in the semi-supervised MCR used in this study, it is assumed that a large number of spectral components are utilized as reference spectra.Therefore, instead of estimating the rank in the matrix decomposition, we apply a LASSO regularization term to obtain a sparse solution.
The hyperparameter  1 of the LASSO regularization term are optimized by cross-validation, which evaluates the L2 norm of the residuals of the MCR result matrix.The data set was subjected to 5-fold cross-validation to estimate the  1 values (Figure S5).After cross-validation, appropriate  1 values were estimated to be around 1e-06 to 1e-05.Here, various values of  1 (0, 1e-05, 8e-03) was applied for semi-supervised MCR-ALS to confirm the accuracy (Figure S6), by using reference spectra of albumin, oleic acid, palmitic acid, citric acid, L(+)-ascorbic acid, ergosterol, In the case of  1 = 0, while existing compounds were generally accurately detected, several kinds of unincluded compounds were detected incorrectly, such as citric acid, streptomycin or avermectinB1a (Figure S6).In the case of  1 = 8e-03, many of the components were not detected accurately, such as albumin, oleic acid, palmitic acid, and starch.Moreover, Raman spectra of benzylpenicillin and pyruvate included as unknown compounds were not extracted through calculation.
Consequently, the analysis allowed sparse biomolecular detection in the spectral dataset using large numbers of reference Raman spectra, succeeding in detecting unknown compounds simultaneously.S9

Semi-supervised MCR-ALS: 14 compounds Raman spectral dataset
To confirm the validity of this analysis for Raman spectra with a larger mixture of components, the spectral dataset with 14 biomolecular compounds were prepared and used for the demonstration.The 14 spectral components and artificial background were mixed with the following profile (Figure S9a).As in the previous demonstration, random noise component was also added.The tailored Raman spectra had very strong background components (Figure S9b)  showed that adjusting  1 allows us to evaluate biomolecular production with high accuracy.The spectral dataset was well decomposed with the  1 values at 4e-05 (Figure S12).The spectra extracted through the ALS calculation also included some additional components such as pyridoxine, trans-o-coumaric acid and others, that were assumed to be unknown component (Figure S13).Newly extracted pyridoxine and trans-o-coumaric acid spectra highly preserved Raman spectral feature of them (Figure S14).
Lower  1 value caused large errors in several components such as citric acid, benzylpenicillin and sparsomycin (Figure S12).On the other hand, larger  1 value caused crucial misfitting for albumin, L(+)-ascorbic acid, ergosterol, starch.
The results show that the analysis can be applied to Raman spectral data sets consisting of many components, S11 including unknown components, by applying a reference spectrum and LASSO regularized ALS optimization.

Figure S1 .
Figure S1.The schematic diagram of custom-made Raman spectrometer

Figure S3 .
Figure S3.Raman intensity profiles set for artificial Raman spectral dataset.a) The intensity profile of each biomolecular and background component.b) Prepared Raman spectral dataset.
, streptomycin, lincomycin, avermectinB1a, pyridoxine, trans-ferulic acid, and trans-o-coumaric acid.The result demonstrated setting proper L1H value effective for detecting compounds accurately.In the case of  1 = 1e05, the components were detected with high accuracy and sparseness, including unknown compounds benzylpenicillin and pyruvate as well(Figure S6 and S7).As shown in FigureS8, the proper  1 value allowed accurate extraction of Raman spectra of benzylpenicillin and pyruvate.The comparison of the extracted spectra and standard spectra showed the possibility of exploring the unknown compounds with the pure Raman spectral information.

Figure S6 .
Figure S6.Comparison of Raman spectral intensity profile between set and resolved via MCR-ALS."-" indicates the component was not used for preparing the dataset.

Figure S7 .
Figure S7.semi-supervised MCR resolved spectra for six components Raman spectral dataset with L1H of 1e-05.Benzylpenicillin, sodium pyruvate and Component 1 -6 were detected during calculation without reference spectra.

Figure S9 .Figure S10 .
Figure S9.Raman intensity profiles set for artificial Raman spectral dataset.a) The intensity profile set for each biomolecular and background component.b) Prepared Raman spectral dataset.

Figure S11 .
Figure S11.Cross-validation for L1H.The horizontal axis indicates L1H and the vertical axis indicates L2 norm of the residual

Figure S12 .
Figure S12.Comparison of Raman spectral intensity profile between set and resolved via MCR-ALS."-" indicates the component was not used for preparing the dataset.