Plasma infrared fingerprinting with machine learning enables single-measurement multi-phenotype health screening

Summary Infrared spectroscopy is a powerful technique for probing the molecular profiles of complex biofluids, offering a promising avenue for high-throughput in vitro diagnostics. While several studies showcased its potential in detecting health conditions, a large-scale analysis of a naturally heterogeneous potential patient population has not been attempted. Using a population-based cohort, here we analyze 5,184 blood plasma samples from 3,169 individuals using Fourier transform infrared (FTIR) spectroscopy. Applying a multi-task classification to distinguish between dyslipidemia, hypertension, prediabetes, type 2 diabetes, and healthy states, we find that the approach can accurately single out healthy individuals and characterize chronic multimorbid states. We further identify the capacity to forecast the development of metabolic syndrome years in advance of onset. Dataset-independent testing confirms the robustness of infrared signatures against variations in sample handling, storage time, and measurement regimes. This study provides the framework that establishes infrared molecular fingerprinting as an efficient modality for populational health diagnostics.

Supplementary Table S1.Description of study cohort, related to STAR Methods.Two time-separated samplings of a longitudinal populationbased cohort were considered.

1/14
Longitudinally sampled study population Both sample sets were combined into one dataset, and the classifier was trained to assign each measurement into one sample set to determine whether there exists differences between the IR spectra of the sample sets.Two ROC curves of the classifications are depicted -one when applied to normalized spectra and one when applied to non-normalized spectra.The ROC curves are depicted on test samples as determined via cross-validation.Both classifiers were capable of perfectly determining to which sample set each measurement belonged to -revealing that measurement variations existed between the sample sets (even on normalized spectra), despite the overlapping cloud appearance depicted in (D).

A
3/14 Supplementary Figure S3.Receiver operating characteristic (ROC) curves for binary classifications modeling each listed outcome against all other outcomes (one-versus-rest) without the classifier chaining, related to Figure 2.For each classification, the mean cross-validated ROC curve of the test sets is depicted and the area under the curve is listed below, along with its standard deviation.Model validation was carried out as described in the main text.Compared to the ROC curves depicted in Figure 2D in the main text, the binary classification analysis here was performed with independent binary classifiers that do not utilize the chain mechanism.
4/14 and creatinine) were used to compare the performance of simultaneously detecting the phenotypes of interest with the performance of using the IR molecular fingerprints as exclusive predictors (shown in Figure 2 in the main text).(A-B) Test predictive performance of a model using the clinical analytes as the exclusive predictors.Model validation was carried out as described in the main text.
5/14 6/14   S2. (A-E) Difference between the mean spectral measurement of cases and controls for each condition in the first sample set (dark blue) and second sample set (light blue).The "healthy" category includes individuals having none of the other listed conditions.(F) ROC curves for the binary classifications detecting each phenotype.The mean cross-validated ROC curve of the test sets is depicted and the area under the curve is listed, along with its standard deviation, for each binary classification.Supplementary Table S3.11/14 Supplementary Table S4.Predictive performance of multilabel classification using different label arrangements for optimal model selection, related to STAR Methods.To select an optimal classification order, several permutations were tested to observe their influence on the overall predictive efficacy.For each listed order, model performance was estimated by cross-validating on only sample set #1 to avoid introducing model selection bias by leaking information from sample set #2 (which was used for the independent testing).The mean and standard deviation of the calculated exact match ratio, Hamming score, and Hamming loss is listed for each of the possible 120 ordering of the 5 binary classifiers in the chain.The last entry in the table lists the performance of a non-chained model where each of the 5 phenotypes was detected with an independent binary classifier.Abbreviations: T2D -type-2 diabetes; PD -prediabetes.
. Model validation procedure, related to STAR Methods.(A) Studied population consisting of naturally occurring heterogeneity between individuals that was sampled at two time points, thus longitudinal.(B) On the first sample set of the two, 10-fold crossvalidation was performed to obtain metrics of model performance in 10 train and test splits of the samples.Each split was trained on approximately 90% of the samples and tested on approximately 10% of the remaining samples.(C) On the second sample set, the same cross-validation procedure was performed -independently of the first.(D) In 10 splits, model training was performed on 90% of the first sample set, leaving out a different set of individuals in each iteration.Model testing was then performed on the individuals from the second sample set who were not included in the training data from the first sample set for each data split.The model was thus tested on a dataset of independent samples and individuals. .Effects of variations in sample and measurement conditions on IR molecular fingerprints, related to STAR Methods.(A-B) Difference between the mean measurement of the first sample set and the mean measurement of the second on (A) non-normalized spectra and (B) normalized spectra.Gray-shaded areas depict the standard deviation of the IR fingerprints.(C-D) Principal component analysis was applied on the combined set of spectra from the two sample sets on (C) non-normalized spectra and (D) normalized spectra, color-coding the measurements from each sample set.(E) Binary classifications were carried out to determine to which sample set a measurement belonged to.
. Prediction of anthropometric parameters using IR molecular fingerprints, related to Figure 2. (A) Test ROC curves for distinguishing males from females.Classification performance was estimated by repeatedly training on 90% of the first sample set and testing on the second sample set in 10 iterations, ensuring no overlap between the longitudinally sampled individuals.(B-C) Prediction error of regression models trained to estimate (B) age and (C) body mass index (BMI) of the sampled individuals, combining samples from both sets.The predicted values follow from the test sets of models fit on training sets in a 10-fold cross-validation where each point represents a measurement.(D-F) Same procedure, but applied on the healthy subset of the population to reduce the contribution of the health phenotypes on the anthropometric parameter predictions.

F
Supplementary FigureS6.Phenotype detection in gender, age, and body mass index matched cohorts using IR molecular fingerprints, related to Figure2 and Table Breakdown of metabolic risk factors for pre-MetS and MetS investigations, related to Figure 3.Each row represents a group of individuals with varying numbers of concurrent risk factors.The sample counts and percentages are listed according to the distribution of risk factors observed in each group.Distributions of the first and second sample sets are listed separately.Samples with unknown values for any of the risk factors were excluded.Classical clinical analytes reflected in non-normalized IR molecular fingerprints, related to Figure 4. (A) Pearson correlation coefficient (red curves) between the concentrations of each clinical analyte and the absorbance at each wavenumber between 1000 and 3000 cm −1 for non-normalized spectra.The mean absorbance spectrum of all measured IR spectra (n = 5184) is overlaid in gray on each panel as a visual reference for the shape of the spectrum.(B) Performance of quantitatively predicting clinical analytes using the non-normalized IR spectra of the measured population.Regression algorithms were trained for each parameter to capture the relations between the clinically-measured values and the multivariate spectral features.The predicted values follow from the test sets of 10-fold cross-validations.Each point represents a measurement.The mean coefficient of determination (R 2 ) and the root mean squared error (RMSE) are listed for each parameter, along with their standard deviations across the test splits.The diagonal (dashed red line) is a visual reference for a perfect fit.Performance of quantitatively predicting cholesterol ratios using IR molecular fingerprints, related to Figure 4. Ratios were calculated by dividing the clinically measured mmol/l concentrations of triglycerides by HDL cholesterol (A), LDL cholesterol by HDL cholesterol (B), and total cholesterol by HDL cholesterol (C).Regression algorithms were trained for each ratio to capture the relations between the calculated values and the multivariate spectral features.Predicted values follow from test sets of models fit on training sets in a 10-fold cross-validation.Each point represents a measurement.Model performance was measured by the mean coefficient of determination (R 2 ) and the root mean squared error (RMSE) and is listed for each parameter, along the standard deviation, across the test sets of the cross-validation.The diagonal (dashed red line) represents a visual reference for a perfect fit.

Table S2 .
Description of cohorts used in case-control matched classifications, related to Figure 2 and Figure S6.Matching was performed based on gender, age, and body mass index of the sampled individuals in a pair-wise fashion.

Table S4 .
Continued from previous page.