Introduction

Tuberculosis (TB), a communicable disease caused by Mycobacterium tuberculosis (Mtb), remains a global health crisis. The World Health Organization (WHO) estimated that TB was responsible for 1.5 million deaths and approximately 10 million new patients in 20201. The persistently high incidence and prevalence of TB in part reflect inadequate diagnostic approaches; it has been estimated that only 60% of new cases are detected, especially in countries with a high disease burden and low treatment coverage1. A sensitive and easy-to-implement test would provide an important initial improvement in diagnostic accuracy. However, the current standard in the diagnosis of TB is smear microscopy or culture tests, both of which have a low sensitivity, are laborious, and require a specialist laboratory2. Molecular tests, such as the Xpert MTB/RIF assay, have been introduced; however, their use is not economically feasible in primary care settings3. Moreover, TB tests often rely on sputum and thus have sub-optimal sensitivity, especially for patients with early active TB—such patients cannot consistently provide sputum4. To improve TB healthcare quality, the WHO has urged the development of a rapid and sensitive biomarker-based non-sputum test that can be implemented at the clinical site and utilizes accessible samples, such as blood, urine, or breath condensate4.

Omics-based discovery studies in TB patients—which involved comprehensive profiling of the host transcriptome, metabolome, and proteome—identified several biomarkers for the diagnosis of TB5,6,7,8,9,10,11. Moreover, implementing network analysis with multi-omics data could potentially verify and choosing the best among those biomarkers12. Transcriptomics is the most matured technology that identifies promising transcript biosignatures for TB diagnosis, treatment monitoring, and outcome prediction. Among the proposed signatures, Sweeney3, a host-response three-gene signature, has met the WHO’s target product profiles for a triage test13, whereas lipidomics research applications in TB management remain limited. Consequently, there is a need for further research, particularly into the altered lipidome of TB patients; lipids and lipid-related genes also have potential for use as diagnostic and prognostic biomarkers. Moreover, large-scale lipid profiling using plasma can provide insights into the disease because host lipids constitute a significant nutrition source for Mtb growth and reproduction14.

Mutual metabolic alterations constitute important aspects of host–pathogen interactions; together with regulatory factors, such alterations are responsible for drug tolerance but can be exploited to design effective host-tailored therapies15,16. Mtb lipid metabolism in host macrophages has a vital role in TB pathogenesis16,17. Lipid droplet (LD) formation, an important event in Mtb lipid metabolism, is a multifaceted process related to Mtb intracellular growth and drug tolerance; it also acts as a host defense mechanism to combat the pathogen17,18,19,20. Accordingly, studies that examine biomarkers related to lipid metabolism and immunology are expected to be fruitful.

There have been several investigations of the biological fluid lipidome in TB patients, with the goal of identifying biomarkers for TB diagnosis21,22,23,24,25. Chen et al. described changes in lipid levels during TB treatment, and the unbiased lipidomics approach of Shivakoti et al. revealed an association between the host lipidome and treatment failure26,27. These pioneering studies indicate significant differences in lipid profiles of patients with active TB and their counterparts; thus, they highlight potential applications of lipid and lipid-gene biomarkers in diverse clinical scenarios.

In the current study, a robust workflow was developed that facilitates the identification and validation of multi-omics metabolism-centric lipid and lipid-gene biomarkers for the diagnosis of active pulmonary TB.

Material and methods

Institutional review board statement for the clinical cohort

The Institutional Review Board of Korea University Guro Hospital reviewed and approved the study (No. 2017GR0012). All procedures were carried out following the Declaration of Helsinki. Written informed consent was obtained from all participants that allowed the blood and clinical data analysis to be used.

Sample characteristics

As mentioned elsewhere, plasma and clinical data were obtained from the Biobank of Korea University Guro Hospital28. Patients with malignant diseases, diabetes mellitus, hyperlipidemia, human immunodeficiency virus infection, and chronic liver or renal diseases were excluded. Thus, 35 patients with confirmed pulmonary TB and 37 controls were included in this study. The demographic information of included populations was described in Supplementary Table S1. There were no statistically significant differences between the two groups in terms of age (Wilcoxon rank sum test) or sex (Fisher’s exact test).

Available transcriptomics data

Three data sets with baseline gene expression profiles of TB patients and the counterparts were selected for the differentially expressed analysis and machine learning (ML)-based classification studies. The data sets are: GSE107991 (21 TB, 21 latent tuberculosis infection (LTBI), and 12 Control), E-MTAB-8290 (54 TB and 127 non-TB, including presumptive symptomatic adults with negative TB diagnosis controls and with or without human immunodeficiency virus infection), and GSE101705 (28 TB and 16 LTBI)29,30,31,32.

Chemicals, reagents, and consumables

The LC–MS grade ammonium formate, formic acid, methyl tertbutyl ether (MTBE), and toluene were purchased Sigma Aldrich (St. Louis, Missouri, USA). LiChroSolv® LC–MS grade solvents including water, methanol, acetonitrile, and isopropanol were purchased from Merck KGaA (Darmstadt, Germany). The SPLASH Lipidomix® Mass Spec Standard was purchased from Avanti Polar Lipids (Alabama, USA).

Acquity charged surface hybrid technology (CSH) C18 2.1 × 100 mm, 1.7 μm column and Acquity VanGuard CSH C18 2.1 × 5 mm, 1.7 μm pre-column were purchased from Waters (Milford, MA, USA).

Sample preparation and lipid extraction

Sample preparation and lipid extraction were performed in accordance with previously established methods, with a few modifications33,34. In brief, 55-μL plasma samples were thawed on ice for approximately 30 min; subsequently, 5 μL were removed from each sample and pooled to obtain a quality control (QC) sample. Five microliters of the lipid internal standard mixture were injected into each sample (1:10, v/v) and the sample was briefly vortexed. After the sample had been incubated on ice for 20 min with intermittent vortexing, 300 μL of methanol (− 20 °C) and 1000 μL of MTBE (− 20 °C) were added. The mixture was vortexed vigorously for 10 s, then incubated at 4 °C for 1 h with occasional vortexing. After the addition of 250 μL of water, vigorous vortexing for 20 s, and a 10-min incubation at 4 °C, the sample was centrifuged for 2 min at 4 °C and 14,000 rcf. The two supernatants (lipid fraction), each comprising 500 μL, were collected. One half was used for assessments in positive ion mode, and the other half was used for assessments in negative ion mode. The lipid fraction was completely dried in a vacuum at room temperature and stored at − 80 °C until needed.

Instrumental conditions for untargeted lipid profiling

An Acquity charged surface hybrid technology C18 column (2.1 × 100 mm, 1.7 μm) and Acquity VanGuard charged surface hybrid technology C18 pre-column (2.1 × 5 mm, 1.7 μm) were used for lipid separation with a binary gradient elution as described in Supplementary Table S2. A Shimadzu Nexera LC system (Kyoto, Japan) was utilized for the experiment. Lipid extracts were resuspended in methanol/toluene (9:1, v/v) and kept at 4 °C in an autosampler. The injected volume was ion-mode- and data-acquisition-dependent. From 200 μL of resuspended volume, 1 μL (scan profiling) and 2 μL (information-dependent acquisition and SWATH-based data-independent acquisition) were injected in positive ion mode; 3 μL (scan profiling) and 6 μL (information-dependent acquisition and SWATH) were injected in negative ion mode. The separated lipid ions were analyzed using an X500R QTOF with a Turbo V™ ion source with a TwinSpray probe (SCIEX, MA, USA). For the tandem MS analyses, either 45 eV (spread of 15 eV) or 25 eV (spread of 15 eV) were used. The MS parameters are shown in Supplementary Table S3. Mass calibration was automatically performed after every fifth injection through the instrument’s CDS system, using X500R positive or negative calibration solutions.

Lipid data processing, alignment, and lipid annotation

Raw data (wiff files) were directly input to MS-DIAL (version 4.8) for data processing, alignment, and lipid identification. The parameters were ion mode-dependent, as described in Supplementary Table S4. The aligned data were exported for subsequent use. Post-alignment data processing was performed using MetaboAnalyst 5.0 and features with missing rates ≥ 50% were removed; otherwise, the k-nearest neighbors algorithm was used to impute the missing features35. Features with relative standard deviation of ≥ 25% in the pooled QC were also removed. The MS-DIAL inbuilt library and Fiehn’s lab lipidomics library were used for lipid identification36,37.

Transcriptomics data processing and differential analysis

Raw counts of transcripts mapped into genes were summarized using the sum level. The annotated gene-level raw counts were normalized using Trimmed Mean of M-values. The pipeline was implemented using NetworkAnalyst 3.038. Differentially expressed analysis was applied for lipid-related genes in the three transcriptome profiles (i.e., E-MTAB-8290, GSE107991, GSE101705) using two-sided unpaired t-test (rstatix package version 0.7.0, implemented in R 4.1.2). Genes with a false discovery rate (FDR) less than 0.05 were considered as significant.

Data exploration and visualization

An unsupervised method, principal component analysis (PCA), was employed to explore and visualize the lipidome data. Prior to the analysis, the data were normalized (using the median method), log-transformed, and Pareto scaled. PCs that explained the most sample variance were plotted in a two-dimensional space (MetaboAnalyst 5.0) or three-dimensional space (R package, Plotly version 4.10.0). Heatmap and volcano plots (MetaboAnalyst 5.0) were also used for data visualization.

Statistical analysis and modeling of lipidomic data

Prior to univariate analysis using an unpaired t-test, the data were normalized using the median method and log-transformed. An FDR of 0.05 was set as the threshold for significant features. Fold-change (FC) thresholds of 1.2, 1.5, and 2 were also tested for biomarker candidate selection. Class discrimination between the lipid profiles of the two groups was achieved using partial least squares-discriminant analysis (PLS-DA). Because the discriminant model has tuning parameters (e.g., the number of components), the optimal model was selected in a tenfold cross-validation process. The variable importance in projection (VIP) score of the PC1 of the optimal model was set at ≥ 1.2 as the threshold of important features used to detect potential biomarker candidates. Statistical analyses were conducted using MetaboAnalyst 5.0 unless stated otherwise.

Internal and external biomarker validation

Univariate receiver operating characteristic (ROC) curve analysis was conducted to examine the potential biomarker applications of individual lipids. Random forest and linear support vector machine (SVM) were carried out to investigate the discriminatory capacity of the lipid biomarker candidates. Random forest is an ensemble method that generates many decision trees, then aggregates their outcomes to obtain greater prediction accuracy39. This powerful tool uses bagging and random feature selection to build multiple base learners. In SVM, a hyperplane is identified that maximizes the margin from data points. A larger margin leads to greater separation by the hyperplane, thus reducing generalization error. The performances of random forest and SVM are stable, regardless of the domain and data types.

For internal validation, the ROC curve-based exploratory analysis was utilized because it can automate important feature identification and performance evaluation. In the external validation, the biomarkers that were overlapped with the quantified lipids in the data of Cho et al. (at the fatty acyl/alkyl sum composition) were used to validate their performance in classifying TB patients from latent infection and controls21. All matched biomarkers were used to establish the biomarker models. The analyses were carried out in three different scenarios: TB vs. LTBI + control, TB vs. LTBI, and TB vs. control. The biomarker models using lipid-related genes in the datasets E-MTAB-8290 (54 TB, 127 control/non-TB), GSE107991 (21 TB, 12 controls, 21 LTBI), and GSE101705 (28 TB, 16 LTBI) were trained and validated using the same approach. In particular, the gene expression of matched lipid-related genes in four different data sets were utilized for the ROC-curve-based exploratory analysis. In all analyses, the area under the curve (AUC) and 95% confident interval (CI) of the best models are reported. The analyses were performed in the “Biomarker Analysis” module of MetaboAnalyst 5.0.

Correlation network analysis

Normalized expression levels of lipid biomarker candidates were visualized by correlation network analysis in the R package corrr (version 0.4.3). The network shows variables as nodes and their association as edges. The proximity of two nodes is determined by their correlation strength; their locations (or Euclidean coordinates) are found by multidimensional scaling. This method reduces the number of data dimensions to facilitate variable visualization.

Functional analysis

Biomarker candidate data were submitted to Lipid Ontology (LION) for lipid ontology enrichment analysis via the “LION-PCA heatmap” module40. In addition, lipid-gene association networks were analyzed using Lipidsig and lipid-genes were extracted41. For visualization, the R package ggplot2 (version 3.3.5) was used.

Results

Lipid profiles of TB patients are distinguishable from lipid profiles of controls

PCA was performed in positive ion mode to explore sample tendencies independent of sample source. The analysis was conducted using a total of 3791 detected lipid features of TB and control samples, with and without QC samples. In the PCA scores plot with QC samples, all QC samples clustered tightly together (Fig. S1A), and the relative standard deviation of the raw total ion chromatogram among QC samples was only 6.5%. These data indicated satisfactory repeatability of the untargeted lipid profiling analysis, which allowed subsequent data analyses and interpretation. In the PCA scores plot with TB and control samples, the three first PCs explained 52.1% of the variance: 23.2%, 21.1%, and 7.8% for PCs 1, 2, and 3, respectively. The relative separation of samples into two separate groups is evident in the three-dimensional PCA plot (Fig. 1A). Heatmap analysis captured relative differences between the two groups at the feature level; differences in lipid features were relatively clear (Fig. 1B). In addition, PCA analysis, which included 762 detected features, were also conducted in negative ion mode. Similar to positive ion mode, the QC samples clustered together (relative standard deviation of total ion chromatogram: 5.6%, Fig. S1B). The three-dimensional PCA plot indicated relative separation of the samples into two groups (Fig. 1C). At the feature level in the heatmap, we could also notice a proportionately contrast between the two groups (Fig. 1D). Taken together, the data exploration analyses in positive and negative ion modes indicated considerable differences between the lipid profiles of TB patients and of controls.

Figure 1
figure 1

Plasma lipidome data visualization of Tuberculosis patients (N = 35) and Control (N = 37) group. (a) Principal components analysis 3D score plot of the two group in the positive ion mode. (b) Heatmap of all lipidome features between two group in the positive ion mode. (c) Principal components analysis 3D score plot of the two group in the negative ion mode. (d) Heatmap of all lipidome features between the two group in the negative ion mode. C control group, T Tuberculosis group.

Univariate and multivariate analyses suggest numerous lipid biomarker candidates

Data exploration indicated considerable differences in lipid metabolic profiles between TB patients and controls, but a more sophisticated statistical approach was needed to identify lipids that could be regarded as biomarker candidates. Supervised investigation was conducted using PLS-DA and the lipid profiles of TB patients and controls. In positive ion mode, a PLS-DA model with five components classified the two groups with appropriate performance metrics (Fig. 2A, accuracy = 0.90, R2 = 0.92, and Q2 = 0.58). Similarly, in negative ion mode, a PLS-DA model with five components provided satisfactory classification (Fig. 2B, accuracy = 0.90, R2 = 0.95, and Q2 = 0.62) (Fig. S2A,B). The VIP score of the first PC, which explained the most sample variance, was extracted as an additional metric of biomarker candidate potential. A VIP score ≥ 1.2 was determined for 821 (21.66%) and 139 (15.71%) features in positive and negative ion modes, respectively. Finally, the random forest model demonstrated satisfactory performance in distinguishing the two groups. The cross-validated out-of-bag errors were 19.7% and 11.3% for positive and negative ion modes, respectively.

Figure 2
figure 2

Partial least squares-discriminant analysis (PLS-DA) score plots of Tuberculosis patients and controls plasma lipidome. (a) PLS-DA 3D score plot of the two group in the positive ion mode. (b) PLS-DA 3D score plot of the two group in the negative ion mode. C control group, T Tuberculosis group.

Univariate analysis using a t-test was employed to further explore potential biomarker candidates. In positive ion mode, 752 significant features (351 up- and 401 downregulated in TB patients) were found based on an FDR threshold of 0.05. Among these features, 743 (343 up- and 400 downregulated in TB patients), 404 (115 up- and 289 downregulated in TB patients), and 195 (51 up- and 144 downregulated in TB patients) exceeded the FC thresholds of 1.2, 1.5, and 2.0, respectively. In negative ion mode, 175 significant features (94 up- and 81 downregulated in TB patients) were identified with an FDR threshold of 0.05. With the FC thresholds of 1.2, 1.5, and 2.0, 156 (78 up- and 78 downregulated in TB patients), 74 (23 up- and 51 downregulated in TB patients), and 33 (8 up- and 25 downregulated in TB patients) features were selected, respectively. The volcano plots in Supplementary Fig. S3A (positive ion mode) and S3B (negative ion mode) show significant features based on the FC threshold of 1.5 and FDR threshold of 0.05.

The intersection of two criteria, a VIP score (PC1, PLS-DA model) of ≥ 1.2 and an FC (t-test) of ≥ 1.5, revealed 89 and 28 potential biomarker candidates in positive and negative ion modes, respectively. Among the selected features, 73 (positive ion mode) and 26 (negative ion mode) were successfully annotated as lipids, thus yielding 93 non-overlapping lipid biomarkers (Table 1).

Table 1 Statistics information of the potential biomarkers for TB versus control distinguish.

Internal and external validation indicate satisfactory performance of lipid biomarker candidates

Annotated lipid biomarker candidates were first subjected to univariate biomarker analysis. The ROC curves for those candidates were significantly associated with the TB status (Supplementary Table S5). Among 93 candidate lipid biomarkers, 21 had AUC values < 0.7, whereas 72 were considered promising (AUC ≥ 0.7); of the 72, 13 were considered good (AUC ≥ 0.8) and 2 were considered excellent (AUC > 0.9). The “excellent” lipid biomarkers were two ether-linked phosphatidylethanolamines: PE(O-38:5) and PE(O-40:5). The “good” biomarker candidates were from six lipid sub-classes: two phosphatidylcholine (PC), PC(36:0) and PC(38:7); two ether-linked phosphatidylcholines (PC(O-)), PC(O-36:0) and PC(O-34:0); two ether-linked lysophosphatidylethanolamines (LPE(O-)), LPE(O-16:1) and LPE(O-18:1); two phosphatidylethanolamines (PE), (PE(36:1) and PE(38:4); one PE(O-), PE(O-40:5); and four free fatty acids (FAs), FA(20:3), FA(20:5), FA(22:5), and FA(22:6). Multivariate biomarker analysis using the random forest method revealed that models with 93 variables had the best performance (AUC = 0.921, 95% confidence interval [95% CI] 0.834–0.987) (Fig. 3A). The result of the linear SVM method was approximately similar to the result of the random forest method (Fig. 3B). Correlation analysis showed a significant linear correlation among biomarkers for both TB patients (Fig. 3C) and controls (Fig. 3D), suggesting that a small number of lipids could be used as biomarkers to differentiate TB patients from controls.

Figure 3
figure 3

Lipid biomarkers multivariate and correlation analysis. (a) Random Forest predictive model of the lipid biomarkers. (b) Linear Support Vector Machine predictive model of the lipid biomarkers. (c) Correlation of the lipid biomarkers in Tuberculosis group (d) Correlation of the lipid biomarkers Control group. Var variable, AUC area under the curve, CI confidence interval, CAR acylcarnitine, Cer ceramide, Hex2Cer hexosylceramide, LPC lysophosphatidylcholines, LPC (O-) Ether-linked lysophosphatidylcholines, PC phosphatidylcholine, PC (O-) Ether-linked phosphatidylcholine, LPE lysophosphatidylethanolamines, LPE (O-) Ether-linked lysophosphatidylethanolamines, PE phosphatidylethanolamine, PE (O-) Ether-linked phosphatidylethanolamine, PI phosphatidylinositol, NAE N-acetyl ethanolamine, DG diacylglycerol, TG triacylglycerol, FA free fatty acid.

To rule out the possibility that the internal validation overestimated candidate biomarker performance, external validation was conducted using the dataset from Cho et al. (21 active TB patients, 20 patients with latent infections, 28 controls)21 but restricted to overlapping lipids (i.e., one LPC, 4 PCs, and 7 PC(O-)s). The validation was first conducted by dividing the samples into active TB vs. non-TB (patients with LTBI and controls) groups. In the univariate ROC analysis, six lipids exhibited an AUC ≥ 0.7. While LPC(20:3) and PC(O-34:0) were unable to differentiate between groups, the AUC of PC(O-40:4) and PC(O-42:5) was 1. Satisfactory performance (AUC 1, 95% CI 1–1) was obtained using the random forest model. Similar results were achieved in the comparison of active TB vs. LTBI or TB vs. control (Table 2). The results of the external validation partially supported the results of correlation network analysis, i.e., only a few lipids could differentiate TB from LTBI or controls.

Table 2 External validation performance of 12 overlapping lipid biomarkers.

Functional analysis reveals profound lipid metabolic alterations in TB patients

The LION ontology results indicated that PC(O-), PC, and PE were generally enriched in the TB group, whereas FAs and triacylglycerols (TAGs) with longer acyl chains were downregulated. The TB group also exhibited enrichment of lipids associated with mitochondrion, endoplasmic reticulum, and membrane components; it showed decreased levels of LD-related lipid species (Fig. 4A).

Figure 4
figure 4

Lipid ontology enrichment and lipid-gene association network analysis. (a) Lipid ontology (LION) PCA-heatmap of Tuberculosis and Control group. (b) Bubble plot of lipid-gene association pathways. C control group, T Tuberculosis group, PC phosphatidylcholine, TG triacylglycerol, LION Lipid ontology.

The reported biomarker candidates belonged to 15 (sub)-classes, of which six contained ≥ 3 lipid species. Those six sub-classes were subjected to lipid-gene analysis to identify TB-associated functional dysregulation and potential gene biomarkers. As shown in Fig. 4B, numerous pathways were altered. Significantly enriched biological processes included the PI3K-Akt, Rap1, calcium, and chemokine signaling pathways. Ether lipid metabolism, fat digestion and absorption, linolenic acid metabolism, and cholesterol metabolism were also altered. The full list of altered pathways and associated genes is provided in Supplementary Table S6.

Lipid-genes are excellent biomarkers for differentiating active TB from its counterparts

Among dysregulated pathways detected in the lipid-gene analysis (p < 0.01), 162 unique genes were identified. These genes were tested for their ability to differentiate active TB from LTBI or controls in three different data sets. The random forest classifier established from the expression of lipid-related genes was able to distinguish TB from its counterparts in three different TB cohorts, with an AUC ranging from 0.829 (95% CI 0.707–0.931, E-MTAB-8290) to 0.958 (95% CI 0.909–1, GSE101705) (Fig. 5A–D). The linear SVM model showed similar results (Fig. S4A–D). However, most genes exhibited a small FC.

Figure 5
figure 5

Tuberculosis (TB) and non-TB classification in three cohort by lipid-genes biomarkers using Random Forest predictive model. (a) Model performance (AUC = 0.919) of TB versus Control classification in GSE107991 dataset. (b) Model performance (AUC = 0.884) of TB versus latent tuberculosis infection (LTBI) classification in GSE107991 dataset. (c) Model performance (AUC = 0.829) of TB versus non-TB classification in E-MTAB-8290 dataset. (d) Model performance (AUC = 0.958) of TB versus Control classification in GSE101705 dataset. Var variable, AUC area under the curve, CI confidence interval, TB Tuberculosis, LTBI Latent tuberculosis infection.

Discussion

This study demonstrated differences in the lipidomes of TB patients and non-TB controls; it revealed 73 and 26 potential biomarker candidates, identified in positive and negative mode, respectively (six biomarkers were detected in both). In TB patients, the biomarkers had at least a 1.5-fold difference (in either direction). Among the significantly altered lipid sub-classes, ceramide (Cer), LPC, PC(O-), and PE were generally upregulated; certain PCs, diacylglycerols, and FAs were downregulated in TB patients. TAGs with shorter acyl chains were strongly increased in TB patients, while TAGs with longer acyl chains were decreased. The biomarkers mostly belonged to the lipid classes PC, PC(O-), PE, PE(O-), FA, and TAG, suggesting that these lipid classes are important in TB pathophysiology.

Among the putative lipid biomarkers, 12 were matched with previously reported lipid profiles that utilized a targeted approach21. PC(O-40:4), PC(O-42:5), PC(36:0), and PC(34:4) were prominent biomarker candidates identified by internal and external validation. Besides, our biomarkers showed concordance partially with the top biomarkers reported by Chen et al. and Han et al. in terms of lipid species. PC and PC (O-) were dominant in the top list, which suggests the importance of these lipids in differentiating TB from the controls23,26. However, different biological and technical factors can be attributed to the heterogeneity of the findings among studies, such as cohort characteristics, genetic background, sample treatment and instrumental data acquisition, and utilized statistical methods.

Lipid-related genes were associated with various TB pathologically comparable pathways; they also formed a distinctive signature that differentiated active TB from LTBI and other non-TB controls. Although gene expression was only subtly altered, it provided a consistent signature. Among the 162 lipid-related genes, 96 genes were found to be differentially expressed in TB versus non-TB in at least one comparison (Supplementary Table S7); these genes are presumably involved in crucial biological processes that underlie TB pathophysiology. For example, differentially expressed genes related to the biochemical regulations of PC, which is profoundly changed in TB21,23,26, included CHPT1, LPCAT2, LPCAT4, PLA2G4A, PLA2G4C, PLD2, PLD4, MBOAT2, ADORA2A and ADORA2B.

A significant increase in Cer(d34:1) was identified in TB patients. This biomarker has been previously reported to exhibit consistently higher levels in TB patients than in healthy individuals or patients with other respiratory diseases23,24,42,43. Shivakoti et al. showed that Cer(d34:1) is also related to TB treatment outcome; patients with the highest Cer levels had an increased risk of treatment failure27. The involvement of Cer in host immune responses against Mtb—via immune cell activation, phagocytosis, and other mechanisms—might explain its higher level in TB patients than in controls24,44,45.

Although Mtb relies on different carbon sources at different stages of pathogenesis, host lipids are generally the primary carbon source for Mtb in vivo46. The genome of Mtb laboratory strain H37Rv has > 250 genes related to lipid metabolism47. The infection of macrophages by Mtb triggers the formation of foamy macrophages through the accumulation of lipid bodies of TAGs and cholesterol esters48,49. Consistent with previous findings23,26, our study showed a decrease in TAGs in host plasma; this may be related to the uptake of host TAGs into foamy macrophages to form LDs, which can serve as nutrient sources14,48. While LD formation may be a host-driven immune response rather than an Mtb-mediated process19, the resulting physiological changes in Mtb lead to TAG accumulation, LD formation, growth reduction, decreased metabolic activity, and development of phenotypic drug resistance; these processes are associated with the persistent and non-dividing stages of Mtb50. We also found the downregulation of FAs in TB patients; this hinders the formation of longer-chain FAs that form the main components of Mtb cell wall lipids51. Fas I/II-induced elongation could partially explain the decrease in plasma FAs.

Lysophosphatidylcholine acyltransferase 2 (LPCAT2) induces LD accumulation in cancer patients during the onset of chemoresistance52. Our study is presumably the first to report an association of the upregulation of LPCAT2 with metabolic alterations in TB patients vs. non-TB controls, suggesting that LPCAT2 can serve as a biomarker in the diagnosis of TB.

There is also emerging evidence concerning the crucial role of metabolism in host–pathogen dynamics, with the transcription factor PPAR (peroxisome proliferator-activated receptor) implicated in LD buildup during inflammation and infectious diseases53,54,55. Our analysis demonstrated the enrichment of several lipid-related genes associated with PPAR signaling pathways; these genes include PPARA, CD36, FABP4, and ACSL156. Lipid mediators, cytokines, and chemokines may act in a paracrine manner to induce LD formation14,57.

We also identified the involvement of lipids and lipid-related genes in chemokine signaling (CDC42, FGR, IKBKB, RAC1), ether lipid metabolism, glycerophospholipid metabolism, sphingolipid metabolism, and phospholipase D signaling pathways. In a mouse model of TB58 and a study of T cells from TB patients59, Mtb was found to inhibit host proinflammatory cytokine production through the PI3K-Akt signaling pathway. In the ontology analysis of lipid genes, the PI3K-Akt signaling pathway exhibited the most significant functional dysregulations in TB patients. Together, these observations support a significant role of lipid metabolism and lipid-related genes in the host immune response.

Our study had several limitations. First, its focus was on the discovery and validation of lipid biomarkers; however, there is evidence to support the use of hydrophilic metabolites (e.g., glutamic acid and glutamine) as biomarkers21. The combined use of these metabolites and lipids would significantly improve the detection of active TB in clinical settings. Second, other infectious respiratory diseases (e.g., community-acquired pneumonia) were not included in the lipidomics analysis. Nevertheless, some markers were able to reliably distinguish TB and LTBI. Subsequent studies should examine the differential diagnostic performance of those biomarker candidates in other infectious respiratory diseases. Third, quantitative information is available for some biomarker candidates, based on isotopically labeled internal standards at ratios relative to human plasma. However, an accurate quantification strategy (e.g., AdipoAtlas60) is needed to facilitate clinical application. This can be readily achieved through targeted analysis of a subset of the most promising biomarkers. Fourth, through our exploratory analysis, we enable the identification of several altered lipids and lipid genes, as well as lipid-related metabolism and immune response pathway in TB patients. Experimental studies on in vitro or animal models are required to substantiate our findings. Finally, a prospective validation cohort study with actual concentrations of lipid biomarkers is required to examine the relevance of the identified biomarkers in TB manifestations.

In summary, our study identified and validated lipid-focused biomarkers. Multiple data mining methods with lipidome and lipid-related transcript signatures were used to obtain robust biomarkers and gain new mechanistic insights into TB. Lipid species that belonged to the PC(O-), PCs, TAGs, FAs, and Cer were identified as excellent candidate biomarkers. PC(O-40:4), PC(O-42:5), PC(36:0), and PC(34:4) were externally validated and had a good performance. Additionally, our study revealed systemic and multi-omics levels of biologically relevant processes involved in host responses to Mtb infection. Overall, comprehensive omics analyses employing a data-driven, knowledge-based approach can support metabolism-centric biomarker discovery and validation.