Host‐Variable‐Embedding Augmented Microbiome‐Based Simultaneous Detection of Multiple Diseases by Deep Learning

Microbiome has emerged as a promising indicator or predictor of human diseases. However, previous studies have typically labeled each specimen as either healthy or with a specific disease, ignoring prevalence of complications or comorbidities in actual cohorts, which may confound the microbial–disease associations. For instance, a patient may suffer from multiple diseases, making it challenging to detect their health status accurately. Furthermore, host phenotypes like physiological characteristics and lifestyles can alter microbiome structure, but such information has not yet been fully utilized in data models. To address these issues, a highly explainable deep learning (DL) method called Meta‐Spec is proposed. Using a deep neural network (DNN)‐based approach, it encodes and embeds refined host variables with microbiome features, enabling the detection of multiple diseases simultaneously. Experiments show that Meta‐Spec outperforms regular machine learning (ML) strategies for multilabel disease screening in several cohorts. More importantly, Meta‐Spec successfully detects comorbidities that are often missed by other approaches. In addition, for its high interpretability, Meta‐Spec captures key factors that shape disease patterns from host variables and microbes. Hence, these efforts improve the feasibility and sensitivity of microbiome‐based disease screening in practical scenarios, representing a significant step toward personalized medicine and better health outcomes.


Introduction
The dynamics of the human microbiome are closely associated with numerous diseases. [1,2][5] Typically, ML classifiers take taxonomic or functional features of microbiomes in different health conditions for training and construct classification models to predict the status of new samples.In this scenario, microbiome cohorts for research are always well-designed that each specimen is marked with only one definitive status, denoted as "label," that is, either healthy or with a specific disease. [6,7]Such effort reduces the confounding of complex factors in the experimental design and makes "single-label classification" a typical strategy (Figure 1a).However, complications or comorbidities are prevalent in actual cohorts (Figure 1b).For example, in the American Gut Project (AGP), [8] %61% of patients were diagnosed with at least two diseases.Our recent study has shown that gut microbiomes with comorbidities can have distinct microbial patterns from those with a single disease, even though they share common biomarkers. [9]As a result, comorbidities can significantly disrupt disease detection.On the other hand, the lifestyle and physiological variables of human hosts have strong connections to various diseases, which can also interfere with the recognition of microbiome-based status.Age, for instance, is one of the major risk factors for cardiovascular disease [10] and is also associated with Crohn's disease. [11]icrobiome has emerged as a promising indicator or predictor of human diseases.However, previous studies have typically labeled each specimen as either healthy or with a specific disease, ignoring prevalence of complications or comorbidities in actual cohorts, which may confound the microbial-disease associations.For instance, a patient may suffer from multiple diseases, making it challenging to detect their health status accurately.Furthermore, host phenotypes like physiological characteristics and lifestyles can alter microbiome structure, but such information has not yet been fully utilized in data models.To address these issues, a highly explainable deep learning (DL) method called Meta-Spec is proposed.Using a deep neural network (DNN)-based approach, it encodes and embeds refined host variables with microbiome features, enabling the detection of multiple diseases simultaneously.Experiments show that Meta-Spec outperforms regular machine learning (ML) strategies for multilabel disease screening in several cohorts.More importantly, Meta-Spec successfully detects comorbidities that are often missed by other approaches.In addition, for its high interpretability, Meta-Spec captures key factors that shape disease patterns from host variables and microbes.Hence, these efforts improve the feasibility and sensitivity of microbiome-based disease screening in practical scenarios, representing a significant step toward personalized medicine and better health outcomes.
Moreover, body shape and BMI can act as predictors for metabolic syndrome [12] and type 2 diabetes. [13]These host variables provide crucial information that can aid in disease prediction and diagnosis.Nevertheless, many existing ML classifiers focus solely on analyzing microbiome features, such as microbial diversity, abundance, and composition, while overlooking the potential of metadata in disease screening.Although these microbial features are important in disease screening and prediction, they are not the only factors that contribute to disease risk.
Here we present Meta-Spec, an explainable deep learning method for multilabel disease classification using microbiome data.Meta-Spec is based on a multigate mixture-of-experts (MMoE) model [14] and cross networks, [15] which is capable of detecting multiple diseases simultaneously by integrating genotype data (microbiome features derived from sequences) and phenotype data (host variables).In addition, unlike other neural network-based methods that lack interpretability, Meta-Spec can quantify the importance of confounding factors in describing each disease by their relative contribution in status classification.Experiments on multiple cohorts showed that the performance of our method was superior to widely used ML classification strategies in comorbidities screening and disease correlation catching, while also providing insights into the underlying mechanisms of each specific disease.

Multilabel Disease Classification Based on Multitask Deep Learning
The framework of Meta-Spec for multilabel classification is illustrated in Figure 1c.During model training, richness of microbial features (e.g., taxa, amplicon sequence variants (ASVs), operational taxonomy units (OTU), functional gene families, etc.) is treated as dense features, while host variables (e.g., physiological characteristics and lifestyle habits) are transformed into high dimensional embedding vectors, representing sparse categorical features (details provided in the Experimental Section).Then the concatenation layer merges dense features and embedding vectors, followed by an MMoE layer for learning microbiome associations among diseases and two cross networks for capturing microbial and host-variable interactions (refer to Experimental Section for details), respectively.Finally, the tower network calculates each disease's probability by combining the outputs of the MMoE and cross networks.Thus, with a new microbiome and corresponding metadata, Meta-Spec classifier generates a binary array to summarize the prediction results, where each bit represents a particular disease's presence (Figure 1b,c).

Deep Neural Network Largely Improves the Multilabel Disease Classification
We evaluated the efficacy of Meta-Spec in multilabel disease screening using Dataset 1 (Table 1; see Experimental Section for details), which was produced by the American Gut Project. [8]To minimize the impact of country information on gut microbiome, [16] here we only employed the US cohort.This dataset contained 5308 subjects, including 3767 patients and 1541 healthy controls.We filtered out invasive inspection information (e.g., blood) and only kept questionnaire-based host variables.Each patient in this dataset was diagnosed with at least one of the 7 target diseases of autoimmune disease, lung disease, thyroid disorder, cancer, inflammatory bowel disease (IBD), cardiovascular disease, and autism spectrum disorder (ASD).Although previous studies have demonstrated the association between gut microbiome and these target diseases, [17][18][19] screening for status can be significantly affected by the combined influence of multiple diseases.We utilized 5-fold cross-validation to perform multilabel disease classification using Meta-Spec and other ML classifiers, including logistic regression (LR; a typical linear model), random forest (RF, commonly used in microbiome studies [5,20] ), light gradient boosting machine (LGB; the latest gradient boosting method developed by Microsoft [21,22] ), and multilayer perception (MLP, a deep learning model composed of two hidden layers).Performance was evaluated using the regular area under the receiver operating characteristic curve (AUROC).In addition, since the number of samples were highly unbalanced among hosts and labels, we also compared the performance using area under the precision recall curve (AURPC), which is sensitive to unbalanced datasets.
The results presented in Figure 1d demonstrated that using only the microbiome ASVs, the regular ML methods achieved low AUROC values.This was likely due to the confounding effects of host factors or multidisease interactions.When metadata was included through one-hot coding (denoted by a 'þ' symbol in Figure 1), the performances of all models were substantially increased, highlighting the significant role of host variables in disease detection.Meanwhile, it should be noted that the competitive methods still exhibited shortage in the detection of ASD and thyroid diseases indicated by detailed AUPRC values, as shown in Figure 1e.The best overall performance of both AUROC and AUPRC was achieved by Meta-Spec, as shown in Figure 1d, which outperformed all other models by a significant margin.This can be attributed to the deep neural network of Meta-Spec, which was able to utilize the associations of microbiome patterns among diseases during model training (Figure 1f ).For example, the network reported a positive correlation (PCCs = 0.60; Pearson correlation coefficient) between IBD and cardiovascular disease, as well as a negative correlation (PCCs = À0.84) between IBD and ASD in Dataset 1.The ablation experiment further explained the effectiveness of such a strategy of Meta-Spec for using MMoE and cross networks (Table S1, Supporting Information).
We also validated Meta-Spec by Dataset 2 (Table 1) collected from the Guangdong Gut Microbiome Project (GGMP), [23] which consists of 5347 subjects (Table 1; refer to Experimental Section for details).Patients were diagnosed with at least one target disease of metabolic syndrome, gastritis, type 2 diabetes mellitus (T2DM), and gout.For Dataset 2, results were Source AGP US cohort [8] GGMP cohort [23] AGP UK cohort [8] Multi-cohort [29] # in the same trend as those of AGP US cohort, that is, low AUPRC only by OTUs, then significantly raised with additional metadata in classification (Figure S1, Supporting Information), verifying the superiority of multilabel classification strategy in Meta-Spec (Figure S2, Supporting Information).In addition, Meta-Spec also revealed a positive connection between gastritis and gout in Dataset 2 (PCCs = 0.43; Figure S3, Supporting Information).

Disease Correlation Being Crucial for Comorbidity Detection
On the other hand, comorbidities are prevalent in actual cohorts, as evidenced by the fact that out of 3767 patients in Dataset 1 cohort, 1360 were identified as having two or more diseases.These comorbidities often go unnoticed by regular classification strategies, which is why we conducted further measurements to assess the ability of different methods in detecting them.To accomplish this, we divided patients of Dataset 1 and Dataset 2 into two groups (as shown in Figure 2a): the single-disease group with only the target disease and the comorbidity group with additional disease(s).The comorbidity detection results were based on multi-label classification, which took into account the performance on both the target disease and the comorbidities.
As illustrated in Figure 2b, using microbiome structure and host variables for training, Meta-Spec achieved significantly higher AUPRC (Figure 2b) and AUROC (Table S2, Supporting Information) than other learning classifiers.Such advantage mainly benefited from disease associations inferred by MMoE gates (Figure 1f ), as well as the combination of multiple host variables by the cross network in Meta-Spec.On contrary, the competing methods can miss comorbidities when focusing solely on target diseases like IBD, autoimmune, and thyroid (Figure 2c).AUROC for commodities is illustrated in Table S2, Supporting Information, which also validates the superiority of Meta-Spec.

Feature Selection and Variable Refinement for Multilabel Disease Screening
To tackle the shortage of model interpretability of most neural networks, Meta-Spec is able to quantify the contribution of microbial members and confounding factors in recognizing disease patterns using a Meta-Spec importance value (MSI; refer to Experimental Section for details).Sorted by MSI, host variables were the dominant features for disease screening on Dataset 1, although their values varied among different diseases (Figure 3a and S4, Supporting Information).Taking cardiovascular disease as an example (Figure 3a), age was the most important feature for cardiovascular disease detection.Specifically, old people, artificial sweeteners, and people who have constipation were more susceptible to cardiovascular disease. [10,24,25]The highly ranked MSI values of host phenotypes also elucidate their effects to improve the performance of RF and LGB for cardiovascular disease classification (Figure 1e).Besides, some features analyzed from microbiome sequences also helped in distinguishing the disease, including ASV 1 (Escherichia_Shigella) and ASV 28 (Bacteroides) that have been reported, [26,27] yet they received weaker importance in Meta-Spec model.
By analyzing the mostly contributed features (top 20% MSI of host variables and microorganisms) of each disease, we observed that over 80% of them were shared by at least two diseases on the two datasets (Figure 3b and S5, Supporting Information; Table 2).Such nonspecific associations among gut microbiota, host variables, and diseases [28] further explained the limitation of biomarker-based and regular ML-based strategies in comorbidities detection.Among them, a few factors were highly ranked in most diseases, such as BMI, bowel movement quality, probiotic frequency, ASV 1 (Escherichia_Shigella) and ASV 10 (Bacteroides) of Dataset 1 (Figure 3b) and districts, medication, OTU 4425571(Escherichia_Shigella) and OTU 136025 (Ralstonia) of Dataset 2 (Figure S5, Supporting Information).On the other hand, we also found the distinctive features that were only sensitive to a single disease.For instance, seafood frequency was strongly associated with cardiovascular disease, as well as OTU 4478762 (Lacticigenium) and BMI contributed to metabolic syndrome detection.Nevertheless, although these features were important for disease detection, none of them can work as the disease indicator solely.For example, as universal markers, age, waist size, medication, and ASV 1 only achieved the AUPRC of 0.300, 0.248, 0.173, and 0.162 in status predicting (Figure S6, Supporting Information), largely lower than then overall performance.Hence proper classification methods and models are essential to fully explore the significance of important factors.
After the preliminary artificial curation of the original metadata, there were still 71 host variables kept in model training and validation for Dataset 1.Since too many items in questionnaires can introduce difficulties in real applications, a feature selection procedure was performed to assess the effect of metadata amount in classification, thus reducing the number of enrolled host variables.For Dataset 1, we first sorted all the host variables by their mean MSI over all target diseases.Then, less important host variables were gradually eliminated to repeat the model training and the corresponding validation by Meta-Spec.The performance curves (Figure 3c,d) described the linkages between the performance of multilabel classification and the number of enrolled host variables.We noticed that when taking only 20 host variables, Meta-Spec can still provide promising multidisease screening.Such effort is crucial in actual application scenarios for Meta-Spec and enhances the potential of gut microbiomes in multiple disease screening with little easy-collected information from hosts.

Meta-Spec with Hybrid Model Expanding the Application of Microbiome across Geographical Locations
Geography has been shown to have a significant impact on the variations in the human microbiome. [16,23]Making a data model built from the local cohort, the most suitable option for microbiome-based detection, limited training samples can pose a challenge to this approach.In such cases, adopting well-validated models from other locations becomes a more practical option.However, the applicability and compatibility of the model need to be taken into consideration.To study the cross-cohort multilabel classification of Meta-Spec, we conducted a fivefold cross-validation on the UK cohort of the AGP dataset (Table 1, Dataset 3).In each fold, we used one-fifth of the UK samples as the test set and two different sets for training (Figure 4a) of 1) the local set (comprising other 4/5 UK samples) and 2) the hybrid set (comprising other 4/5 UK samples and US samples).Moreover, for both of the two training sets, we varied the amount of UK training samples from 10% to 100% to simulate the lack of local samples for modeling.
As expected, the UK local cohort with limited learning data (only 10% of the training set) exhibited poor classification performance (as shown in Figure 4b).However, when combined with microbiome features and host variables from the US cohort, the detrimental effect of data scarcity was mitigated, resulting in a significant improvement in cross-cohort classification results.By introducing more UK samples into both sets, we found that the hybrid set consistently outperformed the UK local cohort in terms of AUPRC and AUROC on the performance curve.Even with only 40% UK samples, the hybrid set achieved similar levels of performance as those obtained utilizing the entire UK local set.Furthermore, the performance of the local set, comprising 100% UK samples, can be further optimized and enhanced by incorporating cross-cohort data.Thus, despite the challenges posed by interlocation variations in the gut microbiome, host variable embedding and multicohort model integration offer significant technical advantages in bridging this gap in status classification.

Applicability of Meta-Spec on Regular Single-Label Classification
To further verify the applicability of Meta-spec on widely studied single-label classification, we also employed the Dataset 4 with 3391 metagenomes from multiple cohorts (Table 1; refer to Experimental Section for details). [29]Here each patient was labeled by a single disease from four categories of acute cerebral vascular disease (ACVD), colorectal cancer (CRC), Crohn's disease, and type 2 diabetes (T2D).With this dataset, two types of single-label classification test were performed by Meta-Spec, RF, and LGB using fivefold cross validation: 1) binary classification that only distinguishes disease samples from healthy controls.We also calculated the Gut Microbiome Health Index (GHMI) [29] for each sample to predict the likelihood of disease.As shown in Figure 5a, RF and LGB exhibited higher AUROCs than GMHI in binary status classification using microbiome taxonomy features.When further embedding four available host variables including gender, age, BMI, and geographical region in model training, the average AUROC was improved up to 0.97.2) Multi-category classification that specifies the detailed disease type.Figure 5b shows that prediction models were sharply enhanced by additional host information in the overall kappa coefficient.Therefore, compared to regular machine learning methods, Meta-Spec can also offer optimal performances in single-label disease screening.

Conclusion and Discussion
For years, researchers have been fascinated by the potential link between the human microbiome and various diseases. [30,31]By studying microbial features, scientists hope to predict changes in human health status.However, this is a complex challenge, as disease interactions and variations in host lifestyles can interfere with the human microbiota.While the gut microbiota has been shown to play a critical role in human health, it's important to consider the impact of host information on disease screening.Although these variables have been considered in experimental design, cohort recruitment, multivariant statistics, and effect size measurement, they have not yet been included in microbiomebased modeling.By incorporating easily collected host phenotype data, such as diet, BMI, and age, disease screening models can be significantly improved in terms of sensitivity and precision compared to those that rely solely on microbial features.On the other hand, ML has been increasingly used to develop prediction models. [32,33]However, many training processes still rely on traditional ML techniques, such as SVM and RF, which do not take advantage of the latest developments in ML or DL.Using multiple datasets and cohorts, we have demonstrated the benefits of a cutting-edge deep neural network for multidisease classification of biological data with inherent complexity.Furthermore, by treating each label as a single task, our Meta-Spec approach can be quickly and easily updated to accommodate additional diseases with only minor modifications, while regular ML models require significant reconstruction.These efforts represent an important step toward understanding the underlying properties of unknown microbiomes.

Experimental Section
Host Variable Embedding: A microbiome sample can be demoted as a vector x ¼ x 1 , : : : , x h , x hþ1 , : : : , x d ð Þ , where the first h features are sparse features (host variables) and the last (d À h) features are dense features (microbial members).Since the number of sparse features is much less than dense features (h << d À h), the effect of host variables can be diluted in modeling due to the imbalanced feature number.To tackle this problem, in Meta-Spec, we encoded each sparse feature as an m-dimensional embedding vector (m was set as 128), and all embedding vectors were then reintegrated with dense features into a

Utilization of Microbiome Associations among Diseases Based on MMoE:
To capture microbiome associations among diseases, MMoE layer was introduced in our deep learning framework (Figure 1c).MMoE layer contains multiple experts to model different DNNs with the high-dimensional vector c and multiple gates that learn different mixture patterns of experts assembling to catch the disease relationships. [14]The output of the l th expert was denoted by f l c ð Þ, and w k g ∈ ℝ L was the weight vector of the k th gate.The MMoE output for the k th tower network T k c ð Þ was obtained by weighting the output of experts as Equation ( 1) As shown in the above equation, the MMoE explicitly trains the microbiome relationships among diseases and learns with the shared representation.It can automatically learn parameters for shared information and allocate weights for different diseases.
Utilization of Microbial and Host Variable Interaction Using Cross Networks: Cross networks were constructed to capture microbial interactions and host variable interactions (Figure 1c).Cross network is efficient in learning feature interactions in an automatic fashion. [15]For microbial interactions, a dense cross network was constructed for microbes by Equation ( 2) where x dense ∈ R dÀh is a combination of all dense features, Þ is a weight matrix, b dense ∈ R dÀh is a bias vector, and ⋅ represents Hadamard product.
Similarly, we also developed a cross network to learn host variable interactions by Equation (3) where x emb ∈ R hÂm is a combination of all sparse features, Þ is a weight matrix, and b emb ∈ R hÂm is a bias vector.As illustrated in Equation ( 2) and ( 3), cross network is simple, and memory efficient and easy to implement.
For each tower network, the outputs of the two cross networks were concatenated with the corresponding MMoE output as its input.Each tower network consisted of a fully connected layer with a sigmoid to output the final predictions.Additionally, an automatic weighted loss function [34] was employed to combine multiobjective losses by Equation ( 4) where c k is the trainable weight of the k th task, w k is the network parameters, and l k x, y k , ŷk ; w k ð Þis the loss function of the k th task.During model construction, different from traditional ML approaches that treat input features as input constants, Meta-Spec continuously updates the embedding vectors by iterations.Therefore, Meta-Spec can not only learn associations among diseases but also take advantage of sparse features, which make it outperform traditional ML methods.
Calculation of Meta-Spec Importance: To rank and quantify the contribution of microbial features and host variables in disease screening, we defined a Meta-Spec Importance (MSI) based on SHAP [35] derived from game theory. [36]With a Meta-Spec model and a test dataset, the MSI value explains the proportion of a feature's contribution to the prediction.More specifically, for a feature i, first we parsed out its relative contribution C i by Equation ( 5) where SHAP ij denotes the SHAP value of the ith feature of the jth sample in the test set.Then the MSI was generated by the normalization of contribution C i in Equation ( 6) and ( 7) In this way, for a specific disease, the total sum MSI of all features was normalized as 100%.
Performance Evaluation and Comparison: Here we also used LR, RF, LGB, and MLP to build regular classifiers for comparison.For multilabel classification, regular ML models were trained from the original vector x (refer to Section 4.1) and organized by binary relevance that decomposed the task into several independent binary classifiers (one per label).According to Statnikov et al.'s work, [37] parameters for each model were tuned as shown in Table 3.We applied nested repeated fivefold cross-validation in the test procedure, which selected 80% of the data as the training set and 20% of the data as the test set.
In each fold, AUPRC, F1-macro, and AUROC were calculated to evaluate the performance.AUROC is the area under the ROC curve, while AUPRC stands for the area under the precision-recall curve.The average AUROC and the average AUPRC across all diseases were treated as the overall AUROC and the overall AUPRC.Additionally, F1-macro averages the F1 score on the prediction of different diseases by the following equations where TP k , FP k , FN k , Recall k , Precision k , and F1 k represent true positive, false positive, false negative, recall, precision, and F1 for detecting the k th disease.
Microbiome Datasets and Preprocess: The brief information of all datasets is summarized in Table 1.Dataset 1 and Dataset 3 were collected from the AGP cohort. [8]ASVs of 16S rRNA gene amplicons and metadata of each subject were downloaded from Qiita (study ID:10317). [38]The taxonomy of ASVs was then annotated by Greengenes 13-8 database [39] using Parallel-Meta Suite. [40]A subject was treated either as a patient if recorded as "Diagnosed by a medical professional (doctor, physician assistant)" for a specified disease in the metadata or as healthy if marked as "I do not have this condition" for all diseases.To eliminate the sparsity of ASVs, we performed a distribution-free independence test based on the mean variance index [41,42] and selected 1168 ASVs that were relevant to healthy status for disease detection.Dataset 2 was collect from the GGMP. [23]he 16S rRNA gene amplicon sequences and metadata of each subject were obtained from EBI (ID: PRJEB18535), and OTUs were picked by GGMP pipeline (https://github.com/SMUJYYXB/GGMP-Regionalvariations).We also employed the distribution-free independence test based on the mean variance index and selected 449 OTUs that were relevant to the target diseases.Dataset 4 was a cross-cohort dataset produced by 34 studies. [29]Species-level taxonomy of raw shotgun stool metagenomes was analyzed by MetaPhlAn 2. [43] Additionally, in each dataset, the chi-square test was utilized to select host variables that associated with at least one target disease.
Code and Data Availability: The software package of Meta-Spec was released at GitHub (https://github.com/qdu-bioinfo/meta-spec).Source data of datasets used in this work is summarized in Table 1.

Figure 1 .
Figure 1.Multilabel disease classification.a) In typical experiment design, each subject only has a single-status label.b) In actual cohort, a subject can have multiple diseases.c) Deep learning framework of Meta-Spec.d) AUROC and AUPRC of Dataset 1. e) AUPRC of each disease of Dataset 1. f ) Disease correlations of Dataset 1.In (d) and (e), 'þ' denotes models with host variables by one-hot coding.

Figure 2 .
Figure 2. Performance of comorbidities detection.a) Patients were divided by comorbidity status.b) Overall AUPRC of Dataset 1. c) Detailed AUPRC of Dataset 1. 'þ' denotes models with host variables by one-hot coding.

Figure 3 .
Figure 3. Meta-Spec importance and host variable refinement.a) Top host variables and microbial members for cardiovascular disease sorted by MSI.Numbers are their actual ranks.b) Distribution of most importance variables among diseases in Dataset 1. c) Change of AUROC of Meta-Spec with different number of host variables.d) Change of AUPRC of Meta-Spec with different number of host variables.

Figure 4 .
Figure 4. Performance comparisons of cross-cohort validation.a) Hybrid modeling for cross-location detection.b) Overall AUROC and AUPRC trend with the increase of UK training samples in hybrid modeling.

Figure 5 .
Figure 5. Performance comparisons on multiclass dataset.a) ROC curve on healthy status detection.b) Performance comparisons on multiclass prediction by Kappa.'þ' denotes models with host variables by one-hot coding.

Table 2 .
Number of features among diseases.

Table 3 .
Parameters of model tuning.