Models for depression recognition and efficacy assessment based on clinical and sequencing data

Major depression is a complex psychiatric disorder that includes genetic, neurological, and cognitive factors. Early detection and intervention can prevent progression, and help select the best treatment. Traditional clinical diagnosis tends to be subjective and misdiagnosed. Based on this, this study leverages clinical scale assessments and sequencing data to construct disease prediction models. Firstly, data undergoes preprocessing involving normalization and other requisite procedures. Feature engineering is then applied to curate subsets of features, culminating in the construction of a model through the implementation of machine learning and deep learning algorithms. In this study, 18 features with significant differences between patients and healthy controls were selected. The depression recognition model was constructed by deep learning with an accuracy of 87.26 % and an AUC of 91.56 %, which can effectively distinguish patients with depression from healthy controls. In addition, 33 features selected by recursive feature elimination method were used to construct a prognostic effect model of patients after 2 weeks of treatment, with an accuracy of 75.94 % and an AUC of 83.33 %. The results show that the deep learning algorithm based on clinical and sequencing data has good accuracy and provides an objective and accurate method for the diagnosis and pharmacodynamic prediction of depression. Furthermore, the selected differential features can serve as candidate biomarkers to provide valuable clues for diagnosis and efficacy prediction.


Introduction
Major depressive disorder (MDD) is a multifaceted mental illness involving genetic, neurological, and cognitive factors.It is characterized by negative tendencies in emotion and suicidal tendencies in behavior.At present, MDD affects more than 350 million people worldwide [1].Among them, more than 98 million people in China suffer from MDD, and the suicide rate accounts for 50 %, higher than the global average of 3 % [2,3] According to the World Health Organization (WHO), MDD will have the highest global burden by 2030 [4].
However, the etiology of depression is unknown, but it is thought that genetic predisposition and environmental factors are influential.Such as stressful life experiences will exert epigenetic modification in these risk genes via DNA methylation to generate long-lasting effects on these gene expression, which in turn cause brain structural and functional alteration and finally increase the vulnerability to depressive disorders.Preclinical and clinical studies have shown significant correlations between stress and depression [5], stress and epigenetic [6], and depression and epigenetic changes [7].DNA methylation is one of the most important epigenetic modifications and is closely related to depression [8][9][10].DNA methylation of BDNF gene can be used as a biomarker for differentiating healthy controls (HCs) from patients with depression due to different methylation levels of CpG units between HCs and patients with depression (DPs) [11].DNA methylation in the 5-hydroxyptamine receptor (5-HTR) family, which includes HTR1A/2A/3A, is also associated with MDD [12,13].Selective serotonin reuptake inhibitors (SSRIs) were the most commonly used antidepressants [14].However, it does not work for all depressed people.Studies have suggested that functional genetic variants in HTR1A and HTR1B may be involved in individual differences in SSRI treatment response [15,16].Thus, epigenetic biomarkers such as DNA methylation, which is found in depressions and is thus important in depression development can help diagnose and predict response to adjuvant chemotherapy [17].
Methylation is a chemical modification of DNA molecules that involves the addition of methyl groups in DNA molecules [18].Sequencing data typically contains information about the DNA sequence and can therefore be used to detect methylation patterns in DNA [19].DNA methylation, which does not involve changes in DNA nucleotide sequence, occurs mainly at CpG site, the region adjacent to cytosine nucleotide and guanine nucleotide, and is one of the most studied epigenetic mechanisms [20][21][22].DNA methylation plays a key role in determining genome structure and function, including regulating cell differentiation and coordinating gene expression [23,24].DNA methylation disorders are associated with the development of atherosclerosis, cancer, obesity, type 2 diabetes, neuropsychiatric disorders, and other complex multifactorial diseases, and predict all-cause mortality [20,[25][26][27][28][29].Single nucleotide polymorphisms (SNPs) are single base variations in a DNA sequence and are one of the most common forms in the genome [30].While most SNPs have no biological consequences, a small number of alternatives have functional significance and are fundamental to human diversity.Moreover, there are complex interactions between methylation variants and SNPs, for example, variations at SNP sites may affect nearby or distant methylation patterns, thereby affecting gene expression or regulation [31].
Early detection and intervention can prevent progression, and predicting a patient's response to treatment can help select the best treatment.However, at present, the clinical diagnostic methods of MDD are mainly based on the diagnostic criteria of MDD in the International Classification of Diseases 10 (ICD-10) or the Diagnosis and Statistics of Mental Disorders V (DSM-V) [32], combined with doctor-patient communication, scale analysis and doctors' diagnostic experience [33,34].This method is easily affected by subjective factors such as the degree of patient cooperation and the doctor's proficiency, which may lead to misdiagnosis or missed diagnosis.
In recent years, with the rapid development of medical big data and artificial intelligence technology, machine learning (ML) and deep learning (DL) have been gradually applied in the early prediction and auxiliary diagnosis of depression due to their powerful data processing capabilities.ML and DL can capture features from large, multidimensional, and interconnected datasets and train them using automatic learning algorithms to find hidden phenotypic or genotypic structures and then predict treatment responses based on individual patient characteristics [35].A recent study using ML to predict antidepressant treatment response in MDD patients identified 25 characteristics that best predicted whether patients would respond to citalopram antidepressants and predicted the Fig. 1.Experimental design of this study.
Y. Hu et al. persistence, chronic, and severity of depression using a self-reported questionnaire [36,37].A study by Kautzky et al. demonstrated that a random forest prediction model for treatment outcome correctly identified 25 % of responders by using 3 SNPs (including BDNF rs6265, PPP3CC rs7430, and HTR2A rs6313) and a clinical variable [38].In addition, Patel et al. employed an alternating decision tree approach to predict treatment outcomes with 89 % accuracy in patients with late-life depression by using structural imaging data and clinical variables [39].There are ways to improve the accuracy of models by screening for some of the clinical features of patients with depression [40,41].
Nevertheless, the current methodologies suffer from inherent limitations, characterized by either a lack of precision or undue complexity [38].Consequently, the imperative exists to devise more streamlined and precise diagnostic techniques for depression or to unearth valuable biomarkers for forecasting depression treatment efficacy.This investigation delves into substantial disparities between individuals afflicted with MDD and HCs, exploiting these pronounced distinctions as a potential avenue to identify diagnostic and prognostic markers for depression.This pursuit is based on the integration of both sequencing and clinical diagnostic data, factoring in multiple variables for optimizing predictive accuracy.

Study design
The analysis of depression identification and prognosis assessment are summarized in Fig. 1.Multiple feature selection methods were carried out to find the best feature subset for the detection of depression patients and the efficacy of antidepressants.Seven binary classifiers were utilized to calculate the classification performances.RFE: recursive feature elimination; LR: logistic regression; SVM: support vector machine; GBDT: gradient boosted decision tree; XGBC: gradient boosting classifier; MNN: multiple neural network; MI: mutual information.

Subjects and clinical evaluation
A total of 300 patients with MDD and 100 HCs were recruited for the study, which was referred by Zhongda Hospital.As shown in Fig. 2, 291 DPs, and 100 HCs were retained for subsequent analysis.
According to the criteria from the Diagnostic and Statistical Manual of the American Psychiatric Association (APA), fourth edition (DSM-IV), all recruited DPs met the criteria for diagnosis of depression.

Antidepressant treatment and outcome assessment
The MDD patients were treated with a single antidepressant as recommended by a clinician (SSRI: n = 174, non-SSRI: n = 112).Before the study began, the assessment and treatment protocols used by psychiatrists were standardized.Other than supportive medical visits, the MDD patients did not receive other psychological interventions.The study excluded patients who needed to change antidepressants or performance noncompliance.The severity of depression symptoms was assessed with the HAMD-17 score at baseline and 2 weeks later.The effective, outcome response was defined as a reduction ratio of ≥50 % from the baseline HAMD-17 score after 2 weeks of treatment, based on the guideline of the World Federation of Societies of Biological Psychiatry.Otherwise, it was defined as ineffective.

Gene polymorphism and methylation detection
To fully detect biochemical markers, peripheral blood samples were collected from subjects at 8:00 a.m. at least 8 h after eating (collected before taking the medication).Blood samples for methylation analysis were collected in a 5 ml EDTA tube and stored at − 80 • C until genotyping.DNA methylation level was analyzed by MethyTarget® (Genesky Biotechnologies Inc. Shanghai, China), a multi-target CpG methylation analysis method based on NGS.According to the manufacturer's protocol, sequencing was performed on the Illumina HiSeq platform in a 2x150 bp paired-end mode.The data obtained by sequencing was analyzed (Fig. 3).The average effective sequencing depth of target regions for all samples was 600 X, more than 80 % samples have an effective depth greater than 400-fold, and 90 % samples exceed 10-fold.After rigorous quality control, 406 eligible CpG sites were detected in 300 depressed patients and 100 healthy controls.A total of 23 SNPs of the four candidate genes met the DNA methylation status criteria of the sequence to be tested for subsequent analysis.There were 5 SNP sites in HTR1A, 3 SNP sites in HTR1B, 1 SNP site in S100A10 and 14 SNP sites in BDNF.

Statistical analysis
Clinical and demographic data were given as mean ± standard deviation.Differences with a P-value <0.05 were considered statically significant.T-test was used to identify differences in continuous variables between groups.Differences in categorical variables were compared using the standard chi-square test.The statistical analysis was implemented using the scipy package of Python.Scipy (version number: 1.4.1)package of Python was used for differential statistical analysis.The Hard Weinberg equilibrium (HWE), the percentage of undeleted loci, and the minor allele frequency of SNP data were analyzed using Haploview 4.2.

Data preprocessing
The collected data contains noise, missing values, unbalanced and non-numerical data.Before feature selection and modeling, we need to preprocess the data, including missing value processing, feature coding, data normalization, and data integration.After pretreatment, the overall quality of big data can be greatly improved.

Data cleaning
These different missing data types were disposed of in different filling methods.Over 20 % of data loss is directly discarded, and less than 20 % loss data can be filled.The missing values of DNA methylation data and SNP data were deleted in this study.As for the clinical indicators, we adopt the mode value to fill the discrete indicators.However, in order to avoid the disadvantage of overconcentration of statistical filling values (mean, median and mode).Based on this, this study intends to use random filling method to interpolate continuous features.That is, substitute values are randomly extracted from the observation data of known features to fill the missing values, which can effectively avoid the fixity of data filling.

Feature coding
The value of DNA methylation level is a set of quantitative data between 0 and 1.For the encoding of methylation data, the median of each site was taken as the threshold.It was coded as 1 if it was greater than the median, or as 0. In addition, One-hot coding was applied to the upper-class genes of chromosomes, sex, education level, insomnia, suicide, and marital status.

Data normalization
The data normalization step is used to scale the feature values for several reasons: (1) To avoid features in greater numeric ranges dominating those in smaller numeric ranges; (2) To improve the efficiency and convergence rate; (3) To get higher classification accuracy.Here we used maximum-minimum normalization to normalize the data (also called features) into a fixed range between the maximum and minimum.Each feature can be scaled to range [0, 1] as follows: [1] where x is the original value, x min and x max are the lower and upper bounds of the feature values, and the X is the scaled value.

Data integration
Data integration involves combining heterogeneous data from different sources to better understand for subsequent analysis.Fig. 4 shows the data integration.

Feature engineering
Feature engineering is the process of converting raw data into features that better express the nature of a problem.Simply put, feature engineering is to find the features that have a significant impact on the dependent variable, so that these features can be applied to the prediction model to improve the prediction accuracy of invisible data.The depression recognition data set and depression efficacy assessment data set have the characteristics of a large feature number and small sample number.The feature selection method in feature engineering is used to identify the relevant feature subset in high-dimensional data and construct feature engineering based on it.This part is implemented by Python 3.7.6.T-test is conducted in the scipy 1.4.1 of Python.Other algorithms are carried out by the Python package sklearn 0.22.1.

Filter method
The filter method is based on the feature inherent characteristic to select feature subsets and does not involve any learning algorithm, which includes T-test and Chi-square test.Where T-test was performed to analyze the degree of difference between two types of samples for a certain feature.The significance level of the T-test was determined as 0.05, the T-value of each feature was calculated and the final difference feature was determined.The larger the statistic value is, the smaller the p-value is, which indicates the greater the importance of the feature.Chi-square is a correlation filtering specifically for classification problems.It calculates the chi-square statistics between each non-negative feature and the label, and ranks the features according to the chi-square statistics from high to low.T-test was used to screen the differential features for continuous features in DNA methylation data and clinical data, and the chisquare test was used to select the differential features for discrete features.

Wrapper method
In wrapper method, predictors are regarded as black boxes, while the performance of predictors is treated as an objective function to evaluate feature subsets.The recursive elimination method (RFE), a popular wrapper method, drops the feature with the lowest feature coefficient returned by an iterative train classifier.If multiple feature subsets reach the best accuracy, the subset with the least features is returned.Through continuous iteration, the feature subset is optimized to improve the discriminant performance.

Embedded method
Embedded method tunes feature selection and classifier as one.That is feature selection is a part of the training process.In this study, the feature importance of random forest was used for feature selection in embedded method.Random forest is a practical ensemble learning method, which is based on decision tree model and bagging to integrate multiple decision trees to form a forest.Then, the last prediction outcome is gained by voting through the multiple decision trees.This study used the Gini coefficient, which is widely used at present, as the branch rule.Specifically, the training set is used to build multiple classification trees, while the test set examines the importance score of each predictor in the decision tree one by one.The average value of the importance score of the predictor in all decision trees is regarded as the importance index of the predictor.The importance of predictors is ranked from high to low.The final feature subset is selected by the out-of-bag prediction error rate.

Classification algorithms
Classifiers may reach drastically different performances on the same dataset.In this study, seven popular classifiers were utilized to evaluate the degree of differentiation of feature subsets between two types of samples, including six machine learning algorithms (logistic regression, decision tree, SVM, random forest, GBDT and xgboost) and a backpropagation deep neural network model.
This study uses default Settings to implement logistic regression (LR), decision tree (DT) and SVM classifiers.Ensemble learning accomplishes learning tasks by building and combining multiple weak classifiers.Random forest algorithm integrates multiple decision trees through the idea of ensemble learning.Its basic decision unit is decision tree.The random forest (RF) algorithm integrates all decision tree classification results, and formulates the category with the most votes as the final output.Gradient boosting decision tree (GBDT) is an iterative algorithm that achieves the purpose of data classification by using an additive model.That is, the linear combination of basic functions reduces the residuals until clearly classification in the training process.XGBClassifier (XGBC) ensemble the individual prediction results and receives the most votes as the final results based on a series of weak classifiers.The six ML models were built on sklearn (version: 0.19.0)package of python.
Deep neural network is a feedforward neural network in deep learning.The deep learning model used in this study is based on the deep learning frame framework of Keras.The model consists of three layers: For the input layer, hidden layer, and output layer, relu is selected for the activation function of the input layer and hidden layer, and sigmoid is selected for the activation function of the output layer.dropout is used to prevent overfitting of the model, and random gradient descent is used to optimize the model parameters.The number of neurons, learning rate, number of training rounds, and batch data size should be adjusted according to the actual situation.At least 200 evaluations were conducted for each model evaluation.

Performance measurements
The area under the receiver operating characteristic curve (ROC) was calculated to evaluate the performance of the prediction model.AUC is a more objective performance evaluation index, which is popular with most researchers to evaluate the performance of the model.The better the prediction model, the higher the AUC.In addition, to better describe the overall performance of the prediction model, measures, such as accuracy (ACC), precision, recall, and F1 score were also used to evaluate the performance of classifiers, respectively.The precision and recall are integrated into the F1 score.The closer it is to 1, the better the model.In addition, ten-fold cross-validation was utilized to calculate the binary classification performance.Specifically, the whole dataset will be divided into 10 separate small datasets.At this point, a 9/10 dataset is used to train the predictive model.The remaining 1/10 dataset is used as a test set to test the performance of the predictive model.

Demographic and clinic characteristics of the participants
The demographics and clinical characteristics of the DPs and HCs were shown in Table 1.There were no significant differences in sex and age between DPs and HCs.NLES (Negative Life Events Scale score) [42], CTQ (Childhood Trauma Questionnaire score) [43], and one-NLES scores showed significant differences between DPs and HCs, as well as the NLES, CTQ and one-NLES of DPs were significantly higher than HCs.These scale data constitute clinical data.

Sequencing characteristics of participants
A total of 391 peripheral blood samples (291 DP samples and 100 HC samples) from recruiters were sequenced on the Illumina HiSeq sequencing platform.After strict quality control, a total of 406 CpG sites were obtained.Methylation levels of CpG sites were calculated with the ratio of methylated cytosines over total cytosines, which is defined as β.The β value is between 0 and 1, and 1 indicates complete methylation of the site.Statistical analysis of 406 CpG sites found that only 29 sites showed significant differences between DPs and HCs (Fig. 5).We also paid attention to the association among DPs, HCs and SNP data.A total of 23 SNP sites of 4 candidate genes were obtained.After the One-Hot coding process, the 23 SNPs became 68 features.After multiple comparative analysis, it was found that 22 SNP sites did not reach the significant level between DPs and HCs.The basic information of SNP data is shown in Table 2.

Machine learning and deep learning predictive results
Based on three independent datasets, we conducted independent analysis and combined analysis, respectively.Six ML algorithms (LR, DT, SVM, RF, GBDT and XGBC) were used for predictive analysis on 11 different datasets (methylation dataset; SNP dataset; clinic dataset; methylation encoding dataset; the dataset of methylation and clinic; the dataset of methylation encoding and clinic; the dataset of methylation and SNP; the dataset of methylation encoding and SNP; the dataset of SNP and clinic; the dataset of methylation combined SNP and clinic; the dataset of methylation encoding combined SNP and clinic).Fig. 6 shows that the ensemble learning prediction models (RF, GBDT and XGBC) were better than single prediction models (LR, DT and SVM) in general.
Since different classifiers produce different results for the same feature set, we compared the prediction results of six machine learning algorithms (LR, DT, SVM, RF, GBDT and XGBC) and a DL algorithm (backpropagation deep neural network model).To avoid the randomness of the dataset partition, we used ten-fold cross-validation to calculate the binary classification.Table 3 shows the prediction results of six machine learning algorithms and a DL algorithm.For the dataset of methylation, SNP and clinic, machine learning achieved good performance in distinguishing DPs from HCs.In general, the RF prediction algorithm has the best prediction effect on the dataset of methylation, SNP and clinic.While the DL algorithm has a poor effect among the seven algorithms, which is mainly reflected in the score of AUC, and F1 score (integration of precision and recall).This may be due to the fact that the amount of data in machine learning is within a certain range due to deep learning, which is also proven in industry.
A: methylation dataset; B: SNP dataset; C: clinic dataset; D: methylation encoding dataset; E: the dataset of methylation and clinic; F: the dataset of methylation encoding and clinic; G: the dataset of methylation and SNP; H: the dataset of methylation encoding and SNP; I: the dataset of SNP and clinic; J: the dataset of methylation combined SNP and clinic; K: the dataset of methylation encoding combined SNP and clinic.

Experimental results and analysis of feature selection
First, the features of 406 CpG sites, 7 clinical features, and 68 SNP sites were ranked by importance score from largest to smallest, and then a stepwise RF was performed to identify the optimal subset of features associated with the outcome variable.Based on the experimental analysis, we summarized the selected features in Table 4.Then, the model is modeled.Compared with the original data model, the predictive performance of the model constructed by using the feature subset after dimensionality reduction of RF is greatly improved.Table 5 shows the prediction results of the feature subset DL model after dimension reduction of RF.The accuracy of the model is 87.274 %, an increase of 6.07 %, the F1_score is 76.90 %, an increase of 72.42, and the AUC is 85.10 %, an increase of 61.17 %.
Then we used P-value≤0.05as the differential screening threshold.A total of 23 differential features were obtained from methylation, clinical and SNP datasets, including 20 methylation features, 2 clinical features and 1 SNP feature.The accuracy of DL prediction results of 23 feature subsets is 82.70 %, F1_score is 49.70 %, and AUC is 80.5 %. 33 differential features were obtained from the methylation, clinical, and SNP datasets, including 31 methylation coding features, 1 clinical feature, and 1 SNP feature.For the prediction performance of 33 feature subsets, the DL algorithm found that the accuracy of the model was 83.85 %, the accuracy of F1_score was 56.47 %, and the AUC was 81.35 %.
Finally, we then selected a RFE algorithm in the methylation combination of SNPs and clinical datasets for evaluation and feature selection.The score of each feature is constantly adjusted in repeated iterations.LR model was repeatedly constructed using RFE based on the variation maps of accuracy, F1_score and AUC in the process of screening optimal feature subsets of methylation, clinical and SNP data sets (Fig. 7).The two goals of reducing the number of features and improving the predictor are combined.We choose a feature subset of 18 features to build the model.The accuracy rate was 86.11 %, F1_score was 92.17 %, and AUC was 92.77 %.
In general, among the several methods, RFE was used to screen out 18 different features, which could effectively distinguish between depressed patients and healthy controls, with the highest accuracy of 87.26 % and AUC of 91.56 %, and the prediction effect was better than clinical judgment.

Demographic and clinic characteristics of the DPs
Table 6 describes the demographic and clinical features of the subjects.Among them, there were significant differences in gender, relapse times, HAMD-0W score of the subjects in the effective group and the ineffective group.The HAMD-0W score of the effective cohort was significantly lower than the ineffective cohort.Surprisingly, the females in effective group were more than the males.

Sequencing characteristics of DPs
The candidate genes' DNA methylation sequencing analysis was performed on the peripheral blood samples from 291 DPs.After strict quality control, there were a total of 406 CpG sites.Statistical analysis of 406 CpG sites found that only 18 CpG sites were significantly different between the effective group and the ineffective group.According to our previous studies, the primers were designed to cover 100 bp upstream and 100 bp downstream of four candidate genes (HTR1A, HTR1B, S100A10, BDNF) SNP sites.As shown in Fig. 8, there were 22 effective SNP sites captured.The basic information of SNP data is shown in Table 7. Analysis for single sites did not reveal any significant associations between the effective group and the ineffective group.

Performance of different classifiers
To reveal the potential value of methylation level, clinic indicators and SNP features in the evaluation of DPs during 2 weeks followups, a DL algorithm and six ML algorithms were constructed to implement the efficacy evaluation.The predictive performance of different classifiers is shown in Table 8.Ensemble learning algorithm combines multiple classifiers, which is better than a single   classifier.GBDT classifier obtained the optimal AUC, which was 51.32 %.The optimal accuracy came from the random forest classifier, which was 58.86 %.High-dimensional data brings dimension disaster to supervised learning, which makes the prediction performance of the model unsatisfactory.Therefore, we adopt the feature selection method to select the relevant feature subsets from highdimensional data in the follow-up.

Feature selection
First, we used T-test to obtain a total of 14 differential features from methylation, clinical and SNP data sets, including 11 methylation features and 3 clinical indicators.At the same time, we screened 18 features from the methylation coding, clinical, and SNP datasets using T-test.There were 15 methylation features and 3 clinical features.Table 9 shows the prediction results of DL.A model was constructed to evaluate the prognostic effect of feature selection by T-test.Compared with the original data model, the performance of the model has been greatly improved, especially in terms of AUC and accuracy.
We then used RF to rank methylation features, methylation coding features, SNPs, and clinical markers according to feature importance scores.By sorting the feature importance scores of the total features of these data sets, the size of the optimal feature subset obtained is shown in Table 10.The optimal feature subset is used to build a DL model, and the model performance is evaluated by judging the indicators.RF feature selection algorithm improves the performance of the model (Fig. 9).Especially in the data set of 11 methylation coding features and clinical features, the accuracy rate increased by 30.54 %, F1_score increased by 46.83 %, and AUC increased by 57.18 %. Surprisingly, the prediction performance decreased when the dataset combined 2 clinical features with 11 methylation encoding features.
It is best that we use the RFE approach to characterize methylation, clinical, and SNP datasets to evaluate the efficacy of two-week DPs.The results show that there is a certain relationship between feature number and evaluation index (accuracy, F1_score, AUC).With the increase of features, the evaluation index will have a certain upward trend, as shown in Fig. 10.A sustained increase tends to be a smooth fluctuation.To balance the two purposes of reducing the number of features and improving the evaluation index of the model, 33 features were selected as the final feature subset.The prediction performance of 33 feature subsets is analyzed by DL algorithm, and the accuracy of the model is 75.94 %, the accuracy of F1_score is 73.99 %, and the AUC is 83.33 %.It has good prediction effect.
In general, we found that among the 33 features selected by the RFE method, the model had the best prediction effect, with an accuracy of 75.94 % and an AUC of 83.33 %, which could achieve an accurate evaluation of drug efficacy in patients with depression

Discussion
This study leveraged clinical data and sequencing data to elucidate notable disparities between individuals diagnosed with DPs and HCs, while also discerning significant variances between subgroups exhibiting effective versus ineffective prognostic outcomes within the DPs cohort.The viability of utilizing these discernible alterations as biomarkers for both diagnosing depression and predicting treatment efficacy was rigorously examined.Notably, our investigation represents the pioneering endeavor to amalgamate DNA methylation sequencing data, SNP data, and clinical parameters to construct predictive models distinguishing between MDD sufferers and healthy counterparts.Furthermore, we extend this innovative approach to forecasting antidepressant treatment outcomes among MDD patients through the application of DL methodologies.
Firstly, this study analyzed clinical data and sequencing data among depressed patients and healthy controls.Clinical factors in depression patients and healthy controls are evaluated in this study.The result found that NLES, CTQ and one-NLES scores were significantly different between depressed individuals and healthy controls.As shown in the results, higher scores of NLES, CTQ and one-NLES were found in depression patients compared with healthy controls.In conjunction with earlier studies, these results suggest that a combination of information from questionnaires and rating scales can meaningfully contribute to identifying depression patients.Meanwhile, this study provides a prediction model to differentiate between DPs and HCs using our method.We evaluated the performance of using clinic data, methylation data and SNP data alone or in combination in the identification model of depression patients.After a comprehensive validation and comparison, we demonstrated that multiple dimensionality features improved the prediction performance and feature interpretability of the prediction model using six ML models.The result further confirmed the ensemble learning models were greater than single prediction models [44].In this study, we pinpointed that RF model outperformed other ML models in terms of AUC for distinguishing depression patients and healthy controls.
We focused on DL model, though six ML methods were compared.Our study achieved a high accuracy (82.383 %) between DPs and HCs with the dataset of methylation, clinic and SNP.Notably, it has a lower AUC (52.8 %) and F1_score (44.6 %).AUC is a better measure than accuracy based on formal definitions of discriminancy and consistency.The model showed high false positive considering that high-dimensional and low-sample datasets will affect the overall prediction performance of the model.To improve the performance of DL model, three different feature selection methods were used to identify the relevant feature subsets in the highdimension data and find the optimal feature subset.Finally, we identified 18 predictive variables of identification DPs from RFE, including 15 methylation features, 2 clinical biomarkers and 1 SNP site. 2 clinical biomarkers encompass CTQ and one-NLES.Previous studies have shown that early life stress was associated with depression episodes [45].The risk of depression will increase six-fold with six months of negative life events.Kessler conducted a study on negative life events of depression.The result showed that the more negative life events and the more serious they were, the higher the incidence of depression and the more serious the symptoms of depression would be.There was a relationship between the number of negative events and response [46].In the present study, we evaluated the role of rs10042486 in DPs through genetic association analysis.The major finding was a significant association of rs10042486 within HTR1A promoter region with depression in the Han Chinese population.This finding might implicate the susceptibility of rs10042486 to depression.Surprisingly, this finding has been consistent with the rs10042486 was significantly associated with schizophrenia from Zhang R [47].Moreover, of the 15 DNA methylation sites, seven CpG sites were significantly different between DPs and HCs, one at HTR1A, two at HTR1B, and four at the BDNF gene.Three of the four CpG sites on the BDNF gene were highly methylated.Studies have found that increased methylation of BDNF gene is associated with depression [48].HTR1B is involved in serotonin synthesis, which affects behavior and anxiety in animal models [49,50].The interaction of recent stressful life events showed a significant correlation between HTR1B SNP (rs6296C) and depression, but there was no significant difference in the number of life events, suggesting that the main contributor to this phenotype was epigenetic life events affecting the HTR1B gene [51].In addition, HTR1A autoreceptors medicate feedback inhibition of 5-HT neurons, which was found to be associated with MDD [52].
There are studies finding that HTR1A SNP rs6295G allele mutations were associated with increased susceptibility to depression [51].These results further suggest that the differential features of gene methylation can be used as potential biomarkers to guide the early

diagnosis of depression.
After this, we combined the 15 methylation predictors with 2 clinical biomarkers and 1 SNP site to establish the predictive models for depression identification by using the DL model.Our result indicated that our DL model may provide a suitable approach to create predictive models for identification of depression patients with 93.4 % AUC, 89.16 % accuracy and 71.3 % F1_score.Exhilaratingly, there is an overlap between the subset determined by RFE and the subsets gained from RF and T-test.This further suggests that these  features can be used as a valid indicator to explain the difference between DPs and HCs.The feature subset determined by RF obtained the best prediction accuracy (85.1 %).Our statistical learning approach allowed us to identify and rank the variables that contributed the variables that contributed the most and understand the mechanism underlying multivariate outcome prediction, which achieved 86.3 % AUC.The result further indicated that DPs and HCs could potentially be distinguished by the 18 predictive variables.The prediction was accurate enough to be clinically meaningful.The present results show that a combination of relatively few genetic and clinical feature variables can effectively identify DPs and HCs, which achieved the purpose of saving computing resources and improving performance.
In the second part, we focused on the efficacy prediction in patients with depression.In this study, we constructed the prediction model based on multiple features to predict 2 weeks of treatment response of antidepressants for DPs at baseline.Clinical and sequencing information were considered simultaneously.Serotonin reuptake inhibitors are the most commonly used antidepressant.In this study, 60.84 % DPs were using SSRI drugs and 39.16 % were patients of non-SSRI drugs.However, SSRI is not effective for all DPs.Approximately 38.14 % of patients did not significantly relieve after the first antidepressant treatment.Consistent with this result, previous studies indicated functional genetic variants in HTR1A and HTR1B may be involved in individual differences in SSRI treatment response [53].Subsequently, associations of DPs with clinical parameters and the efficacy were investigated.The result demonstrated that gender, relapse times, and HAMD-0W were significantly different in DPs.Relapse times of DPs were dominantly characterized by once or twice.Our investigation showed that the fewer recurrences, the higher number of efficacy of patients.The current series of results suggests that episodes of MDD sensitize a patient to further episodes, easily leading to recurrent episodes of the disease.Thereby, it may incur a greater risk of treatment resistance and persistent depression episodes in pathophysiological process,  suggesting more episodes will lead to a reduced response to antidepressants [54].Our finding is consistent with the majority of previous findings in the literature that indicate the number of episodes of MDD has a substantial causal relationship with the efficacy of patients [55].Hence, the number of depression episodes can be one of the best predictors of antidepressant efficiency [56].On the other hand, sex differences appear as a possible factor in predicting antidepressant response.After two weeks of antidepressant treatment, female patients represented a significantly higher proportion of effectivity.Previous studies also demonstrated women as having a more favorable response to antidepressant drugs than men [57].Based on previous studies of depressed patients, Higher scores of HAMD-0W were found in ineffective depression patients.HAMD score can reflect the severity of depression, the more severe the depression, the worse the treatment effect may be.Study comparing the methylated discrepancies of HTR1A, HTR1B, S100A10, BDNF between ineffective patients and effective patients.The result found 18 CpG sites were significantly different between the effective group and the ineffective group.Genetic studies have shown that HTR1A gene variation is associated with depression, especially a functional HTR1A SNP rs6295 in the promoter region was found to be associated with antidepressant pharmacogenetics in the different population [51].HTR1B, as a significant candidate gene, is also thought to be involved in the pathogenesis of MDD and antidepressant treatment [58,59].Therefore, the methylation information of candidate genes can provide beneficial information in depression efficacy assessment.Additionally, we evaluated the role of the SNP sites of HTR1A, HTR1B, S100A10, BDNF genes in depression patients through genetic association analysis.However, a significant association was not observed between SNP site and depression patients in our study.Interestingly, other studies have found that HTR1B SNP rs6298C shows a good response to SSRI antidepressants (citalopram).This may be the result of the number of patients taking citalopram was too small to show a significant relationship.
Subsequently, we assess the combined utility of pre-treatment clinical information and sequencing data, in predicting antidepressant response at week 2 of pharmacotherapy treatment using ML and DL methods.Pharmacodynamics is a complex effect result, affected by a variety of related factors.The results of general ML models and DL model were not ideal, which may have a better performance after feature selection.Meanwhile, in the aspect of integral prediction performance, the ensemble learning prediction models demonstrated significant improvement compared with single-level prediction models.To get better classification accuracy, we applied different feature selection strategies to get the optimal feature subset.Our statistical learning approach allowed us to identify and rank the variables that contributed the variables that contributed the most and understand the mechanism underlying multivariate outcome prediction.Despite the prediction was largely driven by specific symptom profiles.Because sex and HAMD-0W contributed to the prediction of the outcomes.Interestingly, the present analysis shows that the methylation encoding variables play an important role in the prediction, suggesting a complex interplay between methylation encoding information and other predictors in determining treatment outcomes, which adds to the interpretability of the result.
We applied an importance-ranked, forward-selection modeling approach of RF to search for the most predictive input variables from primary datasets of methylation, clinic and SNP.The experimental results clearly demonstrated that the feature subsets determined by random forest have a certain degree of improvement on the prediction effect and generalization performance of the model.The higher classification 67.57% accuracy was achieved by clinic variables in combination with 11 methylation encoding variables using DL model.However, there is a certain gap between the prediction results and our target.
Afterward, the RFE is introduced to realize the screening of feature subsets.Surprisingly, our study has achieved satisfied classification efficacy for distinguishing effective depression patients from ineffective depression patients based on the feature subset determined by RFE models.The DL prediction model is valuable and parsimonious, based on only 33 variables.The result found that the accuracy of the model was 75.94 %, an increase of 46.72 %, 73.99 % of F1_score, an increase of 80.46 %, and the largest increase was AUC, with an increase of 96.07 % from 42.5 % to 83.33 %.There are 30 DNA methylation features, 2 clinical indicators (HAMD score and age of the first onset) and 1 SNP (rs7119334C).As several studies have reported, age is an important factor associated with depression efficacy.The increase in age will cause adverse reactions in the course of depression.Older patients tend to have a worse course of depression than younger or middle-aged patients.However, it was found an important clinical index affecting the efficacy of depression was the age of the first onset rather than the age in the study.The older the age of first onset, the better the treatment effect of depression was (the effective rate was 54.29 % in 70 patients under 30 years old, and 64.44 % in 90 patients over 50 years old).The rs7119334 is a genetic polymorphism site located on BDNF gene, which is related to the mechanism of antidepressant.Previous studies have found that BDNF Val66Met SNP, as well as other two BDNF SNPs, were linked to a reduced response to antidepressant treatment suggesting that these SNPs may impact treatment responses to antidepressants [60,61].And the predictions are reproducible in non-overlapping validation dataset.Moreover, the present results show that a combination of relatively few genetic and clinical variables can predict whether an individual with depression may remission with a specific antidepressant.Additionally, these results demonstrate that a combination of genomic and clinical information has the potential to serve as a clinical decision-support tool that may help the clinician to select an antidepressant that an individual is more likely to benefit from.In this way, clinicians can pay attention to and adjust treatment according to the response results of different patients in the follow-up treatment.
In summary, DL algorithm, leveraging both clinical and sequencing data, exhibits commendable accuracy and offers an objective and precise diagnostic approach for discerning depression and predicting its pharmacodynamic responses.Moreover, the discerned differential features, once selected, hold promise as candidate biomarkers, furnishing valuable insights for the early detection and prognostic evaluation of depression.

Data availability statement
The data related to this study are not stored in publicly available repositories and are private data.All the data are available upon request to the corresponding author.

Fig. 5 .
Fig. 5. Visualization of differential CpG sites in DPs and HCs.

Fig. 7 .
Fig. 7.The change relationship between feature number and accuracy, F1 and AUC.

Fig. 8 .
Fig. 8. Heatmap of differential CpG sites for effective and ineffective groups.

Fig. 9 .
Fig. 9.The prediction performance of the DL model of different feature subsets.

Fig. 10 .
Fig. 10.The relationship between the number of features and the evaluation indicators (accuracy, F1_score, AUC).

Y
.Hu et al.

Table 1
Demographic and clinical characteristics of the participants.
CTQ: the Childhood Trauma Questionnaire score; NLES: the negative life events scale score; one-NLES: the negative life events scale score in one year; life: whether or not a negative event happened, where 0 means it has not happened and 1 means happened; N: sample size.Y.Hu et al.

Table 2
General characteristics of SNP site.
%gene: the percentage non-missing for this SNP site; HWpval: the Hardy Weinberg equilibrium p value; MAF: the minor allele frequency for this SNP site; alleles: the major and minor alleles for this SNP sites.Y.Hu et al.

Table 3
Prediction results of six ML algorithms and a DL algorithm on the dataset of methylation, SNP and clinic.

Table 4
Results of RF feature selection.
Y.Hu et al.

Table 5
Prediction results of the DL model of feature subsets after RF dimensionality reduction.

Table 6
Demographic and clinic characteristics of the depression patient subjects.
Tips: A total of 291 DPs matched the inclusion criteria but did not meet the exclusion criteria, which completed 2 weeks of treatment.Among them, 180 DPs were effective in antidepressant treatment, and 111 were ineffective.The effective rate of treatment was 61.86 %.In the subsequent analysis, 18 clinical biomarkers were used, including age, gender, marital status, HAMD-0W, relapse times before enrollment, etc. HAMD-0W: HAMD-17 scores at week 0; SHAPS score: Snaith-Hamilton Pleasure Scale score; TESS score: Treatment Emergent Symptom Scale score; BSI-CV score: Beck Scale score for Suicide Ideation-Chinese Version; TAS score: Toronto Alexithymia Scale score; SSRS score: Social Support Rating Scale score; SSRI: selective serotonin reuptake inhibitors; T: true, it means the patients have this symptom; F: false, it means no this symptom.Y.Hu et al.

Table 7
General characteristics of SNP site.: the percentage non-missing for this SNP site; HWpval: the Hardy Weinberg equilibrium p-value; MAF: the minor allele frequency for this SNP site; alleles: the major and minor alleles for this SNP site. %gene

Table 8
Performance of different classifiers on the dataset of methylation, SNP and clinic (All predictive performance is mean value).

Table 9
Prediction results of DL algorithm of T-test.

Table 10
The size of the optimal subsets of RF feature selection.