Mesocorticolimbic and Cardiometabolic Diseases—Two Faces of the Same Coin?

The risk behaviors underlying the most prevalent chronic noncommunicable diseases (NCDs) encompass alcohol misuse, unhealthy diets, smoking and sedentary lifestyle behaviors. These are all linked to the altered function of the mesocorticolimbic (MCL) system. As the mesocorticolimbic circuit is central to the reward pathway and is involved in risk behaviors and mental disorders, we set out to test the hypothesis that these pathologies may be approached therapeutically as a group. To address these questions, the identification of novel targets by exploiting knowledge-based, network-based and disease similarity algorithms in two major Thomson Reuters databases (MetaBase™, a database of manually annotated protein interactions and biological pathways, and IntegritySM, a unique knowledge solution integrating biological, chemical and pharmacological data) was performed. Each approach scored proteins from a particular approach-specific standpoint, followed by integration of the scores by machine learning techniques yielding an integrated score for final target prioritization. Machine learning identified characteristic patterns of the already known targets (control targets) with high accuracy (area under curve of the receiver operator curve was ~93%). The analysis resulted in a prioritized list of 250 targets for MCL disorders, many of which are well established targets for the mesocorticolimbic circuit e.g., dopamine receptors, monoamino oxidases and serotonin receptors, whereas emerging targets included DPP4, PPARG, NOS1, ACE, ARB1, CREB1, POMC and diverse voltage-gated Ca2+ channels. Our findings support the hypothesis that disorders involving the mesocorticolimbic circuit may share key molecular pathology aspects and may be causally linked to NCDs, yielding novel targets for drug repurposing and personalized medicine.


Introduction
According to the United Nation's (UN's) "Sustainable Development Goals set forth in 2015", all countries are committed to reducing premature deaths due to noncommunicable diseases (NCDs) by 33.3% by 2030.Despite this commitment, according to the 2022 Progress Monitor, only a few countries are on their way to meet these objectives [1].Lagging results are incomprehensible, as evidence-based guidelines are in order for both preventing and managing NCDs [2].The controversy may be resolved if the nature of NCDs is taken into consideration, as the most common risk factors for the four most prevalent noncommunicable diseases e.g., cardiovascular diseases, cancer, diabetes and chronic respiratory diseases are alcohol misuse, unhealthy diets, smoking and sedentary lifestyles.Given these antecedents of NCDs, the responsibility of the individual should be considered, especially in light of the fact that these four risk behaviors are all linked to the mesocorticolimbic (MCL) system (for a brief overview, see [3,4]).Hence, it may be proposed that altered reward-related processes contribute to the development of risk behaviors paving the path to NCDs.The ongoing discourse to include common mental disorders (CMDs) under the umbrella term of NCDs must also be noted.Shared etiology, high co-morbidity and overlapping pathways between NCDs and CMDs has been established previously [5].
Alteration of MCL function is well established in the case of the harmful use of alcohol [6], nicotine addiction [7] and hedonic eating [8].Additional to the aforementioned risk behaviors, numerous mental disorders, e.g., schizophrenia [9], depression [10], anxiety [11], eating disorders [12], pathological gambling [13], post-traumatic stress disorder [14], have also been associated with altered MCL function.Thus, it may be proposed that alteration of the MCL system may be a common factor in the etiopathogenesis of several risk behaviors and mental disorders and thus may emerge as a transdiagnostic marker linking NDCs and CMDs.Furthermore, by identifying shared molecular pathways, novel molecular targets that are potentially exploitable for drug repositioning may be identified.The need for new therapeutic approaches is pressing given the unmet medical need for combating NDCs and CMDs.A possible way to bridge the gap for unmet medical needs is to identify novel targets for drug repurposing.With increasing barriers for innovative de novo drug development (e.g., financial, regulatory and time constrains), focus has shifted toward identifying new targets for existing drugs.Identification of new targets may lead the way to target-, knowledge-and network-based drug repositioning [15,16].
Better understanding and identification of the common neurobiological underpinnings of diverse symptoms may yield new therapeutic targets.By unveiling novel therapeutic targets, personalized approaches may be identified, leading to higher response rates in therapy [17].Although classified under distinct diagnostic categories, altered functioning of the MCL system has been described for various mental disorders as well as risk behaviors (see above).Thus, we set out to test the hypothesis that the mesocorticolimbic system is a common denominator contributing to the pathogenesis of diverse mental disorders and risk behaviors and as such may be approached therapeutically as a group.

Performance of the Model
The performance of the model in terms of prediction error as a function of the number of features included in the model is shown in Figure 1.Removing features increased model performance up to a point (e.g., 47 features), because removing the less informative features reduced noise and biases.However, as the removal process progressed, elimination of explanatory features led to information loss and poorer predictive model performance.
The performance of the final model, as assessed by the area under curve for the receiver operator curve was 0.927, reflecting a good accuracy for the model, with good ability to recover a large portion of what is already known (Figure 2).The variation of model performance was ~90% when assessed in the cross-validation process over ten repeats of the outer loop (Figure 3).

Targets for Treating MCL Disorders
The integrative prioritization of proteins as candidate targets by machine yielded several potentially new therapeutic targets.Among the highest ranked 250 ta (53%) of the 137 predefined positive control genes/targets were recovered, correspo an approximate known target enrichment of 36.6-fold over baseline.(For the list of th targets yielded by machine learning see Supplementary Table S1).The average rank the set of the top 250 targets was 188.4 (median 189.3), with a minimum of 4.2 and a m of 339.4.The worst score in the entire table was 16,589 for the considered total of 18,4 Using information curated in MetaBase TM , the existence of direct connect tween the highest ranked 250 targets was assessed.Utilizing the information fro ture-derived protein-protein interaction targets seemed to closely interact, repres densely interconnected system involved in neurotransmission and related functio central nervous system (Figure 4).

Targets for Treating MCL Disorders
The integrative prioritization of proteins as candidate targets by machine learning yielded several potentially new therapeutic targets.Among the highest ranked 250 targets, 73 (53%) of the 137 predefined positive control genes/targets were recovered, corresponding to an approximate known target enrichment of 36.6-fold over baseline.(For the list of the top 250 targets yielded by machine learning see Supplementary Table S1).The average rank score in the set of the top 250 targets was 188.4 (median 189.3), with a minimum of 4.2 and a maximum of 339.4.The worst score in the entire table was 16,589 for the considered total of 18,493 genes.
Using information curated in MetaBase TM , the existence of direct connections between the highest ranked 250 targets was assessed.Utilizing the information from literaturederived protein-protein interaction targets seemed to closely interact, representing a densely interconnected system involved in neurotransmission and related functions of the central nervous system (Figure 4).

Assessment of the Top 25 Control and Novel Targets
The top 25 control target genes yielded by the final integrated model are shown in Table 1.The highest ranked target in the control target set was dopamine receptor D2.It should be noted that all dopamine receptors scored highly in the feature matrix, though not equally so: DRD2 (rank 1), DRD3 (rank 9), DRD4 (rank 19), DRD1 (rank 45), DRD5 (rank 278).The current results point to the higher relevance for the D2-like family (D2, D3, D4) regarding MCL disorders than the D1-like family (D1, D5).Another example of a well-established high scoring target is monoamine oxidase B (MAO-B).This target attained the second highest predicted score.In this analysis, for each pair of proteins prior knowledge of direct interaction included in MetaBase TM was assessed.Proteins with previous reports of direct interactions were connected with an arrow on the plot.Green arrows correspond to activating interaction mode, red-inhibition, grey-effect unspecified.Proteins are clustered by function: green (in the center)-ligands, yellow (in the center)-enzymes (kinases, phosphatases, etc.), blue (bottom, right)-plasma membrane receptors, purple (left)-ion channels, the four greenish proteins at the top are G-proteins; light blue at the top-proteins with miscellaneous functions.Red circles indicate that a protein belongs to the top 250 target list (all proteins on this plot).The top targets represent a dense interconnected system rather than a set of isolated molecules working independently in the cell.

Assessment of the Top 25 Control and Novel Targets
The top 25 control target genes yielded by the final integrated model are shown in Table 1.The highest ranked target in the control target set was dopamine receptor D2.It should be noted that all dopamine receptors scored highly in the feature matrix, though not equally so: DRD2 (rank 1), DRD3 (rank 9), DRD4 (rank 19), DRD1 (rank 45), DRD5 (rank 278).The current results point to the higher relevance for the D2-like family (D2, D3, D4) regarding MCL disorders than the D1-like family (D1, D5).Another example of a wellestablished high scoring target is monoamine oxidase B (MAO-B).This target attained the second highest predicted score.Upon further assessment of the enriched molecular functions of these top 25 control targets, it may be observed that these targets are mainly involved in neurotransmission and modulation of membrane potential (Table 2).
Table 2. Enriched molecular functions of the top 25 known targets.r-number of genes with a given molecular function found within the top 25 known targets.R-number of analyzed targets (top 25).n-total number of genes associated with the function in the human genome.N-gene universe size (=number of genes associated with at least one known molecular function in the Gene Ontology database).p-value-enrichment p-value (hypergeometric test).Taken together, the analysis recovered a number of previously well-established targets for disorders that belong to the group of MCL disorders at the top of the prioritization.This suggests that this computational model successfully captured current knowledge and thus may be used to generate novel predictions regarding putative targets for drug development in the field of MCL disorders.

Putative Targets for MCL Disorders
Assessment of the top 25 novel targets retrieved by the model yielded serendipitous results (Table 3).Highly ranked novel targets included DPP4, PPAR-gamma, NOS−1, BDNF, IL1 and IL6 (see Supplementary Table S1).The highest ranked novel target in this analysis was dipeptidyl-peptidase 4 (DDP4), a ubiquitous, intrinsic membrane glycoprotein.This target is mainly exploited for the management of hyperlipidemia, hypertension and diabetes.Another highly ranked novel target was peroxisome proliferator-activated receptor gamma.Similarly to DDP4, PPAR-gamma is mainly used in the context of metabolic diseases but not in psychiatry.Components of the renin-angiotensin aldosterone system, e.g., angiotensin converting enzyme (ACE) and angiotensin II receptor type 1, were also highly ranked, molecular targets mainly associated with the function of the cardiovascular system and metabolism.Furthermore, it should be noted that a key neurogenesis factor, brain derived neurotrophic factor (BDNF), was also predicted to be tightly connected with MCL disorders.Interestingly, IL1B and IL6 also scored relatively highly among the novel targets (55th and 69th out of thousands of proteins in the prioritization).When further evaluating the top 10 gene ontologies of molecular function with enrichment in this group of targets, components of Ca 2+ signaling converging on cAMP response element binding protein 1 (CREB−1) emerged (Table 4).

Discussion
The findings of the current study support the hypothesis that disorders involving the mesocorticolimbic circuit may share key molecular pathology aspects.Furthermore, this study aimed to identify the common targets for mesocorticolimbic system disorders as a group and test a hypothesis that these disorders may be approached therapeutically as a group.Selection of the initial scope of disorders was based on prior knowledge regarding the involvement of the mesocorticolimbic circuit.Using a combination of approaches, e.g., knowledge-, network-and disease similarity-based methods, candidate targets were retrieved for prioritization, leveraging the Thomson Reuters' bioinformatics portfolio.Each approach characterized proteins from a specific perspective to assess the likelihood of a protein to represent a target for the MCL disorder as a group.Next, using machine learning the individual approaches were combined into an integrated score, resulting in a final target prioritization.Machine learning identified characteristic patterns of the already known targets (control targets) from the final model, with high accuracy (the AUC of the ROC curve was ~93%).This is similar to model performance reported by others [18].As machine learning analysis was trained on the union of targets for MCL disorders, this high-accuracy coherent prediction, with the majority of already established targets readily being recovered at the top, further underscores the hypothesis of joint etiology.This result is unlikely for a collection of disconnected disorders, suggesting that these pathologies share key molecular links.Next, using rules obtained on this training set all other proteins (~18,000) were prioritized.The analysis resulted in a prioritized list of 250 targets for MCL disorders, many of these targets were already known to be involved with the mesocorticolimbic circuit.Upon construction of the molecular network from the highest ranked 250 targets, a densely interconnected network emerged, further supporting the hypothesis that the molecular targets of MCL disorders represent a molecular system with numerous avenues for crosstalk.The well-established top targets included dopamine receptors, MAOs and serotonin receptors (Table 1), whereas novel (not part of the training set) targets included DPP4, PPARG, NOS1, ACE, ARB, CREB1, proopiomelanocortin and diverse voltage-gated Ca 2+ channels (Table 3).These targets may represent candidates for further investigation in the examined disorders.
One of the fundamental characteristics of the MCL circuit is the involvement of the dopamine pathway (for an overview, see [3,4,19]).Dopamine receptors are primarily located in the caudate, putamen, nucleus accumbens and olfactory tubercle, where they are involved in the modulation of locomotion, reward, reinforcement, memory and learning [20,21].Thus, the mesocorticolimbic system may be regarded as a dopamine system, and an entire field of study concerns itself with the critical importance of dopamine signaling in the brain.In the current study, every dopamine receptor received a high rank, although the D2-like family (D2, D3, D4) seemed slightly more relevant for the treatment of MCL disorders than the D1-like family (D1, D5).Another example of a well-established highly ranked target is MAO-B (this target received the second highest rank).MAO-B is one of the two MAO isoforms that catalyze the oxidation of monoamines including dopamine, serotonin and adrenalin.Dopamine is mainly metabolized by MAO-B.Abnormal regulation of MAO-B is associated with depression, substance abuse and attention deficit disorder, among others [22].Thus, it seems that when MCL disorders were considered as a group, some generally known molecular targets were retrieved by the model.
That machine learning was able to predict known drug targets for a seemingly diverse group of mental disorders with high accuracy suggests that the computational model captures existing knowledge well.Taking these concepts one step further, the ability of this approach to make predictions regarding novel, unknown disease-specific drug target candidates for future research could be proposed.Others have shown that diseases that share a high number of differentially expressed genes have similar underlying pathologies and pathways [18].Using the results of target-based clustering, Emig and colleagues were able to identify diverse diseases that share common underlying biological processes and, based on these findings, novel disease-specific drug target candidates emerged [18].
It is interesting to note that several known targets for metabolic diseases, e.g., diabetes and insulin resistance, emerged from the model prediction.The highest ranked novel target, DDP-4, is a ubiquitous, intrinsic membrane glycoprotein and a serine exopeptidase that assumes a role in the post-translational modification of membrane proteins by cleaving X-proline dipeptides from the N-terminus of polypeptides.DDP-4 is present on the luminal side of epithelial and endothelial cells and on monocytes, lymphocytes and natural killer cells.The key enzymatic function of DDP-4 is the degradation of incretins, a polypeptide that stimulates pancreatic α and β cells and regulates glucose-stimulated insulin secretion [23].Membrane-bound DDP-4 may be cleaved and its enzymatically active soluble form is present in the circulation and was implicated in playing a causative role in development of insulin resistance [24].The third highest candidate target was PPAR-γ, which is known to improve insulin sensitivity and glucose metabolism by promoting fatty acid uptake and triglyceride formation and storage in lipid droplets [25].Furthermore, several other targets with links to insulin sensitivity were ranked highly among the list of novel targets, like angiotensin converting enzyme and angiotensin II receptor type 1 [26].Neuronal nitric oxide synthase, additional to playing a role in insulin sensitivity, assumes an important role in the central regulation of glucose homeostasis, in fact the central actions of apelin, ghrelin and leptin are known to be nitric-oxide-dependent processes (for an overview, see [27]).The emergence of druggable targets classically related to glucose and lipid homeostasis in the context of MCL disorders is worthy of notice.That hunger is a primary driver for organisms is well established [28].That sweet taste has inherent hedonic value and is considered as a primary reinforcer is also well known [29].Starting from that motivation, incentive salience and reward processing are at the heart of the MCL system; the emergence of novel targets that are involved in metabolism point to the fundamental two-way interaction between somatic and mental function.In agreement with these findings, others report similar results.In a small community sample of 109 healthy adults with normal blood pressure (120-139/80-89 mmHg), fronto-striatal cerebral blood flow was assessed in response to cognitive challenge and cardiometabolic risk was assessed at follow-up after 2 years.In this prospective study, the authors found that greater activation of fronto-striatal brain regions was associated with increases in cardiometabolic risk factors at follow-up [30], suggesting an interplay between the MCL circuit and the cardiovascular system.Heart-brain interactions in the context of mental disorders, e.g., dementia, depression, as well as vascular diseases (ischemic heart disease, heart failure stroke and Takotsubo syndrome), have been reviewed previously.The authors articulated the central role of the limbic system as forming the link between the two organ systems [31].Furthermore, they emphasized the need to further investigate the molecular mechanisms regulating the brain-heart interaction with the hope of identifying new individualized treatments that interrupt the pathogenic crosstalk.
Our findings that the MCL system may be a common pathogenic factor underneath CMDs and NCDs are further corroborated by epidemiologic observations.Comorbidity between cardiometabolic diseases and mental disorders has been reported in a large crosssectional study based on the UK Biobank.Overall, data from 391 083 participants with available information regarding cardiometabolic diseases (CMDs) and depression was analyzed.The presence of CMDs showed a dose-dependent and accumulative relationship with an increased risk of depression disorder after multivariable adjustments [32].Conversely, an increased risk for cardiovascular disease was identified in a longitudinal cohort study.The cohort included 900,240 patients newly diagnosed with psychiatric disorders, their 1,002,888 unaffected siblings and 1:10 sex-and age-matched reference population with no prior diagnosis of cardiovascular diseases at enrollment.The follow-up period was 30 years.A direct link between psychiatric disorders, hypertension, ischemic heart disease, angina pectoris, venous thromboembolism and stroke was established independent of familial factors.The authors concluded that people with psychiatric disorder are at increased risk for CVDs and that increased surveillance and treatment of CVDs should be an integral part of their clinical management [33].Moreover, the comorbidity of several cardiometabolic diseases and mental disorders has also been described in smaller studies [34], e.g., insulin resistance and ADHD [35], heart failure and cognitive impairment [36], and myocardial infarct and anxiety [37].
The study has some limitations in terms of its potential for translation, as data-driven identification of novel therapeutic drug targets may not yield viable therapeutic options due to, for example, dosing issues (e.g., s higher dose is needed for the new indication, raising concerns regarding safety), considerable off-target effects and possible barriers coming from intellectual property claims [38].Furthermore, the current analysis is based on two proprietary databases not in the public domain, which limits independent access.However, this adds some additional merit to the study, by allowing drawing inferences from such databases.A further limitation of the study is the need for future clinical validation.However, given the fact that several currently marketed drugs are available for several of the drug targets newly identified within the current study, new candidates with well-established safety profiles may be considered for proof-of-concept trials.For example, a PPARG agonist (e.g., pioglitazone), DDP4 antagonists (e.g., sitagliptin, linagliptin, etc.), ACE inhibitors (enalapril, captopril, etc.) and AGTR1 blockers (e.g., telmisartan, losartan, etc.) are currently available for various cardiometabolic indications, thus may serve as feasible candidates to pursue clinical proof-of-concept studies for use in MCL disorders.Alternatively, retrospective analysis of clinical records or health claim records may provide preliminary evidence for the translation potential of currently used medications (for example the use of ACE inhibitors for cardiomyopathy in patients suffering from comorbid depression).
In conclusion, our findings support the hypothesis that disorders involving the mesocorticolimbic circuit share key molecular pathways and may be causally linked to NCDs.Looking at the leading NCDs and their risk factors paralleled by the co-existence of CMDs, it is plausible that the intertwining relations between known targets for the MCL circuit and cardiometabolic diseases may yield already known well-tolerated drugs for drug repurposing and personalized medicine.

Selection of Disorders and Positive Control Targets
The current analysis included mental disorders with prior knowledge regarding the mesocorticolimbic system's involvement in their etiopathogenesis.Disorders included in the Thomson Reuters databases Integrity SM and Metabase TM were included in the analysis.Accordingly, Alzheimer's disease, anorexia nervosa, anxiety, bipolar disorder, bulimia, depression, mania, obsessive-compulsive disorder, Parkinson's disease, pathological gambling, phobic disorders, post-traumatic stress disorder and schizophrenia were selected for the current study (from here on collectively referred to as MCL disorders).After defining the scope of the mental disorders' network-and pathway-based algorithms, knowledge and statistical inference were used to identify links between a disorder and its target protein(s).
Integrity SM (http://integrity.thomson-pharma.com(accessed on 6 February 2016), a unique knowledge solution integrating biological, chemical and pharmacological data, access in not in the public domain) is a collection of integrated biological, chemical and pharmacological data on 400,000 compounds and 220,000 patient family records.Molecular targets, indication of use and clinical phase of development are assigned to each drug.According to the clinical development phase, drug targets may be validated, candidate, exploratory or none.The drug targets associated with launched drugs or drugs under active development for the indication of the disorder of interest are considered validated targets, and targets associated with drugs that are no longer under active development for the respective disorder are considered candidate targets.Exploratory drug targets are related to drugs that are under biological investigation for the disorder.Drug targets not assigned any status were not included in the current analysis.For each disorder, Entrez Gene drug target associations were identified and served as true positives for each disorder for evaluating drug target predictions.Targets that were associated with at least two of the predefined MCL disorders were used as positive controls when model predictions were assessed.Overall, 137 common druggable targets were identified based on information coded in the Integrity SM database.
MetaBase™ (a comprehensive, manually curated and semantically consistent knowledge source, access is not in the public domain [39]), on the other hand, is a database of manually annotated protein interactions and biological pathways.This database stores each molecule as a network object and contains the functional description of the molecule (receptor, kinase, transcription factor, etc.).Network objects may be proteins, protein complexes or protein families, or small molecules including non-coding RNAs and metabolites.The interactions included in MetaBase™ have been manually compiled from the literature and represent a physical relationship between pairs of elements in the network.The interactions are marked with additional information regarding bidirectionality (activation, inhibition or unspecified) and mechanism of action (e.g., binding, phosphorylation).Only proteinprotein interactions with direct mechanisms were used in this study.Accordingly, the mechanisms included: binding, competition, transformation, cleavage, catalysis, transport, receptor binding, transport catalysis and covalent modification (phosphorylation, dephosphorylation, ubiquitination, deubiquitination, sumoylation, desumoylation, neddylation, deneddylation, etc.).Indirect mechanisms (e.g., influence on expression) were omitted.
In addition, MetaBase™ stores information on linear pathways, representing the signaling cascade from receptor activation to the cellular response and pathway maps representing pathway depictions in a particular biological context.Pathway maps include several linear pathway components with crosstalk and regulatory feedback loops among them.Regular pathway maps capture normal signaling pathways.Disease-specific pathway maps depict signaling events most relevant to a disease or disorder in focus, e.g., gain or loss of protein function resulting in new or disrupted protein interactions compared with the respective normal pathway, etc.In the current study, over 700 regular and over 600 disease pathway maps were used from the MetaBase™ portfolio.
To attest if a common core molecular network could be identified for mental disorders with known links to the MCL system, proteins as targets were prioritized in this respect.Knowledge-based, network-based and disease similarity-based approaches were used to retrieve potential targets.The different approaches produced various scores for each protein encoded in the human genome.The scores were integrated into a final score by machine learning analysis in a way where integration combined the different scores with appropriate weights.Weights were iterated using the weights from targets that are already known for the disorders in question.

Knowledge-Based Approaches
Using knowledge-based approaches: a protein's role in MCL disorders was assessed based on prior knowledge in terms of this protein's connection to pathways, molecular interactions and biomarkers involved in the disorder.Prior knowledge was defined as information available in the Integrity SM and MetaBase TM databases.Overall, five knowledgebased methods were used as follows.First, the established biomarkers for each mental disorder involved in the analysis were inferred jointly from the two databases.For each indication, proteins received a score of 1 or 0 for being a biomarker or not, respectively.Next, using the MetaBase TM network, interaction inference was assessed by identifying the targets (proteins) that interact with targets known to be involved in the disorder.All 1-step-away direct neighbors of these proteins were identified and tested for enrichment with known biomarkers of each disorder under investigation.A score of a log-transformed enrichment p-value was computed for each protein.Third, using the MetaBase TM pathway maps, enriched pathway inference was retrieved to predict novel targets.Signaling pathways significantly enriched in biomarkers of the disorders were identified and proteins in these pathways were predicted as candidate therapeutic targets.Computationally, each pathway was tested for an enrichment in disease biomarkers yielding a −log(p-value) associated with each pathway in the database.Next, these pathway scores were transformed into protein scores by adding each pathway score for a protein of interest across relevant pathways (e.g., where a protein of interest is present).The fourth approach used pathways specifically constructed for biological processes or diseases curated by the MetaMiner consortium.For the present analysis, the pathways reconstructed for biological areas related to central nervous system diseases and disorders were assessed (e.g., "Neurophysiological process", "Depressive Disorder", "Mental Disorders", "Parkinson Disease"; as well as general biological themes informing us on basic protein functions in the cells: "Carbohydrates metabolism", "Energy metabolism", "Energy metabolism", etc.).The score for a potential target was obtained by counting the number of pathways on which the protein of interest was present.Finally, linear pathway inference was performed to retrieve potentially new targets using MetaBase™.In linear pathway inference, a protein is predicted as a target if it is frequently found on linear pathways containing biomarkers of the disease.For the present analysis, the inclusion criteria for linear pathways was to start with a ligand-membrane receptor interaction and end with a transcription factor.A protein score was calculated as the ratio of the number of linear pathways containing the protein of interest together with at least one known biomarker for the disorders divided by the total number of linear pathways containing the protein of interest.

Network-Based Approaches
Network-based approaches were used to predict novel protein-disorder associations based on the topological positions of proteins in the interactome.Network algorithms were used to assess the probability of a protein acting as a disease driver by testing its topological significance in the MetaBase™ network with regard to known biomarkers of the disorders.Four network algorithms were used to capture the complementary topological properties of the targets: network propagation, random walk, interconnectivity and neighborhood scoring.These methods score proteins in the MetaBase™ network by how strongly they are topologically associated with the biomarkers of the input disorders.
Network propagation, a global method, was performed as described previously [40].The starting nodes for network propagation were the predefined biomarkers for MCL disorders, from which flow is injected into the network.In each iteration, the flow was diffused further through the network until the changes of flow reached equilibrium.The final flow level for each node described the overall proximity of the node to all the starting points.To compensate for bias due to the network structure, node degrees were taken into account during computations and edge weights were normalized to the node degrees of the adjacent nodes.In each iteration t, the flow for each node was computed as follows: F is a vector containing the flow in iteration t. α is a weighting factor, set to 0.8 in this study.W' is the transition matrix (normalized by the node degrees).Y is the prior knowledge vector that contains a 1 for each starting node and 0 for all other nodes.
Random walk, a global method that utilizes the topology of the whole network, is defined as the transition of an iterative walker from its current node to a randomly selected neighbor [41].The starting points for the random walkers were the known biomarkers for MCL disorders, with random walkers moving to a neighboring node with equal probability at each iteration.Random walkers may also return to their starting points (random walk with restart).At each iteration the nodes are assigned a score that reflects the probability that a random walker visits the node at some point.Once the probability scores reach equilibrium, the random walk terminates.In each iteration t, the probability scores for the nodes are computed as follows: p is a vector containing the node probabilities in iteration t. r is a weighting factor describing the restart probability, set to 0.75 in this study.W is the transition matrix that is normalized to fulfill the properties of a stochastic matrix.p 0 is the starting vector that contains equal probabilities for each starting node and 0 for all other nodes.
Interconnectivity is a local method for prioritizing targets based on their overall connectivity to the set of starting points (e.g., biomarkers) [42].The method accounts for the direct connection between two genes i and j and the neighboring genes that are shared between i and j.Each gene in the network is scored according to its overall interconnectivity to all starting nodes.Similarly to the network propagation method, the interconnectivity corrects for the node degree removing the potential bias introduced by the network topology.The interconnectivity between two nodes i and j is computed as follows: e(i,j) equals 1 if an edge between nodes i and j exists, and is set to 0 otherwise.n(i,j) corresponds to the number of neighboring nodes that are shared by i and j. deg(i) describes the degree of node i, i.e., the total number of its neighbors [42].The final score for each network node is computed based on its overall interconnectivity to the starting points: Neighborhood scoring is a local method that only takes the direct neighborhood of the disease biomarkers into account [43].In neighborhood scoring, each starting point is assigned a differential expression level corresponding to the fold change of the differentially expressed gene (1 in the current study case).This differential expression level is then smoothed over the gene's direct neighborhood using the following formula: For a gene i, its score (FC' i ) is computed based on its own differential expression level (if applicable; this is the first part of the formula) and based on the average differential expression level of its neighbors j. α is a weighting factor and is set to 0.5 in this study.

Disease Similarity Approach
Starting from the contention that diseases with shared aspects of molecular pathology may serve as candidates for drug repositioning between them, disease similarity was investigated to predict further targets.To define disease similarity, the overlap between biomarkers was assessed by obtaining a metric for disease similarity as −log10(P) from a hypergeometric test.Disease similarity was used to select the 15 diseases most similar to the MCL disorders as a group.At least 3 targets had to be shared between the similar disease and the gold-standard training set.These similar disorders/symptoms were psychosis, memory impairment, sleep disorder, vascular dementia, panic disorder, alcoholism, spinal cord injury, mild cognitive impairment, renal failure, stroke, traumatic brain injury, atherosclerosis, ischemic stroke, obstructive sleep apnea and neurodegeneration.Next, each human protein received a disease-specific score of either 1 or 0 for being a target or not being a target, respectively.

Integration of the Evidence
The complementary methods yielded a vast array of input for drug target prediction.For optimal results, information was integrated in a weighted fashion by using machine learning (with higher weights assigned for more informative and smaller weights assigned for less informative methods).Weights were obtained by training the model on the already known targets, followed by application of the learned rule to prioritize all proteins as targets for MCL disorders.For training, the positive controls were defined as the targets associated with at least 2 indications from the predefined list of MCL disorders yielding the training set.The machine learning model was built using a partial least squares (PLS) regression algorithm.The predict function from the caret package was used to train PLS classifiers [44].The analysis was performed using a balanced (stratified) nested crossvalidation.The inner loop of the cross-validation feature selection using the recursive feature elimination approach was implemented.In the outer loop, the performance of the resulting classifiers that were trained on the best performing optimized feature set in the inner loop were tested.Both the outer and the inner loops represented 5-fold crossvalidations.The negative controls were defined as a random subset of all targets that were not positive controls.This allowed for avoiding a strong imbalance between the number of positive and negative controls during the learning process.The negative controls were defined in each fold of the outer loop independently, so the effect of random selection was averaged out across the cross-validation.The number of negative controls was set to 10 times the number of positive controls.The entire cross-validation-from splitting the samples in outer folds and to integrative protein scoring-was repeated 10 times.Target PLS scores were converted into genome-wide ranks within each repeat.The ranks were finally averaged across the repeats to produce the final target ranking.Additionally, recursive feature elimination was performed within the inner loop of the cross-validation and model performance was assessed.Accordingly, change in prediction error with respect to the number of features used to build the model was evaluated by eliminating the least informative features one by one.The optimal number of features was selected for the final model, e.g., the model that corresponded to the minimal prediction error.Feature selection was performed 50 times: 10 repeats multiplied by 5 outer folds.For each feature, the number of times it was selected into the final model (a value from the 0-50 interval) was recorded.This quantity characterized the relative importance of each feature in the modeling process (see also the Supplementary Materials below).
The performance of the final model was assessed using a receiver operating characteristics plot by plotting the true positive rate against the false positive rate for recovering the known control targets in the final target prioritization.The area under curve was used to characterize performance.Furthermore, to characterize model performance, the variation of the performance was assessed for the outer loop of the cross-validation across 10 repeats.

Conclusions
Our results provide molecular evidence for various mental disorders (e.g., Alzheimer's disease, depression, anorexia nervosa etc.) possibly forming a coherent group of disorders with alterations of the mesocorticolimbic circuit.As MCL disorders seem to share substantial molecular aspects, therapeutic exploitation of novel targets may pave the way to new approaches to combat mental disorders and NCDs.

Figure 1 .
Figure1.Predictive model performance characterized by prediction error as a function of the number of features included in the model.Features were removed successively; at each step the lowest ranked feature was removed.The lowest prediction error was seen at 47 features; thus, the final model was developed using the top 47 features.Initially (moving from right to left), prediction error falls as low impact features are removed.However, prediction error increases parallel to removing informative features.

Figure 2 .
Figure 2. Variation of the performance of the predictive model across 10 repeats in the outer loop of the cross-validation of the training set (e.g., targets associated with at least 2 individual MCL disorders were included).Box plots correspond to distributions across the 10 analysis repeats.

Figure 1 .
Figure 1.Predictive model performance characterized by prediction error as a function of the number of features included in the model.Features were removed successively; at each step the lowest ranked feature was removed.The lowest prediction error was seen at 47 features; thus, the final model was developed using the top 47 features.Initially (moving from right to left), prediction error falls as low impact features are removed.However, prediction error increases parallel to removing informative features.

Figure 1 .
Figure1.Predictive model performance characterized by prediction error as a function of the number of features included in the model.Features were removed successively; at each step the lowest ranked feature was removed.The lowest prediction error was seen at 47 features; thus, the final model was developed using the top 47 features.Initially (moving from right to left), prediction error falls as low impact features are removed.However, prediction error increases parallel to removing informative features.

Figure 2 .
Figure 2. Variation of the performance of the predictive model across 10 repeats in the outer loop of the cross-validation of the training set (e.g., targets associated with at least 2 individual MCL disorders were included).Box plots correspond to distributions across the 10 analysis repeats.

Figure 2 .
Figure 2. Variation of the performance of the predictive model across 10 repeats in the outer loop of the cross-validation of the training set (e.g., targets associated with at least 2 individual MCL disorders were included).Box plots correspond to distributions across the 10 analysis repeats.

Figure 3 .
Figure 3. ROC curve of the training set (e.g., targets associated with at least 2 individual orders were included).Box plots correspond to curve distributions across the 10 analysis

Figure 3 .
Figure 3. ROC curve of the training set (e.g., targets associated with at least 2 individual MCL disorders were included).Box plots correspond to curve distributions across the 10 analysis repeats.

Figure 4 .
Figure 4. Literature-derived direct physical interactions connecting proteins from the top 250 target set.In this analysis, for each pair of proteins prior knowledge of direct interaction included in MetaBase TM was assessed.Proteins with previous reports of direct interactions were connected with an arrow on the plot.Green arrows correspond to activating interaction mode, red-inhibition, grey-effect unspecified.Proteins are clustered by function: green (in the center)-ligands, yellow (in the center)-enzymes (kinases, phosphatases, etc.), blue (bottom, right)-plasma membrane receptors, purple (left)-ion channels, the four greenish proteins at the top are G-proteins; light blue at the top-proteins with miscellaneous functions.Red circles indicate that a protein belongs to the top 250 target list (all proteins on this plot).The top targets represent a dense interconnected system rather than a set of isolated molecules working independently in the cell.

Figure 4 .
Figure 4. Literature-derived direct physical interactions connecting proteins from the top 250 target set.In this analysis, for each pair of proteins prior knowledge of direct interaction included in MetaBase TM was assessed.Proteins with previous reports of direct interactions were connected with an arrow on the plot.Green arrows correspond to activating interaction mode, red-inhibition, grey-effect unspecified.Proteins are clustered by function: green (in the center)-ligands, yellow (in the center)-enzymes (kinases, phosphatases, etc.), blue (bottom, right)-plasma membrane receptors, purple (left)-ion channels, the four greenish proteins at the top are G-proteins; light blue at the top-proteins with miscellaneous functions.Red circles indicate that a protein belongs to the top 250 target list (all proteins on this plot).The top targets represent a dense interconnected system rather than a set of isolated molecules working independently in the cell.

Table 1 .
Top 25 control targets predicted for MCL disorders by machine learning.(Entrez IDs are available at https://www.ncbi.nlm.nih.gov/gene(accessed on 3 September 2024), gene symbols are indicated in accordance with the HUGO Gene Nomenclature Committee).

Table 1 .
Top 25control targets predicted for MCL disorders by machine learning.(Entrez IDs are available at https://www.ncbi.nlm.nih.gov/gene(accessed on 3 September 2024), gene symbols are indicated in accordance with the HUGO Gene Nomenclature Committee).

Table 3 .
The top 25 novel targets predicted for MCL disorders via the machine learning approach.(Entrez IDs are available at https://www.ncbi.nlm.nih.gov/gene(accessed on 3 September 2024), gene symbols are indicated in accordance with the HUGO Gene Nomenclature Committee).

Table 4 .
Enriched molecular functions of the top 25 novel targets.r-number of genes with a given molecular function found within the top 25 novel targets.R-number of analyzed targets (top 25).n-total number of genes associated with the function in the human genome.N-gene universe size (=number of genes associated with at least one known molecular function in the Gene Ontology database).p-value-enrichment p-value (hypergeometric test).