Structure-function similarities in deep brain stimulation targets cross-species

Deep Brain Stimulation (DBS) is an effective neurosurgical treatment to alleviate motor symptoms of advanced Parkinson's disease. Due to its potential, DBS usage is rapidly expanding to target a large number of brain regions to treat a wide range of diseases and neuropsychiatric disorders. The identification and validation of new target regions heavily rely on the insights gained from rodent and primate models. Here we present a large-scale automatic meta-analysis in which the structure-function associations within and between species are compared for 21 DBS targets in humans. The results indicate that the structure-function association for the majority of the 21 included subcortical areas were conserved cross-species. A subset of structures showed overlapping functional association. This can potentially be attributed to shared brain networks and might explain why multiple brain areas are targeted for the same disease or neuropsychiatric disorder.


Introduction
With the development of implantable electrodes in the mid-20th century, the modern era of Deep Brain Stimulation (DBS) started (Miocinovic et al., 2013;Pycroft et al., 2018;Schwalb and Hamani, 2008). DBS involves the placement of a neurostimulator that, through implanted electrodes, delivers electrical pulses to specific brain regions (Lozano et al., 2019). By placing the electrode in the thalamus, globus pallidus internal segment (GPi) or subthalamic nucleus (STN), DBS has been successful in alleviating motor symptoms of a number of neuromotor disorders including Parkinson's disease. Similarly, stimulation of the thalamus has been used to reduce (chronic) pain. This initial success has since been leveraged to expand the usage of DBS for a wide range of diseases and neuropsychiatric disorders. As DBS is considered for an increasing number of conditions, there is a corresponding increase of potential target regions. Interestingly an individual brain region can now be considered a target region for multiple diseases and neuropsychiatric disorders (Lozano et al., 2019;Pycroft et al., 2018).
There are a number of structures such as the ventral posterolateral nucleus of the thalamus (VPlN), the periaqueductal and periventricular grey matter (PaG, PvG) that are used to alleviate neuropathic pain (e.g., (Ben-Haim et al., 2018;Keifer et al., 2014;Pereira and Aziz, 2014)). The posterior hypothalamus and ventral tegmental area (VTA) are targeted for other pain related disorders such as cluster headache (e.g., (Akram et al., 2017;Fontaine et al., 2010)).
Here, we aim to provide insight in the many-to-one mapping, while taking into account interspecies differences. The aim of the current study is twofold: investigate what the functional similarities are between DBS targets and whether these associations can be transferred across species. To address these two questions, we applied an unsupervised machinelearning approach to conduct a large-scale analysis of the primate and rodent literature focusing on 21 DBS targets.

PubMed search query
A comprehensive literature search was conducted by querying the PubMed database (www.pubmed.org) using the Entrez search tools implemented in the Biopython Bio.Entrez module (V1.83; (Cock et al., 2009)) and PyMed (V0.8.9; (Wobben, 2019)). The query date was 30th of March 2021 and used the following search query structure: structure name AND species [MESH]. We included the following 21 regions that have (recently) been used as DBS targets in humans: amygdala, caudate nucleus, fornix, globus pallidus external segment, globus pallidus internal segment, hypothalamus, internal capsule, lateral habenula nucleus, nucleus accumbens, periaqueductal grey substance, pedunculopontine tegmental nucleus, periventricular grey substance, putamen, red nucleus, subcallosal area, innominate substance, substantia nigra, subthalamic nucleus, ventral tegmental nucleus, ventral posterolateral nucleus of the thalamus and the ventral posteromedial nucleus of the thalamus. We did not include potential subnuclei for the 21 regions in our search query.
The spelling and synonyms of the structure name was based on the 2017 Terminologia Neuroanatomica (TNa; http://fipat.library.dal.ca/) from the Federative International Programme for Anatomical Terminology (FIPAT). All English and Latin spelling of each structure and corresponding officially acknowledged equivalent or synonym was included (FIPAT, 2017). As noted in our previous work (Keuken et al., 2018a), the TNa terminology is not fully adopted by the scientific community. We therefore also included the most common names and abbreviations as noted on the English Wikipedia page for that given structure. Finally, as the TNa nomenclature is based on human anatomy we also included the rodent nomenclature for the 21 structures as proposed by (Hamani et al., 2011;Swanson, 2018;Wise, 2008).
For species the two MeSH terms 'Primates' and 'Rodentia' were used. Due to the explosion feature in PubMed each term captures a separate eutherian mammal class containing a number of taxonomic orders, families, genera and species. The 'Primates' term will therefore include literature on species such as the Macaca mulatta and homo sapiens, whereas the 'Rodentia' term will include literature on rodent species such as guinea pigs, mice and rats. All species per included class can be found online in the MeSH hierarchical tree (https://meshb.nlm.nih.gov /treeView).
The different nomenclatures resulted in 95 search terms for the 21 structures. On average there were 4.52 (SD: 2.44) search terms per anatomical structure. The primate query resulted in one or more hits for 76 of the 95 search terms. For the rodent query this was 72 of the 95 search terms. In total, the PubMed query resulted in 144,394 and 165,083 hits for the primates and rodents, respectively. The different spellings, synonyms and abbreviations resulted in a number of duplicate publications for some structures. To reduce the bias between structures, these duplicates were removed. Duplicates were not removed if they implicated in more than one structure.
For each of the PubMed ID's, the title, keywords and abstracts were used for further analysis. Any paper that had an empty title or abstract field was excluded. We would like to note that topic modeling is ideally done on the full text document. However, in light of the corpus size and the high information density of abstracts, the use of title, keywords and abstracts is considered to be sufficient to reliably estimate the latent topics (Schuemie et al., 2004;Shah et al., 2003;Syed and Spruit, 2017). A benefit of not having to rely on full text documents is that there are no paywall restrictions.

Topic modeling
Topic modeling is an unsupervised machine-learning approach that allows the identification of latent concepts, or topics, in a large corpus of documents (Blei et al., 2003;Griffiths and Steyvers, 2004). As such the automated approach allows the analyses of ~310k PubMed hits, which would not be feasible using a manual approach.
Standard data cleaning steps were performed, including the conversion of text to lowercase and the removal of numeric values, punctuations marks, double spaces and single character words. The text was then lemmatized using the SpaCy Large NLP model (V3.0.0; (Honnibal and Montani, 2017)), keeping the nouns and removing stop words using a standard stop word list using NLTK (V3.5; (Loper and Bird, 2002)). Subsequently, bigrams were created for every individual document (Loper and Bird, 2002) and were added to the corpus if they occurred three times or more in a given document. Finally, the preprocessed data were used to create a dictionary where words that occurred in less than 10 documents or in more than 75 % of the documents were removed. These steps were done to improve the interpretability of the resulting topics (Debortoli et al., 2016;Martin and Johnson, 2015;Schofield et al., 2017). The preprocessing steps resulted in a set of 20,127 and 19,195 unique tokens for the primate and rodent literature.
The Latent Dirichlet Allocation (LDA) topic model (Blei et al., 2003) as implemented in GenSim (V3.8.3; (Hoffman et al., 2010;Rehurek and Sojka, 2010)) was applied to the resulting dataset using standard hyperparameter settings with a chunk size of 20k, 100 passes, and 500 iterations. Probabilistic topic models were applied to the primate and rodent literature separately. The number of topics were estimated using the coherence value (Röder et al., 2015;Syed and Spruit, 2017) where a crude search of 16 different levels of granularity in topics (2, 12, 22 … 152, step size 10) was followed by a fine-grained search of 15 different levels of topic granularity centered on the winning model of the crude search (step size 1). The winning model resulted in 43 topics for the primate and 49 topics for the rodent literature.

Topic categorization
To assess whether the probabilistic topic modeling resulted in semantically coherent topics, the topics were labeled by two independent raters (AA, MCK). Per topic, the top 40 most relevant terms were extracted and exported to an excel document. The order of primate and rodent topics were randomized and both raters categorized each topic into one or more of the following (sub)categories: 1. Anatomy and Physiology, 1.  (Gwet, 2019) as implemented in NLTK (V.3.5; (Loper and Bird, 2002)).

Statistical analysis
The distribution of PubMed hits per structure and species was tested with a Pearson's χ 2 test and the effect size was estimated using Cramér's V as implemented in R (R Core Team, 2021). The structure-structure similarity between species was quantified using a correlation matrix and hierarchical dendrogram as implemented in Seaborn, Python (Waskom, 2021). To identify clusters of structures the Euclidean distance between topic loading and Wards clustering method was used (Müllner, 2011;Ward, 1963). Assumptions of normality were tested using quantile-quantile (Q-Q) plots. The Q-Q plot indicated that the data was normally distributed and therefore Pearson correlations were used to compare the structure-structure similarity matrices between species (R Core Team, 2021). To check whether the correlation between similarity matrices occurred due to noise, we randomly shuffled the structure labels in the primate and rodent similarity matrix 10k times. This shuffling resulted in a matrix with the same mean and standard deviation as the observed similarity matrix but without the underlying anatomical structure. The 10k random matrices were then correlated between species and resulted in a null distribution of primate and rodent correlations. Finally, we calculated how many standard deviations the observed correlation was removed from the permutated null distribution.

Open science and data availability
All abstracts resulting from the search query, code used to analyze the data and to generate the figures are available on OSF (link; DOI 10 .17605/OSF.IO/GXCB5). The 40 most salient keywords per topic and species are also made available.

Number of PubMed ID's per structure and species
After the removal of duplicates and PubMed IDs that had no title and/or abstract, the primate and rodent query resulted in 121,967 and 138,323 IDs, respectively. The number of documents per structure and species is shown in Fig. 1. After data cleaning, there were on average 79.02 (SD: 29.27) and 80.67 (SD: 27.47) tokens per document for the primate and rodent literature.
A Pearson's χ 2 test indicated that the distribution of PubMed IDs per structure was significantly different between species (χ 2 (20) = 26789, p < 0.001; Cramér's V = 0.32 and indicates a moderate effect size). This means that depending on the species there is a different amount of focus on a given structure.

Topic fingerprint
Based on the coherence value, the number of topics for the primate literature was 43 and 49 for the rodent literature. The interpretability of topics was quantified by two independent raters who labelled the 92 topics into a number of categories. The interrater reliability was moderate (Cohen's Kappa for all categories: 0.54; Cohen's Kappa for only the main categories was 0.58) and indicated that the LDA topic modeling resulted in semantic coherent topics.
A given primate topic was on average the dominant topic for 2836.44 documents (SD: 1514.35). For the rodent topics this was 2822.92 (SD: 1619.73) indicating that on average the topics were based on a similar number of documents cross-species. The mean number of topics for a given structure was 38.14 (SD: 7.79) for the primate literature and 44.71 (SD: 4.83) for the rodent literature. The topic fingerprint for the 21 structures per species is given in Fig. 2.

Structure-function similarity
A number of brain areas such as the NAc, amygdala, hypothalamus, SN, STN, and VTA showed comparable structure-function associations in both groups of species. The two largest topics for the primate amygdala both focus on limbic processes as indicated by the salient terms that contribute to the individual topic (primate topic 13: 'emotion', 'face', 'recognition' and 'aggression'; primate topic 33: 'stress', 'fear', 'ptsd', and 'threat'). For the rodent, the two largest topics associated with the Fig. 1. The number of PubMed ID's for a given structure and species. The top three structures for the primate search query are the substantia nigra, hypothalamus, and amygdala. The top three structures for the rodent search query are the hypothalamus, substantia nigra, and the nucleus accumbens. amygdala relate to epilepsy and limbic processes (rodent topic 5: 'seizure', 'kindling' and 'convulsion'; rodent topic 36: 'memory', 'fear', 'learning', 'avoidance' and 'extinction'). The hypothalamus was strongly associated with topics associated with metabolism, homeostasis, and hormones (primate topic 20: 'food', 'sleep' and 'obesity'; rodent topic 37: 'hormone', 'secretion' and 'crf'; rodent topic 14: 'food', 'leptin', 'intake', and 'weight'). Given the neurodegeneration observed in the SN in Parkinson's disease it was expected that the largest topic for the SN for primates is disease specific (primate topic 22: 'pd', 'neuron', 'loss', and 'dopamine') and that the second largest topic relates to biochemistry (primate topic 28: 'acid', 'enzyme', and 'phospholipid'). Similarly, for the rodent, the largest two topics for the SN are disease and biochemistry related (rodent topic 7: 'neurodegeneration, 'parkinson', and 'model'; rodent topic 38: 'acid', 'phospholipid', and 'rhythm'). As a frequently used target in the disease of Parkinson, the STN has a similar structurefunction association in both species (primate topic 30: 'stimulation', 'dbs', 'parkinson', and 'motor'; rodent topic 44: 'stimulation', 'frequency', and 'selfstimulation'). The VTA was associated with reward and addiction related topics in both species (primate topic 21: 'dopamine', 'alcohol', 'cocaine', 'addiction' and 'reward'; rodent topic 27: 'dopamine', 'nicotine', and 'accumbens').
The NAc initially showed a different structure-function association as the largest topic for primates was cancer related (primate topic 25: 'cell', 'cancer', 'growth' and 'apoptosis') whereas in rodent the largest topic relates to antioxidant processes (rodent topic 11: 'oxide', 'glutathione', 'oxygen' and 'nacetylcysteine'). In both cases the structure-function association is most likely a misattribution as the abbreviation used for N-acetylcysteine is NAC, and the abbreviation for the accumbens nucleus is NAc. The antioxidantic properties of N-acetylcysteine are widely used in the prevention and therapy of a number of cancers (Breau et al., 2019;Lee et al., 2013Lee et al., , 2011. As a result, the PubMed query appears to have resulted in the inclusion of studies that do not focus on the accumbens nucleus but rather on N-acetylcysteine. The second largest topic for the accumbens nucleus in primate and rodent is, however, reward and addiction related (primate topic 21: 'dopamine', 'alcohol', 'cocaine', 'addiction' and 'reward'; rodent topic 22: 'cocaine', 'reward', 'heroin' and 'addiction').

Structure-structure similarity
Using the topic loading fingerprint per structure, we asked the question which of the subcortical structures are functionally similar and whether the relationship between structures is similar across species. As shown in Fig. 3, there are a number of structures that (irrespective of species) are clustered together: the cluster of the internal capsule, caudate nucleus, and putamen; the amygdala and subcallosal area; the STN and GPe; and the cluster of the innominate substance, ventral posteromedial nucleus of the thalamus and red nucleus.
Interestingly, a number of structures have different group members based on the primate literature compared to the rodent. For instance, the NAc and VTA are thought to have a comparable structure-function association in rodent but this is not the case for primates (as both structures belong to two distinct cluster branches). Similarly, the PPN is thought to have a comparable topic loading to the STN and GPe in primates whereas in rodent the PPN is thought to be more similar to the innominate substance, ventral posteromedial nucleus, and red nucleus. Regardless of these differences between species, the primate structurestructure similarity matrix correlates (moderately) with the rodent structure-structure similarity matrix (r = 0.60, p < 0.001). To test whether this correlation was spurious, a correlation null distribution was estimated. The permutation results indicate that the correlation between structure-structure similarity matrices is 8.61SD removed from the permuted null distribution (see Fig. 4).

Discussion
We set out to investigate the structure-function associations between DBS target sites and whether these associations were comparable across species. Topic modeling revealed that topics most frequently associated with a number of structures seemed to capture similar semantic themes cross-species. For instance, in both groups, the amygdala is predominantly associated with limbic processes, the hypothalamus associated with homeostasis, the SN with Parkinson's disease and the VTA with reward and addiction related processes.

Limitations
Our study is inherently limited by the publication bias of statistically significant differences. Additionally, those published results potentially suffer from confirmation biascases in which experiments are unintentionally designed to further strengthen a structure-function association prevalent in the literature (Holman et al., 2015;Nickerson, 1998). As a result, the literature and resulting reviews become strongly biased towards a specific functional association and potentially overlook other functional roles that a structure might have (Greenwald et al., 1986;Keuken et al., 2012). Additionally, the search query only included scientific publications from a single database. As such the inclusion of grey literature (i.e., literature that is not formally published in sources such as peer-reviewed journals) in the current study is limited (Haddaway et al., 2020;Shultz, 2007). Another factor that might have introduced unintended biases within and between species is the inclusion of specific anatomical nomenclature and the actual use by the scientific community (Keuken et al., 2018a). We tried to minimize this bias by including multiple recently published nomenclatures for different species but even so we might have missed certain (historical) naming conventions. Two examples are the terms entopeduncular nucleus and the ventral intermediate nucleus (VIM) of the thalamus. The entopeduncular nucleus is frequently used to refer to the globus pallidus internal segment in rats. In the rodent nomenclature used in the current study this term was, however, deemed anomalous and as such not included in the search query (Swanson, 2018). For the VIM, the nomenclature was challenging as a number of competing and conflicting nomenclatures for the thalamus exist (Mai and Majtanik, 2019). This makes large scale automatized meta-analytic approaches challenging and calls for the standardization and use of anatomical nomenclature. While keeping these limitations in mind there are a number of conclusions that can be drawn from the current study. Fig. 3. The correlation and hierarchical clustering between structures over species. Note that the order of structures follows the hierarchical clustering layout and therefore differs for the primate and the rodent panel.

Structure-function (dis)similarities
The similarity in topic fingerprint between a number of brain areas indicated overlapping functional associations. The amygdala and subcallosal area are examples of such clusters where, irrespective of species, the main functional associations relate to limbic processes. It is therefore not surprising that these areas are candidate DBS target sites for neuropsychiatric disorders such as TRD and PTSD (Langevin et al., 2016;Merkl et al., 2013).
From the DBS literature on Parkinson's disease, we initially expected that the STN, GPi and posterior ventrolateral nucleus, which includes the VIM of the thalamus, would have similar structure-function associations (Hartmann et al., 2019). This was neither the case for the primate or rodent literature as the STN did not belong to the same cluster as the GPi and posterior ventrolateral nucleus. Instead, the STN was found to be in the same functional association cluster as the GPe. This could be due to the substantial bidirectional connections between the two nuclei and the joint role of these two nuclei in motor control and optimal action selection (Benarroch, 2008;Bogacz et al., 2016;Ditterich, 2010;Lepora and Gurney, 2012). A white matter connection which is known to be present in both primate and rodents (Milardi et al., 2019).
The structure-structure cluster analysis also hinted at species specific clusters. Based on the primate structure-function association, the STN and PPN are two structures that were similar in their topic loading. This can potentially be explained since the STN is one of the first target regions to alleviate PD symptoms whereas the PPN is one of the emerging region for the same disease (Anderson et al., 2017;Hamani et al., 2016). We did not find such a similar functional association in rodent literature as the PPN was neither associated with the STN, GPi, or posterior ventrolateral nucleus; the three most frequently used targets for DBS in PD (Anderson et al., 2017). Whether this cross-species discrepancy for the PPN is caused by the substantial interspecies differences in afferent and efferent white matter connections remains unknow (Alam et al., 2011).
There is considerable work highlighting commonalities and differences in the white matter micro and macro anatomy between primate and rodents (Mota et al., 2019;Scholtens et al., 2018;Van Essen et al., 2019). An interesting future application of probabilistic topic modeling could be to solely focus on white matter tracts, and testing associations with certain diseases and disorders. Such an analysis could potential reveal novel white matter targets to consider for DBS (Rodrigues et al., 2018;Sui et al., 2021). Another factor to consider is the massive cortical expansion between primates and rodents resulting in less homologous cortical regions projecting to the subcortex (Fernández et al., 2016;Schaeffer et al., 2020;Van Essen et al., 2019). So while a single subcortical area might have similar functional associations between species, these areas and associated networks in primates seem to receive a wider range of cortical information which can accommodates more complex behavior (Buckner and Krienen, 2013;Halloway, 1967).

Many-to-one mapping
In cases where multiple structures are used as a DBS target for the same disorder, one would perhaps expect some level of overlapping structure-function associations. An example of such a many-to-one mapping is the internal capsule and the NAc for OCD (Borders et al., 2018). Given the cluster membership differences of both structures within and between species this is clearly not the case. One explanation why both areas are considered for the treatment of OCD is that a similar region is targeted, but that the nomenclature used to describe the region is not precise enough (Haber et al., n.d.). Another explanation is that two adjacent areas are targeted, and that as a result of imprecision in electrode placement and/or the spread of current it is unclear which area is responsible for the clinical improvement (Horn et al., 2019). A third explanation why multiple target regions are successful in alleviating a disease or disorder is that they are part of the same structural and functional connectome (Clelland et al., 2014;Horn et al., 2021;Li et al., 2020). Finally, a more speculative explanation might be that as the disease progresses, networks are reorganized (Calabresi et al., 2007;Chu, 2020), and other regions develop into more clinically relevant targets for that given disease stage.
We also identified many-to-many mappings such as the nucleus accumbens and the subcallosal area for the treatment of depression and anorexia. In such mappings it is possible that the disorders share a number of symptomatologic and neurobiological features (Oudijn et al., 2013). To understand which disorders share a common structural and functional network, it is necessary to identify the entire network and role of each of the individual nuclei in vivo. The subcortex is, however, notoriously difficult to image with conventional MRI methods and a given structure requires tailored structural and functional sequences Forstmann et al., 2017;Keuken et al., 2018b). While challenging, future work would benefit from focusing on the multimodal mapping of the human subcortex including a detailed connectome to mimic what is done with the cortex (Glasser et al., 2016;Van Essen et al., 1998). Such mapping will be invaluable to better understand the occurrence of side-effects as well as to understand the many-to-one and many-to-many mapping of DBS targets and disease and disorders.

Conclusion
Overall, while some differences are present, the structure-function association for most of the 21 included subcortical areas were similar cross-species. A number of structures were also similar to one another in Fig. 4. The correlation coefficient of structure-structure association similarity between the two groups of species. The red line indicates the observed correlation coefficient (0.60) which is 8.61SD removed from the permutated null distribution. their functional associations. This is potentially due to being part of the same brain network and might explain why multiple DBS target sites are considered for a single disease or neuropsychiatric disorder.

Declaration of Competing Interest
The authors declare no competing financial interests.