Oxytocin and vasotocin receptor variation and the evolution of human prosociality

Modern human lifestyle strongly depends on complex social traits like empathy, tolerance and cooperation. These diverse facets of social cognition have been associated with variation in the oxytocin receptor (OTR) and its sister genes, the vasotocin/vasopressin receptors (VTR1A/AVPR1A and AVPR1B/VTR1B). Here, we compared the available genomic sequences of these receptors between modern humans, archaic humans, and 12 non-human primate species, and identified sites that show heterozygous variation in modern humans and archaic humans distinct from variation in other primates, and for which we could find association studies with clinical implications. On these sites, we performed a range of analyses (variant clustering, pathogenicity prediction, regulation, linkage disequilibrium frequency), and reviewed the literature on selection data in different modern-human populations. We found five sites with modern human specific variation, where the modern human allele is the major allele in the global population (OTR: rs1042778, rs237885, rs6770632; VTR1A: rs10877969; VTR1B: rs33985287). Among them, variation in the OTR-rs6770632 site was predicted to be the most functional. Two alleles (OTR: rs59190448 and rs237888) present only in modern humans and archaic humans were putatively under positive selection in modern humans, with rs237888 predicted to be a highly functional site. Three sites showed convergent evolution between modern humans and bonobos (OTR: rs2228485 and rs237897; VTR1A: rs1042615), with OTR-rs2228485 ranking highly in terms of functionality and reported to be under balancing selection in modern humans (Schaschl, 2015) [1]. Our findings have implications for understanding hominid prosociality, as well as the similarities between modern human and bonobo social behavior.


Introduction
Modern humans are characterized by prosociality, a broad term that encompasses intraspecies empathy, social tolerance, cooperation and altruism. While our closest living relatives, chimpanzees (Pan Troglodytes) and bonobos (Pan Paniscus), live in highly organized social groups, modern human social networks are larger and denser, powered by a complex social cognitive machinery [2]. Modern humans are also characterized by great intrasocial compassion and display a clear tendency to act in concert, to the extent that our species has been labeled as 'ultra-social' [3].
Less is known about the sociality of our extinct human relatives, the Neanderthals and the Denisovans. Paleogenomic studies show that both Neanderthal and Denisovan populations endured a population size bottleneck (i.e., sharp reductions in their population size) [4]. In addition to this, high inbreeding rates evident in both the Altai Neanderthal [5] and the Denisovan [6] genomes, as well as lower genetic diversity in Neanderthals compared to modern humans [7] suggest that their social groups were most likely quite smaller than ours. According to Rogers and colleagues [8], Neanderthals were deeply subdivided into small isolated populations with scarce contact between them, which may be associated with a social profile distinct from Homo sapiens.
A wide range of studies have highlighted a role of oxytocin and vasopressin/vasotocin receptor genes in a broad range of behaviors, including social [9][10][11] and non-social behaviors [12,13]. Relevant to this study, some of the social behaviors these genes have been associated with are bond formation, parental care, social recognition and social vocalizations [9][10][11]. Alterations in these genes have been linked to disorders that include social deficits, such as Autism Spectrum Disorders (ASD) [10]. These findings have led some researchers, most prominently Hare [14], to ascribe to these genes a key role in the emergence of human ultra-social behavior.
Considering both the evidence on modern human prosociality and the evidence on the involvement of the oxytocin and vasopressin/ vasotocin receptors in social behaviors, we hypothesized that the evolution of these receptors might elucidate the genetic basis of the evolution of hominin prosociality. For this reason, we searched for sequence variation in the OTR-VTR1 family of genes in modern humans, archaic humans and non-human primates (specifically, OTR (a.k.a. OXTR), VTR1A (a.k.a. AVPR1A) and VTR1B (a.k.a. AVPR1B), following the nomenclature proposed in Ref. [15]). Since fixed or nearly fixed changes associated with sociality have not been found so far in these genes in comparisons between modern humans (MH) and archaic humans (AH) or non-human primates (NHP) (chimpanzees and bonobos in particular) [5,[16][17][18], we sought to identify potential polymorphic heterozygous sites in MH that are not found in AH or/and NHP, and that have been associated with social features/deficits; polymorphic heterozygous sites are those where at least two alternative sequences are found in a population. We identified 5 sites with MH specific variation, where the modern human alleles are the major alleles in the global population. Among them, there were sites in the OTR regulatory regions that are active in the brain and associated with prosocial behavior in modern humans. We also identified 3 convergent sites between modern humans and bonobos, a primate species that shows convergence of prosocial behaviors with humans, separate from chimpanzees. Some of the sites have been reported to be under balancing or positive selection in MH. Our findings shed light on the evolutionary questions on modern human and hominid prosociality in general, as well as on similarities in the social behavior of MH and bonobos.
Specifically, of the 61 sites analyzed, 29 had associations with sociality (Supplementary Tables 7-9) and simultaneously showed a variation pattern found only in MH, in AH and MH, or in MH and a NHP species: 19 OTR-SNPs, 7 VTR1A-SNPs and 3 VTR1B-SNPs. Based on their variation patterns in MH, AH and NHP, we classified the 29 sites into five categories (Fig. 1): 1) Modern Human Unique (MHU) for sites where the heterozygous allelic variation is unique to MH (e.g., T/A in MH and A/A in all other species); 2) Modern Human Specific (MHS) for sites where MH have a specific variation pattern (e.g. T/A) different from other species (e.g, C/A); 3) Homo Unique (HU) for sites that are variant in both MH and AH in the same fashion (e.g., T/A), while the rest of the NHP are invariant (e.g., A/A) or have a different variant (e.g., C/A); 4) Homo Specific (HS) for sites that are variant in MH (e.g., T/A), and invariant in AH (e.g., T/T), different from all NHP species (e.g. A/A); and 5) MH-NHP for sites with convergent variation between MH (e.g., T/A) and another NHP (e.g., T/A in bonobo), where other species/ populations are invariant for the ancestral allele (e.g., A/A). Of these 29 sites associated with sociality, most (13) were MHU in OTR, while the 3 in VTR1B were MHS. The others were in a mixture of gene-species combinations, with 2 HU sites in OTR and 1 in VTR1A; 4 HS sites in OTR and 1 VTR1A; and 3 convergent sites between MH and bonobos (2 in OTR and 1 in VTR1A) (Fig. 1, Table 1).
Ancestral allele and functional association analyses revealed that in almost all cases, the ancestral and the MHU, MHS, HU or HS alleles showed mixed associations with sociality, i.e., the same allele was associated with both 'prosocial' and 'antisocial' features in the literature. In other words, we did not find a general trend where the ancestral allele was associated with 'antisocial' features, and the new allele with 'prosocial' ones, or the other way around (Supplementary Tables 7-9). For example, the MHU OTR-rs1042778(G), which is very frequent in MH populations, has been associated with ASD [27], but also with altruistic and comforting behaviors [28] (Supplementary Table 7). A similar trend of mixed social associations for each variant, or absence of evidence for one of the allelic variants, was found for the rest of the sites (Supplementary Tables 7-9).

Pathogenicity prediction analyses
Most of the sites we identified were located in regulatory regions of the genes, but some in exons giving rise to missense substitutions ( CADD scores that predict variant deleteriousness showed that MH-NHP OTR-rs2228485(A) and VTR1A-rs1042615 (G, C, T) alleles are likely deleterious (i.e., scores >20 predict deleteriousness), while HS OTR-rs4686302(T) also scored very close to this value (19; Table 1). Scores for most other sites ranged from ~0 to 6, meaning their predicted effect is likely benign. GERP tests showed positive evolutionary rate scores (Methods) only for MHU OTR-rs2254298 and HS OTR-rs13316193 (Table 1), which represent highly conserved sites that are likely to be functional. Combined REVEL, MetaLR, SIFT and PolyPhen-2 scores classified HS OTR-rs4686302 as more likely benign, with only the Mutation Assessor classifying it as likely deleterious (Table 1). MH-NHP VTR1A-rs1042615 was ranked by most of the aforementioned analyses as deleterious or likely deleterious, with only MetaLR ranking it as benign. SIFT and PolyPhen-2 tests categorized MHS VTR1B-rs28632197 as benign. The finding that in both MH-NHP sites we identified deleterious signals for the ancestral alleles, namely those present in MH and all NHP, including bonobos, and not for the MH-NHP convergent alleles, perhaps suggests selection differences in social behavior in MH and bonobos vs. other NHP.

Regulation analyses
We performed regulation analyses using RegulomeDB [36] and the Ensembl regulation data resources [37]. RegulomeDB makes use of large datasets, including ChIP-seq, FAIRE-seq and DNase I hypersensitive information for a variety of important regulatory factors across a diverse set of cell types; this includes chromatin state information for over 100 cell types, transcription factor binding sites (including Position-Weight Matrix and DNase Footprinting), and expression quantitative trait loci (eQTL) information allowing the association of distal sites with gene promoters. The Ensembl resources provide a functional annotation of the regulatory elements in the human genome, including data on epigenetic marks, transcription factor binding and DNA methylation, as well as microarray probe mappings.
Our RegulomeDB analyses ranked highly the MHU OTR-rs237889 variant site as lying in a functional genomic location (rank 1f; Table 2). Consistent with these findings, this same site is found in a genomic location with peak Chip-seq and DNase-seq signal for open chromatin in the brain and other cell types (Fig. 2). Open chromatin regions are regions that can be bound by protein factors and play various roles in gene transcription, DNA replication, and nuclear organization. Accordingly, this region (OTR-rs237889) contains binding sites for the EGR2 (Early Growth Response 2) and ZFHX2 (Zinc Finger Homeobox 2) transcription factors and is located in an eQTL that affects the expression of CAMK1 (Calcium/Calmodulin Dependent Protein Kinase I) ( Table 2). The second most highly ranked site was the MH-NHP OTR-rs2228485 allele (rank 2b). ChIP-seq, DNase-seq and FAIRE-seq data indicate that this site is in a genomic region active in the brain, and in particular the dopaminergic substantia nigra neurons and one of its projection targets, the caudate nucleus of the striatum, as well as the hippocampus, other parts of the temporal lobe, the cingulate and angular gyri, and the middle frontal area 46 ( Table 2). This site is implicated in binding with CTCF (CCCTC-Binding Factor), EZH2 (Enhancer Of Zeste 2 Polycomb Repressive Complex 2 Subunit), POLR2A (RNA Polymerase II Subunit A), RAD21 (RAD21 Cohesin Complex Component) and IKZF1 (IKAROS Family Zinc Finger 1) transcription factors, but with the SNP potentially altering the binding for PAX6 (Paired Box 6) and ZSCAN4C (Zinc Finger And SCAN Domain Containing 4) ( Table 2).
MH-NHP OTR-rs2228485, MHU OTR-rs2268493, HU OTR-rs60902022, HU OTR-rs237888 and MHS VTR1B-rs28632197 were all less likely to affect regulation (rank 3a), but still had open chromatin regions that either fell within the binding site of different proteins, or alter gene motifs, or both, and showed activity in brain regions like the cingulate gyrus and the hippocampus ( Table 2; Fig. 2a and b). The remaining sites (OTR: rs2268490, rs4686302, rs237917, rs6770632, rs59190448 and rs9872310; VTR1A: rs10877969 and rs1042615; VTR1B: rs33985287) did not rank highly in changing regulation, according to the RegulomeDB scores (ranks 4-6), but were also found in open chromatin or promoter regions (according to Ensembl or Reg-ulomeDB data), or in a transcription factor binding domain (Fig. 2a); all showed activity in the brain. A total of 8 of these 15 sites were located within CTCF and the POLR2A transcription factor binding motifs ( Table 2; Supplementary Table 10). These findings indicate that variant sites associated with sociality are located within transcription factor binding motifs, which could cause variant brain gene expression among the MH population.

Linkage disequilibrium analysis
We ran Linkage Disequilibrium analysis (using LDmatrix [38]) to test if any of the sites we identified are co-inherited, using data from all populations included in the 1000 Genomes Project, using a cut off of R 2 > 0.8. This analysis revealed four haplotype blocks: one containing co-inheritances among rs60902022, rs13316193 and rs11131149 in OTR; a second between rs10784339 and rs10877969 in VTR1A; a third between rs11174811 and rs3021529 also in VTR1A; and a fourth between rs33985287 and rs28676508 in VTR1B (Fig. 3, Supplementary  Table 11). This suggests that the combinations of these variant alleles might act in concert to shape functional differences that end up affecting phenotypes controlled by these genes.

Population data and putative selection signatures
We further analyzed the human allele frequency data for these sites (1000 Genomes Project [39] Phase 3 and gnomAD [40] v2.1.1 and v3) and reviewed the literature for selection on these sites (Table 1, Supplementary Tables 12-13). We used the supplementary information in Schaschl et al., 2015 [1], and filtered the results of their FDIST, Bayescan, and extended Lewontin and Krakauer (FLK) tests, reporting here only those on our identified SNPs with significant values (p or q values < 0.05) (Supplementary Table 13). We also analyzed data from Kovalaskas and colleagues [41] on 13 bonobos, where strong positive selection signals in OTR and VTR1A were identified. We recognize that selection signals are hard to ascertain, and no strong claims should be laid on them. We report them as an additional source of available information concerning the alleles under discussion.

Table 1
List of identified variants in Modern humans and pathogenicity prediction scores. For each variant, we have listed where it is found in the genetic sequence (intronic, 3 ′ -UTR) and if in the exon, whether it gives rise to a missense or a synonymous mutation ('Consequence'); if exonic, and what 'Amino acids' it gives rise to; all MH (Modern human) alleles found on this site, with the ancestral being in bold; the 'Type' of each site based on our 5 categories ( Table 1). Several of these sites have a high derived allele frequency of ≥70% globally (OTR-rs6770632(C), VTR1A-rs10877969 (T), VTR1B-rs33985287(T)), while OTR-rs1042778(G) and OTR-rs237885(T) showed lower frequencies (59% and 51%, respectively). Similarly, we found one HU and one HS case, where the new hominin (AH or MH) allele is the major allele in the global MH population (HU: OTR-rs59190448(G); HS: OTR-rs237888(T); Table 1). Both sites show derived allele frequencies of >85%, and they have been associated with signals of positive selection. Specifically, based on our filtering of the results in Ref. [1], all three approaches (FDIST, BayeScan and FLK) identified the OTR-rs59190448(G) to be strongly under positive directional selection (FDIST p = 0.0004, BayeScan q = 0.0004, FLK p = 0.001), while only the FLK test (p = 0.011) detected the OTR-rs237888 to be under positive selection (Supplementary Table 13). Interestingly, both the OTR-rs237888(T) and the OTR-rs59190448(G) are present in the majority of the AH genomes (6/8 for OTR-rs237888(T) and 5/7 for OTR-rs59190448(G); Supplementary Table 1); however, we acknowledge that the genomes of a small number of AH individuals currently available might not be representative of the general archaic population. These results, though tentative, show a similar frequency trend in AH and MH.
Several sites where we encountered a new allele either in AH or in MH are currently the minor allele in MH populations, with some of them being present at low percentages in the global population (<25%) (OTR-rs9872310, rs2268493, rs2254298, rs4686302, rs237911; VTR1A-rs11174811, rs3803107, rs3021529, rs3759292; and VTR1B-rs28676508, rs28632197). Our filtering of the results in Ref. [1] showed that some of them are under balancing selection: OTR-rs237911  Table 13). Interestingly, VTR1A-rs3759292 is also in a region that is found deleted in 0.3% of the macaques sampled on that site (739; Supplementary Table 2), and according to Donaldson and colleagues [26], who found sites with both variation and deletion events in the MH and NHP VTR1A microsatellite, such regions might have influenced sociobehavioral traits during primate evolution.
Among the three sites where MH and bonobos showed convergent variation (OTR-rs2228485 and rs237897; VTR1A-rs1042615), the OTR-rs2228485(G) was present in 13/15 (87%) bonobos we studied, and the alternate allele (A) was present in 2/15 (13%) bonobos. Interestingly, the 13 G-carriers were all from the same bonobo population [42], while the remaining 2 A-carriers were bonobos whose genome was sequenced for reference genomes. Although in all these sites, the convergent allele was not the major allele in the global MH population, in some MH populations it is present at ~50% (Table 1, Supplementary Table 12). In particular, the convergent OTR-rs2228485 G-allele that was found at 87% among the bonobo population, is present at 28% in the global MH population, while at 46% in the American population. OTR-rs237897(A) was found in 40% of the MH global population (50% in Europeans) and 13% in bonobos. VTR1A-rs1042615(A) reached 35% in the global MH population (49% in Europeans), and 87% in bonobos. Of these sites, only the OTR-rs2228485 was found to show signs of balancing selection (FDIST, p = 0.0149) in MH (Supplementary Table 13). None of these convergent sites were among the specific sites of strong positive selection identified independently in bonobos [41], but several of these sites were located in close vicinity to our identified sites in the OTR and VTR1A (e.g., one site was found at a ~14 kb distance from VTR1A-rs1042615). These findings suggest a correlation of several sites associated with sociality also showing convergent changes in MH and Table 2 RegulomeDB high-ranked results and Ensembl regulation results. For each site, we list the gene they are found in, ranking scores from RegulomeDB, if they are found at open chromatin regions based on Chip-seq, DNase-seq or FAIREseq (✓) data from Ensembl regulation database. This includes sites found on promoter or open chromatin regions, expression quantitative trait loci (eQTL) regions, proteins bound (* for those mentioned only in Ensembl, *+ for those supported from both RegulomeDB and Ensembl data. All the rest come from RegulomeDB), PWM (positional weight matrix), whether they are active in the brain (e.g. in chromatin with evidence for strong transcription in the brain) and in which brain regions (SN: Substantia nigra; Cd: Caudate nucleus; CG: Cingular gyrus; Hi: Hippocampus; MFA 46: Middle Frontal Area 46; ANG: Angular Gyrus; TL: temporal lobe).
bonobos, not found in chimpanzees and other NHP examined.

Discussion
As Ambrose [43] stated more than a decade ago, "Neanderthals may have been deficient in the neuroendocrine receptor variants that responded to the hormones that have been shown to correlate with trust, reciprocity, and cooperation". When they made that statement, they pointed out that "comparison of Neanderthal and modern human genes for these neuroendocrine systems could help test this hypothesis." In some sense, we followed this suggestion, taking advantage of the increasing number of (ancient) genomes available.
We found multiple cases of allelic heterozygosity variation in the OTR, VTR1A and VTR1B genes that are specific to MH or to both MH and AH, where one allele is unique to them among primates and the other shared with other primates. Variants in all three genes have been associated with sociality, although more so for OTR than VTR1A and VTR1B. We are aware that this could also be influence by a publication bias favoring sociality studies with OTR. Taken as a whole, our results paint a mosaic picture concerning the evolution of our prosocial nature. In line with other works [44,45], we do not contend that a specific allele made all the difference for any increase in prosociality. Rather, we believe it is the combination of heterozygous alleles among the population that can have different impacts on genes and traits, such as the protective affect for sickle cell anemia [46]. In the same way that SNPs act as general risk factors, and not as specific modifier alleles in certain behavioral or biological contexts (i.e. disorder, gender) [47] (Supplementary Note 2), we suggest that these variants could have a synergistic heterozygous impact on the evolution of social behavior in hominids. In other words, we found that certain alleles that were associated with more social deficits in one disorder (e.g., ASD) had either no effect, or conferred a benefit with respect to social abilities in another disorder (e.g., ADHD). Additionally, the impact of the alleles of a certain variant site is different depending on environmental factors, because environmental processes may be differentially accentuated through hormone signaling. We thus focus our interpretations on the functional and frequency analyses.
We consider that of the MHU and MHS SNPs, only those five where the new allele is the major allele in MH populations could be responsible for beneficial changes in MH relevant to sociality (OTR-rs6770632, rs1042778, rs237885; VTR1A-rs10877969; VTR1B-rs33985287), especially those found at ≥70% (OTR-rs6770632, VTR1A-rs10877969, VTR1B-rs33985287). OTR-rs6770632 is the only site among these three that falls in open chromatin regions active in the brain, and particularly in the cingulate gyrus, a brain region involved in social cognition [48]. Moreover, this site is in a genomic region containing the binding domain of EZH2, a gene that is involved in the etiology of memory impairments [49], autism [50] and Weaver syndrome, an overgrowth/intellectual disability syndrome [51].
These MHS and MHU sites bear on several working hypotheses that account for MH prosociality. Among them, the 'self-domestication hypothesis' [14,[52][53][54][55] posits that certain physiological and behavioral traits that MH share with domesticated animals support a significant turning point exclusive of MH on the prosocial continuum. Interestingly, Hare [14] and Theofanopoulou [53] have hypothesized that human self-domestication was likely facilitated by changes in the oxytocin/vasotocin system, based in part on studies showing that oxytocin and vasotocin ligands or receptors are under relaxed selective constraint in domesticated species [56], and their gene expression and methylation Fig. 3. Linkage Disequilibrium analysis. a, OTR heatmap matrix of pairwise linkage disequilibrium statistics. Pairwise statistics are colored based on their R 2 and D' significance scores, with the darkest shades of red being the most significant R 2 score, and white the least, and the darkest shades of blue being the highest D' scores, and white the least. ✓, pairwise R 2 > 0.8, our threshold for a significant co-inheritance of the variant alleles. The same coloring and symbols were used in the rest of the heatmaps. b, VTR1A heatmap matrix of pairwise linkage disequilibrium statistics. c, VTR1B heatmap matrix of pairwise linkage disequilibrium statistics. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)

Fig. 4. Hypothesized functional links between iden-
tified SNPs on the OTR-VTRs and sociality-related changes/similarities across species/lineages. On the left, a schematic tree between portraying the phylogenetic relationships between chimpanzee, bonobo, MH and AH. On the right, a table showing the evolutionary comparisons discussed (1st Column), the sociality-related functional links reviewed for each respective evolutionary comparison (2nd Column; all references can be found in the Discussion), and the strongest putative SNPs on the OTR-VTRs we identified (3rd Column) for each comparison (1st Column), which we hypothesize might be underlying the neurobiology of their functional differences/similarities (2nd Column). patterns differ in wild versus domesticated species [57][58][59]. We also found that some of the variants present only in MH among primates (VTR1A-rs11174811 and VTR1A-rs3021529) show signs of relaxed selection. Moreover [60], it is likely that just as argued for other species' domestication [60], oxytocin did not play a single role in human self-domestication. Different stages in the self-domestication process likely drew on different properties of the oxytocin system. This in part explains why not all our results align on a single 'prosocial' dimension.
These sites can also shed light on the genetics underlying possible sociality differences identified between MH and AH (Fig. 4). For example, they might be relevant to the smaller social groups attributed to Neanderthals and Denisovans [4] or to the decreased MH androgenization revealed by both craniofacial anatomy and digit ratio measurements compared to AH. According to Nelson and colleagues [61], the digit ratios of Neanderthals could be linked to a polygynous social structure and a higher level of male-male competition than most contemporary MH populations. Interestingly, one of the MHU sites identified in this study (OTR-rs53576) has been found to predict digit ratios, emotional (anger) and cognitive (hostility) aggression in 7 ethnic MH groups, something that could be the case also between MH and AH.
HU sites, where AH is homozygous with a new mutation not found in NHP, and MH are variant, can offer information on those SNPs that were likely present in AH and were selected for or purged out in the MH populations. OTR-rs59190448 and OTR-rs237888 are the only two sites where the HU allele ended up being positively selected in MH [1], possibly resulting in beneficial phenotypes. With the open chromatin region for OTR-rs237888 also in the cingulate gyrus, we propose that this site along with the aforementioned OTR-rs59190448, could account for a more prosocial phenotype in the human lineage compared to our non-human primate ancestors at the time of the Pan-Homo lineage split [45,62]. These sites can elucidate the genetic underpinnings of the progressive social tolerance needed for the intensive cultural transmission of technological innovations (e.g., fire use) in the evolution of humankind [63], as well as for the reduced aggression indicated by several markers in early hominid evolution, such as the reduction of male canine size [64]  For the three sites of convergent evolution between MH and bonobos, OTR-rs2228485 was found to be under balancing selection in MH and is also one of the most highly ranked sites in our RegulomeDB analysis, pointing to putative high functionality. Only the OTR-rs2228485(G) and the VTR1A-rs1042615(A) sites were the major alleles in our bonobo sample, but in order to conclusively resolve the state of these sites in bonobos, larger population samples will be needed. Nonetheless, our findings on convergent changes for MH with bonobos could be insightful for understanding the posited similarities in prosociality, social tolerance and cooperation between MH and bonobos, and the differences of both compared to chimpanzees [66,67]. For example, bonobos outperform chimpanzees on tasks relevant to social causality or theory of mind [68] and are more attentive to the face and eyes [69], suggestive of higher empathic sensitivity. In addition, both adult testosterone [70] and prenatal androgen [71] levels are lower in bonobos than in chimpanzees, with the latter being more in line with human measures in relevant correlates (i.e., second-to-fourth finger length ratio), and associated with higher empathy [72]. Added to that, according to a recent study [41], bonobos, since their split from chimpanzees, show strong positive selection signals on variants located in genes of the oxytocin and vasotocin pathway (including OTR and VTR1A), something viewed as evidence for reduced aggression and self-domestication in bonobos (we note though that a more recent study [73] did not find such selection signatures using branch and branch-site models).
Staes and colleagues [73] in a recent comparison of several neurochemical receptor genes in MH, chimpanzees and bonobos, but not AH, showed partial overlap with our results, which we confirmed (Nicky Staes; personal communication) was due to them not including the reference bonobo genomes in their sample, as well as using different ancestral state inferences. Our results for convergent evolution between MH and bonobos for the OTR-rs237897(A) agree, although in our sample bonobos are heterozygous (A/G), like MH, and in their sample homozygous (G/G). Our findings disagree for our identified OTR-rs2228485(G) convergence between MH and bonobos, where we found that both of them were heterozygous (A/G), with the ancestral state being homozygous (A/A) in all studied NHP. Due to different bonobo samples used, where all bonobos were homozygous (G/G), and no further ancestral state being considered, Staes et al. deemed this was a MH-chimpanzee convergence, something that contradicts our data. Lastly, when it comes to variants they identified in MH only, our results widely agree, although in our study, since we included AH genomes, we found that some of their MH-variants were present already in AH (e.g., OTR-rs59190448 and OTR-rs237888). Similarly, other contradictions can be explained due to our use of macaque SNPs in our inference of the ancestral state. For instance, since we found the OTR-rs237911(G) allele in one of the macaque databases we used (Supplementary In summary, we used an interdisciplinary approach to understand the evolution of hominid prosociality through the lens of the OTR-VTRs, where we combined evidence from modern and archaic genomics, population genetics, transcriptomics, and behavioral and neuroscientific studies, among others (see Supplementary Table 14 for all our results at a glance). Our approach was hypothesis-driven, narrowed down to 3 gene candidates, which we deem necessary in conjunction with genomewide studies, whose strict filters end up discovering what is present only in very high percentages in certain species' comparisons. In doing this, we identified several strong allelic candidates pointing towards a prosocial shift in MH (5 sites in OTR-VTRs) and in the hominin populations altogether (2 sites). Future studies are needed to investigate further their presence in larger sample sizes, as well as their functionality. So far, our results and interpretations are broadly compatible with information based on the fossil record, paleogenomic evidence [5][6][7][8], and with behavioral differences between chimpanzees and bonobos [66,67]. We lastly suggest that other receptors and ligands should be studied as well (glutamate receptors [74], dopamine [45], β-endorphines [62], as well as other genes in the genetic signaling pathways where OTR-VTRs function), so that we can achieve a broader understanding of how they could all have acted in concert to form a broader set of changes that set the stage for our prosocial profile.
We ran a multi-alignment between the modern human, archaic human, chimpanzee, bonobo and macaque gene sequences of OTR, VTR1A and VTR1B. Of the differences we found, we focused on those which are polymorphic in MH with literature associated with sociality or social deficits. We performed the alignments using the 'Phylogenetic Context' tool built in Ensembl [75], tools in the Max Planck for Evolutionary Anthropology Ancient Genome Browser (https://bioinf.eva. mpg.de/jbrowse/), including MUSCLE [76] and MView [77] to generate the alignments, and Aliview [78], Decipher for R [79], Bedtools to edit and analyze them. We used all the genomic sequence of the genes we aligned, as provided in the standard layout of the files of the genomic sequences in the Ensembl database, namely with 600 bp upstream and downstream. We defined the genomic sequences in the same way when we extracted the gene sequences from the archaic genomes. The only additional regions we included ad hoc in our analysis that fall further out of 600bp window were those of several VTR1A-microsatellites that have been associated with social-related phenotypes [25,26] (RS3-(CT) 4 TT (CT) 8 (GT) 24 , RS1-(GATA) 14 , GT 25 , and AVR-(GT) 14 (GA) 13 (A) 8 ).
We then aligned these sites with the rest of the available Neanderthal genomes (AH: Spy [20] (2 individuals), Goyet [21], Les Cottés [22] (5 individuals), Hohlenstein-Stadel [23], Scladina [23], Chagyrskaya [80]) and additional NHP genomes, chimpanzee (Pan Troglodytes) (Pan_tro 3.0/panTro5 and Clint_PTRv2/panTro6), bonobo (Pan Paniscus) (Max-Planck Institute panpan1/panPan1) and Rhesus macaque (Macaca Mulatta) (Mmul_10/rheMac10). Of the AH sites, we included only sequence reads that had a mapping quality of ≥25. Due to low-quality DNA for most of the ancient samples, and given the heterogeneity of procedures for data production in AH genome sequencing, we considered variant alleles only those present in at least two AH genomes, to avoid false positives due to sequencing errors. Following [81], we filtered out the samples that strongly deviated from the remaining individuals in the dataset. This filtering of samples only affected two major alleles discussed in the study: the MH-bonobo convergent VTR1A-rs1042615(A), that is also found in the Denisovan sequence; and the MHS OTR-rs6770632(C) allele that is also found in the Altai Neanderthal sequence. All alleles we retrieved can be found in Supplementary  Table 1.
To assess variation in NHP, we used Single Nucleotide Variant (SNV)data from: bonobos (13 bonobos from Ref. [42], 27 individuals with only OTR and VTR1A data from Ref. [82], 113 individuals with only VTR1A data from Ref. [83]); chimpanzees (25 individuals from Ref. [42], 35 [82] and 62 [84] individuals with only VTR1A and OTR data); and macaques (1234 individuals from the mGAP database [85] and dbSNP 150 in Ensembl [75], where the mGAP data have been incorporated) (Supplementary Table 2). The SNV-data from bonobos and chimpanzees from Ref. [42] were lifted over to the GRCh38/hg38 version of the human genome (Homo sapiens), while the macaque SNVs [85] were lifted to the GRCh37/hg19 version of the human genome; they were lifted from different human references because these are where the different primate data sets were mapped to. We lastly used information from several other publicly available genomes of these species, since they were sequenced from different individuals (chimpanzee: Pan_tro 3.0/panTro5 and Clint_PTRv2/panTro6, bonobo: Max-Planck Institute panpan1/panPan1, and rhesus macaque: Mmul_10/rheMac10).

Ancestral allele inference
In order to infer the ancestral alleles of the identified sites, we aligned the MH sequences of OTR, VTR1A and VTR1B against 12 nonhuman primate species, and inferred the ancestral sequences through Ortheus [86] using the 'Phylogenetic Context' tool built in Ensembl [75]. Ortheus is a probabilistic method that infers the ancestral allele, using a phylogenetic model, that incorporates gaps, to infer insertion and deletion events. Ancestral sequences are predicted for each node of the phylogenetic tree. Multiple alignments of all the sites we identified in 12 primate species, with the inferred ancestral sequences in each node, can be found in Supplementary Note 1. Although we found several other putative sites of convergent evolution between humans and other non-human primates (e.g., mouse lemur, olive baboon, gelada etc.; Supplementary Note 1), in absence of variation data in these species we can't determine the reliability of this convergence as we have done with bonobo population data.

Association studies analysis
We went through all the association studies identified in the NCBI Variation Viewer [87] as of October 2020 (e.g. for OTR: https://www. ncbi.nlm.nih.gov/variation/view/?q=OXTR) for SNPs that have publications (Filters: 'Variant type' (: Single Nucleotide Polymorphism), and 'Has publications' (: Yes)), and report the association(s) identified for each SNP site (Supplementary Tables 3, 4 and 5). We then classified these associations on whether they were strongly related to sociality (e. g., 'prosociality', 'social cognition'), 'possibly related to sociality' (e.g., 'depression', 'Attention Deficit Hyperactivity Disorder'), or not (e.g. 'diabetes'). We calculated the percentage of the associations for each site first, to then add all the percentages for each category together averaged out by the total number of the SNPs. Although there are more studies on the OTR SNPs in total, mostly due to replication bias, we normalized this factor, by calculating an association-percentage for each SNP first (whether it came from one or several studies), and then used the number of the SNPs to average out the final results (instead of the number of studies). We calculated p values between the percentages, using a Chi-squared test (n exact sample sizes, degrees of freedom and confidence intervals are noted in Supplementary Tables 3, 4 and 5). We lastly calculated the difference between the proportions of the studies related to sociality for each pair-wise comparison of the analyzed genes, using a Chi-squared test (n exact sample sizes, degrees of freedom and confidence intervals are noted in Supplementary Table 6). All the sites we report have studies that we classified as clearly related to sociality, with the exception of rs10784339 that has been associated with substance use disorders (Supplementary Tables 7-9). Although we classified those studies as not related to sociality, we included this site in the analysis, recognizing that there might be subjectivity in the classification of substance use disorders as being related to sociality or not.
For some sites where we identified variation patterns we wanted to study further in terms of disorders, we performed an additional exhaustive literature research in the National Center for Biotechnology Information [87] (https://www.ncbi.nlm.nih.gov/pubmed/), SNPedia (http://snpedia.com) and Google Scholar (https://scholar.google.com/ platforms) (as of October 2020) for the clinical significance of each one of the SNPs we identified in present-day human populations. On Supplementary Tables 10-12, we have included most of studies that report an association of these variants, a short description of their exact impact, as well as the trial sample of each study. We have not included review studies that recapitulate findings of original studies or studies showing negative results.

Pathogenicity prediction analyses
We performed pathogenicity prediction analyses using the following seven tools: 1) CADD [29] (Combined Annotation Dependent Depletion), which scores the predicted deleteriousness of SNVs and insertion/deletions variants in the human genome by integrating multiple annotations including conservation and functional information into one metric. In our analysis, we display scores above 20 as likely deleterious and scores below 20 as likely benign. 2) GERP [30] (Genomic Evolutionary Rate Profiling), which identifies constrained loci in multiple sequence alignments by comparing the level of substitution observed to that expected if there was no functional constraint. Positive scores represent highly-conserved positions while negative scores represent highly-variable positions.
3) Mutation Assessor [31], which predicts the functional impact of amino-acid substitutions in proteins using the evolutionary conservation of the affected amino acid in protein homologs. The rank score is between 0 and 1, with variants with higher scores being more likely to be deleterious. 4) REVEL [32] (Rare Exome Variant Ensemble Learner), which is an ensemble method for predicting the pathogenicity of missense variants. It integrates scores from MutPred, FATHMM v2.3, VEST 3.0, PolyPhen-2, SIFT, PROVEAN, MutationAssessor, MutationTaster, LRT, GERP++, SiPhy, phyloP, and phastCons. Scores range from 0 to 1, and variants with higher scores are predicted to be more likely to be pathogenic. We considered scores above 0.5 as 'likely disease causing' and below 0.5 is 'likely benign'. 5) MetaLR [33] (Meta Logistic Regression), which uses logistic regression to integrate nine independent variant deleteriousness scores and allele frequency information to predict the deleteriousness of missense variants. Variants are classified as 'tolerated' or 'damaging'; scores rank between 0 and 1, and variants with higher scores are more likely to be deleterious. 6) SIFT [34] (Sorting Intolerant From Tolerant), which predicts whether an amino acid substitution is likely to affect protein function based on sequence homology and the physico-chemical similarity between the alternate amino acids. Scores <0.05 are classified as 'deleterious' and all others as 'tolerated'. 7) PolyPhen-2 [35], which predicts the effect of an amino acid substitution on the structure and function of a protein using sequence homology, Pfam annotations, 3D structures from PDB where available, and a number of other databases and tools (including DSSP, ncoils etc.). The PolyPhen-2 score represents the probability that a substitution is damaging, so values nearer 1 are more confidently predicted to be deleterious (note that this is opposite to SIFT).

Regulation analyses
We performed regulation analyses using RegulomeDB [36] and Ensembl regulation data resources [37]. RegulomeDB analysis make use of large datasets, including ChIP-seq, FAIRE-seq and DNase I hypersens Itive information for a variety of important regulatory factors across a diverse set of cell types, chromatin state information across over 100 cell types, binding sites of transcription factors (including Position-Weight Matrix and DNase Footprinting), and expression quantitative trait loci (eQTL) information allowing the association of distal sites with gene promoters. The Ensembl resources provide a functional annotation of the regulatory elements in the human genome, including data on epigenetic marks, transcription factor binding and DNA methylation, as well as microarray probe mappings ( Table 2, Fig. 2).
RegulomeDB has developed a heuristic scoring system based on functional confidence of a variant. Category 1 variants are those that are known eQTLs for genes, and thus have been shown to be associated with expression. Within Category 1, subcategories indicate additional annotations from the most confident (1a, which has transcription factor (TF) binding, a motif for that TF, and a DNase footprint) to the least confident (1f, which has only TF binding or a DNase peak). According to Boyle and colleagues [36], who developed the system: "the additional categories represent analogous annotations to Category 1 but without eQTL data and, thus, no known direct effect on binding. Category 2(a-c) demonstrates direct evidence of binding through ChIP-seq and DNase with either a matched PWM (positional weight matrix) to the ChIP-seq factor or a DNase footprint. Category 3(a-b) is considered less confident in affecting binding due to a more incomplete set of evidence. These sites have ChIP-seq evidence and either a motif that matches the ChIP-seq data but no DNase evidence, or DNase evidence and any other motif. Finally, Categories 4-6 lack evidence of the variant actually disrupting the site of binding. These include DNase and ChIP-seq evidence (Category 4), DNase or ChIP-seq evidence (Category 5), or any single annotation not in the above categories (Category 6)." A list of the scores of all variants studied can be found in Supplementary   Table 10.

Linkage disequilibrium analysis
We ran Linkage Disequilibrium analysis (using LDmatrix [38]) to test if any of the sites we identified get coinherited, using data from all population included in the 1000 Genomes Project and a cut off of R 2 > 0.8. LDMatrix (https://ldlink.nci.nih.gov/?tab=ldmatrix) creates an interactive heatmap matrix of pairwise linkage disequilibrium statistics. R 2 and D ′ scores for all sites' combinations can be found in Supplementary Table 11.

Population data and selection analysis
We collected human allele frequency data for our identified sites using data from the 1000 Genomes Project [39] (Phase 3) and gnomAD v2.1.1 and v3 [40]. We listed the minor allele frequency (MAF) and the highest MAF in present day human populations in Table 1, and specific frequencies only from the 1000 Genomes Project in Supplementary  Table 12. For bonobos, we also analyzed the data from 13 bonobos by Kovalaskas and colleagues [41] (raw data provided by Sarah Kovalaskas), where strong positive selection signals on OTR and VTR1A were identified.
We also reviewed the literature for studies on selection signals of these sites in MH. We used the supplementary information found in Schaschl et al., 2015 [1], and filtered the results of their FDIST, Bayescan and extended Lewontin and Krakauer (FLK) tests, reporting only those on our identified SNPs with significant values (p or q values < 0.05) (Supplementary Table 13

Author contributions statement
CTh conceptualized and designed the study, ran variation, pathogenicity prediction, regulation, human population and macaque data, and association analyses. AA ran variation, pathogenicity prediction, human, chimpanzee and bonobo variation analyses and reviewed the literature. EDJ and CB supervised the study.

Data availability statement
All data generated or analyzed during this study are included in the Suppl. Notes and Tables.