Chimpanzee vocal communication: what we know from the wild

Vocal communication plays a vital role in the daily lives of our closest living relatives, chimpanzees. Unpacking the adaptive function of vocalisations, and the cognitive mechanisms underlying their production and comprehension is not only crucial for understanding chimpanzee behaviour, but also for inferring the capacities of our last common ancestors. Here, we review how observational and experimental methods have advanced our understanding of the vocal production and comprehension of wild chimpanzees. We discuss the impact of social and ecological factors on chimpanzee vocal communication, and review the inroads that have been made in elucidating the cognitive processes underpinning call production. We highlight approaches that may offer substantial future advances in knowledge and argue that whilst challenging to collect, data from wild populations is critical to building a comprehensive and accurate understanding of the communicative and cognitive abilities of our closest living relatives, and to tracing the evolutionary roots of human language. This study provides important insights into the variability in call pro- duction rates across age-sex demographics and socioecological contexts in chimpanzees. Understanding the social and ecological parameters of calling can provide important insights into the function of vocalisations and estimation of calling rates also has important con- servation implications: call densities acquired from Passive Acoustic Monitoring can be converted into estimates of population densities if calling rates are known. modulated This paper showcases the value of cross-site collaboration which en- abled the inclusion of data from five chimpanzee communities from two geographical regions. Analyses showed that the function of greeting calls can be modulated by their specific acoustic variants and the visual gestural signals accompanying them: two types of greeting calls (pant grunts and pant barks) were more likely to be produced by subordinate individuals if an approaching dominant individual had displayed ag- gression. The production of both calls was also predicted by the subordinate individuals nonvocal behaviour, specifically postures and gestures linked to submission. These patterns were stronger when the utterance contained a pant bark rather than a pant grunt alone. of our closest living relatives meet several of the key hallmarks of intentional signal production, in a directly way to great ape gestures. of three wild chimpanzee alarm call types examined exhibited characteristics which had previously been used to argue for intentionality in gestural communication. This field experiment showed that these calls were socially directed (given to the arrival of friends), goal directed (calling when recipients were safe), and associated with visual monitoring of the audience and gaze alternation. These findings contrast with the traditional characterisation of great ape vocal as an inflexible and involuntary zero-order intentional process, and crucially undermines a central argument of gestural theories of language evolution. These findings instead open up the possibility of a multimodal origin of human


Introduction
Six decades ago, researchers ventured into the forests of Africa and the first pioneering studies of chimpanzee behaviour began [22,46]. Among investigations into hunting and tool use, early descriptive work highlighted the crucial role of communication for negotiating the complex social worlds chimpanzees are immersed in. In the last 25 years, significant research effort has been dedicated to systematically decoding the information content, adaptive function and proximate mechanisms underpinning communicative signals. Whilst chimpanzees communicate with vocal, gestural, facial, olfactory, and multimodal signals [39], vocalisations are particularly important since fission-fusion social dynamics and lowvisibility natural habitats often mean group members are out-of-sight. Vocal communication is also critical and complex in many other primate and nonprimate species (e.g. [5,57,59,49,15,62,48]), however, this review focuses on chimpanzee vocal communication due to its pivotal role in reconstructing the evolutionary roots of human speech and language (e.g. [15,37,21,12]). Studying the communicative behaviour of our extant primate relatives allows us to draw inferences about the capacities of our extinct ancestors [27], and chimpanzees are a crucial evolutionary model to reconstruct the primitive language traits our last common ancestor may have shown 5-7 million years ago [13].
Given the central importance of vocalisations, a diverse suite of approaches has been applied to unpack their information content and the cognitive mechanisms underlying their production and comprehension. First, on a neural level, the use of Positron Emission Tomography brain imaging has revealed that brain areas homologous to those recruited during language production and perception are active when chimpanzees deploy vocal and gestural signals [58]. Second, ape language projects probed the capacity of chimpanzees to acquire and comprehend aspects of human language, and have demonstrated that enculturated chimpanzees raised in language-rich environments have limited vocal, compared to gestural, plasticity, highlighting critical differences in the cognitive mechanisms underpinning chimpanzee vocal and human speech production [28,29,31]. Third, dedicated research effort to understand naturalistic communication between conspecifics has been successfully applied in captivity. Captive settings allow high levels of control over physical and social aspects of the environment, permitting researchers to probe factors driving call production [4,47], and to conduct playback experiments to test receiver understanding [52,55]. Despite these advantages, captivebased work is limited by an intrinsic lack of ecological validity. When trying to understand the adaptive function of vocalisations, in terms of the benefits call production accrues to the signaller and responding to calls accrues to the receiver, it is critical to study signals in an environment similar to the one in which they evolved. Research in the wild with free-ranging individuals is therefore necessary for a comprehensive understanding of chimpanzee vocal communication. Moreover, understanding the adaptive function of calls can help us to understand the selective pressures that may have driven the evolution of chimpanzee vocal communication and inform theories of language evolution.
To unpack the informational content of signals and their adaptive function, it is essential to examine communication from both the signaller's and receiver's perspectives. Specifically, it requires understanding the effect vocalisations have on receivers, and the factors influencing call production alongside an examination of the fitness benefits associated with both sending and responding to a signal [3]. Investigating the ecological and social factors that influence call production, both in terms of usage conditions and the fine acoustic structure of calls, is one key evidential prong. Once observational data have been used to identify putative informational content (in the form of reduction of uncertainty, see Ref. [49]) for a signal the next crucial step is to test receiver understanding of the call, using methods such as playback experiments. These experiments allow researchers to present calls to receivers in the absence of other behavioural or environmental cues that may be driving the responses in naturalistic interactions, to test what receivers can extract from the calls alone. Observational and experimental data have been crucial for identifying functionally referential calls: calls that are produced reliably to a specific external event or object, and that receivers respond to as if they refer to that event or object [42]. The uncertainty about the cognitive mechanisms underpinning functionally referential call production has led some to question the relevance of functionally referential vocalisations to understanding human referential abilities [66]. However, given chimpanzees' other advanced social cognitive abilities [36] it is possible that functionally referential calls in this species are underpinned by mechanisms more similar to humans than other nonprimate species (e.g. chickens, [14]; and meerkats, [43]). Given the relevance of informational content and particularly functionally referential chimpanzee calls for understanding language evolution, we next review recent research conducted with wild communities of chimpanzees focusing on both call production and perception.

Call production
The chimpanzee vocal repertoire comprises 13 identifiable call types, many of which grade into further subtypes and each other [44,54]. Calls are commonly given in response to specific ecological or social events, such as food or predator discovery, and agonistic and affiliative interactions [23]. Research addressing the production of chimpanzee vocalisations has broadly taken two main approaches with wild populations: (i) observationally documenting call occurrence and accompanying contexts, and (ii) experimentally eliciting calls with the presentation of conspecific calls or predator models. Both of these approaches have the potential to help us understand the factors influencing call production, which are important to identifying the adaptive function and potential information content of vocalisations.
In terms of understanding when chimpanzees vocalise, observational data have enabled researchers to identify a variety of socio-ecological factors that influence wild chimpanzee calling rates. For example, Crunchant et al.
[10] studied a community of chimpanzees living predominantly in savanna woodland (Issa valley, Tanzania) and showed that loud calls (pant hoots, pant barks, and screams) were given at higher rates when individuals were in larger parties (sub groups of individuals), were travelling, or were in open habitats. Furthermore, individual attributes such as age and sex also influenced loud call production rates, with adults vocalising at higher rates than juveniles, and males vocalising more than females. Sex differences in production rates of one type of loud call, the pant hoot, have also been found in the Tai forest, Cote d′Ivoire, but when all call types (loud and quiet) were considered there were no overall differences in call rate in this community indicating sex differences in calling rates are modulated by the type of call [33]. Such variation in calling behaviour as a function of, for example, sex, group membership or behaviour is suggestive of a dynamic system and further focus on the social-ecological variables that predict the production of specific call types, has allowed hypotheses about putative functions for these call types to be generated.
Food-associated calls have received particular attention, as aspects of the external environment seem to correlate with the production of these calls, and captive work has confirmed that one type of call (rough grunt) is functionally referential [52]. In the wild, observational data have confirmed that rough grunts are produced in a highly context-specific manner, with 93% of calls associated with a feeding context [50]. Males from the Sonso community, Uganda, are more likely to produce rough grunts (food-associated calls) and combinations of pant hoot and rough grunt calls when encountering large than small food patches [38,56]. Additionally, in the Taï south community, the acoustic structure of rough grunts has been found to differ systematically with tree size and species with calls given to large trees being more successful in recruiting individuals to the food source [34]. Thus, research in the wild builds on captive findings and indicates that rough grunts may provide information to listeners about not only the presence and value of food [52], but also the size of the food patch and possibly the type of food discovered. Despite the potential richness of the informational content of these calls, if they are produced unintentionally as a result of excitement or increased arousal elicited by the discovery of a large or preferred food patch, then the parallels between these functionally referential calls and human reference are limited [66]. However, calls are only produced by males on arrival at 56% or 45% of feeding events in Sonso and Kanyawara communities respectively [16,56], indicating rough grunts are not an automatic, reflexive response to food. Indeed, audience effects have been found, with male chimpanzees from Sonso and Kanyawara communities, Uganda, being more likely to produce food calls when close social partners are in the foraging party [56] or in close proximity [16]. Field experiments where pant hoot calls were played back to simulate the arrival of a specific individual close to a male's feeding tree confirmed these calls are selectively produced for individuals that a feeding male had a close affiliative relationship with, and for higher ranking individuals [50]. In line with this, recent observational research has also found that combinations of pant hoots and rough grunts were most likely to be given when high-ranking individuals joined a feeding party [38]. This indicates that chimpanzees have a degree of voluntary control over call production and that these calls are directed at specific individuals and used tactically. These findings have also led to the hypothesis that in Eastern male chimpanzees, one adaptive function of rough grunts may be to facilitate social bonding between signallers and receivers.
Vocalisation types produced to negotiate social interactions have also been shown to vary in usage conditions or acoustic structure as a function of social variables. For instance, whether or not subordinate individuals produce greeting calls (pant grunts and pant barks) when dominant individuals approach is predicted by aggressive behaviour in the dominant recipient, and nonvocal submissive behaviour in the subordinate signaller. These associations are then much stronger for call bouts containing pant barks than pant grunts alone [19]. The number of greeting vocalisation repetitions has also been found to be lower for dyads with a strong social bond [41]. Similar audience effects also modulate production of other social calls. For example, the likelihood of female copulation call production is suppressed when dominant females are in the vicinity [61], and the acoustic structure of victim screams is modulated by the presence of third-party individuals that may be able to effectively support the victim in an agonistic interaction [67]. This highly selective and targeted call production highlights that, contrary to traditional views of nonhuman primate vocal production being the product of arousal-based processes [2, 20,60], vocal production in chimpanzees may be under some degree of voluntary control.
Determining whether vocal signals are under voluntary control is an important step to establishing if signals are intentionally deployed; a hallmark of human communication [25]. Although theoretical and operational definitions of intentionality are debated, the focus of much animal research aims to distinguish reflexive, unintentional signal production from goal-directed, socially directed signals that are voluntarily produced [24]. As outlined previously, primate vocal production has traditionally been assumed to be the product of reflexive processes (e.g. [60]), so identifying species, call types and contexts in which vocalisations are voluntarily produced, socially directed and goal-directed (first-order intentionality; [11]), is important as these vocalisations may represent a stepping stone towards the higher levels of intentionality humans regularly engage in when producing language. To examine intentional call production, research has focused on establishing if vocal signals meet criteria for first-order intentionality; when a signaller intends to change a recipient's behaviour [1,24]. Indeed, Schel et al. [51] explicitly tested whether alarm calls produced by chimpanzees belonging to the Sonso community, in response to a model python presentation, met established markers of first-order intentional signal production, that had previously been applied to great ape gestural signals [39]. By presenting the model to chimpanzees in different social contexts, Schel et al. [51] were able to show that two out of three alarm call types investigated met key criteria of intentional production. These calls were socially directed (only produced in the presence of others and not when the snake model was encountered alone, and were more likely to be given when friends rather than nonfriends arrived into the vicinity of the snake), callers visually monitored their audience, rather than just the snake and, they were goaldirected in that callers stopped producing alarm calls when audience members were safe from the predator.
Further research has probed the possibility that chimpanzee alarm call production is not only intentionally produced to influence the behaviour of others (first order intentionality; [51]), but also to change the knowledge state of others (second-order intentionality; [9]). Whilst captive experiments provide convergent evidence for chimpanzees understanding knowledge and ignorance states in others [26,35], whether this mental state understanding influences call production remains contentious. In two related studies, Crockford and colleagues [8,9] aimed to test whether chimpanzee alarm call production was mediated by the knowledge state of audience members. Through presenting travelling parties of chimpanzees with snake models, the researchers were able to show that subjects were more likely to produce alarm calls to a snake model when audience members had partial knowledge of the snake (heard previous alarm calls, but not seen it) compared to full knowledge (seen the snake). However, as this analysis was focused on cases where an audience member was approaching the snake, simpler behaviour-reading explanations for this pattern of results cannot be ruled out [51]. In order to control for this potential confound, in a subsequent study Crockford et al.
[8] presented individuals with a playback of a group member that, simulated either a knowledgeable or ignorant individual in the vicinity. Acoustically distinct variants of the quiet hoo vocalisation, that listeners are sensitive to [6,7], were played back, with rest hoos simulating resting callers (ignorant of danger) and alert hoos simulating callers who had detected a threat (knowledgable of danger). Chimpanzees were more likely to mark the presence of the snake with gaze alternation between the snake and recipient and to give alarm calls when the simulated group member was ignorant compared to knowledgeable. Although the nonvocal marking behaviour is a convincing demonstration of knowledge state attribution in the wild, whether it mediated their vocal behaviour is still unclear as recipient knowledge was confounded with signaler knowledge in this set up (hearing an alert hoo gives the signaller partial knowledge of the snake before they encounter the snake; whereas the rest hoo gives the signaller no warning of the snake). Taken together, these studies suggest that chimpanzee vocal production meets the behavioural markers for first-order intentionality, and possibly second-order intentionality, although further research is needed to confirm this. As such, the differences between humans and chimpanzees in the cognitive mechanisms guiding the production of vocalisations may be one more of degree than kind.

Call comprehension
To date only a handful of studies have experimentally probed the response of chimpanzee receivers to various call types in the wild, data which are key to confirming the putative information content and function of vocalisations. Such experiments are practically and ethically challenging to conduct in the field, which automatically limits the scope as to which vocalisations can be targeted and investigated in detail. Most likely due to its conspicuous nature, the long-distance pant-hoot vocalisation has received most interest. This call, which is individually distinctive [18], is thought to function to maintain contact between fissioned parties within communities [45], but also to regulate spacing across communities [63]. With a focus on the latter function, studies with the Kanyawara community and three communities from Tai (North, South, and Middle) exposed parties of chimpanzees to playbacks of pant hoots from different communities, and found that behavioural responses to calls were strongest when receivers were in larger parties [30,64]. Moreover, individuals distinguished between calls produced by neighbours and strangers [30]. Together these studies suggest that chimpanzees can recognise group membership from pant-hoots and use this information strategically to guide their territorial responses.
More recent research has focused on the degree to which receivers can use the information encoded in the fine acoustic structure of calls produced by community members to inform their responses. Observational field work suggested screams produced by victims in agonistic encounters varied acoustically, with the variation systematically mapping on to the severity of the aggression received [67]. High severity screams, for example, are higher pitched and longer in duration compared to lower severity screams. Slocombe et al. [53] then employed field experiments to investigate whether listeners were sensitive to these acoustic differences, and used them to make sense of agonistic interactions they could hear but not see. Adult males and females were found to respond more strongly to high-severity victim screams than low-severity screams, a difference that could not simply be explained by the more arousing acoustic properties of the high-severity calls, given that similar responses were not elicited by infant tantrum screams (which are acoustically indistinguishable from high-severity victim screams [53]. One implication is therefore that chimpanzees are capable of decoding the precise contextual information pertaining to the ongoing aggressive interaction from screams; information which they can use to inform their own behavioural decisions regarding whether or not to intervene.
More recently, attention has turned to less conspicuous, lower amplitude, social vocalisations, and the corresponding meaning attributed to such calls. Specifically, Crockford et al.
[7] exposed chimpanzees to playbacks of 'hoo' variants that observational data confirmed were acoustically distinct: namely resting hoos and alert hoos [6]. In response to both calls, chimpanzees spent time looking in the direction of the speaker and even searching the associated area from which the sound was broadcast, perhaps seeking out the call provider. Critically, however, stronger responses were elicited by the alert hoo playbacks than resting hoos, potentially to acquire more information regarding a putative threat. In sum, these findings suggest hoos are perceptually discriminated and, minimally, chimpanzees attribute a threat-based meaning to alert hoos though precisely how resting hoos are comprehended still needs exploration.
In summary, both observational and experimental research with wild populations of chimpanzees have shed light on function, usage and informational content of calls and have started to probe the cognitive processes underpinning vocal behaviour. Although this progress should be celebrated, we have a number of suggestions for how future work in the wild could strive to deepen our understanding of vocal communication in our closest living relatives.

Future directions
Despite an upsurge in wild empirical vocal work at both production and comprehension levels [40], a key limitation is that, to date, most studies conducted originate from a single chimpanzee community. If we are to make broader generalisations regarding chimpanzee vocal capacities, data from multiple sites are crucial. Crosscommunity work offers the potential to reveal group and individual-level variation and flexibility in call structure, function and meaning: features with important parallels to human language. Such approaches are slowly becoming more common (e.g. [19]), but collecting data from multiple field sites is logistically challenging. An alternative approach is to form cross-site collaborations ideally at the start of a project so consistent data collection protocols can be agreed or where this is not possible to share hard-earned datasets to allow questions of common interest to be addressed. Engagement with the online data and protocol sharing platforms such as the Open Science Framework (https://osf.io) and projects such as Many Primates offer a promising way forward.
Investigations into the influences of socioecological variables on call production would be greatly aided if long-term social data (rank and association) were collected in a directly comparable way across field sites (e.g. using the same behavioural sampling techniques and collecting the same behavioural markers of rank and association). Collaboration between research groups may also facilitate understanding the function and evolution of rarer signals where sufficient data sets can seldom be collected by individual researchers. A greater team effort could also bolster longitudinal data on developmental trajectories of communicative behaviour, with sequential teams of researchers continuing to collect data on individuals as they age. This would facilitate understanding of an essential but understudied aspect of communication.
Finally, many aspects of long-distance vocal communication between parties of wild chimpanzees, and their role in social decision making, remain a mystery, as researchers are currently constrained by reliance on visual identification of callers and observation of in-sight social interactions. Playback experiments show that chimpanzees can identify individual community members from their vocalisations [65] and make sense of social interactions they can hear but not see (e.g. [55]). Observational data also indicate long-distance vocalisations play a crucial role in fission-fusion grouping decisions [17], however how the identity and behavioural context of the caller influences social decisions of listening chimpanzees remains unknown. Indeed, in the absence of an army of researchers interspersed throughout the forest, a large proportion of chimpanzee vocal and social interactions currently remain intractable. Technological advances, including supervised machine learning, may represent one promising route out of this impasse allowing access to out-of-sight social interactions through the real-time automated analysis of accompanying acoustic footprints and reveal the intricacies of an important, but currently hidden, part of the chimpanzee world.
In conclusion, studying vocal communication in the wild has shed much needed light on the communicative complexity of chimpanzees and offered insights into the cognitive mechanisms underpinning call production and perception. Furthermore, although conducting rigorous vocal research in the wild is challenging, such research is also central to a phylogenetic reconstruction of the behavioural and cognitive profile of our last common ancestor, information that is central to paint an accurate and comprehensive picture of the evolutionary origins and uniqueness of our species.

Disclosure statement
Given her role as Guest Editor, Katie Slocombe had no involvement in the peer review of the article and has no access to information regarding its peer-review. Full responsibility for the editorial process of this article was delegated to Zarin Machanda. This study provides important insights into the variability in call production rates across age-sex demographics and socioecological contexts in chimpanzees. Understanding the social and ecological parameters of calling can provide important insights into the function of vocalisations and estimation of calling rates also has important conservation implications: call densities acquired from Passive Acoustic Monitoring can be converted into estimates of population densities if calling rates are known. Using collocation analyses (used in language sciences) the authors confirmed that the combination of pant hoots with food calls was not a random co-occurrence, but instead a consistently produced structure. Following this they confirmed that pant hoots and foods calls comprising the combination were acoustically indistinguishable from the same calls produced in isolation, using acoustic analyses. This shows that the combination is composed of individual meaning-bearing units -a key criterion of linguistic syntax. Last, they demonstrate that the call combination was more likely to be produced when feeding on larger patches, and when a high-ranking individual joined the feeding partysuggesting the production was context specific. These findings provide promising evidence for combinatorial structuring in wild chimpanzee vocalisations, and further expand the known phylogenetic distribution of such abilities to include our closest living relatives.