Angry birds calling: an advanced system of signalling aggression to moderate conflict in the common chiffchaff

Many animals produce vocal signals during agonistic interactions that convey their ability and motiva- tion to escalate the con ﬂ ict. Often, such disputes follow ritualized, sequential phases with increasing aggression, favouring the assessment of individuals' chances of winning. This allows for individuals to withdraw before getting involved in a serious ﬁ ght. Birds are well known for the variety of information encoded in their vocalizations, sometimes including speci ﬁ c features that signal aggression. A particular case may be so-called ‘ tret calls ’ produced by chiffchaffs, Phylloscopus collybita , a song af ﬁ x for which the communicative role is not well understood. Here, we describe the acoustic structure and use of tret calls during simulated con ﬂ icts using playback. Our results show that tret calls are a low-amplitude vocalization and that birds increased tret calling after playback stimulation, suggesting a role as an aggressive signal similar to soft song in other species. However, we did not ﬁ nd that the tret call rate predicted subsequent aggressive behaviour. Our analysis of song variation in response to playback suggested that birds respond to a simulated vocal competitor in two phases: a highly aggressive phase where syllable rate goes up, which is predictive of physical attack, while simultaneously tret call rate goes down, and a low-aggression phase, in which tret call rate goes up, while syllable rate returns to baseline levels. These stereotypic ﬂ uctuations in two acoustic features may be part of an advanced communication system to moderate aggression. Finally, our results show a seasonal increase in tret calling after birds

Moderating aggression through vocal signals is a widespread behaviour during disputes over resources among individual animals. During agonistic interactions, it may be relevant to assess the fighting capacity of your opponent as well as its motivation to escalate the conflict (Bradbury & Vehrencamp, 1998). Such information can be conveyed vocally through acoustic signals that send different messages through different parameters. For instance, spectral variation of vocalizations can be a cue for body size, which is usually a relevant aspect of the fighting capacity of an individual (Charlton et al., 2020). On the other hand, the production rate of such a vocalization can be modulated to show the willingness to escalate a conflict (Briefer, 2020). During agonistic encounters, contestants may assess their chances of winning by assessing their own fighting capacity, generally referred to as resource-holding potential (RHP), but also by assessing their opponent's RHP. During mutual assessment, contests often follow ritualized phases with sequential changes in signalling as the conflict escalates into a more aggressive interaction (Enquist & Leimar, 1983). Such ritualization in escalated phases allows individuals to assess their chances of winning and continue with increased aggression only when it is the best strategy (Enquist et al., 1990). Searcy and Beecher (2009) distinguished three criteria to establish aggressive intent in a particular signal: (1) it must occur significantly more often in agonistic than in nonagonistic interactions (context criterion); (2) it should correlate with other aggressive behaviours or predict a subsequent escalation of the conflict (predictive criterion); and (3) it should elicit a differential reaction (i.e. approach or withdrawal) in receivers (response criterion). All three criteria can be experimentally tested by playback exposures. The context criterion can be investigated by comparing signal occurrence before (nonagonistic context) and during and/or after playback of stimuli (simulation of agonistic context). The predictive criterion can be investigated by correlating variation in the signal produced by the subject before playback with the intensity of the subject's response to playback. The response criterion can be investigated by varying the occurrence of the signal in the playback stimuli and testing for an impact on response intensity.
Songbirds are well known for their ability to vocally convey a variety of messages, including aggression (Marler & Slabbekoorn, 2004). Typical aggressive signals are singing at a high song or syllable rate, singing songs at high amplitude, or matching or overlapping the opponent's song (Collins, 2004). 'Soft song', defined as a quiet delivery of typical song with largely decreased active range (omnidirectional distance over which critical signal components are audible), is also considered an agonistic signal (Nice, 1943;Searcy & Beecher, 2009). In swamp sparrows, Melospiza georgiana, and song sparrows, Melospiza melodia, soft song is produced during territorial interactions and is a good predictor of attack (Ballentine et al., 2008;Searcy et al., 2006). In other species, soft song is composed of twitter sections, made up of short, soft, broadband sounds, which typically signal elevated levels of arousal (Dabelsteen et al., 1993(Dabelsteen et al., , 1997Lampe, 1991). European blackbirds, Turdus merula, use so-called 'strangled song', composed of lowamplitude twitter fragments, as an aggressive signal (Dabelsteen & Pedersen, 1990;Snow, 1958). Some authors have suggested that soft song is used for close-range communication while avoiding eavesdropping from neighbouring conspecifics or predators (McGregor & Dabelsteen, 1996;Vargas-Castro et al., 2017). Others have suggested that soft song mainly serves to address, stimulate and thereby locate the intruder (Jakubowska & Osiejuk, 2018).
Although many species show aggression by modulating the delivery of otherwise 'normal' song, via a quantitative change, some species produce qualitatively different calls or song components. Cetti's warbler, Cettia cetti, generally sing one song type (S-song) while a particular type (I-song) is used solely during escalated aggression (Luschi & Seppia, 1996). In water pipits, Anthus spinoletta, the production of a particular syllable within song, the 'snarr', has been suggested to signal social dominance (Rehsteiner et al., 1998). Male barn swallows, Hirundo rustica, emphasize the rattle, which is a specific section of the song, during agonistic encounters (Galeotti et al., 1997) and in willow warblers, Phylloscopus trochilus, the production of a specific song type, the A-song, has been suggested as a strong signal of aggression, predicting imminent attack (J€ arvi et al., 1980). Seasonal changes in singing activity are well known in northern temperate songbirds, with clear increase and decrease in song output, normally peaking during the fertile window (Gahr, 2020;Hinde, 1952). This is usually explained by the role of song in territorial defence and mate attraction (Collins, 2004). The aspects of song that encode aggression or motivation to fight may fade in and out accordingly. Such aggressive signals play an important role in territory acquisition, taking place early in the season, but also during the mating period, especially in polygynous species that compete to gain extrapair copulations. For example, a song component of the European blackbird, the twitter, is considered an important signal during agonistic interactions (Dabelsteen & Pedersen, 1990) and this twitter part exhibits a proportional increase as the breeding season progresses . Other species also show clear seasonal changes in song structure, such as the rate of syllable production (Leitner et al., 2001), song length (Apfelbeck et al., 2013;Smith et al., 1997) and repertoire size (Nottebohm et al., 1986;Van Hout et al., 2009), features often associated with sexual selection and territorial encounters.
In urban habitats, or particularly noisy areas, birds have been found to be more aggressive than their nearby conspecifics in less urbanized, or more quiet, areas (Phillips & Derryberry, 2018). Common chiffchaffs, Phylloscopus collybita, living under extremely noisy conditions at an airport, but in otherwise rather natural habitat, showed increased levels of aggression in response to playbacks (Wolfenden et al., 2019). This study suggested that noisy conditions could be responsible for increased aggression, perhaps due to a deterioration in hearing thresholds resulting from auditory overexposure (Dooling & Popper, 2007) or to a noise-dependent increase in chronic stress levels (Goudie & Jones, 2004). Other studies have shown that noise masking can also lead to a disproportionate weaker response if birds are unable to assess the stimulus accurately (Kareklas et al., 2019). Nevertheless, noisedependent variation may also be driven by behavioural changes in signal production to avoid, or compensate for, masking by noise (Brumm & Slabbekoorn, 2005;Derryberry et al., 2020). Noisemasking avoidance strategies include frequency shifts (Slabbekoorn & Peet, 2003), increased amplitude (Brumm, 2004) and serial redundancy (Potash, 1972), and often involve a combination of these (Slabbekoorn, 2013;Verzijden et al., 2010).
The common chiffchaff is well studied for its vocal variation related to natural interactions and simulated intrusions via playback experiments. Playback studies in this species have shown that individuals increase their syllable rate during territorial conflicts (Linhart et al., 2013;Sierro et al., 2020) and that birds singing at high syllable rates are more likely to attack (Linhart et al., 2013). Other studies have shown that longer songs elicit a stronger response in territory holders (Linhart et al., 2012), but that playback stimulation had no effect on the song length of the responding individuals (McGregor, 1988). Linhart et al. (2012) also showed that song frequency is associated with body size in chiffchaffs and that contestants respond differently based on their relative difference in song frequency, suggesting a mutual assessment model in contests of this species (Taylor & Elwood, 2003). Songs varying in the switching syllable pattern do not seem to elicit differential responses, while higher diversity of syllable types did trigger stronger responses after playback (Sierro et al., 2020). Song units of this species can be introduced or interspersed by so-called 'tret calls'. Although most studies have not included them in their song analysis, they are tightly associated with singing and likely play a similar role. Indeed, some authors claim that chiffchaffs react to these calls as to typical song syllables and that tret calling increases when excited (Cramp & Brooks, 1993). Production of tret calls seems to appear sometime after individuals return from their wintering ground to settle in their breeding territories (Geissbühler, 1954;Homann, 1960).
We conducted a series of playback experiments to investigate the role of syllable diversity and syllable switching (Sierro et al., 2020), during which we noted the variable occurrence of tret calls. For the current paper, we investigated the role that tret calls play in communication, by describing (1) the acoustic features and (2) the occurrence of tret calls during simulated territorial intrusion. We compared some of these characteristics to the typical song syllables, for which the structure and function are well studied in this species. We aimed to assess whether tret calls are aggressive signals by testing the context criterion (increased tret calling after playback stimulation) and the predictive criterion (aggression during contests is correlated with tret calls before playback). Given that the song stimuli designed for the experiments did not include tret calls, we could not test the response criterion. To find other cues that could inform the communication role of tret calls, we studied the seasonal variation in tret calling and measured the correlation of tret calling with the ambient noise levels. If tret calls signal aggression, we expected to find a seasonal increase in tret call occurrence, with a potential decrease towards the end of the breeding season. In relation to ambient noise, we expected to find more tret calls in noisier territories.

Model Species and Study Site
The chiffchaff is a small migratory passerine of a light greenishbrown colour that weighs around 8 g (Bairlein et al., 2006). During the breeding season, from late February to August, it is widespread over the Western Palearctic region, with a population density that may reach 22e50 breeding pairs per km 2 (Cramp & Brooks, 1993;Helbig et al., 1996). Our study population was located in and around the city of Leiden, The Netherlands (52 09 0 16 00 N, 4 29 0 41 00 E), in residential areas with a fair number of trees and urban parks. In 2014, the singing activity started in early March in our study area, as breeding pairs began to establish breeding territories (Cramp & Brooks, 1993;Rodrigues, 1996). The song of this species is composed of a series of relatively slow-paced syllables, typically alternating between two or more syllable types that vary in relatively higher and lower pitch (Figs. 1, 2; Cramp & Brooks, 1993). In this species, song is given mostly by males and seems rare, or absent, in females (Cramp & Brooks, 1993), although this should be taken with caution, since sex identification in the field is not possible. Songs last typically between 5 and 10 s and, in the gaps between songs, birds produce tret calls, which are short, soft vocalizations (Figs. 1, 2). See Fig. A1 for similar call affixes in closely and distantly related species. The birds for this study were not marked but we avoided double sampling by detailed mapping of song posts. Territory fidelity is high during and across breeding seasons (Cramp & Brooks, 1993;Piotrowska & Wesołowski, 1989;Rodrigues, 1996) and it is therefore feasible to avoid mixing up different individuals. Given that we always followed a single individual throughout a test, the fact that the individual could be a male or a female should not bias our results.

Playback Experiment
We performed two playback experiments. In experiment 1, we tested the role of syllable diversity and in experiment 2, we tested the role of syllable switching during territorial conflicts (Sierro et al., 2020). Both experiments took place in the same breeding season from 24 April 2014 to 21 June 2014, and we carried out playback tests from 0600 to 1300 hours. In experiment 1, we compared the response of the same individual to high and low syllable diversity songs. In experiment 2, we compared the response to high and low syllable switching rate songs. One trial consisted of two tests that were carried out on consecutive days and on the same subject, one for each treatment depending on the experiment. The order of treatment presentation within each trial was alternated and balanced to control for a possible bias due to an order effect. Once we detected a bird singing in its territory, we began our test by placing the loudspeaker on the ground (facing upwards) inside the territory, approximately 10 m from the singing bird. While placing the speaker, we took visual cues to delimit a 3 m perimeter around the speaker ensuring there was a potential perch within this range. Each playback test was divided into three phases: before playback, during playback and after playback. Before playback, we recorded 11 songs of spontaneous, unstimulated singing. Then, we played the stimulus from the loudspeaker. After playback we continued recording until the bird sang another 11 songs. In all cases, subjects stopped singing during the playback phase and began searching for the simulated intruder, as is common in natural interactions of many species (Catchpole, 1989;Fishbein et al., 2018). Sometime after the playback stimulus ended, birds resumed singing (Sierro et al., 2020), usually moving away from the speaker while decreasing searching behaviour and ultimately singing normally from a single perch. The distance from the bird to the microphone was usually less than 10 m.
The next day, we arrived at the same territory at a similar time of day and waited for the individual to sing at the same song post. Then, we carried out the second playback test using the complementary treatment. To play back the song stimuli, we used an Intertechnik M 130 KX4 speaker with a Monacor IPA-10 amplifier placed inside the territory and a smartphone as digital audio player. We recorded song with a Marantz PMD661 recorder (48 kHz sampling rate and 24-bit depth), together with a Sennheiser ME 67 directional microphone, covered with a foam windshield. After each test, we took four sound pressure level measurements of the ambient noise, each pointing towards one of the four cardinal directions (American Recorder Technologies, Simi Valley, CA, U.S.A., SPL-8810, response set FAST and A-weighting).
Overall, we carried out 86 playback tests on 43 individuals, 22 in experiment 1 and 21 in experiment 2 (Sierro et al., 2020). Note that the subjects tested in experiment 1 were not tested in experiment 2. For the present study, we pooled all playback trials to study the role of tret calls during territorial conflicts and the effect of season and anthropogenic noise on the production of these calls. For more detailed information on the study design see Sierro et al. (2020).

Song and Behavioural Analyses
During the playback response, we recorded very few cases of direct physical aggression, that is, directly perching and pecking on the speaker. Thus, we used a proxy to assess the intensity of the aggressive response by measuring the time spent within 3 m of the speaker (Burt et al., 2001;Catchpole, 1977;Krebs et al., 1981;Linhart et al., 2013). We analysed songs produced before and after playback using spectrograms (window type 'Hanning', window length 1024 samples, 90% overlap and À80;0 dB range) produced in Audacity (Mazzoni & Dannenberg, 2014). We defined a song as any group of at least three syllables separated from the next syllable by more than 0.5 s (Sierro et al., 2020), which is 1.5 times the average intersyllable interval within a song, as obtained from a previous study in this species (Linhart et al., 2013). Once the songs were defined, we counted the tret calls in the intervals between the songs. Given that the length of intersong intervals was variable and changed after playback presentation (Sierro et al., 2020), we measured tret calling as the rate of calls/s dividing the number of tret calls by the length of the intersong interval. Previous authors have considered tret calls as introductory calls (Cramp & Brooks, 1993); therefore, we assigned all tret calls within a song gap to the next song in the recording.
To describe the temporal and spectral properties of tret calls, we selected only high-quality recordings with a high signal-to-noise ratio. We manually marked the start and end times of each call and exported the time markings to a text file using Audacity software. We then conducted a detailed acoustic analysis in R software (R Development Core Team, 2016), package tuneR (Ligges, 2013) and package seewave (Sueur et al., 2006). We cut out each call from the recording and saved it as a single, normalized WAV file. As each tret call is normally composed of two or more separate notes, we computed the normalized amplitude envelope of the entire call. The amplitude of the sound is received by the microphone as an electric signal in volts and then translated by an analogue-to-digital converter into bits. We obtained the amplitude envelope using a smoothing function with window length of 256 samples and a 90% window overlap. This amplitude envelope was normalized so that the maximum amplitude of the signal equals 1. Note that this is a linear scale of amplitude and is not transformed into decibels, which is a logarithmic scale that reflects perceived volume. Within each call, we selected the start and end times of each note as the time point where the amplitude went over, and below, a relative amplitude of 0.3. This allowed us to measure a standardized duration of each note within the call.
On the same normalized amplitude envelope, we selected the amplitude peaks of each note within the call, and, at those time points, we computed the normalized power spectrum to extract the spectral features (window size 1024 samples, maximum dB: 0; Figs. 1, 2a). On the power spectrum, we measured the minimum and maximum frequency as the lowest and highest frequency above a À15 dB amplitude threshold (Podos, 1997). To measure the minimum and maximum frequency of song syllables, we selected each syllable manually and computed the power spectrum of the entire syllable. For a comparison of song syllables and tret calls, we associated each tret call with the song that followed in the recording.
Finally, we also measured the relative amplitude of the tret calls in relation to the song syllables, computing the normalized amplitude envelope of the entire song gap (where tret calls were recorded) plus the first three syllables of the following song. In this case the amplitude envelope was normalized for the entire WAV file that included the tret calls and song syllables. Thus, it was possible to measure the maximum relative amplitude of each tret call relative to the maximum amplitude of the song.

Ethical Note
The study adheres to the ASAB/ABS Guidelines for the Use of Animals in Research. The birds were never caught or handled in any way and therefore no specific permits were needed to conduct this study. Each playback test lasted an average ± SE of 8.83 ± 0.2 min causing some disturbance to the birds that responded as if there was an intruder in their territory. We can confirm that none of the birds left the territory after the trials were conducted since we encountered them during following visits.

Statistical Analyses
All measures are presented as mean ± one SD, unless otherwise indicated. All statistical analyses were carried out in R software (R Development Core Team, 2016). We calculated a mean value of minimum and maximum frequency per individual for the tret calls and the song syllables. Then, we carried out a paired t test comparing minimum and maximum frequencies and duration of tret calls with song syllables. For the statistical analysis we used packages lme4 (Bates, 2010) and MuMIn (Barton, 2011) and for data management and visualization we used stringr (Wickham & Wickham, 2019), dplyr (Wickham et al., 2022) and ggplot2 (Wickham, 2016).
To investigate the variation in tret call production during simulated conflicts, we fitted a linear mixed-effects model (LMM) using the rate of tret calls/s for each song as response variable. The explanatory factors in the model were the playback phase (before versus after playback), the song position (from 1 to 11 in each phase) and the interaction between these two factors. We observed in preliminary analysis that changes in tret calling during the response were not linear; hence, we applied a logarithmic transformation to the song position variable. This model structure was already tested in our previous study (Sierro et al., 2020) and was based on a priori experimental design; thus, we did not carry out a model selection process. In our model, song position was a numerical variable from 1 to 11 songs and the reference level at intercept was the first song (song 1). Therefore, the estimate for the parameter of phase (before versus after) compared the first song before to the first song after playback. We then fitted a second model setting the last song (song 11) as the reference at the intercept to compare the tret call rate at the end of each phase. Finally, we also included the territory identity as a random effect to deal with repeated measurements and avoid pseudoreplication.
To investigate the variation in production of tret calls during unstimulated song, we standardized our measurement by counting all tret calls within 90 s before the onset of playback, hence considering only unstimulated songs without any artificial stimulation. This was the response variable in an LMM model. On the right-hand side of the formula, we included season, as Julian date (origin was 1 January 2014), the ambient noise levels in dB(A) and the approach response as time (s) spent within 3 m of the speaker. Among these explanatory variables, we tested for potential multicollinearity estimating the variance inflation factor (VIF) (vif function from the car package, Fox & Weisberg, 2019), considering multicollinearity was present with a VIF greater than 3 (Zuur et al., 2009). We scaled all regression predictors of the full model, following Gelman (2008), which facilitates the interpretation and direct comparisons of the parameter estimates (Schielzeth, 2010). As each individual was tested twice, we included the territory identity as a random effect, thereby taking repeated measurements into account and avoiding pseudoreplication.
We then conducted a model selection process to select which factors were important in explaining variation in tret call production. For this we used an information theory approach, computing all possible model combinations and ranking them by the Akaike information criterion for small samples (AICc). This procedure compares the fit of all possible models while penalizing model complexity, in terms of the number of explanatory variables included. We selected all models that had DAICc < 2, in relation to the model with the lowest AICc score (best model), to compute the full average model as the final model (Burnham & Anderson, 2002;Burnham et al., 2011). We used the relative importance of each factor in the final model together with the coefficients and estimated confidence intervals (CI) with a threshold of 95% (Burnham et al., 2011;Nakagawa & Cuthill, 2007), considering there was a significant effect if the CI did not overlap with zero. We inferred a nonsignificant trend if the 95% CI overlapped with zero but not the 90% CI. In that case, we give the model estimate and the 90% CI in the text.

Structure of Tret Calls
The sample of high-quality tret calls used in spectral analysis included 29 individuals and a total of 443 individual tret calls (15.28 ± 18.12, number of tret calls per individual). In this sample, 97% of the tret calls were composed of two separate notes, each with a duration of 19.2 ± 4.2 ms, while the entire call was 70.2 ± 7.9 ms in duration. Our analysis showed that tret calls had a minimum frequency of 2.88 ± 0.16 kHz, which is significantly lower than the minimum frequency of song syllables (3.58 ± 0.20 kHz; t 27 ¼ À14.3, P < 0.001, 2.5% CI ¼ À0.77, 97.5% CI ¼ À0.60). The maximum frequency in tret calls was 6.28 ± 0.57 kHz, which is not significantly different from the maximum frequency of song syllables (6.45 ± 0.45 kHz; t 27 ¼ À1.12, P ¼ 0.27, 2.5% CI ¼ À0.41, 97.5% CI ¼ 0.08). The averaged power spectrum of the tret calls revealed the multiharmonic structure of this call, with at least three clear harmonics at ca. 3 kHz, ca. 4.5 kHz and ca. 6 kHz (Fig. 3). The mean maximum relative amplitude of the tret calls was 0.16 ± 0.08, which is 6.14 times lower in amplitude than the loudest syllable in the song.

Tret Calls During Simulated Agonistic Interaction
Immediately after playback stimulation, territorial chiffchaffs decreased the tret call rate significantly, as we can see in the parameter estimate for 'Playback phase' in the model with 'song 1' as reference level (Table 1, Fig. 4). After such a drop, we found a significant increase in tret call rate after playback, rising to levels that were significantly higher than before playback, as shown by the estimate for playback phase in the second model, with 'song 11' as the reference level (Table 1, Fig. 4). Predicted values derived from the model indicate that tret call rate was higher after the fourth song of the playback response, compared to any song before the playback. Furthermore, the highest tret call rate was measured in the last song of the response, suggesting that it was still increasing when we stopped recording. Fig. 4 also shows the pattern observed for syllable rate during the same experiment (from Sierro et al., 2020).

Tret Call Variation with Season and Noise
All models computed during the selection process, describing variation in unstimulated production of tret calls, are shown in Table 2. For the second-best model DAICc was greater than 2; therefore, only the best model was selected as the final model. The production of tret calls during unstimulated singing before playback did not correlate with the intensity of aggressive responses during playback (Table 3, Fig. 5a). As the season progressed, there was a significant increase in tret call occurrence, from the end of April to the end of June (Table 3, Fig. 5b). We also found that the number of tret calls tended to increase with rising ambient noise levels (Table 3), although this was a nonsignificant trend, (5% CI: 0.21; 95% CI: 11.78; Fig. 5c).

DISCUSSION
When singing, common chiffchaffs often produce so-called 'tret calls' interspersed with typical song syllables. Here, we have shown that tret calls are normally composed of two separate notes, with a multiharmonic structure, and are low in amplitude and frequency relative to the typical song syllables. Our experimental results showed that the production of tret calls increased significantly following a simulated territorial intrusion, which is in line with tret calls serving as an aggressive signal according to the context criterion. However, the tret call rate did not predict subsequent aggressive behaviour, which does not support the predictive criterion of an aggressive signal. Finally, we found a significant seasonal rise in production of tret calls and a nonsignificant trend for a positive correlation with ambient noise levels.

Signal in Tret Call Structure
Our acoustic analysis showed that tret calls are structurally distinct from the typical song syllables in being multiharmonic and of relatively low amplitude. The three sound frequencies at 3, 4.5 and 6 kHz match the first three harmonics of a sound with a fundamental frequency (F0) of 1.5 kHz, which is not heard. Such a structure is a common phenomenon in bird vocalizations, likely realized by the production of a multistacked harmonic signal, including the F0, at the source (the 'syrinx'), after which all sound other than the first three harmonics is filtered out by the vocal tract (Fletcher & Tarnopolsky, 1999;Nowicki, 1987). The tret calls also had a significantly lower minimum frequency and six times lower amplitude than the song syllables, which fits with a typical structure for signals of aggressive intent, according to the socalled motivation-structured rules (Morton, 1982). The much lower amplitude of tret calls relative to song syllables suggests that they serve to reach a different audience at a different distance.
As low-amplitude, broadband and short vocalizations, tret calls seem best suited for close-range communication (Slabbekoorn,  2004), as a sort of 'soft song' in chiffchaffs. In this sense, tret calls could be used to avoid eavesdropping by conspecifics or predators further away, while interacting with nearby conspecifics (McGregor & Dabelsteen, 1996;Vargas-Castro et al., 2017). However, in the current case, this would only partially work as the tret calls are interspersed with song syllables of regular amplitude. A possible scenario could be that receivers further away would get the impression that aggression levels have decreased, based on aggressive signals encoded in song syllables, while the nearby receiver still hears tret calls, a signal of elevated aggression. The potential advantage of such an advanced multireceiver strategy is difficult to assess and requires more exploration. Soft song has also been attributed a role in locating the intruder (Jakubowska & Osiejuk, 2018), but our data are not in line with this suggestion either, since tret calling increased over the series of response songs, when searching behaviour typically dropped (i.e. singing from a single perch). Alternatively, the low detectability of 'soft' tret calls may be compensated by the harmonic structure and low frequency and hence best interpreted as a variant of singing in the chiffchaff. In this sense, tret calls would serve, not to avoid eavesdropping, but as a singing strategy to modulate aggression by adding qualitatively different calls to otherwise 'normal' song. Other examples are the 'snarr' call, a song component which is an indicator of social dominance in water pipits (Rehsteiner et al., 1998) and the A-song, a specific type of song that is associated with imminent attack in willow warblers (J€ arvi et al., 1980). Exploring the detailed temporal dynamics of chiffchaffs' response to the playback may provide more insight into the functional role of tret calls.

Temporal Dynamics of Tret Calls During Contests
We found that chiffchaffs produced tret calls during spontaneous, unstimulated song, but also that tret call rate increased after playback stimulation to higher levels than during unstimulated song. Chiffchaffs became silent during the playback presentation and did not overlap the song of the playback. This has been observed in other species during natural and experimental interactions, and has been suggested to aid in the location and assessment of the opponent (Catchpole, 1989;Yang & Slabbekoorn, 2014). After the playback ended, chiffchaffs began to produce tret calls at low rates (nearly zero), but tret calling increased quickly to higher levels than before playback for most of the response. These findings are in line with tret calls serving as aggressive signals, meeting the context criterion, as the tret call rate was higher during most of the playback response than during unstimulated song (Searcy & Beecher, 2009). However, we did not find an association between production of tret calls before playback and subsequent aggressive behaviour after the simulated intrusion, failing to meet the predictive criterion. It is important to mention that our proxy for aggressive behaviour was the approach to the speaker, which sometimes does not correlate with physical attack (Linhart et al., 2012). We found a discrepancy between the predictive and the context criteria for tret calls to be a signal of aggression. Below, we use other signals in chiffchaff song and associated literature to seek an interpretation of this discrepancy.
Syllable rate is also a known signal of aggression in this species, and it also increased after playback in the current experiments. However, we found opposite patterns in the temporal variation in syllable rate and tret calling during the simulated contest. First, right after playback, tret calls dropped to a minimum while syllable rate rose to its maximum at the same point in time. Then, while tret calling increased with time in the series of songs after playback, syllable rate decreased. If we only considered the syllable rate at the end of the response (11th song after playback), we could conclude that song features related to aggression had nearly returned to preplayback values (see Sierro et al., 2020). However, we have shown that taking tret calls into account reveals a very different picture.
Hence, our findings with respect to the context criterion were in favour of tret calls being an aggressive signal, but they were not in line with our findings with respect to the predictive criterion nor did they match the pattern observed in the syllable rate, a known aggressive signal in chiffchaffs. An explanation may be that aggressive encounters follow a sequential assessment model in two phases, since previous studies suggest that chiffchaffs mutually Model estimates are shown with the associated confidence interval (CI) and t statistic. The variable playback phase is a two-level categorical variable, comparing vocal behaviour before and after playback. The variable song position is an integer. We fitted two separate models, one with the first song as the reference level in song position and a second with the last song (song 11) as the reference level in song position. Only the estimates of the intercept and the effect of playback phase change from the second model are shown since the other estimates were identical.  assess each other during fights (Linhart et al., 2012). The most aggressive phase is characterized by high syllable rate, an aggressive signal that is predictive of physical attack. Second, there is a less aggressive phase when syllable rate goes down but in which arousal levels are still up relative to unstimulated song as shown by the high rate of tret calls. In sequential assessment models, the initial phases of aggression typically imply low-aggression signals, as we found for tret calls in the chiffchaff, and the contest escalates only if rivals are matched in RHP (Arnott & Elwood, 2009). Hence, our findings could reflect an advanced system for communication about escalated levels of aggression (Bradbury & Vehrencamp, 1998), where high syllable rate is a signal of imminent attack and tret calls increase during a less aggressive phase of conflict escalation (Enquist & Leimar, 1983;Enquist et al., 1990). A future experiment to explore such an advanced signalling system could be to simulate a two-phase territorial intrusion, first with a low-threat stimulation outside but near the territory border, followed by an escalated conflict where the simulated intruder moves inside the territory.

Tret Call Use Across the Season and in Noisy Conditions
Further evidence for a general association of tret calls with territorial signalling comes from our findings of a seasonal increase in tret calling upon return to the breeding grounds, as observed by earlier studies (Geissbühler, 1954;Homann, 1960). Agonistic interactions are likely to increase over the first weeks after arrival as birds settle on their breeding grounds, which would explain the increase in tret calling during this period. In line with this argument, we would expect a decline in territoriality later in the season when territory boundaries are established. But, contrary to expectation, we found that tret calls kept rising until the end of our recording period (21 June). It is possible that we have not covered the complete breeding season and missed the later decline in tret logLik: log likelihood. AIC: Akaike information criterion corrected for small sample size. All models with an AICc lower than 2 included the season factor, as the only variable with a significant impact on tret call production of the model. The 'Akaike weight' represents the normalized, relative likelihood of a model, following Burnham and Anderson (2002, p. 75): '[…] is considered as the weight of evidence in favor of model […]'. Model estimates are given with the associated confidence interval (CI) and t statistic. The variable Julian date is measured as days in relation to 1 January of the same year (2014). Noise levels were measured as the mean decibels of four separate measures taking immediately after the recording was made. Finally, the time spent within 3 m of the speaker during the following playback stimulation is included as a measure of aggressive response. Note that the response variable is the total number of tret calls recorded during 90 s before the playback began (unstimulated song). calls, since some studies show that breeding attempts can occur until the end of July (Rodrigues & Crick, 1997). It could also be that tret calls play an important role during agonistic conflicts beyond establishing territories, such as mate guarding, which may be common later in the season in their polygynous mating system (Rodrigues, 1996).
We also found an increase in the number of tret calls with rising noise levels, independent of the season, although the relationship was a nonsignificant trend. As a potential consequence of noiseinduced stress and aggression, we expected to find an increase in tret calling with higher noise levels (Phillips & Derryberry, 2018;Wolfenden et al., 2019). Moreover, as low-amplitude, low-frequency and broad bandwidth sounds, tret calls are likely to be easily masked by low-frequency urban noise. Although some studies suggest that birds decrease the production of song elements that are heavily masked by noise (Sierro et al., 2017), the role of tret calls during singing contests may render this strategy ineffective. Alternatively, masking avoidance through increased temporal redundancy could explain the observed rise in the number of tret calls with increasing levels of noise. However, we need to be cautious in drawing conclusions as the correlation we found was only a nonsignificant trend. Furthermore, our most noisy sites had ambient noise levels below 60 dB(A) and it would be useful to include recordings at a higher range of noise levels, above 70 dB(A), such as at heavy traffic crossroads or at airports.

Conclusions
Vocal interactions among birds provide an excellent opportunity to study advanced strategies of acoustic communication in the animal kingdom. Our acoustic analyses provided structural detail of tret calls, a common and well-known, but little studied call type in common chiffchaffs. The multiharmonic structure, relatively low frequency and low amplitude of these calls make it a suitable candidate for an aggressive signal. Our playbacks revealed a mixed picture, with strong evidence for an association with the aggressive context of territorial intrusion, but no evidence for tret calls being a predictor of aggression level. The high temporal resolution analyses showing opposite patterns between tret calls and syllable rate indicates a two-phase agonistic signalling system. On the one hand, high aggression is conveyed by increased syllable rate, while tret calls replace this signal with fading aggression during a (de)escalated phase of the conflict. This would keep the message alive for a bit longer, especially for nearby competitors for which they are audible and not for further away neighbours or predators which would only hear the much louder song syllables. We can hereby add the chiffchaff to the few other cases where a specific, qualitatively different vocalization plays a specific role during singing interactions in an advanced system of signalling about escalation in animal conflicts.

Declaration of Interest
The authors declare that there was no financial or nonfinancial conflict of interest during the development of this study. These calls are produced in gaps between songs, apparently with a multiharmonic structure and much lower amplitude than song syllables. Delivery of these calls seems very similar in these other species, suggesting that a similar system as described here can be found in other, distantly related species.