Introduction

Amid a cacophony of environmental sounds, juvenile songbirds can reliably learn a species-specific song. In general, male birdsong has the functions of territory defense and mate attraction, and learning these songs is a costly process (Catchpole and Slater 2003; Nowicki and Searcy 2004). The song control system in the brain of vocal learners is a network of discrete, interconnected neural circuits with two primary pathways, one for song learning and one for song production (Nottebohm et al. 1976, 1990; Scharff and Nottebohm 1991; Brenowitz et al. 1997). By decreasing the resources available for brain development, nutritional and developmental stress in the nest can lead to deficits in this song system (MacDonald et al. 2006; Schmidt et al. 2013) and thus in song learning. Therefore, learned song can be a reliable indicator of a male’s fitness potential (Nowicki et al. 1998a; Nowicki and Searcy 2004) and general cognitive capacity (Boogert et al. 2008, 2011; Templeton et al. 2014). Early-life stresses can lead to perceptible differences in song repertoire size (Nowicki et al. 2000), the amount of song produced (Buchanan et al. 2003), song complexity (Spencer et al. 2003, 2004; Soma et al. 2006), and learning accuracy (Holveck et al. 2008). The effects of early development on song can be reflected in sexual selection and fitness; for example, female swamp sparrows (Melospiza georgiana) prefer the songs of males that had exhibited faster growth as nestlings (Searcy et al. 2010).

Although songs are learned, birds often exhibit unlearned predispositions for the kinds of songs they learn. For example, swamp sparrows (M. georgiana), when raised in isolation and exposed to various recorded and manipulated songs, learned only songs composed of swamp sparrow syllables (Marler and Peters 1977). Further, juvenile white-crowned sparrows (Zonotrichia leucophrys) could be ‘tricked’ into learning the syllables of other species when the manipulated playbacks began with a white-crowned sparrow whistle, indicating that this initial syllable may act as a cue to alert the bird that the syllables that follow should be learned (Soha and Marler 2000). In a discussion of the unlearned aspects of this learned behavior, Peter Marler suggested that “[i]n the evolution of birdsong there is tension between selection pressures to develop along species-specific lines and pressures to systematically or opportunistically develop novel behaviors” (Marler 1984).

The balance between these pressures can differ dramatically from species to species. In certain species, more accurately learned songs (Melospiza melodia, Nowicki et al. 2002) or more species-typical songs (M. georgiana, Lachlan et al. 2014) elicited more mating displays from females. However, in other species, such as the great reed warbler (Acrocephalus arundinaceus), females in natural settings preferred males with larger repertoires (Catchpole 1986). Larger repertoire sizes have also been associated with larger song control areas in the brain (Devoogd et al. 1993; Szekely et al. 1996), with increased testosterone levels (Van Hout et al. 2012), with polygynous species compared to monogamous ones, and with increased migratory behavior across species (Read and Weary 1992). As well as general features of songs and song repertoires, specific elements of a song can also be under selection: in a long-term study of Savannah sparrows (Passerculus sandwichensis), certain syllable types were associated with reproductive success, and these increased in frequency over time (Williams et al. 2013). Further, the song features that are the least variable within a species might be salient cues for species recognition, although the level of support for this “invariant feature hypothesis” (Marler 1960) depends on the species studied (Emlen 1972; Nelson 1989; Dabelsteen and Pedersen 1992; Shackelton et al. 1992).

In addition to this tension between selection for accurately learned or species-typical song and selection for complex songs, different learning programs are exhibited by different songbird species. Species with age-limited or closed-ended learning, such as zebra finches (Taeniopygia guttata, Zann 1990) and song sparrows (M. melodia, Marler and Peters 1987), crystallize their song repertoires at a relatively early age, whereas species with open-ended learning, such as northern mockingbirds (Mimus polyglottos, Howard 1974) and European starlings (Sturnus vulgaris, Buchanan et al. 2003), are capable of learning new song elements throughout their lives. In certain species, a larger repertoire size in males has been shown to correspond to earlier pairings and an increased number of offspring (e.g. Howard 1974; Catchpole 1980), indicating that repertoire size could be under strong sexual selection. In some cases, this earlier pairing allowed birds to produce an additional clutch of offspring (Price et al. 1988), corresponding to a dramatic fitness benefit.

Using published estimates for the number of syllables in a species-typical repertoire and the length of the song-learning program (Appendix 1), we found that open-ended learners had significantly larger repertoire sizes than closed–ended learners, even when controlling for phylogeny (p = 4 × 10−5, Fig. 1). We suggest that sexual selection on repertoire size can be characterized as a niche-constructing process (Laland et al. 2000; Odling-Smee et al. 2003b; Rendell et al. 2011) that shifts the cost-benefit balance of continuing to learn throughout life. Changes in neural circuits in the song system, perhaps building on existing plasticity, are hypothesized to underlie the shift to open-ended learning (Brenowitz 2004). The connections between different parts of the song system in the brain can change with age, hormone levels, and experience in both open- and closed-ended learners (Nottebohm 1992). In addition, specific regions of the song system have been shown to play different roles in the canary (Serinus canarius), an open-ended learner, and the zebra finch (T. guttata), a closed-ended learner (Nottebohm 1992). Since there is a metabolic and developmental cost to the juvenile song-learning phase (Nowicki et al. 1998a), there is almost certainly a cost to maintaining the neural underpinnings of song learning into adulthood, although this has yet to be directly quantified. However, this capacity to learn throughout life could be advantageous if females of a species are judging male quality based on size or complexity of repertoires. Extending the song-learning timeframe could also be beneficial when selection does not favor larger repertoires, for example (a) when females of a species prefer accurate learning or species-typical song, especially for males that have learned an inappropriate song early in life, or (b) when females prefer a song that closely matches a local dialect or a nearby neighbor, especially when birds establish a territory near others that they did not hear as juveniles, for example after dispersal or migration. This type of extension of the sensitive period, which seems capable of correcting juvenile learning errors or adapting a song to a local environment, is observed in certain migratory birds (e.g. Liu and Nottebohm 2007), although the song crystallizes very shortly after migration.

Fig. 1
figure 1

Species data from the analysis of published syllable repertoire sizes and learning programs (Appendix 1) were plotted on a log scale and grouped by learning mode (either closed-ended or open-ended learning). The difference between these distributions is significant in a phylogenetically controlled analysis (phylogenetic ANOVA, F = 52.97 p = 4 × 10−5)

Homophily (assortative mating) based on song type is also thought to play an important role in evolutionarily important processes such as pre-mating isolation, mate recognition, and speciation. This kind of homophily or mating preference is a different form of sexual selection in that it does not involve one song type (such as a complex or a simple song); instead females may favor a familiar song based on early experience (Laland 1994). In two subspecies of captive zebra finches (T. guttata guttata and T. guttata castanotis), cross-fostered females preferred the song of the foster father’s subspecies, demonstrating assortative mating according to song type (Clayton 1990). For example, the songs of male medium ground finches (Geospiza fortis) resemble their fathers’, and their songs differ quantitatively from songs of the sympatric species, the cactus finch (G. scandens). In the former species, females avoid mating with males that sing a heterospecific song, with very rare exceptions that generally have a behavioral explanation, such as the death of the father before song learning can take place (Grant and Grant 1996). In other species of Darwin’s finches, Grant and Grant (1989) observed similar patterns of transmission from fathers to sons, with females showing a preference for their fathers’ song types in potential mates. Grant and Grant (1989, 1992, 1996) concluded that song type was an important factor in species recognition and mate choice, with rare mistakes potentially leading to interspecific hybridization.

Evolutionary feedback between learning strategies and repertoire size may have been facilitated by sexual selection and homophily in songbirds. To study this possibility, we develop an age-structured model of gene-culture niche construction that is based on a model described by Creanza et al. (2013). This framework produces feedback between homophily, sexual selection, and learning strategies when song learning is costly. We are particularly interested in whether the presence of a costly cognitive ability for open-ended learning can increase in frequency in a population through sexual selection for the increase in song repertoire size that this learning can facilitate. Conversely, if sexual selection favors small repertoire size and effective learning, we may see the fixation of closed-ended learning, or selection may favor an extra chance to learn the correct song accurately. This represents a form of cultural niche construction whereby a costly biological trait (open-ended learning) can hitchhike to high frequency (or be removed from the population) due to an associated cultural trait that is under direct selection (Rendell et al. 2011).

Methods

To investigate the relationship between repertoire size and open- or closed-ended learning programs in songbirds, we performed a quantitative review of the literature on birdsong learning and adapted the model described by Creanza et al. (2013) to account for unique features of songbird learning, such as sexual selection favoring either accurate learning of species-typical songs or large song repertoires and song complexity (Howard 1974; Catchpole 1986; Nowicki et al. 2002a; Lachlan et al. 2014).

The study of birdsong often faces a “unit problem,” since different researchers define “repertoire size” by different criteria. For example, repertoire size has been defined as the number of song types, strophe or phrase types, syllable types, or note types, taking account of the features of particular songs that are most salient for the specific species studied (Macdougall-Shackleton 1997). For this study, we used syllable repertoire size as the unit because it is relevant both for birds with short songs of few repeated elements and for birds with long strings of variable elements.

We performed a literature search for data on repertoire size and learning programs using the terms “syllable repertoire” and “syllable repertoire size” combined with “bird,” “birdsong,” “song,” or “oscine.” This yielded the individual papers cited in Appendix 1 as well as two papers that compiled data on syllable repertoire size (Moore et al. 2011 and Szekely et al. 1996). We then performed another literature search for those species with available syllable repertoire size data, combining the Latin or common species names with the key terms “open-ended,” “closed-ended,” “adult learning,” “song plasticity,” “sensitive period,” “crystallization,” and “crystallized song” to determine the learning mode for each species. The resulting papers were categorized according to features of the studies and species they described. For example, some studies that reported open-ended learning recorded the difference in repertoire size between first-year birds and second-year or older birds, but did not record further changes in repertoire size throughout the life of the birds. These studies, noted in Appendix 1, could not eliminate the possibility that these birds crystallized their song after their second year and so might have experienced a kind of extended learning that may or may not have been truly open-ended.

We also reviewed studies on species that were known to “overproduce” song elements (Marler and Peters 1982; Nelson et al. 1996). Juveniles in these species learn more song elements than necessary and crystallize one particular song, which may follow the local song dialect or match their neighbor’s song after migration and before mating. Species with this type of closed-ended learning are also noted in Appendix 1.

To account for phylogenetic relationships when analyzing repertoire size and learning mode, we sampled 1000 trees for the subset of species listed in Appendix 1 from the avian phylogeny assembled by Jetz et al. (2012, data available at birdtree.org), rooted with Sayornis phoebe as an outgroup. We generated a consensus topology with the ‘consense’ function in PHYLIP (Felsenstein 1989). We then performed a phylogenetic ANOVA (Garland et al. 1993) implemented with the ‘phytools’ R package (Revell 2012) to compare the distribution of repertoire sizes between the two groups (open-ended and closed-ended learners) while controlling for the phylogenetic relationships between species. This test determines the significance of the ANOVA with an empirical null distribution, generated by simulating 100,000 times the evolution of (log10-transformed) syllable repertoire size along the tree with a Brownian motion algorithm.

To investigate possible evolutionary mechanisms that may facilitate or explain an association between repertoire size and learning mode, we analyzed a mathematical model that treats repertoire size as a selectively advantageous culturally transmitted trait and investigated a possible cultural niche-constructive relationship between learning mode, song homophily (assortative mating), and this repertoire size. With this modeling framework, we tracked the spread of the repertoire size trait denoted by T, with alternative forms, T, representing a relatively large, complex repertoire, and t, representing a relatively limited or simple repertoire. Two other dichotomous traits influence the evolution of T: a trait, S, that determines the learning program used by an individual, with S representing open-ended learning and s representing closed-ended learning, and a trait, M, that determines the rate at which females preferentially mate with males of a certain repertoire size, with M representing a preference for a song she heard early in life (i.e. the T state she inherited from her father) and m representing random mating. Haploid genetic structure is modeled for simplicity.

There are, therefore, eight pheno-genotypes, whose frequencies sum to one: TSM (frequency denoted as x 1), TSm (x 2), TsM (x 3), Tsm (x 4), tSM (x 5), tSm (x 6), tsM (x 7), and tsm (x 8). The form of the M trait (M or m) determines a female’s probability of departure from random mating by choosing a mate based on a comparison between features of the potential partner’s song and the song she learned from her father (i.e., based on a match between T states). This mating preference is measured by the parameter α (0 ≤ α ≤ 1), which is the probability that a female with genotype M departs from random mating: a proportion 1 − α of M females choose their mates randomly, and the remainder of the M females, α, mate preferentially with individuals of the T state that they inherited from their father. The frequency of mating between pheno-genotypes j and k is described by μ j,k for j,k = {1, 2, 3… 8}. For example, matings between females with pheno-genotype TSM (frequency x 1) and males with pheno-genotype TsM (frequency x 3) occur at frequency:

$$ \mu_{1,3} = (1 - \alpha )x_{1} x_{3} + \frac{{\alpha x_{1} x_{3} }}{{x_{1} + x_{2} + x_{3} + x_{4} }}. $$
(1)

See Appendix 2 for the corresponding equations for the 64 possible mating pairs.

To investigate the evolution of a simple and idealized system of open- or closed-ended learning, we introduced an age-structured model in which there are two available learning opportunities: one as a juvenile and one as an adult. Individual birds can learn from their parents as juveniles only (closed-ended learning) or can learn from their parents and continue to learn, after they mature, from the population as a whole (a simplified version of open-ended learning). These learning differences depend on the form of the S trait. Here, we represented learning early in life as occurring primarily from parent to offspring, although it is known that a juvenile’s repertoire is often influenced by other nearby adults (Baptista and Morton 1988; Williams 1990; Beecher et al. 2007). This vertical transmission represents learning of repertoire size, not necessarily particular syllables, so it could also be conceptualized as a form of imprinting in which the juvenile learns from the father the length of a species-typical song.

We further assumed that a bird is predominantly exposed to unrelated birds after reaching sexual maturity, so learning at the second opportunity is oblique. We assumed the repertoire size (T) is a culturally transmitted trait (Table 1a) that is transmitted from father to offspring with some rate of error. Finally, the preference for one type of song repertoire over another (M) and the trait determining learning mode, S, are treated as genetically transmitted traits with Mendelian inheritance (Table 1b–c). As described above, particular forms of the T trait can be transmitted at two life stages. First, juvenile birds with either learning mode (S or s) can acquire a large (T) or small (t) repertoire through vertical learning from their fathers at rates b TT and b tt , respectively (Table 1a). With the mating frequency between each pair of pheno-genotypes denoted by μ j,k , the full recursions for this learning step are of the form

$$ \bar{w}x_{1}^{v} = \left( {\begin{array}{*{20}l} {\mu_{1,1} b_{T \to T} + \frac{{\mu_{1,2} b_{T \to T} }}{2} + \frac{{\mu_{1,3} b_{T \to T} }}{2} + \frac{{\mu_{1,4} b_{T \to T} }}{4} + \mu_{1,5} (1 - b_{t \to t} ) + \frac{{\mu_{1,6} (1 - b_{t \to t} )}}{2}} \hfill \\ { + \frac{{\mu_{1,7} (1 - b_{t \to t} )}}{2} + \frac{{\mu_{1,8} (1 - b_{t \to t} )}}{4} + \frac{{\mu_{2,1} b_{T \to T} }}{2} + \frac{{\mu_{2,3} b_{T \to T} }}{4} + \frac{{\mu_{2,5} (1 - b_{t \to t} )}}{2}} \hfill \\ { + \frac{{\mu_{2,7} (1 - b_{t \to t} )}}{4} + \frac{{\mu_{3,1} b_{T \to T} }}{2} + \frac{{\mu_{3,2} b_{T \to T} }}{4} + \frac{{\mu_{3,5} (1 - b_{t \to t} )}}{2} + \frac{{\mu_{3,6} (1 - b_{t \to t} )}}{4}} \hfill \\ { + \frac{{\mu_{4,1} b_{T \to T} }}{4} + \frac{{\mu_{4,5} (1 - b_{t \to t} )}}{4} + \mu_{5,1} b_{T \to T} + \frac{{\mu_{5,2} b_{T \to T} }}{2} + \frac{{\mu_{5,3} b_{T \to T} }}{2} + \frac{{\mu_{5,4} b_{T \to T} }}{4}} \hfill \\ { + \mu_{5,5} (1 - b_{t \to t} ) + \frac{{\mu_{5,6} (1 - b_{t \to t} )}}{2} + \frac{{\mu_{5,7} (1 - b_{t \to t} )}}{2} + \frac{{\mu_{5,8} (1 - b_{t \to t} )}}{4}} \hfill \\ { + \frac{{\mu_{6,1} b_{T \to T} }}{2} + \frac{{\mu_{6,3} b_{T \to T} }}{4} + \frac{{\mu_{6,5} (1 - b_{t \to t} )}}{2} + \frac{{\mu_{6,7} (1 - b_{t \to t} )}}{4} + \frac{{\mu_{7,1} b_{T \to T} }}{2}} \hfill \\ { + \frac{{\mu_{7,2} b_{T \to T} }}{4} + \frac{{\mu_{7,5} (1 - b_{t \to t} )}}{2} + \frac{{\mu_{7,6} (1 - b_{t \to t} )}}{4} + \frac{{\mu_{8,1} b_{T \to T} }}{4} + \frac{{\mu_{8,5} (1 - b_{t \to t} )}}{4}} \hfill \\ \end{array} } \right) $$
(2)

where \( x_{i}^{v} \) represents the frequency x i after vertical transmission, \( \bar{w} \) is the average fitness of the population, μ j,k values and full recursions are given in Appendix 2, and b values are given in Table 1.

Table 1 Vertical transmission probabilities for T, S and M

Second, open-ended learners (S, represented by the frequencies x 1, x 2, x 5, and x 6) have another chance to learn T or t as adults, through horizontal or oblique learning from randomly chosen members of the population at large (see Eq. 3 below). Closed-ended learners (s, represented by the frequencies x 3, x 4, x 7, and x 8) do not have the opportunity to change their T state after the initial vertical learning step. Note that a smaller repertoire (t) can be transmitted by oblique learning, depending on its frequency in the parental generation. We used numerical iteration to explore the dynamics of the evolutionary system across a wide range of parameter values.

After this sequence of vertical learning → maturation → oblique learning, reproduction occurs at the same age for both open- and closed-ended learners, and selection acts via a fitness cost imposed on males with the S genotype relative to those with the s type (Fig. 2) In Appendix 2, the mating frequencies involving S males were multiplied by a factor of 1 − ε. Here, ε represents the constant fitness cost of open-ended learning to S males, for example the potential metabolic cost of maintaining the learning pathways of the song system. After the oblique learning step, selection acts on males with either a large (T) or a small (t) repertoire by increasing the number of offspring of T or t fathers. Oblique transmission and selection are described by the following equations:

$$ \begin{aligned} x_{1}^{o} = & (1 + \sigma_{T} )(x_{1}^{v} + \gamma x_{5}^{v} (x_{1}^{p} + x_{2}^{p} + x_{3}^{p} + x_{4}^{p} ) - \gamma x_{1}^{v} (x_{5}^{p} + x_{6}^{p} + x_{7}^{p} + x_{8}^{p} )) \\ x_{2}^{o} = & (1 + \sigma_{T} )(x_{2}^{v} + \gamma x_{6}^{v} (x_{1}^{p} + x_{2}^{p} + x_{3}^{p} + x_{4}^{p} ) - \gamma x_{2}^{v} (x_{5}^{p} + x_{6}^{p} + x_{7}^{p} + x_{8}^{p} )) \\ x_{3}^{o} = & (1 + \sigma_{T} )(x_{3}^{v} ) \\ x_{4}^{o} = & (1 + \sigma_{T} )(x_{4}^{v} ) \\ x_{5}^{o} = & (1 + \sigma_{t} )(x_{5}^{v} + \gamma x_{1}^{v} (x_{5}^{p} + x_{6}^{p} + x_{7}^{p} + x_{8}^{p} ) - \gamma x_{5}^{v} (x_{1}^{p} + x_{2}^{p} + x_{3}^{p} + x_{4}^{p} )) \\ x_{6}^{o} = & (1 + \sigma_{t} )(x_{6}^{v} + \gamma x_{2}^{v} (x_{5}^{p} + x_{6}^{p} + x_{7}^{p} + x_{8}^{p} ) - \gamma x_{6}^{v} (x_{1}^{p} + x_{2}^{p} + x_{3}^{p} + x_{4}^{p} )) \\ x_{7}^{o} = & (1 + \sigma_{t} )(x_{7}^{v} ) \\ x_{8}^{o} = & (1 + \sigma_{t} )(x_{8}^{v} ) \\ \end{aligned} $$
(3)

where x o i is the frequency after oblique transmission, x p i is the frequency x i in the parental generation, and the efficacy of learning obliquely is given by the parameter γ.

Fig. 2
figure 2

Schematic of the mathematical model with relevant parameters showing the life cycle of birds in the model (in the dashed box) and the stages of the life cycle at which learning events occur (shown in shaded boxes) for open-ended learners (S birds) and closed-ended learners (s birds) on the left and right, respectively. Closed-ended learners learn either a large (T) or small (t) repertoire only through vertical learning before sexual maturity. Open-ended learners undergo a second, oblique learning event before mate choice and reproduction but after sexual maturity

Coefficients σ T and σ t (0 ≤ σ T ,σ t  ≤ 1) determine the strength of selection (Table 2), with σ T  > 0 representing a fitness advantage for larger repertoires (T) and σ t  > 0 representing a fitness advantage for smaller, simpler repertoires (t). Studies of sexual selection based on repertoire size generally measure one of two proxies: (1) female preference for song repertoires of different sizes, often quantified as number of copulation solicitation displays or degree of approach to the source of song playback, and (2) the outcome of mate choice, which assumes that more desirable males will be preferentially chosen by females and is usually measured by pairing date, egg-laying date, or mating success (Byers and Kroodsma 2009). Following this tradition, there are two forms of sexual selection that can operate in our modeling framework: females can have a mating preference based on song repertoire (α), and males with certain song repertoires are favored by sexual selection (through, for example, an earlier pairing date or increased mating success) and thus have a fitness advantage (σ) as suggested in the literature (Price et al. 1988).

Table 2 Fitness effects

Using initial pheno-genotype frequencies near fixation of tsm (x 1 = 0.011, x 2 = 0.012, x 3 = 0.013, x 4 = 0.01, x 5 = 0.015, x 6 = 0.014, x 7 = 0.016, x 8 = 0.909), we introduced large repertoires (T), open-ended learning (S), and homophily (M) at low frequency, and the system was numerically iterated for 100,000 timesteps or until the pheno-genotype frequencies (x i ) reached equilibrium, which almost always occurred in many fewer timesteps. Finally, we made the assumptions that there can be cultural mutation between T and t when b TT and b tt are not equal to 0 or 1 (Table 1) and that a short or simple repertoire (t) can be more easily learned by a juvenile bird than a long or complex repertoire (T): b tt  > b TT . In other words, a juvenile whose father or tutor has a smaller repertoire (t) will only rarely produce a larger repertoire (T), perhaps through copy error, improvisation, or eavesdropping (Beecher et al. 2007; Sober and Brainard 2012). However, a juvenile whose father/tutor has a large repertoire (T) might, with a larger probability (1– b TT  > 1– b tt ), learn a smaller repertoire (t) than his tutor, perhaps because of the challenges of learning a large repertoire or because he was not exposed to his father’s entire repertoire (Botero et al. 2008). Since some closed-ended learners are known to learn song elements obliquely early in life (e.g. Baptista and Morton 1988; Liu and Nottebohm 2007), we note that the vertical transmission here denotes repertoire size learning, not necessarily transmission of particular syllables, and so could represent a form of imprinting where a juvenile vertically learns characteristics of a species-typical song.

Results

There is a clear relationship between the mode of learning employed by a bird species and the size of its reported syllable repertoire. In Fig. 1, the repertoire sizes from the review of published data described above are plotted on a log scale and grouped by learning program. We found a significant difference in repertoire size distributions between open- and closed-ended learners even when accounting for phylogenetic relationships (phylogenetic ANOVA, F = 52.97, p = 4 × 10−5). Thus there is an association between open-ended learning and large repertoire sizes, warranting further investigation into the effects of, for example, phylogenetic distance and life history traits.

In the model of cultural niche construction, when σ T  > 0 (Table 2) selection favored larger repertoires, and males of type T left more offspring than those with smaller repertoires, t. Here both closed-ended (s) and open-ended (S) learners had a chance to learn a large repertoire from their fathers when young, and open-ended learners (S) had a further chance to change their repertoires by copying others when older. In these numerical simulations, we introduced large repertoires (T) and open-ended learning (S) in a small fraction of the population and tracked their frequency across generations. Since cultural mutation is possible in this system, a small repertoire can be learned from a tutor with a large repertoire and vice versa; T and t thus coexist at equilibrium, and we display the frequency of T in the figures. When open-ended learning (S) approached fixation, large repertoires became dramatically more frequent (upper right of Fig. 3a). When a large repertoire had a strong fitness advantage and adult learning was very effective (σ T and γ both large), all individuals in the population eventually had open-ended learning (S) and nearly all had a large repertoire (T). As γ increased, i.e. adult learning became more effective (x-axis), a smaller fitness advantage (σ T ) was sufficient for fixation of S and the associated sharp increase in T. When the cost of extending the learning window into adulthood was higher (ε larger), the fitness benefit of a large repertoire had to be greater and the effectiveness of learning higher in order for adult learning (S) to approach fixation (Fig. 3b).

Fig. 3
figure 3

The effect of oblique, open-ended learning, γ (x-axis), and the fitness benefit (σ T ) of a large repertoire that is difficult to learn vertically or the fitness benefit (σ t ) of a small repertoire that is easier to learn vertically, on the frequency of T in the population. In panels a and b, σ t  = 0 and the y-axis is σ T ; in panels c and d, σ T  = 0 and the y-axis is σ t . The color bar shows the frequency of T, from 0 (darkest) to 1 (lightest). Red lines enclose areas where S has fixed in the population; otherwise, s has fixed (closed-ended learning). In panel d, S did not fix in the population for any combinations of σ t and γ. Panels a and c show cases where the cost of open-ended learning is low (ε = 0.05) and panels b and d show cases where it is higher (ε = 0.1). For each panel, other parameters are: α = 0, b TT  = 0.7 and b tt  = 0.9

When σ t was positive and σ T  = 0, individuals with t had higher fitness than their T counterparts with larger complex repertoires. It is useful to imagine that larger repertoires here represent a collection of new or learned sounds picked up through mistakes in learning or in song recognition. In this case, a small set of specific song elements (t) was favored. When open-ended learning allowed a second chance to learn and was not too costly (σ t  > 0, ε = 0.05), S was driven to high frequency along with the favored song, t, when selection (σ t ) and oblique transmission (γ) were both strong (Fig. 3c). However, if continuing to learn was more expensive (ε = 0.1), S did not rise in frequency even when selection (σ t ) and oblique transmission (γ) were both strong, and a smaller repertoire (t) was found at high frequency along with closed-ended learning (s) for all values of σ t and γ (Fig. 3d); this was also the case if b tt  = 1. These results suggest that if continuing to learn is very costly, we should expect to see that sexual selection for a trait that is relatively easy to learn as a juvenile should lead to the fixation of closed-ended learning (Fig. 3d). Conversely, if a trait is relatively difficult to learn initially but strongly favored, selection for the trait should facilitate the spread of open-ended learning (Fig. 3a–b).

Similarly, as the fitness benefit of a large repertoire (σ T ) increased (σ t  = 0), open-ended learning was able to fix in the population at increased cost (ε) of this extended learning window (Fig. 4a). In effect, open-ended learning (S) was able to hitchhike on the fitness benefits of larger repertoires, even when costly, because it enabled the more efficient spread of the beneficial trait T. When small repertoires (t) were favored (σ t is high, σ T  = 0), open-ended learning (S) facilitated the spread of the favored t trait, but only when costs were low or the fitness advantage of t was large (Fig. 4b).

Fig. 4
figure 4

The effect of the cost of oblique, open-ended learning, ε (x-axis), and the fitness benefit σ T of a large repertoire that is difficult to learn vertically (panel a) or σ t of a small repertoire that is easier to learn vertically (panel b), on the frequency of T in the population. In panel a, σ t  = 0 and the y-axis is σ T ; in panel b, σ T  = 0 and the y-axis is σ t . Color bar and red lines as in Fig. 3. Parameters are α = 0, \( b_{T \to T} \) = 0.7, \( b_{t \to t} \) = 0.9, γ = 0.5 for each panel

Intuitively, when the vertical transmission of a large or complex repertoire was relatively inefficient (b TT was low) a large fitness advantage (σ T ) was necessary for T to spread, which in turn led to the fixation of open-ended or adult learning, S, (top left of Fig. 5a However, when the vertical transmission of these large repertoires was very accurate (b TT  > 0.9), the benefit of oblique song learning as an adult did not outweigh its cost, and large repertoires could spread in the population without open-ended learning (top right of Fig. 5a). A similar pattern is seen for the transmission (b tt ) and fitness (σ t ) of t, a relatively simple repertoire (Fig. 5b). In other words, when a certain repertoire size is favored by sexual selection, but juveniles do not necessarily learn it accurately, then open-ended learning provides a “second chance” to acquire the favored repertoire size. However, if juvenile learning is very accurate, these costly second chances are not necessary. On the other hand, if juvenile learning is inaccurate, the favored repertoire size might remain at low frequency in the population, in which case random oblique tutors are less likely to provide the correct repertoire size.

Fig. 5
figure 5

The effect of vertical transmission (\( b_{T \to T} \)) and the fitness benefit of a large repertoire that is difficult to learn vertically (σ T ) on the frequency of T in the population (panel a), and the effect of the efficacy of vertical transmission (\( b_{t \to t} \)) and the fitness benefit of a relatively small repertoire that is easier to learn vertically (σ t ) on the frequency of T in the population (panel b). Color bar and red lines as in Fig. 3. Parameters are α = 0, γ = 0.5, and ε = 0.05 for each panel. σ t  = 0 in panel a and σ T  = 0 in panel b

In addition, when α > 0, M females preferentially mate with males that have the same relative repertoire size as the one they learned from their father (Appendix 2). This homophily, or assortative mating based on song, promotes the spread of open-ended learning (S) in a number of ways. First, homophily lowers the fitness advantage to T (σ T ) necessary for S to spread (Fig. 6a). Second, homophily lowers the threshold for the rate of oblique learning (γ) necessary for open-ended learning to reach high frequency (Fig. 6b); as the degree of homophily (α) increases, open-ended learning can be less effective (smaller γ) and still approach fixation. Third, homophily allows open-ended learning to evolve in cases where the cost of adult learning (ε) would otherwise be prohibitively high (Fig. 6c). Finally, homophily expands the range of transmission rates of T for which open-ended learning can spread (Fig. 6d). In other words, homophily allows T to spread more effectively even when it is weakly vertically transmitted, in turn favoring the evolution of open-ended learning.

Fig. 6
figure 6

The effect of homophily, α, and other parameters on the frequency of T in the population. Color bar and red lines as in Fig. 3. The frequency of M in the population increases in the direction of the black arrows. Parameters are: a \( b_{T \to T} \) = 0.7, \( b_{t \to t} \) = 0.9, γ = 0.5, ε = 0.05, and σ t  = 0, b \( b_{T \to T} \) = 0.7, \( b_{t \to t} \) = 0.9, σ t  = 0, σ T  = 0.6, and ε = 0.05, c \( b_{T \to T} \) = 0.7, \( b_{t \to t} \) = 0.9, σ t  = 0, σ T  = 0.6, and γ = 0.5, d \( b_{t \to t} \) = 0.9, σ t  = 0, σ T  = 0.6, ε = 0.05, and γ = 0.5

Since the cultural trait in question, T, reflects repertoire size in relative rather than absolute terms, we assumed fairly high accuracy of transmission in the simulations used to generate the previous figures. However, in a study of great tits (Parus major), the narrow-sense heritability of absolute repertoire size was found to be quite low (~0.36, McGregor et al. 1981). With this in mind, we tested the predictions of our model with a low transmission accuracy of repertoire size, retaining the constraint that as a juvenile it is easier to learn a small repertoire than a large one (b tt  > b TT ). The results of these simulations were qualitatively very similar to those with high transmission rates. As selection favoring large repertoires (σ T ) increases (y-axes, Fig. 7a–c), open-ended learning (S) can approach fixation with a reduced rate of adult learning (γ, Fig. 7a) and with increased costs to adult learning (ε, Fig. 7b), and there is an associated increase in the frequency of T. When the fitness advantage of a large repertoire is strong (σ T  > 0.6), S approaches fixation and T increases in frequency for low and intermediate values of b TT , the transmission of T; however, when the transmission of large repertoires is very accurate (b TT near 1), the benefits of adult learning (S) do not outweigh their costs, and closed-ended learning (s) remains fixed in the population (Fig. 7c). A similar pattern was seen when comparing the transmission of t (b tt ) to the fitness advantage of a small repertoire (σ t , Fig. 7d). An overview of the predictions of the model at relatively high and low values of each parameter is given in Fig. 8.

Fig. 7
figure 7

The effect of selection on repertoire size when the transmission of repertoire size is low (\( b_{T \to T} \), \( b_{t \to t} \) < 0.5). There is a tradeoff between the fitness advantage of a large repertoire that is difficult to learn vertically (σ T ) and the efficacy of adult learning (γ, panel a), the cost of open-ended learning (ε, panel b), and the rate of transmission of a large repertoire (\( b_{T \to T} \), panel c). Similarly, in panel d, there was a tradeoff between the fitness advantage of a small repertoire (σ t ) and rate of transmission of a small repertoire (\( b_{t \to t} \)). In panels ac, σ t  = 0 and the y-axis is σ T ; in panel d, σ T  = 0 and the y-axis is σ t . Color bar and red lines as in Fig. 3. Unless varied, parameters are α = 0, \( b_{T \to T} \) = 0.3, \( b_{t \to t} \) = 0.48, γ = 0.5, ε = 0.05 for each panel

Fig. 8
figure 8

Overview of the predictions of the model at different rates of transmission of large repertoires: low (left column; \( b_{T \to T} \) = 0.25, \( b_{t \to t} \) = 0.4) and high (right column; \( b_{T \to T} \) = 0.75, \( b_{t \to t} \) = 0.4). We summarize the predictions of the model by varying several parameters: the selective advantage of large (top row; σ T  = 0.8, σ t  = 0) and small (bottom row; σ T  = 0, σ t  = 0.8) repertoires, the cost of adult learning (low: ε = 0.01; high: ε = 0.1), and the effectiveness of adult learning (low: γ = 0.09; high: γ = 0.9). Homophily was not included. When σ T  ≈ σ t (middle row), the transmission parameters for T and t predict the relative frequency of T and t in the population at equilibrium, and adult learning (S) is not present. When selection for a certain repertoire size is strong (σ T or σ t high, top and bottom rows), adult oblique learning (S) approaches fixation in the population unless its cost is high and its effectiveness is low. A high effectiveness (γ) of adult learning leads to a higher frequency of the favored T phenotype. Note that, as shown in Fig. 5, if the transmission of the favored phenotype is higher than the b values presented here, S may be lost from the population when oblique learning is not necessary for the favored T phenotype to spread quickly. The value of the transmission parameter (\( b_{T \to T} \) or \( b_{t \to t} \)) at this boundary between fixation of S and fixation of s decreases as the cost of the adult oblique learning step (ε) increases

Discussion

Cultural niche construction occurs when one or more cultural traits can alter the evolutionary pressures on other cultural or genetic traits; models of cultural niche construction can incorporate, for example, fitness effects, mating preferences, cultural transmission differences, and age-structured learning (Odling-Smee et al. 2003; Ihara and Feldman 2004; Creanza et al. 2012, 2013; Fogarty et al. 2013). The literature on niche construction in birds has focused primarily on foraging and nest-building behaviors (Tebbich et al. 2001; Odling-Smee et al. 2003; Jones et al. 1996; Harrison and Whitehouse 2011). Previous models have explored various aspects of avian song learning and its evolution: the origin (Aoki 1989) and maintenance (Lachlan and Slater 1999) of vocal learning itself, the preservation of dialects (Planqué et al. 2014), the restrictiveness of “innate learning preferences” (sensu Marler 1990; modeled in Lachlan and Feldman 2003), and the effect of the song learning program on song divergence and male dispersal (Ellers and Slabbekoorn 2003). Other theoretical work has investigated the influence of song learning on evolutionary processes, including the link between learned song and male quality (Ritchie et al. 2008; Lachlan and Nowicki 2012), as well as the effects of song learning on the evolution of brood parasitism (Beltman et al. 2003), on speciation after colonizing a new niche (Beltman et al. 2004), and on population divergence in allopatric and sympatric contexts (Lachlan and Servedio 2004; Olofsson and Servedio 2008; Olofsson et al. 2011; Rowell and Servedio 2012). Here, we have proposed a model for the interactions between three traits that may be important in songbird evolution: the culturally transmitted trait of song repertoire size, the capacity for song learning in adulthood, and mating preference for a song heard early in life. These traits can indeed influence one another in this theoretical evolutionary framework, sometimes in complex ways.

First, we explored the association between sexual selection for a large repertoire size and the spread of open-ended learning. Maintaining the neural circuitry for song learning into adulthood can allow a bird to increase its repertoire size but is hypothesized to be costly; we observed a tradeoff in the effect of this cost. The rate of adult song learning (γ), which can help a bird acquire an attractive repertoire, had a nonlinear relationship with the fitness advantage (σ T ) necessary for open-ended learning to spread in a population. In other words, if adult song learning is very effective (large γ) at increasing repertoire size, the sexual selection pressure on large repertoires (σ T ) does not need to be as strong for the benefits of open-ended learning to outweigh its costs (Fig. 3).

The fidelity of transmission of a large repertoire to juveniles (b TT ). and the sexual selection for a large repertoire (σ T ) also interacted in a nonlinear way to shift the balance of costs and benefits of open-ended learning (Fig. 5). As juveniles learn more effectively (increased b TT ), the benefits of open-ended learning outweigh the costs under weaker selection for a large repertoire (decreased σ T ) and open-ended learning (S) approaches fixation in the population; however, if birds can very easily learn a large repertoire as juveniles, open-ended learning ceases to have an advantage. The benefit of being able to learn as an adult seems most likely to outweigh its cost when a large repertoire is favored by sexual selection but cannot be easily learned as a juvenile. Juveniles have been tutored by an adult tutor in captivity in only a few species of open-ended learners; European starlings (S. vulgaris) learned 62–76 % of their tutor’s repertoire with twelve weeks of exposure, and canaries (S. canarius) raised with a tutor from the juvenile stage until sexual maturity learned 76–91 % of their tutor’s repertoire. In contrast, in the closed-ended learners studied (e.g. T. guttata, M. georgiana), juveniles are assumed to be capable of imitating the complete syllable repertoire of the tutor, and learning accuracy can be measured by comparing the fine-scale properties of sound spectrograms from full pupil and tutor songs (Tchernichovski et al. 2001; Nowicki et al. 2002b). For both open- and closed-ended learners studied, tutoring with few syllables led the pupil to have a very limited repertoire, but birds tutored with abnormally large repertoires only learned species-typical repertoire sizes (Kroodsma and Canady 1985; Clayton 1989; Airey et al. 2000; Tchernichovski et al. 2001). The exception to this pattern is open-ended learners that improvise most of their syllable repertoire (Kroodsma et al. 1997; Leitner et al. 2002); these species are capable of producing a species-typical repertoire in isolation and are not within the scope of the current model.

Homophily, incorporated here as a female’s preference (α) for males that sing a song of the type she remembers hearing as a juvenile, can also shift the balance between the cost (ε) of adult learning and the benefit (σ T ) of learning a selectively advantageous song, thus enhancing the spread of open-ended learning with a smaller fitness advantage of large repertoire size, reduced effectiveness of adult learning, increased open-ended learning cost, and a larger range of juvenile learning rates (Fig. 6). In these simulations (Figs. 3, 4, 5, 6, 7 and 8), the populations displayed variation in repertoire size (both T and t present) but approached fixation for either S, open-ended learning, or s, closed-ended learning (except in certain cases with very high homophily); this is consistent with field and laboratory studies of songbird species.

In at least some species, song is a reliable indicator of male fitness, and different aspects of song are thought to be under sexual selection pressure in different species (Macdougall-Shackleton 1997; Nowicki et al. 1998b). Examining the broad phylogenetic distribution of both large and small repertoires, Macdougall-Shackleton noted that “selection has probably acted both to increase and to decrease repertoire size to different degrees in different lineages” (1997). Likewise, while less well studied, open- and closed-ended learning may appear in closely related branches of the oscine songbird lineage (Appendix 1).

Evidence that repertoire size is under selection is somewhat inconsistent. Under laboratory conditions, females offered the choice between two songs will often pick the more complex one, but this preference does not necessarily hold in a natural setting (Searcy and Marler 1984; Searcy 1992). Evidence from field tests of song repertoire size preference in female birds has not pointed to a consistent effect of repertoire size (Searcy 1992; Byers and Kroodsma 2009; Soma and Garamszegi 2011). However, one source of these apparent conflicts may be that the learning program of the species studied was not taken into account. For example, species surveyed by Searcy (1992) that showed earlier pairing dates for males with larger repertoires were open-ended learners (Howard 1974; Catchpole 1980), and species that showed no relationship between pairing date and repertoire size were closed-ended learners (Krebs et al. 1978; Searcy 1984; but see Reid et al. 2004). Thus, sexual selection based on repertoire size might interact with learning programs and learning modes in natural populations.

With this in mind, it may be necessary to take account of these learning modes and the length of the sensitive period for learning in studies of mate choice, honest signaling, and species recognition. Our model hints at an evolutionary interaction that can be tested in future field studies and evolutionary analyses. In addition, this model could be extended to account for more learning scenarios. For example, in the case of avian species that are capable of prolific mimicry, the oblique learning step of a juvenile or adult would not be constrained to members of its own species. This learning step would therefore depend not only on the repertoires of conspecifics but also on the diversity of ambient sounds. Further, our current framework of open- and closed-ended learning does not discriminate between species that can increase their repertoire size throughout life and species that are capable of adult learning only until their second year. In future work, our model could be modified to include mimicry of interspecific sounds and additional learning steps.

The model presented here is, of course, a simplified representation of a very complex process, but it demonstrates that culturally transmitted song can be a niche-constructing trait, influencing the spread of other traits that are likely to have genetic underpinnings, such as those that affect neural development and mating preferences. The niche-constructed selection pressures on a bird’s song can shift the balance between the costs of maintaining the neural architecture for adult song learning and the benefits of increasing repertoire size with age and experience.