Zebra finch behaviour differs consistently between individuals but is not affected by early life adversity

Individual variation in animal personality is ubiquitous, but little is known of its proximate causes. Theory predicts individuals will be more risk prone when their life expectancy is short, and adverse environmental conditions during development shorten life span. We modi ﬁ ed developmental conditions of zebra ﬁ nches, Taeniopygia castanotis , by manipulating parental foraging conditions to be either ‘ harsh ’ or ‘ benign ’ and studied offspring behaviour in adulthood using multiple tests: exploration, novel object, tonic immobility, dominance and sociality. Each test was done in duplicate, and repeatabilities of test scores ranged from 0.31 to 0.55 independent of treatment with one exception: tonic immobility was repeatable in individuals reared in harsh conditions only. Correlations between behaviours (i.e. behavioural syndromes) were generally weak and nonsigni ﬁ cant. Growing up in harsh conditions and in larger broods both negatively affected offspring growth and thereby presumably their life expectancy, but neither affected behaviour in standardized tests. This raised the question whether the origin of variation in personality was perhaps largely genetic, but heritability estimates were low to moderate. We conclude that variation in personality in our study can be attributed to environmental effects, but independent of early life adversity as manipulated here. To explain this ﬁ nding, which runs counter to our expectation, we present two hypotheses. First, the adolescent social environment impacted the ontogeny of per- sonality, overshadowing developmental effects. Second, the risk associated with a behaviour may be state dependent, with identical behaviours encompassing a greater risk in low-quality than in high- quality individuals. Thus, while individuals with different developmental backgrounds displayed similar behaviour, the perceived risk this entailed may have been very different and in accordance with theoretical predictions. Given

Consistent interindividual differences in behaviour, or personality (Dingemanse & Wright, 2020;R eale et al., 2007), have been well studied in recent years. In particular ecological and evolutionary consequences of personality variation have been elucidated (e.g. Dingemanse et al., 2004), revealing that personality variation can be maintained by natural and sexual selection (Carere & Eens, 2005;Dingemanse & R eale, 2005;Schuett et al., 2010;Wolf & Weissing, 2010). Furthermore, the genetic causes of personality variation have been well studied (Edwards et al., 2015;Korsten et al., 2010;Lesch et al., 1996;Mueller et al., 2013;Thys et al., 2021;Verhulst et al., 2016), and on average 13.7% of personality variation is attributable to additive genetic effects (Dochtermann et al., 2015). Thus, the largest part of the proximate causes of personality variation remains unexplained. Yet the study of the proximate causes of variation in animal personality can provide insights into the evolutionary and functional constraints, and further our understanding of how selection acts on (the plasticity of) personality (Stamps & Groothuis, 2010).
On a functional level, life history theory predicts that individuals with higher fitness expectations, that is, a higher residual reproductive value (RRV), should be more risk averse (Wolf et al., 2007). The relationship between the benefits and costs of taking risks shifts towards taking less risk when the RRV is high because these individuals have more to lose (i.e. 'asset protection', Clark, 1994;Roff, 2002). A core assumption of this theory is that individuals with different life history strategies end up with similar fitness, but individuals with different personalities will not have similar fitness when the expression of personality is state or condition dependent. Such associations could, for example, arise when the risk associated with specific behaviours is state dependent, such as a better capability to escape predators (Stamps, 2007), where individuals in a better state face reduced risk. In this scenario, state-dependent behaviour would negate the functional prediction that individuals with higher fitness expectations behave as if more risk averse, and alternative predictions are not obvious.
Testing the relationship between fitness expectations and behaviour is preferably done in wild populations, where resources such as energy are limited, creating an incentive to take risks. However, results from such studies are mixed. For example, lower RRV was associated with more risk-prone behaviour in superb fairy wrens, Malurus cyaneus (Hall et al., 2015), in line with the functional framework. However, RRV was not associated with risktaking behaviour in field crickets, Gryllus campestris (Fisher et al., 2018) or in yellow-bellied marmots, Marmota flaviventris (Petelle et al., 2019). Moreover, recent meta-analyses found 'no directional relationship between riskier behaviour and greater mortality' (Moiron et al., 2020, p. 399), and generally weak relationships between behaviour and 'pace-of-life' (Royaut e et al., 2018). A potential explanation for the weak support is the correlational nature of most studies, making it difficult to separate genetic and environmental (e.g. developmental) effects on behaviour. This can potentially be resolved with an experimental approach, for example manipulating factors that modulate state and associated fitness prospects, followed by the assessment of individual variation in risk-taking behaviour.
In this study, we manipulated developmental conditions of captive zebra finches, Taeniopygia castanotis, and thereby their fitness expectations, and studied their behaviour in adulthood, to test for effects of life expectancy on risk-prone behaviour. Developmental conditions were manipulated by breeding zebra finches in aviaries where the parents experienced either benign or harsh foraging conditions Jimeno et al., 2019;Koetsier & Verhulst, 2011). We previously showed that harsh foraging conditions during development negatively affected offspring growth (Gerritsma et al., 2022), indicating reduced life span expectations Briga & Verhulst, 2021;De Kogel, 1997). We measured behaviour in multiple behavioural tests: (1) exploration in a novel environment (Fidler et al., 2007;Verbeek et al., 1994;Wuerz & Krüger, 2015), (2) boldness in a novel object setting (King et al., 2015;Spencer & Verhulst, 2007), (3) response in a tonic immobility test (Gallup, 1977(Gallup, , 1979Peixoto et al., 2020), (4) dominance in a competition over a nonfood resource (Spencer & Verhulst, 2007) and (5) sociality in a preference setting (Kelly et al., 2011;Kelly & Goodson, 2015). All tests were first optimized to yield high repeatabilities, which is essential to characterize individual variation in behaviour. In the context of our study, risk-prone behaviour was assumed to be expressed as faster exploration of a novel environment, faster investigation of a novel object (boldness), less time spent immobile in the tonic immobility test and more dominant behaviour. For the sociality test we did not see a clear prediction with respect to the risk involved in being more or less social.
We further investigated whether behavioural syndromes, that is, correlations between different personality traits, emerged depending on developmental conditions (Bell & Sih, 2007;Dhellemmes et al., 2020;Dingemanse et al., 2007) or sex (Fresneau et al., 2014). This is of interest in itself but of particular relevance with respect to the prediction that individuals with higher fitness expectations should be more risk averse, because such a prediction is only tenable when individuals consistently differ in the risk they take, regardless of the nature of the behavioural test.

Ethical Note
All methods and experiments in this research were performed with approval of the Central Committee for Animal Experiments ('Centrale Commissie Dierproeven') of the Netherlands, under licence AVD1050020174344. Minor stress from transporting individuals from and to test locations was unavoidable; however, test room and indoor aviaries were located close to each other which limited transport times. No injuries or mortality were observed during the behavioural tests. Animals were kept alive for further experiments after this study.

Animals and Developmental Treatment
We used 93 zebra finches (57 females, 36 males) raised in 58 different broods. The experimental setting in which the birds were reared is described in Gerritsma et al. (2022Gerritsma et al. ( , cohorts 2018Gerritsma et al. ( and 2019. In brief, breeding took place in four outside aviaries (320 Â 150 cm and 210 cm high) with up to 24 individuals each (sex ratio approximately 1:1). Two aviaries had low foraging costs and two had high foraging costs, as described in Koetsier and Verhulst (2011). Briefly, food, a commercial seed mixture, was offered in a single container (120 Â 10 cm and 60 cm high), which was suspended from the ceiling. Food containers had five holes on each side (i.e. 10 in total), where the birds could access the seeds. In the benign foraging environment, there were perches beneath the holes, allowing the birds to perch while eating (low costs). In the harsh foraging environment, these perches were absent, forcing the individuals to fly back and forth between a distant perch and the food box for each seed.
The aviaries were equipped with nestboxes and ad libitum nest material (hay) from mid-April until mid-November each year. We measured offspring mass and biometry at the ages of 15 (±1) and 31 (±1) days, and growing up in harsh foraging conditions substantially reduced offspring growth (Gerritsma et al., 2022). Based on our earlier work, we assume that individuals growing up in harsh conditions had reduced fitness expectations (e.g.  and refer to them as low-quality individuals. Offspring did not learn to eat from the foraging box in either easy or harsh foraging conditions. Consequently, they were nutritionally dependent on their parents until removal from the aviary to an identical aviary with food offered in containers on the floor, from which they would readily eat. At the age of 65 ± 4 days, the offspring were moved to single-sex groups. Between removal from the natal aviary and age 100 days, offspring were housed with unrelated adults, two males and two females, for sexual imprinting and song learning. After reaching the age of 100 days, individuals were moved to indoor aviaries (295 Â 150 cm and 230 cm high, 14:10 h light:dark, lights on 0700e2100, ±20 C, relative humidity ca. 50%, food (a seed mixture) and water ad libitum) with up to 24 individuals each. Individuals were colour ringed (size SP2, A.C. Hughes, Redruth, U.K.) for identification purposes and behavioural testing commenced afterwards (age at first test: mean ± SD: 296 ± 92 days).

Behavioural Tests
Behavioural tests were carried out in a fixed order to standardize potential sequence effects (Bell, 2013). The order of testing was: (1) novel environment, (2) novel object, (3) tonic immobility, (4) sociality and (5) dominance. Each individual was subjected to each behavioural test twice before the next test in the sequence. All tests were recorded using a video camera (Polaroid Cubeþ for the novel environment, JVC Everio R for all others). The novel environment, tonic immobility and dominance videos were analysed using Solomon coder (v19.08.02, https://solomon.andraspeter.com/ ). We tracked an individual's position in the novel object and sociality videos using EthoVision XT 11.5 (Noldus Information Technology, Wageningen, the Netherlands). Researchers were blind to the developmental manipulation of the individual when analysing the video recordings.
We performed the novel object, sociality and dominance tests in the same specially designed cage (104 Â 52 cm and 52 cm high), with six perches, equally distributed over the length of the cage, which was permanently placed in a test room, in which we also performed the tonic immobility test. Prior to each behavioural test, birds were moved from the indoor aviaries to this designated test room. Hence, only the novel environment test was performed in a different room. To habituate the birds to the test cage and room, we moved each individual with three same-sex conspecifics to the testing cage for 1 h per day, for 4 consecutive days, before the first behavioural test. On the fifth day, each individual was individually habituated to the behavioural testing cage for 1 h, with sounds from an unfamiliar zebra finch colony playing in the background.

Novel environment
The novel environment was a room (200 Â 150 cm and 200 cm high) in which were three artificial trees with four branches each (the same trees as used in Verbeek et al., 1994). The position and the colour (either wood or white) of the trees were changed between the first and second test to keep the environment novel. Individuals were transported in a small opaque wooden box (15 Â 15 cm and 20 cm high) to the test room, where the box was put on a pedestal (1 m high) against a screen that was one wall of the test room. Immediately after leaving the novel environment, the experimenter opened the box with a pulley system from behind a screen. The individual was then recorded for 60 min. We measured the latency in leaving the box (log 10 transformed), the number of unique locations explored (up to 12 'branches') and the number of movements made (log 10 transformed). We combined these three variables into a single variable using principal component analysis (PCA; see Statistics for details) and used the first principal component (PC) as the dependent variable. The first PC explained 72.7% of the variation. Individuals with a higher score on this PC, that is, individuals that left the box early, explored more unique locations and made more movements were considered to be fast explorers (David et al., 2011;Dingemanse et al., 2003).
In 22 tests (N ¼ 18 birds), the individual remained in the box the full 60 min. After observing this a few times, we wondered whether these birds were not exploratory or did not leave the box for a different reason. Thus, if an individual remained in the box for 60 min, the experimenter slightly moved the box through the screen, which stimulated the individual to exit the box after which it was recorded for another 10 min. From the 10 tests (N ¼ 9 birds) in which this occurred, the individual visited only one perch and remained immobile there for 10 min in seven tests. In the other three tests, the individual visited three, seven or nine unique locations. The additional 10 min were excluded from the analysis, that is, if a bird remained in the box for the full 60 min it explored 0 unique locations and made 0 movements.

Novel object
Each test consisted of two runs which were performed on consecutive days, and there were at least 3 days between the second run of the first test and the first run of the second test. For each run, individuals were moved to the behavioural testing cage, and given a 30 min habituation period. The experimenter then returned and divided the behavioural testing cage into two compartments with an opaque divider, leaving the bird in one half of the cage. The experimenter then put their hand in the empty cage-half, with or without placing a novel object there, after which the divider was removed. The novel object was placed on the second perch, counting from the far end from the perspective of the bird. The run without a novel object served as a within-individual control. Half of the birds received the novel object in the first run and the other half in the second run and the order was reversed for each individual in the second test. Two different novel objects (a green clothes peg and a black hair clamp) were used, balanced across the first and second test. We recorded the latency to first move and the latency to visit the perch with (novel object treatment) or without (control treatment) the novel object for up to 30 min after removal of the partition. We combined these variables using PCA, on the (1) latency to inspect the novel object (log 10 transformed) and (2) the D latency to move between phase 2 (disturbance) and phase 3 (disturbance þ novel object) (log 10 transformed). The first PC explained 62.9% of the variation (for loadings, eigenvector values and directions of both PCAs, see Appendix Figs A1 and A2 and Table A1). Individuals with a higher score on this PC, i.e. those that took longer until their first move when a novel object was present and that took more time before perching next to the novel object, were considered to be shy (Wuerz & Krüger, 2015).

Tonic immobility
We measured tonic immobility in a quiet room with minimum disturbance, and the two tests were performed on consecutive days. For each test, an individual was fixed on its back, with wings pressed to its body, on a felt-lined aluminium cradle. Fixation took place by holding the index and middle finger gently on its chest for 5 s (mean ± SD ¼ 5.5 ± 1.1 s) and then slowly removing them. Immobilization was deemed successful if the bird did not move for 5 s, and we measured the time spent immobile with a maximum score of 600 s. If immobilization was unsuccessful, the bird was fixed again for up to 10 times in total, using only the time spent immobile after successful immobilization. If immobilization was not successful 10 times, we assigned an immobility time of 1 s. The time spent immobile was used as the dependent variable (log 10 transformed to normalize the distribution), and individuals that spent more time immobile were considered more anxious (Brust et al., 2013;Gallup, 1977;Wuerz & Krüger, 2015).

Sociality
For the sociality tests, side cages (52 Â 52 cm and 52 cm high) were fitted to the sides of the behavioural testing cage, separated by metal bars, limiting physical contact, but otherwise allowing all communication. In each side cage, there was one perch, ca. 4 cm from the outer perches of the behavioural testing cage. Prior to the introduction of the focal individual, the experimenter placed 10 individuals in one side cage and two individuals in the other side cage, with sides balanced between first and second tests and individuals. These individuals had never previously interacted with the focal individual. For 10 min, we recorded the time spent on each of the six perches in the behavioural testing cage, where we took the time spent on the outer perches as time spent interacting with the conspecifics.
Group preference was taken to be the proportion of the 10 min spent on the outer perches, and group size preference was the time spent next to the group of 10 individuals divided by the total time spent interacting with either group. Group preference and group size preference were used as the dependent variables, where individuals that spent more time next to the groups, and those that preferred the large group, were considered more social. Both group preference and group size preference were square-root transformed to obtain a normal distribution.

Dominance
Social dominance score was the outcome of tournaments among single-sex groups of five or six individuals, and competition was over the single perch in the testing cage on which there was room for only one bird (Spencer & Verhulst, 2007). Each individual competed against every other individual in their group in dyadic interactions, with a maximum of one match per day for each individual. Thus, there were 10e15 dyadic matches per tournament, and each tournament was held twice, yielding two dominance scores per individual. Assignment to groups was such that the groups did not differ in size (based on their biometry at day 100, using the mean and SD of mass, tarsus and headþbill length) and birds from the benign and harsh treatments were equally represented in the groups. Unfortunately, we were not also able to assign groups in such a way that they did not differ in the mean and SD of age.
The behavioural testing cage was split into two compartments with an opaque divider, each with a low perch. Two birds were then moved to the behavioural cage, one in each compartment, and 30 min later the low perches and the divider were removed. Removal of the divider triggered a single, higher, perch to drop into place in the middle of the cage. We measured the time each individual spent on the perch for 30 min, assuming that the perch is the preferred position. To test this assumption, we included the single dominance perch in the solo habituation that was performed prior to the behavioural tests (ca. 1 h) in the test cage of the last 47 individuals. Of these 47 individuals, 41 perched, with 34 of them perching within 1 min. For the 41 individuals that perched, a mean ± SD of 3157 ± 916 s were spent on the perch. For the second run of each dyadic interaction, the starting positions were switched, with half of the individuals starting in the left half of the behavioural testing cage. We used the time spent on the perch to calculate the David's score (Gammell et al., 2003;Hemelrijk et al., 2005). The David's score was used as the dependent variable, and individuals that received a higher David's score were considered more dominant. Individuals that never sat on the perch, with their opponents also not perching, had a David's score of 0, and were not included in the analysis.

Repeatability
Repeatability was calculated as the between-individual variance divided by the sum of the between-and within-individual variance. Repeatability estimates were obtained using the 'rpt' function from the 'rptR (v0.9.22)' package (Nakagawa & Schielzeth, 2010), with variance components extracted from mixed-effects models. Here, we included individual identity as a random effect, and the test number (1 or 2) as a fixed effect. We used a Gaussian distribution for all (transformed) metrics. Confidence intervals and significance were calculated by parametric bootstrapping, with 500 iterations and 0 permutations. Additional random effects were added per behavioural test: for the novel object, we included the phase the individual was subjected to first (disturbance or novel object). For the dominance and sociality tests we included a group identifier, for the group of individuals the focal bird was competing against or the group of stimulus individuals present. For tonic immobility, we included an identity for the experimenter performing the immobilization.
Additionally, we tested whether repeatability estimates differed between individuals from benign or harsh developmental conditions by separately estimating repeatability for each group. If repeatability estimates were similar, this did not necessarily mean that individuals from one group behaved more consistently, as there may have been larger variance between individuals (Dochtermann & Royaut e, 2019). Therefore, after estimating the repeatability for both groups (i.e. benign and harsh development), we tested whether within-or between-individual variance in behaviour differed significantly between these groups with a Levene's test ('car' package, v3.0-10, Fox & Weisberg, 2019). Withinindividual variance was calculated by subtracting the value of the first test from the value of the second test.

Multivariate analyses
As we had up to five behavioural tests per individual, we employed multivariate response models, allowing us to estimate individual level pairwise residual correlations between behaviour in the different tests (i.e. behavioural syndromes). We fitted Bayesian multivariate response models with the 'brms' package (v2.14.4; Bürkner, 2017), interfaced with the MCMCsampler of RStan (v2.19.2; Stan Development Team, 2020). All response variables were transformed to a standard normal distribution (mean ¼ 0, SD 1), resulting in informative main effects even when included in interaction terms. Furthermore, standardization facilitates comparison of effect sizes and increases the efficiency of the MCMCsampler. For fixed effects, we used weakly informative Gaussian priors (mean ¼ 0, SD ¼ 1, Lemoine, 2019). Default priors of 'brms' were used for random effects, a Student's t density with three degrees of freedom for standard deviations and an LKJ density (Lewandowski et al., 2009) for correlations.
As fixed effects, we included test number (1 or 2), developmental treatment, sex, and the interaction between sex and developmental treatment. We initially included an individual's age at the first test but found no significant effect thereof in any of the analyses and decided to exclude it from further models. For the tonic immobility test, we included the fixation time as a fixed effect. As Gaussian random intercepts, we included individual identity and the identity of the genetic parents. We further included the same random factors per behavioural test as in the repeatability analysis (Appendix Table A2). As we previously found that both the developmental treatment and brood size affected offspring growth (Gerritsma et al., 2022), and thereby their fitness expectations, we ran additional multivariate models where we replaced developmental treatment with either brood size or mass on day 15.
Multivariate models were run on three chains, with 1000 warmup iterations each, followed by 3333 iterations per chain resulting in 9999 iterations in total. Adapt_delta was set to 0.999 to prevent divergent transitions. Trace plots were used to check proper mixing of chains and convergence of chains was checked by checking whether Rhat values were (close to) 1. Posterior predictive checks were inspected to evaluate model fits by using the pp_check function in the 'brms' package.
As multivariate models require complete data sets, we ran these models with (N ¼ 68) and without (N ¼ 93) the dominance test results (sample sizes are given in Table 1).
For hypothesis testing, we calculated the probability of direction (Pd), following Makowski, Ben-Shachar, Chen, et al. (2019). The Pd is defined as the portion of the posterior that has the same sign as the mean and can be considered the Bayesian equivalent of the frequentist P value, following the formula P ¼ 2 Â (1-Pd) (Makowski, Ben-Shachar, Chen, et al., 2019). Following this formula, Pd values between 0.95 and 0.975 are described as 'weak evidence' (corresponding to 0.1 ! P ! 0.05), Pd values between 0.975 and 0.99 as 'moderate evidence' (0.05 ! P ! 0.02) and Pd values above 0.99 as 'strong evidence' (P 0.02). Effect sizes were calculated as the posterior means, with the highest density intervals (95% confidence interval, CI) of the posterior mean calculated with the 'hdi' function from the package 'bayestestR' (v0.8.2; Makowski, Ben-Shachar, & Lüdecke, 2019).
To test for behavioural syndromes, we estimated an individual level pairwise residual correlation matrix from the multivariate models that included only effects related to the methodology of the tests, thereby yielding correlations between the 'raw' behavioural observations in the different tests and its 95% CI. In these models, fixation time was included for the tonic immobility trait, and all random effects that were used in the repeatability analysis were kept (Appendix Table A2). Also, here we ran separate multivariate models, excluding and including dominance, and used the latter for correlations between dominance and other traits only.
To test whether individual level behavioural syndromes differed between individuals from benign or harsh early life conditions, or between males and females, additional multivariate models were performed on subsets of the data set. More specifically, residual correlations were extracted from the multivariate models using the VarCorr function from the 'brms' package. We subtracted the residual correlations from the benign development model from the residual correlations from the harsh development model. For sex, we subtracted the residual correlations from the male model from the residual correlations from the female model. We calculated the mean and 95% CI of these differences, where an overlap of the CI with zero indicated that behavioural syndromes did not differ significantly between individuals with different developmental backgrounds or between the sexes. Sample sizes were not sufficient to determine whether behavioural syndromes differed between males and females depending on the developmental treatment.

Heritability
The heritability (h 2 ) of the behavioural observations was estimated following equation 8 (given below) from de Villemereuil et al. (2018). Heritability estimates were calculated for the genetic father and the genetic mother as the ratio of the additive genetic variance, Va, which is the amount of variance explained by the identity of the genetic parent, to the total phenotypic variance. As suggested by de Villemereuil et al. (2018), we included only 'natural' fixed effects (i.e. sex, and not developmental treatment) and included the random effects that were also included in the repeatability analysis (Appendix Table A2). The total phenotypic variance consisted of (1) Va, (2) Vf, the variance from fixed effects, (3) Vr, the variance of additional random effects, and (4) Vre, the residual variance component.

Sample Size per Behavioural Test
Some tests were unsuccessful (see Appendix Table A3 for sample sizes), in particular the dominance test, usually because neither of the two contestants used the perch (45/114 cases in females, 3/72 cases in males).

Developmental Manipulation
Offspring growth was reduced in harsh foraging conditions (Gerritsma et al., 2022). In brief, at age 15 days, offspring reared in harsh foraging conditions were 0.7 g lighter (mean ± -SD ¼ 8.57 ± 1.99 versus 7.88 ± 1.80 g, benign versus harsh, respectively) and had shorter wings (39.84 ± 5.21 versus 38.47 ± 6.30 mm) and head þ bill lengths (18.77 ± 1.40 versus 18.28 ± 1.51 mm), but tarsus length (13.62 ± 0.96 versus 13.41 ± 1.03 mm) did not differ. At the age of 31 days, offspring born in the harsh foraging environment were still significantly lighter (11.57 ± 1.59 versus 11.14 ± 1.45 g), but wing, tarsus and head þ bill length were not significantly smaller. As the growth of the offspring was negatively affected, we refer to growing up in harsh foraging conditions as being faced with early life adversity.

Individual Repeatability
Behaviour in the standardized tests varied repeatably between individuals (Fig. 1), with repeatability estimates ranging from 0.31 to 0.55 (Table 2). An exception was group size preference in the sociality test (R ¼ 0.11, P ¼ 0.17), and we therefore excluded this variable from further analysis. Repeatability estimates did not depend on developmental conditions, except for the time spent immobile in the tonic immobility test, which was repeatable in individuals from harsh developmental conditions (R ¼ 0.62, P < 0.01), but not in individuals from benign developmental conditions (R ¼ 0.17, P ¼ 0.12). This difference was due to significantly smaller within-individual variance in individuals from harsh developmental conditions, while between-individual variance was independent of treatment (Table 2).
Effects (standardized effect sizes) of developmental conditions on behaviour in the standardized tests ranged from À0.27 to 0.38 with CIs including zero in all cases (Appendix Table A4). Thus, we conclude that early life adversity as applied in the present study did not affect adult behaviour in the standardized tests. Neither did behaviour in the standardized tests differ significantly between the sexes (standardized effect sizes ranging from À0.45 to 0.20 with CIs overlapping zero). There was weak evidence for individuals to be more social in the second test (Pd ¼ 0.954, posterior mean, 95% CI ¼ 0.158 [À0.027, 0.338]). There was no evidence for an interaction between development and sex (Fig. 2).
Behaviour in standardized tests was independent of mass on day 15, with standardized effect sizes ranging from À0.05 to 0.26, but CIs always overlapping with zero (Appendix Table A5). However, dominance and tonic immobility were affected by an interaction between mass on day 15 and sex. To elucidate this effect, we Behaviour in the standardized tests was independent of brood size, with standardized effect sizes ranging from À0.20 to 0.14, with CIs overlapping with zero (Appendix Table A6).

Behavioural Syndromes
Correlations between behavioural scores in the different standardized tests were low (Fig. 3, Appendix Table A7), with two exceptions: exploration was positively correlated with dominance and negatively with tonic immobility (Appendix Table A7). Developmental conditions may affect the development of behavioural syndromes, and we therefore repeated this analysis for birds reared in benign and harsh conditions separately. This yielded evidence for two behavioural syndromes in individuals reared in harsh conditions only. First, there was a correlation between exploration and tonic immobility (Appendix Table A7), where individuals that explored faster spent more time immobile. These correlations differed significantly between benign and harsh developmental conditions (difference and 95% CI ¼ 0.533, [0.130, 0.930]). Second, there was a correlation between novel object and tonic immobility (Appendix Table A7), where individuals that were bolder when faced with a novel object spent more time immobile, but this difference was not statistically significant (difference and 95% CI ¼ À0.14, [À0.53, 0.25]). When examining behavioural syndromes in the sexes separately, we found exploration to be positively correlated with dominance in females, but not in males (Appendix Table A7), but the difference between the sexes was not statistically significant (estimate and 95% CI ¼ À0.437, [À0.908, 0.069]).

Heritability
Heritability estimates of the behavioural scores ranged from 0.08 to 0.12, with the heritability of the behaviour towards a novel object being somewhat higher at 0.23 (Table 3). Test 1 (log 10 s)

Causes of Individual Variation
We successfully manipulated early life adversity in zebra finches, as evidenced from their growth, and subjected these birds to multiple behavioural tests. However, early life adversity did not affect the behaviour in any of these tests. This lack of effect can in principle be due to imprecise measurements, but repeatabilities of the behavioural scores were substantial (mean 0.45, range 0.31e0.55), and in the range expected for behavioural traits (Appendix Table A8). Moreover, each individual was subjected to each test twice, and the repeatability of the mean of two observations is higher, ranging in this case from 0.48 to 0.71 (mean 0.62; calculated using equation 37 in Nakagawa & Schielzeth, 2010). Given that the sample size was substantial at 93 birds, we assume that any undetected effects of early adversity must have been small relative to the total variance.
The absence of a behavioural response to adversity in our study is consistent with a meta-analysis that found no effect of maternal undernutrition during pregnancy on offspring behaviour in mammals (Besson et al., 2016). It is also consistent with the results of a meta-analysis of associations between 'state' and personality traits, which concluded that such associations explain at most 8% of the variation (Niemel€ a & Dingemanse, 2018). Furthermore, high heterogeneity in these meta-analyses is reported (44.1e85.6%, Besson et al., 2016;80%, Niemel€ a & Dingemanse, 2018), indicating a need for more studies to unravel the source of this variation.
High heritability of behaviour could in principle explain the lack of experimental effects in our study, because variation in personality is known to have a heritable component (Dochtermann et al., 2015). However, on average only 13.7% of the variation in animal personality can be attributed to additive genetic variation (Dochtermann et al., 2015), and our mean estimate (±SE: 12.2 ± 1.2%) is close to this estimate. We were limited in estimating heritability as variation in relatedness was limited to full-siblings and half-siblings but this limitation likely generated an overestimation due to shared environments, and hence is unlikely to affect our conclusion that a genetic constraint does not explain the absence of experimental effects.
The finding that neither early life adversity nor additive genetic effects explains the substantial individual variation in behaviour indicates that the causes of this variation must be found elsewhere. Zebra finches live in flocks in the wild (Zann, 1996), and the quantity and quality of interactions within flocks vary widely (Boogert et al., 2014;Brandl et al., 2019). Most studies of social environment effects on later behaviour investigated the pre-or postnatal environment (Caldji et al., 1998;Kaiser & Sachser, 2005;Kemme et al., 2008), and attention has only recently shifted to the adolescent stage (Sachser et al., 2011). In zebra finches, the adolescent social environment affects adult affiliative and aggressive behaviour (B€ olting & von Engelhardt, 2017;Ruploh et al., 2013Ruploh et al., , 2014Ruploh et al., , 2015, suggesting it to be an important period in shaping zebra finch personality. In our study, birds were always in groups, and interactions in these groups at any point before the behavioural testing may therefore have shaped their personality as expressed in our behavioural assays, perhaps through processes as described by reinforcement sensitivity theory (Corr, 2008) but this remains to be investigated.

Functional Aspects
Life history theory predicts individuals with reduced life expectancy will be more risk prone (Wolf et al., 2007). The results from our study and from Krause et al. (2017), who studied long- term effects of early life nutrition, do not agree with this prediction, since neither study found effects of developmental conditions on behaviour in adulthood. However, a core assumption of this theory is that individuals with different life history strategies ultimately achieve similar fitness. Given that early life adversity negatively affected growth in our study, fitness prospects are likely to differ between these individuals, and this complicates predictions. Specifically, the risk associated with behaviour could be state dependent (Luttbeg & Sih, 2010), where low-quality individuals may be at greater risk in identical contexts, for example if they are less able to escape a predator. This implies that while early life adversity did not affect behaviour, it may well have had an undetected effect on the risk that they took in expressing it. Testing this hypothesis requires different tests, for example quantifying escape behaviour. Behavioural syndromes potentially limit the behavioural expression of an individual, and can be the result of specific selection pressures, when combinations of behavioural traits are favoured by selection. For example, in sticklebacks, Gasterosteus aculeatus, and lemon sharks, Negaprion brevirostris, behavioural syndromes only formed when individuals were exposed to predation (Bell & Sih, 2007;Dhellemmes et al., 2020;Dingemanse et al., 2007). However, greater plasticity can also be selected for, as early life adversity disrupted the emergence of a behavioural syndrome in cichlids (Hope et al., 2020). We therefore tested whether early life adversity affected the development of behavioural syndromes in our study. However, except for a negative correlation ('syndrome') between exploration and tonic immobility in birds that experienced early life adversity, we found little evidence for behavioural syndromes.
The ability to detect behavioural syndromes depends on the repeatability of the behavioural scores. For example, when the repeatability of two behavioural tests is 0.5, the maximum correlation that can be observed between the two behavioural scores is 0.5 Â 0.5 ¼ 0.25. Thus, even if the two behaviours in this example were perfectly correlated (r ¼ 1), when averaged over many measurements in each individual, we would observe a correlation of only 0.25 in a data set where each behaviour is measured once and the difference is due to the within-individual variability of the behaviour. In our study, the (extrapolated) repeatabilities ranged from 0.48 to 0.71, and the upper limits of the correlations we could find (i.e. the products) therefore ranged from 0.26 to 0.49 across all combinations. We were very likely to detect correlations at this level, that is, perfect correlations (r ¼ 0.26, power ¼ 0.73; r ¼ 0.49, power > 0.99 when using Pearson correlation), but suppose the true correlation was 0.5 then our power becomes substantially lower (range 0.24e0.66). It remains difficult to exclude the possibility, therefore, that behavioural syndromes were present but undetected in our study. On the other hand, due to the withinindividual variability in behaviour, the expression of undetected behavioural syndromes is nevertheless likely to have been relatively weak. This reduces their relevance when trying to understand causes and consequences of variation in behaviour, because selection will act on the expression of the phenotype.
A starting point of this study was the assumption that individuals can be distinguished on the basis of the amount of risk Effects of developmental treatment, sex and test sequence on behavioural test results, expressed as standardized effect size (vertical line) with SE (thick whisker) and 95% confidence interval (thin whisker). Pd: the probability of direction, where values below 0.95 are not significant and values above 0.95 are significant. Exp: exploration in novel environment; Soc: group preference in the sociality test; NO: novel object; Imm: tonic immobility; Dom: dominance. Dominance tests were always intrasex, so there can be no differences between the sexes. Negative effect sizes indicate lower scores for individuals reared in harsh conditions (DT), males (Sex) or the second test. Note that the x-axis for novel object is on a longer scale, and that variables were mean-centred, rendering the main effect estimates informative, even in the presence of the interaction term in the model. they take, combined with the prediction based on life history theory that individuals with lower fitness prospects would take more risk (Clark, 1994;Roff, 2002;Wolf et al., 2007). However, zebra finches in our study did not differ consistently in the amount of risk they took, as evidenced by the weak correlations between behaviour in the different tests, despite the consistent individual differences in behaviour. In this setting, theories that assume consistent individual differences in risk taking are of limited relevance to functionally explain the individual variation in behaviour. This raises the question how to explain the consistent individual variation in behaviour we observed from a functional perspective. If fitness payoffs are frequency dependent (e.g. being aggressive or not; Duckworth, 2008), this could lead to consistent interindividual differences in behavioural traits through diversifying selection, which can act on multiple behavioural traits, yet not necessarily in a similar direction (e.g. towards risk taking; Dall et al., 2004). In this scenario, there is no reason to anticipate differences in fitness between individuals with different behavioural scores. Such a diversifying process may be at the phenotypic level only, in accordance with low to modest heritability of behavioural scores in our study. Furthermore, such a diversifying process could, for example, take place during adolescence, as discussed above, with individuals continuing to behave in the way that proved to be beneficial. In this scenario, small initial differences in behaviour, potentially with a largely stochastic origin, may become amplified through reinforcement, resulting in consistent interindividual differences in adulthood. The colour of the point and line corresponds to whether the 95% CI overlaps with zero (grey/white ¼ yes, black ¼ no), and denotes whether behavioural traits were correlated within that group (all, development or sex). Exp: exploration in novel environment; Soc: group preference in the sociality test; NO: novel object; Imm: tonic immobility; Dom: dominance. A small asterisk on the y-axis denotes whether the correlations differed significantly between the groups. Estimates are given as the additive genetic variance explained by the genetic mother or genetic father with the 95% confidence interval between brackets.

Data Availability
Data are stored on 10.5281/zenodo.6406723 and will be made available by the authors upon request.

Declaration of Interest
None. The sample size per developmental treatment is given between brackets: the first number is for the benign and the second for the harsh treatment. Sample sizes are given for all individuals for which we have data for both tests or for at least one test. a All females, 33 from a benign and 24 from a harsh developmental treatment. b All males, 16 from a benign and 20 from a harsh development. D first move between phase 2 and 3: 0.71

APPENDIX
As higher scores on the first PC of the novel object corresponded with higher latency to inspect the novel object, and a lower latency to first move when a novel object was present, indicating lower levels of boldness, we multiplied it by À1. Identical fixed effects are included for all response variables, with identity of the individual and the identity of its genetic parents included as random Gaussian intercepts. DT: developmental treatment. Additional random Gaussian intercepts included are specified per response variable. For each posterior mean estimate, the other predictors were kept averaged over levels. The variable not shown is fixation time, concerning tonic immobility, which was kept at its mean value when calculating posterior means. DT: developmental treatment. For each posterior mean estimate, the other predictors were kept averaged over levels (factors) or at their median (continuous predictor, i.e. mass). The variable not shown is fixation time, concerning tonic immobility, which was kept at its mean value when calculating posterior means. For each posterior mean estimate, the other predictors were kept averaged over levels (factor) or at their median (continuous predictor, i.e. brood size). The variable not shown is fixation time, concerning tonic immobility, which was kept at its mean value when calculating posterior means.    Figure A1. PCA on the novel environment test. The direction of each metric is given for the first two principal components, including the percentage of variance explained by each. All variables were standardized prior to the PCA (referred to as 'St.') and the latency in starting exploring (latexp) and the activity during the test (actexp) were log 10 transformed.  Figure A2. PCA on the novel object test. The direction of each metric is given for the first two principal components, including the percentage of variance explained by each. All variables were standardized prior to the PCA (referred to as 'St.') and the latency to inspect the novel object (latno) was log 10 transformed. Deltamoveno refers to the difference in the latency to the first move between phase 2 and phase 3.