Inhibitory control performance is repeatable over time and across contexts in a wild bird population

Inhibitory control is one of several cognitive mechanisms required for self-regulation, decision making and attention towards tasks. Inhibitory control is expected to in ﬂ uence behavioural plasticity in animals, for example in the context of foraging, social interaction or responses to sudden changes in the envi- ronment. One widely used inhibitory control assay is the ‘ detour task ’ where subjects must avoid impulsively touching transparent barriers positioned in front of food, and instead access the food by an alternative but known route. However, because the detour task has been reported to measure factors unrelated to inhibitory control, including motivation, previous experience and persistence, the task may be unreliable for making cross-species comparisons, estimating individual differences and linking performance with socioecological traits. To address these concerns, we designed a variant of the detour task for wild great tits, Parus major , and deployed it at the nesting site across two spring seasons. We compared task performance of the same individuals in the wild across 2 years, and with their perfor- mance in captivity when tested using the classical cylinder detour task during the nonbreeding season. Potential confounds of motivation, previous experience, body size, sex, age and personality did not signi ﬁ cantly predict performance, and temporal and contextual repeatability were low but signi ﬁ cant. These results support the hypothesis that our assays captured intrinsic differences in inhibitory control. Instead of dismissing detour tasks and ‘ throwing the baby out with the bathwater ’ , we suggest confounds are likely system and experimental-design speci ﬁ c, and that assays for this potentially fundamental but largely overlooked source of behavioural plasticity in animal populations, should be validated and re ﬁ ned for each study system.

Inhibitory control is a form of self-regulation that allows an individual to supress predispositions in favour of more appropriate actions (Diamond, 2013). It is a domain-general cognitive ability and one of several executive functions required for decision making and attention towards tasks. Inhibitory control is increasingly being used to explain sociality and functional behaviour (reviewed in Kabadayi et al., 2018). This includes foraging flexibility when predation risk varies (Coomes et al., 2021), dietary breadth in individuals and species (MacLean et al., 2014;van Horik et al., 2018) and inhibiting previously employed social strategies depending on the identity of social group members (Amici et al., 2008;2018;Reddy et al., 2015). Moreover, inhibitory control is likely to be responsive to selection given its links with brain size in primates and its heritability in humans (Friedman et al., 2008), dogs, Canis lupus (Gnanadesikan et al., 2020), fish (Lucon-Xiccato & Bertolucci, 2020) and birds . However, the inferential power of these studies is dependent on cognitive assays that accurately characterize inhibitory control and whether these assays are applicable to how animals behave in natural settings.
Recently, researchers have taken a variety of classical cognitive tasks to the field (e.g. Johnson-Ulrich et al., 2020;, 2016Muth et al., 2018;Reichert et al., 2020;Sonnenberg et al., 2019;Toledo et al., 2020) where cognition can be assayed under natural conditions with high ecological validity. Likewise, depending on the scope of the planned research and institutional policies, welfare concerns and administrative burdens associated with housing animals in captivity are reduced under natural settings. However, inhibitory control assays in the wild are scant and rely on experimenters in close proximity to habituated animals (Ashton et al., 2018;Shaw et al., 2015). Moreover, the extent to which extraneous variables contribute to, or confound, cognitive performance may differ for captive-bred, temporarily captive wild individuals and free-living animals (Morand-Ferron et al., 2016), but comparisons across such experimental contexts have rarely been made for any cognitive trait (see Benson-Amram et al., 2013;Cauchoix et al., 2017;Forss et al., 2015;McCune et al., 2019;Mouchet & Digemanse, 2021 for some examples).
One way of measuring inhibitory control is the widely used 'detour' task. In this task, subjects must avoid and move around a transparent barrier to retrieve a food reward that is positioned directly in front of them, but behind the barrier (Diamond, 1990). A central premise of this task is that the visible reward generates a strong prepotent impulse to approach it directly. This impulse must be inhibited, and the subject must initially move in a direction away from the reward to detour around the barrier successfully. The classic detour task is dependent on an experimenter rebaiting the apparatus with a food reward, such that its scope for field-based assays has been restricted to wild, human-habituated New Zealand robins, Petroica australis, and Australian magpies, Gymnorhina tibicen (e.g. Ashton et al., 2018;Shaw, 2017). Another potential limitation for cognitive tasks in the field is that food rewards may be accessible to multiple individuals at once (but see Cauchard et al., 2013Cauchard et al., , 2017Shaw et al., 2015). Consequently, individual performance is likely to be subject to social interference, for example through social learning or competition from other group members Reichert et al., 2020).
Attributing a cognitive capability to an individual is potentially challenging if the cognitive trait of interest is confounded by some other factor, such as motivation, personality and experience (Morand-Ferron et al., 2016). In the case of inhibitory control, experience with experimental contingencies (e.g. odour, Santac a et al., 2019; human training history, Fagnani et al., 2016, Jelbert et al., 2016, apparatuses (e.g. transparent plastic, Santac a et al., 2019) and motor routines  can predict performance, as can motivational factors such as body condition as a proxy for energetic state (Shaw, 2017). Animal personality traits, for example consistent differences between individuals in how readily they engage with and explore their environment, have been shown to affect how animals perform on cognitive tasks generally (Dougherty & Guillette, 2018;Guillette et al., 2017;Sih & Del Giudice, 2012), including on tests of inhibitory control (Gomes et al., 2020;Savaş çı et al., 2021), although these effects are not universal (Guillette et al., 2015;Vernouillet et al., 2018). The extent to which cognitive tasks that target inhibitory control tap into additional cognitive and noncognitive processes may also be dependent on the type of task used (e.g. V€ olter et al., 2018), the context (e.g. captivity versus the wild) and temporal aspects (e.g. season). Therefore, pinpointing confounds and correlates is a critical step towards accurately assaying inhibitory control for any given study system. Examining the consistency of task performance has the potential to shed light on task validity. Typically, consistency is quantified by estimating repeatability, that is, the proportion of variance in a repeatedly measured trait that is explained by between-individual differences. Significant repeatability suggests consistent individual differences in behaviour, and although it sets the upper limit of heritability (Dohm, 2002), equally these individual differences could be caused by permanent environmental effects (Wilson, 2018). Of the few studies that have explicitly examined repeat performance using the detour task, evidence for consistency between individuals is mixed. Captive zebrafish, Danio rerio, were consistent in their performance on the same task (Lucon-Xiccato & Bertolucci, 2020), but captive song sparrows, Melospiza melodia, captive pheasants, Phasianus colchicus, and wild New Zealand robins were not (Shaw, 2017;Soha et al., 2019;van Horik et al., 2018). Across different tasks (i.e. different types of transparent apparatuses), captive zebrafish performance was correlated, and wild Australian magpies showed significant repeatability (Ashton et al., 2018). Repeatability may not necessarily be strong proof of task validity since the task could be consistently measuring the same confounds, especially when using the same task under the same conditions over time. For example, in a recent study on problem solving in great tits, Parus major, performance was repeatable but this repeatability was largely explained by experimentally manipulated motivational effects that were present in both repeat measures . This is less likely to be a problem, however, when repeat measures are taken under very different conditions (contextual repeatability) that are unlikely to have shared environmental confounds, and especially when intrinsic confounds are also likely to differ (Nakagawa & Schielzeth, 2010). Finally, as discussed above, confounding variables can explain variation in single measures of cognitive performance, but can also act on the between-and/or within-individual variance component when dealing with multiple measures from the same individual. It follows that a further important test is to establish whether the consistent between-individual differences specifically are driven by these confounding effects Nakagawa & Schielzeth, 2010). Demonstrating temporal and contextual repeatability in performance on the detour task would provide compelling support for a common cognitive basis of these tasks, especially when controlling for potential confounds. Our aim here was to examine the extent to which variations in the detour task assay inhibitory control in our study system of wild great tits across time and contexts.
We designed a modified version of the detour task and presented it at the nestbox over two breeding seasons. The task involved minimal experimental disturbance or social interference from conspecifics, and no need for food rewards. Additionally, motivational factors related to brood age, size or satiety were not expected to confound our task performance (Cauchard et al., 2017). We also ran a classic version of the task by testing wildcaught great tits in captivity during one winter season. This approach allowed us to test for both contextual and temporal repeatability, and to examine statistically whether environmental or state variables caused between-individual differences in detour task performance, rather than the hypothesized cognitive mechanisms underpinning inhibitory control. These variables included experience, motivation, body size (because large birds may have more difficulty avoiding the barrier) and habitat as potential ecological confounds linked to differences in population density (O'Shea et al., 2018) and/or environmental variability (van Horik et al., 2019). Moreover, we also examined whether 'exploration behaviour in a novel environment', a commonly used assay of the fasteslow exploration personality axis (Dingemanse et al., 2002), explained performance on the captive detour task. If these variables did not explain task performance in general, and the repeatability of performance in particular, this would lend support for the detour task's utility as a robust measure of inhibitory control, where individual performance is not sensitive to bias from extraneous influences.

Study System and Field Sites
The great tit is a model species in ecology and evolution field studies and has also been used for exploring the evolutionary ecology of cognitive variation. Great tits breed in nestboxes and can be fitted with passive integrative transponder (PIT) identification tags for remote detection at nestboxes using radio frequency identification (RFID) technology. They readily engage with experimental apparatuses in the wild and in captivity Cauchard et al., 2013;Troisi et al., 2021), allowing for comparison of cognitive performance across different settings. Our study took place across 10 distinct woodland sites in the Bandon Valley area, Co. Cork, Ireland. Five sites were mixed deciduous and five were conifer plantations (Table A1). Nestboxes, hung approximately 1.5 m from the ground, were distributed across these sites at a density of two nestboxes per ha. Nestboxes were monitored for breeding data (O'Shea et al., 2018; lay dates, incubation period, hatching date, clutch size and number of fledglings). Adults were trapped at the nestbox between day 10 and day 13 posthatching to measure biometrics (including weight, wing and tarsus length), and to tag individuals with a PIT tag and a coloured ring with a unique alphanumerical code for identification. Chicks were ringed and weighed on day 15 posthatching. Brood size was recorded on experimental dates.

Experimental Procedures: Detour Task in the Wild
To measure individuals' detour task performance in the wild, we designed a detour task for the nestbox requiring birds to avoid initially opaque, and subsequently transparent, flexible barriers to gain access to their nest hole. To do so, birds were required to alter their normal flight path (i.e. level with the nest hole), such that they could enter from the gap under the barrier (Fig. 1). The experiment consisted of three phases: (1) habituation to the opaque experimental apparatuses, (2) training to go under an opaque barrier and (3) testing with a transparent but otherwise similar barrier (Fig. 2). In both years, a Perspex cover (20 cm x 20 cm) was positioned horizontally on top of the box during the training and test phases, to provide birds with experience with transparent plastic (see Isaksson et al., 2018;van Horik et al., 2018) and to act as a cover for the transparent barrier in case of rain (although most experiments occurred when there was no rain). Experiments took place between 25 May and 25 June 2017 (hereafter year 1), and between 26 May and 15 June 2018 (hereafter year 2).
At the start of an experimental session, that is, when the experimental apparatus was placed on the nestbox for each phase, the standard nestbox front was replaced with an identical-looking front with integrated photodiode sensors and RFID technology that logged visits from birds with or without PIT tags (RFID logger for Schwegler 1B from Dominic Goodwin, NatureCounters, www. naturecounters.com). A Panasonic HC camera was mounted on a 50 cm high tripod, positioned approximately 10 m from the nest. Both the nest front and the camera were synchronized and used to quantify the number of times each individual bird landed on/ entered the nest hole, and to identify the parents by colour ring and/or plumage characteristics. Each experimental session was presented at the nestbox for approximately 1 h each day over consecutive days between 0700 and 1300 hours. Three boxes received a test experiment between 1300 and 1600 hours (one box in year 1, two boxes in year 2). RFID and photodiode data were reviewed on the same day to determine whether birds passed the habituation and training phases (Fig. 2, see Appendix for criterion details). In year 1 the video footage confirmed the accuracy of the RFID data at the habituation and training phase, so that in year 2 only RFID data were required. Video footage was reviewed to quantify performance at the test phase in both years.
We measured the wild detour task performance as the proportion of successful trials out of total trials (see Statistical Analysis: Detour task in the wild), where higher values indicated better performance. A bird was successful if it flew under the barrier, by flying lower than normal or by landing on the perch/box and jumping under. A bird failed if it touched the barrier either by flying into it, by jumping into it from the perch, or by perching on the side and tapping the edge with the beak. On some occasions the lip of the underside of the barrier touched the bird's back as it jumped under, but this was not considered a fail as it was a clear attempt to avoid the barrier, and likely an artefact of the bird's size rather than a lack of inhibitory control. If a bird was perched at the nest hole and under the barrier, but did not enter the box, it was still scored as having completed a successful trial. Birds had to fly away from the nestbox (i.e. out of the camera view) for a new trial to be scored. If a bird repeatedly jumped between the nest hole and the perch (either contacting the barrier or not), its score was based on its first action, which reflects our scoring criterion in the captive task (see below).
If neither parent passed a phase after 2 days, the experiment was abandoned for that nest. If only one parent passed the phase after 2 days, then the experiment was advanced for the participating parent. Therefore, the experiment took three to six experimental sessions (i.e. days) to complete in year 1 (mean ¼3.7 days ±0.13 SE). None of the birds abandoned the nest during or after the experiments. Experiments started at day 10 posthatching, except for 12 boxes in year 1 that started at day 5e7 due to logistical constraints. Experiments did not occur on the trapping day. In year 2 the experimental protocol was refined to reduce the number of experimental trials by eliminating the need for a habituation phase and reducing the criterion for the training phase. Instead of the 1 h habituation phase described above, a dummy apparatus was placed on the box permanently from day 8 posthatching until the start of the training phase. This consisted of wire mesh around the top of the box attached to a plastic rectangle covered with camouflage tape (0.2 x 6 cm and 5 cm high), and a wooden perch below the nest hole. The criterion for passing the training phase was reduced to one successful approach, instead of three, as our year 1 data showed that the number of training trials did not influence performance during the test phase (see also Results). The experiment took 2e4 days to complete in year 2 (mean ¼ 2.4 days ± 0.12 SE).

Aviary Housing
Birds were caught from nine of the 10 sites described above between January and early March 2018, four of which were mixed deciduous woodlands and five were conifer plantations. Birds were transported from the field sites to the aviary in cloth bags within 2 h of being caught and brought into captivity for approximately 3 weeks before being released at the site of capture. Housing details are described in Coomes et al. (2021). Birds were not deprived of food prior to experiments, and wax worms, Galleria mellonella larvae, were only provided as experimental rewards.

Experimental Procedures: Detour Task in Captivity
The captive detour task consisted of the same three experimental phases described above (Fig. 2). We piloted different sizes of tubes on a cohort of birds not included in the main analyses and chose a 3.5 cm diameter x 3 cm wide cylinder tube so that the complexity of the task did not cause ceiling or floor effects in performance caused by the difficulty in detouring around the barrier (Farrar et al., 2020;V€ olter et al., 2018). A 5 cm high perch was positioned 15 cm in front of the cylinder to standardize the approach direction for each trial (Fig. 1). The habituation phase was presented the day after birds arrived in the aviary. Birds had to retrieve a wax worm placed at the edge of an opaque plastic cylinder three times before they received the training phase. Wax worms were euthanized by head compression, so they did not move during the trial. Depending on the bird's progress, the training phase occurred either on the same day or the following day.
In the training phase, birds had to retrieve a wax worm placed in the centre of the same plastic cylinder, without touching the exterior of the cylinder, on four of five consecutive attempts to retrieve the worm (Boogert et al., 2011). The test phase was always performed the day after a bird passed the training phase. During this phase birds received 10 trials with a transparent cylinder of the same dimensions as the one in the habituation and training phases. To ensure birds did not become sated, half a wax worm was used as a reward during the test trials. A trial was defined as a bird approaching the cylinder and making contact with the barrier (scored as a fail) or retrieving the worm from the side of the tube without making contact with the barrier (scored as a success). The trial ended when the bird retrieved the worm, or flew away from the apparatus, at which point the tube was removed from the testing enclosure, rebaited by the experimenter, and placed back in the enclosure. This procedure was designed such that each approach was measured as a success/fail, as opposed to the number of pecks until success, the latter of which may be guided by individual persistence . Allowing birds to consume the worm at each trial, whether they failed or succeeded, controlled for reward history that may have influenced reinforcement and/or motivation through hunger. All birds ate 8e10 worms at the test phase, except two birds, which ate four and seven worms. We recorded the time it took each bird (N¼35) to complete 10 trials as there was no limit to how long birds had to approach the cylinder for each trial. As for the wild task, we measured the captive detour task performance as the proportion of successful trials out of total trials, such that high values indicated putatively high inhibitory control.

Experimental Procedures: Exploration Behaviour
The morning following the birds' arrival to the aviary, we performed an 'exploration in a novel environment' assay, henceforth referred to as exploration behaviour (see also Coomes et al., 2021;adapted from;Dingemanse et al., 2002). An access hatch at the back of the bird's cage that led to a larger room (4.60 x 3.10 m and 2.65 m high) was opened. The light in the home cage was turned off and  Figure 2. Schematic of the experimental phases (Habituation, Training, Test) for the wild tasks (year 1 and year 2) and the aviary task. Birds advanced to the next phase if they met the criterion, indicated by either entering the box or eating the worm the required number of times (indicated by 'x'). If they did not meet the criterion, the phase was repeated the following day. The criterion for the wild task in year 2 was reduced to minimize disturbance at the box. The experiment ended when birds performed the required number of trials in the test phase. Further details are provided in the Appendix.
birds were free to enter the room. Once the bird entered the room, the number of hops and flights, within and between trees, was recorded from the adjacent corridor through one-way glass. 'Trees' were made of a wooden upright support and two thick dowels running at right angles to each other (see also Coomes et al., 2021). The trial was complete after 2 min, at which point the birds were returned to their home cage. Exploration behaviour was recorded as the sum of the number of hops and flights, and has been shown to be repeatable in our population (O'Shea et al., 2017).

Statistical Analysis: General Comments
All models, unless otherwise specified, were run as generalized linear (mixed) models (GLMMs) in lme4 (Evans, 2016) in the R statistical software interface (R Core Team, 2017). P values were generated using lmerTest (Kuznetsova et al., 2017). Plots were generated using ggplot (Wilkinson, 2011). We used the dredge function from the MuMIn package (Barton 2019) and an information-theoretic approach in combination with model averaging (Grueber et al., 2011). We generated models from a global model from our GLMMs and retained models with an Akaike's information criterion corrected for small sample sizes (AICc) within 2 units of the top model. We report the conditional averaged weighted parameter estimates across the retained models. All continuous variables were scaled. We used the vif function in the usdm package (Naimi et al., 2014) to test for collinearity between fixed factors. All variables had a variance inflation factor less than 2.5 and were not considered to show multicollinearity. Our R code is included as Supplementary material.

Statistical Analysis
Detour task in the wild The number of trials undertaken by each bird during the 1 h test phase varied (mean ± SE trials ¼ 9.6 ± 0.57 SE, range 1e26). Our analyses included a maximum of the first 10 trials as this was consistent with the number of trials in the captive task and existing literature (MacLean et al., 2014). We also confirmed that the number of trials used to calculate overall performance did not bias wild detour task performance, if, for example, birds with more trials had higher scores if they learned over successive trials to avoid the barrier (see Results).
Initially we examined what fixed effects had a potentially confounding influence on detour task performance in the wild. Lack of any strong effects would lend support for the wild detour task being a reliable test of inhibitory control. It is also important to identify which fixed effects could be driving between-individual variation in detour task performance (see repeatability below). We modelled wild detour task performance in a GLMM with a binomial distribution and logit link function, with the number of successes as the numerator and the total trials as the denominator (in R, using cbind, the response variable is entered as two variables, number of successes and number of fails). Our global model included the following fixed effects as potential sources of variation in performance: the number of training trials because the motor action of flying under a barrier could carry-over to the test phase; wing length as an index of body size that could influence the ability to pass underneath the barrier; year, lay date and brood size as these could be sources of motivation that may influence parental impulses to feed their chicks; and sex, which has been reported to predict inhibitory control performance in a stop-signal task (Lacreuse et al., 2016). Continuous variables were scaled and meancentred to zero. Site, nest and bird identity (ID) were included as nested random terms. Owing to convergence issues associated with overfitting the model with categorical variables, we did not include habitat (conifer versus deciduous) or age in our global model, although visual inspection of these variables and reduced models in which these variables were included suggest they had no effect on performance (Fig. A1).

Detour task in captivity
We modelled captive detour task performance in a GLMM with a binomial distribution and logit link function. Our global model included reward history (i.e. number of worms eaten), motivation (i.e. time to complete all 10 test trials), habitat, personality, sex and age as fixed factors, and site of capture as a random effect. Habitat was included as a potential ecological confound linked to differences in population density (O'Shea et al., 2018) and/or environmental variability (van Horik et al., 2019). Exploration behaviour was included as a fixed factor as personality may influence how birds engage with the task, or form part of so-called cognitive 'styles' (Sih & Del Giudice, 2012). Sex and age were also included as sources of variation in task performance (Lacreuse et al., 2016;Macdonald et al., 2014).

Repeatability
We investigated whether individuals showed consistent differences in the wild detour task performance across years (temporal repeatability; but not for the captive task for which we had no repeats). Significant temporal repeatability in performance would suggest that the task measured an intrinsic trait, indicating a permanent environment effect and/or heritability (e.g. Quinn et al., 2009). The wild detour task data set included repeat measures (N ¼ 16 observations, eight individuals), as well as single measures (N ¼ 68) to increase power (Martin et al., 2011). We ran a GLMM as described above, with year as a fixed effect and bird ID as a random effect, and compared this model with another that excluded bird ID using the anova function in R. The repeatability estimate was calculated from the variance components and the residual variance as 1/p(1-p), where p is the expected probability of success calculated as the mean wild detour task performance in the data set (Nakagawa et al., 2017). The 2.5% and 97.5% confidence intervals (CI) were calculated with the function confint() using the bootstrap argument with 1000 simulations. Sigma (residual deviation) was estimated to be 1. We also tested whether the inclusion of fixed effects resulted in any change in the repeatability estimate (adjusted repeatability). If the repeatability of wild detour task performance remained significant after inclusion of these effects, this would point to consistent individual differences being explained by inhibitory control.
We also estimated contextual repeatability between tasks, significant levels of which would suggest that performance on these tasks could be attributed to a common factor, supporting the hypothesis that these tasks reflect, at least in part, inhibitory control where the prepotent impulse to go directly towards a positive stimulus (either a food reward or a begging offspring) must be inhibited. We ran an additional two GLMMs (with and without bird ID as a random effect) using a data set that included repeat measures (N ¼ 21 observations, 10 individuals, one of whom was measured both in year 1 and in year 2 of the wild task), and singletons (N¼98). Differences in results were negligible if we excluded year 2 performance from the bird for which we had repeated measures in years 1 and 2 (see Appendix). We included task (wild versus captivity) as a fixed effect, and bird ID and site as random effects. We then repeated these analyses to control for fixed effects that were common between tasks and had been retained in the model selection for temporal repeatability, to ensure that any significant repeatability was not driven solely by common factors unrelated to inhibitory control.

Ethical Note
This study was conducted under licences from the Health Products Regulatory Authority (AE19130_P017), The National Parks and Wildlife Services (C11/2017, 004/2017, C02/2018 and 001/2018) and the British Trust for Ornithology. The research project received ethical approval from the Animal Welfare Body at University College Cork, and was in accordance with the ASAB/ABS Guidelines for the Treatment of Animals in Behavioural Research and Teaching. We observed no nest desertion as a result of our experiment.

Participation
In year 1, 19 males and 24 females from 29 nests participated in the wild detour task. The experiment was attempted at three additional nests, but these were excluded because neither parent passed the habituation or training phases. In year 2, 21 females and 20 males from 22 nests participated, where at least one parent from all nests reached and participated in the test phase. One bird did not participate in the captive detour task as it would not approach or consume the worm next to the apparatus.
We found that the total number of test trials did not correlate with overall performance (Kendall's tau test: z ¼ -0.23, tau ¼ -0.02, P ¼ 0.79, N ¼ 84, Fig. A2a). Moreover, for birds that completed at least 10 trials, their overall performance calculated from the first five trials was highly correlated with their overall performance calculated from the first 10 trials (z ¼ 9.42, tau ¼ 0.86, P < 0.001, N ¼ 69, Fig. A2b). Eight individuals completed fewer than five trials and were included in the analysis (three birds completed one trial, one bird completed two trials, one bird completed three trials and three birds completed four trials).

Detour Task in the Wild
The number of instances that individuals successfully went under the opaque barrier during the training phase varied across individuals (mean ¼ 7.11 ± 0.59 SE, range 1e37). In year 1, during the training phase there were eight instances from seven individuals when birds made contact with the opaque barrier, whereas during the test phase there were 130 instances from 36 individuals when birds made contact with the transparent barrier (mean ¼ 3.02 ± 0.45 SE), confirming that the transparent barrier elicited the prepotent response of flying straight to the nest hole (see van Horik et al., 2018). Year, lay date, sex and wing length were retained in the top models, but there was no evidence that any influenced detour task performance (Fig. 3, Table 1), although males performed marginally worse than females, and there was a tendency for performance to decline with lay date. The number of training trials, brood size and year were not retained in any of the top models and did not predict task performance (Fig. 3, global model test statistics are provided in Table A2).

Detour Task in Captivity
The mean captive detour task performance was 0.41±0.04 SE, substantially lower than that observed in the wild task. The time it took birds to complete the test phase, the number of worms eaten, sex and exploration behaviour were the retained fixed terms in the top models, but evidence that any of these variables had an effect was weak because none were statistically significant (Fig. A3). Age (Fig. A3c) and habitat (Fig. A3f) were not retained in any of the top models and did not predict task performance (global model test statistics are provided in Table A2).

DISCUSSION
We have shown that, in wild great tits, the detour task is repeatable across years and testing environments (wild versus captivity). Furthermore, we controlled for a range of possible confounding variables, and they did not explain the repeatability of performance. Although our sample size was small, these results suggest an underlying trait that is common across tasks in our system, which we interpret as the inhibition of a prepotent response/habitual behaviour. To our knowledge, two other studies have investigated repeatability of inhibitory control in wild birds (Ashton et al., 2018;Shaw, 2017), only one of which also reported significant repeatability (Ashton et al., 2018). Moreover, our findings contrast with recent reports that the detour task is correlated with cognitive and noncognitive traits unrelated to inhibitory control (Shaw, 2017;van Horik et al., 2018;van Horik et al., 2020). By synthesizing our findings across both wild and captive versions of the detour task, we discuss the extent to which we can attribute task performance to inhibitory control.
Performance on the wild detour task was repeatable across time, and between wild and captive versions of the task. Although repeatability estimates were low, these results point to an underlying, inherent trait that was consistently measured despite differences in year, season, testing location and apparatus type. Repeatability sets the upper limit for heritability but does not preclude permanent environment effects driving some or even all the intrinsic differences observed. Nevertheless, genetic pedigree studies have shown that detour task performance is heritable in birds , and that executive functioning is the most heritable psychological trait in humans (Friedman et al., 2008). While we acknowledge that the relatively small sample size of within-individual measurements may render parameter estimation less reliable (Nakagawa & Schielzeth, 2010), our repeatability estimates were consistent with previous reports in animal cognition. A meta-analysis of repeatability estimates across a range of different cognitive tasks including inhibitory control, problem solving, discrimination and reversal learning, memory, physical and spatial cognition found low to moderate R values for temporal repeatability (0.15 and 0.28) and contextual repeatability (0.20e0.28; Cauchoix et al., 2018). The two detour task studies in the meta-analysis had very opposing results: no repeatability in New Zealand robins (R ¼ 0.002; Shaw, 2017) but high repeatability in Australian magpies (R ¼ 0.80; Ashton et al., 2018). This suggests that repeatability of the detour task across species and populations may be considerably variable if temporary environmental effects vary across space and time. The particularly high repeatability reported for the Australian magpies, for example, is likely due to the repeats being taken just 2 weeks apart (in comparison to the 12 months in this study) which is likely to inflate differences between individuals due to transient factors (Bell et al., 2009;Cole et al., 2011). In addition, cognitive tasks that quantify performance as binary success/fail achieve higher R estimates than those that tally the number of trials to reach criterion (Cauchoix et al., 2018), and therefore, without an additional measure of performance to compare our success/fail measurement, repeatability estimates here may be an overestimate. Differences in performance and effect sizes are to be expected across detour task studies, and comparative cognition studies generally (Farrar et al., 2020). Our finding that the same individuals performed better on the wild than the captive task highlights how different variants of the detour task are likely to affect how well individuals perform, which makes comparisons difficult.
Experience (Fagnani et al., 2016;Santac a et al., 2019;van Horik et al., 2019van Horik et al., , 2020, motivation (Shaw, 2017) and persistence  have been proposed to confound individual differences in inhibitory control. Performance on the wild task was higher than performance on the captive task, perhaps because birds also experienced the transparent barrier as they exited the box, not   Table 1 for model results.
just when they approached it. Motivational factors could also explain differences in performance between the wild and captive tasks. The use of food rewards may contribute to motivational effects attributed to between-individual differences in hunger state, body condition and/or food preferences Shaw, 2017). Placing the detour task in front of the nestbox access hole allowed us to measure performance independently of food rewards and we expected that all individuals would be highly motivated to participate in and complete the inhibitory control tasks as quickly as possible. Nevertheless, motivation could vary depending on the reproductive value and the viability of each parent's offspring. However, we found no evidence for this because performance was unaffected by brood size and lay date. Although we did not control for chick age and satiety in our own study due to model overparamaterization, these variables were not shown to influence great tit problem solving performance at the nestbox (Cauchard et al., 2017). Moreover, performance was repeatable despite different reward incentives between wild and captive tasks. Approach latencies, or time to complete a task, have been interpreted as proxies for motivation to obtain a reward, but we found no effect of these variables on performance in our captive task, nor did a similar study with common pheasants . Body condition may reflect an individual's energetic state, and has been linked with performance in the detour task in wild New Zealand robins (Shaw, 2017). We did not test for such an effect in the current study as we did not have an accurate measure of body weight at the time of the experiments. Weights were only taken when birds were handled (i.e. at capture), and can fluctuate as much as 5% for great tits in captivity (G Davidson, personal observation), and while parents are provisioning chicks. We controlled for many potential proxies of motivation and found that they did not affect performance, but it remains possible that there are intrinsic individual differences that we did not account for (Morand-Ferron et al., 2016). Between-individual differences in animal personality may influence task performance if personality is intrinsically linked to socalled cognitive 'styles' (Sih & Del Giudice, 2012). For example, behaviours commonly attributed to the reactiveeproactive animal personality axis include slow-exploring, environmentally sensitive individuals at one extreme and fast-exploring, routine-forming individuals on the other (R eale et al., 2007). These definitions have many parallels with definitions associated with human inhibition and impulsivity, including an impulsive behaviour with no forethought of consequences (Moeller et al., 2001). We found no evidence that exploratory behaviour, a common behaviour associated with the fasteslow exploration personality axis, was associated with performance on the detour task in captivity, thus excluding this as a dominant influence on cognitive performance. This is consistent with reports in black-capped chickadees, Poecile atricapillus (Guillette et al., 2017) and domestic dogs (Bray et al., 2015), but not common waxbills, Estrilda astrild (Gomes et al., 2020). We note that our measure of personality is an index of the fasteslow personality axis (Bell, 2007), and it may be that specific facets of this axis, for example responsiveness, the quality of exploration (i.e. information gathered) and neophobia, need to be measured in isolation to detect links with inhibitory control. Similarly, other kinds of personality axes, or behavioural variation in general, could play a role. Equally, we also note that our measure of inhibitory control likely only captures one facet of self-regulation but there are many others (e.g. delayed gratification; e.g. Mischel et al., 1989) that themselves may be controlled by distinct but related forms of inhibitory control we did not measure here. Despite the challenges in teasing apart different elements of inhibitory control, and cognition generally, much of the psychology literature suggests that different measures of inhibitory control are a component of a wider latent cognitive variable, such as general inhibition, executive functioning and general intelligence (Anderson & Weaver, 2009;Aron et al., 2004;Bari & Robbins, 2013), which has also been demonstrated in wild systems (Shaw et al., 2015;Ashton et al., 2018). The field of animal cognition has made major advances in describing cognitive variation between and within species, yet obtaining unbiased and realistic estimates of cognitive variation in natural populations remains a significant challenge (Morand-Ferron et al., 2016;Rowe & Healy, 2014;Thornton et al., 2014, but see, for example, Reichert et al., 2020). Limited participation in selfadministered cognitive trials in the wild potentially leads to bias towards some kinds of individuals, for example those with higher cognitive abilities (Reichert et al., 2020). By deploying a modified version of the detour task that we developed specifically for our system, where birds were compelled to visit, we were able to minimize participation bias at the population level, which can have a big impact on parameter estimation generally. Additionally, our approach limited the effects of human interference and social interactions by conducting the task at isolated locations. Finally, our results support the traditionally held view that the detour task is a reliable measure of inhibitory control, a cognitive process that is likely to be an important driver of functionally important behavioural plasticity. While it may never be feasible to study inhibitory control as a discrete module in isolation from extraneous, integrated processes, which is true for most cognitive processes, it may not be advisable or necessary to do so when addressing questions in evolutionary ecology (Morand-Ferron & Quinn, 2015) since selection rarely acts on individual genes or traits. Overall, cognitive estimates derived from the detour task can hold value either as a stand-alone task specifically measuring inhibition or as part of a larger test battery aimed at understanding general cognitive ability.

Methods
During the habituation phase, birds were familiarized with the components of the experimental apparatus. These consisted of an opaque rectangular piece of PET plastic film cut from a mobile phone screen protector (0.04 x 10 cm and 5 cm high) covered in camouflage tape attached to the top of the box above the nest hole, a wooden perch positioned at the base of the box attached with metal mesh and wire, and an RFID-equipped nestbox front. To pass the habituation phase, individuals had to enter the nestbox at least three times before moving onto the training phase.
During the training phase, the opaque barrier was inverted so that it covered the nest hole (Fig. 1a). This phase ensured birds could perform and were familiar with the motor action of going under the barrier. Birds were visually observed from the camcorder footage, and if they went under the barrier without touching it a minimum of three times over the course of the training phase, they were advanced to the test phase the following day. During the test phase, the barrier was transparent (i.e. without camouflage tape on the plastic film; Fig. 1b). The test phase was completed once the birds had made a minimum of five attempts to enter the nest hole. These attempts, hereafter referred to as 'trials', were defined as either colliding with or touching the transparent barrier or accessing the nest hole under the barrier without touching it. A minimum of five trials was set as the criterion as this was the lowest previsioning rate (i.e. five visits within 1 h) recorded in our population (W. O'Shea, personal communication, 2016). However, some birds did not achieve five trials in a 1 h experiment, and therefore the test phase was repeated the following day. For welfare reasons associated with nest disturbance, we allowed up to 6 days of experiments across all phases.

Results
One of the birds for which we had repeat measures was recorded in year 1, year 2 and in captivity. To control for potential effects of memory, we reran the contextual repeatability model excluding this bird's performance in year 2. Nonadjusted contextual repeatability in performance across tasks was low but significant (R ¼ 0.17, P < 0.001, CI ¼ 0.13, 0.18), and remained significant when controlling for sex (adjusted R ¼ 0.16, P < 0.001, CI ¼ 0.12, 0.17). We show the coordinates in decimal degrees (DD) and the main habitat type (mixed deciduous forest versus coniferous plantation) for each site. *Sites that were included in year 2. ǂ A site where birds were not caught for the captive task.   Table 1 for model results.