A field guide for aging passerine nestlings using growth data and predictive modeling

Accurate nestling age is valuable for studies on nesting strategies, productivity, and impacts on reproductive success. Most aging guides consist of descriptions and photographs that are time consuming to read and subjective to interpret. The Western Bluebird (Sialia mexicana) is a secondary cavity-nesting passerine that nests in coniferous and open deciduous forests. Nest box programs for cavity-nesting species have provided suitable nesting locations and opportunities for data collection on nestling growth and development. We developed models for predicting the age of Western Bluebird nestlings from morphometric measurements using model training and validation. These were developed for mass, tarsus, and two different culmen measurements. Our models were accurate to within less than a day, and each model worked best for a specific age range. The mass and tarsus models can be used to estimate the ages of Western Bluebird nestlings 0–10 days old and were accurate to within 0.5 days for mass and 0.7 days for tarsus. The culmen models can be used to estimate ages of nestlings 0–15 days old and were also accurate to within less than a day. The daily mean, minimum, and maximum values of each morphometric measurement are provided and can be used in the field for accurate nestling age estimations in real time. The model training and validation procedures used here demonstrate that this method can create aging models that are highly accurate. The methods can be applied to any passerine species provided sufficient nestling morphometric data are available.


Background
The nestling period is a sensitive time within the life cycle of altricial birds that strongly influences their survival and reproductive success (Langham 1972;Bryant 1978;Amiot et al. 2014). Quantitative measures of nestling growth and development are important for studying avian breeding biology and reproductive strategies (Amiot et al. 2014). Accurately aging nestlings at younger ages is challenging, but important for data quality throughout the nestling period. Often, nest box studies require collecting data on nestlings multiple times during the nestling period. If the initial age estimate is not accurate, there will be data quality issues throughout the study. Because young nestlings do not yet display some obvious physical characteristics (e.g., feather tract development), photographic guides and models based on morphometric measurements aid in accurately estimating age (Bortolotti 1984a;Wails et al. 2014;Costa et al. 2020).
The ability to accurately age nestlings is an important aspect of avian ecology that yields insight into the effects of different nesting strategies on nest success (O'Connor

Open Access
Avian Research *Correspondence: audrey_a@lanl.gov 1 Environmental Stewardship, Los Alamos National Laboratory, Los Alamos, NM, USA Full list of author information is available at the end of the article 1978; Shaffer 2004) and helps identify impacts of environmental variables on fitness, feeding habits, growth, and reproductive success (Jongsomjit et al. 2007). Studies that quantify nest success often use models (e.g., Dinsmore et al. 2002;Shaffer 2004) that directly input nest age (determined by the nestlings within that nest) as a covariate to predict nest success, and thus accurate nestling age would also be critical for any study employing these models. Overall trends in population demographics and determining different stages of nestling development that are most impacted by adverse environmental conditions can benefit from accurately aged nestlings (Partridge and Harvey 1988;Brawn 1991). Without an accurate way to estimate age, productivity can easily be overestimated or underestimated depending on the methodology used (Wails et al. 2014). In addition, it is important for determining when nestlings can be banded (Murphy 1981;Costa et al. 2020). Missing these opportunities could result in incomplete and poor-quality data if birds fledge before they can be banded. Other examples of single opportunity sampling (i.e., only one chance to obtain data) are studies that rely on taking blood samples or sexing birds at certain ages.
The most accurate method of aging nestlings would result from daily nest checks until eggs hatch, which provides researchers with a specific hatch date. However, this is not always feasible; checking a nest daily is time consuming, expensive, causes stress on the birds, and can lead to increased predation (Wails et al. 2014). Some researchers have used feather tract development and physical characteristics in photographic guides for age determination for a variety of species (Murphy 1981;Podlesak and Blem 2002;Jongsomjit et al. 2007;Fernaz et al. 2012;Brown et al. 2013;Amiot et al. 2014;Wails et al. 2014;Costa et al. 2020). However, the use of photographs alone can be highly subjective due to quality, perspective, and image scale (Bechard et al. 1985;Brown et al. 2013). There are also guides that contain written physical descriptions of nestlings for each day (Pinkowski 1975;Amiot et al. 2014). Costa et al. (2020) developed a photographic guide for aging European Bee-eaters (Merops apiaster) that is accurate to within 3 days. This guide also contains written descriptions on eye development, bill size and color, feather color and stages of development, motor coordination, and overall size for 3-day intervals throughout the nesting period. For researchers that manage large nest box networks and have crew members with varying levels of experience, reading a written description for each nestling is time consuming and interpretation is subjective.
Morphometric data have been used to estimate nestling age of various species [see Wails et al. (2014) for a list of published studies]. Predicting age from morphometric measurements using linear regression, where the morphometric measurement is the independent variable, has been used so that researchers can age nestlings in the field (Bortolotti 1984b;Gilliland and Ankney 1992;Palacios and Anderson 2018). This method of nestling age estimation has been used in Bald Eagles (Haliaeetus leucocephalus) (Bortolotti 1984b) and California Brown Pelicans (Pelecanus occidentalis californicus) (Palacios and Anderson 2018). Validation to determine the accuracy of a given aging method is a critical step, so that researchers know the uncertainty around their estimation (Wails et al. 2014;Costa et al. 2020). Failing to know the accuracy of estimations could result in poor-quality data. One way to validate age estimates is to use predictive modeling and validation procedures. This is done by applying the model to a subset of known test data, which are withheld from the model-building dataset, to quantify (e.g., using root mean squared error) how close the model-predicted values match the known test data.
Another method of decreasing uncertainty in age estimates is to use multiple morphometric measurements. Brown et al. (2011) compared the accuracy between inexperienced and experienced researchers for two methods of age determination: feather tract development and morphometric data. They found age estimates were most accurate for both inexperienced and experienced researchers when more than one morphometric measurement was used. Additionally, using a combination of morphometric measurements and physical characteristics is recommended if time allows and birds are not stressed (Murphy 1981;Haggerty 1994;Podlesak and Blem 2002;Jongsomjit et al. 2007).
The Western Bluebird (Sialia mexicana) is a small territorial passerine that is widely distributed in western North America from southern British Columbia to southern Mexico (Dickinson and Leonard 1996;Keyser et al. 2004). They are a secondary cavity-nesting songbird that inhabits coniferous and open deciduous forests. Ponderosa pine (Pinus ponderosa) forests provide nest cavities and low perches for insect hunting and constitute one of the Western Bluebird's typical habitats (Kozma and Kroll 2010). Western Bluebirds also live and nest in pinyon-juniper woodlands comprised of pinyon pine (Pinus edulis) and juniper (Juniperus sp.) trees.
Bluebirds also use nest boxes for breeding when there is cavity competition (Brawn and Balda 1988;Brawn 1991) or when there is loss of breeding habitat (Keyser et al. 2004). Nest box programs not only provide suitable nesting locations, but they also provide a mechanism for data collection and a way to monitor and evaluate environmental and anthropogenic impacts to populations over long time periods (Musgrave et al. 2019;Wysner et al. 2019).
Despite various nest box studies on Western Bluebirds (Brawn 1991;Wang and Weathers 2009;Fair et al. 2010), no nestling aging guides exist. There is an imagebased guide for aging Eastern Bluebird (S. sialis) nestlings (Pinkowski 1975), a non-peer reviewed guide describing general development of all bluebird species among different age ranges (Graham 2006), and an assessment testing the accuracy of three photographic guides for aging Eastern Bluebird nestlings (Wilkins and Brown 2015).
Here, we developed methods to age Western Bluebird nestlings using data collected from nest boxes over three breeding seasons in northern New Mexico. Our specific objectives were to create linear regression models built from morphometric data and present measurement ranges for tarsus, mass, and two different culmen measurements for each nestling age (0-21 days) to accurately age nestlings. Our regression approach included using multiple morphometric measurements to develop a series of validated predictive models that can be used to calculate age from measurements taken in the field. We provide information on how to use our data for researchers working with Western Bluebirds and how these methods can be applied to other passerine species using R code that is shared on a public data repository.

Study site
Our field work was conducted using an existing nest box network at sites on and around Los Alamos National Laboratory in Los Alamos County, New Mexico. Los Alamos County is located on the eastern flanks of the Jemez Mountains in north-central New Mexico, approximately 2200 m in elevation, on the Pajarito Plateau. The landscape consists of narrow mesas separated by steep-sided canyons. The dominant habitat type is dependent on elevation but there are four major habitat types in Los Alamos County: mixed conifer forest, ponderosa pine forest, pinyon-juniper woodland, and juniper grasslands (Fair and Myers 2002).

Field work
Nest boxes were placed on trees in locations dominated by ponderosa pine. A total of 14 nests over three breeding seasons (2015-2017) were monitored in this study. The number of nestlings studied varied each year with 10 in 2015, 26 in 2016, and 27 in 2017. We checked nest boxes every 2 weeks beginning in April until a clutch of eggs was discovered. Following egg discovery, we monitored those nest boxes daily to ensure the hatch day was recorded. Western Bluebirds generally lay one egg per day, averaging five eggs per clutch (Brawn 1991) and do not begin incubation until all eggs are laid (Guinan et al. 2020). They can display asynchronous hatching when looking at a much finer timescale (Ferree et al. 2010); however, in our study we did not have the ability to monitor the nest boxes hourly to determine the exact hatch order and we assumed all eggs hatched within a 24 h period.
Beginning with the hatch day (day 0) through fledging, we collected morphometric data daily for each nestling between 10:00 and 12:00 to ensure that a constant interval of development was recorded for each day. Two researchers were present for all measurements over all 3 years to limit observer bias. We weighed each nestling using a digital scale to the nearest tenth of a gram. Tarsus length and culmen length were also measured daily. We used digital calipers to measure the length of the tarsus and culmen to the nearest hundredth of a millimeter. Tarsus was measured from the tibiotarsal joint to the distal end of the last leg scale (Jongsomjit et al. 2007).
We used two different methods for measuring culmen. In 2015, we measured the total culmen length, which is a measurement from the tip of the bill to the edge of the skull (Baldwin et al. 1931;Winker 1998). In 2016 and 2017, we measured the culmen from the tip of the bill to the distal side of the nares (Baldwin et al. 1931;Borras et al. 2000). The different culmen measurements were a result of two different field crews interpreting methods differently.

Statistical analysis
All statistical analyses were performed using R version 3.5.1 (R Core Team 2018). For all three morphometric growth measurements (mass, tarsus, and culmen), we developed models for age prediction using linear mixed models (LMM), model training, and model validation. We used the 'lme4' package (Bates et al. 2015) for the LMMs. P-values presented for the mixed models were derived from Satterthwaite's degrees of freedom method using the 'lmerTest' package (Kuznetsova et al. 2017). The 95% confidence intervals are presented for each coefficient and were calculated using the confint function. We used the package 'caret' (Kuhn 2020) for splitting data that were used for model training and validation. Data were plotted using the 'ggplot2' package (Wickham 2009). The data analysis procedure is described below.
First, we plotted each measurement variable (mass, tarsus, and culmen) against nestling age to determine the age ranges for which predictions would be the most accurate and informative. We visually assessed the data in those plots to determine the age ranges selected for our analyses. Mass and tarsus were not as good of a predictor of age after 10 days as they were from 0 to 10 days. This was due to increased variation after day 10 (see Fig. 1a and Fig. 1b). Therefore, mass and tarsus were only used to predict nestling age up to day 10. As expected, the two different culmen measurements were distinct from each other and could not be used in the same predictive model (Fig. 1c). Specifically, 2015 is different from both 2016 and 2017 because of the two different measurement methods, as described previously. Therefore, we completed the model training and validation separately for these two culmen measurements. We also determined that culmen was accurate through day 15. For 2015, age becomes slightly harder to predict after nestlings become 15 days old (greater amount of variation between culmen and age by visually assessing the data). Initial analyses also suggested that models were much less accurate if the full nestling period (0-21 days old) was included. Additionally, there are no data for 2017 after day 15 (Fig. 1c). Therefore, culmen was only used to predict nestling age up to day 15 for these 2 years as well.
Mass, tarsus, and culmen were highly correlated (Pearson; mass-tarsus: r = 0.98, p < 0.001; n = 698; mass-culmen: r = 0.87, p < 0.001; n = 698; tarsus-culmen: r = 0.85, p < 0.001; n = 698) due to all three measurements increasing as nestlings got older. Due to issues regarding multicollinearity in a multiple regression model, we created separate models for each of the measurement variables. We combined all 3 years for mass and tarsus in an effort to obtain models that were more general (i.e., did not over fit the data) for a given year. Again, culmen was separated into two data sets (2015 and 2016-2017), which gave us four measurement variables for the modeling procedure described next.
We created four LMMs with nestling ID as the random effect to account for repeated measures of the same nestling over the course of the nestling period. The fixed effect in each model was the measurement variable (mass, tarsus, culmen 2015, and culmen 2016-2017). Models were fit using restricted maximum likelihood. The following model training and validation was applied to each measurement variable. We randomly split the data into two separate datasets: a model training dataset containing 90% of the original data and a model test dataset containing 10% of the original data. Based on our samples sizes, we wanted to make sure that enough data were being used in model training to reduce the error surrounding the estimates as much as possible, but still have enough points to test the models.
The mixed model, with the structure described above, was created from the training dataset using the lmer function in the 'lme4' package. Since some relationships appeared curvilinear when graphed, we made two models for each variable: linear and quadratic (eight models total). For each model, the test dataset was used to validate the model derived from the training dataset by making predictions from the mixed models. We used Fig. 1 Initial mass (a), tarsus (b), and culmen (c) measurements collected from nestlings for 0-21 days of age. For final models, mass (a) and tarsus (b) were analyzed from 0 to 10 days because variation increased after day 10 and predictive power of models was reduced. Culmen (c) was analyzed from 0 to 15 days because variation increased after day 15 for 2015 and data was not collected passed day 15 in 2017. Red dashed lines show the age below which data were used for model development. In all three panels, points are jittered to see the spread of the data the 'predict' function in R to predict the age (in days) of nestlings using the individual nestling measurement. This resulted in a predicted age and an actual age (from the test data) for a given nestling. For each model, we calculated the root mean square error (RMSE). RMSE is reported in days and is a measure of how the predicted ages differed, on average, from the actual ages. We chose between the linear and quadratic models using RMSE, since this is a measure of how well the model predicts nestling age. If there was a substantial drop in RMSE from the linear to quadratic model (> 10% decrease), then the quadratic model was used as the final model from which nestling age predictions were based. These final models were plotted with the best fit line, the actual ages (test data), and the predicted ages. Equations were obtained from the model coefficients and the y-intercept. The R 2 values were obtained using the r.squaredGLMM function in the 'MuMIn' package (Barton 2019).
The code and documentation for data processing, model training, and model validation can be found on GitHub (https:// github. com/ ChanL and/ nestl ing_ age_ model). This code can be used to conduct these analyses for nestlings of any other bird species, provided there are sufficient data available on known ages and the corresponding measurement(s) of interest.

Results
The model selection process (here, RMSE) to determine whether the linear or quadratic model should be used for predicting nestling age are presented for each development variable (Table 1). For mass and tarsus, the quadratic model did not improve our predictive power based on RMSE values (Table 1). Therefore, we chose the linear mixed model as the final model used to make the predictions from mass and tarsus. Mass had a RMSE of 0.43, which means that mass is accurate, on average, to within a half day. Tarsus is accurate to within 0.68 days. For both culmen measurements, RMSE decreased from the linear to the quadratic model (Table 1). Because the predictive power got better, we chose the quadratic model as the final model. Culmen 2015 is accurate to within 0.59 days and culmen 2016-2017 is accurate to within 0.70 days.
The model validation is shown graphically in Fig. 2. The full dataset used is shown for mass (Fig. 2a), tarsus (Fig. 2b), culmen 2015 (Fig. 2c), and culmen 2016-2017 (Fig. 2d). In each panel, the full dataset is split into the training data (gray points) and the test data (purple points). The green points are the predicted nestling ages from the final mixed models (described above) using the nestling measurement variables from the test data. Each green point is connected to the corresponding purple point (actual nestling age) by a vertical black line. The average difference between the predicted age and the actual age is measured by the RMSE (Table 1).
The output for the final model (linear for mass and tarsus, quadratic for both culmen measurements) is presented along with an equation that was created from the model coefficients (Table 2). These equations can be used with measurement data for each variable to estimate the age of unknown nestlings. The Additional file 1: Table S1 contains the mean, minimum, and maximum values for the mass, tarsus, and both culmen measurements for each nestling age from 0 to 21 days old.

Discussion
The most accurate nestling age would result from checking a nest daily until eggs hatch to determine a hatch date, however that is not always feasible. Checking a nest daily is time consuming, expensive, can cause stress on the birds, and can lead to increased predation (Wails et al. 2014). In this study, we developed a set of field tools designed to improve age estimates of Western Bluebird nestlings for use by researchers of varying levels of experience without having to determine the exact hatch date. We transformed 3 years of morphometric field data into predictive models for estimating nestling age. All of the measurement variables we used to develop our models are accurate to within less than a day: mass is accurate to within 0.43 days, tarsus is accurate to within 0.68 days, culmen 2015 is accurate to within 0.59 days, and culmen 2016-2017 is accurate to within 0.70 days. Our high accuracy species-specific guide is a resource for ensuring quality data and research on the Western Bluebird. The novelty of our study is that the model training and validation methods used here can be applied to any other passerine species provided enough accurate daily growth data are collected to develop robust growth models. In addition, assigning process-based age estimates ensures all field crews in a given study and across different studies are aging birds exactly the same way and it reduces the subjectivity of the estimates. This results in more robust comparisons between studies of a specific species. Growth curves of most passerines follow a logistic pattern characterized by an inflection point that occurs when 50% of the asymptote is attained (Pinkowski 1975;Starck and Ricklefs 1998). Traditional growth models for avian species are presented with age as the independent variable (Holcomb and Twiest 1971;Lyons and Mosher 1983;Bortolotti 1984a;Bechard 1985;Carlsson and Hörnfeldt 1994;Rodway 1997) to demonstrate the logistical growth pattern. The predictive models we developed consider age the dependent variable determined by morphometric field measurements, as age is the variable we are predicting. By first graphing the data set, we were able to visually assess the accuracy range for each morphometric measurement. Model validation provided metrics of accuracy for each measurement variable.
Young nestlings do not display some obvious physical characteristics for aging (e.g., feather tract development). Therefore, growth models can aid in capturing early life stage data. Reliable methods for estimating the age of nestlings is important for investigating ecological and behavioral aspects of a species (Amiot et al. 2014). Purple points are the test data (actual ages), while the green points are the ages predicted from the final model. The predicted age from the model is connected to the actual age by the vertical black lines. The black lines are the regression lines and are linear for mass and tarsus and quadratic for both culmen measurements. Shaded blue areas represent the 95% confidence intervals. In all four panels, points are jittered to see the spread of the data The reliability of our research is dependent upon normal variation in individuals and may vary between growth parameters. In addition, factors such as asynchronous hatching on a finer timescale could add some variation between nestlings. Western Bluebirds are facultative cooperative breeders, with a small proportion of breeding pairs having helper males at the nest Duckworth and Badyaev 2007;Potticary and Duckworth 2018). We did not have any way to identify that in this study, but it should be a consideration for future work, especially for other species using this tactic more often such as the Acorn Woodpecker (Melanerpes formicivorus) (Koenig et al. 2011). Single morphometric measurements can be used to predict nestling age with high accuracy. However, the combination of two or more measurements may increase the accuracy of age prediction.
The two different culmen measurements show the importance of identifying which technique should be used and training of field crews before collecting measurements to estimate age. Misinterpretation of measurement techniques by different field crews can result in inaccurate age predications. We provided both here so that both culmen measurement techniques can be used to predict nestling age. However, we emphasize that researchers should pick one technique for a given field season to avoid erroneous measurements and age estimations and to make sure the method used is documented.
If researchers plan to use our data for other Western Bluebird populations, we recommend that they test the models developed in this study with a few nestlings of known ages before relying exclusively on our models. It is known that passerine nestlings can exhibit variation in developmental patterns both within and among different geographic locations (Ardia 2006). The data ranges in the Additional file 1: Table S1 should be used to aid in age determinations in the field. If researchers plan to use these techniques to develop their own models for other species, we recommend gathering morphometric measurement data on individual nestlings for which the hatch date is known or can be inferred from frequent visits to the nest. Measurements should be taken on at least 30 nestlings. Ideally, this should be done over at least two field seasons to account for potential variation between years. Variation can occur from multiple factors including but not limited to, environmental influences, researcher bias, sibling competition, hatch order if eggs are hatching asynchronously, and whether or not there are helpers assisting with nest provisioning. When taking measurements, nestlings should be measured every day from day 0 to when they fledge. A greater number of nestlings in the dataset increases the accuracy of the models in predicting ages from the morphometric measurements taken. Once the morphometric data are collected, researchers should use the R code and documentation for data processing, model training, and model validation on GitHub referenced in the methods.

Conclusions
All measurement variables were accurate to within less than a day and two were accurate to within half a day. We discourage the use of the equations beyond the valid age ranges shown in Table 2. The models for calculating age from tarsus and mass are only valid for nestlings 10 days old and younger, as these metrics begin to plateau at this point, and thus observed changes are most likely due to observer bias (especially for tarsus), and variation in parental feeding success (mass). Similar patterns are seen in the culmen measurements, which make them less reliable at estimating nestling age greater than 15 days old. Once a given measurement is not reliable, the range of measurements for 0-21 days of age provided (Additional