Paired sampling standardizes point count data from humans and acoustic recorders

Acoustic recordings are increasingly used to quantify occupancy and abundance in avian monitoring and research. The recent development of relatively inexpensive programmable autonomous recording units (ARUs) has further increased the utility of acoustic recording technologies. Despite their potential advantages, persistent questions remain as to how comparable data are between ARUs and traditional (human observer) point counts. We suggest that differences in counts obtained from ARUs versus human observers primarily stem from differences in the effective detection radius of humans (EDRH) versus ARUs (EDRA). We describe how paired sampling can be used in conjunction with generalized linear (GLM) or generalized linear mixed models (GLMM) to estimate correction factors (δ) to remove biases between ARUs and traditional point counts. Furthermore, if human observers conduct distance estimation, we show that density estimates can be derived from single ARUs by estimating EDRA as a function of EDRH and δ, thus providing alternatives to more complicated and expensive approaches. We demonstrate our approach using data from 363 point count stations in 105 unique boreal study sites at which field staff conducted point count surveys that were simultaneously recorded by an ARU and later transcribed in the lab. Finally, we used repeated random subsampling of the data to split the data into model creation (70%) and validation (30%) subsets to iteratively estimate δ and validate density estimates from ARUs against densities calculated from human observers at the same independent validation locations. We modeled density of 35 species of boreal forest birds and show that incorporating δ in statistical offsets successfully removes systematic biases in estimated avian counts and/or density between human and ARU derived surveys. Our method is therefore easily implemented and will facilitate the integration of ARU and human observer point count data, facilitating expanded monitoring efforts and meta-analyses with historic point count data. Standardisation de données de points d'écoute issus d'échantillons appariés d'observateurs et d'enregistrements acoustiques RÉSUMÉ. Les enregistrements acoustiques sont de plus en plus utilisés pour quantifier l'occurrence et l'abondance dans le cadre de suivis et de recherches aviaires. Le développement récent d'unités d'enregistrement autonomes (ARU, pour autonomus recording units), programmables et plutôt bon marché a contribué à hausser l'utilité des technologies d'enregistrement acoustique. En dépit de leurs avantages possibles, des questions demeurent quant à la comparabilité des données issues de points d'écoute traditionnels (observateur humain) et d'ARU. Nous supposons que les différences obtenues entre les dénombrements issus d'ARU et ceux issus d'observateurs proviennent de différences dans le rayon de détection effectif (EDR, pour effective dectection radius) des humains (EDRH) comparativement à celui des ARU (EDRA). Nous décrivons de quelle façon un échantillonnage apparié peut être utilisé de pair avec des modèles linéaires généralisés ou linéaires généralisés à effets mixtes pour estimer les facteurs de correction (δ) permettant de supprimer les biais entre les points d'écoute traditionnels et ceux provenant d'ARU. De plus, si les observateurs doivent estimer les distances, nous montrons comment des estimations de densité peuvent être dérivées à partir de simples ARU, en estimant EDRA en fonction de EDRH et de δ, fournissant ainsi une alternative aux approches plus complexes et couteuses. Nous démontrons notre approche à partir de données provenant de 360 stations de point d'écoute localisées dans 105 sites d'étude boréaux, où du personnel de terrain a réalisé des dénombrements par point d'écoute qui ont été enregistrés simultanément par ARU et ensuite transcrits en laboratoire. Enfin, nous avons procédé par sous-échantillonnage aléatoire répété des données pour les séparer en sous-ensembles destinés à la création de modèles (70 %) ou à la validation (30 %) afin d'estimer δ de façon itérative et de valider les estimations de densité obtenues au moyen d'ARU comparativement à celles calculées à partir des données récoltées par des observateurs aux mêmes sites indépendants de validation. Nous avons modélisé la densité de 35 espèces d'oiseaux de forêt boréale et montrons que l'incorporation de δ dans les compensations statistiques enlève correctement les biais systématiques des estimations de dénombrements aviaires et/ou de densités trouvés entre les relevés faits par des observateurs et ceux obtenus au moyen d'ARU. Notre méthode est simple à utiliser, facilite l'intégration de données de points d'écoute issus d'observateurs et d'ARU, et permettra des efforts de suivi plus grands et des méta-analyses avec des données de point d'écoute historiques.


INTRODUCTION
Long-term monitoring programs such as the North American Breeding Bird Survey (hereafter BBS; Sauer et al. 2014) and the Christmas Bird Count (Link et al. 2006) provide crucial data that help to guide species status assessments and conservation efforts (Downes et al. 2016).Despite the success of those monitoring programs, many species are insufficiently monitored because of large geographic gaps in avian monitoring programs (Sauer et al. 2003, Francis et al. 2009, Machtans et al. 2014), or the species life history traits, e.g., crepuscular/nocturnal behavior, make them hard to monitor using traditional census methods (Goyette et al. 2011, Zwart et al. 2014).Many of the gaps in monitoring efforts exist in regions, e.g., boreal forest, or habitats where poor access and logistical constraints, high cost, and a lack of skilled observers have hindered monitoring efforts (Sauer et al. 2003, Francis et al. 2009, Machtans et al. 2014).Thus, it would be beneficial to augment monitoring efforts with alternative methods for poorly sampled species, habitats, and regions to better guide species assessments, conservation prioritization, and test hypotheses for causes of population change.
One potential solution to monitor bird populations in regions and habitats in which it is difficult to send skilled observers is to augment human observers with stereo recordings (Hobson et al. 2002, Francis et al. 2009, Klingbeil and Willig 2015).This approach is a plausible solution because most avian monitoring and research programs collect count data using point count surveys that primarily rely upon detection of acoustic cues (Hobson et al. 2002, Blumstein et al. 2011, Matsuoka et al. 2014).Indeed, several comparisons of data from stereo recordings against human observers in the field suggest that recordings provide relatively similar estimates of species abundance and community composition (Hobson et al. 2002, Blumstein et al. 2011, Venier et al. 2012, Klingbeil and Willig 2015).
Although many comparisons of acoustic recordings with point counts conducted by human observers suggest the data are generally comparable, subtle differences nonetheless do exist (Hutto andStutzman 2009, Venier et al. 2012).For example, Hutto and Stutzman (2009) found that significantly more species were detected using point counts than acoustic recordings and speculated that this may in part be due to differences in the radius over which species were audible between methods.In addition to potential differences between a given recording system and human observers, previous research has also shown variable species detection between different recording systems (Rempel et al. 2013).Thus, broader incorporation of ARU-based data into monitoring programs may require correcting for differential detectability between ARUs and human observers in the field (Sidie-Slettedahl et al. 2015).We describe a sampling design and analysis framework to relate counts from ARUs to traditional point count data and derive estimated bird densities for both by correcting data for biases in species availability and perceptibility following earlier developments by Bart and Earnst (2002) and Sólymos et al. (2013).Development of this method will allow for efficient and cost-effective augmentation of acoustic monitoring programs with ARU technology for species and regions with poor survey coverage.
We develop a framework to use simultaneously conducted human point count and ARU surveys to estimate statistical offsets to adjust for systematic differences between counts conducted by humans versus ARUs (following Sólymos et al. 2013).We use field data to test whether our approach removes bias in ARU-based counts relative to densities estimated from field observers conducting point counts using both distance estimation and time removal sampling.Our proposed approach to correcting counts between ARUs and humans would be most easily applied if the statistical offsets do not vary with other factors affecting detection, and we therefore tested whether the offsets differed by habitat or with environmental noise.Based on the literature and field experience, we hypothesized that ratio of counts from ARUs relative to counts from human observers conducted at the same time and location would be < 1 on the assumption that the detection radius of the ARU would be less than that of a human observer.We also hypothesized that the same ratio would be smaller when recordings were made in deciduous vs. other forest types and/or windy conditions because microphones tend to amplify leaf rustle (B.Turnbull, personal communication).

Avian sampling
We conducted our study in the boreal forest of Saskatchewan, Canada.Study sites were located in Bird Conservation Region 6 (Boreal Plains Ecozone) and Bird Conservation Region 8 (Boreal Shield Ecozone) between 53° 34'N, 103° 43' W and 58° 08'N, 109°2 8' W (Fig. 1) in the summers of 2014 and 2015.Surveys were conducted at 363 unique point count stations distributed among 105 study sites, each of which constituted unique forest stands.We sampled between 1 and 12 point count locations per study site (median = 3).Effort was approximately equally distributed between years (n = 205 in 2014 and n = 200 in 2015), with 42 point count stations in 9 separate study sites surveyed in both years.Based on the 250 m resolution Land Cover Map of Canada 2005 (LCC05; Latifovic et al. 2008), the most frequently sampled land cover classes were open coniferous (29.2%), closed mature mixed (16.3%), and open mature deciduous (11.6%) forest, with the remainder of the point count samples distributed among 13 other land cover classes (Table 1).
All surveys were conducted by one of five observers between 15 minutes prior to sunrise and 4.5 hours after sunrise between 1-29 June.Upon arriving at the point count station, observers attached a Song Meter SM2+ ARU with a pair of SMX-II microphones (Wildlife Acoustics Inc. ©, Maynard, MA) to the nearest tree at approximately head height and began a manual recording.ARUs were programmed to record in stereo in wav file format, using a sampling rate of 44,100 samples per second and using factory default acoustic gain settings for the microphone preamplifier.The observer stood approximately 3 m from the ARU to avoid introducing extraneous noise into the recordings.Observers then announced when they began and ended a simultaneous 10-minute point count to ensure data from the subsequent transcription of ARU recordings were collected over the identical time frame to point counts conducted by the human observers.Our point count protocol followed those recommended in Matsuoka et al. (2014).In brief, field observers placed observed or acoustically detected individuals into one of three distance bins (0-50, 50-100, > 100 m) while conducting point counts.Observers were trained in distance estimation prior to field work in addition to opportunistically ground-truthing distance estimates using GPS units to estimate distances between point count centroids and birds heard or observed while walking between point count locations.To account for differences in availability, observers additionally coded observations to the time interval (0-3, 3-5, and 5-10 minutes) of initial detection; thus treating any subsequent detections of the same individual as though it were "removed" from the population (Farnsworth et al. 2002).
To avoid introducing additional observer biases, ARU recordings were transcribed by the same observer that conducted the field count.During transcriptions, each acoustically identified individual was coded into one of 10 subset 1-minute long time periods (0-1, 1-2, through 9-10 minutes, respectively) to facilitate estimation of availability using a count-removal approach.ARU based intervals were subsequently collapsed to the 0-3, 3-5, and 5-10 minute intervals to match the human observer-based design.
Transcribers were not privy to field notes during transcription and transcription was conducted after field season.Unlike counts conducted under field conditions, transcribers were allowed to pause and/or rewind the recording, e.g., to confirm identification, as is frequently done in data transcription.Finally, transcribers categorized environmental noise recorded from the ARU data for each point count using a five point scale (1 = none, 2 = light, 3 = moderate, 4 = heavy, and 5 = excessive).
Framework Sólymos et al. (2013) previously demonstrated that point count data can be adjusted for differences in field methodologies if the data include information on the time (Farnsworth et al. 2002, Sólymos et al. 2013) and distance (Matsuoka et al. 2012, Sólymos et al. 2013) intervals in which the individuals were first heard.These extra data allow the application of removal (Farnsworth et al. 2002, Sólymos et al. 2013) and distance modeling (Buckland et al. 2001, Matsuoka et al. 2012) to estimate components of detection probability.Specifically, removal or time-of-detection methods allow the estimation of the probability that an individual bird present at the time of survey gave a visual or auditory cue and was therefore available for detection, i.e. availability (p), while distance sampling allows estimation of the probability that the available birds were detected (perceptibility [q]) given that they were available (Alldredge et al. 2007a, Nichols et al. 2009).The two components of the observation process can be estimated independent of each other using conditional maximum likelihood estimation (see Appendix in Sólymos et al. 2013).Sólymos et al. (2013) established that incorporating the components of detection probability as statistical offsets in generalized linear (GLM) or generalized linear mixed effects (GLMM) models effectively adjusts count data for differences in point count methodology.The offset based method of Sólymos et al. (2013) forms the basis of our approach to placing ARUs and human observers on a similar footing, but assumes we can approximate both components of detection for both humans and ARUs.
Obtaining p for both ARUs and humans is simply a matter of removal sampling (Farnsworth et al. 2002, Sólymos et al. 2013) or employing time-of-detection methods (Alldredge et al. 2007a, b).The use of Global Positioning System (GPS) synchronized ARU arrays allows distance to sound source to be directly estimated via differences in timing of sound arrival to linked ARUs (Dawson andEfford 2009, Mennill et al. 2012).The use of synchronized ARU arrays can provide accurate and precise estimates of density (Dawson andEfford 2009, Mennill et al. 2012), but is expensive owing to the need for many ARUs spaced Avian Conservation and Ecology 12(1): 13 http://www.ace-eco.org/vol12/iss1/art13/over small distances, e.g., ~30m (Mennill et al. 2012).To reduce costs, it would therefore be advantageous to devise methods of estimating distance-related detection error for single ARUs sampled with an unknown effective detection radius (hereafter EDR).Thus, we require a method to indirectly estimate detectability for single ARUs to apply the methods of Sólymos et al. (2013).
For a count conducted by the human observer (H), the expected value of a count for a single species from a point count survey observer can be expressed as: where Y H is the count, N is the species' abundance, D is the point level density (per unit area), A H is the area sampled, p( tj ) is the probability of an individual singing (and being detected) at least once during the cumulative duration of the count ( tj ) given that it is present to be detected (j=1,...,J; the number of time intervals), and q( rk ) is the probability that an individual bird within point count radius (r k ) is detected given that it is provides a cue, e.g., song, (k=1,...,K; the number of distance intervals).Although the area sampled (A H ) is typically unknown, it can be estimated via distance sampling, for example using binomial or multinomial distance estimators to estimate the effective detection radius (EDR, denoted here as τ) assuming perfect detectability (q = 1) within this effective distance: The simplest approach to determine the relationship between counts from human observers and those from ARUs is to conduct paired sampling (or "double sampling" sensu Bart and Earnst 2002).If we simultaneously use an ARU (A) to record the same acoustic environment in which a human observer (H) is conducting a point count, the population density to which both are exposed is identical by design; i.e., D = D H = D A .As a result, if all else is equal then differences in the observed counts from ARUs and human observers should be primarily due to differences in the area sampled by each method.We note, however, that minor differences in estimated abundances could also be due to differences in the probability of detecting cues from individuals birds (p H versus p A ) related to differences in how detections are made in the field versus in laboratory, e.g., possibility of double checking recordings or lack of external distractions in the laboratory.This assumption (p H = p A ) can be explicitly tested by estimating p H and p A from the data by recording time intervals in which individuals were first detected and using removal models (Farnsworth et al. 2002, Sólymos et al. 2013) or using time-ofdetection methods (Alldredge et al. 2007a,b).For the sake of simplicity, we start by assuming that p H and p A are equal.If we divide the expected values of the counts, we can observe the expected relationship between the areas sampled by humans versus ARUs: So, if we let δ be such that: and Therefore, Equation 3 could be written as: As a result, the ratio of mean counts derived from the ARU to mean counts by the human observer provides an estimate of a squared scaling constant (δ²) that mathematically relates τ H to the unknown EDR of an ARU (τ A ). Counts Poisson or negative binomial GLM or GLMMs can be fit in this fashion combining human observer and ARU based counts using an indicator function (I A ) taking 0 value for human observers and 1 for ARU based counts: log( ) = log( ) + log( ) + " log ( ) Log density is estimated as a linear combination of predictor variables and corresponding coefficients.

Data analysis
Prior to analysis, we removed species, e.g., gulls, ducks, that are poorly monitored using point count methods because they are frequently detected as flyovers and thus violate the closure assumption.We then began by estimating EDR H based on our model calibration data.We limited analyses to species with at least 15 detections, and fit half-normal binomial distance models to estimate EDRs (Matsuoka et al. 2012).We then fit count removal models to both the human observer and ARU data using a model in which we included survey type as a factor to test for a difference in species availability.We considered p unequal if the 95% confidence interval (hereafter 95% CI) for ARU survey parameter estimate did not overlap zero.Distance and removal models were fit using the "detect" package based on conditional maximum likelihood estimating procedure (Sólymos et al. 2016).
Although δ² can be approximated based on the square root of the ratio between arithmetic mean ARU and human observer counts (see above), we are interested in deriving maximum likelihood estimates and associated confidence intervals of δ that account for sampling design.These can be derived from coefficients (δ² = exp[β]) from Poisson or negative binomial regression, which can be interpreted as the ratio of the count between levels of a treatment.We used Poisson GLMMs to estimate δ by including http://www.ace-eco.org/vol12/iss1/art13/a fixed effect factor for survey type (ARU vs. human as the reference category), and included random intercepts for station and visit to account for paired observations between human observers and ARUs.Following Sólymos et al. (2013), we used our human observer data to derive species specific estimates of log(EDR H ² • π • p) and included these as statistical offsets in our GLMMs.We estimated δ for each species in which the comparison of availability between ARU and humans (above) showed p H is approximately equal to p A as per the assumptions of our approach.
We validated the predictive performance of our models and examined bias in density estimates (relative to those estimated by the human observers in the field) by using repeated random subsampling of the data.In each repeated subsample, data were partitioned by randomly selecting 70% of the study sites (n = 74) for developing GLMMs from which we estimated δ, and 30% of the study sites (n = 31) were withheld as independent validation samples.We repeated this random sampling 50 times.In each repeated sample, we estimated δ using the aforementioned GLMM structure and calculated the 95% CI across the 50 replicated analyses.We also calculated empirical estimates of δ by dividing the mean ARU count by the mean human observer count in the withheld validation data for each of the 50 repeated subsampling events.We then assessed whether the 95% CIs for the GLMM-based estimates of δ overlapped with 95% CIs from the empirical estimates of δ and examined the (Pearson's) correlation between both estimates of δ.In addition, we also examined whether the inclusion of δ in statistical offsets reduced bias in predicted densities from ARU surveys within each random subsampling.We began by estimating density for human observations by fitting a GLMM to the subset of each validation subsample in which we included a random intercept for study site and a statistical offset, i.e., log(EDR H ² • π • p) to generate mean study site level density estimates.We then fit two competing models to the ARU data from the same sites with the same random effects structure as used for the human observer data, but fit one (our "null" model) in which we included the statistical offset used for the human observer data, and a competing model in which we used the δ estimate from the model calibration data within the same iteration to estimate the offset as log([δ *EDR H ]² • π • p).
Based on these models, we calculated bias as the difference between the mean density predicted from the models fit to the ARU data minus the predicted mean density estimated from the human observer data.We were interested in testing whether δ values estimated by different approaches were statistically different.Because δ represents a relative difference between two methods and we used the same EDR H estimates, incorporating the uncertainty around EDR H would not have changed our results.It should be noted however that propagating the error through modeling (e.g. as described in Sólymos et al. 2013) might be required when estimating bird densities.
Finally, we used our full data set to assess whether δ varied between habitat types (deciduous/mixed wood habitat types versus all other categories) and environmental noise conditions.We constructed six a priori GLMM models that all included random intercepts for station and visit as per above and offsets based on those calculated from human observer data.We included a null (intercept only) model, a model with fixed effects for survey type (two-level factor: ARU vs. human), habitat type (two-level factor: deciduous/mixed vs. other) and models with both survey type plus habitat type or survey type plus environmental noise as main effects.Although noise was an ordinal variable, previous analyses suggest a reasonably linear response of counts to our noise variable and thus we treated noise as a linear covariate.Finally, in addition to the main effects models, we included two models that incorporated interactions between survey type and habitat type versus one with survey type and noise interaction.We did not consider models including all three main effects and interactions.We selected among competing models based on Akaike's information criterion (AIC, Burnham and Anderson 2002), and we only considered models with a ΔAIC of < 2 as potentially competitive.If habitat or environmental noise differentially impacted the detection radius of an ARU relative to a human observer, models including the interaction terms should receive the greatest support.

RESULTS
Forty-one species met our minimum sample size criteria (Table A1.1).Across species, effective detection radii (EDR H ) ranged from ~34 to 167 m (median = 68 m; Table A1.1).Of the 41 species for which we estimated availability based on count-removal models, models did not successfully converge for one species (Northern Flicker, Colaptes auratus), and five species did not meet the assumption of equal availability based on parameter estimates for the ARU factor in removal models.Estimates of availability (p) were strongly correlated (Pearson's r = 0.81, p < 0.001) between count removal models fit to human observer versus ARU count data (Fig. 2; Table A1.

Fig. 2.
Relationship between probability of a bird singing at least once (and being detected) during a 10-minute point count survey estimated from count-removal models fit to data from 41 species from simultaneous point counts conducted by human observers in the field (x-axis) versus from autonomous recording units (ARUs, y-axis).Dashed diagonal line indicates 1:1 correspondence.http://www.ace-eco.org/vol12/iss1/art13/Models examining variation in paired counts between human observers and ARUs suggested that the majority of species were slightly less detectable on ARU recordings than in the field (Fig. 3; Table A1.2).Across species, the median estimate of δ was 0.95 (minimum = 0.78, maximum = 1.11); however, 95% CIs overlapped one for 18 out of 35 species (Fig. 3; Table A1.2).
Comparison of 95% CIs around estimates of δ against those for empirical ratios of ARU to human observer counts showed overlap for all 35 species (Table A1.2).Across species, estimates of δ from our calibration models were positively correlated with empirical ratios derived from the withheld validation samples (Fig. 3; Pearson's r = 0.84, p < 0.001).Applying δ estimates within the statistical offsets resulted in reduced bias for 33 out of 35 species compared to modeling the data using offsets taken solely from human observer data (Fig. 4).Failing to incorporate δ estimates within the statistical offsets resulted in 24 species with negative biases in their density estimates (Fig. 4).Of the 24 species with negatively biased density estimates derived using uncorrected offsets (taken from human observer data), five species (Ovenbird [Seiurus aurocapilla], Dark-eyed Junco [Junco hyemalis], Ruby-crowned Kinglet [Regulus calendula], Clay-colored Sparrow [Spizella pallida], and Connecticut Warbler [Oporornis agilis]) had 95% CIs that did not overlap zero (Fig. 4), whereas density estimates for these same species were unbiased when δ was incorporated in the offsets (Fig. 4).For Ovenbird, failing to incorporate δ within the offset resulted in density being underestimated by 0.10 birds ha -1 on average.Similarly, densities of Dark-eyed Junco, Ruby-crowned Kinglet, Clay-colored Sparrow, and Connecticut Warbler were underestimated by 0.07 birds ha -1 , 0.04 birds ha -1 , 0.02 birds ha -1 , and  2 for scientific names of bird species.
0.01 birds ha -1 , respectively.Conversely, estimates of Philadelphia Vireo (Vireo philadelphicus) and Cedar Waxwing (Bombycilla cedrorum) densities were less biased on average when the statistical offsets did not incorporate δ; however, 95% CIs overlapped zero for both offset approaches (Fig. 4), suggesting both methods produced unbiased estimates.Although the remainder of the species had 95% CIs that overlapped zero for both of the statistical offset approaches, incorporating δ also (on average) reduced overestimation of densities (Fig. 4).For example, using uncorrected versus δ corrected statistical offsets resulted in greater overestimation of density on average for Cape May Warbler (Setophaga tigrina; 0.02 vs. 0.01 birds ha -1 ), Palm Warbler Based upon AIC model selection, the null model was the most parsimonious for 18 species, the model only including the survey type factor was the most parsimonious for two species, the model including factors for both survey and habitat type was the most parsimonious for 10 species, and the model including survey type and noise was the most parsimonious model for five species (Table 2).There was substantial model uncertainty for virtually all species; however, neither of the interaction models received substantial support (Table 2).Across species, the minimum ΔAIC (3.43) for the survey type by habitat type interaction was observed for Dark-eyed Junco, and parameter estimates from the interaction in that model show little evidence for an effect (β = -0.38,SE = 0.51).Similarly, the most substantial support for the survey type by noise interaction was observed for American Robin (Turdus migratorius; ΔAIC = 3.21), which similarly showed little evidence for an effect (β = 0.17, SE = 0.32).

DISCUSSION
Our results provide further evidence supporting the conclusions of previous researchers (Haselmayer and Quinn 2000, Hobson et al. 2002, Celis-Murillo et al. 2009, Blumstein et al. 2011) that the raw counts derived from both acoustic recordings and human observers are relatively comparable.In paired comparisons between ARU and human observers, we found that the null models were favored over models incorporating a survey type effect for 18 out of 35 species.This was further supported by parameter estimates for the survey effect (δ) that overlapped one for the majority of the species that we modeled.Together, these lines of evidence suggest occasional minor biases exist in count data from ARUs relative to human observations.
Despite the relative similarity of many of the raw counts, systematic biases were apparent, and for five species the 95% CIs for estimated bias in densities did not overlap zero if the statistical offsets did not incorporate δ; thus analytically dealing with these biases will be important for data integration.This may be especially important since the biases may differ between acoustic recorder types and brands (Rempel et al. 2013, Yip et al. 2017) and/or may change with equipment wear (Turgeon et al. 2017).
We demonstrated that correcting ARU data for differential detectability can effectively remove the majority of these biases.Because our experimental design and statistical analysis resulted in similar species availability, we suggest that the key source of bias in the counts derives from differences in detection radius between human observers and ARUs.
Our results suggest that the relatively simple approach of pairing human observers with ARUs allows data to be successfully corrected for systematic biases between counts.Repeated random subsampling of our data suggested that δ estimates were relatively robust to sampling variation and were correlated with empirical estimates calculated from withheld data from independent study sites.In addition, applying offsets incorporating δ reduced bias for almost every species examined.Furthermore, we found little support for interactions between survey type and habitat type and survey type and environmental noise effects on δ.The lack of support for the interaction models suggests that biases between ARUs and human observers are apparently relatively consistent (but see below).Therefore, paired sampling can generally be used to derive corrections that can be readily obtained using relatively common Poisson (GLM or GLMM) regression models simply by including survey type as a factor available in most modern statistical software.
In addition to our approach facilitating integration of ARU data with human point counts, it has the added benefit that avian density estimates can be derived from single ARUs as long as the human observers include distance estimation in their survey protocol.Alternative methods exist to derive densities from ARUs, typically employing acoustic localization from arrays of synchronized ARUs (Dawson and Efford 2009, Campbell and Francis 2012, Mennill et al. 2012).Acoustic arrays require a greater financial investment in ARUs because multiple ARUs are required for an array and each is more expensive owing to additional hardware (GPS).For example, as of the time of writing, Wildlife Acoustics Inc. (http://www.wildlifeacoustics.com/store#song-meter-sm3) charges US$1049 for a single SM3 ARU plus an additional US$299 for the accompanying GPS module.Although acoustic localization is rapidly evolving, it can be logistically and computationally difficult.Thus, our approach provides a logistically feasible and affordable alternative to more complicated designs.Future paired comparisons between ARU and/or human point counts placed within acoustic arrays or traditional spot mapping grids (Bart and Earnst 2002) would further improve certainty in point count density estimation and would naturally fit with the analytical approach we describe here.
An alternative method to correcting biases between ARUs and human point counts would be through playback experiments.Yip et al. (2017) conducted playback experiments in which species calls were played along transects at various distances away from human observers and ARUs.An experimental approach has the advantage of the calls coming from known distances, however it also requires the experimenter to make assumptions about the amplitude at which birds sing/call because how loud wild birds sing is variable and generally unknown (Brackenbury 1979).In addition, the effects of directionality on song amplitude in wild birds are not well described but are known to impact detection probability from experimental playbacks (Alldredge et al. 2007a, c).Therefore experimentally replicating the impact of bird orientation relative to point count location complicates the approach.In contrast, with respect to density estimation our approach assumes that observers accurately estimated distance, which can be inaccurate (Alldredge et al. 2007c).We suggest that both paired comparisons and experimental playbacks could be used in a complementary fashion to estimate correction factors between human point counts and ARUs.Further, we suggest that paired sampling is a pragmatic approach to obtain statistical offsets for the majority of species and does not require assumptions about amplitude and directionality of songs.Where sample sizes become limiting because of species rarity, the experimental approach of Yip et al. (2017) would allow estimates to be obtained for species for which δ cannot be estimated because of a lack of detections.http://www.ace-eco.org/vol12/iss1/art13/Similar to our results, Yip et al. (2017) generally found estimates of δ that were less than 1.Unlike our results however, Yip et al. (2017) found evidence for habitat related variation in δ.
Presumably detection for both human observers and ARUs were similarly affected by habitat and environmental noise in our experiment and thus our density estimates may be biased low; however, systematic differences between survey types were apparently corrected.Given the apparent difference between our results and those of Yip et al. (2017), future analyses under a broader set of habitat conditions and with a broader range of species may provide evidence suggesting the need for stratification to improve the corrections we have employed here.For example, we did not fit observer or habitat specific distance models that would presumably improve precision because there can be substantial interobserver variation in distance estimation (Nadeau and Conway 2012).Despite not having estimated observer or habitat specific offsets, our validation still suggests a substantial reduction in bias despite randomly sampling among habitats and observers.Greater effort should be put into replicating our design with more combinations of species, habitats, and environmental conditions to facilitate estimating how much annual effort should be placed on paired sampling because the added time for transcription is an added cost of our method.
Our results and external validation provide evidence that data from both human observers and ARUs can be placed on a similar footing.We therefore recommend monitoring and research programs begin further integration of ARUs and human observed point counts to take advantage of the relative merits of both methods.Not only would this improve sample sizes, but would also allow researchers to gain a better understanding of factors influencing detection probability owing to the ease of obtaining repeated samples with programmable ARUs.Although further sampling could provide refinements to our estimates, we have shown that our approach reduces bias related to survey type.Our method therefore provides an easily implemented method that facilitates the integration of ARU data with human observer point counts to allow expanded monitoring efforts and will facilitate meta-analyses with historic point count data to examine factors influencing avian populations (Cumming et al. 2010, Sólymos et al. 2013).
Responses to this article can be read online at: http://www.ace-eco.org/issues/responses.php/975

Fig. 3 .
Fig. 3. Comparison of Poisson generalized linear mixed model based (GLMM) estimates of δ (± 95% confidence intervals)against empirical estimates of δ (± 95% confidence intervals) calculated by dividing the mean ARU count by the mean human observer count in the withheld validation data.GLMM estimates were derived from iteratively (50 repeated random subsamples) fitting GLMMs to 70% of the study sites and compared against the empirical estimates of δ which were estimated from the 30% of withheld study sites.Dot-dash diagonal line (red) indicates 1:1 correspondence.

Fig. 4 .
Fig. 4. Bias in estimated densities (birds ha -1 ) from point count data derived from autonomous recording units (ARUs) compared to densities derived from human point counts conducted at the same time and location.Densities from human point counts were derived by adjusting counts for biases in availability (p) and perceptibility (q) using QPAD approach (Sólymos 2013) by inclusion in a statistical offset (i.e., log [EDR H ² • π • p], see Methods).Densities from ARU surveys were derived by applying QPAD offsets from human observer data (open circles -log[EDR H ² • π • p]), versus adjusting the offsets to account for the scaling constant δ (i.e., log[(δ * EDR H )² • π • p], closed red circles).Bias was estimated from fitting models to 70% of the study sites and validated against the withheld external validation sites (30%) over 50 repeated random subsamples of the data.See Table2for scientific names of bird species.

Table 1 .
Distribution of point count samples amongst land cover classes derived from the 250 m resolution Land Cover Map of Canada 2005 (LCC05; Latifovic et al. 2008).
Sólymos et al. (2013)log-linear Poisson general linear (GLM) or generalized linear mixed (GLMM) models.If we estimate τ H and p H using distance and removal sampling, respectively, followingSólymos et al. (2013)we can calculate a correction factor (C) and the mean for a count made at point count location i (i=1,...,n; number of locations) by the human observer that can be expressed as:

Table 2 .
Model selection based on Akaike's Information Criterion (AIC) for Poisson generalized linear mixed effects models examining variation in counts of 35 species of boreal forest birds.Presented are ΔAIC values that rank models relative to the model with the lowest AIC value, with the lowest value (i.e., 0) representing the most parsimonious model.