Breeding origins of a uniquely regular migrant songbird in the Galápagos Islands

Abstract Little is known about the causes and consequences of alternative pathways flown by long‐distance migratory birds. Bobolinks (Dolichonyx oryzivorus) breed in grasslands across northern North America and migrate from their breeding grounds toward the eastern Atlantic Coast and then proceed through the Caribbean to South America. However, a small but regular number of Bobolinks have been recorded on the Galapagos Islands. We collected genetic samples from nine Galapagos Bobolinks and performed double‐digest restriction site‐associated sequencing. We compared them with samples from seven locations across their breeding distribution to determine their population of origin. Galapagos Bobolinks shared the genetic structure of a cluster in the eastern portion of the breeding range that includes New Brunswick and Ontario, Canada, and Vermont, United States. Genetic assignment tests largely corroborated this finding, although slightly different results were obtained for the two methods. All individuals were assigned to the Ontario breeding population using AssignPop, while Rubias assigned six of the migrants to Ontario and three to a Midwest breeding population. Low average relatedness among Galapagos individuals indicates that they are not more related to one another than to individuals within a breeding population and are therefore likely not from a single, small isolated population. Our results do not support the probability hypothesis—that Galapagos Bobolinks originated from the region that includes the greatest proportion of their breeding range (Great Plains)—or the vagrant hypothesis—that migrants are displaced onto Galapagos due to weather events. Instead, our findings support the proximity hypothesis, where migrants originate from the geographically closest‐breeding populations.


| INTRODUC TI ON
Birds face many ecological and evolutionary trade-offs when responding to the challenges of long-distance migration, particularly when navigating large ecological barriers. For example, morphological traits that favor higher maneuverability can enable individuals to take a less risky but more energetically challenging route (Corman et al., 2014). Increased body size, enabling greater cold tolerance, can differentiate migration strategies between sexes (MacDonald et al., 2016). Responses can also be plastic as an individual songbird's migration strategies can differ notably between years (Fraser et al., 2019). Some birds balance the timing of their migration relative to their fat stores, exchanging the ability to expediently complete the full migration with achieving optimal timing of arrival on the breeding grounds (Prop et al., 2003). Cumulatively, these and other trade-offs result in a complex matrix of between-and within-species variation in migration strategies.
A songbird's breeding longitude and latitude can explain variation in migration routes that entail circumnavigating or crossing large barriers (Fraser et al., 2013) or lead to more time spent on migration (Neufeld et al., 2021). For example, all nightingales (Luscinia megarhynchos) tracked from three populations across their European breeding distribution took detours from the shortest route possible, and the detour extent varied among populations and between spring versus fall migration (Hahn et al., 2014). When crossing large barriers, birds may take the shortest path with a tailwind, rather than wait for optimal wind conditions that would enable a longer non-stop flight (Abdulle & Fraser, 2018). Individuals who are capable of crossing a large barrier without stopping may experience an advantage by minimizing total migration time (Bennett et al., 2019) and reducing the overall energy costs of migration (Schmaljohann et al., 2010).
The Bobolink (Dolichonyx oryzivorus), a grassland obligate songbird in the blackbird family (Icteridae), migrates between North and South America each year. Its breeding range spans nearly coast to coast in Canada and the United States. Populations from British Columbia to Nova Scotia migrate through the Caribbean and the Llanos of Venezuela and Colombia to and from their wintering grounds in Bolivia and Argentina . There is at least one consistent exception to their general migratory route.
Beginning with Charles Darwin in 1835 (Darwin, 1963), observations of small numbers of Bobolinks on the Galápagos Islands have been documented on numerous occasions, including observations in at least 13 different years between 1957 and 2021 ( Table 1).
Numbers of birds and observations are greatest during October and November, and generally higher during fall and spring than any other time of the year, although individuals have been observed on the islands in most months of the year (Lévêque et al., 1966). Harris (1973) suggested that numbers of individuals "vary greatly from year to year," although systematic surveys have never been conducted.
The breeding origins, migratory routes, and non-breeding locations of Bobolinks that stop in Galapagos are unknown . Pettingill Jr. (1983) hypothesized that they breed west of the Continental Divide, migrate south through western Mexico, and continue across the Pacific Ocean. Bobolinks breeding in Oregon and British Colombia, however, have so far been shown to use the species' main migratory route by flying east from their breeding grounds toward the East Coast and then proceeding through the Caribbean to northern South America (Renfrew et al., , 2020. Thus, to date, we lack an understanding of the migratory pathways, including the breeding grounds of origin, for the birds that stopover on Galapagos.
Identifying the breeding origins of Bobolinks stopping in the Galapagos Islands will not only contribute to our understanding of avian migration patterns but it is also important to the conservation of resident bird species on the islands. Bobolink is the only songbird that regularly migrates through the Galapagos Islands, and it carries at least two Plasmodium lineages that infect endemic Galapagos birds (Levin et al., 2013), presumably through the vector of two recently introduced species of mosquitoes. Based on a database of parasites in co-occurring grassland bird species, one of these lineages is likely obtained by Bobolink on migration stopover, while the other is more likely transmitted to Bobolink on its breeding grounds (Levin et al., 2016). However, although four of nine Bobolinks sampled in the Galapagos Islands during their southbound migration carried Plasmodium, the lineages were not those previously identified in Galapagos endemic birds; these infections provide a catalog with which to monitor for future parasite infections (Perlut et al., 2018 Here, we use genetic analyses to examine the breeding origins of Bobolinks that stop in Galápagos during their southbound migration. We assign individuals to their breeding region based on the known genetic population structure of Bobolink across its breeding range (Renfrew et al., 2021) using two genetic assignment methods, and we assess whether they originated from a single, local breeding area by determining their genetic relatedness.

| Sample collection and preparation
On October 12-23, 2015, we searched for Bobolinks on migration stopover in native and agricultural grasslands in the highlands of San Cristóbal Island, Galápagos, Ecuador .
We used mist nets and playbacks to capture nine Bobolinks at Finca de las Gemelas (422 m elevation; 0°53′26.23″S, 89°27′15.35″W), a 2.5 ha grassland patch, and in a pasture at Santa Monica, an agricultural complex 9.15 km west of Gemelas, owned by the Ecuadorian military (435 m elevation). We fitted each bird with a unique U.S. Geological Survey leg band, recorded morphometric measurements, and collected a blood sample from the brachial vein into lysis buffer for DNA preservation (Longmire et al., 1988). We used a standard phenol-chloroform extraction protocol following Sambrook et al. (1989) of each sample. Samples below 10 ng/μl were concentrated through vacuum centrifuge, while samples above 25 ng/μl were diluted, resulting in a concentration range 5-25 ng/μl across samples.

| Molecular and bioinformatic analyses
The nine Galápagos samples presented here were sequenced in conjunction with 174 samples collected from seven populations ( Figure 1) in the Bobolink breeding range, as part of a study on population structure (Renfrew et al., 2021). Analyzing these datasets to-  Renfrew et al. (2021).
Briefly, we digested 200-500 ng of DNA with the enzymes SbfI and MsPI, followed by ligation of P1/P2 adapters using T4 DNA ligase.
Samples were pooled into index groups using unique P1 adapters (Peterson et al., 2012)  Bioinformatic data processing and SNP identification are also described in Renfrew et al. (2021). In summary, we checked all raw DNA sequences for quality using FastQC (Andrews, 2010). Using the FASTX-Toolkit (Hannon, 2010), we trimmed all sequences on their 3′ end to 97 bp and eliminated reads that had a Phred quality score ≤10, in addition to 95% of the reads that scored <20. Using process_radtags in STACKS (version 2.0), we demultiplexed all reads, while also filtering for low-quality sequences, those that did not meet the Illumina chastity/purity filter, had an uncalled base, or had one or more adapter mismatches. We further required that reads had an intact SfbI RAD cut site and one unique barcode. The resulting reads were trimmed using fastx_trimmer to the length of the shortest sequence (90 bp). We used the STACKS (version 2.0) pipeline to identify SNPs from these filtered sequences. We optimized parameters for the de novo assembly, as described in Renfrew et al. (2021), resulting in m = 3 (number of identical reads required to initiate a new putative allele) and M and n = 5 (number of mismatches allowed between stacks within individuals and number of mismatches allowed between stacks between individuals). We used program Populations in STACKS to group individuals into eight geographic sampling locations (seven breeding locations and Galápagos), from which we identified polymorphic SNPs. We required an SNP to be present in 70% of individuals and six populations to be called. We also required a minimum minor allele frequency of 0.01 and maximum observed heterozygosity of 0.5 to process a nucleotide site at a locus. Only the first SNP per stack was used to avoid linkage in the dataset. This resulted in a final SNP panel of 3234 SNPs. We tested for conformance to Hardy-Weinberg equilibrium and validated that missing data in our final dataset had no effect on inference of population structure (Renfrew et al., 2021). We also used the program BayeScan (version 2.1; Foll & Gaggiotti, 2008) to identify loci putatively under selection within our dataset, and we removed eight loci that were identified as outliers, to form a putatively neutral SNP panel (3226 loci).

| Population-level analyses and relatedness
To characterize genetic structure of the Galápagos migrants in relation to the four previously identified breeding populations (Renfrew et al., 2021), we used a Bayesian clustering algorithm in program STRUCTURE (version 2.3.4; Pritchard et al., 2000). Using our neutral SNP panel, we ran 10 runs of 150,000 MCMC repetitions with a burn-in of 50,000 repetitions for each K value (K = 1-8). We used the LocPrior model (Hubisz et al., 2009) with correlated allele frequencies. We determined the allele frequency parameter used in subsequent analyses by performing an initial run of K = 1, allowing lambda to vary and settle at the correct value (lambda = 0.37). To assess the results of STRUCTURE and determine the best-supported model, we used a combination of the delta K method (Evano et al., 2005) and plateau in lnPD criterion (Pritchard et al., 2000) from program Structure Harvester (Earl & vonHoldt, 2012). We visualized results using CLUMPAK (Kopelman et al., 2015).
We performed a discriminate analysis of principle components (DAPC) with the Adegenet package in program R as an additional method to assess genetic differentiation among the samples (Jombart, 2008) and determine with which breeding population(s) Galápagos migrants cluster most closely. We first transformed the data using a principal components analysis and then determined genetic clusters using discriminant analysis. We ran this analysis with our neutral SNP panel (3226) with inferred populations as our sampling locations. We performed a cross-validation to accurately determine the number of principal components to retain (78 PCs and seven discriminant functions retained).
To test whether Galápagos birds likely originated from a single breeding locale or had more diverse population origins, we evaluated population-level inbreeding coefficients and relatedness values, in comparison with the birds from the seven breeding locations. We hypothesized that if Galápagos birds had higher than average population-level relatedness and inbreeding values, this would provide evidence consistent with originating from a single locale. We calculated inbreeding coefficients (F is values) for each of the seven breeding locations and the nine Galápagos migrants, in the program Populations of STACKS. We calculated pairwise individual relatedness values for each bird using --realtedness2 function in VCF tools (Danecek et al., 2011) following the method described in Manichaikul et al. (2010), and averaged them across sampling locations.

| Assignment tests
We used two methods of genetic assignment testing to determine the origin of Galápagos migrants to their breeding population-AssignPop (Chen et al., 2018) and Rubias (Moran & Anderson, 2018), both implemented in program R (R Core Team). Both methods employ a framework that is designed for solving the upward bias issue previously documented with these types of analyses (Anderson, 2010; Waples & Do, 2010) but differs in their algorithms and analytical approach. Since different population genetic methods may lead to slightly different results, we followed standard practice of employing more than one method.
AssignPop uses a supervised machine-learning method to evaluate discriminatory power of a dataset to correctly assign individuals of unknown origins to source populations (Chen et al., 2018).

It uses principal component analysis for dimensionality reduction,
Monte-Carlo resampling cross-validation to estimate mean and variance in assignment accuracy, and machine-learning classification algorithms to build predictive models. We used individuals from our seven breeding population sampling locations as a training dataset to subsequently assign Galápagos birds back to their breeding population of origin. Because outlier loci may improve assignment testing power, and our prior work confirmed that the population structure recovered by outliers does not differ from that of neutral loci (Renfrew et al., 2021), we performed this analysis with all SNP loci. To confirm the outlier loci did not provide any assignment bias, we also ran the assignment testing using the neutral SNP panel. We used the Support Vector Machine (SVM) model (30 iterations First, we assessed the power of the reference dataset using the self-assign function, by which reference individuals were assigned back to sampling collections using a leave-one-out procedure, as summarized in Anderson et al. (2008). We then evaluated assignment accuracy through simulated mixtures, using the same leaveone-out approach. We estimated mixing proportions of simulated individuals using a maximum-likelihood EM algorithm and posterior mean from MCMC. We ran these simulations with a maximum size of 50 individuals in each simulated mixture, with 100 MCMC repetitions. Simulations were carried out using full multilocus genotypes, as well as resampling over gene copies, as the latter method was found to yield a better power assessment in some instances (Anderson et al., 2008). We compared estimated reporting unit mixing proportions to simulated values to assess the degree and direction of reporting unit bias within the data. Subsequently, we ran a mixture analysis on the nine Galápagos birds using both the default MCMC method as well as the parametric bootstrapping method to account for reporting unit bias found in our data. We ran the above procedure with reporting units defined at both a regional (four genetically distinct populations of Renfrew et al., 2021) and local scale (seven breeding sampling locations) to determine the breeding population of origin of the birds in the mixture.

| RE SULTS
We had a total of 111,771,061 raw sequencing reads, of which we  Table 3). Similar results were found when the analysis was performed using neutral loci (3226 SNPs) only, although with slightly lower assignment scores (Appendix S3).
In Rubias, the self-assignment testing of reference individuals resulted in high average posterior means of group membership (P of Z) for the four regional breeding population groups (eastern, midwestern, Oregon, and British Columbia), with all groups having >89% correct assignment (Table 4). Self-assignment testing also resulted in high P of Z values for six of the seven breeding sampling locations (range 90%-100%), apart from North Dakota which had an average posterior means of group membership of 40% (Table 4).
For the simulated mixtures in Rubias, the average P of Z was consistently high across regional breeding population simulations (eastern, midwestern, Oregon, and British Columbia), regardless of the method of resampling, ranging from 99% to 100% with the gene copy method and 91%-100% with the full multilocus method (Appendices S5 and S6). Average P of Z levels were also high (range 93%-100%) for simulations performed with the seven sampling locations as reporting units using the gene copy method (Appendix S4).
These levels were lower and more similar to the self-assignment re- Mixture analysis in Rubias assigned the nine Galápagos individuals back to the four regional breeding populations, as well as the seven breeding sampling locations with high levels of accuracy, using both the default method and parametric bootstrapping correction for reporting unit bias. Six Galápagos migrants were assigned to the eastern cluster and three individuals were assigned to the midwestern cluster with 100% certainty (Table 5), using the bootstrapping correction. When we used the seven sampling locations as reporting units, we were able to get better resolution on breeding origin of the birds with similarly high levels of certainty (>91%). We found the same six birds that were assigned to the eastern group were assigned to Ontario and the same three individuals that were assigned to the midwestern group were assigned to Nebraska (Table 5). Assignments were identical when we ran the analysis without the bootstrapping correction, with >92% average P of Z (Appendix Table S1).
Mixture proportions (pi) for the analysis with the seven breeding populations matched the results of the assignment testing, with approximately 0.606 proportional assignment to Ontario, 0.315 to Nebraska, and 0-0.019 to the other five populations. Bootstrap corrected proportions were largely similar to the uncorrected mixture proportions, albeit slightly higher to Nebraska and Ontario, thereby further strengthening the evidence for these assignments (Table 4).
Uncorrected and corrected mixture proportions differed, however, for the results with the four regional breeding populations. suggesting stronger evidence for assignments to the East (Table 4).

| DISCUSS ION
Using genetic data in conjunction with assignment testing and mixture analysis, we identified the breeding population of origin

F I G U R E 4
Results of the Monte-Carlo cross-validation performed in AssignPop to evaluate the baseline Bobolink data (reference dataset of 174 individuals from seven breeding locations). Assignment accuracy was estimated through resampling random training individuals using all loci (3234 SNPs). Each inset shows assignment accuracy in the baseline data across each sampling location and overall at varying proportions of individuals used in the training dataset (0.5, 0.7, and 0.90).

TA B L E 3 Results from AssignPop assignments of nine migrant
Bobolinks caught on the Galápagos Islands back to breeding locations using all loci (3234 SNPs).

Individual % Assignment to Ontario
Galápagos 44619  of the Galápagos migrants is the eastern breeding group, and specifically, the Ontario area. Furthermore, the relatively small F is and negative relatedness values for the Galápagos Bobolinks (Table 2) suggest that they are not from a single, closely related breeding population. This indicates that it is unlikely that the Galápagos migrants are from a small, isolated eastern population and may instead stem from several locations within and around the Ontario area.
In trying to explain why a small number of Bobolinks maintain a historical migration through Galápagos, our results do not support the probability hypothesis, which predicts that they originate from the region that includes the greatest proportion of the breeding population. The core of the Bobolink breeding range with the highest TA B L E 4 Results of evaluation from the Self Assign function in Rubias, for the 174 Bobolink reference individuals from the seven breeding locations (left) and the four regional breeding groups (right). Note: The average posterior means of group membership (P of Z) show the average posterior probability of assignment to the correct group for each reporting unit. Simulated mixture proportions (pi) and parametric bootstrapped corrections on those mixture proportions for the migrant Galápagos Bobolink mixture dataset are shown after the self-assignment results, calculated in program Rubias for the seven breeding locations (left) and the four regional population groups (right).

TA B L E 5
Results of the mixture analysis of nine migrant Galápagos Bobolinks from program Rubias. Note: The table includes each individual caught on the Galápagos Islands, the reporting unit to which it was assigned (from one of the seven breeding locations or one of the four regional groups), and the associated probability of assignment (P of Z).
abundances lies in the Great Plains, particularly in North Dakota, which has a different genetic population structure from the eastern cluster (Renfrew et al., 2021). While our results could not completely rule out the Midwest as the origin of three of the migrants, these assignments were to Nebraska and not North Dakota, thereby also inconsistent with the core portion of the breeding range.
Our results also do not support the vagrant hypothesis. If Bobolinks in Galápagos were vagrants, they would be present as a result of weather or other stochastic events that alter their normal migratory route. As vagrants, we would expect the Bobolink population on the islands to comprise a mix of individuals from different locations in the breeding range. If samples collected from individuals across multiple years all originated from the eastern breeding range, this hypothesis may be ruled out, unless there is an annual weather event that always causes a detour for a few eastern individuals only.
Reporting on an 1898-1899 Galápagos expedition, Snodgrass and Heller (1903) (Nocera et al., 2006), and on non-breeding grounds, they form flocks and use vocal and visual cues to avoid predation and find foraging areas (Renfrew & Saavedra, 2007). Social information influences young Bobolinks especially, such that individuals will select lowquality habitats in response to social information over other means of habitat evaluation (Nocera et al., 2006(Nocera et al., , 2009. Young Bobolinks especially may follow the "wrong" migration route in the process of following erroneous social cues (Betts et al., 2008, found this with respect to selecting breeding habitat). Therefore, one or a few Bobolinks that make an error could have followers. Finally, the route could be maintained to minimize exposure to parasites (Gill et al., 2009;Levin et al., 2016); in this case, Bobolinks avoid continental or island stopovers with known parasite histories but going to the Galápagos prior to ~2000 had no known blood parasites.
However, if this migration was maintained to avoid parasites, we are unsure why such a small proportion of the global population uses it, particularly when only 18% of Bobolinks carry Plasmodium (Levin et al., 2013). Likewise, we do not yet know if and how parasites, particularly Plasmodium, impact Bobolink life histories.
In conclusion, following the proximity hypothesis, our findings provide evidence that the Bobolinks that stopped in Galápagos

ACK N OWLED G M ENTS
The University of New England and the Vermont Center for Ecostudies provided funding for this project. Laboratory costs were provided by the University of New Hampshire. We thank the staff at the Charles Darwin Foundation and Galápagos National Park for their support and logistics. We are deeply grateful to J. Megyesi and S. Neumann, who provided essential, priceless assistance in all forms on San Cristobal. We thank Andrea Trigueros and Sage Rohrer from the Parker lab at UMSL for conducting the initial DNA extractions and preparing samples for shipment.

DATA AVA I L A B I L I T Y S TAT E M E N T
Data will be archived in Dryad (doi: 10.5061/dryad.9s4mw 6mmj).

B EN EFIT-S H A R I N G S TATEM ENT
Benefits from this research accrue from the sharing of our data and results on public databases as described above.