Uncovering population structure in the Humboldt penguin (Spheniscus humboldti) along the Pacific coast at South America

The upwelling hypothesis has been proposed to explain reduced or lack of population structure in seabird species specialized in food resources available at cold-water upwellings. However, population genetic structure may be challenging to detect in species with large population sizes, since variation in allele frequencies are more robust under genetic drift. High gene flow among populations, that can be constant or pulses of migration in a short period, may also decrease power of algorithms to detect genetic structure. Penguin species usually have large population sizes, high migratory ability but philopatric behavior, and recent investigations debate the existence of subtle population structure for some species not detected before. Previous study on Humboldt penguins found lack of population genetic structure for colonies of Punta San Juan and from South Chile. Here, we used mtDNA and nuclear markers (10 microsatellites and RAG1 intron) to evaluate population structure for 11 main breeding colonies of Humboldt penguins, covering the whole spatial distribution of this species. Although mtDNA failed to detect population structure, microsatellite loci and nuclear intron detected population structure along its latitudinal distribution. Microsatellite showed significant Rst values between most of pairwise locations (44 of 56 locations, Rst = 0.003 to 0.081) and 86% of individuals were assigned to their sampled colony, suggesting philopatry. STRUCTURE detected three main genetic clusters according to geographical locations: i) Peru; ii) North of Chile; and iii) Central-South of Chile. The Humboldt penguin shows signal population expansion after the Last Glacial Maximum (LGM), suggesting that the genetic structure of the species is a result of population dynamics and foraging colder water upwelling that favor gene flow and phylopatric rate. Our findings thus highlight that variable markers and wide sampling along the species distribution are crucial to better understand genetic population structure in animals with high dispersal ability.


Introduction
In species with high dispersal ability and no geographical barriers in their distribution, it is expected found low genetic population structure. For instance, weak or no population genetic structure has been frequently recorded for seabird species along the Atlantic coast of South America (e.g. Kelp gull, Larus dominicanus [1,2]; Magellanic penguin, Spheniscus magellanicus [3]; South-American tern, Sterna hirundinacea [4]), along the Pacific coast of South America (e.g. Peruvian pelican, Pelecanus thagus [5]), and around Antarctica (Emperor penguin, Aptenodytes forsteri, [6,7]; Adélie penguin, Pygoscelis adelia [8]; P. antarticus Chinstrap penguin [9]). Therefore, relative importance of factors that influence the population structure of seabirds have been under debate, such as the presence of physical or non-physical barriers [10,11], the foraging ecology of the species [12], and/or their philopatric behavior [13].
The upwelling hypothesis has been proposed to explain reduced or lack of population genetic structure in seabird species specialized in food resources available at cold-water upwellings, which are regularly influenced by largescale climatic events [12]. For instance, the Southeast Pacific coast is characterized by the Humboldt Current System (HCS) of cool sea surface temperature and high biological productivity. In the HCS, the coastal upwellings provide more than 10% of global fish catch [14], however, these areas are not temporally continuous or spatially uniform: indeed, El Niño Southern Oscillation (ENSO) reduces the upwellings' intensity leading to the warming of surface waters, reducing productivity [15,16,17,18], and affecting the survival and dispersal of fishes, seabirds and marine mammals [19,20,21]. Thus, during El Niño events, adult seabirds disperse long distances to find new productive upwelling areas to forage and colonize new area, according to the availability of breeding grounds, increasing gene flow and consequently reducing population genetic structure. The weak population genetic structure and high genetic diversity of Humboldt Current endemic seabirds can be explained by the upwelling hypothesis [5], such as described for Peruvian pelicans, Peruvian boobies [12], and also for Humboldt penguins [22].
Another hypothesis to explain reduced population genetic structure is the effect of Pleistocene glaciations, which is frequently proposed for seabirds from South America [23] and from the North Hemisphere [24,25]. During the Last Glacial Maximum (LGM), the southern portion of the Pacific coast was covered by an ice sheet [26,27]. However, in Chile, the region between 33˚S and 46˚S was considered to have been climatically stable [28]. Thus, distinct climate conditions throughout the Pacific coast could play an important role in defining the demographic history of populations, affecting species distribution, and leaving a genetic signature on populations. Therefore, low genetic structure could reflect the signature of population expansion after the LGM associated with high population size (Ne), thus recent gene flow would not be easily detected.
Gene flow among populations derives from contemporary and historical factors. However, detecting population genetic structure in species with large effective population size is a challenge, since variation in allele frequencies is masked by genetic drift that is inversely proportional to population size [29]. There is also debate on the power of algorithms of clustering to detect genetic structure in species with large population [6,30,31,32]. For instance, Chinstrap penguins showed weak genetic population structure and a pattern of isolation by distance (IBD) when evaluating four breeding colonies [33] at southernmost Western Antarctic Peninsula (WAP), but no differentiation from the distant Bouvet Island and 11 WAP locations [9]. The number of molecular markers, loci, sample size, and the number of locations across the species distribution might be important to fully understand these patterns. Therefore, the detection of population genetic structure for seabirds from HCS may not only reject the upwelling hypothesis, but may propose new hypotheses to explain the new patterns of species across the region.
Penguins are monogamous seabirds, with intense biparental care, philopatric behavior, high capacity of dispersion, and specialist diet [34]. The Humboldt penguin (Spheniscus humboldti) is an HCS specialist, widely distributed along the Pacific coast of South America, from La Foca Island (05˚12'S; 81˚12'W) in Peru to Metalqui Island (42˚12'S; 74˚09'W) in Chile [35]. There are records that Humboldt penguins have been seen at Guafo Island (43˚36'S) in a mixed-species colony, however there is no report of breeding activity [36]. On the southern limit of its distribution, there is information about hybridization between Humboldt and Magellanic penguins in mixed-species colonies [37]. The current global population size estimation is ca. 32,000 to 36,982 breeding adults, and the Humboldt Penguin is listed as Vulnerable by International Union for Conservation of Nature (IUCN) due to population size reductions attributed to exploitation or habitat alteration, as well as the effect of ENSO events [38,39].
Migration rates among colonies are not well known, however there is evidence of individuals from Pan de Azucar colony migrating over 600 km, and birds from Puñihuil were found over 1,000 km northwards from their original colonies [40,41]. In addition, it was proposed that during ENSO events, Humboldt penguin abundance and distribution might have been shifted southward, causing the reduction of populations in the North (e.g. Punta San Juan, Peru) and an increase in some colonies in Chile, such as Chañaral Island [39]. Intense migration rate has been corroborated by low genetic structure estimated among some colonies of Humboldt penguins [22].
The present study aims to evaluate population genetic structure of the Humboldt penguin, testing upwelling and glaciation hypotheses to explain lack or reduced population structure as previously proposed for this species, and to investigate the potential contribution of choice of molecular markers for population genetic structure of the species along the HCS. To achieve these goals, we (1) characterize the genetic diversity and geographical structure across the entire species range, (2) determined the effects of historical climate changes over the species demography, and (3) evaluated if there is sex-biased philopatry and dispersal, bringing light to the questions about low structure recorded on several seabird´species.

Ethics statement
This study was carried out in strict accordance with the recommendations in the Guide for the Care and Use of Animals at research of the National Council of Animal Experiments from Brazil and Bioethics Guideline from CONICYT (Comisíon Nacional de Investigacion Cientifica y Tecnologica) del Chile.

Sampling
Blood samples for genetic analysis were collected during penguin breeding seasons between 2005 and 2013. We collected a total of 487 samples from adult Humboldt penguins from 11 breeding colonies distributed along its entire distribution range in Peru and Chile (Fig 1). Penguins were captured quietly with a noose pole 1.5 m in length used to lead the penguins out of their nests, and they were hold manually. The heads of captured penguins were covered by one mask to reduce the visual stress. Approximately 500 μl (microliters) of blood samples were obtained from the internal metatarsal vein or the brachial vein, using a 23G needle and 3 mL syringe and stored in 96% ethanol P.A. for genetic analysis. To avoid re-sampling, penguins were marked temporarily with water-resistant color markers, with exception in Punta San Juan colony where penguins already had flipper bands. DNA was isolated for genetic analysis using standard phenol-chloroform extraction protocols followed by ethanol precipitation and DNA resuspension in sterile water [42].

Molecular sexing
Sex was determined by Polymerase chain reaction (PCR), using primer pairs P 2 and P 8 [43], prepared to 10 μL volume containing 50 ng of total DNA, 1X of Taq buffer, 200 μM of each dNTP, 3.5 mM MgCl 2 , 0.5 μM of each primer, and 0.5 U of Taq DNA polymerase. Amplifications were performed with an initial step at 94˚C for 4 min, 53-55˚C for 30 s, and 1 min at 72˚C, followed by 40 cycles of 30 s at 92˚C, 50˚C for 30 s, and 45 s at 72˚C, followed by a final extension of 7 min at 72˚C. The PCR amplifies regions of the CHD1 gene found on the sex chromosomes [43]. Gender identification was based on the number of bands for a given sample visualized on 3% agarose gel. Males have a single band that corresponds to the intron on Z chromosomes, whereas females have two bands, corresponding to introns on the ZW chromosomes that showed distinct size.

Microsatellite genotyping
We genotyped 13 microsatellite loci developed for Spheniscus [22,44,45]. PCR amplification and genotyping procedures were performed using fluorescently labeled primers (Thermofisher, São Paulo, Brazil). PCR were conducted separately for each locus in a 10 μL volume containing 20-40 ng of total DNA, 1X of Taq buffer, 200 μM of each dNTP, 0.5 μM of each primer, and 0.5 U of Taq DNA polymerase. Amplifications were performed with an initial step at 95˚C for 4 min and 37 cycles of 30 s at 94˚C, 30 s at annealing temperature according to each locus (S1 Table), and 1 min at 72˚C, followed by a final extension of 10 min at 72˚C. PCR products were diluted 5X (1:5). A volume of 1.5 μL diluted PCR from FAM labeled and 3.0 μL from HEX labeled was then suspended in 7.75 μL HiDi Formamide (Applied Biosystems, Foster City, CA) with 0.25 of GeneScan 500 ROX size standard, and analyzed on ABI 3500 (Applied Biosystems, Foster City, CA). The fragments were genotyped using GeneMapper v.4 software (Applied Biosystems, Foster City, CA).
Genotyping error or null alleles were tested for each colony using the program Micro-Checker [46]. Hardy-Weinberg and linkage equilibrium for each locus within each colony, as well as all colonies and all loci together were evaluated in GenAlEx [47]. Allele frequencies, allelic richness, observed and expected heterozygosity were estimated for each colony using GenAlEx. Differences in the distribution of genetic variation among colonies were analyzed by R ST between colonies using ARLEQUIN v.3.1 [48]. Possible correlations between the indices of genetic differentiation and geographic distances between all pairs of colonies were tested using the Mantel test, which was performed with Mantel Non-parametric Test Calculator v.2.0 [49] using 10,000 randomizations. The distances between the pairwise colonies (islands) were calculated from geographic coordinates by the program GPS coordinate (https://gps-coordinates.org/distance-between-coordinates.php) considering a Euclidean distance.
Bayesian clustering analyses was performed to estimate the optimal number of genetic clusters (K) using STRUCTURE v2.3.3 [50]. To test for population genetic structure without prior knowledge of sampling locations, we estimated the posterior probability of the data fitting the hypothesis of K clusters [Pr(X|K)], where K is the number of putative populations. STRUC-TURE was run using the admixture and noadmixture model, without localities as priors and assuming uncorrelated allelic frequencies. Preliminary runs, testing from K = 1 to K = 10, were repeated 10 times, each run had 100,000 cycles of burn-in and 10,000 cycles of MCMC. STRUCTURE HARVESTER was used to infer the simplest model to genetic population structure performing Evanno's method (ΔK) [51]. Also, a Discriminant Analysis of Principal Components (DAPC) was carried out to determine the number of cluster of genetically related individuals, with a non-Bayesian approach. DAPC uses sequential K-means and model selection to identify genetic cluster [52]. For this, adegenet package in R [53] was used, retaining all principal components.
Assignment of each individual to their reference putative population was evaluated for microsatellite data using GeneClass [54], employing the likelihood method based on allele frequencies [55], as well as a Bayesian approach [56]. The probability that each individual was assigned to a candidate population was estimated using a Monte Carlo resampling method [57] (number of simulated individuals = 10,000; type I error = 0.01, applying rejection threshold of 0.05). The same program was used to detect first generation migrants employing the Bayesian criterion [56] applying the Monte Carlo resampling method [57] with 10.000 simulated individuals and an alpha of 0.01. We used the highest likelihood value among all available populations (L = L_home/L_max). Philopatric rate and migrate rate difference between sexes were tested by Mann-Whitney non-parametric test using Bioestat Software [58]. Gene flow between population also were estimated through coalescence-based maximum likelihood (LMAX) method implemented in MIGRATE-n 3.2.6 [59], considering geographical distance. MIGRATE assumes an n-island model at mutation migration-drift equilibrium with values of M and θ constant over time. The Brownian motion model was used as an approximation of the stepwise mutation model, and 10 following initial trials, search criteria for the MCMC sampler were set to 20 short chains of 20 000 steps and 3 long chains of 200 000 steps.
Deviations from mutation/drift equilibrium were tested with the program BOTTLENECK [60,61]. Three models of microsatellite evolution were tested: infinite allele model (IAM), two-phase model (TPM), and the stepwise mutation model (SMM). The TPM is the most realistic model for microsatellite mutation because it assumes mainly stepwise mutations with some multi-step mutations [62]. Parameters for the TPM were set at 90% single step mutations, as suggested for microsatellite data [60,61].

Mitochondrial DNA and nuclear DNA sequences
The mtDNA Control region was amplified with primers D-loop C and D [63]. The PCR reaction (10 μL) contained 20 ng of DNA, 1x of Taq buffer, 200 μM of each dNTP, 1.0 μM of each primer, and 0.5 U of Taq polymerase. Amplifications were performed with an initial step at 94˚C for 2 min and 35 cycles of 30 s at 94˚C, 40 s at 62˚C and 90 s at 72˚C, followed by final extension of 10 min at 72˚C. PCR products were cleaned by precipitation using 20% Polyethylene Glycol with 2.5M NaCl. Sequences were obtained on an ABI 3500.
RAG-1 (Recombination activating gene 1) was amplified with primers RAG17 and RAG22 [64]. The PCR reaction as prepared for 10 μL as before (D-loop) Amplifications were performed with an initial step at 94˚C for 2 min and 35 cycles of 30 s at 94˚C, 40 s at 59˚C and 90 s at 72˚C, followed by final extension of 10 min at 72˚C. PCR products were cleaned and sequenced as before.
Sequences were visualized using ChromasLite 2.1 (www.technelysium.com.au). The alignments were adjusted by manually in Bioedit v.5.06 [65]. A Bayesian approach run with the program PHASE [66] was used to identify haplotypes of heterozygotes in the nuclear intron: this program reconstructs the haplotype as implemented in DNAsp v.5.10.01 software [67].
Descriptive analyses, including haplotype diversity (h), nucleotide diversity (π), and theta (θ), were done in DNAsp v.5.10.01 [67]. We used the Network software version 4.6 (www. fluxus-technology.com) with median joining method [68] to draw relationships among haplotypes. Additionally, we calculated Tajima's [69] and Fu's [70] statistics to test bias from neutrality in DNAsp v.5.10.01 [67]. We selected these statistical tests due to their power to detect population expansion scenarios in specific sampling conditions and with specified population expansion rate, time since the expansion, sample size and number of segregating sites [71].
We used AIC as implemented in the software JModeltest [72], to select the best-fit evolutionary model. The evolutionary model selected for control region was T92+G with a discrete gamma distribution (α = 0.06), and JC for RAG1 region. These evolutionary models were used in Bayesian Skyline Plots to analyze population size dynamics through time [73], implemented in BEAST 1.6.1 [74]. We performed runs of 200,000,000 steps, logged every 5,000 steps, and burn-in of 20,000,000. For BEAST analysis, we considered the mutation rate of 0.86 substitution/site/myrs to D-loop (HVRI) described for the Adélie penguin [75] and 1.9 X 10 −3 substitution/site/myrs to RAG1 [76]. To evaluate the convergence of parameters between runs and the performance of analysis (ESS values > 200), we used TRACER 1.7.5 (http://beast.bio.ed.ac. uk/Tracer) [74]. To check the level of population genetic structure among localities, we performed an analysis of molecular variance (AMOVA) with two hierarchical levels using ARLE-QUIN 3.5 [46].

Genetic diversity
The Humboldt penguin showed high genetic diversity for all markers in all colonies. For microsatellites, the number of alleles found in each locus ranged from eight (locus Sh2Ca58) to 23 (both Sh2Ca40), averaging 15.89 over all loci (S1 Table). Private and rare alleles were found in almost all breeding colonies, except for Pupuya and Isla Grande de Atacama (frequencies of 0.004 to 0.056).
The analyses performed by Microchecker showed evidence for null alleles at locus M1-11 in seven breeding colonies (Algarrobo, Cachagua, Tilgo, Pajaros, Choros, Chañaral and Punta San Juan) and out of Hardy-Weinberg Equilibrium (HWE), thus it was excluded from population analysis. Loci Sh2Ca58, Sh2Ca12 and Sh2Ca9 were also excluded from population analysis due genotype proportions deviating from H-W expectations (S2 Table). Thus, the population analysis was conducted with 10 microsatellite loci. Humboldt penguin breeding colonies through the Pacific coast from South America showed relatively high levels of genetic diversity, with mean heterozygosity 0.72 ± 0.014 (Table 1). The minimum value observed was from Algarrobo Island (H o = 0.66) and the maximum from Tilgo (H o = 0.76).
Analysis of the 401 bp fragment of the D-loop HVRI mtDNA from 175 individual Humboldt penguins showed a total of 37 haplotypes, high haplotype diversity (H d = 0.906) and high nucleotide diversity (π = 0.008). A RAG1 fragment of 876 bp from 54 individuals showed 23 haplotypes, also high haplotype diversity (H d = 0.876) and nucleotide diversity (π = 0.002). These patterns of high genetic diversity were also observed in all colonies analyzed (Table 1).

Population genetics structure
Genetic variability using AMOVA was found mainly within populations rather than among populations. For microsatellite loci, 96.89% of genetic variability was detected within populations and only 3.11% among populations (p < 0.001), compared to 98.36% within and only 1.64% among populations for mtDNA (p = 0.10), and 78.51% within and 21.49% among populations for RAG1 (p < 0.001).
Bayesian structure analyses of microsatellite data suggested that the existing global population is composed of 3 groups (K = 3) of Humboldt penguins: 1) Punta San Juan, Peru, 2) Pan de Azucar and Isla Grande de Atacama and 3) the remaining locations (Fig 2, S4 Fig and    S8 Table). Despite the large geographical distance, Pupuya from the central coast was grouped with the locations from the north (i.e. Pan de Azucar and Isla Grande). However, this is probably due the low sample size from this location. In addition, DAPC estimated the optimal number of cluster to K = 2, being one included Pan de Azucar and Isla Grande de Atacama and another included all remaining locations, despite into second group Punta San Juan was slight differentiation (Fig 3). No isolation by distance was identified between all locations, with absence of correlations between geographic and genetic distance (Mantel test; r = 0.04, t = 0.17, p = 0.57). Population genetic structure with significant R ST values for microsatellite were observed between the majority of pairwise locations, mainly between the groups detected by STRUC-TURE, with significant R ST values ranging between Punta San Juan and the remaining locations of 0.011 to 0.088, or among Pan de Azucar and Isla Grande and remaining locations of 0.001 to 0.158; and within the third group higher values were found between Algarrobo and the remaining locations (0.059 to 0.158), except Pupuya and Chiloé (Fig 4, S3 Table).
RAG1 corroborated with high population structure based on significant ϕ ST for 15 of 36 pairwise comparisons, mainly among Punta San Juan and Pan de Azucar, Chañaral, Choros, Cachagua and Algarrobo; and among Pan de Azucar and Chañaral, Choros, Cachagua and Algarrobo (S4 Table). Cachagua is significantly distinct of almost all colonies, except Tilgo. On the other hand, D-loop region was non-informative, since only one pairwise comparison showed a significant value (S4 Table). In addition, the assignment test indicated that philopatry of the Humboldt penguin is higher than 86% (Table 2).

Historical versus recent dispersal
Inference of recent migration indicated low, asymmetric and bidirectional gene flow among Humboldt penguin colonies (Table 2). In contrast, historical gene flow was observed among all colonies (Table 3). In total 368 Humboldt penguins were sexed, where 190 were males and 178 were females (S5 Table), with no significant bias from expected 1 male:1 female proportion. However, females showed lower philopatry than males (S6 Table).       Table). Furthermore, there is evidence of recent expansion of the Humboldt penguin in Punta San Juan (Fs = -5.87, p = 0.02) and Pan de Azucar (Fs = -9.31, p = 0.01; D = 1.63, p = 0.03) only with D-loop region ( Table 1). Corroborating this expansion is the network shaped as a star topology, with few haplotypes in high frequency and several in low frequency with few mutations among them ( Fig  5). Skyline plots showed coalescence around 145,000 years ago to RAG1 and 25,000 years to D-loop region (S2 Fig).

Discussion
Our study reveals that the Humboldt penguin exhibits a clear population genetic structure along the Pacific coast of the South America, as observed in other previous studies on Humboldt penguins [22]. However, our outcome did not corroborate isolation by distance pattern [22], probably due to a gap of sampling. The present study used of several markers, a higher sample size, the distribution and the number of breeding colonies sampled throughout the whole range, and the new methods of data analysis such as the Bayesian methods applied in this study. The combination of all these appeared to overcome the effects of large population size and pulses of migration related to climate oscillations (e.g. ENSO) in the Humboldt penguin, which frequently limit the power of detection of population genetic structure. Bayesian genetic structure analyses revealed three genetic clusters in the Humboldt penguin: 1) Peru (Punta San Juan), 2) north Chile (region of Pan de Azucar and Isla Grande de Atacama), and 3) Central-South of Chile (Pajaros, Chañaral, Tilgo, Choros, Cachagua, Algarrobo) (Fig 2). This structure needs to be considered while implementing management and conservation action plans for the Humboldt penguin along the Southern Pacific coast. Also, taking into account the population data for Humboldt penguin (numbers of breeding pairs in each colony) that indicate Punta San Juan as a key colony in Peru, supporting around 3,160 breeding pairs [38]; and Pajaros, Chañaral, Tilgo and Choros together supporting around 21,700 Population genetic of Humboldt penguin breeding pairs (14,000 at Chañaral, 1,860 at Choros, 2,640 at Tilgo, and 1,200 at Pajaros), and Pan de Azucar with 3,000 breeding pairs. These regions need to be monitored to avoid population decline. Although some endemic seabirds from the Humboldt Current show weak population genetic structure, such as Peruvian pelicans [5]; Peruvian boobies [12], however, other marine vertebrates, such as the Marine otter (Lontra feline) exhibit higher genetic differentiation from populations from Peru compared to those distributed along Chile [77]. Despite the population genetic structure among Humboldt penguin colonies, historical gene flow among several colonies was also observed (Table 3), but recent gene flow was reduced. Gene flow among seabird colonies can be associated to colder-water upwelling [12]: these areas that retain high productivity, becoming important forage sites during ENSO years. Along the Chilean coast, several upwellings have been identified, the main sites being at Antofogasta and Mejilliones, Coquimbo, and Concepcíon [16]. Thus, Humboldt penguins travel long distances to find these highly productive regions [78] to supply their diet necessity, and search main food items such as the Peruvian anchovy (Engraulis ringens), the Araucanian herring (Strangomera bentincki), the Silverside (Odontesthes regia), the Common hake (Merluccius gayi), the Inca scad (Trachurus murphyi), the Garfish (Scomberesox saurus scombroides), and the South American pilchard (Sardinops sagax) [79]. Therefore, the genetic structure that has been observed in the present study may result of foraging in distinct upwelling regions. It is possible that individuals from Pan de Azucar and Isla Grande de Atacama forage in Mejilliones' colder-water upwellings and in Iquique, leading to reduced gene flow with the other colonies. This isolation of Pan de Azucar is corroborated by a known foraging radius of 35 km during the breeding season and 640 km to the north (near Iquique) during the winter [40], reducing gene flow with the other Chilean colonies. Around Isla Choros is also frequent region of colder-water upwellings favoring intense gene flow and reduced genetic structure among some islands, as Pajaros, Chañaral and Tilgo. Pajaros and Cachagua showed lower philopatry rates (0.89 and 0.86, respectively), indicating high gene flow with the other colonies. Thus, it is possible that individuals from Punta San Juan could be forage near Choros and Antofogasta in the North of the Chile, and breed there leading to gene flow among Peru and northern-central Chilean colonies, corroborating colder-water upwelling hypothesis. Vianna et al. (2014) proposed that changes in population size might have been the result of irrupt Humboldt penguin to favorable and productivity areas, moved from Punta San Juan to North of the Chile.
The population genetic structure can be explained by the species philopatric behavior that reduces gene flow. Philopatry was confirmed by assignment test based on microsatellites for all colonies of Humboldt penguins, showing individuals assignment the population origin above 86% (Table 2). Ecological data also indicate strong fidelity to natal colonies on Punta San Juan in Peru [80], and Cachagua and Algarrobo in Chile [40,81,82].
The evolutionary history, such as isolation, expansion or retraction of populations, affects the population genetic structure and the genetic diversity of a species. In the present study in the Humboldt penguin, the D-loop and RAG1 analyses showed a stability and recent coalescence (around 24,917 years ago to D-loop and 145,000 years ago to RAG1). Thus, it is possible that, during the glaciation when Pacific coast experienced a change in its productivity, leading to an intense reduction of the global population of the Humboldt penguin, followed by an expansion of the population after the LGM. However, the population expansion was observed only in neutrality tests to D-loop region and in network (Table 1, Fig 4). The large population size of Humboldt penguin can be masked the expansion, being important increase genomic markers to recovery demographic history. But it is not possible to discard the influence of glaciation in Humboldt penguin. Other penguin species showed the influence of climate effects on their distribution and speciation, experienced strong demographic fluctuations: during the LGM, the Gentoo penguin (Pygoscelis papua) maintained large effective population sizes in Antarctica and the Scotia Arc followed by an expansion [83,84,85], while the Adélie penguin showed two divergent lineages due to different glacial refugia in Antarctica [86,87]. The Magellanic penguin also showed a signal of expansion after the LGM, probably due to increase of available breeding sites [3].
Penguin species have demonstrated, in general, high genetic diversity, as detected here for the Humboldt penguin, and recorded for the Magellanic penguin [3,88], the Gentoo penguin [83,84,85], the Adélie penguin [84,86,89], and the Chinstrap penguin [9,84]. High genetic diversity resulted from several evolutionary factors, such as large population size, low inbreeding rate, and equal sex ratio. Genetic data showed signature of drastic reduction at Pupuya, Tilgo, Pajaros, Choros, Chañaral, Pan de Azucar and Punta San Juan, indicating a bottleneck in these colonies. Despite of these bottlenecks, the Humboldt penguin has maintained high genetic diversity in all colonies.

Considerations to conservation
Our study shows that the population structure in the Humboldt penguin can be better investigated and understood by increasing the number of markers and the sampling effort to cover the whole species' distribution. Low sampling may not reflect real allele frequencies of the global population, showing a simplification of the genetic structure. This study of the Humboldt penguins' population genetic structure revealed three major regions: 1) Punta San Juan in Peru, with clear genetic differences from the Chilean colonies, 2) Pan de Azucar and Isla Grande de Atacama in the North of Chile, which need a special attention as the most genetically differentiated colonies and 3) the breeding colonies from the Central-South of Chile.
Based on our results, following recommendations arise in relation to conservation initiatives. It is important to expand population genetic studies to cover other breeding colonies in Peru, to better understand the relationship between the population of Punta San Juan and the other two genetic groups detected in Chile. Chañaral and Choros are part of the Humboldt Penguin National Reserve, but the other islands (Tilgo and Pajaros)-should also be considered into this system to maintain genetic diversity and a more integral form of population management.
The Humboldt Penguin suffers from the impacts of several factors, such as the interaction with industrial fisheries (overexploited foraging species, incidental catch), the pressure from alien species (e.g. rats, Rattus rattus and R. norvegicus, and feral dogs) that predate on unattended eggs and chicks [90,91], human perturbations due to tourism and guano harvesters [92], and habitat loss. Furthermore, predictions of the effects of future climate change include increases in rainfall (locally) and temperature in South America [93]. It is important to have solid monitoring systems for breeding colonies that could be affected directly by these factors, especially regarding the reduction of chick survival and reproductive success. Thus, outcomes this study help to make better decisions regarding conservation actions for this species.