Genomics reveal the origins and current structure of a genetically depauperate freshwater species in its introduced Alaskan range

Abstract Invasive species are a major threat to global biodiversity, yet also represent large‐scale unplanned ecological and evolutionary experiments to address fundamental questions in nature. Here we analyzed both native and invasive populations of predatory northern pike (Esox lucius) to characterize landscape genetic variation, determine the most likely origins of introduced populations, and investigate a presumably postglacial population from Southeast Alaska of unclear provenance. Using a set of 4329 SNPs from 351 individual Alaskan northern pike representing the most widespread geographic sampling to date, our results confirm low levels of genetic diversity in native populations (average 𝝅 of 3.18 × 10−4) and even less in invasive populations (average 𝝅 of 2.68 × 10−4) consistent with bottleneck effects. Our analyses indicate that invasive northern pike likely came from multiple introductions from different native Alaskan populations and subsequently dispersed from original introduction sites. At the broadest scale, invasive populations appear to have been founded from two distinct regions of Alaska, indicative of two independent introduction events. Genetic admixture resulting from introductions from multiple source populations may have mitigated the negative effects associated with genetic bottlenecks in this species with naturally low levels of genetic diversity. Genomic signatures strongly suggest an excess of rare, population‐specific alleles, pointing to a small number of founding individuals in both native and introduced populations consistent with a species' life history of limited dispersal and gene flow. Lastly, the results strongly suggest that a small isolated population of pike, located in Southeast Alaska, is native in origin rather than stemming from a contemporary introduction event. Although theory predicts that lack of genetic variation may limit colonization success of novel environments, we detected no evidence that a lack of standing variation limited the success of this genetically depauperate apex predator.


Abstract
Invasive species are a major threat to global biodiversity, yet also represent large-scale At the broadest scale, invasive populations appear to have been founded from two distinct regions of Alaska, indicative of two independent introduction events. Genetic admixture resulting from introductions from multiple source populations may have mitigated the negative effects associated with genetic bottlenecks in this species with naturally low levels of genetic diversity. Genomic signatures strongly suggest an excess of rare, population-specific alleles, pointing to a small number of founding individuals in both native and introduced populations consistent with a species' life history of limited dispersal and gene flow. Lastly, the results strongly suggest that a small isolated population of pike, located in Southeast Alaska, is native in origin rather than stemming from a contemporary introduction event. Although theory predicts that lack of genetic variation may limit colonization success of novel environments, we detected no evidence that a lack of standing variation limited the success of this genetically depauperate apex predator.

| INTRODUC TI ON
Biological invasions have been likened to one of the four horsemen of the ecological apocalypse (Diamond, 1984) and remain a leading cause of local and species-wide extinctions (Clavero & Garciaberthou, 2005;Mooney & Cleland, 2001;Moyle & Leidy, 1992;Patankar et al., 2006;Ricciardi & Macisaac, 2010;Vitousek et al., 1996). Indeed, the legacy effects of species introductions that result in invasive hybridization (e.g., Muhlfeld et al., 2017) or novel predation by apex predators (e.g., Côté & Smith, 2018) create especially pernicious problems for natural resource management and conservation. Despite this adversity, biological invasions provide large-scale, unplanned, replicated natural experiments to investigate fundamental ecological and evolutionary questions (Sax et al., 2007;Westley, 2011). Here we apply modern genomic tools to the invasion of a top freshwater predator, northern pike (Esox lucius: Esocidae; hereafter "pike") to explore patterns of population divergence and to shed light on the origins of the invasion in Alaska, USA.
Pike is a circumpolar species of fish found throughout much of the Northern Hemisphere (e.g., Seeb et al., 1987). Despite their broad native range, pike are genetically depauperate with low diversity compared to other species of freshwater fish (Seeb et al., 1987;Senanan & Kapuscinski, 2000;Skog et al., 2014;Wennerström et al., 2018). This is likely due to their history as modern day populations were established after postglacial expansion from multiple geographically restricted refugia. Other fishes with similar ranges, and that presumably experienced similar bottlenecks, do not show the same reductions in genetic diversity (Miller & Senanan, 2003).
Although it is well established that pike originated in Eurasia, how pike colonized and their subsequent dispersal in North America remain unknown (e.g., Johnson, 2019). There does appear to be genetic separation between native populations of Alaska and eastern North American pike, and a reduction in genetic diversity in eastern populations strongly suggests a Beringia origin of colonization into North America with subsequent expansion eastwards (Johnson, 2019;Senanan & Kapuscinski, 2000;Skog et al., 2014).
Introductions of pike outside their native range have increased worldwide (Johnson et al., 2009;McMahon & Bennett, 1996). As a prized sportfish and common target of aquaculture enhancement, pike introductions are the product of natural and human-assisted movements (Dunker et al., 2018). In Alaska, pike are an ecologically and culturally important native fish species north and west of the Alaska Range, but do not occur naturally south and east of the Alaska Range mountains (Figure 1). A possible exception-that has not been evaluated genetically-are populations of pike found in the Antlen River near Yakutat, Alaska (Morrow, 1980). The origins of these pike, relative to other native and introduced Alaskan populations, are unclear and genetic studies may help elucidate their origins. Elsewhere, south and east of the Alaska Range in Alaska, pike occurrences are clearly the result of human-mediated introductions. Although the history of the introduced population is relatively unknown, records suggest that in the 1950s, a floatplane pilot transported pike from Minto Flats AK, USA (i.e., their native range), and released them into Bulchitna Lake (Matanuska-Susitna Borough, AK, USA; Dunker et al., 2020). This illegal stocking event is the first known instance of pike in the Matanuska-Susitna basin, although prior introductions may have occurred. Since then multiple stocking events have occurred throughout Southcentral Alaska and in lakes on the Kenai Peninsula, Alaska, until at least the early 2000s (Dunker et al., 2018;Haught & von Hippel, 2011). Continued identification of newly established populations associated with human-assisted introductions and subsequent range expansion of non-native populations indicates that this invasion is an ongoing process and that many currently un-invaded sites remain vulnerable to colonization (Jalbert et al., 2021).
Despite widespread concern about the impacts of pike, particularly as predators of native Pacific salmonids (Oncorhynchus spp.), virtually nothing is known about the genetic diversity of Alaskan pike (although see Seeb et al., 1987;Skog et al., 2014;Wooller et al., 2015). We used genome-wide RAD-seq data approaches from pike from both native and introduced populations in Alaska to illuminate whether the origins of introductions were likely the result of single-source or multiple introductory events. Specifically, our objectives were (1) to characterize the level of genetic variation in both invasive and native populations of Alaska pike, (2) to determine the most likely origin of invasive populations of pike, and (3) to clarify the origins of a presumed postglacial relict population of pike in the Antlen River near Yakutat in the context of our findings regarding known invasive and known native pike populations. ranged (Southwest Alaska, Fairbanks) (Dunker et al., 2018;Jalbert et al., 2021).
Pike were captured using a combination of gillnet, seine net, rotenone treatment, and angling methods. All pike captured in the invasive range and up to 50 pike from each native site were euthanized.
Pectoral fin clip tissues were placed in reagent alcohol (95%) or a solution of dimethyl sulfoxide (DMSO), Ethylenediaminetetraacetic acid (EDTA), and salt for preservation (Seutin et al., 1991) immediately after collection. Sample collection and associated protocols were approved by the Alaska Department of Fish and Game (ADF&G) and the Institutional Animal Care and Use Committee (IACUC) and were collected under Fish Resource Permit number SF2017-168 and IACUC protocol number 921163-3.

| Genomic DNA isolation and RADseq
Total genomic DNA was isolated from preserved tissue samples using the reagents and protocols of the Qiagen Gentra Puregene Tissue kit (QIAGEN Inc., Valencia, CA, USA) with a modification to the final elution buffer. Specifically, isolated genomic DNA was dissolved in 50 μL of low-EDTA TE buffer (pH 8.0) to minimize potential inhibition of downstream reactions. Quantity and purity of DNA preparations were assessed through fluorometry, spectrophotometry, and electrophoresis. We used a double-digest restriction-site-associated sequencing (ddRAD-seq) approach (Peterson et al., 2012) to broadly characterize genetic variation.
A sample size of up to 25 randomly selected individuals per site was based on previous studies that used ddRAD-seq datasets to determine genetic diversity (Hale et al., 2012;Willing et al., 2012).
Briefly, ≥200 ng of DNA from each sample was digested using the restriction enzyme combination MspI (C|CGG) and EcoRI (G|AATTC). Restriction digest products were used to build individually tagged libraries, which were then pooled, and size selected to a range of 200-500 base pairs (Blue Pippin, Sage Science) and sequenced on an Illumina HiSeq platform using 2 × 125 bp PE V4 chemistry. Library construction, sequencing, and demultiplexing were performed by GENEWIZ LLC (South Plainfield, NJ, USA) with a targeted 5 million paired reads per individual.

| Sequence data quality assessment and alignment to pike genome
We used FASTQC version 0.11.7 (Andrews et al., 2010) to assess the quality of the sequence reads and MULTIQC version 1.5 (Ewels et al., 2016) to obtain a consolidated summary of individual FASTQC module reports. Sequences and individuals with more than 20% of F I G U R E 1 Sampling sites of pike sequenced in this study as detailed in Table 1. Pike are native to Alaska north and west of the Alaska Range indicated by squares, except for a putative relictual population in Southeast Alaska represented by Antlen River in this study indicated by a triangle (Morrow, 1980). Sampling sites in Southcentral Alaska are bounded by a box and indicated by circles in the main figure are considered introduced (Dunker et al., 2020). The inset shows Southcentral Alaska sampling sites with the size of points scaled to sample size. Sample site names are prefaced with labels corresponding to genetic grouping as explained in the text: I -Southcentral I; II -Southcentral II, K -Kenai Peninsula. Eagle Lake (diamond) is a geographically distant site from Alaska with a pike population representative of non-Beringian pike.

Minto Flats Yukon Flats
Eagle Lake

| Population genomic analyses
Genotypes were called from sorted BAM files using ANGSD (v0.922; Korneliussen et al., 2014)  Admixture analysis was performed using NGSadmix (Skotte et al., 2013). Beagle files generated by ANGSD (minMaf 0.05, minQ 20, minMapQ 20) were used as input for NGSadmix. NGSadmix was run on two datasets: (1) all Alaska pike samples from sites with n > 6 individuals (n = 344, 17 sites) and (2) introduced samples from Southcentral Alaska (n = 225, 11 sites). Analyses were repeated with K set to varying numbers (range from 2 to 10 for the complete dataset and 2-5 for the Southcentral Alaska samples). Ancestry plots were then created in R (R Core Team, 2018) using both the ggplot2 and melt packages (Wickham, 2009). Optimum K values were inferred based on a combination of maximum likelihood scores and the observed relationship between populations.
To examine population structure evidenced by an alternative method to NGSAdmix, we applied Discriminant Analysis of Principal Components (DAPC) as implemented in the R package adegenet (Jombart, 2008;Jombart et al., 2010). the same parameters as the admixture analysis but also enforcing a posterior cutoff option (−postCuoff 0.95) with ANGSD and outputting the genotypes as a PLINK-formatted file. Then we used PLINK (Purcell et al., 2007) to convert from PLINK to VCF format, using vcfR R package functions that read in a VCF file and convert to a genind object for use with adegenet within R (Knaus & Grunwald, 2017).
We examined K = 2 genetic clusters to identify the largest division in the dataset and subsequently identified an optimal K genetic clus- ters. An optimal K genetic cluster of individuals was determined by successive k-means clustering as implemented by the find. clusters function of adegenet and the optimal K selected through Bayesian information criterion (BIC). Subsequently, an undivided geographic grouping of introduced pike was investigated by a separate genotype calling and DAPC analysis using the same parameters and methods.

| Phylogenetic analyses of Alaska pike
We examined the relationships of Alaska pike with a phylogenetic approach that utilized 358 samples from 20 sites (including Eagle Lake from Canada) collected and sequenced in this study ( Table 1).
These 20 sites included 11 introduced pike sites and 8 natural pike sites from within Alaska. Genotypes were generated as a PLINK- The PLINK file was converted to a VCF formatted file with PLINK v1.9 (Chang et al., 2015) and the +prune algorithm of BCFtools v 1.10.1 (bcftools +prune −l 0.9 −w 10,000) applied. The pruned SNPs were prepared for SNPbased phylogenetic analysis by removing sites that would be considered invariant due to ambiguities and formatted as a PHYLIP file (https://github.com/btmar tin72 1/raxml_ascbi as/blob/maste r/ ascbi as.py, https://github.com/edgar domor tiz/vcf2p hylip). We then generated a consensus tree based on 1000 rapid bootstraps (−bb 1000) in IQ-TREE v2.0-rc1 (Minh et al., 2020). Concatenated phylogenetic inference does not model recent or ancient admixture so TreeMix (Pickrell & Pritchard, 2012) was applied to the same SNP alignment as the concatenated phylogenetic inference as conducted by IQ-TREE to identify potential admixture events in a phylogenetic framework.

| Origins of Antlen River northern pike near Yakutat
We examined the relationships of Antlen River pike to native pike collected from natural occurrences in northern North America to determine which lineage (Beringian or Eastern North American) Antlen River pike are most closely related to and to further investigate if these fish are an introduced population from elsewhere in Alaska.
An input file for SVDQuartets was made by grouping samples by location and data type. A set of 1,000,000 random quartets was evaluated and 100 bootstrap replicates were undertaken to assess confidence in inferred relationships.

| Data summary
We generated ddRAD-seq data from 351 individuals from 19 sites across Alaska that included sites within the species native (eight sites, n = 83) and introduced ranges (11 sites, n = 268). In addition, we sequenced seven individuals from a single site in Western Canada yielding a total of 358 pike. The sample distribution is shown in Figure 1 and summarized in  Table S1.

| Population structure and genetic differentiation
The introduced populations were split into three groups referred here as Southcentral I (Alexander Creek, Alexander Lake, Deshka River, Otter Lake and Shell Lake), Southcentral II (Bulchitna Lake, Anderson Lake, Tukallah Lake and Yentna River), and Kenai Peninsula (Stormy Lake and Tiny Lake) (Figure 1). These groupings were evident and consistent across analyses of population genetic structure and differentiation.
Pairwise estimates of F ST varied widely between comparisons that ranged from 0.419 (between Antlen River and Otter lake) to 0 (−0.003; between Deshka River and Alexander Lake) ( Table 2) A major aim of this research was to address the origin and relationships of introduced populations of pike in Alaska. Therefore, we also focused our admixture analyses on the introduced populations.
Here, the optimal K value was 3 and could be broadly grouped into a Southcentral I genetic cluster, a Southcentral II genetic cluster, and the two lakes on the Kenai Peninsula. Increasing the K values suggested some lakes contained a high level of admixture and might be indicative of stocking from multiple populations and/or gene flow through dispersal. For example, when K = 4 the Southcentral I group began to divide, with Otter Lake and Shell Lake differentiated from Alexander Lake, whereas Deshka River and Alexander Creek appeared to contain alleles from both non-native Southcentral groups suggesting admixture. Lastly, restricting our admixture analysis to just the Southcentral I populations suggested admixture in both Alexander Creek and Deshka River from fish from Otter Lake and Alexander Lake.

| Discriminant analysis of principal components
A total of 345 individuals and 4860 loci were passed to DAPC after calling genotypes and removing sampling locations with few individuals (Selawik River, North Slope and Eagle Lake).
When K = 2 genetic clusters is specified, the two observed clusters are composed of (1) native populations and Kenai Peninsula introduced pike and (2) all other introduced pike ( Figure 5).
Successive k-means clustering indicated support for K = 8 genetic clusters with Antlen River, Yukon River, Minto Flats, and Fairbanks sampling locations identified as unique clusters.
Groupings of sampling locations at K = 8 are Southwest Alaska and Kenai Peninsula pike (Lake Clark, Lake Nerka, Stormy Lake, Tiny Lake), and Southcentral I sampling locations are divided into two groups: one of Alexander Creek, Alexander Lake, and the Deshka River and a second of Otter Lake and Shell Lake.
The genetic cluster present at K = 8 of Southcentral II (Anderson Lake, Bulchitna Lake, Tukallah Lake, and Yentna River) sampling locations was further examined by additional genotype calling for these fish (86 individuals, 4163 loci). This genetic cluster was subdivided by an optimal K of 2 splitting Anderson Lake from Bulchitna Lake, Tukallah Lake, and Yentna River sampling locations. The average membership in each genetic cluster for introduced pike sampling sites at K = 8 is plotted geographically in Figure 6.

| Phylogenetic analyses of Alaska pike
An initial 3614 SNPs were reduced to 907 with the consensus tree shown in Figure 7. Peninsula, the Stormy Lake and Tiny Lake fish separated into two groups based on site (Figure 7c). TreeMix analysis produced topologies largely congruent with the concatenated phylogeny; notably a mixture event into the base of the Southcentral I grouping of pike is inferred ( Figure S1).

| Origins of Antlen River northern pike near Yakutat
Initially, from 139 samples from 12 presumably native pike populations, 837 SNPs met minimum MAF and other thresholds. These Comparisons within the introduced range are indicated as circles with those between Stormy Lake (Kenai Peninsula) and all other locations colored in blue. Pairwise comparisons with Antlen River are indicated by red triangles.

F I G U R E 4
Admixture plots generated by NGSadmix of Alaska pike at K = 2 and K = 6. Sampling locations are subdivided by status (introduced vs. native) and then by geography, that is, Interior, Western Alaska, Kenai peninsular, then other Southcentral locations. Note that the classification of genetic clusters as Southcentral I vs. Southcentral II was based on allele sharing rather than geography (see Figure 1). Introduced populations were further subdivided, based on allele sharing, into either two genetic clusters (Southcentral I) or three genetic clusters (Southcentral II). Values of K were chosen based on the lowest log-likelihood as implemented in NGSadmix software within ANGSD.
populations include eight sites in Alaska with RADseq data covering the broad natural range of pike in Alaska, a geographically distant site of native pike in Canada (Eagle Lake) and three locations from western North America from WGS data. Pruning reduced this number to 425 SNPs across native pike populations and the Antlen River pike. Plotting of PC 1 (30.58% of variance) and PC 2 (13.75% of variance) indicates Antlen River pike is a discrete group, separate from all other sampling locations examined (Figure 8a). Principal component 1 corresponded to the separation of Antlen River from all other pike, with Eagle Lake the closest on the PC 1 axis. The second PC separated Antlen and Southwest Alaska pike from Interior Alaska and the Yukon River pike. Principal Component 3 (6.69% of variance) largely separated Interior Alaska and Yukon River pike (Figure 8b).
For the construction of a species tree, 306 SNPs are available for use with SVDQuartets. Bootstrapping found maximal support for all inferred nodes. Antlen River and Eagle Lake were identified as most-closely related (Figure 8c). Minto Flats origin pike of different data types were placed together in the tree and a close relationship indicated between North Slope and Southwest Alaska samples.
Broadly, pike were distributed from East to West across the tree, the exception being Antlen River most closely related to the farthest East sampling location (Eagle Lake).  Figure 5b across collection sites are presented as pie charts. Four genetic clusters are present in Southcentral Alaska; genetic cluster 2 is shared between Stormy Lake and Tiny Lake (Kenai sites) and Lake Clark and Lake Nerka (Western Alaska). Other genetic clusters are found in introduced pike only.  (Figure 7). This divergence is obvious when K = 4 and is maintained at higher estimates of K. These clusters were composed of Alexander Creek, Alexander Lake, Deshka River, Otter Lake, Shell Lake (Southcentral I), and Anderson Lake, Bulchitna Lake, Tukallah Lake, and Yentna River (Southcentral II). While some of these groupings conform with expectations of geographic and hydrologic connectivity (e.g., Alexander Creek flows out of Alexander Lake), others do not ( Figure 6). For example, Bulchitna Lake (Southcentral II), the putative first site of pike introduction south of the Alaska range mountains, is physically close to both Alexander Creek and Shell Lake (Southcentral I) but appears to be genetically distinct. It is not clear to what extent the genetic distinctiveness of Bulchitna Lake reflects unknown idiosyncratic aspects of the local site rather than restricted gene flow. It is notable, however, that the outlet of Bulchitna Lake has a make-shift unmaintained dam that likely reduces gene flow between the lake and nearby populations, while other sites do not have obvious barriers to dispersal (Jalbert, 2018).
Despite lingering uncertainty regarding the origins of non-native populations, admixture plots in Figure 4 shed some light on allele sharing with native populations. For example, Southcentral I contain shared alleles with Fairbanks, Lake Clark, and Lake Nerka. This is also shown in F ST estimates, which are lower for comparisons between Creek and its headwater, Alexander Lake. Despite close geographic proximity, pike in Alexander Creek exhibited more private alleles and higher estimates of than pike in its headwater lake, Alexander Lake (Susitna, Alaska, USA), and measures of differentiation between the groups were not zero (F ST = 0.047 ± 0.011).
In contrast to the Southcentral I and II groupings, the two sites on the Kenai Peninsula most likely originated via introductions from Western Alaska. Genome-wide F ST estimates were high between the Kenai Peninsula samples and the other populations (both introduced and native) but lower between Stormy Lake and the two sampled Western Alaska populations (Lake Clark and Lake Nerka; 0.224 and 0.199, respectively). A Western Alaska origin for the Kenai Peninsula samples was supported by both population genetic and phylogenetic analyses, indicating that either these lakes or other lakes in the vicinity of Lakes Clark and Nerka were the origins of pike in the Kenai Peninsula. The first documented recovery of pike in Stormy Lake was in 1970, providing several decades for pike to disperse, but movement between Stormy Lake and Tiny Lake would involve dispersal through drainage basins and crossing of a saltwater barrier, which as discussed previously is unlikely for this scenario.
The low likelihood of dispersal by pike without assistance between Stormy Lake and Tiny Lake paired with evidence of admixture within Stormy Lake ( Figure 4) and higher nucleotide diversity than most introduced populations suggest that translocations within and/or multiple introduction events to the Kenai Peninsula have occurred.
Our results suggest that the Kenai Peninsula pike originated from multiple sources from Western Alaska, and, at least in Stormy Lake, The origin of pike in the Antlen River in the Yakutat region of Southeast Alaska has long been suspected to be a postglacial artifact but has remained unclear. Our genetic data from the Antlen River strongly suggest a native origin based on high genome-wide pairwise F ST between all populations studied and phylogenetic analyses which place Yakutat pike as distantly related to all other Alaska pike.
In addition, nucleotide diversity of Antlen River pike exceeds that of numerous introduced populations ( Figure 2a) and Tajima Our results contribute to a broader emerging story of pike as an increasingly successful global invader (Dunker et al., 2018). In the Colorado River, pike have been a conservation challenge since the 1970s (Tyus & Beard, 1990) and the recent invasion and spread of pike in tributaries of the Columbia River is a cause of widespread concern (Carim et al., 2019;Carim et al., 2022;Muhlfeld et al., 2008). Our findings, coupled with the observations of successful invasions elsewhere, make clear that pike are not limited, at least in the short term, by their lack of genetic diversity. Although it is unknown whether pike have adapted to novel environments through genetic or plastic mechanisms (Berghaus et al., 2019), the lack of diversity appears not to preclude genetic-based adaptation in at least some other non-native freshwater fishes (Koskinen et al., 2002). While it is possible that over the long-term non-native pike will decline or even exhibit local extinctions as other invasive species have been known to do (e.g., Cooling & Hoffmann, 2015), it seems equally possible that pike are not only here to stay, but likely to flourish in new habitats, especially in the light of widespread warming. Patterns from Europe serve as a cautionary tale, suggesting that co-existence between salmonids and pike may only be possible where sufficient spatial and thermal refugia exist within habitats (Hein et al., 2014). In Alaska, it remains unclear how native fish communities will adapt to both the effects of ongoing pike invasions together with other ecological stressors in an era of rapid environmental change (Schoen et al., 2017;Westley et al., 2021).

ACK N OWLED G M ENTS
The staff and facilities of the Alaska Cooperative Fish and Wildlife Research Unit and the Institute of Arctic Biology at the University of Alaska Fairbanks were instrumental to the success of this project. We thank the students of the Alaska-Kamchatka exchange, the Alaska

CO N FLI C T O F I NTER E S T S TATEM ENT
The authors have no competing interests to declare.

DATA AVA I L A B I L I T Y S TAT E M E N T
Sequence data generated for this study have been deposited in the National Center for Biotechnology Information Sequence Read Archive under BioProject PRJNA949726.