Identifying the genetic structure of introduced populations of northern snakehead ( Channa argus ) in Eastern USA

With new introductions of invasive species occurring at an alarming rate, resource managers must be able to rapidly determine the source of introduction if there is to be a chance of preventing further spread or future invasions. The first North American populations of reproducing northern snakehead (Channa argus) were detected in Maryland in 2002 and have continued to spread into new watersheds. We used four microsatellite markers to describe genetic characteristics of four established C. argus populations in Eastern U.S.A., a collection of samples of unknown origin from a Chinatown market in Manhattan, New York, and of a C. argus population of uncertain status in the Upper Hudson River. We aimed to determine the probable source of the introduction of C. argus to the Upper Hudson River basin and to clarify the genetic structure of C. argus populations in northeast U.S.A., overall. Results from population structure analysis infer two distinct genetic groups among the specimens sampled. Measures of genetic distance suggest the C. argus population in the Upper Hudson is most similar to the population in the Lower Hudson near Queens, NY. Results conclude that the Potomac River and Chesapeake Bay basins represent one genetic population, which suggests that introductions to the Chesapeake Bay were sourced from the Potomac population and/or that the Bay does not represent a barrier to C. argus dispersal. Overall, our analysis provides evidence of multiple introductions into U.S. waters and human mediated secondary spread from these founding populations.


Introduction
Effective containment of invasive species relies on an underlying knowledge of the invader (e.g.dispersal ability, physiological limits, and competitive ability), but also critically on an understanding of the pathway(s) of introduction and spread (Koenig et al. 1996;Palumbi 2003;Rollins et al. 2009;Estoup and Guillemaud 2010;Lowe and Allendorf 2010).However, information about invasion routes is often based on historical and observational data, which can be difficult to discern or obtain (Estoup and Guillemaud 2010).For example, global trade and transport systems are well-documented vectors for species dispersal (Mills et al. 1993;Johnson et al. 2001;Bax et al. 2003;Kilian et al. 2012), but documentation of the presence of an invasive species through routine quarantine and inspection in harbors or at airports cannot provide information regarding the subsequent steps in the invasion process (i.e. survival, reproduction, dispersal).
Genetic tools provide a rapid means for verifying each step of the invasion process, including molecular surveillance for rare species in the environment and genetic stock analysis to delineate management and eradication units (Kolbe et al. 2004;Darling et al. 2008;Ficetola et al. 2008;Rollins et al. 2009;Jerde et al. 2011).The use of microsatellite markers is a routine and simple method commonly used in population studies (e.g.Paetkau et al. 1995;Darling et al. 2008) to detect subtle and recently evolved genetic differences on which statistics (F ST , F IS , etc.) can be computed (e.g.Nei 1972;Weir and Cockerham 1984;Slatkin 1995).A microsatellite locus is a short tandem repeat on the non-coding region of the DNA strand.These loci have high mutation rates, which allows researchers to determine distinct populations.This is based on the alleles of a given locus that are present in a given sample.Microsatellite loci are extremely useful in that they can be used to cluster genetically similar groups without prior knowledge of sampling locations of individuals (Paetkau et al. 1995;Pritchard et al. 2000;Rollins et al. 2009).
Snakeheads (Channa spp.and Parachanna spp.) are an imminent threat to invade U.S. waters by way of aquarium release, live trade, and natural dispersal post-introduction (Courtenay and Williams 2004;Fuller et al. 2015).Channa argus (Northern snakehead;Cantor, 1842) pose the greatest threat in the United States because of their aggressive piscivory, growth rate, and tolerance to low temperatures and ice coverage (Odenkirk and Owens 2005;Herborg 2007).Native to China, C. argus is an aggressive predator, shown to compete with and prey upon native and sport fishes in North America (Love and Newhard 2012;Saylor et al. 2012).This species threatens to wipe out native forage populations and displace sport fishes that contribute millions of dollars to local economies (Love and Newhard 2012).The first reproducing C. argus population in America was documented in 2002 in a Maryland pond less than 300 m from the Little Patuxent River, a tributary of the Chesapeake Bay.In 2004, both adult and juvenile C. argus were captured in a Potomac River tributary, Little Hunting Creek.In follow up surveys, the abundance of age three fish (3 yo) in the Little Hunting Creek suggests that this population may have predated the discovery of the Maryland pond population (Odenkirk and Owens 2005).The species (C.argus) is now established throughout the Potomac River basin in Virginia and Maryland, on both sides of the Chesapeake Bay basin, and in the Hudson River and Delaware River basins (Figure 1; Fuller et al. 2015).King and Johnson (2011) developed and tested nineteen microsatellite loci on a C. argus population in Meadow Lake, New York and determined that these markers could sufficiently be used to elucidate fine-scale population structure and define management/eradication units.Unfortunately, no such study has been conducted for all C. argus populations in the eastern U.S. and, therefore, the genetic structure of these populations is poorly understood.A populationlevel study (n=8 populations) using C. argus microsatellite markers was conducted in China (Zhuo et al. 2012), but a more comprehensive study is required in U.S. waters to determine population boundaries and identify potential sources of secondary spread (ANS Task Force 2014).
Natural dispersal and genetic admixture of North American populations of C. argus are presumably limited by isolation and distance between populations (Lapointe et al. 2010; Aquatic Nuisance Species (ANS) Task Force 2014; Figure 1).Although expanses of salt water were once considered a barrier to snakehead dispersal, long distance dispersal events through the Chesapeake Bay (up to 40 km over 2 months) have been documented (Lapointe et al. 2013) Northern snakehead are a popular food fish and much of the recent spread is thought to be driven by the desire to establish new local fisheries (ANS Task Force 2014).Prevention of future spread requires that established populations are contained and nascent outlying populations are extirpated where possible.The latter relies upon treatment of waterways with general piscicides like rotenone that can be expensive, subject to failure, and or can result in non-target impacts that can be socially and politically unpalatable (Cailteux et al. 2001).However, an understanding of the genetic structure of C. argus populations and sources of secondary spread could help direct management decisions and justify actions, including eradication efforts, to prevent future spread.Insight into the genetic structure of populations may also aid in identifying the source of new C. argus introductions in the future.
that snakehead may be reintroduced to this or a neighboring water body remains.In the current study, we use microsatellite markers to elucidate the probable source of the introduction of northern snakehead into the Upper Hudson River basin by screening samples from four established C. argus populations in the northeastern U.S. and samples from a New York City fish market of unknown origin.More generally, we use four microsatellites to clarify the genetic structure of northern snakehead populations throughout the Eastern USA in order to identify potential dispersal patterns and populations that could facilitate secondary spread by humans.

Sample collection
Muscle tissue and fin clips from C. argus specimens (n=274 total individuals) were provided by universities along with state and federal agencies in Maryland, Virginia, Delaware, North Carolina, Pennsylvania, and New York (See supplementary materials for sample collection information, Table S1, Table S2).Tissue samples were also obtained from C. argus specimens purchased from a live fish market in Chinatown, Manhattan, New York (n=3).Samples used in this study varied in their preservation method, including fresh, frozen, and preserved in ethanol after their collection.Upon arrival in the lab, tissues were transferred to and stored in 95% non-denatured alcohol until DNA extraction occurred.Putative populations used for this study include the Upper Hudson River basin, Potomac River basin, Lower Hudson River basin, Delaware River basin, Chesapeake Bay basin, and Chinatown market (Table S1) and samples sizes were 112, 91, 11, 57, 7, and 3 individuals, respectively.Populations represent individual samples collected from larger, encompassing basins and may contain individuals from multiple waterbodies.While individual samples

Laboratory methods
DNA extractions were performed on muscle tissue or fin clips using Qiagen DNeasy Blood and Tissue kits (Qiagen Inc., Valencia CA).Four primer pairs developed in previous studies (Table 1) were used to amplify microsatellite loci using end-point PCR.One primer from each pair was commercially synthesized with a fluorescent dye -FAM, HEX, VIC, or NED (Applied Biosystems Inc.) -on the 5' or 3' end (Table 1).We chose these four loci because they were proven effective for discerning population boundaries by Zhuo et al. (2012) and shown to be highly polymorphic by King and Johnson (2011) and Wen and Sun (2010)

Statistical analyses
We tested for deviations from Hardy-Weinburg equilibrium (HWE) using Genepop version 4.2 (Table 2; Raymond and Rousset 1995;Rousset 2008).Likewise, we tested for significant linkage disequilibrium for each pair of loci for each population with 1000 iterations.We used sequential Bonferroni correction to compute the critical significance levels for simultaneous statistical tests for deviations from HWE or linkage disequilibrium (Rice 1989).Genetic diversity across the invaded range of C. argus in the eastern USA was investigated using several genetic metrics.We calculated the average number of alleles at each locus for each putative population group.To account for small sample sizes (in some cases) we used the R package 'diveRsity' to calculate a standardized allelic richness based on 1000 re-samples (with replacement), with n = smallest sample size (Keenan 2014).We also calculated observed heterozygosity, expected heterozygosity, and inbreeding coefficient (F IS ) for each population at each locus using 'diveRsity'.Analysis of Molecular Variance (AMOVA) was used to describe genetic variation within and among populations, as calculated in GenAlEx (Excoffier et al. 1992;Peakall andSmouse 2006, 2012).We also used GenAlEx to compute pairwise F ST values for each putative population, and used a Principle Coordinate Analysis to visualize genetic distances relative to each individual C. argus sample.
Structure v2.3.4 was then used to infer the number of distinct genetic groups based on Bayesian assignment analysis (Pritchard et al. 2000;Falush et al. 2003).Parameters in our model include admixture, correlated allele frequencies, a single α, and a total run length of 150,000 generations with a burn-in period of 20,000 generations.We ran the model in three iterations for K (number of distinct genetic clusters) values 1-8 to evaluate likelihood scores and consistency among runs (Pritchard et al. 2000).K=8 was chosen as the maximum number of clusters based on preliminary analyses, which yielded likelihoods lower than those observed at lower values of K. Greater values were eliminated from the formal analysis to reduce computation time (Darling et al. 2008).We selected the smallest K value that captured the major structure in the data by assessing increases in posterior probability as K increases; the value of K that preceded the plateau in the plot of posterior probability versus K was chosen (Garnier et al. 2004;Darling et al. 2008).Falush et al. (2003) suggest the value of K (K = number of populations) that maximizes log-likelihood of posterior probability is an appropriate choice when inferring the number of populations.We also consider the method suggested by Evanno et al. (2005) to select K based on the greatest rate of change in the log probability of the data between K values (Figure S1).Evanno's delta K was computed in the program Structure Harvester (Earl and von Holdt 2012).

Results
Eleven of the 20 HWE tests were significant after sequential Bonferroni correction (Table 2).Deviations from HWE occurred in all loci across the Upper Hudson basin population and all loci except locus WL28 in the Potomac River population.Locus CarD108 and locus WL8 also deviate from HWE in the Delaware River population.Loci did not exhibit significant linkage disequilibrium after Bonferroni correction, except for the Upper Hudson River basin population at loci WL8 and WL28.Notably, most populations with non-significant departures from HWE (and F ST , discussed later) occur in populations with small sample sizes (Table 2).We found 48 unique alleles across four microsatellite loci.Locus CarD108 exhibited the greatest number of alleles (20), while locus CarC6 had the least (7).The average number of alleles per locus (N A ) range from 2.8 to 5.5 (Table 3).Average gene diversities (H e ) were variable across all four loci, likely due to comparatively small and large sample sizes.F IS values indicate a considerable degree of inbreeding within some populations and not others (Table 2), which could also be due to the variability in sample sizes.We also found three rare alleles shared only between the Lower Hudson and Upper Hudson populations -the greatest number of shared rare alleles between the Upper Hudson and any other population.The only other putative population pair that shares more rare alleles is Chesapeake Bay and Potomac River populations (five rare alleles in common).
We found significant genetic differentiation among putative C. argus populations.Analysis of molecular variance (AMOVA) reveals 11% variation partitioned among populations (p-value < 0.001) and 85% of the variation partitioned within populations.Twelve of the fifteen possible pairwise F ST comparisons were significant after sequential Bonferroni corrections (Table 3).
Comparisons with the Chinatown population resulted in non-significant F ST, likely due to the extremely small sample size (N=3; see below for discussion).The greatest genetic differentiation between populations occurred in the Upper Hudson and Chesapeake Bay populations, while the narrowest differentiation was between the Potomac River and Chesapeake Bay populations (F ST = 0.0032; p-value<0.001).The percentage of variation from Principal Coordinate Analysis explained by the first and second axes was 23 percent and 12 percent, respectively.
Bayesian clustering analysis indicated either two or three distinct genetic clusters for C. argus populations.We adopted the K=2 model based on its biological significance to our dataset and geographic relevance (Darling et al. 2008).While northern and southern populations are distinct and separate, uncertainty in genetic structure arises among the northern populations; population structure is indiscernible between the Delaware River population and the three populations in New York State (Figure 2).As documented in Figure 3, the C. argus population in the Delaware River basin may have received genetic influence from the northern populations (in New York) and southern populations (in the Chesapeake Bay and Potomac Rivers).This suggests that fish could have been transferred from both locations to the Delaware River basin, as unique alleles from each of these populations appear within these samples.

Source population identification
Based on sample sizes and our use of four microsatellite loci, we can only speculate as to which population acted as the source of the C. argus introduction into the Upper Hudson River basin.The number of rare alleles shared exclusively between the Upper Hudson and Lower Hudson leads us to purport that Lower Hudson  basin population of C. argus (also the closest geographic population) is the source of the snakehead introduction into the Upper Hudson River Basin.This affirms one of two things: the population introduced into the Upper Hudson River basin was either the result of fish moved directly from the Lower Hudson population (Willow and Meadow Lakes) or, fish in both populations are derived from the same source population in a different location that was not sampled in the current study (Slatkin 1985).

Genetic structure of populations
Bayesian clustering results suggest that two genetic populations of C. argus exist in the eastern United States, possibly as a result of two unique introductions from a source population in its native range or from some other undiscovered and unsampled population not included in our dataset.Assignment of northern individuals to a single genetic stock indicates a common ancestry and that founding individuals may have come from a common source population (Alfaya et al. 2012).The apparent lack of genetic mixing between northern and southern genetic populations suggests that a barrier to either natural dispersal or reproduction exists between the two, and that admixture only occurs when facilitated by secondary spread by humans (LaRue et al. 2010).However, it remains unknown what this barrier is, whether anthropogenic, hydrological, or some other mechanism of separation.Both northern and southern clusters of populations were not distinguishable using the Structure software package, despite high and significant F ST among nearly all comparisons.
Individual-based Bayesian clustering analyses (Figure 2) and Principle Coordinate Analysis (Figure 3) provide strong support for grouping the Chesapeake Bay population with the population that has developed throughout the Potomac River.A distinct separation between northern and southern populations is apparent (Figure 3).Based on observations of dispersal behavior, Lapointe et al. (2013) noted that snakehead in the Potomac River readily crossed the main channel and he proposed that the Potomac River system contains a single, patchy population of C. argus.This patchy population likely explains the separation of two "sub-populations" in the southern cluster (Figure 3).Lapointe et al. (2013) also noted that some snakehead dispersed out into Chesapeake Bay and passed considerable distances through estuarine environments.Our analyses provide further evidence that the brackish waters of Chesapeake Bay have not been a barrier to natural dispersal, as was initially believed (ANS task Force 2014).In light of this evidence, managers may also want to re-examine whether salinity and environmental conditions in the lower Hudson and East Rivers of New York will prevent the natural spread of the established C. argus populations from Willow and Meadow lakes (Queens, NY).
The four microsatellite markers used were effective in discerning the genetic structure of C. arugs populations in eastern U.S., and were used to characterize C. argus populations in China (see Zhuo et al. 2012).The inclusion of additional loci in our study could facilitate further resolution of population structure, and may provide more clarity into the Delaware River basin population.The authors of Structure suggest that additional markers may decrease the variance of posterior probability estimates, but the program is already effective at discerning population structure with just a few loci (Pritchard et al. 2000).Cavers et al. (2005) determined that microsatellite studies using few loci can be sufficient if they provide highly informative content (i.e.rare alleles).Until more advanced methods (i.e.SNPs, RAD-tags) become less expensive to implement, using the fewest possible microsatellite markers to achieve meaningful results such as we have achieved in this study is the most cost effective method for determining population boundaries and provides a useful tool for management groups in the immediate future.
Few conclusions can be made for populations with very small sample sizes (Lower Hudson River basin, Chinatown, and Chesapeake Bay basin), but it is important to include these samples in general analyses due to their potential role in secondary spread.Many of the C. argus populations in the U.S. are small and/or have already been extirpated.By including all available snakehead genetic tissue, we can understand what role, if any, some of these populations had in the establishment of C. argus in the U.S. based on their genetic proximity to all available tissues.Additionally, Hubisz et al. (2009) suggest that new models implemented into Structure can infer weak population structure despite small sample sizes or variable data.Likewise, the models developed by Hubisz et al. (2009) do not infer population structure when none is present, supporting the robustness of our conclusions for K=2.

Invasion pathways and management implications
Orrell and Weigt (2005) compared snakehead haplotypes from fish captured in the Potomac River to those from (then) established populations using mitochondrial DNA.That study determined no haplotypes were shared between population areas, indicating several independent introductions of northern snakehead into each area (Orrell and Weigt 2005).Populations of C. argus have since expanded from their previous range and new populations have emerged (Courtenay and Williams 2004;Odenkirk and Owens 2007).Our study supports a minimum of two unique introductions into U.S. waters from a foreign stock (Figure 2), but in contrast to Orrell and Weight (2005) our data indicate that certain C. argus populations in Eastern U.S. waters may be acting as bridgehead populations (e.g.Lower Hudson River) from which other snakehead populations are derived.Genetic analyses of fish from the native range of C. argus in China should be analyzed to confirm this hypothesis.
Currently, prohibiting the sale and possession of live C. argus has not prevented the movement and introduction of snakehead along the U.S. eastern sea board.Hulme (2009) determined that eliminating any variables that contribute to invasive species dispersal can greatly reduce the risk of intentional or inadvertent introduction; specifically, eradication of incipient populations may be the best solution if the introduction is a single event and the invaded area is relatively small (Clout and Veitch 2002;Hulme 2006).While eradication may not be a feasible option for expanded populations in tributaries of Chesapeake Bay and the Potomac River, it is conceivable that populations that occur in small, discrete waterways could be eradicated (e.g.Lower Hudson River).Hulme (2006) suggests that a hybridized management strategy of eradication and control may be optimal in some cases.For C. argus, eradication of small outlying populations like the those in Queens, New York and Philadelphia, Pennsylvania, in concert with enhanced efforts to contain and control snakehead in Potomac and Chesapeake Bay tributaries, may be necessary to mitigate a much broader snakehead invasion in the eastern U.S.This is especially true in light of the apparent ability of snakehead to disperse long distances through estuarine environments.

Figure 1 .
Figure 1.Distribution of sampling sites and the collection locations of putative C. argus populations in the Eastern United States.For details see Supplementary material, TableS1.

Figure 2 .
Figure 2. Results from Bayesian clustering analysis when K=2.Each individual is represented by a vertical bar in K colored segments, where K is the number of clusters and the length of the segment is proportional to the individual's membership in the cluster.The run (out of three replicates) with the highest posterior probability (and determined ΔK) is shown.Black vertical bars bisecting the plots delineate putative populations.

Figure 3 .
Figure 3. Principle Coordinate Analysis (PCoA) of standardized genetic distance matrix of C. argus populations in eastern U. S. Coord. 1 explains 23% of variation in the data; Coord. 2 explains 12% of variation in the data.

Table 1 .
Primer sequences for northern snakehead microsatellite loci listed 5' to 3' with fluorescent tags (in bold).

Table 2 .
Genetic variability of four microsatellite loci for six northern snakehead groups in the United States.

Table 3 .
Pairwise F ST values (below diagonal) between six northern snakehead populations in the United States based on four microsatellite loci.Significant F ST values were calculated after sequential Bonferonni corrections and are shown in bold.