Patterns of genetic structuring at the northern limits of the Australian smelt (Retropinna semoni) cryptic species complex

Freshwater fishes often exhibit high genetic population structure due to the prevalence of dispersal barriers (e.g., waterfalls) whereas population structure in diadromous fishes tends to be weaker and driven by natal homing behaviour and/or isolation by distance. The Australian smelt (Retropinnidae: Retropinna semoni) is a native fish with a broad distribution spanning inland and coastal drainages of south-eastern Australia. Previous studies have demonstrated variability in population genetic structure and movement behaviour (potamodromy, facultative diadromy, estuarine residence) across the southern part of its geographic range. Some of this variability may be explained by the existence of multiple cryptic species. Here, we examined genetic structure of populations towards the northern extent of the species’ distribution, using ten microsatellite loci and sequences of the mitochondrial cyt b gene. We tested the hypothesis that genetic connectivity among rivers should be low due to a lack of dispersal via the marine environment, but high within rivers due to dispersal. We investigated populations corresponding with two putative cryptic species, SEQ-North (SEQ-N), and SEQ-South (SEQ-S) lineages occurring in south east Queensland drainages. These two groups formed monophyletic clades in the mtDNA gene tree and among river phylogeographic structure was also evident within each clade. In agreement with our hypothesis, highly significant overall FST values suggested that both groups exhibit very low dispersal among rivers (SEQ-S FST = 0.13; SEQ-N FST= 0.27). Microsatellite data indicated that connectivity among sites within rivers was also limited, suggesting dispersal may not homogenise populations at the within-river scale. Northern groups in the Australian smelt cryptic species complex exhibit comparatively higher among-river population structure and smaller geographic ranges than southern groups. These properties make northern Australian smelt populations potentially susceptible to future conservation threats, and we define eight genetically distinct management units along south east Queensland to guide future conservation management. The present findings at least can assist managers to plan for effective conservation and management of different fish species along coastal drainages of south east Queensland, Australia.


INTRODUCTION
Genetic structure in aquatic fauna is strongly influenced by the characteristics of the ambient environment. Freshwater species typically exhibit higher levels of genetic differentiation than those living in estuarine or marine habitats (Ward, Woodwark & Skibinski, 1994;Sharma & Hughes, 2009). These greater levels of genetic structure in freshwater fish from different drainages are the result of the isolating nature of drainage systems and relatively smaller population sizes compared with marine species (McGlashan & Hughes, 2002;Huey, Baker & Hughes, 2010). Strong to moderate genetic structure is observed in many freshwater fish species suggesting restricted gene flow and limited dispersal among populations (Leclerc et al., 2008;Pereira, Foresti & Oliveira, 2009;Huey, Baker & Hughes, 2010). Movement by obligate freshwater organisms is limited to the water column and the freshwater environment, preventing inter-catchment dispersal via the sea (Burridge et al., 2008;Hughes, Schmidt & Finn, 2009;Bernays et al., 2015). In addition, various in-stream barriers to dispersal including waterfalls, dam walls, habitat heterogeneity, dried river reaches, and steep catchment gradients may act to restrict gene flow among populations within catchments (Amoros & Bornette, 2002;Huey, Baker & Hughes, 2010). As a consequence of the physical limitations to dispersal in freshwater environments, populations of aquatic organisms are often highly differentiated both among and within catchments (McGlashan & Hughes, 2000;Hughes, 2007;Sharma & Hughes, 2009). For a particular species, it is essential to understand the levels of population differentiation, genetic diversity, and rates of gene flow among populations for proper conservation and management of freshwater ecosystems (Geist, 2011).
The southern smelt (Retropinnidae: Retropinna spp.) is a common fish species distributed throughout the rivers of south-eastern Australia (McDowall, 1996). Individuals reach a maximum length of about 100 mm total length (TL), although adults are usually 50-60 mm TL (Pusey, Kennard & Arthington, 2004). Australian smelts are currently formally recognised as two described species R. semoni Weber, and R. tasmanica McCulloch, but recent genetic analyses have identified a complex of five or more cryptic species across their geographic range based on allozymes, microsatellites and mitochondrial DNA data (Hammer et al., 2007;Schmidt, Islam & Hughes, 2016). Otolith chemistry studies in the southern part of their distribution have shown that Australian smelt exhibit a range of life history patterns, including freshwater residency, facultative diadromy and estuarine residency (Crook, Macdonald & Raadik, 2008;. In inland regions of Australia, large numbers of Australian smelt have been observed moving upstream through fishways (e.g., Baumgartner & Harris, 2007) and the species is widely described as potamodromous (i.e., migration within freshwater for the purpose of breeding) (e.g., Rolls, 2011). Nonetheless, Woods et al. (2010) found strong genetic structure among inland populations of Australian smelt and suggested low levels of dispersal in at least some regions. These differences between studies could reflect differing life-histories among the cryptic species.
In most studies to date, diadromous behaviour has been shown to facilitate genetic connectivity among river catchments and typically results in ''isolation-by-distance'' (IBD) patterns of population genetic structure (Keenan, 1994;Jerry & Baverstock, 1998). In Australian smelt, however, there is strong genetic differentiation among catchments across the southern part of the range-even among populations containing diadromous individuals, suggesting high retention of fish within estuaries and a lack of marine dispersal . The aim of the current study was to examine patterns of genetic connectivity of populations in the north of the geographic range of Australian smelt, which contains two putative cryptic species ('SEQ', 'CEQ' sensu. Hammer et al., 2007) that differ from those studied in detail previously ('MTV', 'SEC' sensu. This study focuses on the SEQ lineage which was further subdivided into northern and southern groups (SEQ-N, SEQ-S) following Page & Hughes (2014). Mitochondrial DNA sequence data combined with nuclear data from 10 microsatellite loci were used to test the hypotheses that, (i) northern R. semoni would display high population structure among rivers similar to southern populations; and (ii) that genetic structure within rivers would be low due to dispersal within rivers.

Sampling strategy
A total of 389 individual samples were collected from 15 locations in south-east Queensland, Australia ( Fig. 1; Table 1A). Samples were collected using a hand-held seine net from an upstream and a downstream site from each of eight river systems (except the Noosa River-downstream only). We aimed to collect at least 30 individuals per site but this was not always possible, as the species was not abundant in all rivers. Fin clips or entire individuals were placed in 95% ethanol in the field and stored prior to preparation for analysis. All procedures were carried out according to Australian Ethics Commission protocol number ENV/23/14/AEC.

Molecular methods
Genomic DNA was extracted from fin tissue using the DNeasy Blood and Tissue kits (Qiagen, Germantown, MD, USA) following the manufacturer's directions. Microsatellite markers developed for R. semoni were amplified and genotyped using primers developed by Islam, Schmidt & Hughes (2017). Ten loci were screened across all individuals. The ten loci were BS18, BS3, BS4, BS5, BS20, BS21, BS22, BS24, BS8 and MS24. Microsatellite screening was carried out in 10 µl polymerase chain reaction (PCR) consisting of 0.5 µl of genomic DNA, 0.2 mM reverse primer, 0.05 mM tailed forward primer, 0.2 mM tailed fluorescent tag (either FAM, VIC, NED or PET, Applied Biosystems), 1 × PCR buffer (Astral Scientific, Caringbah, Australia) and 0.02 units of taq polymerase (Astral Scientific, Caringbah, Australia). The following basic thermocycler settings for PCR were performed: initial denaturation at 94 • C for 4 min, followed by 35 cycles at 94 • C for 1 min, 57 • C for 30 s, 72 • C for 1 min and a final extension at 72 • C for 7 min. Fluorescently  labelled PCR products were pooled and added to 10 µl of Hi-Di TM formamide with 0.1 µl of GeneScan TM 500 LIZ size standard. Fragment analysis was conducted on an ABI PRISM 3130 Genetic Analyzer (Applied Biosystems, Foster City, CA, USA) according to the manufacturer's instructions. Data were scored using GENEMAPPER version 3.1 software (Applied Biosystems, Foster City, CA, USA). Two individuals from each of the 15 populations represented in the microsatellite study were randomly selected for mtDNA analysis except the Mooloolah_upper (MLU) site from which one additional sample was included for analysis. Samples from four additional sites not included in microsatellite analysis were also sequenced-two sites from the Mary River and two sites from the Brisbane River ( Fig. 1 and Table 1B). In total 68 individuals from nineteen sites were sequenced. A 666 bp fragment of the cytochrome b region of the mtDNA genome was selected for sequence analysis. The primers HYPSLA and HYPSHD (Thacker et al., 2007) were used to amplify the region in 10 µL reaction mixtures. PCR conditions were 4 min at 95 • C, followed by 45 cycles of 30 s at 95 • C, 45 s at 53 • C, 45 s at 72 • C and a final extension cycle of 7 min at 72 • C. MtDNA sequences were edited and aligned using Geneious version 9.1.5 (Kearse et al., 2012).

Genetic diversity
Microsatellite genotype frequencies were checked for the presence of null alleles, large allele dropout and stuttering artefacts using MICRO-CHECKER v2.2.3 (Van Oosterhout et al., 2004). Tests for linkage disequilibrium (LD) and departures of genotypic proportions expected under Hardy-Weinberg Equilibrium (HWE) were undertaken with exact tests for each population and over all loci using default settings in GENEPOP v4 (Rousset, 2008). Probability values were corrected using standard Bonferroni correction (Rice, 1989) whenever multiple testing was performed.

Population genetic structure
Genetic structure among the 15 populations was quantified through estimating pairwise and global F ST values in ARLEQUIN. These were tested for significant deviation from panmictic expectations by 10,000 permutations of individuals among populations. Populationspecific F ST values were calculated using GESTE v2.0 (Foll & Gaggiotti, 2006) to evaluate the contribution of individual population samples to overall F ST .
ARLEQUIN was used to evaluate the geographic structuring of genetic variation. F ST was calculated for each locus separately and as a weighted average over the ten micosatellite loci. Statistical significance of Bayesian clustering methods implemented in STRUCTURE v.2.3.1 (Pritchard, Stephens & Donnelly, 2000) were applied to estimate the number of genetically homogeneous clusters (Latch et al., 2006;Hasselman, Ricard & Bentzen, 2013). This programme builds genetic clusters by minimizing linkage disequilibrium and deviations from Hardy-Weinberg equilibrium expectations within clusters. All individuals were assigned to clusters without prior knowledge of their geographic origin using the admixture model with correlated allelic frequencies. Ten independent runs with the number of potential genetic clusters (K ) from 1 to 16 were carried out to verify that the estimates of K were consistent across runs. The burn-in length was set at 250,000 iterations followed by a run phase of one million iterations. The generated results were imported into the software STRUCTURE HARVESTER (Earl & vonHoldt, 2012) to calculate the ad hoc K statistic (Evanno, Regnaut & Goudet, 2005). The K value where K had the highest value was identified as the most likely number of clusters.

Analysis of isolation by distance
A test for a positive association between genetic and geographic distances (Isolation by distance (IBD)) based on microsatellite DNA loci was carried out using a Mantel test (10,000 permutations) in ARLEQUIN. Genetic distance was represented as F ST . Stream distances were calculated between river mouths and then sample sites using Google Earth.
Migration and gene flow BAYESASS v1.3 was used to calculate contemporary migration rates over the past few generations, where m is the proportion of immigrants in a focal population i that arrive from a source population j (Wilson & Rannala, 2003). Migration rates were estimated for all pairs of sites and rate were reported that fell outside the 95% confidence interval simulated for uninformative data (Wilson & Rannala, 2003). We also used the Bayesian assignment procedure of Rannala & Mountain (1997), as implemented in GENECLASS 2 (Piry et al., 2004) to estimate whether our samples might contain individuals that were first generation (F 0 ) immigrants from unsampled populations. Here we used the Paetkau et al. (2004) method to compute probabilities from 10,000 simulated genotypes to identify F 0 immigrants.

Analysis of mtDNA sequence data
A neighbour-joining (NJ) tree analysis was performed using the HKY distance model in Geneious version 9.1.5 with 1,000 bootstrap replicates (Kearse et al., 2012). In addition to the 68 sequences generated from this study, two Genbank accessions were used, one representing R. tasmanica: JN232589; and one representing R. semoni: JN232588 (Burridge et al., 2012). The R. semoni sequence JN232588 lacks locality information (C Burridge, pers. comm., 2017), but likely belongs to a southern lineage of R. semoni which are known to have a closer mtDNA relationship with R. tasmanica than to northern lineages .

Genetic variability and levels of differentiation
After Bonferroni correction, 3 out of 15 populations exhibited deviations from HWE in only one or two loci (Table S1). All loci were kept for further analyses since deviations were not consistent across populations. Instances of null alleles estimated using MICRO-CHECKER were rare and not consistently associated with specific loci or populations (Table S2). No evidence for genotypic linkage disequilibrium was observed between any pair of loci across all populations.
Population genetic diversity indices are shown in Table 1A. Microsatellite genetic diversity was high. Mean number of alleles per population ranged from 4.60 (MLD) to 14.70 (CMU) and heterozygosity averaged across loci ranged from 0.566(MLU) to 0.887 (CMD and CMU). Most sites exhibited positive F IS values, indicating that most of the populations had slight heterozygote deficits.
Most of the pairwise F ST values between the 15 populations were significant and ranged from −0.018 to 0.404. The SEQ-N populations were more diverged from one another than the populations in the SEQ-S group. The lowest pairwise F ST value (F ST = −0.018; P < 0.05) was observed between populations NRD and NRU. The highest genetic divergence (F ST = 0.404; P < 0.05) was observed between populations NSD and MLU. Out of 105 comparisons, only two comparisons were non-significant (P > 0.05) and each of these pairs was from within the same river (Logan-Albert and Nerang). However, one site of each of these pairs contains a small number of samples which probably reduces the power to detect the significant F ST values. Generally F ST comparisons revealed much less divergence among populations within the same river than between populations from different rivers ( Table 2). STRUCTURE analysis revealed the highest likelihood was at K = 8 clusters (Average log probability of data Ln[P(DK)] ± SD = −15125.4 ± 0.584618) (Fig. 2, Table S3) indicating this as the best estimate of the true number of genetic clusters. The height of K was used as an indicator of the strength of the signal detected by STRUCTURE (Evanno, Regnaut & Goudet, 2005). K also showed the highest peak at K = 8, suggesting eight genetically homogeneous clusters across the sampled populations (Fig. S1) and negligible migration was observed among rivers (Fig. 2).
Strong genetic structure was observed by AMOVA. The AMOVA showed significant genetic differentiation between the two groups (SEQ-N and SEQ-S) (F CT = 0.05), but also among populations within groups (F SC = 0.19) (Table 3A). There were similar patterns observed in the two groups when each group (SEQ-N and SEQ-S) was analysed separately, with the F CT value (among rivers) higher than the F SC value (among sites within rivers) in both groups (Table 3Bi and 3Bii). However, the overall F ST values, and each of the other F statistics in the hierarchy were slightly higher within the SEQ-N group than the SEQ-S group.

Isolation-by-distance
There was a significant correlation between genetic differentiation and stream distance among populations in the SEQ-S group (R 2 = 0.3687, p = 0.001; BRD, BRU, LAD, LAU,   (Table 1A) in south-east Queensland, Australia. Individual sampling locations is listed below the figure and see Table 1A for site codes and locations. Each column represents one individual and the colours represent the probability membership coefficient of that individual for each genetic cluster. Results of Evanno's Mean LnP(D) and K (Table S3;  CMD, CMU, NRD, NRU, CRD, and CRU) (Fig. 3A), but not for the SEQ-N group (R 2 = 0.0355, p = 0.302; MRD, NSD, MRU, MLD and MLU) (Fig. 3B).

Contemporary migration
Very little contemporary migration was observed among the coastal river populations. Only six sampled populations contained individuals that were identified as potential immigrants from the BAYESASS analysis. In all cases, the putative source population was the paired site within the same catchment (Table 4). In most cases, dispersal was from the upstream to the downstream site. Only individuals from Currumbin Creek were estimated to have dispersed in an upstream direction. Only thirty-six (<10%) of 389 individuals across all sites were identified as F 0 migrants using the ''detection of first generation migrants'' option in GENECLASS2 (Table 5).

MtDNA sequences analysis
The edited alignment for the cyt b gene was 575 bp and included 121 variable positions. All sequences are lodged under GenBank accession numbers MG867590-MG867657. The neighbour-joining tree revealed two strongly supported clades (bootstrap 89% SEQ-S; 96% SEQ-N; Fig. 4). Phylogeographic structure was also clearly evident within clades. All individuals from four sites in the Brisbane River formed a distinct clade, and all three rivers sampled for the SEQ-N lineage formed shallow clades (i.e., Mary, Noosa and Mooloolah rivers; Fig. 4). Genetic distance was high between northern smelt lineages and the southern smelt sequences used as outgroups (uncorrected mean nucleotide distance 0.15-0.17). The mean nucleotide distance between the two northern lineages (SEQ-N and SEQ-S) was 0.04 (SE = 0.007).

Population structure and dispersal
Based on previous studies of Australian smelt (Woods et al., 2010;, we had hypothesized that R. semoni in south-east Queensland would exhibit limited genetic connectivity among river systems. Our findings of strong genetic differentiation among rivers support this hypothesis. For both groups (SEQ-N and SEQ-S), there were highly significant F ST values, which indicated that populations were not panmictic. The results of the significant genetic differentiation among rivers were also consistent with the results of population structure revealed from the STRUCTURE analysis, suggesting restricted gene flow and limited dispersal among populations in both groups. Limited dispersal Table 4 Estimates of contemporary migration rates (m) for each populations based on microsatellite data. Migration rate estimates were based on unidirectional assignment of microsatellite genotypes in BAYESASS v1.3. Proportion of nonmigrant values represents the proportion of individuals assigned back to their sampling site of origin. Mean and 95% CI (confidence interval) values for uninformative data were 0.833 (0.675, 0.992). Putative source of migrants represents source population supplying migrants into the focal population where the estimated migration rate exceeds the upper 95% CI value for uninformative data (0.0124). Migration rates and source of migrants were not provided for cases where the 95% CI of estimated migration rate overlapped with the 95% CI value for uninformative data. was supported by our first-generation migrant detection analysis in Geneclass2, which demonstrated that less than 10% of individuals in each population were immigrants. The sample from Tinana Creek (MRD site), was differentiated from the rest of the populations in the SEQ-N group (Table 2). This might be the result of a barrier which separates Tinana Creek from the rest of the Mary River system despite their close proximity to one another. Tinana Creek runs into the Mary River not far from the mouth, with both drainages having tidal estuarine reaches in the lower sections. The differentiation of the Tinana Creek population from the main stem of the Mary River is also observed in a number of other freshwater species including Mary River Cod, Maccullochella mariensis (Huey, Espinoza & Hughes, 2013), Mary River Turtle, Elusor macrurus (Schmidt et al., 2018), freshwater crayfish Cherax dispar (Bentley, Schmidt & Hughes, 2010) and Australian lungfish Neoceratodus fosteri (Hughes et al., 2015).
Although populations in the SEQ-N group were slightly more highly structured than those in the SEQ-S group, fishes in both groups exhibited restricted gene flow. The high genetic structuring in SEQ-N populations might be the result of genetic divergence within the lineage that occurs over a very small geographical range. The lower levels of genetic structure of R. semoni populations in the SEQ-S group than the SEQ-N group suggests that more current or recent gene flow occurs within and among the catchments in SEQ-S group than for the SEQ-N group. This could be because the catchments in the SEQ-S group have Table 5 Results of the assessment for detecting first-generation migrants performed using GENECLASS2 showing the number of individual migrants (P < 0.05) detected per sampling location. Results are based on the L_home/L_max statistic for microsatellite data. L_home/L_max is the ratio of the likelihood computed from the population where the individual was sampled (L_home) over the highest likelihood value among all population samples including the population where the individual was sampled. F 0 is the first generation migrant. See Table 1A for site codes and locations.  been connected more recently and/or more often than those in the SEQ-N group. This could be because during extreme flood events, SEQ-S catchments might be connected by freshwater plumes flowing into Southern Moreton Bay. Another plausible reason might be that these cryptic species (SEQ-N and SEQ-S) have different microhabitat preferences, with SEQ-N being less tolerant of brackish conditions (Hughes, 2015). An important model for stream dwelling species is isolation by distance (IBD). In this model, equilibrium between genetic drift and gene flow may be reached in species where the life time dispersal distance is less than the range. Here, a relationship between stream distance and genetic differentiation should be evident (Wright, 1943). In this study, a strong IBD relationship was identified among the SEQ-S populations, but not among SEQ-N populations. This suggests that for SEQ-S populations, dispersal, when it occurs, is more likely between nearby catchments. Similar IBD relationships have been reported for other coastline restricted species (Keenan, 1994;Jerry & Baverstock, 1998;Shaddick et al., 2011;Schmidt et al., 2014). Lack of IBD for the SEQ-N group may be attributed to an insufficient number of population samples available for comparison and/or the greater degree of population isolation within this group relative to the SEQ-S group, consistent with the overall higher F ST estimates among SEQ-N populations.  observed similarly contrasting patterns of population genetic structure between cryptic species groups of southern Australian smelt. In that study, two informal species groups (MTV and SEC) with adjacent distributions along the western and eastern coast of southern Victoria had microsatellite-based F ST values of 0.19 and 0.07 respectively . Using otolith microchemistry,  also showed that the more structured western group (MTV) had a greater proportion of nondiadromous populations relative to the weaker structured eastern group (SEC). The similar pattern of contrasting structure observed here between northern groups in the Australian smelt complex (SEQ-N and SEQ-S), is probably not due to differences in diadromous behaviour because preliminary evidence from otolith chemistry suggests all of these populations are nondiadromous (R Islam, 2017, unpublished data). Higher structuring of the SEQ-N group could possibly be due to genetic drift if these populations have been established for a longer period of time at the northern-most limit of Australian smelt distribution relative to the SEQ-S populations.

Sample to
The complementary pattern of population structuring in both microsatellite and mtDNA data between the SEQ-N and SEQ-S groups could reflect long-term phylogeographic isolation or perhaps further evidence for cryptic species within Australian smelts as first highlighted by Hammer et al. (2007). Mean cyt b divergence of 4% between SEQ-N and SEQ-S samples is close to the 3.6% divergence observed for the full mitochondrial molecule reported by Schmidt, Islam & Hughes (2016), and is within the range of lineage divergence reported for R. semoni in southern Queensland (Page & Hughes, 2014). It should be noted that Schmidt, Islam & Hughes (2016) incorrectly attributed mitochondrial sequence data for the SEQ-N group to the CEQ group (i.e., Genbank accession KX421785 corresponds to SEQ-N and KX421784 corresponds to SEQ-S). The level of cyt b divergence between the SEQ groups and southern lineages of R. semoni is very large (15-17%) and adds to previous studies that have highlighted the likely existence of a cryptic species complex within the taxon currently referred to as R. semoni (Hammer et al., 2007;.

Contemporary migration
The Bayesian assignment analysis detected contemporary movement of individuals only between proximate sites within rivers (Table 4). Contemporary dispersal was not observed between rivers. Although, most of the sites that we sampled were within 10-60 km stream distance of another sampled site, there was no contemporary dispersal among the majority of those rivers in either group. The present results coincide with the findings of some southern smelt populations, where contemporary movement was observed among populations within a catchment (Woods et al., 2010). The species appears to occur in a wide range of freshwater habitats, many of which are isolated by long stretches of unfavourable habitat. Our data therefore suggest that if local extinctions occur in one or more of these streams within a reach of the river, then recolonization from elsewhere is unlikely to occur rapidly.

CONCLUSION
Finally, the present study revealed high levels of population structuring within and between drainages, which suggested that contemporary movement among sites was rare and limited to sites within the same river. Little conservation attention has been given to the Australian smelt since it has long been considered a common species distributed widely across southeastern Australia. The findings of the present study and other recent research (Hammer et al., 2007;Crook, Macdonald & Raadik, 2008; suggest that southern smelts are a genetically complex and ecologically diverse taxonomic group. Therefore, proper conservation and management will require appropriate taxonomic treatment to align species with the clear genetic divisions now recognised across the range of Australian smelt. Further, eight isolated management units (MUs) along the south-east Queensland drainages were detected in R. semoni from the microsatellite dataset (Fig. 2) demonstrating little to no gene flow between them. These management units align with individual coastal catchments, which suggest that other genetically distinct populations may exist in coastal rivers not sampled in this study. High levels of genetic divergence between the two lineages have important implications for the conservation of these endemic freshwater cryptic fish species. Therefore, findings of the present study on population structure of Australian smelt will help to formulate effective management and conservation plans for this cryptic species complex across their geographic range.