Introduction

Tropicalisation, the displacement of species from equatorial latitudes to temperate locations, is one of the major reported consequences of climate changes1,2,3. The increase of sea surface temperatures (SSTs) in the last decades has promoted shifts in the distributional ranges of species (e.g.,4,5) with individuals moving into areas best corresponding to their physiological optimum. Additionally, the ability of a species to colonise new habitats is influenced by oceanographic currents, the existence of adequate resource availability (i.e., habitat and food) and life-history patterns (e.g., number of eggs produced, age or parental care). These movements lead to the colonization of more poleward habitats by low latitude species, and create new interactions between species and new trophic assemblages. In commercial species, these shifts due to climate change can be magnified by fishing pressures, as reported for the North Sea cod (e.g.,6).

Poleward colonization by organisms with a typically equatorial distribution was described almost three decades ago for terrestrial organisms in association with postglacial recolonisation routes (e.g.,7,8,9,10). As a general rule, organisms follow a leptokurtic distribution type, in which the majority of individuals stay at or near the original area, and only a fraction disperse to longer distances. This range extension is usually done in a stepping-stone manner, implying that each settlement has fewer individuals compared with the previous one. Theoretically, this process corresponds to multiple successive genetic founder events associated with the corresponding genetic implications of the downsize in the effective population numbers: the erosion of the genetic diversity by genetic drift induces allele loss, the “southern richness and northern purity” paradigm11. Additionally, the “central-marginal hypothesis”12 posits that populations at the centre of the distribution have higher population sizes and gene flow, and peripheral (leading edge) populations, are smaller in size, have lower genetic diversity and will be more genetically differentiated12.

Phylogeographic studies using mitochondrial and nuclear genes revealed that species with similar environmental requirements and life-history traits often present distinct genetic and demographic historical patterns13,14. Among fish species of the northeast Atlantic, we can find the co-existence of: (1) panmictic populations without latitudinal variation in genetic diversity (e.g. Lipophrys pholis15); (2) significant population structure with similar levels of genetic diversity throughout the entire species range (e.g. Taurulus bubalis16); and (3) sharp decline of genetic diversity from south (west European) to northern (Scandinavian) populations (e.g. Pomatoschistus microps17; Symphodus melops18; Labrus bergylta19). It is less common the observation of geographical range expansion of marine species without loss of genetic diversity but, for examples, see16,20,21. Contemporary and increasingly visible tropicalization will contribute to more records of fast range expansion events.

The Senegal seabream, Diplodus bellottii Steindachner, 1882 (Sparidae), displayed an outstanding range expansion from its endemic origin (Senegal to Cape Blanco in Mauritania22) to the Atlantic coast of the Iberian Peninsula within a decade (see Fig. 1 for specific dates). There are no records from the archipelagos of Madeira, Canaries or Cape Verde23. However, its presence is reported in Mauritania24 and Morocco25 during the late 60’s. In the 1970’s, the species was consistently recorded in the Iberian Peninsula, first in Cadiz26 then in Algarve27, moving progressively towards North to Sado and Tagus (1995) estuaries28, and Ria de Aveiro in the north Atlantic coast of Portugal29 (Fig. 1). These movements are thought to result from the sea temperature rise30. The Tagus estuary experienced a major increase in the abundance of the Senegal seabream between 1979–1981 and 1995–199728. Presently, the species is a common bycatch species of the artisanal fisheries around the Lisbon area31. The species is therefore a good candidate to illustrate the effect of tropicalisation on the genetic make-up of the species, and on the relationship between colonisation success and genetic diversity, without the contamination of past recolonisation events.

Figure 1
figure 1

Present-day distributional range, sample locations and consensus modeled distribution of the Senegal seabream (Diplodus bellottii) in Northern West Africa and Iberia. Locations: MAU—Mauritania; CAD—Cadiz; ALG—Algarve; SES—Sesimbra; LIS—Lisbon. Perennial range in grey and year of first record of expanded range in horizontal lines. Color-coded sampled locations over-layed on the consensus current modeled probability of occurrence distribution (see “Material and methods” section). Note: The original estimated suitability value was divided by 1000 to convert the suitability value into a probability of occurrence. As a rule of thumb, sites with suitability higher than 0.5 predict presence, while sites with suitability lower than 0.5 indicate absence. Figure generated using the Biomod2 package (https://cran.r-project.org/web/packages/biomod2) implemented in the R32 (version 3.5.3), and Adobe Illustrator CC2019 (version 23.0.1) (https://www.adobe.com/products/illustrator.html).

Diplodus bellottii breeding is synchronous and occurs mostly from April to June and sexual maturity is reached at 1 year old for females and two for males31. The same study analysed the otoliths of this species and found that the oldest specimens were 9 years old, corresponding to females with at least seven reproductive seasons31.

In this study, we used the documented expansion of D. bellottii as an opportunity to assess putative founder effects by looking for evidence of reductions in the number of mitochondrial or nuclear singleton haplotypes (haplotypes seen only once in a sample, i.e. an unshared haplotype) coherent with a predictable loss of overall genetic diversity. We hypothesise that similarly to several other marine fish species in this area of the Northeastern Atlantic such as Halobatrachus didactylus33, Symphodus melops34 or Labrus bergylta19, the recent observed range expansion will lead to a substantial loss of diversity. Also, the expanding low-density leading edge is by definition constituted by edge patches, which are most typically affected by founder events35, where demographic and allelic composition stochasticity36 is predictably stronger. If no significant reduction in singleton haplotypes is observed, and no differences among haplotype frequencies are found, there would be no evidence for a founder event or genetic bottleneck and the species may be panmictic. Alternatively, if no significant reduction in singleton haplotypes is observed but there are differences among haplotype frequencies, the species may be genetically structured. This study is the first appraisal on the genetic variability of this species across its range, a key information to understand its astonishing northward expansion. Additionally, we used species distribution modelling with present-day conditions to evaluate suitable areas for potential colonization, and explored whether the sampling coverage allowed an effective representation of the genetic diversity of the species.

Results

A total of 357 base pairs of the mitochondrial control region were sequenced for 124 individuals, defining 118 haplotypes and 548 base pairs of the S7 first intron were resolved for 96 individuals, defining 69 haplotypes, from five sampling locations (Fig. 1). Haplotype and nucleotide diversities were high in all locations (Table 1), except in Cadiz where diversities were low. Haplotype diversities were close to 1.000 (hCR = 0.997 to 1.000 and hS7 = 0.960 to 1.000) and nucleotide diversities were high, ranging from πCR = 0.4% (Cadiz) to 10.4% (Mauritania) and πS7 = 0.0% (Cadiz) to 2.7% (Mauritania) with πCR = 7.9 and πS7 = 0.1% for the entire data set (Table 1).

Table 1 Sample locations, sample abbreviations, collection dates, sample sizes and summary statistics for one mitochondrial fragment of mtDNA control region (CR) and first intron of the S7 ribosomal protein (S7) of Diplodus bellottii.

In the entire data set the three estimates of fixation GST, G′ST, θ and the differentiation estimate D reveal significant values for CR (GST = 0.068, CI = 0.053–0.089; G′ST = 0.981, CI = 0.940–0.999; θ = 0.045, CI = 0.032–0.062; D = 0.979, CI = 0.936–0.999) and for S7 (GST = 0.233, CI = 0.203–0.292; G′ST = 0.990, CI = 0.979–0.998; θ = 0.076, CI = 0.054–0.105; D = 0.988, CI = 0.972–0.997). The same parameters behaved differently in pairwise comparisons for both markers (Table 2), with GST and θ revealing largely non-significant comparisons, and G′ST and D showing all comparisons statistically significant. This means that haplotypes are usually distinct among the five sampling sites, with complete haplotypic differentiation (D = 1) involving four pairs of sampling sites in CR and six in S7, including locations that are not the most distant ones, such as Algarve and Cadiz.

Table 2 Pairwise CR and S7 differentiation in Diplodus bellottii.

The haplotype accumulation curves failed to reach the asymptote for any of the markers (Fig. 2), indicating that only part of the actual genetic diversity was captured. Both markers had largely overlapping haplotype rarefaction curves and confidence intervals, although the Chao 1 estimated total diversity for the CR region of 690 haplotypes and 206 for the S7.

Figure 2
figure 2

Haplotype rarefaction curves and the Chao 1 estimated total haplotypes for each marker in Diplodus bellottii. Shaded areas indicate 95% confidence interval from 1000 permutations. The 1:1 line indicates the maximum where each new sample represents a new haplotype. Figure generated using the chaoHaplo function (https://CRAN.R-project.org/package=spider) of the spider package implemented in R language32 (version 3.5.3), and Adobe Illustrator CC2019 (version 23.0.1) (https://www.adobe.com/products/illustrator.html).

The main characteristic exhibited by the two markers’ haplotype networks is their bush-like complexity, with no central or otherwise extremely abundant haplotype. The networks displayed a large number of singleton haplotypes and very few shared haplotypes among sampling sites (coloured circles), a pattern characteristic of DNA hyperdiversity (Fig. 3). The CR region had only five shared haplotypes, with the most frequent haplotype shared among Mauritania, Cadiz and Lisbon, and the others between Lisbon–Sesimbra (the closest locations), Sesimbra–Algarve, Sesimbra–Cadiz and Lisbon–Cadiz. The S7 intron has seven shared haplotypes, four between Lisbon and Sesimbra (the closest locations), and one between Lisbon and Algarve, Lisbon and Mauritania, and Sesimbra and Mauritania.

Figure 3
figure 3

Statistical parsimony network illustrating the genealogical relationships of Diplodus bellottii among CR (a) and S7 (b) haplotypes (threshold of statistical significance = 95%). The size of the circle corresponds to the haplotype frequency. Pie charts indicate the proportion at which each haplotype occurs at each location. Black dots indicate the number of mutational differences separating sampled haplotypes or hypothetical (unsampled) haplotypes. Figure generated using the TCSBU37 and Adobe Illustrator CC2019 (version 23.0.1) (https://www.adobe.com/products/illustrator.html).

The neighbour-net network (Fig. 4) showed two lineages diverging from an unresolved polytomy, as represented by the box-like appearance of the internal branches, evident in the CR less so in the S7 network. The two lineages occurred at all sampling sites, but the χ2 test of the total frequencies of individual membership per sampling site yielded significant results only for the CR (χ2 = 10.31, p = 0.036) and not for the S7 (χ2 = 5.70, p = 0.222).

Figure 4
figure 4

Neighbour-net networks based on 124 mitochondrial DNA CR sequences (a) and 96 S7 nuclear DNA (b) of Diplodus bellottii. Value between groups in CR corresponds to bootstrap replicates. Figure generated using the Splitstree38 and Adobe Illustrator CC2019 (version 23.0.1) (https://www.adobe.com/products/illustrator.html).

One of the main features of the BIOMOD2 software is the ability to combine predictions made in single models in an ensemble. The species distribution modelling produced six models (ANN, FDA, GBM, GLM, MARS, and RF) that can be considered good to excellent while CTA, Maxent and SRE have a poor fit with TSS and ROC values under 0.7 (Fig. 5). Among the best performance models were RF (ROC = 0.950, TSS = 0.826) and GBM (ROC = 0.951, TSS = 0.822).

Figure 5
figure 5

Performance statistics of all the species distribution models. Plot of the mean (dot) and standard deviation (horizontal and vertical lines) of evaluation scores for the different modeling algorithms used with Biomod2, by algorithm (a) or cross-validation (b). Best models are closer to the top-right corner of the graph. TSS: true skill statistics; ROC: area under the receiver operating characteristic curve. Figure generated using the Biomod2 package39 implemented in the R language32 (version 3.5.3), and Adobe Illustrator CC2019 (version 23.0.1) (https://www.adobe.com/products/illustrator.html).

The percentage of the correctly predicted presence (sensitivity) ranged from 65 to 92% considering all models individually, with the highest value for the GBM model and the lowest for SRE (Fig. 5). Specificity (i.e., the % of absences correctly predicted) ranged between 79 and 90%, with the highest value for the MAXENT.Phillips model and the lowest for ANN. The visual assessment showed that the predictions of the consensus model reflected realistic spatial patterns of distribution of D. bellottii without apparent anomalies (Fig. 1). The highest probabilities of occurrence were in the Mauritania southern coast and Morocco, with discontinuous distribution patches, in contrast with south Iberia, that is associated with areas of high habitat suitability, and the Lisbon area. The overall model did not predict suitable habitats north of the Lisbon region, although the species is present in Aveiro, and predicted the area corresponding to the west Alboran Sea as a suitable habitat.

Discussion

Remarkably high haplotype and nucleotide genetic diversities, and significant population differentiation were consistently found throughout the present-day geographical range of the Senegal seabream. These results are not consistent with the central-marginal hypothesis nor with the expectations of a leptokurtic distribution, and the Senegal seabream seems to be able to maintain very high levels of diversity in marginal and recently-colonised areas. Before dissecting these results, it is appropriate to address three main caveats regarding this work. Firstly, sample sizes are small, considering the observed hypervariability of the mitochondrial control region and even smaller in the nuclear S7, as many individuals failed to amplify. This limitation constrains the breadth of the discussion, and results are therefore interpreted with caution. Moreover, no inferences on dispersal patterns can be made from the genetic data here presented. Secondly, there is a geographic sampling gap between Mauritania and Cadiz, where the species may be present, albeit in small numbers. We could not obtain any samples from that region, and species distribution modelling may be affected by that gap. Thirdly, the small number of existing geo-referenced presence records limited the use of more than four environmental predictors to be included in the consensus prediction D. bellottii potential distribution model.

Hyperdiversity in Diplodus bellottii

Benthopelagic fish such as D. bellottii usually display genetic diversity levels typically lower than pelagic species, due to differences in population sizes (e.g.40). Nevertheless, the diversity levels recorded in D. bellottii were similar to values observed in widely distributed pelagic species such as sardines (e.g.41) or other fish species with larger population sizes (see Table 1 in42). Interestingly, in the congeneric D. vulgaris43 and D. puntazzo13, the overall CR haplotype (0.983 and 0.998 respectively), and nucleotide diversity (1.8% and 4.1%) are also very high. Contrariwise, in another species of Diplodus, D. sargus, diversity measures were much lower (h = 0.700 and π = 0.8%)13. The differences between D. puntazzo and D. sargus genetic diversity values were attributed to a suite of fortuitous occurrences such as extinction and colonisation rather than to ecological differences between the two species13.

The results of the four statistics (GST, G’ST, θ, and D) were very different with GST; and θ, showing few pairwise significant differences, and G’ST, and D returning all pairwise comparisons significant. It is important to refer that both θ and GST suffer from the fact that this dataset is hypervariable, therefore restricting these estimators to values close to zero and overestimating gene flow (for a review see44). On the other hand, both G’ST and D, because they measure differentiation as the fraction of a deme's population that consists of private or near-private alleles45, will always be significantly different from zero. The hypervariable mtDNA in D. bellottii leads to a lack of shared haplotypes (with rare exceptions) among locations (Supplementary Table S1), but this does not necessarily reflect population genetic differentiation by fixation of alternative haplotypes as previously reported by46. This result may be explained by a single or a combination of small sampling sizes and very high mutation rates. Also, effective population size and demographic stability can be contributing factors, not in the recent colonised locations which may still be locally expanding in numbers, but in the putative source, the most southern location. We have shown that with the number of singletons present, to capture all the haplotypes in the species we would need to elevate the sampling to almost 700 individuals (see Fig. 2) which makes the first explanation plausible but does not rule out the other ones.

The high diversity recorded in D. bellottii is mainly due to a large proportion of singletons (91% and 74%, for CR and S7, respectively) and very few among-location shared haplotypes, 4% and 13%, for CR and S7, respectively. This fact makes all locations virtually significantly different from one another, with statistically significant values of D ranging between 0.847–1.000 and 0.849–1.000 for CR and S7, respectively. The expanding low-density leading edge is by definition constituted by edge patches, which are typically affected by founder events35, where demographic36 and allelic composition stochasticity are predictably stronger. However, alleles present in the leading edge patches, even if absent from other marginal locations are frequently shared with sites at the centre of the distribution47, which is not observed in our results.

The extremely high level of genetic diversity found across the entire distribution of D. bellottii and the paucity of shared haplotypes throughout all the sampled locations, including the native southern site, place our findings in apparent conflict with “southern richness and northern purity”11 and the peripheral populations (“central-marginal hypothesis”12) hypotheses. Non-expected high diversity was also found in the leading edge population of other species (e.g. Taurulus bubalis) suggesting that marine organisms with a long planktonic stage and high numbers of colonisers can travel long distances and export their diversity to the newly colonised areas16. In these cases, bottlenecks may not occur because populations are not close to the carrying capacity of the habitats. Diplodus bellottii is a prolific species at least in its southern distribution limit48 and it may have sufficiently large numbers of individuals to colonise in significantly large waves of individuals.

Possible causes of hyperdiversity

The hyperdiversity displayed by both CR and S7 markers does not automatically suggest restricted gene flow and/or population genetic differentiation. The sampling size did not capture the entire genetic diversity of the species as shown by the rarefaction curve, and did not provide a reasonable number of shared haplotypes. Moreover, abnormally high mutation rates may obscure the signal of gene flow49. The haplotype network (Fig. 2) with many unsampled haplotypes and haplotype accumulation curves (Fig. 4) indicate an acute subsampling of haplotypes, despite sampling sizes ranging from 11 to 55 per site, in a total of 124 specimens. As the expected diversity is ca. 690 haplotypes for the CR region and ca. 206 for the S7, many more individuals would be needed to capture a representative level of the genetic diversity of the species. It is therefore plausible that with increased sample sizes, one could detect more haplotypes in more southern locations, such as Mauritania, and shared haplotypes between these locations and the more recent colonised northern sites.

High genetic diversity in CR may have several explanations. The most plausible hypotheses involve: (1) the mutation rate of the fragment, (2) the evolutionary rates hypothesis, or (3) the metabolic rate theory (e.g.50). The mitochondrial control region is a high-mutation region organized in hypervariable regions flanking a central conserved region and is considered the most variant non-coding gene in fishes (e.g. for groupers51). This mitochondrial region is a marker of choice in phylogeographic and connectivity studies (but on the use of hyperdiverse mtDNA for population differentiation see46). The gastropod Melarhaphe neritoides revealed hypervariable mitochondrial DNA markers (16S, COI and Cytb) as a consequence of high mutation rates49. The second hypothesis used to explain high mutation rates in tropical areas (D. bellottii is considered a subtropical fish species), suggests that temperature and UV tend to generally increase mutation rates in the mtDNA of vertebrates52. The third hypothesis, the metabolic rate theory, posits that smaller animals accumulate mutations quicker than large animals53,54. However, it has been shown in the cold-water blue crab Callinectes sapidus, that in each recruitment event, the genetic diversity could be just as high as the one found in the adult population, although with different composition55. This would be exactly the opposite of a ‘sweepstakes effect,’ in which a small proportion of the gene pool contributes to the recruitment of the population56, therefore promoting reduced genetic diversity of populations. This way, in each seasonal recruitment, a new combination of alleles may constitute the new generation, which could help to explain the high genetic diversity in this blue crab species, even in populations near the limits of the species range55. Indeed, a similar pattern was found in two fish species, Lipophrys pholis and Atherina presbyter, with contrasting life-history patterns in a nursery rocky coastal area in Portugal57. In L. pholis only two out of 171 haplotypes were shared between three sampling periods, and in A. presbyter 25 haplotypes out of 155 were shared, which reveals an increase of private alleles in the gene pool over the years, even without a significant corresponding increase in diversity indices.

Geographic structure

Despite the results of this study in which all sampling sites are significantly different from each other, no geographical associated structure could be inferred from the networks. Lack of geographic genetic structure can be rooted in large effective population sizes, high potential for dispersal and weak physical barriers to gene flow58,59, all of which possible in D. bellottii. Regarding population sizes, some sparids present low effective size (e.g. Pagrus auratus,60), while other species present contrasting high values (as the congeneric D. sargus61). While the dispersal distance is often used as a function of the pelagic larval duration (PLD)62,63, there is ample controversy on the subject, with studies finding evidence for uncorrelated individuals’ PLDs and net distance travelled64,65. Although the exact PLD for our species is unknown, it is expected to be around 17 days, value reported for other Diplodus species (e.g.66); this value may however vary with latitude (a proxy of sea temperature). Finally, there are no apparent physical barriers between sampled locations, which may promote considerable admixture among individuals from different origins. However, the species distribution modelling evidenced stretches of coast along northwestern Africa that seem to be an unsuitable habitat for the species, which could theoretically foster some level of geographical isolation. Apparently, D. bellottii overreaches those soft barriers keeping its genetic diversity undisturbed.

The concept of “spatial selection” includes the idea that more available resources coupled with less competition may lead to spatial allele sorting and increased growth at the invasion front, with the corresponding increase in reproduction67. Spatial selection would favour dispersal, and since both dispersal and reproductive rate limit invasion (colonisation) speed, high reproductive rates promote faster range expansions. Because D. bellottii is under thermal stress at 30 °C68, a scenario of spatial selection in the most southern range is rather probable. The reduced tolerance to higher temperatures, coupled with the species preference for estuaries with large areas and high volumes but low depth nursery grounds, means that an increase in sea surface temperatures may result in habitat loss for juveniles, as the estuaries of southern Europe may get too warm. During the summer of 2006, D. bellottii was first detected at 40° N (north of Portugal;69) showing the quick continued northward expansion of this species. Although the results from the species distribution modelling do not show a particular affinity of this species for areas north to Lisbon, the continued raising of sea surface temperature will make these areas suitable for continual range expansion of D. bellottii. The impacts of this species in the recently colonised ecosystems, mainly in what concerns interactions with other species occupying the same ecological niche, has not been fully evaluated. The spatial overlap of D. bellottii with other sparid species is not consistent, across the literature, ranging from 47%69 to 94%30. These large differences may be explained by the dynamics of the estuaries, which suffer major changes in their hydrological features across the years. A noteworthy observation is that the spatial niche overlap between D. bellottii and D. vulgaris has been shown to increase with latitude69.

Origin of the two groups

The origin of the two groups (Fig. 4) is debatable and somewhat challenging to explain. When an isolation scenario is plausible, the existence of sister groups is easy to envisage. However, it is difficult to propose an allopatric (or peripatric) origin for the geographically co-distributed groups. Their distribution encompasses a continuous coastline with an apparent lack of geographical hard or soft barriers. Moreover, the observed northwards expansion of the two groups makes it difficult to conclude that they have dispersed independently along the Atlantic coastline. Without isolation, different lineages in a continuously distributed coast can arise in a large panmictic population with a long history70. Sometimes it is hard to realize that any particular genealogy is the single outcome of an infinite number of equally possible genealogies resulting from stochastic outcome71.

Future work and management implications

This species is an important commercial species in Mauritania, but its smaller length in the Lisbon region31, and the existence of more valuable commercial species, has not made the Senegal seabream a main fisheries target. In Lisbon, the species is mostly a bycatch of a traditional shallow artisanal fishery31 and for official statistics, D. bellottii is merely recorded as Diplodus sp. The fact of this species may constitute a future target fishery should prompt an adequate biological survey to assist its proper management. The unexpected pattern of very high diversity in recently-colonised areas within a climate change scenario, deserves further investigation and this species can serve as a useful model for studying the impact of a changing environment on the genetic diversity of a coastal species. Surveys need to be implemented, ecological implications evaluated, and a genomic temporal approach assessed to understand the causes for a successful colonisation without the loss of genetic diversity levels.

Materials and methods

Sampling

Adult specimens of D. bellotti were collected from five locations along its distributional range in the northeastern Atlantic (Fig. 1 and Table 1). These sites included: Mauritania, MAU; Cadiz, CAD in South Spain; Algarve, ALG in South Portugal, and Sesimbra, SES and Lisbon, LIS in Central Portugal. After asserting the species identification, fin clips were provided by fishermen. Samples were preserved in 96% ethanol.

DNA extraction, amplification and sequencing

Total genomic DNA was extracted with the REDExtract-N-Amp Kit (Sigma-Aldrich) following the manufacturer's instructions. The mitochondrial control region (CR) and the first intron of the nuclear S7 ribosomal protein gene (S7) were amplified, with the same kit, in a Bio-Rad Mycycler thermal cycler, using L-pro1 and H-DL172, and S7RPEX1F and S7RPEX2R73, using the following PCR conditions: initial denaturation at 94 °C for 7′, followed by 35/30 cycles (denaturation at 94 °C for 30/45ʺ, annealing at 55 °C for 30/45ʺ, and extension at 72 °C for 1′; values CR/S7, respectively) and a final extension at 72 °C for 7′. The same primers were used for the sequencing reaction, and the PCR products were purified and sequenced in STABVIDA (Sanger sequencing, https://www.stabvida.net/).

Chromatograms were edited with Codon Code Aligner (Codon Code Corporation, https://www.codoncode.com/index.htm) and sequences were aligned with Clustal X 2.174. For the S7 gene forward and reverse strands were obtained and some samples were sequenced twice for error checking. All sequences were deposited in GenBank (Accession Numbers MF120288-MF120372; and MF120408-MF120464, respectively to CR and S7).

Genetic diversity and geographic structure

The standard measures of genetic diversity were computed for the five location sites separately and after pooling (referred to as “total population”). We used a variety of packages developed for R v.3.6.132 to estimate population genetic statistics. We used pegas75 R-package32 to estimate standard descriptive measures of genetic diversity, including number of haplotypes and private haplotypes, haplotype diversity (h)76,77 and nucleotide diversity (π)76 and respective standard deviations.

To assess population structure, we estimated FST‐like statistics78, such as: (1) Nei's GST,77, (2) Hedrick's GST79, (3) Weir and Cockerham θ80 and (4) Jost's D81 using diveRsity R-package82. While the first three estimators measure nearness to fixation, the last quantifies the relative degree of allelic differentiation45. We report these four estimates to provide complementary information on the genetic structure among the Senegal seabream sampled locations. The diveRsity R-package was used to estimate 95% confidence intervals for all estimators using 10,000 bootstrap iterations.

Haplotype accumulation curves

The spider R-package83 was used to obtain haplotype accumulation curves by 1000 random permutation subsampling using the functions haploAccum() and plot.haploAccum() supplied in the spider R-package83 and the Chao 1 estimator84, with chaoHaplo() function. Haplotype accumulation curves provide a graphical way to assess the extent of haplotype sampling by representing the extent of saturation as a function of the number of individuals sampled and the number of haplotypes accumulated. Curves displaying weak asymptotic behaviour suggest further sampling is necessary to capture the genetic variation of the species, while curves with rapid saturation indicate that much of the intraspecific haplotype diversity was sampled. The Chao1 estimates the appropriate minimum sample size to account for all haplotype diversity based on the sample size and number of detected haplotypes as well as the number of singleton and doubleton sequences (those occurring once and those appearing twice) in the dataset.

Genetic connectivity

Population genetic connectivity in D. bellottii was investigated by reconstructing haplotype networks using a statistical parsimony algorithm85, implemented in TCS v.1.2186. The raw output was visualized in the web implementation of tcsBU37. Additionally, an unrooted neighbour-net network was constructed using SplitsTree 487 to investigate the genealogical relationships among haplotypes. The network was built using HKY + I + G nucleotide substitution model for both markers, as the best selected with the modelTest function of the phangorn88 R-package according to the Akaike Information Criterion89 and compatible with the distance algorithms available in SplitsTree 4, and equal-angle split transformation90 with 1000 bootstrap replicates.

Species distribution data: occurrence data

The georeferenced locations of the observed distribution of Diplodus bellottii was obtained from occurrence records from two databases: OBIS (http:// www.iobis.org/) and GBIF (https://data.gbif.org/). We used 94 records of Diplodus bellottii compiled from the Global Biodiversity Information Facility (GBIF, https://www.gbif.org/) and 72 points lifted from the Ocean Biogeographic Information System (OBIS, http:// www.iobis.org/) and added five own records. Spurious geo-reference errors and duplications were deleted from the initial data. Presence points were limited to within 200 m depth to consider the recorded depth range for this species (between 10 and 90 m depth)91. Finally, the remaining 39 presence cells were georeferenced to a 0.083° × 0.083° (= 5 arc min ~ 9.2 km) grid resolution.

Species distribution data: environmental data

The occurrence data was linked to 43 environmental variables available from Bio‐ORACLE92 and MARSPEC93 with a spatial resolution of 5 arcmin using the R package sdmpredictors94. All predictors were delimited up to 200 m depth, given the species known bathymetric preference95, which enhances modelling performances by limiting the extent of extrapolation.

Species distribution data: variable selection

The first level of predictor selection, was done by reducing collinearity between predictors, with variance inflation factor (VIF)96 implemented in R package uSDM97,98. We used a stringent VIF value (< 2) to retain predictors, as recommended for ecological studies see99. The final set of variables was reduced to four (chlorophyll a range, cloud fraction maximum, North–South aspect and sea surface temperature range) complying with two criteria: being conceptually meaningful variables100 and proportion of occurrences to predictors obeying to the general rule of thumb of 10 to 1101 (Table 3).

Table 3 Environmental variables used to model current potential distribution of the Senegal seabream (Diplodus bellottii).

Species distribution data: modelling approach

We produced an ensemble approach to obtain predictions for the current distribution implemented in biomod2 package102 (v. 3.3–7) for R32. One of the main features of the BIOMOD2 software is the ability of producing an ensemble model by combining predictions from single models nine different methods: maximum entropy (MAXENT), general linear models (GLM), general boosted models (GBM, also referred to as boosted regression trees), classification tree analysis (CTA), artificial neural networks (ANN), surface range envelope (SRE), flexible discriminant analysis (FDA), multiple adaptive regression splines (MARS), random forests (RF), with default settings39. Explicit D. bellottii absence records were not available so we used two sets of pseudo-absences (background data) extracted at random in a proportion of ten pseudo-absences per presence cell103. Data (presences and pseudo-absences) were split at random into a calibration (70%) and a validation set (30%). We evaluated 140 models: 2 pseudo-absence sets × 10 iterations × 7 models. Models’ performance was evaluated using two measures: the true skill statistic (TSS104) and the area under the receiver operating characteristic (ROC) curve (AUC)105. For the ensemble modeling, only those models with a TSS and ROC scores above 0.8, and 0.9, respectively (i.e., with minimum ‘good’ rating) were considered104,106.

All calculations and analyses were performed with R version 3.0.3, with the following R packages raster107, rgdal108, dismo109, vegan110, sp111, rJava112, performed in R32 using the R2C2 computational facility (https://rcastilho.pt/R2C2/R2C2_cluster.html).