The taxonomy of Alosa caspia (Clupeidae: Alosinae), using molecular and morphometric specifications, in the South Caspian Sea

Abstract In this paper, classification of subspecies of Alosa caspia (Eichwald, 1838) was examined using morphometric and molecular methods. Preliminarily, four groups of Alosa caspia were identified based on morphometric analyses of specimens taken from the south Caspian Sea. Based on molecular analyses, two of three clades created in a phylogenetic tree with different sequences encoding a certain type of protein were found; therefore, it was assumed that Alosa caspia had no subspecies in the south Caspian Sea. Specimens obtained from different sites were placed in similar clades, which is in contradiction with the idea that the members of each subspecies belong to a separate geographic region. Our morphological results confirm the existence of three subspecies of Alosa caspia, while the phylogenetic analysis based on the cytochrome b oxidase gene does not support the traditionally morphological subdivision of Alosa caspia into three subspecies. The phenotypic differences in the specimens can be generalized to the effect of their different environmental conditions, and from the molecular results it can be concluded that this species does not have any subspecies. Indeed, the selected fragment of cytochrome b was not able to separate the studied groups from one another. It seems, according to these results, that the specimens studied were populations of Alosa caspia species in the south Caspian Sea.


Introduction
The Clupeidae is one of the world's most commercially important families of fishes. In this family, the genus Alosa (Linck, 1790) (subfamily Alosinae) has received research attention concerning its ecology and life history (Baglinière 2000), as well as conservation issues due to the recent declines in many of its species (Waldman 2003). The genus Alosa (Linck, 1790) (shad) is classified as part of the Clupeidae family and comprises about 64 genera and about 218 species worldwide (Nelson et al. 2016;Coad 2017). The Caspian Sea species of Alosa were formerly placed in the genus Caspialosa (Berg 1949). Svetovidov (1952) synonymized Caspialosa with Alosa. Five to seven species of Alosa have been reported from the Caspian Sea (Coad 2017;Esmaeili et al. 2017).
Three subspecies of Alosa caspia have been reported from the Caspian Sea, namely caspia (Eichwald, 1838) (= North Caspian, Central Caspian, Caspian or il'men shad); knipowitschi (Iljin, 1927) (= Enzeli or Anzali shad); and persica (Iljin, 1927) (= Astrabad shad) (Coad 2017). Knipovich (1921) has recorded a species, Caspialosa enzeliensis Iljin, from the south Caspian Sea, which he categorizes as a subspecies of caspia. It is probably an unused manuscript name for what Iljin later described as knipowitschi (Coad 2017). In the south part of the Caspian Sea the subspecies persica is found in the southeast, near Gorgan or Astrabad Bay. Holcik and Olah reported Alosa caspia from the western basin of the Anzali Lagoon. This same species has been reported from the Safid River and Anzali Lagoon as subspecies persica and from the Anzali Lagoon alone as knipowitschi (Holcik & Olah 1992;Abbasi et al. 1999).
Despite the application of morphological characteristics in the classification of species, morphometric and meristic traits are usually changed and their ontogeny is repressed through filial levels (Ihssen et al. 1981;Lança et al. 2014). Morphometric characteristics are resistant and remain unchanged through generations, but when environmental stressors and other factors such as inbreeding, hybridization and bottleneck heavily influence a population, these traits begin to show significant disparity (Beacham 1990). If the above-mentioned environmental and genetic stressors remain unchanged for a considerable amount of time, the fish will be highly susceptible to stressors and their identification will be subject to their existence in their specific habitat and ecological niche (Ihssen et al. 1981). Body shape differences reflect not only the genetic characteristics of populations but also the environmental parameters (Guill et al. 2003). Morphological changes induced by environmental factors can help give a better understanding of phenotypic plasticity (Khataminejad et al. 2013). Factors such as stock density, aggressiveness, stress, food type and quality, swimming performance and fish mobility, salinity, temperature and sea floor substrate influence the presence of polymorphisms in fishes (Favaloro & Mazzola 2003;Hanson et al. 2007;Cadrin 2014). It has been postulated that the pattern of morphometric variations in fish may indicate differences in life history such as growth and maturation rates, mainly because body form is a product of ontogeny (Cadrin 2014).
Geometric morphometrics (GM) is used to quantify the body shape of biological structures (Bookstein 1991;Zelditch et al. 2012) and has been applied to distinguish and differentiate populations based on morphological characteristics (Loy et al. 1999;Turan 1999). This method helps to quantify the changes in shapes and patterns of morphometric variations within and among all groups (Eagderi & Kamal 2013;Khataminejad et al. 2013). Cytochrome b (cyt b) is commonly used as a region of mitochondrial DNA for determining phylogenetic relationships among organisms due to their sequence variability (Castresana 2001). It is considered most useful in determining relationships within families and genera. Comparative studies involving cyt b have resulted in new classification schemes and have been used to assign newly described species to a genus as well as to deepen the understanding of evolutionary relationships (Castresana 2001). In the present study, geometric morphometric traits and one mtDNA gene segment were used to investigate the systematic relationships of Alosa caspia subspecies.

Morphological analyses
Morphometric and meristic traits. Taxonomic identification of the samples was carried out according to Svetovidov (1952), Whitehead (1985), Turan et al. (2007), Eschmeyer (2014) and Coad (2017) (Table I). Thirty traditional morphometric characteristics were measured using a digital caliper to the nearest 0.01 mm (Lagler 1956), and 10 meristic variables were counted in each specimen by direct observation (relative to Standard length (SL) and Head length (HL)) ( Table II). The morphometric measurements and meristic counts were performed following standard methods (Jayaram 2002). Based on previous identification keys, the fish (three subspecies) with different geometric morphometric and meristic traits were divided into four groups (A, B, C and D) (Svetovidov 1952;Whitehead 1985;Coad 2017) (Table III). According to previous studies (Svetovidov 1952;Hoestlandt 1991;Coad 2017), especially Coad (2017) andSvetovidov (1952), Alosa caspia has three subspecies in the Caspian Sea basin. We aimed to verify these subspecies, using geometric morphometric, meristic and molecular techniques.
In the present study, group A and group B specifications were similar to subspecies Alosa caspia caspia and Alosa caspia persica, respectively, and therefore we considered them two subspecies. The morphometric and meristic characteristics of the remaining specimens could not be identified as Alosa caspia knipowitschi. Therefore, they were classified into groups C and D because of their morphometric and meristic discrepancies.

Geometric morphometrics
Geometric morphometric measurements were employed in this experiment. The left side of the specimens (with dorsal and anal fins held erect using pins) was photographed using a digital camera (Canon Power Shot SX 30 IS). Fifteen homologous landmark points were defined and digitized on twodimensional images using tpsDig2 software v. 2.16 (Rohlf 2004). Landmarks (putatively homologous points on anatomical features) were chosen to best represent the external shape of the body (Bookstein 1991) (Figure 2). Information unrelated to shape, including size, orientation and position, was removed by generalized least squares (GLS) Procrustes superimposition. Moreover, the landmarks were submitted to a generalized Procrustes analysis (GPA) and partial warp (shape variables) and relative warp scores (with α = 0 which is a principal component analysis of shape variables; Rohlf 1993) were calculated using the software tpsRelw v. 1.46 (Rohlf 2008).

Statistical analysis
Analysis of variance (ANOVA) was performed for each morphometric characteristic to evaluate the significance of differences between the groups (Zar 1984). The principal component analysis (PCA) was performed to summarize the variation among the groups in as few dimensions as possible. To examine the suitability of the data for PCA, Bartlett's test of sphericity and the Kaiser-Meyer-Olkin (KMO) measures were performed. KMO statistics vary between 0 and 1 and values greater than 0.5 are acceptable (Nimalathasan 2009;Yakubu & Okunsebor 2011). Canonical variant analysis (CVA)/Multivariate Analysis of Variance (MANOVA) was carried out to investigate the power of distinction among the groups. As a complement to discriminant analysis, morphometric distances between the individuals of four groups were inferred from cluster analysis (CA) by adopting the Euclidean square distance as a measure of dissimilarity and the unweighted pair group method with arithmetical average (UPGMA) as the clustering algorithm (Sneath & Sokal 1973). A discriminant function analysis (DFA) was also performed to compare the body shape of the specimens using consensus configuration and deformation grids.
Statistical analyses for morphological data were performed using the SPSS v. 22 software package, Numerical Taxonomy, Multivariate Analysis System (NTSYS-pc) (Rohlf 1990), and Microsoft Excel 2010. To determine the most effective morphometric measurement to differentiate the studied  subspecies, the contributions of variables to principal components (PC) were examined.

Molecular analysis
DNA extraction, PCR and sequence acquisition. DNA extraction from fin tissue was done using the Roche® kit. We collected DNA sequence data from a 515-bp fragment of the cytochrome b oxidase (cyt b) mitochondrial gene. This mtDNA gene was chosen due to the fact that it offers a different level of sensitivity for phylogenetic analysis. The cyt b fragment was amplified using Alosa-specific primers: Alocytbf1 (CCT TCT AAC ATT TCA GTC TGA TG) and Alocytbr1 (AGG ATT GTG GCC CCT GCA ATT AC), both located at cyt b (Faria et al. 2006). A 515-bp fragment of the cyt b was amplified by polymerase chain reaction (PCR). PCR amplification conditions were as follows: one preliminary denaturation at 94°C for 5 min followed by 35 PCR cycles. Strand denaturation was made at 94°C for 45 s, annealing at 61°C/45 s, and primer extension at 72°C for 45 s. A final extension at 72°C for 5 min was performed. The PCR cocktail used for mtDNA gene was 15 μL master buffer, 1.5 μL of each primer, 7.5 μL template DNA, and 4.5 μL water for a total reaction volume of 30 μL. The PCR fragments of the DNA samples were separated by electrophoresis on 2% agarose gel and visualized by safe staining on a gel doc instrument.
Purification and organization of the PCR products were conducted at Macrogen Korea (Seoul, Korea) Laboratories using the aforementioned primer pairs. No indications of unexpected stop codons occurred in any sequence. A consensus sequence made from the 60 specimens of Alosa caspia using the International Union of Pure and Applied Chemistry (IUPAC) codes was deposited in GenBank as MG878385.

Editing and alignment of the sequences
After the forward and reverse sequences were edited in the CLC Genomics Workbench software program and the consensus sequence was made for each individual, a Multi-FASTA file was made including the 60 sequences of the studied specimens of Alosa caspia, along with 15 sequences of some Alosa species other than Alosa caspia. The Multi-FASTA file was in the AliView software (Larsson 2014) using the MUSCLE algorithm. The alignment file was adjusted manually by eye using Mesquite v. 2.5 (Maddison & Maddison 2008).

Data analysis
For the phylogenetic analysis, Alosa pseudoharengus was used as an outgroup (Faria et al. 2006). The Bayesian analysis was conducted with MrBayes, v. 3.2.2. Using JModeltest (Darriba et al. 2012) on the CIPRES Science Gateway web portal (Miller et al. 2010), the General Time Reversible model incorporating invariant sites and a gamma distribution (GTR + I + G) was selected based on the Akaike information criterion (AIC). The setting for the priors to estimate the posterior probabilities and branch lengths of the tree was based on software defaults. We carried out two independent Markov chain Monte Carlo (MCMC) runs with 20,000,000 generations. Four parallel chains were applied for each run. The printfreq and samplefreq parameters were both set to 1000 and a 50% majority rule consensus tree was built after we considered burninfrac = 0.25, sump burnin = 5000 and sumt = 5000. The convergence of all the parameters was checked in Tracer v. 1.6 (reference). In our analysis, gaps were treated as missing data.
The bootstrapping analysis was carried out in PAUP* v. 4.0bl0 (Swofford 2002) using the heuristic search of maximum parsimony, the options for which were the random addition sequence and tree bisection-reconnection of branch swapping in 10,000 replications. The consensus tree was visualized in FigTree (v. 1.4.2) and in Adobe Illustrator CC 2017.
To measure the genetic distances in the software MEGA v. 7, based on the Kimura two-parameter substitution model (Kimura 1980), we had to make an alignment file in which all the sequences are aligned with a specific fragment. Hence, we had to cut the beginning and the end of all the sequences used in the phylogenetic analysis in the AliView program to make an alignment in which each sequence was made of 294 bp. This was the maximum length, which was common in all of the 60 sequences of specimens from the three provinces that we used in the phylogenetic analysis. All generated DNA barcodes are deposited in the National Center for Biotechnology Information (NCBI) GenBank, and given respective accession numbers (Table IV).

Morphometric and meristic results
Alosa caspia is characterized by a relatively deep and compressed body likened to a "shad" shape, and not elongated and rounded as in some related species that are likened to a "herring" shape. It has between 50 and 180 gill rakers, which have been variously reported as thin or thick, long (obviously longer than the gill filaments), and forming a convex outline on the lower arch. Its teeth are poorly developed in both jaws and Unbranched rays dorsal fin (DHR) values were observed for 3-4 and 12-15 Branched rays dorsal fin (DSR), Unbranched rays anal fin (AHR) for 3-4, usually 3, and 15-20 Branched rays anal fin (ASR) measurements (Table V). It also has 49-54 scales in each lateral series (Coad 2017). The morphometric and meristic characteristics of this species are listed in Tables III and V, respectively. ANOVA showed that all of the morphometric measurements except caudal peduncle width (P = 0.062), post-orbital distance factor (P = 0.36) and meristic measurements except for unbranched (P = 0.12) and branched rays (P = 0.38) in the anal fin and unbranched rays in the dorsal fin (P = 0.27) were significantly different between the samples (P < 0.000).
The KMO coefficient obtained was 0.935 and 0.514 for morphometric and meristic characteristics, respectively, designating the appropriation of this test at good and medial levels (Field 2000), and the sampled data is appropriate to proceed with multivariate analysis.
PCA of 30 morphometric measurements was extracted from 21 factors with eigenvalues > 1, explaining 99.79% of the variance. The first principal component (PC1) accounted for 81.99% of the variation, and the second principal component (PC2) accounted for 11.11%. The most significant loadings on PC1 were total length, standard length and fork length. In the PCA of the 10 meristic data points, the first component explained 99.17% of the total variation and the subsequent components (nine PCs) represented the remaining 0.83%. For the DFA, the averages of the percentage correctly classified (PCC) were 65.1% and 57.4% for morphometric and meristic characteristics, respectively. The DFA results showed that function 1 had the greatest effect on group separation in morphometric and meristic characteristics, with a variance of 83.2% and 96.4%, respectively. The scatter plot of canonical DFA for morphometric and meristic data indicates that group A had the greatest distance from the other three groups on the x-axis, and groups B, C and D were not nearly as separated on that same axis (Figure 3).
The dendrogram derived from CA of Euclidean square distances showed four main groups based on meristic and morphometric characteristics (Figures 4  and 5). However, the cluster result for the meristic characteristics showed that the individuals of these groups had the same clades with great homogeneity, which was in accordance with results obtained from PCA, CVA and DFA. The scatter plot of the CVA for morphometric and meristic characteristics showed that groups C and D were overlapping with each other. Group A and (partially) group B were separate from the other groups (Figures 6 and 7). Finally, these results confirmed the existence of three groups (named subspecies).

Geometric morphometric results
The PCA for all specimens explained 79.39% of shape variations by the first two PCs extracted from the variance-covariance matrix (PC1 = 56.42% and PC2 = 22.97%). The PCA for all specimens revealed significant differences in body shape among the studied specimens, with apparent changes in head size (landmarks 4-6), body height (landmarks 8-10) and anal fin measures (landmarks 11 and 12) (Figure 8).
The CVA/MANOVA revealed significant differences in body shape among the groups (P < 0.05). The CVA plot revealed a clear separation in groups A and B from C and D, which still had overlaps with each other (Figure 8). Based on CVA results, the observed differences are related to head, snout, and caudal peduncle regions and body depth in landmarks 5, 6 and 8-12 (Figure 9).
There was a high degree of separation in body shape characteristics between group A and the  other three groups, with a slight degree of separation between group B and the groups C and D ( Figure 10). The DFA classified 66.3% in origin data and 63.9% in cross validation of specimens into the correct groups.
The greatest percentage of the separation of each group from the others was 73.3%, 67.5%, 53.8% and 47.5%, which belonged to groups A, B, C and D, respectively (Table VI). In the DFA, function 1 with 76.3% of variance had the greatest effect on the groups' separation. The dendrogram showed that groups A and B of Alosa caspia sp. were completely distinct from each other, but groups C and D were in the same clade in terms of morphometric geometric characteristics ( Figure 10).
The grids of Figure 11 indicate a large head and snout, short and deep caudal peduncle and large body height in midsection for group A; shorter body height, narrower caudal peduncle and smaller head for group B compared to group A; and shallow body depth, elongated head and snout and longer caudal peduncle for groups C and D. The body shape of Alosa caspia in groups C and D showed  close similarity. Therefore, the GM result revealed differences in body shape among the three groups (named subspecies).

Molecular results
The final alignment yielded an average of 294 bp for cyt b. The genetic distances within/between subspecies based on combined sequences were calculated using MEGA and the results are shown in Table VII. The range of genetic distance between the specimens of clades 1 and 3 was 1.5, and the distance between the specimens of clades 1 and 2 and clades 3 and 2 was 1.7 and 0.9, respectively. The values for all statistical indices were similar, indicating very low genetic distance between the groups (Table VII).
Two different phylogenetic analyses produced similar topologies based on the cyt b data set: Bayesian and maximum parsimony (Figure 12). Based on this tree a polytomy clade was formed between the present study's specimens and the four species Alosa alosa, Alosa fallax voucher, Alosa killarensis and Alosa masedonica. The results of the molecular analyses revealed two molecular groups with a special type of protein encoded within them. A gene mutation was, however, observed in the Golestan 118 specimen, which differed from the other specimens in the coding of that protein.
In the comparison of sequences between Alosa caspia and the other species of the genus Alosa, six out of 15 species were placed inside and between the two clades.

Discussion
The multivariate analysis of morphometric characteristics classified the named subspecies of Alosa caspia of the south Caspian Sea into four distinct groups (three named subspecies). While groups A and B were presumably Alosa caspia caspia and Alosa caspia persica, respectively, the remaining specimens with different morphological characteristics were divided into two groups (C and D) with little similarity to Alosa caspia knipowitschi specifications. In general, fishes demonstrate greater variance in morphological traits both within and between populations than other vertebrates do, and are more susceptible to environmentally induced morphological variation (Turan et al. 2006), which might reflect different feeding environments, prey types, food availability or other features (Turan et al. 2006). Therefore, the distinctive environmental conditions of the regions of the present study in the south Caspian Sea may impress morphological differentiation on the specimens from these three locations.
Physical differences in water temperature, water velocity and salinity between western and eastern coastal waters of the Caspian Sea, especially in the nursery areas of the Sefidrood and Gorganrood rivers (Mazlomi et al. 2009), are the presumed causes of the ecological differences between Caspian Sea fishes. Differences in water temperature and seasonal variation in ecological conditions may also cause varying selection pressure on growth and morphology (Pakkasmaa & Piironen 2001). Slower growth rates could be responsible  for the differences in head proportions. In general, larger head and body proportions are observed in individuals that grow slowly (Guenette et al. 1992). The average surface temperature is generally higher in the eastern region than in the western region (Ginzburg et al. 2005). At optimal temperatures, growth and development can proceed more rapidly, while at sub-optimal high or low temperatures, growth and development are slow and lead to larger fins (Walsh et al. 2001). The overlapping distribution of some samples may be attributable to extensive intra-species reproduction. According to Coad (2017), the subspecies of Alosa caspia have separate spawning areas. The most common spawning occurs in the north Caspian Sea near the outflow of the major rivers, particularly the Volga. The subspecies Alosa caspia knipowitschi spawns in the Anzali Wetland (and probably the Chemkhaleh River to the east of the Safid River) in May and June after a spring migration from the sea. Spawning of the subspecies Alosa caspia persica takes place in Gorgan Bay. However, separation of the spawning area alone does not support the existence of subspecies.
The present study on the shape indices of four groups (three named subspecies) of Alosa caspia demonstrated significant variation among the groups. Differences among the groups, divided into the three categories of head and snout, caudal peduncle, and body depth, are as follows: (1) large head and snout, short and deep caudal peduncle and large body depth in the midsection for group A; (2) slight body height, narrow caudal peduncle and comparatively small head for group B; and (3) shallow body depth, elongated head and snout and longer caudal peduncle for groups C and D. The results revealed a positive association between body depth and the shortness and depth of the caudal peduncle. The CVA revealed that the observed differences in Alosa caspia subspecies were related to head, snout and caudal peduncle regions and body depth. This result is consistent with previous studies on other fish species such as darters (Page & Swofford 1984), in which the situation was explained as an adaptation for inhabiting stream riffles.
In contrast, a relatively shallow body may be useful for benthic pool fishes, as pointed out by Wood and Bain (1995). Adult morphology may be determined by diversifying selection and ecological adaptation (Cadrin 2014). For example, observed   (Cadrin 2014), and changes in head and mouth shape can be considered reflective of differences in selection of food items and direction of feeding (Langerhans et al. 2003).
The molecular results showed two clades in the calculations of the genetic distance. However, these clades do not indicate the existence of two subspecies as they had different sequences that encoded a certain type of protein. Specimens belonging to different sites were placed in similar clades, which contradicts the concept that the members of each subspecies belong to a separate geographic region. Therefore, although the individuals of Alosa caspia were nested in two of three clades, we did not find enough genetic distance to consider them different subspecies. This result contradicts the Alosa subspecies specification, as described by Coad (2017) based on key morphometric characteristics, because the members of each clade had rather different morphometric characteristics from each other.
In the Bayesian tree of members of Alosa caspia species, specific members of clade 3 were closer to the European species, especially the species that inhabit the Black Sea and the Volvi (A. macedonica) lake. Therefore, they were placed in same clade and there was little genetic difference from each other. In clade 2, Alosa fallax, Alosa killarnensis and Alosa alosa x Alosa fallax with a very low genetic difference from each other had close genetic relationship with the members of the clade 1 and clade 3 with values 1.7% and 0.9%, respectively. The members of clade 2 were close to Alosa caspia in comparison to the American species such as Alosa sapidissima, Alosa alabamae, Alosa pseudoharengus, Alosa aestivalis, Alosa sapidissima and Alosa alabamae that were available in the Bayesian tree. This is due to the association of the Black Sea with the Caspian Sea and the identical origins of these two seas as well as the effect of the geographical distance. According to Pourkazemi et al. (1999), geographic distance is the most important factor in the formation of the genetic structure of populations, species and subspecies. In addition to the small geographical distance, the origins of the Black Sea and the Caspian Sea are the same as both are remnants of the Tethys Sea, which have been separated for several million years (Biju-Duval et al. 1977;Gorur 1988).
In fact, the Caspian Sea is considered a unique geographic habitat (Firoozfar et al. 2014), and since members in the former classification (named subspecies) do not have separate reproductive areas and immigration outside the Caspian Sea (for example, freshwater rivers), the classification of Alosa caspia species into three subspecies would be fundamentally ambiguous. The poor phenotypic differences between the members can be generalized to the effect of different regional conditions.
In conclusion, the groups of Alosa caspia distinguished in this study are distinct based on discriminant function and morphometric indices, yet the molecular analysis of cytochrome b failed to distinguish among these groups, and it seems that we are witnessing a polytomy. The existing morphological differences can be related to the interactions of the Alosa caspia species as well as environmental adaptation (Kishida et al. 2011). In present study, the phylogenetic analysis of cytochrome b does not support the traditional morphological subdivision of Alosa caspia into three subspecies. Due to the length of the selected sequence, it is possible the selected fragment of cyt-b gene in the species Alosa caspia does not have enough polymorphism for exploring the phylogenetic relationships among the studied groups in the south Caspian Sea. Therefore, to complete and validate the information obtained from the comparison of the cyt-b gene fragment among the studied groups, we suggest that such studies include complete genes or multiple molecular markers to ensure that some of the typical caveats of mtDNA-based phylogenies are considered and the morphological information is integrated. These data could serve as an objective template to aid systematic revision, or more detailed local studies on the status of specific taxa.