Population structure of the Indonesian giant tiger shrimp Penaeus monodon: a window into evolutionary similarities between paralogous mitochondrial DNA sequences and their genomes

Here we used both microsatellites and mtCR (mitochondrial DNA control region) sequences as genetic markers to examine the genetic diversity and population structure of Penaeus monodon shrimp from six Indonesian regions. The microsatellite data showed that shrimp from the Indian and the Pacific Ocean were genetically distinct from each other. It has been reported previously that P. monodon mtCR sequences from the Indo-Pacific group into two major paralogous clades of unclear origin. Here we show that the population structure inferred from mtCR sequences matches the microsatellite-based population structure for one of these clades. This is consistent with the notion that this mtCR clade shares evolutionary history with nuclear DNA and may thus represent nuclear mitochondrial pseudogenes (Numts).


Introduction
Penaeus monodon (black tiger shrimp) (Fig. 1) is a marine crustacean widely cultured in the Indo-Pacific, especially in the Indonesian coastal region (Holthuis 1976). During the past two decades, the dramatic growth in mariculture industry has propelled Indonesia into one of the top producers of shrimp worldwide (FAO (Food & Agriculture Organisation), 2014). As a result, wild shrimp populations have become liable to devastation, as most fry are sourced from the wild due to their superior fertility and fecundity (Rosenberry 2004;Gillett 2008). In order to improve the output of local farmers as well as to understand the effect of this exponential growth of production of P. monodon in Indonesian waters, it is necessary to better understand the structure and migration patterns of wild shrimp populations in these waters.
Studies carried out during the last two decades aimed at better understanding the underlying population structures, phylogenetic relationships, and geographic distributions of various marine organisms inhabiting the Indonesian coastal region have consistently found this region to be a major center of marine biodiversity (Klinbunga et al. 1998;Benzie et al. 2002;Sugama et al. 2002). While a number of historical biogeographic factors have been proposed to explain the diversity in this region, the factor that is most widely accepted is the complete closure of the Indian and Pacific oceans during the ice ages (Palumbi 1997).
A number of approaches using a variety of molecular tools such as allozymes , mtDNA-RFLP (Brooker et al. 2000), or sequencing the mtCR (mtDNA control region) (Walther et al. 2011) have been used to study the population structure of P. monodon. Interestingly, Walther et al. (2011) inferred the presence of paralogous nuclear copies of mitochondrial origin (Numts) in addition to the putatively true mitochondrial sequences in a large number of P. monodon samples, concluding that this may have misled inferences of population structure in P. monodon in the past.
Here, we aim to further explore the genetic diversity and population structure of P. monodon in Indonesian waters by combining microsatellite and mtCR data, specifically taking into account the presence of nuclear copies of mtCR.

Amplification and analyses of microsatellite markers
According to Pan et al.'s (2004) study, a set of 15 polymorphic markers (Table A1) which resulted in high expected heterozygosity values (H E >0.92) were chosen for genotyping.
All PCR experiments were performed in 96-well plates using an Applied Biosystem Veriti 96-well fast Thermal Cycler (Carlsbad, CA, USA). Each reaction (20 lL) contained 10 ng DNA, 10 lL of 2x QIAGEN Fast Cycling PCR Master Mix (Venlo, Limburg, Netherlands), and 0.02 lmol/L of both, forward and reverse primers.
The thermocycling profile consisted of an initial activation step of 3 min at 95°C, followed by 40 cycles of denaturation for 10 sec at 95°C, annealing for 30 sec at 50-60°C (according to primers used) and extension period of 2 min at 72°C. PCR products were visualized on high-resolution synthetic polymer-based matrix, Spreadex EL 400-600 gels (Elchrom Scientific, Cham, Switzerland) run in an Elchrom ORIGINS apparatus and sized with M3 size marker (Elchrom Scientific). Staining was performed with 1xGelStar nucleic acid gel stain (Lonza, Basel, Switzerland). For confirmation, samples with ambiguous genotypes were independently amplified and scored twice.

Data analysis
Based on microsatellite data (Jarne and Lagoda 1996), FSTAT version 2.9 (Goudet 1995) was used to describe the intrapopulation genetic diversity using the number of alleles (A), the number of private alleles per loci (A p ), the allelic richness based on sample size (A r ), the expected heterozygosity (H E ) in comparison with the observed heterozygosity (H O ) (Saitou and Nei 1987), and the inbreeding coefficient (F IS ) for the six shared microsatellite loci that were examined.
Null alleles were mitigated using FreeNA (Chapuis and Estoup 2007). To test whether the observed genotype frequencies of individual loci deviated from the expected HWE (Hardy-Weinberg equilibrium) within a population, and to test for heterozygote deficiency across loci and populations, GENEPOP v 4.0 (Raymond and Rousset 1995) was used.
To investigate interpopulation structures, the genetic distance (pairwise F ST value) between any two populations was calculated by comparing allele frequencies (Wright 1949). FSTAT v.2.9 (Goudet 1995) was used to determine the statistical significance of an F ST value by performing 10,000 permutations. A matrix of F ST values was then used to visually display the relationships between populations. The resulting matrices were also used to construct phylo- genetic trees using the neighbor-joining (NJ) method as implemented in QuickTree (Howe et al. 2002).
To determine the most likely sources of genetic ancestry (K) without considering sampling locations, a Bayesian clustering analysis was performed using the admixture model of ancestry and correlated allele frequency parameters in STRUCTURE v2.3 (Pritchard et al. 2000) with 1.2 9 10 6 MCMC iterations (2 9 10 5 of which were discarded as burn in). In order to obtain the real number of sources of genetic ancestry, the modal value of ΔK, a quantity based on the second-order rate of change with respect to K of the likelihood function was used (Evanno et al. 2005).
Tested K values ranged from 1 to 8, and for each value of K, six replicates were performed. The most likely K was determined by plotting their posterior probabilities. Comparing posterior probabilities predicted for each K value, the value immediately below the posterior probability values with the smallest differential was taken as the most likely number of sources of genetic ancestry.
For our analysis, sequences from haplogroups A and C were analyzed. These sequences were obtained post-editing and FaBox v1.41 (Villesen 2007) was used to collapse the sets of sequences into haplotypes and convert them into input files for Arlequin v3.5 (Excoffier and Lischer 2010) which was used to calculate the pairwise F ST matrices. The resulting matrices were then used to construct phylogenetic trees using the neighbor-joining (NJ) method utilizing the QuickTree software (Howe et al. 2002). Sequences were deposited at GenBank (accession numbers HQ724331-HQ724473).

Comparative analysis for the elucidation of the likely source of the paralogous sequences
To elucidate the most likely source of the paralogous sequences found (haplogroup C) in addition to the putatively authentic mitochondrial sequences (haplogroup A), for this analysis only shrimp individuals were retained for which both, microsatellite data and mtCR, data were available. This resulted in a reduction in the total sample size for haplogroup A (n = 96) comprised of 20 individual shrimp from Aceh, 9 from Cilacap, 15 from Grajagan, 17 from Bali, 15 from Sumbawa, and 20 from Timika, while haplogroup C (n = 24) retained the same sample size. Both microsatellite data analyses and mtCR sequence analyses as described above were repeated for the reduced data sets.

Microsatellite analysis
DNA from 90 of the 115 P. monodon samples from six locations was successfully genotyped at the 15 microsatellite loci targeted. In order for a rigorous comparison of all individuals from all populations, analysis was restricted to only six microsatellite loci for which data were generated across all individual P. monodon samples.
Across the 90 genotyped P. monodon individuals, allelic polymorphism was present at each of the six microsatellite loci examined, with a total of 38 distinct alleles being identified across all loci (Table A1). Locus Pm5271 (21 alleles) displayed the most polymorphism followed by loci Pm528 (20 alleles) and Pm4858 (17 alleles). Locus Pm5625 (8 alleles) displayed the smallest amount of polymorphism. When allelic polymorphism across all six loci was considered collectively, the mean number of alleles (A) per locus ranged from 6.83 in Sumbawa (17 shrimp) to 3.16 in Cilacap (9 shrimp). Within each of the sampled populations, mean observed heterozygosity (H o ) determined across all microsatellites ranged from 0.784 for Aceh (22 shrimp) to 0.532 for Cilacap (9 shrimp) (see Table A2). The mean expected heterozygosity (H E ) ranged from 0.425 for Timika to 0.262 for Bali.
The highest allelic mean number (A r ) across all six loci was observed in Sumbawa (3.12) followed by Aceh (3.10) and Timika (3.01) and was lowest for Grajagan (2.67). Most of these were rare alleles, with allele frequencies <5% at each locus, in each population.
Shrimp from Sumbawa possessed the highest (11) private alleles (A P ), while Cilacap showed the least (2). Inbreeding coefficient (F IS ) calculated for each of the six microsatellite loci for each of the six populations showed statistically significant departure from HWE (P < 0.05; Table A1). F IS was greatest in Timika (0.50) indicating a low level of genetic diversity compared to Cilacap (F IS = 0.27) indicating a higher level of genetic diversity.
The estimated genetic distance among populations (e F ST values) revealed statistically significant (P < 0.05) differences between all pairs except two population pairs ( Table 1). The populations for which pairwise F ST values were not significant were Aceh-Sumbawa (0.043) and Bali-Cilacap (0.049). In comparison, the highest F ST values to other populations were observed for Timika (0.078-0.140) and Grajagan (0.052-0.140). A neighborjoining tree generated from the pairwise F ST data clustered P. monodon from the six locations into three distinct clades, one comprised of shrimp originating only from the Pacific Ocean, another comprised of shrimp from the center of the Indonesian isles, while the third clade comprised of shrimp predominantly from the Indian Ocean. (Fig. 3). Clade 1 comprised of one branch of shrimp from Aceh, Clade 2 comprised only of a branch from Timika, and Clade 3 comprised of two sub-branches each split to include shrimp from Bali and Cilacap on one and Grajagan and Sumbawa on the other.
Bayesian analysis of microsatellite allele frequencies suggested that the 90 P. monodon samples most likely segregate into three sources of genetic ancestry (Fig. 4). Most individuals from Sumbawa, Bali, and Grajagan derived their genetic ancestry from a predominantly single source, and individuals from Timika derived their ancestry almost exclusively from the two other sources, while individuals from Aceh and Cilacap showed admixture from all three ancestral sources to varying degrees.

mtDNA control region sequence analysis
For haplogroup A, a total of 38 haplotypes were identified in an alignment of a 465-bp mtCR sequence amplified from 111 P. monodon which consisted of samples from all six populations. Of these haplotypes, 27 (71%) occurred only once, these haplotypes predominately occurred among shrimp derived from Sumbawa and Grajagan. Both the F ST matrix (Fig. A3) and the neighbor-joining tree clustered the P. monodon individuals into three clades. Clade 1 comprised of shrimp from Bali, Clade 2 from Sumbawa, Clade 3 from Cilacap, and Clade 4 consisted of two branches with shrimp from Grajagan and Aceh forming a branch and Shrimp from Timika forming another branch as an outgroup to the first (Fig. 5).
For haplogroup C, a total of 18 haplotypes were identified in alignments of 465-bp mtCR sequence amplified from 24 P. monodon samples. A total of 14 (78%) of these haplotypes occurred only once, which were mostly found among shrimp from Aceh. The F ST matrix  ( Fig. A4) and the neighbor-joining tree constructed clustered the 24 P. monodon of predefined populations into different clades than that of haplogroup A sequences: Clade 1 comprised of shrimp from Grajagan, Clade 2 those from Bali, Clade 3 clustered shrimp from Cilacap and Aceh together, and Clade 4 comprised only of shrimp from Sumbawa (Fig. 6).
Comparative analysis of microsatellite and mtCR for the elucidation of the probable source of the paralogous sequences When the F ST values from both the microsatellite alleles and the mtCR sequences were calculated for the reduced data set of haplogroup A (96 individuals), the resulting F ST matrix (Figs. A5 and A6) as well as the neighbor-joining trees generated were then compared. This neighbor-joining tree generated from the F ST calculations of microsatellite DNA clustered the predefined shrimp populations into three clades, Clade 1 comprised of shrimp from Aceh, Clade 2 those from Sumbawa and Grajagan, Clade 3 consisted of two branches; one comprising of Bali and Cilacap and the other with shrimps from Timika which formed a sister branch to the above (Fig. 7). This was mostly similar to that of the full data set with the only difference being the clustering of Timika as an outgroup in the third clade as opposed to clustering it as a clade on its own.
In similarity to the full data set, the Bayesian analysis of the microsatellite allele frequencies from the reduced data set for haplogroup A also suggested a likely segregation of the 96 P. monodon samples into three sources of genetic ancestry (Fig. 8).
The neighbor-joining tree created from mtCR F ST values for the reduced data set were completely different from   those of the microsatellite data. Its predefined populations were clustered into four clades with Clade 1 comprised of Cilacap, Clade 2 of Sumbawa, Clade 3 branched into two with Timika on one and the other consisted of two nodes with Grajagan and Aceh, and Clade 4 with Bali (Fig. 9).
For the reduced data set of haplogroup C (24 individuals), F ST matrix (Figs. A6 and A7) and the subsequent neighborjoining tree generated from microsatellite data clustered the predefined populations into three Clades as in the full data set. The constituents of the three clades were varied greatly from that of the full data set. The three clades of the reduced data set consisted of Clade 1 with Sumbawa, Clade 2 with two branches, one of which consisted of Grajagan and another with two sub-branches comprised of Cilacap and Aceh, while Clade 3 consisted of only Bali (Fig. 10). The Bayesian analysis of the microsatellite DNA data revealed no clear source between these three populations, and this may be due to the small sample size (Fig. 11).
The F ST matrix and neighbor-joining tree generated from mtCR data clustered the predefined populations into three clades which, while being very different from those of the reduced data set of haplogroup A and the full data set, were similar to those of haplogroup C reduced microsatellite DNA data set. Clade 1 consisted of two branches, Cilacap and Aceh; Clade 2 with Grajagan and Sumbawa; and Clade 3 of only Bali (Fig. 12). Comparatively Clade 3 was seen too similar to that of the microsatellite data, while Clade 1 was seen to closely resemble that of Clade 2 of the microsatellite data.

Discussion
Genetic diversity and population structure of P. monodon in Indonesian coastal waters Here we examined the genetic diversity and population structure of 90 specimens of P. monodon obtained from six geographic locations. These locations are widely spread across an area of 6000 KM of the coastal regions   of the Indonesian islands. Both microsatellite (6 loci) and mtCR sequences were used in the analysis.

Microsatellite analysis
Based on the microsatellite analysis of the full data set, the genetic diversity among the six P. monodon populations was found to be relatively low with respect to the mean number of alleles and heterozygosity values (Table A2). This is consistent with a previous analysis based on allozyme data for Indonesian coastal samples . Our results also revealed that based on the full data set, P. monodon from Aceh and Timika are genetically distinct from each other as well as those from the central Indonesian isles of Cilacap, Grajagan, Sumbawa, and Bali.
Bayesian analysis suggested that the likely genetic ancestry of shrimp from the central Indonesian isles (Sumbawa, Bali, and Grajagan) was different from those from the Pacific Ocean (Timika). This corresponds to findings from other samples from the Indo-Pacific on limited gene flow between the Indian Ocean and the Pacific Ocean (Palumbi 1997;Duda and Palumbi 1999;You et al. 2008). This strong differentiation has been hypothesized to be mostly due to the separation of the Indian and Pacific Ocean populations when lower sea levels restricted the tropical sea passages between these oceans. Another barrier to the movement of P. monodon larvae may have been the upsurge of cold water at the base of the Indonesian arc resulting in narrow channels between the eastern Indonesian islands (Palumbi 1997). In addition, this may also explain the high F IS value observed for Timika. Due to the geographic remoteness of Timika in the Pacific Ocean from the rest of the study locations, such genetic distinction was expected. We predict this could be explained with the Member-Vagrant evolutionary model that restricted gene flows result in locally adapted gene pools and stable genetic structures.

mtDNA control region sequence analysis
Penaeus monodon mtCR sequence data for this study was obtained from previously published literature (Walther et al. 2011) where it was found that about a quarter of the sequences were paralogous sequences (haplogroup C), highly diverged from the major haplogroup A. In this publication, we reanalyzed the mitochondrial sequences in conjugation with microsatellite data to elucidate the possible source of the paralogous sequences.
When the data were initially analyzed, the resulting mtCR data for each of the two haplogroups A and C were compared with the microsatellite data. The results of the analysis in terms of the interpopulation structure between the six population groups showed strong differences between microsatellite and mtCR analysis. The analysis of the mtCR sequences for haplogroup A clustered it into four distinct clades, that is, Bali, Sumbawa, Cilacap, and a single clade comprising of Timika and a branch with Grajagan and Aceh. Haplogroup C was also clustered into four distinct clades but differentiated from haplogroup A; with Grajagan, Bali, and Sumbawa forming three distinct clades, and Cilacap and Aceh jointly forming the fourth clade. In the case of haplogroup C, DNA samples from shrimps from Timika were not amplified as this haplogroup represented only the western edge of the Indo-Pacific (Walther et al. 2011). While no clear conclusion could be drawn from this comparison, it is evident that the analysis was being skewed by data from shrimps for which only microsatellite but no mtCR data was available and vice versa. To control for this bias, the data set was rearranged to include only individuals for which both data types were available.
With the new data set, clearer inferences became possible. While the microsatellite analysis of haplogroup A data set resulted in a different interpopulation structure in comparison with the mtCR analysis, haplogroup C had mostly similar interpopulation structure between the microsatellite analysis and the mtCR analysis. Based on haplogroup A's mtCR analysis being very different from the microsatellite DNA, we hypothesize that haplogroup A is likely comprised of the true mitochondrial sequences which evolved separately from nuclear DNA. In addition, it has been found that phylogenetically, the mitochondrial reference sequence for P. monodon (Wilson et al. 2000) belongs to haplogroup A (Walther et al. 2011). Haplogroup C with a comparatively similar interpopulation structure between the microsatellite analysis and the mtCR analysis may point toward it sharing evolutionary history with nuclear DNA. We further hypothesize that haplogroup C may represent Numts (Bensasson et al. 2000(Bensasson et al. , 2001Frey and Frey 2004;Thalmann et al. 2004). Further evidence is the indication on the basis of phylogenetic trees from (Walther et al. 2011) pointing toward the sequences in haplogroup C being more ancestral than the sequences of haplogroup A.

Conclusion
Using microsatellite analysis of Indonesian samples, we present further evidence of genetic differentiation between P. monodon from the Indian and the Pacific Oceans corroborating earlier reports using older molecular tools Sugama et al. 2002;Waqairatu et al. 2012;You et al. 2008). This variation can provide a basis for a more accurate assignment of P. monodon stocks in order to ensure sustainable aquaculture practices as well as the conservation of wild P. monodon. Our results also show that the mtCR-like sequences belonging to haplogroup C share evolutionary history with nuclear DNA (Zhang and Hewitt 1996) as would be expected for Numts and hence provide additional evidence that only the mtCR sequences in haplogroup A are authentic mitochondrial sequences. This corroborates previous tentative conclusions regarding the evolutionary source of haplogroup C mtCR sequences in P. monodon (Walther et al. 2011) and points to the possibility that there are different migration and dispersal rates for male and female P. monodon shrimps in Indonesian coastal waters. You, E.-M., T.-S. Chiu, K.-F. Liu, A. Tassanakajon, S. Klinbunga, and K. Triwitayakorn. 2008. Microsatellite and mitochondrial haplotype diversity reveals population differentiation in the tiger shrimp (Penaeus monodon) in the Indo-Pacific region. Anim. Genet., 39:267-277. doi:10.1111/ j.1365-2052.2008.01724.x Zhang, D., and G. Hewitt. 1996