DNA barcoding of marine fish species from Rongcheng Bay, China

Rongcheng Bay is a coastal bay of the Northern Yellow Sea, China. To investigate and monitor the fish resources in Rongcheng Bay, 187 specimens from 41 different species belonging to 28 families in nine orders were DNA-barcoded using the mitochondrial cytochrome c oxidase subunit I gene (COI). Most of the fish species could be discriminated using this COI sequence with the exception of Cynoglossus joyneri and Cynoglossus lighti. The average GC% content of the 41 fish species was 47.3%. The average Kimura 2-parameter genetic distances within the species, genera, families, and orders were 0.21%, 5.28%, 21.30%, and 23.63%, respectively. Our results confirmed that the use of combined morphological and DNA barcoding identification methods facilitated fish species identification in Rongcheng Bay, and also established a reliable DNA barcode reference library for these fish. DNA barcodes will contribute to future efforts to achieve better monitoring, conservation, and management of fisheries in this area.


INTRODUCTION
There are approximately 30,000 fish species worldwide, constituting slightly more than one-half of the recognized living vertebrates (Nelson, Grande & Wilson, 2016). The Chinese fish fauna also has high species richness, with more than 3,200 marine species described from Chinese coastal waters (Liu, 2011). Owing to dramatic expansion in fish production, particularly from aquaculture, fish availability in China has grown steadily, with apparent per capita fish consumption increasing by an average of 6% annually in the period 1990-2010(FAO, 2014. It is imperative that the ichthyofauna of China be well studied for effective conservation and resource management. Accurate and unambiguous identification of fish will assist in managing fisheries for long-term sustainability, and will improve ecosystem research and conservation.
However, this presents a resource challenge. Traditional determination methods, such as morphological identification, require a considerable amount of taxonomic expertise.
The DNA-based barcoding method has been proven to be a valuable molecular tool for species identification and it is accessible to non-specialists (Hebert, Ratnasingham & Dewaard, 2003;Frézal & Leblois, 2008;Leray & Knowlton, 2015). A number of international campaigns are focused on DNA barcoding whole biota, including fish; FISH-BOL (http://www.fishbol.org), for example, is now well established and aims at DNA barcoding all the fishes of the world (Ward, Hanner & Hebert, 2009). DNA barcoding is useful not only for the identification of whole fish but also for the identification of larvae, eggs, fillets, fins, and other fragments of the body that are difficult to identify based on morphology (Trivedi et al., 2016). The mitochondrial COI gene has been accepted as the standard region for DNA barcoding (Hebert, Ratnasingham & Dewaard, 2003;Hajibabaei et al., 2007a;Hajibabaei et al., 2007b) and it is extremely effective at discriminating fish species (Ward et al., 2005;Hubert et al., 2008;Valdez-Moreno et al., 2009). Approximately 98% of reported marine fish species can be distinguished by COI barcoding, and this approach has been used to catalogue and record fish in many geographic regions (Aquilino et al., 2011;Asgharian et al., 2011;Cawthorn, Steinman & Witthuhn, 2011;Lakra et al., 2011;Becker et al., 2015). However, there have been only a few DNA barcoding studies of the marine fish resources of China, and most of them have focused on the South China Sea (Zhang, 2011;Wang et al., 2012;Zhang & Hanner, 2012).
Rongcheng Bay is a coastal bay of the Northern Yellow Sea in Shandong Province, China, and it is one of the most important coastal regions for fisheries in China. The fisheries in this area are a major source of food and have helped coastal communities to maintain their livelihoods and community structure. Although it is an important spawning and nursery area for many marine species, scant information is available on fish species richness, barring two reports by Yu et al. (2013) andWang et al. (2016). These investigators studied species composition and seasonal variation in community structure and reported the length-weight relationships of 13 fish species in Rongcheng Bay, but they did not undertake systematic DNA barcoding or other molecular data collection. Our study aimed to complement their dataset using DNA barcoding in order to better investigate and monitor fish resources and implement conservation efforts.

Ethics statement
The study was conducted in accordance with the guidelines and regulations established by China Government Principles for the Utilization and Care of Animals Used in Testing, Research, and Training. All other applicable international, national, and institutional guidelines for the care and use of animals were followed by the authors. The animal work and animal protocols were approved by Institute of Oceanology, Chinese Academy of Science. Permits and approval of field studies have been obtained by the authors from the Institute of Oceanology, Chinese Academy of Sciences (201305043 and 200805069).

Fish samples
Fish samples were collected from Rongcheng Bay, Northern Yellow Sea (37 • 12 -37 • 24 N, 122 • 33 -122 • 48 E) (Fig. 1), using bottom trawl nets and stake nets from spring 2011 to winter 2014. One to 10 individual specimens were collected for each fish species. All captured fish samples were immediately stored on ice and transported to the laboratory for identification. Whole-specimen vouchers were deposited in the Key Laboratory of Experimental Marine Biology, Institute of Oceanology, Chinese Academy of Sciences. All specimens were identified to the species level based on morphology by consulting a standard fish reference (Cheng & Zheng, 1987) and FishBase (http://www.fishbase.org). Muscle tissue was excised from each specimen and stored at −20 • C until use. To avoid cross-contamination between fish samples, the inner part muscle of each individual was collected with tweezers and scissors, which were sterilized with alcohol and alcohol lamp. Every sample was collected by using a separate tool set.

DNA extraction and PCR amplification
Total genomic DNA was extracted from muscle tissue by salt extraction (Aljanabi & Martinez, 1997). Concentration and purity of the extracted DNA were measured using a NanoDrop 2000 spectrophotometer (Thermo Scientific, Wilmington, DE, USA). Approximately 650 bp was amplified from the 5 region of the mitochondrial COI gene using different combinations of the following primers designed by Ward et al. (2005). FishF1 5 -TCAACCAACCACAAAGACATTGGCAC-3 FishF2 5 -TCGACTAATCATAAAGATATCGGCAC-3 FishR1 5 -TAGACTTCTGGGTGGCCAAAGAATCA-3 FishR2 5 -ACTTCAGGGTGACCGAAGAATCAGAA-3 The 50-µl PCR mixtures included 5 µl of 10 × PCR buffer, 2.5 µl of MgCl 2 (50 mM), 1 µl of dNTP (0.05 mM), 2 µl of each primer (0.01 mM), 0.125µl of each dNTP (0.05 mM), 1.25 U of Taq polymerase, 2.0 µl of DNA template, and ultrapure water to 50 µl. The amplification procedure consisted of an initial denaturation step at 95 • C for 5 min, followed by 35 cycles at 95 • C for 30 s, 52 • C 30 s, and 72 • C for 60 s, then a final extension at 72 • C for 10 min. PCR products were visualized on 1.2% agarose gels and purified by the EZNA TM gel extraction kit (Omega Bio-Tek, Norcross, GA, USA). The purified PCR products were sent to Shanghai Sunny Biotechnology Co. Ltd., China, for bidirectional sequencing using an ABI 3730 capillary sequencer (Applied Biosystems, Foster City, CA, USA).

DNA sequence analysis
The DNA sequences were assembled, aligned, and annotated using BioEdit software (Hall, 1999). The obtained sequences were double compared to sequences of fishes in both GenBank and BOLD databases, and the similarity index with available sequences of same fish species from databases were over 98% for all sequences. Distance-and character-based DNA barcoding methods for species discrimination were used in this study. Pairwise genetic distances were calculated using the Kimura 2-parameter (K 2 P) distance model (Kimura, 1980). Neighbor-joining (NJ) trees of K 2 P distances with 1,000 bootstrap replications (Saitou & Nei, 1987) were generated to provide a graphic representation of the patterning of divergence between species. The K 2 P distance and the neighbor-joining (NJ) tree were calculated and generated using MEGA version 5 (Tamura et al., 2011). BLOG 2.0 (Weitschek et al., 2013) was used for character-based identification of fish species represented by more than two individuals. The COI sequences of Cynoglossus joyneri and Cynoglossus lighti downloaded from GenBank (accession numbers: GU479053.1, JQ738602.1, JQ738613.1, DQ116752.1, JQ738430.1, JQ738456.1, JQ738468.1, KF979127.1, HQ711865.1) together with the sequences generated from this study were used to construct the NJ tree with 1000 bootstrap replications by using MEGA.

RESULTS
A total of 187 COI sequences (GenBank accession numbers: KU236800-KU236892 and KY275270-KY275363) were obtained from 41 fish species belonging to 28 families and nine orders (Table 1). Sequence lengths ranged from 605 to 655 bp, with an average of 635 bp. No stop codons, insertions, or deletions were observed in any of the amplified sequences. The overall average nucleotide composition of the sequences was 23.5% A, 29.2% T, 18.7% G, and 28.5% C, with an A + T bias ( Table 2). The percent GC content of five fish species, namely Chelidonichthys kumu, Neosalanx anderssoni, Lateolabrax japonicus, Saurida elongata, and Zoarces elongatus, was found to be more than 50% (Table 3). Among these, N. anderssoni (Osmeriformes) showed the highest GC content (53.4%). Among the other 36 species, Pterogobius zacalles (Perciformes) showed the lowest GC content (41.4%).
The K 2 P genetic distances within each taxonomic level are summarized in Table 4. The average genetic distance using K 2 P analysis of individuals within species was 0.21% (Table  4). Within genera, families, and orders, the distances were 5.28%, 21.30%, and 23.63%, respectively (Table 4). An increase in genetic variation at increasing taxonomic levels was observed, but the rate of increase declined in the higher taxonomic categories (Fig. 2). In this study, some species were represented by a single specimen, but for the majority of the species (29), multiple specimens were analyzed (Table 1) and character-based analyses   successfully identified those species. The BLOG method of species diagnosis is based only on nucleotides of specific sites in particular taxa, and these diagnostic sites are referred to as nucleotide diagnostics (ND). Thirteen species were identified with two ND, while the other species were diagnosed using an ND of three to four nucleotide positions in combination (Table 5). A phylogenetic NJ tree was generated based on all individuals' DNA barcode sequences (Fig. 3). Most individuals from each species belonged to single monophyletic clusters, except for Cynoglossus joyneri and Cynoglossus lighti. Instead of forming two separate branches, these individuals clustered under a single node. In order to validate our sequences, additional COI sequences of these species were downloaded from GenBank and analyzed together with the sequences generated from our study. The resulting NJ tree (Fig. 4) showed that the C. joyneri and C. lighti voucher sequences were also non-monophyletic. The K 2 P distances within C. joyneri ranged from 0% to 1.23%, with an average value of 0.48%, and the values within C. lighti ranged from 0.35% to 0.70%, with an average value of 0.53%. The minimum, maximum, and average K 2 P distances between C. joyneri and C. lighti were 0%, 1.23%, and 0.48%, respectively.

DISCUSSION
Traditional morphological species identification requires experienced taxonomists, and the phenotypic plasticity of taxa may lead to misidentifications. The DNA barcoding method has been proven to be an effective tool for species identification, particularly with specimens that are damaged, incomplete, or consisting of several morphologically distinct stages. Nevertheless, DNA barcoding also has limitations. In some cases, related species may present identical sequences making DNA barcodes useless for species discrimination. Therefore, DNA barcoding can serve as a complementary tool for the identification of species, but it cannot replace morphological taxonomic analyses (Pečnikar & Buzan, 2014).
In this study, DNA barcode analysis based on the COI gene was able to identify most fishes in Rongcheng Bay, and the identification results were in agreement with that of Figure 3 Neighbor-joining (NJ) tree of 187 COI sequences from 41 fish species, using K 2 P distances. This phylogenetic tree was constructed using the NJ method, and bootstrap analysis with 1,000 replicates was used to assess the strength of the nodes.  Landi et al., 2014). In this study, a reliable DNA barcode reference library for the fish in Rongcheng Bay was established, which could be used to assign fish species by screening sequences against it in the future. This would contribute to achieving better monitoring, conservation, and management of fisheries in this area. The variation of GC content affects different codon positions. Generally, the second codon position shows the lowest variation and the third codon position shows the largest range of variation (Min & Hickey, 2007), which is consistent with our results (Table 2). These differences between the codon positions may reflect the degree of selective constraint, therefore GC content could provide a significant insight into the nature of selective pressures impacting nucleotide usage (Clare et al., 2008). GC content was also proved to be correlated with some bio-functions, such as DNA helix (Vinogradov, 2003) and gene expression (Quax et al., 2015). The mitochondrial GC shifts in nucleotide composition can be explained by mutational biases, natural selection (Mooers & Holmes, 2000) and other factors, such as environmental temperature (Bernardi & Bernardi, 1986) and amino acid content Table 5 Nucleotide diagnostics (NDs) for 29 fish species collected from Rongcheng Bay.

Lophius litulon T-250+C-286+G-319
Konosirus punctatus T-286+A-493 Thryssa kammalensis A-294+G-501 Okamejei kenojei A-321+T-495 (Foster & Hickey, 1999). The sequences obtained in this study, together with other datasets, may facilitate further investigation of these hypotheses. Although it is highly controversial (Srivathsan & Meier, 2012), the distance-based technique remains as the standard approach in DNA barcoding (Reid et al., 2011). In this study, the K2P model was used in this study to ensure consistency and comparability with other barcoding studies. The intraspecific genetic divergence (about 0.2%) was much smaller than the interspecific genetic divergence (about 10%) in Rongcheng Bay (with the exception of C. joyneri and C. lighti). These results indicate that using COI gene sequences as DNA barcodes to discriminate fish species in Rongcheng Bay is feasible. Increasing average genetic distance values were obtained at increasing taxonomic levels in this study. The average genetic distances between individuals within species, genera, families, and orders were 0.21%, 5.28%, 21.3%, and 23.63%, respectively, consistent with the patterns observed in other fish barcoding studies. For example, the K 2 P values of Australian fish within species, families, and orders were 0. 39%, 15.46%, and 22.18%, respectively (Ward et al., 2005); the values of Indian marine fishes were 0.30% within species, 9.91% within families, and 16.00% within orders (Lakra et al., 2011); and the values of fishes from South China Sea within species, families, and orders were 0. 32%, 20.20%, and 24.66%, respectively (Zhang, 2011).
Most of the 41 fishes in this study have been recorded previously in the Yellow Sea (Liu & Ning, 2011;Yu et al., 2013;Wang et al., 2016). According to Liu & Ning (2011), five of these fish are warm water species, 20 are warm-temperate species, and 16 are cold-temperate species (Table 6). No cold-water species were present. However, cold-temperate fish accounted for a high proportion of the species were observed (39% of the total) in this study. This may be a result of the Yellow Sea cold water mass, wherein a 70-80 meter depression in the central part of the Yellow Sea (He, Wang & Lei, 1959), holds cold water throughout the year and provides an important habitat for cold-temperate fish. Based on the literature and fish databases (Froese & Pauly, 1998;Liu & Ning, 2011;Shao, 2011), the habitat preferences of these 41 fish species can be grouped into five categories. In the most dominant category, 31 were associated with continental shelf demersal habitats. The other species sampled included two continental shelf reef-associated species, four continental shelf pelagic-neritic species, three oceanic pelagic species, and one oceanic bathydemersal species. The species composition in Rongcheng Bay is similar to that in other bays along the coast of the Yellow Sea, such as Laizhou Bay and Jiaozhou Bay (Xu et al., 2013;Sun et al., 2014). However, these regions do not yet have an analogous DNA barcoding database for comparison with our study.
Of the 41 species sampled in this study, there are 14 with high commercial value (Table  6). Some traditionally and economically important fishes, such as Larimichthys polyactis and Scomber japonicus, were caught seldom or only once during our investigation. Meanwhile, some less-valuable species have become dominant, implying that fish species assemblages and population sizes in this area have changed in keeping with other reports (Jin & Tang, 1996;Jin, 2003;Xu & Jin, 2005). Therefore, the economically valuable fish species should be better protected, and DNA barcoding of the ichthyofauna of this bay has contributed additional information toward this goal.
Finally, our study has brought to light an interesting relationship between two species belonging to the subfamily Cynoglossinae, C. joyneri and C. lighti, which exhibited genetic distances that were very small (from 0% to 1.23%). The resulting NJ tree showed that these two fishes did not form distinct monophyletic clusters and were not clearly separated from each other. Voucher sequences from GenBank were consistent with these findings. Moreover, the morphological taxonomy in the subfamily Cynoglossinae is contentious, and several disputes about species delimitation have arisen (Matsubara, 1955;Ochiai, 1963;Li & Wang, 1995). It is difficult to morphologically distinguish C. joyneri from C. lighti. The appearance of these two fishes is very similar, and the main differences are the number of lateral-line scales and the head to tail length ratio (Li & Wang, 1995). On a molecular level, our study did not find sufficient interspecific genetic differentiation to regard these species as truly separate. These results support the findings of Liu et al. (2010), who analyzed
genetic differentiation among individuals of these two species using partial 16S rRNA and Cyt b mitochondrial gene sequences, and found that genetic differentiation in these gene sequences was small. On this basis, they hypothesized that C. joyneri and C. lighti are probably the same species. Owing to the limited number of samples and genes surveyed in our study we hesitate to say conclusively that they are conspecific, however. Further investigation combining morphological data and the divergence of multiple molecular markers, including nuclear genes, will be required to confirm the taxonomic status of C. joyneri and C. lighti.

CONCLUSIONS
Rongcheng Bay is a coastal bay of the Northern Yellow Sea, China, and its ichthyofauna has its own unique features. Therefore, it is very important to identify the fish species from this area. DNA barcoding is a molecular method that uses a short standardized DNA sequence of the mitochondrial COI as a species identification tool. In this study, 187 specimens from 41 different species belonging to 28 families in nine orders were DNA-barcoded. The average genetic distance using K 2 P analysis of individuals within species, genera, families, and orders were 0.21%, 5.28%, 21.30%, and 23.63%, respectively. There are no overlaps of pairwise genetic variations between conspecific and interspecific comparisons apart from the species Cynoglossus joyneri and Cynoglossus lighti in genus Cynoglossus. Our results confirm that DNA barcoding can be used as an effective tool for fast and accurate fish identification in Rongcheng Bay.