DNA Barcoding and Phylogenetic Relationships of Nine Catfish Species from Mekong Basin, Vietnam

Fishes that belong to the family Pangasiidae are widely recognized to have good potential for aquaculture and are highly valued as flesh food in the markets of Vietnam. However, there is much debate on the identification and phylogeny of the available species of Pangasiidae in the Asia. In the present study, nine species of two genera (Pangasianodon and Pangasius) of Pangasiidae were investigated using two partial sequences of the COI and cytochrome b mitochondrial genes to differentiate among them and study their phylogenetic relationships. A total of 42 haplotypes were identified (21 haplotssypes of each gene). The highest interspecies genetic distance was between Pangasius larnaudii and Pangasius bocourti (0.189) for COI and was between Pangasius macronema and Pangasianodon hypophthalmus (0.179) for cyt b. Whereas the lowest genetic distance was between Pangasius macronema and Pangasius conchophilus for both genes (0.65 for COI and 0.92 for cyt b, respectively). The phylogenetic tree analyses of two genes showed two major clusters that are genetically distant from the two genera. The results obtained in this study also show that beside COI gene, the cyt b gene region can be successfully used for differentiating between species and accepted as a standard region for DNA barcoding.


Introduction
The Vietnamese Delta forms an integral part of the Lower Mekong Basin (LMB). LMB, originating in the Tibetan Plateau, is supplied with rich alluvial deposits from the Mekong River, which has a high biodiversity of fishes [1]. The water environments of the delta region are numerous, such as large freshwater rivers, irrigation canals, brackish estuaries, mangrove creeks, and mudflats [1]. Thus, fishery resources play an important role in the economy of south Vietnam. Many types of large-and medium-sized fishes, especially those that belong to the family Pangasiidae, have a good potential for aquaculture and highly valued as flesh food in the markets of Vietnam, for example, the sutchi catfish (Pangasianodon hypophthalmus), basa catfish (Pangasius bocourti), ca bong lao (Pangasius krempfi), and spot pangasius (Pangasius larnaudii). In 2016, the production of Pangasinodon hypophthalmus was 1.2 million tons, and it was exported to 136 countries across all continents, with an estimated export income of US$ 1.67 billion [2].
The family Pangasiidae includes a large number of species and belongs to the order Siluriformes, and it has a relatively wide distribution from Southwest to Southeast Asia. According to Roberts and Vidthayanon (1991), 11 species of the family Pangasiidae are found in Thailand, 10 in Indonesia, 3 in Peninsular Malaysia, and 4 endemics to Borneo Island. Most of the species in the family Pangasiidae are freshwater fishes. Five species also occur in brackish water: Pangasius pangasius [3], Pangasius krempfi [4], Pangasius kunyit [5], Pangasius sabahensis [6] and Pangasius mekongensis [6]. In Vietnam, according to some researchers, the family Pangasiidae has 13 species that belong to 4 genera: Pangasianodon, Pangasius, Pseudolais, and Helicophagus [7,8]. Most of the species are distributed in freshwater environments; only three of them inhabit the brackish waters at estuaries: Pangasius krempfi, Pangasius mekongensis, and Pangasius elongatus.
Although, many studies on the classification of Pangasiidae according to external morphological characteristics have been conducted, there is still much debate on this issue because the species in the family have very similar morphologies, resulting in difficulties in classification in recent years. The website www.fishbase.org provides information on the morphology of fishes, but it is not always accurate. In many cases, the morphological classification is limited by the influence of living conditions or processed products. Therefore, identification becomes difficult or even impossible. Recently, with developments in molecular biology, several markers have been used as effective tools for the identification of fish species in particular and many other animal species in general. Especially, over the last decade, DNA barcoding has emerged as a molecular method for species identification. DNA barcoding is based on the principle of sequencing a short segment of DNA from a uniform region of the mitochondrial genome of the target specimen and comparing these unknown barcodes to an existing barcode database to identify the species [9].
DNA barcoding is also used to refine species identification by detecting query specimens with probabilistic algorithms when a set of barcodes of known species is established. Information on the phylogeny of the family Pangasiidae is scarce [10], and some species of the family have similar morphological features, such as Pangasius bocourti and Pangasius nasutus. Rainboth (1996) stated that Pangasius bocourti and Pangasius nasutus have been misidentified. Gustiano et al. in 2003 used biometric measurements to distinguish between seven species of four genera in four main rivers in Sumatra; however, the classification was solely based on morphology without any molecular evidence. Generally, cytochrome c oxidase I (COI) or cytochrome b (cyt b) sequence is used for DNA barcoding [11,12]. The cyt b gene has been considered one of the most useful genes for phylogenetic studies, and it is probably the best-known mitochondrial gene with respect to the structure and function of its protein product [13]. The cyt b gene contains both slowly and rapidly evolving codon positions as well as more conservative and variable regions or domains overall. It is a powerful indicator for identifying species with DNA analysis techniques [14][15][16], and it is also used in molecular evolution studies [17]. In this study, we used DNA barcoding for the identification of nine species of Pangasiidae from Mekong Delta, Vietnam. The DNA barcoding data obtained in this study will be used for better monitoring, conservation, and management of the family Pangasiidae.

Sample collection and DNA isolation
A total of 50 individuals (9 species) of Pangasiidae were obtained from a local fish market or fishermen from the provinces in Mekong Delta, such as An Giang, Đong Thap, Can Tho, Soc Trang, Tien Giang, and Ben Tre. The fishes were caught in rivers by using lines and nets or in aquaculture facilities (cages or ponds). The specimens were identified according to the descriptions published by Roberts and Vidthayanon and [18], and approximately 1 to 3 g of fin clips or muscle tissue was collected from each specimen and stored in 95% ethanol until further use. A summary of the statistics and sampling localities of some species of the family Pangasiidae is presented in Table 1.
The genomic DNA was isolated from the fin clips or muscle tissues by using the standard phenol-chloroform-isoamyl-alcohol method described by Sambrook and Russell [19] with some minor modifications. The DNA quality was checked using 1% agarose gel electrophoresis, and the absorbance at 260 nm was measured using the Ultrospec 2100 Pro UV/visible spectrophotometer to determine the DNA concentration. Fragments were gelpurified and sequenced using an ABI 3730XL DNA Sequencer (Applied Biosystems).

Genetic diversity analysis and phylogenetic tree construction
The sequence data were edited manually to confirm all base-pair assignments from chromatographs by using BIOEDIT software [22]. All the COI and cyt b sequences of the mtDNA regions were homologous in length and could be easily aligned using Clustal X. For sequence comparisons, pairwise genetic distances were quantified on the basis of the Kimura 2-parameter (K2P) distance model [23] by Hall using MEGA 6.06 [24].
Sample identification based on the COI sequence similarity approach was conducted using two databases: BOLD and GenBank. The highest percent pairwise identity of the consensus sequence from each species searched (BLASTN) in NCBI was compared to the percent specimen similarity scores of the consensus sequence from each species in the BOLD Identification System (BOLD-IDS) [25]. We searched for the consensus sequences in the NCBI database and downloaded one sequence for each species for the phylogenetic tree.  . The neighbor-joining (NJ) method was used to infer the phylogenetic relationships among the species by using MEGA version 6.06 [24]. The NJ tree was constructed, and support for monophyly was assessed with 1000 bootstrap pseudo-replicates [27].

Species identification and genetic distance
Two mtDNA regions in all the samples were successfully amplified using PCR; 21 haplotypes of COI and 21 haplotypes of cyt b were investigated from nine species, and all the sequences were submitted to the NCBI database with accession numbers KY398017-KY398037 (COI) and KY451455-KY451455 (cyt b) ( Table 1). Table 2 shows the comprehensive barcoding identification results for the COI and cyt b genes by using the GenBank and BOLD databases. The results for the COI gene in both databases revealed definitive identity matches in the range of 98% to 100% for the consensus sequences of six species (Pangasius macronema, Pangasius conchophilus, Pangasianodon hypophthalmus, Pangasianodon gigas, Pangasius sanitwongsei, and Pangasius larnaudii). GenBank-based identification for all the species yielded an alignment E-value of 0.0. The BOLD-IDS results were consistent with the GenBank results with respect to the identification of these species and yielded 100% identity, except for Pangasianodon gigas (100% maximum identity in GenBank, whereas the percent similarity in BOLD was 99.64%). The present study also highlighted that the GenBank databases lack the data record for the cyt b gene of Pangasius mekongensis and provided a top hit for a related species, Pangasius (92% identity).
Pairwise nucleotide and genetic distances (p-distance) using K2P are represented in Table 3 (for COI) and Table 4 (for cyt b). For the COI gene, the highest interspecies genetic distance (0.189) was observed between Pangasius larnaudii and Pangasius bocourti, and the lowest genetic distance (0.065) was between Pangasius macronema and Pangasius conchophilus. For the cyt b gene, the lowest genetic distance (0.092) was also between Pangasius macronema and Pangasius conchophilus; however, the highest interspecies genetic distance (0.179) was observed between Pangasius macronema and Pangasianodon hypophthalmus.

COI gene
Cyt b gene

Discussion
Morphological studies of specimens raise questions regarding observed features versus described features. In a few cases, key morphological characteristics are difficult to discern [28]. The DNA Phylogenetic relationships among the Pangasiidae species: The phylogenetic trees (NJ (Figures 1 and 2), MP, and ML) for the two genes revealed almost identical phylogenetic relationships among the species. Two major clusters were revealed: the first cluster formed by Pangasianodon hypophthalmus and Pangasianodon gigas, and the second cluster further divided into two subclades in all the constructed trees in which Pangasius macronema, Pangasius conchophilus, Pangasius     barcoding approach has resolved some identification issues and elucidated the actual species composition in certain regions [29]. In this study, we sequenced the COI region and cyt b gene of mtDNA to create a set of barcode sequences and identify nine catfish species belonging to two genera (Pangasianodon and Pangasius) in Vietnam. We extensively compared our results to the BOLD and GenBank databases. We found that out of the nine-species studied, only six matched the reference sequences in both databases.  [6,31]. BOLD-IDS validates the identification search only if the species in the reference database has at least three barcoded specimens and identifies the query sequences if they match the reference sequence within a conspecific distance of less than 1% [25]. Therefore, correct species labeling, morphological taxonomy, and voucher documentation should be prioritized in cases where reassessment of spurious data is necessary [20].
In this study, we also aimed to understand the potential of COI and cyt b genes as DNA barcoding tools for identifying almost all the species of the two genera (Pangasianodon and Pangasius) in Vietnam. A total of 42 haplotypes were identified, and 21 haplotypes per gene were studied. For the nine-studied species, the interspecies distances were greater than 0.02 for both genes and ranged from 0.065 to 0.189 for COI and 0.092 to 0.179 for cyt b). No intraspecies and/or interspecies distance overlaps were detected, and a distinct barcoding gap was found between intraspecies and interspecies distances in each species. These results indicate that the COI and cyt b gene sequences can be effectively used to identify the nine species of the two genera with DNA barcoding. The observed transition vs. transversion ratios in pangasids are also comparable to those in many teleosts [32,20]. Generally, a larger excess of transitions related to transversion is observed in teleost mtDNA [20].
The phylogenetic trees for the nine species belonging to Pangasiidae in Vietnam on the basis of the COI and cyt b genes are shown in Figures  1 and 2. The phylogenetic relationships based on the COI sequence are concordant with the morphological and osteological comparisons made by [1,8]. Our results are consistent with those of Azlina et al. in 2013, who studied the phylogenetic relationships of Pangasiidae in Malaysia.

Conclusion
In conclusion, DNA barcoding is emerging as an invaluable tool for species identification. The gene sequences of COI and cyt b have been submitted directly to GenBank. The results obtained in this study show that the cyt b gene region can be successfully used for differentiating between species and accepted as a standard region for DNA barcoding. The phylogenetic trees represented two major clusters that are genetically distant from the two genera.