DNA barcoding of fish fauna from low order streams of Tapajós River basin

The Amazon basin harbors a megadiverse fish fauna spread in an intricate network of big rivers and small streams. The Amazonian streams are home of many small sized fishes that remains poorly documented. In order to accelerate the scientific knowledge on these important aquatic systems we adopted a modern integrative approach joining morphology and molecular tools to investigate the ichthyofauna assemblages from low order streams situated on the lower Tapajós River Basin. Cytochrome c Oxidase I (COI) DNA barcodes from 252 specimens collected from 10 stream sites were obtained. The combined analysis revealed 29 species, 21 genera and 11 families. Cryptic diversity was evidenced in Knodus sp.1, Aequidens epae and Copella callolepis, in which deep genetic divergence were detected (intraspecific distances: 20.48%, 7.99% and 3.77%, respectively). The putative new species showed closer relationships with their counterparts occurring in the Tapajós-Xingu water drainages.


Introduction
A significant part of the megadiverse Amazonian ichthyofauna inhabits in the extensive network of streams and small rivers (igarapés). These streams are scattered in all over the Amazonian uplands and lowlands territory, invading the forest matrix where they are mostly shadowed by forest canopy. The waters are usually acidic, with low temperature variation (24-26˚C) and low primary productivity [1]. The riparian forest surrounding the watercourse plays pivotal ecosystem function as a nutrient source and the meanders, with a variety of obstacles in the channel (e.g. trunks, stones, and marginal vegetation roots); provide a succession of microhabitats ordinarily explored by tiny fish representative of orders Characiformes, Cichliformes, Siluriformes and Gymnotiformes [2], [3]. PLOS  For a long time the ichthyofauna from Amazonian streams remained almost unknown and the major scientific efforts on the fish biodiversity, were concentrated around the easily accessible large floodplain rivers. In the last years several studies had been availed Amazonian streams exploring about sampling methods [4], [5], fish assemblages [6][7][8][9], fish ecology [7], [10] and environment degradation [11], [12].
The Amazonian rivers and streams have high endemism and taxonomic/ecological diversity, what could be appreciated by its about 2.500 fish species have already been described and more than 1.000 undesbribed species [13], [14]. In the present decade, hundreds of new fish species had been discovered in the Amazon region [14], [15], including in the streams of the lower Tapajós River [16], what is a warning that its real fish diversity continues severely underestimated. The Tapajós River drainage has a large fish diversity encompassing 494 recorded species, whose 17% are restricted from this water system [17] and 117 had been previously recorded in the streams from the lower portion of this basin [9].
Traditional morphology examinations remain as the favorite method for species identification, in use by many Brazilian fish scientists; however, complementary molecular studies (DNA barcoding) are promising and encouraged to accelerate the knowledge on fish taxonomy.
DNA barcoding [18] has been well established as a successful tool for species identification based on a short segment of mtDNA (Cytochrome c Oxidase I gene-COI). Many papers demonstrate the efficacy of DNA barcoding for recognizing and discover new fish species from the marine to freshwater habitats [19][20][21][22][23][24].
In the present study, we investigated the fish diversity from streams of the lower Tapajós River, by using morphology and molecular tools, in order to provide important taxonomic data to an underexplored and highly speciose Amazonian place. The necessity of taxonomic characterization and conservation policies to the Tapajós ichthyofauna arises from the imminent threats on the local aquatic biota regarding the impacts of gold mining industry and big hydroelectric power projects administered by government and multinational companies.

Ethics statement
This research was conducted in accordance to the guidelines by the National Council for Control of Animal Experimentation and the Federal Board of Veterinary Medicine. The protocols were approved by the Committee on the Ethics of Animal Use of the Instituto Nacional de Pesquisas da Amazônia (040/2012; 041/2012). Sampled individuals were euthanized using a lethal concentration of Eugenol and whole specimens or piece of tissues were stored in ethanol. Fish sampling and tissue collect were authorized by the Brazilian Government and carried out under ICMBIO licenses (SISBIO 32653-3, 44215-1 and 44215-2).

Study area
A total of 10 low order streams, situated on the right bank of Tapajós River were surveyed. The streams are located in the municipality boundaries of Santarém, Belterra and Ruropólis, in Pará State, Brazil (Table 1, Figs 1 and 2).  Sampling and morphology analysis. Samples were collected from 2012 to 2015 (S1 File). Before each capture session, we delimited an aquatic parcel in a stretch of 50m long and blocked the stream channel in the extremity of the parcel using filament nets attached to the margins and the bottom of the stream [30]. The fishes were captured by three collectors using sieves and small nets (2 and 3m long) during 2 hours of active captures. Sample size for any given location ranged from 2 to 24 individuals. All the captured specimens were collected and when possible a maximum of 10 specimens per species and sampling event were chosen for DNA barcoding analysis, while the remaining ones were fixed and preserved for museum.
The collected fishes were provisionally classified in the field, as morphospecies. Then, each individual specimen was labelled and processed for photograph records and tissue sampling. Vouchers were fixed in formalin 10% for 24h, rinsed with water and moved to ethanol 70%, and deposited at Fish Collection of the Institute of Water Science and Technology, Federal University of Western Pará, Brazil (http://www.ufopa.edu.br/ufopa/institucional/unidadesacademicas/icta/) (S2 File). The tissue collections were stored at Laboratory of Genetics and Biodiversity, Federal University of Western Pará, Brazil.
A definitive taxonomic identification at species level, based on the examination of morphologic characters and using identification keys [31][32][33][34] and taxonomic material deposited in scientific collections. If identification was not properly assigned to a specific species, "sp.", "cf." and "aff." abreviations were applied [35].
Molecular methods. Before the specimen fixation, muscle tissue samples were extracted, preserved in Ethanol 96˚GL and stored at -20˚C. Genomic DNA was purified with the "salting out" protocol adapted by Vitorino and colleagues [36]. Briefly, the lysis step occurred in a microtube containing 440μL of lysis buffer (10mM Tris-HCl, 2mM EDTA, 400mM NaCl, 2% SDS) added with 10μL of proteinase K (10mg/mL), incubated in a water bath at 55˚C by 3h or alternatively overnight. For DNA precipitation, we added 300μL of 5M NaCl and the microtubes were inverted manually and centrifuged by 10min at 10000 rpm. The DNA in the supernatant phase was collected and precipitated with 500μL of 100% isopropanol and centrifuged by 10min/10.000 rpm. The DNA was washed with 700μL of 70% ethanol, dried and reconstituted in 30 μL of sterile water. Finally, 5 μL of RNAse (10mg/mL) was added and incubated at 37˚C by 30min. The purity and concentration of the extracted DNA were evaluated through electrophoresis with 1% agarose gel stained with Gelred (Biotium-Uniscience).
The PCR positive products were cleaned with columns system using the E.Z.N.A. Cycle Pure Kit (Omega Bio-tek) following the fabricant instructions. DNA barcoding sequences were obtained by di-desoxiterminal Sanger method using ABI PRISM Big Dye Terminator V.3 Cycle Sequencing kit (Applied Biosystems). Sequencing reactions were made in 96-well plates with final volume of 10μL, containing 5 μL of sterile H 2 O, 1.5 μL of sequencing buffer 5X, 0.5 μL of primer (10 μM), 1 μL of Big Dye mixture and 2 μL of PCR cleaned product. PCR conditions were as follows: 96˚C (1 min); 35 cycles of 96˚C (15 sec), 50˚C (15 sec), and 60˚C (4 min). The reactions were precipitated in ethanol/EDTA and dried at 90˚C for 2min. The plates were resuspended with 10 μL Formamida Hi-Di, heated at 94˚C for 3 min. and sequenced in ABI 3500 genetic analyser (Applied Biosystems).
Data analysis and species delimitation. DNA barcode sequences were previously edited to remove primer reads, to remove ambiguous bases, to inspect for premature stop codons and then be aligned with BioEdit [37] and MEGA v.7 [38]. The DNA barcode sequences and the standard associated metadata were uploaded to BOLD systems platform (www. boldsystems.org) and assigned to the "Peixes de Igarapés da Bacia do Tapajós (IGTAP label)" as part of campaign Br-BOL-Project 09. To analyse the barcode sequence database were used online BOLD tools: Distance Summary, Barcode Gap Analysis and Barcode Index Number System (BIN). To illustrate the phylogenetic arrangement of species and groups, we generated a dendrogram through Neighbor-Joining reconstruction under Kimura 2-parameters (K2P) model [39] using MEGA v.7 [38]. The statistical robustness of the branches was evaluated by bootstrap test with 1000 pseudo-replicates.
In order to delimit cryptic and candidate species we follow the criteria adopted in Pugedo and colleagues [24]. Potential candidate species were flagged if: 1) classified as Concordant BIN cluster; 2) present nearest neighbor distance (NND) higher than 2%. Cryptic species were recognized by possessing an intraspecific distance higher than 2% and no exihibit morphological distinctiveness between specimens a priori.
A total of 252 DNA barcode sequences longer than 500bp, without stop codons or indels, were yielded. The base composition showed a mean percentage of 18.06% (G), 27.76% (C), 23.49% (A) and 30.7% (T). The sample included from two to 24 individuals per species with an average of eight ( Table 2). The mean intraspecific distance was 0.46% ranging from zero to 20.48% based on 1627 comparisons. This extraordinary high value of maximum intraspecific distance was recorded only in Knodus sp.1. The second higher value was observed in Aequidens epae (7.99%). The mean intrageneric distance was 14.79%, ranging from 3.55% to 22.14%, while within families the mean intrageneric distance was 24.25%, ranging from 4.12% to 32.76%.

Family
Collection sites ACZ2937 and ADC4164), with at least three species clearly evidenced through DNA barcodes. Additionally, Copella callolepis (n = 18) with three lineages and three BINs (ACX6532, ACH3210 and ACH3211) and Aequidens epae with two lineages and two BINs (ACH3650 and ADC3786) revealed cryptic diversity that were undetected with traditional morphology methods alone. The clusters recovered from the barcodes phylogenetic reconstruction indicated fully agreement to the species assigned a priori by morphological based identification (Fig 3). With the exception of Knodus sp.1 that was splited into multiple branches as a paraphyletic array (Fig  3). On the other hand, a Barcode Index Number (BIN) analysis, implemented with the Boldsystems workbench, revealed 35 clusters which 29 were classified as concordant (clusters constituted of one species); two clusters are singletons (Knodus sp. 1 -IGTAP264-16, BIN-ACG7692 and Copella callolepis-IGTAP045-13, BIN-ACH3211) and finally four clusters discordant (clusters constituted of more than one species). The latter included the following species: Hoplias malabaricus (BIN-ABZ3047), Knodus sp. 1 (BIN-ADC4164), Bryconops cf. transitoria (BIN-ACG8555) and Apistogramma agassizii (BIN-AAJ1190).

Table 3. BIN classification and measures of intraspecific genetic distances (I.D.) and nearest neighbor distances (NND) of fish species from the streams of the lower Tapajós River. Species with high intraspecific divergence (> 2%) were assigned in bold type. BIN classification follows-C (concordant), D (discordant) and S (singleton).
Species complex were illuminated with blue shadow. The distances were estimated following Kimura-2-parameter model.

BIN (classification)
Morphological The integrative approach following Pugedo et al. (2016) criteria for species delimitation highlighted 25 species (BIN concordant and NND > 2%), Table 3. Based on morphology traits examination we identified fifteen of these species, but the remaining 10 species did not have their taxonomic status precisely determined. Some of them presumably have poor diagnose characters resulting taxonomic confusion with congeners (e.g. Bryconops aff. caudomaculatus, Hemigrammus cf. vorderwinkleri), personal observation (FRVR). However, other taxa were better characterized as undescribed new species candidates (e.g. Hyphessobrycron gr. heterorhabdus, Melanocharacidium sp., Aequidens sp. and Bujurquina sp.).

Discussion
The Brazilian inland aquatic biota has been investigated for centuries, but we are far from to consider this megadiverse group reasonably well studied. A good example to highlight our ignorance on this theme could be the recently amazing discovery of a new fish family (Tarumaniidae-Characiformes) from deep fossorial Amazonian habitats [40].
Based on an integrative approach we delimited 29 nominal fish species from Amazonian streams and some of them clearly harbour cryptic diversity. The DNA barcoding evidence suggests that the most promising taxa with putative new undescribed species are Knodus sp.1, Aequidens epae and Copella callolepis. Three species (Bryconops cf. transitoria, Hoplias malabaricus and Apistogramma agasizii) were recognized based on morphology, but not with the present adopted DNA barcode criteria, since its BIN resulted discordant. Such discordance of BIN result can emerge from erroneous entries stored in BOLDsystems databases and comparisons with known complex of undescribed species (e.g. Hoplias malabaricus).
In order to explore the phylogenetic relationships of Knodus sp.1 from the Cupari-Tapajós drainage we assembled DNA barcodes downloaded from the GenBank (accessions: KF210030 -KF210276), assigned to the species included in the Knodus sensu stricto [48]. The putative species Knodus sp.1 BINs (ACZ2936; ACZ2937) showed a closer relation to Knodus sp. Xingu, whereas the BIN (ADC4164) was sister aligned to Knodus sp. Teles Pires. The Knodus sp.1 singleton BIN (ACG7692) branched as a separated lineage clearly distinct from the all species included within Knodus sensu stricto (Fig 4). Therefore, this complementary analysis revealed that Knodus sp.1 hides three new species that belong to Knodus sensu stricto [48] and carry molecular and geographic affinities with taxa from the Xingu-Tapajós drainages. On the other The genus Aequidens Eigenmann and Bray [50] encompasses 18 valid species, which are largely distributed along the South America drainages [47], [51], [52]. Previous records of Aequidens on the Tapajós basin were pointed to A. epae, A. mauesanus and A. tetramerus [9], [53]. Additionally, Silva-Oliveira and colleagues [9] reports on Aequidens sp. from the Cupari River drainage. In the present study, we found A. epae splited into two lineages that clearly diverged as full species (BINs ACH3650 and ADC3786). The first clade occurred in lowest portion of the Tapajós River near the confluence with the Amazonas River, instead of the later that occurred at the Cupari River drainage. A phylogenetic analysis of Aequidens from Tapajós, supplemented with DNA barcodes of congeners downloaded from Boldsystems, revealed that A. epae Cupari River nested with A. diadema (GenBank accession: GU817291) while A. epae Lowest Tapajós River was linked to this branch as a basal lineage. On the other hand, our specimens delimited as Aequidens sp. from São Jorge/Branco streams (BIN ACH3483) showed phylogenetic affinity with A. tetramerus. In summary, it is reasonable to point at least three putative new Aequidens species from the Tapajós basin: Aequidens sp. Cupari River [9], A. epae Cupari River (BIN ADC3786) and Aequidens sp. São Jorge/Branco streams.
The lebiasinid genus Copella was recently revised and encompasses 6 nominal valid species: C. arnoldi, C. callolepis, C. compta, C. eigenmanni, C. nattereri, and C. vilmae [54]. Based on morphological traits all the specimens recovered from the Tapajós basin streams (present study) were identified as C. callolepis, however, the molecular evidences suggest a species complex.
Two of the putative new species (C. callolepis-BIN ACH3211, BIN ACX6532) occurred at UDV and Irurá streams, both places situated in the periurban area of Santarém, the most populous city in the lower Tapajós region. The third species (BIN ACH3210) occurred at UDV, São Bras and Sonrisal streams. These aquatic systems are situated near of an important paved road (PA-457) that links Santarém to Alter do Chão Village. Because of their vicinity with an urban centre, there are high disturbance in the natural habitats associated with distinct human pressures as marginal deforestation, aquatic pollution, water collect for domestic and aquaculture usage. This scenario is threatening for long-term persistence of such populations and conservation/management plans, as well as further studies on integrative basis should be carried out, in order to minimize the associated risk of premature local extinctions of new undescribed fish species.
DNA barcodes has been largely assumed as a powerful tool for species delimitation and has been effective to discover new fish species; however, such operators enrolled in this methodology may contribute to diminish its precision and resolution power. For instance, in the present study our DNA barcode approach found 25 species based on Pugedo's criteria while the morphology examination pointed 29 species. These discrepancies arose because few OTU did not match with molecular and morphology identifications, since them were classified as discordant BINs or species complex. Factors that contribute to these inconsistencies could be the deficient library of standard DNA barcodes released from BOLDSystems platform and the occurrence of records with imprecise or erroneous entries uploaded in this repository, and that are being used for BIN analysis.
The insufficient coverage of the public repositories for standard DNA barcode sequences of Neotropical freshwater fishes is well demonstrated when 67% of the species listed in the present paper did not have any DNA barcode previously published. Moreover, DNA barcodes failed to delimit Bryconops cf. transitoria (BIN ACG8555) despite a consistent cluster of six individuals with mean intraspecific distance of 0.06 and NN distance of > than 2%. This BIN resulted discordant due to a clustering of B. cf. transitoria with Hyphessobrycon pulchripinnis previously deposited in BOLDsystems from a unique individual [27]. How these species are clearly distinguished by morphology, we suspected that it is a case of an erroneous entry of H. pulchripinnis. Surprisingly, the NN distance of H. pulchripinnis and H. rosaceus was 22.55, two times bigger than the second NN distance pair, 10.45 between H. eques and H. copelandi [27].
The Tapajós basin is a hotspot for stream ichythyofauna and the integrative taxonomy is a powerful methodology for prospecting biodiversity in the Amazonian waters. Further studies are advised to bring light on the obscure taxonomy of Knodus likewise on the phylogenetic and geographic relationships of the regional stream fish assemblages.