Dataset on the distribution location and biological traits of freshwater fishes in the Yangtze River Basin

In this data article, we provide the scientific and theoretical data on fish taxonomy including class, order, family, and genus in the Yangtze River. The Yangtze basin is divided into 56 units, and their geological information including latitude, longitude, latitude, and channel length is recorded. Fish presence/absence data at the unit scale are reported. Biological traits including morphological, physiological, and ecological characters of each fish species are also described, numeralized, and reported. These data are the foundation of the analyses and results in the article “Continental-scale analysis of taxonomic and functional fish diversity in the Yangtze River” (Kang et al., 2018).


Specifications table
Subject area Biology More specific subject area Biodiversity and conservation Type of data Fish distribution data can be incorporated into other biogeographical studies concerning the environmental pollution, climate change, and human activities.
Surveyed biological traits of fishes in the whole basin can provide insights in species interaction, ecosystem function, and conservational decision-making.

Data
The Yangtze River, covering multiple types of landforms and climatic zones, supports abundant fish diversity and resources. In this data article, based on the natural river system and discharge, we divided the basin into 56 units [2] and extracted their geological data (Table 1, see Figure 1 in Ref. [1]), including longitude, latitude, altitude, and channel length of each unit. On the basis of available literatures, we reviewed and updated the freshwater fish species and their distribution at unit scale (Appendix A) to determine the species richness. According to their phylum (class, order, family, and genus; Table S2), taxonomic diversity of each unit was calculated. Biological traits were used to determine functional diversity. The data catalogs include body shape, feeding habit, trophic level, water zone, water column position, and water temperature (Table 2), and the details of each species were determined (Appendix B). We quantified the dissimilarities of species richness, taxonomic, and functional diversity among all the units, and traced the process of species turnover and nestedness.

Experimental design, materials, and methods
The Yangtze Basin was dived into geographic 56 units (sub-basins) according to the natural river system and discharge, each unit with annual discharge larger than 300 Â 10 8 m 2 . Then the maximum, minimum, and mean values of longitude, latitude, and altitude were extracted; the channel length of each unit was calculated.
To clarify the fish fauna in the basin, we collected available literatures including monographs, published papers, investigative reports, and additional records. The compiled data were revised following Fishbase [3] to avoid invalid species, as well as synonyms and homonyms.
At the unit scale, the records of species locality were identified for constructing the species distributional data matrix. The presence/absence data were scored '1' for the presence of a species in a unit, and '0' for its absence. An aggravate data matrix on species taxonomy was compiled with order, family, and genus as columns, and species as rows.
We constructed a functional traits matrix regarding morphological, physiological, and ecological characters. Morphological parameters (the shape of body, head, eye, fin, etc.) were directly measured  from available formalin fixed specimens. We extracted data for feeding habit and trophic level from the literature [4] and Appendix A. Data on water zone, water column position, and water temperature suitable for a species were extracted by reviewing the distributional information from referential sampling reports. When physiological and ecological knowledge of a species was not available, we extrapolated the data for the genus to the species level. Traits in ordinal, nominal, and continuous data were then numeralized: For the numerical data, an average value was calculated and assigned to each individual [5] when more than one value was available for a given species; for the nominal data, the trait status received different values based on its category.