R language-based methods for revealing taxonomic and functional diversity and tracing process of fish fauna

Graphical abstract The figure showed the spatial pattern of an integrated index of multifaceted diversity in the Yangtze River Basin, considering the higher fish species richness and functional contribution. Headwater (characterized by abundant endemic species), the middle mainstream (necessary to fish migration), and the lakes (showing higher diverse species) were primary zones for conservation.


Geological parameters
To determine the spatial pattern process of fish diversity at taxonomic and functional aspects concerning to the environmental characters, we divided the Yangtze River Basin could be into 11 subbasins as Jinsha, Min-Tuo, Jialing, Han, Wu, the upper mainstem, the middle mainstem, the lower mainstem, Dongting, Poyang, and further into 56 geographic units, according to the natural river system and annual discharge.
The longitude, latitude, altitude, and channel length data of each unit were derived through the Spatial Analyst Tool in ArcGIS 10.0 [1]. The longitude and latitude extents of a unit were represented by a rectangle envelope spanning the minimum and maximum x,y coordinates of the region, the four points and the centre of gravity of an envelope enclosing the unit were computed using the toolset. At the end, the maximum, minimum and mean values of each factor were calculated by comparing and averaging the values of each grid within the sub-basin through the Zonal Statistics tool of Spatial Analyst. The altitude data with a spatial resolution of 3 arc-seconds ($90 meters) was provided by the Shuttle Radar Topography Mission database (https://lta.cr.usgs.gov/SRTM) download from the International Scientific & Technical Data Mirror Site (Computer Network Information Centre, Chinese Academy of Sciences, http://datamirror.csdb.cn).
The hydrologic analysis in Spatial Analyst Toolbox of ArcGIS was used to model the movement of water across a basin. An eight-direction (D8) flow model [2] was used to determine the direction of flow (a direction of steepest descent) from every cell in the elevation raster. After creating a hydrologically conditioned elevation model (the sinks were identified and filled) from a digital elevation model (DEM), the flow accumulation for each cell location could be calculated. Supposing when enough water flows through a cell, there should be a stream passing through it, thus defining the stream network. The length of all streams combine in a unit was summed as channel lengths.

Taxonomic diversity approach
Taxonomic diversity (TD), incorporating the degree to which the species are taxonomically related to each other, can be used to describe biological diversity of an area more than the numbers of species present. For example, TD of an assemblage containing n species belonging to x genus is larger than that of another assemblage containing n species belonging to y genus (x>y). The Average Taxonomic Distinctness (AvTD), defined as the average taxonomic path length, was calculated based on the taxonomic distance through a classification tree at four taxonomic levels (species, genus, family, order) between every pair of species, with an advantage of not being affected by the number of species or the sampling efforts [3]. The AvTD (D+) is determined from data consisting only of presence or absence of taxa: Total Taxonomic Diversity (TTD, sD+) sums up distinctiveness values for particular species across the community: where s is the number of species present, the double summation is over the set {i = 1, . . . s; j = 1, . . . s, such that i < j}, and vij is the 'distinctness weight' between species i and j.
All the taxonomic diversity indices calculation was undertaken using R language [4] (Appendix A in Supplementary materials).

Functional diversity indices
Functional richness, representing the amount of functional space filled by a community, quantifies the functional diversity for unit with species distributed in a multidimensional functional space [5]. The relationship between taxonomic and functional diversity suggests a predominant influence of environmental filters on the functional structure of communities, which is not necessarily reflected in their taxonomic structure. As probably several species showed similar functional traits, FD variation was small than SR. Similarly, as several species with differentiated functional traits belonged to the same taxon, TD variation was small than FD.
In order to estimate the volume filled in the T dimensional space by the community of interest, the quick hull algorithm was used to compute the minimum convex hull including all considered species.
First, a functional distance matrix based on biological traits of each species was computed, using Gower's distance, to compile a metric able to accommodate nominal, ordinal, continuous and missing data. Then, a Principal Coordinates Analysis (PCoA) was constructed to represent the multivariate traits space occupied by the fishes. Following a trade-off between information quality and computation time, we finally kept the species coordinates on the first three axes as the values of three synthetic functional traits describing fish functional strategies. All calculations of functional diversity indices were performed in the R statistical and programming environment using the 'FD' and 'vegan' packages [4] (Appendix B in Supplementary materials).

The β-diversity and its components
The pairwise β-diversity (species richness and taxonomic diversity) of units' fish communities was measured using the Jaccard dissimilarity index (β jac ) based on the number of species, and divided into β-diversity turnover (β jtu ) and β-diversity nestedness (β jne ) [6]. The β jtu measures the proportion of species that would be replaced between assemblages if both assemblages had the same number of species, hence accounts for species replacement without the influence of richness differences; β jne reflects the increasing dissimilarity between nested assemblages resulting from the increasing differences in species richness. The calculation formulas were: + 2 min (b, c), Where a is the number of species present in both sites, b is the number of species present in the first site but not in the second, and c is the number of species present in the second site but not in the first.
Based on the convex hull volume, functional β-diversity = (functional space not shared) / (total functional space filled) was decomposed into functional turnover and functional nestedness components as: For each unit, we calculated the mean β-diversity by computing the average of the pairwise comparisons between each focal basin and all the remaining basins. All calculations of functional diversity indices were performed in the R statistical and programming environment using the 'FD', 'betapart', and 'vegan' packages [4] (Appendix C in Supplementary materials).

Integrated diversity index
The 9 aspects of calculated diversity indices (species richness, β-species richness turnover, βspecies richness nestedness, taxonomic diversity, β-taxonomic diversity turnover, β-taxonomic diversity nestedness, functional diversity, β-functional diversity turnover, β-functional diversity nestedness), each reflecting an aspect of fish fauna, were normalized to eliminate the gaps of original numerical differences: y i = [x i Àmin(x j )]/[max(x j )Àmin(x j )], (1 i n, 1 j n), where x i is the original data, max(x j ) and min(x j ) are the maximum and minimum values of the sample data, respectively, and y i is the normalized data.
A trade-off integrated index was determined by summing all the normalize values, and then sorted to screen the prior zones for conservation; where i means different aspect of diversity.