Individual body mass and length dataset for over 12,000 fish from Iberian streams

We provide a unique fish individual body size dataset collected from our own sampling and public sources in north-eastern Spain. The dataset includes individual body size measures (fork length and mass) of 12,288 individuals of 24 fish species within 10 families collected at 118 locations in large rivers and small streams. Fish were caught by one-pass electrofishing following European standard protocols. The fish dataset has information on the local instream conditions including climatic variables (i.e., temperature and precipitation), topography (i.e., altitude), nutrient concentration (i.e., total phosphorus and nitrates), and the IMPRESS values (a measure of cumulative human impacts in lotic ecosystems). The potential uses of this new fish dataset are manifold, including developing size-based indices to further estimate the ecological status of freshwater ecosystems, allometric models, and analysis of variation in body size structure along environmental gradients.


b s t r a c t
We provide a unique fish individual body size dataset collected from our own sampling and public sources in northeastern Spain. The dataset includes individual body size measures (fork length and mass) of 12,288 individuals of 24 fish species within 10 families collected at 118 locations in large rivers and small streams. Fish were caught by one-pass electrofishing following European standard protocols. The fish dataset has information on the local instream conditions including climatic variables (i.e., temperature and precipitation), topography (i.e., altitude), nutrient concentration (i.e., total phosphorus and nitrates), and the IMPRESS values (a measure of cumulative human impacts in lotic ecosystems). The potential uses of this new fish dataset are manifold, including developing size-based indices to further estimate the ecological status of freshwater ecosystems, allometric models, and analysis of variation in body size structure along environmental gradients.  Table   Subject Biology, Zoology Specific subject area Freshwater fish ecology, community ecology, environmental science Type of data Table  How data were acquired All data were provided by two public institutions: Agència Catalana de l'Aigua (ACA) and Confederación Hidrográfica del Ebro (CHE). Fish individual body sizes, water samples, anthropogenic pressures, and topographic variables were obtained in the field at each stream location following official biomonitoring programs. Field data was noted in sheets and later organized in files in text format. Water samples were analysed in the laboratory to obtain nutrient concentrations. The climate data was retrieved from the Global Climate Data. All data were curated and analysed in the statistical environment R 4.

Value of the Data
• Assessing biodiversity status and trends in fish communities is critical to maintaining ecosystem services. • Individual body-size fish data is rarely available but size-based approaches can be useful to integrate with official biomonitoring programs. • The database can be used by other researchers to investigate the community patterns in Mediterranean streams and to assess the biological status of fish species. • The database also contributes to enhancing the knowledge of the ecology and biology of fish in the streams of the Iberian region, a region heavily impacted by human activities but holding a unique fauna.

Data Description
The present data article includes 12,288 of individual body length (mm) and body weight (g) of 24 fish species in 118 stream locations in the north-eastern Iberian Peninsula (latitudinal gradient from 40.73 °to 43.08 °N and longitudinal gradient from 4.17 °W to 2.29 °E; Fig. 1 ). Specifically, stream locations are in an area mostly characterized by the Mediterranean climate and located within the west part of the Palearctic ecoregion, within the Ebro basin ( n = 103) and smaller river basins of Llobregat ( n = 6), Besós ( n = 5), Francolí ( n = 3), and Gaià ( n = 1). The accessibility of this data came from a recent scientific publication by Arranz et al. [1] . The data was collected from two main public institutions named Agència Catalana de l'Aigua (hereafter, ACA) and Confederación Hidrográfica del Ebro (hereafter, CHE). The dataset can be found in the Zenodo data repository [2] and includes three text files with tab-separated values. The first file is named 0_Data_Dictionary and contains a detailed description of the variables in the following two files including the definition, and attribute of each variable (see also Table 1 for a summary). The second file is named 1_stream_information and provides the complete records of the stream locations (toponomy and data source from the two public institutions), fish sampling (date, number of fish caught and sampling area), local environmental information, and a measure of anthropogenic pressure of each stream location. Most of the environmental information, which includes water samples and topographic variables, was obtained on the same day of the fish sampling. Water samples were frozen and transported to the research laboratory for the analysis of nutrient concentrations (see Material and Methods for further details). Additional environmental data, which includes climate information, was obtained from the geographic coordinates of each location (longitudinal and latitudinal in a World Geodetic System 84, WGS84) and retrieved from the Global Climate Data (hereafter, WorldClim) with a spatial resolution of 1 km 2 using the statistical environment R version 4.1.0 [3] . The anthropogenic pressure was described as the IMPRESS value, a standardized metric derived for the Water Framework Directive to assess  the ecological health of European streams [4] . It evaluates multiple pressures including hydrological alterations, point and diffuse source of pollution, and riparian landscape changes [4] . The third and the last file, named 2_fish_information , contains the individual fish body size (length and weight) and the scientific Latin name of the fish species. The files 1_stream_information and 2_fish_information can be concatenated according to the variable in both files named Code_ID , which consists of a numeric sequence of values from 1 to 118. In addition, we provide sensitivity analyses of the fish individual body size through mass-length relationships for each fish species. The mass-length relationships can be useful for other studies in the same region when the direct measures of individual fish body mass, usually more time-consuming to measure in the field, are not available. All data were curated, organized, and analysed in the statistical environment R version 4.1.0 [3] .

Data selection
Complete fish and stream data are available at https://www.chebro.es/ for the CHE and from the corresponding author upon reasonable request. We carried out initial data screening to select comparable stream locations with robust data that can largely represent the species composition, and body size structure of each fish assemblage among stream locations. To do this, we limited the selection of streams sampled from May to October (both included) to avoid the transient effects of seasonal events and sudden increases in fish density from spring reproduction. We further screened samples by a floor of 34 individuals measured in order to minimize statistical biases associated with fish low catches. The number of catches varied substantially among sites (median = 91; SD = 54.3). In cases of more than one sampling occasion per year within the seasonal range, we kept one sample (the more with more fishes caught). In total, the data selection comprehended 118 stream reaches sampled between 2003 and 2009.

Fish sampling
Fish sampling was carried out through one-pass electrofishing (from 2.5 to 4.5 kW and from 300 V to 800 V, pulsed DC current) with the help of operators holding dip nets to catch fish stunned by the electric field [5] . Each sampling location covered all mesohabitats (e.g., runs, riffles, pools) and sampling stream length varied according to the stream width (from 20 m in small streams to 50 m of river margin [5] ). Depending on the stream width, electrofishing was carried out by boat (usually in near-shore areas) in large rivers or by wading foot in small streams [5] . The fish capturability of the electrofishing gears used has been analysed elsewhere, and comprises one of the most efficient methods for biomonitoring programs [6 , 7] . Water conductivity was measured prior to electrofishing to determine the appropriate output voltage for effective sam pling but minimizing unwanted fish mortality. Additionally, fish sampling was done when the water temperature was > 5 °C because fish catchability is low below that temperature [5] . Fishes stunned by electrofishing were anesthetized using MS222 (tricaine methanesulfonate), identified to species level, measured (fork length, mm), weighed (g), checked for DELT anomalies (Deformities, Eroded Fins, Lesions, and Tumors), and released back to the stream after recovery from MS222.

Local abiotic information
A one-liter water sample was taken in each stream location, and transported in cool boxes in the laboratory. Water samples were immediately filtered in the laboratory through Whatman GF/F filters (0.7 μm pore size and 47 mm diameter), and stored frozen until nutrient analyses, following the International Organization for Standardization (ISO). For the concentration of total phosphates (mg • l −1 ), a Continuous Flow Analysis (CFA) was conducted for each sample in each stream location following the UNE-EN ISO 6878:2005 [8] . For the total nitrates (mg • l −1 ), a chemiluminescent technique for the determination of nanomolar quantities of nitrate, nitrate plus nitrite, or nitrite alone in the stream water was conducted in each stream location following the UNE-EN ISO 11905-1:1998 [9] . Climate-related variables were represented by mean annual air temperature ( °C), precipitation (mm), and altitude (m). They were calculated from the geographic coordinates of each location (that is, longitudinal and latitudinal in a World Geodetic System 84, WGS84) as the 20-year average at 1 km 2 spatial resolution from the WorldClim database [10] . The calculations of the climate-related variables were carried out using the statistical environment R version 4.1.0 [3] .

IMPRESS values
The cumulative effects of anthropogenic pressures are represented by the IMPRESS values. IMPRESS is an analysis to identify the pressures and to assess impacts, derived by the Water Framework Directive (WFD) to assess the ecological health of European lotic habitats [4] . Specifically, the IMPRESS values encompass cumulative pressures related to the presence of contaminants, hydromorphological alterations, and land-use changes (see details and formulas for each pressure in [11 , 12 , 13] ). A greater value of IMPRESS means greater anthropogenic pressure, and thus failure in achieving the Directive's environmental objectives [4] . The European Commission, in the context of WFD, developed a protocol to explain how to calculate the IMPRESS value using data on the presence of contaminants, hydromorphological alterations, and land-use Table 2 Information of the mass-length relationships of the stream fish species with more than 25 individuals. Descriptive and regression statistics of the linear regressions (log 10 M = a + b log 10 L ) between fish individual mass ( M ) and length ( L ) for each fish species. Length is fork length, except for eel, mosquitofish, blenny, and catfish, where it is total length. Min. = minimum, max. = maximum; CI = confidence interval.  [4] . Then, each public agency implemented this protocol using data collected by themselves [11 , 12 , 13] . The IMPRESS values for our dataset were collected from these public agencies. CHE provided IMPRESS data of Ebro basin and its affluents whereas ACA facilitated IMPRESS data from Llobregat, Besós, Francolí, and Gaià streams.

Data validation
In order to confirm the robustness of our stream fish dataset, we used mass-length relationships (MLs) for each fish species whose abundances reached more than 25 individuals (in total, 18 out of 24 fish species). We regressed log 10 fish mass with log 10 fish length of the focal species, and used the coefficient of determination ( r 2 ) as a measure of goodness of fit ( Table 2 ). As fish measures taken from the field may often cause errors in the fish body sizes [14] , we removed individuals in which the residuals from the MLs at log-log scale were two times higher than the standard deviation (in total, 405 outliers representing 3.19 % of the 18 fish species selected for MLs).

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that have or could be perceived to have influenced the work reported in this article.

Ethics Statement
The work did not involve the use of human subjects, animal experiments, nor data collected from social media platforms. The fish assemblage was sampled by electrofishing following the European Committee for Standardization protocol [5] . Fishes stunned were collected with nets, placed at the shore in buckets (with aerators), identified to species level, measured, and immediately released to the same site.