A reconstructed database of historic bluefin tuna captures in the Gibraltar Strait and Western Mediterranean

This data paper presents a reconstruction of a compilation of a small but consistent database of historical capture records of bluefin tuna (Thunnus thynnus; BFT hereafter) from the Gibraltar Strait and Western Mediterranean (Portugal, Spain and Italy). The compilation come from diverse historical and documentary sources and span the time interval from 1525 to 1936 covering a period of 412 years. There is a total of 3074 datum, which reach up to 67.83% of the total implying a 32.17% of missing data. However, we have only reconstructed the captures for the time interval 1700–1936 and we provide these reconstructions only for this time interval and for 9 out of 11 series due to the scarcity and inhomogeneity of the two oldest capture time series. This reconstructed database provides an invaluable opportunity for fisheries and marine research as well as for multidisciplinary research in climate change.


a b s t r a c t
This data paper presents a reconstruction of a compilation of a small but consistent database of historical capture records of bluefin tuna (Thunnus thynnus; BFT hereafter) from the Gibraltar Strait and Western Mediterranean (Portugal, Spain and Italy). The compilation come from diverse historical and documentary sources and span the time interval from 1525 to 1936 covering a period of 412 years. There is a total of 3074 datum, which reach up to 67.83% of the total implying a 32.17% of missing data. However, we have only reconstructed the captures for the time interval 1700-1936 and we provide these reconstructions only for this time interval and for 9 out of 11 series due to the scarcity and inhomogeneity of the two oldest capture time series. This reconstructed database provides an invaluable opportunity for fisheries and marine research as well as for multidisciplinary research in climate change.
&  This database provides an invaluable opportunity for fisheries and marine research (e.g., resources management) as well as for multidisciplinary research in climate change.
This datasets will be beneficial to understand the bluefin tune population dynamics and their relationship with different environmental variables

Data
The historical BFT captures span the time interval from 1525 to 1936 covering a period of 412 years (Fig. 1). There is a total of 3074 datum, which reach up to 67.83% of the total implying a 32.17% of missing data (Fig. 2). Data were manually digitalized from diverse documentary and historical sources as well as some "recent" publications [1][2][3][4][5][6][7][8][9]. Moreover, the database was double-checked due to potential typographical errors by the investigators. In addition, we have compared visually and quantitatively (as much as possible) our compilations with other previous works [2,4,[6][7][8][9][10]. After a preliminary inspection, we decided to limit our data reconstructions to the time interval from 1700 to 1936, due to the scarcity and inhomogeneity of the two oldest capture time series (Conil and Zahara; Fig. 1 in [9]). As a consequence of these drawbacks, we removed Conil and Zahara in our data reconstructions (Fig. 3).

Experimental design, materials and methods
We have reconstructed the missing data using the Data INterpolating Empirical Orthogonal Functions technique (DINEOF) [11][12][13]; as implemented in the R package sinkr [14]. This statistical technique of data reconstruction is based on the decomposition of the time series into Empirical Orthogonal Functions (EOF), and it was first applied to fisheries by [13]. DINEOF is a self-consistent method for reconstructions missing values contained in geophysical data (i.e., oceanographic, meteorological, etc.) [15]. This statistical method is based on the fact that an optimal number of EOFs, usually very small if compared to the total number of EOFs, retains a large fraction of the total  variance of the whole dataset. The DINEOF method fills the missing data by means of an iterative process [12,13]: 1) the leading EOF is computed; 2) the leading EOF is used to estimate the anomalies at the missing points; 3) the process is iterated until convergence in the anomalies at the missing values is achieved from one iteration to the next within a prescribed tolerance level; 4) once the convergence is reached, the number of computed EOFs increases, from 1 to 2 and next to k max EOFs; and 5) there is an estimate for the missing data reconstructed after convergence is achieved with a reconstruction computed using 1, 2, …, k max EOFs. The optimum number of EOFs to be used in the reconstructions is defined by means of the cross-validation technique [16]. However, in this data paper, we have used the maximum number of EOFs, which corresponds to the number of reconstructed time series (i.e., 9 series).