A Standardized Abundance Index from Fishery Independent Data: A Case Study of Swordfish (Xiphias Gladius) from Indonesian Tuna Longline Fishery

Most varieties of the billfish caught in the Indian Ocean are either swordfish or Indo-Pacific sailfish. Swordfish is mostly considered as by-catch from tuna longline fisheries, except for South African, Spanish and Portuguese fleets. Despite of its importance, little are known about their abundance. Relative abundance indices are the input data for stock assessment analyses that provide useful information for decision making and fishery management. In this paper, a Generalized Linear Model (GLM) was utilized to systematize the catch-per-unit-effort (CPUE) and to estimate relative abundance indices based on the Indonesian longline dataset. The data was collected by scientific observers from August 2005 to December 2016. Conventional models for counting data were used, but zero-inflated and hurdle models also considered, due to the high number of zero-catchper-set. Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) were applied to select the best models among all those evaluated. Both AIC and BIC suggested that the simple negative binomial (NB) model is the best option. The trends were relatively similar to the nominal series, but with smoother peaks. In general, there was a tendency of positive trends in the last decade, with the series varying throughout the period.


Introduction
Swordfish (Xiphias gladius) is a large oceanic apex predator inhabits all the world's oceans. It is predominantly known as a subject of exploitation worldwide, mainly in the Pacific Ocean, Atlantic Ocean, and Mediterranean Sea [1]. Throughout the Indian Ocean, swordfish are primarily caught by longline fisheries, and the commercial harvest was first recorded by the Japanese in the early 1950s as bycatch of their tuna longline fisheries [2]. Since 1990s the catches of swordfish increased sharply to a peak of 35,000 tons in 1998 [2] due to the growing shift of catching tunas to swordfish by Taiwanese longline fleets, the increasing number of longline fleets operations from various nations (e.g. Indonesia, Australia, La Reunion, Seychelles and Mauritius), and arrival of longline fleets from the Atlantic Ocean (e.g. Portugal, Spain and United Kingdom). In recent years (2013-2017), Indonesian fleets are responsible for approximately 20% of the total catch of swordfish in the Indian Ocean (~8,000 MT), followed by Taiwan (17%), Sri Lanka (12%) and Spain (12%) [2]. However, the total catch was revised to just under ~3,000 MT (9%) due to the refining methodology on catch estimation provided by the IOTC secretariat [3]. In addition, the revision also aligned with the impact of Ministerial Regulation No. 56/2014 and No. 57/2014 about the moratorium on foreign fishing vessels and prohibition of transshipment at sea within Indonesia national jurisdiction, which resulted in a significant reduction of longline vessel operations from 584 in 2015 to 271 in 2016. Despite the abundant catches, swordfish is still considered as a bycatch of the Indonesian commercial tuna longline fishery [4].
The current population of swordfish in the Indian Ocean is considered neither overfished or a subject to overfishing [2]. Nevertheless, the most recent catches (31,407 t in 2016) were at the Maximum Sustainable Yield (MSY) level (31,590 t). Hence, due to the uncertainty from the Indonesian catch, in 2017 the number should not have been above MSY level. The estimation of relative abundance indices (e.g., standardized catch-per-unit-of-effort/CPUE) signifies important information concerning the status of fisheries stocks [5]. Furthermore, those indices are necessary to run simple models, and they are also used as auxiliary data in more detailed stock assessment models [6]. Standardizing CPUE for species with a low number of catch and a substantial proportion of zero catch, such as billfishes, are needed to consider more inclusive models to analyze the proportions of zeros and the positive catch rates separately (i.e., delta distribution models) or use of zero-inflated models [5]. However, in many cases, longline data are compatible with two parts or hurdle models because they accommodate more flexibility than single-part distributions, particularly for rare and nontargeted species [7].
Our analytical objective was to investigate how the data-limited of swordfish fishery can construct a fairly robust relative abundance indices amid the "spatial gap" of the existing dataset for standardized CPUE in the eastern Indian Ocean (e.g., Japanese and Taiwanese longline dataset). We believe the results are valuable as an important information to assess the status of swordfish in the Indian Ocean.

Data collection
This research analyzed the data gathered by the Indonesian scientific observer on commercial tuna longline vessels, which are mainly situated in Benoa Fishing Port, Bali. The dataset included information concerning the number of fishes caught by species, the total number of hooks, the number of Hooks Between Floats (HBF), the start time of the set, soak time, and geographic position (latitude and longitude) where the longlines deployed into the water. The response variable in the models was the catch of swordfish in number (N). Year and quarter were used as a categorical (factor) explanatory variables. Additional information was used as explanatory variables as follows: − Area: treated as a categorical variable, describing spatial catch within and outside the Indonesian EEZ. We also use latitude and longitude as additional quantitative variables. − Start time of the set: treated as a quantitative variable, the values were rounded to the nearest integer; − Soak time: calculated as the time elapsed between the start of setting up the longline and the start of hauling. Soak time in the model was treated as a continuous variable. Thus the value was rounded to the nearest integer; − Moon phase (29.5 days): categorized into two periods, as light and dark, and assumed the demilunes (first/last quarters), and waning gibbous and full moon as light period; new moon and crescent as dark period [8]; − The number of hooks between floats: treated as a quantitative variable instead of factor.

CPUE standardization
We considered six GLM models for modeling the number of swordfishes for modeling the nominal catch (number of fish) as a response variable while effort was included in the models as an offset caught. These models are Poisson (P) and Negative Binomial (NB), which we refer to as the standard models, Zero-Inflated Poisson (ZIP), Zero-Inflated Negative Binomial (ZINB), Poisson Hurdle (PH), and Negative Binomial Hurdle (NBH) models.
We applied a forward approach to select the explanatory variables and the order included in the full model. The first step was to fit simple models with one variable at a time. The variable included in the model with the lowest residual deviance was selected as a start. As the second step, the model with the selected variable then received other variables one at a time, and the model with the lowest residual deviance was again selected. The same procedure was extended until the residual deviance did not decrease as new variables added to the previously selected model. Finally, all main effects and first-order interactions were analyzed and a backward procedure based on Akaike Information Criterion (AIC) [9] and Bayesian Information Criterion (BIC) [10] was used to select the final models for the six approaches. We also relied on AIC and BIC to compare these models.
The qualities of the fittings were assessed by comparing the observed frequency distributions of the number of fish caught to the predicted frequency distribution, as calculated using the selected models. Kolmogorov-Smirnov test was used to assess whether the difference between the two distributions (observed and predicted) were significant. Maps were produced using QGIS version 2.14 [11] and the statistical analyses were carried out using R software version 3.5.3 [12], particularly the package pscl [13], lsmeans [14], MASS [15], Hmisc [16], and statmod [17].

Descriptive Catch Statistic
RITF scientific observers recorded catch and operational data at sea following Indonesian tuna longline commercial vessels from 2005-2016. The dataset contained 100 trips, 2565 sets, 2797 days-at-sea, and more than 3.3 hooks deployed, respectively (Table 1).  1).

Fig 1.
The distribution of the Indonesia observer data used in this SWO CPUE standardization. The effort is represented in 2x2 grids with darker and lighter colors representing areas with more and less effort respectively in the total number of hooks.

CPUE standardization
The number of parameters (k), AIC, BIC, the logarithm of the likelihood (logLik), number of predicted zero catches, and p values of Kolmogorov-Smirnov test as calculated using six model structures (P, NB, ZIP, ZINB, HP, and HNB are shown in Table 2. The overall logarithm of likelihood of zero-inflated and particularly hurdle models was high, but they will be more complex if many parameters involved. The number of zero catches in the database was 1823. The negative binomial model was selected because it holds the lowest value in both AIC and BIC (4330.40 and 4599.47, respectively). Several explanatory variables tested for the SWO CPUE standardization were significant and contributed significantly in explaining the part of the deviance. Some interactions were also significant and therefore were included in the final model. On the final model, the factors that majorly contributed to the deviance were the Start_Set, followed by Year, HBF, Moon, Quarter, Area, Long, Lat and the interactions (Table 3). In terms of the model validation, the residual analysis, including the residual distribution along with the fitted values, the QQ plots, and the residuals histograms, showed that the model was adequate with no major outliers or trends in the residuals (Fig. 4).   The final standardized SWO CPUE index (N/1000 hooks) from Indonesian data in the eastern Indian Ocean between 2005 and 2016 is shown in Figure 5 and Table 4. The trends were relatively similar to the nominal series, but with smoother peaks. In general, there were no noticeable trends, with the series varying along the period.

Discussion
Simple negative binomial distribution worked better for data that contained a lot number of zero observations, as shown in this study, for example. However, although the final model involved a lot of interaction among variables, it is unlikely to cause overfitting. The series, overall, showed a positive trend with high variations occurred toward the end of the series. It is probably be affected by a lot of zero catch per set in the data. Thus, the nominal CPUE is highly influenced by possibly non-targeting fleets and biased the calculation. A few workarounds, can be done in order to reduce unwanted zero catches, i.e. using core area [18], which is a 1x1 degree-based catch block with a minimum constant catch for at least 6 years (doesn't need to be consecutively) could improve the proportion of zero catch from 93% to 58%. Other solutions, such as: incorporating random effect into General Linear Mixed Model (GLMM) [19], using a more complex model, i.e., delta-lognormal [20] or applying to smooth in the zero-inflated negative binomial model [21] could be considered. Inclusion of environmental factors, i.e., Sea Surface Temperature (STT) also highly correlated to CPUE of swordfish [22,23]. The high proportion of zero catch per set that occurred during observation may relate to low productivity areas surrounding the southeastern Indian Ocean. High swordfish catch rates recorded by Spanish, Portuguese, Japanese, and Taiwanese fleets are mostly concentrated in the western Indian Ocean and northeastern Indian Ocean [19,[23][24][25]. While, south of Java, Bali and Nusa Tenggara waters are well-known as southern bluefin tuna spawning ground, instead [26]. Swordfish is a by-product from tuna longline fishery targeting yellowfin, albacore, and bigeye tuna. Thus, a large proportion of sets (~80%) were conducted at dawn. While most swordfish-targeted fishery mostly commenced sets at night [27], following their diel vertical migration [28].
Overall, the model could only explain ~20% of the variables; the interaction did play a part but only contributed to ~6% improvement. Start of sets (Start_Set) was the main driven factor, as it reduces the residual deviance substantially, followed by Year, hooks between floats (HBF) and moon phase (Moon), respectively. On the other hand, swordfish catch rates were not influenced by spatial influence, such as latitude and longitude. Little are confirmed that setting time is affecting the variability of swordfish CPUE since most assessments are obtained from logbook data, but Setyadji et al. [29] found that it was not an influential variable and was dropped from the model, at least in the black marlin (Makaira nigricans) case. A plausible scenario, although most of the sets were conducted at dawn, the probability of swordfish caught was higher at night, despite the small margin between two-time frames (26% and 35% respectively). HBF mainly correlated with set depth which reflects targeting strategy [30] and may vary spatially according to different HBF set up [31]. It also an integral variable to the CPUE standardization model when available [19,20]. With deep longline configuration, it is more likely swordfish to be caught during night sets compared to daylight (30% and 17%, respectively). Swordfish mainly caught during the first and last quarter of the lunar phase in longline fishery [32] and during the new moon or dark period in gillnet fishery [8]. By contrast, soak time (Soak_Time) and a quarter were also important but only explained an inconsequential portion of CPUE variability. Further, Carruthers et al. [27] also found that both soak time and various depth temperature did not affect swordfish catch rates. Instead, it highly correlated with the catch of blue shark (Prionace glauca).
Although the model fits very well, and a closer look on the standardized trend, it describing similar compared to Japanese and Taiwanese at the relatively same period (2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015) and area (southeast Indian Ocean) [19,20]. Therefore, In order to get a clear picture of the population of swordfish in the Indian Ocean, particularly in the southeastern area, a joint CPUE initiative as it did with yellowfin and albacore fishery [33,34] should be a priority.
Most of the catch and effort data are usually obtained from a fishery-dependent source, such as logbook, however, in this study, fishery-independent data (scientific observer) was accommodated instead since the application of logbook system has not met the satisfactory requirement for analysis, at least in the last two decades. Exploratory surveys or scientific observer data usually more reliable since it conducted by competent scientists or technicians. Nevertheless, the substantial cost involved could be a barrier to further development. A continuation of such a program is a necessity, focusing on more spatial coverage and quantity of effort observed (at least 5% of total effort).
Both AIC and BIC suggested that the simple negative binomial (NB) model is the best in determining the abundance index for swordfish. On the overall, the trends were relatively similar to the nominal series, but with smoother peaks. In addition, there was a tendency of slightly positive trends in the last decade, with the series varying throughout the period.
In order to get a more robust analysis, given the constraints on spatial coverage, exploring another model, such as the delta-lognormal model and advanced data screening to reduce the proportion of zero-catch-per-set could be a thing to consider in the future.