Elsevier

Ecological Informatics

Volume 25, January 2015, Pages 35-42
Ecological Informatics

Evaluating machine-learning techniques for recruitment forecasting of seven North East Atlantic fish species

https://doi.org/10.1016/j.ecoinf.2014.11.004Get rights and content

Highlights

  • Fish recruitment forecast is crucial for fisheries management; however, data is sparse and difficult to gather.

  • A series of machine learning methods are presented and compared.

  • A series of performance metrics and statistical validation methods are presented and used.

  • Probabilistic classification methods are shown to be adequate to deal with the uncertainty at forecasting fish recruitment.

  • In particular, the flexible naive Bayes is used and tested on real-world data for fisheries management purposes.

Abstract

The effect of different factors (spawning biomass, environmental conditions) on recruitment is a subject of great importance in the management of fisheries, recovery plans and scenario exploration. In this study, recently proposed supervised classification techniques, tested by the machine-learning community, are applied to forecast the recruitment of seven fish species of North East Atlantic (anchovy, sardine, mackerel, horse mackerel, hake, blue whiting and albacore), using spawning, environmental and climatic data. In addition, the use of the probabilistic flexible naive Bayes classifier (FNBC) is proposed as modelling approach in order to reduce uncertainty for fisheries management purposes. Those improvements aim is to improve probability estimations of each possible outcome (low, medium and high recruitment) based in kernel density estimation, which is crucial for informed management decision making with high uncertainty. Finally, a comparison between goodness-of-fit and generalization power is provided, in order to assess the reliability of the final forecasting models. It is found that in most cases the proposed methodology provides useful information for management whereas the case of horse mackerel is an example of the limitations of the approach. The proposed improvements allow for a better probabilistic estimation of the different scenarios, i.e. to reduce the uncertainty in the provided forecasts.

Introduction

Early on in fisheries research, recruitment was identified as a key element in management. As a result, recruitment and the factors determining it have been the subject of intense research (e.g. Cushing, 1971, Myers et al., 1995, Ricker, 1954, Rothschild, 2000). Such research has evolved from considering only the biomass of spawners, to including also environmental factors that can modulate recruitment (e.g. Planque and Buffaz, 2008, Schirripa and Colbert, 2006). The main limitation to achieve good forecasts, from a data analysis perspective is the sparse and ‘noisy’ nature of the available data (Fernandes et al., 2010, Francis, 2006).

A further problem is that data about some of the factors that can be controlling recruitment directly (e.g. food availability, larval growth), may be more laborious to obtain, than the recruitment estimate itself (Irigoien et al., 2009, Zarauz et al., 2008, Zarauz et al., 2009). Based on a simplified approach, fisheries management has been moving towards the use of environmental relationships using oceanographic data. These are collected routinely, as proxies of recruitment conditions (Bartolino et al., 2008, Borja et al., 2008, De Oliveira et al., 2005). Nevertheless, the problem remains difficult because the mechanisms behind such relationships are often poorly understood; this in turn, makes it difficult to determine the forecast estimation robustness, leading to the failure of some proposed relationships, methods and performance estimations, when new data became available (Myers et al., 1995). Such failures may be related to new controls, which were not considered previously (Myers et al., 1995, Planque and Buffaz, 2008), or to limitations in the available data (Schirripa and Colbert, 2006).

Recruitment forecast is a problem of high uncertainty (Mäntyniemi et al., in press). Machine-learning techniques have been proposed as an appropriate approach with some desirable properties to address such problems (Dreyfus-León and Chen, 2007, Dreyfus-León and Schweigert, 2008, Fernandes et al., 2010, Fernandes et al., 2013, Uusitalo, 2007). In this study, an update of a previously proposed machine-learning based framework (Fernandes et al., 2010) is applied to several North Atlantic species of commercial interest, which share spawning and nursing environment in the shelf break (Ibaibarriaga et al., 2007, Sagarminaga and Arrizabalaga, 2010). The main properties of this methodology are: (i) forecasts with its uncertainty estimated; (ii) forecasts and scenarios easy to interpret; (iii) recruitment and factors boundaries, that can be interpreted easily; (iv) high stability of selected factors, using a ‘leaving one out’ schema; (v) error balanced through all recruitment level; and (vi) robust, as well as honest performance estimation.

Within this context, this work has three aims: to identify factors for forecasting of North Atlantic species that share spawning and nursing area; (ii) to propose a novel model to modify the previous framework in order to produce more accurate probabilistic forecasts; and (iii) to provide a comparison between goodness-of-fit and generalization power, in order to assess the reliability of the final forecasting models. This comparison is necessary since the used methods are non-parametric and might over-fit the data. The three objectives are crucial to produce reliable forecasts that can be used for decision taking in fisheries management of those species that share spawning and nursing area.

Section snippets

Target species

The species recruitment time series analysed for the North East Atlantic that share the shelf break as spawning and nursing area are summarized below: 1) The anchovy recruitment mixed time-series (ARM) is a combination of two anchovy recruitment time-series; the long anchovy recruitment index time-series (ARI; Borja et al., 1996) established from the percentage of age 1 in the landings (40 years) and the Anchovy Recruitment (AR; ICES, 2008a; 23 years). The resulting time-series contains 45 years

Pipeline comparison

The missing imputation can also be applied to the ‘NBC-Pipeline’; however, no significant improvement was observed. This result was expected since NBC can be learned with missing data and there was no factor with high levels of missing values.

Both classifiers, NB and FNB classifiers, show good-fit for most of the considered species (Fig. 1). The ‘MIS + FNB-Pipeline’ produces the best fitting for the seven species (Table 2). The most interesting property of this fitting for fisheries management is

Discussion

The main contribution of this work is the application of the methodology developed in Fernandes et al. (2010), to a broad set of species using a global set of variables. The forecast estimates of each species can be improved by applying more specific knowledge (more specific environmental data), to each species. However, the results show that, even using a global approach, useful information can be obtained using machine learning techniques applied to the recruitment forecasting problem. The

Acknowledgements

The research of Jose A. Fernandes and Nerea Goikoetxea is supported by a Doctoral Fellowship from the Fundación Centros Tecnológicos Iñaki Goenaga. This study has been supported by the following projects: Ecoanchoa (funded by the Department of Agriculture, Fisheries and Food of the Basque Country Government); the Saiotek and Research Groups 2007–2012 (IT-242-07) programs (Basque Government), TIN2008-06815-C02-01 (Spanish Ministry of Education and Science); COMBIOMED network in computational

References (48)

  • A. Borja et al.

    Climate, oceanography, and recruitment: the Bay of Biscay anchovy paradigm

    Fish. Oceanogr.

    (2008)
  • G.W. Brier

    Verification of forecasts expressed in terms of probability

    Mon. Weather Rev.

    (1950)
  • D. Cushing

    The dependence of recruitment on parent stock in different groups of fishes

    ICES J. Mar. Sci.

    (1971)
  • T. Delavallade et al.

    Using entropy to impute missing data in a classification task

  • R.O. Duda et al.

    Pattern Classification and Scene Analysis

    (1973)
  • B. Efron

    Bootstrap methods: another look at the jacknife

    Ann. Stat.

    (1979)
  • U.M. Fayyad et al.

    Multi-interval discretization of continuous valued attributes for classification learning

  • R.I.C. Francis

    Measuring the strength of environment-recruitment relationships: the importance of including predictor screening within cross-validations

    ICES J. Mar. Sci.

    (2006)
  • S. García et al.

    An extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons

    JMRL

    (2008)
  • L. Ibaibarriaga et al.

    Egg and larval distributions of seven species in north-east Atlantic waters

    Fish. Oceanogr.

    (2007)
  • L. Ibaibarriaga et al.

    A two-stage biomass dynamic model for Bay of Biscay anchovy: a Bayesian approach

    ICES J. Mar. Sci.

    (2008)
  • ICCAT
  • ICES

    Report of the ICES/GLOBEC Workshop on Long-term Variability in SW Europe (WKLTVSWE), February 13–16, Lisbon, Portugal

    (2007)
  • ICES

    Report of the Working Group on the Anchovy, ICES Headquarters, June 13–16

    (2008)
  • View full text