Assessing the performance of statistical classifiers to discriminate fish stocks using Fourier analysis of otolith shape

The assignment of individual fish to its stock of origin is important for reliable stock assessment and fisheries management. Otolith shape is commonly used as the marker of distinct stocks in discrimination studies. Our literature review showed that the application and comparison of alternative statistical classifiers to discriminate fish stocks based on otolith shape is limited. Therefore, we compared the performance of two traditional and four machine learning classifiers based on Fourier analysis of otolith shape using selected stocks of Atlantic cod (Gadus morhua) in the southern Baltic Sea and Atlantic herring (Clupea harengus) in the western Norwegian Sea, Skagerrak, and the southern Baltic Sea. Our results showed that the stocks can be successfully discriminated based on their otolith shapes. We observed significant differences in the accuracy obtained by the tested classifiers. For both species, support vector machines (SVM) resulted in the highest classification accuracy. These findings suggest that modern machine learning algorithms, like SVM, can help to improve the accuracy of fish stock discrimination systems based on the otolith shape. Résumé : L’affectation d’un poisson donné à son stock d’origine est importante pour la fiabilité des évaluations de stocks et de la gestion des pêches. La forme des otolites est communément utilisée comme marqueur de stocks distincts dans des études de discrimination. Notre examen de la documentation a montré que l’application et la comparaison de différents critères de classification statistiques pour discriminer des stocks de poissons sur la base de la forme des otolites constituent une approche d’usage limité. Nous avons donc comparé la performance de deux critères de classification traditionnels et quatre critères d’apprentissage machine reposant sur l’analyse de Fourier de la forme des otolites pour des stocks choisis de morue (Gadus morhua) dans la mer Baltique méridionale et de hareng (Clupea harengus) dans la mer de Norvège occidentale, le Skagerrak et la mer Baltique méridionale. Nos résultats montrent que les stocks peuvent être discriminés efficacement sur la base des formes d’otolites. Nous observons des différences significatives de l’exactitude obtenue en utilisant les différents critères de classification évalués. Pour les deux espèces, les machines à vecteurs de support (MVS) produisent la plus grande exactitude de classification. Ces constatations donnent à penser que des algorithmes d’apprentissage machine modernes, tels que les MVS, peuvent aider à rehausser l’exactitude de systèmes de discrimination des stocks de poissons basés sur la forme des otolites. [Traduit par


Introduction
Discrimination of fish stocks is essential for reliable fisheries resource management and is currently an integral part of modern fish stock assessments (Begg et al. 1999). Many commercially exploited fish stocks show strong habitat overlaps, resulting in a temporal mixing. A disregard of stock mixing, particularly when stocks differ in productivity, may lead to the overexploitation of unique spawning components (Kell et al. 2004;Kerr et al. 2017). Therefore, individuals from mixed-stock catches need to be assigned to their stock of origin using reliable stock discrimination methods with high classification accuracy (Cadrin et al. 2014).
One widely applied stock discrimination technique involves otoliths, which are calcium carbonate structures located in the inner ear of fishes (Campana and Casselman 1993). Otolith shape is mostly driven by a combination of environmental and genetic factors and contains stock-specific features, which are usable as a relevant marker of distinct stocks (Vieira et al. 2014;Berg et al. 2018). In recent years, diverse methods enabling the description of the otolith shape were developed and tested, such as curvaturebased descriptors, wavelets, shape geodesics, or mirroring techniques (Parisi-Baradad et al. 2005;Nasreddine et al. 2009;Harbitz and Albert 2015). However, otolith outlines are still most frequently investigated with a mathematical scheme of Fourier decomposition, namely fast Fourier transform or elliptical Fourier analysis (Stransky 2014). Both fast Fourier transform and elliptical Fourier techniques decompose shape, which is a polygon of twodimensional coordinates, into a spectrum of harmonically related trigonometric curves and calculate coefficients describing each of these curves (for details see Haines and Crampton 2000;Kuhl and Giardina 1982). Calculated coefficients may be then used as predictors for the discrimination of fish stocks in multivariate statistical analysis (Stransky 2014). However, once shape coefficients are extracted, little attention has been paid to apply and compare performances of alternative statistical systems to assign fish individuals to known groups (stocks or species) based on their otolith shape. Available classifiers arise from different fields, like statistics (e.g., linear discriminant analysis), artificial intelligence and data mining (e.g., decision trees), or connectionist approaches (e.g., neural networks) (Fernández-Delgado et al. 2014). Most machine learning (ML) algorithms are not yet part of the traditional statistical modeling; hence, their application in ecology is still scarce (Olden et al. 2008). However, modern ML algorithms have a high potential to outperform traditional parametric classifiers in solving real-world classification problems (Fernández-Delgado et al. 2014). They are much more flexible than conventional models and are able to handle the nonlinear relationships and interacting elements that often characterize biological data (Guisan and Zimmermann 2000). Current computational capabilities and freely available statistical software allow relatively easy implementation of these modern algorithms, and they may be valuable in the development of fish stock discrimination routines. The advantages of ML applications have been already considered in other stock discrimination approaches, like in otolith chemistry (e.g., Mercier et al. 2011) or analysis of parasitological markers (e.g., Perdiguero-Alonso et al. 2008). These studies strongly suggest that current ML classifiers are already well suited to assign fish to stocks and that classification abilities are improved compared with traditional discriminant analysis.
Few studies used ML algorithms and Fourier analysis of otolith shape to discriminate fish stocks (e.g., Zhang et al. 2016;Mapp et al. 2017). However, these studies did not compare the ML performance with traditional classifiers like linear discriminant analysis. Only recently Jones and Checkley (2017) compared random forest with discriminant analysis to identify otoliths found in sediment cores and showed that the ML approach outperformed the traditional classifier. However, they applied these algorithms to distinguish between species (i.e., between higher taxonomic groups that naturally show stronger otolith shape differences than between fish stocks). To the best of our knowledge, no comprehensive comparison of traditional and modern ML classifiers to assign individuals to fish stocks has been conducted.
Here, we apply six statistical classifiers (two traditional: linear discriminant analysis (LDA), quadratic discriminant analysis (QDA) and four ML classifiers: K-nearest neighbors (KNN), classification and regression trees (CART), random forest (RF), and support vector machines (SVM)) to discriminate stocks of two commercially exploited fish species, where Fourier analysis of otolith shape is required for accurate estimation of mixing ratios for a proper stock assessment: Atlantic cod (Gadus morhua) in the southern Baltic and Atlantic herring (Clupea harengus) in the northeastern Atlantic.
This paper aims to (i) conduct a systematic review of the available scientific literature focusing on statistical classifiers associated with Fourier analysis of otolith shape for discrimination purposes; (ii) investigate the otolith shape variability of cod and herring stocks by applying elliptical Fourier analysis; and (iii) assess the performance of traditional and recent ML classifiers to assign fish individuals to their group of origin based on their otolith shape.

Literature review of the use of statistical classifiers
Peer-reviewed literature was searched in the Web of Science Core Collection database using the keywords "otolith$" and "Fourier". Only English-language studies on otolith shape that applied Fourier analysis to discriminate fish groups at different biosystematics levels (ecotype, stock, population, species) were chosen for further investigation. Selected literature was reviewed to analyze which statistical classification algorithm was applied to discriminate different fish groups. Different types of algorithms based on the framework of Fisher discriminant analysis (Fisher 1936), including parametric and nonparametric extensions, were aggregated as one group (discriminant analysis). The list of 106 publications used in the review process is given in the online Supplementary materials (Table S1). 1

Atlantic cod (Gadus morhua)
Atlantic cod is one of the most important commercially exploited fish species across the North Atlantic Ocean, inhabiting also the brackish waters of the Baltic Sea. Here, Baltic cod is managed as two separate stocks: one western stock (ICES subdivisions (SDs) 22-24) and one eastern stock (SDs 24-32; ICES 2019a). The genetically distinct cod stocks coexist in the Arkona Basin (SD 24;Hemmer-Hansen et al. 2019;Weist et al. 2019), resulting in uncertainties in the stock assessment. Since the ICES benchmark in 2015, otoliths of cod from commercial samples from the mixing area are assigned to their respective stock of origin using elliptic Fourier descriptors and LDA (ICES 2015(ICES , 2019bHüssy et al. 2016). For this study, we used otolith images of genetically validated Baltic cod samples (N = 507; Weist et al. 2019) from the mixing area (SD 24; Fig. 1) and from adjacent areas (Belt Sea (SD 22), Øresund (SD 23), and Bornholm Basin (SD 25)). The dataset consists of 52% western Baltic cod (WBC) and 48% eastern Baltic cod (EBC) ( Table 1). For further details refer to Schade et al. (2019).

Atlantic herring (Clupea harengus)
Atlantic herring is a commercially exploited fish species in the northeastern Atlantic that has been a key species for stock discrimination studies (Geffen 2009). Herring stocks in this region consist of multiple spawning components. In this study, we analyzed otoliths from four distinct spawning components (  (WBSS), whereas CBNC is part of the central Baltic herring (CBH) stock. To ensure that distinct components were sampled, we only used herring sampled in spawning condition. Further, only herring of ages 5-6 were selected to reduce age effects on shape variability (Libungan et al. 2015). Herring were mainly collected during scientific surveys, except for GB and some samples of CSS that were caught by local fishers using gillnets.

Otolith shape analysis
For cod and herring, shape images of clean and unbroken sagittal otoliths were used. Images of the right otolith were preferred; otherwise, the image of the left otolith was flipped. There are no differences between left and right otoliths for cod (Campana and Casselman 1993;Cardinale et al. 2004) and herring (Libungan et al. 2015). High-resolution images were binarized using the threshold function of the GNU Image Manipulation program (Natterer and Neumann 2008).
For the shape analysis, outlines were automatically obtained from converted images using the Momocs package (Bonhomme et al. 2014) in the R environment (R Core Team 2018). Elliptical Fourier analysis proposed by Kuhl and Giardina (1982) was used to quantify otolith outlines. This technique decomposes two-dimensional shape with a sum of harmonics, where each harmonic is described by four coefficients (two for x-axis and two for y-axis coordinates). Precision of approximate reconstruction of shapes increases with the number of harmonics used, but it is recommended to reduce the number of harmonics for multivariate analysis. To define the appropriate number of harmonics, 100 otoliths were randomly sampled from the whole set, and the Fourier power (PF n ) spectrum and cumulated Fourier power (PF c ) were calculated with the following formulas: where A n , B n , C n , and D n are the coefficients of nth harmonic (Lord et al. 2012). The number of harmonics that reaches 99% of cumulated Fourier power of 30 harmonics were chosen to summarize shapes of otoliths (Stransky et al. 2008b;Vieira et al. 2014). The first three coefficients were taken as fixed values (A 1 = 1; B 1 = C 1 = 0) to normalize otoliths for size, orientation, and starting point (Tracey et al. 2006). Mean otolith shapes of different stock components were calculated by invert transformation of Fourier coefficients. Overall variance in the shape of otoliths was assessed with principal component analysis (PCA) integrated with morphospaces (theoretical shapes were reconstructed based on the PCA scores; Bonhomme et al. 2014).

Statistical classifiers
Among the six selected algorithms, LDA and QDA were chosen as two of the most popular classifiers, widely implemented in otolith-based fish stock and species discrimination (e.g., Paul et al. 2013;Zhang et al. 2013). They are applied to predict the affiliation of observations from two or more known classes. Both classifiers use the best combination of several characters that provide the strongest separation of classes by maximizing the standard deviation between obtained groups and minimizing them within groups (Fisher 1936).
KNN algorithm is one of the simplest ML classifiers that can be applied both to binary and multiclass problems (Hall et al. 2008). In the first step, it selects the nearest neighbors and then determines the class of observation using these selected neighbors.
One of the KNN advantages is its higher tolerance of the data structure (Hastie et al. 2009).
Similarly, CART, a nonparametric procedure, requires no assumptions about the distribution of the data. These models are obtained by recursively partitioning the data space and fitting a simple prediction model within each partition. As a result, the partitioning can be represented graphically as a decision tree (Loh 2011).
RF is an ensemble technique, based on a set of CARTs, where a bootstrap approach is implemented to select a random set of observations and variables used to construct each tree in ensemble. Finally, decisions of all trees on object allocation are aggregated, and the majority is used to provide final class prediction (Breiman 2001).
SVM was selected among the broad range of ML approaches because of its ability to deal with high-dimensional datasets and its flexibility in modeling diverse data sources (Ben-Hur et al. 2008). This technique uses kernel functions to project the predictive variables into feature space with more dimensions than the initial space of the input data, allowing the construction of linear models (Cortes and Vapnik 1995).

Statistical analysis
All predictors (Fourier coefficients) were examined for normality with graphical tools (Zuur et al. 2010). None of the variables showed significant deviation from normal distribution. For each fish species, differences in total fish length between stock components were tested and found to be significant using one-way ANOVA (Tukey HSD, p < 0.001). To test allometric effects of fish length on shape coefficients, we conducted analyses of covariance (ANCOVAs). Information on stock components origin was included in the model as fixed factors and fish length as covariate. If the interaction between fixed factor and covariate was significant, the variable was excluded from the dataset; otherwise, shape coefficients with significant fish length effect were standardized using the common slope for all stock components (Zhuang et al. 2014).
Classification and regression training package caret (Kuhn 2008) for R was used to compare performances of selected classifiers. The package allows for different algorithms to be trained in a consistent environment and to conduct the tuning of the ML parameters. All predictor variables were scaled and centered in a preprocessing stage. Optimal hyperparameters of KNN (k), CART (cp), RF (mtry), and SVM ( and C) were defined during preliminary tuning (Figs. S1 and S2 1 ). Following Mercier et al. (2011) andZhang et al. (2016), a fourfold cross-validation resampling method was used to provide the data for the assessment of the performance of each classifier. This validation method is advised as a reasonably stable and low biased measure of model performance (Hastie et al. 2009), but typically indicates lower accuracy of the evaluated algorithms than most often applied leave-one-out cross-validation. Datasets were randomly split into four equal subsets with preservation of class ratios, where three subsets (75% of observations) were used as training data to classify the remaining subset (25% of observations). Validation was repeated for each of the four splits. Additionally, 100 repetitions of the whole process were conducted using a bootstrap approach with independent resampling (Hastie et al. 2009). Confusion (error) matrices (e.g., Kuhn 2008;Perdiguero-Alonso et al. 2008) were generated, and classification accuracy (the percentage of fish correctly assigned to their actual class) was calculated as a measure of classifier quality. To assess the influence of the number of Fourier harmonics used for the shape representation on classification accuracy, we conducted each cross-validation procedure (400 repetitions) on datasets produced with between 2 to n harmonics, where n is the number of harmonics that reach 99% of cumulated Fourier power. When number of variables was lower than the specified optimal hyperparameter mtry for RF, the default mtry was applied, which equals the square root of the number of variables. Moreover, to assess the influence of the number of classes on the performance of classifiers, the herring dataset was split into two-class subsets and similar cross-validation was run for each pair of spawning components. The algorithms were developed in parallel, using the same training and test sets. Therefore, paired t tests with adjusted p values to control false discovery rates (Benjamini and Hochberg 1995) were used to test differences in accuracies of classifiers in relation to the dataset with the n number of Fourier harmonics. The importance of Fourier descriptors was calculated with the varImp function of the caret package and was visualized in decreasing order using mean importance for all models. All of the models were built using the following R packages: LDA and QDA with MASS (Ripley et al. 2015), KNN with caret (Kuhn 2008), CART with rpart, RF with randomForest (Liaw and Wiener 2002), and SVM based on the radial basis function (RBF) kernel with kernlab (Karatzoglou et al. 2015).

Literature review of the use of statistical classifiers
Among 106 selected papers published in the period from 1990 to 2018 that incorporate Fourier analysis as the method for otolith shape description, the framework of Fisher discriminant analysis (DA) was the most popular statistical approach. Studies that applied only DA constituted ϳ92%, while one study (<1%) used DA and RF in parallel (Jones and Checkley 2017). The remaining ϳ7% of the publications applied classifiers other than DA to assign samples to their respective class (e.g., SVM or KNN classifier (Reig-Bolaño et al. 2010b;Benzinou et al. 2013), boundary-based shape classification (Nasreddine et al. 2009), between-class correspondence analysis (Ponton 2006), or RF (e.g., Zhang et al. 2016)).
Application of more than one classifier in the same analysis was scarce (ϳ8% of papers). Comprehensive comparison of accuracy of nine ML algorithms was done by Mapp et al. (2017), including naive Bayes, Bayesian networks, logistic regression, HyperPipes, C4.5, RF, KNN, SVM, and rotation forest. Jones and Checkley (2017) showed that RF algorithms outperformed DA in terms of accuracy. Torres et al. (2000) presented that QDA was superior to LDA, while Finn et al. (1997) found no differences between LDA and QDA models. SVM performed better than KNN in terms of correct classification rate, but the latter classifier resulted in more stable performances across the classes and has been chosen for discrimination of fish based on otolith shape in Benzinou et al. (2013).

Otolith shape variability
Precision of approximate reconstruction of shapes increased with the number of harmonics used (Fig. 2). For both species, 13 harmonics were needed to achieve 99% of cumulative Fourier power summarizing the otolith shapes. Consequently, the first 13 harmonics were used in further analyses. Owing to the significant interaction between stock components and fish total length in the ANCOVA models (p < 0.001), six and 12 Fourier descriptors were excluded from cod and herring data, respectively. A further 23 (cod) and 29 (herring) descriptors were corrected for the fish length effect using a common slope.
Visual inspection of mean otolith shape identified differences between cod stocks and herring components (Fig. S3 1 ). Among cod stocks, WBC had wider otoliths than EBC. Otoliths of NSS and CBNC herring were generally wider than those of CSS and GB herring, which means otolith shapes were very similar.
For cod, the first two PCA axes explained 72.6% of the overall variance in the shape of otoliths (Fig. 3a). The two cod stocks were mainly separated along the first axis, even though a strong overlap was observed. For herring, 66.3% of the overall variance was explained by the first two axes (Fig. 3b).

Classification accuracy
The classification accuracy of cod otoliths increased with increasing number of harmonics but stayed relatively constant for six and more harmonics (Fig. 4a). One exception is QDA, where the accuracy slightly decreased with a higher number of harmonics. In comparison, the accuracy continued to increase for herring otoliths with increasing number of harmonics (Fig. 4b).
The accuracy differed significantly between classifiers, except for QDA and KNN for cod otoliths as well as LDA and KNN for herring (Table 2). For both species, SVM resulted in the highest classification accuracy (Fig. 4), even when herring data were se-quentially split into two-class subsets (Fig. S4 1 ). LDA resulted in slightly but significantly lower accuracy for cod ( Fig. 4a; Table 2).
The fourfold cross-validation using SVM (best classifier) and 13 harmonics (accounting for 99% variance of the otolith shape) resulted in an accuracy of 79.54% for cod and 74.13% for herring (Table 3). For cod, the misclassification rate was equal in both stocks (ϳ10%). For herring, the highest misclassification occurred between GB and CSS herring (ϳ7%). Misclassification among the other herring components was low (<1%).
The relative importance of individual Fourier descriptors was consistent among statistical classifiers for both species (Fig. 5), except for CART. CART and RF both rely on the importance of only a few descriptors (about eight or fewer), while the other classifiers rely on the importance of a higher number of Fourier descriptors.

Discussion
The presented review of the literature showed that the application and comparison of alternative classifiers to discriminate fish groups based on their otolith shape is limited. In this study, stockspecific differences in otolith shapes for cod and herring could be detected, which enables the assignment of individual fish to its respective stock of origin. Moreover, a comparison of different statistical classifiers suggested that ML algorithms, in particular SVM, can improve the accuracy in stock discrimination approaches using the shape of otoliths.

Literature review of the use of statistical classifiers
The literature review emphasized that traditional DA was used in most of the studies for the classification of fish groups based on the elliptical Fourier descriptors of otolith shape, while application of alternative classifiers was less common. For example, Zhang et al. (2016) used RF to discriminate stocks of the Japanese Spanish mackerel (Scomberomorus niphonius) based on Fourier descriptors of otolith shapes, but no comparison with other classifiers was reported. Mapp et al. (2017) used nine ML algorithms for fish stock separation of two clupeid species using otolith shapes. However, the study of Mapp et al. (2017) was not focused on the absolute classification accuracy, but on the applicability of morphometric approaches that incorporate size information. No comparison with traditional classifiers, like LDA, was made in Mapp et al. (2017), while Jones and Checkley (2017) showed that RF algorithms were superior to DA during classification of fish individuals into different taxonomic groups based on the morphological descriptors and elemental compositions of otoliths.
Studies comparing more than one statistical classification algorithm indicated that the success of fish classification can be significantly improved by alternative classifiers (Torres et al. 2000). These findings stress the need for the comparison of different classifiers (i.e., different approaches should be explored so that the best method is used to achieve the best possible assignment). More accurate assignment of individual fish allows for more robust estimation of the contribution of different fish stocks within the mixing areas (i.e., a mixed stock scenario; Hüssy et al. 2016). Accurate estimates of mixing levels can help to understand how movement and mixing affect stock dynamics and provide the quantitative basis for annual stock assessments and scientific advice (Horbowy 2005;Taylor et al. 2011).

Otolith shape variability
Our results support the previous studies showing that Baltic cod stocks can be successfully discriminated based on the elliptical Fourier analysis of otolith outlines (Paul et al. 2013;Hüssy et al. 2016). Significant differences in otolith shape were also reported for other stocks and spawning populations of cod (e.g., the Northeast Arctic and Norwegian coastal cod (Stransky et al. 2008a), Faroe Plateau cod (Cardinale et al. 2004) or Icelandic cod (Petursdottir et al. 2006)). Mean shapes reconstructed on the calculated Fourier descriptors indicated that the otolith outline of WBC and EBC differ in the large-scale shape characteristics (mainly length-width relationship), where otoliths from the western stock are wider and rounder than those from the eastern stock, which is in line with previous observations (Paul et al. 2013;Hüssy et al. 2016). Differences in circularity and rectangularity of otoliths were also reported in other cod stocks (Campana and Casselman 1993;Cardinale et al. 2004).
Similarly, discrimination methods based on the analysis of otolith outlines were applied to separate populations of herring in the northern Atlantic (e.g., Burke et al. 2008;Libungan et al. 2015). Our study revealed differences in otolith shape between herring components. Most of the differences were based on the relationships between the length and width of the whole otolith. NSS and CBNC have wider otoliths, but the rostrum of NSS herring otoliths is clearly longer. Confusion matrices of the cross-validated models (Table 3) indicated that a relatively large number of individuals from the CSS and GB were misassigned, suggesting similarity in otolith shape. This result supports the current assessment approach, where both spawning components are considered as one stock (WBSS) because of the high level of overlap (ICES 2018b). Although selected herring spawning components were discriminated with a high level of accuracy, further studies need to include other stock components in this region, such as the autumn spawners and the southern component of CBH (ICES 2018a).
The differences in the shape of fish otoliths, for both fish species, may be associated with a combination of environmental and genetic drivers (Cardinale et al. 2004;Vignon and Morat 2010). To explore how these factors influence otolith shape, we need to perform further analyses, including experimental and laboratory studies with appropriate control of the potentially confounding variables (e.g., Berg et al. 2018). However, even without the mechanistic understanding of the sources of shape variability, these results support the applicability of Fourier analysis of otolith shape in stock discrimination routines and assessment of fish stocks (Cadrin et al. 2014). The use of otoliths as indicators of stock identity has been previously advocated because otoliths are routinely collected for aging in traditional fish monitoring, providing a robust and cost-effective method for stock discrimination (Campana and Casselman 1993;Cardinale et al. 2004).

Assessment of statistical classifiers
There were significant differences in accuracy among the six statistical classifiers tested. The highest accuracy of fish classification was achieved by SVM, one of the rapidly developing ML clas- Fig. 5. Variable (Fourier descriptors) relative importance obtained for cod (a) and herring (b) from otolith shape classification models. Refer to Fig. 4 for definitions of acronyms. [Colour online.] sifiers. Accuracy of the SVM model trained on cod data was only 0.9% higher than of the second-best performing classifier (LDA), but differences were significant. However, the accuracy of the SVM trained on herring data was 7% to 20% higher than the other classifiers. Good performance of the SVM algorithm, as well as other ML algorithms, has been previously shown in discrimination studies of stocks, species, or higher taxonomic levels of fishes based on their otolith shapes (Reig-Bolaño et al. 2010a;Benzinou et al. 2013;Zhang et al. 2016;Mapp et al. 2017).
These findings suggest that ML algorithms are a good alternative to traditional classifiers and can help to improve the accuracy of routine fish stock discrimination using the shape of the otolith. Although SVM achieved the highest accuracy in this study, we strongly advise to test a range of statistical classifiers in discrimination studies, because the selection of the best-performing algorithm can be case-specific and depends, for example, on the number of classes, similarity between groups, or type and number of variables in the dataset (Fernández-Delgado et al. 2014).
Caution is warranted, however. The proposed benchmark of different statistical classifiers should be conducted only in systems with well-defined units. The ability of ML classifiers to find structures and clusters in the data needs to be considered with caution. Application of the ML algorithms for the discrimination of fish groups, where training baselines are not validated (e.g., by genetics or by sampling spawning individuals in their respective spawning area), may potentially lead to confusing results and recognition of subgroups, which may not represent the real biological or management units. The practical problems of managing natural resources with poorly defined units continue to be an important issue (Geffen 2009). For these reasons, the definition of robust baselines for the training of classification algorithms is a crucial point in the development of operational discrimination systems (Cadrin et al. 2014;Hüssy et al. 2016;Schade et al. 2019).

Study limitations and future implications
In this study, a simple approach was applied, using only Fourier descriptors of otolith shapes as predictors of fish stock affiliation. The focus was exclusively on the differences of statistical classifier accuracies on the length-normalized descriptors of otolith shape (Hüssy et al. 2016). However, incorporation of other potentially informative variables, such as shape indices or routinely collected information on length-at-age and sex of individual fish can further improve the predictive abilities of classification algorithms (Burke et al. 2008;Mapp et al. 2017). Further, alternatives to reconstruct the otolith shape like wavelet transformation or curvature scale space representation should be reconsidered. Fourier descriptors focusing on periodic phenomena (Harbitz and Albert 2015) might be more suited for cod otoliths that are almost elliptical. For more complex otolith shapes with very localized landmarks, like herring otoliths, wavelet transformation could be better-suited (Sadighzadeh et al. 2014). Besides otolith shapes, ML algorithms were already used successfully in other stock discrimination fields (e.g., population genetics (Guinand et al. 2002), otolith microchemistry (Mercier et al. 2011), hydroacoustics (Robotham et al. 2010), or parasitology (Perdiguero-Alonso et al. 2008)), even though the application is still rare.
In our study, the analysis of Fourier power spectrum indicated that 13 harmonics were needed to explain 99% of the variance in the otolith shape both for cod and herring. Interestingly, high accuracy for the cod assignment was already obtained with only five to six harmonics, suggesting that additional higher-frequency harmonics do not incorporate much information for the discrimination of these stocks. These results are in line with the analysis of variable importance, which showed that lower-rank descriptors (D5, D1 -describing a global form of otoliths) were the most powerful predictors in all models. The broadly applied practice to include only a certain subset of harmonics (e.g., first N harmonics needed to describe 99% of shape variance) may not be optimal in the context of classification model performance. For fish species with simple otolith shapes, a reduced number of Fourier harmonics may be advantageous. Conversely, the inclusion of a larger number of harmonics in classification systems developed for species with more complex otolith structures, like herring, can help to achieve a better quality of classification models. In our study, a steady improvement of model accuracy with increasing number of harmonics was observed for SVM and RF, trained on the herring dataset. In the case of increasing dimensionality, the ML algorithms clearly outperform traditional classifiers due to their ability to integrate information from many variables without the high risk of overfitting (Breiman 2001;Ben-Hur et al. 2008). Improvement of the ML models accuracy can also be obtained by the elimination of noninformative variables during the model building (e.g., Smoliń ski 2019). Furthermore, heterogeneous ensemble techniques combining predictions of different model types could also be applied to improve the classification of fish stocks. Such an approach could help to minimize model-specific errors in class predictions and to obtain a more robust assignment of the fish origin.
The ability of SVM and other ML algorithms to model complex and nonlinear patterns without any assumptions is of great importance in many biological applications (Noble 2006). Therefore, the variable transformations are not needed for the application of these algorithms, which make the preprocessing more straightforward and faster. Moreover, variables with non-normal distribution (typically required for the traditional parametric models) do not need to be excluded after an unsuccessful transformation, preventing the loss of information potentially valuable for the discrimination of fish groups (Mercier et al. 2011).
Future operationalization of developing stock discrimination methods needs profound analyses of the level of temporal variability of within-and between-group differences, particularly in otolith shapes. The presented results are based on the samples collected within a short period of time, limiting the influence of the year classes and long-term environmental effects on otolith shape. However, if the temporally stable character of fish otolith shapes can be confirmed for particular stocks, it may enable continuous enlargement of databases. In consequence, better performance of ML algorithms can be achieved, because their classification accuracy typically improves with increasing size of training datasets.

Conclusions
Our study emphasized the potential for applying novel ML algorithms to improve the accuracy of classification systems based on the otolith shape of fish. We recommend conducting comparisons of different statistical classifiers in systems of well-identified stock structures using validated baselines. When temporal mixing of different fish stocks or stock components occurs, as with Baltic cod and herring in the Northeast Atlantic, possible improvements of stock discrimination processes by modern classifiers may be of great importance. More accurate assignment of fish individuals may help to more precisely estimate the contribution of different fish stocks within the mixing areas and, in consequence, provide a more reliable quantitative basis for annual stock assessments and scientific advice. on this manuscript. FMS was partly funded by the European Maritime and Fisheries Fund (EMFF) of the European Union (EU) under the Data Collection Framework (DCF, Regulation 2017/1004 of the European Parliament and of the Council). FB was funded by the Research Council Norway project 254774 (GENSINC).