Fasciola gigantica, F. hepatica and Fasciola intermediate forms: geometric morphometrics and an artificial neural network to help morphological identification

Background Fasciola hepatica and F. gigantica cause fascioliasis in both humans and livestock. Some adult specimens of Fasciola sp. referred to as “intermediate forms” based on their genetic traits, are also frequently reported. Simple morphological criteria are unreliable for their specific identification. In previous studies, promising phenotypic identification scores were obtained using morphometrics based on linear measurements (distances, angles, curves) between anatomical features. Such an approach is commonly termed “traditional” morphometrics, as opposed to “modern” morphometrics, which is based on the coordinates of anatomical points. Methods Here, we explored the possible improvements that modern methods of morphometrics, including landmark-based and outline-based approaches, could bring to solving the problem of the non-molecular identification of these parasites. F. gigantica and Fasciola intermediate forms suitable for morphometric characterization were selected from Thai strains following their molecular identification. Specimens of F. hepatica were obtained from the Liverpool School of Tropical Medicine (UK). Using these three taxa, we tested the taxonomic signal embedded in traditional linear measurements versus the coordinates of anatomical points (landmark- and outline-based approaches). Various statistical techniques of validated reclassification were used, based on either the shortest Mahalanobis distance, the maximum likelihood, or the artificial neural network method. Results Our results revealed that both traditional and modern morphometric approaches can help in the morphological identification of Fasciola sp. We showed that the accuracy of the traditional approach could be improved by selecting a subset of characters among the most contributive ones. The influence of size on discrimination by shape was much more important in traditional than in modern analyses. In our study, the modern approach provided different results according to the type of data: satisfactory when using pseudolandmarks (outlines), less satisfactory when using landmarks. The different reclassification methods provided approximately similar scores, with a special mention to the neural network, which allowed improvements in accuracy by combining data from both morphometric approaches. Conclusion We conclude that morphometrics, whether traditional or modern, represent a valuable tool to assist in Fasciola species recognition. The general level of accuracy is comparable among the various methods, but their demands on skills and time differ. Based on the outline method, our study could provide the first description of the shape differences between species, highlighting the more globular contours of the intermediate forms.

286 shape variables (log-shape ratios, LSR). The influence of size on interspecific discrimination by 287 LSR was estimated by regressing on log-size the discriminant factors. 288 289 Outline data (OD) 290 The contour of each specimen was manually digitized (see Fig. 1A), and the resulting coordinates 291 were subjected to an elliptic Fourier analysis. The "outline data" (OD) used as the input for 292 validated classification were the normalized elliptic Fourier coefficients (NEF). Their numbers 293 were restricted to an average set giving a power of at least 99% (n = 30). Their first eight principal 294 components were used directly as input for the validated classification tests. An estimate of the 295 allometric residue contained in shape variables was performed by regressing the principal 296 components of the NEF on the square root of the area delimited by the contours. The non-297 parametric estimation of the statistical significance of this multivariate regression followed the 298 procedure of Good (2000). On the other hand, the influence of the global size on interspecific 299 discrimination itself was estimated after regressing the first discriminant factor on size. For the 300 artificial neural network technique (ANN, see below), the OD were also combined with different 301 numbers (4, 6, 8, 10, and 12) of LN. 302 303 Landmarks data (LD) 304 The following anatomical landmarks (LM) were digitized: LM1 = left-side of the cephalic cone, 305 LM2 = left-side of the oral sucker, LM3 = right-side of the oral sucker, LM4 = right-side of the 306 cephalic cone, and LM5 = end of the testis (see Fig. 1B). The five landmarks were subjected to a 307 generalized Procrustes analysis, and the resulting aligned individuals were orthogonally projected 308 onto the Euclidean space tangent to the consensus of forms (Rohlf & Slice, 1990). The principal PeerJ reviewing PDF | (2019:08:40125:2:0:NEW 7 Jan 2020) 309 components (PC) of these data (called "Procrustes residuals", "tangent space variables", or 310 "orthogonal projections") were referred to here as "landmarks data" (LD) and used as the input for 311 validated classification. The influence of size on interspecific discrimination by LD was estimated 312 by regressing the discriminant factors on the centroid size. 313 314 Measurement error due to imaging and digitizing 315 The precision of our digitization was evaluated for both landmarks and outlines. For size we used 316 the method recommended by Arnqvist & Mårtensson (1998), and for shape we used the Procrustes 317 analysis of variance (Procrustes ANOVA) (Klingenberg and McIntyre, 1998). 318 319 Validated classification 320 To assess how our results could be generalized to an independent data set, each individual was 321 iteratively removed and its assignation computed following the analysis of the remaining data (the 322 "leave-one-out" method, also known as "validated reclassification") (see Supplementary data 1). 323 The analyses of the remaining data used three different statistical tests: a distance-based (Dist) 324 method, a maximum likelihood (MxL) method, and an ANN method. 325 326 In the first method (Dist), the Mahalanobis distance was computed between the testing set (one 327 individual) and the average of each training group (the remaining data), and the tested individual 328 was assigned to the closest group. In the second method (MxL), the testing set (one individual) was 329 assigned to the group it was thought most likely to belong to assuming a normal distribution and 330 independence between characters (Polly & Head, 2004;Dujardin et al., 2017). For the Dist and 331 MxL methods we tried to avoid using a number of input variables that would exceed the number PeerJ reviewing PDF | (2019:08:40125:2:0:NEW 7 Jan 2020) Manuscript to be reviewed 332 of individuals in the smallest sample. There are various methods to deal with multidimensionality 333 and we selected the simplest one: the set of first PCs lower than the number of individuals in the 334 smallest sample. Note that for the analyses related to the use of five anatomical landmarks, the 335 total number of final shape variables was actually six (two-times the number of LM minus 4 in the 336 case of two-dimensional analyses).  Soda, Slice & Naylor, 2017). We used the multilayer perceptron program written 344 in JavaScript, which is available at https://www.npmjs.com/package/mlp. 345 346 Following a process of trial-and-error, a single (hidden) layer composed of three neurons provided 347 the best results, irrespective of the type of input data. We used as the input the total number of 348 variables instead of a subset of their PCs. Inputs comprised the morphometric variables (LN, LSR, 349 OD, or LD, see above), while the final outputs were the probabilities of inclusion in each group 350 (in a Bayesian sense). Forms were then classified to the species for which their probability of 351 inclusion was the greatest (Soda et al., 2017). The "weights", i.e., the coefficients allowing the 352 outputs to be computed, were applied to the individuals being tested to compute categories and 353 suggest a classification. Thus, each individual was tentatively assigned to its putative species after 354 computing the weights from the remaining data. For a single individual to identify, the remaining PeerJ reviewing PDF | (2019:08:40125:2:0:NEW 7 Jan 2020) Manuscript to be reviewed 355 data were subjected to a large number of iterations to reduce the classification error to a minimum 356 (less than 25%). All specimens were separately classified 30 times, then an average classification 357 and its standard error were computed (see Supplementary data 1).

359
To make the most of the less stringent statistical assumptions of the ANN method with regard to 360 the statistical distribution of the input data (Ripley, 1996), we explored its capacity to deal with 361 different metrics combined into a single matrix. Thus, we built a matrix with four LN and made 362 another (normalized) matrix combining them with the OD. We performed a validated 363 reclassification on each matrix and then compared the scores with the correct identification. We

RESULTS
368 External morphological classification 369 The rough identification of our 90 field specimens did not detect the presence of Fasciola 370 intermediate forms despite them being present, as determined by the molecular analyses (see next 371 paragraph). Moreover, the rough identification wrongly assigned 30% of the specimens as F. 372 hepatica species, despite the complete absence of this species, as revealed by the molecular data 373 (Table 1). Manuscript to be reviewed 378 patterns that enable the three Fasciola species to be distinguished (Fig. S2, S3). The results showed 379 that 74 worms were classified as F. gigantica and 16 worms were Fasciola intermediate forms. F. 380 hepatica was not present in the samples collected in the field in Thailand (Table 1).  (Table 2).

388 Traditional morphometrics (multivariate analyses)
389 Based on the complete set of 20 linear measurements, the validated classification produced 390 excellent scores of correct species attribution, ranging from 89% to 92% according to the 391 reclassification method (Table 3, column "20LN"). In order to tentatively reduce the number of 392 characters, we selected a subset of contributing variables. We did not select them according to 393 univariate statistical differences between groups but according to their contribution to the first and 394 second principal components. This selection allowed us to restrict the set of LN from 20 to 12 395 variables (BL, BW, BP, CL, Vsmin, A-VS, OS-VS, VS-Vit, Vit-P, VS-P, TL, and TW). The 396 exclusion of poorly contributing variables was beneficial and substantially improved the validated 397 reclassification score from 88% to 97% (Table 3, column "12LN"). The shape variables derived 398 from LN (either LSR from 20LN or LSR from 12LN) produced very similar scores as log-399 transformed raw variables (Table 3).    (Fig. 3A). The factor maps of the discriminant analyses 422 (Fig. 3B) showed non-overlapping distributions. Accordingly, satisfactory validated classification 423 scores were obtained, ranging from 89% to 94% (Table 3, column "OD").
PeerJ reviewing PDF | (2019:08:40125:2:0:NEW 7 Jan 2020) Manuscript to be reviewed 424 Allometry 425 The contribution of global size variation to the first shape-derived discriminant factor derived from 426 OD was 23%, while it was 6% for the second one. According to the non-parametric method used 427 to evaluate the multivariate regression significance between size and shape, the allometric residue 428 contained in OD shape variables was significant (p < 0.001).
429 430 Landmark-based analyses 431 The mean specimens aligned according to the five anatomical landmarks used here showed some 432 separation between the three forms, with a seemingly more critical difference for Fasciola 433 intermediate forms (Fig. 4A), as for the OD (Fig. 3). The factor map of the discriminant analysis 434 showed that Fasciola intermediate forms could diverge from F. gigantica and F. hepatica (Fig.   435 4B). However, some overlapping was observed, and the validated classification scores based on 436 landmark data were not satisfactory, ranging from 58% to a maximum of 69% (Table 3,   Manuscript to be reviewed 446 The first and second discriminant factors derived from the Procrustes residuals were still under the 447 influence of size (42% and 4%, respectively) after regression on centroid size, which was more 448 than for the outline-based analysis (23% and 6%, respectively) but much less than for the 449 traditional method (61% and 63%, respectively).  (Table 7). For instance, with ten traditional measurements as the 456 input, the ANN performance achieved 88% of correct identification, with OD only it reached 89% 457 (Table 3), whereas by combining both types of data the total score jumped to 94% (Table 7).  Manuscript to be reviewed 468 To reduce the effects of growth on the phenotype, only fully mature stages were selected from 469 each molecularly defined group. However, in addition to possible growth heterogeneity, other 470 divergent influences could have affected phenotypic variations in our total sample, such as 471 different geographic origins, microhabitats, host, and seasonality. We considered that, within each 472 species or group, these environmental effects would probably have had much more of an impact 473 on size than on shape. We used the geometric morphometrics approach to tentatively set apart 474 these size effects from the morphometric data, extracting the remaining shape variables and using 475 them exclusively to describe the groups distinguished by molecular data. The allometric residue 476 that possibly remained in the shape variables was not removed because our comparisons concerned 477 species rather than intraspecific groups. Species may differ by their allometric growth, which 478 becomes part of their evolutionary divergence (Klingenberg, 2016).  495 This procedure allowed multidimensionality to be reduced where necessary; this did not make our 496 samples more representative, but at least the information they provided was not over-interpreted.

498
Since linear measurements can provide information about both size and shape, and because of the 499 relatively small size of Fasciola intermediate forms (Fig. 2), it was expected that this sample could 500 be identified using traditional techniques. Nevertheless, linear measurements did not just reflect 501 global size variation since they could also allow an excellent score of recognition for the pair F. 502 hepatica and F. gigantica, which are parasites of similar global size (Fig. 2, Table 4). The 503 extraction of shape (LSR) did not produce different results, confirming the relatively poor 504 efficiency of shape removal from our linear measurements. With or without shape extraction, 505 traditional morphometrics appeared to be a powerful means of species determination. It does, 506 however, require an excellent knowledge of the parasite's internal anatomy.  Ashrafi et al., 2006). By considering the contribution of each variable to their first 511 two PC, we were able to reduce this number from 20 to 12, with similar or better results (see Table   512 3). This improvement could be explained by the elimination of redundancy among measurements, 513 and a possible lack of precision for some of them. Either way, our examination of repeatability PeerJ reviewing PDF | (2019:08:40125:2:0:NEW 7 Jan 2020) Manuscript to be reviewed 514 produced satisfactory results for both shape (93%) and size (99%), although these results were 515 lower than the results of the outline-based technique (98% and 99%, respectively).

517
The outline-based morphometrics analysis was based on manual contour digitization. Although 518 this type of manual data collection is relatively fast, an automated capture technique might be 519 preferable. Automated curve tracing is already included in many types of imaging software, and it 520 may be more appropriate for generalizability (Yang et al., 2015), but its useful application is 521 dependent on the complexity of the contours (Sheets at al., 2006;Firmat et al., 2010). We would 522 recommend that a study be performed that is specifically aimed at comparing automatic and 523 manual capture techniques. The fact remains, however, that whether manual or automatic 524 techniques are used, the outline digitization process is a relatively easy task and does not require 525 highly specialized skills from the user. 526 527 The landmark-based approach required a little more knowledge about the internal anatomy of the 528 parasites and did not provide satisfactory results. The lower reclassification scores of the landmark 529 data relative to the OD should not be regarded as a rule. In our case, very few reliable LM could 530 be suggested, reducing the amplitude of shape capture, which could explain the unsatisfactory 531 scores obtained. It is possible that better results might be obtained with, for instance, a different 532 landmark configuration, or a different number of landmarks. 533 534 In our validated method of reclassification each individual was sequentially removed from the total 535 sample and its identification performed using the model computed without it, a method also called 536 a "leave-one-out" or "jack-knife" procedure (Manly, 1986). Thus, individual identification was PeerJ reviewing PDF | (2019:08:40125:2:0:NEW 7 Jan 2020) Manuscript to be reviewed 537 repeated as many times as there were individuals in the total sample (see Supplementary data 1). 538 Our "accuracy" for each method (Tables 3 and 7) was the percentage of individuals that were 539 correctly identified at the end of each such procedure. For the ANN classification, each 540 classification session was repeated 30 times, and the accuracies were presented as the average and 541 standard deviation of the 30 sessions (Tables 3 and 7). 542 543 The validated classification applied to different kinds of data used different methods (Table 3). 544 The one based on the Mahalanobis distance (Dist) generally produced higher scores with more  Since the MxL method assumes normally distributed data and independence among the characters 554 used, our best results were obtained when applied to (a subset of) the PC of the log-transformed 555 input variables, rather than when applied to the raw variables (detailed comparisons not shown). 556 In agreement with previous comparisons (Dujardin et al., 2017), the MxL technique provided a 557 slightly lower accuracy than the Dist method. Manuscript to be reviewed 559 In our study, the ANN produced slightly lower scores than the Dist or the MxL approaches. Our 560 idea was to compare the taxonomic signal embedded in different types of data rather than to deeply 561 explore the power of each classification method. So, for each classification method, we tried to 562 subject each type of data to the same process. The neural network was thus configured in the same 563 way, irrespective of which data were used as the input (see Supplementary data 2). Certainly, some 564 improvements could be obtained by refining the configuration according to the kind of data being 565 used. Moreover, a possible explanation for the relative weakness of the ANN method used here is 566 the low sample sizes of the training data. The ANN method may require many data to achieve an 567 accurate level of performance (Soda, Slice & Naylor, 2017). However, the general scores were 568 satisfactory, and we showed that this technique could greatly benefit from using combined data as 569 the input, for example using a few traditional measurements together with outline shape data 570 (Table 7).      Manuscript to be reviewed Non-parametric comparisons of global size estimations, 1000 permutations (P-values) P, percentage of values equal or higher than the observed differences of mean values between species after 1000 random permutations. It is the empirical risk of error saying that the differences are significant. BP, body perimeter.
PeerJ reviewing PDF | (2019:08:40125:2:0:NEW 7 Jan 2020) Manuscript to be reviewed 1 Manuscript to be reviewed Scores of validated classification using as input for the multilayer perceptron either logtransformed linear measurements (LN) alone, or a combination of LN with outline data.
Percentages are the average of 30 ANN classification sessions (see Supplementary data 1), with the standard deviation between square brackets. LN, log-transformed linear measurements; OD, outline data, i.e., normalized elliptic Fourier coefficients. The first row may be read as follows: with only four log-transformed linear measurements (4LN, first column), 78% of correct species assignment was obtained after validated classification (second column), 88% when these data were combined with outline data (third column).
PeerJ reviewing PDF | (2019:08:40125:2:0:NEW 7 Jan 2020) Manuscript to be reviewed 1 Table 7. Scores of validated classification using as input for the multilayer perceptron either log-2 transformed linear measurements (LN) alone, or a combination of LN with outline data.

LN LN + OD
average stdev average stdev