Comprehensive Morphometric Analysis of Apple Fruits and Weighted Class Assignation using Machine Learning

doi:10.21203/rs.3.rs-2860631/v1

Download PDF

Research Article

Comprehensive Morphometric Analysis of Apple Fruits and Weighted Class Assignation using Machine Learning

https://doi.org/10.21203/rs.3.rs-2860631/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Fruit morphology description for variety registration or evaluation is mostly based on human visual inspection. However, the development of an objective and efficient method for evaluating apple fruit shape would be of significant value. Furthermore, if this method can provide a comprehensive assessment of the multiple attributes encompassed by the term “shape”, it would have great potential for genomic studies. Here, we investigated the potential of a shape analyzer software originally developed to study tomato fruits (Tomato Analyzer) for the morphometric description of apple fruits. We conducted an analysis of 12,920 images of apple sections from 364 genotypes, collected across three harvest seasons. Also, we assigned the images into classes by visual inspection. The software detected the contour of the fruits in most of the images, but with some degree of imprecision, particularly in the stalk and calyx regions. After manual correction of the contours, we obtained 15 measurements of shape and size attributes. In general, size traits had higher heritability (H²) than shape traits (0.72 vs 0.45 in average, respectively). A Random Forest model was used to identify the most important variables determining fruit shape. The fruit shape index external I (FSII) outstood in importance, followed by the fruit shape triangle (FST), the distal angle Macro (DAMa), the eccentricity (ECC), and the proximal angle macro (PAMa). Incorporating these parameters into fruit description guides could provide more precise descriptions of apple cultivars. Additionally, this data will be useful to investigate the potential genetic control of these traits through genomic studies.

apple morphology

apple fruit shape

apple cultivar characterization

random forest

apple shape classification

Apples traded in the international fresh market must meet fruit quality standards as the ones established by the OECD (2021), which include aspects such as fruit size and shape. For fruit size, apples must be larger than 60mm of diameter (or 90g), although smaller sizes may be only accepted if they have a high ºBRIX content. Uniformity in size is also required, with a lower range of variation allowed for higher quality classes. Fruit shape defects are considered a sigh of insufficient development and are only permitted for lower class qualities. Furthermore, since each apple variety has a characteristic fruit shape, deviations from it will result in their classification into lower quality classes.

Although fruit shape is one of the breeding criteria in commercial apple breeding programs, breeders do not typically breed for a specific apple shape. As such, any shape from flat-oblate to tall-conic are acceptable (Brown 1960). However, once a new variety is developed and ready for registration, fruit shape must be disclosed following standard descriptors established by the Union for the Protection of New Varieties (UPOV) as it is a varietal trait used for distinctiveness, uniformity, and stability (DUS) assessments.

Experts score key shape-related characteristics following guidelines in the form of written instructions or visual reference sketches. The UPOV descriptors (CPVO-UPOV 2006) score for six differentiated shape classes and are typically used to register new varieties. On the other hand, other descriptors, as those recommended by the European Cooperative Programme for Plant Genetic Resources (ECPGR), identify 13 visual classes (Szalatnay 2006) and are usually applied for the characterization of germplasm of repositories or bank collections. The ECPGR also includes metric descriptors based on Dapena et al. (2009), which define the shape categories as the combination of two measures: the ratio between width and heigth (known as fruit shape index, FSI ratio) and the conical aspect indicated by the ratio between the fruit width at the eye basin and stalk cavity. These authors set the numerical boundaries for FSI-derived classes at < 0.75, 0.85, 0.95,1.05 and > 1.15 for the flat to very long shapes, and the boundaries for the conical classes at < 0.715 and > 0.815 for conical to cylindrical shapes. Other works reduce the number of FSI-derived clases to three, setting the boundaries at 0.95, 1.05 and above 1.05 for the oblate spheroid, spheroid and oblong spheroid shapes, respectively (Keshavarzpour and Rashidi 2010). Some descriptors also include ribs and the eye bassin depth (Dapena et al. 2009).

Currently, these parameters are manually measured and/or established by comparison with the sketches provided in the guides, to ultimately sort the apples into defined classes. While these classifications are useful to describe cultivars, and despite some being based on objective measures, they do not provide quantitative or objective phenotypic evaluations of the whole fruit aspect in a way that could be used for genetic analysis purposes. A first step towards an exhaustive study of fruit shape variation requires objective mesurements of all possible fruit aspects that could best describe shape variation in diverse germplasm collections.

There are software applications available that can automatically analyze fruit images to obtain a range of morphometric measurements. One such software is Tomato Analyzer, which was developed to analyze tomato images and can measure up to 37 different attributes (Gonzalo et al. 2009). Although originally developed for tomatoes, it has been successfully applied to other crops such as melon, eggplant, and bell pepper (Pereira et al. 2018; Hurtado et al. 2013; Mangino et al. 2021; Nankar et al. 2020). However, the use and efficiency of Tomato Analyzer for analyzing apple fruit images, which have internal and external areas with shades in the calyx and stalk regions, has not yet been reported to our knowledge.

In this work we study the efficiency of Tomato Analyzer in the anlysis of apple images and provide a description of apple fruit size and shape attributes through quatitative values. For this, we used the Tomato Analyzer v3 to measure 12,920 bidimensional images of apple sections from 364 genotypes collected in three harvest seasons. We conducted an analysis on the distribution of attribute values with the objective of showing their variability between accessions and years, as well as their heritability. In addition, we determined which have more weight in the assignmet of the apples into visual classes, as those regularly used by breeders, evaluation officers and germplasm curators. For this, we used Random Forest Classifier, which is an ensemble learning method. The method produces N decision trees from which a consensus prediction is obtained (Breiman 2001). This method is characterized for being reliable and explicative as it returns the variables importance. The results provide new relevant measures that could help to refine fruit characterization.

Fruit sampling and processing

Apples from 364 genotypes of the apple REFPOP collection were collected in three years, 2018 (143 genotypes), 2019 (276 genotypes) and 2020 (346 genotypes) being 94 genotypes common between years (Supplementary Information 1). Each genotype of the REFPOP was duplicated in a completely random block system in the IRTA fields in Gimenells (Catalunya, Spain). The REFPOP collection was managed as indicated in Jung et al. (2022).

At least three fruits representative of the fruits at the whole tree were collected at harvest maturity (assessed by iodine solution test) and stored at 4ºC until processing. For processing, clean apples were fixed with a bench vise and cut in two sections along their longitudinal axis (from stem to calyx) with a double handle knife. Per each genotype, sections from three to five fruits from the same tree were scanned into a single image using the Mustek A3 S-Series and HP Scanjet Pro 4500 fn1 image scanners.

Measurements obtained with Tomato Analyzer and Fruit Shape Visual Categorization

A total number of 12,920 images of fruit sections were analyzed using Tomato Analyzer (TA) version 3 software (Gonzalo et al. 2009; Rodríguez et al. 2010). After visual inspection and, when needed, manual correction of the contours determined by the software, fifteen measures providing information of fruit size (4) and shape (11) attributes were obtained per each fruit section (Fig. 1 and Supplementary Information 2). Images were, in addition, visually classified in three major classes (spheroid oblate or flat, spheroid or round, and spheroid oblong) as well as in the ECPGR classes described in Szalatnay (2006). While ECPGR classification considers 13 classes, only six were observed in the studied dataset (Supplementary Information 3).

Statistical analysis

Each image processed with the Tomato Analyzer software contained from six to ten fruit sections from each tree. The mean values of all sections from each genotype were used for analysis. The data from multiple genotypes per year were displayed using boxplots. To account for environmental variables, we evaluated data from a collection of 94 genotypes that were common across three years. The data was tested for normality and homoscedasticity using Shapiro-Wilk and Bartlett tests, respectively. Genotypes that produced heteroscedastic data were excluded from ANOVA or Kruskal-Wallis tests, depending on whether the data were normally or non-normally distributed, respectively. To determine if there were significant differences in fruit morphology between years, the null hypothesis (H₀: µ₂₀₁₈ = µ₂₀₁₉ = µ₂₀₂₀) was tested using either Tukey-HDS (for the normally distributed data) or Dunn tests. PMCMRplus (Pohlert 2014) and ggplot2 (Wickham 2016) were used for statistical analysis and visualization, respectively. The Spearman correlation (r) analyses were done with the cor (r) function by R Core Team (2022) and visualized through a heatmap using the corrplot R package (Wei and Simko 2021). Data dispersion for the FSII, FST, PAMa, DAMa, and ECC measures was visualized using the seaborn library in Python (Waskom 2021). The broad sense heritability (H²) was evaluated in the collection of 94 genotypes with the R package inti (Lozano-Isla 2021). H² was calculated as the quotient between the genotypic variance (σ_g) and the phenotypic variance (σ_p), which includes the interaction between genotype and year.

Machine Learning classifier

Random Forest

A random forest model was trained using scikit-learn library (Pedregosa et al. 2011). The dataset was split in 70% for training and 30% for validation, with a random state set up to 80. We used 500 estimators with a max depth of 5 and 10 for the three CAT-own and the five ECPGR classes observed, respectively. We excluded three samples classified as “narrow conical” in the ECPGR scale due to the low frequency of the class in the dataset. The shape variables FSII, FST, PAMa, DAMa, and ECC were selected as independent variables and the CAT-own and ECPGR categories were used as dependent variables. We assessed the accuracy of the random forest model by visualizing the confusion matrix and the importance of the variables for each model using Matplotlib (Hunter 2007) and Seaborn in Python (Waskom 2021).

Use of Tomato Analyzer software for apple shape detection

The Tomato Analyzer software was used to process images of apples, but manual editing was required due to a generalized failure in detecting the calyx and stalk cavities (see Supplementary Information 4). The data obtained with the software included measures for both fruit size (4) and shape (11) attributes. For all attributes, data were quantitative continuous. Some of the shape descriptors (Supplementary Information 2) were obtained from ratios between fruit size attributes, such as the fruit shape index external I (FSII), which was derived from the ratio between fruit heigh and widht. The fruit blockiness and triangular shape (PFB, DFB and FST) were also ratios between widths at different fruit positions and reflected conical proportions. Other shape parameters described the degree to which the fruit section described an ellipsoid, circle, or rectangle (E, C, R), and the extent of deviation from a circular from, which was referred to as eccentricity (ECC, FSIINT). Measurements were also taken in specific areas of the apple, such as the angles at the stalk and calyx regions, using PAMa and DAMa (see Fig. 1).

Data and descriptive statistics for all measurements and years can be found in Supplementary Information 5 and 6, respectively, and are visualized through boxplots in Fig. 2. Among the measurements, those with lower coefficient of variation (CV) were the rectangular shape (R) with an average CV of 2.6%, and the Distal Fruit Blockiness (DFB), with an average CV of 5%. On the other hand, fruit area values (A) showed higher CV values, averaging 22.1%. In general, size attributes had higher CV values than shape attributes, with average CVs of 14.18% and 10.87%, respectively.

Using each year data, strong and very strong correlation was observed between the size attributes, with values ranging between 0.71 and 0.95 (p-value < 0.05) across the years (Fig. 3; Supplementary Information 7). Regarding the shape measures, strong positive and negative correlation was found between FST and PFB and DFB, respectively, being measures of fruit conicity. Also, the indexes FSII and FSIINT were stongly correlated, with correlation values between 0.70 and 0.74 (p-value < 0.05). In addition, a moderate to strong correlation was observed between FSIINT, ECC, and the homogeneity measures (E, C, and R).

Variance between years

The number of scanned genotypes differed for the three years of evaluations, while 94 coincided in the three sample sets to allow for the estimation of environmental effects on the traits. In the subset of 94 genotypes, the data of five attributes (FST, ECC, FSIINT, E and C) were normally distributed (p > 0.01) after logarithmic transformation (Supplementary Information 8). Homoscedasticity failed for PFB, PAMa, DAMa and E. For a proper analysis of the values across years, we removed those producing the heteroscedasticity. This meant a slight reduction of the sample size from 94 to 90 for PFB, to 91 for PAMA, to 92 for E. For the DAMa, all 2020 data was discarded.

An ANOVA or a Kruskal Wallis test waws conducted for normal and non-normal distributed traits, respectively. Differences between means (p < 0.001) were observed in five out of the 15 attributes, all of them measuring fruit shape: FST, ECC, FSIINT, E and C (Supplementary Information 8). The multiple comparison test showed that in three of them, the differences occurred with the 2020 data. We did not observe significant variations between years in the size attributes (area, width, and height).

Trait broad sense heritability (H²) ranged from 0.15 (DAMa) to 0.82 (FSII) (Supplementary Information 9). In general, size traits showed higher H² than shape traits (0.72 vs 0.45 in average, respectively). The shape related traits showing higher H² after FSII were C, FSIINT and PAMa (with H² of 0.62, 0.57 and 0.52, respectively).

Main fruit shape descriptors

The Fruit Shape Index (FSI), which is the ratio between fruit height and width, and the conical aspect of the apple, are two of the most important attributes used for the classification of fruits into categorical classes (Dapena et al. 2009). In our dataset, the FSII obtained with the Tomato Analyzer software corresponds to the FSI, and the blockiness measures (FST, PFB and DFB) resemble the conical aspect. These traits exhibited low coefficient of variation (CV) between years, ranging from 7.90–9.59%, indicating moderate variation in the evaluated sample subsets.

The FSII reached values from 0.67 to 1.14 (Fig. 4a, e; Supplementary Information 5 and 6), with a mean of 0.825 and an average CV of 8.22%. The varieties ‘Gros Api’ (MUNQ 33) and ‘Belle Flavoise’ (MUNQ270), described as “flat” in the National Fruit Collection (NFC) database (http://www.nationalfruitcollection.org.uk/index.php), were among the ones with lower FSII, together with others like ‘Grenadier’ (MUNQ 93) and ‘Szaszpap Alma’ (MUNQ 2990) described as “broad globose conical”. In contrast, the variety with higher FSII value was ‘Skovfoged’ (MUNQ 345), described as “narrow conical”.

The ratio between the width at the stalk cavity (proximal) and eye basin (distal) shoulders (FST) showed a mean value of 1.1, with an average CV of 9.3% when considering the three years data (Fig. 4b, f). Some varieties showed contrasting values between years (Supplementary Information 5). This was the case of ‘Reinette d’Anthezieux’ (MUNQ 405) (FST₂₀₁₉ = 1.743; FST₂₀₂₀ = 1.052) described as “broad globose conical” and ‘Reinette Sanguine du Rhin’ (MUNQ 491) (FST₂₀₁₉ = 1.155; FST₂₀₂₀ = 0.643) described as oblong, for example. A revision of the images revealed a detection error of the Tomato Analyzer software, which does not recognize well the shoulder limits in asymmetric apple sections (Supplementary Information 4).

Supervised machine learning

The apples were annotated following two visual classifications: 1) a simple one considering only three classes (spheroid oblate, spheroid and spheroid oblong) that we called CAT-own, and 2) the ECPGR catalog (Supplementary Information 3). To identify the key attributes for fruit classification, we employed a supervised machine learning classifier called Random Forest. We evaluated the importance of these traits in both the CAT-own and ECPGR classification methods. In the CAT-own, the model achieved high accuracy (0.90) and f1-scores ranging from 0.82 and 0.92 across classes, as shown in Table 1. Similarly, In the ECPGR classification, the model achieved an accuracy of 0.9 and f1-scores ranging from 0.71 to 0.98 across classes.

The confusion matrix revealed that in the CAT-own model, 10 out of the 102 spheroid fruits (10%) were missclassified as spheroid oblate, while nine out of 118 spheroid oblate fruits (8%) were missclassified as spheroid (Fig. 5a). Additionally, the model incorrectly predicted three out of 10 spheroid oblong fruits (30%) as spheroids (Fig. 5b). For the ECPGR model, we found that the FSII atrribute was the most relevant trait in both classification methods, as depicted in Fig. 5c. Furthermore, in the ECPGR classification we found that the FST (a descriptor of fruit conicity) and DAMa (a descriptor of the eye basin) were the second and third most important traits, respectively (Fig. 5d). We computed the confusion matrix for the five most frequent categories observed in the sampleset. We observed 22 missclassifications, of which 15 (70%) were between broad-globose-conical and flat globose classes.

Table 1

Classification results from the random forest models using CAT-own and ECPGR classes.
Model	Categories	Precision	Recall	F1-score	Support
CAT-own	Spheroid	0,88	0,9	0,89	102
	Spheroid Oblate	0,92	0,92	0,92	118
	Spheroid Oblong	1	0,7	0,82	10
	Accuracy			0,9	230
	Macro avg	0,93	0,84	0,88	230
	Weighted avg	0,91	0,9	0,9	230
ECPGR	Broad-globose-conical	0,9	0,94	0,92	110
	Flat	1	0,97	0,98	32
	Flat globose	0,75	0,67	0,71	27
	Globose	0,85	0,85	0,85	13
	Globose conical	0,93	0,93	0,93	46
	Accuracy			0,9	228
	Macro avg	0,89	0,87	0,88	228
	Weighted avg	0,9	0,9	0,9	228

Data dispersion fo the five shape important parameters higlighted in the random forest analysis (FSII, FST, PAMa, DAMa, ECC) is shown in Fig. 6. Confirming the random forest results, the FSI was the attribute discriminatting better the samples classified into the CAT-own classes, while some class overlapping was observed when using the ECPGR classes, as for example the broad-globose-conical and flat globose classes, globose conical and globose classes.

To classify apples based on their shape, guidelines typically rely on the ration between the heigh and diameter of the fruit, as well as the ration between the width of the shoulders at both ends. However, while these attributes are important, there are other minor attributes that can also contribute to the overall perception of the fruit shape. To gather objective data on these attributes, we have conducted a comprehensive morphometric analysis of apple fruit shape.

To carry out this analysis, we used 364 genotypes from a wide germplasm collection that represents the genetic diversity of European apple germplasm (the apple REFPOP). We cut the apples into two sections, scanned the sections, and processed the resulting 2D images with the Tomato Analyzer software to provide data on 15 size and shape-related attributes. While this software has been used in previous studies to evaluate the shape of fruits from different species and even leaves (Gonzalo et al. 2009; Nankar et al. 2020; Pereira et al. 2021; Sierra-Orozco et al. 2021), we found that its use in analyzing a large number of apple fruit sections was limited by the need of manual correction of the image boundaries.

The guidelines for characterizing apple cultivars, such as the one of the UPOV (CPVO-UPOV 2006), include descriptors for tree morphology, flowering, productive cycle, and fruit quality. Shape descriptors typically rely on FSI ratios and comparison with sketches and reference cultivars. Other guides also use additional measurements, such as the ratios of the diameter of the stalk cavity and eye basin, in addition to the FSI (Szalatnay 2006; Dapena et al. 2009). In our study, we obtained similar measurements to those commonly used in whole fruits, such as the FSII (height/diameter) and the FST (average stalk cavity/eye basin diameter), as well as other measurements taken manually with a caliper. However, despite observing asymmetry in the fruit shape, we were unable to measure it using the Tomato Analyzer software.

Breeding studies have shown that the size and shape of apples are two traits that are inherited independently (Chang et al. 2014). However, few available studies have been conducted on the variability and heritability of apple shape using a limited number of varieties. Our findings confirm previous research by Crane and Lawrence (1933), which showed that the variation of FSI between years was not significant. Interestingly, Brown (1960) found that the mean FSI of offspring was usually around the midpoint between that of the two parents. These results suggest a high genetic component of FSI and demonstrate the potential of this parameter to use in breeding programs.

Random Forest models are suitable for modelling multidimensional data (Qi 2012) and have been widely used to classify crop related traits. For example, Sánchez-Galán et al. (2021) used surpervised classification to analyse local Panamanian rice crops using plant phenology and Near-Infrared (NIR) traits, while Moradi et al. (2021) applied this approach to predict growth regions for Moringa peregrina, a tree species that contributes to the restoration of fragile ecosystems. Zhang et al. (2021) used Random Forest to predict winter wheat leaf water content, and Tatsumi et al. (2021) utilized this algorithm to identify important variables for predicting tomato yields using aerial vehicle imagery. In our study, the Random Forest algorithm demonstrated high precision in predicting most of the shapes of the two classifications, without confounding the extreme classes. This analysis revealed that there are additional fruit morphological attributes beyond FSII (i.e. FST, DAMa, PAMa and ECC) that are important for accurately classifying apple fruits. Although interesting measures have been obtained with the Tomato Analyzer software, a specialized software that can accurately recognise apple fruit morphology and provide such measures is required. Recently, pipelines have been published for 2D fruit images segmentation (Zingaretti et al. 2021) or even for fruit reconstruction from 3D images (Wang and Chen, 2020). However, a free and user-friendly software is still the needed for the morphological study of apple fruits.

In this study, we conducted a high-throughput phenotyping assay of apple fruit morphology using 2D images, analysing the variability in germplasm across different years and measurements. We utilized the Tomato Analyzer software to generate accurate measurements, but manual modification of the images was necessary for most of the samples, limiting its use in high-throughput phenotyping of apple samples. Several of the evaluated traits showed high heritability, suggesting that genetic factors play an important role in determining fruit morphology.

We found that among the most informative traits for determining fruit shape were FSII and FST, followed by DAMa, ECC, and PAMa. The high heritability of these traits indicates their potential usefulness in genomic studies.

Acknowledgements

The authors would like to thank the field technicians at Gimenells, IRTA, as well as several undergraduate and master’s students who completed their internships in the Rosaceae Genomic Group at CRAG.

Funding

CD was supported by “DON CARLOS ANTONIO LOPEZ” Abroad Postgraduate Scholarship Program, BECAL-Paraguay. FJR is recipient of grant PRE2019-087427 funded by MCIN/AEI/ 10.13039/501100011033 and by “ESF Investing in your future. This research was supported by projects RTI2018-100795-B-I00 and PID2021-128885OB-I00 funded by MCIN/AEI/10.13039/501100011033 and by “ERDF A way of making Europe”. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 817970 (INVITE). We acknowledge support from the CERCA Programme (“Generalitat de Catalunya”), and the “Severo Ochoa Programme for Centres of Excellence in R&D” 2016-2019 (SEV-2015-0533) and 2020-2023 (CEX2019-000902-S) both funded by MCIN/AEI /10.13039/501100011033.

Competing interests

The authors have no relevant financial or non-financial interests to disclose.

Author contributions

The experiment was conceived by MJA and CD, who also collected and analyzed the data and wrote the manuscript. FD contributed machine learning analysis and provided critical support for the data analysis and manuscript. All authors have reviewed and approved the final manuscript.

Data availability

The datasets generated analyzed during the current study are available from the corresponding author on reasonable request.

Breiman L (2001) Random forests. Machine learning, 45(1), 5-32. https://doi.org/10.1023/A:1010933404324
Brown AG (1960) The inheritance of shape, size and season of ripening in progenies of the cultivated apple. Euphytica, 9, 327-337. https://doi.org/10.1007/BF00029485
Chang Y, Sun R, Sun H, Zhao Y, Han Y, Chen D, Wang Y, Zhang X, Han Z (2014) Mapping of quantitative trait loci corroborates independent genetic control of apple size and shape. Sci. Hortic. 174, 126–132. https://doi.org/10.1016/j.scienta.2014.05.019
CPVO-UPOV 2006. Protocol for Distinctness, uniformity, and stability tests – Malus domestica Borkh. – Apple, Angers, pp. 43.
Crane MB, Lawrence WJC (1933) Genetical studies in cultivated apples. Journal of Genetics, 28(2), 265-296.
Dapena E, Blazquez MD (2009) Descripción de las variedades de Manzana de la D.O.P Sidra de Asturias. Villaviciosa. 69pp. http://www.serida.org/pdfs/4071.pdf
Gonzalo MJ, Brewer MT, Anderson C, Sullivan D, Gray S, van der Knaap E (2009) Tomato fruit shape analysis using morphometric and morphology attributes implemented in Tomato Analyzer software program. Journal of the American Society for Horticultural Science, 134(1), 77-87. https://doi.org/10.21273/JASHS.134.1.77
Hunter JD (2007) Matplotlib: A 2D graphics environment. Computing in science , engineering, 9(03), 90-95. https://doi.org/10.1109/MCSE.2007.55
Hurtado M, Vilanova S, Plazas M, Gramazio P, Herraiz F J, Andújar I, Prohens J (2013) Phenomics of fruit shape in eggplant (Solanum melongena L.) using Tomato Analyzer software. Scientia Horticulturae, 164, 625-632. https://doi.org/10.1016/j.scienta.2013.10.028
Jung M, Keller B, Roth M, Aranzana MJ, Auwerkerken A, Guerra W, Al-Rifaï M, Lewandowski M, Sanin N, Rymenants M, Didelot F, Dujak C, Font i Forcada C, Knauf A, Laurens F, Studer B, Muranty H, Patocchi A (2022) Genetic architecture and genomic predictive ability of apple quantitative traits across environments. Horticulture research, 9. https://doi.org/10.1093/hr/uhac273
Keshavarzpour F, Rashidi M (2010) Classification of apple size and shape based on mass and outer dimensions. Am.-Eurasian J. Agric. Environ. Sci, 9(6), 618-621.
Lozano-Isla F (2021) Inti: Tools and Statistical Procedures in Plant Science. R Package Version 0.1, 3.
Mangino G, Vilanova S, Plazas M, Prohens J, Gramazio P (2021) Fruit shape morphometric analysis and QTL detection in a set of eggplant introgression lines. Scientia Horticulturae, 282, 110006. https://doi.org/10.1016/j.scienta.2021.110006
Moradi E, Abdolshahnejad M, Hassangavyar MB, Ghoohestani G, da Silva AM, Khosravi H, Cerdà A (2021) Machine learning approach to predict susceptible growth regions of Moringa peregrina (Forssk) Ecological Informatics, 62, 101267. https://doi.org/10.1016/j.ecoinf.2021.101267
Nankar AN, Tringovska I, Grozeva S, Todorova V, Kostova D (2020) Application of high-throughput phenotyping tool Tomato Analyzer to characterize Balkan Capsicum fruit diversity. Scientia Horticulturae, 260, 108862. https://doi.org/10.1016/j.scienta.2019.108862
OECD (2021) Apples, International Standards for Fruit and Vegetables, OECD Publishing, Paris. https://doi.org/10.1787/12ebba9f-en-fr
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825-2830.
Pereira L, Ruggieri V, Pérez S, Alexiou KG, Fernández M, Jahrmann T, Pujol M, Garcia-Mas J (2018) QTL mapping of melon fruit quality traits using a high-density GBS-based genetic map. BMC plant biology, 18(1), 1-17. https://doi.org/10.1186/s12870-018-1537-5
Pereira L, Zhang L, Sapkota M, Ramos A, Razifard H, Caicedo AL, van Der Knaap E (2021) Unraveling the genetics of tomato fruit weight during crop domestication and diversification. Theoretical and Applied Genetics, 134(10), 3363-3378. https://doi.org/10.1007/s00122-021-03902-2
Pohlert T (2014) The pairwise multiple comparison of mean ranks package (PMCMR) R package, http://CRAN.R-project.org/package=PMCMR.
Qi Y (2012) Random forest for bioinformatics. In Ensemble machine learning (pp. 307-323) Springer, Boston, MA.
R Core Team (2022) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.
Rodríguez GR, Moyseenko JB, Robbins MD, Morejón NH, Francis DM, van der Knaap E (2010) Tomato Analyzer: a useful software application to collect accurate and detailed morphological and colorimetric data from two-dimensional objects. JoVE (Journal of Visualized Experiments), 37, e1856.
Sánchez-Galán JE, Barranco FR, Reyes JS, Quirós-McIntire EI, Jiménez JU, Fábrega JR (2021) Using Supervised Classification Methods for the Analysis of Multi-spectral Signatures of Rice Varieties in Panama. Advances in Science, Technology and Engineering Systems Journal, 6 , 552-558. https://doi.org/10.25046/aj060262
Sierra-Orozco E, Shekasteband R, Illa-Berenguer E, Snouffer A, van der Knaap E, Lee TG, Hutton SF (2021) Identification and characterization of GLOBE, a major gene controlling fruit shape and impacting fruit size and marketability in tomato. Horticulture research, 8. https://doi.org/10.1038/s41438-021-00574-3
Szalatnay D, Bauermeister R (2006) Obst-Deskriptoren NAP. Stutz Druck AG, 8820.
Tatsumi K, Igarashi N, Mengxue X (2021) Prediction of plant-level tomato biomass and yield using machine learning with unmanned aerial vehicle imagery. Plant methods, 17(1), 1-17. https://doi.org/10.1186/s13007-021-00761-2
Wang Y, Chen Y (2020) Fruit morphological measurement based on three-dimensional reconstruction. Agronomy, 10(4), 455. https://doi.org/10.3390/agronomy10040455
Waskom ML (2021) Seaborn: statistical data visualization. Journal of Open Source Software, 6(60), 3021. https://doi.org/10.21105/joss.03021
Wei T, Simko V (2021) R package 'corrplot': Visualization of a Correlation Matrix (Version 0.92) https://github.com/taiyun/corrplot
Wickham H (2016) Package ‘ggplot2’. Create elegant data visualisations using the grammar of graphics. Springer-Verlag New York. ISBN 978-3-319-24277-4. Version 2. https://ggplot2.tidyverse.org
Zhang J, Zhang W, Xiong S, Song Z, Tian W, Shi L, Ma X (2021) Comparison of new hyperspectral index and machine learning models for prediction of winter wheat leaf water content. Plant Methods, 17(1), 1-14. https://doi.org/10.1186/s13007-021-00737-2
Zingaretti LM, Monfort A, Pérez-Enciso M (2021) Automatic fruit morphology phenome and genetic analysis: An application in the octoploid strawberry. Plant Phenomics, 2021. https://doi.org/10.34133/2021/9812910

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Comprehensive Morphometric Analysis of Apple Fruits and Weighted Class Assignation using Machine Learning

Status:

Version 1

Abstract

Figures

Introduction

Materials and Methods

Fruit sampling and processing

Measurements obtained with Tomato Analyzer and Fruit Shape Visual Categorization

Statistical analysis

Machine Learning classifier

Random Forest

Results

Use of Tomato Analyzer software for apple shape detection

Variance between years

Main fruit shape descriptors

Supervised machine learning

Discussion

Conclusion

Declarations

References

Additional Declarations

Supplementary Files

Status:

Version 1