FruitSeg30_Segmentation dataset & mask annotations: A novel dataset for diverse fruit segmentation and classification

Fruits are mature ovaries of flowering plants that are integral to human diets, providing essential nutrients such as vitamins, minerals, fiber and antioxidants that are crucial for health and disease prevention. Accurate classification and segmentation of fruits are crucial in the agricultural sector for enhancing the efficiency of sorting and quality control processes, which significantly benefit automated systems by reducing labor costs and improving product consistency. This paper introduces the “FruitSeg30_Segmentation Dataset & Mask Annotations”, a novel dataset designed to advance the capability of deep learning models in fruit segmentation and classification. Comprising 1969 high-quality images across 30 distinct fruit classes, this dataset provides diverse visuals essential for a robust model. Utilizing a U-Net architecture, the model trained on this dataset achieved training accuracy of 94.72 %, validation accuracy of 92.57 %, precision of 94 %, recall of 91 %, f1-score of 92.5 %, IoU score of 86 %, and maximum dice score of 0.9472, demonstrating superior performance in segmentation tasks. The FruitSeg30 dataset fills a critical gap and sets new standards in dataset quality and diversity, enhancing agricultural technology and food industry applications.


Value of the Data
• Automated Quality Assessment: In the agriculture sector, the precise annotations and diverse image collection of the FruitSeg30 dataset enable the development of advanced deep-learning models for automated fruit quality assessment.These models can be integrated into sorting and grading systems in processing facilities to classify fruits based on quality metrics, leading to a more efficient packing and distribution process.• Yield Prediction: Agronomists and agricultural researchers can leverage the dataset to train models capable of predicting fruit yield images captured throughout the growing season.Such predictive models assist in estimating yield quantities, facilitating better resource management and crop planning, thus optimizing agricultural output.• Nutritional Monitoring and Diet Management: The dataset can be utilized by developers of health and wellness applications to recognize and classify fruits in meal photos, providing users with instant nutritional information, including calorie estimates.The application supports monitoring management and encourages healthier eating habits.• Educational Tools: Educational institutions and software developers can use the FRuitSeg30 dataset to create tools and applications that educate students and aspiring professionals in image processing, machine learning and segmentation techniques.These teaching technologies have the potential to greatly improve learning experiences by offering practical exercises with authentic data.• Market Analysis and Trend Prediction: The dataset's comprehensive coverage of fruit varieties across different geographical regions provides valuable insights for market analysis within the agricultural sector.Businesses can use these insights to forecast market trends, plan crop production, and strategize on distribution to meet regional demands effectively.

Background
The advent of deep learning has significantly enhanced the capabilities of image recognition systems, particularly in the agricultural sector, where such advancements can lead to improved outcomes in areas like yield estimation, fruit sorting, and disease detection [1][2][3][4][5].Despite these advancements, the performance of machine learning models heavily relies on the quality and diversity of the datasets used during the training process [6][7][8].Traditionally, fruit recognition systems have employed datasets captured under controlled environmental conditions, involving fruits presented against uniform backgrounds and consistent lighting conditions [ 9 ].While this simplifies the image segmentation and recognition task, it does not adequately prepare models for the complexities and variabilities in natural environments [ 10 ].
Current datasets typically lack the variability necessary to mimic real-world conditions, where fruits appear in diverse settings, often under varying lighting conditions and backgrounds [ 11 ].Furthermore, these datasets rarely include a wide range of fruit types, further limiting the robustness and applicability of the resulting models [ 12 ].
The "FruitSeg30_Segmentation Dataset & Mask Annotations" was developed to address these shortcomings by providing a richly annotated dataset captured from a variety of natural and uncontrolled environments across Malaysia, Bangladesh and Australia.This dataset comprises 1969 images across 30 distinct fruit classes, each annotated with precise segmentation masks.This dataset not only advances the field by filling the gap in available resources but also supports the development of more sophisticated image segmentation models that are crucial for applications such as automated fruit quality assessment, yield prediction, and real-time monitoring of fruit growth and health.Additionally, the diverse backgrounds and lighting conditions included in the dataset challenge existing models, pushing the envelope on what these algorithms can achieve and where they fail, thereby providing a path toward significant improvements in model accuracy and reliability.

Data Description
The "FruitSeg30_Segmentation Dataset & Mask Annotations" consists of a comprehensive collection of high-resolution fruit images meticulously annotated with segmentation masks.This dataset is segmented into 30 distinct fruit classes, providing a total of 1969 images alongside their corresponding segmentation masks.Each image and mask pair has 512 × 512 pixel resolution, ensuring uniformity and high quality for detailed image processing tasks.
The images in this dataset were captured under diverse environmental conditions in Malaysia, Bangladesh, and Australia, ensuring variability in lighting, angles, and backgrounds.This diversity enhances the robustness of the dataset, making it suitable for training and evaluating image segmentation models in real-world scenarios.
Each class folder is organized into two subfolders: • Images: This subfolder contains high-quality JPG images of fruits.The images feature various backgrounds, ranging from natural outdoor settings (with natural lighting and real-world elements) to controlled indoor environments (with consistent lighting and minimal distraction).This variety includes different textures, lighting conditions, and scenes, providing a comprehensive training ground for machine-learning models.
• Mask: This subfolder contains the corresponding segmentation masks in PNG format.These masks are binary images where the fruit is clearly delineated from the background.In these masks, white pixels represent the fruit, while black pixels represent the background.The masks provide a precise annotation, which is essential for accurate segmentation tasks.

Data collection techniques
• Capture Devices: Images were taken using various cameras to capture a broad spectrum of lighting conditions, angles, and backgrounds.This approach ensures a realistic representation of fruits in different environments.• Environmental Conditions: Images were captured in both natural and controlled settings, reflecting a wide array of scenarios that fruits may be subjected to in real-world conditions.This includes different times of day, verifying light intensities, and diverse geographical locations.

• Segmentation Masks:
The mask creation process for the "FruitSeg30_Segmentation Dataset & Mask Annotations" dataset involves automated background removal and manual verification to ensure high accuracy and reliability.The primary tool used for this process is the 'rembg' library, which leverages advanced machine learning techniques to separate the foreground (fruit) from the background, and create precise binary masks.Each image I is loaded using the Python Imaging Library (PIL).
The 'rembg' library is applied to remove the background, producing an image I no_bg with the fruit in the foreground and a transparent background.
A binary mask M is created from the processed image.The mask is a grayscale image where the fruit is represented by white pixels (value 255) and the background by black pixels (value 0).This is achieved by extracting the alpha channel α of I no_bg .
Then, the processed image I no_bg and the binary mask M are saved in the respective directories.The entire process of generating segmented masks is illustrated in Algorithm 1 .

Preprocessing
The "FruitSeg30_Segmentation Dataset & Mask Annotations" dataset comprises a meticulously curated collection of 1969 images across 30 distinct fruit classes, selected from various environments to ensure a representative and diverse sample for the purpose of training a deep learning model.Each image and its corresponding mask were resized to a consistent resolution of 512 ×512 pixels to standardize input processing across the dataset.Furthermore, the pixel values were normalized to the range [0, 1] by dividing by 255, ensuring data scaling and improving deep learning model training consistency.Fig. 3 represents the entire procedure of the dataset collection and processing.

Segmentation using U-Net
The U-Net model [ 13 ] employed for segmentation consists of an encoder-decoder architecture with skip connections.The encoder (contracting path) reduces the spatial dimensions while increasing the feature dimensions, and the decoder (expansive path) progressively recovers the spatial resolution and detail.

U-Net architecture
• Encoder: The contracting path includes four levels of double conventional layers followed by a max-pooling layer.Each convolutional layer uses a 3 × 3 kernel with ReLU activation.The sequence of feature channels across the encoder is as follows: 64 → 128 → 256 → 512.Maxpooling is performed with a 2 × 2 window and stride of 2, reducing the spatial dimensions by half after each level.• Bottleneck: At the bottom of the U-Net, a bridge connects the contracting and expansive paths, consisting of two 3 × 3 convolutions with 1024 filters each, followed by ReLU activation.This section is crucial as it processes the most compressed feature representation.• Decoder: The expansive path mirrors the encoder, using transposed convolutions for upsampling.At each stage, the feature map is upsampled, followed by a concatenation with the corresponding cropped feature map from the encoder path, maintaining high-resolution features for precise localization.The layers in the decoder are structured as follows: 512 → 256 → 128 → 64.

Loss function and optimization
The model optimization was performed using the Adam optimizer [ 14 ] with a learning rate of 1 × 10 −4 .A custom dice loss [ 15 ] function was utilized, defined as: y true .ypred + 1 This loss function is particularly effective for data with imbalanced classes, as it helps achieve a balance between precision and recall by maximizing the overlap between predicted and true masks.Algorithm 2 provides a detailed overview of the applied U-Net model.

Training, validation and analysis
The U-Net model was methodically trained over 100 epochs using a batch size of 8. Several strategies were employed to mitigate the risk of overfitting, which was particularly important due to the model's complexity and the detailed nature of image segmentation tasks.Callbacks for early stopping were set to monitor the validation loss, terminating training if the loss did not improve for a specified number of epochs.This approach ensures that the model retains its generalization capabilities and does not merely memorize the training data.Additionally, a model checkpoint strategy was implemented to save the model weights at the epoch where it achieved the lowest validation loss, thereby capturing the most effective version of the model during the training process.Table 1 provides the summary of the performance of the applied U-Net model on our dataset "FruitSeg30_Segmentation Dataset & Mask Annotations".The model processes images with an input specification of (256, 256, 3), enabling detailed feature extraction crucial for precise image segmentation.Over 1579 samples were trained with an 80:20 train/validation split over 100 epochs, achieving an outstanding training and validation accuracy of 94.72 % and 92.57%, re- spectively.The model's precision was 94 %, recall 91 %, F1 score 92.5 %, and IoU scores 86 %, indicating high accuracy and balanced performance in segmentation tasks.These impressive metrics highlight the robustness and effectiveness of the "FruitSeg30" dataset, which includes 30 distinct fruit classes with comprehensive segmentation masks.Furthermore, the outputs ( Table 2 ) from the U-Net model applied to our dataset demonstrate a broad spectrum of segmentation capabilities, with varying Dice Scores across different fruit types that underscore the dataset's robustness and diversity.The scores highlight the dataset's correctness in presenting challenging scenarios involving reflective surfaces and complex textures, providing invaluable opportunities for rigorous model testing and enhancement.The scores also illustrate the dataset's comprehensive coverage of textural and color complexities, making it an excellent tool for advancing segmentation techniques across varied conditions.This dataset not only includes a wide array of fruit types, each presenting a unique challenge in terms of shape, texture, and background conditions, but it also enriches the field of machine learning by introducing real-world complexities that are crucial for developing robust models.

Confusion matrix analysis
The confusion matrix presented in Fig. 5 provides a detailed analysis of the U-Net model's performance on the validation set of the dataset, which comprises 390 images across 30 different fruit classes.Each cell in the matrix represents the number of instances where the actual class (True label) matches the predicted class, with diagonal elements indicating correct predictions and off-diagonal elements indicating misclassifications.
The diagonal elements, which represent the correct classifications for each fruit class, show high values, reflecting the model's high accuracy.For instance, the model correctly identified out of 13 Apple_Gala images, 11 out of 11 Apple_Golden Delicious images, and 13 out of 14 Avocado images.Similar high accuracies are observed across most classes, highlighting the model's robustness and reliability.Misclassification, represented by the off-diagonal elements, is relatively sparse, indicating that the model rarely confuses one fruit class with another.For example, there is only one misclassification for Apple_Gala, Avocado, Date Palm, and two in Olive, which was incorrectly predicted as another class.Such low misclassification rates further underline the effectiveness of the FruitSeg30 dataset in training accurate segmentation models.
The confusion matrix also emphasizes the model's ability to generalize well across diverse fruit classes.This strong performance can be attributed to the high-quality annotations and diverse environmental conditions captured in the dataset, which include variations in lighting, angles and backgrounds.

Hardware and software
Experiments were conducted on a computing system equipped with a Ryzen 7 3800X processor, 32GB of RAM, and an NVIDIA RTX 4070 GPU.The software stack was managed via Anaconda, facilitating an organized package management and deployment environment.The model was developed and tested using Spyder IDE, part of the Anaconda Suite, which provided an efficient and user-friendly interface for writing and debugging Python code.The model was implemented using TensorFlow 2.x and Keras, providing a flexible and powerful platform for designing and training deep learning models.

Table 2
Overview of U-Net outputs displaying original mage with true mask, predicted segmented mage, and dice scores.

Analysis of the Model Performance and Dataset Characteristics
The comparative analysis in Table 3 demonstrates the robustness and high performance of the U-Net model trained on the FruitSeg30 dataset.The U-Net model achieved an accuracy of 92.57%, precision of 94 %, recall of 91, F1-score of 92.5 % and an IoU score of 86 %.These metrics are competitive and often surpass those achieved with other datasets.For instance, M.D. Barbole et al. [ 25 ] reported a precision of 95 % and an F1-score of 90.28 % using the U-Net model on the Embrapa Wine Grape Instance Segmentation Dataset (WGISD).In comparison, our study achieved a higher F1-score of 92.5 % and comparable precision of 94 % using the U-Net model on our novel dataset.Similarly, the IoU score of 86 % on the FruitSeg30 dataset is significantly higher than the 71.6 % reported by X. Ni et al. [ 27 ] using Mask R-CNN on the Blueberry traits extraction and analysis dataset.Additionally, the comparison shows that while the Total generalized variation fuzzy C means (TGVFCMS) model by V. G Krishnan et al. [ 26 ] achieved a high accuracy of 93.45 %, it lacks reported precision, F1-score, and IoU, making it difficult to assess the overall segmentation performance comprehensively.In contrast, the FruitSeg30 dataset's comprehensive metrics provide a clearer view of its effectiveness.
Table 4 provides a comprehensive analysis of the "FruitSeg30_Segmentation Dataset & Mask Annotations" in relation to the other datasets within the field, highlighting its distinctive attributes and strengths.This dataset stands out significantly compared to existing collections due to its diverse geographical coverage, extensive class variety, and superior annotation quality.It is distinguished from other datasets by its international scope, which further enhances its robustness and applicability in various global contexts, a critical factor in the development of universally effective machine learning models.Although it comprises 1969 images, which may seem modest relative to larger datasets, it uniquely covers 30 classes, thereby offering a broader range of categories for more comprehensive image analysis tasks.A distinguishing feature of this dataset is the inclusion of segmentation masks, which are invaluable for precise image segmentation tasks and are only paralleled by a few other datasets.The images are also provided at a resolution of 512 ×512 pixels, which is particularly suitable for practical machine learning applications that require moderate image detail without the computational burden associated with higher resolutions.This resolution balances the need for details with computational efficiency.Consequently, this dataset not only satisfies but also surpasses the convolutional standards for dataset construction, providing a versatile instrument for advancing research in image processing and machine learning in various realistic environments.

Limitations
The FruitSeg30 dataset, although characterized by a variety of unique attributes, is limited by its relatively small size, consisting of 1969 images.To maintain the original quality and integrity of the dataset, it was kept in its unmodified, raw form without any augmentation.This limitation may restrict the generalizability of models trained on this dataset.To address this issue, researchers can employ various data augmentation techniques, including rotations, flips, scaling, cropping and color adjustments.These methods help artificially expand the dataset's size.Additionally, synthetic data generation methods such as Generative Adversarial Networks (GANs) can be used to create more training samples.Despite its limitations, the dataset's high quality, detailed annotations and diverse environmental conditions significantly enhance its robustness and applicability for fruit segmentation tasks.

Ethics Statement
This article does not involve any research on human or animal subjects by any authors.The datasets used are publicly accessible.It is essential to adhere to established citation guidelines when employing the datasets.

Fig. 1 .
Fig. 1.Representative images from each of the 30 classes in the FruitSeg30 dataset.

Fig. 2 .
Fig. 2. Provide an overview of the dataset's number of images and segmentation masks.

Fig. 5 .
Fig. 5. Confusion matrix illustrating the U-Net model's performance on the validation set of the dataset, highlighting correct classifications and misclassifications across 30 fruit classes.

Table 1
Parameters and performance summary of the U-Net model.

Table 3
Comparison of the model's performance on the FruitSeg30 dataset with other existing datasets.

Table 4 A
comparative analysis of the related datasets based on various properties.