DeepFruit: A dataset of fruit images for fruit classification and calories calculation

A dataset of fully labeled images of 20 different kinds of fruits is developed for research purposes in the area of detection, recognition, and classification of fruits. Applications can range from fruit recognition to calorie estimation, and other innovative applications. Using this dataset, researchers are given the opportunity to research and develop automatic systems for the detection and recognition of fruit images using deep learning algorithms, computer vision, and machine learning algorithms. The main contribution is a very large dataset of fully labeled images that are publicly accessible and available for all researchers free of charge. The dataset is called “DeepFruit”, which consists of 21,122 fruit images for 8 different fruit set combinations. Each image contains a different combination of four or five fruits. The fruit images were captured on different plate sizes, shapes, and colors with varying angles, brightness levels, and distances. The dataset images were captured with various angles and distances but could be cleared by utilizing the preprocessing techniques that allow for noise removal, centering of the image, and others. Preprocessing was done on the dataset such as image rotation & cropping, scale normalization, and others to make the images uniform. The dataset is randomly partitioned into an 80% training set (16,899 images) and a 20% testing set (4,223 images). The dataset along with the labels is publicly accessible at: https://data.mendeley.com/datasets/5prc54r4rt.

a b s t r a c t A dataset of fully labeled images of 20 different kinds of fruits is developed for research purposes in the area of detection, recognition, and classification of fruits.Applications can range from fruit recognition to calorie estimation, and other innovative applications.Using this dataset, researchers are given the opportunity to research and develop automatic systems for the detection and recognition of fruit images using deep learning algorithms, computer vision, and machine learning algorithms.The main contribution is a very large dataset of fully labeled images that are publicly accessible and available for all researchers free of charge.The dataset is called "DeepFruit", which consists of 21,122 fruit images for 8 different fruit set combinations.Each image contains a different combination of four or five fruits.The fruit images were captured on different plate sizes, shapes, and colors with varying angles, brightness levels, and distances.The dataset images were captured with various angles and distances but could be cleared by utilizing the preprocessing techniques that allow for noise removal, centering of the image, and others.Preprocessing was done on the dataset such as image rotation & cropping, scale normalization, and others to make the images uniform.The dataset is randomly partitioned into an 80% training set (16,899 images) and a 20% testing set (4,223 images).The dataset along with the labels is publicly accessible at: https://data.

Value of the data
• This dataset is useful for fruit recognition and calorie estimation from the images, which can be helpful for diet control [1][2][3] .This dataset contains images of different combinations of fruits, which makes it possible to develop multi-type fruit identification models.These models can be used in health monitoring applications to observe fruit intake and calorie estimation.
• The data can be used by machine learning researchers/companies to develop models for recognizing different fruits [4][5] .
• The current research trends in deep learning and machine learning target mainly the development of applications for everyday use such as face recognition, fingerprint recognition, or application in the fields of healthcare, engineering, and many others.Image recognition applications usually go through different phases starting from preprocessing to recognition for autonomous tasks that are usually done by persons.Preprocessing may consist of processes such as data cleaning, dimensionality reduction, resizing, and labeling.Recognition is the process of categorizing the objects of interest.This is a general process that is used in machine learning.The Fruits images dataset can be used or reused as a unique resource for researchers working on the development of applications for fruit recognition, daily diet intake plans, education, and learning about nutrition facts in different fruits and other related applications using machine learning approaches.
• The DeepFruit dataset with different combinations is a comprehensive repository of fully labeled 20 different types of fruit images to develop automated applications related to fruit recognition and diet plans.• The Fruits images dataset serves as a base for researchers to enhance and develop this dataset further by producing more images with more variations.Researchers can further add more classes to the dataset, increase the number of images per class, increase the complexity of the images, etc.The researchers will develop algorithms and applications that can be used by consumers for dietary purposes or any other purpose.• Using existing nutritional databases, it is easy to calculate the calories when the number and types of fruit are identified as every fruit used to have specific calories [6] .The following are general steps to calculate the calories of a fruit plate: 1. Determine the types and number of fruits in the fruit plate; 2. Use a calorie database and find the calorie content of each type of fruit in the fruit plate; 3. Multiply the calorie content of each fruit portion size of the fruit in the plate; 4. Add up the calorie values of all the fruits in the fruit plate to get the total calorie count of the plate.

Objective
The existing Fruits dataset (Fruits 360 [7] ) contains a smaller number of images and does not have variations and did not contain multiple fruit sets in the single images.In recent literature [1][2][3][4][5] , fruit recognition is mainly done using single fruit in an image, and does not contain multiple types of fruits in a single image.Therefore, there is a need to develop a comprehensive dataset containing a variety of fruits with different fruit set combinations captured on different plates' sizes, shapes, and colors with varying angles, brightness levels, and distances.Further, the fruit datasets used in the recent studies are not publicly available for future research.columns 7 to 21 has values 0 means those fruits are not present in the image.Similarly, the "Fruits_Dataset_Train" directory has the fruit images for training and their labels are provided in "Labels_Train.csv".The "Sample_Images" directory contains sample fruit images for a quick review before downloading the complete dataset.The dataset is comprehensive and was proven to be sufficient for training and classification based on the initial accuracy results of 94.72% using the Convolutional Neural Network model (GoogleNet) [9] and detailed experimental results will be published in a separate journal/conference paper.The current version of the published dataset can be used as is, while future versions may include more image variations, fruit types and numbers.The fruits images dataset is not without limitations and the insufficient light, as well as the non-inclusion of all the Fruit classes that exist as we only limited our dataset to 20 different types of fruits from available fruits, are limitations that will be addressed in future versions of the dataset.

Experimental design, materials, and methods
The DeepFruit is a fully labeled Fruits images dataset captured with different combination sets.The development of the dataset was done at Prince Mohammad Bin Fahd University.Students volunteered in capturing and labeling the images (under the supervision of the authors), which are taken using a smart digital phone camera (Samsung Galaxy S10) with the default setting; resolution 3840 × 2160 and JPG image format.The authors also double-checked each image after the students to ensure that they are within the scope and specifications specified for this research.The images were taken at different times of the day with different light conditions (bright light, low light, backlighting, with flash, and without flash.),positions, and distances (between 1 and 2 feet from the fruit plate).The total number of images per Fruit type is different, however, the dataset consists of 21,122 images in total.The newly developed dataset is stored as 3-channel RGB images with different dimensions and variations.
Following are the steps that can be followed to reproduce the dataset: 1. Collect different fruits (refer to Table 1 ).
2. Collect different plates with varying designs and sizes.
3. Place the fruits randomly on different plates with varying types of fruits (3 to 5 types of fruits).

Fig. 1 .
Fig. 1.Sample fruit images from the dataset with different combination sets (total of 21,122 images).

Table 1
Summary of the newly constructed Fruits Images dataset (DeepFruit).