FruitNet: Indian fruits image dataset with quality for machine learning applications

Fast and precise fruit classification or recognition as per quality parameter is the unmet need of agriculture business. This is an open research problem, which always attracts researchers. Machine learning and deep learning techniques have shown very promising results for the classification and object detection problems. Neat and clean dataset is the elementary requirement to build accurate and robust machine learning models for the real-time environment. With this objective we have created an image dataset of Indian fruits with quality parameter which are highly consumed or exported. Accordingly, we have considered six fruits namely apple, banana, guava, lime, orange, and pomegranate to create a dataset. The dataset is divided into three folders (1) Good quality fruits (2) Bad quality fruits, and (3) Mixed quality fruits each consists of six fruits subfolders. Total 19,500+ images in the processed format are available in the dataset. We strongly believe that the proposed dataset is very helpful for training, testing and validation of fruit classification or reorganization machine leaning model.


Value of the Data
• The dataset is comprehensive which consist of 19500 + high-quality images of six different classes. • The dataset consist of good quality, bad quality, and mixed quality fruit images.
• To the best of our knowledge this is the first open access dataset of indian fruits consistes of good, bad and mixed quality fruits. • This dataset is useful to build applications of fruit classification and detection with quality.
• The dataset will be useful for training, testing and validation of fruit classification or reorganization model. • The dataset is useful to build fruit classification with quality applications which are beneficial for farmers, agriculture industries, wholesalers, hawkers, and customers, and fruit export companies.

Data Description
The profit percentage share of fruit market is substantial with respect to the total agriculture output [1][2][3] . In the agro-industry fast and accurate fruit classification is the highest need. The fruits can be classified into different classes as per their external features like shape, size and color using some computer vision and deep learning techniques [4][5][6][7][8] . The FruitNet dataset was created to include Indian fruits along with its quality parameters for those which are highly consumed or exported as per [9] . It consists of six classes of Indian fruits namely apple, banana, guava, lime, orange, and pomegranate. They further categorized into good quality, bad quality, and mixed quality. The fruit images were taken with different background, in different light conditions in indoor and outdoor environment. The Fig. 1 shows the sample images in the dataset consisting of images taken in various environments.

Experimental design
The image data acquisition process is shown in Fig. 2 . The fruit images were acquired using three different make of camera's i.e. iPhone6 (Apple), ZUK (Z2 Plus), and Realme (Realme 5 Pro) mobile's high resolution rear camera. In all 19500 + images were captured using camera and then were segregated and saved in respective folders as per their quality and classification.
The data acquisition process steps are shown in Table 1 . The fruit images are captured in the natural and artificial lighting conditions with different angles and background in months of July to October. Images pre-processing is done using python script. In the pre-processing we changed the dimensions to 256 × 256 which is standard resolution required to build object classification or object detection model.  Step Duration Activity 1. Data Gathering July to October Daily captured the fruits images in the natural and artificial light with different angles and background.

2.
Pre-processing and creating dataset November Run the python script to pre-process the images (convert all images in 256 × 256 resolution) and save the images into respective folders as per their quality and classification (i.e. bad, good and mixed)

Method
All fruit images are acquired using three mobile make with a high resolution rear camera in different angles and different backgrounds. The orignal images of size 3024 × 3024 were resized to 256 × 256 using a python script. Table 4 describes the classes, number of image taken and the environments in which images are taken.

Ethics Statement
There is no funding for the present effort. There is no conflict of interest. The data is available in public domain.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.