A citrus fruits and leaves dataset for detection and classification of citrus diseases through machine learning

Plants are as vulnerable by diseases as animals. Citrus is a major plant grown mainly in the tropical areas of the world due to its richness in vitamin C and other important nutrients. The production of the citrus fruit has been widely affected by citrus diseases which ultimately degrades the fruit quality and causes financial loss to the growers. During the past decade, image processing and computer vision methods have been broadly adopted for the detection and classification of plant diseases. Early detection of diseases in citrus plants helps in preventing them to spread in the orchards which minimize the financial loss to the farmers. In this article, an image dataset citrus fruits, leaves, and stem is presented. The dataset holds citrus fruits and leaves images of healthy and infected plants with diseases such as Black spot, Canker, Scab, Greening, and Melanose. Most of the images were captured in December from the Orchards in Sargodha region of Pakistan when the fruit was about to ripen and maximum diseases were found on citrus plants. The dataset is hosted by the Department of Computer Science, University of Gujrat and acquired under the mutual cooperation of the University of Gujrat and the Citrus Research Center, Government of Punjab, Pakistan. The dataset would potentially be helpful to researchers who use machine learning and computer vision algorithms to develop computer applications to help farmers in early detection of plant diseases. The dataset is freely available at https://data.mendeley.com/datasets/3f83gxmv57/2.


a b s t r a c t
Plants are as vulnerable by diseases as animals. Citrus is a major plant grown mainly in the tropical areas of the world due to its richness in vitamin C and other important nutrients. The production of the citrus fruit has been widely affected by citrus diseases which ultimately degrades the fruit quality and causes financial loss to the growers. During the past decade, image processing and computer vision methods have been broadly adopted for the detection and classification of plant diseases. Early detection of diseases in citrus plants helps in preventing them to spread in the orchards which minimize the financial loss to the farmers. In this article, an image dataset citrus fruits, leaves, and stem is presented. The dataset holds citrus fruits and leaves images of healthy and infected plants with diseases such as Black spot, Canker, Scab, Greening, and Melanose. Most of the images were captured in December from the Orchards in Sargodha region of Pakistan when the fruit was about to ripen and maximum diseases were found on citrus plants. The

Data
Fruit plants play a significant role in the economic growth of any state. One of the famous species among the fruit plants is a citrus plant, which is full of vitamin C and broadly used in the region of the Middle East and Africa [1]. In the agro-industries, some kind of citrus plants are used as a raw material [2]. Citrus plant diseases is a major cause to reduce the production of citrus fruits and their usages in several industries. The most common disease identified by the domain experts and researchers are Greening, Melanose, Downy, Black spot, Canker, Scab, and Anthracnose. These citrus plants disease can be identified on the basis of their visual symptoms by applying some computer vision and deep learning techniques [3]. This article presents the data set contain 759 images of healthy and unhealthy citrus fruits and leaves that could be used by the researchers to apply different computer vision and image processing algorithm for the detection of citrus plants diseases. The data set is associated with the articles [4,5]. All images were acquired manually using a DSLR with the help of domain expert of Table 1 Description of data set against each of its disease class.

Citrus Leaves
Citrus Fruits Bold represents the total number of disease images captured in citrus leaves and fruits. citrus disease and from the Sargodha region, Pakistan. The width and height for all images with the resolution of 72 dpi were 256 pixels and 256 pixels respectively. The infected images were classified into 4 different diseases of citrus fruits and leave separately. The disease we targeted in the data sets were Black spot, Canker, Scab, Greening, and Melanose. Table 1 contains the description of the data set against each of its disease class. The dark spot on the citrus plants likewise shortened as CBS is accessible at the period when the plant is sick and the climate is ideal for the disease. The indications of the citrus leaf and fruits are little, bowed, and under hazardous spots with dark squares. The spectral range for spot diameter is 0.12e0.4 [4]. The canker spot is covered by the wind-driven downpour. On the citrus leaf, the small circles appear and the scope of the lesion is 2e10mm in size. Where the minimum range of citrus fruit lesion is appeared to be 1e10mm in size and these ranges contrast in size from one another [6]. Similarly the fruit species of citrus plants, the scab skin break out is a composite of parasitic and creature tissue. These acnes are not raised and the color shades vary from pink to light brown [6].
Melanose is referred to as saprophyte where the weight of disease is characterized by total units of inoculum that make an influence on dead wood in the tree top. The sign of the disease is showing up as a little darker spot which ends up stuffed with a rosy dark colored gum. However, the indications of the disease on fruit species rely upon the age of the organic product at disease time [4]. Figs. 1e4 showing the self-annotated images of healthy and un-healthy citrus fruit and plants.

Camera specification
The data set were gathered using an advanced DSLR (Canon EOS 1300D) having a sensor with CMOS system and resolution of 5202 Â 3465 (Mpix). The sensor size for Canon EOS 1300D is 14.9 Â 22.3 (mm). RGB color range is chosen for each of the images in the JPG combination, including 256 shades for each RGB layer and 8 pixels for each shading layer.

Processing
All images are resized to the dimension of 256*256 with 72 dpi resolution. In Ref. [4], the presented data set is used by the authors for the detection and classification of citrus diseases. The entire processes include four major steps to complete which are: (a) enhancing the dataset, (b) lesion segmentation which highlights the infected region, (c) extracting features from the infected region and finally (d) selecting feature visually and perform classification. Firstly, all original input images were filtered by the Top-hat process and then Gaussian function is applied to improve the contrast of the infected region. Secondly, the enhanced images then mapped to the segmented blocks by implementing the weighted segmentation and Saliency map. At this stage, the input data set is segmented with lesion spots identified. Afterward, the segmented images were given to feature extractor and   selector algorithms which extract texture, color, and geometric features and select those features using skewness, PCA, and Entropy methods. Finally, the classification is performed to classify each image instance corresponds to each of its disease class. Each process is made out of the composition of steps as displayed in Fig. 5.