Sugarcane leaf dataset: A dataset for disease detection and classification for machine learning applications

Sugarcane, a vital crop for the global sugar industry, is susceptible to various diseases that significantly impact its yield and quality. Accurate and timely disease detection is crucial for effective management and prevention strategies. We persent the “Sugarcane Leaf Dataset" consisting of 6748 high-resolution leaf images classified into nine disease categories, a healthy leaves category, and a dried leaves category. The dataset covers diseases such as smut, yellow leaf disease, pokkah boeng, mosale, grassy shoot, brown spot, brown rust, banded cholorsis, and sett rot. The dataset's potential for reuse is significant. The provided dataset serves as a valuable resource for researchers and practitioners interested in developing machine learning algorithms for disease detection and classification in sugarcane leaves. By leveraging this dataset, various machine learning techniques can be applied, including deep learning, feature extraction, and pattern recognition, to enhance the accuracy and efficiency of automated sugarcane disease identification systems. The open availability of this dataset encourages collaboration within the scientific community, expediting research on disease control strategies and improving sugarcane production. By leveraging the “Sugarcane Leaf Dataset,” we can advance disease detection, monitoring, and management in sugarcane cultivation, leading to enhanced agricultural practices and higher crop yields.

a b s t r a c t Sugarcane, a vital crop for the global sugar industry, is susceptible to various diseases that significantly impact its yield and quality.Accurate and timely disease detection is crucial for effective management and prevention strategies.We persent the "Sugarcane Leaf Dataset" consisting of 6748 highresolution leaf images classified into nine disease categories, a healthy leaves category, and a dried leaves category.The dataset covers diseases such as smut, yellow leaf disease, pokkah boeng, mosale, grassy shoot, brown spot, brown rust, banded cholorsis, and sett rot.The dataset's potential for reuse is significant.The provided dataset serves as a valuable resource for researchers and practitioners interested in developing machine learning algorithms for disease detection and classification in sugarcane leaves.By leveraging this dataset, various machine learning techniques can be applied, including deep learning, feature extraction, and pattern recognition, to enhance the accuracy and efficiency of automated sugarcane disease identification systems.The open availability of this dataset encourages collaboration within the scientific community, expediting research on disease control strategies and improving sugarcane production.By leveraging the "Sugarcane Leaf Dataset," we can advance disease detection, mon-itoring, and management in sugarcane cultivation, leading to enhanced agricultural practices and higher crop yields.
© 2024 The Author(s

Data Description
The image datasets play a crucial role in various fields, ranging from computer vision and machine learning to medical research and social sciences [1] .These datasets provide a rich source of visual information that enables researchers, developers, and professionals to train and validate their models, algorithms, and theories.By having access to diverse and well-curated image datasets, researchers can explore new possibilities, enhance the accuracy and robustness of their models, and gain valuable insights into patterns, trends, and relationships within the visual data [6].An image dataset specific to sugarcane leaf diseases holds significant importance in the agricultural domain.Such datasets provide researchers, agronomists, and farmers with a valuable resource to identify, classify, and study various leaf diseases affecting sugarcane crops [2] .By analysing these images, experts can develop more accurate disease detection algorithms and early warning systems.This aids in prompt disease management, preventing widespread crop damage and yield loss.Additionally, a comprehensive dataset allows for the exploration of disease patterns, environmental factors, and potential mitigation strategies.In summary, a sugarcane leaf disease image dataset plays a pivotal role in advancing research, improving crop management practices, and ensuring the overall health and productivity of sugarcane crops [3] .
This Sugarcane Leaf Dataset contains a diverse collection of 6748 high-resolution images of sugarcane leaves.The images are stored in JPEG format and have dimensions of 768 × 1024 pixels.The dataset is categorized into 11 distinct classes, including nine disease categories, a healthy leaves category, and a dried leaves category ( Fig. 1 ).The disease categories cover a range of common sugarcane leaf diseases, such as smut, yellow leaf disease, pokkah boeng, mosale, grassy shoot, brown spot, brown rust, banded cholorsis, and sett rot ( Table 1 ).Each category is labelled and organized in separate folders, ensuring easy access and identification of specific disease samples.The images were collected through extensive field surveys conducted in  sugarcane-growing regions.The data collection process involved using quality cameras to capture images from various angles, including both sides of the leaves.Images were taken in the field and by cutting/separating individual leaves, capturing different stages and manifestations of the diseases.This approach ensures a comprehensive representation of the visual characteristics of sugarcane leaf diseases within the dataset.The dataset's images are of high quality, with a resolution set at 72 dots per inch (dpi), ensuring clear and detailed visual representation of the sugarcane leaf samples.

Experimental design
The Sugarcane dataset was generated through the acquistion of images using high resolution rear cameras of Samsung F23 5 G Mobile.The Table 2 provides a summary of the data acquisition steps undertaken for the project.
Step 1: Image Acquisition (Duration: April to June): During this period, field/farm visits were conducted during daytime to capture images.The objective was to gather a collection of images related to sugarcane leaf diseases.
Step 2: Image Pre-processing (Duration: June): In this step, the gathered images were reviewed, and the appropriate images for the dataset were selected.These selected images then underwent pre-processing, which may have included resizing, cropping, and enhancing the images as necessary.
The data acquisition process involved capturing images during field visits and subsequently preparing the images for inclusion in the dataset through pre-processing.

Materials or specification of image acquisition system
The cameras used in the data acquisition process and the specifications of the captured images: 1.For Samsung Galaxy F 23 5 G Android Mobile: • Make and Model: Samsung Galaxy F 23 5 G (SM-E236B) Android Mobile.
• Battery: The mobile is equipped with a 50 0 0 mAh battery.
The captured images were saved in JPG format and resized with a resolution of 768 × 1024 pixels.These specifications provide essential information about the cameras and image properties utilized in the data acquisition process.

Method
The sugarcane leaf disease dataset was compiled through fieldwork conducted at a farm located in the village of Kendur, Taluka-Shirur, District -Pune, Maharashtra, India (geographical coordinates: 18 °47 06.4 N 74 °01 19.5 E).The data collection process involved capturing images under diverse scenarios, encompassing leaves within their natural habitat as well as leaves that had been detached or severed from the plant, all from a distance of 30-50 cm.This deliberate approach aimed to provide a comprehensive and varied representation of sugarcane leaf diseases under different environmental conditions.To ensure accurate disease identification, the collected images were forwarded to the Botany Department of Rashtrapita Mahatma Gandhi Arts and Science College in Nagbhid, Chandrapur, India.The department's expertise was leveraged to confirm the disease categories present in the images.Subsequently, the captured images underwent a pre-processing phase, which involved resizing and renaming, facilitated by the utilization of IrfanView software [4] .The resized and renamed images were systematically organized into folders corresponding to their respective disease categories.This curation process enhances the dataset's suitability for scientific analysis and research on sugarcane leaf diseases ( Fig. 2 ).
Table 3 presents the distribution of images by various categories of sugarcane leaf diseases and healthy leaves in the dataset.The dataset consists of a total of 6748 images, with each category containing a different number of images.The categories include Yellow leaf disease (YLD), Smut, Pokkah boeng, Mosaic (Viral Disease), Grassy Shoot, Brown Spot, Brown Rust, Banded

Fig. 2 .
Fig. 2. Architectural presentation of the image acquisition and image pre-processing.

Table 1 . Value of the Data
). Published by Elsevier Inc.This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ )

Table 1
Sample images of different sugarcane leaf (Diseases, Healthy, Dried).

Table 1 ( continued ) Grassy shoot Sett Rot Brown Spot Brown Rust Banded Cholorsis Dried Leaf Healthy LeafTable 2
Data acquisition steps.

Table 3
Total number of images per category in the sugarcane leaf dataset.

of Images in the Dataset 6748
[5]orosis, Sett rot, Dried Leaf, and Healthy Leaves.The number of images for each category ranges from 246 to 1722.The original format of the images is now accessible to the public through Mendeley[5].