SoyNet: A high-resolution Indian soybean image dataset for leaf disease classification

In order to address the challenges related to the classification and recognition of soybean disease and healthy leaf identification, it is essential to have access to high-quality images. A meticulously curated dataset named “SoyNet” has been created to provide a clean and comprehensive dataset for research purposes. The dataset comprises over 9000 high-quality soybean images, encompassing healthy and diseased leaves. These images have been captured from various angles and directly sourced from soybean agriculture fields; The soybean leaves images are organized into two sub-folders: SoyNet Raw Data and SoyNet Pre-processing Data. Within the SoyNet Raw Data folder are separate folders for healthy and diseased images captured using a digital camera. The SoyNet Pre-processing Data folder comprises resized images of 256*256 pixels and the grayscale versions of disease and healthy images, following a similar organizational structure. We captured the images using the Nikon digital camera and the Motorola mobile phone camera, utilizing different angles, lighting conditions, and backgrounds. They were taken in different lighting conditions and backgrounds at soybean cultivation fields to represent the real-world scenario accurately. The proposed dataset is valuable for testing, training, and validating soybean leaf disease classification.


a b s t r a c t
In order to address the challenges related to the classification and recognition of soybean disease and healthy leaf identification, it is essential to have access to high-quality images. A meticulously curated dataset named "SoyNet" has been created to provide a clean and comprehensive dataset for research purposes. The dataset comprises over 90 0 0 highquality soybean images, encompassing healthy and diseased leaves. These images have been captured from various angles and directly sourced from soybean agriculture fields; The soybean leaves images are organized into two sub-folders: SoyNet Raw Data and SoyNet Pre-processing Data. Within the SoyNet Raw Data folder are separate folders for healthy and diseased images captured using a digital camera. The SoyNet Pre-processing Data folder comprises resized images of 256 * 256 pixels and the grayscale versions of disease and healthy images, following a similar organizational structure. We captured the images using the Nikon digital camera and the Motorola mobile phone camera, utilizing different angles, lighting conditions, and backgrounds. They were taken in different lighting conditions and backgrounds at soybean cultivation fields to represent the real-world scenario accu-rately. The proposed dataset is valuable for testing, training, and validating soybean leaf disease classification.
© 2023 The Author(s • Moreover, the dataset's utility extends to the creation of high-quality applications for soybean leaf classification, benefiting farmers, agriculture industries, researchers, and companies involved in soybean pesticide development.

Objective
The main goal of this dataset is to aid researchers in their studies by providing a wide range of soybean leaf images taken directly from real-field conditions. By capturing images from different angles and sourcing them from soybean agriculture fields, the SoyNet creates a valuable resource for those working on classifying and recognizing soybean leaf diseases.

Data Description
India, known for its strong agricultural sector, relies heavily on crop production, which serves as the backbone of its economy. Among the crops grown in the country, soybean holds a significant position, with India ranking fifth in terms of production [1] . The SOPA (Soybean Processors Association of India) confirms that Madhya Pradesh, often referred to as the "Soybean State" of India, contributes a total of 55% of the national soybean cultivation area [2 , 3] . However, the production of soybean faces challenges in the form of diseases and insect pests, which can cause substantial losses. Therefore, it is crucial to diagnose these issues accurately and address them promptly to safeguard soybean crops [4] . To aid in this process, a dataset has been compiled, consisting of images categorized into two sub-folders: SoyNet Raw Data and Soynet Pre-processing Data. The dataset includes images captured by a digital camera, containing folders for both healthy and diseased soybean leaves. The latter comprises images taken using a mobile phone, specifically focusing on diseased leaves. The images in the "Pre-processing SoyNet Data" folder have been resized to 256 * 256 pixels and converted to grayscale, mirroring the organization of the disease and healthy data. The images were clicked using a Nokia digital camera and a Motorola mobile phone. They were taken in various lighting conditions and backgrounds at soybean cultivation fields. The SoyNet dataset is proposed for utilization in testing, training, and validating soybean classification or recognition models. For further details on SoyNet dataset image management, refer to Table 1

Experimental Design
The Jawaharlal Nehru Krishi Vishwa Vidyalaya (JNKVV) located in Jabalpur, Madhya Pradesh, India played a crucial role in the acquisition of soybean leaf images for this research. The field at JNKVV served as the primary source for collecting samples of soybean leaves under real-field conditions. The latitude and longitude coordinates, specifically 23 °12 36.9"N 79 °56 47.7 E. [6] .

Description of the Soybean Crop and Management
Soybean plays a crucial role in the agricultural landscape of Madhya Pradesh, contributing significantly to its economic growth, particularly during the kharif season (June-November). The average productivity has been recorded between 796 and 885 kilograms per hectare. However, specific challenges hinder soybean production in the region, such as the limited availability of high-quality seeds of improved varieties, inadequate adoption of advanced production techniques, and the risks associated with rainfed crop cultivation. Utilizing good quality seeds of improved soybean varieties, including JG 71, JG 315, JG 322, Pusa 391, Vishwas (Phule G 5), Vijay, Vishal, JG 218, JG 16, JG 130, JGG 1, JGK 1, and BGD 128(K).
The procedure for acquiring soybean leaf images is illustrated in Fig. 1 . The images were obtained using two different digital cameras: the Nikon L810 and the rear camera of a Motorola G40 mobile phone. Over 90 0 0 images were clicked using these cameras and subsequently categorized and stored in designated folders based on their quality and classification. The data acquisition procedure is outlined in Fig. 1 . The Soybean leaf images were captured under natural lighting, and from different angles and backgrounds. The image pre-processing was conducted using Python. During pre-processing, the images were resized to the standard resolution of 256 × 256 pixels [5] , which is optimal for building object classification. From October to November, the data acquisition step involved capturing soybean images were captured at the stage when the soybean leaves had reached their full growth. This was done to take advantage of the natural lighting conditions and obtain a diverse range of images. The photographs were carefully taken from different angles and backgrounds, ensuring a comprehensive representation of soybean plants in their natural environment. The SoyNet dataset includes various soybean leaf diseases, including Bacterial blight, Rust, Bacterial pustule, brown spot, Downy mildew, frog leaf eye, and Soybean yellow mosaic. These disease images are stored in the respective disease folder. This meticulous approach aimed to create a dataset that accurately reflects the real-field conditions and challenges associated with soybean leaf diseases. In the subsequent phase, the pre-processing of the acquired images was conducted. Overall, the data acquisition steps outlined in Table 2 provide an overview of the timeline and activities involved in capturing and preparing the SoyNet dataset for further analysis and research. Table 3 provides a comprehensive overview of common soybean leaf diseases, including their symptoms, causes, and corresponding images. It serves as a valuable reference for identifying and understanding various diseases affecting soybean crops. Table 2 Data Acquisition steps.

S. No.
Step Duration Activity

Data acquisition
October to November Soybean images were captured daily, taking advantage of natural lighting conditions. The photographs were taken from various angles and backgrounds to ensure a comprehensive representation of the soybean plants in their natural environment.

2.
Pre-processing of the SoyNet dataset December to January Python is used to accomplish the pre-processing of the images, resize all images to a standardized resolution of 256 × 256 pixels, and then organize them into separate folders based on their classification. ( continued on next page )   Resolution unit 2 2 8

Specifications of the Soybean Leaf Image Acquisition Model
Color representation sRGB Grayscale

Ethics Statements
There are no Conflicting interests involved, and the data is freely accessible in the public domain.

Data Availability
SoyNet: Indian Soybean Image dataset with quality images captured from the agriculture field (healthy and disease Images) (Original data) (Mendeley Data).

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.