A comprehensive dataset of eight Thai cannabis classes for botanical exploration

This dataset presents a comprehensive collection of images representing both dried and live samples from eight distinct Thai cannabis classes. The dataset includes a total of 14,094 images, with images depicting dried and healthy specimens. These images serve as a valuable resource for researchers engaged in botanical exploration, machine learning, and computer vision studies. Additionally, the dataset facilitates investigations into the medicinal properties of Thai cannabis. Interdisciplinary collaboration is encouraged, providing opportunities for innovative insights spanning biology, horticulture, and data science. Beyond fundamental research, this dataset holds practical implications for agriculture, technology development, and disease prevention, offering insights into both dried and live states of Thai cannabis plants across various strains.


Data Description
The implementation of image processing and computer vision methods can serve as an alternative approach to accelerate the process of identifying or classifying plants.The dataset consists of a total of 14094 images, distributed across categories as shown in the Table 1 .
The dataset comprises a collection of 14094 images, categorizing eight distinct Thai cannabis classes: Foi Thong, Hang Kra Rog Phan ST1, Hang Suea Sakonnakhon TT1, KD KT, Kroeng Krawia, Purple Huai Khrai, Tanao Si Kan Khaw WA1, and Tanao Si Kan Dang RD1.This diverse collection includes both dried and healthy specimens, offering researchers a comprehensive dataset for various research areas, including botanical exploration, machine learning, medicinal plant studies, education, and cross-disciplinary research initiatives.Each category is organized within distinct folders, ensuring straightforward access and identification of specific samples.Refer to Fig. 2 for sample images of cannabis plant varieties included in the dataset.

Experimental Design
The dataset images were captured using the Iphone 13 pro mobile phone, ensuring consistent image quality and resolution for each plant sample.Fig. 2 shows the experimental setup for dataset creation.The dataset encompasses eight categories, introducing variability in lighting and environmental factors to mimic real-world scenarios.Fig. 3 shows experimental configuration setup for the Thai Cannabis Plant dataset.Fig. 4 shows data acquisition process and Fig. 5 illustrates pre-processed image.
Step 1: Image Capture (September 2023) -In this step, we conducted field visits during both daytime and nighttime to capture images related to various conditions.The primary objective was to compile a comprehensive collection of images relevant to Thai Cannabis plant categories.
Step 2: Image Pre-processing (January 2024) -During this step, we enhanced the quality of Thai Cannabis plant images by resizing them to 1024 × 768 using FastStone Photo Resizer.The data acquisition process involved capturing images during field visits and subsequently preparing them through pre-processing for inclusion in the dataset.

Materials or Specification of Image Acquisition System
The mobile phone (Iphone 13 pro used in the data acquisition process and the specifications of the captured images are: Sensor Type: 64 MP GW1 / S5KGW3 Focal Length: 26mm Aperture Range: f/1.79-f/2.2Aspect Ratio: 4:3 The images taken were saved in JPG format and were resized to a resolution of 1024 × 768 pixels using FastStone Photo Resizer.These specifications provide crucial details about the cameras used and the image properties obtained during the data acquisition process.

Preprocessing Method
In our study, we initiated image preprocessing using FastStone Photo Resizer, a versatile tool widely known for batch image resizing.This step, outlined in Fig. 6 , streamlines the resizing process for image batches, proving valuable for preprocessing in diverse research applications, such  as image-based machine learning, analysis, and data augmentation.For subsequent preprocessing, we adopted the 'preprocess_input' function within the Keras library, tailored for pre-trained models.This built-in function encompasses mean subtraction, channel reordering, scaling, and resizing, ensuring proper formatting of images for pre-trained models.Its role is crucial in guaranteeing that input images align with the specific requirements of the chosen pre-trained model.Fig. 7 shows the augmentation code of Thai cannabis dataset and Fig. 8 represents image enhancement steps during preprocessing.

Demonstrating the Significance of the Thai Cannabis Plant Dataset
In the realm of machine learning datasets, several notable contributions have emerged recently [2-8 , 10] catering to machine learning applications.We wanted to show just how valuable our Thai Cannabis Plant Dataset [1 , 9] is, so we ran some experiments using well-known pre-trained models like VGG19, DenseNet201, and EfficientNetB7.Our goal was to see how this dataset can boost the accuracy of machine learning models, especially when it comes to identifying Thai Cannabis plants.
First, we ran these pre-trained models without any tweaks using our dataset as a sort of benchmark.Then, we gave these models a boost by training them on our dataset.What we found was pretty exciting.When we fine-tuned these models with our dataset, there was a significant jump in accuracy.This was most apparent in how well the models could detect and classify Thai cannabis plants.Table 2 shows accuracy of pretrained machine learning models on the Thai cannabis plant dataset before and after training with our dataset.Similarly, Table 3 shows confusion matrix of pretrained machine learning models on the Thai cannabis plant dataset before and after training with our dataset.
In a nutshell, our dataset plays a crucial role in making these machine learning models, like VGG19, DenseNet201, and EfficientNetB7, perform much better.By offering a solid resource for training and fine-tuning, our dataset becomes a vital tool in creating more reliable models that can help improve Thai Cannabis Plant cultivation and keep those plants healthy ( Fig. 1 ).
The above confusion matrix in the Table 3 , offers a detailed breakdown of the model's predictive accuracy.It allows us to discern where the model excels, correctly identifying instances within each class (true positives), and where it stumbles, making classification errors (predicted positives and predicted negatives).

Limitations
Expanding the dataset to encompass a wider range classes and samples from diverse global regions would enhance its overall diversity and applicability.

Fig. 2 .
Fig. 2. Sample images of cannabis plant varieties in the dataset.

Table Subject
The data collection process encompassed capturing images of both dried and live samples from eight distinct Thai cannabis classes.A total of 14094 images were meticulously collected, comprising images of dried and healthy specimens.These images were saved in JPG format and underwent resizing to achieve a resolution of 1024 × 768 pixels using FastStone Photo Resizer.Special emphasis was placed on capturing images of dried plant leaves to provide comprehensive coverage of each cannabis class, including Foi Thong, Hang Kra Rog Phan ST1, Hang Suea Sakonnakhon TT1, KD KT, Kroeng Krawia, Purple Huai Khrai, Tanao Si Kan Khaw WA1, and Tanao Si Kan Dang RD1.This carefully executed data collection process resulted in a diverse and extensive dataset, offering valuable resources for researchers delving into image classification, machine learning, and computer vision within the realm of botanical studies.
• Interdisciplinary researchers find opportunities for collaboration and innovative insights across biology, horticulture, and data science in this diverse dataset, while considering both the dried and healthy states of the plants.•This dataset is a valuable resource with applications in research, agriculture, technology development, and disease prevention.It not only addresses essential questions but also provides practical tools for improving plant health assessment, ultimately contributing to more sustainable and efficient agricultural practices.

Table 1
Distribution of Thai cannabis plant images.

Table 2
Accuracy of pretrained machine learning models on the Thai cannabis plant dataset: before and after training.

Table 3
Confusion Matrix of pretrained models.