Dataset of cannabis seeds for machine learning applications

The recent changes in policies in several countries regarding cannabis use has increased cannabis usage and research [1,2]. Cannabis is the second most used psychoactive substance word-wide [3]. Cannabis remains the subject of many research works. The cannabis can be classified into different classes according to their external features like colour, shape, and size using some computer vision and machine learning techniques. Precise classification or recognition is the unmet need of the agriculture business. This attracts many researchers to produce solutions with machine learning and deep learning techniques. Neat and clean dataset is the primary requirements to build accurate and robust machine learning model and minimize misclassification for the real-time environment. To achieve this objective, we have created an image dataset of cannabis seed. Accordingly, we have considered seventeen cannabis seeds to create dataset. The dataset contains 17 subfolders of cannabis seeds and folder is named with the category of seed. We strongly believe the cannabis seeds dataset will be very helpful for training, testing, and validation of cannabis classification or recognition with machine learning models.


a b s t r a c t
The recent changes in policies in several countries regarding cannabis use has increased cannabis usage and research [1 , 2] . Cannabis is the second most used psychoactive substance word-wide [3] . Cannabis remains the subject of many research works. The cannabis can be classified into different classes according to their external features like colour, shape, and size using some computer vision and machine learning techniques. Precise classification or recognition is the unmet need of the agriculture business. This attracts many researchers to produce solutions with machine learning and deep learning techniques. Neat and clean dataset is the primary requirements to build accurate and robust machine learning model and minimize misclassification for the realtime environment. To achieve this objective, we have created an image dataset of cannabis seed. Accordingly, we have considered seventeen cannabis seeds to create dataset. The dataset contains 17 subfolders of cannabis seeds and folder is named with the category of seed. We strongly believe the cannabis seeds dataset will be very helpful for training, testing, and validation of cannabis classification or recognition with machine learning models.  Table   Subject Machine Learning, Agriculture Science Specific subject area Images dataset of cannabis seeds for classification Type of data Cannabis seeds images How data were acquired The high-quality cannabis seeds images were captured using mobile phone camera with different background and artificial lights. Data format Raw Description of data collection The high-resolution rear camera of iPhone was used to capture the different classes of cannabis seeds. The images were taken jpg. Format with the dimension of 3024 * 4032. The dataset is categorized into 17 subfolders of cannabis seeds namely Ak47 photo, blackberry (auto), cherry pie, gelato, gorilla purple, hang kra rog ku, hang kra rog phu phan st1, hang suea sakon Nakhon tt1, kd, kd_kt, krerng ka via, purple duck, skunk (auto), sour diesel (auto), tanaosri kan Daeng rd1, tanaosri kan kaw wa1, and thaistick foi thong.

Value of the Data
• The dataset consists of 3434 high-quality original images of seventeen different classes of cannabis seeds. • This is the first open access dataset to the best of our knolwege, of cannabis seeds.
• This dataset is useful to build applications of cannabis seeds classification, counting and detection with quality.
• The dataset will be useful to researcher to train, test and validate their classification or recognition machine learning models for cannabis seeds. • The dataset is useful to build high quality cannabis seeds classification applications which are beneficial for farmers, agriculture industries, wholesalers, and cannabis seeds export companies.

Objectives
• A dataset of different types of cannabis seed that can help AI/ML algorithms to detect/classify cannabis seeds in real-time. • A neat and clean dataset of cannabis seeds to build AI/ML models and minimize the misclassification by algorithms.

Data Description
This dataset consists of seventeen classes of cannabis seeds namely Ak47 photo, blackberry (auto), cherry pie, gelato, gorilla purple, hang kra rog ku, hang kra rog phu phan st1, hang suea sakon Nakhon tt1, kd, kd_kt, krerng ka via, purple duck, skunk (auto), sour diesel (auto), tanaosri kan Daeng rd1, tanaosri kan kaw wa1, and thaistick foi thong. According to [4] cannabis seeds contain approximately 29 to 34 percent oil by weight. Cannabis seeds are also used to produce a clear yellow liquid. There is multiple usage of cannabis such as they can used for cosmetic preparations such as skin care products in the form of moisturizers, shampoos, lotions and lip balms. Cannabis seed oil is used as an ingredient in body oils and lipid-enriched creams [4]. There are multiple datasets on fruits, vegetables [5 , 6 , 7] but there is a need of cannabis seed dataset for researchers to develop machine learning models and/or applications. This dataset contains the images of cannabis seeds and not their plants' leaves. The cannabis is cultivated in indoor and/or outdoor environments. The images were captured using mobile phone. The cannabis seed images were taken on white background. The Fig. 1 shows the sample images in the dataset consisting of images from each class.

Experimental Design
The image data acquisition process is shown in Fig. 2 . The seed images were acquired using iPhone 13 pro mobile phone's high resolution rear camera. In all 3434 images were captured using camera and then were segregated and saved in respective folders.
The data acquisition process steps are shown in Table 1 . The seed images are captured in the natural and artificial lighting conditions with different angles and background in months of June to October. Images are stored in original format in the dataset. Researchers can convert them into 256 * 256 or 224 * 224 as per their needs to build machine learning models with the cannabis seed dataset.

Method
All the seventeen types of cannbis seeds Ak47 photo, blackberry (auto), cherry pie, gelato, gorilla purple, hang kra rog ku, hang kra rog phu phan st1, hang suea sakon Nakhon tt1, kd, kd_kt, krerng ka via, purple duck, skunk (auto), sour diesel (auto), tanaosri kan Daeng rd1, tanaosri kan kaw wa1, and thaistick foi thong were purchased from local market in thailand. The seeds brought to Kasetsart University laboratory. Daily images are captured using iPhone 13 pro mobile make with a high resolution rear camera in different angles and white backgrounds. The images were taken everyday. Table 4 describes the classes, number of image taken and the environments in which images are taken. Table 1 Data acquisition steps.

Sr. No.
Step Duration Activity

Data Gathering June to October 2022
Purchase of Cannabis seed in Thailand for dataset. Daily captured the cannabis seed images in the natural and artificial light with different angles and white background.

2.
Pre-processing and creating dataset October 2022 Save the images into respective folders as per their classification.    Resolution unit 2

Ethics Statement
The data is available in public. No ethics approval needed for this study. There is no conflict of interest.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data Availability
Dataset of Cannabis Seeds (Original data) (Mendeley Data).