IDENTIFYING MEDICINAL PLANT LEAVES USING TEXTURES AND OPTIMAL COLOUR SPACES CHANNEL

This paper present an automated medicinal plant leaf identification system. The Colour Texture analysis of the leaves is done using the statistical, the Grey Tone Spatial Dependency Matrix (GTSDM) and the Local Binary Pattern (LBP) based features with 20 different color spaces (RGB, XYZ, CMY, YIQ, YUV, YCbCr, YES, U*V*W*, L*a*b*, L*u*v, lms, l 𝛼𝛼𝛼𝛼 , I 1 I 2 I 3 , HSV, HIS, IHLS, HIS, TSL, LSLM, and KLT). Classification of the medicinal plant is carried out with 70% of the dataset in training set and 30% in the test set. The classification performance is analysed with Stochastic Gradient Descent (SGD), k Nearest Neighbour(kNN), Support Vector Machines based on Radial basis function kernel(SVM-RBF), Linear Discriminant Analysis(LDA) and Quadratic Discriminant Analysis(QDA) classifiers. Results of classification on a dataset of 250 leaf images belonging to five different species of plants show the identification rate of 98.7 %. The results certainly show better identification due to the use of YUV, L*a*b* and HSV colour spaces.


Introduction
In the western ghats region of Asia, especially in Kanyakumari district, the southern most district of India, traditional home herbal remedies are readily available in most of the homes.These herbal plants are used in treating common sicknesses like common cold, diarrhoea and headache.The herbal practitioners, herbal medical industries and the even the common lay people have a good knowledge about using these herbal plants.They collect the needed herbal plants from the available sources and then use the plant leaves for the preparation of the medicine.
World has seen a steady interest and usage of traditional herbal remedies and herbal produces [1].World Health Organization (WHO) reports that about 65-80% of the world's population in developing countries depend on plants for their primary healthcare due to poverty and lack of access to modern medicine [2].
Leaf is one of the major identifying features of a plant.Leaves are far being the only discriminant visual key between species but, due to the shape and size, they have the advantage to be easily observed, captured and described [3].Though all the medicinal plant parts like roots, flowers and barks are used for medicinal purposes, the most common part with more medicinal value are the fresh leaves.Identification of the  respective medicinal plant leaves are normally done using plant taxonomic field guides by the Botanists.The Siddha medicinal practitioner (the traditional herbal physician) who mostly rely on medicinal plant leaves, and the households, depend on their knowledge gained and their experience in identifying the plant leaves.Though, this has been the regular practice for generations, modernisation has it is impact and most of the younger generations do not have enough knowledge in identifying the respective medicinal plant leaves.Wrong identification of these leaves can lead to more damage while treating the patients.
As there is a need for precise and quick identification of the plant leaves, an automatic identification system would prove to be an effective solution.
Five medicinal plant leaves are considered in the present work and some of their medicinal uses are given below.A sample of the medicinal plant leaf dataset from different plant leaves is given in the Figure 1.(0) Desmodium gyrans -Antidote for snake poison, effective for heart diseases, rheumatic complaints, diabetes and skin ailments.(1) Butea monosperma -promotes diuresis & antihelminthic, treats leucorrhoea & diabetes.(2) Malpighia glabra -help lower blood sugar, increases collagen and elastin production, treats diarrhoea, dysentery, and liver problems.(3) Helicteres isora -help treat intestinal complaints, colic pains and flatulence.(4) Gymnema sylvestre -suppresses the sensation of sweet, anti-diabetic.
Automatic plant leaf identification is a difficult problem because there is often high intraspecies variability, and low interspecies variation [4].But many promising approaches have emerged.Image based plant leaf identification is done using morphological characteristics(shape, colour, texture, margin [4][5][6][7]).Automatic identification of medicinal plant leaves are done using texture, colour, statistical and Local Binary Pattern(LBP) features [8][9][10].The identified plant leaves are used in food processing, medical, botanical gardening and cosmetic industry [11].
Colours are represented using different colour spaces.There is no particular colour space that is proving itself best for all the colour images.The use of colour increases the performances of the standard grey level texture analysis techniques [12,13].To exploit the advantage of the colour component of the medicinal plant leaves which are mostly green in colour, identifying medicinal plant leaves using colour shall be considered in this research work.Though colour is not stable during the growth period of the leaves, colour space usage will certainly be providing more information in better identification.Selection of the suitable colour space for the particular identification depends on the specific image [14].Different colour component based image processing produces reliable and accurate results [15] and improves classification performance [16].Several approaches used in automatic identification of plant leaves use HSV colour space, which results in better performance [17].
The decorrelated colour space L*a*b* is said to perform best in colour transfer algorithms which yields better quantification results in foods with curved surfaces [18].Y CbCr colour model outperforms other specified models RGB, YIQ, YCbCr, HSV and HSI in terms of objective quality assessment for Colour Image Fusion [19], YUV colour space is used to solve parameter optimisation of blind colour image fusion [20].The hybrid colour space RCrQ possesses complementary characteristics and enhances discriminative power for face recognition [21].
The objective of the present work is to compute the colour texture features using various colour spaces based on greylevel, the Grey Tone Spatial Dependency Matrix (GTSDM) and the LBP operators and to find out the suitable colour space in identifying the medicinal plant leaf automatically.
Suitable image classification approaches can be used for the identification process.The rest of the paper is organized as follows: Methodology of the work is discussed in section 2 which includes the details of texture feature extraction, colour transform, classification and the description of the dataset.Design and implementation of the work are discussed in section 2. The results and discussion are given in section 3, which is followed by conclusion in section 4.

Methods
The general approach adopted for identifying medisinal plant leaves are explained here.The different stages like texture feature extraction, colour transform and classification algorithms are presented in detail.Primary dataset is elaborated.The Algorithm 1 shows the overview of procedure to obtain the classified medicinal plant leaf.

Texture Feature Extraction
Texture features in identifying medicinal plant leaves can be computed using first-order statistical moments (mean, variance and skewness) and second-order statistical moment [Grey Tone Spatial Dependency Matrix-GTSDM (energy, homogeneity, dissimilarity, entropy, correlation and contrast) and Local Binary Pattern-LBP (mean and standard deviation)] operators [9].First-order grey level statistical features are the simplest texture descriptors.GTSDM or GLCM are the second-order statistical features which are more sensitive and more intuitive than first-order features.Local Binary Pattern (LBP) operators are known for its invariance to local gray scale variations, monotonic photometric changes and its high descriptive power.

Classification
Texture based image classification involves deciding the most pertinent texture category of the observed image [35].Classification here would mean the machine language classification of the different classes and not the linnaean taxonomic system of plant classification [36].When the prior knowledge of the established classes are available and the texture features are extracted, the given image could be classified to the appropriate class.
The texture class i consisting of a set of n images can be represented as Where  , is the member image.
Various classifiers used in this work are Stochastic Gradient Descent (SGD) [37,38], k Nearest Neighbour (kNN) [39,40], Support Vector Machines based on Radial basis function kernel(RBF) [41], Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA) [42].SGD is an approach to discriminative learning of linear classifiers under convex loss functions.kNN computes the distance between a test point and all points in the training set.SVM are supervisor learning methods which are similar to SGDs and these are effective in high dimensional feature spaces.Linear Discriminant Analysis (LDA) is a classifier with a linear decision boundary, generated by fitting class conditional densities to the data.Quadratic Discriminant Analysis (QDA) is a classifier with a quadratic decision boundary, generated by fitting class conditional densities to the data.Parameter tuning is of importance while dealing with certain classifiers.The k value in the kNN classifier ranges from 1 to a maximum of the square root of the number of instances [43].Choosing a lower k value will have a greater influence of noise on the result while a higher k value will increase the computation time.Moreover the value of k is chosen to be a odd value to lessen the computation time [44].The parameters of SVM with a Gaussian radial basis function (RBF) kernel are C and  the parameter C controls the influence of each individual support vector.A low bias and high variance is obtained from larger C and vice versa.The parameter handles non-linear classification.A higher gamma value will give a high bias, low variance and vice versa.The standard and optimal method to choose the optimal parameters C and  is a Grid Search (GS) [45].
The classifier places all of the N training images in a vector space with N dimensions, based on the features extracted from the image.When a new uncategorized test image is placed in that vector space, the best suited image is found using the pairwise-distance function.The pairwise-distance between two images are calculated as given in the equation ( 2) where d1 and d2 are the two images.When the distance between d1 and d2 are the smallest, the most suitable image is obtained.
Where L is the number of matching images.
The test image is estimated to belong to a specific category, after generation of its features, when the majority of its neighbors are also in the same category.

Datasets
The medicinal plant leaves are collected around the Western Ghats region of Kanyakumari forests.The digital images are obtained using the Canon EOS 1000D digital camera on the abaxial portions of the medicinal plant leaves in the uncompressed JPEG format with the dimension, 3420 x 4320 x 3, in a closed environment to maintain constant illumination.As the leaf characteristics, especially in terms of the colour, vary widely from its tender stage to the mature stage, the proposed algorithm is restricted for images of mature leaves of a

Design and Implementation
The implementation of the proposed algorithm is done using Python as the programming language in Linux platform along with the open-source libraries namely numpy, scipy, scikit-learn, colorsys and grapefruit.The various stages of the algorithm developed in identifying the medicinal plant leaves using colour textures are as given in Algorithm 2 and are explained here.
The medicinal leaf image from our dataset is available as an m x n x p colour image L with p colour channels in the form of equation ( 3) where i = 1,2,…m, j = 1, 2,… n, k = 1,2…p.p represents the number colour channels of the image.
The different colour space transformation, CTk in equation( 4), is calculated from the colour channels LIk using the corresponding colour transform equations given in the respective citations of section 2. A sample of all the considered colour space leaf images are given in the Figure 3 where i = 1, 2,… m,j = 1, 2… n, k = 1, 2… p. p represents the number colour channels of the image.The statistical features using mean (), variance (), standard deviation () and skewness () are calculated directly from the transformation CTk: The co-occurrence matrix and the haralick features energy (), homogeneity (), dissimilarity (), entropy (), correlation () and contrast (), are calculated from the selected channel of the transformation CTk [35].The feature values are used for classification.
The image CTk is used to calculate the local binary pattern operator feature using 8 neighbors and a unit radius for computation of the pattern.From the computed LBP histogram image, the mean of the histogram (L) and standard deviation (L) are calculated and used as a feature.
The general procedure of classification shown in Figure 2 is followed with the dataset.After extracting the features , 2, , 3, , , , , , , L and L for the data set, which can be divided into two sets, the classification is done in two stages: training and testing.The 70% of dataset is considered as training samples and the remaining 30% as testing images.The medicinal plant leaves can be classified using the trained

Results and Analysis
The different colour spaces play a key role in this process of classification.The leaf images taken from the dataset are transformed into different colour spaces.For each colour space is considered with different colour channels.The feature values are computed for all colour channels.Sample feature values of various classes are given in the Table 1.The classification is done using the feature values of each colour channel.

Performance Analysis
The recognition percentage obtained for different colour spaces varies on the colour channels.The best classification result for different colour spaces shown here is considered based on the best classification result channel.The detailed classification result obtained for the colour channel of various colour spaces are shown in Table 3.Primary Colour Spaces: It is observed that the blue channel produced good result of 96% than the red and green channels of RGB colour space for kNN, SVM-RBF and QDA classifiers.In the XYZ colour space, the Y channel gives 96%, which is the best accuracy with X, Y and Z channels for QDA classifier.The CMY colour space performs best with 96% on the Y channel for the three classifiers kNN, SVM-RBF and QDA in yellow channel.The consolidated primary family colour spaces performances are given in Figure 4 The RGB-B and CMY-Y are both giving the best performance of 96 % with kNN, SVM-RBF and QDA.
Luminance Chrominance Colour Spaces: The YIQ colour space presents the best recognition rate of 96% for QDA classifier in Y channel.For YUV colour space, U channel gives very good performance than all other colour space and channels with 98.7% for QDA classifier.The Y CbCr colour space performs the result of 96% in QDA classifier with Cb channel.In YES colour space E channel presents the 97.3% recognition rate for QDA classifer.The U*V*W* colour space produced 96% in V* channel for QDA classifier.Other Colour Spaces: LSLM-LM presents the maximum performance of 97.3%.The LSLM-LM colour space performs best at 97.3% with QDA.The KLT-K colour space performs best at 96.0% with SVM-RBF and QDA.The consolidated performances are given in Figure 7.
When the classifier performance is considered, the QDA performs the best in all colour spaces.The SVM-RBF comes next to QDA in the classification rate.For L*a*b* and HSV, the SVM -RBF produced the best result of 98.7%.For YUV, the QDA produced the best result of 98.7%.

Precision Recall
The precision, recall, f1-support and support are calculated for one of the best performing classifiers QDA, which gives a performance of 98.7% in identification of the leaf and is given in the  4. The confusion matrix for the classifier QDA with the YUV-U channel is given in Figure

Comparison with Related Works
The proposed work is done with the texture features and the colour transforms.The leaf shape, venation are not considered here.The work presented here produced best results than the previous [9] work done with the same database.The comparative results of the different methods are listed out in Table 5.The computations were done with gray scale operations.The neighborhood method [46] operated with 30 plants gives 95.83% accuracy.The Fourier moments [47] with Flavia database gives 62.00%.The PNN-PCNN [47] method applied with Flavia database produces 91.00% accuracy.The SVM-BDT [47] applied with Flavia database presents 96.00% performance.The 82.33% accuracy is found for 1-NN method [48] with 60 species.The texture feature method [9] for 5 species of medicinal plants produced the performance accurac of 94.7%.The proposed method applied with the individual components of the 20 colour spaces considered, yielded the better accuracy of 98.7%.

Conclusion
In this paper, the method to identify the medicinal plants automatically with the help of colour textures calculating the statistical, GTSDM and LBP texture features is proposed.Though a single shade of colour for the considered mature plant leaf is not stable during the growth period of the leaves, usage of colour space is providing more information that helps for better identification.The limitation of this method in identification is that this method works only with the matured leaves of the plant and that it demands more computational time.The colour space YUV-U with QDA, L*a*b*-a* with SVM-RBF, and HSV-H with SVM-RBF give the finest performance of 98.7% among the twenty colour spaces considered and hence it is concluded that the suitable colour space for identifying medicinal plant leaves are YUV with U channel, L*a*b* with a* channel and HSV with H channel.The QDA classifier performs best in most of the different colour spaces considered.

Figure 2 .
Figure 2. Block diagram for Classification

1 : 5 : 6 :
Get image L from the leaf dataset.2: Apply the colour transform in L producing the transformed image T. 3: Extract the features fi from the transformed image T (where i = 1 to 12).4: Store the feature values fi in the database.Repeat the steps 1 to 4 for all training images.Read a test image I from the dataset.7: Extract the features ti from the test image I (where i= 1 to 12) 8: Apply the classifiers using the features fi and ti.9: Display the classified output.10: Calculate the classification accuracy.11: Repeat the stages 6 to 10 for all testing images.end 22 Jurnal Ilmu Komputer dan Informasi (Journal of Computer Science and Information), Volume 10, Issue 1, June 2017

1 :
Let L be the leaf image, where L = LIj and j = 1, 2 • • • p 2: Form the transformed image T using the colour transform for the leaf image L, where T = CTj and j = 1, 2 • • • p 3: Calculate all the following features for each channel of the transformed matrix T. correlation(ρ) and contrast(λ) c) Local Binary Pattern(LBP)mean(µL) and standard deviation(σL).4: Insert calculated features in the feature vector, f1 for each channels of the transformed matrix T. 5: Repeat the stages 1 to 4 for all the training set images 6: Read the test image I from the dataset 7: Form the transformed image K for the test image I, where K = CTj and j = 1, 2

Figure 4 .Figure 5 .
Figure 4.Primary Colour Spaces Figure 6.Perceptual Colour Spaces The colour space L*a*b* very well performed with 98.7% in SVM-RBF classifier.The channel a* projected the highest recognition rate of 98.7% in L*a*b*.The L*u*v* colour space manages 96% in L* channel for the SGD classifier.The S channel in LMS colour space proves with the performance of 96% in three classifiers.The L*u*v* colour space provides 97.3% in L* channel for QDA classifier.The consolidated Luminance Chrominance colour spaces performances are given in Figure 5.The YUV-U with QDA, L*a*b*-a* with SVM-RBF gives the best performance of 98.7%.Independent Colour Space: The I1I2I3 colour space gives 96% in I1 channel for SVM-RBF and QDA classifiers.Perceptual Colour Spaces: The HSV colour space is the best performing colour space for all classifiers except LDA.For SVM-RBF classifier, HSV-H presents the maximum performance of 98.7%.The colour spaces HSI-I, IHLS-H, IHS-I and TSL-T are best at 96% with SVM-RBF and QDA.The performances of Perceptual family of colour spaces are given in Fig 6.The HSV-H yields the best performance of 98.7% with SVM-RBF.

24 Jurnal Ilmu Komputer dan Informasi (Journal of Computer Science and Information),
Volume 10, Issue 1, June 2017 H Arun, Identifying Medical Plant Leaves 25