CNN Approaches for Classification of Indian Leaf Species Using Smartphones

Leaf species identification leads to multitude of societal applications. There is enormous research in the lines of plant identification using pattern recognition. With the help of robust algorithms for leaf identification, rural medicine has the potential to reappear as like the previous decades. This paper discusses CNN based approaches for Indian leaf species identification from white background using smartphones. Variations of CNN models over the features like traditional shape, texture, color and venation apart from the other miniature features of uniformity of edge patterns, leaf tip, margin and other statistical features are explored for efficient leaf classification.


Introduction
Leaf detection and classification is fundamental to agriculture, forestry, rural medicine and other commercial applications. Precision agriculture demands plant leaf disease diagnosis for automatic weed identification [Ahmad, Muhammad, Ahmad et al. (2018); Bakhshipour and Jafari (2018); Bakhshipour, Jafari, Nassiri et al. (2017); Dos Santos Ferreira, Freitas, da silva et al. (2017)]; Environment and Forestry needs solutions for automatic tree species identification [Bakhshipour, Jafari, Nassiri et al. (2017); Ghasab, Khamis, Mohammad et al. (2015); Goyal and Kumar (2018); Mouine, Yahiaoui and Verroust-Blondet (2012); Mouine, Yahiaoui, Verroust-Blondet et al. (2013c); Mzoughi, Yahiaoui and Boujemaa (2013); Pahalawatta (2008) ;Yahiaoui, Mzoughi, Boujemaa et al. (2012)]; rural medicine [Ahmad, Muhammad, Ahmad et al. (2018) ;Pornpanomchai, Rimdusit, Tanasap et al. (2011)] involves recognition of plant species for deciding upon the suitability of consumption. Freshness of leaves is an important trait for processing tea leaves. The problems in all of the above areas rely upon leaf classification to a larger extent. By taking advantage of the leaf features, advanced machine learning algorithms could be applied for automatic leaf detection. Most of the existing literature on leaf classification focused largely on shape, texture and color based features. Inspite of the presence of various big datasets [Dobrescu, Valerio and Tsaftaris (2017)] on leaf classification research, ensembling the learning over high dimensional features of leaf image data is less addressed. This paper proposes deep learning based approaches for plant leaf classification using large feature set for Indian leaf species.
literature [Wang, Yang, Tian et al. (2007)]. Active polygons Bell et al. [Bell and Dee (2019), Rabatel, Manh, Aldon et al. (2001)] and active contours [Mishra, Fieguth, Clausi et al. (2010);Qiangqiang, Zhicheng, Weidong et al. (2015)] are noteworthy to mention. Histograms [Pape and Klukas (2014)] are widely used for background image separation. For faster detection, leaves required to have a plain white background. Overlapping leaves are also dealth with in literature [Pape and Klukas (2014); Soares and Jacobs (2013); Wang and Min (2012)]. Deep CNNs have been proposed for leaf counting applications [Aich and Stavness (2017); Dobrescu, Valerio and Tsaftaris (2017)]. Pyramid CNN [Morris (2018)] seeks to combine statistical boundary detection approaches [Arbelaez, Marie, Fowlkes et al. (2010); Dollár and Zitnick (2014); Kirk, Anderson, Thomson et al. (2009)], and CNN based boundary detection algorithms [Shen, Wang, Wang et al. (2015); Xie and Tu (2017)] with additional advanced CNN architectures [Newell, Yang, Deng et al. (2016)] for dense leaves segmentation. However, it does not involve testing the dataset in wild forestry. Though leaf boundary detection in dense setup was successful including approaches for closed-boundary leaf segmentation, it was not convincing for leaves possessing internal textures. Also, strong additional cues were necessary for achieving high precision. Colour characteristics were predominantly used to distinguish green plants away from soil for leaf area estimation purposes [Rasmussen, Norremark, Bibby et al. (2007); Meyer and Neto (2008), Kirk, Anderson, Thomson et al. (2009)]. Cues like ExG (Excess Green Index) and ExR (Excess Red Index) provided a clear contrast between plants and soil, and has been widely used in separating plants from non-plants [Zheng, Zhang, Wang et al. (2009);Burgos-Artizzu, Ribeiro, Guijarro et al. (2011);Guerrero, Pajares, Montalvo et al. (2011)]. Colour Index of Vegetation Extraction (CIVE) was proposed for measuring growth status of crops. Other combined indices derived upon primary color cues were also proposed [Meyer and Neto (2008) [Zheng, Zhang, Wang et al. (2009)] and Fisher Linear Discriminant (FLD) [Zheng, Shi, Zhang et al. (2010)] proved to improve the quality of segmentation. Other methods like Affinity Propagation-Hue Intensity (AP-HI) [Yu, Cao, Wu et al. (2013) (2014)] applied 3D histograms for pixel level classification and robust leaf edge detection [Pape and Klukas (2015)]. Vukadinovic et al. [Vukadinovic and Polder (2015)] use neural network based pixel classification techniques for background separation, and proceed with watershed segmentation approaches for segmenting leaves. Yin et al. [Yin, Liu, Chen et al. (2014); Ye, Cao, Yu et al. (2015)] uses chamfer matching techniques. Super pixel approach [Shen, Wang, Wang et al. (2015)] is also proposed for color-based and  [Ren and Zemel (2017)] propose another remarkable progress in leaf identification research. They use RNN models which remember the previously identified leaves. Basically leaf edge detection approaches works with counting the leaves and establishing the leaf area of a growing plant. Shallow CNN [Bell and Dee (2019)] is used to distinguish plant edges from leaf edges. Canny edge detection is applied before region-based segmentation. These sequence of approaches help in better elimination of occluded leaf images. The literature on plant species detection also shown in Tab. 1.  (2003) 2007)]. Considering the progress of above literature, this paper proposes various approaches for Indian leaf species identification using deep learning.

Automated identification of leaf species
The idea is to classify the plant species after proper edge detection and segmentation. The proposed work utilizes a cluster of edge detection algorithms which is discussed in the next subsection shown in Fig. 1 (1-14).

Prewitt edge detection
Prewitt is a discrete differentiation operator, which computes the gradient approximation of image intensities. In other words, the prewitt operator calculates the point-wise image intensity to capture the smooth variation of leaf image changes at any direction. Horizontal and Vertical intensities are calculated which are then examined for the direction which has the largest possible intensity variations. The operator uses 3×3 kernels one each for horizontal and vertical directional changes. For the leaf image, assuming are the two gradient vectors of horizontal and vertical directions respectively, the resulting gradient approximation is given by Eq.

Sobel edge detection and laplacian edge detection approaches
The conventional Sobel edge detector and Laplacian edge detector is also applied for leaf edge and vein segmentation. The outcome of sobel operator and laplacian operator is averaged with prewitt edge detection and the skeleton of leaf is obtained for further classification.

K-Nearest neighbor classification
The edge detected leaf images are subjected to classification using k-NN approach. The PSNR value for each image is multiplied by 100 and taken as input to the k-NN code. The k-NN uses Manhattan distance to find the K nearest neighbors and takes a majority vote to classify a particular image. Extra values are taken for normalization and it does not affect the k-NN calculation as same values are used for each dataset, hence distance between them is 0. Leaves of Pipal, Nerium, Neem, Ashoka, Crown flower, Cannonball tree, Hibiscus, Mango, Mint, Lemon, Moringa, Betel, Jackfruit and Curry Tree were clicked in smartphone (android): and were considered for examination. Ten positional variations for each species were captured in mobile phone camera under white background. The algorithm resulted at 72% accuracy for detecting 9 leaf species' positions and 79% accuracy for detecting all 14 leaf species.
Structural Similarity values indicated poorer recognition accuracy upon various positions and an overall PSNR evaluated to better values for leaves of Crown flower, Cannonball tree, where lower PSNR values evaluated to worst evaluation for Curry Leaves. The reason is that the dataset consisted of Neem leaves which is close to Curry leaves' structure and shape; However crown flower and cannonball tree flowers have distinct characteristics in color, shape, vein and texture which resulted in much higher accuracies.

SVM classification without edge detection
The fundamental approach for classification using SVM is adopted here. 14 Indian leaf species were examined using basic SVM. Fig. 2. shows the SVM classification accuracy of Nerium across other species without edge detection. It is interesting to note that Nerium is misclassified as Mango and Neem at various experiments. This emphasizes the need for edge detection before classification. Accuracy of other species before edge detection is also presented in Figs. 3-16. Though the detection is reasonably high the misclassification is also high.

Deep learning based approaches
We have also explored the possibility of k-NN, SVM in pre-training with ANN. The results are promising when compared to all earlier approaches. Firstly, we defined 2 different preprocessing functions using openCV package. The first one is called image to feature vector, to resize the image and then flatten the image into a list of row pixel. The second one is called extract color histogram, to extract a 3D color histogram from the HSV color spacing using cv2.normalize and then flatten the result. We use 85% of the dataset as train set, and 15% as the test set. Finally we applied the KNN, SVM for pretraining and ANN to evaluate the data. In k-NN, the raw pixel accuracy and histogram accuracy are relatively same. In 5 labels sub-dataset the histogram accuracy is a little bit higher than raw pixel, but overall, the raw pixel shows better result. In ANN classifier, the raw pixel accuracy is much lower than histogram accuracy. For the whole dataset (10 labels), the raw pixel accuracy is even lower than random guessing. Based on the results, we found that in order to improve the accuracy listed in Tab. 2, its necessary to use some deep learning method. In addition we have implemented leaf detection with MLP (Multi-layer perceptron).
(MLP) models were successfully used for image recognition, due to the full connectivity between nodes they suffer from the curse of dimensionality and thus do not scale well to higher resolution images. So in this part we built a CNN using deep learning frame work by Google -Tensor Flow. Tensor Flow defines the CNN architecture as a stack of distinct layers that transform the input volume into an output volume (e.g. holding the class scores): through a differentiable function. We assumed the first layer to hold the images, followed by 3 Convolutional layers with 2 x 2 max-pooling and Rectified Linear Unit (ReLU). The input is a 4-dim tensor with the following dimensions: Image number, Yaxis of each image, X-axis of each image, Channels of each image. The output is another 4-dim tensor with the following dimensions: Image number, same as input, Y-axis of each image. If 2x2 pooling is used, then the height and width of the input images is divided by 2, X-axis of each image, Channels produced by the convolutional filters. The 2 Fully-Connected Layers were built at the end of the network. The input is a 2-dim tensor of shape [num_images, num_inputs]. The output is a 2-dim tensor of shape [num_images, num_outputs]. However to connect Convolutional layers and Fully-Connected Layers a Flatten Layer is needed to reduce the 4-dim tensor to 2-dim which can be used as input to the fullyconnected layer. The very end of CNN is always a softmax layer which normalize the output from Fully-connected layer so that each element is limited between 0 and 1 and all the elements sum to 1. To optimize the training Cost function is used i.e., cross entropy. The Optimization Method is Adam Optimizer () which is an advanced form of Gradient Descent. Further we have explored yet another variation of CNN. We attempted at retraining the last layer of a pre-trained deep neural network called Inception V3, also provided by Tensor Flow. Inception V3 is trained for the ImageNet Large Visual Recognition Challenge using the data from 2012. This is a standard task in computer vision, where models try to classify entire images into 1000 classes, like "Zebra", "Dalmatian", and "Dishwasher". In order to retrain this pre-trained network, we ensured that our own dataset is not already pertained. Modern object recognition models have millions of parameters and can take weeks to fully train. Transfer learning is a technique that shortcuts a lot of this work by taking a fully-trained model for a set of categories like ImageNet, and retrains from the existing weights for new classes, the results in Tab. 3. Though it is not as good as a full training run, this is surprisingly effective for many applications, and can be run in as little as thirty minutes on a laptop, without requiring a GPU. First from the pre-trained model, the old top layer is removed, and a new layer is trained on the dataset. None of the leaf images were involved in pre-training. The magic of transfer learning is that lower layers that have been trained to distinguish between some objects can be reused for many recognition tasks without any alteration. The script runs with 4,000 training steps. Each step chooses ten images at random from the training set, finds their bottlenecks from the cache, and feeds them into the final layer to get predictions. Those predictions are then compared against the actual labels to update the final layer's weights through the back-propagation process. In CNN based models we attempted at examining the CNN model with sigmoid as well.
The detection accuracy shown in Tab. 4. is well appreciable when compared to earlier models like k-NN and SVM. However, the models were subjected to sample image edge detection before feature learning and classification. Binary CNNs were used in sigmoid variation. The CNN models were subjected to pre-training with plain SVM discussed in earlier section. 20 epochs were planned and the validation loss and accuracy are obtained. Almost up to 12 epochs the validation loss is reduced to 50% as compared to training loss in Figs. 17-20.The validation loss increases after 19 epochs. Therefore a stopping criteria of 20 epochs is chosen for the proposed work. The validation accuracy for every iteration per epoch is also presented in Figs. 21-22. Fig. 23 presents the accuracy of leaf identification of Binary CNN without pre-training. The detection is much lower when compared to pre-training which supports the fact that pre-training using the proposed methods improves the CNN classification accuracy in Fig. 24.

Conclusion and future work
The paper proposes CNN based approaches for detecting Indian leaf species. The experiments were conducted with pre-training and edge detection. CNN is experimented with softmax as well as sigmoid layer. The results validate that with proper edge detection and pre-training, binary CNN with sigmoid is able to detect the leaf species more accurately. In future, more exploration of fast and robust CNNs with multiple deep layers would support real-time leaf detection using smartphones.