Research on plant leaf classification and retrieval method based on machine learning

In this article, a variety of leaf images are taken as the research objects. The images were preprocessed and the color, shape and texture features of the leaf images were extracted. Five machine learning algorithms were used to classify and retrieve the feature values of these leaf images, and the recognition effects of each algorithm was obtained.


Introduction
The beginning and the most important part of the protection of green plants is to classify plants efficiently and quickly [1]. Due to limited human cognitive ability and a great variety of plants species, manual classification and identification methods of plants are very inefficient and difficult. In recent years, because of the rapid development of computer technology, image recognition and artificial intelligence research becomes research focus, and image retrieval and classification technology have a wide range of applications in various fields [2].
In this article, relevant leaf image extraction methods are applied to complete the extraction of relevant feature values of leaf images. Then the machine learning classifier is trained. And finally, classification test is carried out to achieve leaf classification and retrieval. By comparing the retrieval rate and recognition rate, a relatively good machine learning algorithm can be obtained.

Classification of plant leaf
Generally speaking, the leaf recognition process can be divided into two stages, namely the training stage and the testing stage. During the training stage, relatively mature training results can be formed by learning the feature vectors of the leaves in the data set. During the testing stage, the feature vectors of the test data set are input into the classifier for prediction, and at last the corresponding plant species are obtained. The machine learning algorithm completes the classification of plant leaf through the above two stages of training and testing.
Where f represents the gray value of each pixel, r represents the value of the red component of each pixel, g represents the value of the green component of each pixel, b is the value of the blue component of each pixel. Leaf image contour extraction can be divided into two parts: contour recognition and contour drawing. This article extracts the contour by calling the library function in Opencv. The result of contour extraction is shown in Figure 1.  Percentage of pixels 1 0.1563 F [4] Percentage of pixels 2 7.9958 F [5] The mean value of tonality 4.7244 The mean value of saturation 0.1587 Table 2. Shows a group of contour features extracted from a certain leaf.

Machine learning algorithm
After the feature extraction of leaf image completed, five commonly used machine learning algorithms in sklearn package are called for training and testing. The trained algorithm is used to predict the type of test data set, and then the accuracy rate and log loss value are calculated by comparing the predicted value type with the actual type of set of testing leaves. These parameters are used to compare the advantages and disadvantages of these machine learning algorithms. (1) k-Nearest Neighbour algorithm, (2) Support vector machine algorithm, (3) Decision tree algorithm, (4) Random forest algorithm.

Leaf image retrieval method
Common image retrieval methods mainly include text -based image retrieval (TBIR) and content based image retrieval (CBIR). The TBIR method marks the features by hand and then retrieves the features. The CBIR method is generally divided into three parts, namely user interface, database image processing and image retrieval module. Feature extraction is also the basic part of image retrieval, including color feature, texture feature, shape feature and space feature, etc. After the feature value of images is extracted, a certain measure method is needed to compare the similarity between the feature vectors of the retrieved image and the features in the image index feature database. The quite common measurement methods are: Minkowsky distance; Histogram; Euclidean distance. Through training and learning, leaf images can be retrieved more efficiently. The foundation of these algorithms is statistical theory. The common machine learning methods include: KNN algorithm, SVM support vector machine algorithm and decision tree algorithm, etc [3][4][5].

Results and analysis
In order to achieve the purpose of the study, over 1500 leaf images from more than 30 plant species were selected for the experiment, half of which were testing data sets and half were training data sets. In this article, Python language was used for programming experiments under the PyCharm environment. The color feature, shape feature and texture feature were extracted by calling all kinds of functions in OpenCV library. Then several machine learning algorithms of the Sklearn library were called to classify, and the corresponding accuracy rate and log loss value were calculated by comparing the sample test value with the predicted value obtained by the machine learning algorithm. The results are as follows:   The above figures are partial operation result figure of this study, figure 5 shows the results of partial feature extraction, figure 6 shows the results of classifier operation, table 4 shows the performance comparison of five machine learning algorithms, figure 7 and figure 8 are the bar charts showing the accuracy rate and log loss results. The results showing, in terms of accuracy, the decision tree algorithm performs best among all, reaching 100%; followed by random forest algorithm, KNN algorithm, nusvc algorithm, all over 90%; the worst one is svc algorithm, only 83%. From perspective of log loss value, svc and nusvc algorithms have distinct log loss value, while other 3 algorithms have no obvious log loss value.

Conclusion
In this article, we find that the accuracy rate can reflect the effect of machine learning well for a given test data set. As can be seen from the results, the decision tree algorithm is significantly better than the other four algorithms, while the SVC algorithm is not as good as others. However, there are many methods and indexes to evaluate machine learning, and the two indexes used in this article cannot fully reflect the advantages and disadvantages of these algorithms. It should also be evaluated comprehensively from professional statistical indicators such as ROC-AUC curve and GINI coefficient when comes to the classification effect of certain machine learning algorithm. In addition, by comparing classification effect of modified algorithm and original one is another effective research angle.