Combining Leaf Shape and Texture for Costa Rican Plant Species Identification

In the last decade, research in Computer Vision has developed several algorithms to help botanists and non-experts to classify plants based on images of their leaves. LeafSnap is a mobile application that uses a multiscale curvature model of the leaf margin to classify leaf images into species. It has achieved high levels of accuracy on 184 tree species from Northeast US. We extend the research that led to the development of LeafSnap along two lines. First, LeafSnap’s underlying algorithms are applied to a set of 66 tree species from Costa Rica. Then, texture is used as an additional criterion to measure the level of improvement achieved in the automatic identification of Costa Rica tree species. A 25.6% improvement was achieved for a Costa Rican clean image dataset and 42.5% for a Costa Rican noisy image dataset. In both cases, our results show this increment as statistically significant. Further statistical analysis of visual noise impact, best algorithm combinations per species, and best value of k, the minimal cardinality of the set of candidate species that the tested algorithms render as best matches is also presented in this research.


Introduction
Plant species identification is fundamental to conduct studies of biodiversity richness of a region, inventories, monitoring of populations of endangered plants and animals, climate change impact on forest coverage, bioliteracy, invasive species distribution modelling, payment for environmental services, and weed control, among many other major challenges for biodiversity conservation.Unfortunately, the traditional approach used by taxonomists to identify species is tedious, inefficient and error-prone [1].In addition, it seriously limits public access to this knowledge and participation as, for instance, citizen scientists.In spite of enormous progress in the application of computer vision algorithms in other areas such as medical imaging, OCR, and biometrics [2], only recently have they been applied to identify organisms.In the last decade, research in Computer Vision has produced algorithms to help botanists and non-experts classify plants based on images of their leaves [3,4,5,6,7,8,9,10,11,12].However only a few studies have resulted in efficient systems that are used by the general public, such as [13].The most popular system to date is LeafSnap [13].It is considered a state-of-the-art mobile leaf recognition application that uses an efficient multiscale curvature model to classify leaf images into species.LeafSnap was applied to 184 tree species from Northeast USA, resulting in a very high accuracy method for species recognition for that region.It has been downloaded by more than 1 million users [13].LeafSnap has not been applied to identified trees from tropical countries such as Costa Rica.The challenge of recognizing tree species in biodiversity rich regions is expected to be considerably bigger.Noisy Subset Fresh leaf images were captured during field trips to both La Sabana and INBiopark.No press was used to flatten them.A total of 2345 fresh leaf images were captured.This subset was captured against white uniform backgrounds (normally a sheet of paper).Each image has a 3000x4000 pixel resolution, in JPG format.No artifacts were removed manually.However as explained in Section 3.3 several automated image enhancements were performed both on the clean subset and the noisy subset.Figure 1b presents a noisy leaf image sample.The camera used is a Canon PowerShot SD780 IS.

Image Leaf Segmentation
The first step to process the leaf image is to segment which pixels belong to a leaf and which do not.We used the same approach as LeafSnap by applying color-based segmentation.

HSV Color Domain
When segmenting with color it is imperative to use the right color domains in order to exclude undesired noise.[13] states how, in the HSV domain, Hue had a tendency to contain greenish shadows from the original leaf pictures.Saturation and Value however, had a tendency to be clean.So we also used those two color components for leaf segmentation.Figure 2 shows the noise present in the Hue channel, but also shows how Saturation and Value are cleaner.This was useful for posterior segmentation using Expectation-Maximization (EM).We used OpenCV [24] to convert the original images into the HSV domain.Then, by using NumPy [25] , we extracted the Saturation and Value components, which were fed to the Expectation-Maximization (EM) algorithm.

Expectation-Maximization (EM)
Once images were converted to HSV and the desired channels were extracted, we applied EM to the color domain in order to cluster the pixels into one of 2 possible groups: leaf and non-leaf groups [13].Figure 3 shows several samples of the final segmentation after applying EM.As shown, EM segments the image into the leaf and non-leaf pixel groups by assigning a 1 to the leaf pixels and a 0 to the non-leaf pixels.This method also works well on both simple and compound leaves.It is important to highlight that we did not assign weights to each cluster manually as the work done by [13], because we wanted to leave the process as automatic as possible.In their work, they improve the segmentation of certain types of leaves, especially skinny ones, by manually assigning different weights to each cluster.Weights play a fundamental role into the segmentation process as reported in [26].
Training Algorithm 1 describes the process to train the EM algorithm.We used OpenCV's implementation of EM.First we stacked all the pixels of the image matrix into a single vector.Then we trained the model using a diagonal matrix as a co-variance matrix, and we assigned two clusters to it, which internally were translated into two Gaussian Distributions, one for the leaf cluster and one for the non-leaf cluster.Once trained, we returned the EM object.

Algorithm 1 EM Training
stackedP ixels ← ∅ for all pixelRow in image do for all pixel in pixelRow do stackedP ixels ← stackedP ixels ∪ pixel end for end for EM ← OpenCV.EM (nClusters = 2, covM atT ype = OpenCV.DIAGON AL) EM.train(stackedP ixels) return EM Pixel Prediction Algorithm 2 explains how the owning cluster of a single pixel of the image was predicted.Once the EM object was trained, the OpenCV's implementation allowed to compute the probabilities of the pixel belonging to each cluster.However, for more efficiency, we created a dictionary containing each unique (Saturation, V alue) pair as key, and the cluster as value.If the key was not found in the dictionary, we then proceeded to predict the probabilities for each cluster, added the key and cluster to the dictionary, and returned the associated cluster with the biggest probability.After segmentation of the leaf using EM, some extra work was needed to clean up several false positives areas.We followed the process of LeafSnap [13].First of all, each image was clipped to the internal leaf size provided by the segmentation.Then the image was resized to a common leaf area, followed by a heuristic applied to delete undesired objects.Finally, the stem was deleted since it added noise to the model of curvature (not that much to the texture model).

Clipping
Before extracting features, a clipping phase was needed in order to resize the region where the leaf was present to a common size.The clipping algorithm was trivial to implement once the contours were calculated using OpenCV.As shown in Algorithm 3, the minimum and maximum coordinates were calculated for all contour x and y components, followed by a cut of the leaf image matrix to those resulting minimum and maximum coordinates.The was used to allow posterior algorithms ignore false positives regions that intersect the border.The results of the Clipping phase can be seen in Figure 4.

Resizing Leaf Area
Once the leaf area had been clipped, a resize was applied in order to standardize the leaf areas inside all images.If not, the model of curvature would be affected negatively since the amount of contour pixels varied significantly [13].Our implementation of the resize was applied to the whole clipped image.Images may end up having different sizes, but the internal leaf areas were the same or almost the same.Algorithm 4 shows how a new width and height were obtained by calculating the ratio between the current leaf area, the desired new leaf area, and the current height and width of the image.Finally, OpenCV was used to resize the clipped image to a constant leaf size of 100, 000 pixels.This number was used empirically based on LeafSnap's original dataset resolution and the internal regions associated with leaf pixels.This approach means that the absolute measures of leafs are lost.

Deleting Undesired Objects
Even when uniform background images were used, initial segmentation turned out not to be enough when the image contained undesired objects, such as dust, shadows, among others.[13] attempted to delete these noisy objects by using the same heuristic we implemented as shown in Algorithm 5.By using Scikit-learn [27] we calculated the connected components of the segmented image.We deleted the "small" components by area (in pixels).Small components were normally dust, small bugs or pieces of leaves, among other things.Once all small components were deleted, if the remaining was only one then we took that to be the leaf.If more than one component remained, then we calculated for each remaining component how many pixels had intersections with the image margin.We then deleted the component with the biggest number of intersections.The thinking behind this is to get rid of components that were not centered on the image, which tend to be non-leaf objects.Finally, the component with the biggest area from the remaining components was taken as the leaf.

Deleting the stem
We followed the approach for stem deletion described in [13].If the stem was left intact, it would add noise to the model of curvature, given all the possible sizes the stem may take.Algorithm 7 shows the procedure.First, a Top Hat transformation was applied to the segmented image in order to leave only possible stem regions, as shown in Figure 5. Then all connected components were calculated from the Top Hat transformed image, and also their quantity.Then we looped over all the components, deleting every single one from the original segmentation and recalculating the new number of connected components.If the original number of recalculated connected components did not change upon deletion, that meant the current component was a good stem candidate (heuristically, a stem does not affect how many original connected components there are).Once all stem candidates were calculated, the one with the biggest area and largest aspect ratio was chosen to be the stem, as described in Algorithm 6.

Leaf Feature Extraction
Feature extraction was designed and implemented considering three main design goals: • Efficiency: algorithms should be fast enough to support future mobile apps.
• Rotation invariance: the leaf may be rotated by any angle within the image.Two different feature sets were calculated.The first one captures information about the contour of the leaf, while the second one captures information about its texture.Section 3.4.1 describes how we implemented Histogram of Curvature over Scale (HCoS) [13] to extract contour information.Section 3.4.2describes how we implemented Local Binary Pattern Variance (LBPV) to extract texture information.Both models generate histograms that are suitable for distance metric calculations.

Extracting contour information (HCoS)
The model of curvature used by LeafSnap comprises several steps.Previously explained segmentation and post-processing resulted in a mask of leaf and non-leaf pixels.The non-leaf pixels have values of 0, and the leaf pixels have values of 1. First, the different contour pixels were found, then 25 different masks with disk shapes were applied on top of each contour point, providing both an area of the intersection and an arc length.Then all calculations at each scale were turned into a histogram, resulting in 25 different histograms per image, one per scale.Finally, the 25 resulting histograms were concatenated, conforming the HCoS.
Contours On a binary image (resulted from the previous segmentation), the OpenCV implementation of contour finding worked very well, based on the original algorithm of [28] for contour finding.The algorithm generated in a vector of pairs (x, y) that represent the coordinates where a contour pixel was found.A contour pixel can be defined as a pixel which is surrounded by at least another pixel with the opposite color of it.Figure 6 shows in red the contour pixels detected in the original image, calculated from the segmented mask.Notice how shadows affect the contour algorithm, since they were not segmented perfectly.

Scales
The original algorithm of [13] makes use of 25 different scales, creating one disk per scale.We implemented a discrete version of the disks making use of matrices based on [29], whose code is available in 0 0 1 0 0 0 1 1 1 0 1 1 1 1 1 0 1 1 1 0 0 0 1 0 0 (a) A filled disk of radius=2 pixels used to calculate area of the intersection with the leaf segmentation 0 0 1 0 0 0 1 0 1 0 1 0 0 0 1 0 1 0 1 0 0 0 1 0 0 (b) An unfilled disk (ring) of radius=2 pixels used to calculate arc length of the intersection with the leaf segmentation The disks used are actually matrices of 1's and 0's.They were applied as masks over specific parts of the segmented leaf image (mostly contour points).The idea was to count how many pixels intersected the segmented image and each disk mask.We created two different types of disks.The first type is filled up with 1's, as shown in Figure 7a.It is used to measure the area of intersection.The second type is more like a ring, where 1's are present only in the circumference of the disk (see Figure 7b).It is used to determine the arc's length of the intersection of the disk with the leaf, at a given contour point.
Once all disks were created for both area and arc length versions, we applied them to each pixel of the contour vector, as shown by Algorithm 8.

Algorithm 8 Area and Arc Length Vector Calculation
arcs ← empty areas ← empty for all pixel of the contour vector do for all areaM ask, arcM ask = 1 to 25 do center areaM ask, arcM ask at current contour pixel area ← count(areaM ask ∩ segmentation) areas ← areas ∪ area arc ← count(arcM ask ∩ segmentation) arcs ← arcs ∪ arc end for end for Figure 8 shows how one specific area disk was applied to the segmented image, for an specific scale (radius=18 in this case), at a given contour pixel.The gray area shows the intersection of pixels with the leaf segmentation.This procedure was then repeated over all the pixels from the contour vector in the same way.
Histograms Using NumPy at each scale, a histogram was created from all the values generated from all contour pixels, as described by Algorithm 8. We used histograms of 21 bins, as [13] did.This means a total

Extracting texture information (Local Binary Pattern Variance (LBPV))
We aimed at improving the model of curvature by adding texture analysis.We used a Local Binary Pattern Variance (LBPV) implementation called Mahotas [30] that is invariant to rotation, multiscale, and efficient.This implementation of LBPV is based on the algorithm of [31] and makes use of NumPy libraries to represent the image and the resulting histograms.It works on gray images, so we used OpenCV to convert the Red Green Blue (RGB) images to gray scale images.The LBPV approach detects micro structures such as lines, spots, flat areas, and edges [31].This is useful to detect patterns of the veins, areas between them, reflections, and even roughness.Figure 9 shows what two different LBPV implementations look like.The upper image shows a radius = 2, pixels = 16 (R2P16) implementation, and the one below shows a radius = 1, pixel = 8 (R1P8) pixel implementation.The different variants of the LBPV used are shown in Table 1.In some cases we concatenated two histograms of different scales such as R1P8 & R2P16.It is important to note that we did not use the variant which samples 24 pixels, since it generated too large histograms.We did, however, run some tests in which we noticed the 24 pixels variation didn't add more accuracy, so we decided to ignore this method.
Just like the HCoS, LBPV generates histograms that can be used for similarity search.Several histograms were generated at different radius sizes and different circumference pixel sampling, in order to validate which combinations provided the best results.The Mahotas implementation returned a histogram of the feature counts, where position i corresponds the count of pixels in the leaf texture that had code i.Also, given that the implementation is a LBPV, non-uniform codes are not used.Thus, the bin number i is the i − th feature, not just the binary code i [30].Figure 10 describes at a very high level how the process of extracting the local patterns histograms works.First, the image is converted to a gray scale image.Then, for each pixel inside the segmented leaf area, we calculated the local pattern with different radius and circumference using the mahotas implementation.Finally, each pattern was assigned to a bucket in the resulting histogram.Each pixel has a number assigned to it corresponding to a pattern, and the histogram was created using all those numbers from the segmented leaf pixels.

Species Classification based on Leaf Images
Once all histograms were ready and normalized, a machine learning algorithm was used to classify unseen images into species.We implemented the same classification scheme used by LeafSnap.The following paragraphs describe how k Nearest Neightbors (kNN) was implemented.
Scikit-learn's kNN implementation was used for leaf species classification.This process was fed with previously generated histograms from both the model of curvature using HCoS and the texture model using LBPV.Additional code was created to take into consideration only the first matching k species, not the first k images, as shown by Algorithm 9.The difference resides in taking into account only the best matching image per species, until completing the first k species [13].We used 1 <= k <= 10 in order to measure how different algorithms behaved as the value of k increased.

Distance Metric -Histogram Intersection
We tested the basic Euclidean distance to measure similarity between histograms, however the results were not encouraging.We implemented the histogram intersection shown on Equation 1, where I(x, y) is the histogram intersection between a histograms x and y of same size, n is the number of bins, and x i and y i are each a bin in histograms x and y, respectively.This distance metric is also normalized to unit length.

Accuracy
Let E be an identification experiment that consists of a model M , a set S that contains n images of leaves of n (not necessarily different) unknown tree species to be identified, and an integer value k, k ≥ 1.We define hit(M, k, x) as a boolean function that indicates if model M generates a ranking in which one of the top k candidate species is a correct identification of sample x.Equation 2 formally defines Accuracy(M, S, k).

Experiments
Several model variations were used in the experiments (see Table 2).
2. Several scales of the texture model based on LBPV.One Versus All One approach to test a model is to partition a dataset into two datasets: one for training and one for testing.Another approach is to use One versus All, that is, each image in a dataset with n elements is considered a test image and the remaining n − 1 images the training subset.We used both approaches as explained at the end of this section.
Combining Curvature and Texture When combining two different models, we faced the issue of having different scales in the resulting ranking of each model.This was resolved by normalizing the rankings to unit length.After normalizing the rankings (one per combined algorithm), we assigned a factor to each combined model in order to rank the predicted species into a single ranking.This factor sums 1 in total.However we varied the factor associated with each model to see the behavior across different combinations.We used factors of (0.10, 0.90), (0.20, 0.80), (0.30, 0.70), (0.40, 0.60), (0.50, 0.50), (0.60, 0.40), (0.70, 0.30), (0.80, 0.20), (0.90, 0.10).For example, (0.50, 0.50) means we gave the same level of importance to each model on that combination.Algorithm 10 describes how the merge between two methods was achieved.

Texture and Curvature Model Experiments
We ran all models M described in Table 2, with 1 ≤ k ≤ 10, and the following data sets: Costa Rica clean subset (One versus All, n = 1468), Costa Rica noisy subset (One versus All, n = 2345), and Costa Rica complete data set (training set with all 1468 clean images and testing set with all 2345 noisy images).In each experiment, Accuracy(M, S, k) was calculated for the corresponding dataset S. In addition, for model Using the clean and noisy datasets, we calculated a General Linear Model (GLM) per species over a total of 65 species.We aimed at discovering the following: • What is the minimum value of k that provides results statistically equivalent to those obtained when k = 10 for each species?Obviously, accuracy increases as the value of k increases.However, for practical reasons, we would like to test if there is a threshold value k after which accuracy remains statistically equivalent to using k = 10.For example, in a mobile app users would appreciate if the number of best ranked species is not the maximum value 10, but a smaller number.
• What is the best algorithm or combination of algorithms for each species?For this we used five different algorithms: R1P8 & R3P16 (texture alone), 0.1 HCoS and 0.9 R1P8 & R3P16, 0.5 HCoS and 0.5 R1P8 & R3P16, 0.9 HCoS and 0.1 R1P8 & R3P16 and HCoS (curvature alone).This also includes creating clusters of species based on their most significant algorithms, and understanding the clusters with more species and best accuracies.
• Does noise decrease the accuracy level obtained per species?Can we find some species that are not affected by noise in the data?
To achieve this, we calculated a GLM per species to detect significance of noise, algorithm used and value of k.We used a confidence level of 0.95.Once each GLM was calculated and each main effect significance known and proven, we calculated if all levels within each factor were statistically equivalent.We are actually trying to find the levels that are significantly different, for all three factors..We used a Tukey statistical test for each factor.Table 3 shows the different factors and levels used during this experiment.

Statistical Analysis of Best Algorithms for k = 5
Because k = 5 has become an informal benchmarking value in other research [13], it is important to discover what algorithms got the best accuracy when k = 5.For this experiment, we ran a Binary Logistic Regression and optimized it, thus maximizing the probability of a successful identification.Based on the resulting regression model, we calculated the two best algorithms for both noisy and clean factors, and k = 5, per species.

Comparison with Others Studies
In order to set a baseline, several other studies have used the Flavia dataset for their research [6].Table 4 shows the comparison of these studies and our approaches.Some studies do not report accuracy but precision only.The best accuracy of our work was achieved, on this dataset, by adding 0.5 HCoS and 0.5 R1P8 & R3P1 with k = 10 for a 0.991.We also attempted to use texture only, which shows to be very extremely accurate with up to 0.98.This dataset has been, however, artificially cleaned, so other studies should be evaluated on more complex datasets.

Texture and Curvature Model Experiments
Clean Subset As shown in Table 5, the best results were obtained when k = 10 and the model is 0.5 HCoS and 0.5 R1P8 & R3P16.The resulting accuracy is 0.945, in contrast with the accuracy of HCoS which is 0.79.Notice however that 0.5 HCoS and 0.5 R1P8 & R3P16 is also the best for all values of 6 <= k <= 10.For 1 <= k <= 5, 0.5 HCoS and 0.5 R1P8 & R3P16 and 0.1 HCoS and 0.9 R1P8 & R3P16 have very similar levels of accuracy.Figure 11a more clearly depicts these comparisons.
Noisy Subset Figure 11b clearly shows that 0.1 HCoS and 0.9 R1P8 & R3P16 has the best accuracy for all values of k.In addition, the level of accuracy improvement with respect to HCoS is considerably larger, ranging from 35.2% when k = 10 to 42.5% when k = 4 as shown in Table 7.
Complete Dataset As Figure 11c shows, the level of accuracy is considerably lower for all models, as compared to the previous two experiments.Even the best model achieves levels of accuracy in a poor [14.5%, 43.9%] range.
Discussion These experiments show how, in general, the combination of HCoS and LBPV consistently increases the accuracy of HCoS alone.Accuracy declines as the combination factor assigned to curvature reaches 1.Overall, the best combination seems to be 0.1 HCoS and 0.9 LBPV.It is also important to notice how the accuracy is sensitive to the quality of the dataset.The clean subset has a tendency to improve the recognition accuracy, in contrast with the noisy subset.This reflects the importance of good pre-processing and good segmentation.Shadows, dust, and other artifacts affect the final accuracy results.

Measuring Significance of the Accuracy Increase
As shown in the previous section there is an increase in accuracy when texture is added to our implementation HCoS.This, however, may not be statistically significant.We proceeded then to apply a Statistical Proportion Test for Two Samples.Our null hypothesis H0 is that the accuracy of the implementation of HCoS equals the ones obtained by combining curvature and texture.In contrast, our alternative hypothesis H1 is that the accuracy of the implementation of HCoS is less than the combinations.
Proportion Tests on the Clean Subset Table 6 shows the results obtained for all the proportion tests for the clean subset.Most combinations of HCoS and R1P8 & R3P16 for 1 <= k <= 10 resulted in very low p-Values, which reject H0.However a few accuracy increases from 0.9 HCoS and 0.1 R1P8 & R3P16 did fail the test.This means that, as the weight increases for HCoS, it starts getting non-significant accuracy increases, which makes sense since it is almost equal to HCoS alone.
Proportion Tests on the Noisy Subset Table 7 shows the results obtained for all the proportion tests for the noisy subset.All combinations of HCoS and R1P8 & R3P16 resulted in very low p-Values, which reject H0.
Proportion Tests on the Complete Dataset Table 8 shows the results obtained for all the proportion tests on the complete dataset of leaf images from Costa Rica.Almost every single test rejected H0.For k = 1 the results are not significant.In all Proportion Tests, by adding texture with a bigger factor the model improves significantly the accuracy.As the factor assigned to texture declines, the improvement becomes statistically insignificant.

Processing Time
As shown in Figure 12, times range from 2.76 to 12.81 seconds.However, the median of the elapsed time is 5.70 seconds for the clean subset and 5.66 seconds for the noisy subset.These are suitable times even for mobile applications that use the developed back-end.

Statistical Analysis of Noise Affectation, Best Algorithms per Species, and best value of k
Table 9 shows the results of each GLM per species.Each species has the accuracy maximum, mean and median.Also, a cluster has been assigned regarding the best algorithms resulted from the Tukey test per species.Table 10 depicts the algorithms in each cluster for reference.Additionally, column "Best Without Noise" indicates if noise does affect or not the accuracy for each species.Finally, column " k" indicates the threshold value k per species.As indicated before, any k > k will be slightly better, but this is not statistically significant.Noise Affectation.As Table 9 shows, most species are affected negatively by noise in the data.However, four species do show no significant difference between noisy and clean data.Blackea maurafernandesiana, Brosimum alicastrum, Hura crepitans, and Picramnia antidesma seem to be fairly resilient to noise with these algorithms.Table 9 shows some species which got low accuracy values at the bottom.Annona mucosa and Dendropanax arboreus got a median accuracy of 0.48, and Aegiphila valerioi of 0.45. Figure 13 shows 4 images of these 3 species.We suspect the reasons behind the low accuracy for these species are the shadows present inside the leaf and outside, against the paper sheet.Also it can be noticed some leaves also have physical damage.Dendropanax arboreus in Figure 13a and Figure 13b also shows how different both sides of the same species are, suggesting we need to separate the dataset in both sides of the leaves.
Value of threshold k.The best k is achieved by species Muntingia calabura, with k = 3. Bauhinia purpurea also shows a low value of k = 4. Eugenia hiraeifolia, Genipa americana, Hura crepitans, Quercus corrugata and Urera caracasana have k = 5.Overall, 13 species show a k = 6 value, while 20 species have k = 7, and the rest result in 8 <= k <= 10.As k is lower, then the potential maximum accuracy for that   Best Algorithms per species.Several clusters were identified based on algorithms that showed the best accuracy per species.Table 10 shows the list of clusters.Each cluster contains one to three algorithms that had the same statistical significance during our experiments per species.We have in total 10 clusters based on the best, second best and third best algorithm per species.Table 9 shows that most of the species belong to clusters that had the combination of 0.1 HCoS and 0.9 R1P8 & R3P16.Some also have R1P8 & R3P16 which is texture alone without curvature.Figure 14 shows the accuracy distribution across the 10 clusters formed after carrying out Tukey tests on the different algorithms.The best algorithms have the biggest factor for texture.Cluster 3, Cluster 8 and Cluster 10, have as the best algorithm the combination 0.1 HCoS and 0.9 R1P8 & R3P16, and have the second best accuracy across all species.The very best cluster is Cluster 9, reaching an Accuracy of 1.
Figure 15 shows the distribution of species count per cluster.The cluster with the most species is Cluster 5, with more than 25 species.This cluster, as shown in Table 10, contains two best algorithms: 0.1 HCoS and 0.9 R1P8 & R3P16 combination, and 0 HCoS and 1 R1P8 & R3P16 combination.This means both of them are statistically equivalent for these species.11 shows what algorithms maximize the probability of a good identification, given that k = 5 and the noisy dataset is used.Most algorithms are combinations, from pure texture, to a 0.5 HCoS and 0.5 R1P8 & R3P16 combination.No combination gets near a pure curvature algorithm.Similar results are noted in Figure 16 where the best probabilities are around the 0.2 HCoS and 0.8 R1P8 & R3P16 combination.In general the distribution is very homogenous.
Best Algorithms for k = 5 and clean dataset.Table 12 shows what algorithms maximize the probability of a good identification, given that k = 5 and the clean dataset is used.In this case the best algorithms per species are more spread across most combinations.This is due the lack of noise in the data and the lesser affectation of the curvature algorithms.This, compared with the data of Table 11 confirms how texture seems to be more robust with noise.Figure 17 shows the distribution of probabilities of a good identification per algorithm.On clean data it seems the biggest probabilities are near the center with combinations around the 0.3 HCoS and 0.7 R1P8 & R3P16 combination.

Conclusions
The addition of texture increases significantly the accuracy of our implementation of the HCoS.When comparing HCoS versus the combination of 0.1 HCoS and 0.9 R1P8 & R3P16, for the Costa Rican clean subset, the improvement ranges from 14.1% to 25.5%, depending on the value of k.Similarly, with the noisy subset, the improvement ranges from 35.5% to 42.5%.These improvements were proved to be statistically significant in our experiments.
The complete dataset experiments demonstrated that poor accuracy levels are achieved when noisy images are classified against clean images.We speculate that this is due to the many enhancements that leaf images underwent before being added to the clean dataset.First, leaves were pressed for 24 hours in order to flatten them and thus minimize the presence of shadows.Secondly, Photoshop was used to manually remove artifacts.Finally, image enhancement algorithms (e.g., stem removal) were applied.This result has important implications if a mobile application is developed, given that users will take noisy pictures.As a result we are left with two alternatives.The first one is to use a noisy dataset to train the classifier.Alternatively a clean dataset could be used but user images would need to undergo further automated image enhancements comparable to those performed manually with Photoshop.
Experiments for individual species provided some interesting results.Concerning minimal values of k, i.e., the size of the set of candidates that are considered best possibilities in an identification process, good levels of accuracy were obtained for k = 7 in 63% of the species.Working with noisy images had a negative effect on levels of accuracy on 61 out of 65 species studied, as compared to clean images and a clean dataset.Finally, texture also stands out in most individual cases as the determining factor for high accuracy levels as compared to leaf shape.
Our statistical analysis of best algorithms for k = 5 did not render a clear winner but highlighted that the best combination of algorithms should use weigths smaller than 0.2 to HCoS.

Future Work
A natural next step in this research is to develop a mobile app that uses the georeference of photographs of leaves as an additional criterion to classify species.Most modern mobile phones already include excellent cameras and provide the option of automatically georeferencing any picture taken with these cameras.In addition to the reference image dataset such as the one developed for this research, maps of potential distribution of species of Costa Rican trees would be needed.Atta, a comprehensive and fully georeferenced database of thousands of species of organisms from Costa Rica developed by the National Biodiversity Institute (INBio)2 and GBIF's database3 are excellent foundations to generate these potential distribution maps of species.In addition to curvature, texture, and georeferencing as discriminating factors, morphological measures of leaves are also frequently used by specialists to identify plant species.Some of these measures are: aspect ratio, which is the ratio of horizontal width to vertical length; form coefficient, which is a numerical value that grades the leaf shape as between circular (shortest perimeter for a given area) and filliform (longest perimeter for a given area); and blade and petiole length.Algorithms to calculate these measures have already been developed (e.g., WinFOLIA).However, they have not been integrated in computer vision systems for automatic identification of plant species.
A crowd sourcing approach could be a very efficient way to increase the size of the image dataset that currently comprises 66 plant species from Costa Rica.In addition, crowdsourcing could also be used to clean noisy pictures as a citizen science project.
Finally, the individual contribution of texture features such as venation, porosity, and reflection in characterizing a plant species has not been formally established.A more elaborate analysis of the leaf texture that disaggregates it into a separate layer for each these features would help understand and quantify their individual contribution.
(a) A Robinsonella lindeniana var.divergens sample scanned from a leaf sample from INBiopark, using a HP ScanJet 300 scanner, then cleaned using Photoshop CS6 (b) A Bauhinia ungulata sample taken using a Canon PowerShot SD780 IS camera at the Sabana Park

Figure 3 :
Figure 3: Segmented Samples After applying EM to different Costa Rican species

Figure 4 :
Figure 4: Clipping of a Coccoloba floribunda sample The left image is the original leaf image, and the right one is clipped to the leaf size

Figure 5 :Figure 6 :
Figure 5: Top Hat Transformation applied to a segmented compound leaf image to detect the stem of the leaf

Table 2 : 3 .
Models used in the experiments including curvature, variants of texture model, and combination of The combination of HCoS and the best LBPV variant, which according to our tests was R1P8 & R3P16.This combination was further disaggregated by assigning different weights to HCoS and the texture model.

Figure 11 :
Figure 11: Comparison of HCoS and Combinations

Figure 12 :
Figure 12: Box Plot of leaf image recognition times simulating a mobile app back-end, for Costa Rican noisy and clean subsets

Figure 13 :
Figure 13: Leaf samples of species with low accuracy

Figure 14 :Figure 15 :
Figure 14: Accuracy distribution across different clusters found on the species

Figure 16 :
Figure 16: Distribution of Probability of successful identification with noisy data and k = 5

Figure 17 :
Figure 17: Distribution of Probability of successful identification with clean data and k = 5

Table 3 :
Factors and levels for GLM per species Algorithm 10 was used to comprehensively consider different weight combinations for HCoS and the texture model.Table5summarizes the results obtained.To understand the duration of the recognition process, we measured the recognition time for all images from both Costa Rican noisy and clean subsets, as if a back-end received images from a mobile app.The measured time includes image loading, segmentation, stem deletion, normalization, curvature calculations, texture calculations, and similarity search.It does not include network related times.We used a MacBook Pro with an Intel Core i7, 2.8 GHz, and 8 GBs on RAM.
4.3 Statistical Analysis For Noise Affectation, Best Algorithms per Species, and best value k

Table 4 :
Other studies comparison of obtained results on the Flavia dataset

Table 5 :
Accuracy obtained when combining curvature and texture over the clean subset, the noisy subset, and the complete Costa Rican dataset

Table 6 :
Proportion Test results over the Costa Rican Clean Subset

Table 7 :
Proportion Test results over the Costa Rican Noisy Subset

Table 8 :
Proportion Test results over the Costa Rican Complete Dataset

Table 9 :
Per Species Table with Accuracy Mean, Maximum, Best Algorithms, Affectation by Noise and k

Table 10 :
Cluster definition and most significant Algorithms per Cluster