Automatic 3D Pollen Recognition Based on Convolutional Neural Network

*e importance of automatic pollen recognition has been examined in several areas ranging from paleoclimate studies to some daily practice such as pollen hypersensitivity forecasting.*is paper attempts to present an automatic 3D pollen image recognition method based on convolutional neural network. To achieve this purpose, high feature dimensions and complex posture transformation should be taken into account. *erefore, this work focuses on a three-part novel approach: constructing spatial local key points to obtain local stable points of pollen images, computing orientational local binary pattern using local stable points as the inputs, and identifying the pollen grains using convolutional neural network as the classifier. Experiments are performed on two standard pollen image datasets: Confocal-E dataset and Pollenmonitor dataset. It is concluded that the proposed approach can effectively extract the features of pollen images and is robust to posture transformation, slight occlusion, and pollution.


Introduction
According to the previous researches [1][2][3], high concentration of allergenic pollen in the air may undermine the health of those who are pollen-hypersensitive. Stimulation by pollen-containing allergen can cause allergic reaction or exacerbate some related diseases. us, researchers have been studying alleviating the potential threat by improving the accuracy of allergenic pollen forecasting. It has been proved that computer science can be a more effective resort to recognize pollen grains compared with manual analysis. Computer vision and machine learning techniques are considered as an improvement by many researchers for automated pollen image classification, since the intrinsic digitalized features, such as shape, texture, and geometric and statistical features, can be easily captured when they are obtained by microscopes [4][5][6][7][8]. us, the principal task is to recognize plant species by their pollen using computer techniques. Allergy control scientists can analyze allergen levels from the recognition results of the pollen grains collected in a specific area. us, related departments can effectively forecast the concentration of highly allergenic pollen for those who are pollen-hypersensitive. However, some categories of pollen have similar surface structures but significant differences in the inner structure. erefore, these types of pollen grains can be easily recognized in 3D cases.
Several studies have discussed the use of computer vision technique for classifying or recognizing pollen grains. Stillman and Flenley initially proposed the need for automated palynology in 1996 [9]. Since then, recognition and classification of pollen have been widely studied at a microscopic scale. Achievements in automatic pollen recognition in recent years can be reviewed with two widely accepted approaches, namely, spectra analyzing recognition methods and texture analyzing recognition methods. Some scholars have been contributing to the establishment of spectra recognition methods. Ribeiro evaluated the capacity of Raman parameters of pollen spectra, calculated for only 7 common band intervals in a limited spectral range, to be used as pollen automatic identification. In the testing step, using support vector machine as the classifier, 14 out of the 15 pollen species were correctly assigned and 93.3% recognition rate was achieved [10]. Sauliene et al. evaluated the capabilities of the new Plair Rapid-E pollen monitor and constructed a first-level pollen recognition algorithm. is method is evaluated on three devices located in Lithuania, Serbia, and Switzerland, with independent calibration data and classification algorithm. e algorithm achieved 80% accuracy for 5 out of 11 species. Fluorescence spectra showed similarities among different species, ending up with three well-resolved groups [11]. Seifert et al. analyzed Surface Enhanced Raman Scattering (SERS) and constructed an artificial neural network to extract the taxonomically relevant information under high intraspecies spectral variation caused by signal fluctuations and preparation specifics. e results show that SERS can be used for the reliable characterization and identification of pollen samples [12]. According to the researchers, the spectra analyzing recognition methods have their own advantages, but there are also many drawbacks, such as vulnerability to the effect of optical system parameters and curve nonlinear problems.
As for the texture analyzing recognition methods, some researches substantiate the belief that these types of methods can effectively describe the image features and have good robustness to the resource images. Ronneberger et al. extracted 14 different invariant gray-scale features based on the Euclidian 3D transformation group with nonlinear kernels; 26 most important German pollen taxa are classified by using support vector machine and achieved 82% recognition rate in confocal pollen dataset [13]. Bourel et al. presented multi-CNNs based on multiple convolutional neural networks which are capable of assisting paleontologists to deal with poorly preserved pollen samples. e proposed algorithm recognizes intact, damaged, and fossil pollen grains with misclassification rates of 2.8%, and 3.7% of 3 types, which are Amaranthaceae, Poaceae, and Cyperaceae [7]. Han and Xie proposed local decimal pattern (LDP) by comparing the gradient magnitude of pixel blocks for pollen image recognition. LDP extracts a single texture feature in three directions, which decreases the dimensionality of pollen features. But the reduction of time complexity still remains a challenge for LDP. e average correct recognition rate of LDP on Pollenmonitor dataset is 90.95% [14]. Amu and Hasi described 1320 microscopic images of pollen granules from 13 different species and identified them by pseudo-Jacobi Fourier moments. e proposed algorithm achieved a satisfying result in some categories such as Saffron, but it was not suitable for some other pollens. 90.2% identification rate was found in this paper [15]. Filipovych et al.'s research claims that analyzing the visual texture of pollen grains for each focal image and performing identification using a fast sequence-matching algorithm can effectively identify pollen grains from sets of multifocal image sequences obtained from optical microscopy [16]. ey proposed a method to recognize pollen grains adopting two classification stages which increased the classification rate by 6% [17]. Although the texture analyzing algorithms have achieved satisfactory classification results in laboratory pollen databases, there are still some nonignorable disadvantages, such as high time complexity and limitations in the recognition of specific pollen categories.
Moreover, although a great process has been made in 2D pollen image recognition researches mentioned above, it is still difficult for some categories to be classified correctly. ese kinds of pollen, such as Betula & Tilia and Larix & Poaceae, have similar surface structures but are significantly different in their inner structures. In fact, this key information is usually missing in 2D images, which leads to poor results compared to 3D approaches. Figure 1 shows two examples of pollen grains in different dimensions.
According to these disadvantages of pollen recognition methods mentioned above, we propose an efficient and robust algorithm to identify different categories of pollen grains. Firstly, we detect local stable regions of pollen images from spatial local key point (SKLP) [18] descriptor as the input of the next stage.
en, orientational local binary pattern (OLBP) [19] of the pollen image is calculated by expanding the focus area of local stable regions mentioned above. Finally, the processed feature is used as the input of convolutional neural network to identify the category of the target. Experimental results on the standard pollen image datasets show that the proposed method is robust to posture change of pollen grains and can effectively reduce the time complexity of the algorithm. Figure 2 shows the flowchart of the proposed method.

Proposed Approach
In this paper, we introduce an automatic 3D pollen image recognition method based on convolutional neural network by combining both spatial local key point and orientational local binary pattern. Firstly, a 3D model is built to extract OLBP and SLKP from collected pollen images, which divides the original data into different blocks by determining a scale factor k: where k is the scale factor to divide the image and x is the sampling point consisting of pixels in related neighborhood. e core steps of the feature extraction method are shown as follows. Figure 3 indicates the sampling model in 3D neighborhood.

Extraction of the Spatial Local Key
Point. SLKP is a lightweight statistical feature extraction method based on SIFT and histogram algorithms, which can reduce the high dimensionality of the descriptors for 3D pollen images and effectively indicate the spatial relationship among 3D pixels. It is proved by [18] that SLKP provides a solution for the extension of SIFT from two dimensions to three dimensions, which can solve the problem of information loss mentioned above. e main steps of SKLP are as follows: Step 1. Constructing the 3D Gaussian pyramid. Function L (x, y, z, σ) is defined to determine the Gaussian scale space by calculating the convolution of Gaussian convolution kernel G (x, y, z, σ) and original image I (x, y, z); L is the output of this step: Step 2. Detecting the extremum of the local gradient in each sampling point instead of calculating Gaussian difference pyramid in SIFT to locate the stable blocks between different layers in Gaussian pyramid. A coordinate system with sampling center as the origin is set and local gradient is applied to calculate the positive differential vector and the negative differential vector. According to the difference between neighborhood sampling point and center sampling point, gradient differential vector can be determined as follows: where loc and σ indicate the location and layer of the current block, respectively.
Step 3. After we determine the gradient differential vector which indicates the trend of the gray level in the image domain. e region of interest (ROI) can be detected by obtaining the norm of gradient vectors from different layers. Inst is the set of ROIs as the output of this step: where r is the threshold of the norm of the gradient vector. We choose the blocks from set Inst which have similar gradient vectors from the same region in different layers of Gaussian pyramid as the local key points mentioned above.
Step 4. Due to lack of rotation robustness of previous steps, we cannot use such features as inputs of classifier directly. erefore, descriptor representation is quite necessary in SLKP. In order to deal with the rotation problem, 3D rotation matrix is introduced to transform differential vector from a base coordinate system to another: where α, β, and c are the Z-Y-X Euler angle and sine and cosine are abbreviated as s and c. According to Lowe's theory [20], the normalization of coordinate system should be taken into account to enhance the robustness of rotation transform. In traditional SIFT, a consistent orientation is assigned to each keypoint based on local image properties. e descriptor can be presented based on this orientation and therefore achieve invariance to image rotation. Similarly, two consistent orientations are assigned under the 3D coordinate system. erefore, rotation invariance can be ensured by transforming the gradient vectors into such coordinate system. Filtered ROIs can be obtained as SLKP by comparing the rotated vectors.

Transformation of the Orientational Local Binary Pattern.
Orientational local binary pattern is a local feature for threedimensional pollen image recognition, with which we can effectively extract the 3D texture and analyze the relationship Sampling center Sampling point Scale factor Scientific Programming 3 among spatial voxels of pollen images. e main steps of OLBP are as follows: Step 1. Constructing nine-feature plane for each sliding window as c direction . Binarizing the original image modeled on typical local binary pattern and computing the threshold direction vector α l : where C x (δ) is the x coordinate value and l is the threshold control parameter where 0 indicates low threshold direction vector and 1 indicates high threshold direction vector.
Step 2. Calculating the deviation between direction vector and the normal vectors of each feature plane under the coordinate system based on sampling center. Choose the feature plane which has minimized difference as the optimal feature plane of the sampling center and mark it as where Γ is a collection of feature plans' normal vector. f (β, c) is the evaluation function defined as Step 3. From Ojala et al. [21], local binary pattern can be extracted under a planar neighborhood. We calculate the traditional local binary pattern in the optimal feature plane of most sampling points. Particularly, local binary pattern is obtained in a wider neighborhood in spatial local key points mentioned above.
where ( o x c , o y c ) is the center pixel of the sampling center in the coordinate of optimal feature plane; t is the range factor of spatial local key points; i c is the gray value of sampling center and i p is the gray value of its neighborhood.

Recognition by Convolution Neural Networks.
OLBP are applied to compute the 2D local binary pattern from spatial dimensions by determining optimal feature plane. Convolution neural network is a variant of feedforward neural network which was firstly introduced by LeCun et al. [22] and proved as a desirable classifier to identify pollen images [23][24][25]. We use processed features as the input of convolution neural network which can capture complete information of pollen images. As shown in Figure 4, our proposed convolution neural network consists of two convolution layers C 1 and C 3 , where both kernel sizes are 5 × 5, two pooling layers P 2 and P 4 , where both subsampling regions are 3 × 3, and a fully connected layer F5 as the output layer.

Experimental Results
We focus on two standard pollen image datasets with a PIV computer with 2.8 GHz CPU and 16 GB memory. Confocal-E dataset is a classic 3D pollen dataset that includes 5360 pollen grains from 27 different categories of pollen images collected by confocal laser scanning microscopy in Germany [13]. e pollen images, including Secale, Poaceae, and Fagus, are divided into three groups by sensitization, namely, highly allergenic, moderate allergenic, and lowly allergenic [26][27][28][29][30][31]. e dataset is augmented by taking different transformations, especially rotation transform, in order to validate the geometric invariance of the proposed method, which aims at increasing the volume of labeled training sets by applying transformations while preserving their class labels. Pollenmonitor dataset is a real-world dataset with 22750 pollen grains from 33 categories, in which all the images were automatically collected by a specific equipment, the first Pollenmonitor prototype in Europe [32]. For the purpose of keeping enough local structural features and reducing the complexity of the algorithm, images are preprocessed by filtering and interpolating before experiment. e preprocess does not deform the samples once they were executed in square dimension. By varying the value of scale factor k, the average recognition rate is shown in Figure 5. It is obvious that the best recognition rate is obtained by k equaling 3. e performance index of precision rate (PR), recall rate (RR), and F1-score are used to evaluate the recognition performance of the proposed method. In order to validate the performance of the recognition method proposed above, the average recognition results are also compared to those of the four mainstream methods on two datasets: SKLP descriptors, LDP descriptors, pseudo-Jacobi Fourier moments (PJFM), and traditional CNN.

Experimental Results in the Confocal-E Dataset.
Representative experimental results of 6 pollen categories from the Confocal-E dataset are shown in Figure 6. It can be inferred that most of the images in Confocal-E dataset, which have clear edge and background, are correctly recognized by the proposed method. Table 1 shows the experimental results of 6 representative pollen images. It can be seen that there is still some difference between the recognition performances of different pollen categories. Among all the pollen categories, the precision rate reaches 96.60% in Compositae, which has the most special spatial structure. As for the pollen images that have similar appearance, such as Acer and Alnus, the precision rate can reach about 82.47%. It can be concluded from the experimental results that the performance of the proposed algorithm is affected by the spatial structure and specificity of pollen images. Most of the false recognitions mainly result from deformation of pollen grains or environmental interference. e experimental results validate that the proposed method is an effective algorithm for pollen recognition. Figure 7 presents the recognition results based on 6 representative pollen categories in the Pollenmonitor dataset.

Experimental Results in the Pollenmonitor Dataset.
e experimental results indicate that Pollenmonitor dataset has a lower quality of pollen images compared to Confocal-E dataset, which could be the result of different collecting equipment. Table 2 presents the detailed recognition results on the 6 pollen categories. It is obvious that the recognition performance varies between categories. Although the precision rate in Pollenmonitor dataset is influenced by the quality of the pollen images, most pollen can still be correctly recognized. e highest precision rate of 91.35% is obtained on the classification of Alnus, while the precision rate on Fagus pollen can reach 76.67% at least. Most pollen images with different posture can be classified correctly. False examples are mainly those pollen images which were squeezed or polluted during the automatic collection process.

Ablation Studies.
In this subsection, we show the effectiveness of our design choice. Our ablation study process is split into the following parts: First, we use the original pollen images as the input of convolution neural network as the base architecture. Second, OLBP descriptors of pollen images are extracted as the inputs of CNN without detecting local stable regions. en, SLKP descriptors are input as well. Finally, we combine both SLKP and OLBP to verify the classification results. Table 3 shows the effect of removing different parameters on experimental results.
In this experiment, it is observed that SLKP features are as meaningful as OLBP features, comparing the accuracy improvement from the baseline. Moreover, we find that the false results of two features occurred in different pollen categories. For OLBP, there is no obvious difference in texture among most of the misclassified pollens on the surface or in inner structures. It is implied that OLBP shows a good distinguishing ability for pollen categories that have obvious texture characteristics compared with SLKP. On the other hand, most of the false results of SLKP are similar in

Results and Discussion
e experimental results are compared to SKLP [18] descriptors, LDP descriptors [14], faster CNN [33], and multi-CNNs [7] to verify the validity of the proposed algorithm on Confocal-E dataset. It can be seen from the three-dimensional pollen images that different textures cover the external wall of various pollen grains, such as thorn, tumor, rod, cave, and net, which are more obvious than those in the twodimensional pollen images.
us, the proposed method produces better recognition results in both standard datasets. Comparison results between the proposed method and the other algorithms regarding the average precision rate on two datasets are shown in Table 4. e table shows that the recognition performance of the proposed method is superior in some aspects to some other methods on the pollen images. From the experimental results, the average precision rate for the pollen images reaches up to 90.25%, which is higher than LDP by 8.6%. e complexity of the algorithm is apparently reduced in the process of combining OLBP and SLKP descriptors in the proposed method. Besides, the average recall rate with faster CNN underperforms compared to that of the proposed method by 22.94%, which indicates that the proposed method may have some advantages in recognizing some unexpectable appearance of pollen grains. e line graph shown in Figure 8 describes the average recognition precision versus average recall rates of different descriptors. It can be inferred from the line graph that the proposed approach has better performance on Confocal-E dataset.
e proposed method combines both SLKP and OLBP to describe different features of pollen grains. SLKP is a lightweight algorithm that can reduce the high dimensionality of the descriptors for 3D pollen images. OLBP is a low complexity local feature extraction method based on local binary pattern in space domain, which can reduce the feature dimension of pollen images by selecting optimal feature plane and ignoring redundant information. In the proposed method, local stable points are detected by SLKP and are later used in OLBP. Abundant textures, edges, and corners of pollen images make it easier and faster for the gradient vectors to detect the stable points between layers with different scales in 3D Gaussian pyramid so that SLKP can be effectively extracted using the proposed method. e average recognition time, including preprocessing time and feature extraction time, is calculated to be about 1.6 s, which is shorter than those of some other recognition algorithms. Satisfactory results on the pollen datasets further validate the good geometric invariance of the proposed method.

Conclusion
In this paper, we introduce a pollen recognition method based on convolutional neural network, and this method is applied to pollen recognition and classification experiment on two standard pollen image datasets. It contributes to the improvement of pollen classification in the following aspects: Firstly, the combination of SLKP and OLBP contributes to the integrity of extracted pollen features. More factors have been taken into consideration. Secondly, the experimental results show that the proposed method has great robustness to geometric transformations such as illumination, rotation, scale transformation, and affine transformation. Finally, convolutional neural network is proved as a better classifier than some other classification methods. We will hopefully design a lightweight CNN architecture and improve the recognition rate in further studies.

Data Availability
e data used to support this study are available at 10.1109/ ICPR.2002.1048297. e prior studies and datasets are cited at relevant places within the text as [13].

Conflicts of Interest
e authors declare no conflicts of interest.