Semantic segmentation of pollen grain images generated from scattering patterns via deep learning

Pollen can lead to individuals suffering from allergic rhinitis, with a person’s vulnerability being dependent on the species and the amount of pollen. Therefore, the ability to precisely quantify both the number and species of pollen grains in a certain volume would be invaluable. Lensless sensing offers the ability to classify pollen grains from their scattering patterns, with the use of very few optical components. However, since there could be 1000 s of species of pollen one may wish to identify, in order to avoid having to collect scattering patterns from all species (and mixtures of species) we propose using two separate neural networks. The first neural network generates a microscope equivalent image from the scattering pattern, having been trained on a limited number of experimentally collected pollen scattering data. The second neural network segments the generated image into its components, having been trained on microscope images, allowing pollen species identification (potentially allowing the use of existing databases of microscope images to expand range of species identified by the segmentation network). In addition to classification, segmentation also provides richer information, such as the number of pixels and therefore the potential size of particular pollen grains. Specifically, we demonstrate the identification and projected area of pollen grain species, via semantic image segmentation, in generated microscope images of pollen grains, containing mixtures and species that were previously unseen by the image generation network. The microscope images of mixtures of pollen grains, used for training the segmentation neural network, were created by fusing microscope images of isolated pollen grains together while the trained neural network was tested on microscope images of actual mixtures. The ability to carry out pollen species identification from reconstructed images without needing to train the identification network on the scattering patterns is useful for the real-world implementation of such technology.


Introduction
It is estimated that, depending on geographical location, 10% to 40% of the population in certain areas of Europe suffer from allergic rhinitis (hay fever) [1], and there is also evidence that the susceptibility to different pollen species varies depending on an individual's age [2]. Whilst local pollen count is an important factor in the likelihood that symptoms are displayed, this broad measure does not account for the species of pollen present [3,4]. Thus, having a sensor that could identify the levels of pollen species at a specific location in real-time, so that an individual can either (a) determine the species that is causing them the most severe symptoms, or (b) mitigate the effects by avoiding the pollen, could aid in reducing effects of hay fever (which can include asthma attacks) and help individuals identify the species that affects them most seriously. In addition, monitoring pollen can also be a useful indicator of climate [5,6], insect migration [7] and crop production [8].
Current techniques for real-time sensing of pollen particles is very limited, as optical particle counters only detect particles of an approximate size, and not the type of particle, while pollen collected via Burkard traps [9,10] requires laboratory examination to determine the pollen species [3,11]. Although recently explored automated methods for pollen identification from traps using optical and laser fluorescence techniques have Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. been developed [12,13], these devices can be relatively large, and so a sensor that can image a pollen grain with minimal optics, at lower costs and with a small footprint (e.g. a lensless-based Raspberry Pi [14]) would be invaluable for nationwide, and even worldwide distribution.
Although pollen identification from the scattering patterns has been achieved via deep learning convolutional neural networks [30,31], for a sensor to be deployed in the real-world, it may need to be able to determine pollen grains from mixtures of different species or agglomeration with other particulates (e.g. urban particulate matter [32]). The objective of this work was to devise a deep learning methodology for identification of pollen grains from scattering patterns. However, if a single neural network was applied to this task, the training data would need to include experimental scattering patterns from all types of pollen species, including mixtures of pollen species in all combinations. Since existing databases contain 1000 s of microscope images of pollen grains [33] (rather than scattering patterns from pollen grains), in this work, we propose that the separation of this objective into two components alleviates this challenge. Namely via (i) a neural network for transforming a scattering pattern to an image and (ii) a neural network for identification of pollen species in an image via semantic segmentation [34]. Here, these two neural networks are synergistic, as the output of the image reconstruction network can be used as the input for the segmentation network.
The neural network that is trained to transform scattering patterns to images can be applied to pollen species that were not used during training (because the physical principles of interference and scattering are equivalent for all pollen species). In contrast, the neural network that identifies pollen species via image segmentation must be trained using examples of all pollen species that it is to recognise. Separating the task into two parts means that the identification network identifies pollen species from images (rather than scattering patterns) and can therefore also be trained on images. Critically, this allows us to efficiently utilise existing databases of microscope images of pollen species, rather than having to systematically collect the scattering patterns from all species and combinations of species. In addition, this approach allows the use of data augmentation techniques (which are appropriate for image data but not for scattering patterns). For example, artificially producing data corresponding to mixtures of pollen grains, hence enabling the identification of multiple pollen grains from a scattering pattern, despite having no experimental scattering patterns for mixtures of species in the training dataset.
While some methods have used convolutional neural networks for identifying pollen grains from microscope images, and others have used convolutional neural network-based object detection, these are limited to single output numbers or bounding boxes [35][36][37]. Object detection, such as via YOLO (You Only Look Once [38]), typically produces a bounding box (i.e. a rectangle) around the region of an image that corresponds to the specified object. This bounding box is generally defined by the coordinates of the top-left and the bottom-right corners of the rectangle. Image segmentation, such as the approach presented here, generally produces a mask that has the same size as the input image but with Boolean values annotating whether each image pixel corresponds to part of the specified object. Both approaches are effective methods for defining the presence of an object in an image. However, image segmentation can provide a much richer description of the Figure 1. Two-step concept showing image generation from experimental scattering patterns followed by identification via semantic segmentation of the generated images. A neural network is trained using microscope images and scattering patterns to be able to generate a microscope equivalent image of previously unseen pollen grains. Then (using only microscope images for training) a neural network is trained to segment a generated image into its constituent parts, for example as shown here, background, Iva xanthiifolia or Narcissus pollen grains. extent of the object in the image, as each individual pixel is labelled rather than the provision of a bounding box [39]. Consequently, image segmentation can also enable a more accurate estimation of the size of an object. The size and shape of pollen grains have been shown to assist in identifying seed-siring success [40], pollen grain viability [41], pollinator feeding strategies [42], and the identification of micro-mutations [43]. There is shown to be a phyletic relationship between pollen size and plant variability, such as pistil length [44], and hence pollen size and shape could be used for statistical analysis on the variability of associated plants. The pollen grain size and number has been shown to be a trade-off when under limited resources [45], with sizes also acting as an indicator of soil fertility and Mycorrhizal infection [46], and soil nitrogen [47] and phosphorus levels [48], and hence analysis could provide insight into nutrient levels. The size and shape of pollen grains are also strongly affected by environmental conditions, including moisture [49,50] and temperature [51], with resultant consequences on germination rates, and hence size and shape identification could be used for monitoring the effects of changing climatic conditions. In this work, as shown in figure 1, we use collected scattering patterns from different pollen grain species and train a neural network to generate images of the pollen grains from their scattering patterns. Subsequently, we use semantic segmentation (colour labelling of images), performed via deep learning neural networks, to identify the pollen grains in these generated images and use the labeled pixels to calculate their projected area.
The key part here is that only microscope images (and not generated images) were used to train the segmentation neural networks, thus allowing the potential for previously experimentally unseen pollen (hence, unseen scattering patterns) to be identified from their image reconstructions.
The neural network that was trained to transform scattering patterns to images can be applied to pollen species that were not used during training (because the physical principles of interference and scattering are equivalent for all pollen species). This is evident by the application of machine learning for creating a general solution to the phase retrieval problem [52][53][54]. In contrast, the neural network that identifies pollen species via image segmentation must be trained using examples of all pollen species that it is to recognise. If no label exists for an object, then a segmentation neural network will be unable to identify that object. However, the segmentation network might be able to determine that there is an object that is not recognised, but as there is no associated label the network will not be able to identify the object type [55].

Experiments
The experimental work was split into two. The first experiment involved collecting scattering patterns and images of single pollen grains from ten different species. Experimental scattering patterns and images from eight species were used to train an image generation network, while data from the remaining two species were then used to test the image generation network. Subsequently, an image labelling neural network was trained using the microscope images from all pollen grain species (excluding the microscope images equivalent to the generated images used in testing), and once trained, the network was tested on images generated from experimental scattering patterns. This segmentation neural network labelled objects in the generated pollen grain images by producing a colourised version of the image, with each colour acting as a label and corresponding to a specific pollen species.
The second experiment involved collecting experimental scattering patterns and microscope images from single pollen grains, multiple pollen grains and mixtures of two different species. An image generation network was trained on non-mixed species (single and multiple pollen grains), whilst a segmentation neural network was trained on images of mixed species produced via augmentation of single pollen grains. Testing was carried out on images generated from experimental scattering patterns and contained mixtures of species.
For ease of reading, the experiment that involves the image generation and subsequent identification of species from single species images will be named as the 'isolated species' experiment and the image generation and subsequent identification from images of mixtures of pollen grains will be termed the 'mixed species' experiment.

Sample fabrication
Iva xanthiifolia and Populus deltoides pollen grains were purchased from Sigma-Aldrich, Narcissus pollen grains were collected from the University of Southampton grounds, and Bellis perennis, Populus tremuloides, Hyacinthus orientalis, Chrysanthemum, Antirrhinum majus, Chamelaucium and Rosa were obtained from flowers purchased at a local convenience store. Two samples were fabricated, in which pollen grains were dispersed onto different regions of 25 mm×75 mm×1 mm borosilicate glass slides. One sample was a glass slide for the isolated species experiment, which was covered with all the pollen grain species listed above, and the other sample was a glass slide for the mixed species experiment, which was covered with pollen grains from species Narcissus and Iva xanthiifolia.

Experimental setup
As per our previous work [26], three collinear laser beams at different wavelengths, 450 nm (blue), 520 nm (green) and 635 nm (red), were focussed using a 50× objective lens onto pollen grains that were present on a glass slide (see figure 2 for diagram of the experimental setup). This glass slide was mounted on a 3-axis manual translation stage (5 cm maximum movement in each direction) to allow positioning of the pollen grains within the focus of the laser beam. To aid in this alignment, and to identify and obtain microscope images of the pollen grains, the pollen was imaged via back-reflection using a beam splitter and a CMOS camera sensor (Thorlabs Inc., DCC3260C, 1936×1216 pixels). The camera sensor used for collecting the forward scattered light from the pollen grains (Thorlabs Inc., DCC1645C 1280×1024 pixels) was positioned ∼3 mm after the glass slide, perpendicular to the laser propagation axis.
Although in this work a 50× microscope objective, beam splitter and imaging camera sensor are present in the same setup as the scattering camera sensor, they are merely used here for convenience for obtaining microscope images at the same time as focussing the laser light for the scattering. In a real-world lensless sensor, such optics and equipment would not be present, and only the forward scattering would be captured, with the image generation neural network essentially replacing the 50×magnification objective, and the microscope images used for training collected externally of the setup to allow for a more compact, cheaper sensor to be fabricated.

Neural network training and implementation
As stated in section 2.1, for each of the two experiments (isolated and mixed species), two separate neural networks were created, one for generating an image and one for segmenting the generated image into different labels. The image generation neural networks used the pix2pix architecture [56], as shown in [57,58]. All the scattering patterns and microscope images were cropped and resized to 256×256 pixels. 300 epochs of training and a learn rate of 0.0002 was used in the isolated species experiment, and 1000 epochs and learning rate of 0.000 01 was used in the mixed species experiment. Both cases used a minibatch of 2, the adaptive moment estimation (ADAM) optimizer and the cross-entropy loss function. Narcissus and Chamelaucium pollen grain images and their corresponding scattering patterns were excluded from the isolated species image generation training so they could be used for testing, in order to mimic a potential real-world scenario in which the image generation network could be presented with scattering patterns from species of pollen that were not encountered during training. For the mixed species dataset (Narcissus and Iva xanthiifolia), training of the image generation network used only scattering pattern/image pairs of isolated species and subsequent testing of the image generation network was carried out using scattering patterns produced by mixtures of pollen grain species. In total, 75 pairs of images were used for training the isolated species image generation neural network, and 80 pairs of images were used to train the mixed species image generation neural network. Diagram showing the experimental setup, which includes three collinear laser beams that were focussed onto pollen grains present on a glass slide. The light scattered forward from the pollen grains was collected by a camera sensor placed ∼3 mm away from the glass slide. The pollen grains were simultaneously imaged via back-reflection using a beam splitter that reflected light from the glass slides surface onto a camera's sensor. Inset: Iva xanthiifolia experimental scattering pattern and corresponding microscope image.
For the segmentation, a ResNet-18-based [42] DeepLabV3 by Google [59] was used, which has been shown to have the highest accuracy in the Pascal visual objects challenge, which is a benchmark in visual object category recognition [60]. The parameters used in the segmentation neural networks were the same in both experiments, where an initial learn rate of 0.001, learn rate drop factor of 0.1 with a period of 10, 10 epochs and minibatch size of 4 were used. The optimizer was the stochastic gradient descent with momentum (SGDM) optimizer and the loss function was the cross-entropy loss function. Training data for all neural networks were augmented via random translation, rotation and mirroring, and additionally for segmentation neural networks, random contrast and brightness adjustments of the microscope images were performed. The neural networks were trained and tested on two computers, one with an Intel ® Core™ i-6700 CPU @ 3.40 GHz, 64 GB RAM and a NIVIDIA GeForce RTX 2080 Ti, 11 GB GDDR6 graphics processing unit (GPU), and the other with an Intel ® Core™ i7-8750H CPU @ 2.20 GHz, 16 GB RAM and a GeForce RTX 2070 Max-Q design, 8GB GDDR6 GPU. Unlike for image generation training, which excluded Narcissus and Chamelaucium pollen grain images, the segmentation training for isolated species included microscope images from all species (excluding the microscope images equivalent to the generated images used in testing), and segmentation testing was carried out on generated images. Training of the mixed species segmentation neural network involved augmenting single grains from isolated species via fusing together microscope images of Narcissus and Iva xanthiifolia pollen (and their corresponding colour label image), and testing was carried out on images generated from the experimental scattering patterns of such species mixtures. A total of 95 pairs of images and labelled images were used for training the isolated segmentation neural network, whereas 184 augmented pairs of images and labelled images were used in training the mixed species segmentation neural network. The minibatch loss as well as the accuracy for the isolated species and mixed species segmentation neural network training is shown in figure A1 in the appendix.
All networks were trained and tested in Matlab 2020b, using the Deep Learning Toolbox, with the image generation neural networks each taking on average ∼0.09 s to generate an image, and the segmentation neural networks each taking on average ∼0.05 s to segment a test image. Such speed in automating segmentation could allow for processing of images in real-time for a sensor, thus unlocking the potential for live information to an individual accessing the sensor.
The development of a network that can convert scattering pattern data, to an approximation of a microscope images, breaks the requirement for large quantities of scattering pattern data (as would be needed for pollen species identification direct from scattering patterns or subsequent generated images via segmentation). This allows pollen species identification to be handled by the image segmentation network, which was trained using microscope images. The image segmentation network was trained exclusively using experimentally captured microscope images (it is potentially possible to obtain this data from external sources). By training on real microscope images, it is ensured that the neural network incorporates data that is of the highest possible quality -in order to accurately learn the features of different pollen species. In testing and subsequent use of the identification network, accurate results can be obtained when inputting generated images (even though input images generated from scattering patterns may be slightly less clear).

Data labelling and augmentation
For training of the isolated species segmentation, a different color was selected for each pollen grain species. This manual labelling was applied using Matlab Image Labeler, which utilizes flood fill and GrabCut algorithms [61]. Figure 3 shows an example of the scattering patterns and images of each pollen grain species and their corresponding segmented colour image. Note that the background label colour is black.
In the case of training the mixed species segmentation neural network, microscope images (and corresponding segmented colour labelled images) containing single Narcissus and Iva xanthiifolia pollen grains, were randomly translated and/or rotated and then fused together (see figure 4), in order to train a neural network to be able to correctly identify the pollen grains from mixtures. Only fused microscope images using isolated pollen grains were used for training the mixed species neural network, whereas experimentally-collected microscope images of mixtures were used for testing.

Results and discussion
To determine the capability of the segmentation neural network to identify the correct species from the generated image, we compare it to the manually labelled generated image, since the comparison between the  . The capability of the neural networks for two different species of pollen grain is displayed showing, from left to right, the microscope image, generated image, the ground truth label, the predicted label and the error between the truth and the predicted label for (a) Narcissus and (b) Chamelaucium. Black in the error images corresponds to correctly labelled pixels, and white incorrectly labelled pixels. labelled generated image and the labelled microscope image is dependent on the accuracy of the image generation neural network. For estimating the accuracy of the image generation, the difference in pixel intensity between the microscope and generated images was calculated as a percentage of the total of the microscope image. The accuracy for (a) was ∼98% and the accuracy for (b) was ∼92%. For the isolated species experiment, the image generation from the scattering patterns of (a) Narcissus and (b) Chamelaucium pollen grains, and the subsequent image segmentation by the neural network trained on ten species of pollen grains, is shown in figure 5. The figure shows, from left to right, the microscope images, the images generated from the scattering patterns, the manually coloured ground truth segmented images, the neural network-predicted segmented images, and the error between the truth and predicted segmented images (in the error images, black corresponds to correct pixels while white corresponds to pixels that differ from the ground truth).
The accuracy of the segmentation neural network for the results of the isolated species experiment shown in figure 5, is displayed in table 1. Despite the difference between the microscope and generated pollen images, the global accuracy (the percentage of correctly classified pixels, irrespective of label, to the total number of pixels), is very high at 99.2% and 99.3%, while the mean accuracy for each label is 97.3% and 95.2% for Narcissus and Chamelaucium pollen grain images, respectively. The mean intersection over union (mean IoU), which is defined as the percentage of correctly labelled pixels to the total number of ground truth and predicted pixels of that label, is 62.9% and 36.2% for Narcissus and Chamelaucium pollen grain images, respectively. The mean boundary F1 score (mean BF score) [62], which is measure of how well the predicted boundary of each label matches with the true boundary, is also higher in (a) compared with (b). The lower accuracy in the Chamelaucium case is likely due to the difference in the generated image and microscope image, as shown in figure 5(b), where it is evident that the edges of the pollen grain have significantly brighter regions than at its centre, unlike the generated pattern which has a more uniform intensity profile. The accuracy of the image generation, calculated as per isolated case, was ∼89% for the image generated in (a) and ∼80% for the image generated in (b). In addition, error has also occurred in the prediction due to misidentification of pollen grain sap (see the red in the prediction in figures 5(a) and (b)), which would have been labelled as black in the manually labeled images. Note that this sap can be present in the generated images, since it is also present in microscope images of the pollen grains used in training the image generation neural network. For the generated images in figure 5, we calculate the area of the species label (estimated area in μm 2 , where 1 square pixel ∼0.1 μm 2 in the microscope image) for Narcissus in (a) to be 348 μm 2 , and Chamelaucium in (b) to be 198 μm 2 .  The results of image generation and subsequent segmentation for two different mixtures of Narcissus and Iva xanthiifolia is shown in figure 6. Again, from left to right, the figure shows the microscope images, the images generated from the experimentally obtained scattering patterns, the ground truth labelling, the prediction by the segmentation network and the error between that prediction and the truth. In the segmented images, the Narcissus pollen is shown in green, the Iva xanthiifolia pollen is shown in blue and the background is black. In the error image, as per figure 5, the incorrectly labelled pixels are coloured white and the correctly labelled pixels are coloured black.
The accuracy of the predictions made by the mixed species segmentation network is displayed in table 2, showing that the global accuracy is ∼99% for both mixed pollen images, while the mean IoU is 96.1% and 85.4%, and the mean BF score is 96.7% for (a) and 81.2% for (b). Although high global accuracy has been achieved even though the generated images appear different from the microscope images, the lower values for the mean IoU and mean BF score for (b) are likely due to the less accurate image reconstruction, which contains some regions of low intensity at the centre of the generated Narcissus pollen grain. The accuracy of prediction could therefore be improved with more accurate generated images, which could be achieved with a larger training dataset and/or higher resolution imaging. The calculated area of the labels in the generated images in figure 6 is 504 μm 2 for Narcissus and 216 μm 2 for Iva xanthiifolia in (a), and 576 μm 2 for Narcissus and 95 μm 2 for Iva xanthiifolia in (b).
As a final piece of work, we test the neural networks on additional generated pollen grain data of isolated and mixed species, as shown in figure 7, which displays scattering patterns and corresponding generated pollen grain images with an overlay of the segmentation label. This image style could be desirable as a live feed from a lensless sensor device, whereby one is able to see the image and the label. The mean accuracy of each label and values of the size of the label are also shown in figure 7. For the isolated Narcissus pollen grain in (a), we estimate its projected area to be 174 μm 2 , for the isolated pollen grain in (b), we estimate its area to be 567 μm 2 , for 2×Iva xanthiifolia data shown in (c), we estimate the area for the top pollen grain to be 173 μm 2 and the bottom pollen grain to be 209 μm 2 , while the estimated area for species in a mixture shown in (d) is 801 μm 2 for Narcissus and 90 μm 2 for Iva xanthiifolia pollen grains.

Conclusion
To conclude, we have shown that neural network image segmentation can be used to determine the species of pollen grains and their projected area in images generated from scattering patterns, via deep learning. We were  able to obtain a mean global accuracy of the correct pixel identification of >98% in generated images of single pollen grains and in generated images with mixtures of pollen grain species. To scale up to real-world sensing and to further improve accuracy and range of pollen species recognised, it would be advisable to expand both neural networks to permit input of higher resolution images (comparable to those available in online databases), since this would allow such databases to be used as a convenient source of mass training data and identification, whilst higher resolution image generation could enable segmentation of pollen grain features. Future work will also look at expanding the training data and testing to 100 s of different species of pollen, which might involve exploring whether segmentation can be done with one network, or whether it is necessary to identify groups of species with certain shapes (circular, triangular etc) and then have segmentation neural networks trained specifically for each group, to obtain greater accuracy in labelling.