Interactive Deep Learning for Exploratory Sorting of Plant Images by Visual Phenotypes

This paper proposes an interactive system called Andromeda 1 that enables users to interact with machine learning models to allow for exploratory sorting of images through a cognitive approach that uses a reduced dimension plot. In our system, a dimension reduction algorithm projects the images into a 2D space representing similarities between the images based on visual features extracted by a deep neural network. With Andromeda, users can alter the projection by dragging a subset of the images into groups according to their domain expertise. The underlying machine learning model learns the new projection by optimizing a weighted distance function in the feature space, and the model re-projects the images accordingly. The users can explore multiple custom projections to learn about the visual support for di ↵ erent groupings based on explainable-AI feedback. Our approach incorporates user preferences into machine learning model construction and allows transfer learning from pre-trained image processing models to accomplish new tasks based on user inputs. Using edamame pod images as an example, we interactively re-project the images into di ↵ erent groupings based on maturity and disease, and identify important visual features from the pixels highlighted by the model.


INTRODUCTION
In recent years, computer vision and artificial intelligence (AI) have played crucial roles in automating imagebased decision processes in agriculture research and production. Many AI models have been developed to diagnose plant diseases, 2 determine plant species, 3,4 and assess plant product quality. 5 Most published models follow a simple workflow consisting of a label-train-test-release cycle.
However, researchers sometimes need a more exploratory approach in which they interactively explore many alternative labelings based on their domain expertise. Thus, the goal of this paper is to create an algorithm that aids users in performing exploratory sorting on the fly, as they might sort physical seed pods on a tabletop, augmented with interactive machine learning that incorporates user perception feedback.
To dynamically incorporate user's exploratory feedback into the machine learning processes, we developed an interactive machine learning platform within a computational notebook (Jupyter) called Andromeda. Users can provide feedback to the machine learning models through regrouping the projected images. First, we exploit transfer learning 6 to extract image features using a popular pre-trained image-processing neural network. Second, the user is provided with a 2-D interactive projection where they can visualize similarities and re-group the images. Third, a machine-learning algorithm for inverse projection learns the revised similarities between the user-specified groups and renders a new projection of the images. It uses the interactive feedback from the user to learn a novel distance model that weights the visual features extracted by the network. Fourth, the explainable-AI highlights the pixels in the images corresponding to the up-weighted image features, thus providing visual justification for the user's grouping. In this manuscript, using images of edamame pods as our model system, we demonstrate the functionality and utility of our interactive machine learning approach.

Dataset and Preprocessing
Images used in this paper were collected by the Li Lab of Applied Machine Learning in Genomics and Phenomics at Virginia Tech. 7 This dataset comprises ready-to-harvest, late-to-harvest, and diseased pod images (100 images with 10-20 pods in each image). Figure 1 shows the sample raw data and image pre-processing results. We used an improved vegetation index, Excess Green minus Excess Red (ExG-ExR), 8 to identify pods for our data sets. ExR was subtracted from ExG with a zero threshold to create the ExG-ExR binary image. After computing a binary image from vegetation indices, we applied several morphological transformations. 9, 10 We used dilation to increase the object area and closing and opening, which cleaned background noise by imputing missing pixel values. Finally, after vegetation indices and morphological transformations, we obtained a binary image mask with pods as white and background as black. Pods were detected by finding the contours of these masks.

Feature Extraction and Visual Back Propagation
We use a convolutional neural network (CNN) to convert images into meaningful quantitative representations. image data with multiple levels of abstraction. In particular, convolutional neural network (CNN) models are widely used in computer vision. 11,12 Although any pre-trained CNN model could be applied, in this paper we use the pre-trained ResNet-18 model from ImageNet. 13 ResNet was introduced in 2015 and won several competitions in computer vision 14 , and is one of the widely used CNN models to extract features from images. We use the last convolutional layer to extract 512 features from each image. 15 • Visual Back Propagation To visualize each feature extracted from the ResNet-18 model, we utilize a modified version of the visual back propagation 16 method to visualize sets of pixels of the input image that contribute most to each feature. Starting from the 512 feature map from the last convolutional layer, we back propagate each feature and average the feature maps after each ReLU layer. The averaged feature map of the last convolutional layer is scaled-up via deconvolution and multiplied by the averaged feature map from the previous layer. The resulting intermediate mask is again scaled-up and multiplied. This process is repeated until we reach the input image layer. We initiate visual back propagation using feature weights from the interactive dimension reduction model, thus enabling explainability of the projection.

Dimension Reduction and Visual Analytics
To visualize similarities between images, we use a dimension reduction (DR) algorithm to project the 512dimensional data into a 2-dimensional plot. To allow users to drag images and form new projections, we use an interactive framework called Andromeda. 1 A weighted Multi-Dimensional Scaling (MDS) algorithm with a weighted distance metric enables both forward and inverse projection. Although any dimension reduction algorithm could be applied (such as PCA or t-sne), MDS closely matches the user's cognitive task of sorting images by mapping high-dimensional image similarities to 2D distances. Proximal images in the projection are similar in the weighted high-dimensional image feature space 1 . MDS also easily adapts to di↵erent distance functions for experimentation. While MDS projects the high-dimensional data to a 2D scatter plot, 17 a weighted distance function with user-specified weights on each dimension enables alternative projections that emphasize di↵erent dimensions. After interactive sorting, an inverse dimension reduction algorithm learns distance function weights for the user-modified projections of the images. Figure 3 shows the system design.

INTERACTIVE CASE STUDY
These case studies validate the hypothesis that our interactive system with features extracted by the deep neural network can dynamically capture the novel abstractions interactively specified by the users when dragging images in the plot, and that the visual back propagation method provides explainability to uncover the important visual features identified through the interactive learning process that support the user's abstractions.

Pods Based on Maturity Stage
The maturity stage of each pod as either diseased, late-to-harvest, or ready-to-harvest is a phenotype that can be determined by trained observers. Here we test if a user can sort the images according to these phenotypes with the help of the machine learning. The image data for 30 randomly chosen edamame pods are displayed on the 2D projection as shown in figure 4 (a); note that the color coding was added to show that the default plot does not capture the desired phenotypes as clear clusters. The initial weights for each feature are equal. In figure  4 (b) the user interactively drags 15 pods highlighted in green in order to group them into 3 clusters according to desired phenotype categories. Figure 4 (c) shows the updated projection, which produced three main clusters of pods according to their maturity stage. The red cluster shows the pods that are too late to harvest, the blue cluster are diseased and the green cluster are ready to harvest. This indicates that the desired phenotypes of each pod were e↵ectively captured by the weighted features and successfully learned by Andromeda.
Furthermore, the explainable-AI visualizations of specific pods depict the most important visual features in our re-grouping as learned by the interactive model. In figure 4 (d) we see that one of the more important visual features learned to determine disease phenotype is a salient discolored spot. Similarly, in figure 4 (e,f), areas of each pod correlating to important features are highlighted. This provides us with insight into which parts of the pod are important for visually discerning a diseased, late-to-harvest, or ready-to-harvest product.

Pods Based on Seed Numbers
The number of seeds per pod is an important phenotype that potentially a↵ects consumer acceptance of the product. However, the images were not originally collected to determine the number of seeds. The number of seeds is a novel visual feature that can be observed directly by the end users but is not initially clustered in the default projection. Images of 30 edamame pods are displayed in the 2D plot in Figure 5 (a) with equal weights applied for each feature. The user interactively drags 15 pods highlighted in green to group them into 3 clusters according to number of seeds, as shown in 5 (b). Figure 5 (c) shows the updated projection with clear clustering.
We find that the "number of seeds" phenotype is well captured by the weighted features learned by Andromeda. To explain the feature space with updated weights, we select the features with higher weights as an example. In Figure 5 (d,e,f), the most relevant CNN features mainly captures the overall shape of the pod to di↵erentiate pods with di↵erent numbers of seeds.
These case study results indicate that Andromeda with features extracted by CNN deep neural networks can indeed enable interactive sorting of pod images according to various human-guided visual phenotypes. In future work, the resulting weighted feature models could then be used as classifiers for larger collections of images. We plan to extend these methods to more complex input scenarios, such as images of pods on live plants captured in the field with mobile phones.