Metric Learning for Large Scale Agricultural Phenotyping

,


INTRODUCTION
High throughput phenotyping in agriculture seeks to automate the measurement of properties of plants as they grow.For example, in field grown sorghum, measurements like plant-height, canopy cover, leaf length and leaf width, are visual observable phenotypes that may be measurable automatically.Characterizing these phenotypes over time in large scale field trials opens up opportunities for improving plant-breeding and better understanding of the genotype-phenotype relationships.Some analytics pipelines have been built on platforms like PlantCV 1 but classical approaches often struggle in field conditions.
Recently, convolutional neural networks (CNNs) provide an approach that can use the availability of large training data set to better accommodate real world conditions.CNNs have been the backbone of approaches to extract traditional phenotypes such as leaf counting and segmentation, [2][3][4] panicle counting, 5 and prediction of end of season biomass. 6Most related is recent work on Latent Space Phenotyping, 7 which trains a CNN/LSTM combination on time-sequences of imagery labeled by different conditions (like drought stress or nitrogen deficiency).Using simulated data and datasets captured in controlled environments, they show that on unseen cultivars, they can predict the presence of these conditions quite well.
In our approach, we demonstrate that the embedding can be trained using only labels of the cultivar variety.Even with these simple labels, the features that are learned to distinguish a large number of varieties are also sufficient to predict a wide variety of phenotypic features and genetic variations in realistic field conditions.

APPROACH 2.1 Dataset
To test our method, We create a curated RGB-image dataset based on TERRA. 8,9 ur dataset contains about 58,000 images from 350 cultivars (varieties) of the sorghum Biomass Association Panel 10 of sorghum grown under the TERRA-REF gantry in 2017.Each cultivar was grown in two spatially separated plots, so there are 700 total plots.They are captured on 105 days over the course of the 140 day growing season (some days did not have data captured).
Raw images from the field RGB sensor are 3296 × 2016 pixels.We then split the dataset into training/testing set where 315 cultivars (630 plots) were randomly chosen as training data, and 35 varieties (70 plots) were held aside for test data.There is no overlap of either plots or cultivars between the training and the test sets.

Network Architecture and Training
We follow the Proxy Loss metric learning 11 approach, implemented on a ResNet-50.We use 25 images in each batch, where each image is labeled by plot number.The data augmentation strategy is to randomly crop a 1000 × 1000 sub0image and resize it to 448 × 448 (with appropriate blurring and sub-sampling to create antialiased images).The 448 × 448 image is randomly flipped horizontally and vertically.At test time, the same pre-processing is done except the cropping is always the center 1000 × 1000 image region and there is no flipping.
We use the SGD 12 optimizer with an initial learning rate of 0.01 and a momentum term of 0.9.We train for 40 epochs, at which point the training loss has converged.We follow the loss function in, 11 where each class has a 'desired' location and neural network learns a function such that points from that class are close to the desired location.As our embedding, we follow the suggestion by Nam et al 13 to use the activation of the penultimate CNN layer as embedding features (as opposed to the final output) because the penultimate features typically generalize better to unseen classes.This creates 2048-dimensional features for each image.

Prediction Tasks
In this section, we illustrate how we use the image embedding for the three different tasks of cultivar retrieval, phenotype prediction, and genetic marker classification.

Cultivar Retrieval
We first consider the task of identifying the cultivar observed in a test image.We perform this task using the common gallery style image retrieval-based approach from metric learning -we split the test set into query set and gallery set where every test cultivars present in both sets, then extract features from our metric learning model for all images in the two sets.For any single image in the query set, we compute the distance to the features from all of the images in the gallery set, and sort by their similarity.We then infer the cultivar class from the class of the k-Nearest Neighbors.Since all of the cultivars in the test set are unseen during the training stage.This task presents the generalizability of our model to unseen classes, showing whether the neural network learns to extract features for cultivar differentiation outside of the training set.

Phenotype Prediction
In the task of phenotype prediction, we attempt to predict different ground truth phenotypes using our learned feature representations, with the idea being that a good representation that can differentiate between different varieties of sorghum presumably learned something about the most important visual phenotypes that define those varieties.In order to perform this prediction, we use our trained model to create feature representations for images in both training and test sets.We then create a per-plot, per-day feature, v, by max pooling all of the feature representations from a single plot on a single day.For each of the test per-plot, per-day features, we then perform weighted k-NN and support vector machine regression to infer the phenotype of interest.

Genetic Marker Classification
The final task that we perform is to predict genetic markers for images in the test set.There are five genetic marker families whose genotype × phenotype relationship is well-documented in the sorghum genomics literature: leaf wax group; 14 the dw group; 15,16 the dry stalk (d) locus group; 17 the ma group; 18,19 and the tan group. 20Each sorghum variety in our dataset is considered to be reference (if the genetic marker is in its default configuration) or alternate (if it is mutated in one or more of the known locations in this marker family).Sorghum is a diploid species, meaning it has two copies of each of its 10 chromosomes; of a variety is reference in one copy and alternate in the other copy we drop that variety in this analysis.
For genetic marker classification, we use the similar approach described in section 2.3.2 to predict the genetic marker.Since this is a classification task, in the k-NN model, after the nearest neighbors acquired, the genetic marker of neighbors vote to get the prediction (reference vs. alternate) of query plots; for SVM model, we use the RBF kernel SVM classifier for the prediction.

Embedding Space Visualization
In Figure 1, we show the UMAP visualization 21 of the features from all images in the dataset.The UMAP visualization projects the 2048-dimensional features into 2D, such that points that are neighbors in high dimensions remain neighbors in low dimensions.In the figure, we highlight four example cultivars in detail.In the zoomed view, they are colored by plot.Because the metric learning embedding function was only given plot labels during training -the face that both plots from the same cultivar are mixed together indicates that the features encode features that are heritable and descriptive of sometimes subtle distinctions between similar cultivars.

Cultivar Retrieval
For this task, we use the precision@k metric to evaluate performance.For each query image, there are k images retrieved.Every neighbor with same cultivar gives a score of 1, and 0 otherwise.Then this score is summed and divided by k.When using the GAP feature for this task, the precision@1 is 0.644 (Because there are 35 cultivarscompared to chance performance of and precision@5 is 0.744.This shows the model can extract useful features to describe and identity unseen cultivars.

Phenotype Prediction
In this task, we use the r 2 -value as an evaluation metric.We consider three phenotypes that are available in the TERRA-REF dataset -canopy height, leaf-length, and leaf-width.The scores are shown in Figure 2a with both k-NN and SVM model.
It is instructive to look at the specific predictions and ground-truth value both at the scale of the whole season and for measurements during a day.We show this in Figure 2b .In these figures, the x-axis is the predicted result and the y-axis is the ground truth.The points are colored by date.In Figure 2b1, we enlarge the points from day 45 and 103 to emphasize the performance on single days.These plots all show the predictions from data driven features capture the overall trend of the phenotypes through the season well, even if the trend in a single day is somewhat weaker.

Genetic Marker Classification
For this task, we create a balanced test set for each genetic marker by selecting an equal number of reference and alternate cultivars from the test set for each marker to ensure that random chance performance on this task is 50%.The genetic marker classification results are shown in Figure 3a.We compare the result from k-NN model with different values of k and the SVM model.The results show that both classification approach work, but the SVM has consistently improved performance.This shows the embedding can capture the genotype × phenotype relationship for different genetic markers using the visual features extracted from images.
In Figure 3b, we further illustrate the genetic marker prediction accuracy on each day.The result shows features extracted from image in early season have a harder time predicting the genetic maker.Accuracy increases over the course of the season, as the plants grow and the phenotypes controlled by each of the genetic marker families are better presented.

CONCLUSION
In large scale field trials, it is often the case that some labels come for free, such as the date of images, and the spatial plot labels.In this paper we explore the hypothesis that learning features tied to crop variety creates a rich descriptive space, and we demonstrate that this space can be used to extract common plant phenotypes and known genetic markers.For large, field scale plant trials (in contrast to greenhouse, or smaller scale controlled experiments), there may often be large amounts of visual data with limited labels, and exists already for many types of plants including wheat, 22 sunflower, 23 and lettuce. 24

Figure 1 .
Figure 1.UMAP color coded by cultivars and examples of same cultivar color coded by different plot

Figure 2 .
Figure 2. The phenotype prediction r 2 results for canopy height, leaf length and width are shown on the left, for both k-NN and SVM regression model.On the right, we show the comparison of model prediction and the ground truth for these three phenoetypes.Points are colored by date.Specially in figure b1, the results on day 45 and 103 are highlighted.

Figure 3 .
Figure 3. Genetic marker prediction results are shown on the left.Per day genetic marker prediction accuracy plot is shown on the right.It shows the accuracy goes up as the plant growing.
Future research directions may include more exploratory work to understand and visualize what other known phenotypes of the plant are encoded within the features that are automatically learned, and potentially what novel phenotypes are available in the data that are not things that have been explicitly measured in the past.