Quantifying barley morphology using the Euler Characteristic Transform

,


Introduction
Biologists are accustomed to thinking about how the shape of biomolecules, cells, tissues, and organisms arise from the effects of genetics, development, and the environment.Traditionally, biologists use morphometrics to compare and describe shapes.The shape of leaves and fruits is quantified based on homologous landmarks-similar features due to shared ancestry from a common ancestor-or harmonic series from a Fourier decomposition of their closed contour.While these methods are useful for comparing many shapes in nature, they cannot always be used: there may not be homologous points between samples or a harmonic decomposition of a shape is not appropriate.Topological data analysis (TDA) offers a more comprehensive, versatile way to quantify plant morphology.In particular, Euler characteristic curves (ECC) serve as a succinct, computationally feasible topological signature that allows downstream statistical analyses [12].For example, ECCs have been used to determine a morphospace for all leaves to then predict plant family and location [5].Further analysis has determined the genetic basis of 2D leaf shape in apple [9], tomato [6], and cranberry [3].Here, we show the use of the Euler Characteristic to comprehensively describe the shape of 3D voxel-based X-ray CT scans of barley seeds as a proof of concept.

Methods
Consider a cubical complex X of dimension d.For a fixed direction ν ∈ S d−1 , and a height value h ∈ R, we define to be the subcomplex containing all cubical cells below height h in the direction ν.The Euler characteristic at height h is χ(X(ν) h ), the alternating sum of counts of cells in the subcomplex X(ν) h .The Euler Characteristic Curve (ECC) of direction ν is defined as {χ(X(ν) h )} h∈R , exemplified in Figure 1.The Euler Characteristic Transform (ECT) is defined as the collection of all ECCs corresponding to all possible directions.To be more precise, the ECT of complex X is defined as the function We focus on the shape of barley seeds from a collection of 28 different barley accesions from diverse regions across the Eurasian continent.Using X-ray CT-computed tomography-scanning technology, we have created voxel-based 3D reconstructions of over 875 spikes, from which we have isolated 3121 parental seeds.Since the seeds are oblong in shape, we aligned them according to their three main principal components.
On one hand, we computed 11 traditional quantifiable shape descriptors for each seed, such as length, width, and volume.On the other hand, we computed topological shape descriptors using the ECT.For topological purposes, we treated each voxel-based image as a dual cubical complex where each nonzero voxel is treated as a vertex [13].
We favor the use of the ECT for two reasons.First, the ECT is computationally inexpensive; since it is based on successive alternating sums of counts of cells, a single ECC can be computed in linear time with respect to the number of voxels in the image [10].
Second, the ECT effectively summarizes all the morphological features of any 3D complex as it encodes sufficient information to reconstruct the initial complex [12], a result later extended to the n-dimensional case [2].Nonetheless, the idea of efficiently reconstructing an arbitrary 3D object solely from its ECT remains elusive [1,4,8].
In our case, we used 158 different directions with 16 uniformly spaced thresholds.We emphasized directions toward the crease of the seeds.This yielded a 2528-dimensional vector for every seed.These high-dimensional vectors were later reduced in dimension separately using non-linear KPCA [11] and UMAP [7].
The descriptiveness of both traditional and topological measures was tested by training five non-linear support vector machines (SVM).These characterized and predicted the seeds from the 28 distinct barley accessions based on three different collection of descriptors: traditional, topological, and combined.We also varied the dimension-reduction method.In every case, all the descriptors were centered and scaled to variance 1 prior to classification.Given that SVM is a supervised learning method, we first randomly sampled 75% of the seeds from every founder as our training data set.The remaining 25% was used to test the accuracy of our prediction model.This setup was repeated 100 times.Average scores were considered for the overall data set (Table 1), as well as for individual accessions (Fig. 2).

Results and Conclusions
The majority of the barley accessions studied are more easily distinguished with the topological lens but not with traditional measures, with few exceptions (Fig. 2).Exceptions like Hannchen, Han River and Palmella Blue have slightly distinctive traditional trait distributions, so seed size does matter and it is important to take it into account.At the same time, we observe accessions like Alpha, Glabron, Minia, and Wisconsin Winter, that are poorly differentiated with traditional information but report considerably higher classification accuracies whenever using topological information.When looking at a more robust dimension reduction technique like UMAP, classification accuracy is increased when combined with size-related information.
The Euler characteristic is a simple yet powerful way to reveal features not readily visible to the naked eye.There is "hidden" morphological information that traditional and geometric morphometric methods are missing.The Euler characteristic, and Topological Data Analysis in general, can be readily computed from any given image data, which makes it a versatile tool to use in a vast number of biology-related applications.TDA provides a comprehensive framework to detect and compare morphological nuances, nuances that traditional measures fail to capture and that remain unexplored using simple geometric methods.In the specific case of barley seeds presented here, these "hidden" shape nuances provides enough information to not only characterize specific accessions, but the individual spikes from which seeds are derived.Our results suggest a new exciting path, driven by morphological information alone, to explore further the phenotype-genotype relationship.

Figure 1 :
Figure 1: Filtration of a barley seed along the z-axis with 32 thresholds and its corresponding Euler Characteristic Curve.

Table 1 :
SVM classification accuracy of barley seeds from 28 different accessions after 100 randomized training and testing sets.Classification scores was computed for each accession; the weighted average for each score was taken afterwards, where the weight depended on the number of test seeds used.The use of topological outperforms the use of exclusively traditional descriptors.