MorphoFeatures: unsupervised exploration of cell types, tissues and organs in volume electron microscopy

Electron microscopy (EM) provides a uniquely detailed view of cellular morphology, including organelles and fine subcellular ultrastructure. While the acquisition and (semi-)automatic segmentation of multicellular EM volumes is now becoming routine, large-scale analysis remains severely limited by the lack of generally applicable pipelines for automatic extraction of comprehensive morphological descriptors. Here, we present a novel unsupervised method for learning cellular morphology features directly from 3D EM data: a convolutional neural network delivers a representation of cells by shape and ultrastructure. Applied to the full volume of an entire three-segmented worm of the annelid Platynereis dumerilii, it yields a visually consistent grouping of cells supported by specific gene expression profiles. Integration of features across spatial neighbours can retrieve tissues and organs, revealing, for example, a detailed organization of the animal foregut. We envision that the unbiased nature of the proposed morphological descriptors will enable rapid exploration of very different biological questions in large EM volumes, greatly increasing the impact of these invaluable, but costly resources.


28
Development of multicellular organisms progressively gives rise to a variety of cell types, with dif-29 ferential gene expression resulting in different cellular structures that enable diverse cellular func- ing methods has removed the necessity for generating massive amounts of manual annotations 77 previously needed to train a network. Besides the obvious advantage of being less labour-intensive, 78 self-supervised methods do not optimise features for a specific end-task, but instead produce de- 79 scriptors that have been shown to be useful in a variety of downstream tasks, such as image classi- or subcompartments within neural cells in 3D electron microscopy of brain volumes (Huang et al. 84 (2020); Schubert et al. (2019)). Building on the latest achievements, we aim to expand the feature ex-85 traction methods to automated characterisation of cellular morphology at the whole animal level. 86 Here we present the first framework for the fully unsupervised characterization of cellular 87 shapes and ultrastructures in a whole-body dataset for an entire animal. We apply this new tool to 88 the fully segmented serial blockface electron microscopy (SBEM) volume of the 6 days post fertilisa-89 tion young worm of the nereid Platynereis dumerilii, comprising 11,402 mostly differentiated cells 90 with distinct morphological properties (Vergara et al. (2021) high correlation with genetically defined types such as muscle cells and neurons. Our pipeline also 97 allows for the characterization of rare cell types such as enteric neurons and rhabdomeric photore-98 ceptors, and detects distinct cell populations within the developing midgut. Finally, defining feature 99 vectors that also represent the morphology of immediate neighbours, we group cells into tissues, 100 and also obtain larger groupings that represent entire organs. We show that such neighbourhood-101 based MorphoContextFeature vector clustering reproduces manually annotated ganglionic nuclei 102 in the annelid brain, and represents a powerful tool to automatically and comprehensively detect 103 the distinct tissues that belong to and make up the foregut of the nereid worm -including highly 104 specialised and intricate structures such as the neurosecretory infracerebral gland (Baskin (1974); 105 Hofmann (1976); Golding (1970)) and the axochord that has been likened to the chordate noto-  2014)). We further show that such morphologically defined tissues and organs 107 correlate with cell type and tissue-specific gene expression. Our work thus sets the stage for linking 108 genetic identity and structure-function of cell types, tissues and organs across an entire animal.

109
The MorphoFeatures for the Platynereis dataset, as well as the code to generate and analyse 110 them are available at https://github.com/kreshuklab/MorphoFeatures.git.

112
Unsupervised deep learning extracts extensive morphological features 113 Our pipeline has been designed to extract morphological descriptors of cells (MorphoFeatures) 114 from EM data. It requires prior segmentation of all the cells and nuclei of interest. The pipeline 115 utilises segmentation masks to extract single cells/nuclei and represents their morphology as a 116 combination of three essential components: shape, coarse and fine texture ( Figure 1A). We train 117 neural networks to represent these components independently, using point clouds as input for 118 shape, low resolution raw data for coarse texture with sufficient context and small, high resolution supervision is not only difficult, but might also bias the exploration towards the "proxy" groups used 131 for supervision. To enable immediate data exploration even for datasets where direct supervision 132 is hard to define and the ground truth cannot easily be obtained, we developed a fully unsuper-133 vised training pipeline that is based on two complementary objectives ( Figure 1B,C). The first is an 134 autoencoder reconstruction loss, where a network extracts a low-dimensional representation of 135 each cell and then uses this representation to reconstruct back the original cell volume. This loss 136 encourages the network to extract the most comprehensive description. The second objective is 137 a contrastive loss that ensures the feature vectors extracted from two similarly looking cells (posi-138 tive samples) are closer to each other than to feature vectors extracted from more dissimilar cells 139 (negative samples). Since we do not know in advance which samples can be considered positive for 140 our dataset, we are using slightly different views of the same cell, generated by applying realistic 141 transformations to cell volumes (see Methods). The combination of these two losses encourages 142 the learned features to retain the maximal amount of information about the cells, while enforcing 143 distances between feature vectors to reflect morphological similarity of cells.

144
The pipeline was trained and applied on the cellular atlas of the marine annelid Platynereis 145 dumerilii (Vergara et al. (2021)). It comprises a 3D serial block-face electron microscopy volume of 146 the whole animal that has sufficient resolution to distinguish ultrastructural elements (organelles 147 and inclusions, nuclear and cytoplasm texture, etc.) and an automated segmentation of 11,402 148 cells and nuclei ( Figure 1A). Additionally, whole-animal gene expression maps are available that 149 cover many differentiation genes and transcription factors. Cell segmentation is used to mask the volume of a specific cell (and its nucleus) in the raw data. Neural networks are trained to represent shape, coarse and fine texture from the cell volume (separately for cytoplasm and nuclei). The resulting features are combined in one MorphoFeatures vector that is used for the subsequent analysis. B. Training procedure for the shape features. A contrastive loss is used to decrease the distance between the feature vectors of two augmented views of the same cell, and increase the distance to another augmented cell. C. Training procedure for the texture features. Besides the contrastive loss, an autoencoder loss is used that drives the network to reconstruct the original cell from the feature vector MorphoFeatures allow for accurate morphological class prediction 159 Good morphological features should distinguish visibly separate cell groups present in the data.

160
To estimate the representation quality of our MorphoFeatures we quantified how well they can be 161 used to tell such groups apart. We took the morphological cell class labels available in the dataset,   Figure 2B and Figure S1. This reveals, as expected, that while morphologi-

MorphoFeatures correspond to visually interpretable morphological properties 209
Neural networks are often referred to as "black boxes" to signify that it is not straightforward to   259 To further describe the morphological clusters we took advantage of the whole-animal cellular  Beyond that, we noted considerable genetic heterogeneity in the midgut cluster. Subclustering midgut, which we interpret as midgut smooth musculature and digestive epithelia. We also de-302 tected an enigmatic third cluster located outside of the midgut, in the animal parapodia, compris-303 ing cells that resemble midgut cells morphologically ( Figure 6D). For this subcluster, the current 304 gene repertoire of the cellular expression atlas did not reveal any specifically expressed gene.  Figure 8A,B). Notably, manual segmentation of brain tissues relied on visible tissue boundaries, 328 thus has lower quality in the areas where such boundaries are not sufficiently distinct.

329
In essence, all three ways of defining ganglia lead to very similar results ( Figure 8B), yet MorphoCon- neuron-like cells that surround the foregut like a collar ( Figure 9B,C).

342
Further inspection revealed that the latter structure is located underneath the brain neuropil and to the foregut neurons that express sodium channel scn8aa, the infracerebral gland cells specif-351 ically express homeobox protein onecut/hnf6 ( Figure 9D,E). We also noted that the tissue stains 352 positive for EdU applied between 3 and 5 dpf, indicating that it is still proliferating. 353 We then identified the prominent muscles surrounding the foregut as the anterior extension of the networks to intensity variations, it will be essential to avoid non-biological systematic change in the 410 visual appearance of cells such as intensity shifts, both within and between the EM volumes.

411
The presented pipeline can also be adjusted for other types of data. For example, since our analysis 412 revealed a high variability in neuronal soma morphologies, it appears reasonable to also apply the 413 pipeline to brain EM volumes, where MorphoFeatures could supplement neuron skeleton features 414 for more precise neuronal morphology description. We also expect the approach to contribute to The revolution in multi-omics techniques and volume electron microscopy has unleashed enor-468 mous potential in generating multimodal atlases for tissues as well as entire organs and animals.

469
The main unit of reference in these atlases is the cell type, which has been defined, and is manifest, apart. For learning representations of shape, we follow a purely contrastive approach, whereas 553 for texture representations we combine the CL objective with an Autoencoder reconstruction loss.

554
Both are described in more detail below.

555
The training procedure for CL is the same for shape and texture representations. First, we ran-    showing differential expression in these groups ( Figure 5, 8, 9), the corresponding regions of the 708 UMAP representation were cut out and these genes were plotted on top.

716
Data visualisation 717 We used matplotlib (Hunter, 2007) and seaborn (Waskom, 2021) for plotting the data, MoBIE (Ver-  Figure S4. Infracerebral gland. A. The location of the gland in the head. The neuropil and the secretory cells are pointed up by black arrows, the surrounding muscle layers -by red arrows. B. The shape of the gland and its position relative to the posterior pair of adult eyes (black arrows). C. A cavity likely to be a developing blood vessel (black arrow) on top of the gland.