Computer Methods and Programs in Biomedicine

Background and objectives: In recent years, electron microscopy is enabling the acquisition of volumetric data with resolving power to directly observe the ultrastructure of intracellular compartments. New insights and knowledge about cell processes that are offered by such data require a comprehensive analysis which is limited by the time-consuming manual segmentation and reconstruction methods. Method: We present methods for automatic segmentation, reconstruction, and analysis of intracellular compartments from volumetric data obtained by the dual-beam electron microscopy. We speciﬁcally address segmentation of fusiform vesicles and the Golgi apparatus, reconstruction of mitochondria and fusiform vesicles, and morphological analysis of the reconstructed mitochondria. Results and conclusion: Evaluation on the public UroCell dataset demonstrated high accuracy of the proposed methods for segmentation of fusiform vesicles and the Golgi apparatus, as well as for reconstruction of mitochondria and analysis of their shapes, while reconstruction of fusiform vesicles proved to be more challenging. We published an extension of the UroCell dataset with all of the data used in this work, to further contribute to research on automatic analysis of the ultrastructure of intracellular compartments


Introduction
Volumetric distribution of intracellular compartments and their temporospatial dynamics of interactions define the function and state of eukaryotic organisms [ 1 ].The Golgi apparatus (GA), mitochondria, and endolysosomal compartments play the main role in the modification and sorting of membrane proteins, cell energy production, and degradation of endocytosed or autophagic materials in all cells [ 1 ].Although typical functions are assigned to these organelles, there are differences in their distribution, organization and consequently, in the roles within different cell types.Moreover, there can be large differences between differentiation stages of the same cell types, and among normal cells and cells with dis-eases (i.e., the structure of organelles may reflect pathological conditions) [2][3][4][5].
The urothelium is the epithelium of the urinary bladder and presents a good model to study intracellular compartments involved in the formation and polarized transport of membranes [ 6 ].In differentiated urothelial cells, GA glycosylates transmembrane proteins (i.e., uroplakins), which are organized into 2D crystalline membrane domains (i.e., urothelial plaques) in the post-Golgi compartments are termed fusiform vesicles (FVs) [7][8][9].FVs transport urothelial plaques to the apical plasma membrane of superficial urothelial cells, where they crucially contribute to the blood-urine barrier [ 10 ].Plaques, endocytosed from the plasma membrane, or other compounds of the cell, can be degraded and removed by the endolysosomes if sufficient mitochondria-derived energy (i.e., ATP) is available [ 11 ].It has been shown that many bladder diseases (e.g., urinary bladder cancer, cystitis, uroinfections), chemical agents, aging, or even relatively distant events (e.g., spinal cord injuries), affect urothelial functions, and that is reflected in the altered ultrastructure of the GA, mitochondria, and endolyso-somes [ 12 ].Thus, observing the ultrastructure of these compartments may lead to novel findings on how different factors affect the cells.
Electron microscopy (EM) is the only method with resolving power to directly observe the ultrastructure of intracellular compartments.In recent years, the availability of dual-beam microscopes (i.e., FIB-SEM) pawed the way to acquire volumetric EM data on a large scale [ 13,14 ], but their (manual) segmentation and reconstruction, needed to study the ultrastructure of the compartments, is a major bottleneck, both in terms of time and expertise.Computer-assisted automatization of the process is expected to significantly facilitate and simplify analysis of volumetric electron microscopy data, however, currently available approaches are still not accurate enough.
We are addressing this bottleneck by proposing software methods for fully automatic segmentation and reconstruction of intracellular compartments from the FIB-SEM data.We refer to segmentation as the process of labeling each voxel of a volume with its corresponding semantic label, e.g., mitochondria, GA, fusiform vesicle and background (semantic segmentation), whereas we refer to reconstruction as the process of assigning organelle voxels to individual instances, thus separating them (instance segmentation).
This work is an extension of our previous work on segmentation of mitochondria and endolysosomes [ 15 ].We pre-sent the following novel contributions: (1) methods for segmentation of FVs and the GA in FIB-SEM data, (2) reconstruction methods for FVs and mitochondria, and (3) methods for morphological analysis of mitochondria.Except for the methods for the reconstruction of mitochondria and segmentation of the GA, we are to our knowledge the first to present methods for the described tasks and we believe our work will enable further studies of the ultrastructure of cellular compartments.In addition, we also make public an extension of the UroCell dataset [ 15 ] by including all of the data used in this work.

Related work
We divide related work into three sections: segmentation, reconstruction and datasets.

Segmentation
Segmentation of volumetric data labels each voxel of the 3D volume with an appropriate label (e.g., mitochondria, endolysosome, background).It has become a fast-growing field with the success of convolutional deep neural networks, which can be applied to complex segmentation problems.The most widely adopted architectures in the field of volumetric segmentation of biomedical images such as the 3D U-Net [ 16 ], the V-Net [ 17 ], DeepMedic [ 18 ], and HighRes3DNet [ 19 ] are achieving high segmentation accuracy in different domains.Although general segmentation approaches can be successful, added expert knowledge about the target domain is usually as important as the choice of the segmentation network [ 20 ].Lately Isensee et al. [ 21 ] presented the importance of parameter configuration.With the nnU-Net self-configuring method for deep learning-based biomedical image segmentation, they showed that with the right choice of parameters they can overcome most of the existing state-of-the-art methods on many public challenges.
In our work, we present a method for segmentation of FVs and the GA.We are aware of only a few works that address automatic segmentation of the GA, and none so far for FVs.Recently, Müller et al. [ 22 ] presented a method for segmentation of the GA from FIB-SEM data of the primary mouse β cell using the 3D U-Net architecture.The focus of the approach is on the reconstruction of microtubule-organelle interaction in cells and not so much on segmentation accuracy, therefore they do not provide an evaluation of the segmentation algorithm.Another method for segmentation of multiple organelles, including the GA, was presented by Heinrich et al. [ 23 ].Their method is based on several multi-channel 3D-U-Net architectures, that predict signed tanh boundary distances of binary labels for each organelle class.The distances are finally converted to binary labels of each target class.The authors present an extensive dataset and segmentation approaches for 35 different organelle classes, which will have a significant impact on the development of the field.The GA is segmented in the same fashion as all other organelles, but the authors do not evaluate its segmentation, since the test volumes do not include GA.
In the paper, we make the following contributions to segmentation of EM data.We present a new method for segmentation of the GA, which in contrast to the presented works does not require a precisely annotated training dataset.This is important, because such annotations are very time consuming to obtain due to the complex shape of the GA.We also quantitatively evaluate the segmentation accuracy of the proposed method on a dataset obtained from the urothelial tissue.We should note that the used dataset is challenging for segmentation of the GA, because it contains stacks of FVs that look very similar to the GA.Additionally, we present a method for segmentation of FVs, which to our knowledge is the first of its kind.

Reconstruction
Segmentation labels individual voxels with their respective class labels.With reconstruction, the individual compartment instances and their shapes are determined from the output of the segmentation process, thus enabling the counting of the compartments, measurement of their lengths, volumes, distribution, etc.
Works in automatic reconstruction of volumetric electron microscopy data are frequent in brain research, where the task, also referred to as connectomics, aims to reconstruct the full neuroanatomy, including synaptic contacts [24][25][26][27][28].The reconstruction methods are not generally applicable to other domains, as they focus on the morphology of the target domain.Related to reconstruction, instance segmentation finds and extracts individual instances of objects in images and volumes.Many successful instance segmentation algorithm are derived from the MaskR-CNN [ 29 ].Even though there are examples of successful application of such algorithms to volumetric data (e.g., [30][31][32]), their reliance on bounding boxes is limiting with closely spaced objects or elongated curved objects, as the boxes may highly overlap.This is also the main reason that they are not used for instance segmentation of neurons.
In our work, we present methods for reconstruction of mitochondria and FVs in EM data.Segmentation of mitochondria has been addressed by many authors [33][34][35][36][37], a detailed overview has been presented in our previous paper [ 15 ].Lately, several methods were proposed that also include reconstruction, where the main goal is to rectify locally incorrect segmentation results.Liu et al. [ 35 ] presented a method for automatic reconstruction of 3D mitochondria from automatic 2D segmentations.They use a multi-layer information fusion algorithm and in the end, discard all objects with length smaller than a selected threshold.Heinrich et al. [ 23 ] present a method to prevent over merging of instances in the segmentation output.After smoothing, hole filling, and size filtering, they create an over-segmentation of objects using a watershed algorithm which is followed by an agglomeration, where neighboring segments are merged together depending on the voxel values along the shared edges.Both mentioned works describe reconstruction of mitochondria, which is used to extract parameters for biomedical analysis (e.g., length, size, po-sition...), but they are evaluated only voxel-wise.Such evaluation is valuable to show improvements in segmentation, however to show that the reconstruction and thus the calculated mitochondria parameters are reliable, it is necessary to evaluate the results with metrics that are used for object detection or instance segmentation.Namely, voxel-based metrics show little difference in cases where several neighboring mitochondria are falsely merged into one instance (and vice-versa), because such fusions consist of a negligible number of voxels in comparison to the size of mitochondria.Such errors will however affect the counting of instances, estimation of object sizes, etc.Recently Wei et al. [ 38 ] presented a large dataset for instance segmentation of mitochondria, where they showed that existing reconstruction methods are not very accurate when evaluated object-wise.The method with the best performance for instance segmentation of mitochondria on their dataset, is 3D U-Net, which predicts instance contours and binary masks, followed by a marker-controlled watershed (U3D-BC+MW), which we also evaluate on our data and show that our approach yields better results.
In this paper we present three contributions to reconstruction of EM data.We introduce a benchmark method for reconstruction of mitochondria based on the segmentation algorithm presented in Meku č et al. [ 15 ].The method consists of a set of postprocessing steps that make use of topological properties of mitochondria [ 39 ] to reconstruct their shapes.We evaluate the approach with object-wise metrics.We also introduce novel methods for morphological analysis of mitochondria, by classifying their shape as classical, branched, and contacting.The introduction of such shape-based classification is important for biomedical analysis since experimental evidence demonstrate that mitochondrial ultrastructure affects mitochondrial and cellular function [40][41][42].Both contributions are novel, as in the related work, authors usually calculate only the basic quantification parameters, such as counts, volume and surface area, and don't evaluate their accuracy [ 23,43 ].Finally, we present and evaluate a method for reconstruction of FVs, which, to our knowledge, is the first of its kind.The proposed method is unsupervised and generic and can be applied to reconstruction of objects with similar topological properties.

Datasets
The evolution of the fields of automatic segmentation and reconstruction of intracellular compartments from volumetric microscopy data is highly dependent on the availability of annotated EM data.
Among the intracellular compartments that we are focusing on, several public volumetric datasets exist with annotated mitochondria [ 15,22,23,34,38,[44][45][46][47].For the GA, there are to our knowledge only two public volumetric FIB-SEM datasets [ 22,23 ], whereas no public datasets with FVs are currently available.All the datasets consist of binary class labels, except for the one presented by Wei et al. [ 38 ], which also provides labels for instance segmentation of mitochondria.Another relevant recently published dataset is the CEM500K dataset [ 48 ], which is a large unlabeled EM dataset beneficial for unsupervised pre-training of deep network features for segmentation of intracellular compartments.
With this paper, we make public an extension of the UroCell dataset [ 15 ], which contains annotations for mitochondria and endolysosomes.We extend it with manual annotations of FVs and the GA.We also add labels for instance segmentation of mitochondria and FVs, as well as labels for classification of mitochondria shape as classical, branched or contacting.This is to our knowledge the only public dataset obtained from the urothelial tissue, as well as the only one containing labels for instance segmentation of FVs and classification of mitochondria instances based on their morphological properties.

Materials and methods
In this section, we describe the dataset and the proposed methods for segmentation and reconstruction of intracellular compartments.In Section 3.1 we first describe the dataset used for training and evaluation of the proposed methods.We present a method for segmenting FVs and the GA in Section 3.2 , and the reconstruction methods for mitochondria and FVs in Section 3.3 .

Data
The data that we use were acquired from tissue sample of the urinary bladder of male mice with a FIB-SEM dual-beam electron microscope.The obtained volume is of size 1560 × 1366 × 1180 voxels, where the voxel dimensions are approximately x = 16 nm, y = 16 nm, and z = 15 nm.Details about the data acquisition are provided in Meku č et al. [ 15 ].The dataset used in this study consists of several sub-volumes (256 × 256 × 256 voxels) of the entire volume, cut out from different locations to make the dataset as diverse as possible.
This data is publicly available in the UroCell dataset [ 15 ].Next to the raw data, it includes manual annotations of mitochondria and endolysosomes in five sub-volumes.There are from 15 to 85 mitochondria and approximately 10 endolysosomes in each of the sub-volumes.In this work, we extend the dataset with additional annotations, as described below.The enriched dataset is available on: https://github.com/MancaZerovnikMekuc/UroCellunder the CC-BY-NC-SA 4.0 licence. 1 All the manual annotations were created with the open-source software Slicer3D [ 49 ].In the end, they were revised by a cell biologist with expertise in urothelial biology.An example of a sub-volume showing raw data as well as manual annotations is shown in Fig. 1 .

The extended dataset
In this work, we extend the dataset with manual annotations of FVs and the GA.We also include non-binary instance labels for mitochondria and FVs, and labels denoting branched and contacting mitochondria instances.
The annotations for FVs are given in four sub-volumes.There are 253 to 574 instances of FVs per sub-volume.For the GA, we provide precise annotations for two sub-volumes, which we use for evaluation of the proposed segmentation algorithm.For seven subvolumes we provide an approximate manual annotation, where the region of the GA is roughly labeled.Every annotated sub-volume contains from 0 to 3 instances of the GA.
The labels for instanced, branched, and contacting mitochondria are provided for the five sub-volumes that also contain binary mitochondria labels.To label the instances, we inspected all of the existing binary components and divided or merged them, based on a thorough manual inspection of the underlying data.Labels for branched, and contacting mitochondria are based on the obtained instances and were defined with the help of a domain expert.We provide two sets of labels, one where each mitochondria instance is labeled as branched or non-branched and the other where each mitochondria instance is labeled as contacting or non-contacting.Mitochondria that are not in contacting and branched category are a part of classical category.
Classical mitochondria show "normal", clearly outlined organelles typically found in cells.Branched category shows mitochondria with branched architecture.We call the third category  contacting mitochondria: such situations represent an intermediate state during mitochondrial fission or fusion, and the ability to sort out these mitochondria gives researchers the opportunity to assess the significance of this situation.We label the mitochondria as contacting if there is a visible narrow connection between the instances, but the category is borderline: they could represent functional contacts or be a technical artifact.Nevertheless, it is important that the interpretation is available to the researcher.The necessary certainty is going to be obtained through the accumulation of knowledge gained through advances in electron microscopic and complementary methods.

Segmentation
We present methods for segmentation of FVs and the GA (outlined in Fig. 2 ).Both methods use deep neural networks and different postprocessing steps to improve the segmentation accuracy.

Segmentation of fusiform vesicles
For segmentation of the FVs, we used the nnU-Net method described in Isensee et al. [ 21 ].In the postprocessing stage, components smaller than 4 × 10 5 nm 3 were removed (an average size of the FVs in the dataset is approximately 2 × 10 7 nm 3 ).We also removed FVs that were detected over an area with curtaining artifacts, as these are usually false positives due to the artifacts.

Segmentation of the Golgi apparatus
The GA has a complex outline, due to its numerous cisternae and tubulo-vesicular network, which is difficult and timeconsuming to accurately annotate manually -obtaining accurate ground truth for deep learning is thus difficult.We therefore propose a segmentation approach that relies on approximately annotated ground truth, where GA borders are only roughly traced; such annotations can be obtained much faster.An example of an approximate and precise annotation of a GA is shown in Fig. 3 .
We train the HighRes3DZMNet network with approximate annotations.Highres3DZMNet is a deep neural network for segmentation of volumetric data, first presented in Meku č et al. [ 15 ].It consists of 19 layers with 3 × 3 × 3 convolutions and one last 1 × 1 × 1 convolution layer followed by a softmax.Convolutions in the first seven layers are 1-dilated, in the next six layers they are 2-dilated and in the following six layers, they are 4-dilated.Between every two consecutive layers, except for the first and the last one, are residual connections.In the first layer, the network uses zero-mean convolutions, which makes it more robust to varying brightness levels in different patches of the input volume.Because the network is trained with approximate annotations, it will output volumes with regions that likely contain the GA, however GA borders will not be accurately segmented.To refine the segmentation, we use the active contours algorithm [ 50 ].The active contours, also called snakes, are an iterative region-growing process that pushes the initially given object estimate to the object boundaries by minimizing an energy function.In our case, the initial estimates are segmentations obtained by the deep network.We use a 2D active contours algorithm [ 51 ], which we run on every slice of the analyzed volume along all three dimensions.This results in three different output volumes, which we merge using disjunction (a voxel is considered as part of the GA if it is labeled as GA in at least one of the three volumes).
The GA is morphologically similar to FVs that are very common in our dataset.Thus, false positives may occur as the vesicles get marked as a GA by the segmentation network.We therefore added an additional postprocessing step that removes such false positives.Because FVs are usually smaller in size, this step simply removes all detected GAs smaller than 0.19 μm 3 which is 80% of the smallest GA in our ground truth.
With the described approach we obtain a precise segmentation of the GA even though we only rely on approximate ground truth for training, while the process is still completely automatic.

Reconstruction
Each segmentation network outputs a volume, where each voxel is labeled with its corresponding class (or as background).To count the organelles in a class and to make quantitative analyses of their sizes, shapes and distribution, we need to reconstruct individual instances of these organelles.This is trivial, if their voxels are already separated; however mitochondria and FVs are often found in groups with their membranes touching, thus forming large sets of connected voxels, which must be divided into individual instances.This is the goal of reconstruction -extracting the 3D shape of each organelle instance.We propose two algorithms for this purpose, both making use of the domain knowledge on the shape of individual organelles.The algorithms are designed for reconstructing mitochondria and FVs.Although we also segment the GA and endolysosomes, these are very rarely touching in our dataset, so their reconstruction is simple.In addition, we present methods for exploiting the reconstructed information to classify mitochondria shape into classical, branched and contacting.

Reconstructing mitochondria
Mitochondria may often touch, thus forming connected sets of voxels.The goal of reconstruction is to separate such instances.An example is shown in Fig. 4 .
The main goal of reconstruction is to separate closely-spaced mitochondria, which are falsely connected across membranes, because the deep network labeled the voxels between them as mitochondria (instead of background).We use the active contours algorithm to shrink the labeled components towards mitochondria membranes and thus separate the mitochondria.We use the edgebased model for active contours, which is similar to geodesic active contours [ 52 ].We set the maximum number of iterations of evolving contours to 3 and set the smoothing factor and contraction bias to 0. We run the algorithm along each dimension of the 3D volume, which we merge as already described for segmentation of the GA.After applying the process of active contours, we perform noise removal by removing all components labeled as mitochondria and smaller than 4 .7 × 10 6 nm 3 , which is approx.50% of the size of the smallest mitochondrion in the ground truth.In this way, we remove small noisy chunks sometimes attached to mitochondria as the result of poor segmentation (often near the cell membrane), and now disconnected by our reconstruction method.

Reconstructing fusiform vesicles
FVs densely populate the urothelial tissue (see Fig. 1 d).Their shape can be described as a pancake in 3D space and they are usually found in stacks, which makes it difficult to distinguish be-tween individual instances.Successful segmentation will result in blobs of voxels classified as FVs, however to actually analyze their counts, sizes and interactions, we need a reconstruction method to divide the blobs into individual vesicle instances.An example of such stacks and blobs is shown in Fig. 5 .
Because we are dealing with thin objects, placed very close together, existing methods for volumetric instance segmentation, such as the MaskRCNN [ 29 ], do not yield good results, as the bounding boxes overlap too much.We therefore present a new unsupervised reconstruction method that can be used as a benchmark for the problem.It is based on fitting thin disks to the vesicle stacks, output by the segmentation algorithm.As fitting multiple 3D shapes in 3D space can be a challenging optimization problem, due to the many free parameters, we add steps that limit the number and ranges of parameters.We outline the method in Algorithm 1 and describe it in more detail below.
input : Segmentation of FVsdenoted by vol and its corresponding FIB-SEM volume denoted by volFIB output : A set of divided FVs,denoted by The reconstruction method operates on the output of the algorithm for segmentation of FVs, as well as on the corresponding original FIB-SEM volume.We first find all connected components (vesicle stacks) in the input volume.Each stack is first ro-  tated so that the vesicles are parallel to the sagittal plane (line 4 of Algorithm 1 ).To determine the rotation angle and axis, we analyze the cross section of the stack within the original FIB-SEM volume.We estimate stack orientation by combining edge detection with the Hough transform to detect lines parallel with vesicles in the stack.According to the average line rotation, we rotate the stack to position it parallel to the sagittal plane.An example is shown as the first step in Fig. 6 .
In the next step, we use an optimization approach to fit a set of disks, representing individual vesicles, to the rotated stack.The disks represent a rough approximation of the vesicle shape, we use them because their shape can be described by only one parameter (radius), in contrast to more flexible representations such as parametric surfaces.Together with their 3D position, each disk is defined by 6 varying parameters and one fixed parameter (thickness) listed below.As we start the optimization with a stack that is parallel to the sagittal plane, we can make a good initial estimate of parameter values for each disk.We can also limit their range of values, in order to reduce the optimization search space, as we already know their approximate orientation and have knowledge of the typical sizes of FVs.We use the following parameters and ranges (in voxels): • coronal rotation crot ∈ [ crot curr − 10 , crot curr + 10] , • axial rotation arot ∈ [ arot curr − 10 , arot curr + 10] , • position x ∈ [0 , s x ] , • position y ∈ [ s y − 10 , s y + 10] , • position z ∈ [0 , s z ] .
• thickness h = 4 , s x , s y and s z denote the size of the bounding box of the vesicle stack, and crot curr , arot curr the coronal and axial rotation of the current optimization iteration.Rotations are initially set to zero, while other parameters are set to the middle of their ranges.Thickness is constant and does not vary, its value was determined from observation of the vesicles in our data.
We perform optimization with a pattern search algorithm minimizing an objective function with bound constraints [ 53 ].To estimate the number of vesicles in the stack, we iteratively add new disks and run the optimization algorithm, while the objective descreases by more than a threshold (we selected the threshold value of 0.02).
We define the objective function as follows: where c is the analyzed component, v is a volume containing the already fitted disks and n the new disk that is being added in the current iteration.The first part of the objective function measures the Dice similarity coefficient (defined in Section 4.2 ) between the component and the volume containing the fitted disks -the better the fit, the larger the value.As disks should not overlap, the second part of the objective function ( overlap (v , n ) ) returns the percent of overlapping disk voxels thus penalizing the overlap.An example of the output of this step is shown in Fig. 6 .
After the disks are fitted, we rotate them back to their initial orientation.In the end, we assign instance labels to the voxels by assigning each voxel to its nearest disk (lines 6 and 7 of the Algorithm 1 and the last two steps in Fig. 6 ).

Shape classification
Reconstruction enables researchers to obtain detailed information on the distribution, organization, sizes and shapes of cellular compartments.This in turn enables the study of the differences in compartments between different cell types, between differentiation stages of the same cell types, and among normal cells and cells with diseases (i.e., the structure of the compartments may reflect pathological conditions).To this end, we present two approaches to classify mitochondria according to their shape as classical, branched and contacting.

Detection of contacting mitochondria
Contacting mitochondria contain regions, where they are significantly narrower with respect to their overall shape.Examples of such objects are shown in Fig. 7 .
The contacting mitochondria are detected within the set of reconstructed mitochondria instances obtained by the algorithm described in Section 3.3.1 .The main idea of the proposed approach is to calculate the skeleton of each instance and then detect the narrowed parts by observing the distances of voxels within the instance to the skeleton.The approach is outlined in Algorithm 2 .The method iterates through all the reconstructed mitochondria instances.Holes within each instance are first filled by a simple hole filling algorithm, to avoid errors in further steps.To obtain the center of the instance, its 3D medial skeleton is calculated with the homotopic thinning algorithm [ 54 ].In the next step, the shortest distance to the skeleton is calculated for each instance voxel, thus adding a measure of thickness to the instance and resulting in an instance volume skeldist.All the voxels in skeldist with thickness lower than a selected threshold θ 1 are removed in the next step.The parameter θ 1 controls the thickness margin at which we denote a mitochondrion as contacting.In the final step, we count the components in the skeldist; if more than one is found, the instance is classified as contacting.We do not count components smaller than 5 voxels, which sometimes appear at the ends of the instances, as they do not represent the narrowings that we are interested in.

Detection of branched mitochondria
Mitochondrial fusion and division gives rise to branched mitochondria instances, as well as entire mitochondria networks.Examples of branched objects are shown in Fig. 8 .To automatically detect branched mitochondria, we again observe the reconstructed instances obtained by the algorithm described in Section 3.3.1 .As with contacting mitochondria, we also observe 3D mitochondria skeletons, which are transformed into graphs.If these graphs contain multiple edges, the instance is labeled as branched.The approach is outlined in Algorithm 3 .
The method iterates through all the reconstructed mitochondria instances.Holes within each instance are first filled by a simple hole filling algorithm, to avoid errors in further steps.The 3D skeleton is extracted with the same method as in the previous algorithm for detection of contacting mitochondria.The skeleton is then transformed to a graph [ 55 ], where branches smaller than a threshold θ 2 voxels are removed, as they likely represent small protrusions or noisy instance boundaries.In the final step, we add the instance labeled as branched if there are more than two edges in the obtained graph.

Implementation details
For the segmentation of FV with nnU-Net we ran 4-fold crossvalidation for all the proposed architectures.For the prediction, we used automatically selected 3D U-Net cascade.Predictions were made for each fold separately.We used the publicly available code [ 21 ].
HighRes3DZMNet for the segmentation of GA was implemented with the Nifty-Net framework [ 56 ].We ran 30,0 0 0 iterations of training with a learning rate of 0.001 using the Adam optimizer.
For both approaches, we used L2 regularisation with λ equal to 10 −5 .The reconstruction algorithms are implemented in Matlab.The thresholds used by the shape classification methods were manually set based on observations of typical sizes of the analyzed compartments.Their values (in voxels) were set as: θ 1 to 4.25 and θ 2 to 2. When applied to volumes of different resolution, these should be changed accordingly.Code is available at https://github.com/MancaZerovnikMekuc/HighRes3DZMNetand https://github.com/MancaZerovnikMekuc/Reconstruction .
For the mitochondria reconstruction, we evaluated the U3D-BC+MW method presented in Wei et al. [ 38 ] to compare the methods on our data.Here we also ran 30,0 0 0 iterations of training.For the watershed algorithm, we used the same parameters as proposed for the MitoEM dataset.

Evaluation metrics
To quantitatively evaluate the results, we use the wellestablished metrics from the field of semantic segmentation and object detection: the Dice similarity coefficient (DSC) also called the F1 score (F1), intersection over union (IOU) also called the Jaccard similarity coefficient (JAC), true positive rate (TPR) also called recall (REC) or sensitivity, true negative rate (TNR) also called specificity, and precision (PRE).They are defined as follows: In the upper equations TP (true positives) is the number of elements that are correctly classified as the target class, TN (true negatives) is the number of elements that are correctly classified as the background class, FP (false positives) is the number of elements which are falsely classified as the target class, and FN (false negatives) is the number of elements which are falsely classified as the background class.The highest possible value for all the metrics is 1, which we would get if all the elements would be classified correctly.Depending on the evaluated task, those elements can be voxels or individual compartment instances.
For evaluation of segmentation, we measure the DSC, TNR, and TPR.All the metrics are measured voxel-wise.DSC gives us information on the similarity between the ground truth annotations and the output of the segmentation.TPR measures the proportion of voxels that are correctly classified as the target class, and TNR measures the proportion of voxels that are correctly classified as the background class.
To evaluate reconstruction, where we extract individual compartment instances, we measure object detection accuracy.First, we determine the number of TP, TN, FP, and FN instances.An instance in the result is considered as a TP if it overlaps with an instance in the ground truth and if this overlapping, which is measured by an IOU metric voxel-wise, is higher than a selected threshold.If we have multiple instances for one ground truth object, the one with the highest IOU is considered as the TP and all the others are counted as FP.From the instance-wise TP, TN, FP, and FN, we calculate instance-wise PRE, REC, F1.PRE measures the proportion of correctly classified instance among all of the detected instance, REC measures the proportion of detected instance among all the ground truth instances, and F1 is the harmonic mean of PRE and REC.We also calculated the Panoptic Quality (PQ) metric [ 57 ], which aggregates the semantic and instance segmentation quality in one metric.It is defined as follows: In the upper equation the TP, FP, and FN are numbers calculated instance-wise with an IOU threshold of 0.5.In the numerator, p is a true positive object from the prediction and g stands for its corresponding object in the ground truth.

Results
In the following subsections, we present the qualitative and quantitative results of the proposed approach.It is important to look at both aspects, because the numbers do not always accurately describe the quality or usability of the results.

Segmentation results
Table 1 shows results for the segmentation of FVs.They were obtained by 4-fold cross-validation, so that each sub-volume from the dataset was once used as a test volume.The table shows averages of all four runs, with and without postprocessing.We also visualize the segmentation of the sub-volumes with the highest and the lowest DSC scores and the corresponding ground truth in Fig. 10 .Table 2 shows results for segmentation of the GA.We show evaluation for each step of the proposed method.We trained the network on five approximately annotated sub-volumes and evaluated it on two sub-volumes where precise ground truth annotations were available (these were not included in training).One of the test subvolumes is also visualized in Fig. 11 .
We also evaluated the time needed for annotation of the GA.As noted previously, exact annotation is extremely time consuming, as the shape of the GA is complex, so we based our segmentation algorithm on approximate annotations.An experienced annotator needed approximately 2.5 hours for precise annotation and 0.5 hours for approximate annotation of one GA, resulting in a 5fold speed up for making the ground truth annotations.

Reconstruction results
Results for the reconstruction of mitochondria, based on instance-wise metrics, are shown in Table 3 .They were calculated with an IOU threshold of 0.75 (the values remain very similar for other threshold values and the F1 measure only falls to 0.7 for the threshold value of 0.9).Reconstruction is based on the segmentation algorithm presented in Meku č et al. [ 15 ], which we also evaluated; results are shown in the second line of the table.We evaluated the approach on all sub-volumes that contain instance labels.We summed the TP, FP, and FN for all sub-volumes and calculated the measures in the table from the summed metrics.As mitochondria touching the borders of a sub-volume may only be partially included in the sub-volume and may thus be incorrectly treated by the proposed approach, we removed the border objects smaller than 10,0 0 0 voxels from evaluation.A visualization of reconstruction on a 2D slice is shown in Fig. 9 .With the same procedure, we evaluated the output of the U3D-BC+MW method on our dataset.The results are shown in the first line of the table.We can see that our approach outperforms the U3D-BC+MW method, especially because of its higher precision.
For reconstruction of FVs, we used the same instance-wise evaluation metrics.They were calculated from TP, FP, and FN values summed over all four evaluated sub-volumes.Results are presented in Table 4 .In the table, we show the results calculated with IOU thresholds of 0.3 and 0.5.We compare the results with the output of the segmentation algorithm, where we label each connected component as a separate object.Results of the proposed algorithm are shown in the last line of both parts of the table.Visualization of reconstruction on one of the sub-volumes is shown in Fig. 12 .

Shape classification
For the detection of contacting and branched mitochondria, we use the same evaluation metrics as for the reconstruction algorithm.Results are shown in Table 5 .Again we used the IOU threshold of to 0.75.Examples are visualized in Figs. 13 and 14 .

Reconstruction of the entire volume
To demonstrate the capability of our approach, we ran the segmentation and reconstruction algorithms on the whole volume with resolution: 1280 × 1024 × 1024 voxels.The results are visualized in Fig. 15 , where one can see the dense packing of intracellular compartments within a diferentiated superficial urothelial cell.

Discussion
In the following subsections, we discuss the presented results.

Segmentation
We presented two approaches for the segmentation of intracellular compartments, which were to our knowledge not yet evaluated on the FIB-SEM data.We also make public manual annotations for both classes as part of the UroCell dataset.
Our method for the segmentation of FVs achieved the average DSC score of 0.763.We can explain this value better by looking at the qualitative results.The method segmented the vesicles correctly most of the time ( Fig. 10 (b)), however, results were less satisfying for one sub-volume cut from the edge of the cell ( Fig. 10 (d)).This sub-volume contains more artifacts, as well as the cell membrane, which is sometimes very similar to the FVs and is thus incorrectly segmented ( Fig. 16 ).Even though the method itself doesn't present much technical novelty, we pose new challenges to the field by providing benchmark results and analysis of the method's drawbacks.
The GA was successfully segmented with a high DSC score (0.926).In the result table, we can see that the DSC is lower after the second step of the method (active contours).The reason lies in the false positives which are usually enlarged by the active contours, but are later eliminated in the postprocessing step.The presented method didn't miss any of the GA in the test set, except for one on the border of the sub-volume, which was removed, because its part inside the sub-volume was too small.Several false positives were caused by stacks of FVs labeled as the GA and empty space in the middle of the GA, which was not labeled as empty space.The number of GA in the test set was relatively low, so we also ran the segmentation on the entire volume, which qualitatively showed that the method successfully segmented the GA also in other sub-volumes.With our approach, we also showed that we can get accurate segmentation results with approximate ground truth labels, thus significantly reducing the time needed for annotation (5-fold).We believe that a similar approach is also useful for other segmen-tation problems, where objects with complex border needs to be segmented.

Reconstruction
Reconstruction of mitochondria consists of a few steps.Results in Table 3 show that the noise removal step removes many false positives and that the further step of dividing the mitochondria instances successfully divides most of the touching mitochondria.There are still some cases where division does not succeed, which happens if instances touch along a larger surface, so that the active   shows the reconstruction with the proposed approach: mitochondria (green), endolysosomes (purple), FVs (orange), and the GA (pink).contours method does not separate them.An example is shown in Fig. 17 .
Reconstruction of FVs is a harder problem, especially because the vesicles are very thin and close together, which makes it hard to distinguish the individual instances, even to biomedical experts.
The results in Table 4 show a low F1 measure, although the increased number of true positives indicate that some stacks were successfully separated into vesicle instances.We can also observe this in the qualitative results visualized in Fig. 10 .Analysis shows that simple stacks are usually well resolved (an example is shown in Fig. 6 ), but the method suffers on the more complex ones, mainly because of the narrow parametrization and oversimplification of using disks as vesicle models.The vesicles are often curved, so disks do not cover their volume well.Vesicles may also have different orientations in a stack and if several stacks are touching, they will not be correctly resolved, yielding many false negatives.An example of multiple touching stacks is shown in Fig. 18 .
Another problem arises with small vesicles, as the objective function will prefer fitting a disk to several vesicles, instead of a single smaller one.We visualize the problem in Fig. 19 , where we   can see that the red disk was fitted over two vesicles, which has a domino effect on the fitting of additional disks.Overall, the proposed method can be viewed as a baseline with many opportunities for improvements in the future.

Shape classification
We presented two algorithms for detecting contacting and branched mitochondria.The algorithms both start with the reconstructed mitochondria instances as their input, so their results are limited by the accuracy of reconstruction (and segmentation).The detection of contacting mitochondria is affected by errors in the input data, as erroneous touching or divisions of mitochondria are usually narrow.This can be observed in the lower two red mitochondria in Fig. 13 (d), which have a narrow connection in the ground truth but not in the input volume of the detection algorithm.The method itself is sensitive to cases where the contacting section is quite thick and is thus not labeled as such.This may be observed in Fig. 13 (d) -the upper red false negative object.We show an enlarged image of it in Fig. 20 .
The detection of branched mitochondria is mostly functioning correctly.False positives and false negatives are usually caused by the problems in the input data: mitochondria which were not divided by the division algorithm and are therefore branched are detected as false positives (an example is the purple object in Fig. 14 (d)), objects with incomplete segmentation (example in Fig. 21 ) are detected as false negatives, objects missing in the segmentation are detected as false negatives and there are also a few false negatives where the objects are not connected in the segmentation, although they should be.Errors caused by the algorithm itself and not the input data are mostly due to V-shaped branched objects (the red objects in Fig. 14 (d)).
We also evaluated the sensitivity of shape classification to different IOU thresholds and found that overall it is not very sensitive to the choice of the threshold.Performance slightly decreases when the IOU threshold increases (up to 0.75), due to small components, which are not taken into account when the threshold is lower than approx.0.75.The performance is stable up to approx.0.9, when it drops more significantly because of segmentation errors -some noisy components are not counted as TP anymore.Both presented algorithms are nevertheless extracting most of the objects of interest; as they are general, they could also be applied also to different domains with tubular objects such as vessels, neurons, etc.

Conclusion
For a comprehensive analysis of volumetric microscopy data, we need solutions that combine multiple steps: robust segmentation, quality reconstruction, and quantification of important parameters which in addition to simple quantities such as count, density, and position of the observed compartments, also includes more complex ones related to the analysis of component shapes.In the paper, we addressed several of these challenges and next to the proposed methods, we are also releasing a public dataset to encourage further research in the field.
We presented methods for segmentation and reconstruction of FVs which is to our knowledge the first of its kind.We presented a method for segmentation of the GA, which relies on approximately labeled ground truth, thus decreasing the time needed for creating manual annotations; a similar methodology can be applied to many related segmentation problems.We presented a method for reconstruction of mitochondria that improves on the original segmentation by separating the (incorrectly) merged components.Such object-wise reconstruction is necessary, as it allows for reliable quantification of objects, such as counting, measuring volume, etc.We also presented two methods and ground truth data for analyzing the shape of mitochondria and labeling them as contacting or branched.Such properties are important to analyze as they can indicate different physiologically significant states such as stress or spinal injuries.To our knowledge, such methods have not yet been proposed.We also showed that the presented methods can generalize well to data outside our test and training sets by showing the reconstruction of a large FIB-SEM volume.
We released a public dataset, which is to our knowledge the first volumetric EM dataset with manual annotation of FVs.Next to the recently published dataset by Wei et al. [ 38 ], which provides labels for instance segmentation of mitochondria, our dataset is the only one providing instance labels for several intracellular compartments.A step from semantic segmentation to more complex instance segmentation is crucial to ensure that we can trust in quantification measures derived from reconstructed volumesin current related works, these measures are usually reported and used for the biomedical analysis, but their quality is not analyzed.
There is still a lot of room for improvements of the presented methods.Our first plans are to improve the reconstruction of FVs, which we will address with proposal-free instance segmentation methods.Inclusion of more diverse datasets will enable training of better segmentation models, which will also positively affect the quality of reconstruction.Since many of the presented methods address challenges not tackled before, we hope that with the paper we set a benchmark that will be improved in the future by us and other researchers.
We will also continue our cooperation with biomedical experts and hope that the work presents a step towards a comprehensive analysis of volumetric microscopy data.By focusing on the urothelial tissue we want to support researches of the connections between quantitative parameters and bladder diseases, aging, and spinal cord injuries for which it has been shown that they are reflected in the altered structures of intracellular compartments.

Declaration of Competing Interest
Authors declare that they have no conflict of interest.

Fig. 2 .
Fig. 2. Steps involved in the segmentation of FVs (upper diagram) and the GA (bottom diagram).

Fig. 4 .
Fig. 4. Two mitochondria, classified as one connected set of voxels (left).Analysis of EM data (right) shows that they are only touching and are thus two separate instances.

Fig. 5 .
Fig. 5. Two stacks of FVs.(a) A 2D slice from the FIB-SEM data showing the two stacks.(b) Manually labeled ground truth for binary segmentation.(c) Instanced ground truth.(d) Binary ground truth shown in 3D.(e) Instanced ground truth shown in 3D.

Fig. 6 .
Fig.6.Reconstruction of FVs.The steps are illustrated left-to-right as follows: first the vesicle stack segmented from the FIB-SEM data is shown.The stack is then rotated to be parallel to the saggital plane.Two disks were fitted with the proposed optimization approach, then rotated back and finally merged with the original segmentation, where a separate label is assigned to each vesicle instance.

Fig. 7 .
Fig. 7. Two examples of contacting mitochondria.The upper example has two contacting sections, the lower has one contacting section.For each mitochondria, we show the underlying microscopy data.

Fig. 9 .
Fig. 9.A slice showing two mitochondria (green and yellow) that touched and were thus connected in the output of the segmentation (left).They were divided by the proposed reconstruction algorithm (right).

Fig. 10 .
Fig. 10.Segmentation of FVs. ( a ) Ground truth of the sub-volume with the highest DSC. ( b ) Segmentation of the sub-volume with the highest DSC. ( c ) Ground truth of the sub-volume with the lowest DSC. ( d ) Segmentation of the sub-volume with the lowest DSC.

Fig. 11 .
Fig. 11.Segmentation of the GA on one of the test sub-volumes.( a ) Output of the HighRes3dZMNet.( b ) Segmentation after applying active contours.( c ) Final segmentation after postprocessing.( d ) Ground truth.

Fig. 12 .
Fig. 12. Reconstruction of FVs. ( a ) Segmentation.( b ) Disks fitted to the segmentation.Each disk has its own label.( c ) The final output from the algorithm.( d ) Ground truth.( a-d ) Different colours present different labels.The colours don't match among the different sub-volumes.The only important thing is that a separate instance is of a different colour.

Fig. 13 .
Fig. 13.Detection of contacting mitochondria.( a ) Results of reconstruction.( b ) The output of detection of contacting objects.( c ) Ground truth.( d ) Differences between the ground truth and the result showing two missing false negatives in red.

Fig. 14 .
Fig. 14.Detection of branched mitochondria.( a ) Results of reconstruction.( b ) The output of detection of branched objects.( c ) Ground truth.( d ) Differences between the ground truth and the result.False negatives are labeled with red and false positives with purple.

Fig. 15 .
Fig. 15.Segmentation of the entire volume shows how densely populated the cells are with the intracellular compartments.(a) shows the raw microscopy data, and (b)shows the reconstruction with the proposed approach: mitochondria (green), endolysosomes (purple), FVs (orange), and the GA (pink).

Fig. 17 .
Fig. 17.An example of two mitochondria which are erroneously not divided.

Fig. 18 .
Fig. 18.Multiple stacks of FVs touching each other.Raw data is shown on the left, and the ground truth on the right.

Fig. 21 .
Fig. 21.A branched mitochondria (right) that was not correctly segmented (left) and was thus incorrectly labeled as non-branched.

Fig. 19 .
Fig. 19.Errors in reconstruction of FVs.(a) Raw data.(b) Output from the segmentation algorithm.(c) Disks fitted to the component from the segmentation step.(d) Final (incorrect) output of our approach.(e) Ground truth -four instances of FVs.

Table 1
Results for segmentation of FVs.

Table 2
Results for segmentation of the GA.

Table 3
Results for reconstruction of mitochondria.

Table 4
Results for the reconstruction of FVs.

Table 5
Results for detection of contacting and branched mitochondria.