Multi-class point cloud completion networks for 3D cardiac anatomy reconstruction from cine magnetic resonance images

Cine magnetic resonance imaging (MRI) is the current gold standard for the assessment of cardiac anatomy and function. However, it typically only acquires a set of two-dimensional (2D) slices of the underlying three-dimensional (3D) anatomy of the heart, thus limiting the understanding and analysis of both healthy and pathological cardiac morphology and physiology. In this paper, we propose a novel fully automatic surface reconstruction pipeline capable of reconstructing multi-class 3D cardiac anatomy meshes from raw cine MRI acquisitions. Its key component is a multi-class point cloud completion network (PCCN) capable of correcting both the sparsity and misalignment issues of the 3D reconstruction task in a unified model. We first evaluate the PCCN on a large synthetic dataset of biventricular anatomies and observe Chamfer distances between reconstructed and gold standard anatomies below or similar to the underlying image resolution for multiple levels of slice misalignment. Furthermore, we find a reduction in reconstruction error compared to a benchmark 3D U-Net by 32% and 24% in terms of Hausdorff distance and mean surface distance, respectively. We then apply the PCCN as part of our automated reconstruction pipeline to 1000 subjects from the UK Biobank study in a cross-domain transfer setting and demonstrate its ability to reconstruct accurate and topologically plausible biventricular heart meshes with clinical metrics comparable to the previous literature. Finally, we investigate the robustness of our proposed approach and observe its capacity to successfully handle multiple common outlier conditions.


Introduction
Cardiac magnetic resonance imaging (MRI) is the gold standard for the assessment of a large number of cardiovascular pathologies due its excellent soft-tissue contrast, lack of ionizing radiation, and minimal use of contrast agents (Stokes and Roberts-Thomson, 2017).In current clinical practice, most cine cardiac MRI acquisitions consist of a stack of two-dimensional (2D) short-axis (SAX) slices that provide a cross-sectional view of the heart, as well as multiple 2D long-axis (LAX) slices that intersect the heart longitudinally at different angles.While this allows the visualization of cardiac anatomy from multiple different views, the cine slices only capture information in 2D planes and are therefore unable to truly represent the inherent three-dimensional (3D) structure of the heart (O'Dell, 2019).However, accurate 3D cardiac anatomy models are necessary for a wide variety of applications in both clinical practice and research settings, including the accurate measurement of image-based biomarkers, discovery of novel biomarkers, visualization of healthy and pathological cardiac anatomy, and the development of both population-wide and case-specific modelling of cardiac mechanics and electrophysiology (Yang et al., 2017a;Gilbert et al., 2019;Attar et al., 2019;Mincholé et al., 2019;Mauger et al., 2019;Corral Acero et al., 2020;Levrero-Florencio et al., 2020;Beetz et al., 2021cBeetz et al., , 2022a,c,d;,c,d;Mauger et al., 2022;Beetz et al., 2023b,c).
Consequently, multiple research efforts have been dedicated to developing MRI-based methods capable of creating 3D representations of the human heart.The first group of approaches attempts to achieve this by increasing the spatial resolution of the MRI acquisition itself, e.g.3D MRI (Mascarenhas et al., 2006;Jeong et al., 2015).However, most techniques suffer from lower temporal resolution and reduced image quality compared to 2D acquisitions, making an accurate assessment of cardiac function more difficult (Amano et al., 2017;Usman et al., 2017).While more recent works have improved these shortcomings considerably, they were only tested on a small number of cases, are dependent on the availability of the most recent scanner hardware and software, come with long reconstruction times, and often only allow partial heart coverage (Wetzl et al., 2018;Küstner et al., 2020a,b).
A second group of approaches aims at reconstructing true 3D representations of the heart from the available clinically standard 2D cine MRI slices, which is also the focus of this work.These techniques typically first segment the cardiac structures of interest in the images and then use the resulting contours to reconstruct the corresponding 3D anatomical surface models.Similar to many other medical image analysis tasks, deep learning methods, such as the fully convolutional neural network (FCN) (Long et al., 2015), U-Net (Ronneberger et al., 2015), and their derivatives (C ¸ic ¸ek et al., 2016), have become the state-of-the-art approach for cardiac image segmentation in the recent past (Chen et al., 2020).While most cardiac MRI segmentation research has focused on biventricular segmentation from SAX images (Chen et al., 2020), some have also included the two-chamber (2ch LAX) and fourchamber long-axis (4ch LAX) views (Bai et al., 2018).More recent efforts aim to extend this success to more complex, multi-domain imaging datasets suffering from domain shift (Campello et al., 2021;Eisenmann et al., 2022;Martín-Isla et al., 2023).Hereby, the task is to either detect erroneous segmentations with improved quality control measures (Tarroni et al., 2020;Wang et al., 2020a;Machado et al., 2021) or avoid such failures alto-gether by using more robust algorithms that incorporate topological information about the underlying anatomy (Oktay et al., 2017;Byrne et al., 2020).
Given a set of contours derived from the 2D segmentation masks, the task of 3D surface reconstruction is a challenging, ill-posed optimization problem for two main reasons.First, the available 2D information is extremely sparse as compared to a 3D representation making an accurate surface reconstruction difficult, especially in regions with little or no data.In addition, artifacts caused by various types of motion (cardiac, respiratory, patient) during image acquisition result in slice misalignment and potentially erroneous anatomical information (Sievers et al., 2005;Scott et al., 2009;Bogaert et al., 2012).
In this work, we propose to utilize recent advances in geometric deep learning on point clouds (Qi et al., 2017a,b) to design a novel cardiac surface reconstruction method.Of particular importance for this work is 3D point cloud completion, which tries to predict the complete shape of a point cloud surface from a partial input (Yang et al., 2017b;Achlioptas et al., 2018;Yuan et al., 2018).Point cloud based deep learning methods have recently also been applied to various cardiac image analysis tasks, including segmentation (Ye et al., 2020), anatomy generation (Beetz et al., 2021b), deformation prediction (Beetz et al., 2021c), pathology classification (Chang and Jung, 2020;Beetz et al., 2023d), and the combined modeling of cardiac anatomy and electrophysiology data (Beetz et al., 2022a,c;Li et al., 2022).
To the best of our knowledge, this work is the first point cloud-based deep learning approach for multi-class bitemporal cardiac anatomy reconstruction from 2D cine MRI slices.Previous approaches lacked validation on real data and used inefficient voxel grid representations (Xu et al., 2019), did not incorporate class-specific and temporal information (Beetz et al., 2021a), or relied on the different approach of mesh template deformation with graph neural networks while only using SAX information (Chen et al., 2021).Our main contributions are summarized as follows: • We develop a 3D biventricular surface reconstruction pipeline with a novel point cloud-based deep learning network capable of addressing the data sparsity, motion artifact, and potential errors introduced as part of the segmentation or contouring process into a single model, while at the same time maintaining both multi-class and bitemporal anatomy information; • We evaluate our proposed multi-class point cloud completion network (PCCN) on a large-scale dataset of synthetic biventricular anatomies and demonstrate highly accurate reconstruction performance in a multi-temporal setting, at both the diastolic and systolic ends of the cardiac cycle; • We compare our PCCN to a state-of-the-art 3D U-Net approach and show its advantages in terms of reconstruction results and efficiency in data representation; • We successfully apply and validate the complete reconstruction pipeline on cine MRI acquisitions of 1000 UK Biobank (UKB) cases; • We calculate common clinical metrics from our method's UKB reconstructions and find plausible values compared to other population-wide cardiac anatomy studies; and • We conduct a robustness analysis of our PCCN with respect to erroneous input contours and increasing levels of misalignment.
The rest of the paper is organized as follows: Sec. 2 describes the two datasets used for method development and evaluation in this work.A detailed description of our proposed pipeline is provided in Sec. 3, while the experiments conducted for method evaluation along with the corresponding results are presented in Sec. 4. Finally, Sec. 5 provides a discussion of the proposed technique and our experimental findings, before Sec.6 concludes the paper.

Datasets
We use both a synthetic dataset generated from a highresolution statistical shape model (SSM) (Sec.2.1) and the real cine MRI acquisitions of the UK Biobank study (Sec.2.2) to develop and evaluate our method.

3D MRI-based statistical shape model
The first dataset of this work is based on the biventricular shape model from Bai et al. (2015), which was created from the 3D cardiac MRI scans of 1084 healthy volunteers with a 3D cine balanced steady-state free precession (b-SSFP) sequence and a resolution of 1.25×1.25×2mm.The authors registered and segmented all images at the end-diastolic (ED) and end-systolic (ES) phases of the cardiac cycle to construct two 3D biventricular surface meshes and applied principal component analysis to determine the 100 most important modes of variation of these two mean shapes.We use this SSM to derive a population of 3D biventricular anatomies and corresponding sparse 2D cine MRI inputs to train and evaluate our PCCN (Sec. 3.3.3).The SSM was selected as a basis for our synthetic data generation process due to multiple reasons.First, it is based on 3D MRI acquisitions, which offer high spatial resolution both in-plane and between image planes without the effects of slice misalignment and data sparsity.Second, the dataset was derived from a large and representative number of volunteers, increasing its robustness and ability to accurately capture the true variability in the population.Third, only healthy individuals were considered and consistent scanning protocols were used, making it compatible with large-scale cardiac imaging studies such as the UK Biobank dataset.Hence, we consider the shapes generated from the SSM as the ground truth for our method development.

UK Biobank
The second dataset used in this work consists of the 2D cine MRI acquisitions of 500 male and 500 female cases randomly selected from the UK Biobank study (Petersen et al., 2013(Petersen et al., , 2015)).For each case, we consider the first temporal frame of the cine sequence as the ED phase of the cardiac cycle and determine the frame of the ES phase from the segmented SAX stack as the cardiac phase with minimum LV volume (Banerjee et al., 2021a).As our dataset, we select all SAX slices as well as the twochamber (2ch) LAX and four-chamber (4ch) LAX slices for both ED and ES phases of the cine sequence for each case.Its large sample size and typical image resolution (1.8 × 1.8 × 8.0 mm), the availability of metadata for each case (sex, age), and the usage of a clinically established acquisition protocol (b-SSFP) make it an ideal choice for the evaluation of our proposed cardiac surface reconstruction pipeline under real-world conditions.Including both ED and ES phases in the dataset allows us to additionally analyze the performance of our pipeline in a multitemporal setting, which is crucial for many follow-up cardiac function tasks (Beetz et al., 2021c(Beetz et al., , 2022c)).

Methods
In this work, we propose a fully-automatic 3D biventricular surface reconstruction pipeline consisting of four steps outlined in Fig. 1-c.First, three pre-trained convolutional neural networks (CNN) are applied to segment the SAX, 4ch LAX, and 2ch LAX slices of the input cine MRI acquisition (Sec.3.1).Second, the anatomical contours obtained from the segmentation step are positioned in 3D space and converted into point clouds (Sec.3.2).Third, a pre-trained Point Cloud Completion Network (PCCN) is used to reconstruct a dense multi-class point cloud representation of the biventricular anatomy from the sparse, misaligned input point cloud in what constitutes the key step of the pipeline (Sec.3.3).Finally, the dense point cloud is transformed into an anatomical mesh (Sec.3.4.The pre-training step of both the CNNs (Fig. 1-a) and PCCN (Fig. 1-b) is conducted before the application of the full reconstruction pipeline.The following subsections describe the four steps of the pipeline in greater detail.

Cine MRI segmentation
The first step consists of the segmentation of the SAX, 4ch LAX, and 2ch LAX image slices of the cine MRI acquisitions of the UK Biobank dataset.To this end, we employ the fully convolutional network (FCN)-based approach proposed by Bai et al. (2018) for the segmentation of the SAX stack and 4ch LAX slices, since it has been shown to segment heart structures from cine MR slices with human-level accuracy.A detailed description of the segmentation method is provided in the Supplementary Material.
Due to the lack of a publicly available pre-trained network for automated UK Biobank 2ch LAX slice segmentation, we also train a separate conditional generative adversarial network (Isola et al., 2017) with a U-Net generator (Rezaei et al., 2017) for this task.Hereby, the training data consists of 200 2ch LAX frames chosen at random from separate UK Biobank subjects, equally distributed across the whole cardiac sequence.We extract endocardial and epicardial contours, as well as valvular contours along the mitral valve using the open source tool ImageJ (Schneider et al., 2012;Rueden et al., 2017) from which segmentation masks are computed.Image and segmentation mask pairs are rigidly augmented using rotations, translations, and crops around the LV center, yielding 1500 and 250 training and validation pairs respectively.

Conversion of 2D contours to 3D point cloud
The objective of the second step of our reconstruction pipeline is to convert the 2D segmentation masks of the different views (SAX stack, 4ch LAX, 2ch LAX) obtained in the previous step into a 3D sparse representation of the cardiac anatomy.To this end, we first extract the LV endocardial, LV epicardial, and RV endocardial contours from their respective segmentation masks.We then fit a Bspline curve separately to each contour and resample the same number of points as in the original contour along the obtained curve at equidistant intervals.We repeat this procedure for both SAX and LAX images and finally place all resulting points in the same 3D space as the original cine MR slices to create the corresponding biventricular point clouds for each case.

Multi-class point cloud completion network
The third step of our pipeline aims to address both the sparsity and misalignment challenges of cardiac surface reconstruction with a single deep learning model, while maintaining the spatial and temporal information of all anatomical structures.To this end, we propose a novel multi-class Point Cloud Completion Network, which acts directly on the sparse, misaligned point cloud representations of the biventricular anatomy.The following subsections explain the network architecture (Sec.
It consists of an encoder-decoder structure with a latent space vector of size 1024.The encoding part of the network is an adapted version of PointNet (Qi et al., 2017a) to allow multi-class point cloud processing with different resolutions for input and output data.The network input is a sparse, misaligned point cloud of size n × 4, where a scalar class variable to identify the cardiac substructure (LV cavity, LV myocardium, RV cavity) is concatenated to the 3 spatial coordinate values (x, y, z) of each of the n points.Inspired by the design of PointNet++ (Qi et al., 2017b), the input is fed through two combinations of PointNet-style (Qi et al., 2017a) convolutional blocks and pooling operations as well as a skip connection to allow the network to access information at different scales and across per-point feature maps, before passing the out- put vector to the decoder.
The decoder architecture exhibits a similar two-step design as the decoder of the Point Completion Network (Yuan et al., 2018), but is also adapted to our high-density and multi-class setting.The first step is inspired by Achlioptas et al. ( 2018) and inputs the latent space vector into a shared multilayer perceptron (MLP) followed by a reshaping operation to generate a coarse 3D point cloud with m points separately for each of the three anatomical classes.The goal of this low-resolution point cloud is to capture the global shape of the biventricular anatomy by distributing the 3D points along the surfaces of the respective anatomical structures so that the highest-possible coverage is achieved.The second part of the decoder is based on FoldingNet (Yang et al., 2017b) where points are first initialized as grid-structured patches of size 4 × 4 with the tiling operation where each patch corresponds to one of the points in the coarse 3D point cloud and is then iteratively deformed to obtain the best-possible fit with the dense target surface of the ground truth point cloud.This leads to an effective increase in point cloud resolution on a local level to obtain the final dense output point cloud while maintaining the global information of the coarse point cloud output.The size of the final dense output point cloud is p × 3 × 3, where p refers to the number of points, the first 3 to the spatial coordinates, and last 3 to the respective cardiac substructures.In this work, we set n, m, and p to 36000, 750, and 12000, respectively.

Loss function and training
We base the loss function to train our PCCN on Yuan et al. (2018) and extend it to a multi-class setting by summing over the loss values of each class to obtain a combined total loss.The class-specific loss function consists of two loss terms defined at two different stages of the decoder path as where C refers to the number of classes in the biventricular anatomy.Although we have only tested C = 3 in this work, the proposed approach can be easily extended to any number of classes.The first loss term L coarse compares the coarse 3D point cloud after the first decoder step with the dense ground truth point cloud and forces the sparse, intermediate point cloud to be a good representation of the global shape.The second loss term L dense acts on the final high-resolution point cloud prediction and enforces the desired smooth shape representation on both a global and local level.The weight α is used to control the importance of each of the two loss terms in the total loss.We choose a low α of 0.01 at the beginning of training to allow the network to first learn a good coarse representation of the global anatomy.As training progresses, α is gradually increased to focus on local anatomical details.We use the Chamfer distance between reconstructed and ground truth point clouds for both loss terms in (1): where P 1 refers to the predicted point cloud and P 2 to the ground truth point cloud.
We train the network for 2000 epochs on a GeForce RTX 2070 Graphics Card using the Adam optimizer (Kingma and Ba, 2015) and a batch size of 8.The learning rate is initially set to 0.0001 and reduced every 30k steps with a decay rate of 0.7 to enable finer network updates as training progresses.

Synthetic dataset generation
Since we do not have access to a large number of ground truth 3D anatomies, we construct a synthetic dataset from the statistical shape model (SSM) described in Sec.2.1 to train our PCCN.
To this end, we design a three-step process to synthesize the sparse and misaligned input point clouds as well as the corresponding dense ground truth point clouds (Fig. 3).First, we generate a virtual population of biventricular anatomy meshes by sampling from the SSM (Fig. 3-a).Next, we determine the slice planes of the generated meshes that best mimic the clinically standard cine MRI acquisition (Taylor and Bogaert, 2005;Walsh and Hundley, 2007;Margeta et al., 2014).We introduce small random translations to the chosen landmark points to recreate possible human errors during acquisition.We then artificially introduce misalignment artifacts due to respiratory and patient motion to each SAX and LAX slice, allowing us train the PCCN under realistic conditions.Hereby, we assume that the misalignment can be fully described by rigid transformations, which we found to be a good approximation of real conditions.In order to introduce different random misalignments for each case, we sample the transformation parameters from a normal distribution with zero mean separately for the x, y, z translations and the rotations around the x, y, z-axes respectively.To systematically analyze the performance of our method for different misalignment amounts, we introduce five subgroups with different average levels of randomly introduced misalignment, starting with no misalignment and then increasing the misalignment amount for each level (mild, medium, strong, severe).We choose five separate normal distributions with increasing standard deviation values for each of the subgroups to induce said differences in average misalignment between the different subgroups.Following previous pertinent literature (McLeish et al., 2002;Shechter et al., 2004;Chandler et al., 2008;Villard et al., 2016;Xu et al., 2019;Tarroni et al., 2020) (see Supplementary Material for more details), we select the standard deviation values in Table 1 as reasonable approximations for typical misalignment amounts found in real acquisitions for each severity level.
We introduce random misalignment in this way to each slice of the whole SAX stack and both LAX slices before converting the slices into 3D point clouds, which now represent the sparse, misaligned cine MRI contours of a realistic acquisition (Fig. 3-b).Finally, we extract the vertices of the corresponding deformed meshes generated from the SSM to obtain the dense ground truth point clouds for network training (Fig. 3-c).
We run this 3-step synthetic dataset generation process (Fig. 3) separately for each of the five levels of misalignment to create five different SSM-based datasets.For each of the four datasets with slice misalignment (mild, medium, strong, severe), we first generate 250 deformed meshes for both ED and ES phases and then apply 10 different sets of random misalignment transformations to each of the meshes, resulting in 5000 sparse, misaligned point clouds per misalignment level.In case of no misalignment, we sample 500 different shapes from the SSM for both ED and ES phases, and apply 5 different random transformations to mimic errors in slice plane selection to  Each of the five datasets is split into train, validation, and test datasets with sizes 80%, 5%, and 15%, respectively.

Surface mesh generation from dense point cloud
The last step of our surface reconstruction pipeline consists of transforming the multi-class biventricular point clouds into triangular meshes.To this end, we select the Ball Pivoting algorithm (Bernardini et al., 1999) and apply it separately for each of the three cardiac substructures of the reconstructed point clouds.This allows us to use different hyperparameter settings in the meshing algorithm for each class to account for their specific topological requirements.

Experiments
In this section, we first evaluate our proposed point cloud completion network on the SSM dataset (Sec.4.1) and compare its performance to a 3D U-Net benchmark (Sec.4.2).We then validate the complete cardiac surface reconstruction pipeline on the UK Biobank dataset from both a geometric (Sec.4.3) and clinical perspective (Sec.4.4) and analyze its robustness (Sec.4.5).

Statistical shape model dataset
We choose the synthetic SSM dataset (Sec.3.3.3)for the first evaluation of our point cloud completion network, as it enables a direct comparison between the available ground truth anatomies and the reconstructed point clouds and meshes.By introducing slice misalignment at five different levels of severity, we can also analyze the effect of two cardiac phases perform similarly well in the reconstruction task.
In order to quantify the reconstruction ability of our point cloud completion network for different misalignment amounts, we calculate the Chamfer distances between the dense predicted point clouds and the corresponding ground truth point clouds in each of the five test datasets.We report the results in Fig. 6, split by cardiac substructure and phase for each of the five levels of slice misalignment.
We find median Chamfer distances considerably below or close to the underlying image resolution (1.8×1.8 mm) with low quartile deviation values for all levels of misalignment, cardiac phases, and substructures.Both median and quartile deviation values increase with rising levels of misalignment.Chamfer distances are generally higher for the right ventricular anatomies than for the left ventricular ones, while only marginal differences generally exist between the ED and ES phases.

Comparative analysis
We compare our PCCN with a state-of-the-art 3D U-Net architecture (C ¸ic ¸ek et al., 2016), which has previously been applied to biventricular surface reconstruction (Xu et al., 2019).For this task, we select the SSM dataset with medium misalignment as it represents the mean slice misalignment expected in a typical cine MRI acquisition (McLeish et al., 2002;Shechter et al., 2004;Chandler et al., 2008;Xu et al., 2019;Tarroni et al., 2020;Villard et al., 2016).
Since U-Nets operate on grid-based structures, we first convert the sparse, misaligned input point clouds and the ground truth point clouds of our dataset to voxel grid representations of the biventricular anatomy.We set the voxel size to 1.5 × 1.5 × 1.5 mm, chosen as a trade-off between closeness to the 3D MRI resolution underpinning the SSM dataset (1.25 × 1.25 × 2 mm) and ensuring that the complete biventricular anatomy fits into the fixed size 128 × 128 × 128 voxel grid for all cases.It is also smaller than the pixel size of the underlying 2D image acquisition (1.8 × 1.8 mm) which acts as a lower accuracy limit of the point cloud representation.Furthermore, the voxel resolution values are slightly higher than the 2×2×2 mm used in the work by Xu et al. (2019), enabling a more accurate reconstruction.
With both point cloud and voxel grid representations available for each case, we train both our PCCN and a 3D U-Net for biventricular surface reconstruction on the same dataset.To allow a comparison of results, which is as afair as possible, we convert both the point clouds and voxel grids to multi-class triangular meshes as a neutral data type by using the Ball Pivoting (Bernardini et al., 1999) and Marching Cubes algorithms (Lorensen and Cline, 1987), respectively.The resulting meshes predicted by the PCCN and the 3D U-Net as well as the corresponding ground truth meshes are shown for two sample cases in Fig. 7.
We observe that both the PCCN and the U-Net are able to accurately reconstruct different cardiac shapes for all cardiac substructures and phases.On a global level, we notice only minor differences between the results, which are mostly caused by the lower smoothness of the U-Net outputs as a result of deriving them from gridded data.Visible differences are larger on a local level where the U-Net reconstructions exhibit erroneous outward bulging   in some surface regions that do not align with the ground truth and are correctly smoothed out in the respective PCCN predictions.These differences most commonly occur in the LV cavity substructure and are slightly more pronounced in ES than in ED.

3D U-Net Ground Truth Proposed
In order to quantify the differences between the PCCN and the 3D U-Net, we calculate the Hausdorff distances, the mean surface distances (MSD), and the Chamfer distances between predicted and ground truth meshes of the unseen SSM test dataset for both methods and report the results in Table 2.In addition, we also provide informa-tion about the number of network parameters and data representations used in the respective approaches.
We find that the PCCN outperforms the 3D U-Net by 32% and 24% in terms of average Hausdorff distance and mean surface distance, respectively.Standard deviations of both distance metrics are also lower for the PCCN than for the U-Net reconstructions, while only minor differences exist between the ED and ES results.Due to its usage of memory-efficient point clouds, the PCCN achieves this outperformance despite using 13 times less storage space for each anatomy.

UK Biobank
After evaluating our PCCN on the synthetic SSM datasets, we assess the ability of the complete cardiac surface reconstruction pipeline to transform raw cine MR images into triangular mesh representations of the biventricular anatomy on the real-world dataset of the UK Biobank study (Petersen et al., 2013(Petersen et al., , 2015)).Since the first two steps of our 4-step reconstruction pipeline were both developed and validated on the MRI acquistions of the UK Biobank (Bai et al., 2018;Banerjee et al., 2021a), we can directly use them for this task without any further adjustments.As the third step of our pipeline, we want to directly apply a PCCN pre-trained on one of the SSM datasets to the sparse, misaligned UK Biobank point clouds obtained in the second pipeline step in a crossdomain transfer setup.To this end, we first assess which  of the five networks, trained on different amounts of misalignment, is the best fit for the UK Biobank dataset.We refer to these networks as no, mild, medium, strong, or severe misalignment networks for the remainder of this paper.
Since this is not a straightforward task due to the lack of available 3D ground truth for the UK Biobank data, we create a set of approximate ground truth meshes to act as a benchmark for our analysis.We first select 10 cases with the least amount of misalignment in the UK Biobank dataset.Hereby, we determine the misalignment amount of each case by calculating the average shortest distance of each point in each slice to the remaining slices in the given point cloud.The corresponding 3D point clouds are then reconstructed for both ED and ES using the PCCN trained on the SSM dataset with no misalignment.We consider these 3D reconstructions as our pseudo-gold standard for this experiment.However, we note that some reconstruction error is still expected to be present, as the selected cases are not completely without misalignment, come from a different, unseen domain compared to the PCCN's training SSM dataset, and might contain segmentation errors.
Given this set of pseudo ground truth anatomies, we can apply each of the four pre-trained PCCN candidate models to the sparse input point clouds and compare the predicted 3D reconstructions with the corresponding pseudo ground truths.However, this would only assess the performance on UK Biobank cases with very little misalignment, which are not representative of the whole dataset.Hence, we first artificially introduce random slice misalignments to each of the sparse input point clouds to mimic real-world misalignment conditions, while still maintaining our pseudo gold standard point clouds required for the comparative evaluation.Similar to our experiments on the SSM dataset, we include both ED and ES point clouds of each case in the dataset and introduce the misalignment at four different levels of severity (mild, medium, strong, severe).For each level, 10 random amounts of misalignment are applied to each of the 10 pseudo ground truth cases, resulting in 100 misaligned and sparse point clouds per misalignment level.We use the Chamfer distances between the predicted and pseudo gold standard point clouds as our evaluation metric in all cases and report the UK Biobank reconstruction results separated by cardiac substructure and sex in Fig. 8.
We observe that the mild misaligment PCCN generally achieves the best overall results across all misalignment levels, cardiac substructures, and sex.Its distance scores are the lowest for mild, medium, and strong UKB misalignments, as well as for severe UKB misalignment in the RV endocardium.The medium misalignment PCCN performs best on severely misaligned left ventricular UKB data and second-best overall.We also see a general decrease in performance of all four analyzed networks with increasing misalignment in the UKB data.
Based on these quantitative evaluation results on the UK Biobank data, we select the mild misalignment PCCN for the third step of our reconstruction pipeline.With all components of the full reconstruction pipeline available, we apply it to the randomly selected 1000 subjects of the UK Biobank dataset.We visualize the sparse, misaligned input point clouds, the corresponding dense output point clouds, and the output meshes for two sample UK Biobank cases in Fig. 9.
We find realistic and plausible 3D reconstructions that align very well with the 2D anatomical information in the sparse input point clouds for all cardiac substructures and phases.Furthermore, the meshing step is able to successfully preserve the cardiac surface anatomy of the reconstructed point clouds in the final output meshes and create topologically accurate two-manifold meshes for 97% of all cases.Only small differences in reconstruction performance between the ED and ES phases are observed.

Clinical metrics
Next, we evaluate the ability of our cardiac surface reconstruction method to generate clinically plausible meshes on a population level.To this end, we select two population-wide studies of healthy cardiac anatomy and function and compare their results with ours in terms of multiple clinically established cardiac image-based biomarkers.Table 3 provides an overview of the two benchmark studies along with our proposed method.
We select the LV and RV volumes at both ED and ES phases as well as the LV myocardial mass as image-based biomarkers for the assessment of cardiac anatomy, while stroke volume (SV) and ejection fraction (EF) are used to quantify cardiac function for both the LV and RV.We calculate these metrics for all cases of our UK Biobank dataset using both the modified Simpson's rule on the 2D slice segmentations and the direct calculation from our reconstructed 3D meshes.The results are shown in Table 4, along with the corresponding values reported in the benchmark studies of Petersen et al. (2017) and Bai et al. (2015).We split the scores by sex to analyze whether subpopulation-specific differences are accurately reflected in our method's reconstructions, providing additional validation of our proposed pipeline.We note that, while the analysis of Petersen et al. (2017) is also based on the UK Biobank study, we use a different subset of cases in this work.We observe that our 3D reconstruction pipeline achieves plausible scores for all analyzed metrics and is able to accurately capture sex-related differences.This is shown by the higher left and right ventricular volumes reported for male cases compared to the female ones, which is also present in all three benchmark studies.Comparing our 3D mesh-based approach with the two 2D slice-based calculation methods (Simpson's rule and Petersen et al. ( 2017)), we find similar values for left ventricular volume (LV end-diastolic volume -LVEDV, LV end-systolic volume -LVESV) and function (LV stroke volume -LVSV, LV ejection fraction -LVEF) metrics, but larger values for LV mass and right ventricular volumetric metrics (RV end-diastolic volume -RVEDV, RV end-systolic volume -RVESV, RV stroke volume -RVSV).At the same time, our pipeline's scores are lower than the other 3D meshbased approach by Bai et al. (2015) in three out of the four available metrics and comparable for LVESV.The comparative analysis shows similar trends for both sexes with slightly larger differences for male cases.
In order to further analyze the ability of our method to take into account subpopulation-specific differences in its reconstruction task, we also calculate the same clinical metrics for three different age groups.The results of our 3D mesh-based calculations, the 2D slice-based approach using modified Simpson's rule, and the corresponding values reported by Petersen et al. (2017) are reported for each of the three age groups in Table 5.We only show the scores for female cases since the observed trends are similar for both sexes.The corresponding table for male cases can be found in the supplementary material.
Similar to the sex-specific results, we find generally plausible scores for our 3D reconstructions and comparable trends between our 3D and the two 2D-based calculations with LV mass and RV metrics showing higher and the remaining metrics similar values.Our method is able to successfully capture clinically established age-related changes for all metrics.Examples include the decline in left and right ventricular volume at both ED and ES with increasing age and the consistent EF values across all age groups.In the former case, both our 3D and 2D-based calculations show decreases for both older age groups, while Petersen et al. (2017) report small increases for the oldest age group compared to the medium one.

Robustness analysis
To further validate the accuracy of our proposed reconstruction method on the UK Biobank dataset, we investigate its robustness to various common outlier conditions.In this regard, the image segmentation step of our pipeline is of considerable importance as it affects all downstream tasks, including the 3D surface reconstruction step with the PCCN.While the segmentation performance of modern deep learning approaches has generally been shown to be on par with human experts on a population level for healthy cases (Bai et al., 2018), individual cases or slices often still result in erroneous outputs.These include the breakage of the LV myocardium in the apical region of the heart, the erroneous inclusion of papillary muscles in the myocardial region, anatomically incorrect segmentation of the basal plane slices, or the complete failure of the Values represent mean ± standard deviation.In order to investigate the effects of such errors on the 3D surface reconstruction ability of the PCCN, we first select various UK Biobank cases that suffer from either myocardial breakage or erroneous segmentation of papillary muscles in the predicted segmentation masks.We then compare the affected regions in the sparse, misaligned input point clouds and the dense output point clouds reconstructed by the PCCN.The results are depicted for two sample cases of the UKB dataset in Fig. 10.We see that in both cases the PCCN is able to correct the myocardial breakage at the apex and reconstruct a smooth, continuous myocardium at the affected region.The bottom case in Fig. 10 also depicts an erroneous segmentation of the papillary muscles, which are included in the myocardial region.This results in an inward bulging myocardium in the left mid-cavity region of the sparse input point cloud.However, similar to the myocardial breakage, the PCCN has successfully removed it from the dense output point cloud.We find this corrective ability of the PCCN present in all UK Biobank cases where either myocardial breakage or wrong papillary muscle segmentation occurs.In addition, the myocardial thickness in both reconstructed point clouds is smaller than suggested by the 2ch LAX view alone, but larger than the spatially corresponding information in the SAX slices.This shows that the PCCN is able to utilize the available data from multiple views and select the best trade-off between the available information as the final output.

Discussion
We have developed and successfully validated a fully automatic 4-step pipeline for cardiac surface reconstruction from raw cine MR images.The PCCN as the main step of the pipeline is able to solve both the sparsity and misalignment issues in a single model, while retaining both class-specific information of the different cardiac substructures and cardiac phases (ED or ES).Its architecture is specifically designed for direct and effective point cloud processing.On the one hand, this enables a more memory-efficient data storage and the usage of higher resolutions to represent anatomical surfaces, which is beneficial for many downstream tasks (Beetz et al., 2021b,c;Corral Acero et al., 2022;Di Folco et al., 2022).On the other hand, the fact that only the surface level information is processed by the network facilitates the reconstruction task and ultimately leads to better performance than inefficient grid-based CNNs which require considerably larger amounts of memory to store the same 3D surface data and force the network to manage the additional difficulty of processing highly sparse data.No postprocessing step needs to be applied on our reconstructed point clouds making its application easier than voxel gridbased approaches, which often require further processing (e.g.selection of largest connected component) (Xu et al., 2019).While we develop the PCCN for three classes and two cardiac phases in this work, the network design can easily be extended to additional cardiac substructures or cardiac phases.
The point cloud-based deep learning approach also allows a straight-forward and effective integration of both SAX and LAX information which is crucial for an accurate 3D surface reconstruction, especially in informationsparse regions between slices or in the apical and basal areas of the heart.This in turn is of high importance for many downstream tasks, such as the accurate measurement of longitudinal strain which would be considerably more noisy when based only on SAX information.In addition, the PCCN can also be applied over manually delineated contours through a graphical interface (Banerjee et al., 2021a) providing flexibility to the first step of the reconstruction pipeline.Furthermore, the PCCN does not require any landmark detection, point-to-point correspondence or registration between the input and output point clouds for training, does not need any specific nor-malization to be applied to the input point clouds, and also does not rely on any template shapes, as opposed to many deformation-based reconstruction approaches (e.g.Lamata et al. (2014)).
Our PCCN achieves mean Chamfer distances between the reconstructed and gold standard point clouds that are below or similar as the underlying image resolution for all tested misalignment levels, cardiac substructures, and cardiac phases.This demonstrates that the PCCN is able to reconstruct a large variety of cardiac shapes that differ both spatially and temporally with high accuracy on both a local and global level.This is facilitated by the design of the PCCN decoder with both a coarse and dense output point cloud attending to information at different scales.The low standard deviation values of the Chamfer distances show that this high reconstruction quality is consistently obtained throughout the dataset indicating a high robustness of the network against outlier cases.Once trained, the PCCN also offers considerable speed advantages compared to traditional non deep learning-based reconstruction techniques (Lamata et al., 2014;Villard et al., 2018a;Banerjee et al., 2021a), making it particularly advantageous for large-scale data processing.The combined multi-class anatomy processing is especially beneficial in this regard as it avoids the need for separate reconstruction processes to be run for each cardiac substructure.
We observe that the PCCN pre-trained on the 3D MRIbased SSM dataset can be successfully applied to the UKB dataset in a cross-domain transfer setting as part of the full reconstruction pipeline.This indicates that both the shape deformations and virtual slice planes selected during the creation of the SSM dataset are a realistic representation of real-world conditions.Furthermore, we did not observe any major negative bias or smoothing effects in the reconstructed shapes which showcases the suitability of the PCCN for cross-domain applications.As expected, we observe larger reconstruction errors for larger amounts of introduced misalignment which reflects the more difficult task.Male hearts generally show larger Chamfer distances than female ones across all misalignment levels and substructures.We believe this to be primarily a consequence of using the same point cloud resolution to represent the larger male hearts.This results in typically larger spatial distances between individual points even in case of similarly high reconstruction qual-ity which is no longer present once the values are normalized by heart size.We find that the PCCN pre-trained on mildly misaligned SSM data achieves the best performance on the UKB dataset.This is somewhat surprising as the medium level was originally selected to reflect the average misalignment of typical acquisitions as in the UKB study.We hypothesize that on the one hand, the UK Biobank cohort could suffer from smaller amounts of misalignment than comparable studies due its usage of a coherent acquisition protocol or the selection of relatively healthy volunteers.On the other hand, the small misalignment amounts in the SSM dataset might also act as a regularizer during network training, which in turn helps the PCCN's generalization ability to the new UKB domain.It should also be noted that stronger misalignments are still present in the mildly misaligned SSM dataset albeit to a lesser extent.Finally, the selected misalignment amounts for each level are only chosen as an approximation derived from literature and could therefore also exhibit some degree of error.However, since the Chamfer distances show high reconstruction accuracy for all four PCCNs pre-trained on different misalignment levels, we conclude that a different choice in pre-training dataset would result in only a marginal performance drop.Furthermore, since there are no ground truth shapes available for the UK Biobank dataset, we base our evaluation on a comparison with a pseudo ground truth created by artificially introducing misalignment to selected real data.This likely results in a certain amount of noise in the pseudo ground truth and hence limits the accuracy of the obtained results.However, the misalignment was introduced in a way to approximate real conditions as closely as possible based on findings in prior work.Furthermore, we have also qualitatively assessed the quality of the artificial misalignment with a comparison to the true misalignment in the UK Biobank cases to ensure a high degree of realism.While the manual creation of a potentially more accurate ground truth is a possibility, this would also introduce a degree of subjectivity into the gold standard and significantly complicate the application to larger datasets.
Using the mild misalignment PCCN, we observe a high degree of alignment between the clinical metrics calculated directly from our 3D reconstructed meshes and the respective benchmark methods.This shows that both cardiac anatomy and function are accurately represented in the reconstructions while successfully taking into account the differences in subpopulations (sex, age), cardiac structures, and phases on a real-world dataset.It also further corroborates the accuracy of our pre-training and crossdomain transfer steps.Furthermore, it provides evidence of the effectiveness of our proposed meshing procedure, as a topologically correct two-manifold mesh is required for accurately calculating volumetric biomarkers.While no such topological correctness was achieved for some cases with the current approach, additional fine tuning of the relevant hyperparameters and pipeline would likely further improve the quality of the resulting meshes.
The most noticeable differences in clinical metrics between the 3D and 2D-based calculations are found in the larger values obtained for the LV myocardial mass and RV volumetric metrics.The latter is an expected outcome that we believe to be a consequence of the general RV mesh shape in the original SSM by Bai et al. (2015) that we used to derive our SSM dataset and pre-train our PCCN.In the SSM, the RV extends considerably above the basal SAX plane which leads to higher 3D volumes compared to a 2D-based calculation where the disk around the basal plane position serves as the boundary for calculating the respective volumes.These larger RV volumes are reflected in the reported scores in Tables 4 and 5.This explanation is further corroborated by the RVEF values which show high similarity with the 2D-based approaches due to it being a relative metric that normalizes out raw size differences in volumes.We also note that the three comparative benchmarks rely on manual (Petersen et al., 2017), semi-automatic (Bai et al., 2015), and fully automatic (Simpson's rule applied to our UK Biobank dataset) approaches respectively to obtain the image segmentations required for their biomarker calculations.This further corroborates the good performance of our method, as its reconstructions exhibit similar clinical metrics as multiple ground truth benchmarks derived in different ways.
Finally, we find that the PCCN is able to successfully correct common errors in the segmentation contours of the precursor task by providing continuous and smooth myocardium boundaries with appropriate thickness even in cases of myocardial breakage or erroneous inclusion of the papillary muscles.This indicates that the PCCN is capable of implicitly learning an accurate anatomical prior during training which in turn allows it to automatically adjust anatomical inconsistencies.

Conclusion
We have developed a novel multi-class Point Cloud Completion Network capable of reconstructing 3D biventricular surface anatomies from sparse and misaligned cine MRI contours with high accuracy, while taking both temporal and spatial differences in the underlying cardiac substructures into account.We have also shown that the PCCN trained on a synthetic 3D MRI-based dataset can be successfully applied as the key component of a multistep 3D cardiac surface reconstruction pipeline from raw 2D cine MRI acquisitions of the UK Biobank dataset in a cross-domain transfer setting.Finally, we have thoroughly evaluated both the PCCN and the complete 4-step pipeline on two different datasets and found very high reconstruction accuracy and robustness in terms of a variety of both geometric and clinical metrics.In our future works, we plan to investigate the possibility for further architectural improvements, for example by using the point cloud-based attention mechanisms, and to extend the presented method to other cardiac substructures and the full cardiac cycle.We also plan to evaluate the cardiac reconstruction performance over varying cardiac pathologies in the near future.• Tarroni et al. (2020): "Inter-slice misalignment (Fig. 5, left) had a median value of 2.29 mm and an interquantile range (IQR) of 1.17 mm." • Chang and Jung (2020): "The ranges of the translation in the x, y and z directions were ±1.9 mm, ±3.6 mm, and ±12.2 mm respectively, and the ranges of the rotations in the x, y and z directions were ±0.8 degrees, ±3.2 degrees, and ±0.4 degrees respectively." • Shechter et al. (2004) (based on Free Breathing Angiogram): "For all patients, the heart translated caudally (mean, 4.9 ± 1.9 mm; range, 2.4 to 8.0 mm) and underwent a cranio-dorsal rotation (mean, 1.5 • ± 0.9 • ; range, 0.2 • to 3.5 • ) during inspiration.In eight patients, the heart also translated anteriorly (mean, 1.3 ± 1.8 mm; range, -0.4 to 5.1 mm) and rotated in a caudo-dextral direction (mean, 1.2 • ± 1.3 • ; range, −1.9 • to 3.2 • )." • McLeish et al. (2002) (based on difference between maximum inhale and maximum exhale): "[. . .] typical deformations were 3-4 mm with deformations of up to 7 mm observed in some subjects." • Villard et al. (2016): "Table 1 shows the mean, median, and standard deviation resulting from contour to contour distance calculations before the alignment [. . .]: Median = 2.19; Mean = 2.82; Std = 2.48".Shechter et al. (2004) reported values for respiratory motion artifact in free breathing angiograms.This is different from the cine MRI acquisitions with breath holds used in this work.McLeish et al. (2002) measured motion values between maximum exhale and inhale positions: "The Volunteer Results (V) Show the Movement Between Maximum Exhale and Maximum Inhale", "The patients were asked to hold their breath at the normal end-expiratory and the normal end-inspiratory positions".Consequently, the reported values are likely considerably larger than what would typically be expected for a standard cine MRI acquisition with breath hold.

Appendix C. Analysis of differences between 3D and 2D-based metrics calculations
In order to analyze the observed LV myocardial mass differences between the 3D and 2D-based calculations in greater detail, we select four UK Biobank cases with particularly large and small difference values and visualize them in high amounts of misalignment and low for cases with very little slice misalignment.
We presume that these results are caused by the usage of misalignment correction in the calculations based on 3D reconstructions which is missing in the 2D-based approach and therefore leads to particularly large differences when stronger misalignment is present.

Figure 1 :
Figure 1: Overview of our proposed 3D cardiac surface reconstruction pipeline from cine MR images.(a) We train three separate CNNs to segment SAX, 4ch LAX, and 2ch LAX cine MR images.(b) We train a Point Cloud Completion Network on 3D MRI-based dataset to reconstruct a dense 3D point cloud with corrected misalignment from a sparse, misaligned input point cloud.(c) We propose a 4-step pipeline to reconstruct 3D multi-class cardiac meshes from raw cine MRI acquisitions using the pre-trained networks (a,b) in steps 1 and 3 of the pipeline.
3.3.1,loss function and training procedure of the PCCN (Sec.3.3.2,including the generation process of a synthetic biventricular anatomy dataset (Sec.3.3.3for network training and an initial validation.

Figure 2 :
Figure 2: Architecture of the proposed point cloud completion network.The input is a 3D point cloud, which represents the sparse and misaligned cine MRI acquisitions as a n × 4 tensor where n refers to the number of points and 4 to the spatial x, y, z coordinates with a class label for each point.The network is tasked to reconstruct both a coarse, lowdensity point cloud to capture the global surface structure and a dense, high resolution point cloud to accurately represent the cardiac anatomy on both a local and global level and serve as the final network prediction.The three anatomical substructures are encoded as separate sets of x, y, z point coordinates in each of the output point clouds.Accordingly, the output dimensionality is m × 3 × 3 for the coarse point cloud and p × 3 × 3 for the dense point cloud where m and p refer to the respective number of points.

Figure 3 :
Figure 3: Overview of the synthetic dataset generation from a 3D MRIbased statistical shape model.
of normal distributions with zero mean for each level of misalignment.each of the 1000 point clouds.We note that no individual correspondence between generated ED and ES shapes is present in the dataset based on the available SSM data.

Figure 4 :Figure 5 :
Figure 4: Qualitative reconstruction results of an ED sample case for each of the five levels of misalignment from the SSM dataset.

Figure 6 :
Figure 6: Boxplots presenting Chamfer distances between ground truth and reconstruction results of our method on five different SSM datasets with increasing levels of slice misalignment.

Figure 7 :
Figure 7: Qualitative reconstruction results on two sample cases of the SSM dataset for the Point Cloud Completion Network and a 3D U-Net.

Figure 8 :
Figure 8: Boxplots showing the reconstruction performance of networks trained on SSM datasets with different misalignment levels (in columns) and applied to UKB data with different misalignment levels (in rows).

Figure 9 :
Figure 9: Qualitative reconstruction results for two sample cases of the UK Biobank dataset.

Figure 10 :
Figure 10: Two sample cases with myocardial breakage in the apical region of the 2ch LAX segmentation mask.
Figure C.11: Sample cases at ES with largest (a) and smallest (b) differences between 2D and 3D-based calculations of clinical metrics.

Table 1 :
Misalignment amounts per severity level.

Table 2 :
Comparison of cardiac mesh reconstruction methods using the SSM dataset with medium misalignment.The distance scores are averaged across the three cardiac substructures.

Table 3 :
Dataset comparison of the proposed and benchmark studies.

Table 4 :
Clinical metrics of female and male cases reported by different studies.

Table 5 :
Clinical metrics of female cases split by age group as reported by different studies.

Table A .
6: Clinical metrics of male cases split by age group and calculated using different methods.
Values represent mean ± standard deviation.
Table A.7: Chamfer distances between dense ground truth point clouds and corresponding sparse and misaligned input contours in different SSM datasets with increasing levels of slice misalignment.Values represent mean ± standard deviation.cardiovascular imaging and briefly discuss relevant differences with the setup in this work.The key statements regarding misalignment values in the referenced sources are as follows: • Xu et al. (2019): "To mimic the misalignment caused by motion artifacts, we kept fixed image planes and applied 3D rigid transformations to the model (random rotations no larger than 10°and random trans-lations of no more than 4 mm) [. . .]."