Neural network-based left ventricle geometry prediction from CMR images with application in biomechanics.

Combining biomechanical modelling of left ventricular (LV) function and dysfunction with cardiac magnetic resonance (CMR) imaging has the potential to improve the prognosis of patient-specific cardiovascular disease risks. Biomechanical studies of LV function in three dimensions usually rely on a computerized representation of the LV geometry based on finite element discretization, which is essential for numerically simulating in vivo cardiac dynamics. Detailed knowledge of the LV geometry is also relevant for various other clinical applications, such as assessing the LV cavity volume and wall thickness. Accurately and automatically reconstructing personalized LV geometries from conventional CMR images with minimal manual intervention is still a challenging task, which is a pre-requisite for any subsequent automated biomechanical analysis. We propose a deep learning-based automatic pipeline for predicting the three-dimensional LV geometry directly from routinely-available CMR cine images, without the need to manually annotate the ventricular wall. Our framework takes advantage of a low-dimensional representation of the high-dimensional LV geometry based on principal component analysis. We analyze how the inference of myocardial passive stiffness is affected by using our automatically generated LV geometries instead of manually generated ones. These insights will inform the development of statistical emulators of LV dynamics to avoid computationally expensive biomechanical simulations. Our proposed framework enables accurate LV geometry reconstruction, outperforming previous approaches by delivering a reconstruction error 50% lower than reported in the literature. We further demonstrate that for a nonlinear cardiac mechanics model, using our reconstructed LV geometries instead of manually extracted ones only moderately affects the inference of passive myocardial stiffness described by an anisotropic hyperelastic constitutive law. The developed methodological framework has the potential to make an important step towards personalized medicine by eliminating the need for time consuming and costly manual operations. In addition, our method automatically maps the CMR scan into a low-dimensional representation of the LV geometry, which constitutes an important stepping stone towards the development of an LV geometry-heterogeneous emulator.


Background
Computational studies of left ventricular (LV) mechanics, when integrated with cardiac magnetic resonance (CMR) imaging, can lead to a better understanding of LV dysfunction [1,2,3]. For instance, biomechanical parameters that describe LV function provide new insights into the heart's pump function, related e.g. to myocardial stiffness or contractility [4,5]. Biomechanical studies of LV mechanics typically rely on discrete representation of the LV geometry [6,3]. This discretized LV geometry is necessary as the cardiac mechanic equations admit no closed form solution and have to be solved numerically, i.e. using the finite element method [7]. Moreover, such a geometry itself has direct clinical applications as it can be used to derive various cardio-physiological quantities of interest (e.g. the LV cavity volume and local wall thickness), and provide a realistic 3D shape visualization to clinicians. Hence, obtaining an accurate ventricular geometry is an important diagnostic task, especially for personalized health care [8].
Despite the significance of LV geometry reconstruction in clinical applications, so far the literature on how to make this process fast, reliable or automatic has been limited. In general, a typical procedure of LV geometry reconstruction from in vivo CMR data involves the following four steps: 1) segmentation (either manual or automatic) of LV wall in selected CMR images of a given subject; 2) stacking segmented LV wall boundaries in a selected 3D coordinate system with necessary motion correction [9]; 3) 3D LV geometry reconstruction either through surface fitting, or direct 3D shape generation; 4) generation of a discrete representation of the LV geometry, such as finite element mesh. This procedure requires specialist knowledge, and it is time consuming and prone to human error, which prohibits its wide adoption in the clinic. More recent methods for cardiac geometry reconstruction include manual iterative interventions for reconstruction [10], and warping an idealized ventricular geometry, e.g. an ellipsoid, into patient data [11]. These methods, however, require a separate (manual) segmentation step. In addition, there have been few studies which examine the impact on simulations of LV biomechanics when using geometries reconstructed with such techniques in place of manually reconstructed geometries. In particular, reconstructions based on parametric LV geometry representations, like ellipsoids, are likely to result in a systematic bias, which could have potentially severe consequences for subsequent biomechanical analysis.
These difficulties can potentially be addressed with modern machine learning, which has been successfully applied to challenging problems across numerous domains, and its application in medical contexts has the potential to lead to longlasting advances in healthcare [12]. Specifically, convolutional neural networks (CNNs) have proven valuable in various tasks related to cardiac image analysis, such as automated segmentation of CMR scans [13], survival prediction based on sequences of CMR scans [8], or 3D bi-ventricular segmentation from CMR images [14]. Most of the previous studies have focused primarily on image segmentation tasks [12], while comparatively fewer studies have aimed to learn the ventricular geometry from cardiac images. For example, Bai et al [13] considered only ventricular wall and cavity segmentation. In Bello's study [8], the ventricular geometries were not learned directly from CMR images, but obtained via a non-rigid registration approach by mapping each patient's data into a template geometry.
The current paper differs from previous approaches, as we go directly from CMR images to the LV geometry by learning its low-dimensional representation. The same idea was initially proposed in [15], where the authors developed a one-stage approach, with a single CNN predicting the LV geometry from CMR images. In this study we build on the approach in [15] by substantially extending the underlying CNN methodological framework. In particular, we propose a two-stage approach, designating CMR image segmentation and geometry reconstruction to two specialist networks, respectively.
The main obstacle for translation and impact of state-of-the-art cardiac mechanics models in the clinic is the need for patient-specific model calibration and parameter estimation [4]. This is based on an iterative optimization procedure. Since a separate numerical solution of the underlying cardiac mechanics equations with finite element method is required in each step of this iteration, the computational costs are exorbitant, making clinical decision support in real time elusive [3]. There is currently substantial interest in reducing these computational costs by building a statistical surrogate model or emulator, which would dispense with the need for any finite element simulations. However, an emulator requires a low-dimensional representation of a patient's LV geometry as a functional input, and its manual reconstruction is itself a slow process. For this reason, a method that enables the automatic extraction of such a low-dimensional representation in a fully automated way directly from CMR images would have the potential for paradigm-shifting realtime cardiac clinical decision support.
In the present work, we have developed an automatic pipeline based on machine learning techniques to learn the LV geometry directly from CMR images, without requiring manual segmentation and geometry reconstruction. Specifically, we train a convolutional neural network (CNN) to predict the LV geometry by learning its principal component representation directly from CMR scans via automatic pixel labelling. Our approach delivers two outputs simultaneously: LV wall segmentation and a low-dimensional representation of the LV geometry. We investigate the consequences that the automatic generation of LV geometries has on estimating passive myocardial stiffness and compare the results with those obtained when using manually generated geometries, as in [16]. Figure 1 presents an overview of the proposed framework. Our main contribution in the present paper is the development of a methodological framework for a fully automated pipeline that provides an automatic extraction of the LV geometry directly from CMR scans.

Data
In this section we first describe the protocols used to collect the data, including in vivo CMR imaging protocols for both healthy volunteers and patients with acute myocardial infarction. Next, we discuss how the original CMR scans were used to prepare the data for our analysis, consisting of annotated images, LV geometries and corresponding computational finite element meshes. Finally we summarize the ground truth data used in this study.  Figure 1 An overview of the proposed framework: learning 3D LV geometries automatically from CMR images based on a convolutional neural network (CNN). Important applications of the outputted LV meshes are statistical emulators (thanks to the CNN-predicted low dimensional representation of the LV geometry) and parameter inference in cardiac mechanics models.

Study population and in vivo imaging
The study population consists of 182 subjects; 64 healthy volunteers (HVs) and 118 myocardial infarction (MI) patients. The study was approved by the National Research Ethics Service, and all participants provided written informed consent. All methods, including CMR imaging, were performed in accordance with the relevant guidelines and regulations. Healthy volunteers with no prior history of cardiovascular disease were enrolled for CMR imaging. A 12-lead electrocardiogram was obtained in all subjects and a normal ECG was an eligibility requirement. Other exclusion criteria included standard contradictions to MR, such as metallic implants and metallic foreign body. Demographics of all healthy volunteers can be found in [17]. The MI patients were selected from a larger population of patients with acute ST-elevation MI (STEMI), obtained within a prospective, observational cohort MR-MI study carried out between 14 July 2011 and 22 November 2012 funded by the British Heart Foundation (ClinicalTrials.gov identifier: NCT02072850). Three hundred and forty three patients with acute STEMI were eligible for enrolment in this MR-MI study if they had an indication for emergence percutaneous coronary intervention (PCI) due to a history of symptoms consistent with acute MI. Exclusion criteria represented standard contradictions to MR, such as a pacemaker and estimated glomerular filtration rate less than 30 ml/min/1.73m 2 . The CMR study of MI patients involved CMR scans at 2.2 ± 1.9 days (the acute state) and 6 months post-MI. Acute STEMI management followed contemporary guidelines. In this study, only the CMR scans at acute state were chosen.
The CMR imaging protocol involved steady-state free precession cine imaging, which was used for LV structure and functional assessment, the short-axis cine stack of the left ventricle from the base to the apex was acquired with 7 mm thick slices and a 3 mm inter slice gap. Typical imaging parameters were: matrix = 180×256, flip angle = 80 o , TR = 3.3 ms, TE = 1.2 ms, bandwidth = 930Hz/pixel, and voxel size = 1.3×1.3×7 mm 3 . Standard cine images were also acquired in the LV inflow and outflow tract (LVOT), the horizontal long-axis (HLA), and the vertical longaxis (VLA) planes. In the STEMI group, typical imaging parameters were: matrix = 192 × 256, flip angle = 25 o , TE = 3.36 ms, bandwidth = 130 Hz/pixel, echo spacing = 8.7ms and trigger pulse = 2. The voxel size was 1.8 × 1.3 × 8 mm 3 . The CMR methods and analyses have been previously described in detail in [18].
2.1.2 Non-automated (state-of-the-art) ventricular geometry reconstruction Below, we describe a non-automated method for ventricular geometry reconstruction, which represents the current state-of-the-art and serves as a benchmark for the automated procedures proposed in the present study. As in the previous study in [4], six short-axis (SA) and three long-axis (LA) cine images were chosen for each subject in order to construct the 3D LV model at early-diastole (when the LV pressure is at its lowest), which is used for further biomechanical analysis. The LV wall boundaries were manually segmented at each imaging plane using in-house Matlab code, and the short-axis LV wall boundaries were further aligned to the boundaries in the corresponding HLA, LVOT, VLA images, respectively. Details of this manual procedure can be found in [4]. Ground-truth data of segmented LV wall in SA and LA images were then generated based on manual segmentation with pixels labelled 2 for the LV cavity, 1 for the myocardium and 0 for the background.
A prolate spherical coordinate system was used to reconstruct LV geometry after the manual segmentation and alignment as in [19,16]. In detail, for a point with Cartesian coordinates (x, y, z), the corresponding spheroidal coordinates (u, v, w) are in which α is a scaling factor, u ∈ [−π/2, π/2), v ∈ [0, 2π), and w ∈ (0, +∞). After aligning the most-basal plane to the z = 0 plane, we have u = 0 at the basal plane and u = −π/2 at the apex. By assuming w to be a cubic B-spline interpolation of u and v, we can fit the endocardial and epicardial surfaces separately using the segmented boundaries from the 6 SA and 3 LA cine images respectively. For details of this surface fitting procedure, the reader is directed to [19]. The LV geometry is the region enclosed by the fitted endocardial and epicardial surfaces, each of which is represented by 2865 quadrilateral patches. In total, 5792 vertices are used for representing one LV geometry. In the Cartesian coordinate system, each vertex has three components, thus the LV geometry lies in a 17376 dimensional space (denoted "17k").

Data for inference in biomechanical models
Biomechanical meshes A 3D biomechanical cardiac model requires a volumetric finite element discretization of the LV geometry [3]. To generate it, we first divide the wall thickness between the endocardial and epicardial surfaces into 10 equal divisions, which means for i th division across the wall, we have In the same way, equal divisions along the circumferential and longitudinal directions are generated. Finally, a layered hexahedral mesh is generated suitable for finite element simulations of LV dynamics. The left panel in Figure 2 presents segmented LV boundaries from in vivo CMR cine images, and the right panel shows the corresponding computational finite element mesh. CMR-derived volumes and strains In order to infer myocardial passive stiffness from in vivo CMR scans, we measured end-diastolic LV cavity volume and 24 segmental circumferential strains directly from cine CMR images, similarly as in [20,4]. The segments are defined in four short-axial CMR cine images as specified by the American Heart Association [21].

Methodology overview
Our primary aim is to develop an automatic method based on convolutional neural networks (CNNs) that reads in cine CMR images and predicts the 3D LV geometry, represented by a cloud of points. This cloud of points can be then used to generate the LV mesh for subsequent cardiac biomechanical modelling. Our approach is based on two cornerstones: using PCA for dimensionality reduction of high-dimensional LV geometries and separating the geometry reconstruction task from CMR image segmentation in a two-stage framework. Learning a high-dimensional, high-resolution representation of the LV geometry directly from cine CMR images is a challenging problem, owing to the highdimensional nature of both the input and the output domains, which risks serious overfitting. Cutting-edge (deep) machine learning methods are "data hungry" and require dataset sizes much larger than typically available in cardiac studies (see e.g. the discussion in [22]). For datasets of realistic size, rigorous regularization is needed to prevent overfitting. As we have shown in [23], this loses the non-linear model flexibility that makes (deep) neural networks so powerful in the first place and reduces their predictive performance to that of simple statistical linear predictors.
To address these difficulties, our first cornerstone is to carry out PCA on a large population of LV geometries for dimension reduction, and to map all LV geometries into a low-dimensional space spanned by a few leading principal components. This substantially reduces the output dimension from the order of 17, 000 to no more than 10 (the number of leading principal components). This pre-processing procedure is hard-coded into our CNN. The effect is that the weights on the connections feeding into the output layer of the CNN have been pre-trained and are kept fixed, which substantially reduces the network complexity. These fixed non-adaptable weights on the edges between the inner bottleneck layer and the output layer of the CNN constitute the PCA basis, where the weights represent the projection into the PCA domain, and the number of nodes in the bottleneck layer corresponds to the number of principal components. The effect of this procedure is a novel trade-off between reducing excessive model flexibility that could otherwise cause serious overfitting (by constraining the output weights based on PCA) while keeping enough adaptable degrees of freedom to maintain non-linear model flexibility (all other weights).
The second cornerstone of our procedure is separating the segmentation and geometry reconstruction tasks. To this end we train two CNNs, a segmentation network (Section 2.4) and a geometry prediction network (Section 2.3). The segmentation network is based on the CNN developed in [13] for segmenting cine CMR images, i.e. assigning distinct labels to pixels (LV wall, LV cavity, and background). Next, the predictions from the segmentation network are used by the geometry prediction network to predict the 3D LV geometry. Below we first describe the latter network as it directly relates to our final object of interest, i.e. the predicted LV geometry. Next, we describe the former network, which provides inputs to the geometry prediction network. Each network is trained independently in a supervised manner on population-wide data and both networks contribute valuable complementary information. The advantage of our two-stage approach is that we do not need to reconstruct the high-dimensional LV geometries (i.e. 17k components in this study) directly from noisy images: this task is divided between specialist networks. In the Results section (Section 3.3) we show that the proposed two-stage approach achieves a considerable reduction of the LV geometry reconstruction error compared with an approach based on a single-task CNN predicting LV geometries from CMR scans, such as the one in [15]. Figure 3 presents the overview of the proposed approach.

Segmentation
Network Geometry Prediction Network PIXEL SEGMENTATION (Cavity/Wall/Background) Figure 3 Two-stage framework overview. Given the CMR images, pixel segmentation is predicted using the segmentation network. Next, the geometry prediction network predicts the 3D LV meshes, given the segmentations.
In addition, our framework delivers two extra outputs. Firstly, we obtain a lowdimensional representation of the high-dimensional LV geometry, which is important in the context of developing statistical emulators. Secondly, for the given dataset, we deliver an automatic segmentation of LV wall in cine images, in particular in LA cine images, which has not been considered in [13].

Preparation: PCA and baselines
Our dataset of LV geometries consists of 182 observations, with each 3D LV geometry represented by a 17k-dimensional vector. As discussed above, learning to predict such a high-dimensional vector as a direct function of the CMR image from a relatively small training set size is an ill-posed problem, leading to serious identifiability and overfitting problems. To address this difficulty, we follow [15] and learn a low dimensional representation of the LV geometries using PCA, which substantially reduces the complexity of the LV geometry reconstruction problem. This approach also provides us with a low-dimensional LV geometry encoding, which is paramount for the development of statistical emulators. For evaluation of the reconstruction accuracy we consider two baselines: the results reported in [15] and the predictions obtained using the mean LV geometry. This mean geometry is the LV geometry obtained by taking the mean of coordinates along each of the three dimensions over our LV geometry dataset. More formally, denote the jth GT LV geometry from our dataset, where M is the total number of vertices describing the LV geometry, in our case M = 5792, and is a vector of 3D Cartesian coordinates of the ith vertex. Then the mean geometry is given as

Network architecture
The previous work dealing with predicting LV geometries directly from CMR scans [15] considered a CNN consisting of seven layers (five convolutional and two fully connected) taking CMR images (SA and LA views together) as inputs and outputting a four principal components encoding of the LV geometry. Our proposed network architecture (see Figure 4) differs from that in [15] in a number of aspects. The first and most fundamental difference is that in the present study we split the network into two branches; one each for the aligned SA and LA slices respectively. The reason for this split is the inherent difference between LA and SA views. Dividing their processing into two separate CNN branches is therefore a natural "divide-and-conquer" strategy that allows each branch of the CNN to focus on the particular features of the respective view.
To combine the predictions from both branches we further allow each sub-network to output eight PCA coefficients, which we then add via element-wise addition. We have chosen eight PCA components using cross-validation, discussed later in Section 2.3.5. The added PCA coefficient layer is then followed by an eight unit linear output layer. Our prior experimentation revealed that this approach performs better (in terms of the reconstruction error defined below and convergence speed) than an alternative network with an extra layer [1] . In the convolutional layers we use leaky ReLU (x = max(0.2x, x)) activations.

Inputs and aligning LA slices
Another difference between our geometry prediction network and that in [15] is that we do not use the original CMR images as inputs but the corresponding segmented labelled images. The labelled image is encoded using 3 values: 2 for the LV cavity, 1 for the myocardium, 0 for the background; see the inputs in Figure 4. We then scale the input to the [0,1] range. In addition, we align the LA images so that the long-axis is horizontal. The long-axis is defined as the axis perpendicular to the most basal plane and passing the gravity centre of the LV cavity at that plane (see the lower branch of the CNN in Figure 4). We find that this LA alignment enables a better convergence and more accurate LV geometry reconstruction. Figure 4 Geometry prediction network: CNN consisting of two identical branches, one for six short axis (SA) slices and one for three long axis (LA) slices. Each branch consists of five convolutional layers (Conv. 1--5) and a fully connected layer with leaky ReLU activations outputting eight PCA coefficients, which are then added together using the element-wise addition operator. Inputs: LV segmentations encoded 2 for the cavity, 1 for the myocardium and 0 for the background. Output: a left ventricular geometry represented by approx. 17 thousand values (17k).

LV geometry alignment -alternative GT targets
The LV geometries in our original dataset discussed in Section 2.3.3 are aligned such that the gravity centres of the LV cavity in the most-basal SA plane coincide. Here we introduce two additional datasets, derived from the original basal-aligned dataset that will serve as alternative GT targets for the geometry prediction network (see also Figure 7). The basic idea is to align the LV geometries in a way that will make it easier for the geometry prediction network to learn the target LV geometries (see Figure 4). To this end we consider a class of methods from statistical shape analysis called Procrustes analysis [24,25]. In particular we consider its two variants: ordinary Procrustes registration (OPR) and generalized Procrustes registration (GPR).
OPR uses the operations of rotation, scaling and translation to best match a discretized shape X to another shape Y. This is expressed as where β and γ are the scaling and translation parameters respectively, and Γ is the rotation matrix. X and Y are matrices of equal dimension, where each row gives the position vectors of each point in the discretized geometry. In addition, both matrices are assumed to have centroid zero.
GPR expands on OPR to align a set of shapes X 1 , ..., X N to a reference geometrȳ X as closely as possible. This is done by iteratively minimizing with respect to the transformation parameters where at each iteration the reference shapeX is the mean of the transformed shapes from the previous iteration. Again, the shapes are assumed to have centroid zero. This procedure is then iterated until convergence.
Our first alternative dataset alignment is based on GPR where only translations of the LV geometries are considered. Under the standard Euclidean metric, the optimal translation corresponds to simply aligning each LV geometry to have a common centroid, which we set to the origin. We call this data set "centered-GT". The second alternative alignment is constructed using GPR where only translation and rotation operations are used, which we refer to as "rotated-GT". We do not unify the scaling of the LV geometries because we want to predict the size of each LV geometry explicitly. Note that the rotations applied to the LV geometries by GPR were small, as all the LV geometries in the original dataset share a common orientation. Any further rotation is to account for intrinsic shape variation in each specific LV geometry.

Training
First, we train the geometry prediction network using the ground-truth (GT) segmentations as inputs, which we have described in Subsection 2.1.2. Later on, in Section 3.3, we will discuss the results of the two-stage framework in which the geometry prediction network is trained using the segmentations predicted by the segmentation network discussed in Section 2.4.
In the training we minimize the LV geometry reconstruction error taken as the mean squared error (MSE) between the GT LV geometry and its reconstructions. Formally, using the notation introduced in Section 2.3.1, let G * (j) denote the jth GT LV geometry from our dataset, j = 1, . . . , J. Furthermore, denote the corresponding predicted geometry We then calculate the MSE between G * (j) and Finally, we consider the average MSE over the whole dateset of J LV geometries calculated as We use the average MSE (6) to train the geometry prediction network as well to assess the reconstruction accuracy for different datasets, see Figure 7.
To prevent overfitting we use 14 fold cross-validation, i.e. we divide our dataset of 182 subjects into 14 folds, with 13 subjects each, with HV and MI patients randomly shuffled. We then train the network on 12 folds and use one of the remaining two folds for cross-validation (to be discussed below) and the other one for testing. Importantly, PCA is performed for each training split individually, without the test examples.
As the CMR images usually have different pixel spacing, we firstly unify them by applying a common pixel spacing so that all CMR images and corresponding labelled images are expressed in the same resolution. We then crop 64 × 64 pixel patches from the unified labelled images so that that the LV cavity center is in the centre of the crop. We then take 60 × 60 random crops from the 64 × 64 image patches for training, and the central crop of the same size for validation and testing. Taking random crops for training corresponds to a crop noise of [±2 × ±2] pixels. Random cropping is a standard data augmentation technique in computer vision, aimed at preventing overfitting, see [26] for details.
The CNN training makes use of two regularization parameters (often referred to as "hyperparameters" in the deep learning literature): the learning rate and the L2 regularization strength. In addition to controlling the convergence rate, the learning rate also provides regularization given a fixed number of training epochs via early stopping. The L2 regularization strength prevents overfitting by keeping the weights of the CNN low. This is done by adding the squared L2 norm of the total weight vector, multiplied by the L2 regularization strength, to the standard MSE loss function. In some layers (see Figure 4 for details) we also apply another regularization technique called dropout [27]. Dropout involves randomly switching a subset of the networks weights to zero at a given rate.
The values of the CNN regularization parameters and the dropout rate are the same for each cross-validation fold, with the L2 regularization parameter set equal to 0.001, the learning rate to 0.0004 and the dropout rate to 0.01. We select these values manually, based on the cross-validated CNN performance after 300 epochs (following [15]). We implement the geometry prediction network using Python and TensorFlow. Training of a single network takes about two minutes on a NVIDIA Quadro P4000 GPU for each cross-validation partition.
Training We need to train the network from [13] on our own dataset, individually for SA and LA slices, since the network originally trained in [13] does not segment the LV wall in LA images. Moreover, we also found it did not work properly for the SA images from our dataset [2] . The general approach to training our segmentation network is similar to that for our geometry prediction network detailed in Section 2.3.5. Hence, below we focus on the modifications required to train the segmentation network. As in [13] we use the mean cross entropy as the loss function, computed between the GT annotations and the probabilistic label predicted by the network.
We resize the CMR images as the inputs to the network with a size of 160 × 160 pixels. We train the network for 2000 iterations, using a batch size of four. The latter is due to the memory limit available on our GPU, as the network uses approximately 1GB of GPU memory per image. Like for the geometry prediction network we perform 14-fold cross-validation, which means that each training split uses 12 folds with 13 subjects each, with each subject consisting of 6 SA and 3 LA slices, which results in 936 SA images (12×13×6) and 468 LA images (12×13×3) for training. We explore various data augmentation techniques when training the segmentation network, such as the addition of noise to the image scale, as well as image rotations and shifts, but none yields performance improvements.
Evaluation For the segmentation task we use the Dice score, separately for the LV wall and the LV cavity, as the evaluation measure. The Dice score is defined as 2|P ∩ G|/(|P | + |G|), where P and G are the predicted and the ground-truth pixel sets, respectively. The Dice metric is also used in [13]. It is similar to another popular criterion called intersection over union (IoU), defined as |P ∩ G|/|P ∪ G|.
However, Dice focuses more on the average prediction rather than on the worst-case prediction, as IoU does. [2] It often worked reasonably well for the basal and mid-ventricle SA slices of healthy volunteers, but not for SA slices close to the apex, and it performed poorly for MI patients.

Passive myocardial stiffness inference using biomechanical models
Accurate biomechanical modelling of cardiac function relies on the choice of constitutive laws and associated constitutive parameter values. The latter cannot be measured directly in vivo, and therefore have to be inferred non-invasively from indirect measurements. This is essential for the application of cardiac models in the clinic [3]. Below, we summarize how to infer passive myocardial properties characterized by the widely used Holzapfel-Ogden (HO) law [29]. The procedure has been developed for healthy volunteers and is based on a finite element cardiac mechanics model in diastole, as shown in Figure 6. The HO law has eight constitutive parameters, denoted a, b, a f , b f , a s , b s , a fs , b fs . We refer to [29] for more details on the HO law, including the formula for the strain energy function. A specific, multi-step procedure for inferring myocardial properties using the HO law has been developed in [20]. It is based on matching the model-predicted LV volume (V ) and 24 circumferential strains ε i , i = 1, . . . , 24, to the measured ones, extracted from in vivo CMR scans (see Section 2.1.3), denoted V * and ε * i , i = 1, . . . , 24, respectively, using the following loss function Since the study in [20], inferring HO law parameters has been considered in a series of subsequent studies [4,30,31]. Details on the forward biomechanical LV model and CMR derived strains and volumes can be found in [20]. The LV model is simulated for diastolic filling with a population-based end-diastolic pressure, taken to be 8 mmHg. This will allow us to infer myocardial passive properties as in [20].
At time t 1 At time t 2 > t 1

Cardiac mechanics model
Constitutive parameters LV pressure profile The eight parameters of the HO law are challenging to infer because they are highly correlated and weakly identifiable, see [20]. Therefore, we follow the approach proposed in [30] and reparametrize the original eight-dimensional inference problem [5] using a four-dimensional vector θ as follows where the nought subscripts indicate the reference values, which can be taken from published studies [7,5]. Details on the reparameterization can be found in [30,5].

Bayesian optimization
A disadvantage of the approach in [20] is its high computational complexity and the fact that it is not guaranteed to converge to the global minimum of the objective function (7). To circumvent these problems we use Bayesian optimization (BO), which is a global optimization method based on sequential training of a statistical approximation to the unknown target function. BO is particularly suited for problems in which a single evaluation of the objective function is time consuming. We refer to [32] for a review of the BO methodology.
As in [33], we consider two versions of BO, based on "target surrogates" and "partial error surrogates". In the former approach the objective function (7) is approximated as a whole using a single Gaussian process (GP) regression, while in the latter approach each error term in (7) is approximated separately, using a separate GP regression. We refer to [34] for a detailed treatment of GP methods. Note that in this paper, our interest is not in comparing different variants of BO, but to verify how robust parameter inference is by considering two independent runs of BO for basic uncertainty quantification.
We initialize BO using an initial design based on the Latin hypercube. We follow the standard recommendation [35] to use 10 × D points, where D is the dimensionality of the problem (in our case the number of parameters to be inferred).

Evaluation methods
We are interested in comparing the parameter inference performance when using the LV geometries obtained automatically from our proposed CNN-approach with the results obtained using the manually segmented geometries (see Sections 2.1.2) and the baseline of the mean geometry (defined in Eq. (2)). To make this comparison, we make use of three separate evaluation methods, which we outline below.
BO is designed to optimize a given objective function, thus the best objective function value obtained by the BO optimisation routine will serve as our first evaluation method. Here we are interested in the minimum of the mismatch function (7), hence we prefer approaches leading to lower loss values. A low value of the objective function indicates that the underlying parameters have led to outputs (LV volume and circumferential strains) closely agreeing with the CMR-derived measurements described in Section 2.1.3 (real or synthetic).
For synthetic data, generated with known GT parameters, we can assess the inference accuracy in the parameter space by calculating the weighted L2 distance between the GT and inferred values. In particular, we are interested in the relative root mean squared error (RMSE), calculated as where θ * i denotes the GT value of θ i ,θ i is the estimate of θ i and D is the dimensionality of the problem, in our case D = 4. Scaling each error term by the corresponding GT value makes the measure robust against becoming inflated by large parameter values.
The problem with evaluating in the parameter space is that the parameters of the HO model are not guaranteed to be uniquely identifiable from the available experimental data, as discussed e.g. in [20]. For that reason, differences in biomechanical parameter space may not be informative about the difference in myocardial stiffness. On the contrary, the stretch-stress space derived from the HO law under a prescribed stretching mode (i.e. uniaxial) is directly related to myocardial stiffness and often employed in ex vivo stretching experiments [36].
Therefore, as an additional evaluation metric, we use stretch-stress curves along two principal directions according to a layered myofibre structure approximation. The first direction considered is stretch along the myocyte direction (λ f ) and the second is stretch along the sheet direction (λ s ). These curves allow us to gain insight into the underlying myocardial stiffness values, because in general, higher stress values for a given stretch level indicate higher stiffness.

Results
In this section we first present the results for each CNN separately (segmentation network and geometry prediction network). We then move to assessing the performance of the whole two-stage approach. Finally, we investigate the usefulness of the reconstructed meshes for inferring the parameters of a biomechanical model using the HO law. Figure 7 illustrates the evaluation methods that will be used and the datasets of interest in each case.

Geometry prediction network
The geometry prediction network, visualized in Figure 4, requires segmentation of the CMR scans as inputs. When obtained via the segmentation network, these will inevitably be subject to noise that affects the accuracy of the geometry prediction. Therefore, as a sanity check, we initially make geometry predictions from GT segmentation. Providing a lower bound on the error that can be achieved using our fully automated approach, this also allows extra exploration of the methodology. This initial study reveals eight to be the optimal number of PCA coefficients in the output layers of the two sub-networks. Using more components allows extra flexibility but at increased risk of overfitting while fewer components fail to sufficiently explain the variation in the data.
Note that the one-stage framework, developed in [15], used four principal components, which we find insufficient to accurately represent LV geometries. The final MSE for the GT segmentation is 0.030, which is 60% less than the MSE of 0.049 reported in [15]. We treat this value as the lower error bound for the current architecture.  Figure 7 An overview of the evaluation methods and datasets used for CMR image segmentation, geometry reconstruction and parameter inference. For parameter inference "×2" refers to repetitions of Bayesian optimization with two algorithms for basic uncertainty quantification.
Dashed arrows indicate how the datasets are related. In the case of parameter inference, the synthetic data was constructed using the information (final parameter estimates) obtained using the real data, which is additionally denoted by "inform" below the corresponding dashed arrow.

Segmentation network
As noted in Section 2.4, we retrain the network from [13] for SA and LA images separately. Below, we discuss the results for each of the two networks in turn. The top panel in Figure 7 outlines our approach to segmentation evaluation. Figure 8 shows examples of the predicted segmentation for a HV at multiple SA slice locations. We observe that, compared with a manual segmentation, the network predicted segmentation becomes less accurate near the apex. However, it is worth noting that due to the lack of a clear boundary in the CMR scan, the manual segmentation also becomes more erronous in this region. Table 1 summarizes the SA segmentation results for the entire dataset. When training from scratch, using randomly initialized weights, we obtain Dice scores of 80.5% for the LV wall, and 90.9% for the cavity. These results are improved by initializing the weights using the trained network published in [13], with Dice scores increased to 85.0% and 92.5%, respectively. The latter results are comparable with the scores found in [13] (88% for wall and 94% for cavity), which were based on a much larger dataset of approximately 5000 subjects. One property common to all the networks is the larger Dice score for cavity segmentation. This is unsurprising, since the higher contrast of the cavity makes this easier to segment than the LV wall, which is sometimes indistinguishable from surrounding tissues. Note that despite a much smaller sample size, our adopted segmentation network achieves comparable accuracy for segmenting LV wall from cine CMR images as in [13].

Long axis slices
We train the LA Segmentation Network using all three LA views together, initializing using the pre-trained network from [13]. Examples of segmentation from the trained network and specific Dice scores are given in Figure 9, with overall results provided in Table 2. The obtained mean Dice score is 86.60%, which is slightly lower than that for SA (88.75%). However, despite this slight deterioration, we can see that our segmentation network still delivers a rather accurate segmentation of LV wall in the long-axis view.

The two-stage framework
Below we report the results obtained from our two-stage reconstruction framework, whereby the CMR images are first segmented using the segmentation network, be-  fore these segmentations are passed to the geometry reconstruction network, which predicts the 3D LV geometry.
The middle panel in Figure 7 outlines how the reconstruction results (i.e. predicted geometries) are evaluated. We note that the MSE scores were found to be very stable -running the CNN several times with different random number generator seeds varies the results usually only in the 4th digit. We therefore report the MSE scores up to the third digit, which are found to be robust and does not change over repeated simulations.
The geometry prediction network trained on the predicted segmentation results (for both SA and LA) in an MSE score of 0.033, which is very close to the lower bound of 0.030 obtained with the GT segmentation. As expected, using both SA and LA slices improves upon using either of them separately, with MSEs of 0.038 and 0.044 for the SA-only and LA-only based networks, respectively. The top part of Table 3 summarizes the results.
Aligning the dataset of LV geometries using the two Procrustes techniques from Section 2.3.4, (see also the middle panel in Figure 7) achieves further reconstruction improvements, as reported in the bottom part of Table 3. We observe a decrease in the MSE to 0.026 when using the centered dataset, and a further slight gain in the performance with the rotated dataset, with the MSE of 0.025. However, using the aligned LV geometries also leads to an improvement in accuracy of the baseline mean-geometry prediction: the MSE declines from 0.074 to 0.058 and 0.057 for the centred and rotated dataset, respectively. Since for each dataset (original, centred, rotated) we calculate a separate mean geometry (being the mean of the given dataset), aligning the dataset minimizes its variance and hence enables a better performance of the corresponding mean geometry benchmark. Table 3 The two-stage approach results: MSE. The top part of the table reports the results for the original dataset of LV geometries, and the bottom part presents the results for the two alternative alignments of the dataset of LV geometries described in Section 2.3.4.

Model Result
Dataset of original LV geometries Mean geometry 0.074 CNN baseline from [15] 0  Table 4 shows results post factum, i.e. based on the test set, for different numbers of PCA components. In our main simulations, we used eight components consistently, i.e. without further adaptation, based on preliminary explorations. That is, we did not select the number of PCA components based on the test set results, which would be methodologically flawed (selection bias). Interestingly, Table 4 suggests that the performance is rather robust and does not vary much when varying the number of PCA components over quite a substantial range, between 4 and 10.
In particular, we note that for the aligned datasets, 4 components works well, consistent with the results in [15]. Table 4 The two-stage approach results for different numbers of PCA components. The first four components are the most important and were used in [15]. For the segmentation input we used eight components. The minimum is obtained for eight and seven components for the original and centered/rotated data, respectively.  Figure 10 presents examples of reconstructions from the proposed two-stage approach. We illustrate typical cases for which the reconstruction errors are approximately equal to the median reconstruction error. The reconstructions are accurate, particularly with respect to the general size and height of the LV, but some small misalignments occur near the edges.
As a practical additional experiment we calculate the predicted cavity volumes of the 3D LV geometries and compare these with the GT data. As shown in Figure 7, we performe this exercise only for the best-performing (in terms of MSE) dataset of rotated LV geometries. Using the two stage network, we achieve an RMSE of 12 ml. This is lower than the value of 26 ml obtained using a mean mesh baseline and is comparable with values from the literature. For instance, the network developed in Figure 10 The two-stage approach example reconstructions. Depicted are typical casesreconstructions that obtained approximately the median reconstruction error. Blue shape: reconstruction, grey wireframe: GT. The reconstructions are accurate, LV sizes and heights are similar, there are small misalignments at the edges. [37], which was trained using a volume-based objective on a larger dataset, achieved an RMSE of 10 ml.

Parameter inference
Having evaluated the LV geometry reconstruction in geometry space, we now investigate the effect this has on the inferred mechanical properties as discussed in Section 2.5. We make use of four LV geometries corresponding to randomly selected HVs, which we label HV A, HV B, HV C and HV D. We consider three ways to represent each LV geometry: the manually obtained original geometry (ground truth), the geometry reconstructed using the proposed CNN-based approach and the mean geometry specified in Eq. 2 (serving as the baseline). For each of the four subjects we have measured the end of diastole volume V * and 24 circumferential strains, ε * i , i = 1, . . . , 24, from CMR scans, using procedures discussed in Section 2.1.3.
We carry out two studies: a real data study and a synthetic data study, each evaluated using the corresponding methods discussed in Section 2.5.2 (see also the bottom panel in Figure 7). In the real data study we use the measurements V * and ε * i , i = 1, . . . , 24, available for each HV. However, we do know the underlying true parameters with which we could compare the estimated values, therefore we treat the estimates from the original LV geometries as gold standards. These gold standard parameter values are also used to generate synthetic data: for each LV geometry we run the forward simulator with the corresponding optimized parameter values and treat the outputted measurements as synthetic data on the LV volume and 24 circumferential strains. The knowledge of the GT parameters allows us to assess the effect that the geometry approximations (the proposed CNN-based reconstruction and the mean geometry) have on the optimization directly in parameter space. Table 5 presents the final values of the loss objective function from Eq. (7) for the four LV geometries under consideration. In all cases except HV B the reconstructed meshes lead to lower errors than the mean mesh. Moreover, the results seem stable, with independent runs of BO, even with the two different surrogates described in Section 2.5.1, leading to almost identical values. Stretch-stress curves and parameter estimates Figures 11 and 12 present the stretch-stress curves corresponding to the final θ values from the three geometries (reconstructed, original, mean) for each HV under consideration. For all the HVs except HV C the curves obtained for the CNN reconstructed geometries are closer to those for the original geometries than the curves obtained using the mean geometry. For HV C, the parameters obtained with BO with partial error surrogates for the reconstructed and for the mean geometry lead to much stiffer effects for higher stretches than those found with other methods We report the final parameter estimates used to generate the stretch-stress curves in Table 6. These results are visualized by means of Bland-Altman plots comparing the differences between the original and reconstructed geometries with the differences between the original and mean geometries in Appendix A. Overall, taking the estimates obtained for the original geometries as benchmarks, we can conclude that the CNN reconstructed geometries lead to better estimates than those recorded for the mean mesh.

Real data Objective function
Finally, we observe that θ 4 is often hard to identify, even when using the original geometry. For instance for HV A and HV D the estimates of θ 4 for the original geometry obtained with the two variants of BO vary noticeably. A similar behaviour can be noted for HV C, for both the CNN-reconstructed and mean geometry, or HH B for the mean geometry. Therefore, in our next study based on synthetic data we decide to fix the value of θ 4 = 1 (i.e. no scaling of the reference values).  Figure 11 Bayesian optimization, real data study: stretch-stress curves for HV A and HV B (in rows). Blue: BO with target surrogates (targ.), red: BO with partial error surrogates (part.), solid lines: reconstructed geometries (recon. mesh), dashed lines: original geometries (org. mesh), dotted lines: mean geometries (mean mesh), solid black lines: ground truth values. Table 6 Bayesian optimization, real data study: final parameter estimates (parameters leading to the minimum value of the objective function) for four healthy volunteers (HVs) obtained with the reconstructed geometry (geo.), the original geometry and the mean geometry. BO repeated with two algorithms (target surrogates, targ. and partial error surrogates, part.) for basic uncertainty quantification.   Table 7 presents the final values of the loss objective function from Eq. (7) for the four HVs under consideration. Notice that, due to the absence of measurement errors in the synthetic data, the mismatch values are much lower than those found in Table 5. These values are particularly close to zero for the original geometries due to the absence of any geometry approximation. Similar to the real data study, the reconstructed geometries lead to lower losses than the mean geometry for all cases except HV B. In addition, the choice of surrogate has little effect on the parameter estimates, which was also found in the real data case. Table 7 Bayesian optimization, synthetic data study: the lowest values of the objective function (×100) for four healthy volunteers (HVs) obtained with the reconstructed geometry (geo.), the original geometry and the mean geometry. BO repeated with two algorithms (target surrogates, targ. and partial error surrogates, part.) for basic uncertainty quantification. Parameter estimates Table 8 and Figure 13 present the results in the parameter space using rRMSEs, as defined in (9). As expected, optimization based on the original geometry performs the best with very low rRMSEs. Moreover, parameter estimates obtained with the CNN reconstructed geometries are typically much more accurate than those obtained with the mean geometry, leading to lower rRMSE. For all HVs but HV C the CNN reconstructed geometries clearly outperform the mean geometry for both BO variants and for HV B the reconstructed geometry leads to results very close to those based on the original geometry. For HV C, comparing the objective function values from Table 7 with the rRMSEs in Table 8 reveals that the optimization problem is difficult due to local optima and ridges in the objective function. We can see that even though independent BO runs converge to very close values of the objective function (e.g. 0.809×10 −2 and 0.800×10 −2 for the reconstructed mesh) the corresponding parameters may result in very different rRMSEs (e.g. 0.6377 and 0.1362 for the reconstructed mesh). This illustrates the point made in Section 2.5.2 that evaluation in the parameter space alone might not be sufficiently informative and needs to be complemented with the analysis of stretch-stress curves.  Figure 13 Bayesian optimization, synthetic data study: relative RMSEs from Eq. (9) between the parameters leading to the minimum value of the objective function and the GT parameter values for four healthy volunteers (HVs) obtained with the reconstructed geometry (Recon. mesh.), the original geometry (Org. mesh) and the mean geometry (Mean mesh). BO repeated with two algorithms (target surrogates, targ. and partial error surrogates, part.) for basic uncertainty quantification. Figures 14 and 15 present the stretch-stress curves for the four HVs under consideration. In all cases the curves based on the original geometries almost perfectly match the GT ones. For HV A, the curves based on the reconstructed mesh and the curves based on the mean mesh are in close agreement with the GT curve for small stretches (up to approx. λ f = 1.1 and λ s = 1.15 ). For higher stretches, however, the curves based on the reconstructed mesh are noticeably closer to the GT curve than those based on the mean mesh. For HV B we record an almost perfect agreement between the GT curves and those based on the reconstructed mesh, both for the stretches along the myocyte direction and along the sheet direction. As in the HV A case, the curves based on the mean mesh deviate considerably from the GT curve for high stretches. Finally, there is a very close agreement between the curves obtained for the CNN reconstructed geometries and those for the original geometries and the GT curves. On the other hand, the curves predicted for the mean geometry deviate noticeably from the remaining curves, especially for higher stretches.

Summary
We have developed a two-stage framework-visualized in Figure 3-for automatic prediction of the LV geometry directly from CMR images. First, the segmentation network provides accurate annotations of LV wall in both SA and LA slices. Secondly, the geometry prediction network uses the segmentation to predict a 17kdimensional LV geometry. We have shown that the proposed two-stage approach leads to considerably more accurate predictions of LV geometries than the previous approach from [15]. This earlier one-stage approach struggled to accurately localize LV wall in SA and LA slices, which in our framework is conveniently achieved by the segmentation network. The additional proposed enhancements, such as aligning LA slices (inputs to the geometry prediction network) and aligning the GT LV geometries (outputs from the geometry prediction network), allows for a substantial reduction in the reconstruction error.
In the context of parameter inference for cardiac mechanics models, we have demonstrated that the CNN-predicted LV geometries can perform much closer to the original (manually obtained) geometries than the mean geometry in the sense of leading to more comparable stress-stain curves or lower values of the loss objective function in Eg. (7).

Principal component analysis
There has recently been substantial interest in LV geometry representation for cardio mechanical modelling by means of statistical dimension reduction techniques. In particular, methods based on principal component analysis (PCA) [38] have been especially popular. The study in [39] implemented PCA for the purpose of load free geometry estimation in the context of inferring the passive stiffness of the myocardium. The work in [40] considered a PCA-reduced geometry representing the LV in a deep neural network model. The neural network approximated the diastolic filling process with the aim to infer the mechanical behaviour of myocardium. Our work relates to this strand of literature as we apply PCA within our geometry reconstruction network. In our case, PCA provides an essential dimension reduction and regularization step to enable the application of convolutional neural networks to learning the LV geometry from CMR images when the training data are comparatively small (in the order of several hundred examplars). See also the related discussion in Section 4.4.

Applications and potential impact in the clinic
Statistical emulators As argued in Section 2.5, solving the cardiac mechanic equations needs to be done numerically. However a single run of the associated forward simulator (see Figure 6) typically takes 8-15 minutes. During inference, forward simulations have to be carried out repeatedly as part of an iterative optimization procedure, leading to computational execution times of several days, which is not viable for a practical clinical decision support system. Consequently, there is currently much interest in surrogate models and statistical emulators, in which a substantial part of computations is performed in advance, prior to recording any subject specific measurements.
Recent proof of concept studies in the context of cardiac mechanical modelling have demonstrated that the computational complexity can be reduced by several orders of magnitude at negligible loss in accuracy [30,31,41]. However, those studies assumed that the LV shape, extracted from a CMR scan of a healthy volunteer, was fixed. For real clinical applications based on personalized medicine, variations of the LV shape have to be included in the emulator. Unfortunately, the exploration of the full geometry representation space, discussed in Section 2.1.2, would require the impossible task of building an emulator in 17k dimensional space. A solution to this problem is to consider a low dimensional representation of the LV geometry, in a space that can be more carefully explored during emulator training. For clinical purposes we also require an efficient method for obtaining a projection of a new geometry into this space, ideally requiring minimal manual intervention by a practitioner. Our two-stage method does exactly this, by allowing prediction of a low dimensional representation directly from CMR images.
Other applications Accelerating and automating the process of LV geometry reconstruction is a critical step towards personalized medicine that will obviate inefficient manual CMR image processing for LV geometry extraction. Once reconstructed, the predicted LV geometries can serve multiple purposes, of which the most important from our perspective is numerically solving cardiac mechanics equations with finite element method. Moreover, the predicted LV geometries are valuable for the analysis of a number of clinically relevant quantities, such as LV volume, wall thickness and cavity volume. Finally, they will provide clinicians with a promising 3D visualization tool.
In addition, our proposed approach delivers automatically obtained LV wall segmentation, both in SA and LA images. As we have demonstrated, the segmentation accuracy on the test set is high, over 80% for the wall and over 90% for the cavity. In case higher accuracy is required, the predicted segmentation can be used as instantiations which can be further manually corrected.

Limitations
Performance against GT meshes While the improvement over the mean geometry performance as a reference benchmark is encouraging, there is still a significant performance gap between the reconstructed and the GT geometries, both in terms of direct geometric features, and derived features related to cardio-mechanics (cardiomechanic parameters and stretch-stress curves). This suggests that further work is required to achieve the ultimate objective of reliable and automated clinical decision support.
Limited data The main challenge for future work is enlarging the training set size. The set of LV geometries available in our study is limited to about 200 examplars. This does not bring out the full potential of CNNs (due to the need for restrictive regularization), which have been conceived for "big data" problems. Obtaining a dataset that would be orders of magnitude larger than the current one is fundamentally difficult due to the excessive computational cost of reconstructing a single LV geometry.
Motion correction Motion artefacts, resulting from both cardiac and respiratory cycles and involuntary patient movement, still remain a great challenge in CMR imaging [42,43], in particular for acute-MI patients with shortness of breath. Accurate reconstruction of the LV geometry is critical for biomechanical studies [3,1], thus motion correction is generally needed when constructing patient-specific geometries from cine CMR images. In this study, a rigid-body translation has been applied to realign the endocardial and epicardial boundaries from SA images to the three LA images, as in [4], so that the ventricular boundaries from the SA images overlap with wall boundaries in LA images. Other approaches, such as tracking the imaging plane throughout the cardiac cycle, deformable image registration schemes, etc., would further reduce motion artefacts [42]. However, it would be challenging to apply those methods to our CMR images retrospectively, thus a simple rigid-body translation has been chosen. Currently, there may be confounding effects from incompletely corrected motion artefacts, which are likely to lead to spurious deformations of excessively skewed or bulged LV geometries [9]. Since machine learning has been successfully applied to CMR image analysis and interpretation [12], we expect that incorporating machine-learning based motion correction algorithms into the developed segmentation network will help alleviate motion artefacts.
Bi-ventricular geometry prediction In the present study, only the LV geometry has been directly learned from CMR images, but not the bi-ventricular geometry. By including both LV and RV (right-ventricular) geometries, there will be more geometrical features (e.g. RV-LV insertions) that could potentially enhance the accuracy and reliability of CNN-based geometry prediction. For example, Duan et al. [14] combined a multi-task deep learning approach along with atlas propagation to segment short-axis CMR volumetric images by learning the segmentation and landmark localization simultaneously, and ventricular shape prior knowledge was applied to overcome image artefacts. In future work, we will explore the prediction of the bi-ventricular geometry directly from conventional CMR images by incorporating into our CNN prior knowledge on ventricular shapes, geometrical landmarks, measured wall motions, etc.

Conclusions
We have developed an automatic CNN-based framework for predicting LV geometries directly from CMR images, without the need to manually annotate any CMR scans. We have recorded a noticeably lower LV mesh reconstruction error than previous methods. We have also demonstrated that our predicted LV geometries perform closer to the original, ground-truth geometries than the LV mean mesh in the context of cardiac mechanic parameter inference. A low-dimensional representation of the LV geometry delivered as a by-product of our proposed framework has the potential to be a stepping stone towards patient-specific statistical emulation, as necessary for applications in personalized medicine. Ethic approval and consent to participate Te UK Research Ethics Service (ethics committee references: 10/S0703/28 and 11/AL/0190) approved the study and all of the participants provided written informed consent.

Consent for publication Not applicable
Availability of data and materials The original CMR cine images are not available due to privacy considerations. However, pre-processed data (a subset of re-scaled and cropped CMR images as well as segmented images and LV geometries) and the code are available in the following Github repository: https://github.com/aborowska/LVgeometry-prediction.  Table 6 by means of Bland-Altman plots. They are based on taking the estimates obtained with the original meshes as benchmarks and calculating the differences between this benchmark and the estimates obtained with other mesh types, i.e. CNN-reconstructed meshes (blue markers) and the mean mesh (red markers). For each mesh type we take the "representative" estimates, i.e. average estimates between two BO runs (with different emulation schemes). Since the differences can be both positive and negative we can see in Figure 16 that on average there is hardly any difference between the performance of reconstructed meshes and the mean mesh (solid horizontal lines). The former, however, produce differences much more concentrated around 0 as can be seen by considerably narrower confidence bounds (dashed lines). This can be easily seen in Figure 17, in which the absolute values of the differences against the benchmark are depicted. Here we can see that the mean absolute difference with respect to the benchmark is much narrower for the estimates obtained with reconstructed meshes compared with those obtained with the mean mesh.   Figure 16 Bayesian optimization, real data study: Bland-Altman plot comparing the differences between the final "representative" estimates based on the original and the reconstructed meshes (blue), and the differences between the final estimates based on the original meshes and the mean mesh (red). Each symbol represents a different parameter (circleθ 1 , squareθ 2 , triangleθ 3 , starθ 4 ) so it occurs eight times, twice (blue, red) for each HV (A, B, C, D). The horizontal lines are the mean (solid) and ±1.96 standard deviation (dashed). For each mesh type (original, reconstructed, mean) we take the average between two BO runs (with target and partial error emulation) as the "representative" estimate for the corresponding HV and mesh type.   Figure 17 Bayesian optimization, real data study: Bland-Altman plot comparing the absolute values of the differences between the final "representative" estimates based on the original and the reconstructed meshes (blue), and the differences between the absolute values of the final estimates based on the original meshes and the mean mesh (red). Each symbol represents a different parameter (circleθ 1 , squareθ 2 , triangleθ 3 , starθ 4 ) so it occurs eight times, twice (blue, red) for each HV (A, B, C, D). The horizontal lines are the mean (solid) and q L and q U , where q L and q U are empirical 2.5 and 97.5 percentiles, respectively, of the absolute differences. For each mesh type (original, reconstructed, mean) we take the average between two BO runs (with target and partial error emulation) as the "representative" estimate for the corresponding HV and mesh type.