PVR: Patch-to-Volume Reconstruction for Large Area Motion Correction of Fetal MRI

In this paper, we present a novel method for the correction of motion artifacts that are present in fetal magnetic resonance imaging (MRI) scans of the whole uterus. Contrary to current slice-to-volume registration (SVR) methods, requiring an inflexible anatomical enclosure of a single investigated organ, the proposed patch-to-volume reconstruction (PVR) approach is able to reconstruct a large field of view of non-rigidly deforming structures. It relaxes rigid motion assumptions by introducing a specific amount of redundant information that is exploited with parallelized patchwise optimization, super-resolution, and automatic outlier rejection. We further describe and provide an efficient parallel implementation of PVR allowing its execution within reasonable time on commercially available graphics processing units, enabling its use in the clinical practice. We evaluate PVR’s computational overhead compared with standard methods and observe improved reconstruction accuracy in the presence of affine motion artifacts compared with conventional SVR in synthetic experiments. Furthermore, we have evaluated our method qualitatively and quantitatively on real fetal MRI data subject to maternal breathing and sudden fetal movements. We evaluate peak-signal-to-noise ratio, structural similarity index, and cross correlation with respect to the originally acquired data and provide a method for visual inspection of reconstruction uncertainty. We further evaluate the distance error for selected anatomical landmarks in the fetal head, as well as calculating the mean and maximum displacements resulting from automatic non-rigid registration to a motion-free ground truth image. These experiments demonstrate a successful application of PVR motion compensation to the whole fetal body, uterus, and placenta.


I. INTRODUCTION
T HE recent advent of single shot fast spin echo (ssFSE) T2-weighted sequences has enabled Magnetic Resonance Imaging (MRI) to play an essential role in fetal diagnosis [1] and research.In particular, cases for which ultrasound (US) fails to acquire conclusive image data benefit from fetal MRI [2].Fetal MRI is able to distinguish individual fetal structures such as brain, lung, kidney and liver, as well as pregnancy structures such as the placenta, umbilical cord and amniotic sac [3].It provides improved visualization and structural information of the fetal anatomy, which helps to study abnormalities during pregnancy such as neuro-developmental disorders [4], placental pathologies [5], fetuses with congenital lung masses [6], and conjoined twins [7].MRI is considered to be safe after the first trimester [3] for 1.5T [8]  3T [9] without the use of contrast agents, which may have teratogenic effects.Furthermore, this technology paves the way for researchers and clinicians to analyze correlations between childhood development and prenatal abnormalities.
During image acquisition the fetus is not sedated and moves freely as well as the mother breathes normally.As a result, movements are likely to corrupt the scans, hiding pathology and causing overlap between different anatomical regions.In order to limit these artifacts, fast scanning sequences such as ssFSE [10] allow for the rapid acquisition of single slices at high in-plane resolution in a large field of view and good tissue contrast of the uterus.However, when acquiring a 3D volume through a stack of slices, inter-slice artifacts in the outof-plane views are highly likely.Consequently, this restricts reliable diagnostics to individual slices in the current clinical practice.Fig. 1 depicts a typical example of motion related artifacts in a fetal single-shot fast spin echo (ssFSE) scan.The observed motion (c.f.Fig. 1 b & c) is of unpredictable nature and consists of a combination of maternal respiration movements, fetal movements and bowel movements.Slice-to-volume registration (SVR): SVR combined with super-resolution image reconstruction techniques [11] can be applied to compensate motion between single slices by reconstructing a high-resolution (HR) image from multiple, overlapping low-resolution (LR) images, as shown in Fig. 2. To provide a sufficiently high number of samples for such an approach, multiple stacks of 2D-slices need to be acquired, ideally in orthogonal orientations.A simple LR → HR reconstruction model [11] can be formalized as: where x i denotes the i-th LR image of total N images, and y is the HR image.The matrix W i combines motion, subsampling and degradation effects: W i = DBT i , where D is the sub-sampling matrix, B is the blurring matrix, and T i is the transformation matrix of observation i.  and typically present with anisotropic voxel sizes and intensity inhomogeneities.
Practical limitations: SVR methods have been successfully used to address these problems in fetal MR and are typically applied to small regions and organs with rigid body characteristics that are identified by manual, labor intensive [12], [13] annotations or less precise, automated segmentations [14].Such approaches are prohibitive to whole body and uterus reconstruction because of the assumption of rigid motion in the 2D to 3D registration step of SVR.As a result, different areas in each slice that are likely to move in different directions will break this assumption, e.g., the head and thorax.
Further, an extension of 2D-3D registration to include nonrigid deformations is only well defined with each slice and not well-constrained in 3D.Current SVR approaches will fail in presence of non-rigid deformations and unpredictable organ shapes.This restricts the application of SVR to regions that are manually or automatically annotated.Thus, most of the previous SVR methods have been limited to the fetal brain as the main region of interest for fetal reconstruction due to the high incidence of neuro-developmental disability in premature infants.Only recently, [15], [16] proposed a motion corrected 3D reconstruction of fetal thoracic structures from prenatal MRI.Moreover, SVR is computationally expensive due to the exponential increase of computations with the size of the target area.This leads to prohibitive post-processing times in the clinical practice.Parallelized implementations [17] can address run-time problems, however, methodologically the SVR is still restricted to small, rigid body areas.
Reconstruction of large-scale anatomy: MRI has also been shown to be very useful for the evaluation of the whole uterus and structures like the placenta.During both, normal and high-risk pregnancies, the whole uterine appearance and the condition of the placenta are considered to be an indicator for fetal health after birth [18].Placental functions affect the birth weight as it controls the transmission of nutrients from the maternal to the fetal circulation [19].However, the whole fetal body and secondary uterine parts can be inherently inconsistent.Different fetal body parts can move independently from the uterus.This makes the application of SVR and 2D-3D registration to the full uterus impossible in the presence of fetal motion and maternal respiration.Besides, multiple births is a case where classical SVR pipelines based on preprocessing steps to identify consistent rigid regions will likely fail.The presence of multiple instances of the same fetal structure is usually not considered in previous methods.Therefore, a fully automatic motion correction method for the whole uterus, as it is presented in this paper, is very desirable and will enable the application of standard 3D image analysis techniques, e.g., [20], [21].

A. Related Work
Most motion compensation approaches for fetal MRI are based on SVR techniques that aim to obtain a motion-free and high resolution volume of a fetal target region.Registration of individual 2D slices with a higher resolution 3D volume [22] is the core approach of these algorithms.SVR methods assume that all acquired image stacks are centered at a specific fetal organ (e.g., brain, thorax) and cover three orthogonal image directions.Fig. 3 shows the core elements of SVR and the contribution of previous frameworks from the literature.
The first SVR-based reconstruction framework for fetal imaging was introduced by Rousseau et al. [12].It includes steps to correct 2D slice misalignments, intensity inhomogeneity distortions, and reconstructs an isotropic HR fetal MRI brain image from sets of LR images.Motion correction is done by applying a global 3D rigid alignment between the LR images using one image as a reference to define the global coordinate system.Then every slice of these images is aligned to the initial reconstructed HR volume.Normalized mutual information is maximized using the gradient ascent method for

Initialization
• Volumetric Registrations [12] • Repeated Sampling of Overlapping Slice Planes [23] • Intensity Matching and Bias Correction [25] • Automatic Detection of the Stack with Least Motion [17] Fig. 3. Overview of the required modules of state-of-the-art SVR methods and main components introduced by previous work.
both registration steps.A narrow Gaussian kernel is applied as a point spread function (PSF) for volume reconstruction and empty voxels are filled using the mean of the surrounding voxels.The image contrast is corrected using one LR image as a reference.Jiang et al. [23] introduces the acquisition of many thin slices to provide sufficient sampling of the region of interest.Cross correlation is used as a cost function for the SVR steps assuming that the data have consistent contrast properties.After that, multilevel B-splines are applied to the volumetric reconstruction for data interpolation, which has the advantage of reducing blurring of the reconstructed image supported by including the thin slices.
Kim et al. [13] propose a method for slice intersection motion correction (SIMC) of multi-slice MRI for 3D fetal brain image formation.The method is based on slice-to-slice registration using spatially weighted mean square intensity differences (MSD) of the signal between slices as an energy, assuming that the MRI contrasts are identical.Maternal tissues are excluded from the energy computations using a windowing function of a parametric ellipsoid model.Similar to [12], temporally adjacent slices are grouped together then divided into half iteratively.The splitting process is performed using discrete cosine basis functions.
Gholipour et al. [24] were the first to introduce a mathematical model for super-resolution (SR) volume reconstruction from slice acquisitions of fetal brains.The main difference to previous reconstruction methods is that it includes knowledge of the slice acquisition model and the SR reconstruction is performed based on maximum likelihood and a robust M-estimation minimization for an error norm function.A regularization term is also added to the cost function in order to enforce a solution when the number of acquired samples is not high enough for solving the reconstruction problem.
Murgasova et al. [25] were able to reconstruct the fetal brain using intensity matching and complete outlier removal.The main steps of their reconstruction method are: (i) 3D registration of the acquired stacks using a template stack; (ii) extracting region of interest (the fetal head) from all the stacks; (iii) intensity matching and bias correction between the slices based on an EM framework, where the differential bias fields and slice-dependent scaling factors are estimated during the reconstruction; (iv) motion correction using [12] based on the normalized cross correlation as a similarity measure and an approximated 3D Gaussian PSF similar to [23].A posterior probability is used to define the inlier and outlier voxels within the EM framework in order to remove the motion-corrupted artifacts and misaligned data.Blurring in reconstructed images is reduced by integrating edge-preserving regularization based on anisotropic diffusion within the SR reconstruction framework.
Kainz et al. [17] developed a fast multi-GPU accelerated implementation for the method presented in [25], which is based on 2D-3D registration, SR with automatic outlier rejection and an optional intensity bias correction.They extended the reconstruction framework by automatically selecting the stack with least motion as the reference stack and using a fully flexible and accurate PSF instead of approximated functions.Using a multi-GPU framework enabled the SR reconstruction process to be approximately five to ten times faster than using a multi-CPU framework.

B. Contributions
In this paper we propose and evaluate a new paradigm for motion correction based on SVR and flexible subdivision of the input space into overlapping, highly redundant and partly rigid image patches [26], thus solving the motion compensation problem for large field of view reconstructions.We split the input into small overlapping areas and find these, which contain rigid components.This allows to iteratively learn their consistency compared to a global reconstruction optimization volume.Corrupted and inconsistent data is automatically identified and excluded using robust statistics.The proposed approach facilitates the automatic reconstruction of whole collections of motion corrupted stacks without the need of corresponding image segmentations.By treating rigid image patches as piecewise constant segments of organs further allows limited correction of non-rigid tissue motion.The presented patch-to-volume reconstruction (PVR) method finds rigidly connected areas automatically, which can be used as segmentation prior for further refinement using conventional SVR in small regions of interest.In contrast to [17], we further introduce a multi-scale patch definition approach and thoroughly evaluate the reconstruction quality of the whole uterus including the fetal brain and placenta.We test the breaking points of SVR and variations of PVR on synthetically motion corrupted brain phantom data.The presented approach is the only currently available method that is able to reconstruct fetal organs and detailed 3D volumes of secondary, non-rigidly moving structures such as the placenta.II.METHOD SVR-based motion compensation methods make use of the assumption that rigid regions, e.g., brain and thorax, of 2D input slices deforms rigidly, where a global 3D volume is reconstructed by iteratively registering these 2D input slices.We propose to increase the granularity of the input data by using 2D data patches of arbitrary shape instead of whole slices for SVR reconstruction.We explore square patches and dilated superpixels [27] for the definition of the patch shape.Superpixels provide a method to define semantically meaningful regions while reducing the required data redundancy and computational overhead.
PVR relies on the fact that certain regions of the scanned anatomy are rigid and can be reconstructed with SVR superresolution algorithms.However, unlike SVR, it is fully automatic and provides a full field of view reconstruction.Data consistency is obtained by oversampling a region of interest at different scan orientations.Robust statistics can be used to identify mis-registered or heavily corrupted data [24], [25].Fig. 4 depicts a schematic overview of the proposed PVR framework.
Input Data & Initialization: A template stack is either randomly or automatically chosen from available input stacks by detecting the stack with the least motion artifacts [17].Global intensity matching is applied to normalize intensity values of all input images followed by global 3D-3D alignments to spatially initialize the reconstruction target.Input data can be represented as stacks of 2D images (patches) consisting of where y s is a patch of arbitrary 2D-shape and indexed by the location s. S is the set of all locations in all p stacks, S = {s 1 , s 2 , ...s M }, and M is total number of patches.Patch Extraction: In the simplest, naïve case the shape of y s is square, and defined via its size a and stride ω.Such definition is generally applicable to any kind of oversampled motion corrupted data.If a and ω are fixed, no prior knowledge about the data is assumed.However, ideally each y s corresponds to a meaningful subregion of the volume in which motion can be characterized as rigid.Typically, the square patches are overlapping to provide redundant representations of the same locations.Such approach is computationally expensive with increasing number and patch size a and additional consideration must include the inherent trade-off between a and the assumption of it containing rigid motion.An alternative to naïve shape definitions of y s is to find correlation between voxel locations and its neighbors, which can be found by unsupervised image segmentation techniques such as superpixels (SP) [27].These techniques allow for obtaining similar-sized segments from local intensity information (see Fig. 5) instead of employing dense sampling of overlapping patches, enabling the image reconstruction with fewer but more useful data blocks.Further, reducing the total amount of required data blocks for reconstruction lowers the computational overhead, positively impacting the overall runtime.Additionally, larger rigid areas require less computational effort for image registration and super-resolution, and more importantly less dependency on inherent image data parameters (e.g., voxel spacing, organ size, subject size).
While there are several techniques for generating SP in the literature [28], [29], [27], a fast and efficient SP approach is desirable for the clinical practice.Simple linear iterative clustering (SLIC) [27] allows to obtain regular SP based on minimizing the distance D between the centroids of SP with an initial step size a.D is defined as: where d c and d s are the intensity and spatial Euclidean distances that are controlled by an adaptive compactness parameter t for each SP.Similarly to naïve shape definitions, we can define the initial size of the SP as a and its dilation size γ%.
Multi-scale Patches: Although larger patch regions are less likely to include rigidly connected regions, they may perform better during 2D-3D registration due to the additional contextual information of each patch.In contrast, smaller patch sizes are more likely to represent rigidly deformed regions, but provide less contextual information, potentially affecting the 2D-3D registration.A good trade-off between the size of the patch region and the likelihood of rigid motion needs to be found.Here, we propose the use of multi-scale patches for reconstruction to exploit the advantages of different patch sizes.We represent input data as stacks of 2D patches: where, instead of using the same Y as a unique input, different scales Y i are used for each iteration i at γ% of its original size.S i is similar to Eq. 2 the set of all locations in all p stacks but with different size for each iteration i.Additionally, to increase contextual information for estimating the transformations, we compute overlapping y s patches and dilate each superpixel y s by γ pixels using a flat structuring element b with a fixed neighborhood (26 px in our case), hence ȳs = y s ⊕ b.Clearly, the smaller the choice of γ, the faster is the reconstruction.Ideally γ is > 50% of a, ensuring that every pixel is covered by multiple samples.Patch to Volume Registration: An HR-image X is reconstructed from a number of motion corrupted patches y s using 2D-3D registration-based super-resolution similar to [25], [17], where an accurate PSF calculation is used to generate a gradually improving approximation of X and further employed to initialize the 2D-3D registration and computation of robust statistics.In [25], [17], the PSF equals to a sinc function for the in-plane and the slice profile for the through-plane, measured for the employed MRI sequence (ssFSE), according to [23].
We employ an implementation of the PSF function by applying a Taylor series for a better approximating of small values close to 0. We cut the series after several terms and bound the remainder based on relative error .The Taylor series approximation of the sinc function is defined as sinc The proposed approximate PSF achieves substantial qualitative improvement in the quality of the reconstructed image compared to the sinc implementation.An example from the first iteration of a fetal brain reconstruction is shown in Fig. 6.During the optimization process, individual 2D patches are continuously rigidly registered to the current 3D reconstruction of X and reintegrated into X using iterative super-resolution with gradient descent optimization.Any similarity metric can be used as a cost function for the registration step such as mutual information [12], [25], cross correlation (CC) [23], [17], or mean square intensity differences [13], [24].Choosing the best similarity metric for reconstruction depends on the input data.CC has been found to be effective for input data with similar intensity distribution [30].In our experiments, we employ CC as the similarity metric for 2D-3D registration, after rescaling the intensities between the input stacks.
EM-Based Outlier Removal: Correctly registered patches ȳs should provide a higher contribution to the final reconstruction, presenting with a low error e when compared to the original image data.[24] initially introduced an approach to account for outliers during super-resolution based on Huber function statistics.Similar to [25], we employ the expectation maximization algorithm for outlier removal by classifying ȳs and the included pixels into an inlier and outlier class.A zeromean Gaussian distribution G σ (e) with variance σ 2 is used for the inliers and a uniform distribution with constant density for the outliers.This makes use of available, highly redundant information (i.e., overlapping ȳs ), to find partly matching patches and to depreciate or fully reject erroneous voxels.We aim to maximize the log-likelihood for each patch to be part of a region of rigid motion.Φ is the current estimate of the reconstructed volume X, the variance σ 2 of the errors e, and the proportion of correctly matched voxels c.The posterior probability for a pixel ∈ ȳs being identified as inlier is We perform the updates of c and σ 2 according to [25]: where N is the number of pixels in ȳs .We further define an inlier and outlier probability for each ȳs and exclude it from processing if classified as an outlier (e.g., if it contains structures moving in opposite directions during scanning, such as the fetal head and thorax).Only if information in ȳs is consistent with the originally acquired data, the registered patch will remain contributing to the SR reconstruction of X.

Identification of Rigid Regions and SVR Refinement:
The rigidity of regions is measured by keeping track of the probability p of each pixel of every ȳs .This allows to identify locations best fitting the rigid 2D-3D registration constraints.Integrating p and p into a 3D volume using the same PSF as for the reconstruction identifies candidate regions, solely containing rigid motion components [26].This can further be used to initialize the rigid SVR reconstruction or to visualize the data uncertainty during reconstruction.

III. IMPLEMENTATION
Parallelization: The high data redundancy required for the proposed approach makes conventional single threaded implementation practically not feasible.Computational complexity of PVR is exponentially higher than SVR, depending on the employed patch overlap.For optimal performance we implemented our approach via General-Purpose Programming on Graphics Processing Units (GPGPU) using the Compute Unified Device Architecture (CUDA, NVIDIA, Santa Clara, CA) language [31], [32].CUDA is a highly evolved single instruction multiple data (SIMD) programming language, which allows a large part of the proposed framework to be mapped onto GPU hardware.Currently, CUDA is the only high-level general purpose GPU language that provides, for example, bidirectional texture access via surfaces in a kernel, which is essential for the efficient implementation of certain parts our framework.In this section we discuss the key implementation details.
We use a modular design to allow experimentation with the separate components of the algorithm.An overview of this design is shown in Fig. 7.The modules are encapsulated in a CUDA library, which can be used independently from the instantiating framework.We employ the successor of IRTK 1for interfacing with medical image data.
PVR is parallelized on three levels: I. Patch-level: Individual patches are mapped to blocks of a CUDA computing grid and the contained voxels are mapped to individual threads.Depending on the used GPU hardware, patch processing can be also mapped directly to the computing grid, such that each thread works on a complete patch (limited by the employed patch size).The resulting thread divergence provides opportunities for advanced GPU scheduling strategies [33] and for a direct translation of optimization strategies for image registration, for example patch-wise gradient descent.II.Voxel-level: For the parallelization of PSF-based superresolution and robust statistics we follow a similar three-folded procedure definition as used in [17].The voxels within each patch are processed using kernel level parallelization and parallel pixel-volume, volume-pixel, and volume-volume procedures are applied.III.Patch-batch: PVR scales to multiple GPUs through distributing independent subsets of patches over the desired number of devices.Synchronisation is done through averaging of the resulting sub reconstruction volumes on the master GPU.Initial 3D-3D registration is performed on a single master GPU, which allows optimal coalesced memory access.

IV. EVALUATION & EXPERIMENTS Evaluation of Adult Brain MRI Reconstruction:
We evaluate the performance and limitations of PVR in terms of accuracy and robustness with synthetic non-rigid deformations of adult brain data.Similar to [24], an isotropic 1 mm 3 T2weighted adult brain phantom with no noise obtained from the Brainweb database [34] is used for this experiment.Synthetic non-rigid motion artifacts are generated by skewing (shearing) the original image using: where we use one combined skewing value in the xyzdirection defined by S xyz = tan(±θ • xyz ).After that a motioncorrupted 3D stack is constructed by sampling 2D images from both skewed and motion-free stacks in an interleaved manner similar to fetal MRI acquisition [35].Three stacks are used for the reconstructions where each stack is sampled with a voxel size of 1.25x1.25x2.5 mm 3 .We use standard axial, sagittal, and coronal orientations as shown in Fig. 9.An HR image with isotropic voxel size 1.25 mm 3 is reconstructed using SVR [17], square patch-and superpixel-based PVR.
Evaluation of Fetal Organ MRI Reconstructions: Evaluating the quality of reconstructed fetal MRI is challenging due to the absence of motion-free ground truth data.For this purpose, we introduce a novel approach for this evaluation problem based on the originally acquired slice images.Assuming that 2D in-plane patches extracted from the original stacks contain no motion artifacts, we use them as gold standard and compare them with corresponding simulated patches from the reconstructed volume.Evaluation metrics (see Sec. IV-A) are computed between the reconstructed input stacks and the final motion corrected image and averaged over the whole volume.The fetal brain is typically used to assess the quality of reconstruction as it moves rigidly, fulfilling the rigid motion assumption for SVR-based methods in the 2D-3D registration step.However, soft tissue organs such as the placenta deform non-rigidly.For this reason, we additionally chose to reconstruct the placenta and the whole uterus as challenging test cases for PVR and SVR.Fig. 7.The software modules defined for the implementation of the proposed approach.For implementation details, please refer to the provided source code.

A. Evaluation metrics
We employ the following metrics for measuring the quality of the reconstructed image: Cross-correlation (CC) to measure the similarity between the intensities of input I(i, j) and reconstructed image Ĩ(i, j) at the location (i, j), which is defined as: where N and M are the dimensions of a 2D slice.
The peak signal-to-noise ratio (PSNR) is used to measure the error introduced by motion and is based on the mean squared error (MSE) between the original 2D in-plane patch and the reconstructed image.PSNR is defined as: where I max is the maximum intensity in the original image.
An improved reconstruction quality usually results in higher PSNR.However, PSNR does not reflect well subjective human perception of image quality as it is mainly based on estimating absolute errors between individual pixels.The structural similarity index (SSIM) accounts for image degradation as perceived changes in structural information [36].It measures the structural similarity by comparing normalized local patterns of pixel intensities, which is similar to the human visual system's abilities to extract information based on structure.The SSIM is defined as: where µ I , µ Ĩ , σ 2 I and σ 2 Ĩ are the average and variance of the intensities of the original 2D in-plane slice and the reconstructed slice respectively.σ I Ĩ is the covariance of I and Ĩ. c 1 and c 2 are defined as (k 1 L) 2 and (k 2 L) 2 in order to balance the division with weak denominator, where L is the dynamic range of the intensities in image I and k 1 .k 2 equal to 0.01 and 0.03 similar to [36].Structural dissimilarity (DSSIM) heat maps are calculated in order to visualize the dissimilarities between original and reconstructed images.DSSIM is calculated as a distance metric derived from SSIM: V. RESULTS

Reconstruction of Adult Brain MRI:
Experiments on adult brain MR data using the Brainweb database [34] included introducing synthetic non-rigid motion artifacts as described in  10 for PSNR, SSIM and CC.For all metrics, PVR shows an improved performance over SVR, particularly in presence of deformations with higher skewing angles.Further, we observe that superpixel-based PVR achieves similar performance as PVR using arbitrary square patches, while requiring a lower amount of input patches.show an improved visual appearance and less blurring in the region with severe motion artifacts (arrow).An example of a challenging clinical case with a kidney malformation in one of twin fetuses, is shown in Fig. 8.Our clinical partners confirmed that such complications are easier to examine and to quantify after PVR-based reconstruction.
Comparative experiments of PVR variants were carried out on 32 fetal MR scans at gestational ages of approximately 20 weeks, presenting with challenging image corruption.Tab.I (a) & (b) show numerical results of evaluating individual stacks before reconstruction (baseline), and the final reconstructed image using square patches, superpixels and multiscale variants of PVR.Statistical testing between baseline and PVR variants was carried out using paired T-Tests and differences between using fixed or multi-scale and using square patches or superpixels were assessed via Two-factor ANOVA with repeated measures.In Tab.I (a) & (b) the names of PVR variants are marked in bold if statistically significant differences have been found during analysis, i.e., FS and MS and/or Square Patches and Superpixel pairs are bold if the results between them differ significantly.
The evaluation of the reconstruction quality of a whole 3D image into a single-valued metric may not properly reflect the performance differences, as it is based on averaging values of all the pixels of all the input stacks.Furthermore, Tab.I indicates significant differences between variants of PVR but these differences have only minimal qualitative effect on reconstruction accuracy.Therefore, Fig. 12 evaluates the reconstruction quality of PVR additionally using dissimilarity heat maps based on the measured DSSIM (see Sec. IV-A).This approach allows further qualitative evaluation and allows for uncertainty visualization of PVR reconstructions.Performance Analysis: We further evaluate the computational performance of each PVR variant.Measuring the overall runtime is not meaningful because this would be highly machine specific and would include data transfer overhead and non optimized functions.The runtime varied between 2000-4000s on our testing machines, depending on the system configuration.Instead we are analyzing the computational overhead introduced by PVR compared to SVR.The overhead can be measured by counting the number of processed patches and the number of additionally processed voxels.We compare these values to the achieved reconstruction quality in Fig. 13.Multi-scale superpixels show significantly better performance than other PVR variants and introduce the minimum necessary overhead while gaining the same image quality than more naïve PVR variants.Multi-scale superpixels are potentially five times faster than other variants.

VI. DISCUSSION & CONCLUSION
We have introduced the concept of patch-to-volume reconstruction (PVR) in order to compensate non-rigid motion arti-  facts from fetal MRI scans without requiring a defined region of interest.PVR splits the 3D input image into overlapping square patches and superpixels and employ automatic EMbased outlier rejection to find consistent data.
Our method is able to automatically reconstruct whole collections of motion corrupted stacks without the need of image segmentations and manual identification of rigid regions.We have shown that PVR can reconstruct the whole uterus, selected fetal organs, and secondary, non-rigidly moving pregnancy structures such as the placenta.
PVR's reconstruction quality has been evaluated quantitatively and qualitatively on an adult phantom T2-weighted brain with synthetic non-rigid motion artifacts, as well as on the whole uterus from motion corrupted fetal MRI data including fetal brain, placenta and cases with multiple births.
PVR surpasses the state-of-the-art SVR method especially for considerable non-rigid deformations.We have evaluated different variants of PVR using fixed-size and multi-size square patches and fixed-size and multi-size superpixels.ANOVA analysis has shown significant differenced between these approaches for different areas of the uterus.However, evaluation of motion compensation methods is difficult especially due to the lack of ground truth in fetal MRI.Mapping the reconstruction quality of a whole 3D volume into a singlevalued metric may not properly reflect qualitative differences, as it is based on averaging all measured values of all the input stacks.Therefore, we have performed extensive qualitative analysis and present several examples and evaluation based on structural dissimilarity (DSSIM) heat maps.
In addition to reconstruction and motion correction of the whole uterus, we have also shown that our method works for multiple births cases with multiple fetuses sharing the same womb.These cases are more likely to have complications and to undergo MRI during pregnancy but would require extensive manual effort to be successfully reconstructed with state-ofthe-art methods.
Although our method is able to reconstruct the whole uterus automatically, small parts like limbs that move rapidly between the acquisition of individual slices are more difficult to recover.This is especially problematic for very young fetuses that have more space to move inside the womb.In cases of extreme limb movements (>2 cm between individual slices) PVR is not able to find structural consensus between overlapping patched and blurry image regions will be reconstructed.This is a general problem of automatic intensity-based optimization methods and different methods that are able to understand the semantic content of each patch will be required for future improvements.
PVR introduces a considerable computational overhead to the reconstruction stage of fetal MR image processing pipelines.We have evaluated the amount of necessary additional redundant information to give a general idea about the expected runtime of different PVR variants.Patches based on multi-scale superpixels are significantly more efficient than a naïve implementation of overlapping square patches, while maintaining a similar reconstruction accuracy.Quantitatively, square patches perform slightly better for the brain, which is most likely due to the rigid nature of the enclosing skull.Superpixel-based patches achieve better results for regions that are likely affected by non-rigid movements like the placenta and the whole uterus.

Fig. 1 .Fig. 2 .
Fig. 1.Three view-planes for raw 3D data acquired through stacks of ssFSE images covering the whole uterus.The transverse (a) is the in-plane view, i.e., native 2D slice scan orientation.Motion causes streaky artifacts for multiplanar reconstructions (MPR) in orthogonal views (b) and (c) caused by both maternal and fetal movements between the acquisition of individual slices.

Fig. 5 .
Fig. 5.An illustrative figure showing both square patches and superpixels methods for the patch extraction step.A 2D superpixel shows more flexibility than a square patch in extracting rigid regions or similar voxels.In practice, superpixels are dilated with few pixels to include some contextual information in order to increase the accuracy of the patch to volume registration step.

Fig. 4 .
Fig. 4. A schematic and modular overview of the proposed patch-to-volume reconstruction (PVR) framework.The key parts are 3D-3D registration, patch extraction, 2D-3D registration, super-resolution, and EM-based outlier removal.Core contributions of PVR are written in red and marked with asterisk.

Fig. 6 .
Fig. 6.Example for the observed differences in the first iteration of a fetal brain MRI reconstruction (a).(b) shows a magnified region using a sinc function for the PSF similar to [26] and (c) shows the result from using a Taylor series approximation of the sinc function as used in this work.Taylor series approximation allows a better approximating of small values close to zero.(d) shows the difference between both images.

Fig. 8 .
Fig. 8. Three viewing planes through the originally scanned (a) and the reconstruction (b) of a motion corrupted scan from moving twins with a gestational age of 28 weeks using multi-scale superpixels.For this dataset we used a mask of the uterus to save unnecessary computation time in areas containing maternal tissue.The white arrow points at a unilateral multicystic kidney of one of the twins.

Fig. 9 .Fig. 10 .
Fig. 9. Synthetic motion artifacts caused by skewing on the Brainweb Adult MRI Phantom [34].Rows: MRI in standard orientations: coronal, axial, and sagittal.Columns: original, coronally, axially and sagitally sampled.Reconstruction of Fetal Organs: Exemplary PVR and SVR reconstructions under motion introduced by kicking of the fetus are shown in Fig. 11.PVR reconstruction results

Fig. 11 .
Fig. 11.Example reconstructions of consecutive MR scans of a moving fetus (kicking): input data (a) and corresponding cutting planes through a SVRreconstructed (b) and PVR-reconstructed (c) volume.SVR produces blurry but readable results because of high data redundancy and outlier rejection through robust statistics.PVR with square patches of a = 32 and ω = 16 appears visually superior.The arrow points at an area of substantial quality differences caused by independent rapid movements of the leg.

Fig. 12 .Fig. 13 .
Fig. 12.A sample 2D cutting plane through a motion-corrupted fetal brain (a) and placenta (f), after PVR using square patches with a = 32 and ω = 16 (b) and (g).The DSSIM heat map for a baseline before reconstruction (c) and (h), and after PVR (d) and (i).The average DSSIM of the fetal brain equals 0.497 (c) and 0.248 (d), while for the placenta equals to 0.491 (h) and 0.214 in (i).
The noise of observation i is represented by n i .Any LR image can be considered as a down-sampled, motion corrupted, blurred, and noisy version of the HR image.The resulting reconstruction problem can be divided into two main parts: motion correction (estimating W i ) and super-resolution reconstruction (estimating y).Image registration can be used for estimating motion, interpolation for obtaining a uniformly spaced HR image, and regularized super resolution with automatic outlier rejection for removing blur and noise.Volumetric fetal MR image reconstruction is more challenging than typical image reconstruction problems due to unconstrained random motion during slice acquisitions.Slice misalignments can lead to a loss of spatial coherence arXiv:1611.07289v2[cs.CV] 25 Nov 2016

TABLE I AVERAGE
(A) PSNR AND (B) SSIM RESULTS (N = 32) FOR THE INPUT STACK (BASELINE) AND PVR VARIANTS WITH FIXED (FS) AND MULTI-SCALE (MS) VARIANTS OF SQUARE PATCHES AND SUPERPIXELS.ALL MEAN DIFFERENCES OF PVR AGAINST BASELINE ARE STATISTICALLY SIGNIFICANT (P <0.05).NAMES OF ALL STATISTICALLY SIGNIFICANTLY DIFFERENT PVR VARIANTS ARE STATED IN BOLD.