Segmented simultaneous multi‐slice diffusion‐weighted imaging with navigated 3D rigid motion correction

To improve the robustness of diffusion‐weighted imaging (DWI) data acquired with segmented simultaneous multi‐slice (SMS) echo‐planar imaging (EPI) against in‐plane and through‐plane rigid motion.


| INTRODUCTION
Diffusion-weighted imaging (DWI) is a key contrast in both diagnostic imaging and neurological applications. 1,2 Strong diffusion gradients encode diffusion information on the scale of 10 microns, 1 but make the sequence also susceptible to various types of patient motion. 2 Model-based image reconstructions 3 can help alleviate the trade-offs between scan design and motion sensitivity by properly modeling the signal variations over time.
Single-shot echo-planar imaging (EPI) has for years been the clinical standard for DWI, as it suppresses motion artifacts by fast one-time k-space traversal 2 being commonly accelerated by parallel imaging 4 using SENSE 5,6 or GRAPPA. 7 Multi-shot techniques 8 have been developed to mitigate the shortcomings of single-shot EPI, such as limited resolution and susceptibility to geometric distortions, but come at the cost of further sensitivity to shot-to-shot variations. Modelbased reconstructions are usually classified according to the signal that is used to sense the shot-to-shot variations. Navigated methods acquire additional MR signals for this purpose, while extra-navigated methods use external sensor signals. Self-navigated methods derive navigation signals from the imaging data itself and navigator-free methods yield problem formulations that do not require explicit navigation.
In addition to the in-plane accelerations through segmentation/multi-shot, simultaneous multi-slice 38 (SMS) approaches offer scan time reductions without an immediate loss of signal-to-noise ratio (SNR). Controlled aliasing in parallel imaging (CAIPI) techniques 39,40 improved the coil encoding efficiency and were successfully combined with EPI by blipped-CAIPI 41 and SENSE. 42,43 Combining SMS and segmented DWI, several algorithms have been proposed for shot-to-shot phase corrections with navigated, 44 self-navigated, 45,46 and navigator-free 47 approaches. Further macroscopic motion corrections have been developed using external tracking devices and intermediate non-diffusionweighted navigators. 48 Recent approaches exploit SMS acceleration for single-shot EPI to obtain signal support in the slice direction and propose SMS-to-volume registration for 3D rigid motion-corrected fMRI 49,50 and diffusion tensor fitting. 51 In this work, we extend the navigated IRIS approach for segmented DWI 16 with SMS 44 and investigate the potential of 3D rigid motion-corrected DWI reconstructions. The proposed algorithm, termed motion-aware SMS-accelerated and interleaved image creation (MoSaIC) for DWI, estimates shot-specific phase maps and 3D rigid motion from low-resolution SMS navigators and reconstructs motioncorrected DWI volumes per diffusion direction. We further implemented an SMS extension for the IRIS 16 algorithm, termed SMS-IRIS, as a phase-navigated reference algorithm without rigid motion correction.

| Segmented SMS DWI sampling
The proposed method is based on a navigated Stejskal-Tanner spin-echo sequence for DWI. 16 The first spin echo samples the interleaved high-resolution image echo, while a second spin echo acquires a low-resolution navigator echo at a lower undersampling rate. The echo-spacings are adjusted to match the off-resonance-induced distortions in phase encoding direction for both samplings. The navigated DWI sequence is extended to SMS and blipped-CAIPI encoding as described by Dai et al. 44 The pulse sequence is visualized in Supporting Information Figure S1, which is available online.
The reconstruction problem for segmented SMS DWI under shot-to-shot phase variations und macroscopic intershot motion is summarized in Figure 1. The high-resolution images of one SMS slice group for two interleaves in Figure 1B were produced from an in vivo case using the motion-informed forward model. The shot-specific phase and macroscopic motion variations cause severe ghosting and blurring for the motion-unaware 2D-SENSE 43 in Figure 1C. The MoSaIC algorithm corrects for the shot-to-shot phase variations and 3D rigid motion.

| Model formulation for 3D motion corrected DWI
The model extension to address through-plane motion requires a full-volume reconstruction framework as used in Ref. 28, rather than the standard slice-by-slice reconstruction.
The slice positions of an SMS stack in the scanner frame are in the first place determined by the slice gradients and the RF excitation, independent of subject motion. For pure in-plane motion, the anatomies therefore stay in the same SMS stack as long as the object remains within the field of view (FOV). For through-plane motion, new anatomies enter the slice positions decoupling the strict assignment of anatomies to SMS slice stacks so that a full-volume reconstruction is required to resolve the motion. The model furthermore assumes that the signal encoding remains unchanged in the scanner frame and is explicitly unaffected by macroscopic subject motion. Thus, the spatial profiles of the coil sensitivities stay valid during the whole scan and continuously weight the new anatomies entering the imaged slice stack.
To facilitate the model description, some notations are given here in advance. N x , N y and N z are the number of voxels in readout, phase, and slice encoding direction, respectively, giving in total N = N x N y N z voxels. The volume is sampled with N c coils, N MB simultaneously acquired slices and N i interleaves. A full volume is covered by N g = N z ∕N MB SMS slice group excitations. Each of the N d diffusion directions is thus sampled by N shots = N i N g shots. For fully interleaved sampling, the number of samples is N b = N c N shots N y ∕R P N x with an effective shot reduction of R P = N i in phase encoding direction.
The individual shot operators of the forward model are visualized in Figure 2. The individual (and parallelizable) shot operators are stacked in block-diagonal structure into the following multi-shot operators. Based on the formulated assumptions, the forward model relates the multi-shot and multi-coil data b ∈ ℂ N b to the image volume ∈ ℂ N by several linear operators and a complex Gaussian white noise vector ∈ ℂ N b : Segmented SMS DWI sampling and reconstruction problem for a four-fold interleaved 3-SMS DWI acquisition. A, This example visualizes the segmented SMS sampling. The variations of two interleaves from the same SMS slice group are shown (please note that here magnitude and phase are illustrated for each dataset) (B), along with two SENSE-based reconstructions (C). The two interleaves contain diffusionrelated physiology-induced phase variations (orange arrows) causing ghosting and signal dropouts in a CAIPI-adapted 2D-SENSE reconstruction. Second, macroscopic motion, here in-plane, (red arrow) is a common problem, which blurs anatomical structures. Moreover, through-plane motion mixes the anatomies of different slice groups. MoSaIC is a navigated technique that produces full volume reconstructions per diffusion direction with 3D rigid motion and physiological phase correction. (C) This can be seen for the transversal example slice as well at the coronal reformat next to it volume according to the macroscopic shot motion parameters. Second, the slice-sampling operator M sms (N shots N MB N y N x × N shots N z N y N x ) selects the SMS slices excited for each shot. Third, the physiological motion operator Φ (N shots N MB N y N x × N shots N MB N y N x ) applies the shot-and slice-specific phase variations, followed by the coil weighting of the sensitivity operator C (N c N shots N MB N y N x × N shots N MB N y N x ). Fifth, the CAIPI operator Θ (N c N shots N y N x × N c N shots N MB N y N x ) combines the SMS slices including CAIPI shifts. Finally, the coil images are Fourier transformed by F to sample the shot-specific trajectories using M trj (N c N shots N y ∕R P N x × N c N shots N y N x ).

| Model-based image reconstruction
The optimization problem for motion-corrected segmented SMS DWI is stated as a regularized data discrepancy minimization: where the weighted norm ‖⋅ ‖ 2 Ψ − 1 = (⋅) H Ψ − 1 (⋅) integrates the noise covariance matrix Ψ for SNR-optimal reconstruction as used for SENSE. 5 R is a regularization function and a weighting factor. The proposed MoSaIC algorithm employs an ℓ1-norm regularization R ( ) = ‖T ‖ 1 in a wavelet domain 52 with transform operator T (N × N ). The image volume , the shot phase operator Φ and the macroscopic motion operator Ω are considered unknown. The problem is difficult to optimize due to the non-convexity associated with both Ω and Φ.
Navigation approaches are used to linearize and simplify the nonconvex optimization by estimating reconstruction parameters from an additional signal. 16,29,44 The proposed method leverages the low-resolution navigator echo to estimate the shot phase operator Φ and the macroscopic motion operator Ω. For known Φ and Ω, Equation (2) with ℓ1-norm regularization represents a convex optimization problem for , which can be readily solved using fast gradient projections (FGP). 53

| Data acquisition and preprocessing
DTI data were acquired using a navigated Stejskal-Tanner spin-echo sequence with blipped-CAIPI 41 SMS excitation similar to the work by Dai et al. 44 Unlike this publication, the prewinder phase offset offset 44 was implemented as originally proposed, 41 as for this SENSE implementation the samples are not required to lie on an integer-valued k-space grid. Thus, the k z -sampling is centered around zero by offset = − (D − 1) 2 2 D , D ∈ ℕ for a FOV y ∕D CAIPI shift between two adjacent SMS slices. The first image echo was Illustration of the shot-specific forward model employed for motion-corrected segmented SMS DWI (using a SMS factor of 2). The image volume is resampled by the macroscopic motion operator Ω s according to the motion transformation of shot s. The slice-sampling operator M sms s selects the shot-specific slices. The signal is weighted by the diffusion shot phases and coil sensitivities in Φ s and C (s) . Θ (s) applies the blipped-CAIPI encoding and slice combination followed by the Fourier operator F (s) and shot-specific trajectory sampling M trj s . The subscript (s) with shot index s in parentheses indicates shot operators that are equal for all shots. The multi-shot operators (Equation 1) are obtained by embedding the shot operators in block-diagonal structure sampled at high-resolution in an interleaved fashion with a CAIPI shift of FOV y ∕N MB (D 1 = N MB ). The second echo used a moderately accelerated low-resolution sampling with a fixed CAIPI shift of FOV y ∕2 (D 2 = 2). The navigator CAIPI shift was fixed for simplicity reasons. Given the z-encoding capabilities of the used 32-channel coil, the CAIPI encoding showed sufficient slice disentangling capabilities. The SMS slice groups were sampled in an interleaved ordering to reduce slice crosstalk. 38 Fat suppression was performed by spectral pre-saturation with inversion recovery. The SMS slices were excited with slice-specific radiofrequency (RF) phases to reduce the peak B1. 38 Further sampling parameters are listed in Table 1.
The data were acquired on a 3T Philips Ingenia Scanner (Best, The Netherlands) using a 32-channel head coil. The scan session was carried out on five healthy volunteers. d. DW directions #10-14 continuous through-plane motion ("nodding" trajectory) The subjects were asked to move at moderate rates of change to avoid the dominance of macroscopic intra-shot motion effects on the sensitive DWI sequence and provide sufficiently non-compromised data. Coil sensitivity maps were acquired once in advance using a gradient-echo prescan. 5 The EPI data preprocessing involves gridding the ramp samples in the readout direction and applying odd/even echo Nyquist ghost correction, based on EPI reference data jointly acquired for all SMS slices. The SMS phase from isocenter offsets in the slice direction was corrected for both the image and the navigator echo in advance. 41,42

| Proposed algorithm: MoSaIC
The proposed navigated algorithm, termed motion-aware SMS-accelerated and interleaved image creation (MoSaIC) for DWI, is described in the following four sections: navigator reconstruction, navigator analysis, linear operator construction and full-volume image reconstruction. Figure 3 provides an overview of the reconstruction pipeline.

| Navigator reconstruction
The navigator data are upsampled to the high-resolution voxel size of the image echo using zero padding in k-space and a triangular apodization window to reduce Gibb's ringing. The navigators are reconstructed by 2D-SENSE 43 with Tikhonov regularization (regularization parameter nav = 0.05) recovering the unfolded slice groups of each shot (green box on the bottom left of Figure 3). Finally, residual slice-specific phases from RF excitation and blipped-CAIPI slice encoding are corrected.

| Navigator analysis
The navigator analysis contains a shot phase extraction, a shot rejection scheme and a macroscopic motion estimation. The phase extraction module obtains the phase maps s,l of shot s and slice l of all diffusion-weighted shots from the low-resolution navigators.
Compromised shots are rejected using robust statistics on the ℓ2-norm of the diffusion-weighted navigator shot data. The shot rejection module uses the median absolute deviation (MAD) over all diffusion-weighted shot datasets and excludes shots whose ℓ2-norm drops below the threshold of 5 ⋅ MAD from the median. The threshold was empirically set from analyzing the motion-corrupted navigators over multiple subjects and diffusion directions. Rejected shots are excluded from the registration strategy and from the fullvolume reconstruction.
The macroscopic motion module estimates 3D rigid motion parameters using a shot-wise multi-slice-to-volume registration together with a motion detection strategy. The navigators are downsampled to roughly isotropic resolution of (4 mm) 3 . The first full diffusion-weighted navigator volume (orange square in Figure 3) is used as the reference volume and is set as the moving image for the registration. This choice avoids resampling issues of the discrete SMS data. 54 The non-DWI data (b = 0 s/mm 2 ) is assumed static and not registered in this work.
The multi-slice-to-volume registration was implemented in SimpleITK 55 with the following subtasks: 1. Volume-to-volume (Vol2Vol) registration of each full navigator volume per repetition time (TR) a. Method of moments pre-alignment b. Gradient descent-based registration 2. Motion detection per diffusion direction comparing TRwise Vol2Vol parameters 3. SMS-to-volume (SMS2Vol) registration per shot (in acquisition order) if motion detected a. Linear interpolation of Vol2Vol parameters to shot time points F I G U R E 3 Overview of the proposed MoSaIC algorithm for a DTI acquisition. The DTI acquisition timeline loops over the diffusion-/T 2weightings, the interleaves and the SMS groups (listed from outer to inner loops). All shot navigators are reconstructed by 2D-SENSE 43 (left green box) and the shot navigators from the reference TR (orange) are stacked to a full reference volume for registration. Next, the navigators are used to calculate shot rejection criteria, phase maps per shot and slice and macroscopic motion by multi-slice-to-volume registration. The full-volume reconstruction uses the shot-specific parameters (for Ω s , Φ s , R s ) to reconstruct motion-corrected high-resolution image volumes separately for each diffusion direction | 1707 RIEDEL (NÉ STEINHOFF) ET aL.
b. Warm start: choose Vol2Vol or previous SMS2Vol shot parameters by registration metric c. Gradient descent-based registration 4. Median filter The method of moments estimates rigid translations and rotations from the first and second statistical moments of the volumes. The registration uses the mutual information metric with 25 histogram bins, a 3D rigid versor transform and B-spline interpolation. The maximum number of iterations for gradient optimizations was 1000. For the Vol2Vol registration, the gradient descent uses a regular step size strategy with a minimum step of 10 −6 .
The motion detection checks for each diffusion direction whether any rigid Vol2Vol parameter deviates more than 0.2 mm or 0.2 deg from the median among the N i sub-volumes (TRs). This threshold was empirically set from accuracy analyses in registration simulations. If no motion is detected for a diffusion direction, the macroscopic motion operator is replaced by the identity operator I. Otherwise, the Vol2Vol preregistration parameters of the full TR volume are linearly interpolated to the shot time points. Then, a warm start strategy is employed that compares the registration metric for the interpolated Vol2Vol parameters and the SMS2Vol parameters of the previous shot. The subsequent gradient-based optimization is started from the preferred initial parameters and uses a line search strategy with at most 20 (line search) iterations. For SMS2Vol registration, the metric is evaluated only at SMS slice positions using a metric mask. Finally, a median filter (kernel size 5) is applied to each rigid motion parameter over time to ensure smoothness and to filter outliers.

| Linear operator construction
The macroscopic motion operator Ω for the high-resolution reconstruction is implemented as described by Cordero-Grande et al. 27 Translations use the Fourier shift theorem, whereas rotations are factored into three shears, 56 which are implemented via three 1D-FFTs. Moreover, the image is reconstructed on an extended FOV to handle the fast Fourier transform (FFT)-related interpolation boundary conditions. The operator Φ multiplies the sampled slices by the shotand slice-specific navigator phase maps s,l . Moreover, encoding phases from the blipped-CAIPI image-space representation, off-isocenter encoding and slice-specific RF excitation phases RF are integrated into Φ, ensuring a continuous volume phase for the complex interpolation in Ω. The coil sensitivities in operator C are masked using a reference image and compressed 57 using a principal component analysis (PCA) with a threshold of 97% (resulting in 13 from initially 32 channels). The CAIPI operator Θ applies the integer-valued shifts in image space and adds up the SMS group. The EPI data are then masked in k y -xspace according to the shot-specific EPI sampling trajectories by M trj .

| Full-volume image reconstruction
The full-volume reconstruction is based on FGP. 53 The algorithm is initialized by the motion-unaware SMS-IRIS solution described below. Then, gradient updates and soft thresholding in the Daubechies 4 wavelet domain with transform T are iteratively performed. was empirically set to 30. The FGP is stopped if either 100 iterations are exceeded or the normalized ℓ2-norm difference of two subsequent iterations s 21 drops below 10 − 5 : As the estimation of the Lipschitz constant for the FGP algorithm is demanding, a backtracking strategy was implemented. 53 The Lipschitz constant is initialized by the squared maximum absolute value of the coil sensitivity profiles. The Lipschitz constant is then increased by a factor of 1.5 if the objective function for the image estimate exceeds a quadratic approximation bound. 53

| Reference algorithm: SMS-IRIS
The reference algorithm omits macroscopic motion during the image reconstruction. The static conditions, represented by an identity matrix I for the macroscopic motion operator Ω s = I, decouple the SMS slice group signals from each other and reduce the SENSE problem to small groups of aliasing pixels.
IRIS, 16 short for image reconstruction using image-space sampling function, is a SENSE formalism incorporating shot-to-shot phase variations from a navigator for interleaved single-slice EPI. In this work, we introduce a SMS extension, named SMS-IRIS, which requires adaption of IRIS to the interleaved phase encoding in k z -k y -space. 43 SMS-IRIS can also be interpreted as a SENSE-based formulation of the navigated method by Dai et al. 44 or as a navigated version of SMS-based MUSE. 45 For SMS-IRIS, the shot-and slice-specific phase maps s,l are estimated as described for MoSaIC. The non-iterative algorithm incorporates a weighted Tikhonov regularization R ( ) = ‖W ‖ 2 2 with weighting matrix W (N × N ). W is constructed from the inverse absolute values of a motion-free T 2 -weighted image filtered by a triangular window of about 4 mm isotropic k-space extent. = 10 − 2 was set empirically. Moreover, the shot rejection is adopted to exclude severely compromised shots. This navigated algorithm yields timeefficient reconstructions without macroscopic motion correction providing the MoSaIC initialization and a reference method in this work.

| Experimental design
The proposed algorithms were evaluated in simulation and in vivo studies. The static and motion-compromised in vivo data were reconstructed using SMS-IRIS and MoSaIC. Unsampled k-space portions from partial Fourier acquisitions are recovered by projections onto convex sets (POCS). 58 The algorithms were implemented in Python 3.6.9. Computations were performed on a cluster node with 48 GB RAM. A tensor model was fit to the reconstructed images using Dipy 59 after affine registration, rotation correction of the diffusion directions and PCA-based DTI denoising. 60 The registration was performed in two subsequent steps using a rigid and an affine preregistration of the fast elastic image registration (FEIR) framework 61 with a normalized gradient field metric. 62 The first rigid alignment uses the FFT-based resampling 27 described above. The second affine alignment is resampled with the SimpleITK 55 B-Spline interpolation. The first T 2 -weighted image of the static dataset was set as the registration reference. The diffusion direction per image volume (after the multi-shot reconstruction) was corrected for the estimated rotations from the average MoSaIC and the DTI alignment parameters. 29 The PCA thresholding -factor was set to the default value 2.3 and the SNR was estimated using Dipy's PCA noise estimation. Fractional anisotropy (FA) maps 1 and isotropic diffusion images 63 were calculated.
For the simulations, the motion-free SMS-IRIS reconstructions were used as ground truth data. Ten rigid motion trajectories were simulated by Gaussian processes for four motion scenarios, namely no, rigid in-plane, rigid throughplane and fully 3D rigid motion. The variances were set to 0.01 rad 2 and 0.5 mm 2 . Diffusion phase maps were created by 3D second-order polynomials with random polynomial weights sampled according to Hu et al. 64 The sampling was adapted to the in vivo data with four-shot 3-SMS acquisition as well as FOV/3 and FOV/2 CAIPI shifts for image and navigator echo, respectively. The simulation data were created by selecting random diffusion directions from the ground truth data, applying the forward model and adding uncorrelated complex Gaussian noise in k-space, whereby the SNR was matched to the in vivo data.
The simulation data were recovered by SMS-IRIS and four MoSaIC variants to assess the registration components. All variants exclude rejected shots beforehand: The simulation results of all methods were registered to the ground truth by FEIR 61 with a rigid model, as the final volume registration of SMS-IRIS was considered standard DWI processing for a fair normalized root-mean-square error (nRMSE) comparison.
The simulation results were compared by the target registration error (TRE), nRMSE and reconstruction time. The TRE 65 evaluates the mean Euclidian distance over the registered coordinates u r of target points r: where T ′ s is the estimated and T * s the true coordinate transformation of shot s. The target was defined as a mask derived from the ground truth by thresholding the absolute values at 5% of its maximum and selecting the SMS slices of each shot s. The cardinality |Target| is the number of target points over all shots and slices (per diffusion direction).

| Simulated DWI results
The DWI motion simulations provide a quantitative assessment of the registration and reconstruction performance.  Figure 4C shows two transversal slice examples. For the first motion-free case, the results appear similar to the ground truth. The interhemispheric fissure is well resolved for all methods. For the second 3D motion case, SMS-IRIS shows heavy blurring artifacts from inter-shot motion. MoSaIC is able to mitigate the motion artifacts and shows less blurring than MoSaIC Vol2Vol. The underlying motion estimates for the fully 3D rigid motion case are provided in Figure 4D. The simulations show that the navigator with 5 × 5 × 4 mm 3 resolution enables submillimeter TREs and improved F I G U R E 4 Full-volume reconstruction results of simulated four-shot 3-SMS data with random rigid trajectories and diffusion phases. Ten cases were simulated for different motion states, namely static, in-plane, through-plane and fully 3D rigid motion. A,B, The TRE and nRMSE, respectively, for SMS-IRIS and four MoSaIC variants as standard boxplots with whiskers of 1.5 times the interquartile range. C, Reconstruction examples without motion and with heavy fully 3D rigid motion are compared. D, Rigid motion estimates including the final full-volume registration to the reference (leading also to non-zero motion parameters for SMS-IRIS and MoSaIC static). Without motion, all methods provide results with similar visual appearance (arrows). If motion is present, MoSaIC with its SMS2Vol registration improves anatomical delineation (arrows) by taking into account sub-volume (sub-TR) shot-to-shot motion image quality of the high-resolution full-volume reconstructions. The achieved registration accuracy does not visibly differ between in-and through-plane motion ( Figure 4A), although the nRMSE shows increased interquartile ranges for through-plane motion implying higher variation ( Figure 4B). This could be related to the increased susceptibility to interpolation errors in the coarse slice direction. If motion is present, the high motion estimation resolution of SMS2Vol outperforms the Vol2Vol registration, which, in turn, improves on SMS-IRIS. Thus, the shot-to-shot motion estimation per SMS slice group captures continuous motion trajectories better, while having the same image reconstruction complexity as MoSaIC Vol2Vol. The rigid motion estimates in Figure 4D support this observation overall. Nevertheless, the motion parameters can contain temporary discrepancies above the sampling time resolution at some points, for example, for the y-rotation at approximately 5.0 s.
If no motion is present, the SMS2Vol registration (used for MoSaIC SMS2Vol and MoSaIC) is more instable and introduces higher errors, whereby the visual appearance is not visibly degraded ( Figure 4C). MoSaIC uses the motion detection that switches off the SMS2Vol registration for small Vol2Vol estimates and thereby mitigates this downside by incorporating awareness of the achievable registration F I G U R E 5 Segmented SMS image reconstructions without and with motion correction (in vivo data). A, Example slices of the full-volume reconstructions are shown for SMS-IRIS and MoSaIC, whereby different motion types have been present during the associated acquisitions. B, The estimated rigid trajectories of MoSaIC are plotted and associated to the images by a color code. MoSaIC provides similar image quality for the static dataset (blue) and mitigates macroscopic motion artifacts in the presence of both in-plane (orange and green) and through-plane (red and purple) motion (compare close-ups). The proposed motion detection was triggered for all cases accuracy. The nRMSE improvement for through-plane motion demonstrates the benefits of the 3D motion correction over 2D in-plane motion correction.

| In vivo DWI results
An overview of the in vivo full-volume DWI reconstructions with different motion types is given in Figure 5. Reconstruction examples of SMS-IRIS and MoSaIC in Figure   5A are related to the associated rigid motion estimates of MoSaIC in Figure 5B by a color code. The images appear similar for the static case, whereas MoSaIC mitigates motion artifacts for the remaining cases (arrows). The motion detection was triggered for all datasets, whereby the SMS2Vol registration parameters remain almost constant for the supposedly static case (blue). Strong in-plane (z) rotations are detected for the orange and green datasets, leading to reduced blurring for MoSaIC. Light (red) and heavy (purple) nodding motion with rotations about the right-left axis (x) smear the structures in the coarse through-plane direction for SMS-IRIS, which are reduced by MoSaIC. Despite the 3D rigid motion correction, residual artifacts remain for cases with heavy motion and the image quality of the static datasets might not be fully recoverable. For the motion datasets of the five subjects, the data rejection excluded [23,9,24,0,3] of 680 total shots, while [1, 1, 2, 0, 1] were rejected for the corresponding static datasets.  Figure 6A,B, respectively, and rejected shots are indicated. The time resolution is TR∕N g = 3∕10 s with the number of SMS groups N g yielding >3 Hz motion sampling frequency. For the static DTI dataset, the affine alignment resulted in 1.0042 ± 0.0047 and −3 ⋅ 10 − 5 ± 0.0035 for the non-rigid zoom and shear parameters, respectively. The DTI analysis underlines the benefits of shot-to-shot motion correction for in vivo data. The static reconstructions comprise high SNR and clear delineation of the brain gyri and fractional anisotropies. The estimated motion trajectories reflect the motion study design, comprising roughly no motion in the first, in-plane in the second and through-plane motion in the last third. The shot rejection is active around 85 s and 140 s. Compared to SMS-IRIS, MoSaIC reduces blurring of motion-corrupted brain structures and improves the grayto-white matter differentiation (orange arrows) as well as the directional accuracy of the tensor results (yellow arrows).

| In vivo DTI results
A quantitative histogram evaluation of the FAs and tensor traces between static and motion-corrupted reconstructions is presented for three subjects in Supporting Information Figure S2. The overlaps of the histograms from motioncorrupted datasets with the histograms from the static datasets are evaluated using the Kullback-Leibler divergence (KLD). 66 MoSaIC leads to a consistent, although for some cases modest, reduction of dissimilarity measured by the KLD.
The image quality improvements of MoSaIC come at the cost of higher computational complexity. The durations of the main processing steps are given in Table 2 for the DTI dataset in Figure 6. The highest computational cost of MoSaIC is placed by the FFT-based interpolation requiring multiple 1D-FFTs over the 3D data per shot and FGP iteration.

| Residual artifact evaluation
The MoSaIC reconstructions are affected by several remaining artifact types. Figure 7 compiles four types of artifacts (white arrows) that we encountered for MoSaIC. In all four cases, MoSaIC improves the blurring from head motion, but some artifacts remain, such as residual blurring of the interhemispheric fissure ( Figure 7A), overlays of different susceptibility-induced deformations ( Figure 7B), residual signal shadings ( Figure 7C) and low SNR as well as speckled noise structures for strong motion cases ( Figure 7D).
Another problem arises from the fact that some anatomies might remain unsampled if they have left the FOV, especially due to through-plane motion in feet-head direction. Nevertheless, the motion estimates give valuable information about the sampling locations, which can be translated to sampling densities in image space as outlined in Supporting Information Figure S3.

| DISCUSSION
MoSaIC provides 3D rigid motion-corrected full-volume reconstructions for navigated DWI with SMS and interleaved EPI sampling. The navigator shot images are employed to estimate the shot-specific phase variations, rigid motion states, and data rejection criteria. The multi-band excitation provides valuable data support in the slice direction enabling high temporal resolution through-plane motion evaluation. The performance of MoSaIC was evaluated in simulations and in vivo for DTI scans with 4 shots and N MB = 3 yielding shot motion estimates at roughly 3 Hz temporal resolution.

| Unmodeled shot variations
Several unmodeled shot variations 2 spoil the shot similarity for image registration and multi-shot image reconstructions. First, local off-resonance effects manifest as geometric distortions in phase encoding direction for EPI. As head rotations vary the effective phase encoding direction, the local distortions differ per shot ( Figure 7B). Geometric distortions can be reduced by readout-segmented acquisitions 14,15 or corrected by model extensions accounting for off-resonance effects using B 0 map estimates. 67 This improves the shot navigator similarity for registration and offers higher local shot consistency for multi-shot reconstructions. In the current setting, the distortions also introduce mismatches between the EPI data and the gradient-echo sensitivity pre-scan, which could be avoided estimating the sensitivities from the non-DWI EPI data using ESPIRiT. 68 Second, field inhomogeneities can introduce slice-specific gradient timing offsets that are not captured by the current pre-scan and introduce Nyquist ghosting. The slice-adaptive odd-even corrections could, for example, be addressed by model-based estimation. 69 Third, eddy currents especially from the strong diffusionsensitizing gradients induce generally affine transformations and exceed rigid modeling. Although the zoom and shear parameters were rather small in this study, they depend on the sequence and the system tuning. The field deviations can be reduced by twice-refocused sequences 12 or corrected by model-based reconstructions. 3 Alternatively, the motion model could be extended to affine transformations.
Fourth, the effective diffusion direction is affected by subject rotations leading to shot-wise diffusion contrast variations. 31 The presented framework assumes for each diffusion direction that the induced contrast variations are negligible under sufficiently small rotations. Contrast corrections can be introduced by imposing q-space relations 2,31,70 between the shots.

| Navigator analysis
Navigation strategies are prone to potential signal differences between the image and the navigator data. Intra-shot motion might spoil the acquisition windows differently, affect the navigator refocusing and lead to a geometric mismatch. 24 A central assumption of the current multi-shot model is that the object comprises one consistent phase map, whose shot variations are completely described by the navigator phase and the encoding model. Residual phase inconsistencies and noise from intra-shot motion impede proper pixel unfolding in the full-volume image reconstruction, which can lead to ghosting ( Figure 7C) and speckled noise ( Figure 7D) artifacts. Self-navigation is currently intractable for the studied macroscopic motion-disturbed data due to the large g-factor penalty. The lack of viable shot reference data from the image echo impedes the evaluation of intra-shot motion leaving this as an open issue for further studies. The resolution of the single-shot navigator introduces another signal discrepancy for phase and macroscopic motion estimation.
The SMS-to-volume registration is itself a challenging problem 54 that requires careful design of the parameter initialization and gradient optimization. Regarding the warm start strategy, the exploitation of time correlations in combination with the challenging optimization landscape can cause temporal discrepancies of the registration parameters as described in the last section for Figure 4. The registration accuracy is further impeded by the low SNR of DWI, the coarse navigator resolution, intra-shot motion and the unmodeled shot-specific variations. Thus, MoSaIC might nevertheless suffer from residual blurring artifacts ( Figure 7A). In addition, motion for T 2 -weighted data (b = 0 s/mm 2 ) could be included by registrations to the first T 2 -weighted navigator volume. Conversely, inter-contrast registrations between diffusion-and T 2 -weighted data remained unstable.

| Full-volume image reconstruction
The encoding model mainly determines the quality of the shot navigators and the full-volume image reconstructions comprising the k-space sampling trajectory and the sensitivity encoding. Therefore, the deployment of other potentially non-Cartesian trajectories and improved coil setups might be beneficial. 6 As part of the sampling trajectory, this also applies to optimizations of the navigator CAIPI shift, which is set to FOV/2 in this work.
MoSaIC solves the multi-shot problem with ℓ1-norm regularization by fast gradient projections, and the sparsityenforcing regularization was shown to reduce the nRMSE in simulations. With an ℓ2-norm regularization, the multi-shot problem is tractable by the conjugate gradients method, 66 whereby both methods require one application of the forward model and its adjoint per iteration.
The choice of interpolation is a crucial trade-off balancing image quality and computational cost. The macroscopic motion operator Ω resamples the image volume for all shots, twice per FGP iteration. GPU-based implementations 27 can significantly accelerate this processing. Furthermore, the motion detection of MoSaIC analyses the volumetric preregistration parameters for significant variations and allows circumventing the expensive image resampling.
MoSaIC is currently restricted to anisotropic resolution with coarse slice thickness, which might be overcome by improved slice encoding schemes. The reduction of the slice thickness quickly becomes SNR-critical for the proposed acquisition and requires enhanced slice encoding, such as simultaneous 3D multi-slab approaches 46 or improved radiofrequency slice encoding. 71 The use of thinner slices should ease the relative interpolation burden, but, at the same time, thinner slices are affected more severely by motion.

| Informed sampling and reconstruction
MoSaIC uses a rather simple shot rejection analyzing the navigators' energy content similar to Ref. 11. More elaborate criteria to detect degenerate signals involve correlation measures 17 or k-space signal moments. 15 The reacquisition of deteriorated signals 15 presents a valuable extension to exclude malicious data without sacrificing SNR. Through the 3D rigid motion estimation, MoSaIC is furthermore aware of the spatial sampling density making reacquisitions of undersampled areas in image-space feasible as discussed for Supporting Information Figure S3.
Prospective methods 36 represent an important alternative for macroscopic motion corrections. Prospective control allows also for affine online corrections of the gradient coordinate system mitigating the motion artifacts directly in the acquisition and thereby facilitates the computational burden. The online adaption of the diffusion gradient 35 and phase encoding direction also reduce shot contrast and distortion variations. Conversely, they depend on the functionality of the external system and cannot account for local motion transformations and phase variations. 36 In synergistic use, prospective methods could improve the database and ease the interpolation by reducing the residual motion, while retrospective corrections cover remaining artifacts to enable highly motion-robust DWI.

| CONCLUSIONS
The proposed MoSaIC framework improves diffusion-weighted imaging quality in the presence of head motion. The use of navigation and SMS acquisitions enable 3D rigid registration for motion-corrected volume reconstructions, which was presented for four-shot, 3-SMS DTI. The presented algorithms make combined use of SMS and interleaving allowing flexible balancing of SNR and scan time. This model-based strategy offers potentials for smart motion-aware image sampling and reconstruction to improve the robustness of DWI in clinical practice.