Complex diffusion-weighted image estimation via matrix recovery under general noise models

We propose a patch-based singular value shrinkage method for diffusion magnetic resonance image estimation targeted at low signal to noise ratio and accelerated acquisitions. It operates on the complex data resulting from a sensitivity encoding reconstruction, where asymptotically optimal signal recovery guarantees can be attained by modeling the noise propagation in the reconstruction and subsequently simulating or calculating the limit singular value spectrum. Simple strategies are presented to deal with phase inconsistencies and optimize patch construction. The pertinence of our contributions is quantitatively validated in real examples from adult data and challenging neonatal and fetal cohorts. Our methodology is compared with related approaches, which generally operate on magnitude-only data and use data-based noise level estimation and singular value truncation. Visual examples are provided to illustrate effectiveness in generating denoised and debiased diffusion estimates with well preserved spatial and diffusion detail.


Introduction
Signal to noise ratio (SNR) is the ultimate limiting factor for high resolution magnetic resonance imaging (MRI) (Edelstein et al., 1986;Fuderer, 1988). Diffusion weighted imaging (DWI), aiming at probing the microstructure as given by the diffusivity of water molecules, here with a focus on brain applications, is particularly SNR limited, which is due to the exponential decay of the water signal with the diffusion sensitization factor b. In addition, accelerated encodings have been proposed for denser diffusion sampling, which generally reduce the SNR per volume but increase the SNR per unit time (Setsompop et al., 2012). This has motivated an extensive literature on filtering methods for DWI that have evolved from standard filtering techniques on a per-volume basis (Basu et al., 2006) to more comprehensive procedures that jointly consider the information sampled in the diffusion space (Tristán-Vega and Aja-Fernández, 2010). In parallel, different spatial and diffusion representations have been suggested for DWI, which could aid denoising methods based on prior models. Recently, principal component analysis (PCA) based methods have been proposed for DWI denoising (Pai et al., 2011;Manjón et al., 2013). These techniques are based on the assumption that sampling has been made redundant enough so that the covariance of the signal when arranged in a matrix of local spatial patches of M elements times the number of sampled volumes N can be explained by a number of components R much lower than min(M, N ), with the remaining observed covariance being attributable to noise. Further contributions have proposed to estimate R by resorting to random matrix theory, namely to the Marčenko-Pastur law that governs the empirical spectral noise distribution, both for DWI noise estimation (Veraart et al., 2016a) and denoising (Veraart et al., 2016b). Assuming that the number of components is much lower than the matrix dimensions and the signal is strong enough, the noise will be confined into an interval of well separated small eigenvalues, so that a truncation threshold can be defined from a statistical viewpoint.
Most of the denoising techniques for DWI have been applied to magnitude-only datasets provided by scanner reconstructions, the most widely accessible image data type. However, in magnitude reconstructions the noise statistics depend on the signal level, especially at low SNR. This is typically addressed by postprocessing or iterative techniques for Rician bias correction (Koay and Basser, 2006) but these may only provide moderate benefits, particularly when considering the price of significantly increased modeling and computing complexity. Following a different path that recognizes the complex nature of the acquired data simplifies the statistical treatment, particularly in very low SNR regimes, potentially the most rewarding situation when denoising. This was demonstrated a long time ago (Bernstein et al., 1989;Wood and Johnson, 1999) and applied to wavelet-based DWI denoising on a volume per volume basis in Wirestam et al. (2006). Additional studies have investigated the negative influence of the bias introduced by magnitude data in the reliability of derived diffusion features (Jones and Basser, 2004;Bernd et al., 2009).
By founding the denoising criterion in universal properties of noise and being model-free, random matrix based approaches are attractive tools for denoising the complex raw data right after the reconstruction. At this stage, the statistical noise properties can be well characterized and the local inter-volume information content of the signal may admit a compact singular value decomposition (SVD). In contrast, model-based approaches, unless strongly founded in diffusion data properties, may be more prone to suppress relevant information or to introduce spurious features, with magnified risks of altering the diffusion estimates considering usual needs for subsequent data pre-processing. Convincing results were provided in Veraart et al. (2016b), for instance, about the preservation of high resolution data features after filtering.
Consequently, we propose an extension of the random matrix denoising method in Veraart et al. (2016b) that departs from homoscedastic and uncorrelated noise and hard separability of signal and noise assumptions by making use of the results in Benaych-Georges and Nadakuditi (2012) for optimal singular value shrinkage under a general Marčenko-Pastur law. This extension allows our method to be applied in general acquisition settings not covered by the independent and identically distributed noise model, such as partial Fourier and temporally alternating encodings, and to replace the truncation operation with optimal shrinkage rules stemming from the noise model under consideration. Our pipeline preserves noise additivity by operating in the complex domain, with a strategy being introduced to limit the signal complexity due to unpredictable temporal phase variations. In addition, random matrix theory results are fully exploited by making use of a criterion to determine optimal patch sizes ( § 2). In-vivo experiments are performed on high b-value adult data, a challenging dataset comprised of highly accelerated neonatal DWI acquired with a particularly involved encoding strategy to maximize data robustness against sporadic motion and distortions, and strongly motion degraded low SNR fetal DWI acquisitions. Benefits and/or increased flexibility of modeling the noise properties by accounting for the adopted reconstruction operations, working with complex data while correcting for phase inconsistencies, simulating the spectral noise properties by Monte Carlo methods or computing them by solving generalized Marčenko-Pastur equations, and using singular value shrinkage versus singular value truncation are quantitatively assessed by making use of estimation risks. In addition, added value in retrieved diffusion information and derived measures is visually illustrated ( § 3). The source code of a MATLAB implementation of the signal recovery method and the data required to reproduce some of the results in Figs. 4, 5 and 8 is made available at https: //github.com/mriphysics/complexSVDShrinkageDWI/releases/tag/1.0.1. Finally, we discuss the benefits and limitations of the proposed approach and provide some lines for improvement and extension ( § 4) to end up with some concluding remarks ( § 5).
2 DWI estimation as a matrix recovery problem DWI estimation in the presence of noise can be formulated as a set of matrix recovery subproblems by arranging the DWI data into a set of matrices Y of size M × N . Each of these matrices is assumed to be the result of the additive corruption of an underlying (approximately) low but unknown rank R matrix X with a random matrix with Gaussian noise entries W, so we can write Y = X + W. The objective of each subproblem is to provide an estimate of the signal matrixX with some guarantees on the estimation risk induced by a given loss function C(X,X) that quantifies the error between the estimated and the actual data. In what follows we examine how the DWI data can comply with the requirements of this model and describe the proposed image estimation techniques. First we pay attention to the nature of the complex diffusion signal ( § 2.1), then we describe the propagation of noise in the reconstruction ( § 2.2), later on we present the adopted algorithmic pipeline ( § 2.3), and finally we describe the procedure for matrix recovery ( § 2.4)

Low-rank DWI signal
For the low-rank condition R ≪ min(M, N ) to reasonably hold, we should arrange the DWI data in a way that maximizes the redundancy in the entries of the matrices Y. Arranging the data from local patches along the rows and the diffusion measures along the columns of Y, simultaneously taking advantage of spatial and diffusion redundancy, has been proposed for instance in Pai et al. (2011);Manjón et al. (2013); Veraart et al. (2016b) as the local PCA model for diffusion denoising. We can define a set of local patch extraction matrices indexed by their center voxel index q, P [q] , q ∈ Q = {1, . . . , Q} of size M × Q, with Q the number of voxels in each data volume, Q the set of voxel indexes, and M the number of voxels in the local patch, which are indexed by m ∈ M = {1, . . . , M }. We make p [q 1 ] m←q 2 = 1 if the voxel indexed by q 2 is the m-closest to the voxel indexed by q 1 , and 0 otherwise and we define a spatial redundancy metric using the Euclidean distance between voxels d q 1 q 2 = |r q 2 − r q 1 |, with r q the position vector of the voxel indexed by q.
A potential source of spurious complexity in the data is given by artifacts. For complex DWI data, we have to pay special attention to the phase of the signal. The DWI phase varies in a largely unpredictable manner during the acquisition mainly due to patient motion during the scan (Bammer et al., 2010). Larger gradient strengths imply increased levels of phase fluctuations and reduced SNR, so accurate phase corrections are especially difficult for high b-values. Thus, our method performs phase corrections on the basis of a robust gross approximation of the underlying phase (see § 2.3), which can increase the redundancy of the signal without introducing undesired biases. For now, we denote the reconstructed volumes for the set of diffusion measurements as y, their phase corrected versions asỹ, the local information for the patch centered at [q] after phase correction asỹ [q] = P [q]ỹ and express patch-to-matrix assignments as Y [q] ←ỹ [q] . In this setting, DWI denoising involves solving Q ′ ≤ Q matrix denoising subproblems that provide a set of estimates for the signal at q, with one estimate for each patch the voxel q belongs to, so that a patch combination rule can be implemented to build the resulting denoised DWI volumes (see § 2.4).

Additive Gaussian noise in DWI reconstruction
Measured DWI data is assumed to follow a circularly symmetric complex Gaussian stationary noise distribution of zero mean and a given variance σ 2 , CN (0, σ 2 ) modeling the Johnson-Nyquist noise observed in the quadrature receiver. In parallel imaging, levels and correlation of noise in each of the receivers are described by a noise covariance matrix Λ z , which reads as the covariance Λ of the measured data z, so the noise follows CN (µ z , Λ z ) where µ z = 0. Image reconstruction of each of the DWI volumes n ∈ N = {1, . . . , N } can be performed independently as a linear inversion -a.k.a. sensitivity encoding (SENSE) method (Pruessmann et al., 1999) with E n the encoding operator.
Assuming for now that the encoding is the same for all volumes, the basic encoding operator in parallel MRI can be expressed as E = FS, with F accounting for Fourier encoding (encompassing the applied spectral sampling structure) and S for sensitivity encoding. A whitening transformation Υ z of the measured receiver noise satisfying Υ H z Υ z = Λ −1 z is used in practice to transform the data z = Υ z z and the encoding operator E = Υ z E = FΥ z S = FS. This generates an equivalent inverse problem where the channel noise is decorrelated and standardized (Pruessmann et al., 2001), with noise in z following CN (0 KLN , I KLN ), with 0 KLN and I KLN denoting respectively the null vector and identity matrix of size KLN , where K is the number of spectral samples and L is the number of coils.
Due to reconstruction linearity, y follows CN (µ y , Λ y ). The expected value of noise in the reconstructed data is simply with ⊗ denoting the Kronecker product. On the other hand, the noise covariance is the same for all volumes: with Σ Q referring to a given covariance matrix of size Q. As for patch-based denoising, we are interested on the local patch marginalization of the noise covariance: Before building our random matrix recovery model, let's review the properties of noise in different reconstruction scenarios: • Additive noise. This property will generally hold for complex reconstructed data, as the application of a linear reconstruction operator preserves noise additivity (3). However, this additive property is broken for magnitude-only data due to the non-linearity of the magnitude operation, particularly at low SNR. Using magnitude data, the noise standard deviation will depend on the signal level in agreement to a Rician distribution (Gudbjartsson and Patz, 1995), which prevents a straightforward formulation of the matrix denoising problem. In a somewhat naive manner, one could still apply the singular value shrinkage to magnitude data but, as we will illustrate in § 3.5, for low SNR the noise bias would blend into the signal components so, even if there is some noise reduction, the diffusion contrast appears hampered.
• Standard noise. This property is satisfied when Λ y = I QN , which is generally not the case for reconstructed MRI data. We consider different scenarios with increased noise modeling complexity: -Full noise independence. The simplest situation occurs when the Fourier encoding operator is given by the discrete Fourier transform (DFT), F = F , which corresponds to Cartesian sampling without acceleration and with matched acquired and reconstructed grids (identified hereafter as case FULL). Due to unitarity of the DFT operator, F H = F −1 , the spatial noise covariance can be written S is a spatially diagonal operator, so it acts on each sampled voxel q separately. Then, it can be arranged for each voxel as a column vector of size L × 1, S q = s l , and Σ FULL Q gets reduced to a diagonal matrix comprised of scalar variances: with σ 2 q referring to the elements in the diagonal. -Local independence. A common reconstruction setting is that of parallel imaging acceleration via homogeneous spectral undersampling (case UNDER). In this case the Fourier encoding operator can be expressed as F = F U, with U a spatial overlapping operator from the reconstruction to the acquisition field of view (FOV). Accordingly, the noise covariance is given by: The spatial noise correlation can be represented by a sparse matrix with non-zero entries only along the main diagonal and the U − 1 element pairs corresponding to U aliased voxels in the reduced acquisition FOV. Considering a given aliased dimension i, these correlations will only affect those voxels separated by uK i , with u ∈ U = {0, . . . , U − 1} and K i the number of acquired samples along dimension i. One can define a sensitivity matrix * S q U = s l,u for the overlapped voxels q U = {q 0 , . . . q U −1 } so that noise covariances are given by with Q/U the quotient of Q by U. The local covariance matrix in (5) becomes diagonal due to local independence, and we can write Σ UNDER,Q 1 = σ 2 q U ,q U , computed using the set of overlapping voxels including voxel q. Normalization by this matrix has been previously proposed as the reconstruction in SNR units (Kellman and McVeigh, 2005) and, for accelerated acquisitions, the ratio between σ q U ,q U and σ q corresponds to the g-factor times the sampling ratio between non-accelerated and accelerated acquisitions at voxel q (Pruessmann et al., 1999).
-Spatial correlations. More general settings arise for more complex reconstruction techniques or when the data is acquired in a non-Cartesian manner. Typical examples include when the Fourier operator involves regridding, or Gibbs ringing filtering and/or zero-filling interpolation are applied to the reconstructed data. These situations will generally break the noise independence. The specific noise properties will depend on the reconstruction operator. However, it is generally possible to completely characterize the noise propagation through a linear reconstruction method by making use of a given noise covariance matrix over the local patches, as defined in (5), which can be obtained either analytically or by Monte Carlo simulation. Here, as an example of interest in our data, we focus on partial Fourier (PF) or half scan acquisitions (case HALF) where the reconstruction model in (2) is extended by adding a spectral filter G, for instance the ramp filter used for homodyne detection in Noll et al. (1991) or simply a zero-filling filter. Note that this model only includes the k-space weighting required to retrieve the target signal resolution. While standard PF reconstruction involves a separate phase estimation step, this can be integrated in the signal retrieval formulation within our setting. The noise covariance within this model is written: -Temporal heteroscedasticity. This can be the case when the encoding structure is non stationary, typically from an acquisition technique where more evenly distributed net information is pursued by alternating the encodings for different excitations. Here we focus on the method we have used in our neonatal cohort (Hutter et al., 2018), where the phase encoding (PE) direction is interleaved volumewise among the four possible Cartesian in-plane directions for increased tolerance to main field inhomogeneity induced distortions (Bhushan et al., 2014). We can write the reconstruction for this case (case INTER) using the most general noise model of those previously described, HALF, as where a(n) indexes the encoding used for volume n, with a ∈ A = {1, . . . , A}. In this situation we can define a covariance tensor where Σ HALF,a Q represents the spatial covariance matrix for the encoding a and Σ a N is an indicator diagonal matrix with ones in those entries that correspond to the volumes acquired with encoding a. Figure 1 sketches the steps of our denoising procedure. Here is a detailed description:

Denoising algorithm
• Reconstruction. We use the SENSE reconstruction framework in (2). Our datasets contain both in-plane and simultaneous multi-slice (SMS) accelerated echo planar imaging acquisitions, so the reconstruction method follows the extended SENSE technique in Zhu et al. (2016). Noise correlation introduced by the readout gradient ramps is considered negligible. Sensitivity estimates from a conventional reference scan are refined with the information from non-SMS reference acquisitions with matched readouts (Hennel et al., 2016) to promote matched coil map and image distortions, which explains why the sensitivity matrices S depend on the applied encoding in (11). Sensitivity estimation uses the variational formulation in Allison et al. (2013).
• Demodulation. A model for phase corruption of the DWI signal in the presence of bulk motion was derived in Anderson and Gore (1994). Using this model, the temporal phase variations can be corrected by estimating a linear phase Φ for each of the reconstructed slices, which we have done by  Figure 1: Block diagram of the DWI denoising algorithm exemplified with a fetal dataset. Note PF data z has been acquired which produces a non-symmetric spectrum in the PE (horizontal) dimension. After the inversion of the DFT and spatial folding during the reconstruction, we have access to the magnitude and phase of y and the noise covariance Λ y , here illustrated by the spatial noise amplification levels. Later on, a linear phase Φ is estimated and removed from the data, so we obtainỹ. Optimal patch sizeγ is estimated to perform signal prediction,x, and phase demodulation is reversed to provide the final estimatex.
detecting the position of the spectral peak and its phase, and subtracting it from the data,ỹ = Φ H y. This simple phase model is expected to be robust enough to be applied in low SNR regimes and to largely reduce the dynamic phase fluctuations, thus increasing the redundancy of the data and consequently reducing the rank of the matrices to be recovered. Phase correction of linear ramps in the PE direction may alter the temporal noise properties for PF, which could be incorporated into a model like (12), but for computational simplicity we assume that this effect is negligible.
• Patch estimation. The choice of the patch dimension M , or equivalently the matrix aspect ratio γ = M/N , may impact recovery performance. Using patches that are too large may prevent retrieving highly localized data features while if they are too small the denoising potential will get compromised. Thus, we propose to optimize the patch size by minimizing the relative asymptotic mean squared error (RAMSE) of matrix estimation over a set of candidate γ (j) : 0 < γ (j) < 1, j ∈ J = {1, . . . , J}. The procedure is analogous to matrix recovery, so both will be described in § 2.4.
• Signal recovery. The optimal aspect ratioγ is used to build the patches for matrix recovery, which allows a denoised estimatex to be obtained from the observed dataỹ.
• Modulation. The phase subtraction performed in the demodulation step can be reversed to obtain an estimate of the complex DWI signal,x = Φx.

Matrix recovery
In this Section we describe both the procedure for matrix-based signal recovery and the related patch size estimation technique. The generic recovery pipeline is sketched in Fig. 2. Its inputs are the reconstructed and demodulated data from one of the reconstruction techniques described in § 2.2,ỹ, the noise covariance induced by the reconstruction, Λ y , and the optimal aspect ratio provided by the patch size estimation procedure described at the end of this Section,γ. Here we summarize its blocks: Figure 2: Block diagram of the signal recovery procedure. Local information within ball-shaped sliding patches is extracted by P. Random signal plus noise matrices Y and covariance tensors Λ Y are constructed for each of the patches [q ′ ]. SVD of these matrices is performed with singular vectors in V andṼ H and singular values in η Y . In parallel, noise properties are modelled by a spectral distribution f (η W ), which allows optimally shrunk singular values,η X , to be computed. Matrix estimatesX are synthesized using the shrunk singular values and overlapped patch information is assembled onto the signal estimatesx.
• Patch extraction. Patch extraction uses the procedure described in § 2.1. To accelerate computation, the number of patches to cover the FOV is lower than the total number of voxels, Q ′ ≤ Q. In our tests, patch sliding is performed with a subsampling factor of 2 in all directions, so Q ′ ≃ Q/8, with negligible impact in quality. This step outputs a patch extraction operator P, used later for patch assembling, the matrices to be recovered, with local information for a given patch arranged along the rows and corresponding diffusion measures along the columns, Y [q ′ ] , and a set of local noise covariance matrices, Λ Y in its diagonal.
• Shrinkage. The matrix recovery problem via SVD is grounded on the related spiked covariance model (Johnstone, 2001), which was proposed to study the distribution of eigenvalues of sample covariance matrices drawn from population covariances whose eigenvalues η 2 X are all equal to 1 rather than for a small subset of them η 2 Xr , 1 ≤ r ≤ R, R ≪ min(M, N ), that satisfy η 2 X1 ≥ . . . ≥ η 2 XR > 1 and are independent of the number of variables M and observations N . Then, defining an asymptotically constant ratio of variables and observations γ = M/N , it has been shown (Baik and Silverstein, 2006) that, whenever η 2 Xr > 1 + √ γ, the spiky eigenvalues of the sample covariance matrix η 2 Yr tend to a deterministic limit when N → ∞ given by η 2 Yr = η 2 Xr (1 + γ/(η 2 Xr − 1)); i.e., there is a positive bias in the sample spiky eigenvalues as compared to the eigenvalues of the population η 2 Xr . Building on this model, a set of optimal shrinkage criteriaη 2 Xr = h(η 2 Yr ) for different cost functions L(X, X) have been derived in Donoho et al. (2018). In this paper we focus on the Frobenius loss criterion but keeping in mind that similar ways of proceeding are possible for other losses.
The condition of almost all population eigenvalues equal to 1 in Donoho et al. (2018) makes this covariance estimation problem analogous to the matrix denoising problem in the presence of additive independent and identically distributed (i.i.d.) Gaussian noise, for which optimal shrinkers for different cost functions have been derived in Gavish and Donoho (2017). However, the description of noise propagation in the reconstruction in § 2.2 shows that the i.i.d. condition is generally not valid in DWI reconstructed data. Benaych-Georges and Nadakuditi (2012), using a more general model than Gavish and Donoho (2017), showed that the asymptotic limits of the singular values of additive perturbations of zero-mean Gaussian not necessarily i.i.d. noise-only matrices W only depend on integral transforms of the empirical spectral distribution (ESD) of noise, that is to say, on the distribution of the singular values of matrices with noise-only entries, provided that the perturbation signal matrix is drawn from a bi-unitarily invariant distribution. Nadakuditi (2014) showed that those limits can be used to build optimal shrinkers, to be understood as those that minimize the asymptotic loss among all possible shrinkers, for unitarily invariant costs such as (13).
Following Benaych-Georges and Nadakuditi (2012), we can define the D-transform as with the ESD probability density f (η W ) -or alternatively the ESD cumulative density F (η W )depending on the matrix noise covariance and the aspect ratio as described in the next item, and η + W referring to the upper bound of the support of f (η W ). Then, it turns that an optimal shrinker for the Frobenius norm in (13) can be obtained in terms of the ratio of the D-transform of the observed singular values above the detection threshold and its derivative (Nadakuditi, 2014): Note that defining the set of non-fully suppressed components R = {r : η Yr > η + W }, a rank estimate is obtained asR = |R|, with | · | denoting the set cardinality. Consequently, in contrast to other DWI denoising techniques (Pai et al., 2011;Manjón et al., 2013;Veraart et al., 2016b), our proposal does not impose truncation of the SVD components, as truncation thresholds, even when optimally chosen, have been shown to compare unfavourably with optimal shrinkage Donoho, 2014, 2017).
The benefit of shrinking with respect to truncating is relevant in situations where a given component may include comparable contributions of signal and noise, which we have seen to generally occur in our datasets.
• ESD modelling. The Marčenko-Pastur theorem (Marčenko and Pastur, 1967;Silverstein, 1995) states that the ESD of the population covariance matrix Σ M converges almost surely to a bounded distribution that can be described by a fixed point equation. Numerical tools (Dobriban, 2015;Ledoit and Wolf, 2017) have been developed to compute the ESD for noise covariance matrices of the form Λ Y = Σ M ⊗ I N , which applies to the stationary encoding models in (7), (9) and (10). Although analogous fixed point equations (Wagner et al., 2012) and eigenvalue confinement guarantees (Kammoun and Alouini, 2016) are available for more general noise models such as those of non stationary encodings in (12), we do not know of any numerical tool developed for general random interactions between the matrix entries. Thereby, here we adopt a more flexible approach that approximates the asymptotic spectrum of noise by Monte Carlo simulations (Jing et al., 2010). This can be performed by drawing with η + W ≃η W1 . However, as a result of our investigations to directly compute the asymptotic noise spectrum, we have developed some techniques for solving the generalized Marčenko-Pastur equation in the setting of (12). Although due to computational efficiency these developments have not been adopted in this paper, we have collected them in a companion report .
• Synthesis. Estimated signal matrices are reconstructed for each patch using the shrunk singular values, a diagonal matrix built from the singular valuesη [q ′ ] X obtained by (15).
• Patch assembling. The redundancy metric in (1) can be used to define a Gaussian weight for the estimate of the voxel indexed by q 2 from the patch centered at q 1 : M representing the distance of the M -closest voxel to q 1 . Thus, larger weights are given to those estimates for which the patch center q 1 is closer to the point location q 2 where we are estimating the diffusion signal, as corresponding patches are more likely to contain similar diffusion components to those at q 2 . Then, the estimated DWI volumes are obtained aŝ Limitations and alternatives will be discussed in § 4.
The procedure for patch size estimation is sketched in Fig. 3. It uses the same inputs and analogous methods than the decomposition for matrix recovery, but the objective is to select the patch size that yields the best RAMSE. In what follows we describe the implementation details and the formulas for RAMSE estimation: Figure 3: Block diagram of the patch estimation procedure. Averaged RAMSE estimatesĈ over a subset of the patches used for denoising Q ′′ are computed for a set of candidate matrix aspect ratios γ (j) (which induce a set of patch radii). The minimum RAMSE candidateγ is used for denoising.
• Patch extraction. This step is analogous to that for actual denoising but a smaller number of patches Q ′′ is used to cover the FOV. Drawing a moderate number of patches will generally provide enough information for reliable global RAMSE estimates. We have used a subsampling ratio of 6 in each direction, so Q ′′ ≃ Q/216, with which the computation time of patch estimation remains below the time for denoising.
• SVD / ESD computation. These steps are the same as for actual denoising.
• RAMSE estimation. This step uses additional results provided in Nadakuditi (2014) to obtain estimates of the asymptotic mean squared error of matrix recovery based on integral transforms of the ESD evaluated at the observed singular values above the detectability threshold. Namely, for the optimal shrinker in (15) To determine the optimal aspect ratio, these costs are normalized as followŝ This feature is computed for a set of candidate γ (l) and theγ offering the minimum RAMSEĈ is chosen for denoising. Moreover, these asymptotic error estimates can be used to assess the expected performance of different denoising alternatives, which will be a key ingredient for validating our approach in § 3.

Validation and results
In this Section we validate the different contributions of our framework for the brain DWI domains and specific sequences described in § 3.1. In § 3.2 we show the robustness of the noise measurement and propagation approach. Improved recovery by using phase correction is tested in § 3.3. Effects and pertinence of applying our tools for generic noise models are studied in § 3.4. Finally, in § 3.5 we visually compare our method and the related approach in Veraart et al. (2016b).

Explored applications
A series of experiments have been conducted including three distinct brain DWI application domains: high b-value exploration in adult brain imaging, multi-shell high angular resolution neonatal examinations, and multi-shell fetal imaging. All the volumes in the experiments have been acquired using a 3 T PHILIPS ACHIEVA TX with L = 32-channel coils.  DWI recovery in these domains confronts different challenges. In the adult case, the DWI signal for high b-values may fall below the Rician noise floor so the complex valued signal model seems advisable. In the neonatal case, we encounter a challenging acquisition protocol, where high spatial resolution DWI information has been acquired along four different PE directions interleaved in the diffusion space, using a SMS factor of 4 and with large distortions due to low in-plane acceleration. In addition, these datasets are frequently hampered by motion artifacts, partly because the newborns have been scanned without sedation. Finally, in fetal examinations the targeted brain data is collected from locations that may be far away from the coils, with consequent low SNR levels. In addition, there is breathing motion and perhaps fetal motion throughout the acquisition, so that retrieval and representation of DWI information becomes extremely challenging. Two T E values are reported for this dataset as it has been acquired using a dual hybrid spin echo and field echo sequence where the dead time after the application of the spin echo is used to collect some field echo information that is later used to correct for dynamic main magnetic field variations, which will be the subject of a future publication. Both echoes are acquired with exactly the same encoding structure, so we can easily incorporate the multiple echo information to our denoising technique, as previously proposed in Bydder and Du (2006).

Propagation of noise measures
When the noise level is not available before denoising, different techniques have been proposed for its estimation. In Veraart et al. (2016b) the underlying matrix rankR and the noise levelσ are simultaneously estimated using statistics on the expectation of the i.i.d. noise eigenvalue distribution. This estimator is identified hereafter as EXP1. The procedure is summarized by (21) within Table 2; the candidate matrix rank is iteratively increased until a given criterion for the rank estimate depending on standardized and decreasingly sorted eigenvalues is satisfied, where the standardization is performed using the estimate for σ at the previous iteration. We have studied this estimator and observed a biased behaviour when applied to close to square matrices, so we propose a simple modification that we have identified as EXP2 in (21). On the other hand, in Gavish and Donoho (2014) a simple estimator is proposed where the standard deviation of noise is computed as the ratio of the median sample eigenvalue, η 2 YM/2 , and the median of the Marčenko-Pastur law that governs the distribution of sample eigenvalues for standard additive white Gaussian noise, θ 2 M/2 , as shown in (22). This estimator will be identified as MED.

Technique / Reference
Noise estimate Spectral expectation Veraart et al. (2016b) Spectral median Gavish and Donoho (2014) To show the correctness of noise propagation through the reconstruction as well as the limitations of these data-based noise estimation methods, we use the full Fourier adult dataset (PF factor of 1 in Table 1). In this case, noise is captured by (8). Strictly speaking, the noise level within a given patch is not unique as it changes according to the spatial variation of the receiver field and the g-factor profiles, with the latter potentially inducing abrupt changes in the noise properties of neighboring locations. To avoid this inherent limitation of data-based noise level estimation when operating in signal units, our pipelines are modified in this experiment so that the data is standardized by the application of a whitening transformation Υ M that normalizes by the noise factors. Note that this transformation will preserve the rank of the signal matrix provided that Σ M is positive definite and optimal asymptotic loss guarantees can now be pursued on the invariant Frobenius loss (Canu and Fourdrinier, 2017), C FROB,Υ M (X, X) = ∥Υ M (X − X)∥ 2 2 . If our noise measurement and propagation model is correct, the estimated noise level after standardization should beσ = 1 everywhere. We check whether this is the case by running the estimators in Table 2 for reconstructions of noise-only data with the noise injected in place of the original data after the channel noise standardization described in § 2.2. These results are compared with the estimations obtained when reconstructing the original signal plus noise data. Fig. 4a compares the cumulative histograms of noise estimates at the different spatial locations when using the assembling technique in (18). Results show that when estimating noise levels in the absence of additive signal perturbations (casesσ W ) the distribution of estimates takes the value 1 everywhere, with only minor numerical deviations. This shows both the consistency of the estimators in Table 2 and the correctness of our noise propagation method. However, in the presence of signal perturbations (casesσ Y ), the estimators in Table 2 start to deviate from their asymptotic behaviour, with positive biases for EXP2 and MED and potentially larger dispersions for EXP1, which shows the limitations of data-based noise estimation techniques. Figs. 4b,c show exemplary spatial maps with the estimated noise levels for the noiseonly data and the signal plus noise data. For this experiment only the shape corrected version of (21), EXP2, has been used, which has been judged to provide the most benign dispersion and bias tradeoff according to Fig. 4a. On the other hand, Figs. 4d,e compare the estimated ranks when using EXP2 and our noise propagation and asymptotic random matrix rank estimation. The results are in agreement with the elementary interplay where noise overestimation of EXP2 (Fig. 4c) is accompanied by lower rank estimates (Fig. 4d).  Figure 4: a) Cumulative spatial distributions of noise estimates for different estimators without (σ W ) and with (σ Y ) signal perturbation. b,c) Spatial maps of noise for estimator EXP2 b) without and c) with signal perturbation. d,e) Rank estimatesR, given as a percentage of the number of volumes N , for estimators d) EXP2 and e) ours. All results correspond to the RAMSE-optimal aspect ratioγ = 0.85.

Effect of phase correction
In Fig. 5a we show the spatial distributions of the improvement ratio in the RAMSE when using the full method (PC) versus removing the phase correction in § 2.3, which is computed as ρ = 1 −Ĉ PC /Ĉ NPC and given as a percentage. Plots are generated using brain-only information and they include the adult case and exemplary neonatal and fetal studies drawn at random. A reduction of the RAMSE is observed in all tested cases when using phase correction, with reduction ratios ranging from 0 to 30% depending on the location. In Figs. 5b-d we illustrate with comparisons of original data, results of denoising without phase correction and results of denoising with phase correction in an exemplary subject, slice and orientation within the highest b-value shell for the neonatal case. Despite the estimator operates on complex data, for the sake of economy we only provide magnitude displays of the results. Emergence of plausible anatomical features with better preserved resolution can be observed when using phase correction, for instance in the interfaces indicated by the arrow.

Adult
Neonatal Fetal 0% 5% 10% 15% 20% 25% 30% Figure 5: a) RAMSE improvement ratio ρ of denoising with phase correction (PC) versus denoising without phase correction (NPC). b-d) NPC and PC denoising results for an exemplary slice at b = 2.6ms/µm 2 in the neonatal case: b) original data, c) NPC denoising, d) PC denoising . Same estimated aspect ratiosγ have been obtained for NPC and PC denoising, respectively of 0.85, 0.8 and 0.9 for adult, neonatal and fetal data. Mean RAMSEs with phase correction have been 0.011, 0.012 and 0.068 respectively for the adult, neonatal and fetal cases.

Denoising with a generic noise model
The flexibility of the denoising scheme is illustrated by its application under different reconstruction operators and associated noise models. In a first experiment we use the fetal data to compare three approaches for PF reconstruction. In the first approach, hereafter referred to as regridded, the reconstructed PF data is regridded so that only the sampled area of the spectrum is back-transformed and consequently the data is represented in a coarser grid in the PE direction but with preserved local noise independence. In this case, the noise model follows (8) and the spatial patches are drawn elliptically rather than spherically. After denoising, the data is zero-filled and ramp-filtered as in Noll et al. (1991) to retrieve the desired grid and resolution. As for the second approach, referred to as zero-filled, G in (10) takes the role of a zero filling filter, G ZF , with a ramp filter being applied after denoising. The third approach, referred to as ramped, applies the zero filling and ramp filters before denoising, so G = G RA . Figs. 6a,b show, for the spin echo data at the highest b-value shell, the power spectral density (PSD) of the recovered magnitude signal in both the readout and PE directions for the three denoising approaches and the original data. The PSD along a particular dimension is estimated by averaging along the remaining dimensions and the diffusion orientations. The PSDs generally follow a plausible linear decay after denoising. Overall, the ramped approach provides stronger noise suppression than the other two in the readout direction (as given by more pronounced PSD decay ratios) and similar suppression than the zero-filled approach in the PE direction, with weaker suppression and implausible shape for the regridded PE PSD at high frequencies. These results are in agreement with the implicit SNR-resolution tradeoffs of the different models used for denoising, with lower SNR of ramped when compared to zero-filled (Noll et al., 1991) and larger targeted resolution of zerofilled versus regridded. The low rank prediction adapts to the different SNR-resolution tradeoffs as reflected by the differences in resulting PSD profiles. To gain further insights about these results, the denoised images are compared in Figs. 6c-f, showing the reference data (Fig. 6c) and the data denoised with the considered PF approaches. Regridded denoising (Fig. 6d) produces some spurious oscillations along the PE (horizontal) direction (see the area enclosed by the ellipse), which agrees with the shape of the PE PSD response at high frequencies. As for the zero-filled (Fig. 6e) and ramped ( Fig. 6f) results, differences are more subtle but stronger background noise suppression can be appraised in the ramped case (see the area enclosed by the circle), which seems also in agreement with the slightly stronger decay of the readout PSD for this approach. These experiments show that our denoising method provides flexible ways of balancing the SNRresolution tradeoff in PF reconstructions considering, for instance, the noise amplification introduced when ramp filtering in homodyne reconstructions. They also show that while it is possible to reduce the problem to one with independent noise samples along the local patches by manipulating the reconstruction grid, this has an impact on the results at the prescribed grid and resolution. Resulting images c) before denoising, and when denoising using the d) regridded (γ = 0.65), e) zero-filled (γ = 0.85) and f) ramped (γ = 0.9) approaches.
In the neonatal case, four PE directions were interleaved throughout the acquisition, each of them generating its own spatial noise distribution. Two strategies to deal with these datasets are compared. First, the problem is reduced to four independent denoising subproblems that use the volumes corresponding to each PE separately (SPE method). Second, the noise model in (12) is considered so the different PEs are treated jointly (JPE method). The comparisons are performed at a group level using a sample of 87 subjects chosen uniformly at random from the set of complete scans within the neonatal cohort and using G = G RA for both strategies. The RAMSE averaged on automatically extracted brain voxels is compared in Fig. 7a. A reduction of the average RAMSEĈ has been observed for all subjects when using the JPE approach with an average improvement ratio of ρ = 11.24% (obtained as in § 3.3). These results give a negligible p-value when performing a paired right-tailed sign test against the null hypothesis that the median ofĈ JPE −Ĉ SPE is greater than zero or zero. Left column of Figs. 7b,c shows that the recovered brain signal appears similar at coarse scales. However, when zooming in, as in the central column, better delineated edges are observed when using the JPE model, as observed in the ascending tract feature highlighted by the arrow. The right column explains this effect by resorting to the rank estimates, with JPE estimated signal ranks generally lower and smoother than the sum of SPE estimates for each PE; more data samples have been made available by leveraging all PEs so that any residual redundancy in the information contained in the different PEs will increment the denoising potential.

Complex denoising for unbiased diffusion measures
This Section provides visual evidence of the benefits of operating with complex data for denoising and performs comparisons with the state of the art method in Veraart et al. (2016b), which has been modified so as to use the EXP2 noise estimation correction in (21). To separately assess the technical refinements introduced in matrix recovery, comparisons are also performed when adapting this method to operate on complex data. Due to complexity of neonatal and fetal data preprocessing for motion and distortion correction and the limited applicability of Veraart et al. (2016b) under general noise models, we use diffusion measures of non-preprocessed adult data. Similar properties of magnitude and complex data retrieval have been observed in neonatal and fetal examples, sometimes with differences between them being enhanced by the prepocessing pipeline limitations for low SNR. Fig. 8 shows the denoising results for the b = 10ms/µm 2 shell using an exemplary slice and orientation (top) together with the estimated spherical harmonic power at orders l = 0 (mid) and l = 8 (bottom). The denoising capabilities are evident when comparing with the original data in Fig. 8a. However, the retrieved signal differs noticeably for the tested approaches. Magnitude-only denoising in Figs. 8b, using Veraart et al. (2016b), and 8c, adapting our method, provides biased estimates when compared to complex denoising in Figs. 8d, adapting Veraart et al. (2016b), and 8e, using our method, especially in free diffusion areas. The bias reduces the contrast between restricted and free diffusion, with hampered detectability of the optical nerve for l = 0 (highlighted by the arrows).
Complex denoising also appears more effective preserving the diffusion features for l = 8, for instance at the cerebellum (highlighted by the arrows). Differences between data-based noise estimation and hard thresholding (Fig. 8d) versus propagation of noise measures and optimal shrinkage (Fig. 8e) are more subtle.
However, it appears that the high spatial frequency harmonics are more accurately recovered when using the methods here proposed, with more compactly structured features in 8e versus 8d (top row and l = 8). A quantitative estimate of improvement can be obtained by looking at the differences in the asymptotic mean squared error of both approaches, which can be accomplished by using (19) and the distance between the estimated SVs and the optimal SVs (Nadakuditi, 2014). For this dataset we have observed an estimated asymptotic error reduction of 33.20 % when using our method versus the complex version of Veraart et al. (2016b).

Discussion
We have presented a signal recovery algorithm for DWI based on singular value shrinkage of Casorati matrices where local spatial information and diffusion measures are respectively arranged along the rows and the columns. Due to the limited diversity of spatial and diffusion information, particularly when correcting for unpredictable phase variations due to bulk motion during the application of the diffusion gradients, it is reasonable to assume that the signal component within these matrices is approximately low rank, while the sampling noise properties can be accurately modeled when the scanner noise measures are propagated through the reconstruction. This allows the application of recent results on the asymptotic limits of additive perturbations of random matrices, most essentially the confinement of the noise bulk singular values, to obtain an objective estimate of the signal components as well as to provide a criterion for optimal patch size selection. The method prevents recovery biases by using complex data information and has been applied to different brain DWI cohorts with scans obtained by advanced protocols that make use of SMS accelerated and interleaved encodings. Quantitative experiments have been conducted to validate the different assumptions and to compare to a related approach taking advantage of the estimates of the asymptotic error as provided by the theory. Our proposal guarantees optimal asymptotic risks when estimating diffusion information by singular value mappings, which stems from a synergistic combination of recent results on signal prediction using random matrix theory and the usage of the scanner noise measures. Signal recovery based on singular value shrinkage looks for the best possible asymptotic signal predictor obtained by only manipulating the singular values of convenient matrix arrangements while corresponding singular vector orientations (Benaych-Georges and Nadakuditi, 2011) are left unaltered. This model does not exploit other reasonable assumptions about the data such as the sparsity of the singular vectors in appropriate representation domains, which may contribute to a more accurate retrieval. In this regard, the applicability of approaches such as Ding (2017) could be a matter of future investigations.
Phase retrieval and correction has been proposed as a separate step to obtain a diffusion measure with additive noise properties, for instance, in Prah et al. (2010);Eichner et al. (2015). These methodologies, however, inherit the limitations of the implemented phase recovery strategy, namely, potential resolution loss when low pass filtering (Prah et al., 2010), or non comprehensive use of available information when  Veraart et al. (2016b); c) magnitude data denoising by adapting our method; d) complex data denoising by adapting Veraart et al. (2016b); e) complex data denoising by our method. Top: denoising results in an exemplary slice and orientation at b = 10ms/µm 2 . Mid and bottom: estimates of the spherical harmonic power for the same shell and slice respectively for orders l = 0 and l = 8. All results correspond to the optimal aspect ratioγ = 0.85. operating slicewise (Eichner et al., 2015). In contrast, our approach considers all the available information to simultaneously estimate the magnitude and the phase of the signal, so the retrieved signal accuracy is boosted by any given amount of redundancy in the complex data.
Patch construction in our denoising method is based on a spatial proximity metric but other criteria to build the spatial patches or the introduction of some notion of locality also in the diffusion space may help to improve the estimates provided that they are able to better promote the low-rankness of the induced matrices. We have used a simple patch assembling technique where the weights are computed by the Gaussian kernel in (17). This has been motivated by our main application domain, neurodevelopmental DWI, where the acquisitions are usually performed without the subject being able to stay still. In this situation, we think it becomes reasonable to give larger weights to the estimates of those patches whose center is closer to the voxel where we are estimating the diffusion signal, as they are more likely to include the diffusion components corresponding to that location throughout the scan. Statistically grounded alternatives have been proposed for patch assembling, for instance, in Dabov et al. (2007), where the variance of the estimation is used to weight the patch contributions. Although the asymptotic mean squared error could be used for that purpose, as already discussed in Dabov et al. (2007), minimum variance linear combination when patches overlap should consider the error covariances of the estimates, which is more involved.
Finally, attention should be paid to other degrading factors that may increase the rank of the matrices considered, most notably motion and non stationary distortions. In this regard, similarly to the phase correction adopted in this paper, the denoising model could include other correction steps prone to promote low-rankness while keeping the noise propagation tractability. In case that the low rank assumptions or the asymptotic regime become less justifiable, the matrix recovery problem may be reformulated using different estimation criteria (Yadav et al., 2017).

Conclusions
We have proposed a method for patch-based DWI retrieval based on random matrix theory. Our proposal extends previous contributions in the field by pushing the application of the theory to discontinuous, correlated and temporally heteroscedastic noise and considering more refined manipulations on the empirical singular values. Importantly, it overcomes the limitations inherent to the simultaneous estimation of the signal and the noise properties; in our approach, the latter are calculated or simulated by the propagation of the scanner measures through the reconstruction and spectral decomposition operators. Moreover, the objective nature of this approach to denoising, using only the asymptotic properties of additive noise perturbations of low rank matrices to find the optimal denoising strength without resorting to empirical parameter tuning, is used for patch size selection by means of the RAMSE estimates. Experiments have been conducted quantifying the benefits of noise modeling and SV shrinkage versus noise estimation and SV truncation as proposed previously in the DWI literature. In addition, evidence has been provided on improvements when promoting low-rankness by phase demodulation and on efficient operation in accelerated scanning regimes including inplane spectral subsampling, simultaneous multi-slice and partial Fourier as well as in temporally interleaved acquisition strategies. Finally, we have demonstrated the ability to produce denoised and debiased DWI estimates in challenging fetal, neonatal and adult neuroimaging applications.