HARDI DATA DENOISING USING VECTORIAL TOTAL VARIATION AND LOGARITHMIC BARRIER

In this work, we wish to denoise HARDI (High Angular Resolution Diffusion Imaging) data arising in medical brain imaging. Diffusion imaging is a relatively new and powerful method to measure the three-dimensional profile of water diffusion at each point in the brain. These images can be used to reconstruct fiber directions and pathways in the living brain, providing detailed maps of fiber integrity and connectivity. HARDI data is a powerful new extension of diffusion imaging, which goes beyond the diffusion tensor imaging (DTI) model: mathematically, intensity data is given at every voxel and at any direction on the sphere. Unfortunately, HARDI data is usually highly contaminated with noise, depending on the b-value which is a tuning parameter pre-selected to collect the data. Larger b-values help to collect more accurate information in terms of measuring diffusivity, but more noise is generated by many factors as well. So large b-values are preferred, if we can satisfactorily reduce the noise without losing the data structure. Here we propose two variational methods to denoise HARDI data. The first one directly denoises the collected data S, while the second one denoises the so-called sADC (spherical Apparent Diffusion Coefficient), a field of radial functions derived from the data. These two quantities are related by an equation of the form S = S S exp ( − b · sADC ) (in the noise-free case). By applying these two different models, we will be able to determine which quantity will most accurately preserve data structure after denoising. The theoretical analysis of the proposed models is presented, together with experimental results and comparisons for denoising synthetic and real HARDI data.


Introduction to the HARDI data
Currently, HARDI data is used to map cerebral connectivity through fiber tractography in the brain. HARDI is a type of diffusion MRI, which was introduced in the mid-1980s by Le Bihan et al. [27,28,29] and Merboldt et al. [35]. It is based on the idea that the magnetic resonance signal, which forms the basis of MRI, is attenuated when water diffuses out of a voxel, and the degree of attenuation can be used to measure the rate of water diffusion in any arbitrary threedimensional direction, via the Stejskal-Tanner equation [42]. Water diffusion occurs preferentially in directions that are aligned with axonal fiber pathways, and is hindered in orthogonal directions by the myelin sheaths that coat the axons. Because of this diffusion anisotropy, initial approaches to assess fiber directions modeled the three-dimensional diffusion profile at each point as a single tensor (Beaulieu et al. [6]), in which the principal eigenvector of the diffusion tensor can be used to recover the dominant fiber pathway at that voxel. The diffusion tensor model (Basser et al. [5]) describes the anisotropic nature of water diffusion in tissues (inside a typical 1-3mm sized voxel) by estimating, from a set of K diffusionsensitized images, the 3×3 covariance matrix of a Gaussian distribution (Beaulieu et al. [6]). Each voxel's signal intensity in the k-th image is decreased, by water diffusion, according to the Stejskal-Tanner equation [42]: where S 0 is the non-diffusion weighted signal intensity, D is the 3×3 diffusion tensor, g k is the direction of the diffusion gradient and b is Le Bihan's factor with information on the pulse sequence, gradient strength, and physical constants.
Unfortunately, although it is widely used, the diffusion tensor model breaks down for voxels in which fiber pathways cross or mix together, and these are ubiquitous in the brain which is highly interconnected. More advanced image acquisition techniques, such as HARDI (Tuch et al. [47,48]) or diffusion spectrum imaging (also called q-ball imaging; Tuch et al. [49]), have been introduced in the past 5 years -this type of data recovers the local microstructure of water diffusion more accurately than standard DTI data (diffusion tensor imaging). HARDI, DTI and other similar modalities permit non-invasive quantification of the water diffusion in living tissues. The tissue structure will affect the Brownian motion of the water molecules which will lead to an anisotropic diffusion. By imaging diffusion in an arbitrary number of directions (often 100 or more), HARDI overcomes the limited accuracy of the tensor model in resolving the highly complex fiber structure of the brain, particularly in regions with fiber crossings.
HARDI data makes it possible to compute the orientation diffusion function over a sphere of possible directions. Tuch [46,48] developed the first HARDI acquisition and processing methods, and later Frank [19] used spherical harmonic expansions for processing HARDI data sets. A very active area of research has grown up in processing the HARDI signals, leading to methods for HARDI denoising, segmentation, and registration using metrics on spherical functions (Lenglet et al. [30]). Most of these signal processing methods still model the diffusion signal as a tensor, rather than exploiting the full information in the spherical harmonic expansion. For example, Khurd et al. [25] used isometric mapping and manifold learning (eigen-decomposition of the distance matrix) to directly fit a manifold to the tensors, compute its dimensionality, and distinguish groups using Hotelling's T 2 statistics. Initial image processing on the full HARDI signal has focused on fitting a discrete mixture of k distinct tensors to the signal, and later on fitting a continuous mixture model for modeling the MR signal decay and multi-fiber reconstruction (Jian et al. [21], [22]), or fitting a continuous mixture of tensors using a unit-mass distribution on the symmetric positive definite tensor manifold (Leow et al. [31]). Initial work on the nonlinear (fluid) matching of HARDI images has taken a more nonparametric approach, and has used the Kullback-Leibler divergence to measure the discrepancy between ODF (Orientation Distribution Function) fields (Chiang et al. [11,12]), using a 3D fluid transform to minimize the discrepancy between two fields of ODFs. As information theory can be used to measure the overlap between diffusion probability density functions, there is much promising work using metrics derived from information theory (e.g., the Fisher-Rao metric, von Mises-Fisher distribution, etc.; McGraw et al. [34]; Srivastava et al. [41]; Chiang et al. [12]). Other work has modeled the HARDI signal as high-order tensors (Barmpoutis et al. [4]) or as a stratification (a mixture of manifolds with different dimensions; Haro et al. [20]).
The HARDI data is the MRI signal attenuation information after time t> 0 modeled by where Ω is a bounded open subset of ℝ 3 , x∈Ω and θ ∈ [0, 2π), Φ ∈ [0, π). S 0 is the MRI signal that is obtained when no diffusion gradient vector is applied and this is considered to be a reference image, relative to which the diffusion-attenuated signal is measured. The function d t (x,θ,Φ) is called the spherical Apparent Diffusion Coefficient (sADC), which measures how much the water molecules diffuse in the given direction and b is a parameter pre-selected to collect the data.
We note that, in our work, the sADC is a function that takes values on the unit sphere, whereas in diffusion imaging ADC is sometimes used to refer to a scalar field derived from the diffusion images.
In reality, in experimental data, a higher b-value (e.g., 3000 s/mm 2 ) tends to lead to a smaller SNR (Signal-to-Noise-Ratio) in the obtained images, thus more noise, [13]. Hence, we are led to consider the following simplified degradation model (1) We may say that the baseline signal collected without any diffusion gradient applied, which is S 0 , may also be contaminated by noise, but here we assume that this can be neglected, or we just consider the last noise term in (1) to encompass all types of noise, which we do not know exactly. This is a reasonable approximation, because in practice, it is common to collect several non-diffusion weighted images (several S 0 images) whose average may be used as a reference signal S 0 (see, e.g. Zhan et al. [51]). If we let S̃t(x, θ, Φ) be a denoised dataset, then we expect that for all x, ϕ, θ, (2) As already mentioned, the data has to be first denoised before extracting the fibers, or before registration. Although HARDI is a relatively recent type of data acquisition, several HARDI processing methods have already been processed: we mention just a few. In McGraw et al. [33] and Jonasson et al. [23], curve evolution techniques are applied for the segmentation of HARDI data. Descoteaux, Deriche and collaborators, among others, have also proposed methods to segment HARDI data [15], for estimation of the sADC [16], and for mapping of neuronal fiber crossings [17]. The work Delputte et al. [14] deals with denoising and regularization of fields of ODFs.
The prior work most relevant to ours is by McGraw et al. [32]: the noisy data S = S t (x, θ, Φ) is regularized to remove noise in a functional minimization approach; a standard L 2 data fidelity term is used, combined with a weighted version of vectorial total variation regularization in space, and H 1 regularization of data at every voxel with respect to direction. At every voxel, the data is mapped into 2-dimensional space plane using spherical coordinates and then discretized using finite elements. Denoising results for synthetic and real HARDI data are presented in [32]. In our proposed work, we also use vectorial total variation for the regularization. However, in our first model we also impose the constraint (2). Our second proposed model differs even more from the one in [32], since we faithfully follow the signal degradation model (1) and we denoise d instead of S.
The outline of the paper is as follows. In Section 2 we recall the logarithmic barrier function for general optimization problems with inequality constraints. In Section 3 we introduce two variational models for HARDI denoising, together with theoretical results, the associated Euler-Lagrange equations, and the numerical approximations. Finally, in Section 4 we present several numerical results of denoising synthetic and real HARDI data obtained using the proposed models, error assessment and comparisons.
We would like to mention that a preliminary version of this work has been presented and published in IPMI 2009 [26].

Logarithmic barrier functions
In this section, we will briefly recall the logarithmic barrier method for optimization problems with inequality constraints, one of the barrier function methods used to force the unknown variables of interest to remain feasible. Suppose that we have the following optimization problem: let φ : ℝ n → ℝ and c : ℝ n → ℝ m be smooth functions with c = (c 1 ,…,c m ), (3) Let O be the feasible domain, i.e., We also assume that the first and second-order sufficient conditions hold at a point x* so that x* is a strict local solution of (3). Then the logarithmic barrier function for (3) is the unconstrained optimization problem [37] (4) This method appears in many works on optimization and applications. An initial guess x 0 in the interior of the feasible set, i.e. c i (x 0 ) > 0, i = 1,…, m, is chosen. Then an iterative method is applied (such as gradient descent) for fixed, small μ 0 > 0, to obtain x 1 in the interior of the feasible set, that minimizes P(x;μ 0 ). Then μ 1 is selected, such that 0 < μ 1 < μ 0 , and x 2 is the minimizer of P(x;μ 1 in the interior of the feasible set, etc. Thus the basic idea of the method is the computation of the minimizers x k of P(·;μ k ) for a sequence of values μ k ↓ 0, with the goal that x k converges to x*.
The authors of [18] and [37] present general results about the existence of minimizers of the logarithmic barrier functions in the vicinity of x* and the convergence of the sequence of minimizers to x* as μ k ↓ 0.

Variational denoising models
We propose here two variational denoising methods: one recovers a denoised S, while the other method recovers a denoised d. Again, we let Ω be a bounded open subset of ℝ 3 . The HARDI data is a collection of intensity values at uniformly pre-selected directions on the sphere, to which the electromagnetic field is applied, that is, at each position x ∈ Ω ⊂ ℝ 3 , we measure values at different directions. We note briefly that the actual set of directions is typically computed using an electrostatic repulsion PDE to optimize the sampling of a spherical signal using a finite set of observations (see Tuch et al. [49] for a discussion of spherical sampling schemes).
In the continuous setting, we obtain a function defined on a manifold Ω × S 2 , where Ω is the spatial domain and the sphere S 2 is the space of gradient directions. However, it is not easy to work with the entire domain Ω × S 2 for computational purposes. Instead, we will use a discretized version of S 2 , where and the directions s i ∈ S 2 are uniformly chosen. Then the function that we obtain has the form where each f i : Ω → ℝ corresponds to the direction s i . We will also use a function representation f 0 : Ω → ℝ for the data S 0 which has only spatial information (assumed to be given also). Note that what is important in this model is to measure the correct diffusivity along each direction at each position which is based on the difference between the noisy data S and the reference data S 0 , thus our main concern is to deal with nonnegative differences. To impose the right amount of smoothness and discontinuity on the denoised data, we will use the space BV (Ω;ℝ n ), which is the n-dimensional version of BV(Ω), defined as follows.
The space of functions of bounded variation BV(Ω;ℝ n ) is defined by Note that Du is a n × 3 matrix of Radon measures with Ω ⊂ ℝ 3 . If u is differentiable or if u ∈ W 1,1 (Ω,ℝ n ), then in the distributional sense, and The total variation has been successfully introduced and used in image restoration for grayscale images by Rudin, Osher, Fatemi [39]. The vectorial total variation for color images has been analyzed in Blomgren-Chan [8] and the PhD thesis manuscripts of Blomgren [7] and Tschumperlé [43], together with other regularizations for vector-valued data, such as regularizations based on the structure tensor (see also work by Tschumperlé and Deriche [44], [45]).

The first model
In this model we will denoise the collected data S t in (1). In what follows, we will drop the subscript t and denote the data by S and f will be its corresponding function notation. The functional that we want to minimize is the following: subject to the constraints A more general regularization. The regularization that we use in (5) is mathematically well studied and has the property of allowing discontinuities along hypersur-faces. The vectorial total variation regularization is a particular case of the following regularization for vectorvalued data which utilizes eigenvalues, inspired from [43]. For a given u = (u 1 ,…, u n ), we define g u (x) to be the matrix where u i is the intensity value corresponding to the spherical vector s i as before and The matrix g u (x) is different from the usual structure tensor [43] (the structure tensor would not depend on the spherical directions s i ). Here, g u (x) was chosen so that it would be a measure of intensity variation along the corresponding direction s i , at each point x. As the structure tensor, the matrix g u (x) is symmetric and positive semi-definite and we can define such that Φ is nondecreasing in each variable and tends to infinity when the first variable tends to infinity.
. Now we formulate the following general functional where the function f and the set are given as before. The models proposed here use in which case we have which realizes the total variation of the n-dimensional vector function u(x) = (u 1 (x), …, u n (x)) (but independent of the spherical directions s i ). Note that, in two spatial dimensions, it would be easy to consider other examples of integrands Φ as in [43] (such as , with dependence on the directions s i ), however this becomes very cumbersome in three dimensions in practice, when the eigenvalues no longer have such simple explicit expressions.
The L 1 fidelity term that we use in (5) is different from the more standard L 2 term from [39], that would correspond to additive Gaussian noise. Since the type of noise in MRI data is more complex (usually Rician noise), we think that the L 1 norm is more appropriate in this case instead of the L 2 norm, while easier to implement instead of directly imposing the Rician noise model. For total variation based denoising models with L 1 fidelity term, we refer to [1], [2], [3] in one dimension, and more recently to [40], [36], [10], [9] in the case of planar images, among other work.
In what follows, we will consider an unconstrained version of the first model (5), using the logarithmic barrier method to remove the constraints. The unconstrained version that we will minimize is the following: (6) Note that the last soft constraint term does not incorporate the non-negativity constraint in (5). We prove that if the function f already satisfies f = (f1, …, f n ) ≥ 0 everywhere, then the minimizer u min , if it exists, it also satisfies u min ≥ 0, due to a maximum principle for our associated diffusion equation.
To simplify this model further, we will require a minor change in the last term of (6). Instead of the integrand given in the last term of (6), we will use This function H depends only on the two data f and f 0 , so the computations will not become more complicated. Our choice of the function H is (7) If we use this weight H, then we only penalize the terms at those points x ∈ Ω with f i (x) − f 0 (x) > 0 which violate the constraint (2). Therefore, the first minimization problem that we are interested in is (8) We can prove existence of minimizers for the constrained problem (5) and for the unconstrained problem (9). In what follows, we will use for i = 1, …, n, and since we assume that f ∈ L 1 (Ω;ℝ n ) and f 0 ∈ L 1 (Ω), we know that Ω i and Ω \ Ω i are measurable sets.
The constrained problem (5) and the unconstrained problem (9) both have minimizers.
Proof. It is standard to prove the existence of a minimizer for (5) since the functional is bounded from below by 0. Let be a minimizing sequence of (5). The boundedness of the domain Ω and the constraints for i = 1, …, n and for k = 1,… guarantee that u k ∈ L 1 (Ω;ℝ n ) for all k. Since F(u k ) ≤ M for all k = 1,2, …, for some positive constant M, we deduce that for all k ≥ 1, Hence, the sequence is bounded in BV(Ω; ℝ n ), and we can have a subsequence (still denoted u k ) and u ∈ BV(Ω;ℝ n ) such that u k → u strongly in L 1 (Ω;ℝ n ) and Du k converges to Du weakly in the sense of measures. Based on the lower-semicontinuity of the total variation, we deduce that F(u) ≤ lim in f k→∞ F(u k ). Moreover, u must satisfy the constraints 0 ≤ u i ≤ f 0 a.e. x ∈ Ω due to the strong convergence of the subsequence u k to u in L 1 (Ω;ℝ n ), thus u is a minimizer of the constrained problem (5).
Almost the same steps can be applied to (9), except that we have to make sure that the functional is bounded from below. We have (9) Note that for z > 0, Hence, we can bound the last term of (9) from below: Therefore for any λ, μ > 0, and we can take a minimizing sequence {u k }. There exists 0 < M < ∞ such that F(u k ) ≤ M for all k. Then the above inequalities show that if we let then It is enough to show that the sequence is bounded in L 1 (Ω;ℝ n ). It is obvious that for each i, What remains to prove is that for each i,

Let
. Then on Ω i . Note that Suppose that for some i, Since the function z → λz − μ log(z) is convex for z > 0, Jensen's inequality implies This is a contradiction. Hence, {u k } is bounded in BV(Ω;ℝ n ) and there exists u ∈ BV(Ω;ℝ n ) such that (up to a subsequence) u k → u in L 1 (Ω;ℝ n ) and thus The purpose of the function H is to impose as few constraints as possible to minimize the effect caused by the logarithmic functions. The following theorem ensures that the function H does the job.
Proof. Let u be a minimizer of (9). Let v = max(u, 0). We claim that It is easy to see that What remains to prove is that Let g(z) = (λ/μ)z − log(z) for z > 0. By the assumption, g(z) > 0 and g is increasing on {z ≥ μ/ λ}, which implies on Ω j Therefore, F(v) ≤ F(u) which implies that v is also a minimizer. Since F(v) = F(u), so all the inequalities above become equalities and we have v = u a.e., that is, any minimizer u satisfies u ≥ 0.
In particular, if we take H ≡ 1, then (9) is the same as (6) and Ω i = Ω for all i. The logarithmic term in (6) guarantees that any minimizer u has to satisfy u ≤ f 0 . Therefore, any minimizer u of (6) satisfies 0 ≤ u ≤ f 0 .

Euler-Lagrange equations and a numerical algorithm-The
Euler-Lagrange equations associated with the minimization (9) are: (10) To find a minimizer numerically, we impose Neumann boundary conditions and use the gradient descent method and the finite difference scheme to solve the evolution PDEs for i = 1,…, n, where ▽ and ▽· refer to the spatial gradient and divergence, respectively: For simplicity, we define by For the divergence term, we use the forward and backward difference and we will employ an explicit scheme to obtain Experimental results using this discrete model 1 for HARDI denoising will be shown in Section 4.

The second model
In this model, we will denoise the sADC (spherical Apparent Diffusion Coefficient) d t in (1).
In what follows, we will also drop the subscript t and denote the sADC by d and will borrow the same notations f,f 0 from the previous section. In this section, we also assume, for theoretical purposes, that there exist 0 < c 1 ,c 2 < ∞ such that Our proposed second minimization problem is, (12) subject to the constraints Note that d in (12) realizes b · d in (1) and this is not a modification since b is a constant. Also note from the HARDI data model (1) that knowing the true S is equivalent to knowing the true d, which lead us to consider the problem of denoising d instead of denoising S. However, this model seems to have some advantages over the first model. First of all, d is being used as an exponent of the exponential function, which says that the range of the values of d in (12) is much smaller than that of u in (5) so that when we minimize the two functionals F and G, G does not seem to smooth out d as much as F does to u. Secondly, we directly impose the assumed image formation model. Thirdly, the constraints in (12) are simpler than those in (5). Hence, if we look at an unconstrained version of the second model obtained in the same way as in the first model, then we end up minimizing (13) Unfortunately, the problem (13) has no global minimizer since for any d ∈ BV(Ω;ℝ n ) with G (d) < ∞, if |ω i | > 0 for some i, then To prevent this unboundedness, a slight modification is needed, which is the following. We modify G with the function H defined in (7) by (14) An advantage of the function −μ log(z)/z is that on one hand, unlike the logarithmic function −μ log(z), it rather spreads uniform weights on {z > ε} for ε > 0 when μ > 0 is small. On the other hand, −μlog(z)/z generates more repelling force from z = 0 than −μ log(z). In this sense, −μlog(z)/z realizes the constraints in (12) better than −μlog(z) does.
A discretized version G D will be (15) where Ω is the spatial domain, x = (x 1 , x 2 , x 3 ), and First of all, we can prove that there is a minimizer of the constrained problem (12).

Theorem 3. The constrained problem (12) has a minimizer d ∈ BV(Ω;ℝ n ).
Proof. In the proof we assume for simplicity that a sequence and its subsequence have the same notations.  (16) where and .
Let for each i. Then ,

If , then
That is, which finally implies that Hence, the sequence is also a minimizing sequence which is uniformly bounded in L 1 (Ω;ℝ n ). In other words, the sequence is bounded in BV(Ω). Therefore, there exists d i ∈ BV(Ω) for each i such that Note that d = (d 1 ,…, d n ) ≥ 0.
In addition to the strong L 1 convergence of the sequence , we may say that pointwise a.e. since the domain Ω is finite, which guarantees the convergence of the fidelity term in the functional, Therefore, d is a minimizer.
The purpose of imposing the logarithmic constraint to problem (14) is to insure that d ≥ 0 is satisfied. It turns out that it is sufficient to impose the constraint on a partial domain Ω i = {x ∈ Ω : H(f i (x), f 0 (x)) > 0}. Now we prove the existence of a minimizer for (14). Proof. We prove first that the functional G in (14) is bounded from below. It suffices to prove that there exists −∞ < C < 0 such that for each i, . Hence, if we let C = −|Ω|/e, then Next, we may assume that log(c 2 /c 1 ) ≥ e. Otherwise, we can always choose c 0 > 0 so that c 0 < c 1 ≤ f ≤ c 2 , c 0 < c 1 ≤ f 0 ≤ c 2 and log(c 2 /c 0 ) ≥ e. Then we set c 1 to be c 0 . Now we choose a minimizing sequence {d k }. Let d̃k = min(d k , log(c 2 /c 1 )). As was shown in Theorem 3, If, in addition, for some x ∈ Ω i , then Also if we let v k = max(d̃k, 0), then |Dv k |(Ω) ≤ |Dd̃k|(Ω) and for x ∈ Ω \ Ω i , . Since {v k } is also a minimizing sequence which is uniformly bounded above and below on a bounded set Ω, the same argument as in Theorem 3 guarantees the existence of a minimizer d ∈ BV(Ω;ℝ n ) with d ≥ 0. Let h be any minimizer of (14). Then h̃ = max(h,0) is also a minimizer as shown above and it is easy to see that h̃ = h a.e. Hence, any minimizer h satisfies h ≥ 0.

Euler-Lagrange equations and a numerical algorithm-The
Euler-Lagrange equations for the model (14) are similar to those found in the previous section, which are for i = 1,…, n, with Neumann boundary conditions. We end up solving the following PDEs for i = i,…, n, based on the gradient descent method.
As before, we use finite differences to discretize the PDE's. We implement an explicit scheme for the divergence term. With the same notations as in Subsection 3.1.1, we can obtain the following: where . Experimental results using this HARDI denoising model 2 will be shown in the next section.

Numerical results
We recall that in practice, we work with a decreasing sequence of values {μ k } k=1,2,… and find minimizers or (depending on the choice of a functional between F k and G k ), where The initial values μ 0 ~ 0.000001 and μ 0 ~ 0.00000001 were chosen for each case respectively, and the iterations were performed until μ k reaches 10 −6 μ 0 with μ k+1 = 10 −1 μ k or until we reached a possible steady state. We would like to mention that, even though the denoised data obtained from the second model minimizing the functional G is the function d, we use the computed data to visualize the results and to compute the root-mean-square-error (RMSE) values and the sKL distance, to be explained below. For visualization, we show both the raw HARDI plots, and also the ODF plots obtained by postprocessing. The ODFs are computed from the raw HARDI data using a truncated spherical harmonic series.
Since the functionals (9) and (14) are obviously nonconvex, there might be many local minima, which might cause visibly unsatisfying results or some computational instability. We thus need to choose an appropriate initial guess at t = 0. We notice that the feasible domain for (9) contains the set that might contain a minimizer u* of (9) and the feasible domain for (14) contains the set that contains a minimizer d* of (14). Numerical results show that the computed minimizer u* of (9) also satisfies This led us to choose the initial guess u 0 for (9) and the initial guess d 0 for (14) defined by and We first show denoising results on a synthetic data set that was kindly provided to us by the authors of [32] who generated the data set using the technique described in [38]. The authors of [32] proposed the following variational principle for smoothing the data f and approached this model with a membrane-spline deformation energy minimization method. They used the finite element method (FEM method) for smoothing the spherical information at each voxel. For spatial smoothing, the following variational principle (TV method) was used (18) where n is the number of the gradient directions as in our proposed models. In (18), the coupling between different directions is introduced not through the vectorial total variation, but through the function g that is defined by where GA is the generalized anisotropy defined in [38]. The authors of [32] also kindly provided us with their two denoising results, one obtained by the FEM method only and the other one obtained by the combination of FEM and TV methods. These are shown in Figure 7 and Figure  8 for comparison with our denoising results. Table 1 shows the RMSE values for each model. The RMSE formula is presented in (19).
The size of the synthetic data set is 16 × 16 × 81, which is a set of 81 2D-images of size 16 × 16. That is, there are 81 gradient directions. In Figure 1 and Figure 2, we show two different angle views of the synthetic, noise-free raw HARDI data and its noisy version. In Figure 3 and Figure 4, we show the same angle views obtained from the denoised raw HARDI data using the proposed two models. All the results shown in Figures 1 to 10 were obtained from the 16 × 16 × 81 synthetic data; however, we visualize only 1/3 of the whole volume in Figures 1 -8, to better see the finer details. Figure 9 shows the ODF (Orientation Distribution Function) visualization of the whole volume of the data and Figure 10 is a magnified view of Figure 9.
One way to measure the difference between two data sets uses RMSE (Root Mean Square Error). We compute the RMSE values between the noise-free data and the noisy data, and between the noise free data an the denoised data. The RMSE values corresponding to the synthetic experimental results are shown in Table 1. Since the intensity values in the data range between 0 and 1, we multiplied all the intensity values by 255 and then we computed the RMSE values in the following way: let be the discrete set of gradient spherical directions and let Ω = {x1,x2,…,x N } be a subset of ℝ 2 or ℝ 3 , a discrete version of the spatial domain. Then the noise-free data, the noisy data, and the denoised data are functions on the discrete domain . If and are two images, then (19) Additionally, to assess the accuracy of our results in the synthetic experiments, and since we obtain the ODFs by postprocessing, we can compute the sKL distance between two probability density functions, which is defined by (for two probability densities p(x), q(x)), (20) Let q be the ODF of the noise free data and in each case, let p be the ODF of either the noisy data or the denoised data. As it can be seen from (20), smaller sKL value implies more similarity between the two probability density functions. We obtain and show the sKL values in Table 2.
The visualization of ODFs and the calculation of sKL distances were kindly done by Liang Zhan, Laboratory Of Neuro Imaging. Table 2 needs more explanation. Note that the two noisy data which appear in Table 2 are the same, which means that their sKL distances have to be theoretically the same. However, due to the differences in discretizations, we could not recover the same sKL distance value 1.0409 of the noisy data presented in [32]. Hence, we also look at the ratio between the sKL distance of denoised data and that of noisy data for our results and for the results in [32]. Table 3 presents those ratios. Smaller ratios mean better denoised results.
Next we apply the proposed models to the denoising of two real HARDI brain data sets (one of 30 directions, artifficially noisy by Rician noise, for which we have the original clean ground truth; another one of 94 directions, for which we do not have the ground truth).
Since in general the real data sets do not provide us with the true noise-free data, we need to obtain a noise-free "real" data set to make sure that our denoised methods actually improve the data quality. For this purpose, we fit sixth order spherical harmonics to the real HARDI data sets with 95 gradient directions. Then we sample a new set of 30 unit vectors uniformly distributed on a unit sphere at which we evaluate the spherical harmonics expansions. We consider these data sets as noise-free real HARDI data sets with 30 diffusion-sensitized gradient directions. This procedure was done by J. Eugenio Iglesias from Medical Imaging Informatics, UCLA. We still use the same 11 baseline images for f 0 . It is now easy to corrupt these data sets with Rician noise, which is described below. Figures 11 -17 show this noise-free data and their noisy and denoised versions.
Since skull-stripping was done before to remove non-brain regions from the image, the background was automatically set to zero (black). Thus, we restrict the calculations only inside the brain region, combined with homogeneous Dirichlet boundary conditions along the xy plane. We used only 3 slices from the whole volume to reduce the computational cost and we imposed Neumann boundary conditions along the z-axis, and we visualize results from the middle slice. Figures 11,12,13 show all 30 direction images of original noise-free data, noisy data, and denoised data using the second model G.
Since we also have the noise-free data for the HARDI brain data experiment, it is possible to compute the RMSEs of the noisy data and the denoised data. Table 4 shows those RMSE values. We observed that the first model using F worked quite well with the synthetic data provided by the authors of [32] that has a rather simple structure; however, model 1 was not suitable for the real noisy HARDI data with 30 directions, and thus we only show results with the second model using G for the real noisy HARDI data (with 30 directions).
Finally, we present a denoised result of a noisy real HARDI data with 94 diffusion-sensitized gradient directions. We tested our models on real HARDI datasets with 94 diffusion-sensitized gradient directions, which means that the number n is 94. Briefly, we used the same data as in [31] in which 3D structural brain MRI scans and DT-MRI scans were acquired from healthy young adults on a 4 Tesla Bruker Medspec MRI scanner using an optimized diffusion tensor sequence. Imaging parameters were: TE/TR 92.3/8250 ms, 55 × 2mm contiguous slices, FOV = 23 cm. 105 directional gradients were applied: 11 baseline (S 0 ) images with no diffusion sensitization (i.e., T 2 -weighted images) and 94 diffusion-weighted images (b-value 1159 s/ mm 2 in-plane resolution) in which gradient directions were evenly distributed on the hemisphere [24]. The reconstruction matrix was 128 × 128, yielding a 1.8×1.8 mm 2 in-plane resolution. The total scan time was 14.5 minutes. We set f 0 = S 0 to be the average of the 11 baseline images. Figure 19 shows the slices of one of the actual datasets used in our numerical calculations, together with the denoised results obtained using model G, and Figure 18 shows the corresponding non-diffusion weighted image S 0 = f 0 .
Next we visualize ODFs of the data and compare the ODFs of the noisy data with the ODFs of the denoised data. Notice that calculating ODF of a noisy dataset means that we perform a process of smoothing the data. Since the original noisy data usually violates the constraints we had in the models, if we want to visualize the ODF of the noisy data, then there has to be a preprocessing step to adjust those violating values, which may not involve solid consistency to the fundamental model. To see the difference between the ODFs of the noisy data and the denoised data, we consider some parts of the whole brain image and magnify them especially in regions where fibers are crossing. The data and denoised results obtained using models F and G are shown in Figures 19-21.

Conclusion
We proposed two slightly different models to denoise HARDI data. The difficulty in dealing with the data lies in the nature of its complex structure. One model denoises the signal itself by the standard TV + L 1 method combined with the logarithmic barrier function to realize the constraints that arise naturally from the acquisition model. The other model uses the acquisition formula and we denoise d instead of denoising S. Similarly, we denoise d by TV + L 1 method combined with the logarithmic barrier function, adapting the data fidelity term from the image formation model. As we have seen through the numerical computations, our model for denoising d is more stable and gives more satisfactory results for real HARDI data. We also compared our denoising methods with the methods described in [32]. Our proposed models outperformed the models in [32] providing smaller RMSE values, and also our denoised data is visually more similar to the noise-free data. Two different angle views of noise-free raw synthetic HARDI data. Two different angle views of noisy raw synthetic HARDI data. Two different angle views of denoised raw synthetic HARDI data by our first model using F. Two different angle views of denoised raw synthetic HARDI data by our second model using G. Comparison between models F and G. Top row, left to right: noise-free synthetic raw HARDI data, noisy synthetic raw HARDI data. Bottom row, left to right: denoised synthetic raw HARDI data using model F, denoised synthetic raw HARDI data using model G. Comparison between models F and G. Top row, left to right: noise-free synthetic raw HARDI data, noisy synthetic raw HARDI data. Bottom row, left to right: denoised synthetic raw HARDI data using model F, denoised synthetic raw HARDI data using model G (different view). Comparison between models F, G, FEM [32] and FEM+TV [32] on the synthetic data. Top row, left to right: denoised raw HARDI data using F, denoised raw HARDI data using G. Bottom row, left to right: denoised raw HARDI data using only FEM in [32], denoised raw HARDI data using FEM+TV in [32]. Comparison between models F, G, FEM [32] and FEM+TV [32] on the synthetic data. Top row, left to right: denoised raw HARDI data using F, denoised raw HARDI data using G. Bottom row, left to right: denoised raw HARDI data using only FEM in [32], denoised raw HARDI data using FEM+TV in [32] (different view). Top row, left to right: ODFs of noise-free synthetic HARDI data, ODFs of the noisy version. Bottom row, left to right: ODFs of denoised HARDI data using the functional F, ODFs of denoised HARDI data using the functional G. Magnified views of Figure 9. Top row, left to right: ODFs of noise-free synthetic HARDI data, ODFs of the noisy version. Bottom row, left to right: ODFs of denoised HARDI data using the functional F, ODFs of denoised HARDI data using the functional G. Noise-free real HARDI data with 30 diffusion-sensitized gradient directions. Noisy real HARDI data with 30 diffusion-sensitized gradient directions, artifficially corrupted by Rician noise. Denoised real HARDI data with 30 diffusion-sensitized gradient directions by the second model using G. Small part of the noise-free real HARDI data with 30 diffusion-sensitized gradient directions. Small part of the noisy real HARDI data with 30 diffusion-sensitized gradient directions. Small part of the denoised real HARDI data with 30 diffusion-sensitized gradient directions using model G. ODF visualization. Left to right : noise-free real HARDI data, noisy real HARDI data, denoised real HARDI data using model G, with 30 diffusion-sensitized gradient directions.  19th slice of the real brain data with 94 gradient directions. Only 16 randomly picked directions out of 94 directions corresponding to the 19th slice of the noisy (above) and the denoised (using G, below) HARDI data are shown here. ODF: 19th slice of the real brain data with 94 gradient directions. ODF visualization for real brain data with 94 gradient directions. Left to right: ODFs of noisy data, denoised result using the first model F, denoised result using the second model G. Table 1 RMSE between the mentioned synthetic data and the noise-free synthetic data.

RMSE
Noisy data 17.7079 Our denoised data using F 5.8991 Our denoised data using G 7.6081 Denoised data FEM in [32] 11.9964 Denoised data FEM+TV in [32] 7.6367 Table 2 sKL distances for the synthetic ODF data.

sKL distance
Noisy data 4.2206 Our denoised data using F 1.8243 Our denoised data using G 1.8143 Noisy data 1.0409 Denoised data by FEM in [32] 0.9088 Denoised data by FEM+TV in [32] 0.6576 Table 3 Ratios between the sKL distances of ODFs, for the synthetic data.

Ratio between the sKL distances
Our denoised data using F Noisy data 0.4322 Our denoised data using G Noisy data 0.4299 Denoised data by FEM in [32] Noisy data 0.8730 Denoised data by FEM in [32] Noisy data 0.6318 Table 4 RMSEs: the noisy real data with 30 diffusion-sensitized gradient directions and the denoised version using model G.
Noisy data Denoised data using G RMSE 10.5448 4.7268