NEURONAL FIBER–TRACKING VIA OPTIMAL MASS TRANSPORTATION

Diffusion Magnetic Resonance Imaging (MRI) is used to (noninvasively) study neuronal fibers in the brain white matter. Reconstructing fiber paths from such data (tractography problem) is relevant in particular to study the connectivity between two given cerebral regions. Fiber-tracking models rely on how water molecules diffusion is represented in each MRI voxel. The Diffusion Spectrum Imaging (DSI) technique represents the diffusion as a probability density function (DDF) defined on a set of predefined directions inside each voxel. DSI is able to describe complex tissue configurations (compared e.g. with Diffusion Tensor Imaging), but ignores the actual density of fibers forming bundle trajectories among adjacent voxels, preventing any evaluation of the real physical dimension of these fiber bundles. By considering the fiber paths between two given areas as geodesics of a suitable well-posed optimal control problem (related to optimal mass transportation) which takes into account the whole information given by the DDF, we are able to provide a quantitative criterion to estimate the connectivity between two given cerebral regions, and to recover the actual distribution of neuronal fibers between them.

(Communicated by Martino Bardi) Abstract. Diffusion Magnetic Resonance Imaging (MRI) is used to (noninvasively) study neuronal fibers in the brain white matter. Reconstructing fiber paths from such data (tractography problem) is relevant in particular to study the connectivity between two given cerebral regions. Fiber-tracking models rely on how water molecules diffusion is represented in each MRI voxel. The Diffusion Spectrum Imaging (DSI) technique represents the diffusion as a probability density function (DDF) defined on a set of predefined directions inside each voxel. DSI is able to describe complex tissue configurations (compared e.g. with Diffusion Tensor Imaging), but ignores the actual density of fibers forming bundle trajectories among adjacent voxels, preventing any evaluation of the real physical dimension of these fiber bundles.
By considering the fiber paths between two given areas as geodesics of a suitable well-posed optimal control problem (related to optimal mass transportation) which takes into account the whole information given by the DDF, we are able to provide a quantitative criterion to estimate the connectivity between two given cerebral regions, and to recover the actual distribution of neuronal fibers between them.

Introduction.
1.1. General introduction. Diffusion magnetic resonance imaging (MRI) is a powerful imaging modality for non-invasive and in vivo study of the anatomy of the brain white matter. Starting from the observation that water molecules diffusion in living tissues is highly affected by the cellular organization of the observed tissue, MRI exploits this dependency to probe tissue structure architecture at a microscopic scale far beyond the usual imaging resolution. Moreover, since it is well known that in the brain there is a strong relationship between water diffusion and axonal orientations (see [44], [18], [26]), in the last decade diffusion MRI has been widely applied to the study of fiber bundle trajectories into the brain white matter yielding a high number of "tractography" or "fiber-tracking" methods. In general terms, a fiber-tracking algorithm exploits water diffusion data of each 3D resolution element (voxel) in order to reconstruct the fiber bundle trajectories present into the brain white matter. Various models are used to describe the water molecules diffusion at the scale of an MRI voxel.
In the simplest case of Diffusion Tensor Imaging (DTI), displacements of water molecules subject to magnetic field are assumed to follow a Gaussian distribution and therefore in each voxel there will be only one direction of maximum diffusion. On the contrary, in the state-of-the-art technique Diffusion Spectrum Imaging (D-SI), the diffusion at each voxel is described by a displacement distribution or equivalently by a probability density function (DDF) defined for a set of given directions inside the voxel. The probability density function DDF is reduced to an orientation distribution function (ODF) by summing the probabilities of diffusion in each direction. As opposed to DTI, hence, DSI is able to successfully characterize more complex axonal configurations, such as fiber crossing, fanning and bending.
From a medical point of view, fiber-tracking is relevant to get information about the connectivity between different cerebral regions. It turns out to be important to know not only whether or not two given regions are connected by a bundle of fibers, but also the quantity of fibers forming that connecting bundle. This connectivity problem is relevant in particular to describe situations where, due to medical pathologies, we have two cerebral areas connected by fibers that pass through a damaged region where the connection is interrupted or strongly hampered: if the connection was ensured by several fibers, the damaged area will not jeopardize the connection between the two given areas as much as in the case of a scarce number of connecting fibers.
Despite some important progresses made by recent studies focused on assessing the ability of diffusion tractography to estimates anatomically correct fiber bundles, the validation of reconstructed fiber trajectories and their interpretation remain the two major shortcomings. As regards the interpretation of tractography results, present fiber-tracking methods, DSI-based included, build fiber trajectories disregarding the "quantity" of fibers going to make up a bundle trajectory among The technique allows us to define an ODF for each voxel. An ODF is represented as a spherical polar plot coloured according to local diffusion orientations, and it describes the "average amount of diffusion" along each direction over the sphere. adjacent voxels, thus preventing any evaluation, for example, of the real physical dimension of a fiber bundle or the global quantity of fibers crossing a brain region.
1.2. Related work. In the literature, diffusion fiber-tracking algorithms are generally divided into two major classes: deterministic and probabilistic ones.
Deterministic tractography algorithms [32], [11], [34] incrementally reconstruct fiber trajectories starting from a seed point and following the maximum local diffusion information (i.e., a streamline process). Such algorithms are relative fast and easy to compute, but lack of robustness and errors are propagated exponentially.
To overcome the shortcomings of deterministic tractography in dealing with the uncertainty in fiber orientations, probabilistic algorithms have been introduced [35], [27], [24]. They generate multiple trajectories from a given seed point (up to thousands) by independently repeating a streamline tracking process and randomly choosing the next direction to follow at each step. Some of the reconstructed paths will correspond to anatomically meaningful fiber bundles, others no. The output of such algorithms is not a trajectory, but a map of "probability" for a given point to be traversed by an anatomically genuine fiber bundle. The computation of these maps is burdensome, and their interpretation is still controversial.
More recently, a new approach called global tractography has been proposed by means of some fiber-tracking algorithms [29], [45], [46], [23], [30]. These algorithms share the idea of searching a fiber path as a global parameter optimization process in order to improve robustness to the noise or to local errors of diffusion directions. In [29], the authors model fiber trajectories as spline curves having control points randomly drawn as a Bayesian based distribution. The obtained results are interesting, but the method is computational expensive because the initialization of the control points and the sampling process are two very time-consuming processes.
Another recent promising approach has been introduced by [23] and [30], in which each fiber bundle trajectory is obtained by minimizing Ising spin glass-type model. In general, such approach requires a very high computational time.
1.3. Comparison with previous Hamilton-Jacobi-Bellman based models. In [36] it is proposed a variational model for diffusion MR, settled in a continuous framework, based on ideas from control theory.
In this model, it is imagined that an infinite number of particles starts at time 0 from the boundary ∂S of a given seed region S evolving along the streamline given by the gradient ∇C * S of a suitable cost function C * S which assigns to each point x a cost to reach the seed region S related to an optimal control problem. At time t the evolution front (i.e. the level curves of C * S ) will represent the propagation along the medium. If the seed region is sufficiently small, for short time 0 < t < τ we have an approximate picture of the neural net near the seed region.
More precisely, the procedure used by the authors is summarized here (see Section §2 in [36] and Section 4 for the notations): ds.

Given a seed region
3. The propagation front is the level set of the value function C * S , and it evolves with normal speed F given by 4. Diffusion data define the front speed propagation F . The local cost Ψ and the front speed F are related by the Legendre transform: : v,n ≥ 0 , and, in the setting of the problem, the proposed local cost function Ψ is related to the diffusion tensor D(p) (see Section 4.1 in [36]). 5. Finally, we can consider different cost functions, i.e. different functions Ψ 1 , Ψ 2 , which yield different value functions C * S,1 , C * S,2 related to the same seed region. The quotient C * S,1 (x)/C * S,2 (x) can be used to compare different costs. If we take Ψ 1 to be identically 1, the related cost C * S,1 (x) is the Euclidean distance of x from S. This method turns out to be very useful to establish the connectivity of a single point p with the region S, since it is sufficient to consider the optimal trajectory (minimizer of the cost) steering p on S, but gives few information about the connectivity between two regions.
Experimental data show that neuronal fibers are not uniformly distributed in the brain, hence it is natural to associate also a neuronal density to the seed region and let this density evolve along the streamlines. This approach is not covered by [36], since they do not consider the possibility to associate a density to the seed region. Indeed, this will provide more robust information about the real path of neuronal fibers, moreover the problem can be set in the more general framework of optimal transportation theory.
The basic problem that we are going to investigate is about the connectivity between two regions, each endowed with a (possibly nonconstant) density. If we represent initial and final densities as gray scale images, the problem can be seen as a find a continuous interpolant (warping) from the first image to the second one. In Section §5.2.3 of [7], it is described a technique, based on optimal transportation theory, to obtain this interpolation. It can be viewed as a geodesic problem in the space of probability measures endowed with a metric called Wasserstein distance. According to this, if we have two compactly supported probability measures µ 1 , µ 2 (in the smooth case, where ψ is a scalar convex function, and the geodesic joining µ 0 and µ 1 according to the Wasserstein distance is µ t := (tx + (1 − t)∇ψ(x)) µ 0 , i.e. the push forward of the initial measure along the points of the straight line segment joining Id R d and ∇ψ.
The fundamental idea of our approach is to combine the optimal control and optimal transportation approaches, i.e. construct a continuous warping that, in a certain sense, will take into account the geometry induced by the optimal control metric (hence the transport will not be done on straight lines). In the smooth case, where most of our computation are still developed, the problem is essentially reduced to an optimal control problem where densities are transported along optimal trajectories, but this quite general approach allows in principle to treat much more difficult situations.
1.4. An optimal control approach based on optimal mass transport theory. In this paper we propose a model that takes the microstructural information into account during the fiber trajectories building process in the case of DSI tractography. Exploiting microstructural information, our model can be used to investigate some ambiguities that often occur in white matter and are difficult to deal with, e.g. distinguish between fiber crossing and kissing, improving the specificity and accuracy of tractography results.
We aim to take into account the whole information given by the DDF in order to reconstruct the actual underlying diffusion process, which as observed above is highly anisotropic and takes place essentially along the fiber paths.
The knowledge of the underlying diffusion process allows in principle to recover the whole neuronal fiber net by considering the streamlines corresponding to displacements along high probability directions as encoded in the DDF. These should correspond to the actual fibers. Moreover, the ratio between the actual number of fibers crossing in a point associated to different directions will be equal to the ratio between the respective direction probabilities. Another constraint to take into account is that each fiber has no endpoints in the interior of the brain white matter: in other words, the unit tangent field to each fiber has to be divergence free.
Having in mind the previous considerations, we will focus here on the connectivity problem, i.e. the problem of finding the bundle of fibers connecting two given (small) cerebral regions, by considering it as the superposition of paths that fulfill the following requirements: their tangents form a solenoidal field that fits as much as possible with the velocity distribution given by the DDF.
Observe that we may consider each single fiber path connecting two given regions as the trajectory of a particle moving along this fiber, starting from the first region and reaching the second one in a fixed lapse of time T > 0.
By considering in the first region a mass distribution given by one moving particle on each fiber, we obtain an evolution law relating at each time 0 < t < T the velocity of the particles with their distribution, which is quantitatively expressed by the continuity equation (see 2.2). Since neither sources nor sinks are present along the fibers (solenoidality assumption), the total mass during the process is conserved.
We may thus formulate the following problem: consider two given regions, assign to each one a mass distributions with same total mass, and transport the first mass distribution to the second one by respecting as much as possible the probability distribution on the velocities given by the DDF.
This amounts to assign a cost to any such mass transportation process that penalizes particle trajectories (transport paths) with lower probability velocity distribution according to the DDF data.
The minimizing set of trajectories will give as a byproduct a reasonable picture of the neuronal fiber bundle connecting the two regions.
This kind of problem is known as an optimal mass transportation problem, and in general it admits a solution under mild hypothesis. However, without some regularity assumption, solutions turn out to be difficult to interpret.
In the regular cases, the optimal transport solutions can be viewed as a superposition of noncrossing trajectories (transport paths) starting from the initial configuration to the final one (in our situation, they are the trajectories followed by each particle), and the cost function can be expressed as an integral of a cost function assigned to every single trajectory.
The optimization procedure reduces then to minimize the cost of every trajectory. This can be done with standard techniques in the framework of optimal control theory. The superposition of optimal trajectories will give us the picture of the bundle of fibers connecting the two considered regions.
We propose here two different cost functions, which are described in Sections 2.3 and 2.4, which provide optimal transport paths solutions between two given region thanks to Theorem 2.5, yielding in particular well-posedness of the proposed model. In this way we are able to provide a quantitative criterion that measures the connectivity between two localized cerebral regions (see formula (10)), extending similar criteria in [36]. Roughly speaking, we compare the Euclidean distance of points belonging to the distinct regions with the geodesic distance measured along the corresponding optimal transport paths, which in turn corresponds to the length of the fibers connecting the two regions. For instance, a geodesic distance considerably greater than the euclidean one has to be interpreted as a poor or totally absent neuronal fiber connection between those regions. The plan or the paper is the following. In the next section we first recall and analyze those quantities related to the DSI data that will enter in our modelization. We then recall basic theoretical background on mass transportation theory and its relation with optimal control and Hamilton-Jacobi PDE's, proving in particular Theorem 2.5. We hence discuss two kind of cost functions related to our problem and analyze their regularity properties to ensure existence of optimal transport paths solutions according to Theorem 2.5.
The final section is devoted to the discussion on advantages and shortcomings of our model, the formula giving our quantitative criterion for brain connectivity, together with further remarks and open problems. In the Appendix we fix notations and give some basic results.
2.1. Discussion of the DSI data. The DDF function assigns to each voxel a probability distribution on the space of velocities. A voxel is represented by a set Q(α), defined as the unit cube centered at Due to the highly anisotropic character of the medium where diffusion occurs, the fraction of molecules displaced in the direction of a neuronal fiber will be significatively different from the fraction of molecules displaced in the orthogonal directions. Moreover, many fibers aligned in the same direction will result in an higher value of the DDF corresponding to that direction than a single fiber. By construction, we have is a probability density on R 3 . This means that if we consider all possible displacements v ∈ R 3 and sum all the fractions of molecules which moved in that directions, we recover the whole of the initial number of molecules in Q(α).
The main properties of f D are the following: 1. if we fix α and a direction w ∈ S 2 , we have that the probability of a displacement in direction w follows a normal distribution according to its magnitude; 2. for α fixed, we have that if two directions are sufficiently close, the respective distributions are close, i.e. w → f D (α, w) is continuous; Of particular significance for the construction of our model is the set M (α) of displacements with cumulative probability less or equal 1/2. Remark that this set may be different from the graph of the ODF usually considered in current models.
The set M (α) contains the zero vector and its boundary is given by those vectors v whose modulus is the median of the magnitudes of observed displacements in the direction v/|v|. This set is in general nonconvex, however it is centrally symmetric. We can think to ∂M (α) as a set of averaged observed velocities.
We associate the set M (α) to the center of the voxel. We consider a suitable continuous extension f (x, v) of the DDF to the whole voxel in such a way that and, for any direction w ∈ S 2 and λ ∈ R, the profile of f (x, λw) is similar to f D (α, λw). We set for instance where a, r : R 3 × S 2 →]0, +∞[ are continuous strictly positive functions bounded away from 0 such that with dΣ the area element of S 2 . We assume that, for fixed x, w → r(x, w) and w → a(x, w) are Lipschitz continuous functions. We may hence define the following set of displacements M (x) for any x ∈ M (α): Boundary points of this set are precisely those displacement with cumulative probability function 1/2. We also define the dual Z(x) of M (x) by setting: The sets M (x) inherit from f some regularity properties: x → M (x) is continuous with respect to the Hausdorff distance, moreover M (x) is compact, star-shaped and centrally symmetric. These properties of M (x) will be crucial in the regularity issues for the optimal mass transportation problem, defined in term of the sets M (x), that we discuss in the next sections. Let us summarize these and other immediate properties of the sets M (x) and Z(x).

Proposition 1.
We give a first set of properties on the sets M (x): is strictly star-shaped with respect to 0, for every x ∈ R 3 .
Proof. All the listed properties follow immediately from the definition of M (x).

Remark 1.
We briefly discuss the physical meaning of some of these properties: -Property (M2) implies that we should allow small displacements in all the directions, due to possible measurement uncertainity. This follows from the fact that r(x, w) > 0 and hence inf{r(x, w) : w ∈ S 2 } = min{r(x, w) : w ∈ S 2 } > 0. -Property (M3) says that only bounded displacements are of physical significance, indeed the probability of a displacement in a given direction decays exponentially with its magnitude. This follows from the fact that sup{r(x, w) : w ∈ S 2 } = max{r(x, w) : w ∈ S 2 } < +∞.
These properties of M (x) will be crucial in the regularity issues for the optimal mass transportation model, defined in term of the sets M (x), that we discuss in the next sections.

2.2.
Relation with optimal transportation and Hamilton-Jacobi theory. Assume to have two compactly supported mass distributions with densities ρ 0 and ρ 1 , such that R 3 ρ 0 (x) dx = R 3 ρ 1 (x) dx, representing respectively the initial and the final configuration of a distribution of particles transported trough the neuronal fiber paths. As described in Subsection 1.4, a way to recover the neuronal fiber bundle connecting ρ 0 with ρ 1 is to consider the superposition of the particle trajectories as the solution of an optimal mass transportation problem with marginals ρ 0 and ρ 1 . Namely, we consider the problem of transporting ρ 0 in ρ 1 minimizing a suitable cost which depends on the path followed by the mass particles and which reflects the behavior of the extended DDF. This problem is referred to as a Monge-Kantorovich problem, and it admits an equivalent fluid dynamics interpretation due to Benamou and Brenier (see [12], [13], and Chapter 8 in [42]), that we describe next.
Adopting the Eulerian point of view, for 0 < t < 1 we denote by ρ t (x) the density of a fluid at time t, such that ρ t=0 (x) = ρ 0 (x) and ρ t=1 (x) = ρ 1 (x), and by v t (x) the corresponding velocity field: these quantities are related by the continuity equation expressing conservation of the fluid mass. For (ρ t , v t ) we consider the following minimization problem: among all the absolutely continuous solution t → ρ t dx of the continuity equation (which must be understood in the weak sense, see Section 4) such that µ t := ρ t dx converges in the sense of measure respectively to the initial and final data µ 0 := ρ 0 dx and µ 1 := ρ 1 , dx when t → 0 + and t → 1 − . The function L : R 3 × R 3 → [0, +∞[ will be referred as a suitable current cost related to our problem. It will penalize the velocities with lower probability according to the DDF. If the vector field v t is sufficiently smooth, it is possible to consider the characteristic system and for each initial datum µ 0 we have that the solution µ t is given by µ t = T t µ 0 , where T t µ 0 denotes the push forward of the measure µ 0 by the vector field T t , namely In this case (1) reduces to which can be rewritten as: If we introduce the cost: the problem can be recasted in the Monge's form: In this case, if we set γ x (t) = v t (x) and assume that v t is regular enough to solve (2), we have that γ x (t) is optimal for We refer to such a v t as an optimal interpolating vector field, which allows us to reconstruct the optimal trajectories. It is well known that Monge's problem requires some regularity assumptions to admit a solution. But due to lack of compactness the problem cannot be restricted in general to the class of absolutely continuous measures. A relaxed formulation on the space of probability neasures, introduced by Kantorovich in 1942, involves the notion of transport plans rather than transport paths or optimal transport vector fields, leading to generalized solutions. Consider the set of transport plans from µ 0 to µ 1 defined by Π(µ 0 , µ 1 ) = π ∈ P(R 3 × R 3 ) : π(A × R 3 ) = µ 0 (A), for all µ 0 -measurable A π(R 3 × B) = µ 1 (B), ∀ µ 1 -measurable B , and the following class of functions: The Kantorovich's problem is to find This is of course a relaxation of the Monge's problem: indeed if γ x (1) is optimal in the Monge's problem, then (id × γ x (1)) µ 0 is optimal in the Kantorovich's problem. We recall the following result (see Theorem 1.3 in Ref. [42]) ensuring existence of a solution for Kantorovich's problem: Moreover the infimum in the left hand side is attained, and to compute the supremum in the right hand side we can restrict to bounded and continuous functions in F(µ 0 , µ 1 , c).
The model case is given by c(x, y) = |y − x| p , i.e. a power of the Euclidean distance between x and y. In this case we can give the following: Definition 2.2. Let X = Y = R 3 , 1 ≤ p < +∞, and µ 0 , µ 1 be probability measures on R 3 . We define the p-Wasserstein distance by setting this turns out to be a metric on the space of probability measures with compact support (see Section 7.1 in [6]).
To have more insight on optimal transport plans, we recall the following (cfr. Definition 2.33 in [42]) c-concave functions possess a PDE characterization that turns out to be extremely useful in our setting (we refer to [10] for a comprehensive introduction to the theory of viscosity solutions of Hamilton-Jacobi equations): Then we have that ϕ(0, x) = f (x), ϕ(1, x) is c-concave, and ϕ(t, x) is the viscosity solution of the following Hamilton-Jacobi equation: where H is the Legendre-Fenchel transform of L (see Corollary 3.6 p.151 in [10]).
We recall next the following characterization of optimal transport plans, referring to Section 2.4 in Ref. [42] and [2] for the proof and further references: Assume that there exists a µ 0 -measurable function c 1 : R 3 → R and a µ 1 -measurable function c 2 : c(x, y) dπ(x, y) < +∞.

Then:
1. there exists a c-concave function f ∈ L 1 (R 3 , µ 0 ) such that f c ∈ L 1 (R 3 , µ 1 ) and (f, f c ) achieves the supremum in (7). 2. Kantorovich's problem (6) has a solution, and a plan π ∈ Π(µ 0 , µ 1 ) is optimal iff π is concentrated on the set 3. If c(x, y) is continuous, bounded below, and µ 0 , µ 1 are compactly supported, then f, f c are upper semicontinuous. If x → c(x, y) is locally Lipschitz on a set U and the Lipschitz constant is locally independent of y, then f can be chosen to be locally Lipschitz on U .
We notice that Proposition 3 applied taking as f the c-concave function given by Theorem 2.4 produces a function ϕ, solution of (9) (for further details on this, we recall Subsection 2.5.2, Subsection 5.4.6 in Ref. [42]). If ϕ is sufficiently smooth, problem (6) admits a solution in the original Monge's formulation.
More precisely we have: Theorem 2.5. Assume that 1. L : R 3 ×R 3 → R is continuous in the x-variables and convex in the v-variables; and it is C 1 in the p-variables; 4. the cost c(x, y) satisfies hypothesis of Theorem 2.4. Then problem (6) admits a solution with a C 1,1 displacement interpolation, and the corresponding optimal interpolating velocity vector field v t in the characteristic D x ϕ(t, x)).
Proof. The proof is inspired by the remark at the beginning of p.181 in [42]. We recall that a function h(x) is called semiconcave if there exist C > 0 such that the function x → h(x) − C|x| 2 is concave.
According to Theorem 2.4, the initial datum f of Equation (8) can be chosen to be locally Lipschitz continuous. By Theorem 5.3.8 in Ref. [15], we have that viscosity solutions of Equation (8) are locally semiconcave in ]0, T [×R 3 . Observe that since L(x, q), and hence H(x, p) are even respectively in q and p, we have c(x, y) = c(y, x).
Let us now consider the optimal transport from µ 1 to µ 0 . By the strict convexity assumption on L, it is unique and it is simply obtained by reversing the velocity vector where v t is the optimal interpolating vector field related to optimal transportation from µ 0 to µ 1 for 0 ≤ t ≤ T . By Equation 13.5 in Ref. [43], we have a.e., where ψ solves the Hamilton Jacobi equation corresponding to the optimal transport from µ 1 to µ 0 . This yields D x ϕ(t, x) = −D x ψ(T − t, x). Since both ϕ and ψ are semiconcave in ]0, T [×R 3 , we have that there exists K > 0 such that for fixed s, t ∈]0, T [ we have This implies that, for fixed t, x → ϕ(t, x) is semiconcave and semiconvex in R 3 and hence it is C 1,1 (R 3 ) (see [15] for details on properties of semiconcave functions).
According to Chapter 13, Equation 13.5 in Ref. [43], the two relations define a vector field v t which, through the solution T t (x) of the characteristic system (2), realizes the minimum in the transport problem between ρ 0 and T t ρ 0 for every t ∈]0, T [, i.e. v t is the optimal interpolating velocity vector field.
In the models proposed in Subsections 2.3 and 2.4, we choose a current cost L penalizing the deviations of velocities from being on the boundary of M (x) and ensuring the regularity properties requested in Theorem 2.5. In particular, the whole information on the optimal interpolating transport vector field v t is encoded into the solution ϕ of the Hamilton-Jacobi equation (8) associated to the problem.
Viscosity solutions ϕ of Equation (8) can be viewed as value functions of a suitable optimal control problem (cfr. [10]). We will actually formulate our models relying on this point of view, partially following ideas from [21] and [2]. Recall that there is a well-established literature to effectively treat, also from a numerical point of view, such class of optimal control problems.
The transport cost associated to this choice is given by: i.e. c 1 is the minimum time needed to steer x to y along absolutely continuous curves ξ(·) satisfyingξ(t) ∈ M (ξ(t)) for a.e. t. Notice that according to Aumann's Theorem (see e.g. Theorem 5.15 in Ref. [17]), in the definition of c 1 , M (x) can be replaced by its convex hull . If we set H 1 (x, p) = δ M (x) (−p) − 1, where δ C (p) := sup q∈C q · p is the support function, it is known (see Proposition IV.2.3 and Theorem IV.2.6 in [10] for the proof) that for fixed x, the function y → c 1 (x, y) is the unique continuous and bounded below viscosity solution of H 1 (y, Du(y)) = 0 for y ∈ R 3 \ {x} coupled with u(x) = 0. Moreover c 1 is a metric on R 3 and there are constants k 1 , k 2 > 0 such that k 1 |x−y| ≤ c 1 (x, y) ≤ k 2 |x−y|. Unfortunately H 1 is not strictly convex, whence the solution of (8) may not satisfy the regularity assumptions in Theorem 2.5. To tackle this difficulty, we add a quadratic perturbation depending on a small parameter ε > 0 in order to obtain smooth approximate solutions. We set: Notice that for ε → 0 + we have c ε 1 (x, y) → c 1 (x, y). For every fixed x, the function v → L ε 1 (x, v) := 1 + ε|v| 2 /2 is strictly convex, and we denote by p → H ε 1 (x, p) its Legendre transform, which turns out to be C 1 . We notice that, according to Proposition 1, Theorem 2.5 applies for L ε 1 , yielding a C 1,1 solution ϕ ε of (8). In particular, the optimal interpolating vector field v ε t is given by v ε t (x) = ∇ p H ε 1 (x, D x ϕ ε (t, x)) for a.e. x. We associate to this cost a generalized Wasserstein distance, defined, for two given probability measures µ 0 , µ 1 , by In Section 2.5 we will formulate a quantitative brain connectivity criterion in term of this distance.
2.4. Intrinsic metric model. In this second model, we modify the infinitesimal metric on R 3 in order to take into account the anisotropic character of the diffusion as expressed by the DDF, by using the gauge function (see 4 for a definition) to penalize v t (x) according to its distance to M (x). This equips R 3 with the structure of a geodesic space, where the gedesics correspond to the optimal transport trajectories. Due to technical reasons (see Subsection 3.2), we relax the problem replacing the set M (x) with its convex hull and set We define the following cost function for x, y ∈ R 3 : and accordingly we define the distance In this case there are no constraints on velocities, so that the Hamiltonian H 2 associated to the Lagrangian L 2 is simply given by Also in this case, thanks to Propositions 1 and 2, Theorem 2.5 applies, due to strict convexity of L 2 , so we obtain a solution of the Monge problem and the optimal interpolating vector field v t is given by v t (x) = ∇ p H 2 (x, D x ϕ(t, x)) for a.e. x.

2.5.
A quantitative brain connectivity criterion. From a medical point of view, it turns out to be important to know not only whether or not two regions are connected by a fiber bundle, but also the quantity of fibers forming that connecting bundle. This connectivity problem is relevant in particular to describe situations where, due to medical pathologies, we have two cerebral areas connected by fibers that pass through a damaged region where the connection is interrupted or strongly hampered: if the connection was ensured by several fibers, the damaged area will not jeopardize the connection between the two given areas as much as in the case of a scarce number of connecting fibers. We suggest the following criterion in order to quantitatively estimate the connectivity of two regions D 0 and D 1 , by means of the transport distance W ci 2 : let µ 0 and µ 1 be two absolutely continuous measures with the same mass, constant densities, and support coinciding respectively with D 0 and D 1 . Define the ratio of the transport distance W ci 2 (related to the transport cost c i between µ 0 and µ 1 , as defined in Subsections 2.3 and 2.4) and the Hausdorff distance d H (related to the Euclidean metric) between D 0 and D 1 . A small quotient in (10) implies a high fiber density connection between D 0 and D 1 . This kind of information is apparently not provided by other tractography models that ignore information on fiber density.
The comparison between the value of Q(D 0 , D 1 ) obtained for the same brain areas in healty (reference) and pathological subjects could be an indicator of the effective damage occurred.
2.6. Exact solutions. In both the models described in Subsections 2.3 and 2.4, in order to apply Theorem 2.5 to obtain the required smoothness yielding the optimal interpolating vector field v t , some manipulations to the cost functions were performed. We underline the fact that, indeed, the transport problem admits a solution in the Kantorovich's reformulation even when, due to the lack of smoothness, it is not possible to define a classical optimal interpolant vector field and a classical solution to the characteristic system (see Theorem 2.4).
Indeed, considering the original cost function c 1 of Subsection 2.3 (corresponding to ε = 0) and the unmodified LagrangianL 2 (x, v) = γ 2 M (x) (v) of Subsection 2.4 does not prevent the application of Theorem 2.4 and the definition of corresponding generalized Wasserstein distances even in this case. In particular, the connectivity criterion of Subsection 2.5 can be stated even in these cases.
The quadratic perturbation added to c 1 in Subsection 2.3 is a regularizing term in order to have a representation of quasi-optimal transport problem solutions by means of interpolating vector fields, which is much more intuitive than to make use of so-called dynamical transport plans, i.e. transporting plans depending on time, which would characterize solutions in a more general setting.
In [38] and [39] it is shown that, under suitable assumptions, it is possible to associate to an Hamilton-Jacobi equation H(x, Du) = 0 a generalized distance (the model case is H(x, p) = |p| 2 − 1, which is related to Euclidean distance). More precisely, it is possible to construct a metric S(y, x) such that, for fixed x, the Lipschitz continuous map u(y) := S(y, x) is a viscosity solution of H(y, Du(y)) = 0 in R d \ {x}, with boundary datum u(x) = 0. This result, which is well known in the convex case, still holds in certain nonconvex cases. Indeed, in our framework, if we set H(x, p) = γ 2 M (x) (v) − 1, by Proposition 1, the results of Section 4 in [38] can be applied, ensuring the existence of such metric. Accordingly, we can set c(x, y) = S(x, y) and solve the Monge-Kantorovich transport problem, stating the connectivity criterion. However, in the nonconvex cases, it may happen that the metric S is not a path metric: the distance between two points is no longer given by the infimum of the (intrinsic) lenght of the curves joining x, y, i.e. it may happen that where ξ : [0, T ] → R d is a continuous curve joining x to y, and the supremum is taken with respect all the finite increasing sequence t 1 < ... < t n with t 1 = 0 and t n = T .
S is a metric too, and it is called the path metric associated to S. We have S = S if and only if S is a path metric.
This fact prevents the recovering of an optimal interpolating vector field and the solution of the characteristic system. Theorem 5.1 in [38] states that, however, in our cases the path metric S (y, x) is the metric associated to the convexification of p → H(x, p).

Final remarks and open problems.
In this section we discuss some issues related to our model, and present some directions that need further investigation.

3.1.
Advantages of the proposed model. It is an experimental fact that the measurements in each voxel enjoy very poor robustness properties, and sometimes we can have a voxel where no information are available, surrounded by regular voxels. This missing cell problem is not adequately faced by classical models, since they assume that no fibers can cross such voxels. In order to avoid this situation, we propose to assign to each empty voxel Q α a constant profile M (α) given by a ball, whose radius should be determined by experimental data. In this way, the missing information in the voxel will be in some sense reconstructed by the information contained in the neighboring ones. Indeed, in our construction, if the infinitesimal metric is given by (the dual of) a ball, it will not change the direction of integral lines hence will not affect the data coming from the neighborhood. So fibers will be propagated in the missing voxel according to the behavior in the nearby ones. The smaller the radius of the assigned ball, the stronger must be the surrounding information to have a fiber passing through the missing cell. The lowest possible radius is given by the sensitivity of instruments.

3.2.
Mathematical issues and drawbacks. From a mathematical point of view, the main issue here is to provide from an optimal transport plan, solution of the Monge-Kantorovich problem, a unique optimal interpolating velocity vector field v t describing the evolutive process. The obstruction is given by the lack of smoothness of v t , due to the presence in a voxel of multiple fibers crossing along different directions. The strategy that can be used to overcome these difficulties are based on explicit representation formulas for the solutions, regularization procedures and, in the case of nonuniqueness, also on selection principles (e.g. selecting the smoothest one among all the solutions). We refer to [6] and [5] for a complete analysis of these topics, just recalling that the use of selection principles with respect to certain criteria (e.g. curvature of fibers) is extensively used also in other tractography models (e.g. [45]). In the second model, described in Subsection 2.4, the problem was relaxed by making a convexification of the sets M (x), similar to the (implicit) convexification made in [36] by using the Legendre transform. This relaxation has no physical meaning, since the sets M (x), giving the average displacement probabilities, are characterized by strong anisotropy, hence are nonconvex. However, in the nonconvex case, since Lax-Hopf's representation formula no longer holds for the associated Hamilton-Jacobi equation, it is not possible in general to define a optimal interpolating velocity vector field v t based on characteristic curves. For details and counterexamples on this delicate point, we refer to [38].
3.3. Work in progress. Reconstructing a reliable picture of the whole net of neuronal fibers will actually require a large number of implementations of our model, choosing several pairs of brain areas according to experimental data and/or our quantitative criterion. The algorithmic implementation and the numerical validation of our model is currently still under investigation. Our aim is to provide also an equivalent description of this model in the framework of mean field games, and take advantage of the associated numerical methods (see e.g. [1]). 4. Appendix. Our environment will be the Euclidean n-dimensional space R n . We will denote by L n and H n respectively the Lebesgue and the Hausdorff ndimensional measures.

General notations and definitions.
Our main reference for the properties of convex sets in R n is [37]. A detailed introduction to the theory of viscosity solutions and its applications is given in [10].
Definition 4.1. Let K 1 , K 2 be compact subsets of R n . The Hausdorff distance between K 1 and K 2 is given by: This function is a pseudometric on the nonempty elements of the power set 2 R n := {A : A ⊆ R n } and a metric on the compact subsets of R n . Unless explicitly stated, we will always equip the compact subsets of R n with this metric. Definition 4.2. Let K ⊂ R n , a ∈ K. We say that K is convex iff for every x, y ∈ K, λ ∈ [0, 1] we have λx + (1 − λ)y ∈ K. Given K ⊆ R n , we will denote by co(K) its convex hull, namely the intersection of all convex subsets of R n containing K. Definition 4.3. Let K be a closed convex subset of R n . We define: 1. the support function to K is the function δ K : R n → R defined by δ K (q) = sup p∈K q, p , and we recall that δ K (q) = p 0 , q for some p 0 ∈ K iff q ∈ N K (p 0 ), where N K (p 0 ) := {v ∈ R n : v, p − p 0 ≤ 0 for all p ∈ K} is the normal cone in the sense of convex analysis. 2. if 0 ∈ int(K), the gauge (Minkowski) function of K is defined by: γ K (q) = inf{λ > 0 : q/λ ∈ K}.

4.2.
Measure theoretic solutions of the continuity equation. We collect here the basic facts on the measure solutions of the continuity equation, we refer for all the details to [6].
Definition 4.4. We denote by M loc (R d ) the set of signed Radon measures on R d , by M + loc (R d ) the set of non-negative Radon measures on R d and by M loc (R d ; R h ) the set of R h -valued Radon measures on R d . We recall that the push-forward G µ of a measure µ on R d through a measurable vector field G : R d → R d is defined by setting for all ϕ ∈ C 0 c (R d ) by equivalently G µ(B) = µ(G −1 (B)) for every B ⊆ R d . We will denote by P(R d ) the set of all probability measures on R d . For p ∈ R we define the set P p (R n ) := µ ∈ P(R n ) : R n |x| p dµ < +∞ .
Definition 4.5. Let {µ n } n∈N , µ be measures on R d . We say that µ n converges narrowly (or weakly * ) iff for every continuous bounded function f : R d → R.