Multilevel bioluminescence tomography based on radiative transfer equation Part 2: total variation and l1 data fidelity

: In this paper we study the regularization with both l1 and total-variation norm for bioluminescence tomography based on radiative transfer equation, compare l1 data fidelity with l2 data fidelity for different type of noise, and propose novel interior-point methods for solving related optimization problems. Simulations are performed to show that our approach is not only capable of preserving shapes, details and intensities of bioluminescent sources in the presence of sparse or non-sparse sources with angular-resolved or angular-averaged data, but also robust to noise, and thus is potential for efficient high-resolution imaging with only boundary data.


Introduction
In bioluminescence tomography (BLT) [1][2][3] [4] as forward model, and then formulate it as a standard leastsquare problem, i.e., in which J is the system matrix usually called Jacobian or sensitivity matrix corresponding to the forward model and 2 However the reconstructed images based on DA approximation from (1) are not satisfactory mainly for two reasons. First physically DA approximation may not be accurate, especially in the non-diffusive regime, e.g., close to the source, near the boundary, forwardpeaking scattering, discontinuous absorption or scattering, or when absorption is smaller or comparable with the scattering. Second, mathematically BLT with DA as forward model is severely ill-posed. Since the boundary measurement provides only angular-averaged data whose dimension does not match with that of the interior region, i.e., number of measurements M is much smaller than the total number of unknowns s N numerically, it was shown mathematically [3] that there is no unique solution for this inverse source problem based on DA. Moreover, J comes from a smoothing diffusion operator which destroys image singularities and features. In practice all the above means that one has to solve a very illconditioned and underdetermined system for DA-based BLT. In general, one does not expect a good image quality with clean background from l2-regularized BLT with DA forward model alone without either more data or reduction of unknowns [5][6][7][8][9][10][11][12].
As a remedy for poor reconstruction due to forward modeling by DA, we use radiative transfer equation (RTE) [13][14][15][16] as the forward model. Physically, RTE is an accurate model for photon migration in tissues. Moreover, with RTE one can utilize much richer data, such as boundary angular-resolved measurements, whose dimension can match that in the interior region. Mathematically, it was shown [17,18] that the RTE-based inverse source problem with angular-resolved boundary measurements almost always has a unique solution with certain stability. Numerically it means that a better conditioned and less underdetermined system resulted from RTE-based BLT than from DA-based BLT. So in those applications in nondiffusive regime, it is more appropriate to use RTE model and better image reconstruction is expected.
A powerful and general framework in dealing with inverse problem is through variational formulation. It usually contains two key ingredients in the energy functional, data fidelity, which is to match the solution with measurements based on an appropriate forward model, and regularization, which is to regularize an ill-posed problem. With probability interpretation, the form of data fidelity should be determined by noise model and the form of regularization should be determined by a priori knowledge. The balance between this two is usually determined by signal to noise ratio of the data and the smallest scale one wants to resolve in the reconstruction.
In general l2 regularization tends to penalize large elements and create spurious small elements which smears the reconstructed image and produces noisy background when the inverse problem is severely ill-posed. On the other hand, l1 regularization weights all elements equally and tends to produce a sparse solution with clean background which would be a suitable choice for sparse sources [19][20][21][22]. As a result, we studied l1-regularization strategy in a multilevel framework and showed its superiority over the conventional l2 regularization for sparse sources in our previous work [23] in which we minimize However, l1 regularization tends to find sparse solutions, which on one hand, makes it successful for sparse source reconstruction, but on the other hand, may shrink the support for non-sparse sources. Therefore, it is desirable to design more appropriate formulations for nonsparse sources that also inherit the merits of l1 regularization. In this paper we propose and compare a few different choices of data fidelity and regularization combinations for RTEbased BLT and justify their performances in different scenarios.
For nearly piecewise-constant source distribution in BLT, a natural choice is to use total variation (TV) regularization, which was first proposed in [24], and has wide applications in various aspects in image processing. Now we consider the following minimization problem in which TV λ is the TV-regularization parameter and || || TV ⋅ denotes the TV norm that we shall specify in the next section. It is well known that TV regularization [25] tends to be effective for piecewise-constant reconstruction and can preserve the boundary of large object well while removing small features and thin or small objects such as small inclusions or sparse sources. Therefore we propose a combination of l1 and TV regularization which has an interesting geometric balance between them which will be explained in the next section. We shall call it l1 + TV regularization, so that it combines the merits from both l1 and TV regularization in the sense that it successfully reconstructs piecewise-constant image that preserve both geometry and details with little background noise for both sparse and non-sparse sources. The optimization problem becomes However, if the source distribution is smooth, it is well known that H1 seminorm, which is l2 norm of the gradient, is a good regularization. For example, H1 regularization with l2 data fidelity was used for RTE-based optical tomography in [26]. For comparison purpose, we present mathematical formulation and simulation results with H1 regularization in Section 3.
Most often in practice the data may be contaminated by various types of noises. For Gaussian type of noise, the l2 data fidelity term is a good and natural choice. However, l2 data fidelity term does not handle outliers well since it is biased towards large residuals. On the other hand, nonsmooth data fidelity [27,28] can reconstruct correct images under some assumptions. As a successful example of nonsmooth data fidelity, l1 data fidelity was first introduced in [29], and later found applications in medical imaging [30,31], computer vision [32] and image processing [33]. Therefore, we consider the following optimization problem with l1 data fidelity to deal with practical noise involving outliers Besides, from our simulations, we find that l1 data fidelity with l1 + TV regularization tends to reconstruct as good results as l2 data fidelity with H1 regularization for the smooth non-piecewise-constant sources.
Finally, our multilevel reconstruction developed in [23] can be implemented for all the above formulations. If the source is sparse, coarse mesh reconstruction, which costs much less computation effort and is better conditioned, not only provides a good initial guess but also a good localization for the fine mesh. Hence, the reconstruction on fine mesh can be restricted to the localized region, which again reduces the computation cost. As another consequence, the amount of measurements can be reduced. On the other hand, if the source is not sparse, coarse mesh solution can still provide a good initial guess for the fine mesh and improves overall computational efficiency and stability. Moreover, coarse mesh reconstruction can also help to choose better r to balance between l1 and TV norm for the fine mesh reconstruction.
Here is the outline of this paper. In section 2, we shall first briefly introduce RTE-based BLT, define TV norm on triangular mesh and describe our interior-point methods for solving (4) and (5); in section 3, we shall validate our algorithms with simulations; in section 4, we shall conclude and comment on our future work.

RTE-based BLT
In this paper, we use RTE as forward model for reasons described in the introduction. The details of an efficient solver of RTE can be found in [34]. Next we shall introduce notations for presentation purpose.
For forward problem, in vivo light propagation is modeled by the following RTE with the vacuum boundary condition  (6) where the quantities in the equation are the photon flux Ψ which depends on both space r and angleŝ , the light source q , the Henyey-Greenstein scattering function f with anisotropic factor g , the absorption coefficient a µ , the scattering coefficient s µ , the outer normaln at boundary and the angular space 1 n S − , i.e., the unit circle in 2D or the unit sphere in 3D. After Discontinuous Galerkin discretization of (6), we numerically denote piecewiseconstant isotropic bioluminescent source by [  to denote measure operators for reconstruction. In this paper, we shall use two types of measurements, i.e., boundary angular-resolved measurement Ψ and boundary angular-averaged measurementˆˆ0ˆˆŝ n s n ds BLT, which is a linear inverse source problem, the Jacobian matrix J is composed of Green's function of RTE. We refer readers to [23] for the computation of Jacobian J . Just emphasize here that since we solve RTE without forming the system matrix, the computation for adjoint solver can be realized in three steps: first reverse the directions for detectors and use them as sources due to reciprocity, then compute it in the same way as for direct solver, and last reverse the directions of photon fluxes to get the adjoint fluxes.

Regularization by TV
Although l1 regularization performs well for sparse sources and is capable of recovering highresolution details as demonstrated in [23], it tends to shrink the support of large or non-sparse sources, especially in the isotropic scattering regime or diffusive regime. To overcome this, we introduce the regularization by TV norm || || TV Q for nearly piecewise-constant source reconstruction.
Since our algorithm is discretized on triangular mesh instead of Cartesian grid, we shall use a discretized version of TV coarea formula [ (7) in which k L denotes the length of the k th edge, and , k l q and , k r q represent the left element and the right element with respect to the k th edge with the convention that the spatial index of the left element , k l q is always smaller than that of right element , k r q to avoid the repetitive counting of edges.

Optimization Algorithm for l1 and TV regularization
In this section, we shall describe an interior-point method [36] for solving (4), which can be easily modified for minimization problem (2) and (3).
First we recast (4) as a constrained convex optimization with differentiable functional Here i A is area of the i th element and 1 : / Here we incorporate area weights i A and edge weights k L into the scheme to account for non-uniformity of the mesh, which is not necessary on a uniform mesh.
Next we transform (8) into an sequence of t -dependent unconstrained convex optimization problems by penalizing the element and edge constraints corresponding with logarithmic barrier The standard interior point method for solving constrained convex problem (8) is through minimizing a sequence of unconstrained convex problems (9), for which the solution converges as t → ∞ . Actually the solution of (9) is no more than 2( ) / Next we shall describe our PCG method for computing the search direction by solving Before that, we first compute t ∇Φ and the Hessian 2 t ∇ Φ , i.e., 1 1 , Note the equality above should be understood in the sense the left hand side is assembled from the matrices on the right hand side with correct ordering of unknowns.
. And then we assemble the following local vectors and matrices corresponding to each pair of constraints for either some element or edge into both sides of (10).
First, with the first column or row for i q and the second for i u , the local vector and matrix for the pair of constraints corresponding to the i th element are Second, with the first column or row for , k l q , the second for , k r q and the third for k v , the local vector and matrix for the pair of constraints corresponding to the k th edge It is easy to see that the Hessian (12) is positive symmetric definite. In order to make PCG method work, we need to find a precondition matrix P which is not only fast to invert, but also a good approximation of the Hessian. Here we choose the sparse matrix Algorithm: barrier method for optimization problem (8) given tolerance 0 ε > , 1 λ and r .
Initialize 2 Repeat optimization of subproblems (9) 1. compute the search direction ( , , ) Q u v ∆ ∆ ∆ by (12)  3. update: ( , , ) ( , , ) ( , , ) Next, we consider the balance between l1 and TV regularization. For source distribution that is piecewise constant with uniform intensity I ,TV norm is approximately IC when C is the perimeter of boundary of the support and l1 norm is approximately AI , where A is the area of the support. So in this case a good choice of ratio r , should be roughly proportional to / A C . Note / A C can be independent of the length unit since one can always nondimensionalize RTE (6) so that / A C is dimensionless, to which r should be equal. This implies that ratio r gives a geometric balance of l1 norm and TV norm. For example, l1 regularization should weight more than TV regularization for sources with smaller ratio / A C , e.g., sources with small or thin support, which corresponds to sparse sources. As a result, with sparse sources as a priori, one should put more weights on l1 regularization by choosing small r in the algorithm to get better results. We use numerical experiments to demonstrate our rule of optimal choices for different configurations in Section 3. On the other hand, our tests seem to suggest that the results are not very sensitive to the choice of r . From our numerical experiments, the value of r is typically between 0.1 and 10. Last but not the least, we normalize J by columns to get J * , and then actually solve the following instead of (8) This is crucial for the successful reconstruction. Physically this means the total energy of measurements with respect to each unknown is on the same level, while mathematically diag( ) T J J * * is equal to identity matrix, which makes problem isotropic to all unknowns. After solving (13), we need to change correspondingly from Q * to Q , which reflects the sensitivity of each unknown on measurements. This is also equivalent to optimization in weighted norm.

Optimization Algorithm for l1 Data Fidelity
To simplify the discussion, we will only present the treatment of l1 data fidelity term in the optimization problem (5), and the l1 or TV regularization can be easily adapted to the problem (5) in the same way as what we have done for (4) by (8)  To find the search direction, we need to compute  [1] y is the unit vector for y , 6 [ ] y α is the constraint vector for y and diag( ) ⋅ is the diagonal matrix with diagonal elements in the bracket.
To invert (17) efficiently, we use a similar PCG method with a sparse symmetric positive definite preconditioner Last, we want to emphasize in our algorithm we actually minimize (5) with normalized Jacobian instead of (14), i.e.: we solve a sequence of subproblems coming from the combination of (9) and (16) with l2 data fidelity replaced by l1 data fidelity; in the barrier method, we compute 2 [ ]

Simulations
For comparison purpose, besides (1) for l2 regularization, we also consider H1 regularization by a discrete version of, but not exactly, H1 seminorm where 2 r is the parameter to balance between l2 norm and H1 norm. In this section we use various numerical simulations to validate our novel l1 + TV regularization strategy for BLT and compare l1 data fidelity with l2 data fidelity. In all figures shown in this section, reconstructed images are plotted as what they originally are from reconstructions without further image processing, such as thresholding. As pointed out at the beginning, radiative transport equation (RTE) is used as the underlying model for both forward and inverse problem. Data for forward problem is generated on a fine mesh that is different from the fine mesh used for inverse problem. The fast forward solver developed in [34] is used to compute both the forward problem and the Green's functions for inverse problem. The methods for solving l2-regularized BLT can be found in [23]. In the following figures for l2 regularization except Fig. 14 and 15, we only plot results from (1) since l2 regularization (18) and (19) give the similar results. In particular we design different simulations to demonstrate the following points.
First, with a priori knowledge of the rough size information of bioluminescent source, one can choose the ratio r between TV norm and l1 norm to optimize reconstruction results. On the other hand, from our numerical experiments, it seems that the reconstruction images are not very sensitive to r in the sense that l1 + TV regularization with the ratio around unity in general produces satisfactory images. See Fig. 1.
Second, with angular-averaged or angular-resolved data for non-sparse or sparse source, our l1 + TV regularization reconstruct better than either l1 or TV regularization. Moreover, all of these perform better than the conventional l2 regularization. See Fig. 2-13.
Fourth, for the reconstruction of non-piecewise-constant images, with l2 data fidelity, l2 regularization (1) or H1 regularization (18) may give better results than l1 + TV regularization (4). On the other hand, l1 data fidelity (5) with l1 + TV regularization gives as good results as l2 data fidelity with l2 or H1 regularization. We show examples in Fig. 14 and 15.
Last, l1 data fidelity is capable of dealing with noise containing outliers while l2 data fidelity fails to handle it. See Fig. 16.
Before going into the simulations, we want to comment on the reconstructions with "false" optical background, i.e., the inaccurate absorption or scattering coefficients. Since that sensitivity or Jacobian for sources is much stronger than that for absorption or scattering, the false background does not necessarily cause the large error in source reconstruction. Therefore, we only show the simulation results with correct background in the following to simplify the discussion and emphasize the improvement due to the proposed methods. Regarding the difference for reconstruction with angular-resolved data and angular-averaged data, angular-resolved data improve the conditioning of Jacobian, and plus it alleviates the underdeterminedness, as showed in [23].
The following simulations are performed on a square domain with 20mm side length with anisotropic factor g = 0.9 or 0.0, homogeneous . g = 0.9 corresponds to a domain of 2 transport mean free paths while g = 0.0 corresponds to a domain of 20 transport mean free paths. As a result, the reconstruction is expected to be better when g = 0.0, as can be observed through Fig. 3-10.
The reconstruction mesh has 32 angular directions, 1397 nodes and 2672 elements. To avoid inverse crime, all forward data for various simulations are generated with 64 angular directions on the corresponding spatial mesh, which is different from reconstruction mesh. All the reconstruction results are either with 120 boundary angular-averaged data or 900 boundary angular-resolved data. For all simulations with l1 + TV regularization, 1 0.01 λ = and 1 r = unless otherwise indicated.
The purpose of simulations in Fig. 1 is to demonstrate the dependence of the choice of the ratio r between TV norm and l1 norm on / A C . In Fig. 1 result. In conclusion, the results are not very sensitive to r , however the optimal choice does need a priori knowledge. In Fig. 2, Source I (a) is composed of three objects with the forward mesh (b) of 1445 nodes and 2768 elements; Source II (c) is composed of three letters with the forward mesh (d) of 1509 nodes and 2896 elements. Regarding the relative size of sources, Source I is nonsparse and Source II is sparse. We present below in Fig. 3-10, 12 and 13 that l1 + TV regularization (4) in general gives the best results of l2 regularization (1), l1 regularization (2) or TV regularization (3) for both Source I and II.  For non-sparse source with g = 0.9 as shown in Fig. 3 and 4, regularization by l1 + TV norm indeed gives better results than that with l2, l1 or TV norm. In Fig. 3, regarding the reconstruction with angular-averaged data as a severely ill-posed inverse problem, l1 + TV method (d) is still able to separate three objects and roughly preserve image boundaries with acceptable background noise, while l1 norm (b) shrinks the support significantly and produces high intensity, TV norm alone (c) gives the blurred image with some noise in between objects, and one can hardly see anything through l2 norm (a). In Fig. 4, with angular-resolved data, l1 + TV norm (d) still gives the most accurate reconstruction of image edges, details and intensities. Similarly, reconstruction results with g = 0.0 are shown in Fig. 5 and 6, which are in diffusive regime. Here l1 + TV is even more impressive for angular-averaged data in Fig. 5.        Next, we evaluate the reconstruction performance for sparse sources with g = 0.9 as shown in Fig. 7 and Fig. 8. Regarding the reconstruction with angular-averaged data in Fig. 7, the conclusion is still the same as that for non-sparse sources: l1 + TV method (d) gives the best result. However, since in this case not only three letters are too small, but also the data are limited, even l1 + TV method is unable to resolve the image details. On the other hand, with angular-resolved data in Fig. 8, l1 norm (b) gives the best details although with nonsmooth intensities and l1 + TV (d) is the second best although it blurs the image a little bit. This makes sense since l1 is the most appropriate choice for sparse sources. Similarly, reconstruction results with g = 0.0 are shown in Fig. 9 and Fig. 10.
Then, we compute an example of our multilevel approach for Source I ((a) in Fig. 2) with g = 0.0 as shown in Fig. 11. The single-level reconstruction (a) is done on the reconstruction mesh specified at the beginning of this section with 360 angular-resolved measurements. With the same reconstruction mesh as the fine mesh, the two-level reconstruction with 180 angularresolved measurements also has a coarse mesh with 365 nodes and 668 elements. Both use l1 + TV regularization. The results in Fig. 11 show that our multilevel approach is more efficient than single-level reconstruction with saving of at least half of the computational time.
In our multilevel approach [23] with l1, TV or l1 + TV regularization, we choose l1 + TV regularization for the coarse reconstruction, since it localizes the source with proper boundary and little background noise for both sparse and non-sparse sources. On the other hand, although l1 regularization is favorable for sparse sources, it is not necessary to use it on the coarse mesh since it is difficult to get high-resolution images through coarse reconstruction anyway. Next on the fine mesh, the element edges in (7) are restricted within the support of Q found through coarse reconstruction with the support boundary as the boundary for unknowns. Meanwhile, coarse reconstruction also helps to choose a proper ratio r in (8) to balance between l1 and TV norm for the fine reconstruction, i.e., more TV norm for nonsparse sources or more l1 norm for sparse sources.  Last, we do the reconstruction for Source I with g = 0.9 and varying intensities, i.e., 1 for hexagonal inclusion, 2 for square inclusion and 4 for triangular inclusion with reconstruction results in Fig. 12 and 13, which show that our algorithms can successfully reconstruct sources with different intensities.  In Fig. 14 and 15 we show an example with g = 0.0 or 0.9 that l2 regularization gives better results than l1 or TV regularization for non-piecewise-constant source with angularresolved data. In the simulations, an elliptic source inclusion (a) is centered at (10mm, 10mm) with major radius 8mm and minor radius 4mm, the axis which is 45 degrees from x direction, and the compactly supported smooth intensity 2  , ( , ) 0 x y x y E e Q x y x y E x y ′ ′ is the coordinate of ( , ) x y after 45 degree rotation. The forward mesh has 64 angular directions, 1421 nodes and 2720 elements.
In Fig. 16 we compare the performance of l1 data fidelity with that of l2 data fidelity with respect to Gaussian noise, salt-and-pepper noise or both. Five small inclusions with 1mm diameter are centered at (5mm, 5mm), (5mm, 10mm), (5mm, 15mm), (10mm, 5mm) and (15mm, 5mm) and the forward spatial mesh is of 1917 nodes and 3696 elements. The simulations are performed through l1 regularization with angular-resolved data with g = 0.9 and 1 1 λ = . We add two different types of noise: one is Gaussian noise and the other is saltand-pepper noise. For the salt-and-pepper noise, we randomly pick up certain number of measurements and set the value to a totally irrelevant one, e.g., 0.
With l2 data fidelity, the result without noise is shown in (a) while the result with 10% Gaussian noise is shown in (c). Adding salt-and-pepper noise, we plot the results in (e) and (g) for randomly setting 180 and 300 measurements to zero respectively from 900 measurements with 10% Gaussian noise. Similarly, we plot images (b), (d), (f) and (h) with l1 data fidelity. From the reconstruction results, we conclude l1 data fidelity has good performance for both Gaussian noise and the pepper-and-salt noise while l2 data fidelity handles pepper-and-salt noise poorly. Therefore, l1 data fidelity (5) instead of l2 data fidelity (4)  O N due to the sparse and positive-definite precondition matrix P . Finally, on a desktop with Intel CPU E6850 (3.00GHz), each forward solver takes 3 to 4 seconds and 900 measurements take around 50 minutes for adjoint method to compute Jacobian J ; optimization with l2 data fidelity takes 3-5 minutes while optimization with l1 data fidelity takes 10-15 minutes. In summary, the major computational time is spent on the computation of Jacobian J while the optimization with l2 data fidelity takes usually less than ten percents of the computational time of J and that with l1 data fidelity takes around twenty percents. On the other hand, computation of the Jacobian can be easily parallelized to reduce the computation time significantly. Also there are some other potentially efficient methods for problems (4) or (5), such as [44-46], which may improve a lot on optimization time.
With our multilevel approach, we may reduce M and s N substantially, especially for sparse sources, and thus to reduce both memory storage and computational time. However, note that the memory for forward solver does not reduce and the computational time of each forward solver does not decrease.

Conclusions and discussions
In this paper, we have introduced l1 + TV regularization strategy for BLT and developed novel optimization algorithm for solving l1 + TV-regualrized BLT based on RTE. Simulations have validated that, for nearly piecewise-constant sources, of regularizations by 12, l1, TV or l1 + TV norm, l1 + TV gives the best results in the sense that it combines merits from both l1 and TV and is in general able to reconstruct image details, edges and intensities for both sparse and non-sparse images with boundary angular-averaged or angular-resolved data. We have also studied optimization with l1 data fidelity and developed the corresponding optimization algorithm. Simulations have shown that l1 data fidelity is preferred over l2 data fidelity to deal with outliers. On the other hand, for non-piecewise-constant sources, l1 data fidelity with l1 + TV regularization is able to reconstruct as good results as l2 data fidelity with H1 regularization, and both in general are better than other strategies. Besides, our optimization method with either l1 + TV regularization or l1 data fidelity is quite general in the sense that it can be easily adapted to other problems similar to (4) and (5).