Nonuniform update for sparse target recovery in fluorescence molecular tomography accelerated by ordered subsets.

: Fluorescence molecular tomography (FMT) is a promising imaging modality and has been actively studied in the past two decades since it can locate the speciﬁc tumor position three-dimensionally in small animals. However, it remains a challenging task to obtain fast, robust and accurate reconstruction of ﬂuorescent probe distribution in small animals due to the large computational burden, the noisy measurement and the ill-posed nature of the inverse problem. In this paper we propose a nonuniform preconditioning method in combination with L 1 regularization and ordered subsets technique (NUMOS) to take care of the different updating needs at different pixels, to enhance sparsity and suppress noise, and to further boost convergence of approximate solutions for ﬂuorescence molecular tomography. Using both simulated data and phantom experiment, we found that the proposed nonuniform updating method outperforms its popular uniform counterpart by obtaining a more localized, less noisy, more accurate image. The computational cost was greatly reduced as well. The ordered subset (OS) technique provided additional 5 times and 3 times speed enhancements for simulation and phantom experiments, respectively, without degrading image qualities. When compared with the popular L 1 algorithms such as iterative soft-thresholding algorithm (ISTA) and Fast iterative soft-thresholding algorithm (FISTA) algorithms, NUMOS also outperforms them by obtaining a better image in much shorter period of time.


Abstract:
Fluorescence molecular tomography (FMT) is a promising imaging modality and has been actively studied in the past two decades since it can locate the specific tumor position three-dimensionally in small animals. However, it remains a challenging task to obtain fast, robust and accurate reconstruction of fluorescent probe distribution in small animals due to the large computational burden, the noisy measurement and the ill-posed nature of the inverse problem. In this paper we propose a nonuniform preconditioning method in combination with L 1 regularization and ordered subsets technique (NUMOS) to take care of the different updating needs at different pixels, to enhance sparsity and suppress noise, and to further boost convergence of approximate solutions for fluorescence molecular tomography. Using both simulated data and phantom experiment, we found that the proposed nonuniform updating method outperforms its popular uniform counterpart by obtaining a more localized, less noisy, more accurate image. The computational cost was greatly reduced as well. The ordered subset (OS) technique provided additional 5 times and 3 times speed enhancements for simulation and phantom experiments, respectively, without degrading image qualities. When compared with the popular L 1 algorithms such as iterative soft-thresholding algorithm (ISTA) and Fast iterative soft-thresholding algorithm (FISTA) algorithms, NUMOS also outperforms them by obtaining a better image in much shorter period of time.

Introduction
In the past two decades, fluorescence molecular tomography (FMT) has been playing an important role in a number of preclinical research fields [1,2]. In FMT, fluorescent agents (e.g. fluorophores) will be injected into the object such as mice and then near-infrared (NIR) laser beams will activate those fluorophores. The fluorophores will then emit fluorescence photons propagating out of the mouse surface, which can be captured by detectors such as CCD cameras. Meanwhile, the geometry of the mouse will be extracted so that along with the optical properties of the mouse, the system matrix can be set up. Then all the captured images can be combined tomographically to recover the distribution of the fluorophores inside the mouse. Due to the high scattering effects of photons in tissues, the system matrix of FMT is usually ill-conditioned and the reconstruction problem is ill-posed. Regularization methods have been employed to obtain reasonable results [3][4][5][6][7]. One may also construct an overall preconditioning matrix first before proceeding with any regularization models [8]. But ensuring the nonnegativity constraint in FMT may be time consuming, for instance, an expensive line search is needed when the popular preconditioned conjugate gradient algorithm is used [5].
Also in the past two decades, the majorization-minimization (MM) algorithm has attracted considerable attention in medical imaging due to its advantages such as separating the high dimensional variable for an easier iterative update in a parallel way [9] and applying the nonnegative constraints straightforwardly. Yet the MM algorithm has a critical issue of choosing the appropriate surrogate functions [9]. In medical imaging arguably the most well known application of MM framework was originated in Fessler et al. [10] and Erdogan & Fessler [11] for transmission tomography, where separable quadratic surrogate (SQS) functions were introduced based on a uniform additive type of weight functions. The uniform weighting has a clear advantage in the sense that it can be pre-computed and needs no iterative updates [10]. In FMT, Dutta et al. [5] followed [10,11] and employed the uniform additive weight function to study the effects of the joint L 1 and total variation regularization method. Our previous work [6,7] also followed this additive type of weighting when we investigated a family of nonconvex regularization methods. Nevertheless, uniform weighting tend to ignore the different updating needs at different nodes, which could hinder the convergence of iterations to the true solution.
Recently a nonuniform weighting strategy has been reported to improve image qualities in computed tomography [12,13]. In this paper, we follow the spirit of [12,13] and propose a non-uniform multiplicative weighting with ordered subsets (NUMOS) technique, which is a generalization of the image space reconstruction algorithm (ISRA) [14,15], for the MM algorithm in FMT in small animal imaging, and validate its advantages over its uniform counterpart using both simulated data and phantom experiment.

Forward modeling
In the continuous wave domain, photon transfer is modeled by the following coupled diffusion equation, along with Robin type (mixed) boundary conditions: where ∇ denotes the gradient operator, r the location vector, D ex (r) = [3(µ s,ex (r) + µ a,ex (r))] −1 and D em (r) = [3(µ s,em (r) + µ a,em (r))] −1 , with µ a,ex (r), µ a,em (r) being the absorption coeffi- cients and µ s,ex (r), µ s,em (r) being the reduced scattering coefficients at excitation and emission wavelengths, Φ ex (r), Φ em (r) are the photon fluxes, S ex (r) is the excitation source term, S em (r) is the fluorescence yield to be reconstructed, and n is the outward unit normal vector of the boundary, and α ex , α em are the Robin boundary coefficients.
The above equations can be solved using the finite element method (FEM), leading to a simple linear system of equations [16,17]: where A = (a i j ) ∈ R N m ×N n , a i j > 0, is the system matrix, x = (x j ) ∈ R N n ×1 is a short way to write the fluorescence yield S em (r), b = (b i ) ∈ R N m ×1 the measurements, and N m , N n the total number of measurements and FEM mesh nodes, respectively. Note that N m = N d * N s with N d , N s being the number of detectors and excitation sources.

Regularized least squares
A typical solution of (2) is obtained by minimizing the following regularized squared datameasurement misfit under the non-negativity constraint: where λ is the regularization parameter and R(x) is the regularization term, such as the L p (semi-)norm: R(x) = x p p , p ≥ 0 . Recall in the MM algorithm, the definition of a surrogate function Φ sur (x) in the minimization problem, the following three conditions should hold: We follow the SQS routine [11] to construct Φ sur (the right-hand side of the inequality below) for the data-fitting term in (3) as follows: where constant is independent of x j , β i j > 0 and ∑ N n j=1 β i j = 1. Note that superscript k refers to values of variables at k th iteration, which are known.
The regularized non-negative minimization problem (3) may then be easily solved, especially when we employ the i j β i j by κ j and the regularization parameter by λ 1 , each x j has the following analytical iterative updating formula: where u + = max(0, u), representing the positive part of any function u. For the details of the above derivations, we refer readers to [6], where analytical formulas for nonconvex L p , 0 < p < 1, and log regularizations were also presented.

Nonuniform update
As we mentioned in the introduction section, we clearly see in (5) and (6) that choosing a suitable weight β i j is critical for the success of the MM algorithm, since it determines κ j , which dictates the convergence speed of the algorithm. Contrary to the popular uniform additive weighting [5,6,10,11]: , we propose to adopt in this paper the following nonuniform multiplicative weighting [11,15]: We briefly analyze why (7) meets our needs better in the following. Substituting (7) into the κ j in (6), we have κ M j = (A t Ax k ) j /x k j and (6) has the following much simplified form, noting that the positive function (·) + is not needed if λ 1 = 0: In comparison with its counterpart obtained by using the additive form, where κ A j = (A t A1 N n ) j with 1 N n being the N n -dimensional vector with all entries equal 1 : we observed two potential advantages of (8): firstly, the nonuniform update (8) clearly uses only part of the calculations in (9) so it is less expensive; and secondly, (8) implies that during the iterative process, once x k j = 0, then x˜k j remains 0 for allk ≥ k, as long as not all components are 0, so it naturally promotes sparsity, which is desirable in most FMT problems [18].
We then utilized a random version of the ordered subsets (OS) technique [5,11,19] to enhance the convergence speed of (8). In case an even division of subsets is not possible, we simply ignore those "extra" measurements. We then have the following NUMOS algorithm: on a random partition of the N d detectors, and select the corresponding B i ; where 0 < x 0 < 1 is randomly picked, N max is the number of iterations, nOS number of subsets, . * and ./ entry-wise multiplication and division respectively, and δ stop the empirically chosen stopping criterion.

Remarks
We remark here that the better performing nonconvex (L p , 0 < p < 1, and log) regularizations as identified by our previous work [6,7] and a recent work from another group [20] are essentially doing nonuniform updates, the effects of which will be undesirably canceled in part, if not all, by the effects of the nonuniform updating of κ j . So in this paper we only include L 1 regularization to retain the nonuniform updating of κ j in hope of further gain in image qualities.
We also remark that the recent NU-SQS multiplicative variant [13]: does not lead to a simple update as (8), hence it does not carry any of the aforementioned benefits when compared with the uniform additive weighting. It may however significantly enhance the convergence if the initialization is close to the truth as shown in [13]. But since we don't have a good initialization as the filtered-back-projection result used in [13] for computed tomography, we will investigate this variant in the future.

Image quality metrics
In the analysis of the qualities of the reconstructed images and the selection of the best regularization parameter, we used on the same four metrics as proposed in our previous work [6]: the volume ratio (VR) [21] that is the ratio of the reconstructed volume (typically defined as the voxels with intensities greater than half of the maximum) to the true volume, the dice similarity coefficient (Dice) [22] that is the ratio of the intersection to the average of two volumes, and the well known mean squared error (MSE) and contrast-to-noise ratio (CNR) [23]. VR is a measure of sparsity of the reconstructed target and Dice quantifies the shape and location accuracy. The closer VR and Dice are to 1, the better. MSE is only calculated if the ground-truth is known and usually the smaller it is, the better. CNR is about the reconstructed image only and higher values are preferred.

Numerical simulations and results
To validate the advantages of NUMOS, we first tested it on simulated data. For a fair comparison with the uniform additive weighting without OS we adopted in [6], here we used the same simulated data set and first set nOS = 1, i.e. no OS, then increased nOS by orders of 2 [11]: 2, 2 2 , ..., 2 7 . We briefly recapitulate the simulation setup as follows. Two fluorescent capillary tube sources were inserted inside a mouse mesh, which consisted of 32,332 nodes and 161,439 tetrahedral elements. The tubes diameters were 2 mm and lengths 20 mm. The fluorophore concentration inside the tubes were 1 and outside 0. A total of 60 laser excitations source nodes were uniformly selected along the trunk of the mouse and all the 4020 surface nodes covering the trunk were set as detectors. The tissue optical properties were µ a = 0.007 mm −1 , µ s = 0.72 mm −1 at 650 nm, the excitation wavelength, and µ a = 0.014 mm −1 , µ s = 0.78 mm −1 at 700 nm, the emission wavelength. White Gaussian noise with a signal-to-noise ratio of 1 was added to the simulated measurements. All our calculations were done on an Intel i5 2400 3.1GHz PC with 16GB memory.
The simulation result is shown in Fig. 1, with detailed image quality metrics in Table 1. Note that in this and the following table, we also included the metrics for the best images reconstructed using the uniform additive weighting (Uniform) along with nonconvex regularizations, which were directly copied from our previous work [6]. We clearly see that the proposed NU-MOS with nOS = 1 significantly outperforms "Uniform" in obtaining a more localized and less noisy image. The sparsity (VR) and location accuracy (Dice) have been significantly enhanced, CNR is more than two times higher, and MSE is about 40% lower, as indicated by Table 1. The computational cost of NUMOS, nOS = 1, is less than 1/4 of Uniform. This is partly because that NUMOS needed only about 1400 iterations to stop, while Uniform took 2000 iterations to reach that result. As for results from different nOS, we presented the cases when nOS = 16, 32, 64 and 128 in Fig. 1(c)-(f). The results shown are averaged over 6 runs. For nOS = 16 and 32, we obtained further speed gain, without sacrificing much of the image qualities. From Fig. 1(d) and Table 1, we can see that nOS = 32 is a reasonably good choice since we have about 5.4X speed gain against the nOS = 1 case. For nOS = 64 and 128, image qualities are still acceptable although they start to deteriorate. We may still opt for these cases if we want further reduction of computation time by another 24% and 40%, respectively. Table 1. Metrics for simulated mouse: reconstructed using NUMOS with nOS = 1, 16, 32, 64, and 128, respectively, and λ 1 = 1.0E-03 v.s. uniform weighting with nonconvex L 1/2 regularization, λ 1/2 = 4.0E-05, the best result among different regularization methods [6]. The overall best performance is highlighted in bold.

Phantom experiments and results
We further validated NUMOS with phantom experiment and also tested it with nOS = 1 and with nOS ranging from 2 1 to 2 7 . The cubic phantom was again the same as we used in [6] with dimension 32 mm by 32 mm by 29 mm and was composed of 1% intralipid, 2% agar, and water in the background. In the two capillary target tubes, both 6.5 µm DiD fluorescence dye solution and uniformly distributed 18 [F]-fluoro-2-deoxy-D-glucose (FDG) at activity level of 100 µCi were injected for simultaneous FMT and positron emission tomography (PET) scans. The PET result will be used to validate our FMT. The excitation laser at wavelength of 650 nm scanned the front surface of the phantom at 20 illumination nodes. The emission wave length was 700 nm and there were 1057 detectors for measurement collection. The tissue optical properties were µ a = 0.0022 mm −1 , µ s = 1.10 mm −1 at both 650 nm and 700 nm wavelengths. The FEM mesh has 8690 nodes and 47,581 tetrahedral elements. The PET images were thresholded at 20% of the maximum FDG concentrations to identify the positions of the capillary tubes. For more details of the PET imaging, please refer to [24]. The phantom experiment result is shown in Fig. 2, with detailed image quality metrics in Table 2. The reconstructed images are normalized so that they can be compared with the truth  we obtained from PET, where intensity information is not comparable. The findings here are similar to the simulation case. We see that in comparison with "Uniform", NUMOS, nOS=1 takes 12x less of time to obtain a results with improved metrics: VR much closer to 1, CNR much higher, and Dice coefficient also slightly higher. The results for different nOS's are averaged from 6 runs. The results from cases of nOS = 16 and 32 are again very comparable to those of the case of nOS = 1. And nOS = 32 provides an additional about 3x speed gain. When nOS is further increased, the image qualities start to drop although we again see some speed gain. Table 2. Metrics for cubic phantom, reconstructed using NUMOS with λ 1 = 5.0E+03, and nOS = 1, 16, 32, 64, and 128, respectively, v.s. uniform weighting with L 1/2 , λ 1/2 = 5.0E+07, the best result among different regularization methods [6]. The overall best performance is highlighted in bold.

Discussion and conclusion
In this paper, we proposed NUMOS, a robust nonuniform multiplicative weighting strategy for the MM framework in FMT with L 1 regularization and ordered subsets acceleration. Using both simulation and phantom data, we compared NUMOS with the other popular uniform additive weighting strategy in combination with best possible regularizations. We started the iteration from an arbitrary uniform initialization, ran a maximum of 5000 steps and stopped when x k+1 − x k 2 / x k 2 < δ stop * nOS, with δ stop empirically set as 4E-04 for simulations and 9E-04 for phantom. NUMOS was robust for noise levels as high as having SNR of 1. NU-MOS performed better in the sense that the reconstructed targets were more localized (smaller VR, higher Dice, higher CNR, and smaller MSE (only for simulation case) than those from its uniform counterpart, and also in terms of its significantly faster speed of convergence. On an Intel i5 2400 3.1GHz PC with 16GB memory, each iteration took about 0.6 seconds (v.s 1.6 seconds for the uniform weighting in [6]) for the simulated data with SNR of 1, and about 0.025 (vs. 0.09) seconds for the cubic phantom. NUMOS (nOS = 1) also needed much less iterations to converge, bringing the total speed to 4x faster for the simulation and 12x faster for the phantom than its uniform counterparts. By setting the ordered subset number to be 32, we were able to get an additional 5.5x and 3x speed enhancements for the numerical simulation and cubic phantom respectively. For a better understanding of the uniform additive and nonuniform multiplicative types of β i j in section 2.3, we examined a small variation of the multiplicative form (7), where z j is a small positive number [10]. It is essentially using the maximum of β M i j and β A i j at each step, which performed almost the same as β M i j , but with a slightly slower convergence speed as expected. We also examined a more general additive form But not much improvement in the reconstructed images was observed when we varied the p to different values other than 1, with the exception of 0.9 ≤ p < 1, where we were able to see some slight improvement in the convergence rate. Although the L 1 regularization is not our emphasis as we already remarked in the method section, we understand that there are a lot of efficient L 1 algorithms available. A good summary of five types of closely related L 1 optimization models can be found in [25], where more than ten specific algorithms are also discussed. However, as the authors there stated, all those algorithms are not equivalent to the regularized model we adopted (QR λ in [25]). There are a lot of subtleties to be taken care of before they can be adopted into FMT. Here we briefly compared NUMOS with the popular iterative soft-thresholding algorithm (ISTA) [26] and fast iterative soft-thresholding algorithm (FISTA) [27] algorithms, which have been successfully implemented in FMT [28,29] recently. We first tested the algorithms on the phantom experiment data since its size is much smaller and the calculation of its largest eigenvalue is easy. The results are shown in Fig. 3 (a) and (b) and Table 3. We have not been able to directly carry out the calculation on the digimouse due to the fact that ISTA and FISTA need the estimation of the eigenvalues of the system matrix, which is too large (requires ∼ 62GB memory for storage alone). We noticed there was a lot of background noise in the reconstruction result of FISTA as shown in Fig. 3(b). Further investigation revealed that FISTA did not monotonically decrease the objective function when nonnegative solutions are sought. We applied the backtracking technique to guarantee the strict decrease of the objective function hoping to get rid of the noise. The result is shown in Fig. 3(c). To the best of our knowledge, FISTA with backtracking line search has not been applied to FMT before. The image quality has been greatly improved and is comparable to the result from our proposed NUMOS with nOS = 1 as can be seen from Fig. 3(d). Yet NUMOS is already about 10 time faster. It will be a few extra times faster if OS feature is turned on. In short, our proposed NUMOS outperforms both ISTA and FISTA in yielding an accurate image in much short period of time.
For regularization problems, selection of the optimal regularization parameters has always been a issue. Although we mentioned in section 2.5 that we followed our previous work [6] where a detailed comparison of metrics VR, Dice, CNR and MSE were used, it would still be interesting if we can find an automatic way for quick selection. This is also related to how the   It was brought to our attention that a clustered/group sparsity promoting algorithm has been introduced into diffuse optical imaging recently [31]. We will consider similar techniques in FMT as well in the future.
For the OS technique, the choice of nOS depends on the size of our measurements, the stopping criterion, and how much we want to sacrifice the image qualities for a faster convergence speed [11,30]. For our simulation data, we found that for nOS = 16, 32, the result is not too different from the case for nOS = 1 and for nOS = 64, 128, the results are still quite good. For phantom experiments, nOS = 16, 32, 64 are all good but nOS = 128 is not working well. A possible explanation is that much less measurements in each subset may cause the algorithm unstable. One important requirement for the OS technique is to choose the subsets in a balanced way [19]. That's why we did the randomization trick, which has unfortunately increased quite some computational cost and prevented us from obtaining a theoretical nOS times of speed increase [11]. We plan to further study the distribution of our detector positions so that a deterministic way of choosing ordered subsets for the FMT system may become possible, which may make it possible for NUMOS to take even less time.
Lastly, as we mentioned in the introduction, when λ 1 = 0 and nOS = 1, the NUMOS updating formula (8) was also known as the ISRA algorithm [14,15], and the "non-negative matrix factorization" (NNMF) algorithm [32]. It is interesting to note that this special case of NUMOS, under the NNMF framework, was used to remove background autofluorescence effectively [33,34]. In our future in vivo study, we may follow the NNMF framework and preprocess the data straightforwardly to guarantee sparsity.
In conclusion, our proposed NUMOS algorithm is superior to its uniform counterpart that was popular for the MM framework in FMT and it is also significantly faster than some state-ofthe-art algorithms such as FISTA, even when we disable the OS feature for speed enhancement. When a reasonably large number of ordered subsets are used, our NUMOS can be 3 ∼ 5 extra times faster.