Improving mesoscopic fluorescence molecular tomography via preconditioning and regularization

: Mesoscopic fluorescence molecular tomography (MFMT) is a novel imaging technique capable of obtaining 3-D distribution of molecular probes inside biological tissues at depths of a few millimeters with a resolution up to ~100 μ m. However, the ill-conditioned nature of the MFMT inverse problem severely deteriorates its reconstruction performances. Furthermore, dense spatial sampling and fine discretization of the imaging volume required for high resolution reconstructions make the sensitivity matrix (Jacobian) highly correlated, which prevents even advanced algorithms from achieving optimal solutions. In this work, we propose two computational methods to respectively increase the incoherence of the sensitivity matrix and improve the convergence rate of the inverse solver. We first apply a compressed sensing (CS) based preconditioner on either the whole sensitivity matrix or sub sensitivity matrices to reduce the coherence between columns of the sensitivity matrix. Then we employed a regularization method based on the weight iterative improvement method (WIIM) to mitigate the ill-condition of the sensitivity matrix and to drive the iterative optimization process towards convergence at a faster rate. We performed numerical simulations and phantom experiments to validate the effectiveness of the proposed strategies. In both in silico and in vitro cases, we were able to improve the quality of MFMT reconstructions significantly.


Introduction
As a novel imaging modality combining the benefits of millimeter deep molecular imaging with high sensitivity, Mesoscopic Fluorescence Molecular Tomography (MFMT) can resolve the distribution and concentration of fluorophores inside biological tissues based on boundary measurements [1,2]. MFMT has the unique potential to cover the gap between microscopic and macroscopic examination of biotissues by enabling resolution up to hundred micrometers level at a depth of a few millimeters in intact tissues (without optical windows or clearing agents). MFMT has already found applications in imaging bio-printed thick constructs [3,4], brain molecular [5,6], dental imaging [7] and tumor xenografts studies [8,9]. However, performances of MFMT are intrinsically associated with the inverse optical problem that need to be solved. Typically, the MFMT inverse problem is ill-posed and ill-conditioned. Hence it is very challenging to solve it in a robust fashion. Among the various techniques developed to solve ill-conditioned problems, iterative methods, embedded with various optimization algorithms, have become an effective approach that use successive approximations to obtain solutions with the help of additional constraints introduced by appropriate regularization techniques [10,11]. However, the performance of the iterative method adopted depends greatly on the incoherence and the spectrum of the forward sensitivity matrix. Additionally, many iterative methodologies rely on preconditioners to improve performance and ensure fast convergence [12]. In the case of the MFMT inverse problem, the measurements acquired are characterized with high redundancy due to the diffuse nature of the light collected, reflection geometry employed and dense spatial sampling. Hence, the associated Jacobian demonstrates high coherence between its columns and high condition number due to fine discretization of the volume to be imaged to attain hundred micrometers resolution. In turn, the designed iterative optimization algorithm will slowly converge and may be susceptible to the propagation and amplification of errors after successive iterations [13].
On the other hand, Compressed Sensing (CS) methodologies have been proved to be successful methods to recover sparse signals from far fewer samples than required by the traditional Shannon-Nyquist sampling theorem [14,15]. The CS framework guarantees accurate recovery of sparse signals under certain conditions, one of which is incoherence, i.e. in our case, the degree of the orthogonality of forward sensitivity matrix. Recent works in diffuse optical tomography (DOT) have demonstrated that using a properly preconditioning matrix can reduce the coherence between columns of the forward sensitivity matrix, and as a consequence alleviate the ill-posedness of the underdetermined linear system [16,17]. Similarly, our group has proposed a preconditioning strategy to reduce the coherence of sensitivity matrix for wide-field fluorescence molecular tomography [18]. As for the MFMT inverse problem itself, Yang et al. [19] investigated the influence of noise on reconstruction performance and addressed a two-step data reduction approach to achieve high fidelity results. Nevertheless, the implementation of effective preconditioning methodologies in MFMT has not been reported to date but is expected to greatly improve the performances of the reconstruction process.
In this paper, we propose two computational methods to improve both incoherence and condition number of the sensitivity matrix of MFMT. First, we propose a preconditioner to reduce the correlation between columns of the Jacobian, which contribute to mitigating the ill-posedness of the inverse problem. Then another well-designed regularization parameter is added to improve the convergence rate of the inverse solver to the targeted tolerance. The proposed strategies are described in Section 2 in details. Section 3 presents the objective metrics to evaluate spectrum property, coherence of Jacobian, noise level of measurements, and the MFMT reconstruction performances. Section 4 summarizes the in silico and in vitro experiment setups with reconstruction results. The discussion and conclusion are provided in Section 5.

Configuration of the MFMT optical system
The optical configuration of our second generation MFMT system has been described in details in previous reports [4,19]. Here, we provide the main salient features that are relevant to the optical inverse problem. Briefly, the system is built around the raster scanning of an illumination spot over the specimen with detection performed by an electron-multiplying CCD camera (emCCD) (iXon EM+ DU-897 back illuminated, Andor Technology) acquiring data in a descanned configuration. This set up enables the acquisition of up to 512 × 512 measurements in parallel per illumination spot, leading to very dense spatial data sets. To obtain a data set that can be inverted efficiently, the emCCD data can be binned into superpixels that still provide high accuracy and resolution [19]. The discretization of the image space and positioning of the optodes configuration are illustrated in Fig. 1, where the green squares, red squares and blue squares respectively represent the discretized voxels of the imaging volume, pixels of emCCD camera, and super-pixels binned at 2 × 2 to provide enhanced SNR. This configuration and on the chip binning strategy leads to a total of 256 × 256 super pixels as detectors that cover a detection FOV of 6.4 × 6.4 mm 2 . To cast the inverse problem, we further down sample the measurements space by selecting a 7 × 7 detector matrix (D 1 , D 2 , ……, D 49 ) with a separation of ~0.6mm. To improve the dynamical range, the central detection which coincides with the position of the illumination spot is occluded during acquisition so detector 25 (D 25 ) is not employed. On the illumination side, the raster scanning step size and dwelling time are typically set to 100 µm, 20 ms respectively, leading to a total of 961 scanning locations in the illumination FOV of 3.1 × 3.1 mm 2 . Last, the volume to be imaged is discretized uniformly in voxels of 100 μm length, leading to R i (i = 1, ……, 31), C j (j = 1, ……,31), and Z k (k = 1, 2,……, 30) number of voxels along the x, y and z axes respectively. Overall, this configuration leads to 46,128 measurements that are employed in the inverse optical problem (out of 251,920,384 possible without binning).

Forward model and inverse problem
The source-detector separation in our MFMT implementation can be as small as 600 μm. Hence, in view of typical optical properties of tissues, the collected photons cannot be accurately modeled by the Diffusion Equation [20]. Therefore, we employ Monte Carlo (MC) method, commonly known as the gold standard [21]. The MC method is well established as being accurate in modeling photon propagation in shallow tissues and near sources [22,23]. To meet the accuracy requirements under high resolution of MFMT operating in the mesoscopic regime, up to 10 8 photons are simulated for each source/detector with the GPUbased software MCX to simulate continuous-wave (CW) Green's functions G x and G m at excitation and emission wavelengths [24]. The weight function W for one source-detector pair is then computed with an adjoint formulation for efficiency [25]: where r s , r d , and r are locations for source, detector and any location inside the specimen being probed, respectively. The measured fluorescence intensity for the corresponding source-detector pair U(r s , r d ) can be formulated as the integral equation below: where η(r) is the 3-D distribution of the fluorophore's effective quantum yield. As at this time our experimental system is limited to the continuous wave (CW) data type for MFMT applications, however it's straightforward to expand both Eqs. (1,2) to time-dependent form given time-resolved Green's functions and the lifetime of fluorophore τ [26]. With M sourcedetector pairs, and N discretized voxels of the Region of Interest (ROI), the inverse problem for MFMT in the matrix form becomes: where A∈R M × N is the Jacobian, X∈R N × 1 represents the effective quantum yield, b∈R M × 1 is the vector of detector readings at all source positions. The CS based framework has been demonstrated as an effective methodology to exactly retrieve sparse solutions from under-determined linear systems [10,11,14]. Fortunately, the solutions in our MFMT application are inherently sparse because fluorescent biomarkers are supposed to be clustered within small regions of tissues such as tumors. To achieve sparsity in the image space, we add the commonly used l 1 -norm regularization term so that the optimization problem now becomes: where n i 1 1 X = x  denotes the l 1 -norm of fluorophore distribution vector and λ is the regularization parameter. The minimization problem in Eq. (4) can then be transformed to a convex quadratic problem with linear inequality constrains, and solved by a variety of inverse solvers such as the interior-point method that we applied in this work [27].

Preconditioning to reduce the coherence of the sensitivity matrix
The first preconditioning strategy applied aims at reducing the coherence of the sensitivity matrix, so that the performance of CS-based sparse signal recovery algorithms is improved. The design of preconditioner follows a similar approach as described in [16-18] so we only briefly introduce the derivation. When a preconditioner M A is applied, the original Jacobian matrix becomes: For simplicity, we assume that the preconditioned sensitivity matrix A pre is also columnnormalized, and to minimize the coherence of A pre , we seek to determine M A such that the Gramian matrix T pre pre A A approximates the identity matrix [10,11,28], Equation (6) still holds after multiplying A on the left and A T on the right, Taking the Singular Value Decomposition of A = U A  A T A V , and substitute it into Eq. (7), we get: Thus, we choose M A as the preconditioner, During the actual implementation, a regularization parameter is usually added for stabilization when having a large condition number, so the final form of our first preconditioner becomes: ( ) where ϵ = 0.1 is empirically selected to provide the best trade-off between reducing matrix coherence and suppressing noise propagation in the measurements under the experimental noise level (SNR = 4) [18]. Note that although the construction of MFMT Jacobian also follows the adjoint form, we cannot separately reduce the coherence of excitation and emission matrices and compute the preconditioned sensitivity matrix through the Kronecker product, referred as "separate masks" in [18]. This is because the excitation and emission light fields of MFMT are coupled with each other, i.e., the source and CCD camera always move together and remain the same relative position as the scanning proceeds. Thus, only "global mask" strategy can be applied to MFMT, although it's shown that "separate masks" are slightly superior in suppressing noise and improving resolution in the reconstruction results [16,18]. Though, we can harness the symmetry of the descanned acquisition configuration that leads to a block configuration in the forward model (one source-48 detectors).
Herein, we propose a "sub-preconditioning" approach in which sub-matrices are preconditioned to overall lead to a reduction of the coherence of the whole matrix. In our case, 48 measurements are collected from each of the 961 scanning positions, so we can either perform preconditioning on 961 sub-matrices with 48 rows, or on 48 sub-matrices with 961 rows. In practice, we chose the latter one because it's more effective at reducing the overall coherence of the Jacobian. Thus, each sub-sensitivity matrix A p (p = 1, ……, 48) corresponds to measurements from the same detector at all scanning positions. In the following studies, preconditioning strategies will be performed both globally and block by block, referred as "whole-preconditioning" and "sub-preconditioning", and the performance of the two approaches are compared and analyzed.

WIIM to speed-up the convergence rate of the inverse solver
The convergence rate of an iterative optimal algorithm to solve a large linear system depends mainly on the condition number of its coefficient matrix [13,29]. Unfortunately the condition number in MFMT application is usually very large due to the high correlation between different source-detector pairs resulting from dense sampling and fine discretization. Hence an effective method to reduce the condition number of the sensitivity matrix is highly desired.
To this end, here we employ a regularization approach, Weighted Iterative Improvement Method (WIIM) [30], to effectively improve the ill-conditioned situation and thus contribute to a faster convergence rate of the iterative algorithm. We first construct the iterative process as following, where Λ = A T A is a square matrix, y = A T b, I is the identity matrix, and k = 1, 2, 3, … is the iteration number. The nonzero constant γ determines the degree of improvement for the condition number of Λ. A small γ leads to minor improvement of convergence rate while a very large γ could result in stagnation of iteration or even divergence. It's usually determined from the following empirical formula [30]: where δ is the lower bound of all nonzero elements in the sensitivity matrix. The initial value of γ can be given according to Eq. (13) where β is the absolute value of the smallest eigenvalue of matrix Λ and log is logarithm at the base 10. Let x k+1 = x k + e k and r k = y -Λx k , we can substitute them into Eq. (11) and get: We can apply the same CS-based solver as we solve Eq. (3) because e k is the difference of fluorophore concentration between two iterations and thus is also sparse. With an initial guess 0 x , we can then repeatedly solve Eq. (14) and update the values of e k , r k , and x k . If the last residual norm ||r k || reaches the targeted tolerance, or the iteration number reaches the maximum, the approximated solution is reached. Note that the iteration steps described above can be easily implemented to be embedded in any selected inverse solver. Last, a positive constraints was applied during the iterative process.

The effect of the condition number on the precision of the solution
The condition number of the sensitivity matrix determines how much a small disturbance in the measurement b could affect the solution of the inverse problem x in Eq. (3): where cond(A) = ||A −1 || ||A|| is the condition number. A smaller condition number could lead to results that are accurate and stable against noise, because the error magnification during iterations is minimized. Therefore, it's beneficial to apply preconditioning methods to reduce the large condition number of the MFMT sensitivity matrix.

Coherence of the sensitivity matrix
One of the most commonly used metric in the compressed sensing literature to assess the incoherence of a matrix A is the mutual coherence [16], defined as the largest normalized inner product of two different columns a p and a q : , ,

Signal to Noise ratio
Noise is an important factor impacting the accuracy of the MFMT inverse problem. To describe the noise level of the MFMT system more faithfully, we define the Signal-to-Noise Ratio (SNR) in our experiment as follows: where S f is the fluorescence signal from all source positions, and S b is the average of S f with the ROI. S r is also the fluorescence signal but with the sample replaced by a beam dump. So For in silico experiments, a Gaussian noise with mean of zero and standard deviation of SNR is added to the simulated data set to mimic experimental measurements. We use SNR values ranging from 2 to 6 to explore the robustness of the proposed computational approaches on different noise levels.

Quantitative metrics to assess 3D reconstructions
For consistency, the same normalized figures of merit as [19] are used to quantify the difference between reconstruction results and the ground truth. Four normalized similarity metrics, normalized sum squared difference (nSSD), normalized sum absolute difference (nSAD), normalized disparity (nD), and normalized correlation (nR), are defined as follows.
where A(i, j, k), B(i, j, k) are the value of the normalized numerical model and reconstructed result at coordinates (i, j, k), N = R × C × Z is the total voxels in a cubic phantom and R, C, Z are the number of voxels three axes, respectively. , ′ ′ A B are binaries (0 or 1) of A, B through thresholding, and ⊕ represents the exclusive or (XOR) operation. The values of these four metrics are between 0 (for two absolute different models) and 1(for two coincident models). In other words, a larger value of the metrics indicates higher similarity between the reconstruction result and the ground truth. A numerical phantom was designed to evaluate the performance of the proposed postprocessing methods, as shown in Fig. 2. The imaging domain had a surface area of 3.1 × 3.1 mm 2 with a depth of 3 mm and was uniformly discretized into 31 × 31 × 30 voxels with 100 μm resolution. A vascular tree with a main trunk and three groups of offshoots was placed within the phantom at z = 1 mm. The diameter of the trunk and offshoots were 400 μm and 200 μm separately, and the separation distance between two adjacent off shoots was 100 μm. The fluorophore concentration was assumed to be uniform in the vessel with effective quantum yield equal to 1. The optical properties of domain were assumed to be homogeneous at the excitation wavelength, with absorption coefficient μ a = 0.02 mm −1 , scattering coefficient μ s ' = 1 mm −1 , index of refraction n = 1.34, and anisotropy factor g = 0.81. These values are derived from the collagen scaffold typically employed in our bio-printing application at 6-9 mg/ml density [32] and μ s ' is also on the same level with many biological tissues [33].

Simulation settings
We replicated the imaging system configuration as used in a real experiment: 961 scanning positions and 48 detectors at each source location. The in silico measurements are generated by multiplying the sensitivity matrix with the bio-printed vasculature model. We added Gaussian noise with SNR as low as 2. Then reconstructions are performed as described in Section 2. For each of the following simulations and experiment cases, the optimal regularization parameter is chosen through L-curve analysis [34].

Coherence reduction via preconditioning
We first evaluated the performance of the preconditioning method on improving the incoherence of the sensitivity matrix and reconstruction results with the numerical phantom. As we mentioned in Section 2.3, it could be directly applied on the whole sensitivity matrix (whole-pre) or block by block (sub-pre). We compared the distribution of largest 30% of normalized inner products between two columns of the sensitivity matrix in Fig. 3(a), and the cumulative coherence of the sensitivity matrix as a function of column number k in Fig. 3(b). As shown in the insert in Fig. 3(a), the sub-pre reduces the relative Area Under the Curve (AUC) to 45.27% while whole-pre reduces the AUC to 34.79%. The average slope listed in the insert in Fig. 3(b), a quantitative index of cumulative coherence, also shows a slower increment after applying the preconditioner on whole sensitivity matrix and sub sensitivity matrix, which demonstrates both preconditioning could effectively improve the orthogonality of the matrix. Fig. 3. Curves of the normalized products (a) and cumulative coherence (b) before and after applying the preconditioner on whole sensitivity matrix and block by block sensitivity matrix.
We then compared the reconstruction results between non-pre, whole-pre and sub-pre, at different measurement noise levels (from 2 to 6). The comparison of four reconstruction metrics between non-pre and whole-pre is shown in Fig. 4(a), and those between whole-pre and sub-pre is shown in Fig. 4(b). At all noise levels, the reconstruction quality with the proposed preconditioning methods outperforms that without preconditioning. Moreover, though less effective in reducing the coherence of sensitivity matrix, sub-pre can retrieve the in silico model with greater similarity and smaller disparity compared to whole-pre. The 3D reconstruction images of three cases are shown in Fig. 4(c)-4(e). In accordance with the evaluation metrics, sub-preconditioning provides visually the best result. Although whole preconditioning is better at reducing matrix coherence, the reconstruction results are less ideal than sub preconditioning. This observation is in agreement with the results in [18] and [16], which can be explained by the higher condition number and thus more severe noise amplification when applying whole-preconditioning.

Faster convergence via the weighted iteration improvement method
We further evaluated the performance of the proposed Weighted Iteration Improvement Method (WIIM) on accelerating the convergence rate of the inverse problem in four scenarios, named whole-pre without WIIM (M1), sub-pre without WIIM (M2), whole-pre with WIIM (M3) and sub-pre with WIIM (M4). It can be seen from Figs. 5(a)-5(c) that the convergence rates without WIIM always stagnate before the desired tolerance, especially for the cases with low SNR level. However, after our l 1 -norm regularization based iterative algorithm is embedded with WIIM, an effective improvement on the convergence rate can be seen. The last residual norm ||r k || always becomes much smaller, as reported in Table 1. By comparing M1 with M3 (both whole-preconditioning) and M2 with M4 (both subpreconditioning), we can see from Table 1 that WIIM could help improve reconstruction results for both Jacobians. On the other hand, the evaluation metrics of M3 worse than M2 shows that preconditioning has greater impact on the results than WIIM because the latter one is just a minor improvement for the iterative optimization algorithm. In addition, sub-pre converges faster than whole-pre when WIIM is applied, as illustrated by the green and blue curves in Fig. 5(a)-(c). Overall, the proposed WIIM works well for both whole sensitivity matrix and sub sensitivity matrix to fasten convergence rate.
The visual reconstruction results in Fig. 5(d)-(f) further demonstrate the improvement of reconstruction quality with WIIM. The obvious improvement from Fig. 5(d)-5(f) indicates that the deterioration of reconstruction could likely results from the stagnation before the iterative algorithm converges to the desired tolerance.

Performance of preconditioning methods on experimental data
Next we evaluated the performance of the proposed preconditioning and regularization methods on an experimental data set. The collagen phantom to be explored is homogeneous with optical properties μ a = 0.02 mm −1 , μ s ' = 1 mm −1 , n = 1.34, and g = 0.81. It has a size of 3.1 × 3.1 × 3.0 mm 3 , and voxels of 100 × 100 × 100 μm 3 , same as the simulated case. Four polystyrene fluorophore beads (GFP 488/509, Cospheric) were placed 1.7~1.8 mm beneath the sample surface under standard protocols. The transversal slices across the x-y and y-z planes taken from a micro-MRI are shown in Figs. 6(a)-(b) and the whole 3-D view of the phantom, overlaid with the best reconstruction result, is shown in Fig. 6(c). The measurements were recorded as described in Section 2.1 and the data acquisition time at each scanning position was less than 20 ms. Similar to the in silico cases, we compared the condition number, convergence rate, and reconstruction metrics, when no / whole / subpreconditioning is preformed, with and without WIIM. All the inverse problems were solved with our l 1 -norm reconstruction algorithm as described in Section 2.2 and the optimal regularization parameters were determined through L-curves. As an example, Fig. 7(a) shows the L-curve for experimental data to determine the optimal regularization parameter when applying sub-preconditioning. As shown in Table 2, the proposed preconditioner can effectively reduce the condition number as well as the coherence between columns of the sensitivity matrix in both formations of. It leads to a reduction of 2,524 and 3,628 times of condition number in the two sensitivity matrix formations, respectively. Meanwhile, the last residual norm also drops from 1.1479 to 0.0081with the help of WIIM, which is very close to the targeted tolerance. The four evaluation metrics of reconstruction results in Table 2 further indicate the proposed computational methods greatly improve the reconstruction fidelity. Although whole-pre and sub-pre strategies have the same last residual norm, sub-pre achieves lower condition number as well as faster convergence rate. Visual results in Fig. 7(b)-7(e) and evaluation metrics in Table 2 also validate the superiority of reconstruction quality from sub-pre. However, as shown in Figs. 7(d), even the best results still suffer from some artifact, which illustrates that the proposed methods may still be sensitive to noise. Figure 7(e) shows the reconstruction result when applying both methods on the sub sensitivity matrices and noise suppression strategy as described in [19], which is very close to the ground truth obtained from micro-MRI.

Conclusion
We have proposed two computational methods to reduce the coherence and condition number of the sensitivity matrix of MFMT, and to accelerate convergence rate of the adopted iterative algorithm when solving the inverse problem. The preconditioning method can be directly applied on the whole sensitivity matrix or on sub-matrices of each detector, while the subpreconditioning strategy provides fast convergence rate and better reconstruction quality than whole-preconditioning despite larger coherence between columns of the sensitivity matrix. This is in agreement with the observations in [16-18], because direct preconditioning usually lead to a large condition number and amplification of noise in the measurement vector. The regularization method based on WIIM proves to be effective to avoid stagnation before the iterative algorithm reaches the target tolerance. We have tested the performance of these two methods on in silico as well as in vitro data sets. In both cases, the reconstruction fidelity is significantly improved compared to previous results. We plan to further investigate the potential of the proposed methods for augmented data sets, such as in time-resolved cases [26] when applied to FRET tomography [35] to enable to monitor cellular processes [36,37] and in the case of phased array systems [38][39][40][41].