Experimental Bioluminescence Tomography with Fully Parallel Radiative-transfer-based Reconstruction Framework

Bioluminescence imaging is a very sensitive imaging modality, used in preclinical molecular imaging. However, in its planar projection form, it is non-quantitative and has poor spatial resolution. In contrast, bioluminescence tomography (BLT) promises to provide three dimensional quantitative source information. Currently, nearly all BLT reconstruction algorithms in use employ the diffusion approximation theory to determine light propagation in tissues. In this process, several approximations and assumptions that are made severely affect the reconstruction quality of BLT. It is therefore necessary to develop novel reconstruction methods using high-order approximation models to the radiative transfer equation (RTE) as well as more complex geometries for the whole-body of small animals. However, these methodologies introduce significant challenges not only in terms of reconstruction speed but also for the overall reconstruction strategy. In this paper, a novel fully-parallel reconstruction framework is proposed which uses a simplified spherical harmonics approximation ( SP N ). Using this framework, a simple linear relationship between the unknown source distribution and the surface measured photon density can be established. The distributed storage and parallel operations of the finite element-based matrix make SP N -based spectrally resolved reconstruction feasible at the small animal whole body level. Performance optimization of the major steps of the framework remarkably improves reconstruction speed. Experimental reconstructions with mouse-shaped phantoms and real mice show the effectiveness and potential of this framework. This work constitutes an important advance towards developing more precise BLT reconstruction algorithms that utilize high-order approximations, particularly second-order self-adjoint forms to the RTE for in vivo small animal experiments.


Introduction
Bioluminescence Imaging (BLI) has become an increasingly important tool for in vivo preclinical research [1] [2]. Currently, planar (two-dimensional) BLI is the conventional imaging modality used in optical imaging. Optical photons emitted from bioluminescence sources in the small animal body (usually mice) propagate in biological tissues and are detected once emitted from the subject surface. Since in vivo biological tissues have high absorption and scattering characteristics, planar BLI only indirectly reflects the activity of the targeted biological object via the surface photon distribution [3]. Optical signals are easily confounded by the tissue properties in in vivo mouse experiments, especially when monitoring the initiation and progression of tumors over time within the same subject. The aim of BLT is to quantitatively acquire 3D information of bioluminescence sources, significantly improving the information quality of bioluminescence imaging.
Monte Carlo (MC) methods used with statistical models can obtain very precise simulations by tracking photon propagation in biological tissues [4]. However, these methods are severely time-consuming. Solving the radiative transfer equation (RTE) (i.e. Boltzmann equation) can also obtain precise simulation results as there is no Poisson noise in the simulation data [5]. Currently, nearly all 3D BLT reconstruction algorithms are based on the diffusion equation, which is a simple approximation to the RTE [6] [7] [8] [9]. Simulation and experimental reconstructions have shown that the diffusion approximation (DA) introduces significant artifacts in BLT reconstruction [3] [10]. Therefore, it is necessary to develop high-order approximation-based reconstruction methods to improve BLT reconstruction. The first-and second-order formulations of the RTE are usually used to directly solve the RTE [5]. Discrete ordinates (S N ) and spherical harmonics (P N ) methods, as two usual numerical approximations, can yield simulation solutions based on two types of formulations. Compared with first-order formulations, the operators acting on the second-order forms such as the even-odd parity (EOP) equations are self-adjoint [5]. This distinct advantage provides a straightforward application of the finite element methods (FEM) easily executed on complex heterogeneous geometries [11]. Furthermore, the acquired FEM matrix in the second-order equations is sparse positivedefinite (SPD), which yields better numerical stability and efficiency and benefits the development of reconstruction algorithms [12]. In order to generate an accurate simulation model, and regardless of the first-and second-order equations, one has to set N as large as possible and then N(N + 2) and (N + 1) 2 coupled equations corresponding to S N and P N methods need to be solved. This computational complexity, especially for the whole-body of small animals, creates a substantial challenge in the development of novel BLT reconstruction algorithms. Recently, a novel type of second-order approximation form, the simplified spherical harmonics (SP N ) method, has been developed for optical imaging [13], improving computational efficiency. Furthermore, a fully parallel adaptive FEM method was proposed to improve the simulation speed [14]. However, to obtain more accurate BLT reconstructions, a novel BLT reconstruction framework needs to be developed with the radiative transfer-based approximations to the RTE.
BLT is an inverse source problem and in the general case, its solution is not unique [15]. A priori information plays an indispensable role in BLT reconstruction. Among the various types of a priori information, multispectral measurements are important for achieving BLT reconstructions [16] [17][18] [19]. However, spectrally-resolved data sets can significantly increase the computational burden, especially when non-contact measurements are made using highly sensitive CCD cameras which acquire detailed surface photon distribution. In addition, due to the curved surface topography of the mouse and the necessity of using the heterogeneous characteristics of mouse tissues, numerical reconstruction algorithms, such as those based on FEM, are more suitable compared to analytical and statistical modeling-based methods [4].
BLT reconstruction is, in principle, similar to that of single photon emission computed tomography (SPECT) and positron emission tomography (PET) imaging. Therefore, reconstruction algorithms appropriate for PET and SPECT can be introduced to realize the BLT reconstruction [17]. In this case, the system response P-matrix needs to be computed, which is a very time-consuming step, although it can be obtained prior to acquiring the measured data. The BLT reconstruction is sensitive to various factors. Precalculating the Pmatrix can affect the reconstruction quality due to the use of different heterogeneous geometries between the calculation and the experiment. Diffuse optical tomography (DOT) has been investigated for several decades and its reconstruction algorithms are easily applied to BLT. In this case, the Jacobian matrix needs to be calculated for each iteration, which is timeconsuming [7]. Since the BLT problem is linear, the least-square problem can be solved to realize the BLT reconstruction by establishing the linear relationship between the unknown source variable and the measured surface data [6]. Until now, this method has not been extended to the radiative-transfer-based model. In addition, matrix inversion needs to be performed when using this strategy. Additional investigations should be performed to improve the reconstruction speed.
In this paper, a radiative-transfer-based fully parallel BLT reconstruction framework is developed using simplified spherical harmonics (SP N ) equations. This framework uses finite element methods to process complex reconstruction domain geometries. The linear relationship between the unknown source and the spectrally-resolved measured data using the SP N approximation is established to achieve BLT reconstruction. To improve the reconstruction speed and enable BLT reconstruction for the whole body of the mouse, the finite element-based matrices are stored and operated in a parallel distribution mode. Furthermore, for the timeconsuming problems of the key steps in the reconstruction, corresponding improvements are also performed, which significantly accelerate the reconstruction. Timing analysis demonstrates the improved performance of the proposed framework. Experimental reconstructions using mouse-shaped phantoms and real mice show the potential of this framework for practical BLT applications. The next section introduces the proposed fully parallel framework using the SP N approximation. In the third section, the performance tests and analysis are described and experimental BLT reconstructions also are demonstrated. Finally, we discuss relevant issues.

Radiative transfer equation and SP N approximation
The radiative transfer equation (RTE) is an approximation to Maxwell's equations. In bioluminescence imaging, the source intensity is generally assumed to be time invariant during the data acquisition. In addition, the photons at different wavelengths are considered to be independent, therefore we get (1) where ψ(r,ŝ,λ) denotes photons in the unit volume traveling from point r in direction ŝ. Based on the principle of energy conservation, the RTE suggests that the radiance ψ(r,ŝ,,λ) is equal to the sum of all factors affecting it (including absorption μ a (r,λ), scattering μ s (r,λ), and source energy S(r,ŝ,λ) ) when light photons cross a unit volume [20]. p(ŝ,ŝ′) is the scattering phase function and gives the probability of a photon scattering anisotropically from direction ŝ′ to direction ŝ. Generally, the Henyey-Greenstein (HG) phase function is used to characterize this probability [21]: (2) where g is the anisotropy parameter; cosθ denotes the scattering angle and is equal to ŝ·ŝ′ when we assume that the scattering probability only depends on the angle between the incoming and scattering directions. When photons reach the body surface of a mouse, that is r ∈ ∂Ω, some of them are reflected and cannot escape from the mouse body Ω because of the mismatch between the refractive indices n b for Ω and n m for the external medium. When the incidence angle θ b from the mouse body is not larger than the critical angle θ c (θ c = arcsin(n m /n b ) based on Snell's law), the reflectivity R(cosθ b ) is given by [22]: (3) where θ m is the transmission angle. Furthermore, we can get the exiting partial current J + (r) at each boundary point r [13]: (4) where v is the unit outer normal vector. After a series of deductions with the P N method, the SP N approximation is obtained [13] are the Legendre moments of ψ (λ) (2 ≤ n ≤ N, N is an odd positive integer). Although the SP N solution is asymptotic and cannot converge to an exact radiative transfer solution with the increase of N, the simulation results have shown good agreement between the SP 7 approximation and Monte Carlo methods [14]. Through some further deductions, (N + 1)/2 boundary conditions can be obtained corresponding to (N + 1)/2 Eqs. (5). These boundary conditions are mixed and consist of linear combinations of the even-order ϕ n and their first derivatives. With respect to the composite moments φ n of ϕ n , which are (6) We can get the general equations of the SP N approximation and its boundary conditions when practical measurements are performed at the wavelength λ k using bandpass filter: where , , , and can be calculated. The details and the above coefficients of the SP 1 to SP 7 approximations can be found in [13].

Fully parallel reconstruction algorithm
Based on the finite element analysis, we can get a general weak formulation for the SP N approximation [23]: (8) To avoid the processing of v·φ i in boundary integration, we assume v·φ i are unknown variables in the boundary equations (Eq. (7b)). We obtain f v·φ i (·) by solving a set of first order equations. The boundary conditions can be easily processed using this method. Figure 1 shows the flowchart of the proposed fully-parallel framework. Fully-parallel reconstruction means that almost all of the steps in the reconstruction framework should work in parallel mode. After the reconstruction domain Ω is discretized into the volumetric mesh , the next step is to partition this mesh into N c mesh subdomains , where N c is the number of the utilized CPUs. Regarding the finite element implementation, the space of linear finite elements is introduced on . φ i (λ k ) and S i (λ k ) are approximated: where φ i,p (λ k ) and s i,p (λ k ) are the discretized values at a discretized point p when using the basis function υ p (r); is the total number of the discretized points over the entire domain. Considering Eqs. (8), (9a), and (9b), for a volumetric element τ e , we have (10) where and ∂τ e is the boundary element if τ e is on the boundary and belongs to the respective subdomain in the parallel implementation. After assembling the submatrices on all the elements, we get (11) After inverting the entire matrix at the left side of Eq. (11), we have (12) Where are the submatrices of the entire inverse matrix IM(λ k ) corresponding to . Note that matrix inversion is calculated with respect to the entire matrix M(λ k ) although are used to describe the subdomain submatrices. Because matrix inversion is always time-consuming, to accelerate the reconstruction, direct and iterative matrix inversion methods are compared to optimize the execution time. Details can be found in Section 2.3.
After removal of the rows in the matrices corresponding to the non-boundary measurable discretized points, we further manipulate Eq. (4) to get [13] (13) where β j (λ k ) can be calculated with respect to Eq. (4) when ψ(r,ŝ,λ) is expanded; are the corresponding matrices on subdomain after removing the rows in Eqs. (12). When the surface optical data are collected at K wavelengths, we get (14) where (15) is the relationship matrix between and ; γ k is the percentage at the wavelength λ k of the total energy. It is usually considered as an ill-conditioned matrix because of the illposedness of BLT. The surface measured data corresponding to will likely lead to reconstruction failure when solving Eq. (14) directly due to the noise factor. Through solving the bound-constrained least-square problem (16) We can generate the BLT reconstruction, where S sup is the upper bound of the source density; δ the regularization parameter; and η(·) the penalty function. Since all the data in Eq. (16) are distributed on N c CPUs, the optimization algorithms should work in parallel mode.

Further demonstration
Overall, BLT reconstruction can be obtained by solving Eq. (16) using parallel optimization algorithms. Although a fully parallel spectrally-resolved reconstruction framework can be realized, performance optimization is necessary to accelerate the reconstruction. With respect to the steps of the proposed framework, three aspects are further demonstrated: Load balancing is critical for high performance parallel reconstructions. If there are large differences between the sizes of the mesh subdomains on different CPUs, the performance is adversely affected. At the beginning of the proposed framework, mesh partitioning is an important step. In this framework, a multilevel k-way method is used to perform this [24]. The method results in improved performance by reducing the dimensions of the mesh, partitioning it into smaller sizes, and refining it to the original one. Another consideration is the distribution of the relationship matrix between the spectrally-resolved measured data and the unknown source variable. The distribution of the measurable boundary discretized points is not uniform on each CPU. In addition, is formed by combining . The redistribution of is necessary in order to optimize the performance of parallel optimization algorithms.
Matrix inversion is the major component in the Relationship Forming step. Although it is very time-consuming, it is unavoidable in the proposed framework. Furthermore, with the increase of the number N in the expansion to the radiance, the dimensions of the matrix M(λ k ) become very large. It is essential to find a very efficient inversion matrix algorithm [25]. Since M(λ k ) is sparse positive-definite, LU factorization has good performance in various direct inversion algorithms. Another strategy is to use iterative methods. Preconditioning strategies in recent years have significantly improved the performance of iterative methods [26]. Since the performance of two strategies depends on the matrix properties, it is necessary to decide the type of strategy suitable for the specified problem.
Optimization methods Eq. (16) is a least squares problem, and it is easy to obtain the Hessian matrix in Newton-type optimization methods [27]. However, due to the large scale of the problem, a significant amount of memory is required during the optimization procedure. Even if the Hessian matrix can be calculated at each iteration, the process is extremely time-consuming. In addition, when computing the search direction, it is necessary to invert the Hessian matrix, which is also time-consuming. The speed of BLT reconstruction is therefore severely affected using Hessian matrixbased optimization algorithms. One solution to this problem is to use quasi-Newton methods. Generally, these methods build up an approximate Hessian matrix by using gradient and iteration algorithms. This approximate matrix is obtained by vectorvector multiplications in real time and is easy to inverse, saving memory and time costs. Here, the limited memory variable metric bound constrained quasi-Newton method (BLMVM) in parallel mode is used for BLT reconstruction [28].

Results
The bioluminescence imaging experiments were performed on a Maestro 2 in vivo imaging system (CRI, Woburn, Massachusetts). This system uses a cooled CCD camera with a custom lens as the detector. The distinct characteristic of this system is that a liquid crystal tunable filter (LCTF) is used to acquire multispectral data. Generally, the filter bandpass width was set to 20nm and the optical data was collected from a single view. The exposure time for each wavelength was adjusted to obtain high signal-to-noise ratio (SNR). After completing each optical signal acquisition, the phantom or mouse were scanned using an X-ray microCAT system (Siemens Preclinical Solutions, Knoxville, TN) to obtain CT images. The software Amira (Mercury Computer Systems, Inc. Chelmsford, MA) used the CT images to generate volumetric meshes for BLT reconstructions.
The framework was implemented in libMesh [29]. LibMesh is an open-source, high-quality software package and is developed to meet the needs of parallel FEM-based simulation. LibMesh provides almost all of the components used in parallel PDE-based simulation with unstructured discretization. Its design concept is to use existing software packages as far as possible. PETSc developed by Argonne National Laboratory (ANL) was used to solve the linear systems in parallel mode [30]. By default, an open-source serial graph partitioning package, METIS, realizing the multilevel k-way partitioning algorithm was used to partition the whole domain in libMesh [24]. In addition, in order to observe the effect of the model errors in the reconstruction quality, the regularization parameter δ was set to zero in the reconstructions. In order to test the proposed framework, we selected the SP 1 (DA), SP 3 and SP 7 approximations to perform BLT reconstructions. All the simulations were performed on a cluster of 27 nodes ( 2 CPUs of 3.2GHz and 4 GB RAM at each node).

Mouse-shaped phantom experiments
In the first case, a commercial mouse-shaped phantom (Caliper Life Sciences, Hopkinton, Massachusetts, USA) was used to acquire multispectral data. The phantom was fabricated from a polyester resin, TiO 2 and Disperse Red. To imitate the bioluminescence source, an optical fiber coupled to a green LED was embedded within the phantom. The emission spectrum of the LED was similar to that of a bioluminescence source. Its wavelength range was from 500nm to 700nm with a peak at about 567nm. The photon distribution data at two wavelengths (580nm and 640nm) were used in the BLT reconstruction. Table 1 shows the optical properties (μ a and ) at two wavelengths measured with the inverse adding-doubling method [31]. For the high-order SP N approximations, we set the anisotropic parameter g to 0.9. More detailed information about this phantom can be obtained elsewhere [16]. Figure 2(a) shows a photograph of the phantom in the Maestro 2 system. To avoid the curved surface effect in the measured data, the bottom flat surface was used as the detection surface. Figures 2(b) and 2(c) show the photon distribution at 580nm and 640nm respectively. They were acquired using an exposure time of 5min. There are distinct differences between them because of the different optical properties at different wavelengths, which benefit the 3D source localization.
With respect to the photon distribution, about two-thirds of the overall phantom was selected for mesh generation. The volumetric mesh, as shown in Fig. 3(a), contained 4,969 nodes and 21,348 tetrahedral elements. Figure 3(a) also shows that the photon distribution was mapped on the mesh surface using a manual co-registration method. Figure 3(b) shows the results after the mesh was partitioned when 10 CPUs were used for the BLT reconstruction. The number of discretized points in each subdomain is similar, avoiding a load imbalance. Figure 4 further shows the reconstructed results based on SP 1 , SP 3 and SP 7 approximations. Due to the absence of a regularization parameter, the SP 1 -based reconstruction was very sensitive to the measured noise and we could not obtain good source localization, as shown in Fig. 4(a). Figures 4(b) and 4(c) show the reconstructed results when SP 3 and SP 7 approximations were used. From the figures it is clear that the source positions are reconstructed well when using high-order approximations. The localization errors are less than 1mm in two directions, which can clearly be observed from Figs. 4(b) and 4(c). These reconstructed results show the importance and performance of high-order SP N approximations for BLT reconstruction.

Reconstruction performance optimization
Although the high-order SP N -based BLT reconstructions yield good source localization, the reconstruction memory and time costs are significantly increased with respect to the first order approximation (SP 1 ). In order to save a dense inverse matrix IM(λ k ), the SP 1 -based reconstruction requires only about 94 MB of space compared to about 1.5 GB in the SP 7 -based reconstructions. Although the proposed fully parallel framework has the ability to process matrices with large dimensions by distributing the storage, reconstruction time becomes impractical with the increase of the approximation order N and the number of used total wavelengths K. Performance optimization is indispensable to improve the efficiency of the proposed framework. The quasi-Newton optimization method (BLMVM) has been selected to significantly reduce reconstruction time when compared to general Newton-type methods. Additionally, matrix inversion and the number of utilized CPUs further optimizes the reconstruction framework.

Direct vs. iterative inversions-When the reconstruction domain is discretized into
points, the SP N -based BLT reconstruction must process a matrix. The computational complexity of the matrix inversion is if direct inversion methods are used. The computational burden is significantly increased with the increase of N and . When 10 CPUs were used in the above BLT reconstructions, the SP 1 -based reconstruction required only 1,163.1sec, as opposed to 3,086.1sec for SP 3 and 10,754.5sec for SP 7 (Table 2). LU-factorization-based relationship forming (i.e. forming Eq. (14)) utilized most of the total reconstruction time. For the SP 1 and SP 7 approximations, the percentage increased from 58.0% to 96.8%, making it critically important to improve the performance of the matrix inversion. For iterative matrix inversion methods, the parallel incomplete LU (ILU) conjugate gradient (CG) method was used to accelerate the inversion. This preconditioner was provided by the Hypre open source package [32], developed by Lawrence Livermore National Laboratory (LLNL). For the SP 1 -and SP 7 -based reconstructions, the total reconstruction time sharply decreased from 644.5 to 3,367.5sec, as shown in Table 2. Although the percentage of the relationship forming part in the total time increased regardless of SP 1 and SP 7 approximations, the reconstruction speed was improved by a factor of 1.80 and 3.19 corresponding to the SP 1 and SP 7 approximations.

3.2.2.
Parallel reconstruction performance-Iterative matrix inversion methods show the improved performance in the Relationship Forming step. Ideally, parallel execution on an increased number of CPUs should provide improved reconstruction performance. In order to evaluate the proposed fully-parallel reconstruction framework, BLT reconstructions with SP 1 , SP 3 and SP 7 approximations were performed using different number of CPUs. Iterative matrix inversion was used in these evaluations. Figure 5(a) shows the total reconstruction time depending on the CPU number. The SP 1 -based reconstruction time increased with an increased number of CPUs. For SP 3 -and SP 7 -based reconstructions, there were an optimal CPU number which will provide the shortest reconstruction time (3 and 23 CPUs corresponding to SP 3 and SP 7 ). In the proposed framework, the four main steps are 1) Mesh Partitioning, 2) Matrix Assembly, 3) Relationship Forming, and 4) Optimization. To further observe the above behavior, time analysis was performed while observing these steps, as shown in Figs. 5(b) and 5(c). Since Mesh Partitioning required the least time among the four steps, it was negligible with respect to the entire reconstruction (data is not shown in Figs. 5(b) and 5(c)). With the increase in CPU number, the time cost of Matrix Assembly was gradually reduced. Despite the fact that the matrix assembly time increased with the application of high-order approximation, Matrix Assembly required a small percentage of the overall reconstruction. Relationship Forming and Optimization comprised nearly all the reconstruction time. Furthermore, both of these steps had an optimal number of CPUs to obtain the minimal time cost. Since BLT reconstructions are performed on a cluster, the time cost of the communication between the CPUs becomes significant compared with the performance improvement from parallel execution. Higher speed communication methods, such as shared memory mode, could significantly improve the reconstruction speed. With the current hardware architecture and software settings, the number of CPUs must be preselected to obtain optimal reconstruction time.

Real mouse experiments
To further validate the proposed framework, experiments with a living mouse were performed in the Maestro 2 system. To simulate the bioluminescence source, a luminescent bead (Mb-Microtec, Bern, Switzerland) whose emission spectrum is similar to the in vivo spectrum of a firefly luciferase-based source was used. This bead uses tritium (the half life is about 12 years) to excite phosphor which generates photons, making it a very stable source. The bead dimensions are 0.9mm in diameter and 2.5mm in length. Prior to performing the experiments, the mouse was anesthetized and the bead was injected into the thigh using a syringe. The photon distribution data at 580nm and 660nm were collected from the ventral view. The exposure time was set to 1.5min. The volumetric mesh used in the reconstruction was generated using CT images of the mouse and contained 5,932 points and 24,120 tetrahedral elements. Figure 6(a) shows the mapped photon distribution after co-registration between the photograph of the mouse and the volumetric mesh. From the CT images, the tritium source can be clearly identified, as shown in Fig. 7. Since the photon propagation path consists almost totally of muscle, the reconstruction domain was considered to be homogeneous muscle tissue. The corresponding mouse muscle optical properties were then used in the reconstruction (Table 1). Figure 6(b) shows the partitioned mesh when 30 CPUs were used in the BLT reconstruction. The reconstructed results corresponding to SP 1 , SP 3 , and SP 7 approximations are shown in Fig. 7. The actual center position of the tritium source was at (51.8,-0.1). The reconstructed center position obtained from SP 1 , SP 3 , and SP 7 approximations was at about (51.1,0.2). The SP 7 -based reconstruction was similar with the SP 3 -based one regarding the source center position. Although the SP 1 -based reconstruction was somewhat different compared to the SP 3 -and SP 7 -based results, the source localization errors were measured to be less than 1mm in two directions. This result was similar to the SP 3 -and SP 7 -based reconstruction in the phantom experiments. The difference between phantom-and real mouse-based BLT reconstructions was that the tritium source could be localized well in the SP 1 -based reconstruction. This was likely because the tritium source was superficial compared to the LED source in the mouse-shaped phantom. In addition, the mouse surface was more irregular than the mouse-shaped phantom surface, it should have the effect in the reconstruction.

Discussions and Conclusion
In this paper, a radiative-transfer-based fully-parallel reconstruction framework was developed for spectrally-resolved bioluminescence tomography. Although the BLT reconstruction was performed based on the simplified spherical harmonics approximation, the proposed framework was also suitable for other high-order self-adjoint approximations to the RTE. The application of the finite element methods made the framework suitable for processing complex geometries. Fully-parallel execution made the BLT reconstruction for the whole-body of a small animal feasible. The reconstruction performance optimization significantly improved the reconstruction speed. The experimental reconstruction using the mouse-shaped phantom and real mouse further demonstrated the effectiveness of the proposed framework.
Since bioluminescence tomography can provide more accurate bioluminescent source information, the importance of developing mature BLT technologies is critical, given the successful and extensive application of planar bioluminescence imaging in biological research. Diffusion theory approximation leads to inaccurate reconstructions and more accurate approximation models are necessary for BLT reconstruction. However, the corresponding computational burden prevents the realization of such reconstruction algorithms. The proposed framework addresses this problem. Although good reconstruction performance can be obtained using high-order approximation models, the memory and time cost cannot be neglected. The balance between reconstruction quality and cost should be further explored. From the reconstruction cases presented in this paper, we find that the SP 3 approximation is suitable for obtaining good BLT reconstructions after comparing its results with those based on SP 1 and SP 7 approximations.
The performance optimization presented here is relevant not only to the improvement of the algorithms involved, but also to the development of the computational hardware. While the application of quasi-Newton optimization methods and iterative matrix inversion have effectively improved the reconstruction performance, further optimization strategies should be developed to obtain higher reconstruction speed. Since the matrix M(λ k ) is sparse, sparse approximate matrix inversion can be considered to accelerate the reconstruction. With respect to the hardware, it is necessary to improve the data communication speed between CPUs. Developing multi-core CPU technology and shared-memory high performance computer will significantly benefit the proposed algorithm.
In conclusion, we have developed a fully-parallel spectrally-resolved BLT reconstruction framework for radiative-transfer-based high-order approximations. A performance optimization was also performed and described. Preliminary experimental reconstruction verifications show the feasibility and effectiveness of the proposed framework. Further investigations will focus on real mouse experiments used as disease models. The flowchart of the proposed fully parallel framework. (a) Photograph of Caliper mouse-shaped phantom in a Maestro 2 system; (b) and (c) are the photon distribution at 580nm and 640nm corresponding to (a).      Performance comparison between direct and iterative inversions when 10 CPUs are used in reconstructions. DI is Direct Inversion; II denotes Iterative Inversion; and DI/II is the ratio of total time between DI and II.