L1-norm based nonlinear reconstruction improves quantitative accuracy of spectral diffuse optical tomography

Spectrally constrained diffuse optical tomography (SCDOT) is known to improve reconstruction in diffuse optical imaging; constraining the reconstruction by coupling the optical properties across multiple wavelengths suppresses artefacts in the resulting reconstructed images. In other work, L1-norm regularization has been shown to improve certain types of image reconstruction problems as its sparsity-promoting properties render it robust against noise and enable the preservation of edges in images, but because the L1-norm is non-differentiable, it is not always simple to implement. In this work, we show how to incorporate L1 regularization into SCDOT. Three popular algorithms for L1 regularization are assessed for application in SCDOT: iteratively reweighted least square algorithm (IRLS), alternating directional method of multipliers (ADMM), and fast iterative shrinkage-thresholding algorithm (FISTA). We introduce an objective procedure for determining the regularization parameter in these algorithms and compare their performance in simulated experiments, and in real data acquired from a tissue phantom. Our results show that L1 regularization consistently outperforms Tikhonov regularization in this application, particularly in the presence of noise.


Introduction
Diffuse optical tomography (DOT) is a non-invasive, non-ionizing and low-cost imaging technology with applications in diagnosing breast cancer [1][2][3], analyzing brain function for functional neuroimaging [4][5][6][7][8], and imaging small animals for the study of disease detection, progression/regression and treatment [9,10]. The imaging process typically involves the injection of near-infrared (NIR) light in the spectral range of 650−900nm into the imaging volume of interest (e.g. breast, head, mouse) through sources on the surface of the volume. The transmitted light is then measured at different locations using detectors that are also on the same volume surface. A number of measurements are acquired using different source-detector pairs, and the internal distribution of the tissue's optical properties is reconstructed using a transport-model-based image reconstruction algorithm [11].
When a single wavelength continuous-wave (CW) light source is used, only the amplitude of the fluence can be measured at the surface boundary. In this case, only tissue absorption at that wavelength can be estimated. However, since the quantities of interest in DOT experiments are typically chromophore concentrations (mainly oxyhemoglobin (HbO 2 ) and deoxyhemoglobin (Hb)) rather than the absorption itself, measurements have to be taken at multiple CW wavelengths in order to provide sufficient information to recover the distributions of these chromophores. There are two main approaches for reconstruction of chromophore images using multiple wavelength measurements: non-spectral methods and spectrally constrained methods. Non-spectral methods (traditional DOT) reconstruct the absorption coefficients at each wavelength independently and then calculate the chromophore concentrations using Beer's law [12]. Spectrally constrained DOT (SCDOT) directly reconstructs the chromophore distributions by using the known absorption spectra of the chromophores to constrain the reconstruction process. Compared with non-spectral methods, spectrally constrained reconstruction has been shown to be better at suppressing artefacts in the resulting reconstructed images and to reduce crosstalk between chromophores and scatter parameters in breast imaging [13][14][15]. It has also been shown that boundary measurements at two NIR wavelengths are sufficient to recover the concentrations of HbO 2 and Hb [16].
Reconstruction of images from DOT measurements is a difficult inverse problem. The limited availability of boundary measurements and the diffusive nature of light propagation in tissue [11,17] make the problem non-linear and ill-posed, and iterative solutions with effective methods for regularization are necessary to obtain unique solutions. Many approaches have been used [18][19][20] and quadratic Tikhonov (L 2 -norm) regularization is the most popular approach as the solutions to each iteration step can be computed analytically, simplifying the reconstruction process. This is known to suppress the high-frequency components of the reconstructed image (normally noise) leading to smooth reconstructions, but this has the drawback of being unable to preserve sharp features in the reconstructed images [19]. L 1 -norm regularization has recently been adopted for single wavelength DOT image reconstruction. Features of interest in DOT, such as tumours in the breast or activations in the brain are typically spatially localized and in this case the vector corresponding to the difference in the optical properties relative to the background is sparse with only a few non-zero elements [21][22][23][24]. L 1 -norm regularization is known to induce sparsity in the solution to inverse problems [24][25][26], and has been shown to give essentially the same sparsity as the true sparsity measure (L 0 -norm) [27]. Compared to L 2 regularization, L 1 regularization is less sensitive to outliers, which correspond to sharp edges in image processing applications [28] and is thus able to preserve edge-like features. Both L 2 and L 1 regularization methods can be solved by convex optimization schemes where a unique solution can be guaranteed [24,25]. The more general L p -norm (0 < p < 1) regularization schemes have also been studied for DOT image reconstruction [23,29], and are also known to introduce sparsity to the reconstructed image [30]; however, L p regularization is known to be nonconvex meaning that local minima exist [31] and unique solutions cannot be guaranteed.
In traditional DOT, L 1 regularization can indeed be applied to each wavelength independently. However, there are no guarantees that the solutions will be consistent with Beer's law. It must be assumed that the regularizer will have the same sparsifying effect at all wavelengths. This may not necessarily be true, given that SNR, scattering etc are different. SCDOT can be used to constrain the solution space to those solutions that are physically plausible. Therefore, reconstruction with spatial and spectral regularization simultaneously applied will constrain the solution space much more reliably than their sequential application. To the best of our knowledge, L 1 -norm has not yet been used in SCDOT image reconstruction. We introduce a novel algorithm, spectral-L 1 , which combines the sparsity-preserving advantages of L 1 regularization with the spectral constraints imposed by coupling optical properties across multiple wavelengths, to solve the inverse problem for image reconstruction in SCDOT. The key advance is to adapt the DOT reconstruction process to incorporate efficient methods for solving each iterative step. These are necessary because the L 1 -norm is non differentiable and the update terms in the reconstruction process cannot be computed analytically. We investigate three algorithms for solving the update term: iteratively reweighted least square algorithm (IRLS) [32], alternating directional method of multipliers (ADMM) [33-36] and fast iterative shrinkage-thresholding algorithm (FISTA) [37,38]. All three methods have been widely used to obtain sparse solutions to linear systems. IRLS and ADMM are second-order algorithms that require explicit inversion of a large matrix; FISTA is a first-order algorithm that does not require explicit matrix inversion, but does require a gradient operator to be constructed.
We adapt the DOT reconstruction process to use these methods for the solution of the update terms. An automated method to automatically select the regularization parameters is developed which is based on the L-curve method but is modified for this use-case. Then we perform a systematic comparison of the different regularization methods (L 1 and L 2 ) and optimization algorithms (IRLS, ADMM and FISTA) on simulated data in two-and threedimensions. The comparison evaluates the methods on the accuracy of image reconstruction; ability to preserve edges; robustness against noise; and computational efficiency. Comprehensive and robust qualitative and quantitative evaluations are performed to quantitatively compare the reconstruction results using average contrast (AC), Pearson correlation (PC) and peak signal-tonoise ratio (PSNR). To our knowledge, this is the first systematic study in the area of spectral DOT reconstruction to perform such a comprehensive evaluation. We then apply our methods to the reconstruction of functional activations in simulated human brain imaging data using realistic anatomical models and finally evaluate the proposed algorithms using experimentally acquired data, by imaging a tissue-mimicking, plastic phantom of known optical properties using a multispectral DOT system. The paper is organized as follows: Section 2 introduces the theory for image reconstruction in SCDOT and proposes a new spectral-L 1 inverse model; Section 3 investigates the performance of the candidate three reconstruction algorithms for our proposed model; In section 4, a principled method for selecting the regularization parameter is described; Section 5 presents extensive comparative experiments in simulated models, and the results of experiments using tissue phantoms. In section 6, the conclusions that can be drawn from our results are discussed.

Theory
Image reconstruction in SCDOT aims to find the tissue composition that best explains the boundary measurements. It typically requires the repeated evaluation of a forward model of light propagation in biological tissues as part of an inverse model that minimizes the difference between the measurements and the model's predicted measurements. In this section, the forward model for CW light propagation is introduced, followed by the spectrally constrained inverse model. The L 1 and L 2 regularization methods for the inverse problem are described at the end of the section.

The forward model
It is generally accepted that if the scattering coefficients dominate over absorption coefficients in tissues and the region of interest is far from the light sources, light propagation can be modelled by a diffusion equation (DE) [11]. The DE is able to generate isotropic fluence fields given a distribution of source fibres and the tissue optical properties. For a CW system, the DE can be written as Here, µ a (r, λ i ) is the absorption coefficient at position r for wavelength λ i , κ (r, is the photon density at position r and wavelength λ i , and the isotropic source term at wavelength λ i is given by q 0 (r, λ i ). It should be noted that in CW imaging, the value of µ s (r, λ i ) is not updated by the reconstruction algorithm and is assumed to be a known constant. The air-tissue boundary is represented by an index-mismatched type III condition (Robin or mixed boundary condition) in which the fluence at the edge of the tissue exits and does not return [39,40]. A finite element method (FEM) is used to numerically solve Eq. (1) on a discretized mesh, which has been implemented in several open-source software packages, notably TOAST++ [41] and NIRFAST [12]. In this work, the NIRFAST package is used for all computations.
In CW systems, the tissue absorption µ a depends on the concentration of chromophores in the tissue. The relationship between the absorption coefficients at different wavelengths is therefore constrained by the intrinsic absorption properties of the chromophores via Beer's law. For a dual-wavelength imaging system, and for two chromophores, Beer's law is written in matrix-vector form as where c 1 and c 2 are chromophore concentrations and λ 1 and λ 2 are two measurement wavelengths.
In the remainder of the paper c 1 and c 2 correspond to oxyhemoglobin HbO 2 and deoxyhemoglobin Hb respectively, with λ 1 = 750nm and λ 2 = 850nm. ε c i ,λ i (i = 1, 2) are the extinction coefficients of the two chromophores at the corresponding wavelength λ i . The values of ε c i ,λ i have been documented by Zeff et al (2007) [5].

The inverse model for SCDOT image reconstruction
In SCDOT, chromophore concentrations c 1 , and c 2 are directly estimated from the boundary measurements in preference to explicitly reconstructing optical properties at each wavelength.
The following SCDOT inverse model allows direct estimation of chromophore parameters from two measurement wavelengths (i.e. 750 and 850nm) using some form of iterative procedure. Using a block notation, in which ( · · ) represents the concatenation of two column vectors, we have: where chromophores c 1 and c 2 are the model parameters to be recovered. Φ M λ i (i = 1, 2) is the measured fluence at the tissue surface and Φ C λ i is the calculated data using the forward solver. The superscript k denotes the iteration number. Eq. (3) defines a non-linear least square problem which can be solved via the classical Gauss-Newton method in which the first order Taylor series is used to expand the forward solution Φ C λ i as in which the spectral Jacobian J (also known as the sensitivity matrix) relates the changes in boundary data to changes in chromophore concentrations and can be constructed directly with the incorporation of spectral prior information using the adjoint method [42]. Note that when k = 1, an initial guess of the chromophore concentrations c 0 1 and c 0 2 is required which can be obtained by a data-calibration procedure explained elsewhere [43]. The spectral Jacobian J can be derived as [44]: where J λ i relates the changes in boundary data to changes in the absorption coefficient at wavelength λ i . The size of J in this case is the number of wavelengths times the number of measurements per wavelength, by number of finite element nodes times number of chromophore parameters.
Substituting Eq. (4) into Eq. (3) leads to where ∆c k is the change in the chromophore parameters at the k-th iteration and can be written as ∆Φ in Eq. (6) is the data-model mismatch which is given by Minimizing Eq. (6) leads to the normal equations which can be solved to find the update term ∆c k using the Gauss-Newton algorithm which is summarized in Algorithm 1. It is however non-trivial to calculate the inverse of J (k−1)T J (k−1) in Eq. (9) (i.e. step 6 in Algorithm 1) because it is normally singular or close to singular. Furthermore, experimental noise in the measurements Φ M λ i tends to lead to reconstruction artefacts if this inversion is computed directly. Strategies that can be employed to invert such ill-posed matrices includes Algorithm 1: Gauss-Newton Algorithm for Minimizing Eq. (3).
1 and c k 2 using Eq. (7) end for RETURN c k 1 and c k 2 algebraic reconstruction technique (ART), truncated singular value decomposition (TSVD) or the simultaneous iterative reconstruction technique (SIRT) [1,11,[45][46][47]. Regularization can also be employed to reduce model errors and artefacts caused by measurement noise. In the following section, we introduce an L 1 -based regularization technique to solve this ill-posed inverse problem.

The proposed spectral-L 1 inverse model
To convert Eq. (6) into a more readily solvable problem, a Tikhonov (L 2 ) regularization term is usually introduced into the inverse problem: The regularization parameter λ determines the degree of regularization that will be imposed on the model. This can be solved analytically to give I is the identity matrix and its size is the same as that of J (k−1)T J (k−1) . The introduction of λI effectively reduces the condition number of the matrix, thus stabilizing the matrix inversion. This is widely known as the Levenberg-Marquardt algorithm and is in general more robust than the Gauss-Newton method. An analytical solution to this problem is possible because Eq. (10) is convex and quadratic which makes L 2 -norm regularization an attractive choice for many inverse problems. However, in image reconstruction problems, it tends to over-smooth the result and sharp features such as object boundaries in the reconstructed images are often smeared. Moreover, L 2 -norm discourages sparsity, and is not suitable for sparse image reconstruction. In SCDOT image recovery, the perturbation/change ∆c k is usually zero or close to zero when the region to be recovered is not in the vicinity of the region of interest. In this case, ∆c k is spatially sparse. Recently, L 1 regularization has been widely studied because of several useful properties: it is sparsity-promoting, convex, edge-preserving, and is more robust against noise. Therefore, we propose a new inverse model for SCDOT image recovery based on L 1 regularization that we refer to as spectral-L 1 . This is formulated as

Candidate algorithms for solving the proposed spectral-L 1 method
We now consider three algorithms for the solution of Eq. (12): iteratively reweighted least squares (IRLS) [32]; alternating directional method of multipliers (ADMM) [33]; and fast iterative shrinkage-thresholding algorithm (FISTA) [37]. These algorithms will be incorporated into the image reconstruction process by substituting them into step 6 of Algorithm 1, which solves for the update term.

IRLS
Instead of solving the L 1 -minimization problem directly, IRLS reformulates the problem as a sequence of weighted L 2 minimization problems. Specifically, by introducing a weight matrix W, the L 1 minimization can be converted into finding the optimal solution of the quadratic problem W is a diagonal matrix with weights, w s , along its diagonal that are given by The superscript i above denotes the i'th IRLS iteration (Algorithm 2), and it should be distinguished from the superscript k denoting the iterations of Algorithm 1. A small positive number 0 < ε 1 is used to avoid the possibility of division by zero. It has been suggested that ε should be a series of positive real numbers that decay to zero over iterations [48]. In practice, we have found that using a fixed value in the range 0.001 ≤ ε ≤ 0.01 does not give significantly different results. Eq. (13) results in the normal equation This is known as the weighted L 2 -minimization scheme. We note that if the diagonal weights w s are set to 1, the normal equation reduces to the conventional L 2 -minimization scheme (Eq. (11)). The IRLS algorithm employing this method is summarized in Algorithm 2. The calculation of the elements of W requires an initial guess for ∆c for which we use the solution to the L 2 -regularized problem, Eq. (11). One of the biggest advantages of IRLS is that Eq. (15) has an analytical solution which allows Eq. (13) to be solved exactly, making IRLS almost as easy to implement as the Levenberg-Marquadt scheme. In common with many sparsity-promoting optimization methods, the sparsity level in IRLS is controlled by the regularization parameter λ which must be chosen carefully for each specific problem.

ADMM
ADMM has been widely used to solve optimization problems in machine learning, signal processing, and standard image restoration and reconstruction. This method has become particularly important in the field of variational image processing, which frequently requires the minimization of non-differentiable objectives [33,34,49,50]. It has been shown to be able to solve constrained optimization problems effectively and efficiently. The basic idea is to decompose a complex optimization problem into several simpler subproblems, which usually have closed-form solutions [35]. Its simplicity, flexibility, and broad applicability have made it an important part of the modern optimization toolset. To apply ADMM to our spectral-L 1 problem, we first introduce an auxiliary splitting vector variable v, an augmented Lagrangian multiplier b, and a positive penalty parameter θ, reformulating Eq. (12) as the following unconstrained optimization problem This multivariate optimization problem corresponds to a sub-minimization problem with respect to ∆c, v and b, separately. When all the subproblems converge, the solution for ∆c approximately represents that of Eq. (12). In order to find the minimizers for all of the subproblems, ADMM searches all the saddle points of Eq. (16) by first fixing the variables (v, b) and minimizing the subproblem with respect to ∆c using the following normal equation By inverting the matrix on the left-hand side of Eq. (17), a unique solution for ∆c is found. We then fix variables ∆c and b and set the first order derivative with respect to v to zero. This leads to which can be solved component-wise using an analytical shrinkage-thresholding method to give where • and sign symbols denote component-wise multiplication and the signum function, respectively. The last step of ADMM is to update the augmented Lagrangian multiplier b, The complete method is presented in Algorithm 3. The key advantage of ADMM is that Eqs. (17) and (18) have closed-form solutions. We note that the augmented Lagrangian multiplier means that different choices of the penalty parameter θ will provide similar results but with different rates of convergence. In all the experiments we have conducted, we used θ = 0.01 to achieve fast convergence.

FISTA
FISTA is a very efficient optimization approach that uses the forward-backward splitting technique (FBS) [37,38]. It is an extension of the classical gradient descent method and therefore belongs to the class of first order methods that are a better choice for large-scale problems than second-order methods such as IRLS and ADMM because they do not require the explicit construction of very large matrices. Let us consider minimizing the L 1 -regularized data fitting energy given by Eq. (12). We begin by analyzing the standard unregularized problem with λ = 0. Let F(∆c) = ||∆Φ k−1 − J k−1 ∆c|| 2 2 and ∇F(∆c) = J (k−1)T (J (k−1) ∆c − ∆Φ k−1 ) denote its gradient. We apply the gradient descent algorithm

Algorithm 3: Alternating Directional Method of Multipliers (ADMM)
Tol, otherwise go to step 1 end for RETURN ∆c k = ∆c i where t > 0 is a suitable stepsize which controls how far the iteration moves along the gradient direction in the i'th iteration. The value of t is initialized by estimating the Lipschitz constant L of ∇F as L = L (∇F) and then backtracking rules are adopted to guarantee that the objective has decreased sufficiently [38]. The gradient iteration given by Eq. (20) can be understood as a proximal regularization [51] of the linearized function F(∆c) at ∆c i−1 , which corresponds to the following optimization problem: Analogously, adopting the same gradient descent idea to solve Eq. (12) with λ 0 leads to the following minimization problem ∆c = arg min Minimizing this results in a formulation similar to Eq. (18) and can be solved in the same way to give The minimizer of the Eq. (12) can then be found by iterating ∆c i in Eq. (23) to convergence. In isolation, Eq. (23) is known as the iterative shrinkage-thresholding algorithm (ISTA) [52][53][54][55][56][57], whose global convergence rate is O(1/N), where N is the iteration counter. This is improved upon by using a Nesterov-type acceleration technique to obtain faster convergence. In the FISTA algorithm, the iterative shrinkage operator is not used on the value obtained from the previous iteration ∆c i−1 , but rather on a combination of the values from the previous two iterations. Thus, in FISTA, Eq. (23) is replaced with where ∆y i comes from the prediction procedure given in step 4 of Algorithm 4. This step can help to push the solution to the current iteration further in the direction it moved during the previous iteration, which can significantly improve the computational efficiency. The complete FISTA method is presented in Algorithm 4, where steps 2 and 3 implement the acceleration strategy and can be viewed as an over-relaxation step that improves the global convergence rate from O(1/N) to O(1/N 2 ).

The spectral-L 1 algorithm
We have introduced three methods for solving the chromophore update terms in SCDOT with a sparsity enforcing constraint: IRLS, ADMM, and FISTA. These algorithms replace the single a curve which is typically L-shaped can be constructed. Figure 2 shows the L-curves obtained from each of the four candidate optimization schemes using the numerical experiments described by Zhan et al [44]. The optimal trade-off occurs at the "elbow" of the L-shaped curve and this can be located by determining the point of maximum curvature of the curve. Since strong regularization can improve the conditioning of the linear system, we solve the formulations given by Eqs. (10) and (12) with a relatively large regularization parameter λ and then decrease it gradually by a fixed factor until the curvature of the L-curve starts to decrease. This corner point is considered to be at the optimum value of λ where both the model fit and the regularization function are simultaneously near to their minimum values. In principle, computing the L-curve requires the full image reconstruction process to be run multiple times which is computationally very expensive. We have found that it is sufficient to compute the L-curve for one iteration of the outer loop of Fig. 1, and then to use the resulting optimal value of λ for the remaining iterations. In addition, in order to avoid the special case where the L-curve does not allows an optimal value of λ to be found by purely numerical means [58,59], we select a range around the parameter with the highest curvature value. We then adjust the values manually to get the final optimal parameter by visually inspecting the solutions and choosing the one that generates the sparsest solution with a well-defined compact localization. This approximate optimum is then used for subsequent iterations. We note that the choice of algorithm for L 1 regularized reconstruction significantly affects the shape of the L-curve and the optimal value of λ. In addition to the regularization parameter λ that is common to all three L 1 algorithms, we have considered how to select the other parameters of each method to ensure that our comparison is fair and unbiased. IRLS has one parameter ε and we set this to 0.001 ≤ ε ≤ 0.01 following the recommendations set out by Shaw and Yalavarthy [48]. ADMM has one parameter θ and the use of the augmented Lagrangian multiplier means that different choices of θ provide similar results but lead to different rates of convergence. In all the experiments, θ was set to 0.01 to achieve fast convergence. FISTA has two parameters t and α. t is initialized by estimating the Lipschitz constant and then backtracking rules are adopted to guarantee that the objective has decreased sufficiently [38]. t is therefore updated automatically. α is involved in an over-relaxation step (i.e. step 4 in Algorithm 3) and its update is also automatic (i.e. step 3 in Algorithm 3). The regularization parameter λ is therefore the only parameter that must be optimized for a specific problem.

Experiment setup
We have performed extensive experiments to evaluate the performance of different models and algorithms qualitatively and quantitatively. We first define three evaluation metrics to quantify the quality of the reconstructed images. We then describe simulated numerical experiments, and then real experiments performed on phantom samples. For experiments in which measurement noise was added, ten repeats were performed. In all cases, the forward model was implemented using the NIRFAST package [12] in Matlab R2013a (Mathworks, Natick, USA).

Quantitative evaluation metrics
Three quantitative evaluation metrics are considered: the average contrast (AC), Pearson correlation (PC) and peak signal-to-noise ratio (PSNR). Ideally, if the reconstructed image is exactly same as the ground truth image, AC is equal to 1. For PC and PSNR, the recovered image has higher quality if higher PC or PSNR values are obtained.
Average constrast (AC) is based on the mean value of the region of interest and is defined as: where c j i denotes the recovered values of chromophore i on the finite element node j. N is the number of nodes in the activation region which is selected by thresholding the recovered changes based on 50% of the maximum recovered changes.c i are the ground truth values of the chromophores in the activation region.
The second evaluation metric PC is given by The numerator is the covariance (COV) of the recovered images with the ground truth and σ indicates standard deviation. The PC is thus a measure of the joint variability of the ground truth image with the reconstructed image. Finally, PSNR evaluates the difference between the ground truth image and the recovered image. Larger PSNR values means less difference between the target and the recovered image. This measure is defined as follows PSNR = 10 · log 10 MAX 2 Here, MAX c i is the maximum pixel value of c i and MSE is the mean squared error between the reconstructed and ground truth values.
For AC, values closer to 1 indicate better performance. For PC and PSNR, higher values are better.

Three-dimensional head numerical experiments
We first evaluate our proposed methods using a physically realistic three dimensional head model derived from T1-weighted MPRAGE scans originally acquired by Eggebrecht et al [8] that were kindly provided to us by the authors of that work. Following the process described by Wu et al [60], Statistical Parametric Mapping (SPM) software [61] was used to perform a parametric segmentation of the 5 tissue types (scalp, skull, cerebrospinal fluid (CSF), gray matter, white matter) based on the pixel intensity probability function distribution. These five different layers were then further processed in NIRFAST to create masks and layered volumetric FEM meshes.

Scalp
Skull CSF Gray matter White matter Overall Fig. 3. Three-dimensional surface mesh for each of the five head layers. The mesh is composed of 101046 nodes corresponding to 589658 tetrahedral elements. Each node is labelled by one of the five segmented head tissue types, as shown in Fig. 3. Chromophore concentrations assigned to each layer are computed from the tissue optical properties at 750nm and 850nm in a previous in vivo study [62] (Table 1) using Beer's law and Mie scattering formulae. A high-density (HD) imaging array containing 158 sources and 166 detectors (Fig. 4) [8] was placed over the whole head, with source-detector (SD) separation distances ranging from 1.3 to 4.8cm. In this study, 3478 differential measurements per wavelength were used to image the hemodynamic changes in the brain. Two individual activations were simulated in the visual cortex with chromophore concentrations of HbO 2 and Hb respectively increasing to double the background level in the gray matter (Fig. 5). Each simulated activation has a radius of 5mm. 0% to 2% distributed Gaussian noise at 0.5% intervals was added to the measurement vector.
Reconstructed chromophore concentrations of the simulated activation using the Tikhonov model (Eq. (10)) and the spectral-L 1 model (Eq. (12)) on noise-free data are displayed in Fig. 6, while those on data with 1% Gaussian noise are displayed in Fig. 7. We only show the area with changes in chromophore concentration greater than 0.0001mM. Compared to the results from the spectral-L 1 model, Tikhonov reconstructions have lower image contrast, which can be clearly seen from the first column of Fig. 6 and Fig. 7. Some artifacts (areas contained within green ellipses) can be easily observed around the source and detector areas. With increased levels of noise, larger artefacts are seen with Tikhonov regularization and the results are spatially smeared. In contrast, results from the spectral-L 1 model show fewer artifacts in the non-activation area. Higher noise does not noticeably affect the L 1 -regularized reconstructions. IRLS produces visually more compact localizations than ADMM and FISTA, whilst ADMM appears to have better sparsity-inducing properties compared with IRLS and FISTA.  Evaluation metrics from this experiments are shown in Fig. 8. It is clear that the spectral-L 1 model can achieve higher AC, PC and PSNR values than the Tikhonov model which means higher image contrast and accuracy can be achieved with L 1 regularization. Although the results of FISTA show more visual artifacts than other L 1 -norm methods, it is still able to produce

More realistic three dimensional head numerical experiments
Following the proof-of-concept experiments described in the previous section, we extended our analysis to a more realistic case with much smaller changes in chromophore concentrations. In the activation area we model a small region with changes in HbO 2 (c 1 ) of 5µM and Hb (c 2 ) of −5µM, relative to the background concentrations given in Table 1 (Fig. 9). The mesh is the same as that used in the previous section. In line with the expected in vivo performance of imaging systems, 0.12%, 0.15%, 0.41% and 1.42% Gaussian random noise was added to first (13mm), second (30mm), third (40mm) and fourth (48mm) nearest neighbour measurements to provide realistic data [63]. Reconstruction using the four methods considered here are shown in Fig. 10 with noisefree and noisy simulated data. With reference to results shown earlier in this paper, we make a similar observation that in comparison to the ground truth values, results using Tikhonov regularization are visually inferior to those from L 1 regularization. With increased noise, Tikhonov regularization performs progressively worse with more artefacts visible in the source-detector areas. L 1 regularization induces sparse results with fewer artefacts in non-activated areas. Visual inspection of the results from the three L 1 algorithms suggests that IRLS produces over-sparse reconstructions with strong activations confined to a small area. ADMM and FISTA results are much more visually realistic and they are seen to give higher quantitative accuracy. A quantitative evaluation using AC, PC and PSNR is given in Table 2 and Table 3 and these support the conclusion that even at small changes in chromophore concentration, the spectral-L 1 model can still guarantee higher image contrast and accuracy, with FISTA performing consistently better by all measures (AC closer to 1, higher PC, higher PSNR).

Experiments with phantom data
To evaluate the proposed algorithms on real experimental data, a multispectral, non-contact CW-DOT system designed for hand imaging [64,65] was used to image a solid plastic cylindrical phantom (INO, Quebec, Canada) of radius 12.3mm and length 50mm. Boundary data was collected at five wavelengths (650nm, 710nm, 730nm 830nm and 930nm), in a transmission setup with a 7 x 5 grid of source positions on the underside of the phantom and a 11 x 9 grid of virtual detectors on top, displayed in Fig. 11(b). The spatially constant, but spectrally varying optical properties of the phantom were measured previously in time resolved experiments [66]. The absorbing dye within the phantom was treated as a chromophore that has unit concentration in the bulk of the phantom, the extinction coefficient of which was modelled by the measured absorption coefficient. A heterogeneous version of the phantom was imaged which contained a cylindrical rod of radius 3mm and length 50mm, at a depth of 5mm ( Fig. 11(a)). The rod has twice the absorption coefficient of the bulk phantom which provides a 2:1 contrast in dye concentration compared to background (Fig. 12 left). A homogeneous version was also imaged, enabling calibration of the model/data mismatch and any source or detector coupling variation. The mesh, as shown in Fig. 11(a), consists of 85,205 nodes and 451,821 linear tetrahedral elements. Ground truth data and images reconstructed with L 2 and L 1 methods are shown in Fig. 12 respectively. The experiments described in the previous sections showed that the particular choice of L 1 method makes only a very small difference to the quality of the reconstruction, but there are very large differences in computational efficiency, with FISTA being far more efficient in this domain because of its superior ability to deal with large problems (as will be shown in section 5.5). Therefore in this experiment, we only use FISTA as the L 1 solver. It can be clearly observed from Fig. 12 that L 2 regularization over-smooths the reconstructed images which have much lower image contrast than the ground truth. Some artefacts can be seen in the source and detector areas. We note that only the central region can be reconstructed in both cases because the sources and detectors are confined to this region, with very low sensitivity away from the centre. The image contrast reconstructed by L 1 regularization is much closer to the ground truth but with more compact results. We calculate the three evaluation metrics in the volume of illumination (Table 4) and these support the same conclusions.

Comparison of CPU time consumed in the inverse model
We now compare the computational efficiency of the proposed methods. All experiments are performed using Matlab 2013a (Mathworks, Natick, USA) on a Windows 7 (Microsoft, Redmond, USA) platform with an Intel Core CPU i7-6700 at 3.40GHz and 64.0GB memory. The simulated experiments described in section 5.2 were used to perform this comparison. CPU times used in the inverse procedure only are measured. We run each method over ten realisations of noise at each of five noise levels to obtain reliable statistics. Figure 13 shows the CPU time consumed for the four different methods (Tikhonov, IRLS, ADMM, FISTA). In order to display the recorded times  from ten repetitions clearly, CPU times for one iteration of the outer loop of the reconstruction algorithm are shown in Fig. 13. Total CPU times for iteration to convergence are given in Table 5. FISTA is clearly the fastest L 1 regularization method amongst those considered here, and it is faster even than Tikhonov regularization which does not use an inner iteration. FISTA only involves the computation of J T J which is much more computationally efficient than the computation of JJ T . IRLS and ADMM are substantially slower because they require an inner iteration and inversion/multiplication of large matrices.

Conclusion
In this paper, a new model for spectrally constrained DOT reconstruction with L 1 regularization is proposed. Numerical experiments showed that compared to the L 2 -norm, L 1 regularization can reduce crosstalk and maintain image contrast by inducing sparsity. These findings were tested on real experimental data using a tissue-simulating phantom and similar results were found. Although all L 1 -based methods perform similarly in terms of reconstruction quality, FISTA performs marginally better than ADMM and IRLS by the measures of AC, PC, and PSNR, and is far more computationally efficient as it avoids direct matrix inversion and large matrix-matrix multiplications. The contributions of this paper can be summarized as follows: 1) It is the first time that L 1 regularization methods and spectrally constrained DOT methods have been used together and it is their combination (i.e. spectral-L 1 model) that is original. We have given detailed descriptions of how this can be done, and performed systematic comparisons of the performance and efficiency of the different methods on both simulated and real data. We are not aware of any previously published work that proposes such a model and performs such a detailed analysis of its performance; 2) We have developed a method to automatically select the regularization parameters. This is based on the L-curve method but is modified for efficient application in this use-case; 3) We have conducted extensive numerical experiments on simulated data and on real experimental data, and have performed comprehensive and robust qualitative and quantitative evaluations. To the best of our knowledge, this is the first systematic study in the area of spectral DOT reconstruction to perform such a comprehensive evaluation.