Graph- and finite element-based total variation models for the inverse problem in diffuse optical tomography

Total variation (TV) is a powerful regularization method that has been widely applied in different imaging applications, but is difficult to apply to diffuse optical tomography (DOT) image reconstruction (inverse problem) due to complex and unstructured geometries, non-linearity of the data fitting and regularization terms, and non-differentiability of the regularization term. We develop several approaches to overcome these difficulties by: i) defining discrete differential operators for unstructured geometries using both finite element and graph representations; ii) developing an optimization algorithm based on the alternating direction method of multipliers (ADMM) for the non-differentiable and non-linear minimization problem; iii) investigating isotropic and anisotropic variants of TV regularization, and comparing their finite element- and graph-based implementations. These approaches are evaluated on experiments on simulated data and real data acquired from a tissue phantom. Our results show that both FEM and graph-based TV regularization is able to accurately reconstruct both sparse and non-sparse distributions without the over-smoothing effect of Tikhonov regularization and the over-sparsifying effect of L$_1$ regularization. The graph representation was found to out-perform the FEM method for low-resolution meshes, and the FEM method was found to be more accurate for high-resolution meshes.


Introduction
Diffuse optical tomography (DOT) is an important non-invasive imaging technique whose major applications include diagnosing breast cancer [1][2][3], imaging small animals for the study of disease and analyzing brain function in functional neuroimaging [4][5][6][7].In DOT, near-infrared light (spectral range of 650-900 nm) is injected into the object through optical fibers placed on the surface of the object.The transmitted light is then collected using optical detectors, forming a series of boundary measurements, each of which corresponds to the signal received by a single detector during illumination by a single source.Image reconstruction algorithms are then used to recover the internal distribution of the underlying optical properties of the object from the boundary measurements.
Due to the limited availability of boundary measurements and diffusive nature of near-infrared light propagation, image reconstruction in DOT is an underdetermined, ill-posed and nonlinear inverse problem.Regularization is often used to constrain the inverse problem to yield physiologically and anatomically plausible solutions, resulting in the following minimization problem with respect to the optical properties µ arXiv:1901.01969v2 [cs.CV] 5 Apr 2019 where Φ M represents the boundary measurements acquired from the optical detectors, F is the non-linear operator induced from the forward model [8], R is a general regularization term, and λ is a weight that determines the extent to which regularization will be imposed on the solution µ * .The quadratic Tikhonov-type regularization is widely used.However, it promotes smooth solutions and thereby smears sharp features embedded in the image [9].Regularizations based on the L 1 -norm of the solution have been also extensively studied [10][11][12], as they impose a sparsity constraint on the solution, enabling the recovery of sharp edges of objects in reconstructed images.Eq. ( 1) with either regularization mentioned results in a convex optimization problem, where highly efficient algorithms [13] are available.Recently, the more general L p regularization (0 < p < 1) that promotes sparsity into the resulting image [14], has been employed for DOT image reconstruction [15,16].However, L p regularization is nonconvex and therefore difficult to optimize.
Regularizations involving the L 1 or L p -norm of the solution are used under the assumption that the optical properties (representing the image) to be reconstructed are spatially sparse.These regularizations tend to oversparsify the distribution of the optical properties when such an assumption does not hold [13], for example, in the case of multiple activations or complex injuries in the brain, where the features of interest are not spatially localized and the optical properties relative to the background are therefore non-sparse [7].In order to be able to reconstruct images in which edges are preserved and features are not spatially sparse, a different approach is required.Total variation (TV) regularization, which uses the L 1 -norm of the gradient of the solution as a regularizing term (the detailed forms will be given in Section 2.1 and 2.2), can be used to overcome the limitations.The gradient operator can transform the solution µ * to a sparse space where non-zero values only occur at sharp features.As such, TV can perform better than the pure sparsity preserving regularizations at preserving edges of objects in the images that are not sparse.Further, gradient is a highpass operator, which imposes smoothness to the solution.This improves the conditioning of the minimization problem (Eq.( 1)), thus enabling a robust numerical solution.Due to these advantages, TV has been adapted from applications in imaging processing [17][18][19] to various medical image reconstruction problems, including photoacoustic tomography (PAT) [20], bioluminescence tomography (BLT) [21], fluorescence tomography (FT) [22], as well as DOT [23][24][25].
In most TV-associated imaging problems, the minimization problem is carried out on a Cartesian grid where each element represents a pixel (voxel in 3D) in the image [26].In this case, the differential operators resulting from minimizing the TV regularization, such as gradient, divergence, Laplacian and curvature, are discretized straightforwardly using the finite difference method (FDM) [26].In DOT, it is however non-trivial to represent the complex geometry (i.e. the multi-layer head used in our experiment) using a Cartesian grid and the FDM is not always practical.Two representations are often employed to model complex geometries: finite element and graph representations.In the former, the object geometry is represented by a polygon/polyhedron, over which a series of disjoint triangles (tetrahedron in 3D) are usually generated.In the latter, the object geometry is represented by an unstructured graph, defined by vertices, edges and weights.Such a graph is normally constructed by exploring neighborhood relationships between vertices.For each representation, there is a systematic discretization scheme (finite element discretization or graph discretization) for the differential operators, which can be readily applied to the minimization of TV-associated problems.We note that although TV regularization has previously been studied in DOT [23][24][25], none of these studies have provided detailed information about the discretization schemes they used.The performance of different discretization schemes for TV regularization in DOT has not been systematically investigated.
The minimization of a TV-associated problem can be non-trivial due to the non-linearity and nondifferentiability of the TV regularization term.In image processing, many efficient optimization algorithms have been developed for this task, including iteratively reweighed least squares [27], primal dual [28], split Bregman [29], and fast iterative shrinkage-thresholding algorithm (FISTA) [30,31].Recently, alternating direction method of multipliers (ADMM) [18,26,32,33] has become increasingly popular.The elegance of ADMM lies in decomposition of the original minimization problem into several simple subproblems, each of which either has a closed-form solution or can be iteratively solved with efficient numerical methods.However, since ADMMbased methods have been implemented mainly for Cartesian grids using a forward-backward FDM [34] and it is not straightforward to generalize them to solve the inverse problem on an unstructured domain.Moreover, the non-linearity of the data fitting term in Eq. ( 1) further complicates the DOT reconstruction problem, making the minimization process required to solve Eq. ( 1) difficult.
In this paper, we address these limitations and develop TV regularization approaches for the inverse problem in DOT.More specifically, we make the following three distinct contributions: (1) We introduce finite element and graph representations to discretize the TV regularization term in DOT reconstruction enabling the minimization of the inverse problem associated with TV regularization to be carried out on unstructured domains.To the best of our knowledge, this is the first time that finite element-based discretization methods have been provided in detail for DOT image reconstruction with TV regularization.Additionally, we are not aware of any previous work that attempts to formulate the TV-regularized inverse problem using a graph representation.
(2) We propose an efficient algorithm based on ADMM to minimize the TV-regularized inverse problem.Our algorithm can handle geometries with unstructured grids, and also reduces the computational difficulties arising from the non-differentiability and dual non-linearities in the inverse problem.(3) We further investigate the isotropic and anisotropic variants of the TV regularization, and compare their finite element-and graph-based implementations against a baseline Tikhonov model, both qualitatively and quantitatively using extensive numerical experiments.

Discretizations using finite element and graph representations
In this section, we first show how an unstructured computational domain can be modelled using finite element (FE) and graph representations.The definitions of TV regularizations (anisotropic and isotropic versions) and corresponding discrete differential operators are then given accordingly for each representation.In Fig. 1, we show a circle discretized with the two representations.For the FE representation (left), the circle is divided by a series of elements joint at different vertices.In the FE representation, we normally term a discrete geometry as a FE mesh.Within the circle mesh, a triangle (representing one element) is highlighted comprising three disjoint vertices.For the graph representation (right), the circle is simply discretized with a set of vertices and edges.In this representation there is no concept of 'element'.Note that it is easy to convert the FE representation to the graph representation.For example, the FE mesh can be viewed as a graph if we consider only the vertices and edges in it.In this section, two representations are introduced to model the unstructured computational domain of complex DOT geometries.

Finite element-based discrete differential operators
We apply the Galerkin FE method to the computational domain in DOT [35], the first step of which is to approximate a continuous function by a piecewise-polynomial function.Using the FE representation, let us first discretize an unstructured 2D domain Ω by M triangles jointed at N vertex nodes (e.g.Fig. 1 left).V = {V k } N k=1 denotes a finite number of N nodes.Let ϑ h be the 2D vector space of continuous piecewise-linear functions on the triangles in the FE mesh.The continuous and piecewise-linear function U(x, y) : Ω → R, approximating the optical properties on Ω, can be written in the form of Here {ϕ i } N i=1 , ϕ i ∈ ϑ h are linear basis functions defined as ϕ j (V i ) = 1 if i = j and ϕ j (V i ) = 0 if i j. µ i : V → R is the value of optical property on each vertex in the FE mesh, i = 1, ..., N.
Eq. ( 2) means that the optical property value inside a triangle is associated with the optical property values on all vertices in the mesh.Given three vertices of a triangle T, i.e. v 1 = (x 1 , y 1 ), v 2 = (x 2 , y 2 ) and v 3 = (x 3 , y 3 ), there are three linear basis functions ϕ i associated with the vertices, which are respectively expressed as and c 3 = (x 1 y 2 − x 2 y 1 ) /(2A T ).(x, y) represents any point inside of the triangle T. A T denotes the triangular area of T, which is computed as In FE, one starts from a continuous problem and approximates the solution with a piecewisepolynomial function U.As such, we define the following anisotropic and isotropic TV regularizations ∫ In Eq. (4) and Eq. ( 5), the continuous TV regularizations and their resulting discretized versions are shown on the left-hand side and right-hand side, respectively.The two discrete versions respectively are the anisotropic and isotropic definitions of TV regularization.∂ x and ∂ y are continuous partial derivatives along the x and y directions, respectively.D x is a matrix of size M × N which, when applied to µ gives the discrete partial derivative of µ along the x direction.D y is the derivative matrix along the y direction.D x µ and D y µ are therefore two vectors of size M × 1, where M is the number of triangles in the mesh.We note that the main idea of FE is to break down the calculation domain Ω onto the local elements individually.Afterwards, the derived local matrices are assembled element by element to enable the final computation.Eq.
(4) and Eq. ( 5) can be proved by expressing the partial derivatives ∂ x U and ∂ y U in terms of a basis.To illustate this idea we prove the first term of Eq. ( 4): where A T i denotes the area of triangle T i and {i, 1}, {i, 2}, {i, 3} in a i,1 µ i,1 + a i,2 µ i,2 + a i,3 µ i,3 represent the indices of the vertices of the ith triangle.As a i,1 µ i,1 + a i,2 µ i,2 + a i,3 µ i,3 is a linear combination, we can thus construct the discrete derivative matrix D x with the following steps: • Initialize all-zeros matrix D x of size M × N.
• Loop over M triangles; for each triangle i, compute the coefficients a 1 , a 2 and a 3 using the coordinates of the three vertices and fill in the three columns in the ith row of matrix D x corresponding to the position of the three vertices in the node sequence.
The discrete derivative matrix D y can be obtained in a similar way.Note that D x and D y are sparse matrices as their most entries are zeros.With D x and D y defined, we can therefore minimize the TV regularization (either anisotropic (Eq.( 4)) or isotropic (Eq.( 5)) version) with the data fidelity term in (Eq.( 1)) for DOT reconstruction over 2D unstructured geometries.The corresponding 3D counterparts were also implemented in this paper, as shown in the experiments.

Graph-based discrete differential operators
In this section, we introduce discrete differential operators on graphs and from these derive the TV regularization terms.First, we discretize an unstructured domain Ω by a weighted graph G = (V, E, w) (e.g.Fig. 1 right).In the graph G, V = {V k } N k=1 denotes a finite set of N vertices, and E ∈ V × V is a finite set of weighted edges.We assume that G is an undirected simple graph (no multiple edges) in this study.Let (i, j) ∈ E be an edge of E that connects the vertices i and j in V. Let µ i : V → R denote the value of the optical properties on i.The DOT reconstruction problem then reduces to finding an optimal value of the optical properties for each of N vertices in V.
The discrete differential operators are defined on the graph based on nonlocal methods [18,37,38].First, we define the nonlocal gradient operator ∇ w acting on µ i For vertex i ∈ V, ∇ w µ i is a vector with a length of N. The weight w i, j : V × V → R + represents the similarity between nodes i and j, which is nonnegative and symmetric.The weight function can be determined in many ways.In this study we choose to use the Euclidean distance to define the weight function with w i, j = 1/d i, j where d i, j represents the distance between vertex i and j.We note that an important difference between the FE gradient and the nonlocal gradient is that the former has two directions in 2D or three directions in 3D, whilst the latter is a vector of partial derivatives along all edges connected to the node.

Given a vector function ν
where ν i j is the jth index in the vector ν i .Based on Eq. ( 7) and Eq. ( 8), the nonlocal Laplace operator ∆ w acting on µ i is written as which is a linear operator also known as the graph Laplacian.With these discrete differential operators defined on graph, we propose the anisotropic graph TV and the isotropic graph TV regularization In the above definitions, the constructed graph G is assumed to be fully connected, meaning that each vertex is connected to all other vertices in G.In this case, the computational load for the minimizations of Eq. ( 10) and Eq. ( 11) will be extremely heavy.Spectral graph theory [39,40] or nearest neighbour [41,42] techniques are typically employed to limit the number of edges that are considered.For example, [39] and [40] use spectral approaches and the Nystrom extension method [43] to efficiently calculate the eigen-decomposition of a dense graph Laplacian.In this work, we build the graph by borrowing the positions of the vertices and the connectivity between vertices in the finite element mesh as the vertices and edges in the graph.With this structure, the graph is sparsified and only connected vertices for a given vertice are taken into consideration.In such a case, Eq. ( 10) and Eq. ( 11) are equivalent to and where We note that the 2D and 3D implementations of these differential operators are identical, making the resulting minimization processes of Eq. ( 12) and Eq. ( 13) more straightforward than the FE implementation.

Minimization of TV-associated DOT inverse problems
Due to the non-linearity of the data fitting term and the non-differentiability of the TV regularizations, it is non-trivial to minimize a TV-regularized inverse problem.It is harder than minimizing the standard L1-regularized inverse problem [13] because of the existence of the gradient operator.
In this section, we propose an efficient algorithm based on ADMM to address this, the idea of which is to first linearize the non-linear inverse problem and afterwards apply ADMM to the resulting linearized problem.The whole process is then iterated until convergence.We note that due to the use of the differential operators in Sections 2.1 and 2.2, the proposed algorithm can handle complex geometries with unstructured grids, and also can ease the computational difficulties arising from the non-differentiability and non-linearities in the inverse problem.We now describe the details of this algorithm.

Linearization
Non-linear problems are technically difficult to tackle directly, so iterative linearization can be used to convert a non-linear problem into a series of local linear problems.To do so, Taylor's series expansion is first used to approximate F (µ) in the fitting term of Eq. ( 1) as where J denotes the Jacobian calculated from the (k − 1)th iteration and is defined as The Jacobian in DOT is normally calculated using the adjoint method [44].
With this first order approximation, the non-linear problem Eq. ( 1) can be converted to which is an iterative linearized algorithm.∆Φ k−1 is the data-model mismatch which is given by Φ M − F µ k−1 and ∆µ k is the change in the optical property at the k-th iteration.With a proper initialization µ 0 , a local minimizer to Eq. ( 1) can be found by iteratively updating this step.In Eq. ( 4), Eq. ( 5), Eq. ( 12) and Eq. ( 13), we have defined the four types of TV regularizations using different representations.We then apply them to R(∆µ) in Eq. ( 15), resulting in the following four linearized minimization problems in Table 1.However, it remains unclear how to optimize the second step in Eq. (15).We address this using ADMM.
Table 1.Four TV-regularized minimization problems obtained by applying different TV regularizations to Eq. (15).A-FETV, I-FETV, A-GTV and I-GTV respectively represent anisotropic finite element total variation, isotropic finite element total variation, anisotropic graph total variation and isotropic graph total variation.

Name Formulation
A-FETV

ADMM Implementations
In this section, we introduce ADMM in detail to minimize the A-FETV, I-FETV, A-GTV and I-GTV problems.We begin with the ADMM implementation for A-FETV.Specifically, auxiliary splitting vectors ν x and ν y are introduced to represent D x (∆µ) and D y (∆µ) respectively.Therefore the A-FETV problem is transformed into the following unconstrained optimization problem: where superscript n denotes the nth ADMM iteration and b x and b y are iterative parameters.In order to find the minimizer of Eq. ( 16), an alternating optimization method is used where Eq. ( 16) is split into several subproblems with respect to ∆µ, ν x , ν y , b x and b y , each of which can be solved separately.
First the iterative minimization approach requires us to solve the subproblem with respect to µ which has the optimality condition As the inversion matrix of Eq. ( 18) has size N × N, in order to achieve high efficiency, we use a gradient descent method to optimize the functional iteratively, in which the step size controls how far the iterate moves along the gradient direction during the current iteration.Instead of setting the step size manually, we use a backtracking line search to enforce convergence [31].
The next subproblem with respect to ν x and ν y is given as It should be noticed that, in A-FETV, there is no coupling between ν x and ν y .We can explicitly compute the optimal value of ν x and ν y using the generalized shrinkage operators with the convention that 0/0 = 0.The last one is to update the iterative parameters b x and b y , as In I-FETV, using the same alternating optimization method, the original minimization problem can be transformed as where and M is the number of finite elements.The first subproblem (L 2 component) with respect to ∆µ is the same as A-FETV.It should be noted that the ν x and ν y variables cannot be decoupled as they were in A-FETV.In order to solve the subproblem with respect to ν x and ν y , we can explicitly solve the minimization problem for ν n x , ν n y , using a generalized shrinkage formula with the convention that 0/0 = 0 and We then propose ADMM-based algorithm to address the minimizations of A-GTV and I-GTV.For A-GTV, we first introduce an auxiliary splitting vector variable ν, an iterative parameter b, and a positive penalty parameter θ.The sizes of ν and b are both of N × N where N represents the number of vertices.The A-GTV problem can be reformulated as the following unconstrained optimization problem Since Eq. ( 25) is a multivariate minimization problem, we first solve the subproblem with respect to ∆µ which gives the the optimality condition With the definition of the nonlocal divergence operator (Eq.( 8)) and the nonlocal Laplace operator (Eq.( 9)), the point-wise equation system (Eq.( 27)) can be equivalently converted to the following matrix-based equation system L above is the graph Laplacian in matrix form, whose entries are In Eq. ( 28), the vector . Eq. ( 28) is a system of linear equations, the solution ∆µ n can be acquired iteratively using the same method in A-FETV.Then we minimize the following subproblem with respect to ν which has an analytical solution, calculated from the generalized shrinkage formula with the convention that 0/0 = 0. Lastly, we update the iterative parameter b with We can similarly apply ADMM to the minimization of I-GTV, which can be transformed into the following unconstrained problem with the auxiliary splitting vector variable ν, an augmented Lagrangian multiplier b, and a positive penalty parameter θ.
The L 2 subproblem with respect to ∆µ is the same as the one in A-GTV and can be computed with Eq. ( 28).We then fix ∆µ to minimize the second subproblem with respect to ν: which can be solved with the following soft thresholding equation Algorithm 2: ADMM-based algorithm for I-GTV and A-GTV.
Therefore the whole procedure for minimizing the TV-regularized inverse problem (Eq.( 15)) is given in Algorithm 3, in which outer − loop represents the number of iterations required for the DOT reconstruction and is set to 40 for all experiments in this paper.Algorithm 3: Algorithm for minimizing the TV-associated inverse problem.

Experiments
In this section, we describe extensive experiments to qualitatively and quantitatively evaluate the performance of finite element and graph representations on the two variants of TV regularization in DOT.We use the finite element representation for the forward modelling in all the experiments and use both the finite element and graph representations to discretize the TV regularization term during the solution of the inverse problem.We first define four evaluation metrics to quantify the quality of the reconstructed images.Then we describe simulated numerical experiments on 2D circle and 3D head samples, and real experiments performed on phantom samples.Fig. 2 shows the unstructured grids of the three computational domains.Red dots represent the vertices in the computational domain.In 2D, using the finite element representation, the computational domain is discretized with a finite number of triangles (Fig. 2 (d)) while in 3D, tetrahedra are taken as the basic element (Fig. 2 (e)).However the graph representation is the same in both 2D and 3D because the graph method requires only vertices and edges of the mesh.For simulated experiments in which measurement noise was added, ten repeats were performed.In all experiments, the forward model was implemented using the NIRFAST package [45] in Matlab R2017a (Mathworks, Natick, USA).The simulated experiments conducted are all based on single wavelength continuous-wave (CW) measurements where the optical property to be recovered is the tissue absorption coefficient µ a at that wavelength.

Quantitative Evaluation Metrics
The quantitative evaluation is performed using four evaluation metrics: the localization error, average contrast, peak signal-to-noise ratio (PSNR) and relative recovered volume.If the reconstructed image is identical to the ground truth image, the localization error is 0, average contrast and relative recovered volume are both equal to 1. PSNR is higher if the reconstructed image is closer to the ground truth image.
Localization error is defined as the Euclidian distance between the central nodes X s of the simulated activation region and X r of the recovered activation region.The recovered activation regions are selected by thresholding the recovered changes based on 60% of the maximum recovered changes.Localization error = X s − X r 2 . ( The second evaluation metric is the average contrast which is based on the mean value of the region of interest: where µ i denotes the recovered optical property on the finite element node i. N r is the number of nodes in the recovered activation region.µ is the ground truth values of the optical property in the recovered activation region.PSNR is the third evaluation metric, which aims to evaluate the difference between the ground truth image and the recovered image.Larger PSNR values means less difference between these two types of images.PSNR is defined as follows Here, MAX µ is the maximum pixel value of µ and MSE is the mean squared error between the recovered and ground truth images with MSE = N i=1 µ i − μi 2 /N.Finally, we measure the relative recovered volume which is given as where V r and V s denote the volume of the recovered activation region and simulated activation region, respectively.

Experiments on anisotropic TV regularization
Anisotropic TV regularization is easy to implement because the partial derivatives along different directions can be decoupled as explained in Section 3.2.It is based on the assumption that the shape of the region of interest is aligned with the coordinate axes.Its minimization favors horizontal and vertical structures, because oblique structures cause the TV regularization to increase [46].In DOT, this assumption does not necessarily hold as the region of interest is normally random and structures are not normally aligned with the coordinate system.Therefore anisotropic TV regularization seems to be a poor choice for discrete TV in DOT, as it yields 'blocky' artefacts.However no research has been carried out about the relationship between the 'blocky' artefacts and the representation employed to discretize over the unstructured computational domain.In this section we investigate the anisotropic TV regularization in DOT reconstruction and compare their FE-and graph-based implementations.The effect of the representation method adopted on anisotropic TV regularization will be evaluated.
A two dimensional (2D) circular geometry is simulated with one anomaly centered at (-10mm,10mm).The 2D model has a radius of 43mm while the radius of the anomaly is 10mm.Sixteen source-detector fibres are placed equidistant around the external boundary for CW boundary data acquisition.When one fibre as a source is turned on, the rest are used as detectors, leading to 240 total boundary data points per wavelength.All sources were positioned one scattering distance within the outer boundary because the source is assumed to be spherically isotropic.In order to evaluate the effect of mesh resolution on the representation method, two reconstruction meshes are created with different spatial resolutions.The coarser mesh has 1785 nodes and 3418 linear triangle elements with the average element size 1.6977mm 2 (Fig. 3 (a)) while the finer one has 5133 nodes and 10013 elements with the average element size 0.5801mm 2 (Fig. 3 (d)).The background absorption coefficient µ a is set as 0.01mm −1 and µ a for the anomaly is set as 0.03mm −1 (Fig. 3 (b) and (e)).µ s remains constant as 1mm −1 .To represent various realistic cases, normally distributed randomly generated Gaussian noise ranging from 0% to 3% at 1% intervals was added to the boundary measurements.Reconstructed images of the absorption coefficient are shown in Figs. 3 (c) and (f).

Two dimensional circular experiments
Using the same reconstruction meshes described in section 4.2, we compare I-FETV, I-GTV against a baseline Tikhonov model.To represent various realistic cases, normally distributed randomly generated noise ranging from 0% to 3% at 1% intervals was added to the amplitude of the boundary data.Reconstructed images of absorption coefficient are shown in Figs. 4 (c) and (f).The 1D cross sections and evaluation metrics comparisons are displayed in Fig. 5 and Fig. 6.

Three dimensional head numerical experiments
We now evaluate the isotropic TV model with two discrete differential operator definitions on a physically realistic three dimensional head model.This was created from T1-weighted MPRAGE scans acquired by Eggebrecht et al [7] using the process described by Wu et al [47] in which Statistical Parametric Mapping (SPM) software [48] was used to perform parametric segmentation of five tissue types (scalp, skull, cerebrospinal fluid (CSF), gray matter, white matter) based on the pixel intensity probability function distribution.The five layers were processed in NIRFAST to create masks and layered volumetric FEM meshes.The reconstruction mesh consists of 50721 nodes associated with 287547 tetrahedral elements, with the average element size 9.2676mm 3 .Each node is labeled by one of the five segmented head tissue types.Absorption coefficients assigned to each layer are from an in vivo study [49] at 750nm (Table 2).
A high-density (HD) imaging array with 158 sources and 166 detectors (Fig. 7 first column) [7] was placed over the whole head, with source-detector (SD) separation distances ranging from 1.3 to 4.8cm.In this study, 3478 differential measurements per wavelength were used to image hemodynamic changes in the brain.Two distinct anomalies were simulated simultaneously in the brain, with each 15mm radius.In order to simulate the traumatic brain injury (TBI) cases where tissue oxygen saturation (StO 2 ) is normally between 50% and 75% [50,51], the absorption coefficient in the two anomalies are calculated using Beer's law [45] with 55% StO 2 (Fig. 7 second column).In line with the expected in vivo performance of the imaging system, 0.12%, 0.15%, 0.41% and 1.42% Gaussian random noise was added to first (13mm), second (30mm), third (40mm) and fourth (48mm) nearest neighbor measurements to provide realistic data [52].Reconstructed absorption coefficients using different model are displayed in the third to fifth column of Fig. 7. Corresponding 2D slices are displayed in Fig. 8 and the evaluation metrics are presented in Fig. 9.

Experiments with phantom data
In the final experiment we evaluate different methods on real experimental data which is collected from a solid plastic cylindrical phantom using one non-contact CW-DOT system designed for hand imaging [53].The phantom has size of radius 12.3mm and length 50mm.35 sources and 99 detectors are positioned on the underside and top of the phantom respectively (Fig. 11(a)).The absorbing dye within the phantom was treated as a chromophore that has unit concentration in the bulk of the phantom.Its extinction coefficient was modelled by the measured absorption coefficient.A cylindrical rod was placed at the depth of 5mm to simulate the heterogeneous version of the phantom (Fig. 11(a)).The rod has radius 3mm and length 50mm and provides a 2:1 contrast in dye concentration compared to background (Fig. 11 (c) top row).Five wavelengths (650nm, 710nm, 730nm 830nm and 930nm) in a transmission setup are used to collect the boundary data.The reconstruction mesh consists of 9082 nodes and 48099 linear tetrahedral elements with the average tetrahedral elements size 0.4218mm 3 .Ground truth data and images reconstructed with Tikhonov, I-FETV and I-GTV are shown in Fig. 11(c).The four evaluation metrics in the volume of illumination are given in Table 3.For all the experiments above, the regularization parameter λ is determined using an L-curve method [13].The 2D images reconstructed using A-FETV and A-GTV are shown in Fig. 3, together with the original target distributions.The results show that A-FETV keeps reconstructing the target with boundaries that align with the coordinate axes which is same to the assumption of the anisotropic TV regularization.In addition, some artefacts are observed inside the recovered region of interest when the mesh resolution is low.The reconstructions using A-GTV do not feature these artefacts, and the recovered shape is more accurate, with no bias towards the coordinate axes.The experiments reveal that the blocky artefacts of anisotropic TV regularization are associated with the discretization method used.The blocky artefacts are clearly visible in reconstructions based on the finite element representation, but not in the ones based on the graph representation.This is because in the graph representation, the region is discretized along all edge-based directions, leading to nearly isotropic solutions.Therefore in DOT, A-GTV can adapt to the ground truth solution.However, it is not a good method to preserve anisotropy if anisotropy is a desired property of the solution.

Discussion
For the 2D case which uses isotropic TV regularization, Fig. 4, Tikhonov reconstruction over-smooths the results and smears the edges.The results become smoother with increases in measurement noise.Little difference can be visually observed between the reconstruction by I-FETV and I-GTV when the reconstruction mesh resolution is low.However when the reconstruction mesh has higher resolution, Fig. 4 (f), the results by I-FETV is visually closer to the ground truth than the ones by I-GTV.Similar findings are observed from the corresponding 1D cross sections (Fig. 5).Tikhonov reconstruction produces a single peaked distribution in the piecewise constant target area, and edges of the objects are over-smoothed.Both TV methods are able to reconstruct a piecewise constant distribution.However when the mesh resolution is lower (first column in Fig. 5), fluctuations in the target regions are observed in the results by I-FETV.When the mesh resolution is higher (second column in Fig. 5), the cross-section from I-FETV reconstruction is almost identical to the ground truth.In Fig. 6, red and blue areas represent 25% to 75% value among the ten repeats' experiments.We see that the performance of I-FETV improves with an increase of mesh resolution: by 25% in localization error, 26% in average contrast and 11% in PSNR, while the performance of I-GTV is relatively unaffected by the mesh resolution.These 2D experiments confirm that the discrete differential operators based on graph representation are not affected by the mesh resolution while the ones based on a finite element representation become more accurate when the reconstruction mesh is finer.The 3D images reconstructed from the head geometry represent a physically realistic case, in which two anomalies are simulated simultaneously in the brain.From Fig. 7, Tikhonov reconstruction lead to many visible artefacts near the source and detector area.Due to smoothing induced by Tikhonov regularization, sharp features are not present in the image recovered.I-FETV and I-GTV both can eliminate the surface artefacts resulting from Tikhonov regularization and reconstruct tightly localized results.These findings can be clearly observed in the 2D slice images shown in Fig. 8 and Fig. 9.It should be noticed that, in Fig. 8, the colorbar values corresponding to the green and red parts remain 0.001.It is because only three digits are selected after the radix point and in this study we use rounding off to constrain the three digits.From the visualization of the results, there is no obvious difference between the reconstruction performance of I-FETV and I-GTV because both are based on TV regularization.However it can be observed from the evaluation metrics comparison in Fig. 10 that I-GTV achieves the lowest localization error, where the spatial resolution of the reconstruction mesh is lower.Second column corresponds to Fig. 4 (f) where the spatial resolution of the reconstruction mesh is higher.Top to bottom row: 0%, 1%, 2% and 3% added Gaussian noise.
highest peak signal-to-noise ratio and average contrast much closer to 1.The average relative recovered volume achieved by I-GTV is 77%, compared with I-FETV (66%) and Tikhonov (64%).This experiment confirms the lower performance of I-FETV on reconstruction meshes with low spatial resolution.
In the experiments with phantom data, only the central region is reconstructed in all the cases because the positions of sources and detectors lead to very low sensitivity away from the centre.It can be seen from the second row of Fig. 11(c), that Tikhonov regularization over-smooths the reconstructed images which have much lower image contrast than the ground truth, especially in the first slice image.Artefacts are clearly observed near the source and detector areas.Even though total variation regularization can alleviate the over-smoothing effect caused by Tikhonov regularization, discretization methods still play an important role in the reconstruction performance.It should be noticed that I-FETV can alleviate the artefacts near to the sources and detectors but introduce some artefacts (staircase effect) in the non-anomaly area and does not preserve edges.I-GTV is seen to recover the anomaly with clear edges and high Tikhonov more fluctuatio n along with the increase of noise  image contrast.It is interesting to compare these results to those of our previous work [13] where L 1 regularization was applied to the phantom data.Reconstructions by L 1 regularization were found to be over-sparsified and over-compact.In this work, TV regularization, which induces sparsity to the gradient of the solution, is seen to effectively alleviate the over-sparsifying effect of L 1 regularization and is therefore suitable for non-sparse coefficient distributions.We calculate the four evaluation metrics in the volume of illumination (Table 3) and these support the same conclusions.Similar localization errors are obtained by the different methods with only 1mm difference.Comparing to I-FETV, I-GTV can obtain the highest average contrast and PSNR values with similar relative recovered volume.

I-FETV
I-GTV Tikhonov Ground truth Fig. 8. 2D slices of the reconstructions of the absorption coefficient changes on the forehead anomaly (first row in Fig. 7).The ground truth areas are highlighted in white ellipses.

Conclusion
In this paper, we introduce finite element and graph representations to discretize the TV regularization term in DOT reconstruction.Isotropic and anisotropic variants of the TV regularization are also investigated and compared between their FE-and graph-based implementations.The ADMM-based algorithms are proposed for each TV-regularized inverse problem.Experiments on the anisotropic TV regularization reveal that finite element representation yields the 'blocky' artefacts which is the designed in feature in the anisotropic TV regularization.However the graph representation favors the underlying shape of the region of interest so that the 'blocky' artefacts are not realized.Graph discretization on anisotropic TV regularization can adapt to the ground truth solution, but is not a good way to preserve anisotropy.
Numerical experiments on isotropic TV regularization illustrate that, comparing to Tikhonov regularization, TV regularization can alleviate the over-smoothing effect of Tikhonov regulariza-I-FETV I-GTV Tikhonov Ground truth Fig. 9. 2D slices of the reconstructions of the absorption coefficient changes on the back-head anomaly (second row in Fig. 7).The ground truth areas are highlighted in white ellipses.tion and localize the anomaly by inducing sparsity of the gradient of the solution.These findings were tested on real experimental data using a tissue-simulating phantom.I-FETV does not perform well on low resolution reconstruction meshes because of the discrete nature of the finite element representation.Because the finite element representation operates on each element, the discretization becomes more accurate when the mesh resolution increases.I-GTV is shown to be more stable and robust to changes in mesh resolution because I-GTV is discretized on the graph directly, having no information of elements.Hence I-GTV can give more accurate discretization when the reconstruction mesh is a coarse mesh which is the usual case in DOT.However, I-FETV will outperform I-GTV when an reconstruction mesh with high resolution is available.

Fig. 2 .
Fig. 2. (a)-(c): Discretized computational domain of the three experimental samples; (d): Detailed mesh composition of 2D geometry in finite element and graph representation respectively; (e): Detailed mesh composition of 3D geometry in finite element and graph representation respectively.

Fig. 3 .
Fig. 3. (a)-(c): Reconstruction on the 2D mesh with low spatial resolution.(d)-(f): Reconstruction on the 2D mesh with high spatial resolution.(a) and (d): 2D reconstruction mesh with sixteen co-located sources and detectors.(b) and (e) give the original target distributions.First row in (c) and (f) represents the results using A-FETV on 0% , 1% , 2% and 3% noisy data while the second row shows the results using A-GTV.

Fig. 4 .
Fig. 4. (a)-(c): Reconstruction on the 2D mesh with low spatial resolution; (d)-(f): Reconstruction on the 2D mesh with high spatial resolution.(a) and (d): 2D reconstruction mesh with sixteen co-located sources and detectors.(b) and (e) give the original target distributions.First row in (c)and (f) represents the results using I-FETV on 0% , 1% , 2% and 3% noisy data while the second row shows the results by I-GTV.

1 )Fig. 5 .
Fig. 5. 1D cross section of images recovered in Fig. 4. First column corresponds to Fig. 4 (c)where the spatial resolution of the reconstruction mesh is lower.Second column corresponds to Fig.4(f) where the spatial resolution of the reconstruction mesh is higher.Top to bottom row: 0%, 1%, 2% and 3% added Gaussian noise.

Fig. 6 .
Fig. 6.Evaluation metrics comparing the performance of different methods at four different noise levels.Top to bottom row: localization error index; average contrast index; PSNR index and relative recovered volume.Left column corresponds to the reconstructions in Fig. 4 (c) where the reconsturction mesh resolution is low.Right column corresponds to Fig. 4 (f) where the reconsturction mesh resolution is relatively high.

Fig. 7 .
Fig. 7. First column: distribution of the imaging array with 158 sources (red dots) and 166 detectors (white dots) and the positions of the two simultaneous simulated anomalies.Second to final column: Ground truth and reconstructions by Tikhonov, I-FETV and I-GTV.

Fig. 10 .
Fig.10.Evaluation metrics comparing the performance of different methods on a 3D head model.The left column represents the reconstruction of the forehead anomaly (first row in Fig.7), while the right column gives the reconstruction of the back-head anomaly (second row in Fig.7).

Fig. 11 .
Fig. 11.(a): Distribution of sources and detectors.(b): Illustration of the overall distribution of three slices.(c): Ground truth and reconstruction results with different methods.From top to bottom: ground truth; results with Tikhonov regularization; results with I-FETV regularization and results with I-GTV regularization.

Table 2 .
Head tissue optical property for each of five layers.

Table 3 .
Evaluation of different methods for reconstruction on a tissue-simulating phantom.Localization error / mm Average contrast / -PSNR / -Relative recovered volume / %