References

Optical tomography schemes using non-linear optimisation are usually based on a Newton-like method involving the construction and inversion of a large Jacobian matrix. Although such matrices canbe efficiently constructed using a reciprocity principle, their inversion is still computationally diffcult. In this paper we demonstrate a simple means to obtain the gradient of the objective function directly, leading to straightforward application of gradient-based optimisation methods.


Introduction
Optical tomography is potentially a powerful tool to noninvasively obtain spatially resolved data of the optical parameters of tissue, from which physiologically relevant information such as local oxygenation can be calculated.Primary applications of this new imaging modality are the monitoring of cerebral blood and tissue oxygenation of newborn and preterm infants [1,2] to prevent death or permanent brain damage caused by asphyxiation during birth, functional mapping of brain activation during physical or mental exercise [3], and imaging of the breast to detect tumours [4].Recent developments can be found in several review articles in the recent Royal Society meeting [5], and other journals [6,7].
The principal problem that arises in optical tomography is the dominance of scattering, which causes light to propagate diffusely in tissue and thus prohibits the application of direct reconstruction methods using the Radon transform.Various reconstruction methods, including ad hoc backprojection [8,9] and semi-analytic [10] schemes have been proposed, but increasingly attention is turning to iterative, optimisation based reconstruction methods [11][12][13].
In this approach, the problem is regarded as the optimisation of an objective function representing the sum-squared difference of the data to the model, plus additional regularisation terms representing prior knowledge.The most prevalent approach to date has been a Newton-like method such as Levenberg-Marquardt which involves repeated construction and inversion of the problem Jacobian.This method rapidly becomes intractable as the size of the problem domain increases, for example when a fully 3D problem is considered.An alternative is to consider gradient-based algorithms, such as conjugate gradients (CG).In this paper we present a method to derive the gradient efficiently using an adjoint source scheme.

Problem definition
Let Ω be the domain under consideration, with surface ∂Ω.We consider S source positions ζ j ∈ ∂Ω (j = 1...S) and M j measurement positions ξ j,i ∈ ∂Ω for each source j (i = 1...M j ), resulting in a total number of measurements M TOT = S j=1 M j , with M UNIQ ≤ M TOT distinct measurement positions.The forward problem is non-linear and is represented by: Under the usual assumptions of a system corrupted by multivariate Gaussian noise, and a maximum-likelihood approach to the solution, we may define the inverse problem as the optimisation of an objective function which we write in vector form as where y j,i is the data for the i th measurement from source j with standard deviation σ j,i and b j,i = σ −1 j,i (y j,i − F j,i ) is the residual data for this measurement.We use subscripted vectors y j , F j , σ j to represent all the relevant quantities for a single source j.
Thus P j [µ, κ] is the projection operator for source j, F j is the projection data obtained by sampling P j at the discrete measurement positions ξ j,1 , ξ j,2 . . .ξ j,Mj and b j = R −1 j y j − F j .Bold vectors y, P, F, b represent the combined quantities from all sources.R is the data-space correlation matrix, which we here take to be

Transport model
Data acquisition for optical tomography is not limited to steady-state attenuation measurements.Time-of-flight systems using an ultra-short pulsed laser as a light source and time-resolved detectors allow the boundary measurement of the temporal intensity response function with a resolution of a few picoseconds [14].Frequency domain systems use a radio frequency modulated light source and measure the phase shift and modulation amplitude of the transmitted light [15].While the data acquisition equipment for the steady-state, temporal and frequency domain are quite different, the methods are related from the theoretical viewpoint: the time-of-flight method includes the steadystate case via the integral over time of the detected light, and is related to the frequency domain case via its Fourier transform.
We assume here that the propagation model is the diffusion approximation to the radiative transfer equation which in the frequency domain is: and in the time domain : where Φ is the isotropic photon density, q 0 is an isotropic source distribution and c is the speed of light in the medium.For a detailed discussion of the physical and mathematical models employed in this field, refer to the recent review articles by Hebden and Arridge [6,7].The model is characterised by the two spatially varying functions µ( r) (absorption) 1 and κ( r) (diffusion), which gives rise to the dual search-space nature of the optimisation problem defined in Eq. 2.
For both the frequency-domain and time-domain cases, the boundary measurement Γ(ξ) at ξ ∈ ∂Ω is related to Φ( r) by where n is the outer normal of ∂Ω at ξ.We use the Robin-type boundary condition where α is a term to incorporate boundary reflections as a result of a refractive index mismatch at ∂Ω [16,18].A collimated source incident at ζ ∈ ∂Ω is commonly represented by a diffuse point source q 0 ( r) = δ( r − r s ) where r s is located at a depth of one scattering length below the surface.

Finite element approach
Although the above approach is general, and can be applied to analytical forward models, the most successful approaches use a numerical approach to solve the Forward problem.Here we assume it is an FEM approach.The domain Ω is divided into P elements, joined at D vertex nodes.The solution Φ is approximated by the piecewise linear function Φ h ( r) where h is a finite dimensional subspace spanned by basis functions u i ( r), i = 1 . . .D chosen to have limited support.The problem of solving for Φ h becomes one of sparse matrix inversion for which standard methods such as Cholesky decomposition or conjugate gradient solvers are readily available.The advantage of the FEM approach is its versatility which makes it applicable to complex geometries and highly inhomogeneous parameter distributions.
As developed in [17,18], the diffusion equation in the FEM framework is expressed, in the frequency domain as : and in the time-domain as : where the system matrices K,C, A and B have entries given by : To relate the FEM approach to the the forward model, we define : where Φ j is the solution to Eq. 9 or 10 for the j th source, and M : L 2 (Ω) → L 2 (∂Ω) is a measurement operator.

Measurement types
The simplest measurement operator to consider is the boundary operator B which implements Eq. 7. In the discrete case this is a M j × D matrix.Since B is linear it can be applied to either temporal data Φ(t) or frequency domain data Φ(ω).We define the integrated intensity measurement In our approach we have advocated the use of measurement operators that are self-normalised where T is the integral of Γ(t) weighted by a temporal filter.Examples that have been suggested are : time-gated intensity: n-th temporal moment: n-th central moment: Additional features could be chosen, such as the logarithmic slope of the temporal decay of Γ, the peak intensity, etc.However, we aim to chose features that satisfy two conditions: robustness of the experimental measurements with respect to systematic errors such as fluctuations of the source power, detector sensitivity, or fibre coupling losses, and secondly the ability of the forward model to generate the corresponding data efficiently.Both conditions are satisfied for measurement types ( 18), ( 19) and (20), since these are normalised and therefore not dependent on absolute intensity measurements, and they can be calculated directly by our forward model without the need for explicitly generating the temporal profile of Γ(t) [22,23].

Reconstruction methods
To solve the optimisation problem, we consider the gradient of the objective function.Letting x represent either µ or κ the k th component of the gradient is given by : leading to the vector form The Jacobian J of the forward problem is a M TOT × N TOT matrix where M TOT = M j and N TOT is the number of coefficients of µ and κ expressed in some basis.In this paper we take the basis to be the element values, although other choices are possible.J has the structure #4014 -$15.00US Received November 24,1997 (C) 1998 OSA where we define J j as the M j × N TOT matrix that is the Jacobian for the objective function corresponding to source j and J j,(µ) , J j,(κ) as the sub-matrices corresponding to the two variables µ, κ.The Jacobian may be found in a row-by-row fashion if we establish the notation that the vector corresponding to the i th row of J j is the Photon Measurement Density Function PMDF T (j,i) , as defined in [19,20] which we represent : Previously reported reconstruction algorithms are of Newton-type, a prototypical example being the Levenberg-Marquardt algorithm [24,25].The feature of such methods is that J needs to be explicitly created, and repeatedly inverted.Whenever the number of parameters of the optimisation problem is large, explicit calculation and inversion of the linearised step is regarded as intractable.To overcome this difficulty we have previously reported the use of ART (Algebraic Reconstruction Technique) methods that operate in a row-by-row fashion using only the implicit representation of J given by Eq. 25.However these methods still have a storage overhead that is quite costly [29].Instead, gradient methods require only the computation of the gradient z defined in Eq. 22, the proto-typical example being conjugate gradients.In this method a set of conjugate search directions is generated to find the minimum of the objective function.At each iteration step a one-dimensional line minimisation along the current search direction is performed.Conjugate gradient methods are well-established in nonlinear optimisation.See for example [21].A brief outline is given in Algorithm 1.

Algorithm 1 Nonlinear conjugate-gradient method
, 0 Different choices for β exist.Above we used the Polak-Ribière method.Note that this requires restarting of the conjugate gradient method whenever β < 0 to guarantee convergence.A variety of line search methods can be used, either utilising gradient information, such as the secant method, or using only function evaluations such as the quadratic fit method.Often an exact line search is too computationally expensive due to the large number of function or derivative computations.Inexact line search methods such as Armijo's Rule define the bounds for acceptable step lengths which guarantee convergence.There the line search is terminated when the step length is within the valid range.

Gradient calculation
If the gradient is found by the explicit form of Eq. 23 then no advantage is gained.The use of an adjoint scheme to calculate z has been discussed previously for the timedomain diffusion model using a Finite Difference Scheme [26], the time-domain transport model using a Finite Difference Scheme [27], and the steady-state DC case using a Finite Element scheme [28].A summary of these approaches and their relation to other reconstruction methods is given in [29].Here we give a derivation for the normalised measurement operators that we use in our application.
Consider first the frequency domain problem.We have that If we define the Adjoint problem where is the adjoint source [20], then we have that PMDF T (j,i) is a vector in two parts, whose k th components are given by σ −1 j,i Φ + i (ω)× Φ j (ω) and σ −1 j,i ∇ Φ + i (ω)×∇ Φ j (ω) respectively.Here the notation a × b as defined in [20] is taken to mean the vector of samples of the product of the fields a, b and ∇ a × ∇ b is the vector of samples of the scalar product of the fields ∇a • ∇b.We may now write But Φ + m (ω) is the solution to Eq. 29, so we form a vector and solve leading to:

Results
We present simultaneous reconstructions of µ a and µ s using the nonlinear conjugate gradient algorithm in conjunction with the adjoint source scheme for two different test cases: a circular object with three absorbing and/or scattering objects embedded in a homogeneous background, and the model of a slice of the head of a neonate, with a complex distribution of different tissue types and embedded lesions.
The forward data used in the reconstruction of these test cases were simulated by the FEM light transport model, using highly resolved meshes to ensure good quality of the data.Coarser meshes were used for the inverse solution to speed up computation times.For all following reconstructions we used a combination of skew and Laplace transform data.We have argued previously that this yields the best results among the data types listed above [31].

Circular test object
The test object was a circle of radius 25 mm and three embedded objects, each with a radius of 1.5 mm.The optical parameters of the background medium and the objects are listed in Table 1.The two images in the left column of Fig. 1 show the locations of the perturbations.
Forward data were calculated for 32 source and 32 detector positions, equally spaced along the circumference.Noise was simulated with a noise model presented previously [30] [31], assuming that 10 4 photons were collected for each measurement.The meshes for the forward and inverse solvers contained 7200 and 2808 triangular elements, respectively.Figure 1 shows the target images, the reconstruction results obtained with the CG solver, and, for comparison, reconstructions obtained with a steepest descent and an ART (algebraic reconstruction technique) solver.All reconstructions started from the homogeneous background, and were terminated after 100 iterations.The ART solver used was Block-ART with random source ordering as described in [29].
All three algorithms manage to distinguish between absorption and scatter features with relatively little cross-talk, with the exception of the ART algorithm which shows a strong artefact in the absorption image in the position of the scattering feature.None of the methods recovers the absolute values of the perturbations very well, but the CG and ART methods obtain higher peak values than the steepest descent method after a given number of iterations.Target image (col.1), reconstruction after 100 iterations with conjugate gradient method (col.2), with steepest descent method (col.3) and with ART method (col.4).In all cases, the top image is µa, and the bottom image is µs.The animations linked to the figure show the first 50 reconstruction iterations, and the final 50 in steps of 10.
. The L 2 norms of the data errors, b 2 , for the three algorithms are shown in Fig. 2, as a function of the iteration number (left graph) and as a function of the reconstruction runtime (right graph).We find that ART performs better than the CG method at the beginning of the reconstruction, but levels out very quickly.After 100 iterations both reconstructions arrive approximately at the same error norm.With respect to runtime, the CG solver performs better than ART, because a single ART iteration is computationally more expensive than a CG iteration.Figure 3 shows the L 2 norms of the reconstructed images, ( x− x target ) 2 , for both absorption and scatter as a function of the iteration number.The results are similar to the data errors.Again we find that ART performs well during the first few iterations but quickly levels out, in particular for µ s .For higher iterations the CG algorithm becomes superior.The steepest descent method performs poorly for µ a and µ s .

Neonatal head model
To investigate a more complex case we use a mesh that simulates the parameter distribution in a cross-section of a neonatal head.The outline and internal boundaries between tissue types were derived from an MRI image.The sagittal diameter is 80 mm.We distinguish four tissue types and assigned literature values for their optical properties (Table 2).In addition we placed absorbing and scattering perturbations in random locations to simulate lesions or similar high-contrast features.
Skew and Laplace transform forward data for 32 sources and 32 detectors, equidistantly spaced at the boundary, were generated with a high-resolution mesh (41500 elements).For the reconstruction a coarser mesh of 10500 elements was used.The reconstruction was started from a homogeneous distribution, corresponding to the skin parameters (µ a = 0.022, µ s = 1).This is likely to be a worst-case scenario, since in practice it may be possible to assume more prior knowledge.
Figure 4 shows the target images and CG reconstructions after 100 iterations.The high contrast objects were generally well localised, in particular for µ s , while on the other hand the central µ a object is very diffuse, and the small µ a object at the lower right was not recovered.There is also some cross-talk from the µ s object at the upper left into the µ a image.The fine detail of the background tissues was not recovered.This example demonstrates the difficulties one might expect in the reconstruction of clinical data obtained from complex tissue structures, and it shows that simplistic models of blobs in a homogeneous background, as presented in the previous example, may not be very relevant in the assessment of reconstruction performance of clinical images.

Conclusions
Often it is thought that iterative optimisation based approaches to image reconstruction poses an insurmountable computational burden, especially when the fully threedimensional and time-dependent problem is addressed.In this paper we have presented a method for deriving the gradient of the objective function in a Finite Element Method based scheme for optical tomography.We presented only the conjugate gradient optimisation method.The use of higher order methods such as Truncated-Newton, is being investigated as well as the choice of appropriate regularisation schemes.
We have presented simultaneous reconstructions of absorption and scatter from simulated data for two cases: a simple homogeneous circle with embedded absorbing and scattering features, and a complex neonatal head model with an inhomogeneous background and several high-contrast lesions.We have presented a comparison between conjugate gradient (CG), steepest descent and ART implementations and shown that after the first iterations, where ART is favourable, CG performs equivalently or even better, if runtime is considered.
We have also demonstrated that the reconstruction of complex problems such as the neonatal head model in the second example, is fundamentally more difficult than the often used blobs in a homogeneous background, and we suggest that such complex cases must be considered in evaluating reconstruction algorithms that are to be employed in e. the solution to the DC steady problem) or E[Φ(t)] = B[ Φ(t)dt] (the sum over time of all arriving photons).

Figure 1 .
Figure 1.Target image (col.1), reconstruction after 100 iterations with conjugate gradient method (col.2), with steepest descent method (col.3) and with ART method (col.4).In all cases, the top image is µa, and the bottom image is µs.The animations linked to the figure show the first 50 reconstruction iterations, and the final 50 in steps of 10.

Figure 2 . L 2
Figure 2. L 2 data norms as a function of iteration number, for different algorithms, as a function of iteration number (left) and of runtime (right).

Figure 4 .
Figure 4.Target image (left col.), reconstruction after 100 iterations with conjugate gradient method (right col.).Top row is µa, bottom row is µs.The animations linked to the figure show the first 50 reconstruction iterations, and the final 50 in steps of 10.

Table 1 .
Optical parameters of circular test object.

Table 2 .
Optical parameters of neonatal head model.