Direct Quantitative Photoacoustic Tomography for realistic acoustic media

Quantitative photo-acoustic tomography (QPAT) seeks to reconstruct a distribution of optical attenuation coefficients inside a sample from a set of time series of pressure data that is measured outside the sample. The associated inverse problems involve two steps, namely acoustic and optical, which can be solved separately or as a direct composite problem. We adopt the latter approach for realistic acoustic media that possess heterogeneous and often not accurately known distributions for sound speed and ambient density, as well as an attenuation following a frequency power law that is evident in tissue media. We use a Diffusion Approximation (DA) model for the optical portion of the problem. We solve the corresponding composite inverse problem using three total variation (TV) regularised optimisation approaches. Accordingly, we develop two Krylov-subspace inexact-Newton algorithms that utilise the Jacobian matrix in a matrix-free manner in order to handle the computational cost. Additionally, we use a gradient-based algorithm that computes a search direction using the L-BFGS method, and applies a TV regularisation based on the Alternating Direction Method of Multipliers (ADMM) as a benchmark, because this method is popular for QPAT and direct QPAT. The results indicate the superiority of the developed inexact Newton algorithms over gradient-based Quasi-Newton approaches for a comparable computational complexity.


Introduction
Quantitative photo-acoustic Tomography (QPAT) is a steadily growing hybrid imaging paradigm that simultaneously takes advantage of the high spatial resolution provided by ultrasound imaging and the rich contrast attributed to optical imaging [48]. Typically, nanosecond-duration pulses of electromagnetic energy, in the visible or near-infrared ranges, are used to irridiate a sample. Depending on the optical properties of the sample, a fraction of the optical energy is absorbed, and converted into heat [50]. The generated heat induces a local increase in pressure via thermoelastic expansion effects [50]. Because of the elasticity of soft tissues, the locally induced pressures propagate outwards as acoustic waves, and carry information about the optical properties of the sample to the surface. These acoustic waves are measured in time by ultrasound sensors located outside the surface of the sample [48]. Given a set of time series of data at the boundary, the objective in QPAT is to calculate a quantitative image of a distribution of optical absorption coefficients of the sample [48].
QPAT involves two distinct inverse problems, namely acoustic and optical. The acoustic inverse problem, often referred to as PAT, seeks to reconstruct a distribution of the spatially varying initial pressure from the boundary data. This is a linear inverse problem, for which a vast number of reconstruction methods are available [34]. For acoustically homogeneous media, there are exact inversion methods, e.g., back-projection [51,52,9,28], or frequency-domain techniques [53,55]. Time reversal (TR) is a less restrictive approach since it can be adapted to heterogeneous acoustic media and arbitrary detection surfaces [17,54,9,23,24]. Model-based iterative inversion approaches, in which the discrepancy between modeled data and the measured data is iteratively reduced, are also widely used in ill-posed cases that arise from data incompleteness, modelling errors, finite sampling or noise [25]. The iterative methods can be categorised into convergent Neumann-series methods [39,33] or variational (optimisation) approaches [49,25,15,7,21].
The optical inverse problem is devoted to further reconstruction of an image of distributions of optical absorption and scattering coefficients from a recovered initial pressure distribution. This is a highly nonlinear and ill-posed inverse problem, and is commonly solved by iterative model-based approaches [41,42,19]. Using model-based approaches for solving these two inverse problems, an accurate, yet efficient, modelling of the associated physical processes is required [43].
A very accurate model for propagation of light is the Radiative Transfer Equation (RTE), which has been used for the optical portion of 2D QPAT, e.g. [41,35,20]. An analysis of the optical inverse problem of QPAT using RTE has been given in [3]. Since RTE is computationally very expensive, the applicability of this method for 3D QPAT is very limited. A Diffusion Approximation (DA) of the RTE is more efficient than RTE [43], and is thus more practical for 3D QPAT. In this study, we use the DA for modelling the optical portion of the inverse problem. The DA model is sufficienly accurate when the medium is highly scattering and the scattering is near-isotropic [43]. The DA has been widely used as a light propagation model for QPAT [41,42,32,19,22].
The DA model is defined as a function of optical absorption and scattering coefficients. It turns out that the recovery of both these coefficients from one optical source and wavelength is nonunique [6]. To have uniqueness for the inversion, three approaches are used, i.e., the assumption of the scattering coefficient as known [12], using more than one optical wavelength [5,11,32], or using more than one optical source [4,19,18]. In this study the latter approach is used.
For modelling the acoustic portion of the problem, the dependence of shape, spectrum and amplitude of propagating acoustic waves on properties of the medium [13], together with the highly nonlinear and ill-conditioned nature of the optical portion of the forward operator, motivates enriching the QPAT problem by simulation of tissue-realistic acoustic properties. To incorporate these effects, we use an acoustic model based on a linear system of three-coupled first-order wave equations which can be adapted to spatially varying sound speed and density [40,13], and include two fractional Laplacian operators in order to account separately for acoustic absorption and dispersion following a frequency power law, which is evident in tissue media [46,45]. For a numerical implementation of this acoustic model, we use a k-space pseudo-spectral method, which is popular for iterative PAT because of the high efficiency arising from a requirement of only two grid points per wave-length for defining a field, and a fast computation of the spatial gradient in the frequency domain [40,13].
For ill-posed problems of QPAT, for example, when data are available merely on a part of the boundary, or when acoustic properties of the medium are not known exactly, the reconstruction of initial pressure distribution using the acoustic portion of the inverse problem is not sufficiently accurate to further serve as data for the optical inverse problem [18]. It turns out that a direct estimation of the optical coefficients from time series of measured data, referred to here as direct QPAT, is more stable than classical variants of QPAT, in which the two inverse problems are solved distinctly. The direct QPAT has thus recently received much attention, where the forward operator is defined using a composite opto-acoustic model [20,18,31]. Using direct QPAT, it will be possible to incorporate prior information about the optical parameters into the acoustic inverse problem [18], and mutually the optical inverse problem can utilise information about noise included in the boundary acoustic data [31]. Additionally, the acoustic portion of the inverse problem can benefit from multi-source (resp. multi-wavelength) configurations, since the optical coefficients (resp. chromophore concentrations), as opposed to the initial pressure distribution, are independent from the changes in optical illumination (resp. wavelength) [16].
In [37,38], a linear Born approximation of the DA model based on a Green's function approach is coupled with an acoustic model that uses a free-space Green's function method in order to directly reconstruct perturbations in the optical absorption and diffusion coefficients from time series of measured acoustic data. A simultaneous reconstruction of perturbations in optical absorption coefficient and sound speed given an optical scattering coefficient was also studied using a Born approximation [16]. To the best of our knowledge, existing studies for the direct variant of QPAT have been so far limited to homogeneous and non-attenuating acoustic media, and the acoustic portion of the forward operator and its adjoint are computed based on exact formulae using Green's function approaches [20,18,31].
The optimisation approaches for QPAT can be categorised into linearised (Jacobian-based) or nonlinear (gradient-based) approaches. A majority of Jacobian-based methods for the optical portion of QPAT can be fit into a class of Gauss-Newton methods [36]. For application of these methods to classical (resp. direct) QPAT, see [41,42,32] (resp. [31]). These studies utilise an explicit form of the Jacobian matrix. For direct QPAT, because of a very large size of time series of measured data, an explicit computation and storage of the Jacobian matrix is very expensive [31].
To avoid this problem, nonlinear gradient-based approaches have received much interest for the direct problem of QPAT [19,18]. A majority of these approaches use a search direction based on Quasi-Newton methods which utilise only gradient information for an approximation of the Hessian matrix, and are thus memory-efficient [19,18]. The computation of the gradient for direct QPAT is based on an opto-acoustic forward operator and an acousto-optic adjoint of the Fréchet derivative of the forward operator [20]. A memory-efficient Quasi-Newton method is Limited-memory BFGS (L-BFGS), for which the inverse of Hessian matrix is approximated using the gradient information stored in a user-adjusted number of iterations. Using a total variation (TV) regularisation approach based on an Alternating Direction Method of Multipliers (ADMM), L-BFGS has been used for computing an associated search direction for the classical QPAT [19] and direct QPAT [18].
Contribution. Here, we develop two preconditioned inexact Newton (Newton-Krylov) algorithms for solving the direct problem of QPAT. In our first approach, the residual function is iteratively linearised, and each linearised subproblem is solved using a subspace Krylov method in a matrix-free manner, for which a total variation matrix is used as a preconditioner for accelerating the convergence of the Krylov method. We use the Preconditioned Conjugate Gradient (PCG) as the Krylov method, for which a TV-based preconditioning is applied using the Lagged Diffusivity (LD) method [47]. Our second approach uses two linearisations, the first of which is enforced to the nonlinear residual function, and the second is applied in order to handle the nonlinearity of a TV functional using a Primal-Dual Interior Point Method (PD-IPM). Using this approach, for each linearisation of the residual function, a sequence of normal equations is derived, and is solved using a subspace Krylov method in a matrix-free manner. We solve the arising normal equations using a PCG method, but we emphasise that an extension to other Krylov methods, e.g. a preconditioned variant of LSQR [2], is straightforward.
We implement our algorithms by assuming tissue realistic but erroneous properties for the acoustic medium with a limited-view setting for boundary measurements. We model the acoustic portion of the forward operator using a linear system of three-coupled first-order wave equations that can be adapted to heterogeneous media and account for acoustic absorption and dispersion following a frequency power law using two fractional Laplacians [45]. This acoustic model is very popular for PAT since it simulates an acoustic attenuation that is evident in many materials of interest including tissues [45]. To the best of our knowledge, this is the first study for direct QPAT that uses an acoustic model that accounts for tissue-realistic acoustic properties of the medium. For the acoustic portion of the problem, we include the action of perfectly matched layers (PMLs) in calculation of the adjoint operator. To the best of our knowledge, these effects have not been accounted for in existing studies (See [25]).
It is worth mentioning that a singular value decomposition (SVD) analysis on the acoustic forward operator we use has shown that as time steps increase some of the singular values of the forward operator become very small [46], and make the acoustic forward operator ill-conditioned. However, the use of an acoustically realistic forward operator may be necessary for direct QPAT, since the optical portion of the forward operator is highly nonlinear and ill-conditioned, and thus errors in acoustic modelling may quickly grow in the inversion process, and dominate signal data.
Our numerical results show that the developed preconditioned Newton-Krylov optimisation algorithms perform much better than nonlinear Quasi-Newton methods that have been used in existing studies for direct QPAT. The algorithm we use as a benchmark utilises a TV regularisation based on an ADMM method, together with a search direction using an L-BFGS method. (See [19,18] for application on classical QPAT and direct QPAT, respectively).

Direct QPAT on a continuous domain
In this section, we define our forward operator as a composite map on a continuous domain.

2.1.
Modelling the optical portion of the problem. The time scale for propagation of acoustic waves is on the order of a micro-second, which is three orders of magnitude larger than the timescale for illumination, propagation and absorption of light. Therefore, the generated pressure distribution is regarded as instantaneous for the acoustic problem, and is referred to as initial pressure distribution p 0 . One way to define a forward operator for our QPAT problem is to combine the physics of the optical and acoustic portions of the problem using a simple composition of two maps, one modelling propagation and absorption of optical photons and the other modelling propagation of acoustic waves [20].
Accordingly, let Ω ⊂ R d be a convex bounded domain with Lipschitz boundary ∂Ω and d ∈ {2, 3} the spatial dimension. Additionally, let φ ∈ H 1 (Ω) denote the photon density. For modelling the propagation of light, we use a time-independent variant of DA equations with the well-known Robin boundary condition [41]. This is written as where µ(r), κ(r) ∈ L ∞ + (Ω) denote the positive-valued optical absorption and diffusion coefficients, respectively. Here, r denotes the spatial position. Additionally, γ d is a dimension-dependent factor (γ 2 = 1/π and γ 3 = 1/4),n is an outward unit normal, and I is an inward directed diffuse boundary current [29,43]. Following the absorption of photons, a spatially varying heating field h(r) is generated in the form h[κ, µ, I](r) = µ(r)φ[κ, µ, I](r). (2) Because of thermo-elastic expansion effects, the induced spatially varying heating field causes an instant local increase in pressure that follows where Γ is the Gruneisen parameter, and describes the efficiency of conversion of heat into pressure [20]. Here, we assume Γ(r) constant and rescaled to 1, and thus p 0 (r) = h(r) [20].

2.2.
Modelling the acoustic portion of the problem. We use a linear system of three-coupled first-order equations for describing the propagation of acoustic wavefields in an acoustically heterogeneous and lossy medium [46,45]. To explain this, we define our fields, which are the acoustic pressure field p(r, t), particle velocity vector v(r, t) and acoustic density ρ(r, t), where r ∈ R d and t ∈ [0, ∞) denote the spatial position and time. Additionally, we define the medium's acoustic parameters as sound speed c 0 (r), ambient density ρ 0 (r), attenuation coefficient α 0 (r), and frequency power law exponent y. The acoustic wavefield propagation is now modeled by three equations, i.e., linearised equation of motion (conservation of momentum) ∂v ∂t linearised equation of continuity (conservation of mass) ∂ρ ∂t (r, t) = −ρ 0 (r)∇ · v(r, t), and equation of state with initial conditions p(r, 0) = p 0 (r), v(r, 0) = 0. (7) Here, τ (r) and η(r) are, respectively the absorption and dispersion proportionality coefficients, and are given by 2.3. Opto-acoustic forward operator. Having given the models for describing the optical and acoustic portions of the forward operator, we now define our opto-acoustic forward operator. To do this, we also require the measurement operator M, which at each time step maps the pressure p(r, t) to the measured data at sensors. The time series of acoustic data are denoted byP ∈ R NsNt with N s , N t ∈ N the number of detectors and the number of measurement time instants, respectively.
Definition 1. For a fixed illumination I, the optical portion of the forward operator is a nonlinear map in the form where h(r) is given by (2). Also, the acoustic portion of the forward operator is a linear map in the form [1] Λ a : Using (9) and (10), the coupled opto-acoustic forward operator is defined by the composite map Λ, i.e., 2.4. Model-based approach for inverse problem. The inverse problem of QPAT is a simultaneous reconstruction of µ, κ fromP = Λ[κ, µ]. Applying a model-based approach for solving this problem, the objective is to minimise an error functional, the sum of squared differences between modeled data and the measured data via an iterative adjustment of the unknown optical coefficients µ, κ. Using N q optical illuminations, the error functional is defined by [18] (κ, µ) = 1 2 where, q indexes a set of N q illuminations I q with corresponding acoustic dataP q [18]. We combine the unknown parameters as x = [κ, µ] ∈ L ∞ + (Ω) × L ∞ + (Ω). Let D x Λ denote the Fréchet derivative of the forward operator at x. The Fréchet derivative of at x is given by Here, D * x Λ denotes the adjoint of the Fréchet derivative of the forward operator, and is given by where Λ * a denotes the adjoint of the linear acoustic forward operator Λ a (10) and D * x Λ o represents the adjoint of the Fréchet derivative of the optical forward operator (9) both with respect to the L 2 (Ω) inner product. Formulae for Λ * a can be found in [26], while for D * x Λ we have the next lemma. Lemma 1. Let us denote the solution of (1) for the fixed illumination I, diffusion κ 0 and absorption µ 0 by φ 0 . Then the Fréchet derivative of the optical portion of the forward operator D x Λ o at x 0 = [κ 0 , µ 0 ] applied to the perturbations δκ and δµ is given by where δφ(r) satisfies −∇ · κ 0 (r)∇δφ(r) + µ 0 (r)δφ(r) = ∇ · δκ(r)∇φ 0 (r) − δµ(r)φ 0 (r), r ∈ Ω δφ(r) The adjoint map D * x Λ o can then be calculated from where the adjoint fieldh(r) satisfies Lemma 1 can be proven using integration by parts.

Numerical computation
3.1.1. Variational formulae. We use a first-order Galerkin finite element method (FEM) for approximation of the optical portion of our QPAT problem. For an approximation of Λ o , a variational form of (1) is derived, i.e., where ν ∈ H 1 (Ω) is a test function. Additionally, for an approximation of the Fréchet derivative operator D x Λ o using FEM, a variational form of (16) is derived, i.e., In the same way, for an approximation of D * x Λ o , a variational form of (18) is obtained, i.e., Ω µ 0h νdr + Ω κ 0 ∇h · ∇νdr + 2γ d ∂Ωh νds = − µ 0 hνdr.
3.1.2. Discretisation of optical coefficients and fields. Let T denote a triangulation of Ω with N e elements, i.e., T = t j | j = 1, ..., N e . Applying an approximation using FEM, we discretise the optical coefficients in a piecewise-constant basis {χ j = 1 t j | j = 1, ... , N e }. Using this, the optical fields are approximated as [41] κ 0 (r) ≈ κ e (r) = Ne j=1κ j χ j (r) whereκ j andμ j denote the discretised absorption and diffusion coefficients at element t j . Additionally, φ 0 (r) is approximated in a piecewise-linear basis {ϕ k | k = 1, ... , N n } in the form where Φ 0,k denotes the discretised photon density at node k, and N n is the total number of nodes. We also approximate the adjoint fieldh(r) in a piecewise-linear basis as In the sequel, a field that is discretised at nodes as in (23) and (24) (resp. elements as in (22)) is called a nodal (resp. elemental) vector. In the same way as the continuous formulae, we use δ for signifying a perturbation in a discretised coefficient or field.
3.1.3. Matrix form of variational formulae. For a discretisation of the problem, a matrix form of the variational formulae in section 3.1.1 is derived. [43,41]. To do this, we define a system matrix A o in the form with where p, k = 1, ..., N n denote nodal indices. We also define matrix A δ,o in the form Using the above, we now define the discretised optical forward operators. We stress that this definition is setting the notation for the discretised operators, which will be described in detail below.

Definition 2.
A discretisation of the optical forward operator Λ o gives a map from a vector space of discretised (elemental) optical coefficients to a vector space of discretised (elemental) heating field coefficients in the form Additionally, a discretisation of the Fréchet derivative operator D Now, we give further details on these operators. (19), together with taking ν(r) to be a basis function ϕ p (r), gives a linear system Using this, the heating field is approximated in a piecewise constant basis as [41,19,18] H =μ • IΦ.
Here, • denotes an element-wise product and I is a node-to-element map. For I we use the L 2 orthogonal projection from the space of nodal representations to elemental representations. The action of I on a nodal vector θ restricted at element j is in the form where p denotes the nodal index, and (j) is a set of d + 1 nodes that correspond to element j. We also define I + as a map from the space of elemental vectors to nodal vectors. The action of I + on an elemental vector Θ restricted at node p is given by where S is the vector of volume of elements, and l(p) is the set of elements that are connected to node p. Also note that from (32) and (33), In the sequel, we will use the notation X = [κ,μ] and δX = [δκ, δμ].

3.1.5.
The matrix-free action of J o [X] on perturbation δX. Here, we explain how the action of J o [X] on a perturbation δX is approximated in a matrix-free manner. To do this, we approximate the perturbation field δφ(r) in a piecewise-linear basis in the same way as (23). Plugging the nodal vector δΦ into (20) gives a linear system for calculation of δΦ in the form where Φ 0 has been computed using (30). Finally, the perturbation in the heating field δH is computed as 3.1.6. The matrix-free action of J * o [X] on H. Here we explain how the action of the adjoint of Fréchet derivative J * o on an elemental vector H can be approximated using an adjoint-then-discretise method. PluggingH from (24) into the variational form of the adjoint formula (21) gives a linear system in the form where Finally, given the nodal vectorsH and Φ 0 , we will approximate the action of the adjoint using (17). For this we must choose how to calculate the products that appear in (17), and then how to project onto the space of elemental vectors. We will use the following Lemma which also makes use of the matrices and If the products in (17) are calculated by first multiplying the nodal functions, and then using an L 2 -orthogonal projection on the space of elemental functions, then for the discretisation of the adjoint we have Remark 1. Using a discretise-then-adjoint method, the adjoint of Fréchet derivative of the optical forward operator will be in the form (cf. [18], eq. (27)) Using (33) and the fact that I + Θ = I T (S • Θ), it can be shown that (41) matches (40).
Remark 2. From a theoretical point of view, it is perhaps more natural to make the projections from nodal to elemental representations in the L ∞ norm since κ, µ ∈ L ∞ + . The κ component of the adjoint, i.e., ∇φ 0 (r) · ∇h(r), is constant at each element, and thus the L ∞ projection is the same as the L 2 orthogonal projection. However, the two terms included in the µ component of the adjoint, i.e., φ 0 (r)h(r) + φ 0 (r)h(r), will be different if L ∞ projection is used. Our simulation experiment showed us that an L ∞ projection gives almost the same reconstruction as the L 2orthogonal projection, but dramatically increases the computational cost. Therefore, we used the L 2 -orthogonal projection for approximation of the unknown optical coefficients.

3.2.
Numerical computation of the acoustic operators (Λ a and its adjoint). Here, for a numerical computation of the acoustic portion of the problem, we used a k-space pseudo-spectral time-domain (PSTD) method [40,13]. Applying this method, the spatial gradients are approximated in a frequency domain using a Fast Fourier Transform (FFT), and the temporal gradients are approximated using finite difference schemes [40,13].
3.2.1. Discretisation of acoustic fields. We approximate the acoustic forward and adjoint operators on a uniform rectilinear grid staggered in space and time [40]. We denote the position of a given grid point in Cartesian coordinates by r ζ , where ζ = (ζ 1 , ..., N i the total number of grid points. We denote the grid separation along direction i by ∆r i . The time accessible to detectors, i.e., t ∈ (0, T ), is sampled with a temporal separation of ∆t so that the time step n corresponds to the time instant t n = n∆t. To accommodate a staggered spatial grid, we introduce operator T i , which shifts the point r by ∆r i /2 in coordinate i, and acts on a field as To accommodate a staggered temporal grid, we shift the field at time t n by −∆t/2, i.e., t n−1/2 = n∆t − ∆t/2. Based on the above, the discretised particle velocity vector at time step n is denoted This approximates the actual velocity on the staggered spatial grid as We also approximate the acoustic density as n and the scalar pressure as p (ζ;n) ≈ p(r ζ , t n ) ∈ R N ζ × R Nt+1 n . (Note that acoustic density is not a physical vector, but it is changed to a vector to accommodate the numerical method.)

Discretisation of medium's acoustic properties.
We define a discretised variant of acoustic parameters as diagonal matrices of size N × N acting on the spatial index ζ of acoustic fields. We denote a discretised variant of c 0 (r) byc. Also, an approximation of the ambient density ρ 0 (r) is denoted byρ. We also define ρ 0 (r) on a spatial grid staggered in coordinate i byρ i .
Additionally, we define a discretised variant of the absorption and dispersion proportionality coefficients asτ andη, respectively. Also, a discretisation of the associated fractional Laplacian operators N × N is given by where F and F −1 denote the FFT and its inverse, respectively. Applying the k-space pseudospectral method on a staggered spatial grid, the spatial gradient in coordinate i is in the form where sinc(c ref k∆t)/2 enforces a k-space correction to the spatial gradient using a reference sound speedc ref in order to minimise the numerical dispersion errors accumulated by the temporal integrations [40]. For further details on the k-space pseudo-spectral method, see [40,13].
Because of the computation of the spatial derivatives via an FFT, wave wrapping may occur, i.e., the acoustic waves leaving one side of the grid reenter the opposite side. To avoid wave wrapping, an absorbing boundary condition, referred to as perfectly matched layer (PML), is added to each side of the grid. Here, the action of a PML on a field in coordinate i is denoted by A i ∈ R N ×N . This is defined as [40] where α a,i is the attenuation enforced by the PML in coordinate i.
Having defined a discretisation of the optical forward operator, as well as acoustic fields and medium, we now complete the definition of our discretised composite operator.
Definition 3. A discretisation of the composite opto-acoustic forward operator Λ gives a map from optical coefficients to a set of time series of measured data in the form where H o and H a represent the discretised optical and acoustic forward operators, respectively. Also, an operator representing a discretisation of the Fréchet derivative of the forward operator D X Λ is defined by Using (48), we will use as a discretisation of the adjoint of Frećhet derivative operator the map 3.2.3. Numerical computation of the acoustic forward operator H a . We now explain a discretisation of the acoustic forward operator (H a ). We will use this later for calculation of the acoustic adjoint operator. For a numerical implementation of the acoustic forward operator given by (4), (5), (6) and (7) using a k-space pseudo-spectral method, we used an open-source code, which is freely available on the k-Wave website [44]. Using this code, the discretised initial pressure distribution at t = 0, denoted by P 0 is applied as an injection of mass, referred to as additive source. To do this, P 0 must be split over two time steps n = {−1/2, +1/2}. (cf. [1], Appendix B, or the k-Wave manual [44]). This gives a source in coordinate i as [1] where S is a symmetric smoothing operator that is applied in order to mitigate unexpected oscillations of P 0 . (cf. the k-Wave manual in [44]). Additionally, since s i is added to the mass equation, a factor 1 ∆tc 2 has been applied in order to account for the conversion of units from pressure to the time rate of density (cf. [1], Appendix B, or the k-Wave manual in [44]). Using the definitions given above, the calculation of H a proceeds as follows. Start at iterate n = −1 with initial conditions p (ζ;n=−1) = 0, v (i;ζ;n=−3/2) = 0 and ρ (i;ζ;n=−1) = 0, and terminate at iterate n = N t − 2. 1. Update the particle velocity vector field (conservation of momentum (4)): 2. Update the acoustic density field (conservation of mass (5)) also adding source: 3. Update the scalar pressure field (equation of state (6)): where I N denotes an identity matrix of size N × N , and the last term is actually the action ofτ Y abs on d i=1 ∂ ∂t ρ (i;ζ;n+1) , and is derived from (52). 4. Compute the measured pressure at detectors: This is defined usinĝ where M denotes a discretised variant of M (cf. section 2.3), and includes an interpolation operator for mapping the acoustic pressure field from grid points to position of ultrasound detectors. It is worth mentioning that M also depends on the size and properties of detectors. These effects are neglected in our study by assuming the detectors sample the pressure pointwise.

3.2.4.
A discretise-then-adjoint method for derivation of the acoustic adjoint operator. Having defined a discretisation of the acoustic forward operator (H a ), we now explain how to calculate the acoustic adjoint operator (H * a ) using a discretise-then-adjoint method. The derivation of H * a in this section is a modification to the study of [25] in the sense that the effects of the PMLs and an additive source are incorporated in calculation of the adjoint. Additionally, in contrast to [25], we will represent H * a as a discretised linear system of PDEs, the same as our representation for H a [25].
To derive this adjoint, a matrix form of H a must be derived using the details given in section 3.2.3. To do this, we start with definition of diagonal matrices C =c 2 ∈ R N ×N and Q ∈ R dN ×dN with diagonal a d-times stack of diagonals ofρ ∈ R N ×N . Also, we define a diagonal matrix Q s ∈ R dN ×dN with diagonal a stack of diagonals ofρ i , (i ∈ {1, ..., d}). We will also use A ∈ R dN ×dN as a diagonal matrix with diagonal a stack of diagonals of A i (i ∈ {1, ..., d}).
We now define a stack of coordinate-dependent particle velocity and acoustic density fields v (i;ζ,n) ∈ R N , ρ (i;ζ,n) ∈ R N (i ∈ {1, ..., d}) asv n−1/2 ∈ R dN andρ n ∈ R dN , respectively. We also definep n = p (ζ,n) ∈ R N . A stack of all these vector fields yieldsz n = {v T n−1/2ρ Also, let S ∈ R 7N Nt×N be the map from the discretised sought after initial pressure P 0 to an additive source S. In particular, at time step n, S n+1/2 = S n P 0 , where and S n P 0 = 0 for n = −1, 0. Here, s (i;ζ;n+1/2) has been derived from (50), (52) and (53), I z s ∈ R 7N ×dN is the map from s (i;ζ;n+1/2) toz n in the form We now introduce an operator T ∈ R 7N ×7N for defining a discretised formula for the time sequence of fields asz Additionally, we introduce a measurement matrix M = MI p z with I p z ∈ R N ×7N the projection from the space ofz n to the space ofp n . Based on this, we now define our discretised acoustic forward operator.

Definition 4. The discretised acoustic forward operator is defined by
where Here,z n is determined by (57) with initial conditionz −1 = 0, andP ∈ R NsNt is a time-series stack of measured data at iterates n ∈ {0, ..., N t − 1}.
Proof. We have given the proof in [27], Lemma 2 and and Corollary 1.
Note that in the case we are considering the sum in (61) is actually just two terms. By commuting n = −1, 0 in (55) to M using the fact the the forward operator is linear, we define a source for our acoustic adjoint operator as From (63), equation (62) in Lemma 2 can be modified as where Now, we derive a matrix form of T and T * using the forward model presented in section 3.2.3 to show how multiplication by each of them may be computed. We also denote a matrix form of the k-space spatial gradient in (45) as Using this, we will use Φ ∈ R dN ×dN and Ψ ∈ R dN ×dN composed of submatrices and To make the notation more compact we also introduce matrices D, E ∈ R N ×dN and G ∈ R N ×N whose actions are given by The matrix T is then given, in block form, by From (72), the adjoint operator T * will be in the form From (73) and using (56), (64) and (65), as well as calculating the adjoints of D, E and G from (69), (70), and (71), the time sequence of adjoint fields is given iteratively bȳ wherep sol := H * a (P ), and ϑ = S 1 2dC using (55). Now, applying the replacementsρ n+1 = QA −1ρ Our numerical experiments showed that an operator H * a that is calculated using a discretise-thenadjoint method (75) satisfies an adjoint test with a higher accuracy than using an adjoint-thendiscretise method used in [26]. Therefore, we used (75) for the acoustic adjoint operator.

Iterative model-based approaches for the direct problem of QPAT
Having defined a discretisation of opto-acoustic forward operator H, the Fréchet derivative operator J, and its adjoint J * , we now explain the iterative approaches we will use for minimisation of (12).
Considering (2), the dependence of heating field h on κ, µ is nonlinear. Furthermore, because of high scattering of light in tissue media, simultaneous reconstruction of κ, µ from h can be highly ill-posed. As a result, a minimisation of (12) is a non-convex, nonlinear and ill-posed inverse problem. It has been shown that simultaneous reconstruction of κ and µ using a single optical excitation, i.e., N q = 1, does not have a unique solution [4]. However, the uniqueness and stability, in appropriate norms, of this inverse problem using N q > 1 optical excitations under some geometric constraints has been established [4]. It is also worth mentioning that using our forward acoustic operator, which can be adapted to acoustically heterogeneous and lossy media, the direct problem of QPAT is more ill-posed than existing studies, for which spherical mean Radon transform or Green's function techniques have been used for solving the acoustic portion of the forward operator using an acoustically homogeneous and lossless medium, which does not hold in practice [20,18,31].
Remark 3. Since the magnitude ofκ is often 1 to 2 orders of magnitude greater thanμ, we follow [19,18] and minimise (cf. (12)) with respect to scaled coefficients We will use two types of scaling. For the method described in section 5.1 we use a linear scaling (see (97)) while for the methods described in sections 5.2.1 and 5.2.2 we use logarithmic scaling (see (98)). In our numerical experiments the method of section 5.1 did not converge with logarithmic scaling.
Using these, we now consider a minimisation problem with respect to a scaled vector of optical coefficientsX in the form where is a non-convex, nonlinear and smooth function. Also,X has been constrained by a lower boundX l and an upper boundX u . We will take two main approaches for solving (77), the first of which is a direct minimisation of the nonlinear objective function using a Quasi-Newton approach, and the second is to solve the minimsation problem as a sequence of convex and linearised subproblems using a matrix-free Jacobian-based method. We now explain these approaches. A fixed point iteration arising from the optimality conditions of gives a sequencē where d k is a search direction for iteration k, and α k is a step size along search direction d k .
(Throughout this manuscript, a subscript (resp. superscript) k indicates an iteration for an inner (resp. outer) loop.) 4.1. Newton's methods. From the second-order optimality condition for minimising , the Newton search direction, is derived using Here, ∇ k is the first-order derivative of atX k , and is computed using and H k , is the second-order derivative of (the Hessian matrix). Using a Gauss-Newton method, the Hessian matrix is approximated using where a term including the second-order derivatives of H q has been neglected as compared to the exact Hessian. A class of approaches that utilise (79) for minimisation of are called Newton's method. Although Newton's methods benefit from a quadratic (optimal) rate of convergence, they pose some practical limitations for QPAT, i.e., a) The computation, storage and inversion of the Hessian matrix is expensive. To address this problem, an implicit inversion of H k using an explicit form of J and J * has been used in the context of diffuse optical tomography [36].
b) The size of the discretised heating field H, which represents data for the optical portion of the forward operator, is on the same order as the size of unknown parameters (cf. (28)). It has been shown that an explicit computation of the matrix J o (resp. J) requires at least N s + N e times implementation of H o (resp. H) (See [19], page 11). c) Considering the time series of measured dataP , which is of size N s N t , the Jacobian matrix J is dense, and thus a storage of J is impractical.
We will later explain how we have addressed these challenges using an inexact Newton method.

4.2.
Nonlinear gradient-based methods. An alternative to Newton's method is using gradientbased Quasi-Newton approaches, for which H k is not computed explicitly, but is approximated using solely information included in the first-order gradients ∇ possibly at previous steps. Additionally, an inversion of H k can be avoided using a direct approximation of H −1 k , for which the so-called BFGS method is often used. Since H −1 k is a dense matrix, and BFGS method poses challenges regarding memory, a limited-memory variant of BFGS (L-BFGS) method is used [19,18]. Using L-BFGS, H −1 k is updated using the most recent m pairs of (s, y) given by (In our study, we empirically use m = 5.) We will also use ρ k = 1/y T k s k , and an initial guess for the Hessian matrix in the form where I is the identity matrix. By applying L-BFGS method to the constrained minimisation problem (77), the search direction d k is computed using Algorithm 1. (See [19] and [18] for applications on QPAT and direct QPAT.) From the First-order Karush Kuhn Tucker (KKT) conditions associated with the constraint on X, d k is projected onto the feasible region using (See [18], equation (37)) Using d k given by (84), the step size in (78) is chosen by a standard backtracking line search satisfying the Wolfe conditions as well as the constraints together with enforcing the bounds associated with the constraint, i.e., (See [19,18].) In (85), 0 < c 1 < c 2 < 1 are user-defined parameters. Applying these conditions, α k is chosen using a backtracking line search, as given in Algorithm 2. Here, τ < 1 is a user-defined

Total Variation (TV) regularisation
To mitigate the ill-posedness of the problem, a regularisation functional must be added to the data fidelity term in (77) [19,18]. This results in a minimisation problem in the form where J [X] and λ, respectively denote the regularisation functional and the regularisation parameter, the latter of which makes a balance between a fidelity to the measured dataP and to a priori knowledge about the true solution. In our study, based on an assumption that the optical coefficients are piecewise constant with sharp edges, we use J [X] : with u eitherκ orμ. Using (88), Here, D u ∈ R N l ×Ne is a sparse matrix with N e and N l the total number of elements and the total number of internal edges between elements, respectively. Each row D u l ∈ R 1×Ne (l ∈ {1, ..., N l }) has two nonzero components at indices j 1 and j 2 , which correspond to two elements connected by the internal edge l. These have values a l and −a l with a l the length (or area) of internal edge l [8]. Using this, the gradient of J (X) is a nonlinear operator in the form where M (X) is given by with C(X) a diagonal matrix Here, the smoothing parameter β is added in order to make ∇J [X] differentiable. Having defined our regularisation functional, we now explain the minimisation approaches we use for solving the direct problem of QPAT.

TV regularisation using Alternating Direction Method of Multipliers (ADMM).
Two major issues for minimisation of F is the nonlinearity of ∇J [X] and a loss of accuracy due to the smoothing parameter β. Note that a small value for β may deteriorate the convergence [47]. One way for addressing these difficulties is to use a slack variable for shifting the gradient of DX 1 out of the non-differentiable region and penalising the applied shift. To do this, the Augmented Lagrangian is introduced which, following [19,18], may be further rewritten as where ν and are constants and U w and U p,q are rescaled Lagrange multipliers. Minimisation of F is then accomplished by alternating minimisation of (94) in W andX, and updating of the Lagrange multipliers, using Algorithm 3. In Algorithm 3, T ol out is a terminating threshold, and Algorithm 3 Gradient-based Quasi-Newton method using ADMM 1: Input:P q , (q ∈ {1, ..., N q }) 2: Initialise: Output: X * 9: end while X * denotes an optimal solution. The line 4 in Algorithm 3 is a minimisation of the first term in (94) with respect to W , and can be calculated exactly using a scalar-wise Shrinkage formula of the form Additionally, the line 5 in this algorithm is a minimisation of F A at W k+1 with respect toX, and is done using L-BFGS algorithm, as explained in section 4.2. Note that we should replace by F A , and we hope this is not confusing for the reader. To do this, the first-order derivative of F A with respect toX at inner iteration k is computed using For this method we use a linear scalinḡ where the dominators are the mean value of initial guessesκ 0 andμ 0 .

5.2.
Linearised matrix-free Jacobian-based method. In this section, we explain two methods that we use for solving the problem (77) as a sequence of linearised subproblems. Here, we set X l = −∞ and X u = +∞. This gives a non-constrained form of the problem. Our numerical results showed that enforcing bounds on the solutions is not required because of a good stability provided by these approaches. Additionally, we will make a balance between reconstruction ofκ andμ by using the logarithmic rescalingX = log X X 0 (98) which also implicitly enforces positivity onκ andμ. Here X 0 denotes an initial guess. Accordingly, given an iterateX k and a pointX in a neighborhood ofX k , the forward operator H is linearised using an approximation Applying the approximation (99) on the problem (77) yields the minimisation problem where we have changed fromX to d =X −X k in the minimisation. Note that we have used k as a superscript in order to indicate a linearised subproblem (outer iteration), as opposed to a subscript in (79) that indicates an inner iteration. The k-th linearised subproblem (100) gives a normal equation in the form of where ∇ k and H k are obtained from (80) and (81), respectively. The normal equation (101) is a variant of (79), for which H k is approximated using a Gauss-Newton method. Here, to avoid a storage of J, we solve each linearised subproblem (101) using a Krylov subspace method in a matrix-free manner, for which implicit forms of operators J and J * are used. As discussed in section 4.1, Newton's methods converge rapidly, but solving a normal equation with a high accuracy for each linearisation is very expensive. From a theoretical point of view, using Krylov methods, the total number of iterations for reaching a minimiser is on the same order of the number of unknowns. Therefore, we solve (101) roughly using a loose stopping tolerance, i.e., whered k denotes a rough solution. It has been shown that under assumptions that H k is symmetric and positive definite, the solutionsd k are sufficiently small and η k < 1, the local convergence is guaranteed using a step size α k = 1 (cf. [14], section 2). A class of approaches that use (102) for minimisation of (77) are called Inexact Newton's methods [14]. In the sequel, we use d k , rather thand k , for indicating a rough solution of (102), and we hope this is not confusing for the readers.

5.2.1.
Lagged diffusivity (LD) method with priorconditioning. As discussed above, for our direct QPAT problem, the Hessian matrices H k are ill-conditioned. Therefore, a regularisation functional must be added in order to stabilise the problem. Our first approach for an inclusion of TV regularisation in (102) is based on solving (87) via an iterative linearisation of an associated objective function F. Strictly speaking, we first add the regularisation functional to an original nonlinear problem, and then the linearisations are applied to a regularised form of a nonlinear objective function. The k-th linearisation of the data fidelity term using (100), together with ∇J [X] defined by (91), gives a TV regularised variant of (102) in the form Let us denote an initial guess for k-th subproblem byX k 0 , which is calculated using the previous linearised subproblem k − 1. One way for addressing the nonlinearity of M Linearisation of M using the above equation is called the Lagged Diffusivity (LD) method [47].
(See [22] for an application of LD on the purely optical problem of QPAT.) Our method now is to follow [2] in order to convert (104) into a similar problem in which the regularisation is obtained by early termination of an iterative method rather than tuning of parameter λ. First, since M k may only be positive semi-definite we approximate it by M k γ = M k + γI. Replacing M k by M k γ in (104) we next multiply by (M k γ ) −1 so that (104) becomes Applying Krylov methods for solving (104), the iterates lie in a subspace [2] where i k max is the maximum number of iterations for the Krylov method. Our numerical experience shows that using small values for λ, the term λX k 0 will have a small effect, and indeed we drop this in our method by taking λ = 0.
Using this approach for applying regularisation on a normal equation is called priorconditioning. Note that in this approach we adjust the regularisation by i max . In contrast to an empirical choice for the regularisation parameter λ, which requires a recomputation of the problem, i max can be implicitly controlled by a stopping tolerance [2]. Using the above, our subproblem is to solve a priorconditioned variant of (104) in the form It turns out that (107) provides a better convergence than (104) since the structure of the prior is directly included in the Jacobian matrix [2]. Here, we solve (107) using the Preconditioned Conjugate Gradient (PCG) method, as outlined in Algorithm 4.
Algorithm 4 PCG algorithm for solving linearised subproblem k

10:
Solve M k γ z i+1 = r i+1 11: 14: Output:X * We will terminate Algorithm 4 if i > i m with i m a user-adjusted number of inner iterations, and a relative reduction in r T i z i during i m inner iterations becomes less than a user-adjusted threshold T ol in . Also, the PCG algorithm is unconditionally stopped whenever i > i max .
Remark 4. Each inner iteration of the PCG loop involves an implicit inversion of the sparse matrix M k γ (See [2]). For the direct QPAT, the cost of an inversion of M k γ is negligible, compared to an implementation of the Jacobian and its transpose.
Using the LD method, together with a logarithmic scaling, our inexact Newton algorithm is outlined in Algorithm 5.

Primal-Dual Interior-Point-Method (PD-IPM).
A technique for linearisation of M was developed using a primal-dual method, and was shown to give better convergence than the LD method, especially for small values of β [10]. In [10], the PD-IPM technique was used for enforcing TV regularisation when inverting a linear blurring operator. In contrast to our first approach using the LD method, here we first linearise the data fidelity function, and then add a TV regularisation function to each linearised subproblem using the PD-IPM approach. Using this, we iteratively solve a TV regularised variant of the linearised subproblem (100) in the form Algorithm 5 Inexact Newton method using LD 1: Input:P q (q ∈ {1, ..., N q }) 2: Initialise: Apply linearisation on (87)

6:
ComputeX k+1 using (78) and α k = 1 7: where we have removed the superscripts indicating subproblem k for brevity. A linearisation of this system with respect to (χ, d) gives If we make the replacement ξ = C[d]Dd, then above linearised system gives decoupled equations and where the subscript k indicates an inner sub-subproblem. The same as the LD method, we solve a priorconditioned form of (111) with λ = 0, i.e., and −∇˜ k is actually the second term in the right-hand-side of (111). Here, the subscript k indicates the fact that for each linearised subproblem k (superscript), we solve a sequence of subsubproblems (112) and (113) using an update of d k , M γ,k and ∇˜ k . The developed inexact Newton method using a TV regularisation based on the PD-IPM approach is outlined in Algorithm 6. We use step sizes α k = 1 and α k = 1 for all k and k, respectively. The step size s k is described below the algorithm (see (115)).

5:
Initialise: Compute δd k from (113) using Algorithm 4 8: Compute δχ k using (112) 10: χ k +1 = χ k + s k δχ k 11: end while 12: ComputeX k+1 using (78) and α k = 1 13: X k+1 = X 0 eX k+1 14: end while 15: Output: X * Here, each iteration k amounts to solving a PCG loop. The optimal r provided by each PCG loop k is denoted by r k , * . Using this, we terminate each outer subproblem k using a stopping criterion given in line 6 in Algorithm 6. This stopping criterion uses a stopping threshold T ol med .
Additionally, following [8], we choose s k using a step length rule. Using this approach, we choose Here, ϕ * is the largest ϕ that satisfies a feasibility condition where j denotes the index of components of χ.

Numerical results
The TV regularised minimisation approaches that have been explained in section 5, i.e., ADMM, LD and PD-IPM, were used for a simultaneous reconstruction of images of optical absorption coefficient µ and diffusion coefficient κ for 2D and 3D phantoms. 6.1.1. Optical excitation. Four different optical excitation patterns were used, i.e., N q = 4. For each optical excitation q, we used a discretisation of an inward directed diffuse boundary current I s,q (J/mm) that obeys where ι q ∈ ∂Ω denotes the source position for optical excitation q, and was set each side of the square for each optical excitation (cf. the second line in equation (1).) Table 1. The minimal and maximal values for acoustic properties of the 2D phantom.
c 0 (ms −1 ) ρ 0 (kgm −3 ) min max min max Data generation 1.129 × 10 3 1.902 × 10 3 0.620 × 10 3 1.390 × 10 3 Image reconstruction 1.276 × 10 3 1.725 × 10 3 0.750 × 10 3 1.250 × 10 3 6.1.2. Discretisation for data generation. For generation of time series of boundary data, the square domain was discretised using a grid with 128 × 128 nodes and an even separation distance of 7.81 × 10 −2 mm along both Cartesian coordiantes. For the optical portion of the problem, a triangulation was applied so that each two finite elements form a pixel, and the centre of the pixel matches an associated node on the acoustic grid. For the acoustic portion of the problem, to mitigate wave wrapping [40], a perfectly matched layer (PML) having a thickness of 20 grid points and a maximum attenuation coefficient of 2 nepers per grid point was added to each side of the grid. The propagated wavefield was detected in 1017 time steps using 158 detectors that are equidistantly placed on the left and top sides of the computational grid, as shown in figure 1. A 30 dB Additive White Gaussian Noise (AWGN) was then added to the simulated data.

Discretisation for image reconstruction.
To avoid an inverse crime for discretisation, the image reconstruction was done using a computational grid made up of 80 × 80 nodes with an even separation distance of 1.25 × 10 −1 mm. 6.1.4. Acoustic properties. To the best of our knowledge, our manuscript reports the first results on the direct QPAT for realistic acoustic media, for which acoustic characteristics of tissue media such as heterogeneity and attenuation are taken into account.
In addition, PAT and QPAT use an assumption that the acoustic properties of the medium are known. This assumption does not hold in practical cases. For example, it has been shown that the acoustic properties of the breast vary up to 15 %. These variations are often not exactly known for reconstruction. As a result, using the same acoustic properties for data generation and image reconstruction may be an inverse crime. To avoid this, for data generation, we corrupted the acoustic properties of the medium with 30 dB AWGN noise. Figures 1(a) and 1(b) show the contaminated distributions of sound speed (c 0 ) and ambient density (ρ 0 ) for data generation, respectively. The minimal and maximal values of these maps are given in the top row of Table 1.
Using these values, the computational grid for data generation supports a maximal frequency up to 7.223 MHz. (See the k-Wave manual [44].) Also, in figures 1(a) and 1(b), the position of detectors is shown by the black circles matching the left and top sides of the grid.
For image reconstruction, we used the clean maps that are shown in figures 1(c) and 1(d). The minimal and maximal values of these maps are given in the bottom row of Table 1. Using these acoustic maps, the grid for image reconstruction supports a maximal frequency up to 5.101 MHz. From Table 1, the incorporated noise has provided a 10-15% relative discrepancy between the acoustic properties used for data generation and image reconstruction.
Furthermore, the acoustic medium was assumed attenuating, where acoustic absorption and dispersion follow a frequency power law [46]. Accordingly, we used a constant attenuation coefficient α 0 = 0.75 dB MHz −y cm −1 and an exponent factor y = 1.5 for both data generation and image reconstruction (cf. (8)). These values were chosen so that they approximately simulate the acoustic attenuation properties of the breast.  6.1.5. Optical phantom. We simulate the distributions of optical coefficients so that they follow the optical properties of soft tissues in the sense that they often possess a broad range of values for the optical coefficients. Our numerical experience showed that this is a challenge for image reconstruction, although this issue has been neglected in many of studies of QPAT. The reader is referred to [31] for a study on the direct QPAT, in which this issue has been considered. Accordingly, we simulate a distribution for optical absorption coefficient with 20 values within a range µ ∈ [0.025, 0.325] mm −1 with a background 0.075 mm −1 . Also, a distribution for the diffusion coefficient is simulated so that it has 6 values within a range κ ∈ [0.2, 0.4] mm −1 with a background 0.3 mm −1 . Figures 2(a) and 2(b) show the map for µ and κ, respectively. 6.1.6. Image reconstruction. We initialised all algorithms using values 1.2 times more than the mean of optical coefficients for the associated phantoms.
ADMM. The ADMM approach (cf. Algorithm 3) was applied for a simultaneous reconstruction of µ and κ. Since ADMM is our benchmark method, the associated parameters were chosen very carefully in order to obtain the best possible image. The line 5 in Algorithm 3 is a minimisation of F A (cf. (94)) with respect toX using the L-BFGS algorithm given in Algorithm 1. The L-BFGS algorithm uses a backtracking line search given in Algorithm 2 with c 1 = 1 × 10 −4 , c 2 = 0.9 and τ = 0.25. Also, we set = 1 (cf.(94)), and ν = 1 × 10 −3 for the Shrinkage operator (95). We terminated the ADMM algorithm using T ol out = 1 × 10 −2 (cf. Algorithm 3).
LD. The LD method was applied using Algorithm 5. Using this algorithm, each linearised subproblem is solved using a PCG loop (cf. Algorithm 4) by setting i max = 30, i m = 5, and T ol in = 0. The latter parameter implies that we terminate each PCG loop, if Note that we observed a nonmonotone convergence for iterates of each PCG loop (inner iterations) in regions close to an optimal solution X * , but the sequence (X k ) always monotonically converged to (X * ) using α k = 1. (We suggest using an Armijo condition for the outer iterations, although we observed that a nonmonotonic reduction in iterates of the PCG loop associated with outer iteration k is sufficient for providing a descent search direction, i.e., (X k+1 ) < (X k )). The TV preconditioner was applied using γ = 1 × 10 −6 and β = 2 × 10 −5 . Our LD algorithm was stopped using T ol out = 1 × 10 −3 (cf. Algorithm 5). PD-IPM. The PD-IPM technique was applied using Algorithm 6. Because of applying two layers of linearisation, each of the outer linearised problems are solved using a sequence of inner linearised subproblems. For solving a normal equation associated with each inner linearised subproblem, we terminated each PCG algorithm using the same parameters as in the LD method. The TV preconditioner was applied using γ = 1 × 10 −6 and β = 1 × 10 −6 .
6.1.7. Evaluation of image reconstruction. The criterion that we use for measuring the convergence of sequence X k to a ground truth image (phantom) is Relative Error (RE), which is calculated as Here, u k is the solution for either κ or µ at outer iteration k that is interpolated back to the grid for data generation (phantom). Also, the superscript phantom indicates the distribution of optical coeffcients for the phantom. We also consider (X k ) as the second criterion for convergence (cf. (77)).
6.1.8. Observations. ADMM. Using the parameters given in section 6.1.6, the ADMM algorithm was stopped after outer iteration 6. The final reconstructed images for the optical absorption coefficient µ and diffusion coefficient κ are shown in figures 2(c) and 2(d), respectively. Figure 3(a) shows the RE of sequence computed by the ADMM algorithm at outer iterations. LD. For the LD algorithm, the associated stopping criterion was satisfied after outer iteration 30. Figures 2(e) and 2(f) show the final reconstructed images for µ and κ, respectively. Figure 3(b) shows the RE of the iterates provided by LD for outer iterations k. The computed values for (X k ) are shown in figure 4(a). This figure is shown from an enlarged view around the optimal solution in figure 4(b). As shown in these figures, monotonically converges to a minimiser for all outer iterations using our choice for the step size α k = 1 (∀k).
PD-IPM. The stopping criterion for the PD-IPM algorithm was satisfied after 4 outer iterations. The final reconstructed images for µ and κ are shown in figures 2(g) and 2(h), respectively. Figure  3(c) shows the RE of solutions (optical coefficients) computed by the PD-IPM algorithm for outer iterations k. Figure 4(c) shows the obtained values for (X k ). This figure is shown from an enlarged view around the optimal solution in figure 4(d). As shown in these figures, our choices for the step sizes associated with outer (resp. inner) subproblems, which are α k = 1 (resp. α k = 1), provided a monotonic reduction for values of k (resp. k ) for all iterations.
Remark 5. Using PD-IPM, for both outer and inner linearised subproblems, the first step is a compuation of associated and the gradient ∇ (cf. Algorithm 6). As a result, using a line search for compuation of α k and α k , e.g., a backtracking line search using Wolfe conditions, is straightforward, and does not impose additional computational cost. However, our numerical experience showed that a reduction in r T i z i provided by the PCG loops (cf. Algorithm 4) is sufficient for a monotonic reduction of the objective function without using a line search. 6.2.1. Optical excitation. We used three optical excitation patterns, i.e., N q = 3. For each optical excitation q, we used a discretisation of an inward directed diffuse boundary current I s,q (J/mm 2 ) that obeys (117) with ι q ∈ ∂Ω two confronting faces of the grid, i.e. the left-right, posterior-anterior, and bottom-top faces.
6.2.2. Discretisation for data generation. For data generation, the cubic domain was discretised using a grid with 37 × 37 × 37 nodes and an even separation distance of 2.78 × 10 −1 mm along all Cartesian coordinates. For the optical portion of the problem, an FE mesh was simulated so that each set of six tetrahedral voxels forms a cubic pixel with a centre matching an associated node on the acoustic grid. For the acoustic portion of the problem, we added a perfectly matched layer (PML) with a thickness of 8 grid points and a maximum attenuation coefficient of 2 nepers per grid point. The acoustic wavefield was detected in 292 time instants using 2145 detectors that are equidistantly placed on two (the left and posterior) faces of the grid. A 30 dB Additive White Gaussian Noise (AWGN) was then added to the simulated data. 6.2.3. Discretisation for image reconstruction. For image reconstruction, we avoided an inverse crime for discretisation by using a grid made up of 33×33×33 nodes with a homogeneous separation distance of 3.125 × 10 −1 mm along all Cartesian coordinates. 6.2.4. Acoustic properties. As discussed in section 6.1.4, to avoid an inverse crime for acoustic properties of the medium, we corrupted the sound speed and ambient density with 30 dB AWGN noise for simulation of data, whereas we used the clean acoustic maps for image reconstruction. Table 3 shows the minimial and maximal values for these maps. Using this table, the grid for data generation (resp. image reconstruction) supports a maximal frequency of 2.0766 MHz (resp. 2.156 MHz). The distributions of the sound speed and ambient density for the 3D phantom for data generation (resp. image reconstruction) are shown from a top view in figures 5(a) and 5(b) (resp. 5(c) and 5(d)), respectively. Additionally, the acoustic attenuation coefficient and the associated exponent factor (cf. equation (8)) was simulated the same as the 2D phantom.    6.2.6. Image reconstruction. All algorithms were initialised using values 1.2 times more than the mean of optical coefficients for the 3D phantoms. ADMM. The parameters for an implementation of the ADMM algorithm were chosen carefully in order to obtain almost the best possible image. All these parameres match our choices for the 2D phantom.
LD. For an implementation of the LD algorithm, the arising linearised subproblems were solved using a PCG algorithm with the same parameters as the 2D phantom, except that for stopping each PCG loop, we used i m = 3. The TV preconditioner M was applied using γ = 1 × 10 −8 and β = 1 × 10 −6 . We stopped our LD algorithm using T ol out = 1 × 10 −3 .
PD-IPM. For an implementation of the PD-IPM algorithm, each outer linearised problem was solved using a sequence of inner linearised subproblems. The TV preconditioner was applied using γ = 1 × 10 −8 and β = 1 × 10 −8 . For termination of each PCG loop associated with each inner linearised problem, we used i m = 2 and i max = 30 (cf. Algorithm 4). We also terminated each outer linearised subproblem using k max = 25 and T ol med = 1 × 10 −3 . Also, our PD-IPM algorithm was terminated using a stopping threshold T ol out = 1 × 10 −3 .
6.2.7. Observations. ADMM. The stopping criterion for the ADMM algorithm was satisfied after outer iteration 7. In figures 6(a) and 6(b), the second columns (from the left side) show the final reconstructed images for µ and κ, respectively. These images are shown using horizonal slices the   LD. The LD algorithm was terminated after 10 iterations. The final reconstructed images for µ and κ are shown in the 3rd columns of figures 6(a) and 6(b), respectively. The images are shown in the same way as the ADMM method. Figure 7(b) shows the RE of the solutions computed by LD for outer iterations k. The computed values for (X k ) are shown in figure 8(a), and 8(b) from an enlarged view around the optimal point. As shown in these figures, monotonically converges to a minimiser for all outer iterations.
PD-IPM. The stopping criterion for the PD-IPM algorithm was satisfied after 6 outer iterations. The final reconstructed images for µ and κ are shown in the 4th columns of figures 6(a) and 6(b), respectively. Figure 7(c) shows the RE of solutions (optical coefficients) computed by the PD-IPM algorithm for outer iterations k. Figures 8(c) and 8(d) show the obtained values for (X k ). As shown in these figures, our choices for the step sizes associated with outer (resp. inner) subproblems, i.e., α k = 1 (resp. α k = 1), provided a monotonic reduction for k (resp. k ). Table 4 shows the RE values for the final reconstructed images shown in figure 6.

Discussion
In the previous section, we numerically evaluated the performance of the used iterative algorithms for a direct problem of QPAT for realistic acoustic media. In this section, we give further details on our numerical results.
The quality of images reconstructed by the ADMM algorithm was sensitive to choices for and T ol out , and thus these parameters were chosen very carefully. This leads to several repetition of the entire reconstruction. For exmple, by using a smaller T ol out (further proceeding of the iterations), the RE values started to increase. One way for avoiding this may be increasing the amount of regularisation via an increase in , but our numerical experience shows that choosing greater values for negatively affects the convergence of the algorithm. As shown in the second columns in figure 6, the reconstructed image for µ includes a high level of artifact, and also the ADMM algorithm failed to produce an accurate image for κ. (Note that better images were reconstructed using ADMM for 2D case.) The poor performance of the ADMM algorithm, especially in 3D case, may be because of assuming a high level of errors in the acoustic properties, as shown in figures 1 and 5.   The LD algorithm was not sensitive to γ, T ol in and T ol out . Both and RE will proceed with a monotonic reduction, if we choose smaller values for T ol out . This is indicated by figures 4(b) and 8(b). However, the convergence of the LD algorithm will be deteriorated, if we use very small values for β, for example the values that we used for the PD-IPM algorithm.
Our numerical experience showed that the PD-IPM algorithm was also not sensitive to choices for γ, T ol in , T ol med and T ol out . (For example, by choosing T ol med = 0, the performance of the algorithm is almost the same.) Additionally, this algorithm converged well using very small values for β.
Regarding the computational cost, the ADMM algorithm reconstructed final images using 100-150 gradient-based iterations for both 2D and 3D cases. (Note that each iteration involves at least an implementation of the forward operator and the adjoint of the Jacobian matrix.) Note also that for ADMM, as discussed in the first paragraph of this section, using a smaller stopping threshold deteriorates the quality of reconstructed images. The LD and PD-IPM approaches produced the final images in 150-200 inner iterations. Note that the major cost of each inner iteration for the PCG loop involves an implicit implementation of the Jacobian matrix and its transpose, and has almost the same computational cost as the gradient.
For a direct problem of QPAT for realistic acoustic media with an error in estimation of acoustic properties, our numerical results show that the developed matrix-free Jacobian-based inexact-Newton methods outperform gradient-based approaches that utilise a search direction using Quasi-Newton approaches like the L-BFGS method [19,18], at the same time does not impose a large computational cost due to an explicit construction of the Jacobian matrix [31].
Our next goal is an extension of multi-source QPAT to multi-spectral QPAT [5], which is more practical for biomedical cases, especially when a limited view is accessible for optical excitations. A simultaneous reconstruction of the optical coefficients and the sound speed using adjunct information obtained from ultrasound computed tomography may be promising for improving the quality of reconstructed images [30].