On the Adjoint Operator in Photoacoustic Tomography

Photoacoustic Tomography (PAT) is an emerging biomedical"imaging from coupled physics"technique, in which the image contrast is due to optical absorption, but the information is carried to the surface of the tissue as ultrasound pulses. Many algorithms and formulae for PAT image reconstruction have been proposed for the case when a complete data set is available. In many practical imaging scenarios, however, it is not possible to obtain the full data, or the data may be sub-sampled for faster data acquisition. In such cases, image reconstruction algorithms that can incorporate prior knowledge to ameliorate the loss of data are required. Hence, recently there has been an increased interest in using variational image reconstruction. A crucial ingredient for the application of these techniques is the adjoint of the PAT forward operator, which is described in this article from physical, theoretical and numerical perspectives. First, a simple mathematical derivation of the adjoint of the PAT forward operator in the continuous framework is presented. Then, an efficient numerical implementation of the adjoint using a k-space time domain wave propagation model is described and illustrated in the context of variational PAT image reconstruction, on both 2D and 3D examples including inhomogeneous sound speed. The principal advantage of this analytical adjoint over an algebraic adjoint (obtained by taking the direct adjoint of the particular numerical forward scheme used) is that it can be implemented using currently available fast wave propagation solvers.


Introduction
Photoacoustic Tomography (PAT ) is a biomedical imaging modality that has come to prominence over the past two decades due to its ability to provide images of soft tissue based on optical absorption with high spatial resolution. Optical absorption contrast is desirable because imaging at multiple wavelengths can in principle provide spectroscopic (chemical) information on the absorbing molecules (chromophores). However, due to the highly scattering nature of most biological tissues, imaging purely with light -light in, light out -can only be achieved with high spatial resolution in the vicinity of the tissue surface; beyond that the achievable image resolution falls off quickly with depth. PAT overcomes this problem by generating broadband ultrasonic pulses at the regions of optical absorption which then carry the information about the optical contrast to the tissue surface without being scattered significantly. PAT is increasingly widely used in preclinical studies to image small animal anatomy, pathology and physiology [37,41,38] and PAT's potential as a clinical technique is also being actively explored [20,42,31]. The spectroscopic information PAT provides (ie. differentiating endogenous chromophores or contrast agents based on their absorption spectra) opens up the possibility of molecular and genomic imaging, and even for functional imaging by monitoring changes in blood oxygenation and flow. For further references on the experimental aspects and applications of PAT see the review papers [36,3,27].

The Physics of Photoacoustic Tomography
PAT is based on the principle that acoustic waves can be generated by the absorption of modulated electromagnetic radiation. When the radiation is in the visible to nearinfrared spectrum this is known as the photoacoustic or optoacoustic effect. The term thermoacoustic is used when the radiation is in the microwave region. In PAT experiments, it is common to use pulses of a few nanoseconds duration, such as are generated by a Q-switched laser, to illuminate the tissue. The light will be scattered within the tissue and absorbed by any chromophores present. For the photoacoustic effect to take place, the energy must be subsequently thermalised (as opposed to emitted radiatively) and, if it occurs fast enough, this will give rise to a pressure increase localised to the regions of absorption. These regions of raised pressure and temperature will initiate acoustic and thermal waves as the tissue is elastic and thermally conducting. There are two relevant timescales to be considered. First, for the pressure and temperature fluctuations to be decoupled, the thermal relaxation time must be much longer than the characteristic timescale for the acoustic propagation. This condition is known as thermal confinement, and it must be satisfied if the acoustic pressure is to be modelled using a wave equation, rather than a set of coupled equations in temperature and pressure. The second condition, known as stress or pressure confinement, is that the characteristic timescale for the acoustic propagation, which will depend on the sound speed and the size of the absorbing regions, must be much slower than the time taken to deposit the optical energy as heat. This is essentially an isochoric condition requiring there to be no change is density of the tissue during heat deposition. When this condition is satisfied, the acoustic propagation can be modelled as an initial value problem with the initial (acoustic) pressure distribution as the initial condition. The second initial condition -that the time rate of the change of the pressure is zero -derives from the isochoric condition through the assumption of zero initial particle velocity. In PAT, measurements of the induced broadband acoustic pulses are recorded at the surface of the tissue by an array of ultrasound sensors. In order to form a PAT image, it is necessary to solve an inverse initial value problem by inferring an initial acoustic pressure distribution from measured acoustic time series. This image reconstruction step is the subject of this paper. A possible subsequent image reconstruction step, not considered here, is to recover images of the optical properties of the tissue. This requires a model of light transport and is sometimes known as quantitative PAT [10]. The solution of the acoustic inversion examined here provides the data for this optical inversion.

Challenges of Photoacoustic Tomography
Image reconstruction in PAT calls for the robust inversion of the linear system where p 0 represents the initial pressure distribution we wish to recover, f represents the measured acoustic pressure time series, ε an additional measurement error, and the linear operator A models both the ultrasonic wave propagation in the tissue and the measurement process (we will discuss the details of this modeling in the Section 2.1). Solving (1) can be done in several ways: (i) Assuming homogeneous acoustic properties and the absence of acoustic absorption, the measured time series can be related to the initial pressure distribution via the spherical mean Radon transform. In this case, the methods of integral geometry can be used to derive direct, explicit inversion formulae for certain sensor geometries, such as e.g. spherical arrays. See [21] and the references therein for an overview of such methods. (ii) In more general scenarios, a direct inverse operator corresponding to a forward model of the wave equation can be used. One commonly used example is given by time reversal (TR) approaches [12,40,39,15]: The measured time series at the boundary is used in reversed temporal order as a time dependent Dirichlet boundary condition for the backward wave equation and the pressure field at t = 0 is used as an estimate of the initial pressure distribution. With the development of fast numerical methods that are capable of computing full 3D wave propagation with high spatial and temporal resolution, TR and its enhanced variants [29,28] became an attractive choice in applications [23,16,18]. (iii) In particular in scenarios with incomplete or sub-sampled data, numerical models of wave propagation can be used within the variational image reconstruction framework to find a regularized least-squares solution of (1) by solving the optimization problem Here, λ > 0 is a regularization parameter and J (p 0 ) is a suitable regularization functional that aims to encode a-priori knowledge about the true solution, p 0 .
While variational image reconstruction in PAT has recently been shown to yield superior results compared to TR solutions [17,24,5], a potential challenge to its application is that solving (2) by first order optimization methods (see, e.g., [8] for an overview) requires an efficient numerical scheme for the application of the adjoint PAT operator, A * . One approach is to first discretise the forward operator and then to find its algebraic adjoint, for example by explicit reversal of the computational steps of the forward scheme [17]. The adjoint obtained by this discretise-then-adjoint approach however is not the adjoint of the continuous operator but of the particular numerical scheme. Therefore, in this paper we revisit the mathematical modelling of PAT to derive the general analytical form of the PAT adjoint. The explicit analytical form of the adjoint allows us new theoretical insights, in particular, it clarifies the relation between the adjoint and TR. The analytical adjoint is independent of discretization, i.e., both the forward and adjoint equations can be discretised with any suitable discretization scheme and pre-existing high performance codes can be used to apply the forward and adjoint operators. The remainder of the paper is structured as follows: In Section 2, we revisit the mathematical modelling of the forward problem in PAT and derive an explicit form of the adjoint PAT operator. Section 3 recapitulates how the adjoint operator can be used within a variational scheme. The utility of adjoint-then-discretise paradigm is evaluated in Section 4: After a direct comparison to the discretise-then-adjoint paradigm in [17], variational methods are compared to approaches based on TR for two image reconstruction problems including soundspeed inhomogeneity. In Section 5, we summarize our work and discuss its relations to other approaches. An efficient implementation of the adjoint operator using a k-space pseudospectral wave propagation model is given in Appendix B.

Mathematical Modeling
Wave propagation Acoustic wave propagation through a compressible fluid is usually modelled mathematically by linearising the equations of fluid dynamics derived from conservation laws and an equation of state. Under the condition that the acoustic particle velocity is much slower than the sound speed, an acoustic wave in tissue can then be modelled by the system of first order equations [34]: where p and ρ are the acoustic pressure and acoustic density fluctuations, u is the acoustic particle velocity (the time derivative of the acoustic displacement d), s is a mass source term, and ρ 0 and c 0 are the ambient density and sound speed respectively. Both ρ 0 and c 0 are positive and bounded from above and below. The system (3a-3c) can be combined into the second-order wave equation for a heterogeneous medium: where we introduced the inhomogenous d'Alembert operator 2 .
Initial and boundary conditions As described in Section 1.1, p 0 can be introduced as the initial value of the pressure rather than as a source term. Also, assuming the particle velocity u at t = 0 vanishes, (3a-3c) suggests that the time derivative of p should be initialized with zero. With these assumptions, the forward problem of PAT in d dimensions is stated as Because in practice it is possible to time-gate out any reflections from the experimental equipment, the propagation can be modelled as though it occurs in an unbounded domain i.e., no explicit boundary conditions are required.

Measurement
We deliberately model the measurement of the pressure field p(x, t) in two steps: Firstly, each element of the acoustic sensor array has only limited access to the pressure field: It only detects the field over a small, but finite, volume of space, and the measurement continues only for a finite time T . To signify this, we introduce the window function ω(x, t) ∈ C ∞ 0 (Γ × (0, T )), which maps p(x, t) to the field accessible by the sensor array: g(x, t) = ω(x, t)p(x, t). Here, Γ is a d-dim, open, bounded set with non-zero measure that contains the cumulated volume of all sensor elements. The sound speed on Γ is assumed to be constant. In a second step, each sensor element transforms the accessible spatio-temporal part of the pressure field into a sequence of measured values. This process can be modelled by a measurement operator M which Each type of ultrasonic detector elements commonly used for PAT (see [3,25] for comprehensive reviews) leads to a different particular form of M. For this work, we only assume that have access to the adjoint operator M * .

Derivation of the Adjoint PAT Operator
In this section, we derive the adjoint operator A * to the forward operator where p(x, t) is the time dependent solution of the PAT forward problem (5a-5c). The assumption p 0 ∈ C ∞ 0 (R d ), is justified by the short heat diffusion always present in the thermalisation process generating the initial pressure (cf. Section 1.1). As we assume that M * is given, with respect to the generic L 2 bilinear form in C ∞ 0 (Γ × (0, T )) and C ∞ 0 (R d ), respectively. By definition, this adjoint P * has to satisfy the equality . Therefore, we start with the left hand side ( ) of (8) and we seek to reformulate it in terms of an operator acting on the function g. The solution of the initial value problem on an unbounded domain (5a-5c) can be written as where G(x, t|x , t ) is the causal Green's function of the homogeneous problem corresponding to (4) [2]: Substituting P, (7b), and p(x, t), (9) into ( ) we obtain Shifting the time derivative of G into the integral and rearranging the order of integration, (11) becomes Performing integration by parts in t on ( ), we further obtain The window function ω ∈ C ∞ 0 (Γ × (0, T )) models which spatio-temporal domain is accessible to the sensor array (cf. Section 2.1). As the detection time is restricted, ω(·, 0) = ω(·, T ) = 0, and the second term (13b) vanishes. The next important step is to use the basic physical invariances of the Green's function [2], to rewrite (13a) as Applying a transformation of variables t ← T − t, (15) becomes If we define and compare with (8), we found the adjoint operator P * : is compactly supported due to the compact spatial support of ω, the finite propagation time and the upper bound on c 0 (x)). With the above derivation we can see that the adjoint can be defined through P * [g](x ) := q(x , T ), i.e. P * maps g to the function q(x , t) evaluated at t = T , where q is given by the following wave equation: One way to compare the adjoint operator defined by the above system (17a-17c) to time reversal approaches (cf. Section 1.2), is to choose a closed, sufficiently smooth d − 1 dimensional surface Θ inside the sensor volume Γ, on which we can impose a boundary condition. P can then be expressed as mapping g to the solution q(x , t) of the following wave equation, again, evaluated at t = T : P [g](x ) := q(x , T ). The main difference between time reversal and the adjoint is therefore given by how the time reversed pressure in the sensor element, g(x , T − t)ω(x , T − t), is introduced into the wave equation (4): In the adjoint (17a-17c), it enters as the mass source term s(x, t) while in time reversal (18a-18d), it enters as an explicit constraint on a sub-set.

Numerical Wave Propagation Model
The adjoint model derived above could be implemented numerically using any of the standard techniques for modelling wave propagation in the time domain, so long as they allow to include a time-varying mass source term. One possible realization of such a scheme is given by the k-space time domain method [26,11,35], which is already widely used for forward modelling in photoacoustics, e.g. in the k-Wave Matlab Toolbox [32]. In Appendix A, we give a full technical description of how to use k-Wave to implement discretised versions of the forward operator A = MP (6a-6b), the adjoint operator A * = P * M * (16) and the time reversal operator A = P M * (18a-18d), which we will denote by A ∈ R L×N , A * ∈ R N ×L and A ∈ R N ×L , respectively. Note that for the discretisation, we restrict the spatial domain to a d-orthotope Ω (bounding box ) with Γ ⊂ Ω, which is enclosed by a Perfectly Matched Layer (PML) to model wave propagation on an unbounded domain.

Image Reconstruction Methods
We will now recapitulate some image reconstruction methods that rely on the discrete PAT operators described in the previous section. The first two elementary methods are obtained by directly applying the time reversal and the adjoint operator to the data: In [29,28], an enhancement of standard time reversal was proposed that takes the form of a Neumann series approach: The k-th iterate is obtained as where I N is the identity in R N . A simple reformulation of (21) leads to an iterative image reconstruction algorithm: Note that in [29], a modified time reversal operator is used that replaces (18c) by a harmonic extension of the pressure field g(x , T ) to R d , which is particularly important if not all waves have left the domain, i.e., p(x, T ) = 0 for x ∈ Γ. However, in the scenarios we consider in the numerical simulations, T is chosen large enough so that the power of the residual waves is well below the noise level of the sensors. We therefore neglect the harmonic extension in this study. Next, we discuss iterative image reconstruction methods based on a variational model (2). The simplest case is given by omitting J (p), i.e., we are just looking for the least-squares solution (LS) p LS := argmin In principle, p LS is given as the solution of the normal equations, which we could iteratively solve with customized variants of the conjugate gradient scheme, such as the conjugate gradient least-squares (CGLS ) algorithm [6]. Equipped with a suitable stopping criterion, it yields a well known and understood regularization strategy for solving (1), i.e., a scheme that is robust to noise and ill-conditioning (cf. [14,9] and the references in [1]). However, for a better comparison to the previously presented iTR, we solve (23) by a first order method [8]. We only use the gradient of the cost function at p, in a simple gradient descent algorithm: For a given initialization p 0 , the iteration is given as which is very similar to (22): A has been replaced by A * and we introduced a step size η.
As a slight modification of the above, we can incorporate the a-priori knowledge p 0 0 and solve the positivity-constrained least-squares problem by modifying (25) to a gradient descent re-projection algorithm: where Π + is a projection onto R N + . Given the similarity between (22) and (25), it seems natural to define a positivityconstrained version of iterative time reversal as The next variational method employs the total variation (TV ) energy [7] as a regularizing functional: where λ > 0 is the regularization parameter and TV(p) is the 1 norm of the amplitude of the gradient field of p (the details of the implementation are given in the appendix). We can solve (29), by modifying (27) from a projected to a proximal gradient descent: where the proximal operator prox α,TV+ (y) := argmin x 0 is given as a positivity-constrained total variation denoising problem. For our purposes, solving (31) by the algorithm described in [4] is sufficiently fast. We chose the schemes (25), (27) and (30) because they are easy and intuitive to explain and provide an interesting connection to iterative time-reversal. In all the above algorithms, η ∈ (0, 2/θ) with θ being the largest singular value of A * A, ensures the convergence to a minimizer of the corresponding optimization problems, see [8]. For a given experimental PAT setting, θ only has to be computed once, which can be done by a simple power iteration.

Computational Scenarios
To illustrate our results, we use two simple settings: (I) A 3D scenario: Ω = [0, 1] 3 discretised by N = N 3 x isotropic voxels. The sound speed and ambient density are assumed to be homogeneous with c = 1500 (m/s), ρ 0 = 1000 (kg/m 3 ) and we add N PML voxels of PML at all sides of the cube. The pressure is measured in the plane x = 0, which corresponds to the top layer of voxels. This situation is, e.g., encountered in planar Fabry-Pérot (FP) photoacoustic scanners [43,22,23,18].
(II) A 2D scenario: Ω = [0, 1] 2 with inhomogeneous medium properties and an irregular sensor geometry, see Figure 1. The medium properties vary in the range encountered, e.g., in the human breast [19].
As our derivation in Section 2 is independent of the concrete model for M, we use simple point-sampling in space and time in both scenarios.

Precision Assessment
The discretised operators A and A * are not fully adjoint to each other anymore, which means that if we define then χ[A, A * ](x, y) > 0 for most x, y. In this section, we investigate this error by relating it to other sources of error such as the modeling error caused by the PML or numerical errors due to limited precision and examine its influence on inverse reconstructions. As a "baseline" to validate the adjoint-then-discretize approach described in this work, we will use the discretize-then-adjoint approach described in [17], which relies on the explicit adjoining of all the steps in the k-Wave iteration (cf. Appendix A). Hence, for the algebraic adjoint operatorĀ * obtained this way, χ[A,Ā * ] should be 0 up to the numerical precision.
To test this, we computed (32) for 100 random realizations of x and y in scenario (I). The results are summarized in Table 1. To interpret the results correctly, one has to bear in mind that the PML is known to be the largest limitation of numerical accuracy of the k-Wave iteration [11], effectively preventing the use of double precision to improve numerical accuracy. Therefore, the most relevant results in Table 1 are the ones for n = 128, PML= 16 and single precision. In this setting, the statistics of the error (32) are comparable for A * andĀ * . The results for double precision have to be regarded as a sanity check: While the error (32) forĀ * is, as expected, smaller than for A * , double precision is not used in practical computations. Nonetheless, we investigated the effects of this higher numerical precision in more detail in a second study using scenario (II). For this, we compare the iterates of scheme (27) to compute (26): In the first case, we useĀ * as the adjoint operator, and in the second case, we use A * . The corresponding iterates will be denoted by p k alg and p k ana , respectively. Figure 2 shows how the relative error between p k alg and p k ana develops with k. While the relative 2 error remains almost constant, the relative ∞ error grows slowly but does not indicate any instability. Table 2 completes our accuracy studies with the statistics of the 2 and ∞ norm of the distance A * y −Ā * y for 100 random realizations of y. We can see that in both norms the maximum/median discrepancy slightly decreases with decreasing N . However, we also see that this decrease can be compensated for by increasing the PML size. Furthermore, we cannot observe a difference between single and double precision in this measure anymore. Summarising, the studies indicate that the proposed implementation of the analytical adjoint A * is sufficiently close to the algebraic adjoint to be confidently used in variational reconstruction schemes. Figure 3 shows the results of the different inverse methods applied to scenario (I) using N = 256 3 voxels and a sphere as the ground truth for p 0 . Figure 4 shows the corresponding results for scenario (II), cf., Figure 1. The peak-signal-to-noise ratio   , y), B = A * (adjoint-then-discretize) and B =Ā * (discretize-then-adjoint), for 100 random realization of x and y. Displayed are maximum/median of log 10 applied to the data. log 10 (kp k alg ! p k ana k 2 =kp k alg k 2 ) log 10 (kp k alg ! p k ana k 1 = kp k alg k 1 ) Figure 2: Relative 2 and ∞ errors between the iterates of scheme (27) to compute (26) using either A * orĀ * as the adjoint operator. The corresponding iterates are denoted by p k ana and p k alg , respectively. Table 2: Statistics of the 2 and ∞ norm of A * y −Ā * y for 100 random realization of y (adjoint-then-discretize vs. discretize-then-adjoint). Displayed are maximum/median of log 10 applied to the data. All computations have been performed with single precision on a GPU. For all iterative methods, K = 100 iterations have been used and η = 1.8/θ was used as a step size. All results have been projected to R N + as a post-processing step. In line with the theoretical prediction (cf. (17a-17c) and (18a-18d)) and the numerical implementation (cf. Appendix B), the differences between BP and TR are rather subtle. A particular reason for this is that the measurement time T was chosen large enough, cf. [5]. However, Figure 3 shows that the limited-view geometry of scenario (I) leads to differences in the decay of intensity with depth and the appearance of the well-known circular limited-view artefacts [13]. The differences between the iterative approaches based on time reversal or least-squares minimization are even more subtle than between plain TR and BP. The results rather show that including a-priori information such as positivity or total variation constraints leads to a far more substantial improvement in terms of reconstruction quality.

Discussion, Conclusions and Outlook
In this work, we examined the adjoint of the continuous PAT operator (6a-6b) from different perspectives. We first described how the adjoint is connected to the solution of a wave equation with a time-varying mass source term (17a-17c) (cf. Section 2). The simple proof was based on two main concepts: • Our mathematical model (cf. Section 2.1) of the measurement process explicitly accounts for the fact that the sensor array can only access the pressure field within a finite spatial volume and within a finite time interval. This was modelled by the introduction of a window function, which causes the boundary terms arising from the integration by parts in time to vanish in a central step in the proof, cf. (13b).
• The use of the Green's function representation of the solution of the wave equation (4), which reflects the temporal invariances of the underlying physical system, cf. (14a-14b).
We would like to remark that our setting is more general in terms of choice of Γ and T than the standard one, which assumes measurements on the boundary of the domain.
In this standard setting, the boundary terms arising from the integration by parts in (13b) only vanish under the assumption that the support of p 0 is contained in the interior of the domain enclosed by the measurement surface and that the time T is large enough for all waves to leave this domain. The explicit form of the continuous adjoint not only allows for an easy theoretical comparison to time-reversal (18a-18d), it also enables its numerical implementation with any existing wave propagation code that is able to include a time-varying mass source. As a particular example thereof, we examined an implementation based on the k -space pseudospectral time domain method (cf. Sections 3 and 4): • We demonstrated that our adjoint-then-discretize approach is sufficiently precise compared to the discretize-then-adjoint approach [17] that has to be derived for each specific wave propagation code separately and yields a discrete, purely algebraic, characterization of the adjoint only.
• We illustrated how the adjoint can be used to solve variational image reconstruction problems (2) that incorporate a-priori constraints on the initial pressure.
An alternative description and implementation of the adjoint operator for PAT was recently presented in [5]: In comparison to our approach, the forward problem is modelled in a more restrictive way and with a different motivation, which, by using the weak formulation approach (as opposed to the integral formulation approach we take here) leads to the characterization of the adjoint as a wave transmission problem. The latter was solved by a specific BEM-FEM approach, which is computationally intensive and the numerical studies focus on short recording times T with full boundary measurements. Finally, these future directions of research are directly related to this work: • The inverse reconstructions (cf. Section 3.2 and 4.3) revealed interesting connections and differences between iterative methods based on time reversal and the adjoint operator. This topic has to be examined in more detail, both from theoretical and numerical perspectives: As (17a-17c) and (18a-18d) differ in the way the time-reversed pressure is introduced at the sensor elements, the effect of different sensor geometries, targets in close proximity to the sensors, artefacts caused by limited sensor coverage [13] and short recording times, T , need to be examined. Ultimately, this is not only a mathematical question: The different artefacts caused by time reversal and adjoint operator lead to different deficiencies in in-vivo images, each causing problems for the interpretation of those images.
• The optimization schemes (25), (27) and (30) were only used in this work because they are easy to describe and directly relate to iterative time-reversal. However, we note here that they are computationally rather inefficient and were not tuned for optimal performance. Therefore, the number of iterations used in the numerical examples does not provide a good indicator of the expected computational cost. More sophisticated algorithms for solving the corresponding optimization problems [8] will be examined in forthcoming work, in particular in the context of more challenging large-scale models encountered in many experimental data scenarios.
• Code to reproduce the results of this article will be provided as example scripts for a PAT image reconstruction toolbox that will be released to complement the k-Wave toolbox (cf. Appendix A).

Appendix B. Pseudospectral Implementation of the PAT Operators
Using the k-Wave iteration, the forward operator P (7a-7b) and the adjoint operator P * (17a-17c) can be implemented as follows: P is implemented by setting s n+1/2 = 1 2c and the result is given by f = M g. Here, M is a discrete implementation of the measurement operator M, i.e., it maps the discrete pressure time series in the sensor elements, g ∈ R NtNΓ , to the data f ∈ R L . In the simulation studies, M is just an identity matrix. In (B.1), Q is a smoothing operator introduced to eliminate the high spatial frequencies of p 0 to avoid unintended oscillations in the field being introduced by the band-limited interpolant (see Section 2.8 in the k-Wave manual and [33]). withg := M * f and omitting (A.6). The result is given by Qp Nt /(c 2 0ρ0 ), where the smoothing by the self-adjoint Q has to be included to obtain adjointness with the forward implementation. The rescaling of p Nt andg by c 2 0 andρ 0 is necessary since we are actually solving the adjoint of the first order system (3a-3c) instead of the second order system (4): p and ρ are essentially the same variables up to a scaling by c 2 0 . In the first order adjoint system, the scaling is transferred from ρ to p. The same holds for ρ 0 (x), which is absent from the second order system. It is interesting to compare the implementation of the adjoint with the implementation of TR where the time-reversed pressure field in the sensor elements,g Nt−n+1 , is introduced as a Dirichlet source: In the k-Wave iteration, (A.4) is replaced by