Joint reconstruction of initial pressure distribution and spatial distribution of acoustic properties of elastic media with application to transcranial photoacoustic tomography

Photoacoustic computed tomography (PACT) is an emerging computed imaging modality that exploits optical contrast and ultrasonic detection principles to form images of the photoacoustically induced initial pressure distribution within tissue. The PACT reconstruction problem corresponds to a time-domain inverse source problem, where the initial pressure distribution is recovered from the measurements recorded on an aperture outside the support of the source. A major challenge in transcranial PACT brain imaging is to compensate for aberrations in the measured acoustic data that are induced by propagation of the photoacoustic wavefields through the skull. To properly account for these effects, previously proposed image reconstruction methods for transcranial PACT require knowledge of the spatial distribution of the elastic parameters of the skull. However, estimating the spatial distribution of these parameters prior to the PACT experiment remains challenging. To circumvent this issue, in this work a method to jointly reconstruct the initial pressure distribution and a low-dimensional representation of the elastic parameters of the skull is developed and investigated. The joint reconstruction (JR) problem is solved by use of a proximal optimization method that allows constraints and non-smooth regularization terms. The proposed method is evaluated by use of large-scale three-dimensional (3D) computer-simulation studies that mimic transcranial PACT experiments.


Introduction
Photoacoustic computed tomography (PACT) is an emerging computed imaging modality that exploits optical contrast and ultrasonic detection principles to form images of the absorbed optical energy density within tissue (Kruger et al 1995, Kruger et al 1999, Xu et al 2011. In PACT, the object is irradiated with a short laser pulse. Under the condition of thermal confinement, the absorption of the optical energy within the object results in the generation of pressure waves via the photoacoustic (PA) effect (Xu and Wang 2006a, Oraevsky and Karabutov 2003, Wang and Wu 2012. These pressure waves propagate outside of the object and are subsequently detected by a collection of broadband ultrasound transducers located outside the support of the object. The goal of PACT in its canonical formulation is to reconstruct an image that represents a map of the photoacoustically-induced initial pressure distribution within the object from knowledge of the measured photoacoustically-induced pressure waves (Poudel et al 2019). The initial pressure distribution is proportional to the absorbed optical energy distribution within the object, which can reveal diagnostically useful information based on endogenous hemoglobin contrast or exogenous contrast if molecular probes are utilized (Jones et al 1980, Cheong et al 2003, Xu et al 2011. The majority of the currently available PACT reconstruction methods are based on idealized imaging models that assume a lossless and acoustically homogeneous fluid medium (Kunyansky 2007, Finch andRakesh 2004). This assumption, however, is grossly violated in transcranial PACT due to the presence of the skull. As result, images produced by conventional reconstruction methods can contain significant distortions and degraded spatial resolution , Xu and Wang 2006b, Jin et al 2008, Xu et al 2011, Nie et al 2011, Huang et al 2012.
To mitigate image artifacts caused by acoustic heterogeneties, heuristic methods such as the half-time and partial-time image reconstruction methods have been developed , Anastasio et al 2005a, Anastasio et al 2005b. This is a heuristic method that has been shown to mitigate distortions in the reconstructed image in certain applications. An attractive feature of such methods is that they do not require knowledge of the spatial distribution of acoustic parameters within the object. Such methods have been demonstrated to be effective for the case of fluid media containing weak acoustic heterogeneties; however, they are generally ineffective in the presence strong acoustic heterogeneities such as the skull (Poudel et al 2019). Moreover, truncation-based reconstruction methods generally require that pressure measurements are acquired on a closed measurement surface that surrounds the object (skull), in order to satisfy certain data-sufficiency requirements and microlocal stability. In 3D transcranial PACT measurement geometries, this cannot generally be achieved.
In addition, numerous image reconstruction methods that can account for (known) acoustic heterogeneities in a fluid medium have been proposed , Jin et al 2008, Poudel et al 2019, Grün et al 2008, Burgholzer et al 2007, Hristova et al 2008, Treeby et al 2010. However, bones, such as the skull, are more accurately described as elastic media that can support the propagation of both pressure and shear wavefields . The methods described above that assume a fluid medium fail to account for shear wavefield propagation within the skull and mode conversion between pressure and shear wavefields at the skull interface (Schoonover and Anastasio 2011, White et al 2006. The deleterious effects of neglecting these physical effects in transcranial PACT have been studied previously , Xu and Wang 2006b, Jin et al 2008, Xu et al 2011, Nie et al 2011, Huang et al 2012. To further advance transcranial PACT, image reconstruction methods based on the 3D elastic wave equation have been developed , Poudel et al 2020, Javaherian and Holman 2018, Poudel et al 2019. It has been demonstrated that the images reconstructed using such methods can contain significantly reduced artifact levels compared to those reconstructed by use of methods that assume a fluid medium , Poudel et al 2020. However, a limitation of these methods is that they require the spatial distribution of the elastic properties of the skull to be estimated prior to the PACT experiment. In practice, such subject-specific information is generally difficult to obtain. Furthermore, as the elastic wave equation-based iterative image reconstruction methods are computationally intensive, it can be difficult to manually tune the parameters of the medium based on the reconstructed initial pressure distribution. A natural approach to mitigating this problem is to reformulate the reconstruction problem as a joint reconstruction (JR) problem in which the initial pressure distribution and elastic medium parameters are concurrently estimated by use of the measured PACT data. This idea is the topic of this article and is developed and explored as described below.
Several theoretical and computational studies of the JR problem for PACT in fluid media have been reported (Stefanov and Uhlmann 2012, Huang et al 2016, Liu and Uhlmann 2015, Kirsch and Scherzer 2012. Theoretical work on the JR problem in fluid media that neglects discrete sampling effects has established that the initial pressure distribution and speed of sound (SOS) distribution can be uniquely determined from the measured PACT data alone only for certain special cases (Hickmann 2010, Hristova et al 2008. However, the uniqueness of the JR problem for more general cases has not been established. Stefanov and Uhlmann considered the joint estimation problem for the linearized wave equation in fluid media and demonstrated that, in general, the SOS and initial pressure distributions could not both be stably recovered Uhlmann 2013, Stefanov andUhlmann 2012). A similar conclusion for the full non-linear problem was established via computer-simulation studies (Huang et al 2016). That study also demonstrated that, in certain cases, JR can lead to improved estimates of the initial pressure distribution as compared to conventional methods, even if the acoustic parameters were not estimated accurately (Huang et al 2016).
The idea of stabilizing the JR problem by use of a low-dimensional parameterized representation of the spatial distribution of the acoustic properties of a fluid medium has been proposed (Matthews et al 2018, Zhang et al 2008. It has been demonstrated that the initial pressure distribution and a parameterized version of the SOS distribution can be jointly estimated by minimizing an objective function consisting of a data fidelity term and a regularization term by use of a gradient-based alternating minimization approach (Matthews et al 2018). Acoustic wave propagation was modeled by use of the 2D acoustic wave equation and the adjoint state method was employed to compute the gradients with respect to the initial pressure and acoustic medium parameters. Numerical studies confirmed the robustness of the proposed method when accurate low-dimensional representations of the SOS distribution were employed. A mathematical explanation for the effectiveness of the approach has also been provided (Acosta 2019).
In this work, inspired by the parameterized JR method, we develop and investigate a JR method for 3D transcranial PACT. The formulation of the previously developed parameterized JR problem for PACT with 2D fluid media is extended in several key ways. Firstly, the forward model is replaced with a 3D elastic wave equation-based model that allows for wave propagation in elastic solids. Secondly, motivated by the geometry of a subject's skull that is visible in adjunct x-ray CT or MRI data, low-dimensional parameterizations of the elastic properties of the skull are introduced. Thirdly, a memory-efficient implementation of the adjoint state method that utilizes binomial checkpointing is utilized for computing gradients of the cost functional with respect to the elastic model parameters. The JR problem is solved by use of a proximal optimization method that allows constraints and non-smooth regularization terms. The computational methods were implemented by use of parallel GPU-based computer code. The proposed JR is evaluated by use of largescale three-dimensional (3D) computer-simulation studies that mimic transcranial PACT experiments.
The remainder of the manuscript is organized as follows. In section 2, background information on the imaging model and existing image reconstruction methods is provided. In section 3, the proposed parameterized JR method for 3D transcranial PACT is introduced. A description of the computer-simulation studies and the results of these studies are presented in sections 4 and 5, respectively. Finally, the article concludes with a summary in section 6.

Imaging physics
Let the photoacoustically-induced stress tensor at location r ∈ R 3 and time t 0 be defined as where σ i j (r, t) represents the stress in the ith direction acting on a plane perpendicular to the jth direction. Additionally, let p 0 (r) denote the photoacoustically-induced initial pressure distribution within the object, referred to as the object function, and letu(r, t) ≡ (u 1 (r, t),u 2 (r, t),u 3 (r, t)) represent the vector-valued acoustic particle velocity. For conciseness, the velocity ∂ t u and the acceleration ∂ tt u are represented byu, and ∂ tu respectively. Here, ρ(r), λ(r) and μ(r) represent the spatial distribution of the medium's density and the Lamé parameters, respectively. The compressional and shear wave propagation speeds are given by c l (r) = λ(r)+2 μ(r) ρ (r) and c s (r) = μ(r) ρ(r) . The object function p 0 (r) and all functions that describe the elastic isotropic medium are assumed to be bounded with compact supports.
The acoustic absorption of the skull is described by a diffusive absorption model (Moczo et al 2007) in this study. The diffusive absorption model assumes that the wavefield absorption is independent of temporal frequency; this model does not accurately describe the power law based frequency dependent absorption characteristics of soft tissue and bone (Szabo 1994, Treeby andCox 2014). This model, however, can be sufficiently accurate in cases where the bandwidth of the photoacoustic signals is limited.
In a heterogeneous isotropic elastic medium with a diffusive acoustic absorption distribution α(r), the propagation ofu(r, t) and σ(r, t) can be modeled by the following two coupled equations (Boore 1972, Virieux 1986, Madariaga et al 1998, Alterman and Jr 1968: subject to the initial conditions Here, tr(·) is the operator that calculates the trace of a matrix, I ∈ R 3×3 is the identity matrix, and ∇· is the divergence operator. In equation (2c), p 0 (r) is assumed to be compactly supported in a fluid medium where the shear modulus μ(r) = 0. Hence, the formulation of the initial value condition assumes that there are no optical absorbers inside the elastic solid.

Forward model in continuous and discrete forms
Let p ≡ tr(σ) ∈ L 2 (Ω × [0, T]). The forward model can be described by a continuous-tocontinuous (C-C) mapping as: where p| ∂Ω×[0,T] ∈ L 2 (∂Ω × [0, T]) denotes the measured pressure wavefield data, ∂Ω ⊂ R 3 denotes a measurement aperture that encloses the domain Ω and p 0 ∈ L 2 (Ω) is the soughtafter photoacoustically-induced initial pressure distribution. The linear operator H : L 2 (Ω) → L 2 (Ω × [0, T]) maps p 0 to the pressure wavefield p via the initial value problem in equation (2), and the operator M is the restriction of H to ∂Ω × [0, T]. In practice, the detected pressure wavefield is discretized temporally and spatially at specific transducer locations. In this work, the discrete forward operator described below will be employed. Additional details can be found in a previous study that utilized the same operator . Let p ∈ R N r L denote the discretized pressure signal, where N r represents the number of transducers and L represents the total number of discrete temporal samples. In the forward model described in equation (3), the effects of the acousto-electric impulse response as well as the spatial impulse response of the transducer have been ignored. However, the transducer responses can be incorporated into the imaging model readily .
To obtain a discrete-to-discrete (D-D) imaging model for use with an optimization-based reconstruction method, finite-dimensional approximate representations of the object function p 0 (r) and the material parameters ρ(r), α(r), λ(r), and μ(r) need to be introduced, whose forms are prescribed by the numerical method employed to solve the wave equation. In this study, the numerical method employed to solve the wave equation in equation (2) was a 10th-order staggered grid finite difference time domain (FDTD) scheme. Note that for the staggered finite-difference (FD) scheme, the material properties, stress, and particle velocity functions are sampled at different points of a staggered FDTD cell. Let ρ, α, λ, μ, and p 0 ∈ R N be the finite dimensional representations of ρ(r), α(r), λ(r), μ(r) and p 0 (r), respectively, where N is the total number of grid points on the 3D grid.
For a given ρ, α, λ, μ, and p 0 , the FDTD method for solving the elastic wave equation can be described in operator form as where c ∈ R 4N×1 is a vector that represents the acoustic properties of the elastic medium and is given by c ≡ ρ, α, λ, μ T , H(c) ∈ R NL×N is the discrete approximation of the wave operator H for a fixed c that solves the initial value problem defined in equation (2), and M ∈ R N r L×NL is a sampling matrix that maps pressure data sampled on the computational grid onto the transducer locations. The transducers were assumed to be point-like and thus when the receiver and grid point locations do not coincide, an interpolation method is required. The elements of M were chosen such that trilinear interpolation is performed. The goal of image reconstruction in a discrete setting is to determine an estimate of p 0 given the measured data p and the forward model in equation (4).

Reconstruction of initial pressure distribution when the elastic medium is known
For a fixed c, an estimate of p 0 can be obtained as (Poudel et al 2020, Poudel et al 2019 p 0 = argmin where with p denoting the recorded pressure data. The non-negativity constraint p 0 0 is imposed because negative initial pressure distribution values are nonphysical. The regularization term R(p 0 ) incorporates a priori information about the sought-after initial pressure distribution with γ denoting the regularization parameter that controls the relative weight of the data fidelity and regularization terms. The regularization function in the studies below is chosen to be the total-variation (TV) seminorm (Chambolle et al 2010) Here, [p 0 ] n denotes the nth grid node, and [p 0 ] n1 − , [p 0 ] n2 − , and [p 0 ] n3 − are the neighboring nodes before the nth node along the first, second and third dimension, respectively. The fast iterative shrinkage/thresholding algorithm (FISTA) with backtracking linesearch and adaptive restart was employed to solve the optimization problem in equation (5) (O'donoghue and Candes 2015, Beck and Teboulle 2009b, 2009a.

Parameterized joint reconstruction (JR)
A JR method for transcranial PACT that circumvents the need to know the elastic properties of the skull is described below.

Parameterized representation of the elastic properties of the skull
To constrain and stabilize the JR problem, a low-dimensional parameterized representation of the spatially variant elastic properties of the skull is considered. Consider the vector c ≡ [ρ, α, λ, μ] T ∈ R 4N×1 that represents the concatenation of the four material properties defined on a 3D grid as described above. The vector c p ≡ [ρ p , α p , λ p , μ p ] T ∈ R 4Q×1 represents the parameterized representation of the elastic properties of the medium, where Q < N. Here, the subscript p is used to represent the parameterized representation of the elastic properties of the medium. The original and parameterized elastic properties of the medium can be related as where Φ is a mapping that relates the two representations. While equation (8) permits for many possible parameterized representations, the studies below will focus on the case where the skull properties at each pixel in the computational grid corresponds to one of 4Q possible values. In this case, Φ ∈ R 4N×4Q is a real-valued binary matrix defined as where I j denote the set of indices of pixels that correspond to the jth parameter value, and [Φ] i, j denotes the (i, j) element of the matrix Φ.

Formulation of JR problem
By use of the parameterization in equation (8), the JR problem can be formulated as The quantities p 0 , c p , and c ≡ Φ c p denote estimates of p 0 , c p , and c, respectively. The estimate c p in equation (10) is constrained to lie in the closed, convex set where c max i and c min i represent the maximum and minimum value associated with the ith parameter. The computation of the gradient ∇ c F(p 0 , Φc p ) can be accomplished by use of the adjoint state method described in appendix A. To mitigate the significant computational and memory burden involved in this computation, a binomial checkpointing method can be employed (Symes 2007, Yang et al 2016, Griewank and Walther 2000. Given ∇ c F(p 0 , Φc p ), the gradient of the data fidelity term with respect to c p can be expressed as For the choice of Φ defined in equation (9), one obtains According to equation (13), the magnitude of the jth component of ∇ c p F(p 0 , Φc p ) will depend on the cardinality of the indicator set I j , which represents the number of pixels in c whose values are specified by the jth component of c p . This can render the gradient poorly scaled, resulting in a slow convergence. To mitigate this, a diagonal, positive definite scaling matrix is introduced and defined as Algorithm 1. Parameterized joint reconstruction method.
This matrix can be employed to generate a new search direction B∇ c p F(p 0 , Φc p ) that represents the average gradient over the pixels associated with each component of c p . This new search direction is always a descent direction because B is positive definite by construction.
Other choices for B could result in even faster convergence; however, the choice given by equation (14) requires little additional computational cost and ensures that the magnitude of each gradient component does not depend strongly on the number of pixels corresponding to each parameter.

Reconstruction algorithm
Equation (10) can be solved by use of the weighted proximal gradient descent method summarized in algorithm 1. This algorithm contains two major parts, one in which p 0 is updated via a proximal gradient descent step and one in which c p is updated via a weighted gradient descent step. This is possible since the constraint and non-smooth regularization function R only involve p 0 . For brevity, the following notation is employed to describe the algorithm: where the superscript k refers to the kth iteration of the algorithm. In addition, the notation prox tR (·) denotes the proximal operator of tR and is defined as A two-parameter line search method, described in appendix C, can be employed to choose separate scalar step sizes for updating the initial pressure and sound speed distributions. The action of the projection operator P c p in algorithm 1 on an element of the vector c p is defined as (17)

Description of computer-simulation studies
Three-dimensional computer-simulation studies were performed to validate the proposed JR algorithm.  , Estrada et al 2018, Chang et al 2016. It is assumed that both the inner and outer table are made up of cortical bone that have distinct acoustic and elastic properties compared to the diploë layer and the background fluid medium (Estrada et al 2018, Chang et al 2016. Furthermore, the diploë layer is comprised of a layer of trabecular spongy bone that is described by another distinct set of acoustic parameter values (Estrada et al 2018, Chang et al 2016. Hence, in this form of parameterization, it is assumed that Q = 3 and c p ∈ R 12×1 . The mapping function Φ in equation (9) employed for this low dimensional skull model was generated based on 3D x-ray CT images of a human skull. The CT images from an intact human skull were employed to infer the thickness and contour of the inner table, outer table and the diploë layer. To extract the contour and location of the skull layers from CT images, a segmentation algorithm was employed. The segmentation algorithm generated a mask specifying the location of the inner table, outer table and the diploë layer within the 3D volume. A 2D slice of the CT image acquired from the human skull and the corresponding mask generated by use of the segmentation algorithm are shown in figures 1(a) and (b), respectively. Using the 3D segmented CT skull, the mapping function Φ for Q = 3 was established.
The second low-dimensional skull model also assumes that the skull can be characterized by a three layer model. However, in this model, it is assumed that the cortical bone comprising the inner table and outer table can be further divided into distinct sub-types, the frontal cortical bone, the left parietal cortical bone and the right parietal cortical bone, as illustrated by figure 2(a). In figure 2(a), the maximum amplitude projection of a 3D CT image of the skull along the z-direction is shown. From figure 2(a), one can observe the presence of three bones types fusing along the skull sutures. For illustration purposes, the three cortical bones are segmented and are shown in figure 2(b). The cortical bones are now described as the frontal cortical bone, left parietal cortical bone and right parietal cortical bone. Each of these bone types are assigned distinct acoustic and elastic parameter values. Similar to the Q = 3 model, it is also assumed that the diploë layer is comprised of a layer of trabecular spongy bone that is  (b) A segmented maximum amplitude projection of the 3D x-ray CT images of the skull illustrating the various cortical bone types. The frontal cortical bone is represented by the blue color, the left parietal cortical bone is represented by the red color, and the right parietal cortical bone is represented by the green color. described by another distinct set of acoustic parameter values. Hence, for this low-dimensional representation of the skull, it is assumed that Q = 5 and c p ∈ R 20×1 .
Similar to the Q = 3 skull model, the mapping function Φ employed in the Q = 5 was generated based on 3D x-ray CT images of a human skull. The CT images from an intact human skull were employed to infer the thickness and contour of the frontal cortical bone, left parietal cortical bone, right parietal cortical bone and the diploë layer. In order to extract the contour and location of the skull layers from CT images, a segmentation algorithm was employed. The segmentation algorithm generated a mask specifying the location of the frontal cortical bone, left parietal cortical bone, right parietal cortical bone and the diploë layer within the 3D volume. A 2D slice of the CT image acquired from the human skull and the corresponding mask generated by use of the segmentation algorithm are shown in figures 3(a) and (b), respectively.

Initial pressure distribution phantom.
A numerical blood vessel structure extracted from magnetic resonance angiography (MRA) data was employed as a numerical phantom that represented p 0 . The blood vessel phantom, shown in figure 4, was placed 6 mm below the inner surface of the skull. 4.1.3. Imaging system. The 3D hemispherical measurement system shown in figure 5(a) was assumed in the computer-simulation studies. The measurement system consisted of a total 38 400 ideal, point-like transducers that were distributed evenly in a spiral configuration over a hemispherical surface of radius 115.2 mm. The position of the measurement system with respect to the skull is shown in figure 5(b).

Simulation of pressure data.
In order to avoid inverse crime (Colton 2013), the measured pressure data were simulated with different temporal and spatial sampling rates than those employed for image reconstruction. When generating the measured data, the simulation grid consisted of a 3D Cartesian grid with 896 × 896 × 448 pixels and a pixel size of 0.3 mm. The pressure data were recorded for 7000 time steps at a temporal sampling rate of 100 MHz. Additive Gaussian white noise with zero mean and a standard deviation of 10 percent of the maximum pressure amplitude was added to the measured data. A MPI-based FDTD numerical wave solver  was employed to simulate the measured pressure data.
Two sets of pressure data were generated for the simulation studies. The first set of pressure data were generated using the initial pressure distribution shown in figure 4. Moreover, the spatial distribution of the acoustic properties of the skull employed for the generation of first set of pressure data correspond to the Q = 3 skull model. In this skull model, the diploë layer, cortical bone layers and the background fluid media were assigned distinct acoustic property values. The second set of pressure data were generated using the same initial pressure distribution. However, the pressure data was generated by use of the Q = 5 skull model. In this skull model, the diploë layer, the left parietal cortical bone, the right parietal cortical bone, the frontal cortical bone and the background fluid media were assigned distinct acoustic property values. The acoustic property values assigned to each of the parameters to generate the two sets of forward pressure data are summarized in table 1.

Image reconstruction.
During image reconstruction, the computational grid corresponded to a 3D Cartesian grid with 448 × 448 × 224 pixels and a pixel size of 0.6 mm. The pressure wavefield was simulated for 3500 time steps with a temporal sampling rate of 50 MHz. During the image reconstruction process, the pressure wavefield was simulated by use of a MPI-based FDTD numerical wave solver , implemented using NVIDIA's CUDA framework (CUDA C Programming Guide 2015). The image reconstruction algorithms were implemented by use of four NVIDIA V100 GPU cards. The gradient of the cost functional with respect to p 0 was computed using the action of the matched adjoint operator, while the gradient of the cost functional with respect to c was computed using the adjoint state method, described in appendix A. 4.1.6. Reconstruction of p 0 when the acoustic properties of the skull are known. The initial pressure distribution was reconstructed for a fixed c ≡ Φc p for both the Q = 3 and Q = 5 cases specified in table 2. The values for c p specified in table 2 were also used as initial guess for  the JR algorithm. The forward pressure data were generated by use of both the Q = 3 and Q = 5 skull models. The initial pressure distribution was then reconstructed by solving the optimization problem described in equation (5) for both the Q = 3 and Q = 5 skull models. The TV-regularization parameter (γ) was tuned so as to minimize the root mean squared error (RMSE) between the reconstructed initial pressure distribution and the ground truth initial pressure distribution. The value of the regularization parameter should, ideally, be determined so as to maximize a task-based image quality metric. Appendix B contains images corresponding to different values of the regularization parameter and a relevant discussion about parameter tuning. The initial guess for initial pressure distribution was the vector of all zeros. The stopping criterion employed for the algorithm was when the relative change in cost function between successive iterations was less than 10 −6 . The average time per iteration for the image reconstruction algorithm was approximately 8 min, and the results shown in section 5.1 converged within 350 iterations.

Joint reconstruction.
For all the computer-simulation studies, the density and the absorption coefficient were assumed to be known. Hence, the JR problem was restricted to the problem of estimating the spatial distribution of the longitudinal sound speed of the medium (c l ), the spatial distribution of the shear sound speed of the medium (c s ) along with p 0 . It has been shown that the spatial distribution of the density and the absorption coefficient of the skull can be estimated using raw 3D x-ray CT images expressed in Hounsfield units (Aubry et al 2003). The methodology outlined by Aubrey et al (Aubry et al 2003) was employed to compute and assign the density values and the longitudinal wave absorption coefficient values to the low-dimensional acoustic representation of the skull. The mapping function Φ employed for the Q = 3 and Q = 5 skull models were generated using the segmented 3D x-ray CT images of the skull. The initial guess for p 0 was the vector of all zeros. The initial guess for the parameterized sound speed distribution depended on the number of parameters employed and is summarized in table 2. The two versions of c p employed for the reconstruction of the initial pressure distribution in section 4.1.6 were employed as initial guesses to the JR algorithm. The stopping criterion employed for the JR algorithm was the same as the one employed in reconstruction of p 0 for known acoustic properties of the skull. Similar to section 4.1.6, the TV-regularization parameter (γ) in the JR algorithm was tuned so as to minimize the RMSE between the reconstructed initial pressure distribution and the ground truth initial pressure distribution. The average time per iteration for the joint image reconstruction algorithm was approximately 24 min, and the results shown in section 5.2 converged within 300 iterations.

Reconstruction of p 0 when the acoustic properties of the skull are known
The maximum intensity projection (MIP) images of the estimates of p 0 reconstructed by solving equation (5) when the Q = 3 and Q = 5 skull models were employed are shown in figures 6 and 7. Here, inverse crime with respect to the dimension of the skull model occurred when solving the reconstruction problem. From figures 6 and 7, it can be observed that the images reconstructed by solving the optimization problem defined in equation (5) contains artifacts. The artifacts are present because the acoustic property values assigned to the skull when reconstructing the initial pressure distributions were inaccurate. This demonstrates that image quality degrades considerably when the acoustic property values assigned to the parameters in the low-dimensional skull model are inaccurate. The artifacts present in figures 6 and 7 motivate the need to develop JR algorithms whereby the acoustic property values associated with the low-dimensional representation of the skull can be refined concurrently with p 0 .

Joint reconstruction
The 3D MIP images of the estimates of p 0 reconstructed by use of the JR method from the pressure data generated corresponding to the Q = 3 and Q = 5 skull models are shown in figures 8 and 9, respectively. Here, inverse crime with respect to the dimension of the skull model occurred. For the pressure data generated using Q = 5 skull model, JR of p 0 and the acoustic properties of a Q = 3 skull model was also performed. This study, i.e. non-inverse Figure 6. The maximum intensity projections of the reconstructed initial pressure distribution along the (a) x-axis, (b) y-axis and (c) z-axis, respectively. The initial pressure distribution was reconstructed using a conventional (non-JR) iterative image reconstruction algorithm for a fixed a Q = 3 skull model. The maximum intensity projections of the reconstructed initial pressure distribution along the (a) x-axis, (b) y-axis and (c) z-axis, respectively. The initial pressure distribution was reconstructed using a conventional (non-JR) iterative image reconstruction algorithm for a fixed Q = 5 skull model. crime with respect to the dimension of the skull model, was performed to help elucidate if performing JR with an inaccurate lower dimensional parameterized model had significant effect on the image quality of the reconstructed estimates of p 0 . In figure 10, MIP images of the reconstructed p 0 estimated with a JR algorithm employing a Q = 3 skull model and utilizing pressure data generated using a Q = 5 skull model, is shown. Also, shown for comparison in figures 8(d)-(f ) and 9(d)-(f ) are the MIP images of p 0 reconstructed using the half-time backprojection (BP) method . The uniform SOS value employed in the method was chosen by considering a range of values and selecting the value that gave us the best reconstructed image quality, as subjectively judged via visual inspection. Heuristic methods such as the half-time BP methods have been demonstrated to be effective for the case of fluid media containing weak acoustic heterogeneties; however, they are expected to be generally ineffective in the presence strong acoustic heterogeneities such as the skull. This is evident Figure 8. The maximum intensity projections of the reconstructed 3D initial pressure distribution. The first row of images correspond to the initial pressure distribution reconstructed using a parameterized JR algorithm with a Q = 3 skull model. The pressure data used to produce the images were generated using a Q = 3 skull model. The second row of images correspond to initial pressure distribution reconstructed using a half-time reconstruction method. Each column of images correspond to MIP projections along x-axis, y-axis, and z-axis, respectively.
when comparing the quality of the images reconstructed using the parameterized JR algorithm and the half-time BP method in figures 8 and 9. The uniform SOS values of 1.602 mm μs and 1.610 mm μs were used to reconstruct the images shown in figure 8 and 9, respectively. Comparing the reconstructed images shown in figure 6 with figure 8, and the images shown in figure 7 with figure 9, one can observe that the use of the proposed JR algorithm qualitatively improves the image quality of the reconstructed initial pressure distribution. It can also be observed from figure 10 that allowing the JR algorithm to effectively tune the acoustic properties of the medium, even when the parameterization is crude, can improve the image quality associated with the reconstructed initial pressure distribution. In addition to qualitative improvements, the proposed JR algorithm also quantitatively improves the accuracy of the estimated initial pressure distribution as demonstrated by the line profiles shown in figures 11 and 12. Table 3 summarizes the values of the acoustic properties of the skull estimated using the JR algorithm for the two skull models. In addition, table 3 also summarizes the values of the Figure 9. The maximum intensity projections of the reconstructed 3D initial pressure distribution. The first row of images correspond to the initial pressure distribution reconstructed using a parameterized JR algorithm with a Q = 5 skull model. The pressure data used to produce the images were generated using a Q = 5 skull model. The second row of images correspond to initial pressure distribution reconstructed using a half-time reconstruction method. Each column of images correspond to MIP projections along x-axis, y-axis, and z-axis, respectively. Figure 10. The maximum intensity projections of the reconstructed initial pressure distribution along the (a) x-axis, (b) y-axis and (c) z-axis, respectively. The initial pressure distribution reconstructed using a parameterized JR algorithm with a Q = 3 skull model. The pressure data used to produce the images were generated using a Q = 5 skull model.  acoustic properties of the Q = 3 skull model estimated when applying the JR algorithm on the data generated using a Q = 5 skull model. Comparing the results shown in table 3 with the original values shown in table 1, one observes that the JR algorithm can accurately estimate the acoustic and elastic property values associated with the low-dimensional skull model. When non-inverse crime with respect to the skull model was performed, i.e. when the JR algorithm with a Q = 3 skull model was applied on the pressure data generated utilizing the Q = 5 skull model, the true value of the acoustic properties of the skull were not recovered due to the choice of the parameterization. However, in this case, the JR algorithm estimated the effective/average value of the acoustic properties of the medium for certain regions.

Conclusion
In this work, a JR method for transcranial PACT was proposed in which the initial pressure distribution and the spatial distribution of the acoustic properties of the skull are jointly estimated from PACT measurements alone. This approach overcomes the instability associated with the JR problem from PACT measurements by utilizing a low dimensional representation of the spatial distribution of the acoustic properties of the skull. The developed reconstruction methodology was validated by use of computer-simulation studies that mimicked transcranial PACT experiments.
The computer-simulation studies compared and contrasted the performance of the proposed JR algorithm with a conventional optimization-based image reconstruction algorithm that assumed a fixed acoustic skull model. It was demonstrated that the conventional optimization-based image reconstruction algorithm yielded images with significant artifacts when the assumed skull model was inaccurate. The proposed JR algorithm, on the other hand, was able to refine the spatial distribution of the acoustic properties of the skull concurrently with the initial pressure distribution to yield accurate, high quality images of the initial pressure distribution. In addition, the proposed JR algorithm was also able to accurately estimate the acoustic property values associated with the low-dimensional skull model.
In addition, parameterized JR may offer advantages compared with manually tuning a parameterized representation of the low-dimensional skull model, particularly as the number of parameters in the skull model is increased. The number of parameters employed in the skull model is a design parameter that needs to be carefully considered. When the assumed parameterization is too simple to describe the true variations in acoustic properties within the skull, model error will lead to errors in the reconstructed images. When the assumed parameterization is very complex, the inverse problem may be poorly conditioned and may have many local minima or saddle points. Future studies can focus on considering a range of different parameterizations with different numbers of parameters to carefully examine the trade-off between model error and the conditioning of the inverse problem.
There remain several additional topics for further investigation. For instance, the parameterized low-dimensional representations of the skull developed in this work were developed based on 3D x-ray CT images of the skull. Hence, adjunct image data that provided information about the contour, location of the skull was required. There remains a need to investigate other sources of adjunct image that can help provide the contour, location of the skull without employing ionizing radiation. In addition, future investigations should also include validating the proposed JR algorithm using experimental data. Validating the proposed JR algorithm using experimental data will provide insight into the complexity of skull model required for accurate reconstruction of initial pressure distribution in transcranial PACT. Furthermore, future work can also focus on developing more efficient optimization methods for solving the JR problem, such as accelerated first-order or second-order methods, that would reduce the computational cost.

Acknowledgments
This work was supported in part by National Institutes of Health Grants R01 NS102213, T32EB01485505 and National Science Foundation Grant DMS1614305. The authors also thank Professor Lihong V Wang for his years of collaboration that inspired this work.

Appendix A. Application of the adjoint state method to the elastic wave equation
Here, the application of the adjoint state method for computing the gradients of the data fidelity term given in equation (6) with respect to the spatial distribution of the acoustic parameters of the elastic medium is described (Fabien-Ouellet et al 2016, Tromp et al 2005.
For simplicity, the presentation focuses on the first-order velocity-stress formulation of the elastic wave equation for transcranial PACT given in equation (2). The approaches for the second-order elastic wave equation follow the same basic method and have been detailed previously (Tromp et al 2005, Plessix 2006, Norton 1999. To start, consider the constraints correspond to the elastic wave equation given by equation (2), where σ andu are the state variables and C is a 4D tensor that describes the elastic moduli of the media which are the sought-after model parameters. A method for computing the values of the stress tensor and particle velocity given the elastic moduli and initial stress distributions, i.e. a method for solving the elastic wave equation, was previously discussed in section 2.2. Consider the following functional to be minimized: where p (r, t) is the pressure recorded over all transducer and M is the restriction of the diagonal components of the stress tensor over the whole domain to the pressure at the location of the transducers. Hence, the operator M computes the trace of the stress tensor over the whole grid and maps it to the pressure at the transducer locations. Note that this function depends only on the stress tensor and not directly on the particle velocity. The corresponding augmented functional is given by are the inner products for vector-valued and tensor-valued quantities, respectively. Here, n is the number of components in the vector-valued quantity, and x i represents the ith component of x. Furthermore, let m be the rank of the tensor and a i j represents the (i, j) element of the tensor a.
To compute the derivatives needed to solve the adjoint state equations, it is useful to rearrange the constraint terms. This is done by use of integration-by-parts and by the fact that the adjoint of the gradient operator is the negative divergence operator. With this, the first and the second term in the Lagrangian corresponding to equation (A.1a) can be rewritten as The third term in the Lagrangian corresponding to equation (A.1a) can be rewritten as Using integration by parts Similarly, the first term in the Lagrangian corresponding to equation (A.1b) can be rewritten as The second term in the Lagrangian corresponding to equation (A.1b) can be rewritten as To obtain an expression for the adjoint state equation, the computed ∂L ∂σ and ∂L ∂u are set to zero and the terms are grouped by their time dependence. This yields ρ∂ t γ 1 − ραγ 1 + ∇ · (C : γ 2 ) = 0 ( A . The set of differential equations in equation (A.6) bear a close resemblance to the first-order acoustic wave equation in equation (2). This suggests that a solution to these differential equations could possibly be computed by a similar approach. However, one difficulty is that equation (A.6) involves final conditions on φ 0 and φ 1 rather than the initial conditions specified in equation (2). A change of variables can be employed to change the final conditions to initial conditions: This gives a set of differential equations for the adjoint state variables that can be solved by the same method as employed to solve equation (2): In an isotropic media, C i j;kl (r) = κ(r) − 2 3 μ(r) δ i j δ kl + μ(r)(δ il δ jk + δ ik δ jl ), where κ(r) is the bulk modulus and μ(r) is the shear modulus. The bulk modulus is related to the Lamé parameters by the following relation The only term in the Lagrangian dependent on the elastic moduli of the medium is the term The maximum intensity projections of the reconstructed 3D initial pressure distribution using the JR algorithm with a Q = 5 skull model. The first row of images correspond to images reconstructed using a TV regularization value of γ = 0.02, the second row of images correspond to images reconstructed using a TV regularization value of γ = 0.05 and the third row of images correspond to images reconstructed using a TV regularization value of γ = 0.1. Each column of images correspond to MIP projections along x-axis, y-axis, and z-axis, respectively. random access memory for all time points and its entire history is made accessible when the adjoint problem is solved for efficient gradient computation (Matthews 2017, Matthews et al 2018, Huang et al 2016. For large scale problems, such as the 3D transcranial PACT problem, the storage requirement for gradient computation is severe. Hence, if the commonly employed strategy of storing forward state variables is adopted, it will require the algorithm to store the wavefield in the slowest levels of memory hierarchy such as a hard disk drive. This need for very large amounts of read/write operations performed on hard disk complicates the logistics