Photoacoustic computed tomography for functional human brain imaging [Invited

: The successes of magnetic resonance imaging and modern optical imaging of human brain function have stimulated the development of complementary modalities that offer molecular specificity, fine spatiotemporal resolution, and sufficient penetration simultaneously. By virtue of its rich optical contrast, acoustic resolution, and imaging depth far beyond the optical transport mean free path ( ∼ 1 mm in biological tissues), photoacoustic computed tomography (PACT) offers a promising complementary modality. In this article, PACT for functional human brain imaging is reviewed in its hardware, reconstruction algorithms, in vivo demonstration, and potential roadmap


Introduction
With approximately 100 billion neurons and 100 trillion connections, the brain remains one of the greatest mysteries and challenges in science and medicine [1,2]. Despite the tremendous advances in neuroscience, the underlying causes of many neurological and psychiatric disorders, such as Parkinson's disease, Alzheimer's disease, autism, and schizophrenia, remain largely unknown due to the limited information extractable from the brain and the limited knowledge to interpret the extracted information [2][3][4]. To catalyze neuroscience discovery, one primary goal of the community is to develop imaging tools that provide rich functional contrast, fine spatiotemporal resolution, and sufficient penetration.
Several neuroimaging technologies have been developed. Blood-oxygen-level dependent (BOLD) functional magnetic resonance imaging (fMRI) at high or ultrahigh fields has made tremendous improvements in spatiotemporal resolution, allowing brain function to be investigated at the level of cortical layers and columns [5][6][7][8]. However, the BOLD signal shows a nonlinear relationship with the deoxyhemoglobin (HbR) concentration and suffers substantial tissue background [9,10]. Moreover, the MRI system's bulkiness, high cost, and requirement of confining the subject in a noisy and magnetic enclosure limit use for certain subjects and activities [11]. Alternatively, positron emission tomography (PET) can image neurometabolism but requires radioactive tracers and lacks fine spatiotemporal resolution [12]. Diffuse optical tomography (also termed functional near-infrared spectroscopy (fNIRS)) is advantageous in molecular specificity, temporal resolution, cost, and portability but suffers from low spatial resolution due to light diffusion [13,14]. Electroencephalography (EEG) and magnetoencephalography (MEG) can detect electrical activities of neurons at a high speed but with poor spatial resolution [15]. Functional ultrasound (fUS) has been demonstrated as a valuable tool for monitoring newborns' brain activities through the fontanelles, to reveal adult brain micro-vasculature and hemodynamics through the temporal bone by the use of microbubbles, and to monitor brain activities during brain surgery [16][17][18][19][20]. Nevertheless, fUS lacks molecular specificity, and the round-trip acoustic attenuation and aberration induced by the skull remain challenges for label-free imaging. Generally, fUS is less sensitive to blood vessels running perpendicularly used as a virtual point detector to improve the spatial resolution and mitigate reconstruction artifacts ( Fig. 1(b)) [37]. The reconstructed image revealed the major cortical blood vessels. However, due to the lack of elevational resolution, the image was blurred by the PA signals outside the scanning plane. The same virtual detector concept was applied to a spherically focused transducer which offered both in-plane and elevational focusing [40]. The system was demonstrated to image through an ex vivo adult human skull of 4-9-mm thicknesses [40]. The researchers also used a photon recycler to improve the illumination efficiency and achieved a 2.4-times signal-to-noise ratio (SNR) improvement [40]. In this setup, the photon recycler also served as an acoustic barrier preventing superficial PA signals from propagating to the detector, resulting in further improved elevational resolution. The acquired image revealed a canine brain's major cortical vessels ( Fig. 1(c)). (a) PACT of the rat brain cortex through the skull and skin using a scanned cylindrically focused ultrasonic transducer [36]. (b) PACT of the monkey brain cortex through a monkey skull using a scanned cylindrically focused ultrasonic transducer [37]. (c) PACT of the canine brain cortex through an adult human skull using a scanned spherically focused ultrasonic transducer [40]. (d) (1) The ring-shaped animal PACT system was used to image (1) the mouse brain cortex through the skull in vivo, (2) the coronal plane of the mouse brain through the skin and skull in vivo, (3) the whole mouse brain ex vivo, and (4) the axial plane of the trunk with USCT and PACT in vivo [41][42][43][44][45][46]. (e) A wearable PACT device for imaging the brain of a free-moving rat [47]. (f) PACT of the human breast using a full-ring ultrasonic transducer array [48]. (g) PACT of the deep mouse brain using a scanned linear ultrasonic probe [49]. (h) PACT for brain surgical guidance using a linear ultrasonic probe without scanning [50]. In (b) and (c), the scalp was shown in the schematics but not present in the ex vivo experiments. Adapted with permission [36,37,[40][41][42][43][44][45][46][47][48][49][50].
To increase the imaging speed, researchers replaced the scanned single-element transducer with a ring-shaped confocal transducer array, which has become the primary form of modern 2D PACT systems ( Fig. 1(d1)) [41,42,46,[51][52][53]. Since each transducer element was cylindrically focused in the elevational direction with a small azimuthal dimension, no virtual point detector was assumed for high-quality imaging reconstruction. With 512 elements and 1:8 electronic multiplexing, a full-ring PACT system of a 5-MHz central frequency was developed to image the brain cortex at a 1-Hz frame rate ( Fig. 1(d2)) [42]. Other forms of ring-shape systems, including half-ring confocal transducer arrays consisting of 64 elements and a 3/4-ring confocal transducer array with 512 elements, were also reported to image brain cross-sections at a 10-Hz frame rate ( Fig. 1(d3)) and whole-brain morphology ( Fig. 1(d4)) [51,52]. Later, researchers developed full-ring systems with one-to-one-mapped acquisition channels to overcome the limited-view issues associated with the partial-ring systems and the limited imaging speed due to electronic multiplexing [43,41,46,53]. Some of these systems were also equipped with pulse-echo ultrasound imaging capability, enabling both ultrasound computed tomography (USCT) and PACT [54,46,43]. By incorporating the USCT-measured speed of sound (SOS) into the adaptive PACT reconstruction algorithms, the image distortion induced by the acoustic heterogeneity of biological tissues was substantially suppressed ( Fig. 1(d5)) [46,54]. To monitor brain function in awake animals, a miniaturized PACT device consisting of three ring-shaped 64-element transducer arrays was developed, allowing the brain to be simultaneously imaged at three elevational planes ( Fig. 1(e)) [47]. A scaled-up version of the animal full-ring system was later employed for human breast imaging ( Fig. 1(f)) [48]. However, the transducer elements were designed to be flat and unfocused in the elevational direction due to the large array diameter-the focal distance of an element with a limited elevational dimension is far from the array center. Although the unfocused ring array was unequipped with fine elevational resolution, it was scanned along the elevational direction to form a cylindrical detection aperture to improve the elevational resolution. Therefore, the 2D breast ring-array system still holds the potential for human brain imaging.
Off-the-shelf linear ultrasonic transducer arrays were also explored for 2D PACT. Functional imaging of the deep mouse brain was reported using a 15-MHz linear ultrasonic probe scanned around the animal's head ( Fig. 1(g)) [49]. Scanning a linear probe is analogous to using a full-ring array, except that the former can provide denser sampling along the azimuthal direction and requires fewer data acquisition channels at the cost of imaging speed. A linear probe was also demonstrated for surgical guidance in an ex vivo human skull with tooltip illumination (Fig. 1(h)) [50]. However, the spatial resolution of a linear probe (without scanning) is highly anisotropic due to the limited view, leading to considerable reconstruction artifacts [31,[55][56][57][58]. Another major drawback of using off-the-shelf probes is that they usually do not have pre-amplifiers closely connected to the transducer elements, compromising the SNR [56].
Owing to its simplicity, relatively low cost, and acceptable image quality, a 2D PACT system, especially the full-ring array system, is promising in functional human brain imaging. Nevertheless, three potential challenges need to be considered for in vivo studies. First, without elevational scanning, a large-diameter unfocused full-ring system (e.g., the breast system in Fig. 1(d)) cannot differentiate the brain signals from the scalp signals due to the poor elevational resolution. Adding elevational scanning improves the elevational resolution but decreases the imaging speed. Second, the scalp signals can propagate into the skull and cause reverberation, fundamentally a three-dimensional (3D) problem. Third, a 2D system is less sensitive than an ideal 3D imaging system.

3D PACT systems
Hemispherical detection aperture is considered the optimal and practical solution for 3D human brain PACT because it can provide 2π-solid-angle detection and accommodate the shape of the head. A system that employed a sparse hemispherical array was reported to scan around its central axis for mouse brain imaging and along a 3D spiral trace for trunk imaging ( Fig. 2(a)) [34,59]. A system with a similar transducer element arrangement but a larger array diameter was reported for human breast and extremity imaging ( Fig. 2(b)) [60,61]. The transducer array was scanned along a 2D spiral pattern. It provided dense sampling, allowing a high-quality image to be reconstructed. The key limitations of the system were its long acquisition time (120 s) and shallow penetration (<10 mm) [60]. To adapt the system to human brain imaging, a longer laser wavelength (e.g., 1064 nm) can be adopted to improve the imaging depth. Also, the array can be scanned for a smaller FOV to improve the imaging speed. Another system composed of 16 arc ultrasonic transducer arrays, each consisting of 32 elements of 1-MHz central frequency, was reported to image the human breast ( Fig. 2(c)) [62]. The system combined fixed and dynamic light sources to improve illumination uniformity. A similar but economic version was also reported for human breast imaging. It scanned one quarter-ring transducer array for ultrasound detection (Fig. 2(d)) [63]. The array was made of 96 wideband transducer elements, allowing for denser sampling in the elevational direction and higher resolution. Two fiber bundles were rotated together with the transducer array to illuminate the breast. For both systems in Figs. 2(c) and (d), the illumination strategies provided uniform illumination, but the inconstant light energy distribution at each scanning position violated the constant initial PA pressure assumption defined by the 3D image reconstruction algorithm [64]. Thus, the illumination needs to be fixed or numerically compensated for if the systems are used for functional human brain imaging. Recently, another similar system using fixed illumination was reported for ex vivo transcranial imaging ( Fig. 2(e)) [65]. A quarter-ring transducer array of 64 elements centered at 1 MHz was scanned to form a hemispherical detection aperture. The acquired data were also used to evaluate some advanced de-aberration reconstruction algorithms to be discussed in Section 3 [66]. More recently, a massively parallel system was developed to demonstrate in vivo functional human brain imaging for the first time [27,28]. The system was made of 1024 parallel detection channels evenly distributed on four quarter-ring arrays separated at 90 degrees. By mechanical scanning, it acquired a structural image in 10 s and a functional volumetric image in 2 s. More details regarding that study can be found in Section 4.  [59]. (b) A sparse hemispherical array for human breast and extremity imaging [60]. (c) A system composed of 16 arc arrays for human breast imaging [62]. (d) A system with a quarter-ring array for human breast imaging [63]. (e) A system with a quarter-ring array for transcranial imaging [65]. (f) A 1024-element parallel system for functional human brain imaging [28]. Adapted with permission [59,60,62,63,65,28].
The transcranial PA spectrum has been measured to peak at ∼0.75 MHz using 1-MHz ultrasonic transducers [65]. Given that these transducers have maximum sensitivity at 1 MHz, which is higher than 0.75 MHz, one can infer that the limiting factor for detecting higher-frequency PA signals is the skull-induced frequency-dependent attenuation rather than the transducer's limited bandwidth. In other words, using higher-frequency transducers will not improve the detection of transcranial PA signals. On the other hand, although using lower-frequency transducers can improve the detection of transcranial PA signals, the upper bandwidth of such transducers is not high enough to resolve superficial features, such as scalp vessels and boundaries, for use in skull co-registration and modeling. Therefore, the 1-MHz center frequency, which optimizes the tradeoff between the acquisition of transcranial PA signals and the resolution for superficial features, is desirable for transcranial human-brain PACT.
Other forms of 3D PA imaging systems include acoustic-resolution photoacoustic microscopy (AR-PAM), optical-resolution photoacoustic microscopy (OR-PAM), and Fabry-Pérot etalon sensor-based PACT scanner, among others [67,68]. Typical AR-and OR-PAM form images by scanning a single-element transducer in a 2D plane and reconstructing depth information based on the signal arrival time. Because cross-sectional images are acquired, but an inverse reconstruction problem is not involved, PAM systems can be referred to as photoacoustic tomography (PAT) but not computed tomography. Depending on the transducer frequency, AR-PAM can typically image ∼3 mm deep, whereas OR-PAM can image ∼1 mm deep [67]. The Fabry-Pérot etalon PACT uses an optical ultrasonic transducer and reconstructs images using the time reversal algorithm [68]. Overall, these systems are considered unsuitable for noninvasive human brain imaging due to their limited FOVs, low imaging speeds, and limited penetration.

Ultrasonic transducer technologies
Based on the sensing mechanism, ultrasonic transducers fall into two main categories: electric transducers and optical detectors. Electric transducers can further be classified as conventional piezoelectric transducers, piezoelectric micromachined ultrasonic transducers (PMUTs), and capacitive micromachined ultrasonic transducers (CMUTs) [69]. Based on the detection strategies, optical ultrasonic detectors can be classified as interferometric detectors and refractometric detectors [70]. Figure 3 shows the representative structures of electric transducers and optical detectors. Piezoelectric transducers are most often used [71]. In fact, the systems in Figs. 1 and 2, except Fig. 2(b), all adopted piezoelectric transducers (the transducer array in Fig. 2(b) was made of CMUTs). Compared with piezoelectric transducers, PMUTs and CMUTs are compatible with application-specific integrated circuits (ASICs), which may benefit future PACT systems [72]. Recently, transparent CMUTs have also been reported, potentially permitting more efficient illumination and detection for PACT [73][74][75]. Various types of optical detectors with excellent performance have been developed based on optical fibers [76], free-space optics [68], and polymer waveguides [77]. Benefitting from the modern semiconductor infrastructure, miniaturized optical detectors have recently been made into photonic chips, allowing for submicrometer element size [78] and fine pitch arrays with parallel readout [79]. Nevertheless, compared with electric transducers, optical detectors offer a relatively low sensitivity for an optimal half-wavelength detection area at frequencies below ∼1 MHz [80,81].
For functional human brain imaging, which aims to detect PA signal changes of a few percent, SNR is the most critical design criteria [27,28]. The SNR is mainly determined by the transducer sensitivity and the transducer array's filling factor. For a PACT system, although the transducer elements must be small enough to provide a sufficiently large coverage angle, the element size need not be smaller than half wavelength, because an overly small element size does not improve imaging performance. Therefore, practitioners should consider whether a specific technology allows an element size of half wavelength to be manufactured and evaluate the sensitivity at the optimal half-wavelength active area. Piezoelectric transducers can be made to optimal sizes, and their sensitivities are scalable with the active areas. For CMUTs and PMUTs, although their basic unit is a cell of dozens or hundreds of micrometers, multiple cells can be densely packed into an element of arbitrary size or shapes using the microfabrication process [82,83]. A typical cell filling factor of a CMUT or PMUT element is ∼0.4 [84]. Also, the sensitivities of CMUTs and PMUTs are scalable with their active areas. In contrast, an optical detector's sensitivity is generally independent of its active area. For a 1-MHz (upper cut-off frequency) PACT system, an element size of ∼0.75 mm should be used. Under this condition, the sensitivities, defined as the noise equivalent pressures per square root of unit bandwidth, are ∼1.5 mPa/ √ Hz for piezoelectric transducers [81], ∼1.4 mPa/ √ Hz for CMUTs [84], ∼0.3 mPa/ √ Hz for PMUTs [85], and ∼2.0 mPa/ √ Hz for optical detectors [68,80]. Overall, piezoelectric transducers, PMUTs, and CMUTs are all competitive in human brain PACT. However, PMUTs and CMUTs have not been widely commercialized, and the device performance may vary with wafer substrates, leading to potential challenges in fabricating large arrays with hundreds to thousands of device chips.

Properties and numerical models of the human skull
For transcranial PACT, the presence of the skull has both optical and acoustic effects. At 956-nm optical wavelength, the effective attenuation coefficient of the human skull was measured to be ∼1.9 cm -1 , corresponding to a transmittance of ∼0.5 for a bone thickness of 0.4 cm [86]. A higher transmittance is expected at 1064 nm due to the lower scattering coefficient. 1064 nm is the preferred optical wavelength for transcranial PACT. First, the dominant absorber at 1064 nm, HbO 2 , can be directly measured with a single wavelength. Second, high-energy pulsed Nd:YAG lasers at 1064 nm are widely available. Third, the ANSI safety limits allow for a higher maximum permissible exposure at 1064 nm than at a shorter wavelength [87]. Like transcranial pulse-echo ultrasound imaging, the major obstacle for transcranial PACT is the skull-induced acoustic aberration. Unlike conventional ultrasound imaging, PACT does not suffer speckle artifacts and only experiences one-way acoustic aberration, posing a more straightforward problem to solve [88,89]. Since the development of transcranial PACT reconstruction frameworks mainly centers around the skull-induced acoustic problems, we will discuss the skull acoustic properties and numerical models in this section before reviewing the image reconstruction algorithms in Section 4.

Human skull properties
The adult human cranial bone is a sandwiched structure made of the inner table, outer table, and diploe layer in the middle (Fig. 4) [90]. The inner and outer tables are cortical bones anatomically different from the diploe layer, which is trabecular (cancellous) bone. For infants under about four years old, the cranial vaults are typically unilaminar cortical bones [91]. The cortical bone is a solid and dense material with a density ranging from 1.8 to 2.2 g/cm 3 [92]. The trabecular bone consists of lamellar packets of irregular plates and rods, among which are fluid bone marrow cells [92]. The trabecular bone is highly heterogeneous, anisotropic, and porous. The element size is between 50 and 150 µm with a separation distance between 0.5 and 2 mm [93]. The trabecular bone's density falls between 0.3 and 1.3 g/cm 3 with an average porosity (void fraction) of 24% [90,93]. Since the cortical bone's density is higher than that of the trabecular bone, acoustic energy is mostly absorbed by the cortical bone [94,95]. At 1 MHz, the cortical bone's acoustic attenuation coefficient is ∼2.7 dB/cm for longitudinal waves and ∼5.4 dB/cm for shear waves. For solid bones, the phase velocities of longitudinal and shear waves are ∼3000 m/s and ∼1500 m/s, respectively [90,96]. The cranial bone's thickness is dependent on the anatomical location and the subject's gender and age. Figure 5 maps the average skull thicknesses based on the data acquired from 76 people, including 66 males and 10 females aged from 10 to 60 [97]. The temporal and parietal bones demonstrate relatively small thicknesses, making them a potential sally port for transcranial PACT.

Skull-induced acoustic aberration
The human skull presents impedance mismatch with the ambient soft tissues and coupling media and distorts acoustic amplitudes and phases. Figure 6 categorizes the skull's acoustic effects into three main categories: attenuation, waveform distortion, and signal contamination.
Wave reflection, absorption, and scattering are the main attenuation causes. Reflection results from the acoustic impedance mismatch between the cortical bone and ambient soft tissues at the inner and outer skull boundaries. Absorption is induced by the conversion of mechanical energy to heat in the cortical bone, whereas acoustic scattering mainly occurs in the diploe layer. Over the diagnostic ultrasound frequency range, the acoustic absorption can be described by the frequency power law [98,99]: where r ∈ V represents the spatial location, f denotes frequency in MHz, α 0 (r) is the frequencyindependent absorption coefficient in Np/MHz y /cm (1 Np/MHz y /cm = 8.686 dB/MHz y /cm), and y stands for the power law exponent (typically between 0.9 and 2.0) [98]. For a homogeneous attenuation coefficient of 2.7 dB/MHz 2 /cm and a scattering coefficient of 5.5 dB/cm [90], the transmittance of a normally incident acoustic wave was estimated at 0.75 MHz for different skull thicknesses in Fig. 7. Scattering and reflection are shown to dominate the attenuation, and the transmittance is ∼22% for a skull thickness of 4 mm. For a PA source in the skull, the acoustic waves' incident angles will differ at different skull locations, resulting in different transmission coefficients at the skull boundaries (to be shown in Fig. 8). The average acoustic transmittance measured over a hemispherical detection aperture that partially enclosed the skull was ∼4%, corresponding to ∼80% pressure attenuation [65]. Waveform distortion is another acoustic effect of the skull. Signal broadening is induced by the frequency-dependent attenuation. The relationship between acoustic dispersion and acoustic absorption is governed by [100] 1 where c 1 and c 2 are the sound speeds at frequencies f 1 and f 2 , and y ≠ 1. An alternate expression for y = 1 is available [101]. Equation (2) also indicates that when the absorption coefficient is frequency square-dependent (y = 2), acoustic dispersion is negligible. Wave refraction occurs at the skull boundaries due to the sound speed mismatch between the ambient soft tissues and the cortical bone. When acoustic waves propagate from the soft tissues to the bone or vice versa, wave conversion occurs at their interfaces due to the support of shear stress by the bone. The conversion introduces acoustic distortion due to the different phase velocities of longitudinal (compressive) and shear waves. The degree of wave mode conversion is dependent on the angles of incidence and refraction at the inner and outer skull boundaries (Fig. 8) [65,102]. At the soft-tissue-skull interface, the critical angle for an incident longitudinal wave is approximately 33°, below which the incident wave is partially converted to longitudinal and shear waves [65]. Above the critical angle, only shear waves are converted and transmitted into the skull. The pressure transmission coefficient can be greater than unity due to the higher acoustic impedance of the longitudinal wave in the skull than in the soft tissues, although the intensity transmittance can never surpass unity. There is no critical angle for the longitudinal wave at the skull-soft-tissue interface due to the higher phase velocity of longitudinal waves in the skull than in the soft tissues. The critical angle for the shear wave at the skull-soft-tissue interface is around 75°. Acoustic scattering in the diploe layer scrambles the waves. The degree of scattering depends on the ratio between the scales of the trabecular units and the acoustic wavelength, which typically falls between ∼0.025 and ∼0.075 for longitudinal waves at 0.75 MHz [90,93,96]. Another type of waveform distortion is the reverberation of the brain signals inside the skull, which prolongs the detected signals. Due to the exponential decay of optical fluence with depth, a large amplitude difference exists between the PA signals from the scalp and those from the brain. Similar to the brain signals' reverberation, these superficial signals propagate into the skull and reverberate, contaminating the signals from the brain in the temporal domain. One can visualize the impacts of the human skull on transcranial photoacoustic imaging in a recent simulation work [103].

Numerical models of the skull
Obtaining a numerical model of the skull is essential for de-aberrating the transcranial PA signals, similar to the situations for transcranial high-intensity focused ultrasound (HIFU) [104][105][106].
Theoretically, x-ray CT is most suitable for depicting cortical bone structures, but it has limitations for in vivo applications due to the ionizing radiation exposure [107]. In comparison, MRI is intrinsically less suitable for depicting cortical bone structures due to the low proton density (20% of water) and short signal lifetime (390 ms for T2 at 3T) [108]. Nevertheless, with the development of ultra-short echo time (UTE) and zero time-to-echo (ZT) sequences, MRI has been demonstrated to provide imaging characteristics ideally suited for depicting and segmenting the cranial bone structures [109][110][111][112]. Figure 9(a) displays the skull images acquired using MRI (ZT sequences; inverse logarithmic scaling; bias correction) and x-ray CT, demonstrating an excellent agreement [111]. An approximately linear correlation was also observed between the MRI and x-ray CT images for Hounsfield numbers between 300 HU and 1,500 HU ( Fig. 9(b)) [111]. Since an MRI image is convertible to a Hounsfield map using such a linear relationship, the skull's acoustic parameters can be retrieved based on the relationship between Hounsfield numbers and acoustic properties [112,113]. Alignment between the lab coordinates and the numerical skull model can be performed using the superficial blood vessels imaged by both MRI angiography and PACT [27]. When MRI or x-ray CT is used to improve transcranial PACT, PACT still holds great value in potential applications, such as routine assessment of post-operative brain restoration, studying brain networks during social interaction, which is impractical using a closed-bore MRI or x-ray CT machine, and non-invasive brain-computer interface. In these potential applications, the MRI or x-ray CT scanning only needs to be carried out once. However, the ideal form of transcranial PACT should be independent of MRI or x-ray CT.  (1) was used as the initial guess for AWI (2), which was later used as the preconditioner for FWI (3). The ground truth is displayed in (4). Adapted with permission [111,114].
To obviate the reliance on MRI or x-ray CT, we envisage two alternative approaches to modeling the skull. First, a PACT system can be designed to enable simultaneous USCT. It has been demonstrated that the combination of transmitted transcranial USCT with adaptive waveform inversion (AWI) followed by full-wave inversion (FWI) can produce a sub-millimeter-resolution skull model with a sufficient SNR using a sub-MHz ultrasound frequency (Fig. 9(c)) [114]. FWI is an iterative reconstruction technique originally developed by the petroleum industry to image hydrocarbon reservoirs and requires a relatively accurate initial guess of the model [115]. AWI is a modified form of FWI and is less sensitive to the preconditioner but typically cannot provide as well resolved models as those produced by FWI. Therefore, using the AWI-generated model as a preconditioner for FWI can improve the skull model quality [114]. Figure 9(d) displayed the recovered acoustic properties of the skull by FWI and AWI using USCT. In this approach, the concurrent measurements also allow for automatic co-registration between the USCT and PACT images. However, the major challenge associated with this approach is the considerable computational time. Another method of modeling the skull is to use a skull thickness atlas statistically built based on the established database, such as Fig. 5 [116]. The inner boundary of the scalp imaged by PACT can be treated as the skull's outer boundary [65]. Combining the outer boundary and the thickness, one can estimate the skull's inner boundary. However, this subject-nonspecific method does not measure the exact acoustic properties or geometries of each skull and may need parameter tuning and iterative reconstruction to achieve a sufficient reconstruction quality. Alternatively, this approach can potentially be combined with the first approach by using the atlas as a preconditioner for FWI, where one may not only increase the accuracy of the initial guess but also reduce the computational time by skipping the AWI step.

Computational frameworks for transcranial PACT
Given the considerable progress in solving the inverse problem of transcranial PACT, a survey on this topic may facilitate comparison among different algorithms and catalyze the implementation of the developed algorithms in solving in vivo problems. This section is restricted to the problem of estimating the initial acoustic pressure distribution. Recovering the optical properties of the brain, which is not necessarily required in quantifying the PA signals' fractional changes for functional imaging, will be touched on briefly in Section 5.

PACT forward model in a lossy and acoustically heterogeneous fluid medium
To describe the power-law absorption and dispersion effects, wave equations that employ timedomain fractional derivative operators were proposed [98,99,117,118]. However, the operators posed a significant memory burden for numerical implementation [119]. To overcome the issue, a wave equation that modeled the power law absorption using fractional Laplacian operators was proposed [120]. Later on, the dispersion term was introduced into the equation of state via another fractional Laplacian operator [121]: where p(r, t) is the acoustic pressure, c 0 (r) denotes the SOS in an adiabatic system, and ρ p (r, t) stands for the acoustic density (sound-induced perturbation of the fluid density from its ambient value ρ 0 (r)). The second and third terms on the right-hand side account for the required power law absorption and dispersion. τ(r) and η(r) describe the acoustic absorption and dispersion proportionality coefficients defined as [121] τ(r) = −2α 0 (r)c 0 (r) y−1 , η(r) = 2α 0 (r)c 0 (r) y tan (︂ πy 2 Additionally, the following two equations govern Newton's second law and conservation of mass: where u(r, t) denotes the particle velocity. The initial conditions are forms the forward wave model of PACT in a lossy and heterogeneous fluid medium.
Defining the pressure recording domain as R and the recording positions r ′ ∈ R, the initial pressure distribution p 0 (r) can be mapped to the measured time-varying pressure distribution p(r ′ , t) using a forward operator F: For a lossless and acoustically homogeneous infinite medium, Eq. (4) has the explicit form [64]: where c 0 is the constant SOS, and δ(·) is the Dirac delta function. One can rewrite Eq. (5) as where g(r ′ , t) = ∫ V d 3 rp 0 (r)δ(c 0 t − |r ′ − r|) represents the spherical Radon transform connecting g(r ′ , t) with the integral of rp 0 (r) over a spherical surface that has a radius of c 0 t.

PACT forward model based on elastic wave equations
The previous section assumed a lossy and heterogeneous fluid medium where the shear stress or viscosity was ignored by the fluid wave equations. To describe the propagation of transverse shear waves in the bone, a more accurate reconstruction method can be developed from the elastic wave equations. In the elastic wave equations, instead of using scalar pressure, a 3×3 stress tensor matrix is defined: Here, acoustic absorption within the skull is assumed to be frequency invariant [90]. This assumption is reasonable because the ultrasonic transducer's bandwidth limits the bandwidth of the detected PA signals. Defining u = [ u 1 (r, t), u 2 (r, t), u 3 (r, t)] as the particle velocity vector andα(r) in s −1 as the acoustic absorption coefficient which represents the damping force at unit particle speed for unit mass, the wave propagation can be modeled by the following two equations [122][123][124][125]: ∂u(r, t) ∂t +α(r)u(r, t) = 1 ρ 0 (r) (∇ · σ(r, t)), (7-b) ∂σ(r, t) ∂t = λ(r)tr(∇u(r, t))I + µ(r)(∇u(r, t) + ∇u(r, t) T ), (7-c) subject to the initial conditions: In Eq. 7-c, λ(r) and µ(r) represent the Lamé parameters describing the full elastic tensor of the linear isotropic medium. The tr(·) operator calculates the trace of a matrix, and I is the identity matrix. It should be noted that the initial pressure p 0 (r) is compactly supported in a fluid medium with shear modulus µ(r) = 0 [126]. Similar to Eq. (4), the initial pressure distribution p 0 (r) can be mapped to the measured pressure distribution p(r ′ , t) using a forward operator H: p(r ′ , t) = Hp 0 (r).
H depends implicitly on the discretized shear and longitudinal sound speed distribution, density distribution, and absorption distribution. These quantities can be specified by a skull model, which can potentially be obtained from other imaging modalities, such as MRI, x-ray CT, or ultrasound tomography [65]. The effects of the transducer's electrical and spatial impulse responses were ignored in Eqs. (3) and 7, but they can be incorporated readily [35,66,127].

Discretization of the forward models
To solve the wave equations using numerical methods, the detected pressure and object function need to be both temporally and spatially discretized, and the material parameters need to be spatially discretized. Numerical methods for solving the acoustic wave equations fall into three main categories: finite difference (FD) methods, finite element (FE) methods, and spectral methods [35,128,129]. FD and FE methods are local because the wave equations are solved at each point based on its neighboring points. Spectral methods are global as they employ the information from the entire wavefield, allowing computation to be performed on coarser grids without losing accuracy [129,130]. For transcranial PACT reconstruction, the skull boundaries need to be carefully handled because of the sharp transition of acoustic properties. For FD and spectral methods, a constant grid shape and size are commonly used across the entire finite domain [66,126,131]. To better capture the skull curvature, denser grids need to be defined, which poses a computational challenge. For the FD time-domain (FDTD) method, the widely adopted staggered grid finite difference scheme requires the skull boundaries to be numerically smoothed [35,66,90,126,127]. The effects of smoothing on reconstruction accuracies may require further evaluation. For spectral methods, when the entire finite domain is heterogeneous with a smooth transition or simply homogeneous, the solution can be of high accuracy [129]. However, the sharp acoustic property transition at the skull boundaries may violate the assumptions. FE methods can handle sharp boundaries well using adaptive meshing [128]. However, compared to FD and spectral methods, FE methods require a longer computation time given the same degrees of freedom [132]. Although the sharp skull boundaries can potentially restrict the implementation of a particular numerical method, no study has compared the reconstruction accuracies using the three methods side by side.

Image reconstruction
Comprehensive reviews of PACT reconstruction methods can be found in several review articles [35,[133][134][135]. The universal back-projection algorithm has been the most popular method for PACT reconstruction [64,133,65,33,47,60,62]. Other methods include the inverse Radon transform and time-domain delay-and-sum (beamforming) techniques [136,137]. These canonical methods assume a lossless and acoustically homogeneous medium and that the PA signals are densely sampled on a surface enclosing the object. To compensate for the acoustic heterogeneity induced by the skull, ray-based methods were proposed to divide the wave propagation into layers and account for the skull-induced acoustic aberration in each layer individually (Fig. 10) [65,138]. These approximate methods were computationally efficient and improved the reconstruction quality to some extent but could not separate the longitudinal and shear waves or correct for reverberation. Additionally, to prevent a single back-projected ray from crossing two or more layers, a limited number of virtual detectors in one layer were used to estimate the pressure in the next layer, introducing partial-view problems. Frequency-domain reconstruction methods have also been studied. Certain detection apertures can be implemented efficiently using the fast Fourier transform [139][140][141]. Nevertheless, frequency-domain reconstruction methods also need to assume an acoustically homogeneous medium. Time reversal reconstruction algorithms form a PACT image by running a numerical model of the forward wave equations backward in time [142][143][144]. When the detection aperture fully encloses the object and the sampling time is sufficiently long for the PA waves to completely escape the detection enclosure, time reversal yields a theoretically exact reconstruction of the object function p 0 (r) [145]. Like the back-projection methods, a partial-view detection aperture leads to information incompleteness and an inexact solution. However, time reversal methods can compensate for the medium loss using gain that inverts the attenuation [144]. For example, to account for the acoustic absorption, the absorption terms in Eqs. (3)-a and 7-b can be reversed in signs. However, care should be taken when performing compensation because numerical instability may occur. Heterogeneity in sound speeds can also be incorporated into time reversal methods [143,146]. The universal back-projection and time reversal algorithms were compared by imaging two line-shape targets through an ex vivo money skull (Fig. 11) [113]. PACT reconstruction can also be implemented in a lossy elastic medium by directly applying the adjoint of the discretized H operator to the discretized recorded pressure data. The exact form of the discrete and continuous adjoint operators have been derived, making this approach readily available and computationally efficient [126,147]. Compared with the universal back-projection algorithm, the adjoint reconstruction can effectively mitigate skull-induced wavefront aberration (Fig. 12) [126]. However, the adjoint operator does not compensate for the acoustic attenuation. Instead, the attenuation term applies the attenuation to the measured pressure signals twice if not turned off.  12. (a) Photograph of a skull-mimicking plastic globe with "blood vessels" painted on the inner surface. Ex vivo PACT images were reconstructed through the globe using (b) the universal back-projection algorithm and (c) the adjoint operator. Adapted with permission [126].
When the adjoint operator does not produce adequate image quality, the ability to compute the adjoint operator facilitates gradient-based iterative algorithms [66,126]. In fact, forming the reconstruction problem as an optimization problem has been widely adopted in modern medical imaging modalities, such as x-ray CT and PET [148][149][150]. An iterative reconstruction algorithm that seeks to compute the penalized least-squares estimates has been reported [66]: where vectors p and p 0 represent the discretized observation data (after being deconvolved with the system responses) and discretized initial pressure, respectively. γ is the regularization parameter, and R(·) is a regularizing penalty term, such as the total variation penalty [66]. The quantity ∥·∥ 2 W stands for a weighted l 2 norm where the diagonal weight matrix W contains elements inversely proportional to the variance of the measurement data. Modern iterative algorithms, such as the fast iterative shrinkage/thresholding algorithm (FISTA), can be implemented with parallel computing to improve computational speed [66,[151][152][153][154][155]. The main advantage of the optimization-based approach over the time reversal approaches is that it offers the flexibility to mitigate the effects of data incompleteness via the regularization term. Also, numerical stability issues that exist in time reversal can be mitigated [66,144]. Figure 13 compares the reconstructed images using the adjoint and optimization-based methods [66]. Given the potential estimation errors of the skull's acoustic properties, one can employ a joint reconstruction approach to further improve the image quality by concurrently optimizing the PACT initial pressure distribution and the skull acoustic parameters. Based on the elastic forward models, a PACT joint reconstruction problem can be described as [127,156] (p 0 ,ĉ l ,ĉ s ,ρ 0 ,α) = argmin p 0 ,c l ,c s ,ρ 0 ,α ∥︁ ∥︁ p − Hp 0 ∥︁ ∥︁ 2 W + γR(p 0 ), whereĉ l ,ĉ s ,ρ 0 , andα denote the estimates of the longitudinal wave velocity c l , shear wave phase velocity c s , density ρ 0 , and acoustic attenuation α, respectively. The algorithm iteratively optimizes the skull acoustic parameter estimates and the initial pressure distribution until convergence [127]. The gradients of the cost function can be computed using the adjoint operator [126]. Since this method allows the acoustic properties and initial pressure distribution to be estimated simultaneously, a more accurate estimation of the initial pressure distribution can be achieved. Figure 14 compares the images reconstructed using the universal back-projection, conventional optimization-based (Eq. (9)), and joint reconstruction methods based on simulation.
Recently, the Bayesian framework was proposed for PACT reconstruction [157][158][159][160]. It defines all parameters as random variables. The measurements, the model, and the prior information about the parameters were analyzed using maximum a posteriori estimate to solve the inverse problem. Figure 15 displays the mouse head images reconstructed using the Bayesian framework and time reversal. The Bayesian approach is promising for transcranial PACT as it accounts for the uncertainties in parameters, models, and geometries and demonstrates advantages over time reversal algorithms when the detection view is limited. However, further development is required before it can be applied to an acoustically absorptive and heterogeneous medium.
In addition to the purely model-based algorithms, data-driven techniques, especially deep learning (DL), have been increasingly investigated for PACT reconstruction [161][162][163][164][165]. Driven  by data, DL learns end-to-end transformations without the need for explicit definition of an analytical model. Training the DL network can also be understood as an optimization problem related to the aforementioned Bayesian framework [161]. So far, DL has been successfully applied to reconstruct PACT images from limited-band/limited-view/sparse measurements given forward operators [135,[166][167][168] and to approximate inverse operators, which otherwise involved solving the forward and adjoint problems repetitively [169,170]. Although the use of DL for transcranial PACT has not been established, some reported works are relevant and potentially transferrable to transcranial PACT. For example, an encoder-decoder network was developed to account for the acoustic and optical attenuation in the deep tissue during PACT reconstruction [171], a U-net-based convolutional neural network (CNN) was proposed to correct for the SOS-heterogeneity-induced aberration in PACT images [172], several DL networks were designed to produce full-bandwidth output signals from limited-band raw signals to improve the reconstruction resolution [166,173], and various DL networks were employed to remove the reconstruction artifacts or denoise the measured data [174][175][176]. In another relevant work, the vector space similarity (VSS) model was used in conjunction with a simulated training data set to correct for the skull-induced distortion in transcranial PAM [177]. However, this method is not strictly DL due to the lack of a layered network. For the problem of DL-based transcranial PACT reconstruction, two major issues remain to be addressed. One is the lack of paired training data which are essential for fully supervised training. Two approaches can potentially overcome this issue. Experimentally, an acoustic point source can be scanned over a volume inside the ex vivo skulls to acquire the location-dependent point spread functions. The acquired point source data can be synthesized to form arbitrary features to mimic realistic targets. The second approach uses pure simulation, where the ground truth can be extracted from the clinical x-ray CT or MRI images. By simulating the forward problems using these features as the initial pressure sources, one can create training data for the network. Except for the fully supervised training, a semi-supervised approach can also be considered. For example, physics-informed neural networks can be developed by directly incorporating the physical models into the loss function such that the network learns much of the physics by the terms in the loss function [161]. The other issue is the lack of a data-consistency term, which may cause challenges in assessing the reconstruction correctness. As a result, further networks are expected to consider uncertainty and provide an uncertainty estimate of the reconstruction. For instance, null-space projection can be used to ensure data consistency after postprocessing in DL-based PACT reconstruction [178]. Overall, given the many opportunities that a network can be incorporated into the reconstruction pipeline, it is expected that DL will play a revolutionary role in transcranial PACT reconstruction.

PACT of human brain function
Studies of functional brain PACT have mainly been reported in animal models, for which several comprehensive review articles can be consulted [22,45,179,180]. For human brain PACT, researchers have mainly focused on ex vivo structural imaging, and in vivo functional imaging had not been demonstrated until recently [27,28]. In that study, a newly developed 2.12-MHz 1024-element parallel system was reported. The system costs ∼US $0.44 million before tax, which is less expensive than a 7T MRI system (∼US $6.5 million for Siemens Terra) [181]. The PACT system was used to image the functional responses of hemicraniectomy human subjects. Although the unique subject population provided an acoustic window allowing for aberration-free image reconstruction, the achieved functional results revealed PACT's capability in mapping human brain function with comparable performance to 7 Tesla MRI-the current gold standard in the clinic. It is believed that some of the results and methodologies from that study can inspire future related research. Figure 16 shows a subject being imaged by the PACT system [27]. Supine, lateral, and prone imaging positions were adopted to optimize the tradeoff between comfort and stability [27,28]. A custom-designed head stabilizer was used to reduce the motion artifacts during scanning. Two laser wavelengths, 1064 nm from an Nd:YAG laser fired at 10 Hz and 694 nm from a ruby laser fired at 1 Hz, were employed to excite PA signals from endogenous hemoglobin (Hb). The measurement started with the baseline mode to acquire brain angiography followed by the functional mode to image brain function. The scanning configuration resulted in a 10-cm-diameter FOV on the head, an isotropic spatial resolution of 350 µm, and a temporal resolution of 10 s (1064 nm) and 100 s (694 nm) for the baseline mode and 2 s for the functional mode. Once configured to acquire data at a 20-MHz sampling frequency, 12-bit resolution, and 2000-point sampling length, one functional volumetric scan produces a data size of around 60 MB. If the reconstruction is performed at a volume size of 100×500×500 (resulting in a FOV of 20×100×100 mm 3 at a voxel size of 0.2×0.2×0.2 mm 3 ) with two-times azimuthally interpolated channel data, the GPU-accelerated computational time is around 140 s (GeForce GTX 1080 Ti, Nvidia, Corp.) Fig. 16. A subject's head being imaged by the PACT system. Adapted with permission [27].

Imaging schemes
Five block-designed benchmark motor and language tasks were investigated. They were finger tapping, lip puckering, tongue tapping, story listening, and word generation. N = 4 subjects were recruited for the study, and most tasks were repeated for three times. The BOLD fMRI results non-concurrently obtained using a 7 Tesla scanner were used to validate the PACT functional results.

Functional results
For both PACT and BOLD fMRI, the brain activities were extracted based on the PA signals measured at the 1064-nm optical wavelength at each voxel based on the General Linear Model and presented in t-scores on the top of T1-weight MRI images ( Fig. 17(a)) [27,28,182]. The maximum imaging depth was ∼19 mm below the skin surface or ∼11 mm under the cortical surface. An average dice coefficient and spatial correlation coefficient of ∼0.4 among all motor tasks indicated a fair-to-moderate agreement. An average center-of-mass error of ∼6 mm among all motor tasks demonstrated acceptable variations. For story listening, an average dice coefficient and spatial correlation coefficient of ∼0.5 indicated a moderate agreement. An average center-of-mass error of ∼6 mm of both language tasks denoted acceptable localization discrepancies. The quantified correspondence between the two modalities' functional results demonstrated that PACT could provide comparable performance to 7 T fMRI for functional human brain imaging.  [27]. Adapted with permission [27].
The authors quantified the fractional changes of the directly measured PA signals ( Fig. 17(b)) and the derived Hb concentration signals (Fig. 17(c)). When quantifying the fractional Hb concentration changes, the authors compensated for the optical fluence using the prior knowledge of the wavelength-dependent optical properties of the scalp and brain layers, which had been measured up to 30 mm in depth using diffuse optical imaging based on layered optical models [183]. However, the quantification was not validated against the ground truth, which was generally acquired invasively [184]. Other methods for estimating the optical fluence with a higher accuracy include modeling the light propagation as a relationship between the medium's optical properties and the light fluence, followed by an iterative calculation of the spatial distribution of the optical absorption coefficient [185,186]. However, such an approach suffers computational instability, especially in in vivo situations. Another approach, termed eigenspectra optoacoustic tomography, models optical fluence as an affine function of a few reference base spectra and has demonstrated 3-8-fold estimation improvement for imaging tissues at depths greater than 5 mm [187]. It was also shown to be independent of the specific distribution of optical properties. However, given the case-by-case differences in optical properties of the brain tissue, the preliminary results of simulated targets in the mouse brain require further validation with a larger pool of tissue physiology interrogations [188,189]. Overall speaking, the problem of optical fluence quantification in deep tissues has not been conclusively solved due to the dependency of light fluence on the medium's wavelength-dependent and location-dependent scattering and absorption, as well as the variance of optical properties between subjects [45].

SNR analysis
The functional data acquired in the hemicraniectomy subjects can be used to estimate the SNR of transcranial PACT, which has not been demonstrated in vivo. As shown in Fig. 18, the SNR at ∼11 mm below the cortical surface was measured to be ∼50, corresponding to ∼2% detectability of signal changes [27,28]. Based on the discussion in Section 3, the SNR of transcranial PACT is compromised by the skull-induced optical attenuation, acoustic attenuation, and acoustic waveform distortion (Fig. 6). To detect the functional changes of several percent, the hardware needs to be further improved, and a system with a lower central frequency (e.g., 1 MHz) is preferred [65]. Given the ∼50% optical attenuation and ∼80% acoustic pressure attenuation for a 4-mm skull thickness, the total PA signal attenuation induced by the skull will be ∼90%. As a result, the SNR will decrease to ∼5, corresponding to ∼20% detectability for transcranial imaging. However, the linear dimension of a 1-MHz transducer element will be doubled compared to the element size in this study, improving the SNR by a factor of ∼2.1. Besides, if the radiant exposure can be increased to the maximum permissible ANSI safety limit, another factor of ∼4 SNR improvement is expected [87]. Finally, if the element count of the transducer array can be doubled to 2048, the SNR can be further improved by ∼1.4 times. Since the dependence of the SNR on the system parameters, such as the ultrasonic transducer element size, element count, central frequency, and scanning time, holds the same for different systems, the above SNR is considered transferable to other PACT systems. Indeed, the thickness differs among skulls and locations on the skull, but one may estimate the SNR based on the skull properties summarized in Figs. 5 and 7. Currently, the functional signals are from hemoglobin. However, one may potentially estimate the SNR of other targets as long as its absorption coefficient is known. Notably, the SNR cited in this review is mainly for evaluating the feasibility of transcranial PACT. With more in vivo studies carried out by various labs, the SNR can be cross-validated.
Overall, given the above potential hardware improvement, the SNR of a future transcranial PACT system can potentially reach ∼59 at ∼10 mm below the cortical surface. One should note that the presented SNR analysis treated acoustic scattering as loss. However, if the next-generation reconstruction algorithms can incorporate the acoustic scattering effect, the SNR can be further increased by a factor of ∼1.3 (Fig. 7), resulting in an SNR of ∼77 at ∼10 mm below the cortical surface. SNRs of ∼59 and ∼77 allow for the detectability of ∼1.7% and ∼1.3% PA signal changes, respectively. For example, to detect 1% functional changes, the former SNR requires three times of averaging (SNR increases by √ 3), while the latter SNR needs approximately two times of averaging (SNR increases by √ 2), allowing for faster detection of function. On the other hand, since the SNR was predicted based on the assumption of successful correction of the skull-induced acoustic scattering, the reconstructed structural image is expected to present higher contrast. However, such a correction algorithm is still under development.

Summary and outlook
This article reviewed and discussed the technical state of the art and challenges of translating PACT to functional human brain imaging. A human brain PACT system was suggested to employ a 3D hemispherical detection aperture with massively parallel and sensitive ultrasonic transducers with a ∼1-MHz center frequency. The skull's acoustic properties and the mechanism behind skull-induced acoustic aberration were discussed, suggesting that the skull remained the main obstacle for functional human-brain PACT. Several numerical skull modeling approaches were envisaged to correct for the skull-induced acoustic aberration, including subject-specific and subject-non-specific ones. The subject-specific approaches involved x-ray CT, MRI, or USCT, and the subject-non-specific method was based on the skull atlas. Their feasibility was discussed, suggesting that the USCT-based modeling method could provide the optimal tradeoff between modeling accuracy and system complexity. Conventional reconstruction algorithms and modern full-wave-based acoustic de-aberration algorithms were reviewed, suggesting that optimization-based de-aberration approaches can potentially provide the best reconstruction quality but with considerable computational cost. Alternatively, although data-driven techniques have not been implemented in transcranial PACT reconstruction, they are expected to play a revolutionary role in this problem due to the high computational speed and potentially high tolerance to skull modeling errors. To date, functional human-brain PACT has not been realized in healthy adults, and in the authors' opinion, the primary goal should be as simple as to demonstrate the detectability of any functional changes on the superficial cortex. Based on the preliminary functional results acquired on the hemicraniectomy human patients and ex vivo skulls, the feasibility of transcranial functional PACT was analyzed, suggesting that with potential improvement in the hardware and reconstruction methodologies, an SNR of ∼77 at ∼10 mm below the cortical surface is achievable. The corresponding detectability enables 1.3% functional changes to be revealed. In conclusion, although functional human-brain PACT has been considered one of the most challenging directions in the photoacoustic field, the authors believe it is a feasible direction and look forward to seeing breakthroughs in the foreseeable future.