Feature coupling photoacoustic computed tomography for joint reconstruction of initial pressure and sound speed in vivo

: Photoacoustic imaging relies on diffused photons for optical contrast and diffracted ultrasound for high resolution. As a tomographic imaging modality, often an inverse problem of acoustic diffraction needs to be solved to reconstruct a photoacoustic image. The inverse problem is complicated by the fact that the acoustic properties, including the speed of sound distribution, in the image field of view are unknown. During reconstruction, subtle changes of the speed of sound in the acoustic ray path may accumulate and give rise to noticeable blurring in the image. Thus, in addition to the ultrasound detection bandwidth, inaccurate acoustic modeling, especially the unawareness of the speed of sound, defines the image resolution and influences image quantification. Here, we proposed a method termed feature coupling to jointly reconstruct the speed of sound distribution and a photoacoustic image with improved sharpness, at no additional hardware cost. Simulations, phantom studies, and in vivo experiments demonstrated the effectiveness and reliability of our method.

fat) to 1,700 m/s (for skin) [10], making the images reconstructed with the uniform SOS assumption subject to feature splitting, distortion, blurring, low contrast and poor quantitation. Therefore, taking into account the SOS distribution in PA image reconstruction is highly desirable.
With an extra acoustic source and its stepwise rotation, SOS distribution can be recovered through the analysis of the acoustic time of flight (TOF), at the expense of system complexity and imaging speed [11][12][13][14][15][16]. Alternatively, it is more tempting to jointly reconstruct the distributions of the SOS and the PA initial pressure (IP) by minimizing the difference between model-based predictions and experimental data, without resorting to additional SOS measurement hardware [17][18][19][20][21]. Unfortunately, most of these methods were only validated in simulations. Huang et al. proved that the conventional joint reconstruction problem is illconditioned, and numerically unstable [21]. The accuracy of the SOS estimation sensitively relies on the IP quantification accuracy, with the latter being hampered by the limited detection angle and temporal bandwidth of the sensors in real practice [21]. The instability of the linearized joint reconstruction method was also observed in [22].
There are some methods that mitigate the influence of the SOS heterogeneity, in which absolute quantification of IP is not necessary. For example, in half-time and partial-time reconstruction, a portion of the data is discarded [23][24][25]. Zhang et al. used signal correlation to obtain approximate TOF for each detector [26]. However, this method has an assumption that the detected target is relatively small compared to the detection geometry (i.e., the radius of the detection surface), because the integral surface can be approximated to a plane (in 3D) or a line (in 2D) and the calculation of time of flight between opposing detector pair can work. Moreover, the TOF of one point is calculated through linear scaling, which leads to large error if the SOS variation is not moderate. This method was tested in numerical experiments. Autofocusing finds a uniform SOS value for the whole imaged object, including the background, by optimizing the sharpness [27] or other image metrics [28]. However, the effectiveness of such methods are fundamentally limited when the SOS distribution is complex. Moreover, autofocusing is not an iterative search method based on gradient. It has to perform an exhaust search at a number of discrete SOS values, so the accuracy is influenced by the preset range and step. In a numerical experiment, Zhang et al. [29] obtained a single SOS value by minimizing the discrepancy between half-time reconstructions. This method is not based on gradient search either and its computational complexity depends exponentially on the number of unknowns.
We herein propose a simple and robust method to jointly reconstruct the SOS and IP distributions based upon only the PA data. We take advantage of the fact that any subsets of a full transducer array produce images with slightly displaced features, if the SOS distribution is not properly assigned during image reconstruction. Consequently, the image reconstructed from the full array will exhibit features that are split in various directions. Our method, termed feature coupling (FC), iteratively optimizes an SOS distribution by maximizing the similarity of images from the partial arrays. The optimization process gives rise to an IP image with minimum blur and simultaneously, a SOS map with high likelihood. One advantage of FC is its insensitivity to the quantification error in the IP. Technically, we adopted the gradient ascent with momentum algorithm in the iteration to accelerate convergence, and to avoid local optima. Furthermore, within each iteration, we designed and implemented accelerated fast marching method (AFMM) to accurately calculate TOF maps given SOS distributions of arbitrary complexity. The TOF maps are used in the successive image reconstruction. Being two hundred times faster than the original fast marching method, AFMM iteratively updates the TOF on a coarse grid in batch first, then on a fine grid through interpolation. We demonstrated the effectiveness and robustness of FC through numerical simulations, phantom studies, and in vivo experiments, on a panoramic (360° in-plane detection) PAT system. Similar systems have shown significance in both preclinical [30] and clinical applications [31].

Accelerated fast marching method
Filtered back-projection [32] is a commonly used method for IP reconstruction. Traditional back-projection assumes a uniform SOS distribution. For acoustically heterogeneous media, the calculation of TOF needs to take into account distinct SOS values of different part, and the formula of back-projection can be modified as in which 0 dΩ is the solid angle for a detection element at 0 r with respect to a reconstruction point at r, and 0 ( , ) TOF r r is the TOF between r and 0 .
r For an ideal detection (i.e., transducers respond instantly), the back-projection term takes the following form: In reality, the measurement signal 0 ( , ) S r t can be directly adopted as the back-projection term, that is [33], The fast marching method (FMM), which is based on the Eikonal equation (a non-linear partial differential equation whose solution tracks the TOF of a monotonically advancing front), can iteratively calculate the first arrival time of the photoacoustic wave with a known SOS distribution [34,35]. The Eikonal equation [34] is formulated as where ( ) T r is the TOF map and v is the SOS map. To solve Eq. (4), the domain is discretized into a uniform grid with a spacing h. Due to the monotone and causal properties of this discretization, let Eq. (4) turns to be a quadratic equation [34], whose solution is > assuming one of the partial derivatives is 0, the resolution is , , However, the original FMM has high computational complexity because it has to maintain the narrow band implemented using a sorted heap data structure [34]. To solve this problem, an accelerated version of the FMM, i.e., AFMM, is proposed. In this method, the slowly varying TOF can be calculated on a coarse grid first, and the corresponding TOF on a fine grid can be obtained through interpolation. Since the ray path is reversible, each sensor point is assigned as a point source, and the TOF at all pixel points of the reconstruction domain r Θ is calculated iteratively. The iterative procedure of AFMM can be summarized as Algorithm 1 in Appendix.
The TOF threshold ξ is formulated as where c h is the pixel size of the coarse grid, max v is the maximum SOS of the outer boundary of each iteration and α is a weight usually fixed to be 1.5. The pixel size of the fine mesh is set to be 50 m μ and that of the coarse mesh is 150 . m μ The reconstruction domain r Θ completely encloses the target (i.e., the cross-section of the animal or phantom) and is slightly larger than the target through image dilation, in order to avoid the detrimental interpolation error of the boundary of .  1)) can be performed, denoted as AFMM-BP.

Feature coupling method
The sound speed distribution ( ) v r is approximated using finite-dimensional parameterization, i.e., piecewise constant. An initial coarse segmentation of the target can be obtained using half-time back-projection [23] based on the SOS of water.
 denotes the SOS of different regions, except water, whose SOS can be estimated through temperature measurement [36]. For the ring array system, if the estimated SOS has an obvious deviation from the real distribution, the features of the reconstructed images due to two half rings do not completely co-locate, manifesting as feature splits in the full ring image. Then the SOS distribution can be obtained by maximizing the similarity of the half-ring reconstructions, mathematically where c is a similarity metric dependent upon Λ . We choose correlation coefficient as the similarity metric, the evaluation of which is performed on the region identified as the target, denoted as the correlation domain c Θ . Then, explicitly, P are the half-ring images reconstructed with AFMM-BP. To reduce computational complexity, for each half ring we only select a subset of elements for AFMM-BP. The chosen elements are evenly distributed along the array at a fixed interval. For our simulation and phantom experiments, the interval was N inter = 4 (elements); for our in vivo experiments, N inter = 2 (elements). To suppress negative-valued image artifacts due to sparse spatial sampling and limited detection angle, the negative parts in the reconstructed images are set to be zero before evaluating Eq. (8). The problem to be solved in (7) is not concave, so the gradient ascent with momentum algorithm is adopted to avoid local optima and accelerate convergence. The partial derivatives of the similarity function are calculated numerically by in which v Δ is a tiny SOS perturbation. The feature coupling algorithm is summarized in Algorithm 2 in Appendix. A simplified illustration of our panoramic PAT system is given in Fig. 1. A custom-made 256-element full-ring ultrasound detector array (Imasonic Inc.; 5.5 MHz central frequency; > 60% −6 dB bandwidth) is used for photoacoustic signal detection. As shown in Fig. 1, each element is geometrically focused in the elevational direction to provide sectioning. Two customized 128-channel preamplifier modules (20 dB gain) directly connect to the rear side of the transducer array. The outputs from the preamplifier modules are fed into two 128channel data acquisition units (Analogic Corp., SonixDAQ; 40 MHz sampling rate; 12-bit ADC; 36-51 dB programmable gain) for A/D conversion. 10 Hz laser pulsing is synchronized with data acquisition. In order to double the spatial sampling density, clamped mechanical rotation is applied to the transducer array, causing the array to rotate back and forth at ± 2π/512 (rad) between adjacent data acquisition. Two successive sinograms acquired at the interleaved positions are then merged to complete a 512-element measurement.

System
A 1,064 nm Nd:YAG laser (LOTIS, LS-2145-LT150, 10 Hz pulse repetition rate, 12 ns pulse duration) was employed for most of the experiments. The maximum energy density on the sample (11 mJ/cm 2 )) was below the American National Standards Institute (ANSI) security limits (100 mJ/cm 2 at 1,064 nm, 10 Hz laser pulse repetition rate). As for the multispectral experiment, an optical parametric oscillator (OPO) (SOLAR LP604,680-1,064 nm tuning range, 10 Hz repetition, 10 ns temporal width) was employed. The maximum energy density on the sample (11 mJ/cm 2 at 830 nm) was also below the ANSI limits. The light beams were expanded and subsequently coupled into a customized 1-to-10 fiber bundle (Silica fiber, transmission band 500-2,400 nm). The distal ends of the fiber bundle were arranged such that on the imaged object an annular illumination pattern was formed on the focal plane of the transducer array. The system was controlled by a high-precision delay generator (Stanford Research Systems, DG645). A heater with thermostat control was integrated into the system's water circulation unit to maintain a relatively constant water temperature of around 31°C. In our computer simulations, we used a 512-element ring array with a radius of 5 cm, resembling the real system. The SOS of water was set to be 1,480 m/s. The target was 2.85 cm in diameter and comprised five components with different SOS values (1,560, 1,580, 1,600, 1,620 and 1,640 m/s, respectively, Fig. 2(a)). Figure 2(b) denotes the 'gold standard' initial pressure distribution that we used in the K-wave toolbox [37] to generate the simulated pressure signals. We conducted two numerical experiments: 1. Ideal simulation (Expt. 1). The generated signals were contaminated by white Gaussian noise with a signal-to-noise ratio (SNR) of 40 dB. During the FC process, Eq. (2) was used as the back-projection kernel in AFMM-BP. 2. Partially real simulation (Expt. 2). The electrical impulse response (EIR) of our system was measured (with an absorber whose PA signal bandwidth greatly exceeds our system's bandwidth). Then the generated pressure signals were convolved with the EIR to better represent real measurements. White Gaussian noise was added, with a SNR of 40 dB. Equation (3) was used as the back-projection kernel in AFMM-BP.

Computer simulation
In the beginning of the iteration, all compartments shown in Fig. 2(a) shared a single SOS value of 1,625 m/s, in both Expts. 1 and 2. As Figs. 2(c) and 2(e) show, the SOS of different components gradually separate and approach the correct values. Figures 2(c) and 2(d) are the results of Expt. 1. The mean relative error of the recovered SOS is as small as 0.13%. In the IP image reconstructed using FC (Fig. 2(d)), all details are well recovered. After the pressure signals are filtered with the EIR (Expt. 2), the mean relative error of the recovered SOS is slightly larger but is still small (0.23%). In the reconstructed IP image (Fig. 2(f)), sharp features are smoothed but the image quality is still satisfactory. To better show the convergence capability of FC, a series of reconstructed IP images corresponding to the SOS distribution at different iteration steps are given in Visualization 1. A cylindrical dual-SOS phantom was made (Expt. 3) with agarose gel providing higher SOS than water in an enclosed cylindrical cavity [38] (Fig. 3(c) for the cross-section of the phantom). The India ink was diluted to be 0.625%v/v first. The body of the phantom was made of 5%w/w agar, 0.5%w/w intralipid and 0.85%w/w diluted ink. For the vessel-like PA features, we used 5%w/w agar, 0.5%w/w intralipid and 66%w/w diluted ink. Here, ink and intralipid were used to enhance optical absorption and scattering, respectively. We created a 9 mm-diameter cylindrical cavity in the phantom filled with water, whose SOS was assumed to be unknown. The temperature of the surrounding water was 25.6 ºC, so we calculated the SOS of water to be 1,498.3 m/s [36]. We used this value to infer the SOS of the inner cavity and compare with our reconstruction result. A lab-made phantom holder was used to stably mount the phantom in the system. For PA imaging, we used the 1,064 nm laser with an energy density on the sample of 6 mJ/cm 2 .

Phantom studies
The initial SOS values for the agarose background and the inner cavity (i.e., water) were set to be 1,620 and 1,530 m/s, respectively. These initial values were set for faster convergence. The reconstructed SOS for the agarose background and the inner cavity were 1,596.7 and 1,502.8 m/s, respectively. So the relative error of the water SOS was only 0.3% (as compared with 1,498.3 m/s). Figures 3(a) and 3(b) show the IP images corresponding to the initial and final SOS distributions, respectively. In Fig. 3(a), we can clearly observe feature splitting and artifacts due to an incorrect SOS map. With the SOS corrected, the PA image gets much sharper (Fig. 3(b)) to resemble the real object shown in Fig. 3(c).

In vivo experiments
For all in vivo experiments, mice were maintained under gaseous anesthesia of 1.3% volatile isoflurane. Each mouse was immobilized on a lab-made animal holder upright in the center of the ultrasound detector array. The trunk of the mouse was immersed in distilled water at around 31 °C. Raw data was averaged 131 times to enhance SNR. All animal experiments were conducted in conformity with the regulations of the Laboratory Animal Research Center at Tsinghua University, Beijing, China. Mice could be imaged repeatedly without noticeable adverse conditions due to imaging.

Single-wavelength imaging of the mouse trunk
A healthy nude mouse (8-week-old Crl: NU-Foxn 1 nu mouse) was scanned from liver to kidney with a step of 0.28 mm (Expt. 4). The excitation light was 1,064 nm. Figure 4 shows the IP images of some representative layers. Figures 4(a) and 4(c) were taken in the liver region, while Figs. 4(b) and 4(d) were taken in the kidney region. In the liver region, because the cross-section was almost entirely occupied by the liver, no segmentation was needed. The initial SOS values for different slices were set to 1,660 m/s. In the kidney region, three SOS values were taken into account (Fig. 4(e)), including kidney, bowel and intermediate tissue.
For slices in this region, the initial SOS values for kidney, bowel and intermediate tissue were 1,595, 1,620 and 1,620 m/s, respectively. Artifacts prevalent in the starting IP images vanished at the end of the iterations (Fig. 4(f)). A subset of the imaged cross-sections (layers 4, 6, …, 14 for liver, layers 25, 30, 35 and 40 for kidney) were processed with the proposed FC. The mean and standard deviation (std) of the liver SOS was 1,635.6 and 2.4 m/s, respectively. This indicates that the SOS of the whole liver is quite uniform. The mean and std of the kidney SOS were found to be 1,613.1 and 7.5 m/s, respectively, indicating a diverse distribution. A fly-though movie concatenating all layers is provided in Visualization 2. A movie showing feature convergence during the iteration is given in Visualization 3.  Multi-spectral PA imaging (or, multi-spectral optoacoustic tomography, MSOT) plays an essential role in functional and molecular imaging [1]. The key to multispectral imaging is that objects manifest wavelength-dependent features. Since only the IP image, rather than the SOS distribution, is wavelength dependent, ideally our method should converge to the same SOS distribution regardless of the excitation wavelength. To demonstrate this, laser pulses with different wavelengths were used for liver imaging (Expt. 5). A 8-week-old Crl: NU-Foxn 1 nu mouse was used in this particular experiment. The laser wavelengths was tuned from 680 to 920 nm, with a step of 30 nm. When a short wavelength (680 nm) was used, the light attenuated faster towards the center due to higher tissue absorption and scattering. As a result, the inner part of the body appeared vague (Fig. 5(a)). For longer wavelengths, the fluence distribution was more uniform and the inner vasculatures started to appear (Figs. 5(b) and 5(c)). The SOS of the liver was automatically sought by the program at these discrete wavelengths, see Fig. 5(d). Although slightly different, the estimated SOS at various wavelengths were quite consistent, with a mean of 1,620.5 m/s, and a standard deviation of 1.58 m/s. The remarkable consistency of the calculated SOS values at different excitation wavelengths proves that FC works robustly, even if some inner features are missing.

Pathological model: liver cancer
Liver cancer is the second leading cause of cancer-related death all over the world. Magnetic resonance (MR) elastography [39] and transient elastography [40] confirm that primary or metastatic liver tumors are stiffer than normal liver parenchyma, which indicates that the SOS may increase in the tumor region. Thus, we postulated that our method would uncover the SOS change on top of the tumor-induced angiogenesis contrast in the IP image (Expt. 6).
To prove the above hypothesis, three Crl: NU-Foxn 1 nu mice were orthotopically injected with HepG2 cells at the age of six weeks and then raised for 3 weeks. The excitation light was 1,064 nm. During imaging, six layers through the whole liver were measured for each mouse.
Then, all data were processed using FC, and an average liver SOS was calculated for each mouse. The mean SOS were 1,676.9 m/s (std: 4.0 m/s), 1,682.7 m/s (std: 9.6 m/s) and 1,686.0 m/s (std: 6.7 m/s), respectively. Compared to normal liver (Expts. 4 and 5), the average SOS increased by approximately 50 m/s. All mice were sacrificed after image acquisition. In one representative dissected liver, the tumor was clearly observed (indicated by the red arrow in Fig. 6(a)). According to the hematoxylin and eosin (H&E) stained histologic specimen, tumor cells underwent excessive proliferation, compared to normal tissue (red and black arrows in Fig. 6(b), respectively). On the in vivo IP image, although we expected to see blood vessels in the neighborhood of the tumor, it could be difficult to delineate the tumor boundary precisely without exogenous contrast agents. To dissociate FC from image-dependent SOS segmentation, we meshed the object into 39 triangles [41]. FC optimization was then performed over the 39 degrees of freedom and converged onto the final IP image shown in Fig. 6(c), with most of the features in sharp focus. The spiral 'feeding' vasculature of the tumor could be clearly seen in the figure (red arrow). The corresponding SOS map is shown in Fig. 6(d), with a high SOS region overlapping with the tumor.

Pathological model: hepatic injury
We applied the FC method to imaging of the IP feature and acoustic property changes due to hepatic injury (Expt. 7). In this experiment, six-week-old BALB/cAnNCrl mice were used. Subcutaneous injection with CCl4 and olive oil solution (20% volume fraction, bolus dose 0.1 mL/10g body weight) was carried out every 3 days for totally 10 weeks. In the last 3 weeks, intragastric infusion of ethanol (15%v/v) was performed. Before in vivo imaging, the hair of the trunk was removed, followed by an 8-hour abrosia with only water supplied, in order to reduce the strong PA signals from digested food.
Alcohol and its metabolites can induce hepatocyte injury via mitochondrial damage and endoplasmic reticulum stress, inhibit fatty acid oxidation, but favor fatty acid and triglyceride synthesis. The alcoholic liver diseases (ALDs) include steatosis, alcoholic fibrosis, and cirrhosis. Toxicosis gives rise to ballooning degeneration of hepatocytes. As can be seen in the H&E staining section (Fig. 7(e)), excess lipid droplets accumulated in vesicles. Because the SOS of fat is low (1,350 m/s) [10], the mean SOS of liver tends to decrease.
The excitation light was 1,064 nm. More than 3 layers of the liver were measured for each mouse. The mean SOS measured for two normal mice were 1,668.  Numerous hepatic nodules appeared in the dissected liver with injury ( Fig. 7(c)). The lobular architectures separated distinctly, so the IP features of the hepatic mouse liver shown in Fig. 7 (b) were different from those of a health mouse shown in Fig. 7(a). Our hypothesis about the SOS change was confirmed by histologic specimens stained with H&E (Figs. 7(d) and 7(e)). Compared to normal tissues (Fig. 7(d)), there were many vacuoles in the injured liver ( Fig. 7(e)), which contained lipid droplets and reduced the mean SOS of the whole liver. FC has high quantitative accuracy for multi-SOS problems in numerical experiments (Expts. 1 and 2). For both phantom and in vivo experiments, FC can converge to visually correct results and mitigate image distortion and blurring. Figure 8 shows the final IP images using the liver data in Expt. 4, to compare the performance of FC (Fig. 8(a)) and the popular halftime back projection ( Fig. 8(b)) [23]. The inferior vena cava and spinal cord have obvious distortion and splitting artefacts (highlighted in red box). However, relatively large computation burden is one of the limitations of FC. For FC, time cost for iteratively searching for the SOS map is 2403 s (with an element interval of eight), and the time for generating the final IP image using the full-ring elements (512 elements) is 313 s, while half-time back projection only takes 7 s (Intel(R) Core(TM) i7-7700 CPU @ 3.6 GHz). In the future, GPU acceleration will be needed.

Discussion and conclusion
Although deep vessels could not be clearly imaged with short wavelengths using our current imager, FC still showed good consistency for SOS estimation among different wavelengths (Expt. 5). The significant increase of the SOS for liver tumors (Expt. 6) can be measured with FC, indicating that FC not only produces better PA images, but also indicates the elastic property changes of solid tumors to enrich image contrast for cancer studies. Moreover, the successful application of triangular mesh proves that the implementation of FC is not restricted to image segmentation. If features are plenty in the initial IP reconstruction, blind meshing followed by FC reconstruction could be used to automatically identify the tumor region. The relevance of the FC method for pathological study was further verified by capturing the SOS drop in mouse livers with hepatic injury, and was shown to agree with the fact that the increased vacuoles and lipid droplets reduced the SOS of liver (Expt. 7).
To best implement FC reconstruction, the determination of the initial value for the iteration is a key, especially for the multi-SOS problem. If the initial value is too far from the optimum point, more iterations are needed. Moreover, the optimization problem for FC is not concave, so it may fall into suboptimal solutions. To guarantee the robustness and accuracy of the algorithm, the imaged object should contain features with high contrast, such as vessels and probes that have strong PA emissions. Another problem we encountered was that large structures, such as big vessels and the spleen, exhibited distinct features when reconstructed using the two respective half rings. For example, the part facing the half ring may appear bright, and cast small-valued or even negative 'shadows' away from the detector. The above polarity is inverted when reconstructed using the complementary half ring. The altered features dramatically reduce the sensitivity of our similarity-based method in finding the optimal SOS values. We are still working on this problem to improve the reliability of the method. Moreover, small target size and low image resolution can affect sound speed reconstruction. One last problem is that the blind segmentation shown in Fig. 6(d) is computationally demanding due to the large degrees of freedom. In addition, while it works fine to yield a sharp image, the SOS distribution may look 'jumpy' if no further regulation is applied (such as smoothness). We are still working on developing regulation techniques, or smarter fitting strategies, to make such blindly segmented SOS estimation more reliable.
Our method was demonstrated on a full-ring PACT system, but the basic concept of FC is rather generic and can be implemented on any detection geometry with moderate modification. The way we divided the array (into two adjacent half rings) is effective and simple for our system, but other division schemes may show improved efficiency and accuracy for other detection geometries. If the detection surface is divided into more than two sub-regions, image quality might be further improved, but how to evaluate the similarity of multiple images is one key problem. In sum, we envision that the FC concept has broader impacts in various PACT systems, in imaging applications where both the IP and SOS contrasts are of interest, and even in similar tomographic problems beyond photoacoustics. 4. k = k + 1.

If
I k N ≤ , go to step 1.

Funding
National Natural Science Foundation of China (61735016 and 61871251); Youth Innovation Fund of Beijing National Research Center for Information Science and Technology.