Enhanced contrast acoustic‐resolution photoacoustic microscopy using double‐stage delay‐multiply‐and‐sum beamformer for vasculature imaging

Abstract In acoustic‐resolution photoacoustic microscopy (AR‐PAM) systems, the lateral resolution in the focal zone of the ultrasound (US) transducer is determined by the numerical aperture (NA) of the transducer. To have a high lateral resolution, a large NA is used. However, the larger the NA, the smaller the depth of focus [DOF]. As a result, the lateral resolution is deteriorated at depths out of the focal region. The synthetic aperture focusing technique (SAFT) along with a beamformer can be used to improve the resolution outside the focal region. In this work, for image formation in AR‐PAM, we propose the double‐stage delay‐multiply‐and‐sum (DS_DMAS) algorithm to be combined with SAFT. The proposed method is evaluated experimentally using hair targets and in vivo vasculature imaging. It is shown that DS_DMAS provides a higher resolution and contrast compared to other methods. For the B‐mode images obtained using the hair phantom, the proposed method reduces the average noise level for all the depths by about 134%, 57% and 23%, compared to the original low‐ resolution, SAFT+DAS and SAFT+DMAS methods, respectively. All the results indicate that the proposed method can be an appropriate algorithm for image formation in AR‐PAM systems.


| INTRODUCTION
Photoacoustic imaging (PAI) is a promising imaging modality which provides structural, molecular and functional information. In PAI, a light source irradiates the tissue. Then, the subsequent thermal gradient induces wide-band acoustic waves based on the thermoelastic effects [1,2]. These waves are recorded by an ultrasound (US) transducer. The main advantage in PAI is to have both the optical contrast and US resolution in a single imaging modality. Compared to pure optical imaging, PAI provides a better resolution at more than one optical transport mean-free-path (more than $1 mm) due to the fact that the US waves get scattered 2-3 orders of magnitude less than the optical waves [3,4]. On the other hand, US imaging is based on reflections due to local differences in the mechanical properties of tissues, which makes it hard to detect early tumors. PAI addresses this problem since it is based on optical absorption, and the final photoacoustic (PA) image is the optical absorption distribution map of tissue [5,6].
Over the past decade, many papers have been published indicating different applications of PAI, for example, breast imaging [7], ocular imaging [8,9], cancer detection and staging [6] and sentinel lymph node (SLN) imaging [2,10,11]. Exogenous contrast agents have also been helpful during the development of PAI imaging systems [12][13][14]. Based on the imaging configuration, PAI systems are categorized into two groups: photoacoustic tomography [15,16] and photoacoustic microscopy (PAM) [17,18]. PAM is separated into two categories: acoustic-resolution PAM (AR-PAM) and optical-resolution PAM (OR-PAM). In AR-PAM, the resolution is determined by the focus of the US transducer while in the OR-PAM, the imaging resolution is determined by the optical focus [3,19,20]. In this work, we focus on the AR-PAM imaging systems. In addition, as the biomedical application, we focus on vasculature imaging. Clinically, noninvasive imaging of vasculature has a high value [3,[21][22][23][24][25][26][27]. Tumors usually lead to a distorted vascular architecture around them. Studies in oncology also indicated that angiogenesis (formation/growth of new blood vessels) has a critical value in clarification of status of a tumor and possibility of metastasis. Inability to form new blood vessels might be related to the low metastatic activity [3,28].
In PAM, usually a single element focused US transducer is used to detect the PA waves. Recording at one spatial coordinate of the XYZ system yields an A-line (1D image). By sweeping the US transducer in one direction, B-mode images (2D images) are obtained. In order to have 3D images, raster scanning is needed in both the X and Y directions (assuming the Z direction indicates depth). Once the imaging medium is fully covered, 3D images or maximum amplitude projection (MAP) can be used to visualize the optical absorption distribution of the imaging target [29].
The general method for image formation in PAM is putting all the A-lines next to each other to form a 2D/3D image. In AR-PAM, imaging resolution depends on the properties of the focused US transducer. The lateral resolution is calculated by R L, AR − PAM = 0.71c 0 /NAf 0 , where c 0 , NA, and f0 are the sound speed, numerical aperture (NA), and PA central frequency, respectively [1]. To have a high resolution, a large NA is desired. However, a large NA decreases the focal region (DOF) [30,31]. This is explained in Figure 1 where the beam B has a narrower beamwidth at the focal plane (indicating a higher lateral resolution compared to beam A), but its DOF is shorter than beam A. In such a condition, even though the lateral resolution in the focal zone (or plane) is improved, it gets deteriorated outside the focal region of the US transducer. Consequently, out-of-focus (out of DOF), the image quality is significantly reduced. To address this problem, the synthetic aperture focusing technique (SAFT) using the virtual detector (VD) concept was proposed for both US and PA imaging systems [32][33][34]. This method was also used for the in vivo imaging scenario in a PAM system [3]. In this method, it is assumed that the A-lines are detected at the VD location, which is the location of the focus of the US transducer. Then, new A-lines are synthetically generated by combining properly delayed versions of the detected A-lines. The 2D SAFT [35] and adaptive SAFT [36] are the expanded versions of the conventional SAFT [32][33][34].
For the synthesis of new A-lines, a beamformer is used. Usually, the delay-and-sum (DAS) beamformer (due to its simple implementation) is utilized to obtain the new A-lines. However, as demonstrated in previous publications, DAS is a blind beamformer which treats all the input A-lines equally. Consequently, it leads to a high noise level (low contrast) and a wide beam (low resolution). Different adaptive beamformers and weighting factors have been developed to address the incapabilities in DAS [37][38][39][40]. One of F I G U R E 1 Comparison of depth-of-focus and beamwidth for a large and small numerical aperture the methods is delay-multiply-and-sum (DMAS) which first was used for confocal microwave imaging (breast cancer detection) [41], and then for B-mode US imaging [40]. This algorithm was recently used in an AR-PAM system to improve the image quality [42]. We have previously reported an algorithm called double-stage delay-multiply-and-sum (DS_DMAS) [43][44][45], which outperforms DMAS in the terms of resolution and contrast.
In this work, due to the importance of vasculature, DS_DMAS algorithm has been used in an AR-PAM system to have a higher image quality and vasculature distinguishability. Since DS_DMAS outperforms DAS and DMAS in macroscopic-scale PAI scenarios, as demonstrated in [43,44], we expect to obtain a higher image quality with DS_DMAS in microscopic-scale PAI systems as well. DS_DMAS is utilized as the beamformer used in SAFT to synthesize the new A-lines. We have experimentally evaluated the DS_DMAS algorithm; using hair-target phantom and in vivo vasculature imaging. All the experimental results indicate that SAFT+DS_DMAS can be a proper algorithm to form images in AR-PAM imaging systems.
The rest of the paper is as follows: in Section 2, SAFT, the proposed method and the experimental setup are explained. The results are presented in Section 3. Section 4 contains discussion regarding the methods and results. The conclusion is provided in Section 5.

| Conventional AR-PAM
In conventional AR-PAM, the detected A-lines (1D data) are simply positioned next to each other, and at the end, a Bmode image is generated by showing the formed 2D data. Once all the B-mode images are arranged next to each other, then a 3D and/or MAP image can be generated. As mentioned, the problem with this method is that the lateral resolution is deteriorated [34].

| SAFT and VD
The focal point of the US transducer is considered as the VD (see Figure 2). Due to the diverging of the beam away from the focus, adjacent PA signals will overlap during the image formation procedure. As a result, a higher quality image is formed, compared to conventional method [3,33,34]. In other words, a set of new A-lines is generated by combining the detected A-lines and adjusting them based on the location of the VD and the synthesized imaging point.
The schematic presented in Figure 2 explains the way in which the intensity of a pixel in the final image can be obtained. As seen, the scan-lines in the vicinity of the synthesized pixel are used. At each location of the transducer, we have a laser excitation. Then, a time series of the PA waves are detected by the US transducer. This is considered as a low-resolution A-line. The A-lines are used as the input of the SAFT algorithm. In SAFT, it is assumed that the reflection from a synthesized point on the axis of the US transducer is also manifest of the detected A-lines next to this point (see Figure 2, red point in lines i − 1 and i + 1 have some information about the red point at line i). Therefore, by applying proper delays, the signals from these Alines can be made to overlap in the synthesized point. For this purpose, beamformers can be used.
Based on Figure 2, to form a high-resolution pixel at the depth z = ct and the i th scan-line, the following equation can be used: F I G U R E 2 The schematic of the synthetic aperture focusing technique (SAFT) along with the geometry of the virtual detector (VD) to find the appropriate time delays used in SAFT. z is depth of the synthesized point, z f is the depth of the VD, and r j ' is the distance between the VD of jth scan-line next to the ith one and the synthesized point where RF(i, t) is the received zero-mean signal at the ith scan-line. N D is the number of adjacent scan-lines (next to the ith scan-line) that contribute to the pixel. This number is determined by the opening angle of the transducer. Δt j is the time delay applied to the received signal of the jth scan-line next to the ith scan-line. The Δts for each scan-line are calculated based on the distance between the VD of scan-line and the synthesized pixel: where c is the speed of sound, z is the depth of the synthesized point, z f is the depth of the VD, and r j ' is the distance between the VD of jth scan-line next to the ith one and the synthesized point. Note that r j ' depends on z. The number of the involved scan-lines, considering the opening angle of the transducer, is indicated based on the depth of the synthesized imaging point: where w is the width of the transducer, and Δ Line is the separation between the scan-lines. Eq. (1) describes the standard DAS beamforming in which a geometrical time delay is applied to the detected A-lines, after which these are summed up. To overcome the blindness of DAS, DMAS was introduced in [40] for US imaging. The formula of the DMAS combined with SAFT can be written as follows: where c RF jk i, t ð Þ= S: The superiority of DMAS, in comparison with DAS, is due to the fact that DMAS uses a correlation process to reduce the noise level. This leads to a higher image quality [42]. The application of DAS and DMAS for AR-PAM systems has been reported before [33,34,42]. As evaluated in References [43,44] for PAI and [45] for US imaging, DS_DMAS even outperforms DMAS, mainly in terms of noise level (and hence contrast). In this paper, we propose to combine DS_DMAS algorithm with SAFT for AR-PAM systems. The DS_DMAS beamformer is generated from the expansion of the DMAS Equation [43,43,44], and its formula combined with SAFT is as follows: where e rf jk i, t ð Þ= S: and Readers are referred to the References [43][44][45] for further information regarding this image formation method. In the Section 3, it will be shown that DS_DMAS combined with SAFT provides a higher contrast (lower level of noise) and subsequently, a better image quality.

| Imaging setup
The imaging system used for data acquisition has been previously described in References [46,47]. The schematic of the AR-PAM setup used for imaging is shown in Figure 3. Laser pulses with a wavelength of 570 nm and a pulse repetition rate (PRR) of 5000 Hz were generated using a nanosecond pulsed laser system consisting of a diode-pumped solid-state Nd-YAG laser (INNOSLAB, Edgewave, Wurselen, Germany) and a dye laser (Credo-DYE-N, Sirah dye laser, Spectra Physics, Santa Clara, California). The laser beam was passed through a variable neutral density filter (NDF) (NDC-50C-4 M, Thorlabs), and focused on a multimode fiber (MMF) (M29 L01, Thorlabs) using an objective (M-10X, Newport, Irvine, California) placed on an XY translator (CXY1, Thorlabs). The output beam was collimated using a collimating lens L1 (LA1951, Thorlabs), and then converted into a ring-shaped beam by passing it through a conical lens L2 (1-APX-2-B254, Altechna, Vilnius, Lithuania), having an apex angle of 130 . The resultant beam was directed to a homemade optical condenser (OC) having cone angles of 70 and 110 . After total internal refection in the OC, the beam was refocused with an optical focal diameter of $2 mm. PA signal was generated on excitation of the sample by the resultant beam.
The reflected PA signal was collected using a 50 MHz ultrasound transducer (UST) (V214-BB-RM, Olympus-NDT, Waltham, Massachusetts) housed in the center of the OC. An acoustic lens (AL) (LC 4573, Thorlabs) with a radius of curvature of 4.6 and a 6 mm diameter was attached to the UST using a UV curing optical adhesive (NOA61, Thorlabs). Thus, the acoustic focal spot was $46 μm. To maximize sensitivity, the optical excitation and acoustic detection were cofocused such that the laser beam was loosely focused to the entire acoustic detection area. The scanning set-up was mounted on a three-axis motorized stage (TS)(PLS 85 for xand y-axis, VT 80 for z-axis, PI-Physik Instrumente, Karlsruhe, Germany), which was controlled by a three-axis controller (SMC corvus eco, PI miCos GmbH, Germany). A water tank (13 × 30 cm) with a 10 cm imaging window sealed with polyethylene membrane, was placed directly below the scanning head. Two amplifiers each having 24 dB gain (ZFL-500LN, Mini Circuits, Brooklyn, New York) amplified the PA signal before it was acquired by data acquisition (DAQ) card (M4i.4420, Spectrum, Grosshansdorf, Germany) installed in a desktop. The two channel DAQ card had a sampling rate of 250 Megasamples/second, 16 bit analogto-digital converter, and 4 GB onboard memory. A function generator (GW INSTEK, AFG-3051) triggered and synchronized the laser with DAQ. Data was acquired by performing two-dimensional raster scanning of imaging head using Labview (National Instruments) as an interface.

| Hair sample preparation
Six horse hairs having an average diameter of 80 μm were arranged at different depths in 1% agar solution in a Petridish. After solidification of agar, the petridish was directly placed in the water tank below the scanning head. To get a B-mode image containing all the 6 hair targets, multiple A-lines were collected by moving the scanning head continuously in direction perpendicular to the hair samples. The speed of motor was maintained such the distance between two A-lines was 3 μm.

| In vivo experiment
In vivo experiments were performed in accordance to the guidelines and regulations approved by the Institutional Animal Care and Use Committee of Nanyang Technological University, Singapore (Animal Protocol Number ARF-SBS/NIE-A0331). A 75 g adult female Sprague Dawley rat procured from InVivos Pte Ltd., Singapore, was used to image the vasculature surrounding the sentinel lymph node. The rat was anesthetized by intraperitoneal injection of 0.15 mL of cocktail containing ketamine and xylazine of dosage 85 and 15 mg/kg, respectively. During image acquisition, anesthesia was maintained by using vaporized isoflurane (1.2 L/min oxygen and 0.75% isoflurane, Medical Plus Pte Ltd., Singapore) delivered through custom made nose cone. The vitals of the animal were monitored using a pulse oximeter (Medtronic, PM1 0 N with veterinary sensor, Minneapolis, Minnesota) which was clipped to the hind-paw. F I G U R E 3 The schematic of the AR-PAM system used for data acquisition. AL: Acoustic lens, AR-PAM, acoustic-resolution photoacoustic microscopy; DAQ, data acquisition card, L1, L2, lenses; MMF, multimode fiber; NDF, neutral density filter; OC, optical condenser; TS, translation stage; UST, ultrasound transducer Before imaging, the hair above the fore-paw of the animal was depleted using commercially available hair removal cream. During imaging, the rat was placed on its side, such that the imaging area was directly below the polyethylene imaging window. Clear ultrasound gel was used as a coupling media to prevent any air gap between the rat and the imaging window.
A 12 × 15 mm 2 area around the SLN of the rat was scanned to acquire images of the vasculature. Initially, the set up was arranged such that most of the vessels were in the focal plane of the transducer. Raster scanning was performed by moving the scan head in step sizes of 30 and 3 μm along x-and y-axis, respectively. Next, the translation stage was moved twice with a step size of 1 mm along the Z-axis to increase the distance between the UST and the imaging plane. The same area around the SLN was scanned after each step to get out-of-focus images of the vasculature. It took approximately 32 minutes for our experimental system to scan the entire area. After imaging, the rat was euthanized by injecting an overdose of pentobarbital.

| B-mode images
In this section, the performance of the proposed method and other methods is evaluated for B-scans perpendicular to the hair targets. Therefore, it is expected to see cross-sections of the targets; hairs will look like points in the reconstructed images in Figure 4.
The focal zone of the transducer is close to the third target from the left. Figure 4A shows that the original lowresolution A-lines used in AR-PAM do not work well for targets outside the focal region of the US transducer. The targets, which are expected to be seen like a point, have an arc-shape geometry far from the focal zone. Using the SAFT +DAS ( Figure 4B), the detected signals are better aligned, and the arc-shape geometry of the targets outside the focal region of the US transducer gets closer to a point. However, the background noise affects the image quality. SAFT +DMAS reduces the background noise ( Figure 4C). As shown in Figure 4D, the degrading effects are even better suppressed by SAFT+DS_DMAS. The noisy line in the images obtained with SAFT is generated due to an inherent limitation of the VD method and SAFT, which is discussed in Section 4.
To better evaluate the improvements obtained by the proposed method, lateral variations of the six targets are presented in Figure 5. For all the targets, except the one in the focal zone of the transducer ( Figure 5C), the proposed method outperforms the other methods mainly because the lower noise level results in a higher contrast in the images.
To quantitatively evaluate the methods, the noise level is presented in Table 1. The amplitudes of the noise level in the first 0.75 mm in Figure 5 are averaged linearly, and this average is presented in dB in Table 1 Figure 6 shows the improvements obtained by SAFT+DS_DMAS at difference depths. The obtained improvement is low around the focal point of the transducer.

| Maximum amplitude projection
In order to evaluate the performance of the proposed method in 3D, MAP is used. For the hair-target phantom, the MAPs are shown in Figure 7 where SAFT+DS_DMAS provides a better-degraded noise. In addition, the width of the detected hair targets gets narrower by SAFT+DS_DMAS, compared to other methods. For better evaluation, the lateral variations of the MAPs are presented in Figure 9 along with a quantitative evaluation for the noise level in Table 2. The amplitudes of the noise level in the last 0.75 mm in Figure 9 are averaged linearly, and this average is presented in dB in Table 2.   The lower noise level provided by the proposed method is reflected by the numbers presented in Table 2 where, for instance, at the depth of 0.91 mm (far from the focal point), the SAFT+DS_DMAS leads to lowering the noise level by 30.7, 18 and 14.3 dB, in comparison with the low-resolution A-lines, SAFT+DAS and SAFT+DMAS methods, respectively. In F I G U R E 7 The maximum amplitude projection (MAP) of the hair-target phantom. The numbers on the right side indicate the depth of the hairs F I G U R E 8 Statistical data analysis for the noise level of the graphs presented in Figure 9. The mean values are presented in Table 2. Columns 1-4 represents low-resolution, SAFT+DAS, SAFT+DMAS and SAFT+DS_DMAS methods, respectively. A-F, These are corresponding to Figure 9A-F. DAS, delay-and-sum; DMS, delay-multiply-and-sum; DS, double-stage; SAFT, synthetic aperture focusing technique addition, considering the width of the mainlobe shown in Figure 9, it is clear that SAFT+DS_DMAS results in a narrower hair width, which indicates a better resolution. The adapted Figure 8 shows the statistical data analysis performed over the noise level of plots presented in Figure 9. On each box, the central mark indicates the median of the noise level inside the corresponding area. The bottom and top edges of the box indicate the 25th and 75th percentiles, respectively. This box shows the noise level of 50% of the lateral variations inside the corresponding area. The whiskers extend to the most extreme data points not considered outliers, and the outliers are plotted individually by the + symbol. It can be seen that the mean values (red line in the boxes) verify the values presented in Table 2. For all the depths, the SAFT+DAS and SAFT +DMAS provide distinguished boxes compared to the low-resolution method. However, the boxes related to these two methods have overlapping. It can be seen that the box related to SAFT+DS_DMAS has a lower value and no overlapping, in comparison with other methods, for all the imaging depths.

| In vivo imaging
The in vivo images with different methods of reconstruction are shown in Figure 10. In the first row, most of the absorbers (vasculature) are close to the focal point while in second row ( Figure 10B), most of the absorbers are 1 mm below the focal point of the US transducer. Figure 10C represents a zoomed version of regions indicated by the white rectangles in Figure 10A while most of vasculature are 2 mm below the focal point of the transducer.     The higher image quality obtained by the proposed method can be seen by considering the vasculature inside the blue circles used in Figure 10A,B. In addition, comparing the area indicated by the white dashed rectangle in Figure 10A, it can be seen that the vasculature is better distinguished using the proposed method, and a darker background is obtained. This indicates a higher contrast.

| DISCUSSION
Generally in PAM, the A-lines are put next to each other to form a 2D image (B-scan). In AR-PAM, since a focused US transducer detects the PA waves, a high resolution is obtained in the focal plane of the transducer. In applications in which usually a very high resolution is needed, a large NA (or a high frequency) US transducer will be used (see Figure 1). This leads to a deteriorated lateral resolution out of the DOF of the transducer.
To address this problem SAFT (using VD concept) was proposed in previous publications. Based on the angular extent of PA waves, SAFT assumes that each time sample in each A-line (Line i in Figure 2.2) has projections in A-lines in the vicinity of the line i, which is shown in Figure 2. In this case, based on the overlapping which exists between the triangular beam regions (above and below the VD), the synthetic high-resolution A-lines can be obtained. This method highly suppresses the background noise, as seen Figures 4  and 7. However, it should be noticed that this method does not remove imaging noise around the focal zone of the transducer (see the noisy area in Figure 4B). This so-called noisy line is due to the fact that we have a limited overlapping between the triangular beam regions of VDs around the focal plane. To clarify this, consider Figure 11 where the different amount of overlapping vs depth is shown. Combining more observations in a single pixel involves some kind of averaging and will reduce the noise level. This can be perceived from Equation (3), where N D is directly related to |z − z f |.
In AR-PAM systems, beamformers should be used combined with SAFT and VD techniques. In this paper, it is proposed to use the SAFT method with the DS_DMAS beamformer to improve image quality. DS_DMAS is previously evaluated for macroscopic PAI systems [43][44][45], and its superiority over DAS and DMAS has been proved before. Here, for the first time, it is used for AR-PAM. As proved by the lateral variations of the B-scans and MAPs, shown in Figures 5 and 9, respectively, the proposed method provides lower noise level and better contrast. Lateral variations for the hair target located at the depth of 3.42 mm ( Figure 5C) does not show a significant improvement since the focal plane of the US transducer is at the same depth. This depth, the original method already provides a high resolution and contrast. Therefore, using SAFT (ignoring the beamformer) does not significantly improve the image quality here. The proposed method does not improve the axial resolution in a noticeable manner while the range lobes are better suppressed (as seen in Figure 4) as a result of the higher noise suppression of DS-DMAS algorithm.
The noise level is directly related to the image contrast. Tables 1 and 2 represent the noise level for the B-mode and MAP images, respectively. As seen, the SAFT+DS_DMAS algorithm outperforms other methods. For the B-mode images obtained using the hair phantom, the proposed method reduces the average noise level for all the depths by about 134%, 57% and 23%, compared to the original lowresolution, SAFT+DAS and SAFT+DMAS methods, respectively. However, all the improvements achieved by the SAFT+DS_DMAS are obtained at the expense of a higher computational complexity. For each B-mode image, with our software implementation, it takes 0.02, 1.40, 6.69 and 14.45 seconds by the original, SAFT+DAS, SAFT +DMAS and SAFT+DS_DMAS methods, respectively. The system for image formation contains an Intel(R) Core(TM) i7-8650U @ 1.90 GHz-2.11 GHz processor and 8 GB of RAM. Recently, using implementation of the DS_DMAS on a Graphics Processing Unit (GPU) hardware, the processing time needed for DS_DMAS was reduced [48].
The results for the in vivo experiment (vasculature imaging) is presented in Figure 10. The higher performance of the proposed method can be appreciated by considering the background noise in the images and also the better distinguishability. The white rectangles in Figure 10A also shows the better separability of vasculature obtained by SAFT +DS_DMAS. It is also worth to mention that the proposed method provides a higher improvement when the targets are at a higher distance compared to the VD position, as indicated by Figure 6.

| CONCLUSIONS
In AR-PAM systems, to address the problem of the low resolution and contrast at regions outside the depth of focus of the US transducers, SAFT along with the VD was previously proposed by other researchers. In this paper, the DS_DMAS algorithm was used as the beamformer which works with SAFT to improve the contrast and resolution. The proposed method was evaluated experimentally using the hair-target phantom and in vivo vasculature imaging. Both the B-scans and MAP images were qualitatively and quantitatively evaluated. It was shown that the SAFT+DS_DMAS outperforms other methods in terms of contrast (affected by the noise level of an image) and resolution. For the images obtained from the hair phantom, SAFT+DS_DMAS reduces the average noise level for all the depths by about 18.95 dB, 12.28 dB and 7.56 dB, compared to the original low-resolution, SAFT+DAS and SAFT+DMAS methods, respectively.

CONFLICTS OF INTEREST
The authors declare no potential conflict of interests.

AUTHOR BIOGRAPHIES
Please see Supporting Information online.