Continuous real-time photoacoustic demodulation via field programmable gate array for dynamic imaging of zebrafish cardiac cycle

: A four dimensional data set of the cardiac cycle of a zebrafish embryo was acquired using postacquisition synchronization of real time photoacoustic b-scans. Utilizing an off-axis photoacoustic microscopy (OA-PAM) setup, we have expanded upon our previous work with OA-PAM to develop a system that can sustain 100 kHz line rates while demodulating the bipolar photoacoustic signal in real-time. Real-time processing was accomplished by quadrature demodulation on a Field Programmable Gate Array (FPGA) in line with the signal digitizer. Simulated data acquisition verified the system is capable of real-time processing up to a line rate of 1 MHz. Galvanometer-scanning of the excitation laser inside the focus of the ultrasonic transducer enables real data acquisition of a 200 by 200 by 200 pixel, volumetric data set across a 2 millimeter field of view at a rate of 2.5 Hz.


Introduction
Photoacoustic microscopy (PAM) is a hybrid imaging modality which exploits the photoacoustic effect to provide optical absorption contrast via ultrasonic detection. Through the photoacoustic effect, pulsed optical excitation is absorbed by a chromophore causing local thermoelastic expansion. This localized expansion propagates through the sample as a stress wave, and the wave can be detected via ultrasonic transducer [1]. PAM can provide optical diffraction limited lateral resolution and ultrasonic limited axial resolution [2]. In tissue, ultrasonic radiation is attenuated three orders of magnitude less than visible radiation, enabling PAM to image biological structures deeper than other high resolution optical imaging modalities [3]. The capability of PAM to collect an entire axial line (a-line) per laser pulse enables the capture of volumetric frames without the need for scanning of the axial direction. The high speed of PAM allows for capturing of volumetric data sets of dynamic biological processes in vivo. As such, real time PAM has previously been demonstrated in biological tissues [4][5][6].
PAM has been demonstrated as an effective tool for imaging both exogenous and endogenous chromophores. It has been employed to image microvasculature, melanoma tumors, and cell mitochondria through endogenous chromophores such as hemoglobin, melanin, and cytochrome C [7][8][9]. In a clinical setting, these chromophores can potentially aid in tumor margining and characterization of portwine stains [10]. Additionally, these endogenous contrast agents aid in the research of tumor angiogenesis, hemodynamics, and imaging of cellular mitochondria. Furthermore, through exogenous contrast agents including methylene blue, rhodamine 6G, and gold nanoparticles, PAM has been used as a research and diagnostic tool for a variety of applications [11,12].
In most PAM systems, the optical excitation and acoustic pathways are arranged in one of three co-axial designs. A reflection mode co-axial design allows for optimal geometry between the excitation and detection pathways to maximize axial resolution in thick tissue samples. However, reflection mode co-axial PAM systems require technically complex custom optical/acoustic beam combiners to allow for the excitation and detection pathways to be combined [13]. Alternatively, a transmission mode co-axial design uses an ultrasonic transducer opposing an optical objective to remove the need for a custom beam combiner; however, this approach is not feasible for thick tissue samples or for most in vivo applications. A third approach for performing co-axial PAM utilizes dark field optical imaging to direct the beam around the ultrasonic transducer to the sample. This setup again requires bulky custom optics to direct the excitation beam to the sample. Alternatively, optical methods can be used to detect pressure waves using phase sensitive interferometry and Fabry-Perot interferometry [14,15]. Optical detection suffers from drawbacks in the form of decreased sensitivity to acoustic waves or the requirement of direct contact with the sample. Previously, we have demonstrated off-axis PAM (OA-PAM) which takes advantage of the isotropic generation of photoacoustic waves to allow for confocal alignment of the excitation and detection pathways in reflection geometry, without the need for custom optics [16]. OA-PAM enables the use of commercially available microscope objectives and ultrasonic transducers at the cost of a mild reduction in axial resolution. OA-PAM may also experience some loss of sensitivity at high angles due to refraction of sound waves that could occur at the tissue-water interface; however, due to the matching of the acoustic impedance between water and tissue, this loss is minimal. Additionally, the off-axis geometry in many circumstances will enable the placement of the transducer closer to the source of the PA signal because there are no additional components needed for combining the acoustic and optical pathways. Obviously, a shorter pathlength translates into lower attenuation of the PA signal at the detector.
Real-time photoacoustic microscopy has previously been demonstrated through the scanning of the excitation beam through galvanometers, microelectromechanical mirrors, and voice coils [4,5,7]. These systems allow for collection of an entire volume of data in real time; however, data storage and processing act as a bottleneck between data acquisition and display. Most recently, Shi et al. demonstrated real-time capture of volumetric data sets through the use of PCI data acquisition cards [17]. This work overcame the data collection limitation typically associated with PAM, enabling collection of large PAM data sets in realtime. However, little work has been done to enable high speed data processing required to obtain morphologically accurate images from the raw pressure profiles collected by ultrasound transducers. Since the PAM signal is oscillating in time, the morphological information of the axial dimension is distorted by the bi-polar nature of the signal. Other imaging techniques, including magnetic resonance imaging and ultrasound imaging, collect bi-polar signals; however, these techniques employ demodulation and signal enveloping to process the bi-polar signal into morphologically accurate information [18]. Conversely, PAM images are typically published as maximum amplitude projection (MAP) images or processed through Wiener deconvolution [19]. While useful in many scenarios, MAP images take the highest intensity pixel in depth, thereby removing axial information from the image. Conversely, Wiener deconvolution corrects the bi-polar PAM signal for accurate morphological display of the axial dimension; however, it is an iterative process, and, consequentially, time consuming. Hilbert transform (HT) enveloping has also been employed in the enveloping of signals in photoacoustic imaging [20]. HT provides a robust method for enveloping of the photoacoustic signal due to its lack of need for any a priori knowledge of the signal being generated; however, HT does not provide the increase in axial resolution achieved through deconvolution.
We chose HT and quadrature demodulation (QD) as methods for enveloping the bi-polar signal of our PAM data as they are both commonly used methods for enveloping signals in medical imaging. QD computes an envelope of the signal by mixing a periodic signal of interest with in-phase and out-of-phase reference signals of the same frequency, and then utilizes a low pass filter to extract the output signal. HT combines the original signal with its Hilbert transform to build an analytical signal. The magnitude of the analytical signal is then computed to extract a time domain envelope of the bi-polar signal [21]. As previously mentioned, HT has a notable advantage in its lack of need for any a priori knowledge of the frequency content of the a-line. Conversely, for QD to display an accurate signal, the reference signals must be matched to the center of the input signal's frequency spectrum. QD benefits from low processing overhead since all of the signal manipulation can be performed in the time domain, while HT requires two Fourier transforms, adding a significant amount of processing overhead. HT and QD processing were run in MatLab on a raw photoacoustic data set of a Syrian Hamster Cheek pouch ( Fig. 1(a)). Both QD processing ( Fig. 1(b)) and HT processing ( Fig. 1(c)) resulted in a morphologically accurate representation of the time domain transducer response; however, due to the low pass filter, QD results in a smooth profile whereas the HT generates a signal envelope with minor ripples throughout the peak. We have chosen to implement QD for its smoother profile and lower processing overhead. We identified field programmable gate arrays (FPGA) and graphic process units (GPU) as hardware options for performing QD. Both FPGAs and GPUs are typically used in computing for accelerating complex calculations by offloading data from the central processing unit (CPU), performing the calculations in parallel on the FPGA or GPU, then sending it back upon completion [22]. In general, FPGAs have less overhead than GPUs when communicating with the CPU. For the purpose of continuous PAM imaging, unexpected latency when communicating the PAM signal could lead to data loss. Therefore, we chose to utilize an FPGA to perform enveloping of the PAM signal.
FPGAs are reconfigurable chips with interconnected logic gates. An FPGA has low overhead and determinism similar to an application-specific integrated circuit but with much more flexibility. The functionality of the FPGA can be defined by using software to configure the FPGA gates. The ability to reconfigure the FPGA through software is useful when performing QD since a different set of reference signals and a low-pass filter are required for each transducer used. Due to the architecture of the FPGA, many algorithms can be parallelized; making it an ideal tool for many biomedical applications [23][24][25]. By utilizing the parallel nature of the FPGA, the time taken to perform QD can be greatly reduced.

Data processing
For QD, the demodulated output signal, PA, comes from multiplying the input signal by a sine wave of frequency ω, equal to the center frequency of the transducer. Separately, the input signal is also multiplied by a cosine wave of frequency ω. Frequency mixing with the sine and cosine results in the shifting of the original signal to baseband and 2ω. While this could be accomplished with just one reference signal, using both sine and cosine removes sensitivity to phase variations. The results are then sent through a lowpass filter with a pass band dependent upon the transducer bandwidth. Finally, the square root of the sum of the squares is performed to provide the enveloped signal. Since the result of QD has a magnitude of half of the magnitude of the input signal, the result is multiplied by two to provide an accurate envelope. This process is defined in Eq. (1).
(1) While requiring less computing power than HT, QD is still a computationally intensive process with significant data processing required to perform the low pass filter. A distinct benefit of the FPGA is the ability to pipeline data processing. Pipelining allows complex calculations requiring multiple clock cycles to be parallelized for increased efficiency. Using parallel processing, these steps can be performed simultaneously on multiple data points. To illustrate the benefit of data pipelining, consider an algorithm that would typically take 100 clock cycles to process a single pixel. Utilizing an FPGA with an 80 MHz clock, processing a 1000 by 1000 pixel image would take 1.25 seconds without pipelining. With data pipelining, this 100 clock cycles per pixel processing overhead is translated to an initial latency of 100 clock cycles followed by a pixel being output every clock cycle. Utilizing the same 80 MHz FPGA, the 1 megapixel data set takes only 0.0125 seconds to process, an improvement of 2 orders of magnitude. An algorithm was developed to run on an 80 MHz FPGA (National Instruments 7965R) for performing QD on the PAM signal as it was collected. The low pass filter would typically act as a bottleneck in signal processing, taking 36 clock cycles of the FPGA to complete per pixel. At 80 MHz, this translates to a 450 ns delay per pixel. Through pipelining, this delay is translated into a 400 ns latency followed by constant throughput at 12.5 ns per pixel. Figure 2 illustrates the flow of data for QD through the FPGA. For a PAM system equipped with a 100 kHz laser, a 1000 by 1000 by 1000 pixel PAM data set could be collected in 10 seconds. With data pipelining this image can be collected and processed in 10.575 seconds, demonstrating a processing overhead of less than 6% of the total time for acquisition and processing. Using the same clock rate, it would take 550.25 seconds to collect and process an identical data set without pipelining. Table 1 demonstrates the throughput of the system and the latency that occurs at each step. The latency due to data processing could be removed from the system by implementing QD completely in hardware through signal mixers and analog filters. However, software based QD provides flexibility for optimizing filters for multiple transducers.

Preparation of zebrafish embryos
Embryos were treated with 1-phenyl 2-thiourea during development to block the formation of melanin [26]. Prior to imaging, the zebrafish embryo was anesthetized using 0.0175% Tricaine and embedded in a 1% agarose solution to restrict movement. The theoretical axial resolution of a PAM system is defined by Eq. (2) where ν is the speed of sound in tissue, Δf is the 6dB bandwidth of the transducer in receiver mode, and θ is the angle of the transducer relative to the optical axis [27]. The constant of 0.88 in the equation comes from the impulse response of the transducer being modeled as a Gaussian modulated sine wave. For a co-axial system, θ is zero, whereas for an off-axis system, the axial resolution is governed by the projection of the on-axis signal. For our off-axis PAM system where Δf equals 27 MHz and θ equals 50°, the theoretical axial resolution is 76.1 µm. In order to evaluate the axial resolution, a volumetric image (Fig. 4(a)) was taken of a small ink mark on a coverslip and rendered via Image J. Given the intrinsically thin nature of the ink mark, the resultant a-line will be the impulse response of the system, thus providing an approximation of the axial resolution [27]. An artifact due to the reflection of the acoustic signal through the coverslip is clearly seen in the b-scan of the data, displayed in Fig. 4(b). A demodulated a-line, shown in Fig. 4(c), taken from the ink mark was found to have a FWHM of approximately 90 μm. While this is in in fair agreement with the theoretical axial resolution given by Eq.

System evaluation
(2), discrepancies may have arisen due to the transducer impulse response varying from the Gaussian modeled sine wave model as well as errors in measuring the precise angle.
The minimum angle (in Eq. (2)) is restricted by the diameter of the objective lens, its working distance, the diameter of the transducer and its working distance. This clearly will become important for high NA (>0.5) objectives which typically have large diameters and short working distances.
Using an unfocused transducer has previously been demonstrated as an effective method for accommodation of galvanometer scanning [28]. In place of an unfocused transducer, a focused transducer may be used to optimize the signal to noise ratio (SNR) at the cost of field of view (FOV). By operating the transducer slightly out of its optimal focal depth a larger field of view can be obtained with minimal reduction of SNR. The transducer was translated along its focal axis such that the sample was located in the near field of the transducer focus. The loss of SNR with respect to the offset from the focus of the transducer was characterized using an ink sample. The corresponding increase in field of view was characterized using an air force resolution chart. SNR was calculated as twenty times the log base 10 of the peak signal strength to the standard deviation of the noise floor. Figure 5(a) demonstrates the adjusted design of the off-axis setup. Moving farther into the near field of the transducer provided an increase in lateral FOV, shown in Fig. 5(b). As expected, with the increase in lateral FOV came a reduction in SNR, shown in Fig. 5(c).
Signal fluctuations are known to occur in the near-field of a focused transducer and explain the noisy sinusoidal shape of the SNR. At the maximum field of view of 2.74 mm, the SNR was reduced by 1.15 dB from the focus of the transducer; however, this point corresponded with a peak in the near field fluctuations. Observing the SNR at a trough in the near field fluctuations, corresponded to a reduction in SNR by 3.85 dB with a field of view of only 2.4 mm; therefore it was important to maximize both field of view and SNR based on the near field fluctuations. Figure 5(d) characterizes the loss of SNR with increasing FOV demonstrating the fluctuations in SNR with respect to FOV. If the optical excitation was moved too far in the near-field of the transducer, the detection path of the transducer would be obscured by the side of the objective, limiting further improvement in the FOV. Lateral and axial resolutions were not affected by utilizing the near field of the transducer.
Moving into the near field of the transducer causes fluctuations in signal intensity in both the lateral and axial dimensions. In order to characterize the signal fluctuations caused by operating in the near field of the transducer, a piece of black tape was imaged as a uniform absorber. The SNR of the photoacoustic signal was monitored as a function of distance from the transducer. Figure 6(a) demonstrates the 5.1 dB variance of the lateral signal to noise ratio across the 6dB field of view of the transducer. Similarly, the signal strength in the axial dimension was characterized by imaging an ink mark on paper and observing the SNR of the photoacoustic signal as the axial position of the ink mark was adjusted. Likewise, Fig. 6(b) characterizes the 0.4297 dB variance of the axial signal to noise ratio across the 6dB depth of focus. These near field fluctuations of the PAM signal can limit the effectiveness of quantitative analysis of photoacoustic signals. Additionally, near field fluctuations across the entire field of view can reduce image quality. Near field fluctuations can be processed out through normalization of captured data to the near field of the transducer. Since imaging of the zebrafish embryo only utilized 1 mm of the available 2.5 mm field of view, signal generation could be localized to a region with minimal near field fluctuations, removing the need for signal normalization. Finally, to fully evaluate the potential for high speed PAM applications, the FPGA was setup to run at its maximum sampling frequency of 80 MHz and transfer demodulated data to the computer via first in first out array. With 80 pixels per a-line, a 200 by 200 by 80 pixel volume was simulated and the program ran without issue. This translates to a line rate of 1 MHz. A data transfer rate of 800 MB/s between the FPGA and the host computer enables continuous transfer of full b-scans through block memory of the FPGA. Currently, speed limitations of this system lie on the speed of the digitizer, the repetition rate of the laser, and the depth of focus of the objective lens. With a faster digitizer, greater line rates could be achieved; however, current hardware limitations between the FGPA and the host computer would eventually cap the maximum data throughput around 800 megabits per seconds. With a faster laser, greater line rates could also be achieved; however, when moving to greater line rates it becomes important to limit the optical excitation volume. This is to prevent the temporal overlap of PA signals generated shallow and deep in the tissue from two different laser pulses.

Dynamic imaging of zebrafish heart
To dynamically image the heart of a zebrafish embryo, demodulated b-scan sequences were captured in time across a 1 mm region of the zebrafish heart.Each frame in a sequence consisted of 200 a-lines captured at a line rate of 25 kHz. Each frame took 8 ms to capture followed by a gavlonoeter flyback of 0.8 ms for an effective frame rate of 113 Hz. Each sequence consisted of 200 b-scans for a total capture time of 1.76 seconds. Figure 7 shows 3 frames from a sequence of demodulated b-scans. Demodulation was performed in parallel to data acquisition via FPGA, no additional post-processing was performed. The full sequence is available in supplemental Media 1. A total of 31 b-sequences were captured in 5 µm increments across the zebrefish heart. The result was a collection of 31, 200 by 200 by 200 pixel data sets of the beating zebrafish embryo heart. The collection of sequences was then assembled into a four-dimensional data set via postacquisition synchronization. Postacquisition synchronization utilizes periodic events in a collection of videos to generate a time sequence of volumetric data sets [29]. In addition to postacquisition synchronization, the image was resampled 4 to 1 in the axial dimension to achieve symmetrical voxels. Figure 8 shows 16 volumetric images from a rendered 4-dimensional data set of the beating of a developing zebrafish embryo. The video of the four dimensional data set is available in supplemental Media 2.
A long working distance, low NA objective was used to image the zebrafish embryo heart tube. The long working distance was convenient for placing the embryo in the field of view of the imaging system but still provided enough resolution to resolve the heart tube. A zebrafish embryo 48-72 hours post fertilization has a documented heart rate of 147 beats per minute [30]. To capture a 200 by 31 by 200 pixel volume sampled at the Nyquist frequency would require a laser repetition rate of 75.02 kHz. Due to power limitations of the laser itself, the system was limited to a maximum repetition rate of 25 kHz in biological samples. Thus, it proved necessary to use postacquisition synchronization to capture an appropriately sampled volume of the beating zebrafish heart. Each frame was generated using ImageJ 3d-Viewer. Fig. 8. Above are three dimensional volumes taken from a four-dimensional data set of a zebrafish heart. In order to show the progression of a single heartbeat, the figure shows every third volume from the data set of a single cardiac cycle. Chamber 1, on the left, starts in diastole while Chamber 2, on the right, is in systole. As the heartbeat progresses, chamber 2 relaxes while chamber 1 begins to contract. At 0.1760 seconds, chamber 1 is in systole while chamber 2 is in diastole. The heartbeat finishes at 0.3344s as chamber 1 returns to diastole. Key frames are marked by an asterisk for clarity. (Media 2).
The zebrafish embryo has a two-chambered heart that pumps asynchronously. Figure 8 illustrates the asynchronous pumping of the two heart chambers so that when one chamber is in diastole, the other is in systole. Following the image from left to right, top to bottom, it can be seen that the left chamber starts in diastole, while the right chamber is in systole. As the beat progresses, the left chamber contracts into systole while the right chamber relaxes into diastole. Finally, to complete the cycle, the left chamber relaxes back into diastole and the right chamber returns to systole. At 48 -72 hours post fertilization, the heart of a zebrafish appears as a U-shaped tube. In Fig. 8, both chambers are shown, with the region connecting the two chambers lying outside of the imaging plane.

Conclusion
Through the implementation of galvanometer scanning, we have improved upon our design of off-axis PAM. Through the implementation of QD of a PAM signal via FPGA, we have enabled continuous collection and display of morphologically accurate PAM images. We have demonstrated the speed capabilities of the PAM system by capturing a four-dimensional data set of the beating heart of a zebrafish embryo. Currently, the limiting factor of the PAM system is the repetition rate of the laser. Using simulated data, we have verified the capability of the FPGA to handle laser repetition rates up to 1 MHz. By utilizing a higher repetition rate laser, 4-dimensional images of dynamic processes could be captured without the need for postacquisition synchronization. By eliminating the delay between image capture and display, FPGA technology can further enable research applications of PAM.