Optoacoustic imaging at kilohertz volumetric frame rates

: State-of-the-art optoacoustic tomographic imaging systems have been shown to attain three-dimensional (3D) frame rates of the order of 100 Hz. While such a high volumetric imaging speed is beyond reach for other bio-imaging modalities, it may still be insuﬀicient to accurately monitor some faster events occurring on a millisecond scale. Increasing the 3D imaging rate is usually hampered by the limited throughput capacity of the data acquisition electronics and memory used to capture vast amounts of the generated optoacoustic (OA) data in real time. Herein, we developed a sparse signal acquisition scheme and a total-variation-based reconstruction approach in a combined space–time domain in order to achieve 3D OA imaging at kilohertz rates. By continuous monitoring of freely swimming zebrafish larvae in a 3D region, we demonstrate that the new approach enables significantly increasing the volumetric imaging rate by using a fraction of the tomographic projections without compromising the reconstructed image quality. The suggested method may benefit studies looking at ultrafast biological phenomena in 3D, such as large-scale neuronal activity, cardiac motion, or freely behaving organisms. State-of-the-art optoacoustic tomographic imaging systems have been shown to attain three-dimensional (3D) frame rates of the order of 100 Hz. While such a high volumetric imaging speed is beyond reach for other bio-imaging modalities, it may still be insufficient to accurately monitor some faster events occurring on a millisecond scale. Increasing the 3D imaging rate is usually hampered by the limited throughput capacity of the data acquisition electronics and memory used to capture vast amounts of the generated optoacoustic (OA) data in real time. Herein, we developed a sparse signal acquisition scheme and a total-variation-based reconstruction approach in a combined space – time domain in order to achieve 3D OA imaging at kilohertz rates. By continuous monitoring of freely swimming zebrafish larvae in a 3D region, we demonstrate that the new approach enables significantly increasing the volumetric imaging rate by using a fraction of the tomographic projections without compromising the reconstructed image quality. The suggested method may benefit studies looking at ultrafast biological phenomena in 3D, such as large-scale neuronal activity, cardiac motion, or freely behaving organisms.


INTRODUCTION
Imaging dynamics is essential for understanding the complex biology of living organisms. Optoacoustic tomography (OAT) holds significant advantages for monitoring of fast biological events, since the entire imaged volume can be excited with a single pulse of light, thus potentially enabling very high imaging rates ultimately limited by the time-of-flight of the generated ultrasound waves [1]. However, in contrast to previously demonstrated high frame rate two-dimensional (cross-sectional) imaging implementations [1], high-resolution three-dimensional (3D) optoacoustic (OA) imaging implies the acquisition of signals from a much larger number of positions around the imaged object [2][3][4][5][6], thus imposing a practical limitation for the temporal resolution or imaging rate that can be achieved. Indeed, even if sufficient numbers of OA signals are simultaneously collected with a suitable ultrasound array, data transmission, storage, and processing may turn very challenging as the frame rates increase. Recently, volumetric data acquisition (DAQ) and real-time 3D image rendering at 100 Hz frame rates have been demonstrated with high-end OAT systems employing matrix detection arrays and GPU-based processing of the generated data flows in the gigabits per second range [7][8][9]. With this volumetric imaging speed it became possible to efficiently monitor fast biological phenomena, such as calcium transients in the brain [10] or cardiovascular dynamics [9,11], whereas spiral volumetric optoacoustic tomography (SVOT) further enabled scaling of temporal resolution with the field of view to attain multi-scale dynamic imaging capabilities [12]. However, some biological events may still occur on a faster time scale. For instance, voltage transients (action potentials) in neurons have durations of the order of milliseconds and the recently developed voltage indicators have enabled both fluorescence-and absorption-based readings of those transients in one or two dimensions [13,14]. In addition, investigation of brain activity patterns is often done in freely behaving animals whose rapid movements may necessitate much faster volumetric imaging rates in order to avoid motion-related artifacts [15].
One possible way to boost the temporal resolution without increasing the average data flow consists of using partial DAQ and reconstruction procedures based on compressed sensing (CS) approaches. CS has attracted vast attention in the past decade, suggesting that it is possible to fully recover a signal from a lower number of samples than required according to the Shannon-Nyquist sampling criterion [16,17]. To this end, tomographic imaging modalities have greatly benefited from CS-based methods by reducing the amount of data needed for accurate reconstructions [18]. In OAT, recent works have showcased reconstructions from partial data using CS [19,20] .I nt h i sw o r k ,w et a k et h i sa p p r o a c h one step further by incorporating a total variation (TV) regularization term in the combined spatio-temporal domain in order to achieve significant acceleration of 3D imaging frame rates with negligible effects on the resulting image quality.

A. Experimental Setup
The DAQ protocol is schematically depicted in Fig. 1. The OAT system used for real-time acquisition of volumetric data employed a custom-made 30 mm radius spherical matrix transducer array (Imasonic SaS, Voray, France) spanning an angle of 140°(1.3π solid angle) and consisting of 512 piezocomposite elements having a size of ∼2.5m m . The 10 MHz central frequency and >80% detection bandwidth of the individual detection elements provided an isotropic 75 μm spatial resolution near the center of the spherical acquisition geometry and approximately 6m m× 6m m× 6m meffective field of view [21]. Light excitation with ∼10 ns duration pulses at 532 nm wavelength was provided with a diode-end-pumped Nd:YAG Q-switched laser (Model IS8II-E; EdgeWave GmbH, Würselen, Germany) whose pulse repetition frequency can be set up to 10 kHz. The light beam was guided through a custom-made fiber bundle (Ceramoptec GmbH, Bonn, Germany) providing approximately 1m J ∕cm 2 fluence in the imaged area. A dedicated multi-channel DAQ electronics was designed to simultaneously digitize the detected OA signals at a sampling rate of 40 megasamples per second. The DAQ allows for the acquisition of 494 samples per channel per laser pulse. The digitization was delayed by 13.3 μs with respect to the laser excitation pulse in order to efficiently capture signals originating from the imaged area located around the effective field of view of the spherical array, i.e., 30 mm away from its active surface. The average data throughput of the DAQ is limited by its 1 Gbps Ethernet interface. Thus, only a 100 Hz frame rate can be achieved when acquiring data from all 512 channels simultaneously for each laser pulse, corresponding to an average data flow of ∼400 Mbps including the transmission protocol overhead. As a result, the imaging frame rate can be increased only by reducing the number of channels recorded for each laser pulse, i.e., spatial subsampling of the tomographic data.
The sparse acquisition scheme was facilitated by splitting the detection elements into m randomly distributed groups of n elements, as labeled with different colors in Fig. 1(a), with m · n 512 being the total number of the array sensors. The designed DAQ system is capable of sequential acquisition of signals from each group, triggered with the excitation laser pulses [ Fig. 1(b)]. After signals from all 512 channels have been acquired, the same acquisition sequence was repeated. In this manner, the amount of data that needs to be transmitted per each laser pulse is reduced, thus allowing increasing the imaging frame rate by a factor of m without altering the average data transfer rate.

B. Image Reconstruction
The TV-based image reconstruction procedure was derived from the recently introduced model-based OA reconstruction approach [22]. The pressure waveforms p i detected by the transducer elements at the ith acquisition were modeled in a vector form via where h i is the distribution of the absorbed energy density in a grid of voxels enclosing the sample for the ith acquisition, A is the model matrix derived from the discretization of the OA forward model considering all transducer elements [23], and C i is a selection matrix that picks up only signals of the corresponding group. A uniform Grüneisen parameter was assumed for simplicity [22]. For l consecutive acquisitions, the full theoretical model is expressed as where , h l T , and I l is the l by l identity matrix. According to Nyquist sampling theory, accurate reconstruction of the full sequence of images h would in principle necessitate significantly more information than is actually available through our sparse sampling approach. However, it has been previously shown that sparsity of the solution can be exploited to recover information from fewer samples using CS principles [19]. For this, the signal being recovered must exhibit sparsity in some domain and the sampling matrix must have restricted isometry property (RIP). The latter property is generally hard to prove, but the probability of a randomized matrix to fulfill the RIP condition is known to be relatively high [19]. Since our acquisition electronics does not permit randomizing the temporal sampling space, we attempt to achieve RIP for matrix CA tot by randomizing the spatial subsampling imposed by the C matrix. Although here the RIP property was not rigorously demonstrated, such randomization is known to facilitate attaining RIP conditions [24]. Consequently, the CS-based reconstruction problem is defined as with p m being the measured pressure signals and λ a regularization parameter. khk TV is the spatio-temporal TV of the image sequence defined as where x, y, and z are spatial dimensions and t is the temporal dimension.
The TV norm represents a sparsity-enforcing regularization term that can be efficiently exploited with CS-based methods [17]. Equation (4) further leverages the temporal sparsity of the captured images. Here we created a custom solver for Eq. (3) based on the sub-gradient descent algorithm [25]. The latter was selected owing to its straightforward implementation.

Research Article
Yet other approaches based on, e.g., accelerated primal dual methods may provide better convergence [26], while more advanced algorithms for reconstructing spatio-temporal data are also available [27,28]. The gradient required for the sub-gradient descent method was expressed analytically as the divergence of the normalized gradient of the image: where div is the divergence operator [29] and a is a smoothing term that was added to the denominator to avoid indefinition when the gradient equals 0. This approximation of the gradient ∇ a results in a smooth function getting closer to the original gradient as the value of a approaches 0. The resulting iteration scheme is expressed as where and the step size for the descent algorithm was determined according to the Barzilai-Borwein method. The algorithm is iterated until a satisfactory image quality is achieved. Application of TV regularization in the temporal domain has recently been implemented in the field of MRI [28]. In our case, weighing of temporal domain gradients against their spatial counterparts was performed via (8) Assuming small frame-to-frame variations in the imaged object, it was heuristically determined that satisfactory image quality is achieved when w s w t , an assumption that significantly decreases the algorithm's complexity.
Note that due to the high data volumes resulting from the volumetric image acquisition, the suggested approach is not capable of reconstructing long image sequences in a single step-a 16000 frame sequence of 3D images containing 128 × 128 × 64 single precision voxels would require 312.5 gigabytes of memory. To overcome this memory-related limitation, we retained only the central image as an output from the L images reconstructed with the spatio-temporal TV-regularized inversion using L consecutive frames. In the subsequent sections, this method will be referred to as TV × L, with L representing the size of the temporal window.
The reconstructions were implemented using OpenCL framework and executed in Matlab (MathWorks, Natick, Massachusetts) as a mex function on Nvidia 900 Series Titan X GPU. A single 3D image on a 128 × 128 × 64 grid was rendered in 50 ms by using the accelerated 3D back-projection (BP) algorithm [30]. The TV-based reconstructions were done using 10 iterations with a custom sub-gradient descent solver using the same image grid. The most computationally extensive matrix-vector multiplications required by the solver were the multiplication with the model matrix (A) and its transpose.
The latter were calculated on the fly using the recently reported GPU implementation [31]. The remaining calculations were done on the Intel i7 3820 CPU. The reconstruction times were 6, 9, 15, 24, and 55 s per individual 3D frame when considering 1, 2, 4, 8, and 16 consecutive acquisitions with 512, 256, 128, 64, and 32 random channels per acquisition, respectively. Note that the total reconstruction time is chiefly conditioned by the number of computationally expensive matrix-vector multiplications, which is in turn proportional to the number of consecutive frames considered in the TV procedure.

C. Numerical Simulations
Performance and validity of the algorithms were first verified in numerical simulations. For the purpose of assessing the temporal and spatial resolution performance, a 75 μm diameter sphere (approximately the size of the spatial resolution of the system) was simulated moving at various velocities. Simulations were based on analytical calculation of OA signals generated by a sphere having a 3D parabolic absorption profile, for which the generated timeresolved pressure signal can be calculated analytically [32], thus prov i d i n gar e l i a b l em e t r i cf o ra n a l y z i n gt h es y s t e m 's performance. It was assumed that the sphere was illuminated uniformly in the entire reconstructed 3D volume by a laser pulse whose temporal profile was approximated by a Dirac delta function. The matrix array transducer geometry was approximated by 512 point detectors evenly distributed over the 140°spherical aperture. The individual signals received by each transducer were calculated using a Poisson-type solution to the OA wave generation equation [33]. The detectors were assumed to have 10 MHz central frequency and >80% bandwidth. The 512 detected signals were then combined to create a sinogram. An additive Gaussian white noise was added to the sinogram, resulting in SNR of 20 dB. For sparse acquisition, only the active channels were simulated. For each subsampling scheme, a new simulation was created, resulting in a slightly different set of signals due to the added noise. Each 3D image frame was reconstructed on a 128 × 128 × 64 grid enclosing a 1m m× 1m m× 0.5m m volume. For TV-based reconstructions, the λ value in Eq. (3) was selected manually to render the best perceived image.

D. In vivo Imaging of Zebrafish Larvae
Experiments were performed with five days post-fertilization (dpf ) wild-type zebrafish larvae. The reason for selecting zebrafish larvae for experimental demonstration was twofold. First, their swimming speed in water is high enough to account for significant displacements (on a spatial resolution scale) within a few milliseconds. For instance, when the zebrafish tail moves with 100 mm/s velocity, the inter-frame motion may reach 1 mm for a 100 Hz frame rate [34], thus exceeding by more than 1 order of magnitude the point spread function of our imaging system (75 μm). Imaging at kilohertz rates becomes then essential to smoothly track the larval motion. Second, their size is small compared to the effective field of view of the imaging system, which allowed efficiently exploiting image sparsity during the reconstruction procedure, as detailed in the next section. During the experiments, the spherical transducer array was pointing upward and the volume between the array and the imaged specimen was filled with agar gel to guarantee good acoustic coupling. A 1 cm diameter cavity was carved into the agar at approximately the center of the spherical geometry, where the larva was allowed to swim.

A. Numerical Simulations
Performance of the sparse acquisition TV-based reconstruction algorithm was first validated by numerically simulating volumetric imaging of the moving absorbing sphere. Since the algorithm exploits signal sparsity by finding piecewise constant solutions to the reconstruction problem, fast motion would unavoidably result in reduced frame-to-frame sparsity of the data. Furthermore, for rapidly moving absorbers, the given temporal sampling frequency may become insufficient to capture a complete tomographic information. Figure 2 shows dependence of the reconstructed size of the moving sphere (FWHM) versus its normalized (dimensionless) velocity calculated viaῡ v∕f s · D, where v m∕s is the actual velocity, f s Hz is the imaging frame rate, and D [m] is the size of the sphere. Normalization with D was necessary to create an independent variable for the motion speed that neutralizes the effect of the physical object size on the spatial resolution performance of the system. Clearly, for very slow relative velocity values, the TV algorithm is able to recover the correct size of the absorber using a very low number of random channels per acquisition with both TV × 4 (128 random channels per frame) and TV × 16 (32 random channels per frame) algorithms performing equally well. As the normalized velocity approaches 1, the reconstructed sphere gets smeared along the dimension of motion with its reconstructed size appearing ∼4 times longer that its actual diameter when using TV × 4 and ∼9 times longer when using the TV × 16 algorithm. These blurring artifacts are absent in reconstructions where no temporal sparsity is enforced (TV × 1; dashed lines in Fig. 2). This is expected since single frame OA reconstruction is not affected by motion. However, while the TV × 1 method introduces less spatial blurring for fast motion speeds as compared with the spatio-temporal TV × 16 and TV × 4 methods, it becomes inferior to the latter methods when the motion is relatively slow. This effect is expected to be even more pronounced when the images exhibit less sparsity in the spatial domain. Clearly, spatial blurring can be completely avoided by employing simultaneous sampling with all 512 channels, which, however, results in temporal blurring, as the imaging frame rate is limited to 100 Hz. In Fig. 3, reconstructions of the absorbing sphere made with the TV algorithm are further compared to the conventional BP algorithm, both for full channel sampling and sparse sampling. In this case, the sphere was moving at a normalized velocity of υ 0.1 for all consecutive image frames. While the TV-based reconstruction is able to render a sphere of correct size and shape for both sparse and full sampling, the BP algorithm results in a noisy reconstruction containing artifacts resulting from subsampled data when using only 32 random tomographic projections [ Fig. 3(a)]. Figure 3(b) further illustrates the dependency of contrast-to-noise-ratio (CNR) on the number of consecutive frames considered during the TV-based reconstruction. For the single frame reconstruction, the TV-based approach attains a better CNR as compared to BP for the same number of channels. As expected, the CNR increases as the number of consecutive image frames in the TV inversion increases. Note that combining multiple frames of a moving object into a single BP reconstruction would readily result in spatial blurring, thus only single frame BP reconstructions were considered in the CNR plot. Numerical simulations of the TV-based reconstructions of a moving absorbing sphere showcasing spatial resolution degradation as a function of the relative motion speed. The data is presented for the TV algorithm using 16 consecutive frames with 32 random channels per frame (TV × 16), four consecutive frames with 128 random channels per frame (TV × 4), and one frame (TV × 1) with 128 or 32 random channels. The reconstructed relative sphere size, i.e., the reconstructed full width at half-maximum (d) for the longest dimension divided by the actual sphere diameter, is plotted against its relative speed of motion measured as the ratio between the inter-frame displacement of the sphere and its diameter. The reconstructed 3D images of the swimming zebrafish larva as a function of the number of random tomographic projections (detection channels) per frame are displayed in Fig. 4. The leftmost column showcases the images reconstructed with the BP algorithm, whose image quality significantly deteriorates for partial DAQ, with the larva barely visible when acquiring with groups of 32 channels per laser pulse (fifth row). The maximum frame rate that can be achieved for the given average data transmission limit of the system is inversely proportional to the number of channels acquired per frame (laser pulse), as indicated on the vertical axes in Fig. 4. The images obtained with the newly developed CS-based method are showcased in the other columns of Fig. 4 as a function of the number of acquisitions employed for the reconstruction. The reconstructions in each column were performed by considering L consecutive frames (denoted as TV × L). The image artifacts are readily mitigated even when using the single-frame TV algorithm for the partial data reconstructions (TV × 1 in Fig. 4), which corresponds to TV regularization only in the spatial domain. Indeed, high contrast and good spatial resolution images are achieved when at least 64 random channels are used with TV × 1. Image quality further improves once spatio-temporal TV regularization is used. For example, the shape of the larva can be accurately reconstructed when considering two consecutive acquisitions with 32 random channels each (TV × 2). Image contrast and overall quality for a given number of channels per group increase with the number of frames used in the CS-based reconstruction, indicating the importance of TV regularization in the spatio-temporal domain.
A 3D movie of the swimming larva generated with the TV × 16 algorithm is available as Visualization 1. Note that the video is displayed at 30 frames per second for a better visualization of the motion, while the actual 3D acquisition frame rate is 1.6 KHz. It can be clearly seen that the suggested CS approach is able to fully recover temporal information without introducing spatial blurring, as long as the images are sparse in the spatio-temporal domain. Each video frame was generated by simultaneously reconstructing astackofL frames and then selecting the middle frame for display. Note that isotropic TV regularization was used, which resulted in a smooth video without jumps or frozen segments.
To further showcase the spatio-temporal resolution performance of the method, Fig. 5 displays two consecutive images of the swimming larva (in red and green) during an abrupt turn. The image sequence was reconstructed using TV regularization with different acquisition frame rates ranging from 100 Hz to 1.6 kHz, while the corresponding optimal number of channels per acquisition group ranged from 512 to 32. While a significant shift of the larva's head and tail is clearly visible for images acquired at 100 Hz acquisition rate, the two consecutive images progressively overlap as the frame rate increases. Indeed, while the inter-frame eye motion accounts for 512 μm at 100 Hz, it diminishes to 50 μm at 1.6 kHz, i.e., below the effective spatial resolution of the system. Thus, kilohertz frame rates are essential to accurately track the fish motion.

DISCUSSION AND CONCLUSIONS
The presented results demonstrate basic feasibility of OA imaging at an unprecedented 1.6 kHz volumetric frame rate. The proposed fast sparse acquisition imaging method was shown to capture accurate images without loss of spatial resolution when the inter-frame speed of object motion remained significantly below the available (diffraction-limited) spatial resolution of the system. For instance, when sparsely sampling 32 channels per frame and using the TV × 16 algorithm for image reconstruction, the normalized object velocityῡ should remain below 0.1 (Fig. 2). In other words, for the 75 μm resolution system running at a volumetric frame rate of 1.6 kHz, the object motion between the consecutive image frames should remain below 0.1 · 75 μm · 1.6k H z 12 mm∕s in order to avoid degradation of the spatial resolution. For a denser sampling with 128 channels and reconstruction with TV × 4, theῡ value can be increased to 0.2, leading to a maximal permissible object velocity of 24 mm/s provided the spatial and temporal resolution parameters remain unchanged. The ultrafast volumetric imaging capability introduced in this work can greatly benefit applications involving tracking of functional activity in freely behaving organisms [35]. Furthermore, in cardiac imaging the heart rate may reach 600 beats per minute in small animals, thus accounting for significant motion within several milliseconds. Thus, 3D imaging at kilohertz rates can facilitate accurate characterization of the cardiac cycle and valve motion [11], detection of functional alterations due to arrhythmias, and other cardiac diseases [36]. Since blood flow may reach significant velocities in major vessels, fast imaging frame rates can be used to facilitate more accurate spectral unmixing of oxygenation status and perfusion [37]. The methodology introduced in this work may further serve applications involving imaging and tracking of circulating cells or other extrinsically administered agents, and facilitate enhancement of the spatial resolution beyond the diffraction limit [38].
The ultrafast 3D imaging capacity is not only limited to investigations involving physical motion but may equally find use in other applications where visualization of (sub-) millisecond biological dynamics is of importance. One particular example is the field of functional neuroimaging with large-scale brain activity typically occurring on multiple temporal and spatial scales. While imaging the relatively slow hemodynamic processes remains an important application of OA imaging in neuroscience [8], functional OA neuro-tomography has been recently shown capable of detecting rapid millisecond-scale dynamics using calcium and voltage-sensitive indicators [14,21].
In conclusion, the newly introduced 3D OA imaging capacity at unprecedented kilohertz rates holds promise to provide new insights into biological function not attainable with the existing bio-imaging techniques.
Funding. H2020 European Research Council (ERC) (ERC-2015-CoG-682379); National Institutes of Health (NIH) (R21-EY026382). Inter-frame eye motion ( m) Frame rate (Hz) µ Fig. 5. Spatio-temporal resolution performance of the TV-based CS approach. Two consecutive images of a swimming larva rendered at different frame rates are superimposed in red and green. While a significant shift in the larva position is clearly visible for frames recorded at 100 Hz acquisition rate, the two consecutive images progressively overlap as the frame rate increases. The inter-frame motion of the left eye, marked with blue arrows, is plotted against the frame rate. The red arrows indicate parts of the fish that moved significantly between the two consecutive frames.