Sparse sampling and reconstruction for an optoacoustic ultrasound volumetric hand-held probe

: Accurate anatomical localization of functional information is the main goal of hybridizing optoacoustic and ultrasound imaging, with the promise of early stage diagnosis and disease pathophysiology. Optoacoustic integration to ultrasound is a relatively mature technique for clinical two-dimensional imaging, however the complexity of biological samples places particular demands for volumetric measurement and reconstruction. This integration is a multi-fold challenge that is mainly associated with the system geometry, the sampling and beam quality. In this study, we evaluated the design geometry for the sparse ultrasonic hand-held probe that is popularly associated with three-dimensional imaging of anatomical deformation, to incorporate the three-dimensional optoacoustic physiological information. We explored the imaging performance of three unconventional annular geometries; namely, segmented, spiral, and circular geometries. To avoid bias evaluation, two classes of analytical and model-based algorithms were used. The superior performance of the segmented annular array for recovery of the true object is demonstrated. Along with the model-based approach, this geometry oﬀers spatial invariant resolution for the optoacoustic mode for the given ﬁeld of view. The analytical approach, on the other hand, is computationally less expensive and is the method of choice for ultrasound imaging. Our design can potentially evolve into a valuable diagnostic tool, particularly for vascular-related disease.

with the minimum rate of twice the central frequency. Following the theory of compressed sensing, it was shown that this line of thought can be obviated at the acquisition site [25][26][27]. The geometrical distribution of transducers (elements) has a similar role to the sampling pattern in compressed sensing. To take the advantage of this framework, careful engineering must be implemented, in terms of a sensing mechanism that emulates a randomized incoherent sampling pattern. Hence, by practicing the three principles of incoherence, sparsity, and random subsampling, a drastic reduction in the number of transducers with confidence in reconstruction fidelity is expected. Indeed, sparsity is exploited with a proper reconstruction algorithm, where the recovery of important coefficients is guaranteed.
In our companion paper [28], we presented the conceptual framework for designing a bimodal array, and argued the performance of three sampling patterns (Fig. 1). Here, we complement our original study by providing reconstruction-based analysis and further validation for simulated data. We investigate feature fidelity to the object that is associated with the merit of the reconstruction algorithm. In practical terms, the use of inverse problems in array imaging forms the deconvolution problem, which faces the ill-posedness [29,30] and large-scale [31][32][33] challenges. Given the partial view angle of the hand-held probe, the forward model is rankdeficient [28], and thus effective regularization must be incorporated. In this paper, we seek the answer in Krylov subspace, in which a hybrid regularization method combines the few steps of conjugate gradient least squares (CGLS) with total-variation (TV) penalization. In general, the success of sparsity-based reconstruction appears to be tied to the incoherence and to the restricted isometry property [26]. However, to the best of our knowledge, there is no useful theoretical guarantees for compressed sensing in acoustic-array imaging, specifically with respect to the sampling pattern. Here, we pursued an empirical method to establish an intuitive link between the sampling pattern and the uniqueness of the solution. To avoid the bias evaluation, we compared our result with a modified version of back-projection developed in our earlier study [34], here referred to as virtual element weighted delay and sum back-projection (VE-WDSBP). As the hand-held probe is far from an ideal imaging system, mainly due to the associated limited angle of view, spatial under-sampling, and finite-element size, the potentials of both reconstruction algorithms to deal with nonideal imaging scenarios are investigated in detail. The results are evaluated in terms of reconstruction quality, achievable resolution, and quantitative metrics. for the large elements equates to the ensemble signals recorded by λ/2 sub-elements of the same size, which constitutes the sampling pattern.

Image reconstruction problem
In general, acoustic image reconstruction involves estimation of the object inner structure based on emitted or reflected waves. These waves are recorded by transducers in the form of time-gated radiofrequency signals, at several positions. Basically, each recorded signal equates to a projection in the tomographic set-up, and can thus be formulated in the form of least squares (Ax = b), and by using the inverse problem to recover the acoustic (or optical) properties of the tissue.

Problem formulation
A linear discrete forward model for an idealized discrete-time linear shift-variant system can be derived by modeling a spherical acoustic wave for each voxel-transducer pair [35]. The sources can be assumed to be uncorrelated bipolar (optoacoustic) or monopolar (ultrasound), and are arranged in a lattice pattern with spacings of the diffraction limit. In general, the recorded pressure in a linear system can be approximated by the convolution of three terms: the acoustic impulse response (AIR), the spatial impulse response (SIR), and the source signal itself [28]. Along with the AIR, the SIR represents the spatial variant low pass filter in a linear shift invariant system, otherwise known as the transfer function. Basically, it correlates the acoustic wave of the k th source S k to the recorded signal U r ec through the n th element of the array.
where the impulse response h I R is sampled temporally and discretized to consist of L temporal samples. Expanding the formula for every discretized point within the aperture field of view (FoV) yields a time-discrete matrix representation of the above formula.
For the array of N transducers, the received signals represent the response of the transducers to an ensemble of incident waves that emanate from K sources within the FoV, with a vector representation of S whereŜ is the discretized estimation of the source; viz. the final image. The quality ofŜ is profoundly dependent on the properties of M, and thence the aperture. For instance, it was shown that the number of projections is strictly bound to the number of elements, or the frequency response of the transducers influences the spatial and temporal resolution [36]. However, quality assessment of an imaging system solely on the basis of the final image is not a flawless approach, as the inversion step can be devious. Due to the limited number of elements, the forward model or sensing matrix can be 'fat' (i.e., more columns than rows). Therefore, M is a nonsquare matrix with large condition number that further increases the ill-posedness of the inversion, such that M −1 e S exact , thus hiding the S exact among the inverted errors (e). To cope with this issue, regularization can be imposed to reduce the sensitivity of the solution to error, and to compute a stable solution. However, when the variance of error is not known, the choice of method (e.g., generalized cross validation, normalized cumulative periodogram, L-curve, etc.) for finding the optimal regularization parameter would bias the results [37]. Regularization by projection is yet another way in which the solution to the least squares is restricted to lie in a low dimensional subspace W κ (i.e., Ax − b 2 is subjected to x ∈ W κ ). Theoretically, the subspace W κ is spanned by vectors that represent the desirable features for the regularized solution.
A particular examples of this type is the truncated SVD (TSVD), where the κ dimension subspace is the space spanned by the first κ right singular vectors v.
Knowing that the solution exists in this subspace with the requirement of S = W κ z, the regularized solution can be expressed as a projection problem: For small κ, MW κ can be explicitly calculated to solve the projected least-squares problem, and hence to achieve the low-rank approximation [38]. The advantage of the SVD basis is that it can adjust itself to the problem explicitly by accommodating to the matrix M, although there are limitations associated with the SVD, which include the computation cost. Another equally comprehensive yet computationally attractive subspace is Krylov, which is defined as: Krylov subspace with a maximum κ dimension can adapt itself fully to the case in hand by incorporating the information of both of the quantities M and U r ec . Let us acknowledge that the linear least-squares functions are associated with so-called normal equations of the form M T MS = M T U r ec ; with M T M being a Hermitian positive semi-definite matrix of M. This property allows us to use conjugate gradients to solve the least-squares problem. By applying κ steps of conjugate-gradient iteration on the normal function, the S (κ) will be realized. The use of CGLS in the computing of regularized solutions in the Krylov subspace K κ is referred to as regularizing iterations [37]. Intuitively, CGLS constructs a polynomial approximation to the regularized pseudo-inverse of M. The intrinsic polynomial of CGLS explains how it converges faster than SVD-based regularization. When the SVD component (u T i U r ec ) is large, with u i as the i th left singular vector, CGLS automatically constructs a polynomial with eigenvalues (σ) of the M T M projection on the K κ . This acts as a filter factor by enforcing 'near σ i roots' to knock out large SVD components (u T i U r ec ) 2 [39]. Therefore, the solution space is minimized to a subspace of small dimensionality. The downside is that it accommodates to the error correlated with the recorded data U rec , which can lead to false estimation (e.g., artifacts).

Total-variation minimization
The use of an accurate forward model is crucial for iterative reconstruction algorithms, although the choice of the regularization method is of critical importance as well. The aim of the regularization is to make the problem well-posed by constraining the estimate to a prior, based on the statistical assumption of the error. The Hessian matrix M T M has poor conditioning, which in turn results in very slow convergence and non-unique solutions [40]. As the sampling transducers are limited in number, M is an underdetermined matrix equation that obviates the Hadamard well-posedness definition. Thus, additional constraint is crucial to estimate the 'best' candidate. A well-known technique is to impose a linear transformation to precondition the linear least-squares problem. The conjugate gradient can be preconditioned by classical Tikhonov or other forms of 2 in order to face the erroneous. These quadratic techniques are more suitable when the object tends to be smooth, as high-frequency variations are penalized. To tackle the undersampled measurements in a much more efficient way, there is the need to incorporate the nonstandard data fidelity terms [41]. However, in practical computations, the estimateŝ is indistinguishable from anyŝ + e if ŝ + e 2 is less than the round-off error. These challenges can be faced by external regularization that considers nonquadratic penalties that incorporate different types of a priori, such as sparsity and nonnegativity. The remarkable gradient-based sparsifying characteristics of the TV allow achievement of the piece-wise constant 1 form of recovery [42]. TV overcomes the limitation of Tikhonov by preserving the sharp edges.
where ∆ x i and ∆ y i correspond to the horizontal and vertical first-order differences at the i th pixels. The superior performance of TV over the 1 norm ( 1 = i |∆ x i S| + |∆ y i S|) has been demonstrated in many applications of compressed sensing [43][44][45]. The advantage lies on penalization where the discontinuities, such as sharp edges, are preserved but are not necessarily preferred over the smooth ones. Instead, their presence hinges upon the detected signals. By applying TV to Krylov subspace, an inexact solution (although reliable) can be found in the restricted Krylov subspace [46].
that satisfies: where the regularization parameter (λ TV ) balances the trade-off between data fidelity (the first term) and regularization (the second term), and it is chosen heuristically. To the best of our knowledge, there is no general criteria, such as L-curve, to estimate the TV regularization parameter. Moreover, the nonquadratic properties of the TV norm hinders the integration with the linear conjugate gradient [47]. To solve the above equation, we followed the so-called superiorization approach [48,49], where the penalty term is replaced by perturbation steps between iterations, to shift the path toward an intermediate solution [50]. This shift is subjected to the second criterion, defined by the TV. Here we implemented the S-CG-K pseudo-code in Ref. [49] in Matlab.
There are two questions that need to be addressed here: (a) can S be represented sparsely in the formed basis; and (b) whether the enforced sparsity suits the sampling patterns. The answer might lie in the principle of incoherence, which demands a sampling pattern in which the induced artifacts spread through the sparse domain such that their corresponding coefficients can be easily penalized by λ TV [51]. For the given sampling patterns, we compare the performance of this approach with a variation of gold-standard analytical back-projections.

Virtual element weighted back-projections
Alternatively, one can back-project the recorded pressure signals with respect to the time of flight via a spherical polygon, which is centered at the detector element. For ultrasound imaging, the time of flight from the center of the transmitting elements centered at r t is added accordingly. When large defocused elements (i.e., convex surfaces, negative lens), the concept of the virtual element must be applied as well. Briefly, an extra distance that corresponds to the radius of curvature of the defocused element is added, and a weighting with respect to the calculated directivity would be applied on the polygon [52]. To form the image, the previous step is repeated for every receiving element or sub-aperture that is associated with a projection. Finally, all of the projections are compounded coherently on a voxel-by-voxel basis to perform synthetic focusing. The quality of this localization is conditional on the λ/2 inter-element spacing, the active area of the transducer, and the number of angular projections. Albeit, not all can be met flawlessly for the sparse aperture. Consequently, the back-projection estimation suffers from the reconstruction artifact. In our previous study [34], we suggested a modified weighting factor W that penalizes these artifacts in an adaptive manner. The performance of this approach is highly dependent on the number of projections, and it might suit ultrasound techniques better than optoacoustics. For the sake of completeness, we summarize here the algorithm, as follows: 1. Back-project the recorded signals for each element (or sub-aperture); 2. Repeat step 1 for every virtual sub-element of size λ/2; 3. Apply the inverted SIR map to compensate for the imposed directivity; 4. Coherently compound the projections associated with every sub-aperture; 5. Calculate and apply the adaptive weighting factor to surpass the quality of the image in terms of the SNR, contrast, and resolution.

Figures of merit
Here we take advantage of the commonly used image metrics to quantify the quality of the visual information. Meaningful visual information is conveyed by contrast. This is particularly important in ultrasound imaging, specifically for anechoic regions, such as cysts and blood vessels, where the off-axis clutter noise and phase aberrations complicate the anatomical measurements [53,54]. The contrast resolution is commonly quantified by the contrast-to-noise ratio (CNR), which indicates the relation between lesion detectability, object contrast, and acoustic noise (e.g., speckle variance, acoustic clutter.) where S i and S 0 are the spatial means, and σ 2 i and σ 2 0 are standard deviations of the log image inside and outside the lesion, respectively. Additionally, we considered other metrics, such as range of full width at half maximum (FWHM) for each measured point spread function, peak SNR (PSNR), and the structure similarity (SSIM) index. We used the point source/ reflector images to compute the local point spread function. As the point spread function is the measure of intensity distribution, -6 dB is often used in the logarithmic scale images. We studied the axial plane (C-scan) for optoacoustic imaging and the lateral plane (B-mode) for ultrasound imaging. Multiple point reflectors (for ultrasound) and point sources (for optoacoustics) are positioned within the FoV as a measure of the point spread function of the system. Certain structural artifacts occur due to the compressed sensing and reconstruction processes. Along with the FWHM, the PSNR and the SSIM provide additional information for system performance evaluation [55][56][57]. The PSNR is a common image quality assessment metric, and is defined as: where M AX I is the the maximum possible value of the image, and f and g are the reference and test images, respectively. A higher PSNR indicates less distortion and high reconstruction quality. However, the PSNR is not sensitive to the pixel spatial relationship. Another widely used metric, the SSIM, assesses the image quality based on distortion of the structural information.
where µ is the average intensity of the given image, and c is the infinitesimal variable to stabilize the division.

Results
A natural question arises on the abilities of the designed arrays, represented in Fig. 1, in delivering high-quality images. The two introduced reconstruction algorithms were used to evaluate the performance of arrays in terms of the quality of the detected signal and artifacts, and to characterize the data fidelity.

Ultrasound
To exploit the achievable ultrasound resolution through the proposed sampling patterns, we simulated the total focusing method, in which signals from every transmit-receive pair are processed [58]. Here, we only investigate the VE-WDSBP algorithm performance, as the forward model is relatively large [33] and requires a powerful computer [31]. Thus, the model-based reconstruction is deferred to later studies. Figure 2.(a) illustrates the results of the B-mode reconstruction of the numerical phantom that contains three anechoic cysts of 3 mm in diameter, using the VE-WDSBP method. The CNR of each image has been calculated, considering the speckle background as the signal-dependent noise. Even though the annular circular array shows better contrast (Fig. 2.(b)), the segmented annular array represents a uniform energy distribution and speckle shape across the entire FoV. Figure 2.(c) depicts the achievable resolution and the dynamic range (main lobe to side lobe ratio) for each array. The simulated phantom contains a set of point reflectors with unity amplitude distributed over the volume of 16 mm by 32 mm, for y-plane= 0. The superior estimation of the segmented annular array in obtaining an isotropic spatial response, precise localization and slightly better dynamic range (1-5 dB) is evident in Fig. 2.(d). The lateral resolution is the same for all of the apertures, and is in the range of 223-229µm.  Figure 3 shows the optoacoustic imaging performance of three virtual arrays in C-scan. A set of point sources situated at the depth of 20 mm are distributed equidistantly (1.5 mm) within the axial plane of 16 × 16 mm 2 , parallel to the surface of the arrays. The C-scan reconstruction was carried out using the two methods discussed, VE-WDSBP and CGLS. As these results are suggesting, it is only the segmented annular array that can retrieve all of the points in the medium, using the CGLS algorithm. The peculiar geometry of the segmented annular array allows for every voxel within the volume of interest to be sensed by all of the elements. Figure 3 suggests that VE-WDSBP is a reliable method only when the FoV is equal to the aperture size, and with lower resolution compared to CGLS. Regardless of the reconstruction algorithm, the other two apertures are not capable of isotropic focusing. Indeed, the estimated values for many of the points are below the artifact level.  Figure 4 shows the result of CGLS TV for the segmented annular array, which is compared with CGLS. These results are indicative and perhaps conclusive for subjective qualitative assessments. The CGLS TV shows a more robust estimation, with no sign of artifacts up to 30 dB. Table 1 quantifies these through a set of metrics as a measure of the data fidelity, and outlines the range of achievable resolution. Based on these metrics, the least distortion and best match with respect to the ground truth is provided by CGLS TV for the segmented annular array. Similarly, the best resolution with the smallest FWHM is achievable by CGLS TV for the segmented annular array.

Discussion and Conclusion
In this paper, we evaluated the effects of sensing element patterns suggested by our previous pre-reconstruction analysis on the imaging performance of a sparse bimodal hand-held probe.
A major emphasis was put on modifying the image reconstruction algorithm to increase the fidelity of the object estimation, and hence the accuracy of the reconstruction. The two befitting classes of the reconstruction algorithm were modified to incorporate the geometric properties of the aperture. An adaptive weighting factor along the virtual element concept was used in the VE-WDSBP algorithm, which allowed to relax the imposed λ/2 inter-element spacing required in the classical sampling. Despite the honed accuracy of the estimation, there are shortcomings in the optoacoustic images (Fig. 3) due to the limited number of available projections. Specifically, the marginal areas suffer the most due to the limited view angle associated with the individual elements in the aperture. The diffraction imposed by the finite size of the transducer is demanding for an alternate approach in which the effects of a spatiotemporal filter are nullified. By using the forward model as the imaging operator, the model-based approach offers a more accurate solution, with the results in favor of the segmented annular array. In the reconstruction of the C-scan optoacoustics, the CGLS clearly outperforms the VE-WDSBP. To increase the robustness for outliers and sharper profiles, a form of TV minimization using the superiorization technique was incorporated. The important feature is that the estimation errors can be minimized by perturbing the solution within each iteration. This algorithm remarkably mitigated the artifacts that arose from the limited angle of view of the hand-held probe, as well as the sparse sampling. Withal, VE-WDSBP has been used in ultrasound B-mode images for the sake of simplicity of calculation and to avoid the computational cost of the large total focusing method matrix inversion. Considering the valuable diagnostic information of the speckles in B-mode ultrasound, we noted the distinguishable different speckle patterns between the arrays. Initially, we postulated a granular structure as the resolution cells. Nonetheless, the shapes of these coherent interference artifacts are highly associated with the phase of back-scattered echoes, and their averaging over the surface of the aperture. Therefore, this leaves the question whether the structure offered by the annular spiral is better or worse in comparison with the segmented annular array. While we successfully demonstrated the superior performance of the segmented annular array in terms of resolution, uniform sensitivity, and CNR, the blurring artifact is clearly visible for the marginal points for all of the images. We interpret this as the limitation of the algorithm for FoVs larger than the aperture size, and not as a physical limitation of the array itself.
Based on the reconstructed images, a qualitative and quantitative comparison analysis was followed between the three proposed geometries. Overall, the imaging performance of the segmented annular array outweighs the others, considering the resolution, detectability, and uniform response for both modalities. In conclusion, this study presents the development of a hand-held probe and the adopted algorithms for volumetric optoacoustic ultrasound imaging. The dynamics of a new geometry for volumetric optoacoustic ultrasound imaging has been assessed. Last but not least, we restricted our analysis to the geometric properties of the arrays. As optoacoustic/ ultrasound imaging, the segmented annular array provided acceptable performance and is expected to have an impact on bimodal portable measurement systems. Further optimization with respect to the transducing properties and the experimental evaluation are deferred to future studies.