Compressed sensing and sparsity in photoacoustic tomography

Increasing the imaging speed is a central aim in photoacoustic tomography. This issue is especially important in the case of sequential scanning approaches as applied for most existing optical detection schemes. In this work we address this issue using techniques of compressed sensing. We demonstrate, that the number of measurements can significantly be reduced by allowing general linear measurements instead of point-wise pressure values. A main requirement in compressed sensing is the sparsity of the unknowns to be recovered. For that purpose we develop the concept of sparsifying temporal transforms for three-dimensional photoacoustic tomography. We establish a two-stage algorithm that recovers the complete pressure Signals in a first step and then applies a standard reconstruction algorithm such as back-projection. This yields a novel reconstruction method with much lower complexity than existing compressed sensing approaches for photoacoustic tomography. Reconstruction results for simulated and for experimental data verify that the proposed compressed sensing scheme allows to significantly reducing the number of spatial measurements without reducing the spatial resolution.


Introduction
Photoacoustic tomography (PAT), also known as optoacoustic tomography, is a novel non-invasive imaging technology that beneficial combines the high contrast of pure optical imaging with the high spatial resolution of pure ultrasound imaging (see [1,2,3]). The basic principle of PAT is as follows (compare Figure 1). A semitransparent sample (such as a part of a human patient) is illuminated with short pulses of optical radiation. A fraction of the optical energy is absorbed inside the sample which causes thermal heating, expansion, and a subsequent acoustic pressure wave depending on the interior absorbing structure of the sample. The acoustic pressure is measured outside of the sample and used to reconstruct an image of the interior. An object is illuminated with a short optical pulse that induces an acoustic pressure wave. The pressure wave is measured on discrete locations on a surface and used to reconstruct an image of the interior absorbing structure. The small spheres indicate the possible detector or sensor locations on a regular grid on the measurement surface.

Classical measurement approaches
The standard approach in PAT is to measure the acoustic pressure with small detector elements distributed on a surface outside of the sample; see Figure 1. The spatial sampling step size limits the spatial resolution of the pressure data and the (lateral) resolution of the final reconstruction. ‡ Consequently, high spatial resolution requires a large number of detector locations. Ideally, for high frame rate, the pressure data are measured in parallel with a large array made of small detector elements. However, the signal-to-noise ratio and therefore the sensitivity decreases for smaller detector elements and producing a large array with high bandwidth is costly and difficult to fabricate.
As an alternative to the usually employed piezoelectric transducers, optical detection schemes have been used to acquire the pressure data [4,5,6,7]. In these methods an optical beam is raster scanned along a surface. In case of non-contact photoacoustic imaging schemes the ultrasonic waves impinging on the sample surface change the phase of the reflected light, which is demodulated by interferometric means and a photodetector [5,6,7]. For Fabry-Perot film sensors, acoustically induced changes of the optical thickness of the sensor lead to a change in the reflectivity, which can be measured using a photo diode [4]. Equally for both techniques, the ultrasonic data are acquired at the location of the interrogation beam by recording the time-varying output of the photodetector. In order to collect sufficient data the measurement process has to be repeated with changed locations of the interrogation beam. Obviously, such an approach slows down the imaging speed. The imaging speed can be increased by multiplying the number of interrogation beams. For example, for a planar Fabry-Perot sensor a detection scheme using 8 interrogation beams has been demonstrated in [8].
Another, less straight forward, approach to increase the measurement speed is the use of patterned interrogation together with compressed sensing techniques. Patterned interrogation was experimentally demonstrated using a digital micromirror device (DMD) in [9,10]. Using digital micromirror devices or spatial light modulators to generate such interrogation patterns together with compressed sensing techniques allows to reduce the number of spatial measurements without significantly increasing the production costs. For such approaches, we develop a compressed sensing scheme based sparsifying temporal transforms originally introduced for PAT with integrating line detectors in [11,12].

Compressed sensing
Compressed sensing (or compressive sampling) is a new sensing paradigm introduced recently in [13,14,15]. It allows to capture high resolution signals using much less measurements than advised by Shannon's sampling theory. The basic idea in compressed sensing is replacing point measurements by general linear measurements, where each measurement consists of a linear combination Here x is the desired high resolution signal (or image), y the measurement vector, and A the m × n measurement matrix. If m ≪ n, then (1) is a severely under-determinated system of linear equations for the unknown signal. The theory of compressed sensing predicts that under suitable assumptions the unknown signal can nevertheless be stably recovered from such data. The crucial ingredients of compressed sensing are sparsity and randomness.
(i) Sparsity: This refers to the requirement that the unknown signal is sparse, in the sense that it has only a small number of entries that are significantly different from zero (possibly after a change of basis).
(ii) Randomness: This refers to selecting the entries of the measurement matrix in a certain random fashion. This guarantees that the measurement data are able to sufficiently well separate sparse vectors.
In this work we use randomness and sparsity to develop novel compressed sensing techniques for PAT.  In the PAT literature, two types of binary matrices allowing compressed sensing have been proposed (see Figure 3). In [9,10] scrambled Hadamard matrices have been used and experimentally realized. In [11,12] expander matrices have been used, where the measurement matrix is sparse and has exactly d ones in each column, whose locations are randomly selected. Another possible choice would be a Bernoulli matrix where any entry is selected randomly from two values with equal probability. In all three cases, the random nature of the selected coefficients yields compressed sensing capability of the measurement matrix (see Appendix A for details). As in [11,12], in this study we use expander matrices. For the experimental verification such measurements are implemented virtually by taking full point-measurements in the experiment and then computing compressed sensing data numerically. This can be seen as proof of principle; implementing pattern interrogation in our contact-free photoacoustic imaging device is an important future aspect. Besides the random nature of the measurement matrix, sparsity of the signal to be recovered is the second main ingredient enabling compressed sensing. As in many other applications, sparsity often does not hold in the original domain. Instead sparsity holds in a particular orthonormal basis, such as a wavelet or curvelet basis [16,17]. However, such a change of basis can destroy the compressed sensing capability of the measurement matrix (for example, in the case of expander matrices). In order to overcome this limitation, in [11,12] we developed the concept of a sparsifying temporal transformation. Such a transform applies in the temporal variable only and results in a filtered pressure signal that is sparse. Because any operation acting in the temporal domain intertwines with the measurement matrix, one can apply sparse recovery to estimate the sparsified pressure. The photoacoustic source can be recovered, in a second step, by applying a standard reconstruction algorithm to the sparsified pressure.

Outline of this paper
In this paper we develop a compressed sensing scheme based on a sparsifying transform for three-dimensional PAT (see Section 2). This complements our work [11,12], where we introduced the concept of sparsifying transforms for PAT with integrating line detectors. Wave propagation is significantly different in two and in three spatial dimensions. As a result, the sparsifying transform proposed in this work significantly differs from the one presented in [11,12]. In Appendix A we provide an introduction to compressed sensing serving as guideline for designing compressed sensing matrices and highlighting the role of sparsity. In Section 3 we present numerical results on simulated as well as on experimental data from a non-contact photoacoustic imaging setup [18]. These results indicate that the number of spatial measurements can be reduced by at least a factor of 4 compared to the classical point sampling approach. The paper concludes with a discussion presented in Section 4 and a short summary in Section 5.

Compressed sensing for PAT in planar geometry
In this section we develop a compressed sensing scheme for PAT, where the acoustic signals are recorded on a planar measurement surface. The planar geometry is of particular interest since it is the naturally occurring geometry if using optical detection schemes like the Fabry-Perot sensor or non-contact imaging schemes. We thereby extend the concept of sparsifying temporal transforms introduced for two-dimensional wave propagation in [11,12]. We emphasize that the proposed sparsifying transform for the three-dimensional wave equation can be used for any detection geometry. An extension of our approach to general geometry would, however, complicate the notation.

PAT in planar geometry
Suppose the photoacoustic source distribution p 0 (r) is located in the upper half space {(x, y, z) ∈ R 3 | z > 0}. The induced acoustic pressure p(r, t) satisfies the wave equation where ∆ r denotes the spatial Laplacian, ∂/∂t is the derivative with respect to time, c the sound velocity, and δ(t) the Dirac delta-function. Here (∂δ/∂t)p 0 acts as the sound source at time t = 0 and it is supposed that p(r, t) = 0 for t < 0. We further denote by the pressure data restricted to the measurement plane. PAT in planar recording geometry is concerned with reconstructing p 0 from measurements of Wp 0 . For recovering p 0 from continuous data explicit and stable inversion formulas, either in the Fourier domain or in the time domain, are well known. A particularly useful inversion method is the universal backprojection (UBP), Here r = (x, y, z) is a reconstruction point, r S = (x S , y S , 0) a point on the detector surface, and |r − r S | the distance between r and r S . The UBP has been derived in [19] for planar, spherical and cylindrical geometries. The two-dimensional version of the UBP where r = (x, z) and r S = (x S , 0) has been first obtained in [20]. In the recent years, the UBP has been generalized to elliptical observation surface in two and three spatial dimensions [21,22], and various geometries in arbitrary dimension (see [23,24,25]).

Standard sampling approach
In practical application, only a discrete number of spatial measurements can be made. The standard sensing approach in PAT is to distribute detector locations uniformly on a part of the observation surface. Such data can be modeled by The UBP algorithm applied to semi-discrete data (4) consists in discretizing the spatial integral in (3) using a discrete sum over all detector locations and evaluating it for a discrete number of reconstruction points. This yields to the following UBP reconstruction algorithm.
(S1) Filtration: (S2) Backprojection: For any k set In Algorithm 1, the first step (S1) can be interpreted as temporal filtering operation. The second step (S2) discretizes the spatial integral in (3) and is called discrete backprojection. The numbers w i are weights for the numerical integration and account for density of the detector elements.

Compressed sensing approach
Instead of using point-wise samples, the proposed compressed sensing approach uses linear combinations of pressure values where A is a binary m × n random matrix, and p[i, t] are point-wise pressure data.
In the case of compressed sensing we have m ≪ n, which means that the number of measurements is much smaller than the number of point-samples. As shown in Appendix A, Bernoulli matrices, subsampled Hadamard matrices as well as expander matrices are possible compressed sensing matrices.
In order to recover the photoacoustic source from compressed sensing data (5), one can use the following two-stage procedure. In the first step we recover the pointwise pressure values from the compressed sensing measurements. In the second step, one applies a standard reconstruction procedure (such as the UBP Algorithm 1) to the estimated point-wise pressure to obtain the photoacoustic source. The first step can Here Ψ ∈ R n×n is a suitable basis (such as orthonormal wavelets) that sparsely represents the pressure data and λ is a regularization parameter. Note that (6) can be solved separately for every t ∈ [0, T ] which makes the two-stage approach particularly efficient. The resulting two-stage reconstruction scheme is summarized in Algorithm 2.
As an alternative to the proposed two-stage procedure, the photoacoustic source could be recovered directly from data (5) based on minimizing the ℓ 1 -Tikhonov regularization functional [26,27] Here Ψ is a suitable basis that sparsifies the photoacoustic source p 0 . However, such an approach is numerically expensive since the three-dimensional wave equation and its adjoint have to be solved repeatedly. The proposed two-step reconstruction scheme is much faster because it avoids evaluating the wave equation, and the iterative reconstruction decouples into lower-dimensional problems for every t. In the implementation one takes the number of iterations (at least) in the order of N and therefore the two-step procedure is faster by at least one order of magnitude.
Compressed sensing schemes without using random measurements have been considered in [28,29,30]. In these approaches an optimization problem of the form (7) is solved, where A is an under-sampled measurement matrix. Especially when combined with a total variation penalty such approaches yields visually appealing result.
Strictly taken, the measurements used there are not shown to yield compressed sensing, which would require some form of incoherence between the measurement matrix and the sparsifying basis (usually established by randomness). For which class of phantoms undersampled point-wise measurements have compressed sensing capability for PAT is currently an unsolved problem.

Sparsifying temporal transform
In order that the pressure data can be recovered by (6) one requires a suitable basis Ψ ∈ R n×n such that the pressure is sparsely represented in this basis and that the composition A • Ψ is a proper compressed sensing matrix. For expander matrices these two conditions are not compatible. To overcome this obstacle in [11,12] we developed the concept of a sparsifying temporal transform for the two-dimensional case in circular geometry. Below we extend this concept to three spatial dimensions using combinations of point-wise pressure values.
Suppose we apply a transformation T to the data t → y[ · , t] that only acts in the temporal variable. Because the measurement matrix A is applied in the spatial variable, the transformation T and the measurement matrix commute, which yields We call T a sparsifying temporal transform, if Tp[ · , t] ∈ R n is sufficiently sparse for a suitable class of source distributions and all times t. In this work we propose the following sparsifying spatial transform The sparsifying effect of this transform is illustrated in Figure 4 applied to the pressure data arising from a uniform spherical source. The reason for choice of (9) is as follows: It is well known that the pressure signals induced by a uniform absorbing sphere has an N-shaped profile. Therefore, applying the second temporal derivative to p yields a signal that is sparse. The modification of the second derivative is used because the term ∂ t t −1 p appears in the universal backprojection and therefore only one numerical integration is required in the implementation of our approach. Finally, we empirically found that the leading factor t 3 results in well balanced peaks in Figure 4 and yields good numerical results. Having a sparsifying temporal transform at hand, we can construct the photoacoustic source by the following modified two-stage approach. In the first step recover an approximationq[ · , t] ≃ Tp[ · , t] by solving In the second step, we recover the photoacoustic source by implementing the UBP expressed in terms of the sparsified pressure,   Here r = (x, y, z) is a reconstruction point and r S = (x S , y S , 0) a point on the measurement surface. The modified UBP formula (11) can be implemented analogously to Algorithm 1. In summary, we obtain the following reconstruction algorithm.
(S2) UBP algorithm for sparsified data: Since (10) can be solved separately for every t, the modified two-stage Algorithm 3 is again much faster than a direct approach based on (7). Moreover, from general recovery results in compressed sensing presented in the Appendix, 3 yields theoretical recovery guarantees for Bernoulli, subsampled Hadamard matrices as well as expander matrices (adjacency matrices of left d-regular graphs); see Figure 3.    Table 1. Normalized ℓ α -reconstruction errors for α = 1, 2.

Results for simulated data
We consider reconstructing a superposition of two spherical absorbers, having centers in the vertical plane {(x, y, z) ∈ R 3 | y = 0}. The vertical cross section of the photoacoustic source is shown in Figure 5(a The choice m = 1024 corresponds to an reduction of measurements by a factor 4. The expander matrix A was chosen as the adjacency matrix of a randomly left d-regular graph with d = 15; see Example 10 in the Appendix. The pressure signals p[i, t] have been computed by the explicit formula for the pressure of a uniformly absorbing sphere [31] and evaluated at 243 times points ct uniformly distributed in the interval [0, 6]. Figure 5 shows the reconstruction results using 4096 point samples using Algorithm 1 (Figure 5(b)) and the reconstruction from 1024 compressed sensing measurements using Algorithm 3 ( Figure 5(c)). The reconstruction has been computed at 241 × 41 grid points in a vertical slice of size [−3, 3] × [0, 1]. The ℓ 1 -minimization problem (10) has been solved using the FISTA [32]. For that purpose the matrix A has been rescaled to have 2-norm equal to one. The regularization parameter has then been set to λ = 10 −5 and we applied 7500 iterations of the FISTA with maximal step size equal to one. We see that the image quality from the compressed sensing reconstruction is comparable to the reconstruction from full data using only a fourth of the number of measurements. For comparison purpose, Figure 5(d) also shows the reconstruction using 1024 point samples. One clearly recognizes the increase of undersampling artifacts and worse image quality compared to the compressed sensing reconstruction using the same number of measurements. A more precise error evaluation is given in Table 1, where we show the normalized ℓ α -error α k |p 0 [k] − p CS 0 [k]| α /N for α = 1 and α = 2. The reconstruction error in ℓ 1 -norm is even slightly smaller for the compressed sensing reconstruction than for the full reconstruction. This might be due to a slight denoising effect of ℓ 1 -minimization that removes some small amplitude errors (contributing more to the more ℓ 1 -norm than to the ℓ 2 -norm). Figure 6 shows the pressure corresponding to the   absorbers shown in Figure 5 together with the sparsified pressure and its reconstruction from compressed sensing data. Finally, Figure 7 shows the reconstruction (restricted to [−1, 1] × [0, 1]) using Algorithm 3 for varying compression factors n/m = 16, 8, 4, 2, 1. In all cases d = 15 and λ = 10 −5 have been used and 7500 iterations of the FISTA have been applied. As expected, the reconstruction error increases with increasing compression factor. One further observes that the compression factor of 4 seems a good choice since for smaller compression factor the error increases more severely. In our numerical studies (not shown) we observed that also for different discretizations a compression factor of 4 is a good choice.

Results for experimental data
Experimental data have been obtained from a silicone tube phantom as shown in Figure 8. The silicone tube was filled with black ink (Pelikan 4001 brillant black, absorption coefficient of 54 /cm at 740 nm), formed to a knot, and immersed in a milk/water emulsion. The outer and inner diameters of the tube were 600 µm and 300 µm, respectively. Milk was diluted into the water to mimic the optical scattering properties of tissue; an adhesive tape, placed on the top of the water/milk emulsion, was used to mimic skin. Photoacoustic signals were excited at a wavelength of 740 nm with nanosecond pulses from an optical parametric oscillator pumped by a frequency doubled Nd:YAG laser. The radiant expose was 105 Jm −2 , which is below the maximum permissible exposure for skin of 220 Jm −2 . The resulting ultrasonic signals were detected on the adhesive tape by a non-contact photoacoustic imaging setup as described in [18]. In brief, a continuous wave detection beam with a wavelength of 1550 nm was focused onto the sample surface. The diameter of the focal spot was about 12 µm. Displacements on the sample surface, generated by the impinging ultrasonic waves, change the phase of the reflected laser beam. By collecting and demodulating the reflected light, the phase information and, thus, information on the ultrasonic displacements at the position of the laser beam can be obtained. To allow three-dimensional measurements, the detection beam is raster scanned along the surface. The obtained displacement data do not fulfill the wave equation and cannot be used for image reconstruction directly. Thus, to convert the displacement data to a quantity (roughly) proportional to the pressure, the first derivative in time of the data was calculated [5]. each detector location has been used d = 10 times in total. Figure 9 shows the maximum amplitude projections along the z, x, and y-direction, respectively, of the three-dimensional reconstruction from compressed sensing data using Algorithm 3. The sparsified pressure has been reconstructed by minimizing (10) with the FISTA using 500 iterations and a regularization parameter of 10 −5 . Further, the three-dimensional reconstruction has been evaluated at 110 × 122 × 142 equidistant grid points. For comparison purpose, in Figure 10 we show the maximum amplitude projections from the UBP Algorithm 1 applied to the original data set. We observe that there is only a small difference between the reconstructions in terms of quality measures such as contrast, resolution and signal to noise ratio. Only, the structures in the compressed sensing reconstruction appear to be slightly less regular. A detailed quality evaluation is beyond the scope of this paper, which aims at serving as proof of principle of our twostage compressed sensing approach with sparsifying transform. However the compressed sensing approach uses only a fourth of the number of measurements of the original data set. This clearly demonstrates the potential of our compressed sensing scheme for decreasing the number of measurements while keeping the image quality.  Figure 11 shows histograms of the pressure values before and after applying the sparsifying temporal transform. In both cases the histograms are concentrated around the value zero. This implies the approximate sparsity and therefore justifies our approach, even if the phantom is not a superposition of uniformly absorbing spheres. It further shows that for the present situation one could even apply our two-stage procedure without applying the sparsifying transform.

Discussion
To ensure sparsity in PAT, the standard approach is choosing a suitable basis which sparsely represents the pressure data on the measurement surface. In this paper we developed a different concept based on sparsifying temporal transforms. Since any temporal transform intertwines with the spatial measurements our approach can be used in combination with any measurement matrix that is incoherent to the pixel basis. This includes binary random matrices such as Bernoulli, Hadamard, or expander matrices (see Appendix A for details). According to the compressed sensing theory, expander matrices can be used with binary entries 0 and 1. Bernoulli and Hadamard matrices, on the other hand, should be used with zero mean (achieved, for example taking ±1 as binary entries). As 0/1 entries can be practically most simple be realized, for Bernoulli and Hadamard matrices the mean value has to be subtracted after the measurements process [9]. Avoiding such additional data manipulations is one reason why we currently work with expander matrices. Another reason is the sparse structure of expander matrices which can be used to accelerate image reconstruction. In future work we will also investigate the use of Bernoulli and Hadamard in combination with sparsifying temporal or spatial transforms, and compare the performance of these measurement ensembles in different situations.
As mentioned in the introduction, patterned interrogation can be used to practically implement compressed sensing in PAT. It has been realized by using a digital micromirror device [9,10], where a Fabry-Perot sensor was illuminated by a widefield collimated beam. The reflected beam, carrying the ultrasonic information on the acoustic field, was then sampled by the DMD and the spatially integrated response was measured by a photodiode. Another possibility is the application of spatial light modulators (SLMs), which are able to modulate the phase of the light. By using such SLMs arbitrary interrogation patterns can be generated directly on a sample surface [33]. SLMs are commercially available for a wavelength of 1550 nm, which is the most common wavelength used in optical detection schemes. However, also for other wavelengths appropriate devices are available. State-of-the-art SLMs provide typical resolutions between 1920 ×1080 pixels and 4094 ×2464 pixels, which is sufficient for the compressed imaging scheme presented in this work. For a resolution of 1920 × 1080, the typically achieved frame rate is 60 Hz. This is faster than the pulse repetition rate of commonly used excitation laser sources for PAT, thus enabling single shot measurements. If a faster repetition rate is required, one could use SLMs with a higher frame rate. These, however, usually exhibit lower resolution.
For the Fabry-Perot etalon sensors, the wavelength of the interrogation beam has to be tuned, such that it corresponds to the maximum slope of the transfer function of the sensor. Since for the patterned interrogation scheme only one wavelength is used for the acquisition of the integrated response this demands high quality Fabry-Perot sensors with highly uniform sensor properties. For non-contact schemes, using Mach-Zehnder or Michelson based demodulation, the sensitivity does not depend on the wavelength as the sensor is stabilized by the relative phase between the interrogation beam and a reference beam. However, if the surface is not adequately flat, the phase of the reflected light is spatially varying. Thereby, the sensitivity changes over the detection surface, and even locations with zero or low response could exist. Here, the phase modulation capability of SLMs offers the possibility to compensate for this. In general, each pixel of a SLM can shift the phase of light at least up to 2π and the resulting phase distribution is impressed on the reflected beam. Separate lens functions can be applied to each detection point individually by using distinct kernels for each of these points [34]. In case the shape of the sample surface is known, the phase at each detection point can be chosen to compensate for the phase shifts caused by the imperfect sample surface. With this method it is even possible to choose different focal distances for each detection point, so that detection on even rougher surfaces could be facilitated.

Conclusion
To speed up the data collection process in sequential scanning PAT while keeping sensitivity high without significantly increase the production costs, one has to reduce the number of spatial measurements. In this paper we proposed a compressed sensing scheme for that purpose using random measurements in combination with a sparsifying temporal transform. We presented a selected review of compressed sensing that demonstrates the role of sparsity and randomness for high resolution recovery. Using general results from compressed sensing we were able to derive theoretical recovery guarantees for our approach based on sparsifying temporal transforms. Further, this comes with a fast algorithmic implementation.

Acknowledgments
This work has been supported by the Austrian Science Fund (FWF), project number P25584-N20, the Christian Doppler Research Association (Christian Doppler Laboratory for Photoacoustic Imaging and Laser Ultrasonics), the European Regional Development Fund (EFRE) in the framework of the EU-program Regio 13, the federal state Upper Austria. S. Moon thanks University of Innsbruck for its hospitality during his visit. The work of S. Moon has been supported by the National Research Foundation of Korea grant funded by the Korea government (MSIP) (2015R1C1A1A01051674) and the TJ Park Science Fellowship of POSCO TJ Park Foundation.

Appendix A. Ingredients from compressed sensing
In this section we present the basic ingredients of compressed sensing that explains the choice of the measurement matrices and the role of sparsity in PAT. The aim of compressed sensing is to stably recover a signal or image modeled by vector x ∈ R n from measurements y = Ax + e . (A.1) Here A ∈ R m×n with m ≪ n is the measurement matrix, e is an unknown error (noise) and y models the given noisy data. The basic components that make compressed sensing possible are sparsity (or compressibility) of the signal x and some form of randomness in the measurement matrix A.
Appendix A.1. Sparsity and compressibility The first basic ingredient of compressed sensing is sparsity, that is defined as follows. In Definition 1, ♯(S) stands for the number of elements in a set S. Therefore x 0 counts the number of non-zero entries in the vector x. In the mathematical sense · 0 is neither a norm or a quasi-norm § but it is common to call · 0 the ℓ 0 -norm. It § A quasi-norm satisfies all axioms of a norm, except that the triangle inequality is replaced by the weaker inequality stands for the ℓ p -norm. Recall that · p is indeed a norm for p ≥ 1 and a quasi-norm for p ∈ (0, 1). Signals of practical interest are often not sparse in the strict sense, but can be well approximated by sparse vectors. For that purpose we next define the s-term approximation error that can be used as a measure for compressibility.
Definition 2 (Best s-term approximation error). Let s ∈ N and x ∈ R n . One calls the best s-term approximation error of x (with respect to the ℓ 1 -norm).
The best s-term approximation error σ s (x) measures, in terms of the ℓ 1 -norm, how much the vector x fails to be s-sparse. One calls x ∈ R n compressible, if σ s (x) decays sufficiently fast with increasing s. The estimate (see [35]) shows that a signal is compressible if its ℓ q -norm is sufficiently small for some q < 1.

Appendix A.2. The RIP in compressed sensing
Stable and robust recovery of sparse vectors requires the measurement matrix to well separate sparse vectors. The RIP guarantees such a separation.
Definition 3 (Restricted isometry property (RIP)). Let s ∈ N and δ ∈ (0, 1). The measurement matrix A ∈ R m×n is said to satisfy the RIP of order s with constant δ, if, for all s-sparse x ∈ R n , We write δ s for the smallest constant satisfying (A.4).
In the recent years, many sparse recovery results have been derived under various forms of the RIP. Below we give a result derived recently in [36].
Theorem 4 (Sparse recovery under the RIP). Let x ∈ R n and let y ∈ R m satisfy y − Ax 2 ≤ ǫ for some noise level ǫ > 0. Suppose that A ∈ R m×n satisfies the RIP of order 2s with constant δ 2s < 1/2, and let x ⋆ solve minimize z z 1 such that Az − y 2 ≤ ǫ . (A.5) Then, for constants c 1 , c 2 only depending on δ 2s , x − x ⋆ 2 ≤ c 1 σ s (x)/ √ s + c 2 ǫ. Opposed to Bernoulli and subsampled Hadamard matrices, a lossless expander does not satisfy the ℓ 2 -based RIP. However, in such a situation, one can use the following alternative recovery result.
Choosing a d-regular bipartite graph uniformly at random yields a lossless expander with high probability. Therefore, Theorem 9 yields stable and robust recovery for such type of random matrices.
Example 10 (Expander matrix). Take A ∈ {0, 1} m×n as the adjacency matrix of a randomly chosen left d-regular bipartite graph. Then A has exactly d ones in each column, whose locations are uniformly distributed. Suppose further that for some constant c θ only depending on θ the parameters d and m have been selected according to m ≥ c θ s(log(n/s) + 1) d = 2 log(n/s) + 2 θ .
Then, θ s ≤ θ with probability tending to 1 as n → ∞. Consequently, for the adjacency matrix of a randomly chosen left d-regular bipartite graphs, called expander matrix, we have stable and robust recovery by (A.10).