On the Building Blocks of Sparsity Measures

Understanding the mathematics and the innate machinery of sparsity measures is instrumental in the proper usage of such information measures in various application arenas, ranging from information collection and sensing, to communications and signal processing. In this letter, the structure of sparsity measures is investigated. Specifically, it is shown that sparsity measures satisfying proper sparsity axioms may only be constructed by vector norms. Moreover, the asymptotic behavior of sparsity measures is studied. Owing to their mathematical structure, our numerical results illustrate a convergence of sparsity measures, as the number of input samples grows large.


I. INTRODUCTION
T HE property that most of the energy of an input signal concentrates in a few coefficients is commonly known as compressibility or sparsity [1]. Sparsity plays a central role in a diverse set of signal processing applications such as image processing [2], [3] and deep learning [4], as well as the development of signal processing techniques such as those in signal recovery [5], [6], denoising [6], compressed sensing [7], and sampling [8].
Definition 1 (Sparsity Measure): A sparsity measure s is a function projecting the n dimensional vector of coefficients to a real number through the mappings [1], [9]: which must follow certain axioms as detailed in Table I . Here, n ∈ N is the number of coefficients (samples), and N and C denote the set of natural and complex numbers. Since the nonnegative energy of the coefficients is of interest, the sparsity is measured using the magnitudes of the coefficients.  [10].
Definition 3 (Generalized Sparsity Measure): A sparsity measure s satisfying all the axioms and constructed as a function of more than two distinct vector norms is called a generalized sparsity measure.
As mentioned earlier, sparsity measures should satisfy a certain set of axioms which encapsulate the effect of different actions on their input argument, including permutations, ratio changes or relative differences [1], [10]. The axioms that must be satisfied by sparsity measures are summarized in Table I. Concentration, scaling, homogeneous growth and replication were originally used in [11] in economics to measure the inequity of wealth distribution; [12] added bounds for dispersion functions; [9] and [13] added symmetry and continuity for sparsity and fairness functions, respectively; [14] added quasi-convexity for reward-risk ratio functions; [9] enumerated concentration, scaling, homogeneous growth, replication, regularity and completeness as six desirable properties which a sparsity measure should have; and [1] proposed a mathematical formalism, a joint axiomatic characterization, towards understanding sparsity and entropy measures.
To perceive the physical meaning of axioms for sparsity measures presented in Table I, one can view the axioms through the lens of impact on the energy of signal coefficients as discussed in the following:   r Replication: Sparsity is invariant under Replication. If there is a twin signal with identical energy distribution, the sparsity of energy in one signal is the same for the combination of the two. In this letter, the structure of sparsity measures is investigated. Specifically, based on axioms in Table I, it is shown that sparsity measures may only be constructed through vector norms. Moreover, the convergence of sparsity measures is studied. Interestingly, due to the mathematical structure of sparsity measures, our numerical results illustrate the convergence of such information measures as the number of input samples grows large. We enumerate most common sparsity measures in the signal processing literature in Section II. In Section III, we present the building blocks of sparsity measures. Section IV is dedicated to a study of the convergence of sparsity measures, before concluding the paper in Section V.
Notation: We use bold lowercase letters for vectors and bold uppercase letters for matrices. x n denotes the n norm of the vector x = [x k ], defined as (Σ k |x k | n ) 1 n . We use { x n } to denote a set of vector norms. R n + is the real non-negative coordinate space of dimension n. s(.) denotes a sparsity measure. Given a pair of row vectors x = [x k ] and y = [y k ], the following operators are defined: where α ∈ R + and ⊕ denotes concatenation.

II. SPARSITY MEASURES
One of the most important applications of sparsity measures is the sparse recovery problem. Table II enumerates some commonly used measures for sparsity. In [15], the 0 , 0 , 1 , p and κ 4 measures were compared. The 0 function is a traditional measure for sparsity in many mathematical settings; however, it is not practical since an infinitesimally small value coefficient is treated the same as a large value coefficient. In addition, the presence of noise (even a small amount) makes the 0 measure an inappropriate choice. Thus, the 0 measure is sometimes modified to 0 which is also defined in table II [16].
It is clear that a proper choice of is crucial in signal recovery when the 0 criterion is adopted. Therefore, p with 0 < p < 1 is often preferred to 0 [17]. The 1 measure is another alternative to 0 measure, which has been extensively used in research efforts dealing with sparse signal recovery as an optimization problem. This is due to the fact the 1 -based formulation paves the way for signal recovery through linear programming-thus offering a globally optimal and computationally efficient solution [18], [19]. The kurtosis κ 4 measures the peakedness of a distribution [20]. The Hoyer measure [21] is the normalized version of the The Gini Index was originally applied in economics as a measure of the inequality of wealth distribution [11], [22], and has been used as a measure of sparsity in [23], [24]. The q-ratio is a family of entropy-based sparsity measures which was introduced in [25] and was utilized for sparse recovery in [26]. The measures defined in Table II, do not all satisfy the entirety of axioms of sparsity measures [1], [9]. The only exception is s pq which is defined as [1]: Particularly, there has been emphasis put on two special cases of the s pq measure [1], [27], [28], which are where the latter is also known as max-sparsity.

III. THE BUILDING BLOCKS OF SPARSITY MEASURES
While sparsity measures are extensively used in the literature, not many efforts have been devoted to answering questions about their structure, reasons behind the differences among them and their construction. In the following, we will present a fundamental theorem on the building blocks of sparsity measures that satisfy the desirable sparsity axioms which have been previously identified in Table I.
Theorem 1: A sparsity measure s (which satisfies all the axioms) is constructed only by vector norms. This is presented mathematically as indicating that dependence of s on x is only trough the vector norms { x j }. Proof: Consider s(x) to be a sparsity measure satisfying all the axioms summarized in Table I. Continuity and symmetry imply that s(x) should have the following general form (which is inspired by the original formula in [29]): where {T r (·)}, {f r (·)}, {g r (·)}, {R 1r (·)} and {R 2r (·)} are arbitrary continuous functions. Also, {γ r (n)} are arbitrary functions of the vector length n. Suppose y is a k times replication of x. Then, the replication property implies that By substituting we can rewrite (7) as where Γ rk (x) is a function of k for each r and all x, and Γ r1 denotes the value of Γ rk at k = 1. Since (9) holds for all integers k ≥ 1, we can conclude that Based on (8), we can explicitly express (10) for k ≥ 1 as .
(11) The left-hand side of (11) should not be a function of k since the right-hand side of (11) is not a function of k. This is only true when {R 1r } and {R 2r } are homogeneous functions of degrees {p r } and {q r }, respectively. Mathematically speaking, this can be expressed as Therefore, the sparsity measure s satisfies the replication property when γ r (kn)k p r −q r = γ r (n), ∀ k ≥ 1, r ∈ {1, . . . , N}, (13) which means {γ r (n)} are homogeneous functions of degrees {q r − p r }, i.e., γ r (n) = n q r −p r . The scaling property implies Similar to the above approach (for replication property), considering one can rewrite (14) as where η rα (x) is a function of α for each r and x, and η r1 denotes the value of η rα at α = 1. Equation (16) holds for all α > 0. Therefore, we can conclude that The above equality holds when {f r } and {g r } are homogeneous functions of degrees {λ r } and {β r }, respectively. Similar to what we did in the previous case, this can be expressed as Thus, the sparsity measure s satisfies the scaling property when α λ r p r −β r q r = 1 ⇒ λ r p r = β r q r = c r , Based on (18) and (19), we can conclude that s(x) has the following form: which proves that s(x) is constructed only by vector norms. An important special case of this theorem relates to the case of core sparsity measures, and how they contribute to the construction of sparsity measures in general, as discussed below.
Corollary 1: Based on (20), for N = 1 we have where s c (x) denotes a core sparsity measure. In this case, the core sparsity measure s c (x) is a function of x only through the ratio of vector norms λ and β .
Corollary 2: Based on (20), for N ≥ 2 we have where s g (x) denotes a generalized sparsity measure. It is observed that s g (x) is a function of x only through the combination of the ratio of vector norms λ r and β r . In the other words, we can say that s g (x) is a combination of several core sparsity measures {s c r (x)}. will not satisfy at least one of the axioms, e.g., concentration. Therefore, f should be a strictly increasing function, proving the claim.

IV. NUMERICAL ANALYSIS
In this section, we numerically scrutinize sparsity measures through a comparison with well-known measures in the literature and an analysis of their asymptotic behavior.

A. Comparing Measures of Sparsity
We present a simple example to compare an arbitrary core sparsity measure s 12 with the popular − 1 . By this example, we also put further emphasis on the importance of the axioms for sparsity measures. In particular, we consider two types of input data: (i) x g ∈ R 100 generated according to the Gaussian distribution x i ∼ N (0, σ 2 ), with the standard deviation σ = 0.5, and (ii) x c ∈ R 100 drawn from the Cauchy distribution x i ∼ f (0, γ), where f (.) is the Cauchy density function with γ = 0.5 as its scale parameter. It is readily known that a generated data from the Cauchy distribution is typically more sparse than a data generated from the Gaussian distribution [30] since the Cauchy PDF is more sparse than the Gaussian PDF (since the Cauchy distribution is heavy-tailed). As such, a sparsity measure must similarly indicate that the sparsity of x g is lower than that of x c . In Table III, we observe that s 12 correctly unveils which data is sparser. On the other hand, − 1 claims x g to be more sparse than x c which is not correct. This reaffirms the significance of efforts to construct sparsity measures that satisfy the proper axioms, so to serve as metrics for data sparsity.

B. The Asymptotic Behavior of Sparsity Measures
An arbitrary core and a generalized sparsity measures are chosen to measure the sparsity of data x = [x i ] drawn from the Gaussian distribution x i ∼ N (μ, σ), where μ and σ are the mean and the standard deviation of the Gaussian distribution, respectively. Specifically, we draw a variable number of coefficients from the Gaussian distribution as was previously done in [9]. Since we expect sets of coefficients from the same distribution to have a similar sparsity behavior, it is thus expected that a sparsity measure will converge when we increase the number of coefficients [9]. From a mathematical viewpoint, if sparsity measures are constructed by vector norms or natural estimated moments, based on the law of large number, the convergence of sparsity measures can be readily understood. To expand on this concept, the following generalized sparsity measure is defined as an example of applying the formula in (22): where s 12 and s 13 are members of the s pq family. It can be easily verified that s, defined in (23), satisfies all the axioms in Table I. As expected, Fig. 1 shows the convergence of s 12 and s defined in (23), as the number of input coefficients grows large.

V. CONCLUSION
In this letter, the sparsity measures satisfying desirable sparsity axioms were studied and a fundamental theorem on their structure/construction was presented. Such a result provides deeper insights into the behavior of sparsity measures, and particularly their asymptotic behavior, as was illustrated.