Continuous warped time-frequency representations - Coorbit spaces and discretization

We present a novel family of continuous, linear time-frequency transforms adaptable to a multitude of (nonlinear) frequency scales. Similar to classical time-frequency or time-scale representations, the representation coefficients are obtained as inner products with the elements of a continuously indexed family of time-frequency atoms. These atoms are obtained from a single prototype function, by means of modulation, translation and warping. By warping we refer to the process of nonlinear evaluation according to a bijective, increasing function, the warping function. Besides showing that the resulting integral transforms fulfill certain basic, but essential properties, such as continuity and invertibility, we will show that a large subclass of warping functions gives rise to families of generalized coorbit spaces, i.e. Banach spaces of functions whose representations possess a certain localization. Furthermore, we obtain sufficient conditions for subsampled warped time-frequency systems to form atomic decompositions and Banach frames. To this end, we extend results previously presented by Fornasier and Rauhut to a larger class of function systems via a simple, but crucial modification. The proposed method allows for great flexibility, but by choosing particular warping functions $\Phi$ we also recover classical time-frequency representations, e.g. $\Phi(t) = ct$ provides the short-time Fourier transform and $\Phi(t) = log_a(t)$ provides wavelet transforms. This is illustrated by a number of examples provided in the manuscript.


Introduction
In this paper, we introduce the notion of (continuous) warped time-frequency transforms, a class of integral transforms representing functions in phase space with respect to possibly nonlinear frequency scales.The goal of this contribution is to show the following properties: (a) The proposed transforms possess some basic, but central properties, namely continuity and invertibility in a Hilbert space setting.The latter is obtained through a variant of Moyal's formula [1,2].(b) They give rise to classes of generalized coorbit spaces, i.e. nested Banach spaces of functions with a certain localization in the associated phase space.(c) They are stable under a sampling operation, yielding atomic decompositions and Banach frames of warped time-frequency systems.
In order to prove item (c), we will introduce a slight modification to the discretization theory for generalized coorbit spaces presented in [3], cf.Section 3, enabling discretization results for our own construction.The complementary contribution [5] investigates the construction of (Hilbert space) frames by means of discrete warped time-frequency systems.In the last decades, time-frequency representations, in particular short-time Fourier [6,2] and wavelet [7,8] transforms, have become indispensable tools in many areas from theoretical and applied mathematics to physics and signal processing.The classical time-frequency schemes measure the time-frequency distribution of a function as the correlation of that function with a family of time-frequency atoms.These atoms originate from the application of a set of unitary operators to a prototype function or mother wavelet, e.g.translations and modulations in the short-time Fourier transform (STFT) and translations and dilations in the wavelet transform (WT).By the uncertainty principle [9,10], no mother wavelet can be arbitrarily concentrated in time and frequency simultaneously and thus the choice of the prototype function completely determines the time-frequency trade-off of the representation, i.e. constant resolution in the case of STFT and resolution strictly proportional to the center frequency for wavelets.
This rigidity of classical time-frequency systems, particularly the fixed resolution of the STFT, has lead to the development of more general schemes for extracting time-frequency information from a function [11].Some prominent examples include the α-transform [12][13][14][15], sometimes referred to as flexible Gabor-Wavelet transform, and generalized shift-invariant systems [16,17], known as nonuniform (analysis) filter banks in the signal processing community.Also of note are the countless variations on and extensions of the wavelet scheme, including but not limited to, wavelet packets [18], shearlets [19,20], curvelets [21] and ridgelets [22].The previously mentioned transforms rely on the variation of resolution along frequency.The equivalent concept for variation along time are nonstationary Gabor systems [23][24][25][26], which consider semi-regular modulations of a family of prototype functions that vary over time.
Many applications of time-frequency representations require the transform used to be invertible, or more specifically bounded and boundedly invertible.The appropriate tool for the analysis of invertibility properties of time-frequency systems on Hilbert spaces is frame theory [27,28], given a countable family of time-frequency atoms.For uncountable families, the theory of continuous frames [29,30] is appropriate.Whenever a family of time-frequency atoms Φ forms a (discrete or continuous) frame for a Hilbert space H, then the following automatically hold: • Every function f ∈ H is uniquely determined by its inner products with the frame elements, and • every function f ∈ H can be written as a superposition of the frame elements with norm-bounded coefficients.
If we desire to analyze or decompose functions contained not in a Hilbert, but in a Banach space B, then the two properties above cease to be equivalent.In that case, we have to determine separately whether Φ forms a Banach frame and/or an atomic decomposition [31,32] for B.Where applicable, coorbit theory and its various generalizations [33,34,31,3] yield the appropriate Banach spaces for such an analysis, see below.
In this contribution, we introduce a novel family of time-frequency representations adapted to nonlinear frequency scales.Uniquely determined by the choice of a single prototype atom and a warping function that determines the desired frequency scale, our construction provides a family of time-frequency atoms with uniform frequency resolution on the chosen frequency scale.For particular choices of the warping function, we recover the continuous short-time Fourier and wavelet transforms.Hence, the proposed warped time-frequency representations can be considered a unifying framework for a large class of time-frequency systems.
We show that the transforms in this family provide continuous representations (considered as functions in phase space).Furthermore, the proposed systems form continuous tight frames, even satisfying orthogonality relations similar to Moyal's formula [1,2] for the STFT.
We obtain coorbit spaces associated to each frequency scale, i.e. classes of Banach spaces that classify the time-frequency behavior of a function in terms of the corresponding warped time-frequency representation.Through a minor extension of the generalized coorbit theory by Fornasier and Rauhut [3], we can also prove sufficient conditions for countable subfamilies of warped time-frequency atoms to form Banach frames and atomic decompositions for these coorbit spaces.

Related work
The idea of a logarithmic warping of the frequency axis to obtain wavelet systems from a system of translates is not entirely new and was, to our knowledge, first used in the proof of the so called painless conditions for wavelets systems [8].However, the idea has never been relaxed to other frequency scales so far.While the parallel work by Christensen and Goh [35] focuses on exposing the duality between Gabor and wavelet systems via the mentioned logarithmic warping, we allow for more general warping functions to generate time-frequency transformations beyond wavelet and Gabor systems.The warping procedure we propose has already proven useful in the area of graph signal processing [36].
A number of methods for obtaining warped time frequency representations have been proposed, e.g. by applying a unitary basis transformation to Gabor or wavelet atoms [37][38][39][40].Although unitary transformations bequeath basis (or frame) properties to the warped atoms, the warped system provides an undesirable, irregular time-frequency tiling, see [39].
Closer to our own approach, Braccini and Oppenheim [41], as well as Twaroch and Hlawatsch [42], propose a warping of filter bank transfer functions only, by defining a unitary warping operator.However, in ensuring unitarity, the authors give up the property that warping is shape preserving when observed on the warped frequency scale.In this contribution, we trade the unitary operator for a shape preserving warping.
A more traditional approach trying to bridge the gap between the linear frequency scale of the short-time Fourier transform and the logarithmic scale associated to the wavelet transform is the α-transform [12][13][14][15] that employs translation, modulation and dilation operators with a fixed relation between modulation and dilation, determined by the parameter α ∈ [0, 1].For α = 0, the short-time Fourier transform is obtained, while the limiting case α = 1 provides a system with logarithmic frequency scale similar, but not equivalent, to a wavelet system.Both our construction and the α-transform can be considered special cases inside the framework of continuous nonstationary Gabor transforms, see [43], or the equivalent generalized translation-invariant systems [44].
Coorbit theory and discretization results for time-frequency systems on Banach spaces date back to the seminal work of Feichtinger and Gröchenig [45,33,34,31].Their results are heavily tied to the association of a time-frequency system to a group, e.g. the Heisenberg group and STFT or the affine group and wavelet transforms.More precisely, the time-frequency atoms are obtained through application of a square-integrable group representation to a prototype atom.There have been several attempts to loosen these restrictions to accommodate other group-related transforms, e.g. the α-transform [46] and shearlet transform [19,20], see also the references given in [3].Finally, the work of Fornasier and Rauhut [3] completely abolished the need for an underlying group in favor of general continuous frames that satisfy certain regularity conditions.Our systems lack the relation to a group representation.Therefore, the starting point of our investigation is an extension of their results.Although we largely follow [3], it should be noted that alternative proofs and some generalizations of results from [3] have been presented in [4].

Structure of this contribution
In the next section, we review some necessary theory and notation, including a short overview of the results presented in [3], the foundation on which the rest of this manuscript is built.In Section 3, we introduce a minor but useful extension to these results that allows the treatment of the systems we wish to construct, but also the (intuitive) construction of Banach frames and atomic decompositions for the STFT using the Fornasier-Rauhut theory.The rest of the paper is focused on warped time-frequency representations, their definition and basic properties are presented in Section 4. Section 5 provides conditions on the warped time-frequency system, such that generalized coorbit theory is applicable, i.e. the associated test function spaces and coorbit spaces can be defined and possess the desired properties.Finally, Section 6 investigates the feasibility of discretization of warped time-frequency systems on the associated coorbit spaces, while preserving the frame/decomposition properties.

Preliminaries
We use the following normalization of the Fourier transform and its unitary extension to L 2 (R).The inverse Fourier transform is denoted by f := F −1 f Further, we require the modulation operator and the translation operator defined by , respectively, for all f ∈ L 2 (R).The composition of two functions f and g is denoted by f • g.By a superscript asterisk ( * ), we denote the adjoint of an operator and the anti-dual of a Banach space, i.e. the space of all continuous, conjugate-linear functionals on the space.The usual notation O and Θ, see e.g.[47,Chapter 3], is used to describe asymptotic behavior of functions.The fundamental theorem of calculus, i.e. f (b) − f (a) = b a f (s) ds will be referred to as FTC.Let H be a separable Hilbert space and (X, μ) a locally compact, σ-compact Hausdorff space with positive Radon measure μ on X.A nontrivial Banach space (Y, • Y ) of functions on X, continuously embedded in and the map x → ψ x is weakly continuous. 1 A frame is called tight, if A and B can be chosen such that the inequalities above become equalities, i.e.A = B.For any frame, the frame operator defined (in the weak sense) by is bounded, positive and boundedly invertible [29,49].
A kernel on X is a function K : X × X → C. Its application to a function F on X is denoted by Although the theory in Sections 2.1, 2.2 and 3 are valid in this general setting, the later sections mostly consider cases where X is a suitable subset of R 2 endowed with the usual Lebesgue measure, and H is some subspace of L 2 (R).
The most important examples for (Y, • Y ) are weighted Lebesgue spaces from the family L p w (X), for 1 ≤ p ≤ ∞, X ⊆ R 2d and a continuous, nonnegative weight function w : X → R.These spaces consist of all Lebesgue measurable functions, such that the norm Here, F is identified with its zero-extension to a function on R 2d .If p = ∞, the p-norm is replaced by the essential supremum as usual.
In the next subsections, we recall central results of generalized coorbit theory and their requirements.The interested reader can find a more detailed account and the necessary proofs in [3], where these results were first presented.

The construction of generalized coorbit spaces
For the sake of brevity, we will assume from now on that Ψ := {ψ x } x∈X ⊂ H is a tight frame, i.e. S Ψ f = Af for all f ∈ H, leading to considerable simplifications in the following statements.Define the following transform associated to Ψ, The adjoint operator is given in the weak sense by Furthermore, let A 1 be the Banach algebra of all kernels K : X × X → C, such that the norm is finite.Note that the two suprema are equal if K is (Hermitian) symmetric.The algebra multiplication be given by for some C > 0 and all x, y, z ∈ X.For an admissible weight m, the weighted kernel algebra A m is the space of all kernels K : X × X → C, such that Now, we can formulate the following theorem, combining several important results from [3].
Theorem 2.1.Let m be an admissible weight function, fix z ∈ X and define v := m(•, z).If Ψ ⊂ Y is a continuous tight frame and the kernel K Ψ : X × X → C, given by is contained in A m , then is the minimal Banach space B containing all the frame elements ψ x and satisfying ψ x B ≤ Cv(x) for all x and some C > 0. Furthermore, H 1 v is independent of the particular choice of z ∈ X and the expression The result above enables the extension of V Ψ to the distribution space (H 1 v ) * by means of H 1 v possesses a number of additional nice properties.For an exhaustive list, please refer to [3].We only wish to note that H 1 v is dense and continuously embedded in H, whereas H is weak- * dense in (H 1 v ) * , giving rise to a Banach-Gelfand triple [50][51][52].
If a solid Banach space Y satisfies then we can define the coorbit of Y with respect to the frame Ψ, provided with natural norms Theorem 2.2.Let Y be a solid Banach space that satisfies Eq. (16) and In particular, The coorbit spaces (CoY, • CoY ) are independent of the particular choice of the continuous frame Ψ, under a certain condition on the mixed kernel associated to a pair of continuous frames.Proposition 2.3.If Y and Ψ (1) satisfy the conditions of Theorem 2.2 and Ψ (2) is a continuous frame with K Ψ (2) , K Ψ (1) ,Ψ (2) ∈ A m , where K Ψ (1) ,Ψ (2) is the mixed kernel defined by y , ψ (1)   x (19) then Co(Ψ (1) , Y ) = Co(Ψ (2) , Y ). ( 20)

Discretization in generalized coorbit spaces
In the next steps, we investigate the discretization properties of the continuous frame Ψ, obtaining sufficient conditions for atomic decompositions and Banach frames in terms of a discrete subset of Ψ.
We only provide a review of the theory provided in [3], shortened to an absolute minimum.For a comprehensive treatment including the technical details, please refer to the original contribution.Definition 2.4.A family U = {U i } i∈I for some countable index set I is called admissible covering of X, if the following hold.Every U i is relatively compact with non-void interior, X = ∪ i∈I U i and sup i∈I #{j ∈ I : U i ∩ U j = ∅} ≤ N < ∞ for some N > 0. An admissible covering is moderate, if 0 < D ≤ μ(U i ) for all i ∈ I and there is a constant C with The main discretization result states that any pair of a continuous tight frame Ψ and a covering U, such that the A m -norm of the oscillation osc Ψ,U , defined below, is sufficiently small, gives rise to atomic decompositions and Banach frames for Co(Ψ, Y ) in a natural way.Specifically, {ψ x i } i∈I is both a Banach frame and an atomic decomposition if x i ∈ U i for all i ∈ I. Definition 2.5.A family Ψ := {ψ i } i∈I in a Banach space (B, • B ) is called an atomic decomposition for B, if there is a BK-space2 (B , • B ) and linear, bounded functionals {λ i } i∈I ⊆ B * with the following properties • if (λ i ) i∈I ∈ B , then f := i∈I λ i ψ i ∈ B (with unconditional convergence in some suitable topology) and there is a finite constant C 2 > 0 such that • f = Ω ( ψ i (f )) i∈I , for all f ∈ B.
Definition 2.6.The oscillation of a continuous tight frame Ψ with respect to the moderate, admissible covering U of X is defined by where Proposition 2.7.Let Y be a solid Banach space that satisfies Eq. ( 16), Ψ a continuous frame and define v := m(•, z), for some admissible weight m and arbitrary z ∈ X.If there is a moderate, admissible covering U of X, such that Theorem 2.8.Let Y be a solid Banach space that satisfies Eq. ( 16) and Ψ a continuous tight frame.If the moderate, admissible covering U is such that for some C m,U ≥ sup i∈I sup x,y∈U i m(x, y), then {ψ x i } i∈I is a Banach frame and an atomic decomposition for Co(Ψ, Y ) if x i ∈ U i for all i ∈ I.
For details, e.g. about suitable associated sequence spaces, please refer to [3].
Our strategy for satisfying Theorem 2.8 will be the construction a family of moderate, admissible coverings for δ sufficiently small.Then we can find δ 0 > 0, such that Theorem 2.8 holds for all U δ with δ ≤ δ 0 .

The generalized oscillation
We now motivate and present a generalization of the discretization theory for generalized coorbit spaces.Since the derivation of our extended results is largely analogous to the content of [3, Section 5], we only provide the results and indicate the necessary changes here.However, the complete derivation can be found in [53].There, we provide a variant of [3, Section 5] considering our changes, as well as some corrections and modifications to provide a more rigorous and accessible treatment of the theory provided in [3].
A closer investigation of the oscillation kernel associated to the short-time Fourier transform (STFT) shows that the sampling results obtained via classical coorbit space theory are not easily recovered using the theory presented in [3].At the very least, no sequence of intuitive, regular phase space coverings with property (27) seems to exist, as Example 3.1 below demonstrates.We conclude that the construction of a moderate, admissible covering with osc U A m < δ is far from a trivial task, if at all possible.In the setting of the α-transform, Dahlke et al. [46] circumvent this problem by redefining the oscillation kernel to take into account the group action on the affine Weyl-Heisenberg group.An alternative approach, obtaining semi-regular Banach frames from sampled α-transforms, is presented in [54], which is in turn based on previous work by Feichtinger and Gröchenig [55].The following examples serve to illustrate why the (unaltered) application of generalized coorbit theory to the STFT and α-transform presents a nontrivial task.At the same time, they motivate our own solution to the problem.
On the other hand, the oscillation kernel according to Definition 2.6 and [3] yields and, choosing z = y + 1 and With Q y,ω ⊆ U 2δ y,ω as before, we obtain where we applied the changes of variable x 0 = y − x, ξ 0 = ω − ξ.Note that for any fixed 1 , 2 > 0, there always exists a choice of (y, For any fixed which in the short-time Fourier case equals 2 K G(g) A 1 .Thus, the family U does not satisfy Eq. ( 27).Neither does any family of coverings constructed from regular phase space shifts of a fixed compact set U ⊂ R 2 .
A similar argument provides the same result for any other sensible definition of the STFT.
While we cannot prove that there is no family of moderate, admissible coverings with the property Eq. ( 27), it is surely much harder to satisfy using Definition 2.6, than using the classical theory [31].A similar situation arises for the so-called α-transform [46,12].However, in that situation, classical coorbit theory does not apply and we must rely on its generalized variant.

Example 3.2 (The oscillation for the α-transform). For
where This suggests the construction of a moderate, admissible covering from a countable subset of Although e −2πiωδβ α (ω) = e −2πiδω(1+|ω|) −α converges to 1 for δ → 0, convergence speed decreases in |ω|, for all 0 ≤ α < 1.Similar to Example 3.1, the phase factor can behave arbitrarily bad, independent of the size of the covering elements.In [46], this problem is circumvented by redefining the oscillation to respect the group action.
The negative results obtained in the examples above motivate a more general definition of the oscillation.With the following extended definition, the construction of a covering family with the property Eq. ( 27) becomes a properly intuitive task, similar to the classical case [31].Definition 3.1 (Definition 2.6a).Let Γ : X × X → C be a continuous function that satisfies |Γ| = 1.The Γ-oscillation of a continuous tight frame Ψ with respect to the moderate, admissible covering U of X is defined by where At first glance, the above definition might seem arbitrary, but it actually gives rise to simple generalizations of Proposition 2.7 and Theorem 2.8.

Proposition 3.2 (Proposition 2.7a).
Let Y be a solid Banach space that satisfies Eq. ( 16), Ψ a continuous frame and define v := m(•, z), for some admissible weight m and arbitrary z ∈ X.If there is some Γ : Let Y be a solid Banach space that satisfies Eq. ( 16) and Ψ a continuous tight frame.If there is some Γ : for some C m,U ≥ sup i∈I sup x,y∈U i m(x, y), then {ψ x i } i∈I is a Banach frame and an atomic decomposition for Co(Ψ, Y ) if x i ∈ U i for all i ∈ I.
Remark 3.1.The result above is only truly different from Theorem 2.8 when Γ is not separable into two independent phase factors of the same form, i.e.
Otherwise, ψ x := Γ 1 (x)ψ x defines a continuous frame that provides essentially the same transform, gives rise to the same coorbit spaces and satisfies the assumptions of Theorem 2.8.
Proving Theorem 2.8 is a lengthy affair, see [3], and requires a substantial number of interim results, most of which do not even reference the oscillation.All this preparation can be done in exactly the same way to prove Theorem 3.3.To be precise, the oscillation kernel appears only in the proofs for Lemmas 8, 9 and 10, as well as Theorem 7 in [3].Moreover, the proofs of Lemmas 8, 9 and 10 can be executed identically for the generalized oscillation kernel from Definition 3.1, requiring only |K Ψ (x, y)| = |Γ(x, y)K Ψ (x, y)|.These results already imply Proposition 3.2.
The crucial step for proving Theorem 3.3, however, is the invertibility of the discretization operator U Ψ , defined by where and Φ := {φ i } i∈I is a partition of unity with respect to the moderate, admissible covering U := {U i } i∈I , i.e.
This is achieved by the following theorem, a variant of [3,Theorem 7].
Theorem 3.4.Let Y be a solid Banach space that satisfies Eq. ( 16) and Ψ be a continuous tight frame with In particular, U Ψ is bounded and if the RHS of Eq. ( 46) is less than or equal to 1, then U Ψ is boundedly invertible on K Ψ (Y ).
Proof.Let F ∈ K Ψ (Y ) be arbitrary.For the assertion U Ψ F ∈ K Ψ (Y ), please refer to [3].To prove the norm estimate, we introduce the auxiliary operator Using Proposition 3.2 (derived from [3, Corollary 4]), we can confirm that K Ψ (Y ) ⊆ L ∞ 1/v .Hence, we can apply Theorem 2.2 to show that K Ψ equals the identity on K Ψ (Y ).By the triangle inequality, We now estimate both terms on the RHS separately.
In order to estimate In the derivations above, we used K Ψ (x, y) = K Ψ (y, x) and the property supp(φ i ) ⊆ U i ∈ U of the PU Φ = (φ i ) i∈I .We obtain where we used supp(φ i ) ⊆ U i ∈ U once more.
Define H(y) := i∈I |F (x i )|φ i (y), then by [53, Lemma 10] (derived from [3, Lemma 10]) and solidity of Y : Since the above estimate holds for all F ∈ Y , inserting osc U,Γ A m < δ completes the proof. 2 With the result above in place and the changes discussed earlier in this section, the proof of suitable variants of [3, Theorems 5 and 6] using osc U,Γ is identical to the one presented by Fornasier and Rauhut [3].The statement in Theorem 3.3 is weaker than these variants of [3, Theorems 5 and 6] and therefore implied.
This concludes our discussion of abstract coorbit and discretization theory, note again that a fully fledged variant of [3,Section 5], adjusted to the Γ-oscillation can be found in [53].In the following sections, we will construct a family of time-frequency representations and apply the results obtained so far in their context.

Warped time-frequency representations
In this section, time-frequency representations with uniform frequency resolution on nonlinear frequency scales are constructed and their basic properties are investigated.In particular, we show that these transforms are continuous, norm preserving and invertible.
Our method, motivated by the discrete systems in [5], is based on the simple premise of a function system (ψ x,ξ ) (x,ξ)∈D×R , such that ψ x,ξ = T ξ ψ x,0 , where ψ x,0 and ψ y,0 are of identical shape when observed on the desired frequency scale, for all x, y ∈ R. The frequency scale itself is determined by the so-called warping function.Generally, any bijective, continuous and increasing function Φ : D → R, where D is an interval, specifies a (frequency) scale on D.More explicitly, for a prototype function θ : R → C and warping function Φ, the time-frequency atoms are given by see below.For the sake of simplicity, we consider here only the two most important cases This method allows for a large amount of flexibility when selecting the desired frequency scale, but we also recover classical time-frequency and time-scale systems: Clearly, a regular system of translates is obtained for any linear function Φ, while observing (T x θ) • log a = (θ • log a )(•/a x ) shows that logarithmic Φ provides a system of dilates, respectively.Therefore, short-time Fourier [6,2,56,57] and wavelet [58,7] transforms will turn out to be special cases of our setting.In order to obtain nice systems, we require the derivative of the inverse warping function (Φ −1 ) to be a v-moderate weight function.Definition 4.1.
for some submultiplicative weight function v and constant C < ∞.
Submultiplicative and moderate weight functions are an important concept in the theory of function spaces, as they are closely related to the translation-invariance of the corresponding weighted spaces [59,2], see also [60] for an in-depth analysis of weight functions and their role in harmonic analysis.
) and the associated weight function is v-moderate for some submultiplicative weight v.If D = R, we additionally require Φ to be odd.
Remark 4.1.Moderateness of w = Φ −1 ensures translation invariance of the associated weighted L p spaces.In particular, holds for all θ ∈ L 2 √ w (R).Moreover, a similar estimation yields for all θ ∈ L 2 √ v (R).Without loss of generality, we can assume that Remark 4.2.The definition above only allows warping functions with nonincreasing derivative and, if D = R, we also require point-symmetry.Both restrictions are first evoked in Sections 5 and 6 and not required for the results in the present section.We expect that, with appropriate changes to some of our proofs, it is possible to relax those conditions, as well as the restriction D = {R, R + }, but such modification is beyond the scope of this contribution.
From here on, we always assume Φ to be a warping function as per Definition 4.2 and w = (Φ −1 ) the associated v-moderate weight.The resulting continuously indexed family of time-frequency atoms is given as follows.w (R).The continuous warped time-frequency system with respect to θ and Φ is defined by G(θ, Φ) := {g x,ξ } (x,ξ)∈D×R , where The phase space associated with this family is D × R.
Eq. ( 59) and the above definition immediately yield Proposition 4.5.Let Φ be a warping function and Proof.We compute the following estimate Since modulations are continuous on L 2 (D) it is sufficient to show that g x − g x L 2 → 0, as x tends to x.
To see this we calculate Now a 2 argument finishes the proof since Φ (x)/Φ (x) → 1, Φ(x) → Φ(x) as x → x and translations are continuous on the weighted space L 2 √ w due to moderateness of the weight function w. 2 Indeed, V •,Φ also possesses a norm-preserving property similar to the orthogonality relations (Moyal's formula [1,2]) for the short-time Fourier transform.
Theorem 4.6.Let Φ be a warping function and θ 1 , θ 2 ∈ L 2 √  w .Furthermore, assume that θ 1 and θ 2 fulfill the admissibility condition Then the following holds for all In particular, if θ ∈ L 2 √ w is normalized in the (unweighted) L 2 sense, then Proof.The elements of G(θ 1 , Φ) and G(θ 2 , Φ) will be denoted by g 1 x,ξ and g 2 x,ξ , respectively.We use the fact that Using the substitution s = Φ(t) − Φ(x) we can simplify the inner integral The desired results follow using Parseval's formula (and setting The orthogonality relations are tremendously important, because they immediately yield an inversion formula for V θ,Φ , similar to the inversion formula for wavelets and the STFT.They even imply that {g x,ξ } x∈D,ξ∈R forms a continuous tight frame with frame bound θ 2  2 .Note that the admissibility condition Eq. ( 65) is always satisfied if D = R.In that case w = (Φ −1 ) is bounded below, implying L 2 √ w ⊆ L 2 .On the other hand, if D = R + , w can never be bounded from below and the admissibility condition is a real restriction.Moreover, for Φ = log, ,0 being admissible wavelets, i.e. g 1 Φ −1 (0),0 , g 2 Φ −1 (0),0 satisfy the classical wavelet admissibility condition.
Corollary 4.7.Given a warping function Φ and some nonzero θ The equation holds in the weak sense.
Proof.The assertion follows easily from the orthogonality relations by setting θ = θ 1 = θ 2 since for any given f 2 ∈ F −1 (L 2 (D)) we have the relation To conclude this section, we give some examples of warping functions that are of particular interest, as they encompass important frequency scales.For a proof that the presented examples indeed define warping functions in the sense of Definition 4.2, please see [5,Proposition 2].
Example 4.1 (Wavelets).Choosing Φ = log, with D = R + leads to a system of the form This warping function therefore leads to g x being a dilated version of g 1 .Note the interaction of the Fourier transform and dilation to see that G(θ, log) is indeed a continuous wavelet system, with the minor modification that our scales are reciprocal to the usual definition of wavelets.
Example 4.2.The family of warping functions Φ l (t) = c (t/d) l − (t/d) −l , for some c, d > 0 and l ∈]0, 1], is an alternative to the logarithmic warping for the domain D = R + .The logarithmic warping in the previous example can be interpreted as the limit of this family for l → 0 in the sense that for any fixed This type of warping provides a frequency scale that approaches the limits 0, ∞ of the frequency range D in a slower fashion than the wavelet warping.In other words, g x is less deformed for x > Φ −1 l (0), but more deformed for x < Φ −1 l (0) than in the case Φ = log.On the other hand, the property that g x can be expressed as dilation of g Φ −1 l (0) , or any other unitary operator applied to g Φ −1 l (0) , is lost.

Example 4.3 (ERBlets).
In psychoacoustics, the investigation of filter banks adapted to the spectral resolution of the human ear has been subject to a wealth of research, see [61] for an overview.We mention here the Equivalent Rectangular Bandwidth scale (ERB-scale) described in [62], which introduces a set of bandpass filters following the human perception.In [63,64] the authors construct filter banks that are designed to be adapted to the auditory frequency scales.The warping function can also be used to construct a continuous time-frequency representation on an auditory scale, as the ERB-scale is obtained for c 1 = 9.265 and c 2 = 228.8.Being adapted to the human perception of sound, this representation has potential applications in sound signal processing.
Example 4.4.The warping function Φ l (t) = sgn(t) (|t| + 1) l − 1 for some l ∈]0, 1] leads to a transform that is structurally very similar to the α-transform.Much in the same way, this family of warping functions can be seen as an interpolation between the identity (l = 1), which leads to the STFT, and an ERB-like frequency scale for l → 0. This can be seen by differentiating Φ and observing that for l approaching 1 this derivative approaches (up to a factor) the derivative of the ERB warping function for c 1 = c 2 = 1.The connection between this type of warping and the α-transform is detailed below.The α-transform, provides a family of time-frequency transforms with varying time-frequency resolution.Its time-frequency atoms are constructed from a single prototype by a combination of translation, modulation and dilation, see Example 3.2.
If Fg is a symmetric bump function centered at frequency 0, with a bandwidth 3 of 1, then Fg x,0 is a symmetric bump function centered at frequency x, with a bandwidth of 1/β α (x).Up to a phase factor, Fg x,ξ = M −ξ Fg x,0 .Varying α, one can interpolate between the STFT (α = 0, constant time-frequency resolution) and a wavelet-like (or more precisely ERB-like) transform with the dilation depending linearly on the center frequency (α = 1).
Through our construction, we can obtain a transform with similar properties by using the warping functions Φ l (t) = l −1 sgn(t) (1 + |t|) l − 1 , for l ∈]0, 1], and Φ 0 (t) = sgn(t) log(1 + |t|), introduced here and in Example 4.3.Take θ a symmetric bump function centered at frequency 0, with a bandwidth of 1. Then Fg x,0 = Φ (x)θ(Φ(t) − Φ(x)) is still a bump function with peak frequency x, but only symmetric if l = 1 or x = 0.Moreover, the bandwidth of Fg x,0 equals All in all, it can be expected that the obtained warped transforms provide a time-frequency representation very similar to the α-transform with the corresponding choice of α.

Coorbit spaces for warped time-frequency systems
In the previous section we have developed time-frequency representations for functions f ∈ F −1 (L 2 (D)).Due to the inner product structure of the coefficient computation it seems natural to attempt the representation of distributions f by restricting the pool of possible functions θ, so that the resulting warped time-frequency system consists entirely of suitable test functions.In the setting of classical Gabor and wavelet transforms, the appropriate setting is Feichtinger and Gröchenig's coorbit space theory [33,34].
In addition to a Banach space of test functions and the appropriate dual (distribution) space, coorbit theory provides a complete family of (nested) Banach spaces, the elements of which are characterized by their decay properties in the associated time-frequency representation.However, most attempts to generalize coorbit space theory still require the examined TF representation to be based on an underlying group structure, similar to the STFT being based on the (reduced) Heisenberg group in the classical theory.
Since our warped TF transform V θ,Φ does not possess such a structure, the appropriate framework for the construction of the associated coorbit spaces is the generalized coorbit theory by Fornasier and Rauhut [3].In other words, we aim to translate the results presented in Section 2.1 to the setting of warped time-frequency systems.The first step towards this is finding sufficient conditions for a prototype function θ, such that G(θ, Φ) satisfies K θ,Φ := K G(θ,Φ) ∈ A m , for suitable weights m.A large part of this section is devoted to proving the following main result.Theorem 5.1.Let Φ : D → R be a warping function with w = (Φ −1 ) ∈ C 1 (R), such that for all x, y ∈ R: for a symmetric, submultiplicative weight function v 1 and define m(x, y, ξ, ω) = max m 1 (x) m 1 (y) , m 1 (y) m 1 (x) .Then 3 The exact definition of bandwidth, e.g.frequency support or −3 db bandwidth, is not important for this example.
In fact, we prove a stronger result that provides weaker, but more technical conditions on m and θ.As indicated by (77), we will not be able to construct coorbit spaces for arbitrary warping functions.Indeed, we require w = (Φ −1 ) to be self-moderate, i.e. w(x + y) ≤ C w w(x)w(y), for all x, y ∈ R (80) and a suitable C w > 0. For the remainder of this manuscript, we will assume Eq. ( 80) to hold and that C w denotes a constant such that the equation is satisfied; without loss of generality we also assume C w ≥ 1.
Remark 5.1.Since we require the warping functions to be self-moderate, the following results do not hold for the warping functions  We begin by noting that the norm condition K θ,Φ A m < ∞ reduces to ess sup x,ξ∈R where ds dη dz, ( and . The expression describing I θ,Φ,m (x, ξ) is obtained by (i) inserting the definition of K θ,Φ and g x,ξ while substituting x → Φ −1 (x), and (ii) performing the following three changes of variable: s = Φ(t) − x and z = Φ(y) − x (both used at (1) below) and In order to obtain an estimate I θ,Φ,m < C < ∞, we first derive an estimate for the innermost integral, that ensures convergence of the outer integrals.The most important tool for that purpose is the so-called method of stationary phase [65], a particular case of partial integration most widely known for being the classical method of proving f ∈ for all f ∈ C 1 , with wf ∈ C 0 .Here, we use the differential operators D w,x,η : C 1 (R) → C(R) defined by We need to prove some auxiliary results for those operators: where Proof.The assertion is proven by induction.By definition, Therefore, the assertion holds for n = 1.For the induction step, note that By the quotient rule, Using the definition of P n−1,k it is easy to see that P n−1,k (s + x)T −x w(s) and P n−1,k (s + x)w (s + x) are sums of n-term products of w and its derivatives of order no higher than n − k − 1 and n − k, respectively.Furthermore, the highest order derivative of f appearing in G w,x is f (n) .It remains to show that P n−1,k is a sum of (n − 1)-term products of w and its derivatives of order no higher than n − k.For any term of Therefore, the individual terms of G w,x satisfy the conditions imposed on the terms of P n,k and reordering them by the appearing derivative of f completes the proof. 2 The following corollary shows that D n w,x,η (f )(s) is uniformly bounded as a function in x ∈ R, under suitable assumptions on the function w.
Corollary 5.3.Let w ∈ C n (R) be a self-moderate weight, Eq. (80).If there are constants Proof.First, note that self-moderateness w(x)/w(s + x) ≤ C w w(−s).Invoke Lemma 5.2 to see that is a viable choice for Eq.(92) to hold, provided it is finite.Use Lemma 5.2 again to obtain Since the sets {P n,k } and Σ n,k are finite, the expression in Eq. ( 93) is finite and there is a finite C n > 0. 2 Lemma 5.4.Let Φ be a warping function such that w = (Φ −1 ) ∈ C n (R) is a self-moderate weight, Eq. (80), and |w (k) /w| ≤ D k for some constants and, with C n as in Corollary 5.3, Furthermore, the LHS of Eq. ( 97) is bounded by Proof.To prove the second assertion, simply note that by self-moderateness of w.Now note that Combine Eq. ( 95) with Corollary 5.3 and the stationary phase method to find that where we used Eq. ( 96) to obtain the final estimate.This completes the proof. 2 In our specific case, the function f has a special form, namely f (s) = θ(s)T z θ(s).We now determine conditions on θ such that the estimates obtained through Lemma 5.4 are integrable.This is the final step for establishing convergence of the triple integral Eq. (82).Lemma 5.5.Let Φ be a warping function such that w = (Φ −1 ) ∈ C(R) is a self-moderate weight, Eq. (80).Let furthermore v be a symmetric, submultiplicative weight function. with Here, with C n > 0 as in Corollary 5.3.
Proof.For ease of notation, we will denote in the following chain of inequalities by C a finite product of nonnegative, finite constants.Therefore, C might have a different value after each derivation, but is always nonnegative and finite.There is 1 ≤ k ≤ n + 1, such that the LHS of Eq. ( 103) equals In this derivation, we used self-moderateness of w repeatedly, as well as submultiplicativity and symmetry of both (1 +| •|) −1− and v. Furthermore, we used the product rule for differentiation and that the appearing sum is finite.In the final step, we applied Cauchy-Schwarz' inequality. 2 We are now ready to prove the main result simply by collecting the conditions from the interim results above.The proof itself is only little more than sequentially applying those interim results to the function I θ,Φ,m given by Eq. (82).
Theorem 5.6.Let Φ be a warping function such that w = (Φ −1 ) ∈ C p+1 (R) is a self-moderate weight, Eq. (80), and |w (k) /w| ≤ D k for some constants with weight functions m 1 , m 2 that satisfy for some > 0. Then ess sup x,ξ∈R Proof.Recall the Definition of I θ,Φ,m in Eq. ( 82) ds dω dz, . We already know that C x (z) = w (z+x)  w(x) ≤ C w w(z) by the assumptions on Φ.We now estimate the time-frequency weight m.To that end, observe m(x, z, ξ, η) Here we used Conditions (i) and (ii) on m 1 , m 2 .For the final inequality, we used that w is nondecreasing on R + and, if D = R, symmetric.Note that for D = R + we have p = 0.For sufficiently large C > 0, where we used the method of stationary phase, together with condition (a) on θ.
To obtain the final estimate, we distinguish between the cases |η| < 1 and |η| ≥ 1.In the first case, we use the estimate in Eq. ( 109) and obtain ds dη dz where the derivation follows the steps in the proof of Lemma 5.5.
ds dη dz To obtain finiteness, we used Lemma 5.5 and 111) and (112) to prove the assertion.A more precise statement is obtained using the estimate in the proof of Lemma 5.5: with This completes the proof. 2 We now have a set of conditions on Φ, m and θ that guarantee K θ,Φ ∈ A m and therefore allow the construction of a set of (generalized) coorbit spaces by applying Theorems 2.1 and 2.2.It is easy to see that Theorem 5.1 is just a special case of Theorem 5.6.
Proof of Theorem 5.1.Set m 2 = 1 to see that the conditions on Φ and m imply the conditions of Theorem 5.6.Furthermore, note that θ ∈ C ∞ c implies θ ∈ L 2 √ w and conditions (a-c).If furthermore w = (Φ −1 ) and v 2 are polynomial, then θ ∈ S is sufficient for θ ∈ L 2 √ w and to imply conditions (a-c).Therefore, the result follows immediately from Theorem 5.6. 2 To ensure that Co(G(θ, Φ), Y ) is a Banach space, it remains to show that K θ,Φ (Y ) is continuously embedded in L ∞ 1/v .This will be achieved in the next section, under slightly stronger conditions on θ, through an application of Proposition 3.2.For now, we simply assume that embedding for all considered G(θ, Φ).
The results of this section enable the construction of coorbits of an abstract, solid Banach space Y with respect to G(θ, Φ), provided Y satisfies Eq. ( 16).Before considering the discretization problem in more detail, we discuss how they can be applied to the exemplary warping functions provided at the end of Section 4.

Examples for the application of Theorem 5.1
Fix 1 ≤ p ≤ ∞ and choose a continuous weight function v : D × R → R + .Then by Schur's test, the weighted space L p v (D × R) satisfies Eq. ( 16) with If v is also bounded away from 0 (resp.bounded above), then v y,ω (x, ξ) := m(x, y, ξ, ω) and v (resp.v −1 y,ω and v) are equivalent weights, for any fixed (y, ω) ∈ D × R.
Let additionally v be such that there is an equivalent tensor weight ṽ(x, ξ) := ṽ1 (x)ṽ 2 (ξ), i.e. there are Then m v and m ṽ are equivalent and we can apply Theorem 5.6 with regards to A m v = A m ṽ .If D = R and ṽ2 is a polynomial weight, then condition (ii) in Theorem 5.6 is satisfied for some p ∈ N.For D = R + , this is only possible, if ṽ2 ≡ 1.

Discrete frames and atomic decompositions
We will now construct moderate, admissible coverings (see Definition 2.4) and show that families of covers and a canonical choice of Γ exist, such that the associated Γ-oscillation converges to 0 in A m , i.e.
for any admissible warping function Φ and sufficiently smooth, quickly decaying prototype θ.
Consequently, the discretization machinery provided by Sections 2.2 and 3 can be put to work, providing atomic decompositions and Banach frames with respect to G(θ, Φ) and the family of coverings U δ , δ > 0. Let us first define a prototypical family of coverings induced by the warping function.Definition 6.1.Let Φ be a warping function.Define , where We call U δ Φ the Φ-induced δ-cover.For all δ > 0, U δ Φ is a moderate, admissible covering with μ(U δ l,k ) = δ 2 , where μ is the standard Lebesgue measure.
Let us state our second main result.Theorem 6.2.Let Φ : D → R be a warping function with w = (Φ −1 ) ∈ C 1 (R), such that for all x, y ∈ R: Furthermore, let U δ Φ be the induced δ-cover and m 1 : for a symmetric, submultiplicative weight function v 1 and define m(x, y, ξ, ω) = max m 1 (x) m 1 (y) , m 1 (y) m 1 (x) .Then sup l,k∈Z sup (x,ξ),(y,ω)∈U δ l,k m(x, y, ξ, ω) < ∞ and where For sufficiently small δ 0 and δ ≤ δ 0 , there are constants Similar to the previous section, Theorem 6.2 is a special case of a more general result with weaker conditions on m and θ.And once more, the proof of that result requires some amount of preparation.First, we take a closer look at the sets Q y,ω from the definition of the Γ-oscillation.Lemma 6.3.Let Φ be a warping function, such that w = (Φ −1 ) is self-moderate, Eq. (80), and U δ Φ the induced δ-cover.For all (y, ω) ∈ D × R and all δ > 0, where (121) Proof.Assume that (y, ω) ∈ U δ l,k , then in turn Furthermore, Assume D = R + .Since w is nondecreasing and self-moderate, where we applied the FTC.Therefore This completes the proof for D = R + .For D = R and |y| ≥ δ, Eq. ( 126) holds by the same argument.For |y| < δ, the FTC yields On the other hand w(0) ≥ C −1 w w(Φ(y))/w(Φ(y)) ≥ C −1 w w(Φ(y))/w(δ), showing that Eq. ( 126) holds for all y ∈ R. 2 The next two results are concerned with a certain family of operators.At this point, their definition might seem arbitrary, but their purpose will become clear once we investigate osc U δ Φ ,Γ more closely.In particular, we show that they approximate the identity in a suitable way.For usage in the next two lemmas, we define the space equipped with the supremum norm.
Lemma 6.4.Let X = L p w(R), 1 ≤ p < ∞ or X = (C 0 ) w(R), for some weight function w, and assume that w = (Φ −1 ) , for some warping function Φ, is self-moderate, Eq. (80).For all y ∈ R, ≥ 0, let E y, : X → X be the operator defined by a.e. , for all f ∈ X. (129) The following hold: Proof.We only provide the proof for X = L p w(R), the proof for X = (C 0 ) w(R) is analogous.In order to prove (i), note that By self-moderateness of w and the FTC, where we used w nondecreasing (D = R + ), respectively nondecreasing on R + and odd (D = R).Therefore, 1 − e 2πi Φ −1 (t+y)−Φ −1 (y) for all 0 ≤ ≤ 1 2C w δv (δ) .Inserting into Eq.( 131) proves (i).For proving (ii), note that we can construct, for any f ∈ X, a sequence (f n ) n∈N ⊂ X of compactly supported functions, i.e. supp(f By (i) however, f n − E y, f n X is bounded uniformly independent of y, provided is small enough.Consequently, completing the proof. 2 The next result clarifies the stability of E y, when combined with differentiation.
Lemma 6.5.Let X = L p w(R), 1 ≤ p < ∞ or X = (C 0 ) w(R), for some weight function w, and assume that w = (Φ −1 ) ∈ C n−1 (R), for some warping function Φ, is self-moderate, Eq. (80).If there are then (E y, θ) (n) ∈ X, for all y ∈ R and the map → sup where with Σ n,k,l ⊆ {σ = (σ 1 , . . ., σ l ) : σ m ∈ (0, . . ., n − k − 1)} and some C σ ∈ R. By the conditions on w, Since all the sums in Eq. ( 138) are finite, there is some C > 0 such that for all 0 ≤ ≤ (2π) −1 .For → 0, the first term converges to 0 by Lemma 6.4.To complete the proof, we need to show that Eq. ( 138) holds.Clearly, proving Eq. ( 138) for n = 1.Assume it holds for n − 1, then We now consider the derivative of each term separately.For the first term, invoke Eq. (142) for θ (n−1) .All the other terms are of the form with Reorder everything by the appearing derivative of θ to complete the proof. 2 We are now ready to prove the central statements of this section, which we will split into two more compact results.
Proof.First, note that the conditions of Theorem 6.7 imply the conditions of Proposition 6.6.Therefore, G(θ, Φ) satisfies the conditions of Proposition 3.2 for any induced δ-cover, with arbitrary δ.In turn, Proposition 3.2 provides the continuous embedding K θ,Φ (Y ) ⊆ L ∞ 1/v .Finally, the conditions of Theorem 6.7 imply the conditions of Theorem 5.6.Assembling all the pieces, the spaces Co(G(θ, Φ), Y ) are well defined and, by Theorem 2.2, have the Banach space property.2 The statements we have just proven specify a set of conditions on Φ, m and θ such that we can construct atomic decompositions and Banach frames by invoking Theorem 3.3.That the conditions of Theorem 6.2 imply the conditions in Theorem 6.7 and Proposition 6.6 is easily seen.
Proof of Theorem 6.2.Analogous to the proof of Theorem 5.1, but use Theorem 6.7 and Proposition 6.6 instead of Theorem 5.6. 2 Remark 6.1.Although we only state Theorems 6.2 and 6.7, as well as Proposition 6.6, for the induced δ-cover, it is easily seen that any covering U that satisfies Lemma 6.3, for δ > 0 small enough, guarantees osc U ,Γ A m < and sup l,k∈Z sup (x,ξ),(y,ω)∈ U l,k m(x, y, ξ, ω) ≤ C m, U < ∞.If > 0 is in turn small enough, then Theorem 3.3 can be applied, providing atomic decompositions and Banach frames with respect to U.

Conclusion and outlook
In this contribution, we introduced a novel family of time-frequency representations containing representations tailored to a wide range of nonlinear frequency scales.We have shown that the resulting integral transforms are invertible and produce continuous functions on phase space.Under mild restrictions on the chosen frequency scale, every such representation gives rise to a full family of (generalized) coorbit spaces.Furthermore, through a minor, but important generalization to existing discretization results in generalized coorbit theory, we are able to prove that atomic decompositions and Banach frames can be constructed in a natural way, provided that the system is discretized respecting suitable density conditions.
There still are many open questions regarding the finer structure of coorbit space theory for warped time-frequency representations, e.g.whether the generated coorbit spaces coincide with some known localization spaces.Since the warping functions Φ(x) = x and Φ(x) = log(x) yield short-time Fourier and wavelet transforms, the associated coorbit spaces coincide with their classical counterpart.Furthermore, the close relationship between the α-transform and the warping functions discussed in Examples 4.3 and 4.4 suggests a connection to α-modulation spaces that requires closer study.
Another interesting question is the relation between the spaces {θ ∈ L 2 √ w : K θ,Φ ∈ A m } and {g • Φ −1 : g ∈ H 1  v }.Clearly, the first space is contained in the second, since G(θ, Φ) ⊂ H 1 v , but at this point it is unclear whether the inclusion is strict.
The construction of (Hilbert space) frames by means of discrete warped time-frequency systems is covered in [5].Therein generalizations of classical necessary and sufficient frame conditions, previously known to hold for Gabor and wavelet systems, are recovered.A special focus in [5] is the construction of tight frames with bandlimited elements, also illustrated through a series of examples.
Future work will investigate the extension of warped time-frequency representations to higher dimensional signal spaces and the modification of Fornasier and Rauhut's generalized coorbit theory in order to allow systematic treatment of the coorbit spaces Co(Ψ, L p,q w ) associated to mixed-norm spaces L p,q w , which are important to describe functions that have significantly different properties in the space and frequency domains, respectively.

Definition 4 . 4 .
The Φ-warped time-frequency transform of f ∈ F −1 (L 2 (D)) with respect to the warping function Φ and the prototype θ ∈ L 2 √ w (R) is defined by

Proposition 6 . 6 .
Let Φ : D → R be a warping function satisfying Eq. (80).Let p ∈ N if D = R and p = 0 if D
if there is a BK-space (B , • B ) and linear, bounded operator Ω : B → B with the following properties• if f ∈ B, then ( ψ i (f )) i∈I ∈ B and there are finite constants 0 < C 1 ≤ C 2 such that moderate in all those cases.Hence, polynomial weights ṽ1 satisfy condition (i) in Theorem 5.6 for Examples 4.1, 4.3 and 4.4.