The Affine Wigner Distribution

We examine the affine Wigner distribution from a quantization perspective with an emphasis on the underlying group structure. One of our main results expresses the scalogram as (affine) convolution of affine Wigner distributions. We strive to unite the literature on affine Wigner distributions and we provide the connection to the Mellin transform in a rigorous manner. Moreover, we present an affine ambiguity function and show how this can be used to illuminate properties of the affine Wigner distribution. In contrast with the usual Wigner distribution, we demonstrate that the affine Wigner distribution is never an analytic function. Our approach naturally leads to several applications, one of which is an approximation problem for the affine Wigner distribution. We show that the deviation for a symbol to be an affine Wigner distribution can be expressed purely in terms of intrinsic operator-related properties of the symbol. Finally, we present an affine positivity conjecture regarding the non-negativity of the affine Wigner distribution.


Introduction
The most studied quadratic time-frequency representation is the Wigner distribution defined by Originally invented by Wigner in [24] almost a century ago, the Wigner distribution is essential in quantum mechanics as it gives the expectation values for Weyl quantization of symbols [8]. In recent decades, the Wigner distribution has found many applications in time-frequency analysis [15,Chapter 4] due to its connections with the short-time Fourier transform V g f defined precisely in (2.4). One of the more surprising connections is the convolution relation where P is the reflection operator P (g)(x) := g(−x). The function SPEC g f := |V g f (x, ω)| 2 is called the spectrogram of f with window g. The spectogram is an important tool for analyzing time-frequency content and has been used extensively in the engineering literature since its introduction.
Parallel to the theory of time-frequency analysis is the time-scale (or wavelet) paradigm. Although there have been many attempts at finding a suitable Wigner distribution in the timescale setting, there is no general consensus in the literature. We will motivate a particular choice of a time-scale Wigner distribution W ψ Aff given by The function W ψ Aff is called the affine Wigner distribution due to its relation to the affine group Aff. It was derived through a quantization procedure in [12]. The authors showed that the affine Wigner distribution satisfies W ψ Aff ∈ L 2 r (Aff) for every ψ ∈ L 2 (R + , a −1 da), where L 2 r (Aff) denotes all measurable functions on the upper half-plane R × R + that are square integrable with respect to the measure a −1 da dx.
The affine Wigner distribution W ψ Aff has appeared in the literature several times throughout the years; as a particular Bertrand distributions in [20], and as a tool for studying the quantum mechanics of the Morse potential in [19]. The basic properties of the affine Wigner distribution will be developed in a rigorous manner to fill gaps in the literature. In particular, for all sufficiently nice ψ ∈ L 2 (R + , a −1 da) we have the marginal properties The symbol M(ψ)(x) denotes the Mellin transform of ψ ∈ L 2 (R + , a −1 da) at the point x ∈ R given by The first significant contribution is to develop a connection between the affine Wigner distribution and the scalogram defined by where W g f denotes the continuous wavelet transform of f with respect to g defined precisely in (2.8). By comparing with (1.2) in the time-frequency setting, one would expect a simple convolution relation to hold. However, as the group underlying the symmetries in the time-scale case is the affine group we obtain the following result.
Theorem. Let f and g be square integrable functions on the real line and assume their Fourier transforms φ := f and ψ := g are supported in R + and are in L 2 (R + , a −1 da). Then the scalogram of f with window g is given by the affine convolution where ∆ and I denote the modular function and the involution on the affine group, respectively.
The affine group Aff and constructions on it will be thoroughly described in the sections to come. We will use the affine group time and time again to shed light on various results we derive. In particular, it will be clear that the measure used in the marginal property (1.3) is the correct one.
We introduce an affine ambiguity function A ψ Aff for ψ ∈ L 2 (R + , a −1 da) given by The affine ambiguity function is intimately related to the radar ambiguity function in timefrequency analysis [15,Chapter 4.2]. We will show that the affine Wigner distribution and the affine ambiguity function are related through the Mellin transform by (1.4) The relation (1.4) is used in the proof of Proposition 6.5 to show that the affine Wigner distribution preserves Schwartz functions. It turns out that affine Wigner distributions are never analytic functions on the upper halfplane. This is a consequence of the non-existence of analytic functions in the space L 2 r (Aff). However, the space L 2 r (Aff) can be completely decomposed into "almost analytic" functions as the following result shows.
Proposition. We have the orthogonal decomposition A n (Aff) ⊕ A ⊥,n (Aff), (1.5) where A n (Aff) and A ⊥,n (Aff) denote the spaces of pure poly-analytic and pure anti-poly-analytic functions of order n, respectively. In particular, there are no analytic or anti-analytic functions on the form W ψ Aff for ψ ∈ L 2 (R + , a −1 da).
As an application to the theory developed we consider the approximation problem of understanding, for a given f ∈ L 2 r (Aff), the quantity . (1.6) Notice that (1.6) measures how far f is from being an affine Wigner distribution. The analogous problem in time-frequency analysis has been recently studied in [2]. For each symbol f ∈ L 2 r (Aff) there is a Hilbert-Schmidt operator A f on L 2 (R + , a −1 da) that is weakly defined by the relation , ψ, φ ∈ L 2 (R + , a −1 da). (1.7) The following result shows that the quantity (1.6) is linked to how much A f deviates from being a rank-one operator.
Theorem. Let f ∈ L 2 r (Aff) be real valued. Then (under a mild eigenvalue assumption on A f ) we have that inf where · HS and · op are the Hilbert-Schmidt norm and operator norm, respectively. Moreover, the precise number of distinct minimizers can be deduced from the spectrum of A f .
The structure of the paper is as follows: In Section 2 we outline nessesary definitions and briefly review the affine group as it will be central for many of the results we develop. In Section 3 we derive basic properties of the affine Wigner distribution. We devote Section 4 to uniting the literature and pointing out how the affine Wigner distribution can be derived by emphasizing symmetry. The convolution relation between the affine Wigner distribution and the scalogram will be proved in Section 5. In Section 6 we define the affine ambiguity function and show how this allows us to extend the affine quantization (1.7) to the distributional setting. We prove the decomposition (1.5) of L 2 r (Aff) in Section 7 and show how the Laguerre polynomials are a useful tool in our setting. In addition to the approximation problem described above, we show in Section 8 how basic questions regarding operators on R + can be answered with our framework. Finally, we discuss the affine Grossmann-Royer operator and the affine positivity conjecture in Section 9. The authors are grateful for helpful suggestions from Eirik Skrettingland and Luís Daniel Abreu.

Preliminaries
The notation S(R d ) will be used for the Schwartz space of rapidly decaying smooth functions on R d . Its dual space of tempered distributions is denoted by S ′ (R d ). The Fourier transform of a function f ∈ L 2 (R d ) will be denoted by We will frequently consider the space L 2 (R + ) := L 2 (R + , a −1 da) consisting of measurable functions f : This notation is consistent with our group theoretical approach as a −1 da is the Haar measure on R + . If we want to consider the usual Euclidean measure on R + , we will explicitly write L 2 (R + , dx) to avoid any confusion. We use the notation S(R + ) for the smooth functions ψ : The reader is referred to Appendix 10.2 for notation and basic properties regarding Schatten class operators S p (H) on a separable Hilbert space H for 1 ≤ p < ∞. In particular, the Hilbert-Schmidt operators on H will be denoted by S 2 (H).

The Classical Wigner Distribution and the Heisenberg Group
We begin by recalling basic definitions from time-frequency analysis and their connection with the Heisenberg group. The cross-Wigner transform W (f, g) of f, g ∈ L 2 (R d ) is defined to be Notice that the Wigner distribution W f given in (1.1) is precisely the diagonal term W (f, f ). The cross-Wigner transform satisfies the orthogonality property A key feature of the Wigner distribution is its connection with the Weyl calculus: For a symbol σ ∈ S ′ (R 2d ), the Weyl (pseudo-differential) operator L σ corresponding to the symbol σ is the operator (2. 2) The operators T −u and M ξ in (2.2) are respectively the time-shift operator and the frequency-shift operator defined by The association σ → L σ is called the Weyl transform and the operator L σ maps S(R d ) into S ′ (R d ) by [15,Lemma 14.3.1]. Moreover, the Weyl transform is a one-to-one correspondence between square integrable symbols σ ∈ L 2 (R 2d ) and Hilbert-Schmidt operators L σ ∈ S 2 L 2 (R d ) by a classical result of Poole [21,Proposition V.1]. The connection between the Weyl calculus and the cross-Wigner transform is the relation for σ ∈ L 2 (R 2d ) and f, g ∈ L 2 (R d ). Since the Weyl transform is a quantization procedure, one can think of the inverse transformation L σ → σ for L σ ∈ S 2 L 2 (R d ) as dequantization. In this terminology, the Wigner distribution W f for f ∈ L 2 (R d ) is the dequantization of the rank-one operator The reader should consult [16,Chapter 13] and [7,Chapter 4] for more details about the Weyl transform from a quantum mechanical perspective. Central to time-frequency analysis and its transforms is the representation theory of the Heisenberg group H 2d+1 . We can realize the Heisenberg group H 2d+1 as R d × R d × R where the multiplication between two elements is given by The Heisenberg group H 2d+1 acts on the space L 2 (R d ) by π(x, ω, t)(f ) := e 2πit e πixω T x M ω f.
We refer to π as the Schrödinger representation and it is both unitary and irreducible. A short computation reveals that the matrix coefficients of the Schrödinger representation is given by where V g f is the short-time Fourier transform (STFT) given by We have from [15,Lemma 4.3.1] that the cross-Wigner transform and the STFT is related by the formula where P (g)(x) := g(−x). Thus the matrix coefficients of the Schrödinger representation is related to the Wigner distribution by the formula Moreover, the Stone-von Neumann theorem [

Wavelet Transforms and the Affine Group
The two main operators in time-scale analysis are the time-shift operator T x and the dilation operator D a given by for a > 0 and f ∈ L 2 (R). In time-scale (or wavelet) analysis, the affine group takes the role that the Heisenberg group has in time-frequency analysis. The affine group Aff := (R × R + , · Aff ) has the group operation (x, a) · Aff (y, b) := (x + ay, ab), (x, a), (y, b) ∈ Aff, motivated by the composition rule We can represent the affine group Aff and its Lie algebra aff as Essential for computations is the fact that the exponential map exp : aff → Aff given by is a global diffeomorpism. The left Haar measure on Aff is given by a −2 da dx, while the right Haar measure is a −1 da dx. We will use the notation L 2 r (Aff) and L 2 l (Aff) to indicate if we are using the right or left Haar measure, respectively. The left and right Haar measures on Aff can be written in the coordinates induced by the exponential map as where the function λ is given by .

(2.6)
A natural way the affine group can act on L 2 (R) is by translations and dilations, namely as This is a unitary representation, although it is not irreducible. The matrix coefficients of this representation are given by

A Quantization Approach to the Affine Wigner Distribution
We will briefly outline a procedure described in [12] to determine the affine Wigner distribution. The theory is based on Kirillov's theory of coadjoint orbits and we refer further explanations to the aforementioned paper.
The affine group Aff acts on its Lie algebra aff through the adjoint action A representation Φ of a Lie group G on a vector space V is always accompanied by a representation Φ * of G on the dual space V * defined by where the bracket denotes the natural pairing between V and V * . In the case of the adjoint action in (2.9) we denote the accompanied representation on aff * by Ad * and call it the coadjoint representation of the affine group. We can realize aff * as matrices on the form Any point on the form (x, 0) ∈ aff * is a fixed point for the coadjoint representation. The upper and lower half-planes both constitute distinct orbits. For reasons of symmetry it suffices to understand the representation corresponding to H + . It is convenient to identify H + ≃ Aff as sets and use the notation (x, a) for a general element in H + . From general coadjoint orbit theory [17,Chapter 1.2] it follows that Aff is equipped with a canonical symplectic structure. In fact, this symplectic structure is simply the right Haar measure a −1 da dx on Aff. The main idea of Kirillov's theory is to associate irreducible representations of the Lie group to orbits of the coadjoint representation in a one-to-one manner. A realization of the representation corresponding to H + is given by acting on ψ ∈ L 2 (R + ) by The representation U is (up to a normalization) the representation (2.7) on the Fourier side. Define the Stratonovich -Weyl operator on L 2 (R + ) by the formula where ψ ∈ L 2 (R + ), (x, a) ∈ Aff, and λ is the function defined in (2.6). The following result is given in [12,Corollary 4.3].
Proposition 2.1. There is an isometric isomorphism between L 2 r (Aff) and the space of Hilbert-Schmidt operators on L 2 (R + ). The isomorphism sends f ∈ L 2 r (Aff) to the operator A f on L 2 (R + ) defined by The association f → A f is called affine quantization, while the direction A f → f is referred to as affine dequantization. Moreover, we call f the (affine) symbol of A f . Recall that any Hilbert-Schmidt operator A on L 2 (R + ) has an associated integral kernel Motivated by (2.3), the affine Wigner distribution should be defined as the affine dequantization of a rank-one operator. Hence we have the following definition.
Definition 2.2. The affine cross-Wigner transform acts on functions ψ, φ ∈ L 2 (R + ) by for (x, a) ∈ Aff. We will refer to the diagonal W ψ Aff := W ψ,ψ Aff as the affine Wigner distribution of ψ.

Basic Properties
We begin by deriving basic properties of the affine Wigner distribution. Different approaches to the affine Wigner distribution will be discussed in Section 4. The affine Wigner distribution is fundamentally related to the transformation where λ is the function given in (2.6).
is an isometry. Moreover, the affine Wigner distribution W Aff can be factorized as where F 1 denotes the Fourier transform in the first component and ψ ⊗ φ(r, s) := ψ(r)φ(s) for r, s ∈ R + .
The factorization in Lemma 3.1 is key for understanding essential properties of the affine Wigner distribution. We illustrate its use by extending the orthogonality property of the classical Wigner distribution in (2.1) to the affine setting.

Proposition 3.2. The affine Wigner distribution satisfies the orthogonality relation
Proof. We use the factorization in Lemma 3.1 and obtain We will refer to (3.1) as the affine orthogonality relation motivated by the analogous result for the classical Wigner distribution in (2.1). Through a different (but ultimately equivalent) approach to the affine Wigner distribution taken in [3] and [19], the affine orthogonality relation is already known. The usefulness of the affine orthogonality relation can be readily demonstrated.
Aff then the affine orthogonality relation (3.1) shows that . This can only happen when ψ = c · φ with |c| = 1.
The marginal properties [15,Lemma 4.3.6] for the classical Wigner distribution strengthen a quantum mechanical interpretation of the Wigner distribution. For the affine Wigner distribution, we need an analogue of the Fourier transform on the group R + . This is the Mellin transform given by for x ∈ R and ψ ∈ L 2 (R + ). There is little consensus regarding the exponent of a in the literature and we recommend checking carefully which convention is used whenever the Mellin transform is encountered. The Mellin transform is related to the Fourier transform F by . Hence the Mellin transform is a unitary map between the Hilbert spaces L 2 (R + ) and L 2 (R). Additionally, the inverse of the Mellin transform is given by for a ∈ R + and f ∈ L 2 (R). Finally, the Mellin transform of a dilated function can be calculated to be 3) The following marginal properties have been stated in [22] where the proofs are referred to the unpublished Ph.D. thesis of R. G. Shenoy. We provide a new proof of this remarkable fact to fill inn gaps that are lacking in the original sources.

Proposition 3.4. The affine Wigner distribution satisfies the marginal properties
for all ψ ∈ S(R + ).
Proof. To show the first marginal property, we again use the factorization in Lemma 3.1 and obtain In the last step, we used that λ(0) = 1 as a straightforward limit argument shows. The validity of the pointwise convergence in the Fourier inversion step is clear from the assumption on ψ.
For the second marginal property, we utilize a change of variables in the definition of the affine Wigner distribution to get the alternative form The isometry property of the Mellin transform can then be used to obtain By using the dilation relation (3.3) we can transform this expression to Finally, by using the inverse Mellin transform (3.2) we end up with Interchanging the order of integration and the pointwise convergence of the Mellin transform is easily justified under the assumption that ψ ∈ S(R + ).

Remark. It follows from Proposition 3.4 that
for all ψ in the dense subspace S(R + ) ⊂ L 2 (R + ). If ψ L 2 (R + ) = 1 and W ψ Aff is everywhere non-negative, then the affine Wigner distribution would be a probability density function on the upper half-plane. We will elaborate on this in Section 9.
If ψ ∈ S(R + ) have compact support and a ∈ R + is outside the support of ψ, then Proposition 3.4 shows that This extreme case can be improved with the following finite support property.
Proof. The functions ψ(aλ(u)) and ψ(aλ(−u)) are both non-zero if and only if If a > s then L ⊂ (0, 1). Hence it suffices to show that λ(u) and λ(−u) can not take values in (0, 1) simultaneously. This follows since λ(u) is an increasing function that only takes values in (0, 1) whenever u < 0. If a < r then L ⊂ (1, ∞). In this case, the result follows from the fact that λ(u) > 1 if and only if u > 0.

Alternative Descriptions
Although the affine Wigner distribution was constructed rather recently, it has appeared in the literature several times in different disguises. In this section, we outline two instances of this and see how this enriches our understanding of the more subtle properties of the affine Wigner distribution. Consider a function ψ ∈ L 2 (R) ∩ L 2 (R + ) that is supported on R + and let f ∈ L 2 (R) be such thatf = ψ. The affine Wigner distribution W φ Aff is related to the Bertrand P := (P 0 , 1) distribution described in [20] by the formula Notice that the convention for the Bertrand distribution is that the first entry is a positive number, while the second is real. The distribution P is called the Bertrand P 0 distribution and is in both the affine class and the hyperbolic class described in [20]. From this we can gauge several invariance properties of the affine Wigner distribution: • The fact that P is in the affine class gives the invariance properties where M ω denotes the frequency-shift operator and D r denotes the dilation operator. These invariance properties can be summarized as where U is the action of the affine group on L 2 (R + ) given in (2.10).
• The fact that P is in the hyperbolic class gives the invariance property where H(c, f r ) is the transformation Pay attention to the fact that the positive reference frequency f r only appears on the lefthand side of (4.3).
The affine Wigner distribution W Aff can be derived in another way by emphasizing invariance properties as done in [3] and [19]. From this perspective, one starts with a general quadratic distribution and require invariance under a group extension of the affine group. This will produce the distribution where µ(u) is a weight function that satisfies µ(u) = µ(−u). The requirement that W ψ satisfies the affine orthogonality relation forces µ ≡ 1 so that W ψ = W ψ Aff . Through this description, the affine orthogonality relation would not need to be proved as it is incorporated in the construction. However, this approach conceals the quantization picture. The affine Wigner distribution W Aff is a special case of a family of distributions that are called tomographic distributions in [3].
We would like to mention that there have been other attempts at defining a notion of affine Wigner distribution that do not coincide with our definition. As an example, we refer the reader to [14] and the recent successor paper [13] where an affine Wigner-like quasi-probability is defined through a semi-classical quantization approach. Although this is different from the approach in [12] that our work is based on, it has similarities in both motivation and properties.

Affine Convolution Representation of the Scalogram
Recall from the introduction that the classical Wigner distribution can represent the spectogram through convolution where we used the orthogonality of the classical Wigner distribution given in (2.1). However, even though this might look like a convolution type representation of the scalogram, it does not incorporate one of the natural measures on the affine group. Hence we need to look elsewhere to obtain a proper convolution representation for the scalogram. Before stating the result, we recall some generalities from the theory of locally compact groups applied to the affine group: The affine convolution between two functions f, g on the affine group is given (whenever it is well-defined) by A departure from the usual Euclidean convolution is that the affine convolution is not commutative. The modular function ∆ on any locally compact group measures the difference between the right and left Haar measure. We refer the reader to a precise definition in [11,Chapter 2.4] as we only need that the modular function on the affine group is Finally, the (right) involution of a function f on the affine group is given by The following convolution formula should be compared with (5.1).
Theorem 5.1. Let f and g be square integrable functions on the real line and assume their Fourier transforms φ := f and ψ := g are supported in R + and are in L 2 (R + ). Then for all (a, x) ∈ Aff.
Proof. By using Parseval's identity and that the support of the Fourier transforms are in R + we obtain where U (x, a) is the action of the affine group given in (2.10). The affine orthogonality relation given in Proposition 3.2 and the invariance property given in (4.2) together show that We use the involution on the affine group to write Combining these observations shows that where ∆ is the modular function on the affine group.

The Affine Ambiguity Function and Distributional Extension of Affine Quantization
The cross-ambiguity function in time-frequency analysis of f, g ∈ L 2 (R) is defined to be The ambiguity function Af := A(f, f ) of f ∈ L 2 (R) has been frequently used in radar applications [15,Chapter 4.2]. In the affine setting, we suggest that the affine cross-ambiguity function of ψ, φ ∈ L 2 (R + ) should be the function A ψ,φ Aff on the upper half-plane defined by Similarly as before, we call the function A ψ Aff := A ψ,ψ Aff the affine ambiguity function. In [22] the authors define a different notion of affine ambiguity function under the name wide-band ambiguity function. Notice that the definition of A ψ,φ Aff incorporates the Haar measure on R + in a natural way. Moreover, we will show that our definition possesses properties that justifies the terminology affine ambiguity function. The first statement in the following lemma is a straightforward change of variables, while the last statement is a direct consequence of [15, Lemma 4.2.1]. Lemma 6.1. For ψ, φ ∈ L 2 (R + ) we define the functions Ψ(x) := ψ(e x ) and Φ(x) := φ(e x ) for x ∈ R. Then Ψ, Φ ∈ L 2 (R) and Moreover, the affine ambiguity function satisfies for every (x, a) = (0, 1).
From Lemma 6.1 we obtain most of the expected results regarding the affine cross-ambiguity function. To illustrate this, we show two essential properties.
Proposition 6.2. The affine cross-ambiguity function satisfies the orthogonality relation Proof. Let Ψ i (x) := ψ i (e x ) and Φ i (x) := φ i (e x ) for i = 1, 2 and x ∈ R. We have by Lemma 6.1 that In [15,Lemma 4.3.4] it is showed that the ambiguity function is related to the usual cross-Wigner transform by where F is the Fourier transform and U is the rotation U F (x, ω) := F (ω, −x) for a function F on R 2 . In particular, the ambiguity function satisfies the same orthogonality properties as the cross-Wigner transform (2.1). Hence we obtain that The second property is an uncertainty principle for the affine ambiguity function. Notice that if U = U 1 × U 2 ⊂ Aff is a Borel set, then the right Haar measure µ r (U ) of U is given by where | · | is the usual Euclidean measure. Corollary 6.3. Let ψ ∈ L 2 (R + ) be normalized and let U ⊂ Aff be a Borel set. Assume that there is an ǫ > 0 such that for all p > 2. In particular, setting p = 4 gives µ r (U ) ≥ 2(1 − ǫ) 2 while letting p tend to infinity shows that µ r (U ) ≥ 1 − ǫ.
Proof. Notice that the assumption (6.2) is by Lemma 6.1 equivalent to where Ψ(x) := ψ(e x ). We can write AΨ(u, x) = e πiux V Ψ Ψ(u, x), where V is the STFT given in (2.4). The assumption We need to relate the affine ambiguity function to the affine Wigner distribution. Define the function , for y ∈ R and b > 0 with the convention that Θ(y, 1) = 1 for all y ∈ R. If we write b = e u for u = log(b), then where λ is the function given in (2.6). Hence we can think of Θ(y, b) as arising from a symmetrization of the function λ. We leave the verification of the following result to the reader as it is straightforward.
where (x, a) ∈ Aff and M is the Mellin transform.
We say that a smooth function f on the affine group Aff is rapidly decaying if The space of rapidly decaying smooth functions on Aff will be denoted by S(Aff). The following result illustrates how we can use the Mellin transform and the affine ambiguity function to deduce properties of the affine Wigner distribution.
Proposition 6.5. For ψ, φ ∈ S(R + ) the affine Wigner distribution satisfies W ψ,φ Aff ∈ S(Aff). Proof. If we let Ψ(x) := ψ(e x ) and Φ(x) := φ(e x ), then by Lemma 6.1 and Lemma 6.4 we want to show that . It follows from [15,Theorem 11.2.5] that the usual ambiguity function sends Schwartz functions on R to Schwartz functions on R 2 . Hence A(y, b) := A Ψ,Φ (log(b), y) ∈ S(Aff). Since Θ(y, b) is a smooth function with polynomially bounded derivatives, the same goes for the product Θ(y, b) · A(y, b). Recall that the Mellin transform is related to the Fourier transform by the formula M(ψ)(x) = F(Ψ)(x) for x ∈ R. Thus the result follows from the fact that the Fourier transform preserves Schwartz functions.
The dual space of S(Aff) will be denoted by S ′ (Aff) and called the tempered distributions on the affine group. The following is now a direct consequence of (2.11) and Proposition 6.5.
Corollary 6.6. The affine quantization f → A f extends to a well-defined map from f ∈ S ′ (Aff) to operators A f : S(R + ) → S ′ (R + ).
for f ∈ S(Aff) and (x, a) ∈ Aff. We compute for ψ, φ ∈ S(R + ) that Hence the operator A δ Aff (x,a) is weakly defined through the values of the affine Wigner distribution.

An Almost Analytic Decomposition
Recall that analytic and anti-analytic functions f are characterized by the equations respectively. The fact that the affine Wigner distribution W ψ,φ Aff is in the space L 2 r (Aff) for ψ, φ ∈ L 2 (R + ) allows us to exclude (anti-)analytic functions from being in the image of the affine Wigner distribution. Proposition 7.1. There are no analytic or anti-analytic functions in the space L 2 r (Aff). In particular, functions on the form f = W ψ,φ Aff for ψ, φ ∈ L 2 (R + ) can neither be analytic nor antianalytic.
Proof. The conclusion is easier to obtain by looking at the isomorphic spaces in the unit disc D by applying the standard linear fractional transformation. Under this transformation, the analytic functions in L 2 r (Aff) are transformed to the analytic functions f in the unit disc satisfying the integrability condition D |f (z)| 2 1 − |z| 2 dz < ∞. From Proposition 7.1 a few natural questions emerge: What kind of analytic-like functions are in the space L 2 r (Aff)? Is it possible to decompose the space L 2 r (Aff) into pieces consisting of "almost analytic" and "almost anti-analytic" functions? By looking at the equivalent integrability condition in the disk, it is clear that the function f (z) = 1 − |z| 2 satisfies (7.1). Although it is not analytic nor anti-analytic, it is almost both.
Similarly, a function f : U → C will be called anti-poly-analytic of order n ∈ N if Notice that the function f (z) = 1 − |z| 2 is both poly-analytic and anti-poly-analytic of order two. The poly-analytic functions of order one are simply analytic functions, while the anti-polyanalytic functions of order one are the anti-analytic functions. We refer to an (anti-)poly-analytic function of order n as pure if it is not (anti-)poly-analytic of order n − 1 or lower. Poly-analytic functions and anti-poly-analytic functions do not inherit all the amazing properties that analytic functions are known for; the function f (z) = 1 − |z| 2 vanish on the whole unit circle without being identically zero. The failure of the strong unique continuation principle for poly-analytic and anti-poly-analytic functions is what makes it possible for them to exist in L 2 r (Aff). We will show, inspired by a method in [23], that we can decompose the space L 2 r (Aff) into pieces consisting of poly-analytic and anti-poly-analytic functions. Before we can do this, we explore how generalized Laguerre polynomials give us a suitable orthonormal basis.
For α > −1 we have the orthogonality relation where Γ denotes the Gamma function. Introduce the functions for α > −1. It is straightforward to check that the functions in (7.3) form an orthonormal basis for L 2 (R + ) for each fixed α > −1 by using (7.2). If α = 1 we use the simplified notation L n := L n .
Lemma 7.4. If {ψ n } n∈N is an orthonormal basis for L 2 (R + ), then the functions {W ψn,ψm Aff } n,m∈N constitute an orthonormal basis for L 2 r (Aff). In particular, for a fixed α > −1, we can expand any f ∈ L 2 r (Aff) as Proof. The orthonormality of the functions W ψn,ψm Aff clearly follows from Proposition 3.2. To see the completeness in L 2 r (Aff) we assume that f ∈ L 2 r (Aff) satisfies f, W ψn,ψm for every n, m ∈ N. If we let A f be the Hilbert-Schmidt operator acting on L 2 (R + ) corresponding to f through the quantization procedure, then equation (2.11) implies that Since {ψ n } n∈N is an orthonormal basis for L 2 (R + ) we have that A f = 0. As the quantization correspondence between f and A f is a Hilbert space isomorphism, we conclude that f = 0.
Returning to the problem of decomposing L 2 r (Aff), we use the notation A n (Aff) and A ⊥,n (Aff) for all functions f ∈ L 2 r (Aff) that are poly-analytic and anti-poly-analytic of order n, respectively. Finally, we use the notation A n (Aff) ⊂ A n (Aff) and A ⊥,n (Aff) ⊂ A ⊥,n (Aff) for the subspaces of pure poly-analytic and pure anti-poly-analytic functions of order n, respectively.
Proposition 7.5. The space L 2 r (Aff) has the orthogonal decomposition Moreover, the spaces A n (Aff), A ⊥,n (Aff), A n (Aff), and A ⊥,n (Aff) for n ≥ 2 can be identified with the spaces We have delegated the proof of Proposition 7.5 to Appendix 10.1 as it is heavily inspired by a technique used in [23]. The poly-analytic functions have appeared prominently in the work of Abreu, see e.g. [1], in the context of wavelet analysis and sampling theory. However, a significant difference is that Abreu only considers poly-analytic functions and not the anti-poly-analytic functions.

Affine Wigner Approximation
Let us use the notation and call W(Aff) the affine Wigner space. The affine orthogonality relation (3.1) implies that W(Aff) is a closed subset of L 2 r (Aff). Although we can create orthonormal bases for L 2 r (Aff) by using the affine cross-Wigner transform as done in Lemma 7.2, the space W(Aff) is a proper subset of L 2 r (Aff). Despite the fact that an arbitrary function f ∈ L 2 r (Aff) is not in the affine Wigner space, it is natural to ask how far f is from being in W(Aff). Hence we are interested in the following affine Wigner approximation problem: Given a function f ∈ L 2 r (Aff), we want to understand the quantity inf At this point, it should be clear to the reader that the norm on L 2 r (Aff) is the most natural choice to consider. The analogous problem for the classical Wigner distribution has been recently investigated in [2]. Our approach will be different from the one taken in [2] as it will emphasize the quantization picture.
Let us begin by discussing what we might expect to obtain: Our goal is to understand (8.1) in terms of intrinsic properties of the function f . To be more precise, consider g = W ψ Aff for some ψ ∈ L 2 (R + ). Then (2.11) and the affine orthogonality relation show that for φ ∈ L 2 (R + ). It follows that the Hilbert-Schmidt operator A g on L 2 (R + ) is the positive rankone operator The converse follows as well, so there is a one-to-one correspondence between affine Wigner distributions and the positive rank-one operators given in (8.2). Hence the distance (8.1) should somehow be related to how far A f is from being a rank-one operator. We will show in Corollary 8.2 that, for a large class of functions f ∈ L 2 r (Aff), this heuristic is correct. We use the notation If A f is a positive operator, then λ + max (A f ) coincides with the spectral radius of A f .
Theorem 8.1. The affine Wigner approximation problem for a real-valued function f ∈ L 2 r (Aff) has the explicit solution always exists. Moreover, when λ + max (A f ) > 0 the number of minimizers is equal to the multiplicity of λ + max (A f ). If λ + max (A f ) = 0, then the zero function is the unique minimizer.
Proof. From Proposition 2.1 and the discussion above, we have that Since A f is a Hilbert-Schmidt operator it is in particular a compact operator. Moreover, A f is self-adjoint since for ψ, φ ∈ L 2 (R + ). Thus the spectral theory for compact, self-adjoint operators implies that the spectrum Spec(A f ) = {λ k } ∞ k=0 of A f is countable with 0 ∈ Spec(A f ) as the only possible accumulation point. Moreover, there is by [11,Theorem 1.52] an orthonormal basis {φ k } ∞ k=0 for L 2 (R + ) such that φ k is an eigenvector A f corresponding to the eigenvalue λ k . The convention that eigenvalues with higher multiplicity than one are repeated according to their multiplicity is used.
We claim that we can write where the convergence is in the Hilbert-Schmidt norm. Notice that convergence of ∞ k=0 λ k φ k ⊗ φ k to A f is guaranteed in the operator norm from the theory of compact operators [4,Theorem 3.5]. Hence it suffices to show that ∞ k=0 λ k φ k ⊗ φ k converges in the Hilbert-Schmidt norm; this will imply together with the norm inequality · op ≤ · HS that ∞ k=0 λ k φ k ⊗φ k must converge to A f in the Hilbert-Schmidt norm. Since S 2 (L 2 (R + )) is complete, it suffices to show that ∞ k=0 λ k φ k ⊗ φ k is a Cauchy sequence. For n, m ∈ N with n < m we have The claim follows from the fact that A f is a Hilbert-Schmidt operator.
Returning to the problem, we can now write Assume that λ j = λ + max (A f ). Then (8.4) is clearly minimized when ψ = λ j φ j . By orthogonality, we can rewrite (8.4) and obtain inf g∈W(Aff) We always have a minimizer as we can take h = W ψ Aff . The statement about uniqueness of minimizers is clear from (8.4).

Remarks.
• From the spectral theory of compact, self-adjoint operators, it also follows that the eigenspaces corresponding to non-zero eigenvalues are finite-dimensional. Hence, for a given f ∈ L 2 r (Aff), there is at most a finite number of minimizers h 1 , . . . , h k ∈ L 2 r (Aff) so that inf Then min Proof. Since A f is self-adjoint it follows from [11,Proposition 1.24] that Hence the result follows from Theorem 8.1.

Remark.
Notice that under the assumptions in Corollary 8.2, the heuristic we presented regarding rank-one operators holds true: If A f is a rank-one operator, then the Hilbert-Schmidt norm and the operator norm coincide. Hence (8.5) is zero and thus f is in the affine Wigner space W(Aff). Conversely, the equations Proposition 8.4. There are no non-zero dilation invariant Hilbert-Schmidt operators on L 2 (R + ).
Proof. Assume by contradiction that A ∈ S 2 (L 2 (R + )) is dilation invariant. The quantization correspondence implies that A is on the form A = A f for some f ∈ L 2 r (Aff). It follows from (4.1) that W for r > 0 and (x, a) ∈ Aff. Hence (2.11) implies that On the other hand, since A f is dilation invariant we also have By Lemma 7.4, this forces f ∈ L 2 r (Aff) to satisfy the homogeneity relation for all r ∈ R + and almost every (x, a) ∈ Aff. However, this implies that Hence f is not in L 2 r (Aff) unless f = 0, in which case A f is the zero operator.

Remarks.
• Notice that the proof of Proposition 8.4 actually shows that there can be no nonzero Hilbert-Schmidt operator A that satisfies (8.7) even for a single r = 1. In particular, there are no non-zero Hilbert-Schmidt operators on L 2 (R + ) that are discretely dilation invariant in the sense that • Consider the space M (0,∞) of all ψ ∈ L 2 (R + ) such that the Mellin transform of ψ satisfies supp(M(ψ)) ⊂ R + .
Then the orthogonal projection P : L 2 (R + ) → M (0,∞) is dilation invariant due to the dilation property of the Mellin transform given in (3.3). Hence there are non-trivial dilation invariant operators on L 2 (R + ).
We end by giving an application to the trace class operators on L 2 (R + ). The following result is motivated by [7,Proposition 162]. Corollary 8.5. Let T ∈ S 1 (L 2 (R + )) be a trace class operator. Then we can write T = A f • A g for f, g ∈ L 2 r (Aff). Moreover, the trace of T can be calculated by the formula Proof. As mentioned in Appendix 10.2, any trace class operator T on L 2 (R + ) can be written as a composition of Hilbert-Schmidt operators T = A • B with A, B ∈ S 2 (L 2 (R + )). We can now use the bijective correspondence between Hilbert-Schmidt operators on L 2 (R + ) and L 2 r (Aff) to write A = A f and B = A g for f, g ∈ L 2 r (Aff). Thus by (10.7) we can write Remark. Notice that This can also be deduced from (10.6). In particular, the trace of T is real-valued whenever f and g are real-valued.

Further Research
Affine Grossmann-Royer Operator A standard tool for deriving properties of the classical Wigner distribution is the Grossmann-Royer operator R(x, ω) defined by the relation for f, g ∈ L 2 (R d ) and (x, ω) ∈ R 2d . The precise formula for R(x, ω) can be found in [7, Chapter 1] with a different normalization. An essential property of the Grossmann-Royer operator R(x, ω) is that for all f ∈ L 2 (R d ) and (x, ω) ∈ R 2d . This is immensely useful; to see that the classical cross-Wigner transform is bounded one simply needs to apply Cauchy-Schwarz inequality to obtain Analogously, we define the affine Grossmann-Royer operator R Aff (x, a) by the relation , for ψ, φ ∈ S(R + ) and (x, a) ∈ Aff. We restrict our attention to Schwartz functions for convenience since then W ψ,φ Aff ∈ S(Aff) and hence have well-defined point values. Notice that the affine Grossmann-Royer operator R Aff (x, a) is precisely the affine quantization of the point mass δ Aff (x, a) given in Example 6.7.
Trying to generalize the strategy in (9.1) runs into a problem: The affine Grossmann-Royer operator is not a bounded operator on S(R + ) ⊂ L 2 (R + ) with respect to the norm · L 2 (R + ) . However, if ψ ∈ S(R + ) is supported in the interval 1 k , k for some k > 0, then there is a constant We call the optimal constant C k in the inequality above the k-support constant. Hence if φ ∈ S(R + ) we have sup A trivial adaption of [15,Lemma 4.3.7] gives the following relative uncertainty principle for the affine Wigner distribution.
Proposition 9.2. Let ψ ∈ S(R + ) be supported in the interval 1 k , k for some k > 0 and let U ⊂ Aff be a Borel set. Assume there is an ǫ ≥ 0 such that Then the right Haar measure of U satisfies where C k is the k-support constant.
Motivated by Proposition 9.2, it is of interest to investigate the k-support constant C k both numerically and asymptotically.

Affine Positivity Conjecture
One of the major results about the classical Wigner distribution is regarding positivity; when is W f a non-negative function on R 2d ? Normalized functions f ∈ L 2 (R d ) such that W f is nonnegative would generate probability density functions on R 2d that represent the time-frequency distribution of f . However, a well-known result of Hudson [15,Theorem 4.4.1] shows that this can only happen for suitably perturbed Gaussians.
Turning to the affine setting, we would like to determine the normalized functions ψ ∈ L 2 (R + ) such that W ψ Aff is a non-negative function on the affine group. In [19] the authors showed that the affine Wigner distribution W ψs Aff is non-negative if ψ s is the so called Morse ground state ψ s (r) := r s e − r 2 Γ(2s) , s ≥ 0.
In our case, we will only consider ψ s for s > 0 as ψ 0 ∈ L 2 (R + ). More generally, one can use the invariance properties (4.2) and (4.3) to show that the affine Wigner distribution W ψ Aff of ψ(r) = Cr −i(x+ia) e i(y+ib)r , is non-negative when C ∈ C and (x, a), (y, b) ∈ Aff. The functions on the form (9.2) are the generalized Klauder wavelets in [9,Equation (41)] that are in L 2 (R + ). This leads to the following affine positivity conjecture: The only functions ψ ∈ L 2 (R + ) such that W ψ Aff is non-negative on the affine group are the generalized Klauder wavelets in (9.2).
Hence the decomposition formulas for A n (Aff) and A ⊥,n (Aff) given in (7.5) imply the desired orthogonal decomposition of L 2 r (Aff) because the functions L n form an orthonormal basis for L 2 (R + , a −1 da). Moreover, the decompositions of the pure spaces A n (Aff) and A ⊥,n (Aff) also follow from the decompositions of A n (Aff) and A ⊥,n (Aff). We will focus on showing the decomposition since the decomposition of A ⊥,n (Aff) is similar. The idea of the proof is to define several isometries of the space L 2 r (Aff) so that the equation ∂ z f = 0 is transformed into something more manageable. Define the two isometries F ⊗ I : L 2 (R, dx) ⊗ L 2 (R + , da) −→ L 2 (R, dx) ⊗ L 2 (R + , da), where F denotes the Fourier transform and V is given by .
Thus we obtain the decomposition (10.1) and the rest of the result follows from the remarks made in the beginning of the proof.

Schatten Class Operators
As we need Hilbert-Schmidt operators throughout the paper and trace class operators in Corollary 8.5, we review basic facts about Schatten class operators S p (H) on an arbitrary separable Hilbert space H. In this framework, the trace class operators are simply S 1 (H), while the Hilbert-Schmidt operators are S 2 (H). A thorough reference for Schatten class operators is [4,Chapter 3]. We are primarily interested in the following two cases: • For p = 1 we call S 1 (H) the trace class operators. The norm on S 1 (H) is equivalently given by |T |ψ n , ψ n , (10.5) where {ψ n } ∞ n=0 is an orthonormal basis for H. As expected, the expression (10.5) does not depend on the choice of orthonormal basis. Hence we can define the trace of any trace class operator T as the sum of the absolutely convergent series Tr(T ) =