Relative entropy, Haar measures and relativistic canonical velocity distributions

The thermodynamic maximum principle for the Boltzmann-Gibbs-Shannon (BGS) entropy is reconsidered by combining elements from group and measure theory. Our analysis starts by noting that the BGS entropy is a special case of relative entropy. The latter characterizes probability distributions with respect to a pre-specified reference measure. To identify the canonical BGS entropy with a relative entropy is appealing for two reasons: (i) the maximum entropy principle assumes a coordinate invariant form; (ii) thermodynamic equilibrium distributions, which are obtained as solutions of the maximum entropy problem, may be characterized in terms of the transformation properties of the underlying reference measure (e.g., invariance under group transformations). As examples, we analyze two frequently considered candidates for the one-particle equilibrium velocity distribution of an ideal gas of relativistic particles. It becomes evident that the standard J\"uttner distribution is related to the (additive) translation group on momentum space. Alternatively, imposing Lorentz invariance of the reference measure leads to a so-called modified J\"uttner function, which differs from the standard J\"uttner distribution by a prefactor, proportional to the inverse particle energy.


Introduction
The combination of variational principles and group symmetries has proven extremely fruitful in various fields of theoretical physics over the past century, with applications ranging from classical mechanics [1,2] to quantum field theory [3,4,5]. In this paper, we would like to discuss how group and measure theoretical concepts may be incorporated into the maximum entropy principle (MEP) of canonical equilibrium thermostatistics [6]. ‡ To this end, we follow up an idea by Ochs [7,8] who demonstrated that the canonical Boltzmann-Shannon-Gibbs (BGS) entropy is a special case of relative entropy (Sec. 2). The relative entropy [9] characterizes a probability distribution with respect to a prespecified reference measure and allows a manifestly coordinate invariant formulation of the MEP. In particular, we will focus on how the choice of the reference measure affects the solution of the entropy maximization problem (i.e., the equilibrium distribution). Thereby, it will be clarified that an acceptable MEP must include a postulate that determines which specific reference measure has to be used for a given class of physical systems. To obtain a mathematically meaningful characterization of potential reference measures, one can study their symmetry properties by means of their transformation behavior under group actions. The idea of combining measure and group theory goes back to the Hungarian mathematician Alfred Haar [10]. In the second part of the paper, this approach will be pursued in order to analyze the MEPs for two of the most frequently discussed candidates for the relativistic one-particle equilibrium velocity distribution (Sec. 3).

Thermodynamic entropy, relative entropy and Haar measures
We start out by summarizing the standard formulation of the canonical MEP in Sec. 2.1. We shall focus on the simplest paradigm, corresponding to a spatially homogeneous, ideal gases of classical particles, as this is sufficient for illustrating the main ideas. The concept of relative entropy is reviewed in Sec. 2.2. The choice of the reference measures and their characterization in terms of symmetry groups is discussed in Sec. 2.3.

Standard formulation of the maximum entropy principle
The canonical one-particle equilibrium velocity distribution for a non-relativistic gas of weakly interacting particles (e.g., atoms or molecules) is the Maxwell distribution, corresponding to the normalized probability density function (PDF) Here, β = 1/T is the inverse temperature, m the mass of the particle, and V d denotes the space of the d-dimensional Cartesian velocity coordinates (throughout, we use units such that the speed of light c = 1, and Boltzmann constant k B = 1). In principle, one can find several different arguments to justify Eq. (1) [11]; e.g., it can be shown [12] that the marginal one-particle PDF of an isolated, weakly interacting N-particle gas converges to f M (v) in the thermodynamic limit. An alternative derivation that will be focussed on in the remainder of this paper is based on the canonical maximum entropy principle (MEP). The MEP approach starts from postulating a canonical Boltzmann-Gibbs-Shannon entropy functional of the form Here, E(v) = mv 2 /2 is the non-relativistic kinetic energy of a single particle (measured in the lab-frame), and ǫ the mean energy per particle which is assumed to be known. By means of two Lagrangian multipliers (α, β), the MEP results in the condition Solving this equation for f * and determining (α, β) from the constraints (2b), one recovers the Maxwellian (1) with parameter β = d/(2ǫ). Hence, the MEP based on Eq. (2a) appears to be satisfactory at first sight, but a more careful analysis reveals the following drawback: In order to give the empirically established result (1), the BGS entropy (2a) must be written in terms of the 'correct' physical variable, and one has to use the 'correct' coordinate representation (in the above case, v, or some linear transformation as momentum p = mv, expressed in Cartesian coordinates). Otherwise, one does not obtain the correct one-particle equilibrium distribution (1). To briefly illustrate this, consider the physically most relevant three-dimensional case (d = 3) and suppose that, instead of Cartesian coordinates (v 1 , v 2 , v 3 ), we had started from polar coordinates (v, φ, θ) ∈ [0, ∞) × [0, 2π) × [0, π] =: P 3 , i.e., by naively writing wheref (v, φ, θ) is subject to the constraints andĒ(v, φ, θ) = mv 2 /2 is the energy expressed in polar coordinates. Maximizing S B [f ] under the constraints (4b) yields For comparison, by transforming the Maxwell PDF (1) to polar coordinates we find whereJ is the Jacobian of the coordinate transformation . Upon comparing Eqs. (6) and 5, we observe thatf * =f M , due to the missing Jacobian prefactor in Eq. (5), This simple example illustrates that the above entropy definition is implicitly coordinate dependent. This fact is somewhat unsatisfactory. If viewed as fundamental, then the MEP should be formulated in a form that works independently from the underlying coordinate representation. As we shall discuss next, this can be achieved by recognizing that the thermodynamic entropy (2a) is a special case of the so-called relative entropy [7,8,9].

Relative entropy
First, we summarize the definition of the relative entropy [7,8,9] and demonstrate its invariance under coordinate transformations. Subsequently, it will be shown how the BGS entropy (2a) is embedded into this concept.
Consider some set X ⊆ R d and two measures µ and ν on X that are absolutely continuous with respect to each other (i.e., µ and ν have the same null sets in X [13]). The relative entropy of µ with respect to ν is defined by § where the function is the so-called Radon-Nikodym density [13] of µ with respect to ν. The measure ν plays the role of a reference measure. We briefly illustrate the meaning of the Radon-Nikodym density by two simple examples: The most prominent measure on R d is the Lebesgue measure, denoted by λ [13]. The measure λ assigns to any d-dimensional rectangular parallel-epiped where it is assumed that b i > a i holds ∀ i = 1, . . . , d. If, for example, µ is a probability measure on X ⊆ R d , then the Radon-Nikodym density f µ|λ (x) of µ with respect to λ is the 'ordinary' PDF of µ.
As the second example, consider two measures µ, ν on X ⊆ R d with non-vanishing densities f µ|λ > 0 and f ν|λ > 0 on X. In this case, the Radon-Nikodym density of µ with respect to ν is given by the quotient of their densities, i.e., Accordingly, we may rewrite the relative entropy (7a) in terms of the two densities f µ|λ and f ν|λ as In the second line, we have inserted the equivalent notation d d x for the Lebesgue measure dλ of an infinitesimal volume element in R d . Equation (9) will provide the basis for all subsequent considerations. We note that, in order to define relative entropy, it is a priori not required that the measures µ and ν are normalizable on X ⊆ R d ; it suffices to assume that they have the same null sets, i.e., f µ|λ (x) = 0 implies f ν|λ (x) = 0 and vice versa, so that the argument of the logarithm is well-defined. Before discussing how the BGS entropy (2a) arises as a special case of Eq. (9), it is useful to give the general, coordinate invariant form of the MEP with Eq. (9) serving as the starting point. For this purpose, we impose the constraints where E > 0 is a non-negative 'energy' function. ¶ Maximizing S[µ|ν] = s[f µ|λ |f ν|λ ] with respect to µ or, equivalently, with respect to f µ|λ , and taking into account the constraints (10a) and (10b), leads to the condition Similar to Eq. (3), α and β have entered here as Lagrangian multipliers for the normalization and 'energy' constraints, respectively. From Eq. (11) the solution of the variational problem is obtained as The parameters (α, β) are determined by means of the conditions (10a) and (10b). As it is evident from Eq. (12), the 'equilibrium' PDF f * µ|λ depends on the choice of the reference density f ν|λ (x).
¶ In principle, one could also include more than two constraints. We next show that the relative entropy definition (9) is manifestly coordinate invariant. For this purpose, consider a change of coordinates x →x, and denote byX the range of the new coordinates. Using the following standard formulae for the transformation of volume elements and densities f : whereJ = (∂x/∂x) is the Jacobian of the coordinate transformation, we find that Hence, the relative entropy is indeed independent of the choice of the coordinates, due the fact that the Jacobians in the argument of the logarithm cancel. As a consequence, the solution of the associated MEP becomes coordinate independent as well. To demonstrate this more explicitly, we first rewrite the constraint function E in terms of the new coordinates by definingĒ(x) :=Ē(x(x)). Then, the constraints (10a) and (10b) may be expressed equivalently in the new coordinates as Hence, the solution of the associated variational problem reads This is indeed the correct transformation law for the equilibrium PDF f * µ|λ from Eq. (12); i.e., once the reference measure ν and its density are properly specified, the MEP and its solution become independent of the choice of the coordinates.
Finally, it is straightforward to see that the BGS entropy (2a) is a special case of Eq. (9): We identify X = V d = R d and fix the reference measure as the Lebesgue measure in velocity space ν = λ. Then, taking into account that f λ|λ (v) ≡ 1, Eq. (9) reduces to the BGS entropy (2a); i.e., explicitly, We thus note that the canonical BGS entropy corresponds to a specific choice of the reference measure, namely, the Lebesgue measure in velocity space. Put differently, whenever one writes an entropy in the 'standard' form (17), one has implicitly fixed an underlying reference measure (defined with respect to some set of primary variables).
With regard to the subsequent discussion it will be important to keep in mind that the solution (12) of the coordinate invariant MEP is determined by two ingredients: (i) the 'energy' function E that specifies the mean value constraint; (ii) the underlying reference measure ν. While usually the energy function E is known, it is a not-so-trivial problem to identify the appropriate reference measure ν for a given class of physical systems. In the next section, we are going to discuss how one can classify reference measures according their transformation properties under symmetry groups.

Choice of the reference measure: Group invariance and Haar measures
The above discussion shows that the MEP is incomplete unless one is able to specify the reference measure ν on the state space X. Put differently, before accepting the MEP as a truly fundamental principle, one has to find a general method that allows to determine ν for a given class of dynamical systems. A promising step towards solving this problem is to analyze potential reference measures with respect to their invariance properties under fundamental symmetry transformations. Conceptually, this idea is closely related to the theory of Haar measures [10,13]. In a seminal paper [10] published in 1933, the Hungarian mathematician Alfred Haar studied the possibility to introduce a measure µ • on a continuous group (G, •) such that µ • is invariant under the group multiplication '•'. To briefly sketch this idea, consider a subset A of the group G and some arbitrary, fixed group element g ∈ G. By multiplying each element a ∈ A with g, the subset A is mapped onto another subset of G, denoted by Now consider a measure µ • on G that assigns to A ⊆ G some non-negative real number holds for any g ∈ G and A ⊆ G. Haar was able to prove the existence of an invariant measure µ • , and its uniqueness apart from an irrelevant multiplicative constant for locally compact, topological groups. Such group invariant measures µ • are referred to as Haar measures nowadays [13]. They give a mathematically precise meaning to the notion 'uniform distribution' by combining measure and group theoretical concepts. However, in physics one often encounters the slightly different situation, where a certain symmetry group acts on the domain X of a vector space, e.g., by means of a matrix representation. In this case, it is a natural to extend the original ideas of Haar by considering measures on X that are invariant under the group action. * + In the case of non-commutative (i.e., non-Abelian) groups, one may distinguish invariance under multiplications from the right or left. * For example, in the one-dimensional case d = 1 the proper-orthochronous Lorentz group L ↑ + consists of boosts only and, therefore, it can be identified with the relativistic velocity space RV 1 = (−1, 1); hence, the action of L ↑ + on RV 1 is just the action of L ↑ + on itself. This corresponds to the framework originally considered by Haar [10]. By contrast, in higher space dimensions d > 1 it is not possible In order to link these concepts to thermodynamics, we return to the BGS entropy (17). This 'canonical' entropy was identified above as the relative entropy with respect to the Lebesgue measure λ on the non-relativistic velocity space V d = R d . Adopting the group-theoretical point of view, the defining property of the Lebesgue measure is given by the fact that λ is the only♯ translation invariant measure on V d . To capture this fact more precisely, we define w-parameterized translations G w on V d by means of The velocity translations G w form a group by means of the composition rule Now consider some subdomain A ⊂ V d and define the translation Then the Lebesgue measure λ is the only measure satisfying [13] λ or, equivalently, in differential notation (ii) Kinetic interpretation: Equation (20) describes a momentum conservation law, with ∆p = mw corresponding to the particle's momentum gain in a collision.
Both interpretations are equally plausible here, because non-relativistic momentum and velocity differ by a mass constant m only; in particular, the Lebesgue measure in velocity space transforms to a Lebesgue measure in momentum space, when changing from velocity to momentum coordinates in the non-relativistic case. However, regardless of this ambiguity in the interpretation of Eq. (20), it is evident that the Lebesgue measure in velocity space (or, equivalently, in momentum space) plays a distinguished role in nonrelativistic physics: It is the Haar measure of the Galilei group (or, equivalently, of the momentum translation group). This might explain why only the relative entropy with respect to this particular measure, S[µ|λ], yields the correct non-relativistic equilibrium distribution (1).
anymore to identify the relativistic velocity space RV d = v ∈ R d | |v| < 1 directly with a subgroup of the Lorentz group, since then the number of group parameters is larger than d (cf. Chap. 6 in Ref. [14]). Nevertheless, also in this case one can find a Lorentz invariant measure on RV d , which is unique apart from an irrelevant multiplicative constant; cf. discussion in Sec. 3. ♯ We omit the phrase 'apart from an irrelevant multiplicative constant' from now on.
In the remainder of this paper, we are going to study generalizations of the Maxwell distribution (1) in the framework of special relativity. In particular, we shall identify the reference measures underlying two of the most commonly considered relativistic one-particle equilibrium distributions.

Relativistic velocity distributions
Six years after Einstein [15,16] had formulated his theory of special relativity, Ferencz Jüttner [17] presented in 1911 the first detailed study on the canonical thermostatistics of a relativistic (quasi-)ideal gas of classical particles. † † As the main result of his paper, he proposed the following three-dimensional relativistic generalization of Maxwell's nonrelativistic momentum distribution [17,18,19]: with β = 1/T being the inverse temperature parameter, and the relativistic energy and the relativistic momentum with Lorentz factor γ(v) = (1 − v 2 ) −1/2 (we continue to use units k B = c = 1). The d-dimensional relativistic momentum space is denoted by RP d . The constant Z 0 is determined by the normalization condition and, in the three-dimensional case, one finds [17] where K ν denotes the modified Bessel function of the second kind. The one-particle momentum distribution (24) refers to a laboratory rest frame, where the container enclosing the gas is at rest. As usual, it is assumed that for an ordinary hard box potential the spatial part of the one-particle phase space PDF is trivial (i.e., constant), corresponding to a spatially homogeneous particle distribution in the box. The Jüttner function φ J has been widely used in high energy and astrophysics over the past decades [20,21,22]. However, in recent years several authors [23,24,25,26,27] argued that Eq. (24) might not represent the correct relativistic equilibrium distribution, and several alternatives were suggested. Generalizing to an arbitrary number of space dimensions d, the proposed candidates can be summarized in terms of the following η-parameterized momentum PDF: The normalization constant Z depends on both η and d. For η = 0 the PDF (28) reduces to the standard Jüttner function (24), φ 0 ≡ φ J . The most frequently considered 0 0.5 modification corresponds to η = 1 [23,24,25,26,27,28], while one author [26] has also included the case η = 2. Compared with the Jüttner value η = 0, larger values η > 0 diminish the probability of particles having high absolute momentum at same temperature T = 1/β. The one-particle velocity PDF corresponding to Eq. (28) is given by with v taking values in the relativistic velocity space RV d := v ∈ R d | |v| < 1 . Below we focus on the two most frequently considered values η = 0 (standard Jüttner distribution) and η = 1 (modified Jüttner distribution). Figure 1 shows the corresponding velocity PDFs f 0 and f 1 at two different temperature values for the onedimensional case d = 1.
In the remainder, we will analyze the MEPs that give rise to the standard and modified Jüttner distributions, respectively. In particular, the different underlying reference measures shall be characterized by means of their invariance under group actions.

Standard Jüttner distribution: Momentum translation symmetry
We first consider the MEP for the standard Jüttner distribution with η = 0. As discussed in Sec. 2.2, the MEP becomes coordinate independent if expressed in terms of relative entropy. In the relativistic case, it is most convenient to use the momentum coordinate p ∈ RP d := R d . The Lebesgue measure on relativistic momentum space RP d has, by definition, a constant density denoted by ℓ. Without loss of generality, we choose the normalization ℓ(p) = (mc) −d = m −d so that the integral of ℓ over some finite subset of RP d is a dimensionless number. With these preliminaries, we can state the MEP for the standard Jüttner function: Maximization of the relative entropy under the constraints where now E = (m 2 + p 2 ) 1/2 is the relativistic energy, yields the standard Jüttner distribution φ J , corresponding to η = 0 in Eq. (28). It may be worth noting that, in the relativistic case, the Lebesgue measure on RP d does not transform into a Lebesgue measure on the relativistic velocity space RV d due to the nonlinear momentum-velocity relation p = mvγ(v). Hence, if one rewrites the relative entropy s 0 in terms of the velocity v, an additional determinant factor enters in the argument of the logarithm. We now turn to the invariance properties of the specific reference measure, required to obtain the standard Jüttner distribution with η = 0. Analogous to the discussion in Sec. 2.3, the Lebesgue measure in relativistic momentum space is singled out by the fact that it is the only translation invariant measure in momentum space; i.e., it is the Haar measure of the momentum translation group. Hence, the standard Jüttner function is consistent with the kinetic interpretation in Sec. 2.3. Put differently, if the Jüttner function turns out to be the correct relativistic one-particle equilibrium distribution, then the maximum principle for the relative entropy should be completed by the postulate that the reference measure must be translation invariant in momentum space.

Modified Jüttner distribution: Lorentz symmetry
As the second example, we consider the modified Jüttner distribution with η = 1 in Eq. (28). It is straightforward to verify that this distribution is obtained by maximizing the relative entropy under the constraints (30b). In contrast to Eq. (30a), the reference density ρ = 1/E is momentum dependent. The measure χ associated with ρ assigns to any subset A ⊂ RP d the measure number It is interesting to explore the invariance properties of this measure. For this purpose, we consider an arbitrary proper-orthochronous Lorentz transformation. Such transformations are either spatial rotations, or boosts, or a combination of both [14]. They act as linear transformations on the energy-momentum vector (E, p). Due to the fixed relation E(p) = (m 2 + p 2 ) 1/2 between energy and momentum, a Lorentz transformation can also be viewed as transformation that operates on the momentum coordinates p alone, denoted by L : RP d → RP d . The functions L are linear only in the case of pure rotations, but nonlinear otherwise [30]. However, analogous to Eq. (22), we may define the Lorentz transformation L[A] of a set A ⊂ RP d by By taking into account the well-known fact that [29,21,30] holds under Lorentz transformations, one then finds that Hence, the specific reference measure underlying the modified Jüttner distributions with η = 1 is distinguished by the property that it is Lorentz invariant. In view of the fact that the Lorentz group is the relativistic counterpart of the Galilei group, one can say that the modified Jüttner distribution is obtained when adopting the geometric interpretation in Sec. 2.3. Put differently, if the modified Jüttner function were the correct relativistic one-particle equilibrium distribution, then the maximum principle for the relative entropy should be completed by the postulate that the reference measure in momentum space must be invariant under the action of the fundamental symmetry group of the physical model (e.g., Galilei, Lorentz, etc.).
Explicit example: One-dimensional case d = 1. As remarked earlier, the onedimensional case d = 1 is somewhat special, because (only) in this case the Lorentz boosts form a group that may be directly identified with the one-dimensional velocity space RV 1 := (−1, 1). The composition of two Lorentz boosts induces a group multiplication ⊕ ψ on RV 1 , given by This group operation is well known as the Einstein addition of velocities. The task of introducing an invariant measure on the group (RV 1 , ⊕ ψ ) falls exactly into the class of problems originally considered by Haar [10]. The subscript ψ symbolizes that the Einstein addition ⊕ ψ is equivalent to an ordinary addition '+ ' in the space Ψ := (−∞, ∞) of the rapidity variables ψ := arctanh v. Put differently, the maps 'arctanh ' and 'tanh' induce a group isomorphism between (RV 1 , ⊕ ψ ) and (Ψ, +). The latter fact makes it particularly simple to identify the Haar measure on (RV 1 , ⊕ ψ ): One merely needs to rewrite the Lebesgue measure λ ψ on Ψ, which is invariant under the addition of rapidities, in terms of the velocity coordinate; in differential notation, one then finds corresponding to the Lorentz invariant measure on RV 1 and RP 1 , respectively [cf. Eq. (34)]. As discussed above, using this measure as the reference measure in the MEP, one obtains the one-dimensional modified Jüttner distribution with η = 1.
For comparison, the ordinary addition p 3 := p 1 + p 2 in momentum space RP 1 = (−∞, ∞) induces another group operation ⊕ p on RV 1 = (−1, 1) by means of the map v(p) = p/(m 2 + p 2 ) 1/2 . The corresponding velocity addition law reads explicitly Analogous to Eq. (37), the invariant Haar measure on (RV 1 , ⊕ p ) is obtained by expressing the Lebesgue measure λ p on RP 1 , which is invariant under the momentum addition, in terms of the velocity variable, yielding dλ p = dp ∝ γ 3 (v) dv.
As discussed in Sec. 3.1, by using this measure in the MEP one is led to the standard Jüttner function.

Summary
We have studied the canonical maximum entropy principle (MEP) for thermodynamic equilibrium distributions by combining basic ideas from group and measure theory [10,13]. It has been demonstrated that the concept of relative entropy [7,8,9] provides a suitable basis for stating the MEP in a coordinate invariant way. Moreover, this approach clarifies that thermodynamic equilibrium distributions, if obtained from a MEP [6], are determined not only by their constraint functionals but also by the underlying reference measures. The latter may be characterized in terms of their symmetry properties, i.e., by their invariance under group actions. As examples, we analyzed the two most frequently considered candidates [17,18,19,20,21,22,23,24,25,26,27,28] for the relativistic generalization of the Maxwell distribution. It could be shown that the two candidate distributions are based on different underlying reference measures. The reference measure leading to a standard Jüttner distribution [17,18,19] is uniquely characterized by the fact that it is invariant under momentum translations, whereas the modified Jüttner distribution [23,24,25,26,27,28] is related to a Lorentz invariant reference measure in momentum space. Even though the above approach clarifies the underlying mathematical differences on a fundamental level, it does not permit to decide which distribution actually is the better candidate, as either reference measure has its own merits. In our opinion, this ambiguity deserves further consideration in the future.
We conclude this paper by mentioning two applications. The correct relativistic equilibrium distribution is required in order to calculate the friction coefficients and noise correlation functions of relativistic Langevin equations (RLEs) self-consistently [31]. An accurate determination of these quantities is essential, e.g., if RLEs are employed to estimate the outcome of high energy collision experiments, as recently done by van Hees et al. [32]. Another potential, astrophysical application concerns the Sunyaev-Zeldovich (SZ) effect [33,34], i.e., the distortion of the cosmic microwave background (CMB)