A Direct Link between Rényi–Tsallis Entropy and Hölder’s Inequality—Yet Another Proof of Rényi–Tsallis Entropy Maximization

The well-known Hölder’s inequality has been recently utilized as an essential tool for solving several optimization problems. However, such an essential role of Hölder’s inequality does not seem to have been reported in the context of generalized entropy, including Rényi–Tsallis entropy. Here, we identify a direct link between Rényi–Tsallis entropy and Hölder’s inequality. Specifically, we demonstrate yet another elegant proof of the Rényi–Tsallis entropy maximization problem. Especially for the Tsallis entropy maximization problem, only with the equality condition of Hölder’s inequality is the q-Gaussian distribution uniquely specified and also proved to be optimal.


Introduction
Tsallis entropy [1,2] has been recently utilized as a versatile framework for expanding the realm of Shannon-Boltzmann entropy for nonlinear processes, in particular, those that exhibit power-law behavior. It shares a structure in common with Rényi entropy [3], Daróczy entropy [4], and probability moment presented in Moriguti [5], since the essential part of all these functionals is p q (x)dx (or ∑ p q i ) for certain constrained probability density functions p(x) (or p i ). This naturally has been of interest for a variety of issues in information theory and related areas. For instance, in his pioneering work, Campbell [6] stated that "Implicit in the use of average code length as a criterion of performance is the assumption that cost varies linearly with code length. This is not always the case." Then, Campbell [6] introduced a nonlinear average length measure defined as being an extension of the one by Shannon, in which D is the size of the alphabet, p i is the probability for a source to produce symbols x i , l i is the length of a codeword c i mapped from symbol x i (using D letters of the alphabet) in the context of source coding, and t is an arbitrary parameter (0 < t < ∞). One of the surprising facts proved in [6] is that the lower bound to the moment-generating function of code lengths, namely, L(t), is given by (p), namely, Rényi entropy of order (1 + t) −1 of the source p = {p i } N i=1 . Moreover, Ref. [6] also realizes that, if which is a mixture of the Shannon code length log D 1 p i and Rényi entropy of order (1 + t) −1 , we have the lower bound L(t) = H 1 1+t (p). So far, Baer [7] has further generalized this result and constructed an algorithm for finding optimal binary codes under quasiarithmetic penalties. In addition, new extensions of [6] were obtained by Bercher [8] and by Bunte and Lapidoth [9].
Such an instance, where "a nonlinear measure" (i.e., generalized entropy) naturally arises, is also known for channel capacities. Daróczy [4] first analyzed a generalized channel capacity, which is a natural consequence of his extension of Shannon entropy (i.e., Daróczy entropy). This result has initiated extensive work in this direction. For instance, Landsberg and Vedral [10] first introduced Rényi entropy and Tsallis entropy for a binary symmetric channel, and they suggested the possibility of "super-Shannon channel capacities." More recently, Ilić, Djordjević, and Küeppers [11] obtained new expressions for generalized channel capacities by introducing Daróczy-Tsallis entropy even for a weakly symmetric channel, binary erasure channel, and z-channel. Similar extensions have been explored for rate distortion theory. For instance, Venkatesan and Plastino [12] developed nonextensive rate distortion theory by introducing Tsallis entropy and constructed a minimization algorithm for generalized mutual information. More recently, Girardin and Lhote [13] covered the setting in [12] in a general framework of generalized entropy rates, which includes Rènyi-Tsallis entropy.
In the context of generalized entropy just described, the q-Gaussian distribution [1,2] often emerges as a maximizer of Rényi-Tsallis entropy under certain constraints, and, hence, it has been extensively studied. Since the q-Gaussian effectively models power-law behavior with a one-parameter q, its utility is widespread in various areas, including new random number generators proposed by Thistleton, Marsh, Nelson, and Tsallis [14] and by Umeno and Sato [15]. In addition to such an important application in communication systems, queuing theory has recently incorporated the q-Gaussian, reflecting the heavy-tailed traffic characteristics observed in broadband networks [16][17][18][19]. For instance, Karmeshu and Sharma [16] introduced Tsallis entropy maximization, and, there, the q-Gaussian emerges as the queue length distributions, which suggests that Jaynes' maximum entropy principle [20][21][22] can be generalized to a framework of Tsallis entropy. Some of the above issues are formulated as nonlinear optimizations with "a nonlinear measure" under certain constraints (which depend on each issue). As mentioned above, Rényi-Tsallis entropy and q-Gaussian is one such instance. In other words, the q-Gaussian maximizes Tsallis entropy under certain constraints. Therefore, it is useful to obtain a deeper understanding of such nonlinear optimization problems. In this study, we find a direct link between Rényi-Tsallis entropy and Hölder's inequality that leads to yet another elegant proof of Rényi-Tsallis entropy maximization. The idea of the proof is different from those offered in previous studies (for instance, [23][24][25][26][27]) as explained below. Interestingly, the technique developed in this study might possibly be useful for tackling more complicated problems regarding optimization issues in information theory and other research areas, such as the conditional Rényi entropy (as in [28][29][30]), for instance.
Previous studies [23][24][25][26][27] are based on a common standpoint, the generalization of the moment-entropy inequality (cf. [25,26]). Namely, they intend to generalize the situation that a continuous random variable with a given second moment and maximal Shannon entropy is a Gaussian distribution (cf. [3], Theorem 8.6.5). In doing so, a generalized relative entropy is devised, which takes a different form (and has a different name) depending on the problem. First of all, Tsukada and Suyari's beautiful work [23] has given proofs for Rényi entropy maximization, which is also known as a bound of Moriguti's probability moment [5] (as posed in R1 in Section 2). Namely, they prove that the q-Gaussian distribution [1,2] is a unique optimal solution by utilizing the fact that all feasible solutions constitute a convex set. Although [23] does not explicitly construct a generalized relative entropy, the essential structure of the proofs inherits the one in the proof of the moment-entropy inequality ( [3], Theorem 8.6.5)). Moreover, they have identified an explicit one-to-one correspondence between feasible solutions to the problems of Rényi entropy maximization and Tsallis entropy maximization, which is also shown in ( [31], p. 754). This implies that an 'indirect' proof to Tsallis entropy maximization (as posed in T1 in Section 2) has been first obtained in [23]. In contrast to this proof, the first 'direct' proof to Tsallis entropy maximization is obtained in Furuichi's elegant work [24]. The proof in [24] utilizes nonnegativity of the Tsallis relative entropy defined between the q-Gaussian distribution (i.e., a possible maximizer) and any other feasible solution. On the other hand, the remarkable work of Lutwak, Yang, and Zhang first clarified that generalized Gaussians maximize λ-Rényi entropy power under a constraint on the p-th moment of the distribution, for univariate distributions [25] and for the associated n-dimensional extensions [26].The essential point in the proofs in [25,26] is construction of relative λ-Rényi entropy power, which is nonnegative and takes a quite different form compared to the Tsallis relative entropy in [24]. (More precisely, in [25], they prove nonnegativity of the relative λ−Rényi entropy log N λ [ f , g] ([25], Lemma 1). Starting from this nonnegativity:log N λ [ f , Gt] ≥ 1, they construct a series of inequalities that saturate at the generalized Gaussian ( [25], Lemma 2). Note, however, that, as observed in this N λ [ f , Gt], they start by giving a candidate of the maximizer ab initio, which is the generalized Gaussian Gt.) Furthermore, Vignat, Hero, and Costa [32] obtained a general, sharp result using the Bregman information divergence for an n-dimensional extension of Tsallis entropy. In addition to [25,26,32], Eguchi, Komori, and Kato's interesting results [27] include the same n-dimensional extension to Tsallis entropy. (Ref. [32] has also identified an elegant structure regarding the projective divergence and the γ-loss functions in maximum likelihood estimation.)Similar to [24][25][26]32], the key component of the proof in [27] is the projective power divergence, which again takes a quite different form compared to the ones in [24][25][26]32]. To prove nonnegativity of the generalized relative entropy, Refs. [25][26][27] utilize Hölder's inequality, but Refs. [23,24,32] do not. Namely, Hölder's inequality has been an auxiliary useful tool, and it has never played an essential role in these previous studies. In addition to the construction of generalized relative entropies, the optimal q-Gaussian distribution needs to be 'given ab initio' [23][24][25][26][27]32], inheriting the framework showing that the Gaussian distribution maximizes Shannon entropy ( [3], Theorem 8.6.5). Now natural questions arise: is it possible to systematically solve the problems of Rényi-Tsallis entropy maximization in a different (and hopefully simpler) way than the previous study? In addition, is it possible to 'construct' the q-Gaussian distribution? These questions are positively answered from a new viewpoint as follows. First, only by the equality (i.e., saturation) condition of Hölder's inequality, the q-Gaussian distribution is specified, and, at the same time, its optimality is proved by Hölder's inequality for a Tsallis entropy maximization of 1 < q < 3 (Theorem 1) and of 0 ≤ q < 1 (Theorems 2 and 3). This clarifies how and why the q-Gaussian distribution emerges as the maximizer in an explicit way for the first time in the literature. (To the authors' knowledge, such a characterization of the q-Gaussian distribution has never been reported.)However, for a Rényi entropy maximization of q > 1 (Theorem 4) and of 1 3 < q < 1 (Theorem 5), the q-Gaussian distribution is specified with the aid of the equality condition of Hölder's inequality. In addition, the proof of its optimality requires a simple inequality inspired from Moriguti [5]. Note that we do not intend to provide an explicit characterization of the q-Gaussian distribution in terms of the parameter q, since numerous previous studies (including [23][24][25][26][27]) have already clarified this. Nevertheless, regarding Tsallis entropy maximization when q = 0, which has previously been studied in [2], a rigorous result (as in Theorem 3) is now obtained for the first time thanks to Hölder's inequality. (For instance, in the framework of [24], the case for q = 0 cannot be incorporated because the Tsallis relative entropy is not defined adequately. ) We note that Hölder's inequality has been recently utilized as an essential tool for optimization in Campbell [6], Bercher [8], and Bunte and Lapidoth [9]; on source coding, in Bercher [33,34]; on generalized Cramér-Rao inequalities; and in Tanaka [35,36] on a physical limit of injection locking. However, such an essential role of Hölder's inequality does not seem to be reported in the context of generalized entropy, including Rényi entropy (cf. [37]), except for the use as a means for proving nonnegativity of a generalized relative entropy, as mentioned above.
In what follows, Section 2 introduces basic definitions required for the analysis. Section 3 includes the main results regarding Rényi-Tsallis entropy maximization problems, and it also contains an explanation on the link to Moriguti's argument in [5]. Section 4 lists the proofs to the results presented in Section 3. Finally, one Appendix at the end provides further supplementary information.

Basic Definitions and Problem Formulation
In this section, we first define Tsallis entropy [1,24] and Rényi entropy ( [3], pp. 676-679). Next, we reformulate Rényi-Tsallis entropy maximization problems in a unified way. Finally, we introduce Hölder's inequality in relation to the problems in this study.

Tsallis Entropy and Rényi Entropy
Tsallis entropy is beautifully presented in the context of q-analysis (cf. [1], p. 41) as follows. First, the q-exponential function exp q , whose domain and range satisfies exp q : is defined by While, the inverse of the q-exponential function, namely q-logarithmic function, is defined by Note that, as q → 1, we have exp q x → e x and ln q x → ln x. We also note that the above definition of exp q x and ln q x has been recently revised by Oikonomou and Bagci [38]. (In [38], they have further developed 'complete' q-exponentials and q-logarithms.)Then, the Tsallis entropy H Tsallis q is defined by H Tsallis for univariate probability density functions (PDFs) p on R, which is a natural generalization of Boltzmann-Gibbs entropy and Shannon entropy. Hereafter, · = ∞ −∞ · dx in (1) is used for notational simplicity. The reason why · is used, instead of · , is due to the fact that · is generally used for the expectation value. On the other hand, Rényi entropy is well-known and can be found in textbooks of information theory (cf. [3], pp. 676-679), which is defined simply by Finally, we note that only differential entropies (i.e., continuous probability distributions) are considered in this study, although our technique with Hölder's inequality can be applied to discrete probability distributions.

Problem Formulation
Let D be the set of all PDFs on R. We then define the set, as introduced in [24], Following the problem formulation in [1,2,23,24,31], we first introduce the Tsallis entropy maximization problem for univariate PDFs p on R: in which q and σ 2 have fixed values, andP q (x) = p q (x)/ p q (x) . Note that p q (x) < ∞ and x 2 p q (x) < ∞ are assumed in T1.P q (x) is often called the escort probability [27,31]. This somewhat unusual form of expectation x 2P q (x) = σ 2 is called the q-normalized expectation [31], which has been usually assumed in Tsallis statistics. In contrast to the q-normalized expectation, as [31] pointed out, the usual expectation x 2 p(x) = σ 2 is also valid in Tsallis statistics. We note the Tsallis entropy maximization problem under the constraint of this usual expectation is considered later in problem R2.
For problem T1, using Tsallis relative entropy, Furuichi [24] first proved that for 0 < q < 3 the q-Gaussian distribution p(x) = 1 Z q exp q (−β q x 2 ) maximizes the Tsallis entropy among any univariate PDFs in C q , where Z q and β q are constants determined by q and σ.
Here, we formulate a slightly generalized optimization problem T2, as follows. First, replace (2c) with Note that now, as opposed to T1, it is not necessarily required that both x 2 p q (x) and p q (x) are finite, and hence, C q is not required, and it is replaced with D. Next, notice that Tsallis entropy is maximal at p(x), such that p q (x) is minimal (or correspondingly, maximal at p(x), such that p q (x) is maximal) for q > 1 (correspondingly, for 0 ≤ q < 1). Then, by introducing an additional arbitrary parameter λ q , T1 is reformulated as where the constant σ 2 is multiplied with p q (x) in the first term of (3a) simply due to notational convenience for later analysis in Section 4. As opposed to the Tsallis entropy maximization problem T1, the Rényi entropy maximization problem is usually considered under the constraint of the usual expectation x 2 p(x) = σ 2 , in other words,

R1:
maximize H Rényi q which is equivalent to : We note this very problem for q > 1 was first posed and solved by Moriguti in 1952 [5]. (Later in [39], cases q > 1 and 0 < q < 1 are both analyzed in an n-dimensional spherical symmetric extension of [5] with the same approach as [5].)Similar to T1, by introducing an additional parameter λ q and the constraint which is obtained from (5b) and (5c), R1 is now reformulated as As we observe (3a) in T2 and (7a) in R2, both become the inner products of two functions; p q (x) and λ q x 2 + (1 − λ q )σ 2 and p q (x) and 1 + λ q (x 2 − σ 2 )p 1−q (x), respectively. This suggests a direct link to Hölder's inequality.

Hölder's Inequality for Later Analysis
Here, we provide minimum information about Hölder's inequality for later analysis in Section 3 and Section 4. The standard Hölder's inequality is given by with 1 ≤ α, β ≤ ∞ and α −1 + β −1 = 1 (cf. [40] for the one-demensional case and [41] for general measurable functions). In general, f and g are measurable functions defined on a subset S ⊆ R n and µ(S) > 0, and we employ a compact notation as Although · α and · β are no longer norms for α, β < 1, now in the context of this study, we set α = q −1 and β = (1 − q) −1 . Then, Hölder's inequality (8) is given in the following form: For the case 0 < q < 1, the equality in (9) holds if and only if there exists constants A and B, not both 0 (cf. [40], p. 140), ( More specifically, if f is null (i.e., f (s) = 0 (a.e. s ∈ S)), then B = 0. In addition, if g is null, In addition, for the exceptional case q = 0 (as well as q = 1), we can argue a condition for the equality in (9) separately, as shown in Section 4.3, although the expression of (10) is no more valid for this case.
In contrast to (9), reverse Hölder's inequality is given by which is directly obtained from Hölder's inequality [40]. We note that f can be 0 over any subset U ⊆ S. As for g, on the other hand, we assume g(s) = 0 for almost everywhere (a.e.) s ∈ S, taking care that −1 q−1 < 0 in (11) (cf. [40], p. 140). Then, for the case q > 1, the equality in (11) holds if and only if there exists A ≥ 0, such that

Main Results
In this study, we focus on the univariate PDFs on R, and we consider f (x) and g(x) defined on R as a special case of general f (s) and g(s) in Section 2.3. Hereafter, we refer to (10) and (12), as the equality condition of Hölder's inequality and reverse Hölder's inequality, respectively. Thanks to these equality conditions, we obtained our results systematically.
Let p(x) be a univariate PDF defined on R. Assume that p(x) is a measurable function which is integrable with respect to x. In addition, let B(·, ·) denote the Beta function (cf. [42], p. 253). Then, we can form the following statements.

Corollary 1. For q ≥ 3, Tsallis entropy H Tsallis q
[p] is bounded, but has no maximizer. Namely, there exist PDFs p(x), such that H Tsallis (The idea for constructing such PDFs is from Tsukada and Suyari [23], where they proved that R1 for q ≤ 1 3 becomes unbounded, i.e., p q (x) → +∞.) The proof of this theorem (and corollary) is given in Section 4.1. As mentioned in Section 1, the above statement itself has already appeared in [24]. However, our proof is quite different to the one in [24], in the sense that it does not require generalized relative entropy, and the maximizer is explicitly 'specified' (not 'given ab initio'). Namely, reverse Hölder's inequality aids in finding the optimal solution.
The outline of the proof is as follows. First, for 1 < q < 3, the maximization of the Tsallis entropy H Tsallis q [p], in other words, the minimization of T q [p; λ q ] in (3a), is related to reverse Hölder's inequality in (11). Second, we observe that T q [p; λ q ] has the lower bound through reverse Hölder's inequality. Third, the minimizer p opt (x) achieving this bound is explicitly and uniquely constructed from the equality condition (12) : and λ q,opt = 1 2 (q − 1).

Remark 1.
Even if we assume the additional constraint: p ∈ C q (⊂ D) in T2, the proof of this theorem (as well as of Theorems 2 and 3) remains the same, since we do not require the finiteness of x 2 p q (x) and p q (x) (i.e., p ∈ C q ) in the proof.

Remark 2.
Another simple proof for optimality of p opt is given as follows. The idea is due to Moriguti's argument (cf. [5], p. 288), where p q opt (x) and any p q (x) are directly related by the Taylor expansion for each x ∈ R : where p int (x) (≥ 0) has a value between p(x) and p opt (x). Substituting (13) into the second term of the right-hand side of (14), we have (15) With the constraints (3b) and (3c) : (13) is a unique optimal solution to T2, as the equality holds only if p = p opt .

Theorem 2. (Tsallis entropy maximization for
is the unique maximizer of the Tsallis entropy H Tsallis The proof of this theorem is given in Section 4.2. In the case for 0 < q < 1, the maximization of T q [p; λ q ] is recast as Hölder's inequality in (9), where, similar to the argument in the proof of Theorem 1, construction of p opt and λ q,opt and verification of its optimality are carried out simultaneously. The maximizer p opt for 0 < q < 1 is uniquely determined from the equality condition (10): where and λ q,opt = 1 2 (q − 1) (< 0) are uniquely determined, and the associatedS q,opt is uniquely determined asS q,opt = − λ q,opt −1

Remark 3.
Another simple proof for optimality of p opt is given by following Moriguti's argument ( [5], p. 288). Similar to the case for 1 < q < 3 in Remark 2, (14) holds for x ∈S q,opt ⊂ R. As for x ∈ R \S q,opt , p opt (x) = 0 from (17). We then have where · R\S q,opt = R\S q,opt · dx is used for notational simplicity. Integrating (18) over R, for any p satisfying the constraints (3b) and (3c), we find since the first term on the right-hand side ≤ 0 as p(x) S q,opt ≤ 1, the second term ≤ 0 from the definition of S q,opt and the fact that 0 < q < 1, and the third term ≤ 0 because of the definition ofS q,opt . Since the equality in (19) holds only if p = p opt , this implies that p opt in (17) is a unique optimal solution to T2.
The proof of this theorem is given in Section 4.3, where the associated Hölder's inequality is given as f g 1 ≤ f ∞ g 1 , and we follow the arguments in the proof of Theorem 2 for 0 < q < 1. (This exceptional case (q = 0) is also considered in T1, where the same result is obtained through a more direct graphical argument, after proving that any candidate p * (x) for the maximizer p opt (x) is defined only on a simply connected interval S * that is symmetric about the origin O.The proof is straightforward but lengthy, so we omit it here.) However, as opposed to the case for 0 < q < 1, the equality condition is not available in the form of (10) for q = 0, and we directly verify that is the unique solution satisfying the equality in (9), as shown in Lemma 1. Namely, for any feasible solutions p(x) satisfying the constraints (3b) and (3c), we find that f * (x) is associated with the unique maximizer p opt (x) of T q [p; λ q ] from (52), and hence, p opt (x) for q = 0 is obtained as in (20).

Remark 4.
We note that the optimal solution shown in ( [2], p. 2399, Figure 1) for q = 0, which is obtained by setting q → 0 in (17), is a special case of (20).
The proof of this theorem is given in Section 4.4. The minimization of R q [p; λ q ] in (7b) for q > 1 is related to reverse Hölder's inequality in (11). In contrast to those of Theorems 1-3, the proof of Theorem 4, which can be found in Section 4.4, follows from two steps. In the first step, we construct a candidate for the minimizer (i.e., p opt (x), see (61) below), whose support becomesS opt , and we determine the associated λ q,opt andS opt through the equality condition of reverse Hölder's inequality. In doing so, as shown in Figure 1, we introduce a subset of feasible solutions p(x), in other words, Q, which satisfies the constraints (5b) and (5c), and an additional constraint: In the second step, after obtaining a candidate p opt (x) ∈ Q, we verify that this p opt (x) is indeed the unique minimizer of R q [p; λ q ] by directly comparing p q opt (x) and p q (x) for any feasible solutions p(x) satisfying the constraints (5b) and (5c).

Remark 5.
We note that the first proof for this optimality of p opt has been given in Moriguti [5], in which the essential idea is the Taylor expansion shown in the argument below (18).
The proof of this theorem is given in Section 4.5. Maximization of R q [p; λ q ] is related to Hölder's inequality in (9) and the proof follows two steps, similar to the proof for Theorem 4. In the first step, we construct a candidate for the maximizer (i.e., p opt (x), given below by (72)) and determine λ q,opt through the equality condition of Hölder's inequality. In the second step, after obtaining a candidate p opt , we verify that this p opt is indeed the unique maximizer of R q [p; λ q ] by directly comparing p q opt (x) and p q (x) for any feasible solutions p(x) satisfying the constraints (5b) and (5c). This verification is done as in the proofs for Theorem 4. Although omitted here, using essentially the same argument as in Remark 1, another simple proof based on Moriguti [5] is possible. Remark 6. Tsukada and Suyari [23] have proved that R1 for 0 < q ≤ 1 3 becomes unbounded. As for the exceptional case of q = 0, the upper and lower bounds of p q (x) (= p 0 (x) ) are argued as follows. First, if we consider the Gaussian distribution that satisfies (5b) and (5c), this gives us p 0 (x) = 1 R = ∞, and it implies there is no maximizer. Next, consider a particular distribution given by with δ > 0. This ∆(x) satisfies ∆(x) = δ −1 δ = 1 in (5b), and it also satisfies x 2 ∆(x) = σ 2 in (5c) when δ is arbitrary small, in other words, and, this particular distribution gives ∆ 0 (x) = 1 [σ,σ+δ] = δ → 0 (δ → 0), which implies there is no minimizer. Therefore, problem R1 (and R2) has no maximizer nor minimizer for q = 0.

Proof of Main Results
Following the outlines leading to Theorems 1-5 in Section 3, here we give their proofs.
(24c) implies that T q [p; λ q ] has the lower bound (i.e., the Tsallis entropy H Tsallis q [p] in (1) has the upper bound).

Proof of Lemma 1 and Theorem 3
Let p be arbitrary feasible solutions to T2 for q = 0, and let p opt be its optimal solution,which is eventually constructed in (53). Let λ q,opt be a particular value of the additional parameter λ q in T2, which is associated with p opt and is eventually constructed in (54). Then, for any p and a particular λ q,opt (= − 1 2 in (54)) , we define f and g as and we define an interval In (45a), as a convention, we take 0 0 = 0, and p(x) < ∞ (a.e. x ∈S q,opt ). Then, f ∞ = 1 follows from (45a). Now, we define f * as We note this particular f * (x) = sgn[g(x)] is proved to be the unique maximizer of f g 1,S q,opt in the following Lemma 1 (as a minor modification of Lemma 4 in [35]). [35]). Let S be an arbitrary subset in R, with µ(S) > 0. For f ∈ L ∞ (S) and g ∈ L 1 (S), assume g(x) = 0, a.e. on S. Then, f * (x) = sgn[g(x)] (a.e. x ∈ S) is the unique maximizer of the functional f g 1,S in (9).

Lemma 1. (cf. Lemma 4 in
Proof of Lemma 1. First, thanks to Hölder's inequality, see (9), f g 1,S is maximized by f * , since Second, the unique representation of this maximizer f * is shown by proof by contradiction, as follows. Suppose another maximizerf * exists and it maximizes f g 1,S , in other words, f * g 1,S = g 1,S . Then, for any given g ∈ L 1 (S), the following is satisfied: Now, using the identities f * (x)g(x) = sgn[g(x)] · g(x) ≥ 0 and |g(x)| = f * (x)g(x), we obtain , respectively, resulting in the equality Substituting (49) into the left-hand side of (48) and using |g(x)| = f * (x)g(x), (48) is rewritten as Now, keeping 0 ≤f * (x) ≤ 1 and the assumption that g(x) = 0, a.e. on S in mind, (50) implies |f * (x)| = 1, or equivalentlȳ where σ takes either −1 or 1. However, among such functionsf * having either −1 or 1 values, it is clear that sgn[g(x)] (= f * ) is the only one that makes f g S maximal. Thus, nof * can exist except for f * , and the uniqueness of the maximizer f * is verified.
Proof of Theorem 3. First, we show that T q [p; λ q ] is maximized in the following way: = f g 1,S q,opt ≤ f ∞,S q,opt g 1,S q,opt (51c) ≤ g 1,S q,opt (the upper bound), where · = ∞ −∞ · dx, · S q,opt = S q,opt · dx, and · 1,S q,opt = S q,opt | · |dx, and f ∞,S q,opt is the infinity norm of f (x) (x ∈S q,opt ), in other words, the essential supremum of | f (x)| (x ∈S q,opt ). The first "=" in (51a) follows from the fact that T q [p; λ q ] = σ 2 p q (x) + λ q (x 2 − σ 2 )p q (x) in (3a) is independent from the value of λ q , since any feasible solution p(x) satisfies (x 2 − σ 2 )p q (x) = 0 in (3c), and the second "=" in (51a) is immediate from (45). The "≤" in (51b) is obtained from the same argument of the inequality (34b) and (35) in the proof of Theorem 2. On the other hand, the equality in (51b) is immediate from f (x)g(x) ≥ 0 (∀x ∈S q,opt ). The first "=" in (51c) follows from the definition of · 1,S q,opt , and the "≤" in (51c) follows from the Hölder's inequality (9). The final "≤" in (51d) follows from the definition of f (x) in (45a), in other words, f ∞,S q,opt ≤ 1, and the resulting g 1,S q,opt implies the upper bound of T q [p; λ q ] if λ q,opt exists for given q and σ.
Next, we construct a maximizer p opt (x) achieving this bound and show its uniqueness, which is done by checking the conditions where all three "≤" in (51) become "=". As for the first "≤" in (51b), it becomes "=" if and only if p(x) becomes positive only inS q,opt . Namely, in other words, the "≤" in (51b) becomes "<" if the above condition (52) is violated, which is easily verified from the graph of g(x) and the above argument for the "≤" in (51b). On the other hand, in (51c), the second "≤" becomes "=" if and only if f (x) = sgn[g(x)] (a.e. x ∈S q,opt ) due to Lemma 1 (simply by replacing S withS q,opt , in Lemma 1). The final "≤" in (51d) becomes "=" if and only if f ∞,S q,opt = 1 in (51c). From these three conditions, f is uniquely determined as f * in (47), and from (45a) the associated maximizer p opt for q = 0 is obtained as where p opt (x) should satisfy p opt (x) S q,opt = 1. Finally, substituting (53) into the constraint (3c), we have (x 2 − σ 2 )p 0 opt (x) = x 2 − σ 2 S q,opt = 0, and from (46) λ q,opt andS q,opt are uniquely obtained as respectively. This shows the uniqueness of the representation of p opt in (53). (This exceptional case q = 0 is also argued in T1, where the same result is obtained through a more direct graphical argument, after proving that any candidate p * (x) for the maximizer p opt (x) is defined only on a simply connected interval S * that is symmetric about the origin O.The proof is straightforward but lengthy, and we omit it here.)

Proof of Theorem 4
Proof. Let p be arbitrary feasible solutions to R2 for q > 1, and let p opt be its optimal solution,which is eventually constructed in (61). Let λ q,opt be a particular value of the additional parameter λ q in R2, which is associated with p opt and is eventually constructed in (65). First, for any p and a particular λ q,opt , in (65), we define f and g as and we define a setS q in R: Next, we introduce a subset Q of the feasible solutions p, which is proved to be non-empty in Appendix A.1. First, we show that the following holds: if p ∈ Q, (7c) is independent from the value of λ q , since any feasible solution p(x) satisfies (x 2 − σ 2 )p(x) = 0 in (6), and the second "=" in (58a) is immediate from the definitions (56) and (57). The first "=" in (58b) is also immediate from (56). The "≥" in (58c) follows from reverse Hölder's inequality (11). If R q [p; λ q ] achieves the lower bound, and f 1 q ,S q g 1 1−q ,S q in (58c) saturates at this bound for p opt (∈ Q) and λ q,opt , then, from (58), the following has to be satisfied: R q [p opt ; λ q,opt ] = f 1 q ,S q g 1 1−q ,S q = the lower bound.

Conclusions and discussion
We obtained a new insight about a direct link between generalized entropy and Hölder's inequality, and yet another proof for Rényi-Tsallis entropy maximization; the q-Gaussian distribution is directly obtained from the equality condition of Hölder's inequality, and its optimality is proved by Hölder's inequality through Moriguti's argument. The simplicity in the proofs of Tsallis entropy maximization (Theorem 1, 2, and 3) is worth noting; essentially, several lines of inequalities (including Hölder's inequality) are sufficient for the proof.
As an analogy, what we have described in this study can be explained as mountain climbing; as for Tsallis entropy maximization, the top of the mountain, in other words, the upper/lower bound is clearly seen from the starting point. Namely, the bounds in (24c), (34d), and (51d) are explicitly given by q and σ. Therefore, all we need to do is to keep climbing to the top, in other words, to construct a series of inequalities (24), (34), and (51) that saturate at the bound. On the other hand, for Rényi entropy maximization, the top of the mountain is not clearly seen from the starting point. Namely, the upper/lower bound is not given only by q and σ but contains p(x), as in (58c) or (69c). Even in such a case, Hölder's inequality is still useful for finding a peak of the mountain, in other words, it leads to a candidate of the global optimal, and then we verify this candidate is really the top by using a GPS (global positioning system). In addition, this GPS is obtained as in (66) or (79), thanks to Moriguti [5].
Our technique with Hölder's inequality plus the additional parameter λ q can be useful for other inequalities (e.g., Young's inequality), and it seems an interesting open problem to clarify what sort of optimization problems can be solved from such a technique. Here, we illustrate an example of Q introduced in Section 4.4. Having obtained p opt and λ q,opt in Section 4.4, an element p (∈ Q) , which satisfies (5b), (5c), and is constructed from p opt in (61) as follows. Figure A1 shows how we are going to construct p from p opt ; the basic idea is that such p is obtained only by slightly modifying p opt at its edge while keeping the constraint (A1). First, we choose small adjacent intervals I 1 and I 2 inside the interval σ, 3q−1 q−1 σ . This choice is consistent to the fact that the constraint (A1) is equivalent to p(x) ≥ λ q,opt (σ 2 − x 2 ), 0 1 q−1 + (as shown by the red dotted line in Figure A1), and hence p(x) can be 0 in σ, 3q−1 q−1 σ (as observed in the inset of Figure A1). Second, we shift I 1 , I 2 , and the associated value of p opt (x) originally defined on I 1 and I 2 , altogether, while keeping the original value of (x 2 − σ 2 )p(x) at 0. As shown in the inset of Figure A1, an option for this shift is: I 1 to the right and I 2 to the left. Such an option for small shifts always exists because of the continuity of integration (x 2 − σ 2 )p(x) with respect to I 1 and I 2 . Note that the resulting p shown in the inset of Figure A1 satisfies (5b), (5c), and (A1). The above constructed p constitutes a non-empty set Q and it is straightforwardly verified to be convex. Figure A1. Construction of p (∈ Q) from p opt .