(Generalized) Maximum Cumulative Direct, Residual, and Paired Φ Entropy Approach

A distribution that maximizes an entropy can be found by applying two different principles. On the one hand, Jaynes (1957a,b) formulated the maximum entropy principle (MaxEnt) as the search for a distribution maximizing a given entropy under some given constraints. On the other hand, Kapur (1994) and Kesavan and Kapur (1989) introduced the generalized maximum entropy principle (GMaxEnt) as the derivation of an entropy for which a given distribution has the maximum entropy property under some given constraints. In this paper, both principles were considered for cumulative entropies. Such entropies depend either on the distribution function (direct), on the survival function (residual) or on both (paired). We incorporate cumulative direct, residual, and paired entropies in one approach called cumulative Φ entropies. Maximizing this entropy without any constraints produces an extremely U-shaped (=bipolar) distribution. Maximizing the cumulative entropy under the constraints of fixed mean and variance tries to transform a distribution in the direction of a bipolar distribution, as far as it is allowed by the constraints. A bipolar distribution represents so-called contradictory information, which is in contrast to minimum or no information. In the literature, to date, only a few maximum entropy distributions for cumulative entropies have been derived. In this paper, we extended the results to well known flexible distributions (like the generalized logistic distribution) and derived some special distributions (like the skewed logistic, the skewed Tukey λ and the extended Burr XII distribution). The generalized maximum entropy principle was applied to the generalized Tukey λ distribution and the Fechner family of skewed distributions. Finally, cumulative entropies were estimated such that the data was drawn from a maximum entropy distribution. This estimator will be applied to the daily S&P500 returns and time durations between mine explosions.


Introduction
For a continuous random variable with density f , the classical differential (Shannon) entropy is defined by Maximizing (1) with respect to f under the constraint of observed power or L-moments gives maximum entropy (ME) densities (see e.g., [1,2]). This ME solution represents a distributional model which is compatible with the minimum information given by the fixed constraints. The task of deriving ME densities is important, as they are the only reasonable distribution to use for estimation, as lower entropy distributions would mean to assume information that we do not possess. However, under different constraints. In Section 4, we propose an estimator for the ME generating function. Finally, we apply this estimator to real datasets. In the Appendix A, we apply the theoretical results to seven families of cumulative Φ entropies (MaxEnt) or families of distributions (GMaxEnt).

What does Maximizing Cumulative Direct, Residual, and Paired Shannon Entropies Mean?
In this section, we first discuss the concept of 'contradictory information' in contrast to no or minimum information and determine that contradictory information corresponds with U-shaped/bipolar distributions. Then, we learn that maximizing cumulative paired entropies best reflects this situation by comparing the results with those of maximizing differential, cumulative residual, and cumulative direct entropies. Next, we see that the cumulative residual and cumulative direct entropies do not correspond to a U-shaped distribution if the support of a random variable is only non-negative. Overall, in this section, the focus is on Shannon entropies. However, all insights can be transferred to arbitrary cumulative entropies immediately.
The traditional ME approach starts with the result that the uniform distribution has minimum information (= maximum entropy) under the constraint that the area under the density sums up to one. However, there is another concept of maximum entropy in fuzzy set [19] and uncertainty theory [24,39]. Transferring this concept to probability theory, maximum uncertainty represents the fact that an event A with probability 0 < P(A) < 1 and its complementary event A with probability P(A) = 1 − P(A) have identical probability. This means that P(A) = 1/2. Since the Shannon entropy −P(A) log P(A) − (1 − P(A)) log(1 − P(A)) is maximized for P(A) = 1/2, this kind of entropy could serve as the basis for an uncertainty measure. For a continuous random variable X, the ensemble of events (X ≤ x), x ∈ R such that 0 < P(X ≤ x) < 1 can be considered. It is obvious to measure the amount of uncertainty of X by − P(X ≤ x) log P(X ≤ x)dx − (1 − P(X ≤ x)) log(1 − P(X ≤ x))dx, with integration area R. We set 0 log 0 = 0. Let F be the cumulative distribution function of X, then the cumulative paired Shannon entropy is defined by with probability integral transformation u = F(x), quantile function Q(u) = F −1 (u), and quantile density q(u) = dQ(u)/du = 1/ f (Q(u)) for u ∈ [0, 1]. f denotes the density of X. If X has a compact support [a, b], CPE S (F) attains its maximum for F(x) = 1/2 for a ≤ x < b. This corresponds to a so-called bipolar distribution with P(X = a) = P(X = b) = 1/2. For this bipolar distribution, CPE S (F) = ln 2(b − a), a < b holds. Therefore, the cumulative paired Shannon entropy increases with b − a. In contrast to this, the classical differential Shannon entropy (E S ) takes a value of ln 2 for all bipolar distributions, independently of how large the distance between the two mass points is. Rao [5] identified this property as an important advantage of cumulative entropies over the differential entropy. The different behaviors of differential (Shannon) entropy and cumulative Shannon entropy are illustrated in Example 1.

Example 1.
We consider the symmetric beta distribution with density f (x; α) = 1 B(α, α) x α−1 (1 − x) α−1 , 0 < x < 1, α > 0 while parameter α ∈ (0. 1,2). This range allows almost bipolar distributions (α = 0.1), uniform distributions (α = 1), and bell-shaped distributions (α = 2). Figure 1 compares the values of the differential entropy and the cumulative paired Shannon entropy for this range of the parameter α. We see that the differential entropy is non-positive everywhere and attains its maximum for the uniform distribution (α = 1). In contrast to this, the cumulative paired Shannon entropy starts with the maximum value for a bipolar distribution and decreases monotonically with an increase of the parameter α. As we do not want to assume information that we do not possess, we perform the ME task to search for densities that are relying on maximum entropy. The densities that are based on minimum information are the only ones that could reasonably be used. In this paper, we propose to rely on bipolar distributions, as they provide contradictory information which is even less useful than minimum or no information for prediction.
The following examples intend to explain where bipolar distributions appear in real situations and how this bipolarity affects the predictability of a random variable X.

Example 2.
In an opinion poll survey, individuals are asked to judge their political belief on a continuous left-right scale. 0 (100) symbolizes an extremely left (right) political view. The survey's result maximizes the cumulative paired Shannon entropy if half of the people state to be extremely left (= 0), and the other half state to be extremely right (= 100). This is a situation of maximum uncertainty regarding how to predict the political view of an individual person. Example 3. If the task is to judge a product on a Likert scale with five ordered categories, the uniform distribution means that no category will be favored by the majority of the voters. However, there could be a result of the voting that is still more confusing than the uniform distribution. What can we learn from the extreme situation that half of the voters give their vote to the best and the other half to the worst category? What does this mean for a new customer thinking over buying the product? In this situation, buying would therefore mean receiving an either excellent or very bad product. This is a situation in which it is most complicated to predict the customer's decision.
Both situations of Examples 2 and 3 can be characterized by the term 'contradictory information' in contrast to minimum or no information. In general, information is able to reduce uncertainty. However, contradictory information is implicitly defined by the fact that it increases uncertainty and provides a high chance for a wrong decision. (Anti-information is a related, but less formal concept introduced by the information scientist J. Verhoeff [40].) Therefore, as bipolar distributions lead to contradictory information, it is an important task to consider entropies that will be maximized by a bipolar distribution if there are no constraints. Thus, we propose to use cumulative paired entropies to cover contradictory information. Example 1 already showed that the differential entropy does not embody contradictory information. In the following section, we compare the information provided by using cumulative residual and cumulative direct entropies in contrast to using cumulative paired entropies. Rao et al. [4] introduced cumulative residual entropies as a Shannon entropy, where the density is substituted by the survival function. Then, [5][6][7][8][9][10][11][12][13][14][15][16][17] discussed this cumulative residual Shannon entropy: DiCrescenzo and Longobardi [41] applied the Shannon entropy to a distribution function and called it cumulative entropy. This gave the formula We will call (4) cumulative direct Shannon entropy (CDE S (F)) for a better distinction to the cumulative residual Shannon entropy (CRE S (F)) and the cumulative paired Shannon entropy (CPE S (F)). What does maximum entropy mean for the cases of cumulative residual and cumulative direct Shannon entropy? The entropy generating function −(1 − u) ln(1 − u) attains its maximum for u = 1 − 1/e = 0.632 > 0.5. If the support is [a, b], the maximum CRE S distribution is bipolar. However, this bipolarity is less extreme than in the symmetric case. This is due to that fact that it holds P(X = a) = 1 − 1/e and P(X = b) = 1/e. Therefore, there is a preference for the alternative a that makes the prediction of X easier than in the symmetric case. However, there is still somewhat contradictory information rather than information. Regarding (4), the probabilities for a and b have to be interchanged to get a maximum CDE S distribution. The following example illustrates this for a beta distribution with parameters α and β. Example 4. Let X be beta distributed with density In the following section, we fix β and compute α such that (3) or (4) will be maximized. In Table 1 α cre and α cde denote the corresponding maximum values. Moreover, this table also contains the maximum values of CRE S and CDE S . We see that the maximum is attained for small values of α and β, denoting a slightly asymmetric U-shaped beta distribution.  Figure 2 illustrates the maximum CRE S and CDE S beta distributions for the parameter settings displayed in Table 1. To date, the support has been R. However, as, e.g., in the reliability theory, the focus is on random variables with only non-negative support. Thus, it is of importance to also discuss this situation. When only considering a random variable with non-negative support and for the ME quantile function Q holds Q(0) = 0, maximizing CRE S or CDE S gives a distribution which is no longer U-shaped, and the maximum entropy situation no longer corresponds with contradictory information. We illustrate this in Example 5 using a special beta distribution. The parameter β will be set to 1 such that Q(0) = 0.   Figure 3 displays, on the top row, the maximum CRE S distribution at α = 0.48 and shows that this is an arrangement between an extremely right skewed (α = 0.01) and an extremely left skewed (α = 3) distribution. On the bottom row, we see the maximum CDE S distribution at α = 1. The question we raised in the section title on what maximizing cumulative direct, residual, and paired Shannon entropies means can be answered by the conclusion that maximizing these entropies leads to a more or less skewed U-shaped distribution as long as there are no special constraints (like Q(0) = 0) which are able to prevent this. This U-shaped distribution corresponds to contradictory information. Examples 2 and 3 showed that this kind of information is even less useful for prediction and estimation than minimum or no information. Therefore, those distributions are the only reasonable distributions to consider to not assume information that we do not possess.
In the following section, we unify the diverse approaches of cumulative entropies and introduce the general class of cumulative Φ entropies. Then, we derive two general formulas for ME quantile functions under some restrictions.

Maximum Cumulative Φ Entropy Distributions
In the following, we will first introduce the general class of cumulative Φ entropies that incorporates and generalizes well-known entropies. Then, we will derive general formulas for maximum entropy distributions for this new class regarding arbitrary support as well as non-negative support.

General Class of Cumulative Φ Entropies
In this section section, we incorporate cumulative direct, residual, and paired entropies into one approach. Additionally, instead of focusing on the Shannon case, we allow for a general so-called entropy generating function φ, which has to be non-negative and concave on [0, 1]. In general, but not mandatory φ has a maximum in the interval [0, 1]. Hence, the corresponding cumulative φ entropies are the cumulative paired φ entropy the cumulative residual φ entropy and the cumulative direct φ entropy with To cover all three cases into one approach, we consider a general concave entropy generating will be called cumulative Φ entropy. For the maximum entropy task, the objective is now to maximize this cumulative Φ entropy with respect to F under distinct constraints and to search for the distribution that maximizes this entropy. At first, we consider cumulative Φ entropies in a situation with fixed mean and variance. The restriction to these two moments can be explained by the fact that higher moments lead to equations for the ME quantile function which cannot be solved explicitly or the solution does not exist. Then, we discuss the same task with the additional requirement that Q(0) = 0. This leads to the fact that the solution can only exist for special relations between the fixed mean and the fixed k-th power moment.
In Section 3.2, we consider the situation where mean and variance are fixed and in Section 3.3, the situation with the additional requirement of Q(0) = 0.

General Results for Arbitrary Support
The focus in this section is on the situation where mean and variance are fixed and the support is arbitrary. First, the maximum cumulative Φ entropy principle and then the generalized maximum cumulative Φ entropy principle are introduced. General formulas for ME quantile functions are provided.

Maximum Cumulative Φ Entropy Approach
In this section, for a given entropy and fixed constraints, general formulas for ME distributions are derived. This maximum cumulative Φ entropy approach follows the maximum entropy principle in the sense of [42,43]. The following theorem provides a general formula for the ME quantile function Q. Theorem 1. Let CE Φ be the cumulative Φ entropy with concave entropy generating function Φ such that the derivative Φ exists i.e., is quadratic integrable over [0, 1], and |Φ(0)| < ∞, |Φ(1)| < ∞ hold. Then, the maximum CE distribution under the constraints of fixed mean µ and variance σ 2 is given by the quantile function Proof. The objective function Φ(F(x))du = with l 1 and l 2 denoting the Lagrange parameters. The Euler-Lagrange equation gives Solving this equation leads to the quantile function l 1 and l 2 are determined by the moments µ and σ 2 . Rearranging and From Solving with respect to l 2 leads to Inserting l 2 into (9) gives the quantile function (8).

Generalized Maximum Cumulative Φ Entropy Approach
In this section, for a given quantile function Q, the corresponding generating function Φ of the cumulative Φ entropy will be derived. This generalized maximum cumulative Φ entropy approach follows the generalized maximum entropy principle formulated by [44]. We also use formula (8) for this approach. For a simpler notation, we introduce the partial mean function. Let X be the random variable corresponding to Q and f be the density of X. Thus, the partial mean function µ(u) is given by Obviously, µ(0) = 0 and µ(1) = µ hold.
The following corollary states that the negative of the partial mean function determines the entropy generating function such hat Q is the ME quantile function under the constraints of given mean µ and variance σ 2 . Corollary 1. Let Q be a quantile function. The entropy generating function Φ, such that Q is ME under the constraints of given mean and variance, is given by The partial mean function µ(u) therefore has a special role. As µ(u) sums up the values x of X weighted with the density f (x) until the u-quantile of X, this addition gives constant values until the median quantile for an extremely U-shaped distribution. Thereafter, the value will be changed one time and stays again constant. Thus, the heavier the tails of a distribution, the steeper the entropy-generating function Φ(u) at u = 0 and u = 1. This leads to a large value for the derivative Φ (u) at u = 0 and u = 1. If the support is R, then lim Φ (0) = ∞ and lim Φ (u) = −∞. In line with the generalized maximum entropy principle, we will use (11) to derive Φ such that a given distribution has the ME property under the constraints of fixed mean and variance. In Section 5, based on (11), we will propose an estimator for Φ.

General Results for Non-Negative Support
To date, in the literature, the ME task was mainly considered for lifetime distributions with the special property that the support is (0, ∞). Therefore, in this section, the focus is on the situation where next to the constraints of fixed mean and variance also the support is restricted to (0, ∞). Similar to Section 3.2, the maximum cumulative Φ entropy principle and then the generalized maximum cumulative Φ entropy principle will be introduced in this situation and general formulas for ME quantile functions will be provided.

Maximum Cumulative Φ Entropy Approach
In this section, for given entropy and constraints, a general formula for ME distributions will be derived, following the maximum cumulative Φ entropy approach, while the support of the ME distribution is (0, ∞), which means that Q(0) = 0 holds for the ME quantile function Q. From this fact, we get an additional constraint for the ME task. As further constraints, we consider a fixed mean µ and a fixed k-th power moment µ k , k > 1. The following theorem shows how to derive the ME quantile function under these three constraints. For an ME solution to be existent, it requires a special relationship between the fixed moments µ and µ k .
is monotonically increasing. Then, the ME quantile function under the constraints of given mean and k-th power moment µ k is if Otherwise there is no solution of the ME task.

Proof. Due to the Euler-Lagrange equation, it is
The constraint Q(0) = 0 leads to l 1 = −Φ (0) and kl 2 can be derived from Inserting kl 2 into Q(u) gives (12) immediately. There is the third constraint Dividing µ k on both sides gives (13).
In the most popular case, mean and variance are fixed. This means k = 2 and

Generalized Maximum Cumulative Φ Entropy Approach
The generalized maximum cumulative Φ entropy approach for random variables with non-negative support and Q(0) = 0 remains to be discussed. We start with the knowledge of the quantile function Q to derive the corresponding generating function Φ of the cumulative Φ entropy such that Q is the ME quantile function for Φ under the constraints Q(0) = 0 and fixed mean µ and fixed k-th power moment µ k . Therefore, we introduce a special partial mean function. µ k−1 (u) denotes the partial (k − 1)-th power mean function with 1] for k = 2, 3, . . .. This partial (k − 1)-th power moment function is an important part of the entropy generating function as the following corollary shows.

Corollary 2.
Let Q be a quantile function. The entropy generating function Φ, such that Q is ME under the constraints Q(0) = 0, fixed mean and fixed variance, is given by Proof. Let X be the random variable corresponding to Q and f be the density of X. From In Section 5, we use (14) to estimate Φ from a data set such that the data are generated by the corresponding ME distribution under the constraints of Q(0) = 0 and fixed mean and fixed k-th power moment.

Applications
In this section, we give an overview about some ME distributions for cumulative entropies applying the results of Section 3. For some choices of Φ, the problem of the ME task has already been solved. In the following section, we consider further choices of Φ with a focus on those that lead to well known distributions. With the ME principle, it is no problem to generate completely new distributions, but this will not be the objective of this paper. Table 3 displays an overview of several entropy generating functions and the corresponding ME distributions. The table is divided by the situation where mean and variance are fixed, by the distinction of the MaxEnt and the GMaxEnt task, and by the situation with the additional requirement of Q(0) = 0. Moreover, while cases no. 1 to no. 4 require symmetry of the ME distribution, cases no. 5 to no. 12 allow for skewness of the ME distribution. f N and F −1 N denote the density and the quantile function of the standard normal distribution. We try to assign well known terms to the cumulative entropies generated by the respective Φ. For the solution of the GMaxEnt task (no. 9 and 10), such terms are not available. The second column refers to the Appendix where the cases are discussed in detail. Table 3. Entropy generating functions with corresponding maximum entropy distributions.
Fixed mean and variance, without Some of the results presented in Table 3 are already known from the literature. These are the solutions of no. 1 [24], no. 2 [34], no. 3 [34], no. 4 [23,45] and no. 11 [5]. The remaining cases state new results which are discussed in the Appendix for all readers interested in flexible statistical distributions. The general finding can best be illustrated by solutions no. 1 and no. 2. The ME distributions are the logistic and the Tukey λ distribution. Solving the ME task for the classical differential and the Havrda-Charvát (or Tsallis) entropy given fixed mean and variance results in the normal and the tor r-distribution [46,47]. The difference is easy to explain. The cumulative entropy pulls the ME distribution as much as possible (limited by the restrictions) towards a U-shaped distribution. This leads to distributions with heavier tails (logistic instead of normal, Tukey λ instead of t or r).
There are a lot of entropy-generating functions well-known from physics which could also be considered in the context of cumulative entropies. It is easy to show that the results of Theorem 1 and Theorem 2 can be applied to, e.g., the generating functions of the Rényi [27], the Kaniadakis [48], or the Hanel-Thurner entropy [49], to mention only a few. Another comment deals with the concept of skewness. Some families of distributions have natural parameters of skewness. If the members of these families have closed expressions for the quantile function, Corollary 1 can be applied directly to derive the function Φ for the corresponding cumulative entropy (GMaxEnt task). This is the reason why we focus on the generalized Tukey λ distribution. It is worth noting that again, a kind of non-symmetric cumulative Havrda-Charvát entropy appears as solution (see no 9). Other families of skewed distributions will be defined by modifying a given symmetric distribution. The Fechner approach as well as the still more popular Azzalini approach proceed in such a way. The Fechner approach introduces skewness by splitting the scale parameter for the positive and the negative halves of the underlying symmetric distribution. This leads to a corresponding splitting of the quantile function. Corollary 1 can again easily be applied to solve the GMaxEnt task as long as the quantile function is available in a manageable form. The solution for the normal distribution is given by solution no. 10. For the more popular Azzalini approach [50,51], this is not the case. Therefore, we omit to discuss the GMaxEnt task for this concept to generate skewed distributions. Table 3 only contains special choices of the entropy generating functions Φ. The main question is how to know Φ. The answer could be given by an axiomatic approach or empirically. The starting point for the axiomatic approach are fundamental requirements with a plausible and general accepted interpretation in the considered scientific discipline. Such axiomatizations are available for the differential and the Tsallis entropy. A recent publication on this topic is e.g., [52]. In the context of cumulative entropies, we can go back to approaches in the fuzzy set theory. In this theory, measures of indefiniteness will be axiomatized (see [19,20,53]). The axioms are directly applicable to cumulative entropies. (The discussion of alternative entropies, skewness and axiomatic approaches is based on valuable comments of two anonymous referees.) In the following section, we do not want to discuss the axiomatic approach further. Instead, in the next section, we will focus on how to estimate the entropy generating function Φ.

Estimating the Entropy Generating Function
Can we learn something from data about the entropy generating function Φ for which the data generating distribution is an ME distribution under the constraints of given mean and variance? The entropy generating function Φ is given by the partial mean function Therefore, we can estimate this partial mean function to get an estimator for Φ.
Let X 1 , . . . , X n be identically and stochastically independent distributed random variables. X (n:1) , . . . , X (n:n) denote the corresponding sequence of order statistics. For a fixed value u ∈ [0, 1] such that nu ∈ {1, 2, . . . , n} we consider an estimator of the form We have to notice that the mean and the variance are fixed to the values 0 and 1. In Figure 4, we compare the estimated entropy generating function (−μ(u)) with the entropy generating functions of the standardized t distribution with 4 degrees of freedom and the standard normal distribution. Standardizing gives also the mean value 0 and the variance 1 for the t distribution. The entropy generating function of the t distribution must be calculated by numerical integration. We chose the number of degrees of freedom by trail and error, but ML estimation gives a value not far away from 4. We can see that by estimating the entropy generating function Φ by the partial mean function, the density of the S&P500 standardized daily logarithmic returns can be fitted quite well.
In the following example, we consider a situation with non-negative support. We know from (14) that for a non-negative random variable with Q(0) = 0 and fixed mean µ and fixed k-th power moment µ k the entropy generating function Φ is given by For this entropy generating function Q is an ME quantile function.
To get an estimator for Φ, it is only necessary to estimate the (k − 1)-th power mean µ k−1 and the partial (k − 1)-th power mean function µ k−1 (u). For a fixed value u ∈ [0, 1] (such that nu ∈ {1, 2, . . . , n}), a natural estimator for the partial (k − 1)-th power mean function iŝ An estimator for the entropy generating function Φ is given bŷ We will show that this estimator works well for a real data set and the Weibull distribution. Therefore, we need the partial (k − 1)-th power mean function for the Weibull distribution with shape parameter r and scale parameter λ. For this distribution, it holds ) denotes the distribution function of a Γ distribution with shape parameter a and scale parameter β. The corresponding entropy generating function Φ such that this Weibull distribution is CE Φ maximum under Q(0) = 0 and the constraints of fixed mean µ and fixed k-the power moment is k determines the shape parameter r by the relation We set k = 2. This means that for every potential ME distribution, µ 2 /µ 2 = 1.762 has to hold. This implies r = 1.148 for the shape parameter r of the Weibull distribution. In Figure 5, the estimated entropy generating function is compared with µ k−1 (u) for this Weibull distribution. The fit seems to be rather good in view of the relatively small sample size. Further work will be conducted to estimate the number of degrees or parameters of other flexible distributions by minimizing the distance between the easy-to-calculate empirical entropy generating function, and the entropy-generating function of the distribution, we suppose the data could be generated from. The advantage of this procedure could be that the empirical entropy-generating function is rather smooth. Therefore, minimizing the distance between the entropy generating functions could be more accurate than considering the distance between the empirical quantile functions, a density estimator, or the empirical distribution functions and the corresponding theoretical counterpart. However, this will be investigated in future research.

Conclusions
To be able to estimate and predict while not using information that we do not possess, it is important to derive maximum entropy distributions. Maximizing Shannon's differential entropy under different moment constraints is a well-known task. Without any constraints, the differential entropy will be maximized by a uniform distribution representing the situation of no information. However, an extremely bimodal (=bipolar) distribution represents a situation of so-called contradictory information since an event and its complement can happen with equal probability. In this situation, it is extremely hard to make a forecast, even harder than for a uniformly distributed random variable. Hence, this paper claims that contradictory information is even less useful than minimum or no information as it increases uncertainty and provides a high chance for a wrong decision. Such a bipolardistribution is covered by maximizing a cumulative entropy instead of the differential entropy without any constraints. Such a cumulative entropy depends either on the distribution function (direct), on the survival function (residual) or on both (paired). Under the constraints of fixed mean and variance, maximizing the cumulative entropy tries to transform a distribution in the direction of a bipolar distribution as far as it is allowed by the constraints. For so-called cumulative paired entropies and the constraints that mean and variance are known, solving the maximization problem leads to symmetric ME distributions like the logistic and the Tukey λ distribution [21,34]. So far, other ME distributions were found for the cumulative paired Leik and Gini entropy [23,34,45]. There are two different principles to derive maximum entropy distributions. The maximum entropy principle in the sense of [42,43] is the task to derive an ME distribution for a given entropy and fixed constraints. The generalized maximum entropy approach formulated by [44] uses a given ME distribution for which the corresponding generating function of the cumulative entropy will be derived. In this paper, we will applied both approaches for the cumulative Φ entropy, which generalizes the cumulative paired entropy in several ways and thus introduced the maximum cumulative Φ entropy approach and the generalized maximum cumulative Φ entropy approach. Moreover, we regarded situations with different constraints. First, we considered a situation with arbitrary support and given mean and variance and second a situation with non-negative support and the additional constraint of Q(0) = 0 for the ME quantile function. This was done, as in the literature the ME task was considered mainly for lifetime distributions with the special property that the support is [0, ∞) and it holds Q(0) = 0. Under these additional constraints, we derived ME distributions for fixed mean and k-th power moment. For the situation with arbitrary support and given mean and variance, we introduced the cumulative paired Mielke(r) entropy and derived the ME distributions. The results already known for the cumulative paired Leik and Gini entropy are included for r = 1 and r = 2. Then, starting with a natural generalization of the derivative of the entropy generating function known from the logistic distribution, we derived as ME distribution the generalized logistic distribution (GLO) immediately. Considering a linear combination of entropy generating functions led to new ME distributions with skewness properties. Here, we derived the skewed logistic distribution and the skewed Tukey λ distribution in line with [55]. Next, using the generalized maximum Φ entropy approach, we derived an entropy generating function such that a pre-specified skewed distribution is an ME distribution. The generalized Tukey λ distribution served as an example. Now, we considered Fechner's proposal to define different values of a scale parameter for both halves of a distribution for getting skewed distributions. Again, we derived the corresponding entropy generating function. The skewed normal distribution served as an illustrative example. Then, we focused on the situation where the support of the ME distribution is restricted to (0, ∞), while using the maximum cumulative Φ entropy approach. Here, we derived as ME distribution for the cumulative residual Shannon entropy the Weibull distribution and for the cumulative residual Havrda-Charvát entropy the extended Burr XII distribution. Finally, we proposed an estimator for the cumulative Φ entropy generating function representing all the properties of the underlying ME data generating distribution. This gives an alternative to non-parametric estimation of density functions or distribution functions. The usefulness of this estimator was demonstrated for two real data sets.

Acknowledgments:
The authors would like to thank Paul van Staden for the hint to Hosking's work about the generalized logistic distribution and three anonymous reviewers for their constructive criticism, which helped to improve the presentation of this paper significantly.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
In Appendixes A.1-A.5 the situation will be considered where mean and variance are fixed and the support is arbitrary. Then, in Appendixes A.6 and A.7 the support of the ME distribution has to be non-negative and the property Q(0) = 0 will be required as an additional constraint.
Furthermore, we will use the maximum cumulative Φ entropy approach in Appendixes A.1-A.3 as well as in Appendixes A.6 and A.7. In Appendixes A.4 and A.5 we will use the generalized maximum cumulative Φ entropy approach.
Appendix A.1. Cumulative Paired Mielke(r) Entropy and the Symmetric Beta Distribution Mielke [56] as well as Mielke and Johnson [57] discussed two-sample linear rank tests for alternatives of scale with score generating function This family of tests, parameterized by r, includes the well known test from [58] for r = 1 and [59] for r = 2. Mood [59] also derived the symmetric distribution corresponding to an asymptotically optimal linear rank test with this score generating function. Similar to [34], there is a simple relationship between the score generation functions of two sample linear rank tests for scale alternatives and the entropy generating function Φ of a corresponding cumulative paired Φ entropy. In the case [56,57] considered, we get the entropy generating function This function is strictly concave on [0, 1] for r > 1 and concave for r = 1. Therefore, we will only consider these cases. Φ is differentiable and the derivative is twice integrable.
In the following section, we talk about the cumulative paired Mielke(r) entropy. We show that the symmetric double beta distribution maximizes CPE M(r) , r > 1, if mean and variance are known (see Table 3 no. 5). For r = 1, we get an extremely bimodal distribution. In both cases, the support of the ME distribution is a closed interval. The four-parameter double beta distribution is defined by Corollary A1. The distribution F maximizing under the constraints of known mean µ and known variance σ 2 is a double beta distribution with parameters We get an extremely bimodal distribution with support {µ − σ, µ + σ} for r = 1.

Proof. The derivative of Φ is given by
In (8) we need the expected value of Φ (U) 2 for U ∼ R(0, 1). This expected value is given by With Φ(0) = Φ(1) = 0 and according to (8), the quantile function is given by 1] with the support given by for r = 1 and {µ − σ, µ + σ} for r = 1. For r ≥ 1 the corresponding distribution function is The corresponding density is . This is a beta distribution with parameters a = 1/(r − 1), b = 1 and support For r = 1 we get an extremely bimodal distribution with mass points µ − σ and µ + σ. In Figure A1, the entropy generating functions and ME densities demonstrate the impact of different settings for the parameter r. For r = 1, the entropy generating function is a triangle, such that the ME density is bipolar (extremely bimodal) (see Table 3 no. 3). Increasing r leads to a bimodal distribution (r = 1.5), a uniform distribution (r = 2) (see Table 3 no. 4), up to a very leptokurtic distribution with singularity at 0 (r = 3). With increasing r, the entropy generating functions become more and more platycurtic. The entropy generating function starts at 0 and 1 with an absolute value of the derivative which is smaller than 1/2. This characterizes the compact support of all ME distributions for the cumulative paired Mielke(r) entropy. All distributional characteristics can be learned from the form of the entropy generating function. Klein et al. [34] discussed the cumulative paired Leik entropy and Dai and Chen [45] the cumulative paired Gini entropy. Both are embedded in the class of cumulative paired Mielke(r) entropy for r = 1 and r = 2.

Appendix A.2. Cumulative Paired Shannon Entropy and the Generalized Logistic Distribution
Formula (8) can also be used by starting with derivative of the entropy generating function, instead of with the entropy generating function itself. Such a derivative could be where φ is strictly increasing on [0, 1]. An example is φ(x) = ln x, x > 0, such that In this special case, Φ belongs to the cumulative paired Shannon entropy and the corresponding ME distribution is the logistic distribution under the constraints of fixed mean and variance [21,24,34,39].
As an extension, we can consider For α → 1 we get (A2).
Corollary A2. The density of the ME distribution for the cumulative entropy with derivative (A3) is given by Proof. Φ determines the ME quantile function via (8). This means The support is given by The considered ME task requires the existence of the variance. The k-th power mean . . , k. This means 1 − 1/k < α < 1 + 1/k. Therefore, the variance exists for 1/2 < α < 3/2. Solving Q(u) = x with respect to u gives and differentiating the distribution function delivers the postulated density.
The quantile function (A4) has first been introduced by [60] under the term 'generalized logistic distribution' (GLO). Further discussions of the GLO can be found in [61,62]. The generalization of the GLO is due to the fact that skewness will be introduced into the logistic distribution by the parameter α. The support also depends on the parameter α. For α → 1, we get the support R and as ME distribution the logistic distribution. This is the only symmetric distribution in this GLO class. For α > 1, the support is (−∞, 1/(α − 1)] and the GLO distribution is skewed to the left. For α < 1, we get the support [−1/(1 − α), ∞) and the ME distribution is skewed to the right.
A working paper written by [63] gives an overview about other generalizations of the logistic distribution that are completely different to (A4). Nassar and Elmasry as well as Tripathi et al. [64,65] are also concerned with generalizations of the logistic distribution.
In this paper, as we started with the derivative of the entropy generating function, the entropy generating function Φ belonging to (A3) remains to be identified. By simple integration, we get the partial mean function (.; a, b) as the distribution function of the β(a, b) distribution (see Table 3 no. 6). Then,  Figure A2 displays Φ as well as the corresponding density of the GLO distribution. The logistic distribution serves as a reference (α = 1). The other settings show left skewed (α = 0.6) and two right skewed distributions (α = 1.2 and α = 1.4).

Appendix A.3. Asymmetric Entropy Generating Functions and 'New' Skewed Logistic and Tukey λ Distributions
Again, we apply the maximum cumulative Φ entropy approach where the entropy generating function Φ is fixed and we search for the corresponding ME distribution.
In [34] entropy generating functions of the symmetric form with ϕ concave on [0, 1] have been discussed. Again, we quote the result that for ϕ(u) = −u ln u we get the logistic and for ϕ(u) = u(u α−1 − 1)/(1 − α) the Tukey λ distribution as ME distributions under the constraints of given mean and variance (see Table 3 no. 1, no. 2). In the following section, we consider the more general case that only plays the role of a scale parameter. In the following, we are interested in skewed ME distributions and thus this case will not be considered. Examples for asymmetric functions are We search for ME distributions in this special case. To get only one skewness parameter, we set α 1 + α 2 = 1 such that α 1 = α and α 2 = 1 − α for α ∈ [0, 1]. The following corollary gives the corresponding ME quantile function. 1] with ϕ from (A5) the entropy generating function. Then the corresponding ME quantile function is with support depending on α and λ > −1/2 Proof. The claim follows immediately by calculating the derivative Φ and the fact that Q(u)) = −Φ (u), u ∈ [0, 1]. The variance exists for λ > −1/2. The support is easily verified by calculating Q(0) and Q(1).
For α = 1/2, the entropy generating function gives different weight to small and large values of u which leads to an asymmetric ME quantile function determined by the negative of the derivative of Φ. In the case of α = 1/2, we get the the logistic distribution and the well-known Tukey λ distribution. Therefore, (A6) can be considered as the quantile function of a skewed logistic distribution or a skewed Tukey λ distribution (see Table 3 no. 7, no. 8). Both seem to be a new alternative to the already discussed generalized Tukey λ and the GLO-distribution. However, in his Ph.D. thesis [55] already introduced these alternatives. In his joint paper with King [66], he investigated the properties of the skewed logistic distribution intensively. Now, we know how the cumulative paired Φ entropy looks like for which these distributions are ME distributions under the constraints of known mean and variance. [55] and [66] derived explicit expressions for the ML estimators for the location parameter, the scale parameter, the parameter α, and the parameter λ as well as their asymptotic standard errors. Here, skewness and kurtosis are controlled by the latter parameters jointly. More precisely, [55] discussed the quantile function with a location parameter a and a scale parameter b > 0. With this terminology Q belongs to the setting a = 2α − 1 and b = λ + 1.  Figure A3 gives an impression of the different entropy characterizing functions Φ and the density of the skewed logistic (λ = 0) and the skewed Tukey λ distribution (λ = −0.4). The symmetric logistic and Tukey λ distribution (α = 1/2) serve as a point of reference. The parameter setting α = 0.1 results in left skewed logistic and Tukey λ distributions. For α = 0.7 and α = 0.9, we get a right skewed distribution. Moreover, in Figure A3, the parameter λ was set to 2.0 to display a compact support and an extremely skewed distribution.
Appendix A.4. Skewed Tukey λ Distribution and Entropy Generating Function Φ Now, we will apply the generalized maximum cumulative Φ entropy approach. This means that we interpret formula (8) as a situation where a quantile function Q is fixed and we derive the entropy generating function Φ such that Q represents the corresponding ME distribution. We want to apply this approach to a quantile function considered by [67]. They introduced the so-called generalized Tukey λ distribution with the help of the quantile function Here, λ 4 is a location parameter, λ 3 is a scale parameter and λ 1 , λ 2 determine the distribution's skewness. Freimer et al. [67] showed that the k-th moment exists iff min(λ 1 , λ 2 ) > −1/k. This means that the variance exists for min(λ 1 , λ 2 ) > −1/2. Simple calculations lead to formulas for the mean and the variance In the following corollary, we show the entropy generating function for which the generalized Tukey λ distribution is an ME distribution under the constraints of fixed mean and variance.
Corollary A4. Let Q be the quantile function of the generalized Tukey λ distribution. The entropy generating function Φ, such that Q is the ME quantile function under the constraints of known mean and variance, is given by The setting λ = λ 1 = λ 2 results in the symmetric Tukey λ distribution (see Table 3 no. 9). Klein et al. [34] identified this distribution as ME distribution for the entropy generating function 1]. This is (up to constant 1/(λ + 1)) identical with the entropy generating function (A7). In the upper two panels of Figure A4, we see the corresponding entropy generating function and the ME density for several choices of λ. The range of distributional properties for the Tukey λ distribution is much richer than it is for the ME distributions belonging to the cumulative paired Mielke(r)-entropy. For negative values of λ, the support is the whole real line R since the entropy generating function has a non-finite derivative at 0 and 1. For λ = 0, we get the logistic distribution. For positive values of λ the support is a compact interval. λ = 1 and λ = 2 give uniform distributions. Figure A4. Entropy generating function and density for the Tukey λ (λ = 0.51, 0, 1, 2) and the generalized Tukey λ distribution (λ 1 = 0.51, 0, 1, 2; λ 2 = 0).
To demonstrate the consequences of skewness, the lower panels of Figure A4 show the entropy generating function and the density for λ 1 = λ 2 . We set λ 2 = 0 and vary only the values of λ 1 . Now, for λ 1 = 0 it is Φ(1) = 0. This can be explained by the skewness of the distribution. Φ represents the cumulative mean function. For a skewed distribution, summing up the quantile function over the positive part does not exactly compensate the sum of the quantile function over the negative part. This latter part can be smaller for a left skewed and greater for a right skewed distribution. Chalabi et al. [68] gave an excellent overview about the properties, parameter estimation, and applications of the generalized Tukey λ distribution. We recommend to study the long list of references in their paper. Also [55] discussed all these aspects in detail in his Ph.D. thesis from the University of Pretoria. He mentioned several applications in distinct scientific fields ranging from actuarial science over finance up to supply chain planning (pp. 127). Chalabi et al. [68] focused on applications in finance. King and MacGillivray [69] discussed the ordering properties of skewness and kurtosis for the generalized λ distribution. Concerned with estimation in this family were [70][71][72].
Appendix A.5. Fechner Approach of Skewness and Entropy Generating Function Φ There are many other proposals to introduce a skewness parameter into a symmetric distribution next to those presented in Appendix A.3. One proposal can be traced back to [73], p. 295. He proposed to use different values of a scale parameter for the left and the right half of a symmetric distribution. Klein and Fischer [74] showed that such a split of the scale parameter leads to a skewed distribution such that the skewness parameter attains the skewness ordering of [75]. Arellano-Valle et al. [76] picked up the Fechner proposal and introduced one parameter of skewness γ by considering functions f 1 (γ) and f 2 (γ) as different scale parameters for both halves of a symmetric distribution. This approach includes the proposal made by [77] with f 1 (γ) = 1/γ) and f 2 (γ) = γ, γ > 0. Let f be a density that is symmetric around 0. Then a density with skewness parameter γ is given by for x ∈ R, γ > 0. The corresponding distribution function is In this section, we follow the generalized maximum cumulative Φ entropy approach. This means to identify the entropy generating function Φ such that (A8) is an ME distribution under the constraints of fixed mean and variance.
Corollary A5. Let f be a density, which is symmetric around 0 and Q the corresponding quantile function. Constraints are given by fixing mean and variance. Then, (A8) is the density of the ME distribution with entropy generating function Proof. By inverting the distribution function (A9), we get the quantile function Summing up this quantile function leads to the partial mean function The negative of the partial mean function determines the entropy generating function Φ such that (A5) is an ME distribution under the constraints of given man and variance.
As an illustrating example, we consider the skewed normal distribution in the following.
(see Table 3 no. 10). Figure A5 shows the entropy generating function and the density of the skewed normal distribution for different values of the skewness parameter γ. The symmetric standard normal distribution (γ = 1) serves as a point of reference. Up to now, there was no constraint on the ME distribution's support e.g., in the form Q(0) = 0. Therefore, we could always apply Theorem 1. However, as in the literature the ME task was considered mainly for situations with support [0, ∞), we will consider this situation in the following two Sections. Thus, in this Section and Appendix A.7, the support of the ME distribution has to be non-negative and the property Q(0) = 0 will be required as an additional constraint for the ME task. Due to the non-negativity, it is possible to derive ME distributions under the constraints of a given mean and more general under a given k-th power moment for k ≥ 2.

Appendix A.7. Cumulative Residual Havrda & Charvát Entropy and the Extended Burr XII Distribution
Now, we substitute the generating function of the cumulative residual Shannon entropy by the more general generating function of the cumulative residual Havrda & Charvát entropy. This will lead to a generalized Weibull distribution, also known as extended Burr XII distribution. Again, following the maximum cumulative Φ entropy approach, the task is to search for the ME distribution while Φ is given.
The entropy generating function of the cumulative residual Shannon entropy can be generalized to Again, we get Φ(1) = Φ(0) = 0. The derivative Φ is such that Φ (0) = 1 and Now, we look for the corresponding ME distribution if Q(0) = 0 and µ, µ k are fixed. The following corollary states that this ME distribution generalizes the Weibull distribution (see Table 3 no. 12).
Mudholkar et al. [78] introduced the quantile function (A13) with a different parametrization and called the corresponding distribution 'generalized Weibull distribution'. Due to the fact that the Burr XII distribution [79] is a special case (in our parametrization for α = 2), [80] use the term 'extended Burr XII distribution' (EBXII). A more recent paper that discussed this distribution is [81]. They were concerned with parameter estimation (see also [82][83][84]). Applications are lifetimes of devices with a bathtub hazard rate [78], flood frequency, and duration analysis [81][82][83].