Inequalities of the Edmundson-Lah-Ribari\v{c} type for n-convex functions with applications

In this paper we derive some Edmundson-Lah-Ribaric type inequalities for positive linear functionals and n-convex functions. Main results are applied to the generalized f-divergence functional. Examples with Zipf-Mandelbrot law are used to illustrate the results.


Introduction
Let E be a non-empty set and let L be a vector space of real-valued functions f : E → R having the properties: (L1) f, g ∈ L ⇒ (af + bg) ∈ L for all a, b ∈ R; (L2) 1 ∈ L, i.e., if f (t) = 1 for every t ∈ E, then f ∈ L.
Since it was proved, the famous Jensen inequality and its converses have been extensively studied by many authors and have been generalized in numerous directions. Jessen [17] gave the following generalization of Jensen's inequality for convex functions (see also [30, p.47]): Theorem 1.1. ( [17]) Let L satisfy properties (L1) and (L2) on a nonempty set E, and assume that f is a continous convex function on an interval I ⊂ R. If A is a positive linear functional with A(1) = 1, then for all g ∈ L such that f (g) ∈ L we have A(g) ∈ I and f (A(g)) ≤ A(f (g)). (1.1) The following result is one of the most famous converses of the Jensen inequality known as the Edmundson-Lah-Ribarič inequality, and it was proved in [3] by Beesack and Pečarić (see also [30, p.98]): Theorem 1.2. ( [3]) Let f be convex on the interval I = [a, b] such that −∞ < a < b < ∞. Let L satisfy conditions (L1) and (L2) on E and let A be any positive linear functional on L with A(1) = 1. Then for every g ∈ L such that f (g) ∈ L (so that a ≤ g(t) ≤ b for all t ∈ E), we have For some recent results on the converses of the Jensen inequality, the reader is referred to [7], [19], [20], [27], [29] and [31].
Unlike the results from the above mentioned papers, which require convexity of the involved functions, the main objective of this paper is to obtain inequalities of the Edmundson-Lah-Ribarič type that hold for n-convex functions, which will also be a generalization of the results from [24] and [25].
Definition of n-convex functions is characterized by n-th order divided differences. The n-th order divided difference of a function f : [a, b] → R at mutually distinct points t 0 , t 1 , ..., t n ∈ [a, b] is defined recursively by The value [t 0 , ..., t n ]f is independent of the order of the points t 0 , ..., t n . Definition of divided differences can be extended to include the cases in which some or all the points coincide (see e.g. [2], [30]): f [a, ..., a n times The results in this paper are obtained by utilizing Hermite's interpolating polynomial, so first we need to give a definition and some properties (see [2]).
Let −∞ < a < b < ∞ and let a ≤ a 1 < a 2 < ... < a r ≤ b, where r ≥ 2, be given points. For f ∈ C n ([a, b]) there exists a unique polynomial P H (t), called Hermite's interpolating polynomial, of degree (n − 1) fulfilling Hermite's conditions: Among other special cases, these conditions include type (m, n − m) conditions, which will be of special interest to us: To give a development of the interpolating polynomial in terms of divided differences, first let us assume that the function f is also defined at a point t = a j , 1 ≤ j ≤ n. In [2] it is shown that where and R(t) =(t − a 1 ) · · · (t − a n )f [t, a 1 , ..., a n ]. (1.5) In case of (m, n − m) conditions, (1.4) and (1.5) become This paper is organized in the following manner: main results, that are inequalities of the Edmundson-Lah-Ribarič type for n-convex functions, are given in Section 2; application of the main results to the generalized f -divergence functional is given in Section 3, and finally in section 4 the results for the generalized f -divergence are applied to Zipf-Mandelbrot law.

Results
Throughout this paper, whenever mentioning the interval [a, b], we assume that −∞ < a < b < ∞ holds.
Let L satisfy conditions (L1) and (L2) on a non-empty set E, let A be any positive linear functional on L with A(1) = 1, and let g ∈ L be any function such that g(E) ⊆ [a, b]. For a given function f : [a, b] → R denote: Following representations of the left side in the Edmundson-Lah-Ribarič inequality is obtained by using Hermite's interpolating polynomials in terms of divided differences (1.6).
Lemma 2.1. Let L satisfy conditions (L1) and (L2) on a non-empty set E and let A be any positive linear functional on L with A(1) = 1. Let f ∈ C n ([a, b]), and let g ∈ L be any function such that f • g ∈ L. Then the following identities hold: where R m (·) is defined in (1.7). After some straightforward calculations, for different choices of 1 ≤ m ≤ n − 1, from (2.5) we get the following: • for m = 1 it holds , so we can replace t with g(t) in (2.6), (2.7) and (2.8), and thus obtain: , and ).

Identities (2.2), (2.3) and (2.4) follow by applying positive normalized linear functional A
to the previous equalities respectively. , and let g ∈ L be any function such that f • g ∈ L. Then the following identities hold: where m ≥ 3 and Proof. Let us define an auxiliary function F : , so we can apply (2.6), (2.7) and (2.8) to F and obtain respectively We can calculate divided differences of the function F in terms of divided differences of the function f : Now (2.13), (2.14) and (2.15) become Let g ∈ L be any function such that f after puttingḡ(t) in (2.16), (2.17) and (2.18) instead of t, we get Identities (2.9), (2.10) and (2.11) follow after applying a normalized positive linear functional A to previous equalities respectively.
Our first result is an upper bound for the difference in the Edmundson-Lah-Ribarič inequality, expressed by Hermite's interpolating polynomials in terms of divided differences.
Inequality (2.19) also holds when the function f is n-concave and n and m are of equal parity. In case when the function f is n-convex and n and m are of equal parity, or when the function f is n-concave and n and m are of different parity, the inequality sign in (2.19) is reversed.
Proof. We start with the representation of the left side in the Edmundson-Lah-Ribarič inequality (2.4) from Lemma 2.1 with a special focus on the last term: Since A is positive, it preserves the sign, so we need to study the sign of the expression: ].
Since a ≤ g(t) ≤ b for every t ∈ E, we have (g(t) − a) m ≥ 0 for every t ∈ E and any choice of m. For the same reason we have (g(t) − b) ≤ 0. Trivially it follows that (g(t) − b) n−m ≤ 0 when n and m are of different parity, and (g(t) − b) n−m ≥ 0 when n and m are of equal parity.
If the function f is n-convex, If the function f is n-convex and if m ≥ 3 is odd, then Inequality (2.20) also holds when the function f is n-concave and m is even. In case when the function f is n-convex and m is even, or when the function f is n-concave and m is odd, the inequality sign in (2.20) is reversed.
Proof. Similarly as in the proof of the previous theorem, we start with the representation of the left side in the Edmundson-Lah-Ribarič inequality (2.11) from Lemma 2.2 with a special focus on the last term: ].
Since a ≤ g(t) ≤ b for every t ∈ E, we have (g(t) − a) n−m ≥ 0 for every t ∈ E and any choice of m. For the same reason we have (g(t) − b) ≤ 0. Trivially it follows that (g(t) − b) m ≤ 0 when m is odd, and (g(t) − b) m ≥ 0 when m is even.
If the function f is n-convex, then its n-th order divided differences are greater of equal to zero, and if the function f is n-concave, then its n-th order divided differences are less or equal to zero. Now (2.20) easily follows from Lemma (2.2).
Inequality (2.21) also holds when the function f is n-concave and m is even. In case when the function f is n-convex and m is even, or when the function f is n-concave and m is odd, the inequality signs in (2.21) are reversed. , and if the function f is 3-concave, then the inequality signs are reversed. It is obvious that inequalities (2.21) from Corollary 2.1 provide us with a generalization of the result stated above.
Next result gives us an upper and a lower bound for the difference in the Edmundson-Lah-Ribarič inequality expressed by Hermite's interpolating polynomials in terms of divided differences, and it is obtained from Lemma 2.1.
Inequalities (2.22) also hold when the function f is n-concave and n is even. In case when the function f is n-convex and n is even, or when the function f is n-concave and n is odd, the inequality signs in (2.22) are reversed.
Proof. From the discussion about positivity and negativity of the term A(R m (g)) in the proof of Theorem 2.1, for m = 1 it follows that * A(R 1 (g)) ≥ 0 when the function f is n-convex and n is odd, or when f is n-concave and n even; * A(R 1 (g)) ≤ 0 when the function f is n-concave and n is odd, or when f is n-convex and n even. Now the identity (2.2) gives us for A(R 1 (g)) ≥ 0, and in case A(R 1 (g)) ≤ 0 the inequality sign is reversed.
In the same manner, for m = 2 it follows that * A(R 2 (g)) ≤ 0 when the function f is n-convex and n is odd, or when f is n-concave and n even; * A(R 2 (g)) ≥ 0 when the function f is n-concave and n is odd, or when f is n-convex and n even. In this case the identity (2.3) for A(R 2 (g)) ≤ 0 gives us g, a, b, A) ≤f [a, a; and in case A(R 2 (g)) ≥ 0 the inequality sign is reversed.
When we combine the two results from above, we get exactly (2.22).
By utilizing Lemma 2.2 we can get similar bounds for the difference in the Edmundson-Lah-Ribarič inequality that hold for all n ∈ N, not only the odd ones. If the function f is n-convex, then (

2.23)
If the function f is n-concave, the inequality signs in (2.23) are reversed.
Proof. We return to the discussion about positivity and negativity of the term A(R * m (g)) in the proof of Theorem 2.2. For m = 1 we have (g(t) − b) 1 (g(t) − a) n−1 ≤ 0 for every t ∈ E, so A(R * 1 (g)) ≥ 0 when the function f is n-concave, and A(R * 1 (g)) ≤ 0 when the function f is n-convex. Now the identity (2.9) for a n-convex function f gives us and if the function f is n-concave, the inequality sign is reversed.
Similarly, for m = 2 we have so A(R * 2 (g)) ≥ 0 when the function f is n-convex, and A(R * 2 (g)) ≤ 0 when the function f is n-concave. In this case the identity (2.10) for a n-convex function f gives us and if the function f is n-concave, the inequality sign is reversed.
When we combine the two results from above, we get exactly (2.23).
when we take n = 3 in (2.22) or (2.23), we get that

Applications to Csiszár divergence
Let us denote the set of all probability distributions by P, that is we say p = (p 1 , ..., p r ) ∈ P if p i ∈ [0, 1] for i = 1, ..., r and r i=1 p i = 1. Numerous theoretic divergence measures between two probability distributions have been introduced and comprehensively studied. Their applications can be found in the analysis of contingency tables [13], in approximation of probability distributions [8], [22], in signal processing [18], and in pattern recognition [4], [6].
Csiszár [9]- [10] introduced the f −divergence functional as where f : [0, +∞ is a convex function, and it represent a "distance function" on the set of probability distributions P. A great number of theoretic divergences are special cases of Csiszár f -divergence for different choices of the function f . As in Csiszár [10], we interpret undefined expressions by In this section our intention is to derive mutual bounds for the generalized f -divergence functional in described setting. In such a way, we will obtain some new reverse relations for the generalized f -divergence functional that correspond to the class of n-convex functions. It is a generalization of the results obtained in [25]. Throughout this section, when mentioning the interval [a, b], we assume that [a, b] ⊆ R + . For a n-convex function f : [m, M ] → R we give the following definition of generalized f -divergence functional: The first result in this section is carried out by virtue of our Theorem 2.1. ([a, b]) and let p = (p 1 , ..., p r ) and p = (q 1 , ..., q r ) be probability distributions such that p i /q i ∈ [a, b] for every i = 1, ..., r.
If the function f is n-convex and if n and 3 ≤ m ≤ n − 1 are of different parity, then Inequality (3.4) also holds when the function f is n-concave and n and m are of equal parity. In case when the function f is n-convex and n and m are of equal parity, or when the function f is n-concave and n and m are of different parity, the inequality sign in (3.4) is reversed.
Proof. Let x = (x 1 , ..., x r ) be such that x i ∈ [a, b] for i = 1, ..., r. In the relation (2.19) we can replace In that way we get In the previous relation we can set p i = q i and x i = p i q i , and after calculatingx we get (3.4).
By utilizing Theorem 2.2 in the analogous way as above, we get an Edmundson-Lah-Ribarič type inequality for the generalized f -divergence functional (3.2) which does not depend on parity of n, and it is given in the following theorem. ([a, b]) and let p = (p 1 , ..., p r ) and p = (q 1 , ..., q r ) be probability distributions such that p i /q i ∈ [a, b] for every i = 1, ..., r. If the function f is n-convex and if 3 ≤ m ≤ n − 1 is odd, Inequality (3.5) also holds when the function f is n-concave and m is even. In case when the function f is n-convex and m is even, or when the function f is n-concave and m is odd, the inequality sign in (3.5) is reversed.
Another generalization of the Edmundson-Lah-Ribarič inequality, which provides us with a lower and an upper bound for the generalized f -divergence functional, is given in the following theorem. ([a, b]) and let p = (p 1 , ..., p r ) and p = (q 1 , ..., q r ) be probability distributions such that p i /q i ∈ [a, b] for every i = 1, ..., r. If the function f is n-convex and if n is odd, then we have Inequalities (3.6) also hold when the function f is n-concave and n is even. In case when the function f is n-convex and n is even, or when the function f is n-concave and n is odd, the inequality signs in (3.6) are reversed.
Proof. We start with inequalities (2.22) from Theorem 2.3, and follow the steps from the proof of Theorem 3.1.
By utilizing Theorem 2.4 in an analogue way, we can get similar bounds for the generalized f -divergence functional that hold for all n ∈ N, not only the odd ones. ([a, b]) and let p = (p 1 , ..., p r ) and p = (q 1 , ..., q r ) be probability distributions such that p i /q i ∈ [a, b] for every i = 1, ..., r. If the function f is n-convex, then we have If the function f is n-concave, the inequality signs in (3.7) are reversed.
⊲ Kullback-Leibler divergence of the probability distributions p and q is defined as and the corresponding generating function is f (t) = t log t, t > 0. We can calculate It is clear that this function is (2n − 1)-concave and (2n)-convex for any n ∈ N. ⊲ Hellinger divergence of the probability distributions p and q is defined as and the corresponding generating function is f (t) = 1 2 (1 − √ t) 2 , t > 0. We see that so function f is (2n − 1)-concave and (2n)-convex for any n ∈ N.
Two cases need to be considered: * if t < −1, then the function f is n-convex for every n ∈ N; * if t > −1, then the function f is (2n)-concave and (2n − 1)-convex for any n ∈ N. ⊲ Jeffreys divergence of the probability distributions p and q is defined as and the corresponding generating function is f (t) = (1 − t) log 1 t , t > 0. After calculating, we see that f (n) (t) = (−1) n+1 t −n (n − 1)!(1 + nt).
Obviously, this function is (2n − 1)-convex and (2n)-concave for any n ∈ N. It is clear that all of the results from this section can be applied to the special types of divergences mentioned in this example.

Examples with Zipf and Zipf-Mandelbrot law
Zipf's law [33], [34] has a significant application in a wide variety of scientific disciplines -from astronomy to demographics to software structure to economics to zoology, and even to warfare [12]. It is one of the basic laws in information science and bibliometrics, but it is also often used in linguistics. Typically one is dealing with integer-valued observables (numbers of objects, people, cities, words, animals, corpses) and the frequency of their occurrence.
Probability mass function of Zipf's law with parameters N ∈ N and s > 0 is: Benoit Mandelbrot in 1966 gave an improvement of Zipf law for the count of the low-rank words. Various scientific fields use this law for different purposes, for example information sciences use it for indexing [11,32], ecological field studies in predictability of ecosystem [26], in music it is used to determine aesthetically pleasing music [23].
Zipf-Mandelbrot law is a discrete probability distribution with parameters N ∈ N, q, s ∈ R such that q ≥ 0 and s > 0, possible values {1, 2, ..., N } and probability mass function Let p and q be Zipf-Mandelbrot laws with parameters N ∈ N, q 1 , q 2 ≥ 0 and s 1 , s 2 > 0 respectively and let us denote H N,q 1 ,s 1 = H 1 , H N,q 2 ,s 2 = H 2 a p,q : = min p i q i = H 2 H 1 min (i + q 2 ) s 2 (i + q 1 ) s 1 b p,q : = max p i q i = H 2 H 1 max (i + q 2 ) s 2 (i + q 1 ) s 1 (4.2)