Estimation of divergence measures via weighted Jensen inequality on time scales

The main purpose of the presented paper is to obtain some time scale inequalities for different divergences and distances by using weighted time scales Jensen’s inequality. These results offer new inequalities in h-discrete calculus and quantum calculus and extend some known results in the literature. The lower bounds of some divergence measures are also presented. Moreover, the obtained discrete results are given in the light of the Zipf–Mandelbrot law and the Zipf law.


Introduction
Distance or divergence measures are of key importance in statistics and information theory. Depending upon the nature of the problem, different divergence measures are suitable. A number of measures of divergence that compare two probability distributions have been proposed (see [15,16,23,24,31,37] and the references therein). Csiszár [12] introduced the f -divergence functional as follows. Definition 1.1 Suppose that f : R + → (0, ∞) is a convex function. Letr = (r 1 , . . . , r n ) and s = (s 1 , . . . , s n ) be such that n k=1 r k = 1 and n k=1 s k = 1. Then an f -divergence functional is stated as The Csiszár's f -divergence is a broad class of divergences which consists of various divergence measures used in finding out the difference between two probability densities. A significant property of Csiszár's f -divergence is that several well-known divergence measures can be deduced from this divergence measure by suitable substitutions to the convex function f . In recent years, several researchers have done a considerable work providing various kinds of bounds on the divergences and distances, see e.g. [13,14,25,33]. Jensen's inequality has an important role in obtaining inequalities for divergence measures. It helps to compute useful upper bounds for several entropic measures used in information theory. In [18], Jain et al. established an information inequality regarding Csiszár f -divergence by utilizing the convexity condition and Jensen's inequality. This inequality is applied in comparing some well-known divergences which play a significant role in information theory. In [19], Khan et al. obtained new results for the Shannon and Zipf-Mandelbrot entropies. They also computed different bounds for these entropies by using some refinements of the Jensen inequality. In [21], the authors established various inequalities for convex functions and applied them to Csiszár divergence. They also obtained several results for Zipf-Mandelbrot entropy. In [27], Mehmood et al. obtained a new generalized form of cyclic refinements of Jensen's inequality from convex to higher order convex functions by utilizing Taylor's formula. They also computed bounds for various notable inequalities utilized in information theory. In [11], Butt et al. used discrete and continuous cyclic refinements of Jensen's inequality and extended them from convex to higher order convex function by using new Green functions and Abel-Gontscharoff interpolating polynomial. As an application, they established a connection between new entropic bounds for relative, Shannon, and Mandelbrot entropies. In [22], Khan [8,9] related to time scales are compact and resolve a lot of time scales calculus. In the past years, new developments in the theory and applications of dynamic derivatives on time scales emerged. Many results from the continuous case are carried over to the discrete one very easily, but some seem to be completely different. The study on time scales comes to reveal such discrepancies and to make us understand the difference between the two cases. The Jensen inequality has been extended to time scales by Agarwal et al. (see [1,8]). Various classical inequalities and their converses for isotonic linear functionals on time scales are established in [5]. In [6], Anwar et al. gave the properties and applications of Jensen functionals on time scales for one variable. Further in [7], the authors obtained the Jensen inequality for several variables and deduced Jensen functionals. They also derived properties of Jensen functionals and applied them to generalized means. In recent years, the study of dynamic inequalities on time scales has been considered by several authors, see [1,28,30,32,36,39,40]. In [3], Ansari et al. obtained Shannon type inequalities on an arbitrary time scale. They also deduced bounds of differential entropy on time scale for various distributions. Further in [4], the authors established several inequalities for Csiszár f -divergence among two probability densities on time scales. They also obtained new results for divergence measures in h-discrete calculus and quantum calculus.
Quantum calculus or q-calculus is usually called calculus without limits. In 1910, Jackson [17] described a q-analogue of derivative and integral operator along with their applications. He was the first to establish q-calculus in an organized form. It is important to note that quantum integral inequalities are more significant and constructive than their classical counterparts. It has been primarily for the reason that quantum integral inequalities can interpret the hereditary properties of the fact and technique under consideration. Recently, there has been a rapid development in q-calculus. Consequently, new generalizations of the classical approach of quantum calculus have been proposed and analyzed in various literature works. The concepts of quantum calculus on finite intervals were given by Tariboon and Ntouyas [34,35], and they obtained certain q-analogues of classical mathematical objects, which motivated numerous researchers to explore the subject in detail. Subsequently, several new results related to the quantum counterpart of classical mathematical results have been established.

Preliminaries
An arbitrary nonempty closed subset of the real line is known as time scale T ⊂ R. The subsequent results and definitions are given in [8].

Definition 2.2
Let T be a time scale and z : T → R be a function, then z is known as rd-continuous or right-dense continuous if its left-sided limits exist (finite) at left-dense points in T and it is continuous at right-dense points in T. The set of rd-continuous functions z : T → R is usually denoted by C rd .
Let us introduce the set T k as follows:

Definition 2.3
Consider a function z : T → R and ζ ∈ T k . Then we define z (ζ ) to be the number (when it exists) with the property that given any > 0, there is a neighborhood U of ζ such that In this case, z is said to be delta differentiable at ζ .
For T = R, z becomes ordinary derivative z , while if T = Z, then z turns into the usual forward difference operator z(ζ ) = z(ζ + 1)z(ζ ). If T = q Z = {q n : n ∈ Z} {0} is the so-called q-difference operator, with q > 1, then Theorem 2.1 (Existence of antiderivatives) Every rd-continuous function has an antiderivative. If x 0 ∈ T, then F is defined by is an antiderivative of f .
In [38], Wong et al. gave the weighted Jensen inequality on time scales which is stated as follows.

Theorem 2.2 Assume that I ⊂ R, and let r
( 1 ) When f is a strictly convex function, the inequality sign in (1) is strict.

Divergences on time scales
Consider the set of rd-continuous functions on time scale T to be In the sequel, we assume that r, s ∈ and the following integrals exist:

Csiszár f -divergence
Csiszár f -divergence on time scale is defined in [4] as follows: where f is convex on (0, ∞).

Differential entropy (continuous entropy)
Consider a positive density function r on time scale T to a continuous random variable X with b a r(ζ ) ζ = 1, wherever the integral exists. In [3], Ansari et al. defined the so-called differential entropy on time scale by whereb > 1 is the base of log. In the sequel, we assume that the base of log is greater than 1.

Theorem 3.2 Suppose that r, s ∈ C rd ([a, b] T , R) are -integrable functions and r is a positive probability density function with S
where h¯b(ζ ) is defined in (6) and a, b ∈ T. Proof the stated result.
Remark 3.2 The inequality in (7) holds in the opposite direction for the base of log less than 1.
Remark 3.4 (9) contains Shannon entropy which is new in quantum calculus up to the knowledge of authors.

Karl Pearson χ 2 -divergence
The χ 2 -divergence on time scale is defined in [4] as follows: Theorem 3.3 Assume the conditions of Theorem 3.1 to get where D χ 2 (s, r) is defined in (10).
after simplification we get Example 3.7 If T = R, then (11) takes the form Example 3.9 Choose T = q N 0 (q > 1) in Theorem 3.3 to have a new lower bound for χ 2divergence in quantum calculus

Kullback-Leibler divergence
Kullback-Leibler divergence on time scale is defined in [4] as follows: where D(s, r) is defined in (15).
the desired result.

Hellinger discrimination
Hellinger discrimination on time scale is defined in [4] as follows: Theorem 3.5 Assume the conditions of Theorem 3.1 to obtain where h 2 (s, r) is defined in (19).
after simplification we obtain

Bhattacharyya coefficient
The Bhattacharyya coefficient on time scale is defined in [4] as follows: Theorem 3.6 Assume the conditions of Theorem 3.1 to get where D B (s, r) is defined in (25).
the desired result.

Jeffreys distance
Jeffreys distance on time scale is defined in [4] as follows: Theorem 3.7 Assume the conditions of Theorem 3.1 to get where D J (s, r) is defined in (29).
or we have the desired result.

Triangular discrimination
Triangular discrimination on time scale is defined in [4] as follows: Theorem 3.8 Assume the conditions of Theorem 3.1 to obtain where D (r, s) is defined in (33).
Example 3.23 Choose T = hZ, h > 0 in Theorem 3.8 to get a new lower bound for the triangular discrimination in h-discrete calculus (s jr j ) 2 s j + r j .

Zipf-Mandelbrot law
The Zipf-Mandelbrot law is a discrete probability distribution and is defined via a probability mass function which is given as follows: where is a generalization of a harmonic number and N ∈ {1, 2, . . . }, a > 0 and b ∈ [0, ∞) are parameters. If b = 0 and N is finite, then the Zipf-Mandelbrot law is commonly known as the Zipf law. By expression (37), the probability mass function in connection with the Zipf law is where Using N, a, b) in (37) as a probability mass function, we observe the obtained results via the Zipf-Mandelbrot law.
For this reason, we give results concerning the Csiszár functionalĨ f (s,r) for the Zipf-Mandelbrot law.
Case-3 Ifs andr both are defined as the Zipf law for N -tuples, then Csiszár functional (5) becomes Start from case-1 which is for the single Zipf-Mandelbrot law r j , j = 1, . . . , N .  The next result is for case-2 as both s j and r j are defined by the Zipf-Mandelbrot law. R, and let N ∈ N, a 1 , a 2 > 0, b 1 1 ) a 1 H N,a 1 ,b 1 ∈ I for j = 1, . . . , N . If f is a convex function, then n j=1 N, a 1 , a 2 , b 1 , b 2 ).

Corollary 4.2 Assume that I ⊂
(45) Proof Using r j = 1 (j+b 2 ) a 2 H N,a 2 ,b 2 and s j = 1 (j+b 1 ) a 1 H N,a 1 ,b 1 for j = 1, . . . , N , in (4), we get (45),   To give certain results related to the particular cases of f -divergences, we begin with the well-known Kullback-Leibler divergence (18). If s j and r j are defined by the Zipf-Mandelbrot law.  N, a 1 , a 2 , b 1 , b 2 ). (48) Proof The function f (ζ ) = ζ ln(ζ ) is convex. Use f (ζ ) = ζ ln(ζ ) in (45) to get (48), wherẽ The following result holds as both s j and r j are defined by the Zipf law. Analogous results for the Hellinger distance (23) are given as follows. (50) The following result holds as both s j and r j are defined by the Zipf-Mandelbrot law.  N, a 1 , a 2 , b 1 , b 2 ).
The following result holds when both s j and r j are defined via the Zipf law.   N, a 1 , a 2 ).