Exact Expressions for Kullback–Leibler Divergence for Multivariate and Matrix-Variate Distributions

The Kullback–Leibler divergence is a measure of the divergence between two probability distributions, often used in statistics and information theory. However, exact expressions for it are not known for multivariate or matrix-variate distributions apart from a few cases. In this paper, exact expressions for the Kullback–Leibler divergence are derived for over twenty multivariate and matrix-variate distributions. The expressions involve various special functions.


Introduction
The Kullback-Leibler divergence (KLD) due to [1] is a fundamental concept in information theory and statistics used to measure the divergence between two probability distributions.It quantifies how one probability distribution diverges from a second, reference probability distribution.Specifically, it calculates the expected extra amount of information required to represent data sampled from one distribution using a code optimized for another distribution.The KLD is asymmetric and not a true metric as it does not satisfy the triangle inequality.It is widely employed in various fields including machine learning, where it serves as a key component in tasks such as model comparison, optimization, and generative modeling, providing a measure of dissimilarity or discrepancy between probability distributions [2].
Suppose X is a continuous vector-variate random variable or a continuous matrixvariate random variable having one of two probability density functions f i (•; θ i ), i = 1, 2 parameterized by θ i , i = 1, 2. The KLD between f 1 (•; θ 1 ) and f 2 (•; θ 2 ) is defined by where the expectation is with respect to f 1 (•; θ 1 ).
Because of the increasing applications of the KLD, it is useful to have exact expressions for (1).Apart from the multivariate normal distribution, not many expressions have been derived for (1) for multivariate or matrix-variate distributions.The KLD for the multivariate generalized Gaussian distribution was derived only in 2019, see [3].The KLD for the multivariate Cauchy distribution was derived only in 2022, see [4].The KLD for the multivariate t distribution was derived only in 2023, see [5].
The aim of this paper is to derive exact expressions for (1) for over twenty multivariate and matrix-variate distributions.The exact expressions for multivariate distributions are presented in Section 2. The exact expressions for matrix-variate distributions are presented in Section 3. The derivations of all of the expressions including a technical lemma needed for the derivations are presented in Section 4. The distributions considered in this paper are continuous.We shall not be considering discrete distributions including mixtures.
The functions and parameters used in this paper are all real-valued.The calculations involve several real-valued special functions listed in the Appendix A.
A closed form for (1) for the multivariate generalized Gaussian distribution was derived by [3].But it involved a special function defined as a (p − 1) folded infinite sum.The expression we give in Section 2.2 is much simpler in that it involves a single infinite sum.A closed form for (1) for the Dirichlet distribution is available in [9] and https://statproofbook.github.io/P/dir-kl.html(accessed on 1 July 2024).

Inverted Dirichlet Distribution
Consider the joint probability density functions

Multivariate Gauss Hypergeometric Distribution [11]
Consider the joint probability density functions

Multivariate Kotz Type Distribution [12]
Consider the joint probability density functions provided that the infinite sum converges.

Multivariate Logistic Distribution [6]
Consider the joint probability density functions p! provided that ∑ p j=1 i j b j a j < 1 and the infinite series converges.

Multivariate Logistic Distribution [7]
Consider the joint probability density functions i j c j a j < b and the infinite series converges.

Sarabia [8]'s Multivariate Normal Distribution
Consider the joint probability density functions

Multivariate Pearson Type II Distribution
Consider the joint probability density functions 2.10.Multivariate Selberg Beta Distribution [13] Consider the joint probability density functions .

Multivariate Weighted Exponential Distribution [14]
Consider the joint probability density functions and which follows from properties stated in [14] provided that the infinite series converge.

Von Mises Distribution
Consider the joint probability density functions and

Matrix-Variate Beta Distribution [15]
Consider the joint probability density functions and and Ω, x, Ω − x being p × p positive definite matrices.The corresponding KLD is

Matrix-Variate Gamma Distribution
Consider the joint probability density functions

Matrix-Variate Gauss Hypergeometric Distribution [16]
Consider the joint probability density functions .

Matrix-Variate Inverse Beta Distribution
Consider the joint probability density functions and

Matrix-Variate Inverse Gamma Distribution
Consider the joint probability density functions

Matrix-Variate Normal Distribution
Consider the joint probability density functions and positive definite symmetric matrices of dimension p × p and M 1 , M 2 being matrices of dimension n × p.The corresponding KLD is 3.10.Matrix-Variate Two-Sided Power Distribution [19] Consider the joint probability density functions The corresponding KLD is

Proofs
Before presenting the proofs of the expressions in Sections 2 and 3, we state a lemma and give its proof.

A Technical Lemma
for t j > 0, a j > 0, j = 1, 2, . . ., p and b > 0.Then, I a 1 , . . ., a p , t 1 , . . ., t j a j > 0. Proof.Setting y j = exp −a j x j and assuming the conditions in the lemma, we can write The result follows.□

Proof for Section 2.1
The corresponding KLD can be expressed as It is easy to show that reduces to the required.

Proof for Section 2.2
The corresponding KLD can be expressed as The second expectation in (3) can be expressed as 1 and V = PDP −1 , where P is an orthonormal matrix composed of eigenvectors of V and D is a diagonal matrix composed of eigenvalues say λ i of V.Then, the first expectation in (3) can be expressed as where x and z = P T y.Using the pseudo-polar transformation z 1 = r sin θ 1 , z 2 = r cos θ 1 sin θ 2 , . .., z p = r cos θ 1 cos θ 2 • • • cos θ p−1 https://en.wikipedia.org/wiki/Polar_coordinate_system (accessed on 1 July 2024), ( 4) can be expressed as where x i = cos 2 θ i and B p x 1 , . . ., , we can apply the generalized multinomial theorem to calculate ( 5) as provided that the infinite sum converges.Hence, the required.

Proof for Section 2.3
The corresponding KLD can be expressed as It is easy to show that reduces to the required.

Proof for Section 2.4
The corresponding KLD can be expressed as It is easy to show that C(a 1 , . . ., a i , . . ., a K , b, c) C(a 1 , . . . ,a K , b, c − α) α=0 , so (7) reduces to the required.

Proof for Section 2.5
The corresponding KLD can be expressed as The second expectation in (8) can be calculated as where y = Σ − 1 2 1 x and t = q y T y a . Let 1 .The first expectation in (8) can be calculated as where y = Σ − 1 2 1 x and t = q y T y a .
As in Section 4.3, write Σ = PDP −1 , where P is an orthonormal matrix composed of eigenvectors of Σ and D is a diagonal matrix composed of eigenvalues say λ i of Σ.Then, the fourth expectation in (8) can be expressed as where y = Σ − 1 2 1 x and v = P T y.Using the pseudo-polar transformation 9) can be expressed as where x i = cos 2 θ i , t = qr 2a and B p x 1 , . . ., , we can apply the generalized multinomial theorem to calculate (10) as provided that the infinite sum converges.The third expectation in (8) can be expressed as and The I 1 can be calculated as where t = qr 2a .
The I 2 can be calculated as provided that the infinite sum converges.Hence, the required.

Proof for Section 2.6
The corresponding KLD can be expressed as since the expectations are zero.Using the Taylor expansion for log(1 + z), the first expectation in (11) can be expressed as where the last step follows by Lemma 1 provided that ∑ p j=1 i j b j a j < 1 and the infinite series converges.The second expectation in (11) can be expressed as p! , where the penultimate step is followed by Lemma 1.Hence, the required.

Proof for Section 2.7
The corresponding KLD can be expressed as since the expectations are zero.Using the Taylor expansion for log(1 + z), the first expectation in (12) can be expressed as where the last step follows by Lemma 1 provided that ∑ p j=1 i j c j a j < b and the infinite series converges.The second expectation in (12) can be expressed as where the penultimate step is followed by Lemma 1.Hence, the required.
4.9.Proof for Section 2.7 The corresponding KLD can be expressed as Using results in [8], we can calculate (13) as the required.
4.10.Proof for Section 2.9 The corresponding KLD can be expressed as It is easy to show that so (14) reduces to the required.

Proof for Section 2.10
The corresponding KLD can be expressed as Easy calculations show that , so (7) reduces to the required.

Proof for Section 2.11
The corresponding KLD can be expressed as Using the series expansion for log(1 + z), we can express (16) as Hence, the required.

Proof for Section 2.12
The corresponding KLD can be expressed as Hence, the required.

Proof for Section 3.1
The corresponding KLD can be expressed as The expectations in ( 17) can be calculated as Hence, the required.

Proof for Section 3.2
The corresponding KLD can be expressed as It is easy to show that B p (a 1 , . . . ,a i + α, . . ., a n ; a n+1 ) B p (a 1 , . . . ,a i , . . . ,a n ; a n+1 ) α=0 and reduces to the required.

Proof for Section 3.3
The corresponding KLD can be expressed as The first expectation in ( 19) can be calculated as .
Since E(X) = 2aΣ 1 , the second and third terms in (19) are equal to tr − respectively.Hence, the required.
4.17.Proof for Section 3.4 The corresponding KLD can be expressed as The expectations in (20) can be easily calculated as .
Hence, the required.

Proof for Section 3.5
The corresponding KLD can be expressed as The expectations in (21) can be calculated as Hence, the required.

Proof for Section 3.6
The corresponding KLD can be expressed as The first expectation in ( 22) can be calculated as The expectations in (24) can be easily calculated as Hence, the required.
4.22.Proof for Section 3.9 The corresponding KLD can be expressed as The second expectation in (25) can be expressed as The first expectation in (25) can be expressed as for a j > 0, j = 1, 2, . . ., k + 1; the modified Bessel function of the first kind of order ν defined by