A Note on Equivalent Conditions for Majorization

In this paper, we introduce novel characterizations of the classical concept of majorization in terms of upper triangular (resp., lower triangular) row-stochastic matrices, and in terms of sequences of linear transforms on vectors. We used our new characterizations of majorization to derive an improved entropy inequality.


Introduction
The concept of majorization has a rich history in mathematics with applications that span a wide range of disciplines.Majorization theory originated in economics [1], where it was employed to rigorously explain the vague notion that the components of a given vector are "more nearly equal" than the components of a different vector.Nowadays, majorization theory finds applications in numerous areas, ranging from pure mathematics to combinatorics [2][3][4], from information and communication theory [5][6][7][8][9][10][11][12][13] to thermodynamics and quantum theory [14,15], from mathematical chemistry [16] to optimization [17], and much more.
There are many equivalent conditions for majorization.We review the most common ones in Section 2. Successively, we present our new conditions for majorization in Sections 3 and 4. Finally, in Section 5, we present an application to entropy inequalities.
There are many equivalent conditions for majorization (e.g., see [2], Chapter 4).The conditions that are more closely related to the subject matter of our paper are expressed in terms of doubly stochastic matrices and T -transforms.
where 0 ≤ λ < 1, I is the n × n identity matrix, and Q = [Q ℓm ] is a permutation matrix such that 1 for ℓ = m, and ℓ, m {i, j} 1 for ℓ = j and m = i 1 for ℓ = i and m = j 0 otherwise, for some indices i, j ∈ {1, . . ., n}, i j.

Majorization by lower triangular stochastic matrices
We start this section by introducing the concept of A-transform.
Informally, an A-transform of a vector x = (x 1 , . . ., x n ) is a transformation that involves two vector components, x i and x j , with i < j.The transformation operates on the vector x by increasing the value of the component x i by the quantity λx j and decreasing the value of the component x j by the same value λx j , where λ is a real number λ ∈ [0, 1].More formally, an A-transform can be described by the matrix: where I is the n × n identity matrix and X = [X ℓm ] is a matrix with all entries equal to 0 except for two elements X ji and X j j , for a given pair of indices i, j, j > i, where X ji = λ and X j j = −λ.Thus, the vector xA has the form Note that the matrix A = [L ℓm ] is lower triangular and row-stochastic, that is, ) ) A ℓm = 1 for all ℓ. (3.5) The following theorem holds.
Theorem 3.1.Let x, y ∈ R n + .It holds that x ≺ y if, and only if, y can be derived from x by the successive applications of a finite number of A-transforms.
Proof.Let x ≺ y.To avoid trivialities, we assume x y.We shall prove that y can be derived from x by the successive applications of a finite number of A-transforms.
Since the first condition of (2.1) holds, there is an index j such that For such an index j, it holds that x j < y j .From (3.6) and the second condition of (2.1), we get that there exists an index k > j such that x k > y k .
Let j be the smallest index such that y j > x j , and let k be the smallest index greater than j such that We define an A-transform as in (3.2), with λ = δ/x k and X = [X ℓm ] defined as follows: The application of such a matrix A on the vector x gives the vector xA = x ′ with components We pause here to illustrate the rest of our proof technique, which proceeds through the following steps: (1) We compute the smallest index j for which (3.6) holds.This means that vectors x and y coincide on the first j − 1 components.
(2) We modify vector x according to the A operator defined above, to get vector x ′ as described in (3.8).
(3) We prove (3.9) below, without altering the order of the components of x ′ = xA (this is crucial).( 4) The number of components on which x ′ and y now coincide is greater than the number of components on which x and y coincide.
Let us show that the new vector x ′ satisfies the following property: (3.9) From (3.8) and since the vector x satisfies the first condition of (2.1), we get From (3.7), we know that x j + δ ≤ y j .Thus, from (3.8) and (3.10), we get By definition, the index k is the smallest index greater than j for which x k > y k .It follows that Therefore, from (3.8) and (3.11), we obtain that for each ℓ = j + 1, . . ., k − 1, it holds that x i (since x j + δ ≤ y j ) From (3.8) and since the vector x satisfies the first condition of (2.1), we get Finally, since the vector x satisfies the first and second condition of (2.1), we have that Therefore, from (3.10), (3.11), and (3.13)-(3.15),we have that (3.9) holds.Notice that if δ = y j − x j , then x ′ j is equal to y j ; equivalently, if δ = x k − y k , then x ′ k will be equal to y k .Thus, the vector x ′ = xA has at least one additional component (with respect to x) that is equal to a component of y.Moreover, since each A-transform preserves the property (3.9), we can iterate the process starting from x ′ = xA.It follows that y can be derived from x by the application of a finite number of A-transforms.
Let us prove the converse part of the theorem.Hence, we assume that y can be derived from x by a finite number of A-transforms, and we prove that x ≺ y.Let be the vectors obtained by the successive applications of a number k of A-transforms.Given an arbitrary vector z ∈ R n + , let us denote with z ↓ the vector with the same components as z, ordered in a nonincreasing fashion.From the definition (3.8) of A-transform, it follows that the partial sums of x ′ are certainly greater than or equal to the corresponding partial sums of x.Therefore, By the transitivity property of the partial order relation ≺, we get x ≺ y.
Corollary 3.2.Let x, y ∈ R n + .If x ≺ y, then y can be derived from x by the successive application of, at most, n − 1 A-transforms.
Proof.In the proof of Theorem 3.1, we have shown that the application of each A-transform equalizes at least one component of the intermediate vectors x j = x j−1 A to a component of y.Since all vectors appearing in (3.16) have an equal sum, the last A-transform always equalizes both the affected components to the respective components of y.As a result, y can be obtained by the application of at most n − 1 A-transforms.
Although it is evident, let us explicitly mention that if y can be derived by the application of at most n − 1 A-transforms, then it holds that y = xL, where the matrix L is the product of the individual A-transforms.
The following technical lemma is probably already known in the literature.Since we have not found a source that explicitly mentions it, we provide a proof of it to maintain the paper self-contained.Lemma 3.3.Let C and D be two n × n lower triangular row-stochastic matrices.The product matrix CD is still a lower triangular row-stochastic matrix.
Proof.Since CD is the product of two lower triangular matrices, one can see that CD is a lower triangular matrix too.Thus, we need only to show that it is row-stochastic.
First of all, each entry (CD) i j of CD is nonnegative since it is the sum of nonnegative values.Let us consider the sum of the elements of the i-th row: Thus, since the above reasoning holds for each i = 1, . . ., n, the matrix CD is a lower triangular rowstochastic matrix.
It may be worth commenting that Lemma 3.3 also holds for a product of column-stochastic matrices, which gives a column-stochastic matrix.This holds since (CD) T = D T C T , where, for an arbitrary matrix D, D T denotes the transpose of D, and the transpose of a row-stochastic matrix is a columnstochastic matrix.
The next Theorem characterizes majorization in terms of lower triangular row-stochastic matrices.
Theorem 3.4.Let x, y ∈ R n + .It holds that x ≺ y if, and only if, there exists a lower triangular rowstochastic matrix L such that y = xL.
Proof.The implication that if x ≺ y, then there exists a lower triangular row-stochastic matrix L such that y = xL directly follows from the results of Theorem 3.1 and Lemma 3.3.Indeed, from Theorem 3.1 we know that y can be obtained from x by the successive application of A-transforms, and from Lemma 3.3 we know that the product of two consecutive A-transforms is still a lower triangular row-stochastic matrix.Hence, the matrix L obtained as the product of all the A-transforms is a lower triangular row-stochastic matrix for which y = xL.
We notice that a different proof of the above result has been given by Li in [18, Lemma 1] by an application of Algorithm 1 in [19] Let us now prove the converse implication.Assume that there exists a lower triangular row-stochastic matrix L = [L i j ] for which y = xL.Let us prove that x ≺ y.For such a purpose, we can express each component of y in the following way: (3.17) Hence, by using (3.17), we can rewrite the sum of the first k components of y as follows: By grouping the multiplicative coefficients of each x i , we get: Since the matrix L is lower triangular row-stochastic, we have that for each j = 1, . . ., n, it holds that j i=1 L ji = 1.Hence, from (3.18), we get for each k = 1, . . ., n, and Thus, x ≺ y.
We point out the following interesting property of the matrix L that appears in the "if" part of Theorem 3.4.For every A-transform, we recall that we always choose the smallest index j such that y j > x j , and the smallest index k greater than j for which y k < x k .Therefore, all the previous A-transforms have chosen indices less than or at most equal to k.Consequently, in the matrix C, all the rows after the k-th row are equal to the rows of the identity matrix.By the definition, the matrix X m has nonzero elements only in the k-th row, (in positions X k j and X kk , respectively).Hence, the matrix CX m has nonzero elements only in the entries (CX m ) k j and (CX m ) kk .Since the element (CX m ) kk will be added to the diagonal element of C, the only new nonzero element in CX m (with respect to C) is (CX m ) k j .Hence, from (3.21), we get that the product CA m gives a matrix with only one new element with respect to C.
Since each product generates a matrix with only one additional nonzero element with respect to the previous one, we obtain that the final matrix L has at most n + 1 + n − 2 = 2n − 1 nonzero elements.
We summarize our results in the next theorem, mirroring the classic Theorem 2.1.
Theorem 3.6.If x, y ∈ R n + , the following conditions are equivalent: (1) x ≺ y; (2) y = xL for some lower triangular row-stochastic matrix L; (3) y can be derived from x by successive applications of at most n − 1 A-transforms, as defined in (3.2).
Proof.The equivalences follow from the results of Theorems 3.1 and 3.4.
Let us look at an example of how the matrix L is constructed.
The second A-transform affects the elements x 1 1 and x 1 3 with δ = min Therefore, the final matrix for which y = xL is It is worth noticing that the matrix L is not the inverse of the doubly stochastic matrix P, for which x = yP, obtained by applying a series of T-transforms (Theorem 2.1).In fact, the inverse of L is which is not a doubly stochastic matrix.

Majorization by upper triangular stochastic matrices
We now present our characterization of majorization through upper triangular row-stochastic matrices.
A B-transformation or, more briefly, a B-transform, is a transformation of a vector y = (y 1 , . . ., y n ) that involves two vector components, y i and y j , with i < j.The transformation operates on the vector y by decreasing the component y i by the quantity λy i and increasing the component y j by the same value λy i , where λ is a real number such that λ ∈ [0, 1].
We can describe a B-transform as a matrix where I is the identity matrix and Y = [Y ℓm ] is a matrix with all entries equal to 0 except for two elements Y ii and Y i j , where Y ii = −λ and Y i j = λ.Thus, yB has the form yB = (y 1 , . . ., y i−1 , y i − λy i , y i+1 , . . ., y j−1 , y j + λy i , y j+1 , . . ., y n ).(4.2) Note that the matrix B = [B ℓm ] is upper triangular and row-stochastic, that is, ) The next theorem relates the B-transforms to majorization.
Theorem 4.1.Let x, y ∈ R n + .It holds that x ≺ y if, and only if, x can be derived from y by the successive applications of a finite number of B-transforms.
Proof.Let x = (x 1 , . . ., x n ) ≺ y = (y 1 , . . ., y n ), x y.We shall prove that x can be derived from y by the successive applications of a finite number of B-transforms.
Let j be the largest index for which y j > x j , and let k be the smallest index greater than j such that x k > y k .Note that such a pair j, k must exist (as we argued in the proof of Theorem 3.1).We set the quantity δ as δ = min {y j − x j , x k − y k }.We define a B-transform as in (4.1), with λ = δ/y j and Y such that The application of such a matrix B on the vector y gives the vector yB = y ′ with components y ′ = (y 1 , . . ., y j−1 , y j − δ, y j+1 , . . ., y k−1 , y k + δ, y k+1 , . . ., y n ).(4.7) Let us show that the new vector y ′ still majorizes x.From (4.7) and since x ≺ y, we get From (4.6) and (4.7), we know that y ′ j ≥ x j .Furthermore, by definition, the index j is the largest index such that y j > x j , and k is the smallest index greater than j such that y k < x k .It follows that Thus, from (4.8), we obtain that for each ℓ = j, . . ., k − 1, it holds that x i (from (4.9)) From (4.7), we get Finally, since x ≺ y, we have that x i . (4.12) Therefore, from (4.8) and (4.10)-(4.12),we have that x ≺ y ′ .Notice that if δ = y j − x j , then y ′ j is equal to x j ; equivalently, if δ = x k − y k , then y ′ k is equal to x k .Moreover, since each B-transform preserves the majorization, we can iterate the process starting from y ′ = yB.It follows that x can be derived from y by the application of a finite number of B-transforms.
Let us now prove the converse part of the theorem.We prove it by contradiction.Hence, we assume that x ⊀ y, and show that if x can be derived from y by the successive applications of a finite number of B-transforms, we get a contradiction.
Since x ⊀ y, there exists an index ℓ ∈ {1, . . ., n}, such that Moreover, by definition of B-transform (4.1), the quantity λy j can be moved between two components y j and y k of y, with j > k, only from y j to y k .Therefore, the sum of the first ℓ components of y cannot be increased in any way through B-transforms.Consequently, this leads to a contradiction, because not all components y 1 , . . ., y ℓ can be transformed into their respective components of x.Thus, it must hold that x ≺ y.Proof.In the proof of Theorem 4.1, we have shown that the application of each B-transform equalizes at least one component of the intermediate vectors y j = y j−1 B j to a component of x.Observe that since all vectors appearing in the sequence of transformation from y to x have an equal sum, the last B-transform always equalizes both the affected components to the respective components of x.As a result, x can be obtained by the application of at most n − 1 B-transforms.
With the same technique of Lemma 3.3, we can prove the following result.Proof.The implication that if x ≺ y, there exists an upper triangular row-stochastic matrix U such that x = yU directly derives from the results of Theorem 4.1 and Lemma 4.3.Indeed, from Theorem 4.1, we know that x can be derived from y by the successive application of B-transforms, and from Lemma 4.3 that the product of B-transforms is still an upper triangular row-stochastic matrix.Hence, the matrix U obtained as the product of all B-transforms is an upper triangular row-stochastic matrix such that x = yU.
We now prove the converse implication.Let U be an upper triangular row-stochastic matrix U such that x = yU.It follows that each component of x can be written as follows: By (4.14), we can express the sum of the first k components of x, with k = 1, . . ., n, as follows: Thus, x ≺ y.
We now bound the number of nonzero elements in the matrix U of Theorem 4.4.
Corollary 4.5.Let x, y ∈ R + n .If x ≺ y, then there exists an upper triangular row-stochastic matrix U, with at most 2n − 1 nonzero elements, such that x = yU.
Proof.From Theorem 4.1 and Corollary 4.2, we know that the matrix U, for which it holds that x = yU, is the product of at most n − 1 B-transforms.Let B 1 , . . ., B t be the individual matrices associated with such t B-transforms, t < n.By (4.1), the matrix B 1 has n + 1 nonzero elements.
Let C be the matrix equal to the product of the first gives a matrix with only one additional nonzero element with respect to C. Let j, k be the pair of indices chosen to construct B m = I + Y m .From (4.15), we get For every B-transform, we recall that we always choose the largest index j such that y j > x j , and the smallest index k greater than j for which y k < x k .Therefore, all the previous B-transforms have chosen pairs of indices i, ℓ such that i is greater than or at most equal to j.Consequently, in the matrix C, all the rows above the j-th row are equal to the rows of the identity matrix.In addition, by the definition, the matrix Y has nonzero elements only in the j-th row, in positions Y j j and Y jk , respectively.Hence, the matrix CY m has nonzero elements only in the entries (CY m ) j j and (CY m ) jk .Since the element (CY m ) j j will be added to the diagonal element of C, the only new nonzero element is (CY m ) jk .Hence, from (4.16), we get that the product gives a matrix with only one new additional element with respect to C. each product generates a matrix with only one additional nonzero element with respect to the previous one, we obtain that the final matrix U has at most n + 1 + n − 2 = 2n − 1 nonzero elements.
We summarize our results in the next theorem, in the fashion of the classic Theorem 2.1.
Theorem 4.6.If x, y ∈ R n + , the following conditions are equivalent: (1) x ≺ y; (2) x = yU for some upper triangular row-stochastic matrix U; (3) x can be derived from y by the successive applications of at most n − 1 B-transforms, as defined in (4.2).
Proof.The equivalences are a direct consequence of the results of Theorem 4.1, Corollary 4.2, and Theorem Let us now see an example of the construction of the matrix U.The first B-transform modifies the elements y 1 and y 2 with δ = min {y 1 − x 1 , x 2 − y 2 } = 0.1 and λ = δ/y 1 = 1/6.As a result, the associated matrix B 1 is the following one: The second B-transform affects the elements y 1 1 and y 1 3 with δ = min {y 1 1 − x 1 , x 3 − y 1 3 } = 0.2 and λ = δ/y 1 1 = 2/5.Hence, the matrix B 2 is the following one Hence, the final matrix is: It is worth pointing out that the above example also shows that the matrices U of this section are not simply the inverses of the matrices L of Section 3.

Applications
We recall that a real-valued function φ : A ⊆ R n −→ R is said to be Schur-concave [2] if In the rest of this section, we will assume that the set A corresponds to the (n − 1)-dimensional probability simplex P n , defined as It is well known that the Shannon entropy H(x) = − n i=1 x i log 2 x i , is Schur-concave over P n .Therefore, for any x, y ∈ P n , if x ≺ y, it holds that H(x) ≥ H(y). (5.1) The above inequality (5.1) is widely used in information theory.There are several improvements to the basic inequality (5.1).For instance, the authors of [20] proved that for any x, y ∈ P n , if x ≺ y, then it holds that H(x) ≥ H(y) + D(y||x), ( where D(y||x) = i y i log(y i /x i ) is the relative entropy between y and x.
The paper [21] proved a different strengthening of (5.1) (see also Proposition A.7.e. of [2]).More precisely, Proposition A.7.e. of [2] states that if x, y ∈ P n and x ≺ y, then it holds that where P = [P i j ] is a doubly stochastic matrix for which x = yP, and α(P) is the Dobrushin coefficient of ergodicity of P, defined as: min {P ℓi , P mi }.
It might be useful to recall that there are papers that intend the Dobrushin coefficient of ergodicity of P as 1 − α(P).
We show that our results from the previous sections can be used to obtain a different improvement of the basic inequality (5.1).In fact, we prove the following result.
Theorem 5.1.Let x, y ∈ P n and x ≺ y.Moreover, let U be the upper triangular matrix obtained through the sequence of B-transforms described in Theorem 4.1 for which x = yU.If x y, it holds that where u = (1/n, . . ., 1/n) and α(U) is the Dobrushin coefficient of ergodicity of U.
To prove the theorem, we need some intermediate results.We recall that a matrix C is said to be column-allowable if each column contains at least one positive element [21].From [21][Thm.1.4], we obtain the following Lemma † .† Theorem 1.4 of [21] is for general f -divergences.

Published in AIMS Mathematics 2024
Volume 9, Issue 4, 8641-8660 Lemma 5.2.Let C ∈ R n×n + be a row-stochastic and column-allowable matrix and let x, y ∈ P n , then where D(x y) = n i=1 x i log 2 (x i /y i ) is the relative entropy between x and y.We note that Lemma 5.2 is a classical example of the role of contraction coefficients.Contraction coefficients are important quantities in strong data-processing inequalities [8].
By exploiting the knowledge of the structure of the matrices U of Section 4, we obtain the following result.
Lemma 5.3.Let x, y ∈ P n , such that x ≺ y.Moreover, let U be the upper triangular matrix obtained through the sequence of B-transforms described in Theorem 4.
where C = ℓ−1 i=1 B i .Let j, k be the pair of indices, with j < k chosen in the first B-transform B 1 , then we know that (5.10) Hence, from (5.10) and by noticing that for each i = 1, . . ., n, it holds that (uB 1 ) i = (1/n) n ℓ=1 B 1 ℓi , we get the following series of equalities and inequalities: Observe that by construction of C, it holds that C j j ≤ 1 and that the only nonzero element in the j-th column is C j j .Thus, it follows that n t=1 C t j = C j j .Similarly, it also holds that C kk = 1.In fact, since y k < x k , the B-transforms do not modify the k-th row (that remains equal to the row of the identity matrix).Thus, it follows that n t=1 C tk ≥ 1.Therefore, we can rewrite (5.15) as follows n i=1 x i log 2 (uC) i (u(CB ℓ )) i = x j log 2 C j j C j j − λC (5.16) Thus, we proved that (5.9) holds.Since both (5.8) and (5.9) hold, we have proved the Lemma, given that n i=1 x i log 2 1 (uU) i > n i=1 x i log 2 1 (uB 1 ) i > log 2 n.
We can now prove Theorem 5.1.
Proof.Observe that the matrix U is a row-stochastic and column-allowable matrix.Thus, from Lemma (5.(5.17) Expanding the divergences in (5.17

Conclusions
In this paper, we have introduced two novel characterizations of the classical concept of majorization in terms of upper triangular (resp., lower triangular) row-stochastic matrices.The interesting features of our upper triangular (resp., lower triangular) row-stochastic matrices are that they are quite sparse in the sense that they have few nonzero elements this property might be useful in practical applications.Finally, we have used our new characterization of majorization in terms of upper triangular row-stochastic matrices to derive an improved entropy inequality.We mention that one could derive a similar (albeit nonequivalent) improved entropy inequality by using the characterization of majorization in terms of lower triangular row-stochastic matrices that we have given in Section 3. To do so, the way to proceed is similar to the one we presented in Section 5.

Corollary 4 . 2 .
Let x, y ∈ R n + .If x ≺ y, then x can be derived from y by the application of at most n − 1 B-transforms.

Lemma 4 . 3 .
Let C and D be two n × n upper triangular row-stochastic matrices.The product matrix CD is still an upper triangular row-stochastic matrix.The following theorem characterizes majorization in terms of the upper triangular row-stochastic matrix.Theorem 4.4.Let x, y ∈ R n + .It holds that x ≺ y if, and only if, there exists an upper triangular row-stochastic matrix U such that x = yU.
Proof.From Corollary 3.2 and Theorem 3.4, we know that the matrix L, for which it holds that y = xL, is the product of at most n − 1 A-transforms.Let A 1 , . . ., A then there exists a lower triangular row-stochastic matrix L, with at most 2n − 1 nonzero elements, such that y = xL.t(3.19)be the individual matrices associated with such t A-transforms, t < n.By Definition (3.1), the matrix A 1 has n + 1 nonzero elements.Let C be the matrix equal to the product of the first m − 1 A-transforms of (3.19), m ≤ t < n, that is, Let j, k be the pair of indices chosen to construct A m = I + X m .From (3.20), we get 1 such that x = yU.If x y, it holds that i be the upper triangular matrix obtained as product of the s B-transforms B 1 , . . ., B s .To prove the lemma, we first show that Successively, we prove that for each ℓ = 2, . . ., s − 1, it holds that n i=1 ), we get Published in AIMS Mathematics 2024 Volume 9, Issue 4, 8641-8660From (5.18), we get the following lower bound on the entropy of x:H(x) ≥ (1 − α(U))H(y) + > (1 − α(U))H(y) + log 2 n − (1 − α(U)) log 2 n i − H(x) ≤ (1 − α(U))(log 2 n − H(y)).(5.18)= (1 − α(U))H(y) + α(U)) log 2 n ≥ H(y) (since H(y) ≤ log 2 n).