Two Bregman Projection Methods for Solving Variational Inequality Problems in Hilbert Spaces with Applications to Signal Processing

Studying Bregman distance iterative methods for solving optimization problems has become an important and very interesting topic because of the numerous applications of the Bregman distance techniques. These applications are based on the type of convex functions associated with the Bregman distance. In this paper, two different extragraident methods were proposed for studying pseudomonotone variational inequality problems using Bregman distance in real Hilbert spaces. The first algorithm uses a fixed stepsize which depends on a prior estimate of the Lipschitz constant of the cost operator. The second algorithm uses a self-adaptive stepsize which does not require prior estimate of the Lipschitz constant of the cost operator. Some convergence results were proved for approximating the solutions of pseudomonotone variational inequality problem under standard assumptions. Moreso, some numerical experiments were also given to illustrate the performance of the proposed algorithms using different convex functions such as the Shannon entropy and the Burg entropy. In addition, an application of the result to a signal processing problem is also presented.


Introduction
Let H be a real Hilbert space with norm · and inner product ·, · . We study the Variational Inequality Problem (shortly, VIP) of the form: where A : H → H is a given operator and C is a nonempty closed convex subset of H. We shall denote the set of solutions of the VIP (1) by Ω V I . The VIP has drawn much attention from researchers because of its importance as a core model for studying many mathematical problems which include convex programming, equilibrium problem, inclusion problem, split feasibility problem, complimentarity problem, minimization problem, etc, see e.g., [1,2]. Recently, VIP has also been used for studying optimization problems arising in machine learning, signal processing and linear inverse problem; see, for instance [3][4][5].
Due to its importance, many researchers have developed several iterative methods for solving the VIP (1) in Hilbert and Banach spaces, see for example [6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22] and references therein. One of the simplest methods known for solving the VIP is the Extragradient Method (EM) introduced by Korpelevich [23] for a case where A is monotone and Lipschitz operator in a finite-dimensional space as follows: Given a 0 ∈ C ⊆ R n , compute b n = P C (a n − γAa n ), a n+1 = P C (a n − γAb n ), n ≥ 1, where γ ∈ (0, 1/L), A : H → H is monotone and L-Lipschitz continuous which has been extended to infinite-dimensional spaces by many authors, e.g., [8,9,13,19,24,25]. However, the EM requires computing the projection onto C twice per each iteration which is not cost effective in practice. As a result of this drawback, there has been an increasing effort in finding appropriate method for solving the VIP. Censor et al. [13] introduced a Subgradient Extragradient Method (SEM) by replacing the secong projection in SEM with projection onto a half-space as follows: Algorithm SEM: Given a 0 ∈ C, γ ∈ (0, 1 L ), compute a 0 ∈ C, b n = P C (a n − γAa n ), a n+1 = P T n (a n − γAb n ), where T n := {x ∈ H : a n − γAa n − b n , x − b n ≥ 0} n ≥ 0. ( Note that the set T n is a half-space whose projection can be explicitly calculated (see for instance [26]). Under some mild assumptions, Censor et al. [13] obtained a weak convergence result for solving VIP using (3), and by modifying the SEM with Halpern iterative scheme (see [27]), they proved the strong convergences of the SEM under certain mild conditions (see for instance [14,15]).
It is worthy to mention that both the EM and SEM require two evaluations of A per each iteration. Popov [28] later proposed a nice modification of the EM that required the evaluation of A at only one point in each iteration. Malitsky and Semenov [29] modified the Popov's extragradient method by replacing the second projection onto C with a projection onto a half-space. Further extension of Popov's extragradient method was introduced by Nomirovskii et al. [30] by using Bregman distance function [31] in a finite-dimensional setting. Recently, Gibali [32] extended the methods of Malitsky and Semenov [29] and Nemirovskii et al. [30] to infinite-dimensional Hilbert space using a Bregman distance function. A weak convergence result is proved for the sequence generated by these methods provided the stepsize satisfies the condition λ ∈ 0, , where A is monotone and L-Lipschitz continuous.
An obvious disadvantage in the above-mentioned results is that the stepsize of the algorithms requires a prior estimate of the Lipschitz constant L. It is known that the Lipschitz constant are too difficult to estimate in general. Even when it is possible, the estimate is often too small which affects the convergence of the algorithm. An effort in solving this problem for the case of monotone VIP was recently introduced by Hieu and Cholamjiak [33]. They introduced an extragradient method with Bregman distance which does not require a prior estimate of the Lipschitz constant. However, their algorithm can only be applicable to monotone VIP but not pseudomonotone. When the cost operator is not monotone, the result of [33] can not be applied to such VIP, for instance, see Example 3.12 and 4.2 of [19].
Motivated by these results, in the present paper, we first introduce two Bregman extragradient methods of Popov's type for solving pseudomonotone variational inequalities in real Hilbert spaces. In the first algorithm, the stepsize requires a prior estimate of the Lipschitz constant. This extends the results of [30,32] from monotone VIP to pseudomonotone VIP in real Hilbert spaces. In the second algorithm, the stepsize is selected self-adaptively and does not require a prior estimate of the Lipschitz constant. This improves Algorithm 1 of the paper and also the results of [30,32]. It also improves the method of [33,34] from the extragradient type method with Bregman distance to the subgradient extragradient method with Bregman distance. More so, it extends the result of [33,34] from monotone VIP to pseudomonotone VIP. We further provide some numerical examples using different types of convex functions and a numerical application of our result to the Least Absolute Shrinkage and Selection Operator (LASSO) problem in compressing sensing. We denote the strong and weak convergence of a sequence {a n } ⊆ H to a point x ∈ H by a n → x and a n x respectively. The paper is organized as follows: We first recall some basic definitions and results in Section 2. We present our algorithm and discuss its convergence in Section 3. We give some numerical experiments in Section 4. We conclude with a final remark in Section 5.

Preliminaries
In this section, we recall some basic definitions and concepts that are needed for establishing our results. Definition 1. [20] Let A : C → H be an operator. Then A is called (iv) Lipschitz continuous if there exists a constant L > 0 such that (v) weakly sequentially continuous if for each sequence {a n }, we have: a n p ⇒ Aa n Ap.

Definition 2. A function f
: H → R ∪ {+∞} is said to be proper if its effective domain, i.e., dom f = {v ∈ H : f (v) < +∞} is nonempty. The Fenchel conjugate function of f is the convex function f * : The function f is said to be strongly convex with strong convexity constant σ > 0, if exists for any v ∈ H. The gradient of f at u is a linear function ∇ f (u) defined by ∇ f (u), y = f o (u, v) ∀v ∈ H. When the limit in (5) holds for all u ∈ int(dom f ), we say that f is Gâteaux differentiable. More so, when (5) is attained uniformly for any v ∈ H with v = 1, we say that f is Fréchet differentiable. It is well-known that f is Fréchet differentiable if and only if the gradient ∇ f is norm-to-norm continuous at u (see [35]).
Given a Gâteaux differentiable function f : H → R ∪ {+∞}. The Bregman distance corresponding to the function f is defined by (see [31]) Moreover, when f is strongly convex with strong convexity modulus σ > 0, then the Bregman distance satisfies the following identity: It is important we emphasize that various types of function f gives different Bregman distance. We give the following important examples of some practical important types of function f and their corresponding Bregman distance (see, e.g., [33,36]): which is the squared Euclidean distance (SE).
a i log(v i ) called Shannon entropy, then its corresponding Bregman distance is given as This distance is called Kullback-Leibler distance (KL) and as become a very important tool in several areas of applied mathematics such as machine learning.
log(v i ) called Burg entropy, then its corresponding Bregman distance is given as This is called Itakura-Saito distance (IS), which is very important in information theory.
which is called the Mahalanobis distance (MD) used for cluster analysis.

Definition 3.
[37] The Bregman projection with respect to f of v ∈ int(dom f ) onto a nonempty closed convex Similar with the metric projection, the Bregman projection is characterized by the following identities (see [37]): and Lemma 1. [38] Let E be a Banach space and let f : E → (−∞, ∞] be a proper strictly convex function so that it is Gâteaux differentiable and {a n } is a sequence in E such that a n u for some u ∈ E. Then The following result is well-known; see, e.g., [30,32].

Lemma 2.
Let {a n } and {b n } be two non-negative real sequences such that a n+1 ≤ a n − b n .
Definition 4. [39,40] The Minty Variational Inequalities (MVI) are defined as finding a pointx ∈ C such that We denote by M(C, A), the set of solution of (11). Some existence results for the MVIP have been presented in [39]. The assumption that M(C, A) = ∅ has already been used for solving V I(C, A) in finite dimensional spaces (see e.g. [41]).

Lemma 3. [40] Consider the VI (1). If the mapping h
is continuous for all x, y ∈ C (i.e., h is hemicontinuous), then M(C, A) ⊂ V I(C, A). Moreover, if A is pseudomonotone, then V I(C, A) is closed, convex and V I(C, A) = M(C, A).

Main Results
In this section, we present our algorithms and their convergence analysis. In the sequel, we assume that the following assumptions hold.

Bregman Projection Method with Fixed Stepsize
We present our first algorithm as follows: Algorithm BPMFS Initialization: Pick a 0 , b 0 ∈ H arbitrarily and let λ > 0. Set n = 0.
Step 1: Compute Step 2: Given the current iterates a n , b n and b n−1 , calculate a n+1 and b n+1 as follows: where If a n+1 = a n and b n = b n−1 or Ab n = 0 : STOP. Otherwise, set n ← n + 1 and repeat Step 2.

Remark 2.
The projection for a n+1 in Step 2 can be calculated explicitly using the projection formula in [26]. Algorithm BPMFS is constructed based on the concepts of [28,29] using Bregman projections. As in [29], the main task of Algorithm BPMFS is to compute one projection for b n+1 onto C and one evaluation of A at the current approximation b n . More so, it is easy to see from (12) that C ⊂ T n . Indeed from (12), we have that hence, using (9), we obtain It follows from (14) that C ⊂ T n .

Remark 3.
Moreover, if for some n ≥ 0, a n+1 = a n and b n = b n−1 in Algorithm BPMFS. Then b n is a solution of the VI (1).
Proof. From a n+1 = a n , this implies that Hence λ Ab n , z − a n ≥ 0 ∀z ∈ T n .
Equivalently Ab n , z − b n ≥ Ab n , a n − b n ∀z ∈ T n .
Since C ⊂ T n , we get Ab n , z − b n ≥ Ab n , a n − b n ∀z ∈ C.
On the other hand, it follows from the definition of T n that Substituting b n−1 = b n and z = a n into the above inequality, we get Then we have λ Ab n , a n − b n ≥ 0.
Combining (15) and (16), we get Therefore b n is a solution of the VI (1).

Lemma 4.
Let {a n } and {b n } be the sequences generated from Algorithm BPMFS. Then the following inequality holds: for any z ∈ Sol.
By the three point identity (7), we have This implies that .
Since A is pseudomonotone and b n ∈ C, it follows that Ab n , z − b n ≤ 0. Then this implies that Combining (19) and (20), we get However, we have Since a n+1 ∈ T n and by the definition of T n , we get Using the three points identity (7) in (22), we obtain Combining (21) and (23), we get Note that where in (25), we have used the following basic inequalities with (8): It follows from (24) and (25) that Hence, we obtained the desired result.
Theorem 1. Suppose that Assumption 1 holds, and let λ ∈ 0, ( Then, the sequences {a n } and {b n } generated by Algorithm BPMFS converges weakly to a solution of the VI (1).
Hence, it follows from Lemma 2 that {a n } is bounded and lim n→∞ ∆ f (b n , a n ) = 0. Consequently, {a n } is bounded and a n − b n → 0, a n+1 − b n → 0 as n → ∞. This implies that a n+1 − a n → 0 as n → ∞. Furthermore b n+1 − b n ≤ b n+1 − a n+1 + a n+1 − a n + a n − b n → 0, as n → ∞.
Since f is uniformly Fréchet differentiable, the ∇ f is norm-to-norm uniformly continuous on bounded subsets of E. Thus, we have Since {a n } is bounded, then we can choose a subsequence {a n k } of {a n } such that a n k p. Since a n − b n → 0, then b n k p. Since {b n } ⊂ C, thus p ∈ C. We now show that p ∈ Sol.
This implies that Hence, we have Taking limits of the above inequality as k → ∞, we get We choose a sequence { k } of positive numbers such that { k } is decreasing and convergence to 0. For each k ≥ 1, we denote by N k the smallest positive integer such that Since { k } is decreasing, it is easy to see that the sequence {N k } is non-decreasing. Thus, for each k ≥ 1, since {b n } ⊂ C, we have Ab N k = 0 and by setting we have Ab N k , ϑ N k = 1 for each k ≥ 1. Now, it follows from (29) that for each k ≥ 1, Since A is pseudomonotone, we get This implies that Now we show that lim k→∞ k ϑ N k = 0. Indeed, since a n k p and lim k→∞ a n k − b n k = 0, we obtain that b N k p as k → ∞. Since {b n } ⊂ C and A is weakly sequentially continuous on H, it follows that {Ab N k } converges weakly to Ap. We can suppose that Ap = 0 (otherwise, p is a solution). Since the norm is sequentially weakly lower semicontinuous, we have Hence for all v ∈ C, we get Therefore by Lemma 3, p ∈ Sol = Ω V I . Finally, we show that p is unique. Assume the contrary, i.e., there exists a subsequence {a n j } of {a n } such that a n j q with q = p. Following similar argument has above, we get q ∈ Ω V I . It follows from the Bregman opial-like property of H (more precisely, Lemma 1) that which is a contradiction. Thus, we have p = q and the desired result follows. This completes the proof.

Remark 4.
We note that Algorithm BPMFS converges strongly to a solution of the VI (1) if the operator A : E → E * is γ-strongly pseudomonotone and L-Lipschitz continuous.
Since p is a solution of the VI (1), we have Ap, v − p ≥ 0 for all v ∈ C. Using the strong pseudomonotonicity of A, we obtain Ax, v − p ≥ γ x − p 2 for all x ∈ C. Taking v = b n ∈ C, we get Hence, we have Therefore (30) and (31), we get Using (7), we have Note that Since a n+1 ∈ T n , then we have Combining (32) and (33), we get Following similar argument as in the Proof of Theorem 1, we have that the sequence {a n } is bounded and b n − a n → 0, a n+1 − b n → 0 as n → ∞. a n+1 − a n → 0 as n → ∞. Since f is continuous on bounded sets and ∇ f is weakly-weakly continuous, we get f (a n+1 ) − f (a n ) → 0 and ∇ f (a n+1 ) − ∇ f (a n ) → 0 as n → ∞. Therefore Thus, we have that lim It follows from (34) that Therefore, we get lim n→∞ b n − p = 0.
Since a n − b n → 0, then a n − p → 0 as n → ∞. This completes the proof.

Bregman Projection Method with Self-Adaptive Stepsize
Next, we propose a Bregman subgradient extragradient method whose convergence does not require a prior estimate of the Lipschitz constant of the cost operator A. The importance of the second algorithm is to cater for the situations when estimating the Lipschitz constant of A is very difficult. So, the algorithm uses a self-adaptive process for selecting its stepsize and does not require a prior estimate of the Lipschitz constant of A.
Step 1: Compute Step 2: Given the current iterates a n , b n and b n−1 , and λ n , calculate a n+1 and b n+1 as follows: where and If a n+1 = a n and b n = b n−1 or Ab n = 0 : STOP. Otherwise, set n ← n + 1 and repeat Step 2.
Remark 5. Note that in Algorithm BPMSAS, the stepsize λ n is chosen by a self-adaptive process. This means that the stepsize is improved along with the iterations. Moreso, the sequence {λ n } is monotonically non-increasing and bounded below by min{λ 0 , α L }. Then, we can say that the limit lim n→∞ λ n exist, which we denote by λ, i.e., lim n→∞ λ n = λ > 0.
We now give the convergence of Algorithm BPMSAS.
Hence lim n→∞ a n+1 − a n = lim Since f is uniformly Fréchet differentiable, then ∇ f is norm-to-norm continuous and therefore Since {a n } is bounded, then there exists a subsequence {a n k } of {a n } such that a n k x. Following similar argument from (27) of Theorem 1, we get thatx ∈ V I(C, A). This completes the proof.

Numerical Examples
In this section, we test the performance of Algorithm BPMFS and Algorithm BPMSAS for solving some VI available in the literature. All numerical computation are carried out using MATLAB program on a PC (with Intel(R) Core (TM) i7-5600U CPU @ 3.40GHz, RAM 8.00GB) in MATLAB 9.7 2019(a).

Example 2.
First, we consider the variational inequality problem for a finite dimensional space R m . The cost operator A : R m → R m is defined by Ax = Gx + q, such that where S is a m × m matrix, Q is a m × m skew-symmetric matrix, R is a m × m diagonal matrix, whose diagonal entries are positive (so, G is positive definite) and q is a vector in R m . The feasible set C is the closed convex polyhedron which is defined by C = {x ∈ R m : Bx ≤ b}, where B is a l × m (l = 10) matrix and b is a non-negative vector in R m . It is easy to see that A is monotone (thus, pseudomonotone) and L-Lipschitz continuous with L = G . For experimental purpose, all the entries of S, Q, R, B and b are generated randomly such that their properties are satisfied and the starting points a 0 , b 0 are generated randomly in R m (m = 50, 100, 150, 200). The Bregman projection onto the feasible set C is easily computed using the fmincon solver in MATLAB Optimization toolbox. We set q = 0 ∈ R m , in this case, we know that the unique solution of the VI is {0}. We choose f (x) = 1 2 x 2 which is strongly convex with modulus 1. We test Algorithm BPMFS (named, Alg. 1), Algorithm BPMSAS (named, Alg. 2) and the SEM (Algorithm (3)) using the following parameters: for Algorithm BPMFS and SEM, we choose λ = 0.3L while for Algorithm BPMSAS, we choose λ 0 = 0.3. The computation is stopped when D n = a n − 0 < 10 −7 is satisfied. We plot the graphs of D n against the number of iteration in each case for each iteration. The numerical results are shown in Table 1 and Figure 1. The numerical results shows that Algorithm BPMSAS has computational advantage than Algorithm BPMFS and the SEM in terms of number of iteration and CPU time taken for computation. This is due to the fact that the stepsize is improve in each iteration compare to the Algorithm BPMFS and SEM which used fixed stepsize. More so, Algorithm BPMFS performs better than the SEM. Note that the corresponding values of ∇ f (x) and ∇ f * (x) = ∇ f −1 (x) for the convex function given in Example 1 are respectively: T and ∇ f * (x) = exp(a 1 − 1), . . . , exp(a m − 1) We test the algorithms using similar parameters as in Example 1 with a n − 0 < 10 −4 as stopping criterion. The numerical results are shown in Table 2. The computation result shows that the IS has the best performance while the MD has the poorest performance for both Algorithm BPMFS and Algorithm BPMSAS.   [42]. Let z be a vector in R M , B be a matrix of order M × N (M << N) and λ > 0. The LASSO problem is given by where x 2 is the Euclidean norm of x and x 1 = N ∑ i=1 |a i | is the l 1 -norm of x. This problem has been considered as a great tool in several branches of science and engineering. Clearly, (40) is a convex unconstrained minimization problem which appears in compress sensing and image reconstruction, where the original signal (or image) is sparse in some orthogonal basis by the process where x is the original signal (or image), B is the blurring operator, ξ is a noise and z is the degraded or blurred data which needs to be recovered. Several iterative methods for solving (40) have been introduced with the earliest being the projection method by Figureido et al. [43]. Equivalently, the LASSO problem (40) can be expressed as a VI by setting A = B T (Bx − z) (see, [44]). We apply our algorithms to solve the signal problem with N = 4096 which is generated by the uniform distribution on the interval [−2, 2] with 160 non-zero elements. The matrix B is generated by the normal distribution with mean zero and variance one while the observation z with M = 512 is generated by Gaussian noise with SNR = 40. The initial points a 0 , y 0 are picked randomly and in Algorithm 1, we choose λ = 0. 35

Conclusions
In this paper, we introduce two Bregman subgradient extragradient method for solving variational inequalities with pseudomonotone and Lipschitz continuous operator in a real Hilbert space. We established some weak convergence results for the sequence generated by the algorithms. We also gives some numerical experiments to illustrate the performance of the proposed algorithms. The numerical results show that the proposed methods perform better than the usual subgradient extragradient method and their efficiency depends on the type of convex function used in the algorithms.