Next Article in Journal
Guaranteed Performance Event-Triggered Adaptive Consensus Control for Multiagent Systems under Time-Varying Actuator Faults
Previous Article in Journal
Sparse Clustering Algorithm Based on Multi-Domain Dimensionality Reduction Autoencoder
Previous Article in Special Issue
WOA: Wombat Optimization Algorithm for Solving Supply Chain Optimization Problems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

One-Rank Linear Transformations and Fejer-Type Methods: An Overview

by
Volodymyr Semenov
1,
Petro Stetsyuk
2,
Viktor Stovba
2 and
José Manuel Velarde Cantú
3,*
1
Faculty of Computer Science and Cybernetics, Taras Shevchenko National University of Kyiv, 03022 Kyiv, Ukraine
2
Department of Nonsmooth Optimization Methods, V.M. Glushkov Institute of Cybernetics of the NAS of Ukraine, 03187 Kyiv, Ukraine
3
Department of Industrial Engineering, Technological Institute of Sonora (ITSON), Navojoa 85800, Sonora, Mexico
*
Author to whom correspondence should be addressed.
Mathematics 2024, 12(10), 1527; https://doi.org/10.3390/math12101527
Submission received: 11 April 2024 / Revised: 6 May 2024 / Accepted: 9 May 2024 / Published: 14 May 2024
(This article belongs to the Special Issue Innovations in Optimization and Operations Research)

Abstract

:
Subgradient methods are frequently used for optimization problems. However, subgradient techniques are characterized by slow convergence for minimizing ravine convex functions. To accelerate subgradient methods, special linear non-orthogonal transformations of the original space are used. This paper provides an overview of these transformations based on Shor’s original idea. Two one-rank linear transformations of Euclidean space are considered. These simple transformations form the basis of variable metric methods for convex minimization that have a natural geometric interpretation in the transformed space. Along with the space transformation, a search direction and a corresponding step size must be defined. Subgradient Fejer-type methods are analyzed to minimize convex functions, and Polyak step size is used for problems with a known optimal objective value. Convergence theorems are provided together with the results of numerical experiments. Directions for future research are discussed.

1. Introduction

In the last few years, algorithms for convex optimization have revolutionized algorithm design, both for discrete and continuous optimization problems [1,2,3,4,5]. Popular convex optimization methods are subgradient-type methods [6,7,8,9].
Subgradient-type methods are characterized by slow convergence when minimizing ravine convex functions. This is due to the fact that an antisubgradient forms with an angle close to π / 2 directed toward a minimum point. The original way out of this situation was first realized and proposed by N.Z. Shor in [10] (see also [11]). It is based on the use of linear non-orthogonal space transformations (namely, the space dilation operator), which allow for changing angles between the subgradient and the direction of a minimum. In particular, this leads to subgradient-type methods in argument space transformed by a linear operator. That is why such methods can be considered as variable metric methods. For other approaches to minimize ravine convex functions, e.g., gradient averaging or the heavy ball method are used; see [1,4,8] and the references therein.
Historically, the first variable metric method was the one for minimization twice continuously differentiable functions proposed by Davidon in [12] and developed by Fletcher and Powell in [13]. It is based on the idea of the quadratic approximation of the function being minimized, and it actually imitates the Newton–Raphson method without explicitly calculating second derivatives [14,15]. As a result, the “quasi-Newton type method” name has been attached to it, and its interpretation as a variable metric method is rarely found in the literature. This is also due to writing form of the Davidon–Fletcher–Powell method, which uses a symmetric matrix correction, which does not contribute to interpretation of it as a method in transformed argument space. This circumstance also applies to other methods of the quasi-Newton type [8,16,17], as well as to a number of conjugate gradient methods [14]. Close to variable metric methods are mirror descent methods [18,19,20,21,22]. Note that the construction of fast and reliable mirror descent methods is an important problem.
Henceforth, variable metric methods will be considered to be those methods that use linear non-orthogonal space transformations, the convergence results of which are based on the study of the characteristic behavior of a function being minimized in the transformed space of arguments. The majority of variable metric methods are subgradient-type methods that use a space dilation operation both in the direction of the subgradient and in the direction of the difference of two consecutive subgradients (r-algorithms) [23,24,25,26]. The ellipsoid method also belongs to the first group, and the r-algorithms have proven themselves to be an effective tool for solving nonsmooth problems. Variable metric methods can also include two methods [27] which are essentially variants of r-algorithms with a variable coefficient of space dilation and classical Fejer adjustment of the step multiplier. A family of economical—in the sense of information processing—variable metric methods for convex programming problems allows one to obtain a fairly general scheme of the methods of simple-body centroids [28], which, along with space dilation, involves the use of other linear non-orthogonal space transformations.
Results of the convergence of the methods given above are based on the study of the behavior of the subgradient norm of a function being minimized in dilated space, the study of the behavior of the distance to the minimum point function in transformed space, and the use of monotonous reduction of the volume of the localization region of the extrema set. Obviously, there is a simple mechanism that allows one to extend the class of variable metric methods. This can be achieved both by increasing the number of linear non-orthogonal transformations and the number of investigated characteristics of a function being minimized in the transformed space of arguments. At the same time, preserving the gradient nature of the methods in a transformed space, it is possible to construct methods that are close to quasi-Newton methods in efficiency but that are applicable to a wider class of functions than smooth ones, including nonsmooth ones.
The practical effectiveness of r-algorithms for ravine problems is due to the fact that space dilation is aimed at improving the structure of the level surfaces of a function being minimized in the next space of arguments. However, in addition to space dilation, other linear non-orthogonal operators can be used for this. Two of these, which allow one to improve the structure of level surfaces in ravine functions, will be discussed in this paper. These transformations are quite simple, and for convex function minimization, it is possible to construct and substantiate variable metric methods that have a visual geometric interpretation in the transformed space of arguments. In addition to space transformation, the construction of such methods requires both the selection of movement direction from the current point and a certain method of adjusting the step multiplier in this direction. We will limit ourselves to the consideration of subgradient-type methods involving classical Fejer adjustment of the step multiplier in the direction of the antisubgradient in relation to convex function minimization with a known value of the function at the optimum point. This paper was prepared using References [29,30].
The content of this paper is presented as follows: Section 2 gives the formulation of the problem and briefly analyzes the Fejer-type methods. In Section 3, the one-rank ellipsoidal transformation of space is considered, and two simple methods with classical Fejer step control are given based on this transformation. In Section 4, the to-orthogonalizing space transformation and the family of orthogonal subgradient descent methods based on it with classical Fejer step control are considered. The practical effectiveness of the above methods is supported by numerical experiments. Section 3 and Section 4 also discuss possible schemes of a number of other algorithms, taking into account these transformations, as well as the constructive solution of a number of fundamental problems of Fejer-type methods for solving the convex programming problem.

2. Fejer-Type Methods

Let there be an unconditional minimization problem
min f ( x ) ,
where f ( x ) is a convex function of a vector argument x X , X = R n ; R n is Euclidean n-dimensional space with scalar product ( x , y ) . Here, X is an initial argument space, and n is the size of the problem. Let the extrema set X * of the problem (1) be non-empty and the minimum value of the function f ( x ) be known: f * = f ( x * ) , x * X * . Without loss of generality, for convenience of further considerations, let us assume that the set X * consists of the single point x * .
This formulation of the problem has a number of applications. In particular, it is easy to reduce the problem of finding a feasible point of a consistent system of convex inequalities to the problem (1). If the system of convex inequalities is inconsistent, analysis of this problem simplifies obtaining a sufficient condition of inconsistency. In addition, application of the exact penalty functions method to solve smooth constrained convex minimization problems leads to nonsmooth convex minimization problems [8,31].
Since we will further discuss numerical methods for solving the problem (1), let us formulate what we will consider as its solution, or, in other words, a stopping criterion for these methods. Let X ε * = x : f ( x ) f * ε , where ε > 0 . We will consider the point x k X ε * , for which f ( x k ) f * ε is fulfilled, as a solution of the problem (1) with accuracy ε by functional ( ε -solution). The input parameter of the methods will be the accuracy ε f , with which the problem (1) is to be solved by a functional, and a stop is performed as soon as the point x k X ε f * is reached.
Fejer methods go back to [32,33] and belong to the subgradient descent type, which involves adjustment of a step multiplier in the direction of a normalized subgradient. In order to solve the problem (1), Fejer methods with Polyak step size are formulated using the following iterative process:
x k + 1 = x k h k f ( x k ) f ( x k ) , h k = γ f ( x i ) f * f ( x k ) ,
where f ( x k ) is a subgradient of f ( x ) at the point x k , and γ is a scalar such that 0 < γ < 2 [34]. This selection of the parameter γ values guarantees a monotonous reduction of the distance to a minimum at each step of the process (2) and ensures the simplicity of proving the convergence of these processes. The h k step at γ = 1 we will call the classical Fejer step, and the corresponding method we will call the classical Fejer method. For a modern presentation of the theory behind Fejer methods, see [5,35,36].
When solving the problem (1) with accuracy ε f using a functional, the classical Fejer method is quite constructive and has the following form.
At the beginning of the process, we have x 0 R n , ε f > 0 .
Let x k R n , f ( x k ) , and f ( x k ) be obtained at the k-th step, these being the values of the function and subgradient calculated within it. If f ( x k ) f * ε f , then x k is the sought point. Otherwise, we calculate another approximation
x k + 1 = x k h k f ( x k ) f ( x k ) , h k = f ( x i ) f * f ( x k )
and proceed to the ( k + 1 ) -th step.
The following theorem is valid.
Theorem 1
([9]). There exists a constant C > 0 for which f ( x k ) C for all steps of the method (3). Then
min i = 0 , 1 , , k f ( x i ) f * C x 0 x * k + 1
for any k 0 . Therefore, the method (3) allows one to find the ε f -solution of the problem (1) in no more than K = [ C x 0 x * ε f 2 ] + 1 steps, where [ a ] is an integer part of the number a.
The proof of this theorem is based on the fact that at each step of the method (3), distance to the minimum point decreases with the accuracy of the shift square in the direction of the normalized subgradient as well as the fact that the step h k is limited by the size of the ( ε f / C ) value. We note that Theorem 1 condition can be strengthened if f ( x ) is bounded in a sphere of radius x 0 x * centered at the point x * , x 0 R n .
Despite its theoretical appeal, the method in (3) has a bad reputation in practical applications even for smooth convex functions due to its slow convergence for ravine functions. In particular, when the level surfaces of a function f ( x ) are not very dilated, it works well for even large-sized problems ( n 100 ). However, if the direction of the antisubgradient at the point is quite close to the orthogonal one (in the direction of the minimum, which is typical for ravine functions), then method (3) is almost hopeless even for small problems ( n 10). For nonsmooth convex functions, where ravineness is the rule rather than an exception, the method (3) is unreliable for problems of even smaller sizes ( n 2–5). In particular, minimizing two-variable function f ( x 1 , x 2 ) = | x 1 | + k | x 2 | causes it to converge with the geometric progression rate with denominator 1 1 / k 2 , which is close to one for small k values ( k 10–100).
To accelerate convergence, based on a piecewise linear approximation of a functional near the minimum, a more general method is proposed in [34]:
x k + 1 = P Q k ( x k ) , Q k = x : f ( x i ) + ( f ( x i ) , x x i ) f * , i I k .
Here, I k is any subset of indices from 0 , 1 , , k that necessarily contains k, and P Q k is a projection operator on Q k . The main problem of the method shown in (5) is to specify a constructive way of forming the set I k in order to limit the number of subgradients being stored. The most meaningful answer here the orthogonal descent method [37] can be considered. It is based on the idea of successive immersion of a non-obtuse cone containing X * into a rectangular cone, and uses no more than ( n + 1 ) vectors. However, it does not always give good results in terms of the accuracy of solving the problem (5) by functional. This is evidenced by the results of a number of numerical experiments with both the orthogonal descent method and its modification, presented in [38,39]. Despite the small size of the problems ( n 10 ) and the fact that they are not very robust, the accuracy of their solution by functional is often ε f 10 4 10 2 .
Another Fejer-type method is proposed in [40]. It uses only two vectors, and to construct the next descent direction, it uses a linear combination of the subgradient direction and the previous step direction (like conjugate gradients) so that it forms a sharper angle with the direction to the minimum. For ravine functions, it provides a “smoother” movement along the ravine than the above methods using only two vectors, the behavior of which is “jumpy”. However, it is also inefficient for ravine functions, especially when the ravine is multidimensional.
In terms of the accuracy of solving the problem (1), to improve the situation for method (3), one can use linear transformation of space to align the structure of the level surfaces of the function in transformed space.
Let Y = A X be the space of arguments transformed using the linear operator A. Here, A is a non-degenerate matrix of dimensions n × n . Let B be the inverse operator of A: B = A 1 . Then, the classical Fejer method with the space transformation into X is as follows:
x k + 1 = x k h k B B T f ( x k ) B T f ( x k ) , h k = f ( x i ) f * B T f ( x k ) ,
which corresponds to the classical Fejer step in the direction of the antisubgradient in the transformed argument space Y
y k + 1 = y k h k φ ( y k ) φ ( y k ) , h k = ( φ ( y k ) φ * ) φ ( y k ) .
Here, y = A x are images of points from X in the transformed space Y, and φ ( y ) is a convex function defined in the transformed space Y: φ ( y ) = f ( A 1 y ) = f ( x ) . φ ( y k ) = B T f ( x k ) is the subgradient at the point y k = A x k for the function φ ( y ) .
Let Y k = A k X be the current argument space. Then, in terms of improving the structure of the level surfaces φ k + 1 ( y ) in the next transformed space of arguments Y k + 1 = T k + 1 Y k , it is natural to expect more efficient work from methods of type (5) than from method (3). Here, T k + 1 is a non-degenerate matrix of size n × n which specifies the transformation from Y k to Y k + 1 .
The main relations that are needed in order to implement such a process are the recalculation of the matrix B k + 1 according the relation
B k + 1 = B k T k + 1 1
and the recalculation of subgradients for φ k + 1 ( y ) defined in Y k + 1 according to relation
φ k + 1 ( y ) = T k + 1 1 T φ k ( y ) .
Further, we will often use various identities that follow from the chain of equalities
A k + 1 = B k + 1 1 = B k T k + 1 1 1 = T k + 1 B k 1 = T k + 1 A k .
Taking into account
B T f ( x k ) 2 = B T f ( x k ) , B T f ( x k ) = f ( x k ) , B B T f ( x k ) ,
process (5) can be written in the form
x k + 1 = x k f ( x k ) f * f ( x k ) , H f ( x k ) H f ( x k ) ,
where H = B B T is a positive definite symmetric matrix of dimension n × n . Depending on which of the ratios are taken as a basis, there are two means of practical implementation of the methods. When referring to them, we will stick to the names that have been fixed for them for the subgradient-type methods with space dilation. The method based on (5) is usually called the method in B form (the B method), and the method based on (9) is called the method in the H form (the H method). The advantage of the H method is that for storing the symmetric matrix H, it requires almost twice as much RAM ( n × ( n + 1 ) / 2 ) as the B method ( n × n cells). However, the method in the B form allows for simple interpretation of the process in the transformed space Y. Therefore, it is advisable to build methods in the B form, and, as a consequence, to consider their analogues in the H form, mainly to save memory and calculations. Therefore, we will focus on methods based on ratio (5).

3. One-Rank Ellipsoidal Operator

To solve the problem (1) in [27], two subgradient methods with a classical Fejer step are considered in a transformed argument space. They are based on volume reduction of the localization region of the extrema set and preserve the monotony of estimation of the distance to the optimum in the transformed space of arguments. The space transformation used in them is similar to that used in r-algorithms [23,24,25], and is reduced to two consecutive dilations of the argument space in orthogonal directions. The space transformation operator has the form
T 2 ( ξ , η ) = R α 1 ξ + η ξ + η R α 2 ξ η ξ η .
Here, ξ , η are normalized vectors from R n such that ( ξ , η ) 2 1 . α 1 = 1 / β 1 , where β 1 = 1 ( ξ , η ) , and α 2 = 1 / β 2 , where β 2 = 1 + ( ξ , η ) . Further, we will call the operator (10) a two-rank ellipsoidal operator.
The methods in [27] use trivial information about hyperplanes that cut off the extrema set. In particular, the first method uses only two subgradients at consecutive points obtained according to the classical Fejer step. The second method, in addition to two consecutive subgradients, also uses a third vector of the aggregate type, which is a convex combination of previously calculated subgradients. Despite modesty in the information used, computational experiments using these methods described in [27] for a number of both smooth and nonsmooth test problems turned out to be not so bad. Therefore, more appropriate strategies for using calculated subgradients can lead to more efficient methods based on the transformation (10).
However, despite the simple geometric interpretation, the two-rank ellipsoidal transformation is inconvenient because, for methods in the B form, when moving from the k-th to the ( k + 1 )-th step, the matrix B k + 1 calculation requires two one-rank corrections of a matrix of dimensions n × n , resulting in 4 n 2 arithmetic operations. However, it turns out that there is a space transformation that allows for one-rank correction of the B k + 1 matrix and that basically does not differ from the two-rank ellipsoidal transformation. This is not surprising, since methods like (5) are defined by the matrix H k = B k B k T . Due to the ambiguity of the decomposition of matrix H k , the same method in the H form may correspond to different methods in the B form.
Let ξ , η R n be vectors such that ξ = 1 , η = 1 , and their scalar product satisfies the condition ( ξ , η ) 2 1 . A linear operator from R n to R n is called a one-rank ellipsoidal operator, which is written in the following matrix form:
T 1 ( ξ , η ) = I 1 1 ( ξ , η ) 2 1 1 ( ξ , η ) 2 η ( ξ , η ) ξ η T .
Here, I is an unitary matrix of size n × n . This operator has a number of interesting properties, which allows one to build and justify various variable metric methods on its basis. The following characteristics are valid for this matrix:
Lemma 1
([29]). For the operator T 1 ( ξ , η ) given according to (11), if ( ξ , η ) 2 1 , then there exists an inverse T 1 1 ( ξ , η ) and
T 1 1 ( ξ , η ) = I + 1 1 ( ξ , η ) 2 1 1 ( ξ , η ) 2 η ( ξ , η ) ξ η T .
In addition, for T 1 ( ξ , η ) and T 1 1 ( ξ , η ) , the following relations are fulfilled:
T 1 T ( ξ , η ) T 1 ( ξ , η ) = I + ( ξ , η ) 1 ( ξ , η ) 2 ξ η T + η ξ T ,
T 1 1 ( ξ , η ) T 1 1 ( ξ , η ) T = I + ( ξ , η ) 2 1 ( ξ , η ) 2 ξ ξ T + η η T ( ξ , η ) 1 ( ξ , η ) 2 ξ η T + η ξ T .
A proof for Lemma 1 is not presented due to its bulkiness. We only note that validity of (12)–(14) is easy to check by direct verification. An acceptable type of expressions (13) and (14) is caused by the fact that when summing similar terms with one-rank matrices ξ ξ T , ξ η T , η ξ T , and η η T , members with 1 ( ξ , η ) 2 are reduced.
Lemma 2
([29]). Let B k be a non-degenerate matrix of dimensions n × n , p 1 , p 2 —n-dimensional vectors such that B k T p 1 B k T p 1 , B k T p 2 B k T p 2 2 < 1 . Let
B k + 1 = B k T 1 1 ( ξ , η ) , where ξ = B k T p 1 B k T p 1 and η = B k T p 2 B k T p 2 .
Then, the matrix B k + 1 is non-degenerate, and the following properties are fulfilled for it:
( a )   det ( B k + 1 ) = det ( B k ) 1 ( ξ , η ) 2 a n d ( b )   B k + 1 T p 1 , B k + 1 T p 2 = 0
To prove Lemma 2, expression (14) from Lemma 1 and some technical features are used.
One-rank ellipsoidal operator (11) has a clear geometric interpretation (see Figure 1). It transforms ellipsoid E l l ( x 0 , r ) into a ball. The minimum volume ellipsoid E l l ( x 0 , r ) is centered at x 0 and contains the convex set resulting from the intersection of the ball S ( x 0 , r ) : = { x R n : x x 0 r } with two half-spaces
P ( x 0 , ξ ) : = { x R n : ( x x 0 ) T ξ 0 } , P ( x 0 , η ) : = { x R n : ( x x 0 ) T η 0 } ,
where 1 < ξ T η < 0 , ξ = 1 , η = 1 . Thus, the vector ξ remains unchangeable under transformation (11), while the vector η is transformed so that ( ξ , η ) = 0 (see Figure 1b).
One-rank ellipsoidal space transformation (11) is strongly related to two-rank ellipsoidal transformation (10). Specifically, for both of them, the same decrease in inverse matrix determinant is characteristic, which is equivalent to a volume decrease of the localization area of the extrema set under certain conditions. Both the first and the second orthogonalize vectors when transitioning to the transformed space of arguments. Further, we will see that both of these transformations lead to the same method in the H form. However, unlike the two-rank ellipsoidal transformation, the transformation (11) requires half as many arithmetic operations when recalculating the matrix B k , i.e., exactly as much as the space dilation in some direction. Therefore, its use leads to computationally more economical methods in the B form than the use of the two-rank ellipsoidal transformation (10).
Based on the volume reduction of the localization region of the extrema set, the one-rank ellipsoidal operator (11) simplifies, justifying methods for solving the problem (1) to improve the structure of level surfaces of the function in the next transformed space. Let g k = B k T f ( x k ) and g k + 1 = B k T f ( x k + 1 ) be subgradients of φ k ( y ) in Y k = A k X at the points y k = A k x k and y k + 1 = A k x k + 1 . Here, y k + 1 is obtained according to the classic Fejer step in the transformed argument space Y k . Then, if ( g k , g k + 1 ) 0 , then the space transformation is not necessary. Let ( g k , g k + 1 ) < 0 . Then, the space transformation Y k + 1 = T 1 g k g k , g k + 1 g k + 1 Y k or Y k + 1 = T 1 g k + 1 g k + 1 , g k g k Y k according to Lemma 2 allows one to orthogonalize the images of subgradients g k and g k + 1 in the transformed space of arguments Y k + 1 while providing better level surfaces for the function φ k + 1 ( y ) in Y k + 1 . These simple considerations lead to the following method for solving the problem (1).

3.1. Fejer-Type Method with One-Rank Ellipsoidal Space Transformation Using Two Successive Subgradients

Before starting calculations, we have x 0 R n ; B 0 = I is a unit matrix of dimensions n × n ; ε f is the accuracy of the functional with which the problem (1) should be solved; f ( x 0 ) and g ( x 0 ) = f ( x 0 ) are the function and subgradient values calculated in x 0 . Then, if f ( x 0 ) f * ε f , then x 0 is the sought point and STOP.
Let x k R n be obtained at the iteration k, f ( x k ) , g ( x k ) = f ( x k ) , and B k be an n × n -dimensional matrix.
Step 1. Calculate the next approximation
x k + 1 = x k f ( x k ) f * B k T g ( x k ) B k B k T g ( x k ) B k T g ( x k ) .
Step 2. Calculate f ( x k + 1 ) and g ( x k + 1 ) = f ( x k + 1 ) . If f ( x k + 1 ) f * ε f , then x k + 1 is the sought point and STOP. Otherwise, proceed to the Step 3.
Step 3. Set
ξ 1 = B k T g ( x k ) B k T g ( x k ) , ξ 2 = B k T g ( x k + 1 ) B k T g ( x k + 1 ) .
Step 4. If ( ξ 1 , ξ 2 ) 0 , consider B k + 1 = B k and go to the Step 5. Otherwise, calculate
B k + 1 = B k T 1 1 ( ξ 1 , ξ 2 ) or B k + 1 = B k T 1 1 ( ξ 2 , ξ 1 ) .
Step 5. Go to the iteration ( k + 1 ) with x k + 1 , B k + 1 , g ( x k + 1 ) , f ( x k + 1 ) .
The following theorem holds.
Theorem 2
([29]). The sequence { x k + 1 } k = 0 , which is generated by the method (16)–(18), satisfies the inequalities
A k + 1 ( x k + 1 x * ) 2 A k ( x k x * ) 2 f ( x k ) f * 2 B k T g ( x k ) 2 .
Here, A k = B k 1 , A k + 1 = B k + 1 1 , k = 0 , 1 , 2 , .
The proof of Theorem 2 is rather bulky and it uses statements from Lemma 1. For the method (16)–(18), Theorem 2 provides the possibility of substantiating its marginal convergence to the solution of problem (1) by reducing the ellipsoid volume that localizes x * . At the same time, the volume reduction is better the larger the classical Fejer step in transformed space and the more obtuse the angle between successive subgradients. However, both of these circumstances depend on the specific properties of f ( x ) , and it is impossible to obtain any estimates for the general convexity of f ( x ) . Therefore, let us focus on this variant of proving the convergence of the method (16)–(18).
Theorem 3
([29]). Let the conditions B k c 1 and f ( x k ) c 2 hold on each step the method (16)–(18). Here, B k is the Euclidean norm of the matrix B k , i.e., B k = i = 1 n j = 1 n | b i j k | 2 1 / 2 . Then, the method (16)–(18) solves the problem (1) with an accuracy ε f by functional in no more than K steps, where K = [ c 1 c 2 x 0 x * ε f 2 ] + 1 . Here, [ a ] is an integer part of the number a.
The proof of Theorem 3 is based on Theorem 1 and an assumption trick about the method (16)–(18).
Therefore, if the conditions of Theorem 3 are fulfilled, then the method (16)–(18) from any initial approximation converges to the ε f -solution of the problem (1) for a finite number of iterations. However, the estimate of the maximum number of iterations required to achieve ε f accuracy is very rough. In real work using the method, it should be expected that for ravine functions, the stopping criterion will work with the condition that
f ( x k ) f * B k T g ( x k ) x 0 x * 2 i = 0 k 1 f ( x i ) f * 2 B i T g ( x i ) 2 ,
for the value of B k T g ( x k ) will be rather small for large k due to the fact that det ( B k ) goes to zero as k . Therefore, it is advisable to strengthen the space transformation in order to guarantee a stronger convergence to zero of det ( B k ) as k . In particular, a variant of the method as in [27] is suitable for this, which uses an aggregate vector, a convex combination of previously calculated subgradients, to enhance the transformation. In this case, the method will become a little more complicated, and it, relative to the method (16)–(18), will be characterized by a stronger alignment of the level surfaces of the function in transformed space, especially in the case of multidimensional ravines. Let us focus on computational schemes of these two methods, which are convenient for implementation, although it should be noted that the transformation (11) allows for justifying a whole family of Fejer-type methods. We will discuss some of them in detail later.
Practical implementation of the methods. Despite the ease of proof, the method (16)–(18) is inconvenient to implement on a computer. In particular, when the space transformation operation is implemented, it requires calculation of the subgradient image f ( x k + 1 ) as in the space Y k B k T f ( x k + 1 ) as well as in the space Y k + 1 B k + 1 T f ( x k + 1 ) . Calculation of each of them requires n 2 arithmetic operations of multiplication and the same number of additions, which results in a total of 4 n 2 arithmetic operations. This can be avoided if the subgradient when moving to the next transformed argument space is recalculated according to the ratio (8). Moreover, in the methods considered the antisubgradient in the transformed space of arguments as the direction of movement. The following lemma allows this to be achieved for the transformation (11).
Lemma 3
([29]). Let B k be a non-degenerate matrix of size n × n ; p k , p k + 1 are n-dimensional vectors such that B k T p k B k T p k , B k T p k + 1 B k T p k + 1 2 < 1 . Let B k + 1 = T 1 1 ( ξ k , ξ k + 1 ) , where ξ k = B k T p k B k T p k , ξ k + 1 = B k T p k + 1 B k T p k + 1 . Then,
B k + 1 T p k + 1 B k + 1 T p k + 1 = ξ k + 1 ,
B k + 1 T p k B k + 1 T p k = 1 1 ( ξ k , ξ k + 1 ) 2 ξ k ( ξ k , ξ k + 1 ) ξ k + 1 ,
B k + 1 T p i = 1 ( ξ k , ξ k + 1 ) 2 B k T p i , i = k , k + 1 .
For the vectors B k T p k and B k T p k + 1 in transformed space Y k = A k X , the expressions (21) and (22) of Lemma 3 ensure recalculation of both normalized directions and their norms at the transition to the next transformed space of arguments obtained as a result of applying the operator T 1 B k T p k B k T p k , B k T p k + 1 | B k T p k + 1 . However, for the B form of the methods, it does not matter which of the transformations— T 1 B k T p k B k T p k , B k T p k + 1 B k T p k + 1 or T 1 B k T p k + 1 B k T p k + 1 , B k T p k B k T p k —is chosen. The choice of the first one is more rational, since it ensures recalculation of subgradients, which allows one not to accumulate errors in normalized directions for making a step in the next transformed space of arguments.
For the implementation of the method, recalculation of the classic Fejer step multiplier in the direction of the normalized subgradient is also required during the transition to the next transformed space of arguments. This does not lead to any problems, since their recalculation is associated with changing the norms of subgradients and, therefore,
h k + 1 = h k B k T g k B k + 1 T g k .
Here, h k + 1 and h k are classical Fejer steps in the direction of the normalized subgradient in the spaces Y k + 1 = A k + 1 X and Y k = A k X .
Therefore, Lemma 3 allows one to avoid unnecessary calculations for the B form of the method (16)–(18), and for the problem (1), it takes on a more rational form in terms of computation.
Before starting calculations, we have ε f > 0 , x 0 R n . Then, if f ( x 0 ) f * ε f , then x 0 is the sought point and STOP. Otherwise, we set h 0 = f ( x 0 ) f * f ( x 0 ) , ξ 0 = f ( x 0 ) f ( x 0 ) R n ; B 0 = I is a unit matrix of dimension n × n .
Let the x k R n , h k , ξ k R n ; B k matrix of dimension n × n be obtained at the iteration k.
Step 1. Calculate the next approximation
x k + 1 = x k h k B k ξ k .
Step 2. Calculate f ( x k + 1 ) , f ( x k + 1 ) . If f ( x k + 1 ) f * ε f , then x k + 1 is the sought point and STOP. Otherwise, set
ξ k + 1 = B k T f ( x k + 1 ) B k T f ( x k + 1 ) | , h k + 1 = f ( x k + 1 ) f * B k T f ( x k + 1 ) .
Step 3. If ( ξ k , ξ k + 1 ) 0 , then B k + 1 = B k and go to the Step 4. Otherwise, calculate
η = 1 1 ( ξ k , ξ k + 1 ) 2 1 ξ k + 1 ( ξ k , ξ k + 1 ) 1 ( ξ k , ξ k + 1 ) 2 ξ k ,
B k + 1 = B k I + η ξ k + 1 T , h k + 1 = h k + 1 1 ( ξ k , ξ k + 1 ) 2 .
Step 4. Go to the iteration ( k + 1 ) with x k + 1 , B k + 1 , ξ k + 1 , h k + 1 .
Therefore, the method (23)–(26) is identical to the method (16)–(18), and Theorems 2 and 3 hold for it. However, it is computationally more rational than the method (23)–(26), and further, we will choose it for numerical experiments.

3.2. A Fejer-Type Method with One-Rank Ellipsoidal Space Transformation Using Two Consecutive Subgradients and an Aggregate-Type Vector

Identically to [27], it is easy to construct a method that uses two consecutive subgradients and some aggregate vector that allows us to strengthen the space transformation and ensure faster convergence of the method in the sense of (20). Using Lemmas 2 and 3 given above, as well as Lemma 3 from [27], we obtain the following method for solving the problem (1).
Before starting calculations, we have ε f > 0 , x 0 R n . If f ( x 0 ) f * ε f , then x 0 is the sought point and STOP. Otherwise, we set h 0 = f ( x 0 ) f * f ( x 0 ) , ξ 0 = f ( x 0 ) f ( x 0 ) R n , p 0 = 0 R n ; B 0 = I is a unit matrix of dimension n × n .
Let x k R n , h k , ξ k R n , p k R n , n × n matrix B k be obtained at the iteration k.
Step 1. Calculate the next approximation
x k + 1 = x k h k B k ξ k .
Step 2. Calculate f ( x k + 1 ) , f ( x k + 1 ) . If f ( x k + 1 ) f * ε f , then x k + 1 is the sought point and STOP. Otherwise, set
ξ k + 1 = B k T f ( x k + 1 ) B k T f ( x k + 1 ) , h k + 1 = f ( x k + 1 ) f * B k T f ( x k + 1 ) .
Step 3. Calculate
λ 1 = ( p k , ξ k + 1 ) ( p k , ξ k + 1 ) 2 + ( ξ k , ξ k + 1 ) 2 , λ 2 = ( ξ k , ξ k + 1 ) ( p k , ξ k + 1 ) 2 + ( ξ k , ξ k + 1 ) 2
and set
p k + 1 = λ 1 p k + λ 2 ξ k , if λ 1 > 0 and λ 2 > 0 , p k , if λ 1 > 0 and λ 2 0 , ξ k , if λ 1 0 and λ 2 > 0 , 0 , if λ 1 0 and λ 2 0 .
Step 4. If ( p k + 1 , ξ k + 1 ) 0 , then B k + 1 = B k and go to the Step 5. Otherwise, calculate
η = 1 1 ( p k + 1 , ξ k + 1 ) 2 1 ξ k + 1 ( p k + 1 , ξ k + 1 ) 1 ( p k + 1 , ξ k + 1 ) 2 p k + 1 ,
B k + 1 = B k I + η ξ k + 1 T , h k + 1 = h k + 1 1 ( p k + 1 , ξ k + 1 ) 2 ,
p k + 1 = 1 1 ( p k + 1 , ξ k + 1 ) 2 p k + 1 ( p k + 1 , ξ k + 1 ) ξ k + 1 .
Step 5. Go to the iteration ( k + 1 ) with x k + 1 , B k + 1 , ξ k + 1 , h k + 1 , p k + 1 .
The method (27)–(32) is characterized by the use of a minimum of information about the function behavior while preserving the classic Fejer step in the direction of the last calculated subgradient. In particular, in addition to the last two subgradients, it also uses an aggregate-type vector p k , which is a convex combination of previously calculated subgradients, and sets a hyperplane that cuts off the set of extrema in transformed argument space from the image of the point x k . In this case, the problem of updating the p k + 1 vector is solved automatically by analyzing the angles between the last subgradients and the current p k .
For the method (27)–(32), as well as for the method (23)–(26), analogues of Theorems 2 and 3 hold. In particular, the following theorem is valid:
Theorem 4
([29]). Let at each step of the method (27)–(32) inequalities B k c 1 and f ( x k ) c 2 hold. Then, the method (27)–(32) solves the problem (1) with an accuracy ε f by functional with no more than in K steps, where
K = [ c 1 c 2 x 0 x * ε f 2 ] + 1 .
This theorem guarantees its finite convergence to the ε f solution of the problem (1) from an arbitrary initial approximation. For ravine functions, compared to the method (23)–(26), it provides a faster tendency to zero of the matrix B k determinant as k . Therefore, more effective practical work in the sense of (20) should be expected from it than from the method (23)–(26).
H form of the given methods. The methods considered above, which use one-rank ellipsoidal transformation of space, are close to methods based on two-rank ellipsoidal transformation in [27]. In particular, the first of them corresponds to the method (27)–(32) and the second to the method (37)–(44) from [27]. Let all these methods start from the same starting point. Then, thee B forms of the methods (23)–(26) and (27)–(32) and B forms of their corresponding analogues from [27] are different in the sense that they generate different sequences of matrices B k . At the same time, the directions of movement in transformed spaces are also different. However, the sequences of points { x k } k = 0 generated by the methods (23)–(26) and (27)–(32) coincide with the same sequences of their analogues from [27]. The fact is that the H forms of these methods coincide, that is, the sequences of { H k } k = 0 matrices are the same for them. In fact, the H form of the above algorithms is determined by the expression (14) of Lemma 1 for T 1 1 ( ξ , η ) T 1 1 ( ξ , η ) T . A similar product for T 2 ( ξ , η ) has the form
T 2 1 ( ξ , η ) T 2 1 ( ξ , η ) T = R β 1 ξ + η ξ + η R β 2 ξ η ξ η R β 1 T ξ + η ξ + η R β 2 T ξ η ξ η = R β 1 2 ξ + η ξ + η R β 2 2 ξ η ξ η = I + ( β 1 2 1 ) ξ + η ξ + η ξ + η ξ + η T × I + ( β 2 2 1 ) ξ η ξ η ξ η ξ η T = I + ( β 1 2 1 ) ξ + η ξ + η ξ + η ξ + η T + ( β 2 2 1 ) ξ η ξ η ξ η ξ η T = I ( ξ , η ) 2 1 + ( ξ , η ) ( ξ + η ) ( ξ + η ) T + ( ξ , η ) 2 1 ( ξ , η ) ( ξ η ) ( ξ η ) T = I + ( ξ , η ) 2 1 ( ξ , η ) ( ξ , η ) 2 1 + ( ξ , η ) ξ ξ T + η η T ( ξ , η ) 2 ( 1 + ( ξ , η ) ) + ( ξ , η ) 2 ( 1 ( ξ , η ) ) ξ η T + η ξ T = I + ( ξ , η ) 2 1 ( ξ , η ) 2 ξ ξ T + η η T ( ξ , η ) 1 ( ξ , η ) 2 ξ η T + η ξ T
and precisely corresponds to the expression (14) for T 1 1 ( ξ , η ) T 1 1 ( ξ , η ) T .
Direct use of the relation (14) for the matrix H k + 1 recalculation requires four one-rank corrections of n × n matrices. However, by appropriately grouping the terms for one-rank matrices, two-rank corrections are enough to recalculate H k + 1 . In particular, for the H form corresponding to the method (23)–(26), the formulas for the recalculation of the matrix H k + 1 take one of the forms
H k + 1 = H k H k g k + 1 g k + 1 T H k ( g k + 1 , H k g k + 1 ) + H k p p T H k ( p , H k p ) , p = ( g k , H k g k + 1 ) ( g k , H k g k ) g k g k + 1 ,
or
H k + 1 = H k H k g k g k T H k ( g k , H k g k ) + H k p p T H k ( p , H k p ) , p = ( g k , H k g k + 1 ) ( g k + 1 , H k g k + 1 ) g k + 1 g k .
Here, g k = f ( x k ) and g k + 1 = f ( x k + 1 ) are subgradients of f ( x ) at successive points. For the H form corresponding to the method (27)–(32), the formulas for recalculating matrices H k + 1 will be a little more complicated.
In addition to saving the memory needed to store the matrix, the use of the H form also leads to more economical methods in terms of computation than the above-mentioned B methods. In particular, the two-rank correction of the matrix H k + 1 , taking into account its symmetry, requires no more operations than the one-rank correction of the matrix B k + 1 , and instead of two matrix–vector multiplications as required by the methods (23)–(26) and (27)–(32), one is sufficient for the H form. It is quite possible that method of justifying H methods can be no less good than B methods. However, these questions require a special study, which will not be conducted here. We only note that the formulas (33) and (34) for H k + 1 recalculation are very similar to recalculation of similar matrices in quasi-Newton methods [8].

3.3. Numerical Experiments

Since the methods (23)–(26) and (27)–(32) are identical to the methods (28)–(33) and (37)–(44) in [27], there is no particular need for a numerical study of their behavior depending on size of the problems. It can be assumed that the results given in [27] (Table 1 and Table 2) are valid for them. However, some specification is needed here. The fact is that the set of test problems in [27] is characterized by strong ravineness. Given that different B methods correspond to different ways of accumulating errors in the B k matrix, the numerical calculations may not completely match, that is, slight discrepancies are possible.
Therefore, in the first series of experiments, we will limit ourselves to problems of small sizes n 5–10 and check the numerical stability of the above methods in relation to the accuracy of solving problems by functional. As nonsmooth test problems, we will choose the two most frequently mentioned examples of piecewise-quadratic functions:
S h o r ( n = 5 , f * = 22.6001620958 ) and M a x q u a d ( n = 10 , f * = 0.841408334596 ) ,
which are considered poor problems for subgradient methods. Such a scrupulous indication of f * for the S h o r and M a x q u a d problems is due to checking the methods’ work at sufficiently small values of ε f . As test smooth problems, we will choose quadratic problems Q u a d ( t ) from [27] with different degrees of velocity ( Q u a d ( 10 . ) and Q u a d ( 3 . ) ) and at different sizes ( n = 5 and n = 10 ). Here, f * = 0 . All of these problems are characterized by a single minimum point, but for nonsmooth problems, it is determined by the accuracy of the problem f * . As ε f , we take ε 0 and ε 0 2 . Here, ε 0 = 10 5 for nonsmooth problems and 10 10 for quadratic problems.
Results of the methods work are given in Table 1. Here, i t e r ( ε ) is the number of calculations of the function and its subgradient values necessary to achieve ε accuracy by functional. The number of space transformations that are implemented in this case are given in parentheses. As can be seen from Table 1, the work of the methods (23)–(26) and (27)–(32) is not as hopeless as the work of the classic Fejer method without space transformation. This confirms to some extent that the space transformation aimed to align surface levels of the function can significantly increase the accuracy of solving problems by functional for Fejer-type methods.
The effort spent on solving these problems with the accuracy ε f = ε 0 2 by functional is often twice as much as that spent with the accuracy ε f = ε 0 . However, this gap narrows significantly as the problem size increases. This is evidenced by the second series of experiments with the method (27)–(32) for weakly ravine problems Q u a d ( t ) and S a b s ( t ) from [27]. The results are shown in Table 2. For the problems S a b s ( t ) , the space transformation is implemented at almost every step of the method (27)–(32). Therefore, only the total number of iterations is given for them in Table 2.
In Figure 2 the method (27)–(32) is compared with Shor’s r-algorithm for seven unconstrained minimization problem instances with smooth and nonsmooth convex objective functions [28,41]. The number of variables varies from 10 to 50. Light grey columns in Figure 2 represent the number of iterations for r-algorithm with ε x = 10 6 and ε g = 10 6 . Grey columns correspond to ε f = 10 8 in the method (27)–(32), while dark grey columns represent results for this method with ε f = 10 12 . As can be seen from Figure 2, the method (27)–(32) outperforms r-algorithm in almost all seven problems except N2.
Consider two more problems to demonstrate the efficiency of Fejer-type methods. The first problem is to minimize ravine piecewise linear function f 1 ( x 1 , x 2 ) = | x 1 | + t | x 2 | , t > 1 starting from different x 0 = x 0 ( 1 ) , x 0 ( 2 ) . The method (27)–(32) finds the minimum point x * = ( 0 , 0 ) in no more than three iterations: 1. one iteration is required if | x 0 ( 2 ) | = t | x 0 ( 1 ) | , no space transformations; 2. two iterations, if | x 0 ( 2 ) | < t | x 0 ( 1 ) | , one space transformation; 3. three iterations if | x 0 ( 2 ) | > t | x 0 ( 1 ) | , one space transformation. If | x 0 ( 2 ) | t | x 0 ( 1 ) | , then the method (2), Fejer method with Polyak’s step, converges with a geometric progression rate q ( t ) = 1 1 / t 2 and requires a significant number of iterations if t is large. Convergence trajectories of the method (2) and the method (27)–(32) for the first problem with t = 10 are shown in Figure 3.
The second problems is to minimize ravine piecewise quadratic function f 2 ( x 1 , x 2 ) = max { x 1 2 + ( 2 x 2 2 ) 2 3 , x 1 2 + ( x 2 + 1 ) 2 } . The corresponding optimal solution is x * = ( 0 , 0 ) and f 2 * = 1 . Starting from x 0 = ( 1 , 1 ) , the method (27)–(32) finds in
  • 16 iterations the point x 16 , where f 2 ( x 16 ) f 2 * 10 6 ;
  • 31 iterations the point x 31 , where f 2 ( x 31 ) f 2 * 10 10 .
For the same function f 2 , the method (2) requires more than 10,000 iterations to obtain a solution with accuracy ε f = 10 3 (see Figure 4).
So, the one-rank ellipsoidal transformation of space is quite a worthy replacement of the two-rank ellipsoidal transformation in the sense that B methods based on it require fewer arithmetic operations. As with the the two-rank transformation, it allows one to build a variety of variable metric methods that have a simple geometric interpretation, and it provides a fairly convenient mechanism for proving methods in the B form based on the external approximation of the extrema set by monotonically decreasing in volume ellipsoid X, which is the image of the sphere in Y k . This fact makes it possible to obtain quite satisfactory answers to a number of questions in convex programming.
In particular, Theorem 3 indicates the fundamental possibility of building a constructive stopping criterion for inconsistent systems of convex inequalities by analyzing the consequence k = 0 h k 2 . If k = 0 K h k 2 r 2 for some K, this means that the convex inequalities system does not have a feasible point in a sphere of radius r centered on point x 0 . Thus, the above methods in a finite number of steps either find a feasible point of a convex inequalities system or obtain a sufficient condition for its inconsistency.

4. Orthogonalizing One-Rank Operator

The one-rank ellipsoidal operator does not allow for working with more than 2–3 vectors easily. Let K ( g 1 , g 2 , , g m ) , m n be an acute cone that localizes the extrema set. Here, g 1 , g 2 , …, g m are linearly independent subgradients, and ( g i , g j ) < 0 , i , j = 1 , , m , such that i j . For effective transformation of the cone K ( g 1 , g 2 , , g m ) using the one-rank ellipsoidal operator in each of the transformed spaces, it is necessary to solve the problems of immersion of this type of cone into simpler cones K ( p 1 , p 2 ) , such that ( p 1 , p 2 ) < 0 . These problems are not a big trouble, and their solving requires a small computational cost. However, this can be avoided if at each step of the method we work with a simpler cone, for example, an orthogonal one. This possibility is easy to implement and the linear non-orthogonal operator described below can be used for space transformation.
Let there be a set of vectors from R n
P = { p 1 , p 2 , , p m } , p i = 1 , i = 1 , , m , m n ,
for which the following conditions hold:
( p i , p j ) = 0 , i j , i = 1 , 2 , , m 1 , j = 1 , 2 , , m 1 ,
p m i = 1 m 1 ( p m , p i ) p i 2 = 1 i = 1 m 1 ( p m , p i ) 2 > 0 .
That is, vectors from P are linearly independent, the first ( m 1 ) vectors are mutually orthogonal, and p m is not necessarily orthogonal to all previous ones.
We denote p = i = 1 m 1 ( p m , p i ) p i . For vectors (35) satisfying (36) and (37), the following properties are fulfilled:
( a ) ( p , p m p ) = 0 ; ( b ) ( p i , p m p ) = 0 , i = 1 , , m 1 ; ( c ) ( p m , p m p ) = p m p 2 .
Further, we will use them rather often.
Let us consider a linear operator from R n to R n , which can be presented in matrix form as follows:
T λ ( p m , p ) = I + p m p p m p 2 1 λ p m + p T ,
where I is a unit matrix of dimension n × n , λ is some scalar such that λ ( λ + 1 ) 0 . We will discuss the selection of the λ parameter and its meaning a little later. The operator (39) we will call a orthogonalizing one-rank operator. The following statements hold for it.
Lemma 4
([30]). For the operator T λ ( p m , p ) for λ ( λ + 1 ) 0 , there exists an inverse T λ 1 ( p m , p ) and
T λ 1 ( p m , p ) = I p m p p m p 2 1 λ + 1 p m + λ λ + 1 p T .
In addition,
T λ T ( p m , p ) T λ ( p m , p ) = I + 1 p m p 2 2 λ + 1 λ 2 p m p m T p p T + p m p T + p p m T .
Lemma 5
([30]). Let the vector set (37) satisfy (38) and (39), and P ˜ = { p ˜ 1 , p ˜ 2 , , p ˜ m } be a vector set obtained from (37), as p ˜ i = ( T λ 1 ( p m , p ) ) T p i , i = 1 , m ¯ , for λ ( λ + 1 ) 0 . Then, none of the vectors p ˜ i is identically equal to zero vector, ( p ˜ i , p ˜ j ) = 0 , i j , i = 1 , m ¯ , j = 1 , m ¯ , and, in addition,
p ˜ i = p i , i = 1 , m 1 ¯ , p ˜ m = λ λ + 1 ( p m p ) .
The proofs of Lemmas 4 and 5 are mostly based on direct verification and using the properties (38).
Therefore, the operator (39) allows for a one-rank correction of the matrix B k + 1 when moving to the next transformed space of arguments and, moreover, provides a rather convenient mechanism for work with a cone, which is defined by a finite set of vectors m n . Its use makes it possible to build constructive methods of variable metrics with a fairly visual geometric interpretation in the transformed space of arguments.
1. If one were to choose subgradients as vectors, one could obtain various methods, such as a conjugate gradients method for smooth functions. In particular, for the unconditional minimization of positive definite quadratic functions, a simple procedure of the sequential accumulation of gradients with their orthogonalization leads to methods of this type that require no more than ( n + 1 ) calculations of the function and the gradient values. At the same time, using step selection, it is only necessary to guarantee the linear independence of the vectors y 0 y k , k = 1 , m ¯ , m n , where y 0 = A m x 0 , y k = A m x k , and, at the last step m of the method, when a linearly dependent gradient is obtained, for finding the minimum point to solve in transformed space a simple system of linear equations with a diagonal matrix of dimension m × m . Controlling the λ parameter in the operator (39) allows one to preserve positive definiteness of the H k = B k B k T matrix and control its degeneracy degree. This allows one to build methods like the conjugate gradients method for nonsmooth problems.
2. The use of the scalar p m , p p provides a fairly convenient mechanism for refining the lower estimate of f * in transformed argument space. This circumstance makes it possible to provide a constructive stopping criterion for understanding of the approximate fulfillment of the optimal conditions of convex programming problems. It can be used to construct variable metric methods using procedures like ε -subgradient [42,43] but in a transformed argument space. Given that, at a certain value of the λ parameter, the external approximation of the extrema set in transformed space by a sphere is preserved without increasing the radius, such procedures are also suitable for the implementation of the internal algorithm of the method of simple-body centroids [28].
3. Using the transformation (39), one can try to obtain computationally stable methods for solving systems of linear equations within the framework of the orthogonalization methods, which have proven to be numerically unstable [44,45]. At the same time, in addition to the transformation itself, there are a number of mechanisms both at the level of B form and H form for ensuring stable recalculation of the distances to hyperplanes in the transformed space of arguments.
4. The transformation (39) makes it easy to construct variable metric methods based on the localization of the extrema set by a cone in R n . We will focus on one of these variants of methods for solving the problem (1), which allows for a classical Fejer step in the direction of the antisubgradient, in this section. It is based on a simple geometric fact. Let y k be the vertex of the orthogonal cone K ( g 1 , , g m ) , which localizes the extrema set, and let g k be the subgradient calculated in it. Let K 1 ( g 1 , , g m 1 , g k ) be the next cone with the vertex at y k constructed taking into account this information. Here, g 1 , , g m 1 are the orthogonal subgradients of the cone K ( g 1 , , g m ) for which the inequality ( g i , g k ) < 0 is satisfied. By construction, this cone will contain an extrema set of the problem (1). The cone K 1 ( g 1 , , g m 1 , g k ) is non-obtuse, and its edges form acute angles. Its transformation into an orthogonal cone can naturally be used to improve the structure of the level surfaces of ravine functions.
The operator (39) and such a step-by-step cone construction strategy leads to a number of variable metric methods, which are based on the localization of the extrema set in transformed space by an orthogonal cone defined by previously calculated subgradients. At the same time, a solution of the problem of limiting the number of stored subgradients is automatically provided. Moreover, their screening is not a simple “forgetting”, but each of these vectors makes a certain contribution to space transformation, which is aimed at expanding the cone of possible directions of descent of the function.
For processes with a classical Fejer step in the direction of the antisubgradient and the described strategy of constructing a cone that localizes an extrema set, there are two possibilities for their implementation. The first is to primarily transform the space, and then, in the transformed space, to implement the classic Fejer step in the direction g ˜ k . Here, g ˜ k is the image of the subgradient g k in the transformed argument space. The second is to first make a classic Fejer step in the direction g k and then transform the argument space. In both cases, we will obtain localization extrema sets by an orthogonal cone in the transformed space of arguments. In the first case, for all of the previously calculated subgradients transferred to the current point, the maximum displacement along the convexity f ( x ) will be realized. In the second, each of the subgradients transferred to the current point, except for the last one, allows for a positive step in the direction of the antisubgradient so as not to disturb the localization of the extrema set by the orthogonal cone. So, in the second case, the method can be improved upon in the sense of the localization of the extrema set due to the step along the convex combination of orthogonal subgradients, not including the last one. This leads to a direction of movement different from the antisubgradient, and we will not consider this second option. Obviously, without a step along the convex combination of previous subgradients, it will lose to the first one.
Let us focus on the first variant and the described step-by-step strategy for building a cone. Such processes for solving the problem (1) will be called methods of orthogonal subgradient descent with a classical Fejer step due to the fact that the movement from a point in transformed space is determined by subgradient in it, which is orthogonal to the accumulated subgradients.

4.1. Orthogonal Subgradient Descent Method with a Classical Fejer Step (ORTGF)

Before proceeding to its description, we will define some additional parameters that have a purely practical meaning and specify the rule for building the next cone. In particular, in addition to the accuracy ε f > 0 of solving the problem (1) by functional, we will use two more rather small positive scalars— ε K and ε R —as well as the parameter m 0 n 1 . Here, ε K > 0 will determine the rule for screening subgradients, i.e., when moving to the ( k + 1 ) -th step, we will leave only those orthogonal subgradients for which the condition g i g i , g k g k < ε K is fulfilled; ε R > 0 will determine additional screening of subgradients so as to avoid the accumulation of calculation errors in the orthogonalization process, that is, those orthogonal subgradients for which g i g i , g j g j > ε R we will exclude. The m 0 parameter will set a limit on the maximum number of subgradients that are stored, and these will be the last computed subgradients. The set of stored subgradients at the step k will be denoted by P k , and its size, that is, the number of accumulated subgradients, will be s i z e ( P k ) . In particular, if P k = , then s i z e ( P k ) = 0 . To work with P k , we will use the sum (∪) and difference (∖) operations.
To simplify the presentation of the material, let us provide some additional restrictions in the O R T G F method. In particular, we will keep only normalized subgradients and, in addition, we will consider the parameter λ to be constant. Although, it should be noted that, by using subgradient norms and variable parameter λ k , more meaningful methods of orthogonal subgradient descent can be provided. For example, by using the space dilation operator and the operator (39), it is possible to ensure the space transformation so that direction of movement in Y k coincides with the shortest vector to the convex shell of the accumulated subgradients. This direction of movement is used in ε -subgradient methods and is considered to be a good replacement for the Newtonian direction.
Given the above, O R T G F ( x 0 , ε f , λ , ε K , ε R , m 0 ) —the method of orthogonal subgradient descent with a classical Fejer step—takes the following form.
Before starting calculations, we have B 0 = I n , P 0 = .
Let x k , B k and P k be obtained at the iteration k. Let us calculate f ( x k ) and f ( x k ) . If f ( x k ) f * ε f , then x k is the sought point and STOP. Otherwise, we go to the iteration ( k + 1 ) .
Step 1. Set ξ k = B k T f ( x k ) B k T f ( x k )   and   h k = f ( x k ) f * B k T f ( x k ) .
Step 2. Form a set P ˜ k = { p i P k : ( p i , ξ k ) < ε K } , preserving the sequence order of p i in P k .
Step 3. If s i z e ( P k ) = 0 , then B k + 1 = B k , ξ k + 1 = ξ k , h k + 1 = h k and proceed to the Step 4. Otherwise, calculate the vector p ˜ k = p i P ˜ k ( p i , ξ k ) p i and recalculate the parameters B k + 1 = B k T λ 1 ( ξ k , p ˜ k ) ,
ξ k + 1 = B k + 1 T f ( x k ) B k + 1 T f ( x k ) , h k + 1 = f ( x k ) f * B k + 1 T f ( x k ) .
Computationally, this is equivalent to B k + 1 = B k ( I η 1 η 2 T ) , where
η 1 = ξ k p ˜ k ξ k p ˜ k 2 , η 2 = 1 λ + 1 ξ k + λ λ + 1 p ˜ k . ξ k + 1 = λ λ + 1 ( ξ k p ˜ k ) λ λ + 1 ( x i k p ˜ k ) , h k + 1 = h k λ λ + 1 ( ξ k p ˜ k ) .
Step 4. Calculate the next approximation x k + 1 = x k h k + 1 B k + 1 ξ k + 1 .
Step 5. Form the next set
P k + 1 = { p i P ˜ k : | ( p i , ξ k + 1 ) | < ε R } ξ k + 1 ,
preserving the sequence order of p i in P ˜ k , ξ k + 1 will be the last vector in P k + 1 . If s i z e ( P k + 1 ) > m 0 , then P k + 1 = P k + 1 { p 1 P k + 1 } , where p 1 is the first vector in P k + 1 .
Step 6. Go to the iteration ( k + 1 ) with x k + 1 , B k + 1 , P k + 1 .
Theorem 5
([30]). For the k-th step of the method O R T G F , the following inequality holds:
A k ( x k x * ) , ξ k f ( x k ) f * B k T f ( x k ) 0 ,
where A k = B k 1 . When P k , then p i = 1 for all p i P k ; ( p i , p j ) = 0 for p i , p j P k such that i j ; in addition, the following inequalities hold:
( A k ( x k x * ) , p i ) 0 , p i P k ,
( A k ( x k x * ) , p ˜ k ) 0 .
To prove Theorem 5, the statements from Lemma 1 are used, as well as (44) and induction.
Theorem 6
([30]). The sequence { x k + 1 } k = 0 generated by the O R T G F algorithm at λ = 1 / 2 satisfies the inequalities
A k + 1 ( x k + 1 x * ) 2 A k ( x k x * ) 2 ( f ( x k ) f * ) 2 B k T f ( x k ) 2 .
Here, A k = B k 1 , A k + 1 = B k + 1 1 , k = 0 , 1 , 2 , .
The proof of Theorem 6 is based on Theorem 5 and Lemma 4 statements.
Theorem 6 means that choosing the λ = 1 / 2 parameter for the O R T G F method allows one to preserve the external approximation of the extrema set by a sphere with decreasing radius when moving into the transformed space of arguments. Given that | det ( B k ) | = 1 , this choice of the λ parameter allows one to substantiate the convergence of the O R T G F method in the sense of the external localization of the extrema set by an ellipsoid with decreasing volume. However, the volume reduction here will be small, as it is provided only by the classic Fejer step from the center of the sphere. Nevertheless, the operator (39) at λ = 1 / 2 (unfortunately, it is the only one) allows for improving the structure of the function’s level surfaces, not increasing the volume of the region of localization of the extrema set. This fact makes it possible to justify variable metric methods and any other strategies for constructing a non-obtuse cone that localizes an extrema set.
In this case, the transformation (39) takes the form
T 1 / 2 ( p m , p ) = T 1 / 2 1 ( p m , p ) = I p m p p m p 2 2 p m p T = I 2 p m p p m p p m p p m p T I + p m p p m p 2 p T
and contains both a “pure” mapping defined by an orthogonal matrix and a transformation acting on the “dilation” type. Relative to mapping, subgradient methods, as well as cutting-type methods, are invariant, and a transformation like “dilation” is able to improve the structure of the level surface of the function being optimized. The disadvantage of the operator (39) for λ = 1 / 2 is that it is not compressive for the subgradient space, i.e., det ( B k B k T ) = 1. This contributes to the accumulation of errors in the calculation of normalized directions and can lead to unstable work of methods for strongly ravine functions.
Theorem 6 implies Theorem 7.
Theorem 7
([30]). Let B k c 1 and f ( x k ) c 2 at each step of the O R T G F method. Then, the O R T G F method solves the problem (1) with the accuracy ε f in no more than K steps, where K = [ c 1 c 2 x 0 x * ε f 2 ] + 1 .
Its proof is identical to the proof of Theorem 3.
However, the parameter λ = 1 / 2 is the only one that allows, based on divergence of the sum of the series i = 0 k ( f ( x i ) f * ) 2 B i T f ( x i ) 2 , one to justify convergence of the O R T G F method in the sense of ε f -convergence by functional. To substantiate the convergence of O R T G F for other values of the parameter λ , particularly a variable, it is necessary to use another apparatus. The interpretation of these methods as akin to conjugate directions methods [14] is most suitable here. However, this issue requires a separate discussion. Here, we limit ourselves to numerical verification of the O R T G F method for two values of the λ parameter: λ = 1 / 2 O R T G F ( 0.5 ) and λ = 1 O R T G F ( 1.0 ) . However, if, for the problem (1), convergence of the O R T G F ( 0.5 ) method to the ε f solution is justified, then the O R T G F ( 1.0 ) requires a clearer justification.
The second choice of the λ parameter can be explained by the following geometrical explanation. Let P ˜ k at the point y k at the k-th step of the O R T G F ( 0.5 ) method. Then, before transforming the cone localizing the extrema set using the operator T 1 / 2 ( ξ k , p ˜ k ) , let us set ourselves the goal of reducing the volume of the region of localization of the extrema set without spoiling at the same time the structure of the cone. This is easy to do within the framework of an ellipsoid like that in [27], where one of the axes coincides with the ( ξ k p ˜ k ) direction, and the second coincides with p i P ˜ k , for which ( ξ k , p i ) 1 / 2 . This will generally be the case for strongly ravine functions. The transformation of such an ellipsoid into a sphere does not change the structure of the cone, which localizes the extrema set. It requires space dilation in the direction ( ξ k p ˜ k ) and compression in the orthogonal direction p i and leads to a decrease of the norm of the last subgradient in a transformed argument space. Choosing the parameter λ = 1 actually gives this meaning to the method, but without additional space transformations. At the same time, the norm of the last subgradient will decrease by a factor of two relative to what would be provided by the T 1 / 2 ( ξ k , p ˜ k ) operator.

4.2. Numerical Experiments

In the first series of experiments, we will check the numerical stability of the O R T G F ( 0.5 ) and O R T G F ( 1.0 ) methods for the same set of test problems as in Section 2 and their ability to achieve the same accuracy ε f by functional. As the parameters of the cone formation, we will choose: ε K = 10 4 , ε R = 10 8 , m 0 = n 1 .
The results are shown in Table 3. Here, the first number in parentheses indicates the number of transformations, and the second indicates the maximum number of accumulated subgradients. As can be seen from the table, the convergence of the methods to the ε f solution is quite stable and almost the same as for the method (27)–(32).
Although the O R T G F method is designed to align the structure of the level surface of the f ( x ) to overcome its ravineness, it will not always achieve this goal. The point is that the ravine φ ( y ) at the point y k characterizes the local behavior of φ ( y ) near y k . However, when using the classic Fejer step and the accepted way of accumulating subgradients, there is a danger that the accumulated subgradients, especially “the oldest ones”, only spoil the pattern of ravineness at the current point. For smooth functions, this will usually not work, since convergence of the method is helped by the fact that the subgradient norm tends to zero. This will ensure accumulation of subgradients that reasonably characterize the ravine. However, this is not the case for piecewise-linear functions. Here, convergence will be ensured rather by computing the vertex of the cone, which is determined by the number of pieces close to n.
Considering the above, for piecewise-linear functions, one should not expect good practical performance from O R T G F ( 0.5 ) for m 0 n . For strongly ravine functions, the situation will worsen also due to accumulation of errors in the B k matrix. Of course, the situation can be improved here both by additional space dilations aimed at volume reduction and by the procedure of “restoration” and improving the way of accumulating subgradients. For O R T G F ( 1.0 ) , the transformation is aimed at a stronger dilation of the cone, and it has a much better chance of good practical performance at small values of m 0 .
In the second series of experiments, we will check the methods’ performance at minimizing a convex piecewise-linear function with a very large number of pieces. For this, we will choose “bad” for subgradient methods problem T R 48 ( n = 48 , f * = 683 , 565 ) from ([46], p. 161). ε K and ε R are chosen the same as before. In this series of experiments, two different starting points are considered. The parameter ε f = 50 for the first starting point is chosen so that the number of calculations of f ( x ) and f ( x ) can be compared with the results presented in [47], where the points where f ( x ) 683 , 500 were considered as an acceptable solution. Results of the experiment with the first starting point show that the method O R T G F ( 0.5 ) with m 0 = n 1 requires 139 iterations with 28 accumulated subgradients ( ε f = 50 ) and 222 iterations with 31 accumulated subgradients ( ε f = 10 5 ). For the second starting point, it requires 72 iterations with 34 accumulated subgradients if ε f = 1 and 151 iterations with 34 accumulated subgradients if ε f = 10 5 . Regarding the method O R T G F ( 1 . ) , it was launched for values m 0 = { 20 ; 10 ; 5 } , and the iteration number grows as m 0 decreases. In particular, the method O R T G F ( 1 . ) , m 0 = 5 , requires 199 iterations ( ε f = 50 ) and 412 iterations ( ε f = 10 5 ) for the first point, and 207 iterations ( ε f = 1 ) and 357 iterations ( ε f = 10 5 ) for the second point, respectively. It is easy to see that the methods in the sense of ε f convergence by functional are rather stable.
Finally, in the last, third series of experiments, we will check the numerical stability of the O R T G F ( 1.0 ) method for medium-sized, strongly ravine problems ( n 30–100). We will choose Q u a d ( t ) and S a b s ( t ) as test problems. The results are shown in Table 4 and Table 5. Here, i t e r ( ε ) is the number of calculations of f ( x ) and f ( x ) . The number of transformations is given in parentheses, and for O R T G F ( 1.0 ) at m 0 = n 1 , the maximum number of accumulated subgradients is given as well. If we consider n calculations of f ( x ) and f ( x ) as one large iteration of the O R T G F ( 1.0 ) method, which in terms of complexity is equivalent to one iteration of the Newton method, then, as can be seen from Table 4 and Table 5, the number of large iterations is not so big and is ∼10.
So, on the basis of the operator (39) it is also easy to construct variable metric methods with a simple geometric interpretation in a transformed argument space. At the same time, for the general problem of convex programming, it is quite possible to create practically effective methods with a strict justification of their convergence using reduction of the volume of the region of localization of the extrema set.
However, the transformation itself is only one component of the methods. The second requires choosing the direction of the function study and the method of adjusting the step multiplier. The methods used in O R T G F are unlikely to be very successful, since the mandatory movement from the point of the antisubgradient greatly complicates the analysis of the φ ( y ) level lines. Therefore, algorithmic schemes are more rational, where the φ ( y ) level lines are analyzed or refined relative to a point fixed for some time. At the same time, for the calculation of f ( x ) and f ( x ) , simple procedures of the ε -steepest descent type in the transformed space of arguments can be provided, using the orthogonality of subgradients that are preserved or their convex combinations. The issues of filtering out extra subgradients are simply solved, the upper and lower estimates of f * are easily specified, etc.
It is natural to direct a space transformation so that the direction of the function f ( x ) investigation from the point approaches the direction to the optimum. Then, by changing the fixed point according to some rule (for example, to the point of record f ( x ) ) or the projection of the fixed point to the set
Q k = y : φ ( y i ) + φ ( y i ) , y y i f r e c o r d , i I k
and, repeating the same procedure with respect to a new point, we arrive at methods like Newtonian (quasi-Newton), which will be applicable to both smooth and nonsmooth functions.

5. Conclusions

Therefore, the application of the described space transformations, aimed at aligning the structure of the function level surfaces, allows us to significantly improve the performance of subgradient methods. For the convex programming problem, the use of operators (11) and (39) in combination with the space dilation operator allows us to construct a whole series of efficient variable metric methods. At the same time, for B-form methods, a simple geometric interpretation of the process in the transformed space of arguments can be given.
The considered transformations do not even come close to exhausting the list of effective one-rank space transformations. In particular, an optimal one-rank transformation like (11) is interesting, which would improve convergence in the sense of a reduction of the volume of the region of localization of an extrema set. Within the framework of orthogonalizing one-rank transformations, a simplex-like transformation is promising, which allows for building methods for convex problems of sufficiently large sizes while storing only a finite set of vectors instead of the full n × n -dimensional matrix. In addition, it is possible to construct a number of other interesting linear operators of the orthogonalizing type.
Using ellipsoidal operators in Fejer-type methods results in space transformations that refine dilated level surfaces of a ravine function. This improves the behavior of these methods in further iterations even for rather large problems if the level surfaces of the objective function are not significantly dilated.
The development of nonsmooth optimization issues and applications related to solving problems of large complexity can be found in [48]. In particular, applications in economics, finance, energy saving, agriculture, biology, genetics, environmental protection, information protection, decision making, pattern recognition, and self-adaptive control of complex objects are described.
The source of high-dimensional nonsmooth problems is the problem of optimal control of systems with distributed parameters. The book [49] describes the application of nonsmooth optimization algorithms in solving some inverse problems and singular problems of optimal control of systems with distributed parameters.
Nonsmooth functions and algorithms for their minimizing are also successfully applied in such fields as optimal set partition [50,51,52], reliable networks design [53], machine learning [54], and lots of other applied areas of human practice [55,56].
Since publication of the pioneering works of N.Z. Shor, many novel applications of nondifferential optimization have been recognized in, e.g., geometrical design problems [57,58], modified Lagrangian dual problems [59,60], and green logistics [61], to mention a few. Responding to new challenges in nondifferential optimization requires novel solution techniques. Combining artificial intelligence-based approaches with classical nondifferential techniques is an interesting area for future research [62,63].

Author Contributions

Conceptualization, V.S. (Volodymyr Semenov), P.S., and J.M.V.C.; methodology, V.S. (Volodymyr Semenov) and J.M.V.C.; software, V.S. (Viktor Stovba) and P.S.; validation, V.S. (Volodymyr Semenov) and J.M.V.C.; formal analysis, V.S. (Volodymyr Semenov) and P.S.; investigation, V.S. (Volodymyr Semenov), P.S., and J.M.V.C.; resources, P.S. and V.S. (Viktor Stovba); data curation, V.S. (Volodymyr Semenov) and P.S.; writing—original draft preparation, V.S. (Volodymyr Semenov), P.S., V.S. (Viktor Stovba), and J.M.V.C.; writing—review and editing, V.S. (Volodymyr Semenov), P.S., V.S. (Viktor Stovba), and J.M.V.C.; visualization, V.S. (Volodymyr Semenov) and P.S.; supervision, V.S. (Volodymyr Semenov), P.S., and J.M.V.C.; project administration, J.M.V.C. and P.S.; funding acquisition, J.M.V.C. and P.S. All authors have read and agreed to the published version of the manuscript.

Funding

The first three authors were partially supported by the NASU (0124U002162), the second and the third authors were partially supported by the Volkswagen Foundation (grant No 97775), and the last author was partially supported by the Technological Institute of Sonora (ITSON), Mexico through the Research Promotion and Support Program (PROFAPI 2024).

Data Availability Statement

The data presented in this study are available on request from the corresponding authors.

Acknowledgments

The authors would like to thank anonymous referees for careful reading the paper and constructive comments.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
  2. Bertsekas, D.; Tsitsiklis, J. Parallel and Distributed Computation: Numerical Methods; Athena Scientific: Nashua, NH, USA, 2015. [Google Scholar]
  3. Vishnoi, N. Algorithms for Convex Optimization; Cambridge University Press: Cambridge, UK, 2021. [Google Scholar]
  4. Wright, S.J.; Recht, B. Optimization for Data Analysis; Cambridge University Press: Cambridge, UK, 2022. [Google Scholar]
  5. Ryu, E.K.; Yin, W. Large-Scale Convex Optimization; Cambridge University Press: Cambridge, UK, 2022. [Google Scholar]
  6. Shor, N.Z. Nondifferentiable Optimization Methods and Complex Extremal Problems. Collection of Selected Works; Evrika: Chișinău, Moldova, 2008. [Google Scholar]
  7. Shor, N.Z. Nonsmooth Function Minimization Methods and Optimization Matrix Problems. Collection of Selected Works; Evrika: Chișinău, Moldova, 2009. [Google Scholar]
  8. Polyak, B.T. Introduction to Optimization; Optimization Software: New York, NY, USA, 1987. [Google Scholar]
  9. Beck, A. First-Order Methods in Optimization; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 2017. [Google Scholar]
  10. Shor, N.Z. Using of space dilation operations in problems of convex functions minimization. Kibernetika 1970, 1, 6–12. [Google Scholar]
  11. Sergienko, I.V.; Stetsyuk, P.S. On N.Z. Shor’s three scientific ideas. Cybern. Syst. Anal. 2012, 48, 2–16. [Google Scholar] [CrossRef]
  12. Davidon, W.C. Variable Metric Methods for Minimization; AEC Research and Development Report; Department of Commerce: Washington, DC, USA, 1959; ANL 5990 (Rev.). [Google Scholar]
  13. Fletcher, R.; Powell, M.J.D. A rapidly convergent descent method for minimization. Comput. J. 1963, 6, 163–168. [Google Scholar] [CrossRef]
  14. Pshenichnyi, B.N.; Danilin, Y.M. Numerical Methods in Extremal Problems; Mir Publishers: Nauka, Russia, 1975. [Google Scholar]
  15. Gill, P.E.; Murray, W.; Wright, M.H. Practical Optimization; Academic Press: London, UK, 1981. [Google Scholar]
  16. Bertsekas, D.P. Nonlinear Programming, 3rd ed.; Athena Scientific: Nashua, NH, USA, 2016. [Google Scholar]
  17. Nocedal, J.; Wright, S.J. Numerical Optimization; Springer: New York, NY, USA, 2006. [Google Scholar]
  18. Beck, A.; Teboulle, M. Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 2003, 31, 167–175. [Google Scholar] [CrossRef]
  19. Nemirovski, A. Prox-method with rate of convergence O(1/T) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 2004, 13, 229–251. [Google Scholar] [CrossRef]
  20. Chabak, L.; Semenov, V.; Vedel, Y. A new non-Euclidean proximal method for equilibrium problems. In Recent Developments in Data Science and Intelligent Analysis of Information. ICDSIAI 2018. Advances in Intelligent Systems and Computing, vol 836; Chertov, O., Mylovanov, T., Kondratenko, Y., Kacprzyk, J., Kreinovich, V., Stefanuk, V., Eds.; Springer: Cham, Switzerland, 2019; pp. 50–58. [Google Scholar]
  21. Semenov, V.V.; Denisov, S.V.; Kravets, A.V. Adaptive Two-Stage Bregman Method for Variational Inequalities. Cybern. Syst. Anal. 2021, 57, 959–967. [Google Scholar] [CrossRef]
  22. Semenov, V.V. A Version of the Mirror descent Method to Solve Variational Inequalities. Cybern. Syst. Anal. 2017, 53, 234–243. [Google Scholar] [CrossRef]
  23. Shor, N.Z. Minimization Methods for Non-Differentiable Functions and Their Applications; Nauk Dumka: Kiev, Ukraine, 1979. [Google Scholar]
  24. Shor, N.Z. Minimization Methods for Non-Differentiable Functions; Springer: Berlin/Heidelberg, Germany, 1985. [Google Scholar]
  25. Shor, N.Z. Nondifferentiable Optimization and Polynomial Problems; Kluwer Academic Publishers: Boston, MA, USA; Dordrecht, The Netherlands; London, UK, 1998. [Google Scholar]
  26. Shor, N.Z.; Stetsenko, S.I. Quadratic Extremal Problems and Non-Differentiable Optimization; Nauk Dumka: Kiev, Ukraine, 1989. [Google Scholar]
  27. Stetsyuk, P.I. r-algorithms and ellipsoids. Cybern. Syst. Anal. 1996, 32, 93–110. [Google Scholar] [CrossRef]
  28. Stetsyuk, P.I. Ellipsoid Methods and r-Algorithms; Evrika: Chișinău, Moldova, 2014. [Google Scholar]
  29. Stetsyuk, P.I. Orthogonalizing linear operators in convex programming. I. Cybern. Syst. Anal. 1997, 33, 386–401. [Google Scholar] [CrossRef]
  30. Stetsyuk, P.I. Orthogonalizing linear operators in convex programming. II. Cybern. Syst. Anal. 1997, 33, 700–709. [Google Scholar] [CrossRef]
  31. Pshenichnyj, B. The Linearization Method for Constrained Optimization; Computational Mathematics; Springer: New York, NY, USA, 1994. [Google Scholar]
  32. Agmon, S. The relaxation method for linear inequalities. Can. J. Math. 1954, 6, 382–392. [Google Scholar] [CrossRef]
  33. Motzkin, T.; Schoenberg, I.J. The relaxation method for linear inequalities. Can. J. Math. 1954, 6, 393–404. [Google Scholar] [CrossRef]
  34. Polyak, B.T. Minimization of unsmooth functionals. Comput. Math. Math. Phys. 1969, 9, 507–521. [Google Scholar] [CrossRef]
  35. Vasin, V.V.; Eremin, I.I. Operators and Iterative Processes of Fejér Type: Theory and Applications; De Gruyter: Berlin, Germany; New York, NY, USA, 2009. [Google Scholar]
  36. Bauschke, H.H.; Combettes, P.L. Convex Analysis and Monotone Operators Theory in Hilbert Spaces; Springer: Cham, Switzerland, 2017. [Google Scholar]
  37. Shchepakin, M.B. On the orthogonal descent method. Kibernetika 1987, 1, 58–62. [Google Scholar] [CrossRef]
  38. Skokov, V.A.; Shchepakin, M.B. Numerical analysis of the orthogonal descent method. Cybern. Syst. Anal. 1994, 30, 274–282. [Google Scholar] [CrossRef]
  39. Shchepakin, M.B.; Shubenkova, I.A. A modified orthogonal-descent algorithm for finding the zero of a complex function. Cybern. Syst. Anal. 1993, 29, 522–530. [Google Scholar] [CrossRef]
  40. Camerini, P.; Fratta, L.; Maffioli, F. On improving relaxation methods by modified gradient techniques. Math. Program 1975, 3, 26–34. [Google Scholar]
  41. Rzhevskiy, S.V. Monotonous Methods of Convex Programming; Nauk Dumka: Kiev, Ukraine, 1993. [Google Scholar]
  42. Hiriart-Urruty, J.B.; Lemarechal, C. Convex Analysis and Minimization Algorithms; Springer: Berlin/Heidelberg, Germany, 1994; Volumes I–II. [Google Scholar]
  43. Bonettini, S.; Benfenati, A.; Ruggiero, V. Scaling Techniques for ε-Subgradient Methods. SIAM J. Optim. 2016, 3, 1741–1772. [Google Scholar] [CrossRef]
  44. Demmel, J. Applied Numerical Linear Algebra; SIAM: Philadelphia, PA, USA, 1997. [Google Scholar]
  45. Golub, G.H.; Van Loan, C.F. Matrix Computations, 4th ed.; Johns Hopkins University Press: Baltimore, MD, USA, 2013. [Google Scholar]
  46. Lemarechal, C.; Mifflin, R. Nonsmooth Optimization; Pergamon Press: Oxford, UK, 1978. [Google Scholar]
  47. Lemarechal, C. Numerical experiments in nonsmooth optimization. In Progress in Nondifferentiable Optimization; Nurminski, E.A., Ed.; International Institute for Applied System Analysis: Laxenburg, Austria, 1982; pp. 61–84. [Google Scholar]
  48. Sergienko, I.V. Methods of Optimization and Systems Analysis for Problems of Transcomputational Complexity; Springer: New York, NY, USA, 2012. [Google Scholar]
  49. Lyashko, S.I. Generalized Optimal Control of Linear Systems with Distributed Parameters; Kluwer Academic Publishers: Dordrecht, The Netherlands, 2002. [Google Scholar]
  50. Kiseleva, E.M.; Stroyeva, V.A. Algorithm of solving of nonlinear continuous multicomponent problem of optimal set partitioning with placement of subsets centers. J. Autom. Inf. Sci. 2012, 44, 15–29. [Google Scholar]
  51. Kiseleva, E.M.; Shor, N.Z. Continuous Problems of Optimal Set Partition: Theory, Algorithms, Applications; Naukova Dumka: Kiev, Ukraine, 2005. [Google Scholar]
  52. Kiseleva, E.M.; Koryashkina, L.S. Continuous Problems of Optimal Set Partition and r-Algorithms; Naukova Dumka: Kiev, Ukraine, 2015. [Google Scholar]
  53. Shor, N.Z.; Sergienko, I.V.; Shylo, V.P.; Stetsyuk, P.I.; Parasyuk, I.M.; Lebedeva, T.T.; Laptin, Y.P.; Zhurbenko, M.G.; Bardadym, T.O.; Sharifov, F.A.; et al. Problems of Optimal Design of Reliable Networks; Naukova Dumka: Kiev, Ukraine, 2005. [Google Scholar]
  54. Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed.; Springer Series in Statistics: Textbook; Springer: Berlin/Heidelberg, Germany, 2016. [Google Scholar]
  55. Shor, N.Z.; Stetsyuk, P.I. Lagrangian bounds in multiextremal polynomial and discrete optimization problems. J. Glob. Optim. 2002, 23, 1–41. [Google Scholar] [CrossRef]
  56. Butenko, S.; Pardalos, P.; Shylo, V. (Eds.) Optimization Methods and Applications: In Honor of Ivan V. Sergienko’s 80th Birthday; Springer: Cham, Switzerland, 2017. [Google Scholar]
  57. Romanova, T.E.; Stetsyuk, P.I.; Fischer, A.; Yaskov, G.M. Proportional Packing of Circles in a Circular Container. Cybern. Syst. Anal. 2023, 59, 82–89. [Google Scholar] [CrossRef]
  58. Litvinchev, I.; Fischer, A.; Romanova, T.; Stetsyuk, P. A New Class of Irregular Packing Problems Reducible to Sphere Packing in Arbitrary Norms. Mathematics 2024, 12, 935. [Google Scholar] [CrossRef]
  59. Litvinchev, I.S. Refinement of lagrangian bounds in optimization problems. Comput. Math. Math. Phys. 2007, 47, 1101–1108. [Google Scholar] [CrossRef]
  60. Litvinchev, I.; Rangel, S.; Saucedo, J. A Lagrangian bound for many-to-many assignment problems. J. Comb. Optim. 2010, 19, 241–257. [Google Scholar] [CrossRef]
  61. Litvinchev, I.; Ríos-Solís, Y.; Ozdemir, D.; Hernandez-Landa, L. Multiperiod and stochastic formulations for a closed loop supply chain with incentives. J. Comput. Syst. Sci. Int. 2014, 53, 201–211. [Google Scholar] [CrossRef]
  62. Wang, H.; Feng, R.; Leung, C.-S.; Chan, H.P.; Constantinides, A.G. A Lagrange Programming Neural Network Approach with an L0-Norm Sparsity Measurement for Sparse Recovery and Its Circuit Realization. Mathematics 2022, 10, 4801. [Google Scholar] [CrossRef]
  63. Halimu, Y.; Zhou, C.; You, Q.; Sun, J. A Quantum-Behaved Particle Swarm Optimization Algorithm on Riemannian Manifolds. Mathematics 2022, 10, 4168. [Google Scholar] [CrossRef]
Figure 1. (a) 2 d -ellipsoid E l l ( x 0 , r ) ; (b) 2 d -ellipsoid after transformation.
Figure 1. (a) 2 d -ellipsoid E l l ( x 0 , r ) ; (b) 2 d -ellipsoid after transformation.
Mathematics 12 01527 g001
Figure 2. Comparison of the r-algorithm (light grey columns) and the method (27)–(32) with ε f = 10 8 (grey columns) and ε f = 10 12 (dark grey columns) for seven problems.
Figure 2. Comparison of the r-algorithm (light grey columns) and the method (27)–(32) with ε f = 10 8 (grey columns) and ε f = 10 12 (dark grey columns) for seven problems.
Mathematics 12 01527 g002
Figure 3. Trajectories of (a) method (2) and (b) the method (27)–(32) for the piecewise linear function f 1 ( x 1 , x 2 ) = | x 1 | + 10 | x 2 | .
Figure 3. Trajectories of (a) method (2) and (b) the method (27)–(32) for the piecewise linear function f 1 ( x 1 , x 2 ) = | x 1 | + 10 | x 2 | .
Mathematics 12 01527 g003
Figure 4. Trajectories of (a) method (2) and (b) the method (27)–(32) for the piecewise quadratic function f 2 ( x 1 , x 2 ) = max { x 1 2 + ( 2 x 2 2 ) 2 3 , x 1 2 + ( x 2 + 1 ) 2 } .
Figure 4. Trajectories of (a) method (2) and (b) the method (27)–(32) for the piecewise quadratic function f 2 ( x 1 , x 2 ) = max { x 1 2 + ( 2 x 2 2 ) 2 3 , x 1 2 + ( x 2 + 1 ) 2 } .
Mathematics 12 01527 g004
Table 1. ε f -convergence of the methods (23)–(26) and (27)–(32).
Table 1. ε f -convergence of the methods (23)–(26) and (27)–(32).
Method (23)–(26)Method (27)–(32)
Problem, n iter ( ε 0 ) iter ( ε 0 2 ) iter ( ε 0 ) iter ( ε 0 2 )
S h o r , n = 5 112(109)227(224)38(36)70(68)
M a x q u a d , n = 10 120(113)293(286)41(35)85(79)
Q u a d ( 3 ) , n = 5 40(11)73(11)40(11)73(11)
Q u a d ( 3 ) , n = 10 82(60)115(74)76(59)109(80)
Q u a d ( 10 ) , n = 5 60(22)93(22)57(21)90(21)
Q u a d ( 10 ) , n = 10 187(141)220(152)148(124)181(141)
Table 2. ε f -convergence of the method (27)–(32).
Table 2. ε f -convergence of the method (27)–(32).
Problem, n f ( x 0 ) f * iter 10 5 iter 10 10 iter 10 20
Q u a d ( 1.1 ) , n = 50 581.95542(32)65(49)102(73)
S a b s ( 1.1 ) , n = 50 1163.909176279347
Q u a d ( 1.05 ) , n = 100 1305.01351(43)79(65)124(97)
S a b s ( 1.05 ) , n = 100 2610.025318424614
Table 3. ε f convergence of the methods O R T G F ( 0.5 ) and O R T G F ( 1.0 ) .
Table 3. ε f convergence of the methods O R T G F ( 0.5 ) and O R T G F ( 1.0 ) .
OTRGF ( 0.5 ) OTRGF ( 1.0 )
Problem, n iter ( ε 0 ) iter ( ε 0 2 ) iter ( ε 0 ) iter ( ε 0 2 )
S h o r , n = 5 33(30,4)59(56,4)33(30,4)69(66,4)
M a x q u a d , n = 10 45(37,5)95(87,5)42(35,5)88(79,5)
Q u a d ( 3.0 ) , n = 5 40(9,2)71(9,2)52(30,3)96(58,3)
Q u a d ( 3.0 ) , n = 10 80(61,5)113(62,5)86(68,3)141(109,3)
Q u a d ( 10.0 ) , n = 5 57(26,3)90(26,3)50(22,3)74(22,3)
Q u a d ( 10.0 ) , n = 10 156(123,8)189(128,8)131(109,4)193(161,4)
Table 4. ε f convergence of the method O R T G F ( 1.0 ) for strongly ravine problem Q u a d ( t ) .
Table 4. ε f convergence of the method O R T G F ( 1.0 ) for strongly ravine problem Q u a d ( t ) .
Problem, n Quad ( 2.0 ) ,
n = 30
Quad ( 1.2 ) ,
n = 60
Quad ( 1.2 ) ,
n = 100
O T R G F ( 1.0 ) , i t e r ( 10 10 ) 236(229,5)188(183,5)428(422,8)
m 0 = n 1 i t e r ( 10 20 ) 332(315,5)277(259,5)542(534,8)
O T R G F ( 1.0 ) , i t e r ( 10 10 ) 236(229)188(183)428(422)
m 0 = 10 i t e r ( 10 20 ) 332(315)277(259)542(534)
Table 5. ε f convergence of the method O R T G F ( 1.0 ) for strongly ravine problem S a b s ( t ) .
Table 5. ε f convergence of the method O R T G F ( 1.0 ) for strongly ravine problem S a b s ( t ) .
Problem, n Sabs ( 2.0 ) ,
n = 30
Sabs ( 1.2 ) ,
n = 60
Sabs ( 1.2 ) ,
n = 100
O T R G F ( 1.0 ) , i t e r ( 10 10 ) 476(464,12)464(462,21)1480(1475,19)
m 0 = n 1 i t e r ( 10 20 ) 527(524,12)541(539,21)1564(1559,19)
O T R G F ( 1.0 ) , i t e r ( 10 10 ) 462(459)469(467)1293(1291)
m 0 = 10 i t e r ( 10 20 ) 523(520)544(542)1375(1373)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Semenov, V.; Stetsyuk, P.; Stovba, V.; Velarde Cantú, J.M. One-Rank Linear Transformations and Fejer-Type Methods: An Overview. Mathematics 2024, 12, 1527. https://doi.org/10.3390/math12101527

AMA Style

Semenov V, Stetsyuk P, Stovba V, Velarde Cantú JM. One-Rank Linear Transformations and Fejer-Type Methods: An Overview. Mathematics. 2024; 12(10):1527. https://doi.org/10.3390/math12101527

Chicago/Turabian Style

Semenov, Volodymyr, Petro Stetsyuk, Viktor Stovba, and José Manuel Velarde Cantú. 2024. "One-Rank Linear Transformations and Fejer-Type Methods: An Overview" Mathematics 12, no. 10: 1527. https://doi.org/10.3390/math12101527

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop