A homogeneous Rayleigh quotient with applications in gradient methods

Given an approximate eigenvector, its (standard) Rayleigh quotient and harmonic Rayleigh quotient are two well-known approximations of the corresponding eigenvalue. We propose a new type of Rayleigh quotient, the homogeneous Rayleigh quotient, and analyze its sensitivity with respect to perturbations in the eigenvector. Furthermore, we study the inverse of this homogeneous Rayleigh quotient as stepsize for the gradient method for unconstrained optimization. The notion and basic properties are also extended to the generalized eigenvalue problem.

• A nonlinear Galerkin condition for this homogeneous Rayleigh quotient is derived.
• Asymptotic bounds on the relative error to an eigenvalue are obtained.
• The quotient is compared with standard and harmonic Rayleigh quotients.
• We study the inverse of this quotient as a stepsize for gradient methods.

Introduction
Let A be an n × n symmetric matrix with eigenvalues λ 1 ≤ · · · ≤ λ n . We are interested in the eigenproblem Ax = λx. Let u ∈ R n be an approximate eigenvector to x with unit 2-norm. Traditionally, the Rayleigh quotient is the standard approach to determine an eigenvalue approximation corresponding to u, which yields the corresponding eigenvalue when u is an exact eigenvector. This θ satisfies two (related) optimality properties, a Galerkin condition (cf., e.g., [1, pp. 13-14]) and a minimum residual condition A second well-known quantity is the harmonic Rayleigh quotient, which is, for instance, sometimes considered when one is interested in an eigenvalue near a given target τ ∈ R. This is defined as (cf., e.g., [2, p. 294]) provided that the denominator is nonzero. This quotient is also equal to the eigenvalue when u is an eigenvector. It satisfies two optimality conditions: a Galerkin condition (A − θ τ I) u ⊥ (A − τ I) u, and a minimum residual condition (or secant condition) θ τ = argmin γ (γ − τ ) −1 (A − τ I) u − u (cf. [3,Prop. 5]).
In this paper, we mainly focus on the case τ = 0; for nonzero u T Au, the harmonic Rayleigh quotient is given by (cf. [4]) For τ = 0, the Galerkin condition becomes and the minimum residual condition is Condition (7) is easily verified by setting the derivative of γ −1 Au − u with respect to γ to zero. Property (7) is perhaps less widely known. Both Properties (3) and (7) have been exploited as secant conditions to determine a stepsize in gradient-type optimization methods; for instance for the wellknown Barzilai and Borwein steplengths [5]. We return to this topic in Section 5; see also Vömel [6].
As an alternative to (3) and (7), we propose and study a new homogeneous type of Rayleigh quotient in this paper. We work in the projective space P(R), and replace a real number α by a quotient employing homogeneous coordinates (α 1 , α 2 ): Note that the restriction α 2 ∈ [0, 1] is without loss of generality. The pair (1, 0) corresponds to the point at infinity. Homogeneous techniques for eigenvalue problems have already been developed by various authors. Stewart and Sun [7, p. 283] exploit homogeneous coordinates for the generalized eigenvalue problem Ax = λB x, for B ∈ R n×n , since this is "especially convenient for treating infinite eigenvalues" [8]. While the standard eigenvalue problem Ax = λx does not involve infinite eigenvalues, the harmonic Rayleigh quotient (5) may be infinite. Dedieu and Tisseur [9,10] exploit homogeneous techniques to study perturbation theory for generalized and polynomial eigenproblems. Inspired by this, a homogeneous Jacobi-Davidson method for subspace expansion has been proposed in [11]; this may be viewed as an inexact Newton method.
The contributions and outline of this paper are the following. We define a new homogeneous Rayleigh quotient as the minimizer of the residual quantity min (α 1 , α 2 ) ∈ P α 1 u − α 2 Au .
In Section 2, we show that there is a closed-form solution to express this homogeneous Rayleigh quotient, and that it can also be obtained from a certain nonlinear Galerkin condition. To quantify the quality of the homogeneous Rayleigh quotient as an approximate eigenvalue, a bound for the (chordal) distance between the homogeneous Rayleigh quotient and the spectrum of A is derived.
We compare the homogeneous Rayleigh quotient with the standard and harmonic Rayleigh quotients. Given the relation between the homogeneous residual quantity and the two residual quantities (3) and (7) (see also Section 2.3), the homogeneous Rayleigh quotient may be seen as the mediator between the two other Rayleigh quotients. In Section 3, we highlight this fact in the theoretical and experimental study on the relative errors of the three quotients. Asymptotic bounds on the accuracy of the three quotients are also derived and compared. Interestingly, the homogeneous Rayleigh quotient may be more accurate than its two competitors in some situations.
In Section 4, we extend the definition of the homogeneous Rayleigh quotient to the generalized eigenvalue problem (A, B), with A symmetric and B symmetric positive definite (SPD). Finally, in Section 5, we propose a new stepsize for gradient methods based on the homogeneous Rayleigh quotient. Conclusions are drawn in Section 6.
Added note: after writing this paper, we became aware of the very recent independent work [12] (with preliminary version [13]), where the authors obtained one of our results, the stepsize (31), in a different way. Whereas their approach is via a total least squares technique, ours is based on exploiting homogeneous coordinates. We will discuss some more details in Section 5.

A homogeneous Rayleigh quotient
For a given approximate eigenvector u, we consider the minimization of the homogeneous residual (as mediator of the two residual quantities in (3) and (7)): For nonzero α 2 we call the ratio between the coordinates of the solution α = α 1 /α 2 the homogeneous Rayleigh quotient.

Key properties of the homogeneous Rayleigh quotient
We discuss some properties of the solution to (8). First, let us introduce the n × 2 matrix C = [u −Au] and the associated 2 × 2 matrix where the second equality holds when u T Au is nonzero. Although we usually impose u = 1, we will not exploit this simplification in view of more general use in Section 5.
In the next proposition, assuming that u T Au = 0, we can provide an explicit formula for the homogeneous Rayleigh quotient α. In addition, we show that α is an eigenvalue of A when u is an eigenvector, and that it is located between the Rayleigh quotient and the harmonic Rayleigh quotient, depending on the sign of u T Au. Proposition 1. Suppose that u T Au = 0 and denote the smallest eigenvalue of C T C by µ. Then the following properties hold.
(i) The value µ is a simple eigenvalue of C T C, or, equivalently, √ µ is a simple singular value of C; its corresponding eigenvector [α 1 , α 2 ] T is the unique minimizer of (8).
(ii) The value of µ is (iii) For the homogeneous Rayleigh quotient α we have In terms of the standard and harmonic Rayleigh quotient, (iv) µ = 0 if and only if u is an eigenvector of A. If u is an eigenvector, then the corresponding homogeneous Rayleigh quotient α is the corresponding eigenvalue. ( (vi) We have the following bounds: Proof. It is well known that the minimizer of (8) is the eigenvector corresponding to the smallest eigenvalue of the matrix C T C; or, equivalently, to the right singular vector of C corresponding to the smallest singular value. We start with the eigenvalue problem The eigenvalue µ of (9) is simple because u T Au is assumed nonzero, so that the discriminant of the quadratic characteristic polynomial is positive. In particular, this implies that the smallest eigenvector [α 1 , α 2 ] T is well defined. The result in part (ii) is an explicit expression for the eigenvalue µ. Its corresponding eigenvector [α 1 , α 2 ] T is used in item (iii) to give an expression for the ratio α = α 1 /α 2 , which gives the homogeneous Rayleigh quotient, that can also be written in terms of the standard and the harmonic Rayleigh quotients. If µ = 0, this means that C has rank 1. This happens precisely when u is an eigenvector. Moreover, if u is an eigenvector corresponding to some eigenvalue λ, equations (10) and (11) give µ = 0 and α = λ respectively. This concludes the proof of part (iv).
The nonnegativity of µ directly follows from the fact that C T C is positive semidefinite. Since u T Au = 0 we have . In particular, this shows that the sign of α only depends on the quantity u T Au. Given this fact, property (vi) is a direct consequence of part (v).
In the exceptional case of u T Au = 0, the matrix C T C is diagonal, with zero Rayleigh quotient and infinite harmonic Rayleigh quotient. The homogeneous Rayleigh quotient α = (α 1 , α 2 ) is either zero (that is, (0, 1)), or the point at infinity (1, 0). The situation u T Au = 0 cannot happen if A is definite, such as for quadratic optimization problems with an SPD Hessian; see also Section 5. Moreover, u T Au = 0 may not be a very common situation in the context of eigenproblems; this usually means that we are approximating a zero eigenvalue or that u is a poor approximate eigenvector.
Rayleigh quotients have some natural invariance properties. It is easy to check that the standard Rayleigh quotient is invariant under nonzero scalings of u; it is invariant under nonzero scalings of A, and it shifts naturally with shifts of A. The harmonic Rayleigh quotient (with target τ ) only satisfies the first property, but modified properties still hold, as follows.
Proof. This is easy to check from (4).
As for the homogeneous Rayleigh quotient, it is easy to see (for instance, from (11)) that the solution to the minimization problem (8) is invariant under nonzero scaling of u: as function of u, we have α(ζ u) = α(u) for ζ = 0. Second, while the standard and harmonic Rayleigh quotients are invariant under multiplications A → ζA, there is no easy relation between α(ζA) and α(A). However, in view of Proposition 1(vi), we establish these bounds: The same upper and lower bounds hold for ζα(A). Therefore, if the inter- On the size of this interval, we note that when θ ≤ θ (which for instance holds in the SPD case), the "relative size" of the interval [θ, θ] can be expressed in terms of the angle between u and Au, since ( θ − θ) / θ = cos −2 (Au, u) − 1 = tan 2 (Au, u). This means that if ∠(Au, u) is small, the interval is small, and all three Rayleigh quotients are close. Still, Example 1 will show that one quotient can still be considerably more accurate than the other two.
Moreover, we have For θ −1 tan(Au, u) (that is, for large eigenvalues), this expression is close to tan 2 (Au, u), which means that α ≈ θ. In contrast, if θ −1 max(tan(Au, u), 1) (i.e., for small eigenvalues), one may check that α−θ θ = O(θ 2 ) ≈ 0, which implies that α ≈ θ. In conclusion, for a small ∠(Au, u), the homogeneous Rayleigh quotient is close to the Rayleigh quotient for small eigenvalues, and close to the harmonic Rayleigh quotient for large ones. Similar remarks can be made when θ < θ.
Simple manipulations of Proposition 1(vi) also lead to some relations between the relative errors of the three Rayleigh quotients, when the estimated eigenvalues are either λ 1 or λ n . If A is SPD, then both the standard and harmonic Rayleigh quotients lie in the spectrum of A, i.e., θ, θ ∈ [λ 1 , λ n ]. Thus the following inequalities hold: We conclude that, in the SPD case, the Rayleigh quotient is more accurate when we estimate the smallest eigenvalue of A, while the harmonic Rayleigh quotient is more accurate for the largest eigenvalue. The homogeneous Rayleigh quotient lies in between. This may suggest the perhaps not so well-known result that to approximate large eigenvalues, the harmonic Rayleigh quotient may be preferable over the standard Rayleigh quotient.
For indefinite matrices, while θ ∈ [λ 1 , λ n ], the harmonic and the homogeneous Rayleigh quotients can lie outside the spectrum. In particular, (12) does not hold if A is not SPD. Example 1 shows that the harmonic and homogeneous Rayleigh quotients may overestimate the largest eigenvalues, and the sensitivity of the homogeneous Rayleigh quotient may be smaller than the other two.
The quotients are θ(u) ≈ 1.9988 < λ 3 , θ(u) ≈ 2.0003 > λ 3 and α(u) ≈ 2.00002 > λ 3 , so the harmonic and the homogeneous Rayleigh quotients overestimate λ 3 . Nevertheless, the homogeneous Rayleigh quotient is an order of magnitude more accurate than the harmonic Rayleigh quotient, and two orders more accurate than the standard Rayleigh quotient. This example highlights that the homogeneous Rayleigh quotient might be more accurate, especially for exterior eigenvalues of indefinite matrices. We will show another example in Figure 1 in Section 3.
When the wanted eigenvalue is in the interior of the spectrum, or if A is indefinite, it is more difficult to derive general conclusions about the ordering of the relative errors. We will discuss this in more detail in Section 3.

A Galerkin condition for the homogeneous Rayleigh quotient
In view of (2) and (6), the question arises whether there exists a Galerkin (orthogonality) condition based on the span of u and Au for the homogeneous approach. Let us first express the homogeneous Rayleigh quotient as a solution to a quadratic equation. As a direct consequence, this property connects the homogeneous Rayleigh quotient and the harmonic Rayleigh quotient with target.
Proof. It is straightforward to check that (11) is a solution to (13). The solutions to (13) have opposite signs since their product is −1. Given that 0 ≤ µ < min(u T u, u T A 2 u) from Proposition 1, the homogeneous Rayleigh quotient has the same sign as u T Au, thus it corresponds to the solution to (13) for which α u T Au > 0. The second part follows immediately from the fact that imposing θ −α −1 = α is equivalent to solving (13).
We remark that the second solution of (13) is related to the smallest eigenvector of C T C, which maximizes (rather than minimizes) the homogeneous residual quantity (8).
Another equivalent point of view on this proposition is the following. Equation (13) can also be expressed as Therefore, we have the nonlinear Galerkin condition This again highlights that α can be viewed as a harmonic Rayleigh quotient with −α −1 as target.

Properties of the homogeneous residual quantity
There is a connection between the two minimal residual conditions of the standard and harmonic Rayleigh quotients ((3) and (7)) and the quadratic equation (13). The stationarity conditions for (3) and (7), respectively, can be stated as This means that (14) is a linear combination of the two stationarity conditions of the standard and the harmonic Rayleigh quotients. This fact, together with the bounds derived in Proposition 1(vi), lets us interpret the homogeneous Rayleigh quotient as a "mediator" between the standard and the harmonic Rayleigh quotient. Experiments in Section 3.3 will also highlight this mediating behavior.
It is an open question whether it is possible to relate the objective function (8) to a combination of the two residuals associated with the standard and the harmonic Rayleigh quotients. Nevertheless, it is possible to show that, when u T Au = 0, the homogeneous residual quantity is smaller than the other two residual quantities (see (3) and (7)). In fact, we can rewrite (8) in two ways: meaning that the homogeneous residual quantity corresponds to the standard and harmonic residual quantities multiplied by a factor smaller than one.
From (15), we may also relate the homogeneous residual to a measure of distance between the homogeneous Rayleigh quotient and the spectrum of A. As stated in [1, Thm. 4.5.1] and [6,Thm. 3.7], for the standard Rayleigh quotient and the harmonic Rayleigh quotient there exists certain eigenvalues of A, say λ and λ, such that provided that Au = 0 in the second case. These inequalities hold for any θ and θ, but the Rayleigh quotients θ and θ have the advantage of minimizing the corresponding residual quantities. A similar result can be stated for the homogeneous Rayleigh quotient in terms of the chordal metric (see, e.g., [ where the upper bound is minimized by the choice of the homogeneous Rayleigh quotient for α. Note that, as explained in [7,Ch. 6], the chordal metric behaves counterintuitively for large eigenvalues: while the chordal metric is small, the relative error can still be large. For a fair comparison of the accuracy of the three Rayleigh quotients, we consider asymptotic bounds on their relative error when the vector u approximates an eigenvector in the next section.

Sensitivity analysis
We now study the sensitivity of the various Rayleigh quotients with respect to perturbations in the approximate eigenvector u. Note that this sensitivity is different from that expressed by the condition number κ(λ) of an eigenvalue, which is related to its perturbation as function of changes in the matrix A. We note that for a simple eigenvalue λ of a symmetric A, it holds that κ(λ) = 1; this means that the eigenvalue is perfectly conditioned (see, e.g., [1, p. 16]).
Studying the sensitivity with respect to the approximate eigenvector for the standard and harmonic Rayleigh quotient is certainly not new (cf., e.g., [14,15] and the references therein), but we derive a new result for the homogeneous Rayleigh quotient, and obtain expressions that allow an easy comparison of the three Rayleigh quotients. We will also comment on the differences between existing results and ours.

Rayleigh quotient and harmonic Rayleigh quotient
As ansatz, suppose u = x + e is an approximate eigenvector, where x = 1, e ⊥ x, and ε := e is small. Then the perturbation of the standard Rayleigh quotient is where we use the fact that ( For the harmonic Rayleigh quotient with target τ the expression becomes In particular, we note that θ(x) = θ(x) = θ τ (x) = λ. We also point out that we have studied similar expressions for inverse Rayleigh quotients (as stepsizes for gradient methods) in [3].
To derive the lower and upper bounds on the sensitivity of the approximate eigenvalues, we make use of this standard result for symmetric operators.
Lemma 2. Suppose u = x + e is an approximate eigenvector corresponding to a simple eigenvalue λ, with x = 1, e ⊥ x, and ε = e , of a symmetric A. Let p be a polynomial. Then Proof. Since A is symmetric, it is diagonalizable by an orthogonal transformation, and we may assume that A = diag(λ 1 , . . . , λ n ). With the notation λ = λ j , it follows that |e T p(A)e| = | i =j e 2 i p(λ i )|, from which the result follows easily.
The above expressions imply the following lower and upper bounds on the sensitivity of the approximate eigenvalues as function of the approximate eigenvector. The assumption λ = 0 in the next propositions may be viewed as non-restrictive: for zero eigenvalues we can consider the absolute error |θ(u) − λ| instead.
We mention that the bound (20) has a slightly improved version, as follows (cf., e.g., [15,Thm. 2.1] for the smallest eigenvalue). For this context we introduce e = e e of unit length, u = x+e as before, and u = u u , where u 2 = 1 + ε 2 . We can decompose u = cos( u, x) x + sin( u, x) e. Then an easy computation gives u T A u − λ = sin 2 ( u, x) e T (A − λI) e and therefore To connect the approximate bound (20) to (22), we note that sin 2 ( u, x) = ε 2 1+ε 2 , which is asymptotically equal to ε 2 . The bound (22) is exact (i.e., not an asymptotic bound as in Proposition 4) and sharp, but it is asymptotically equal to our expression. An exact bound for the error of the harmonic Rayleigh quotient with target is provided in [14,Thm. 5.2]. Under certain assumptions mentioned in [14] for the target τ and ε, with x = x + ε e, it is shown that .
Since we do not make any assumptions on the target or the value of e , our upper bound (21) includes absolute values. Discarding the ε 2 -term in the denominator yields an approximate upper bound accurate to O(ε 4 )terms. In conclusion, (21) is less sharp in some situations but more general, and asymptotically the same as the result of [14]. Another good reason to consider (20) and (21) is that approximate bounds for the error of the homogeneous Rayleigh quotient can also be obtained via the second-order approximation of the quotient, as we have done in Proposition 4. At the end of this section, we will also point out that the three approximate bounds have the same form.

Homogeneous Rayleigh quotient
We now determine approximate bounds for the sensitivity of the homogeneous Rayleigh quotient.
Proposition 5. Suppose u = x + e is an approximate eigenvector corresponding to a simple eigenvalue λ = 0 of a symmetric A, with x = 1, e ⊥ x, and ε = e . Then, up to O(ε 4 )-terms, for the sensitivity of the homogeneous Rayleigh quotient (as function of u) it holds that: Proof. From Proposition 1(iii), the homogeneous Rayleigh quotient can be written as We can express ω in terms of λ and e as ω = 1 − λ 2 + e 2 − Ae 2 2 (λ + e T Ae) .
Furthermore, when e = ε → 0, . For the square root, we use the first-order approximation √ 1 + t = 1 + 1 2 t + O(t 2 ). Discarding O(ε 4 )-terms yields The thesis follows from Lemma 2, with the polynomial p(t) = (t+λ −1 )(t−λ). An alternative method to derive this result is via the Implicit Function Theorem. From (14), for the pair (α(u), u), we have the implicit equation Since ∇α(x) = 0, we consider the second-order approximation of the homogeneous Rayleigh quotient α(x + e) ≈ α(x) + 1 2 e T ∇ 2 α(x) e. The action of ∇ 2 α(x) can be obtained from the first-order approximation of the gradient in x: This also leads to the expression in (23).
Interestingly, by replacing τ = −λ −1 in the approximation of the harmonic Rayleigh quotient with target (19), we get the approximation (23) for the homogeneous Rayleigh quotient.
We now compare the different asymptotic bounds. With a little abuse of notation, let θ(u) be any of the three Rayleigh quotients in u. Then the previous results can be summarized by where the different polynomials p λ (·) are reported in Table 1, along with their pointwise asymptotic behavior for λ → 0 and λ → ±∞. This table provides a naive explanation of what we will observe in the experiments: the homogeneous Rayleigh quotient tends to have a similar behavior to the Rayleigh quotient for small eigenvalues, and it follows the harmonic Rayleigh quotient for large eigenvalues.
We remark that, for the homogeneous Rayleigh quotient, p λ = p −λ −1 . In the special case that both λ and its anti-reciprocal −λ −1 lie in the spectrum of A, the asymptotic lower bound for the homogeneous Rayleigh quotient will be zero, since in general λ = −λ −1 and both eigenvalues are roots of p λ .
For the Rayleigh quotient, we have that max λ i =λ |λ i −λ| ∈ {|λ−λ 1 |, |λ− λ n |}. In contrast, the upper bounds for the harmonic Rayleigh quotient or the homogeneous Rayleigh quotient are more complicated. Viewing p λ as a continuous function on [λ 1 , λ n ], the maximum of the corresponding two upper bounds may be attained at the vertex of the parabola p λ , or at the boundary points λ 1 and λ n . Thus, in view of the discrete spectrum, we know that max where λ = 1 2 λ for the harmonic Rayleigh quotient, while λ = 1 2 (λ − λ −1 ) for the homogeneous Rayleigh quotient. Although p λ (t) always has a factor of t − λ, due to the quadratic nature of the upper bound for the harmonic and the homogeneous Rayleigh quotient, we cannot make any a priori comparison with the bound for the Rayleigh quotient; without additional information about the spectrum, we cannot improve on these considerations. The behavior of the relative errors and upper bounds for the various Rayleigh quotients is shown in the next section.

Comparison of the Rayleigh quotients
Without loss of generality, we may assume that A = diag(λ 1 , . . . , λ n ). We consider a family of 100 × 100 positive definite diagonal matrices where the eigenvalues have uniform distribution on the interval [0, 2σ], i.e., λ i ∼ U (0, 2σ), σ > 0. The second family contains 100 × 100 indefinite diagonal matrices where the eigenvalues have Gaussian distribution with mean 0 and variance σ 2 , which we indicate by λ i ∼ N (0, σ), σ > 0. Since the eigenvalues are drawn from continuous probability distributions, there is zero probability that two eigenvalues are equal, so the eigenvectors are well defined with probability one.
An eigenpair of a generic diagonal A is (λ, x), where x is one of the vectors of the canonical basis of R 100 . To compute a random perturbation u = x + ε e, such that e = 1 and x ⊥ e, we start from a vector with random Gaussian components, project it onto the orthogonal complement of x and normalize to get e. Finally we normalize u.
First, we study the behavior of the different Rayleigh quotients when perturbed, with σ ∈ {0.5, 1, 5, 10} and ε = 0.001. For each eigenvalue x we draw 100 random perturbations u j , and take the maximum relative error. The quantity of interest for, e.g., the standard Rayleigh quotient is max j |θ(u j ) − λ| |λ| . Figure 1 shows the maximum relative error as a function of the estimated eigenvalue λ. In these examples, there are peaks close to zero since we consider relative errors; therefore, the Rayleigh quotients are more sensitive when small eigenvalues are approximated. In particular, the harmonic Rayleigh quotient is always more sensitive than the other two for small eigenvalues. By looking at the two families separately, Gaussian and uniformly distributed, we notice that as σ increases, the curves of the Rayleigh quotient and the harmonic Rayleigh quotient remain almost unchanged. In the uniformly distributed case, when σ = 0.5, the homogeneous Rayleigh quotient closely follows the Rayleigh quotient in the first half of the spectrum and slightly departs in the second half. Starting from σ = 1, the curve gets increasingly closer to the harmonic Rayleigh quotient until the two curves are very similar, except for the smaller eigenvalues, where the relative error of the homogeneous Rayleigh quotient is lower than the one of the harmonic Rayleigh quotient. This trend can be partially explained by the asymptotic behavior of the polynomials in Table 1: the homogeneous and harmonic Rayleigh quotient have the same behavior for large eigenvalues, while the homogeneous Rayleigh quotient tends to be similar to the standard Rayleigh quotient for small eigenvalues. We observe the same characteristics in the Gaussian family, with the interesting addition that when σ = 1, the homogeneous Rayleigh quotient is even less sensitive than the harmonic Rayleigh quotient. We have also considered different magnitudes of perturbation ε, where we observe that the relative errors increase as ε increases, but the relative positions of the Rayleigh quotients remain the same. Finally, we look at the asymptotic bounds of Section 3, when λ i ∼ N (0, 1) and λ i ∼ U (0, 2). Instead of just showing the maximum relative error, we draw 10 random perturbations for each eigenvector, and plot the corresponding relative errors. The results are presented in Figure 2. We observe that the lower bounds are more erratic than the upper bounds. This can be explained as follows. In Section 3.2 we have remarked that all bounds depend on a continuous nonnegative function |p λ (t)|, which has a zero in t = λ (cf. Table 1). Therefore, given a λ j , the minimizer argmin λ i =λ |p λ (λ i )| is typically either λ j−1 or λ j+1 ; the maximizer is generally either λ 1 or λ n (cf. Table 1). For this reason, the lower bound shows more variability. As already mentioned, while this statement holds for the Rayleigh quotient, this is not always true for the harmonic and the homogeneous Rayleigh quotients. Regarding the behavior of the relative errors, the plots in Figure 2 reasonably reflect the ones in Figure 1. For the uniformly distributed eigenvalues, the upper bound seems to be sharp at the extremes of the spectrum, while for the Gaussian family it is sharper only close to the smallest eigenvalues (in magnitude). The upper bound for the Rayleigh quotient is also sharp for the largest eigenvalues. While the lower bound of the uniformly distributed family poorly reflects the behavior of the relative error, in the Gaussian family it is tighter. In particular, it is very accurate for the homogeneous Rayleigh quotient. In Figure 1, we have already remarked that the maximum relative error in the homogeneous Rayleigh quotient is lower than for the other Rayleigh quotients. Now we see that for some perturbations, the sensitivity can be even much lower.
So far we have discussed the properties of the homogeneous Rayleigh quo-tient and compared it to the well-known standard and harmonic Rayleigh quotient. Now we extend the homogeneous Rayleigh quotient for the generalized eigenvalue problem and present an application of Rayleigh quotients to gradient methods for unconstrained optimization problems.

The generalized eigenvalue problem
Let us consider the generalized eigenvalue problem (GEP) Ax = λB x, where A is symmetric (definite or indefinite), and B is SPD. This problem has n real eigenvalues λ 1 ≤ · · · ≤ λ n . Although a common generalized Rayleigh quotient is u T Au u T B u (see, e.g., [1,Ch. 15]), it turns out that, in our context, the most relevant quantities are for nonzero u T AB u. The Rayleigh quotient θ satisfies Au − θB u ⊥ B u, and, equivalently, it is the solution to min γ Au − γB u . The harmonic Rayleigh quotient θ satisfies Au − θB u ⊥ Au and solves min γ γ −1 Au − B u . The extension of the homogeneous Rayleigh quotient for the GEP is rather straightforward. It is the solution to As for the standard eigenvalue problem, this amounts to solve a reduced SVD of an n × 2 matrix, or an eigenvalue problem involving a 2 × 2 matrix. The next result is a generalization of Proposition 1. Let C = [B u −Au].
Proposition 6. Suppose that u T AB u = 0 and denote by µ the smallest eigenvalue of C T C. Then the following properties hold.
(i) µ is a simple eigenvalue of C T C, or equivalently √ µ is a simple singular value of C; its corresponding eigenvector [α 1 , α 2 ] T is the unique minimizer of (26), up to orthogonal transformations.
(iii) For the generalized homogeneous Rayleigh quotient α we have (iv) µ = 0 if and only if u is an eigenvector. If u is an eigenvector, then the corresponding homogeneous Rayleigh quotient α is the corresponding eigenvalue.
(vi) We have the following inequalities Proof. This proof follows the exact same lines as those of Proposition 1.
As in Section 2, we can show that the generalized homogeneous Rayleigh quotient is a solution to a quadratic equation.
Proposition 7. Let u T AB u = 0. The generalized homogeneous Rayleigh quotient (11) is the solution to which satisfies α u T AB u > 0.
Proof. The proof is similar to that of Proposition 3.
We derive a Galerkin condition from Proposition 7 as follows. Since u T AB u = u T B Au, (28) is equivalent to u T (αA + B)(A − αB) u = 0. Therefore, we can write the corresponding nonlinear Galerkin condition in homogeneous coordinates as Finally, we discuss a bound for the chordal metric, analogous to (16). For an SPD B, as a generalization to [1,Thm. 4.5.1], one can show that there exists a generalized eigenvalue λ = λ(A, B) of (A, B) such that |λ − α| ≤ Indeed, if α is not an eigenvalue of (A, B), Therefore, for the chordal distance between λ and α we have This upper bound is minimized for α equal to the generalized homogeneous Rayleigh quotient. We notice that this inequality differs from (16) by the factor (λ 1 (B)) −1 .
In the next section, we return to the standard homogeneous Rayleigh quotient, and study its use as stepsize in gradient methods.

A homogeneous stepsize for gradient methods
In gradient methods for nonlinear optimization, inverse Rayleigh quotients are popular choices for stepsizes. We refer to [16] for a nice recent review about steplength selection. We exploit the inverse of the homogeneous Rayleigh quotient as a new homogeneous stepsize for gradient-type methods. Consider the unconstrained minimization of a smooth function f : R n → R, min x∈R n f (x). Gradient methods are of the form where g k = ∇f (x k ), β k > 0 is the stepsize, and α k the inverse stepsize. As usual we write s k−1 = x k − x k−1 and y k−1 = g k − g k−1 .
In [5], the Barzilai-Borwein stepsizes have been introduced. The motivation is that, when we approximate the Hessian in x k−1 by a scalar multiple of the identity, i.e., ∇ 2 f (x k−1 ) ≈ γ I, the corresponding inverse stepsizes satisfy the following least squares secant conditions: Moreover, both stepsizes (29) can be seen as inverse Rayleigh quotients of a certain matrix H k at s k−1 . In fact, for any f it holds dt is an average Hessian on the line piece between x k−1 and x k . From this relation, it is easy to see that the minimum residual conditions (3) and (7) are equivalent to the secant conditions (30) for u = s k−1 and A = H k .
We introduce the homogeneous BB stepsize (HBB) as the inverse of the homogeneous Rayleigh quotient of H k in s k−1 . The HBB stepsize is given by the quotient β HBB k = α 2,k /α 1,k , where the pair (α 1,k , α 2,k ) solves As in (30), this is equivalent to the minimum residual condition (8) for u = s k−1 and A = H k . Therefore (cf. Proposition 1) (31) As mentioned at the end of Section 1, this stepsize has been proposed very recently and independently in [12], obtained from a total least squares secant condition. In this section, we will also introduce an alternating variant of this HBB step.
In Section 5.1 we will carry out some experiments to test the behavior of the HBB stepsize when plugged into a gradient method for general differentiable functions. A pseudocode of this method is provided in Algorithm 1.
This algorithm is similar to the one proposed in [16,17], with the homogeneous stepsize (31) as the key difference in Line 9. In presence of an uphill direction, i.e., when s T k y k < 0, the homogeneous stepsize is negative (cf. Proposition 1 (vi)), and therefore must be replaced with a positive quantity. We choose the new stepsize as in [17] (cf. Line 8 of Algorithm 1). The convergence of the method is not affected by these choices, since β k stays uniformly bounded, i.e., β k ∈ [β min , β max ] for all k. Therefore, the proof of global convergence of Algorithm 1 can be easily adapted from [17,Thm. 2.1]. While the convergence is not affected, choosing the homogeneous stepsize as starting steplength in the nonmonotone line search might lead to a smaller number of backtracking steps, compared to classical BB stepsizes.
We finally remark that, as for the BB stepsizes, no line search is required for HBB steps when the function f is strictly convex quadratic, i.e., f (x) = 1 2 x T Ax−b T x, with A SPD (see the results in [18,19] Algorithm 1 A homogeneous gradient method for minimization of general functions Input: Continuous differentiable function f , initial guess x 0 , initial stepsize β 0 > 0, tolerance tol; safeguarding parameters β max > β min > 0; line search parameters c ls , σ ls ∈ (0, 1); memory integer M > 0 Output: Approximation to minimizer argmin x f (x) 1: Set g 0 = ∇f (x 0 ) for k = 0, 1, . . .

2:
Set s k = −ν k g k and update x k+1 = x k + s k

Numerical experiments
This subsection is devoted to testing the use of the homogeneous stepsize HBB, and an adaptive variant in Algorithm 1, on a set of unconstrained optimization problems. We take some general continuously differentiable functions and the suggested starting points therein from the collection in [17,21,22], as listed in Table 2. For all the test functions, the size can be n ∈ {10 2 , 10 3 , 10 4 }. The generalized Rosenbrock, generalized White and Holst and extended Powell objective functions have been scaled by the Euclidean norm of the first gradient.
As for the parameters of Algorithm 1, we set the choices β min = 10 −30 , β max = 10 30 , c ls = 10 −4 , σ ls = 1 2 , M = 10, and β 0 = 1. The algorithm stops when g k ≤ tol · g 0 , with tol = 10 −6 , or when 5 · 10 4 iterations are reached. All different steps in Table 3 are tested, meaning that we change the stepsize choice in Line 9 of Algorithm 1 and obtain several gradient methods. Along with the homogeneous stepsize (HBB), BB1 and BB2, we also consider the adaptive ABB method [23]: (32) Default parameters values are η = 0.8 and m = 5; see, e.g., [16]. This stepsize is mostly known as ABB min , while ABB indicates the case for m = 0; we choose to indicate it as ABB for any m to ease the notation. Inspired by this, we also propose a straightforward generalization of our HBB step: the adaptive HBB method (AHBB), which takes the stepsize Since the stopping criterion is based on the gradient norm, and s k = −ν k g k (cf. Line 4 in Algorithm 1), the homogeneous stepsize requires the computation of the two inner products y T k y k and g T k y k , and therefore it has the same cost as BB2. In general, all the studied stepsizes in Table 3 have very similar cost; BB1 is slightly cheaper, since it does not require the inner product y T k y k . Table 4 reports the number of function evaluations and the number of iterations for each problem and stepsize. We remark that, for each problem, all methods converged to the same stationary point. It seems that either HBB or AHBB can be competitive or better than ABB, which is well known for its generally good behavior. For example, HBB behaves nicely in prob-lems FH1 and FH2; AHBB is the best method for the generalized White and Holst function and perturbed quadratic (n = 10 4 ). the homogeneous Rayleigh quotient seems to leverage the two Rayleigh quotients: it is less sensitive than the harmonic Rayleigh quotient when estimating the smallest eigenvalues, and less sensitive than the standard Rayleigh quotient when estimating the largest eigenvalues (in magnitude).
We have derived a nonlinear Galerkin condition for the homogeneous Rayleigh quotient, in contrast to the linear Galerkin conditions for the standard and the harmonic Rayleigh quotient. All the results extend to the homogeneous Rayleigh quotient for the generalized eigenvalue problem (A, B), when B is SPD and A is symmetric.
Finally, we have considered the homogeneous Rayleigh quotient as inverse stepsize (HBB) in gradient methods for unconstrained optimization problems. This has independently been obtained very recently from a different angle in [12]. We have also proposed the AHBB steplength as an alternative to the ABB stepsize [23], based on the homogeneous stepsize. Experiments show that this variant sometimes performs better than the classical BB steplengths and ABB.