A New Method Based on Locally Optimal Step Length in Accelerated Gradient Descent for Quantum State Tomography

Quantum state tomography (QST) is one of the key steps in determining the state of the quantum system, which is essential for understanding and controlling it. With statistical data from measurements and Positive Operator-Valued Measures (POVMs), the goal of QST is to find a density operator that best fits the measurement data. Several optimization-based methods have been proposed for QST, and one of the most successful approaches is based on Accelerated Gradient Descent (AGD) with fixed step length. While AGD with fixed step size is easy to implement, it is computationally inefficient when the computational time required to calculate the gradient is high. In this paper, we propose a new optimal method for step-length adaptation, which results in a much faster version of AGD for QST. Numerical results confirm that the proposed method is much more time-efficient than other similar methods due to the optimized step size.


Introduction
Quantum physics emerged from Albert Einstein's efforts to explain the "Photoelectric effect", which suggested that light can behave like a particle.Other scientists explored the alternative idea that particles such as electrons can behave like waves [1], and this wave-like behavior of particles was later mathematically formulated by Erwin Schrödinger.Schrödinger's equation provides a theoretical foundation for quantum mechanics, but when it comes to making measurements and interpreting experimental data, statistical tools are essential.One of these tools is quantum state tomography (QST), which is briefly explained in the next section.
In quantum computing and quantum information theory, QST is one crucial step in determining the state of a quantum system, which is essential for understanding and controlling a given quantum system.It uses many identical particles, each measured in a slightly different way.By piecing together these measurements, scientists can build a picture of the original quantum state.In QST, projections are the results of measurements on the quantum system, expressed as probabilities and expectation values.
It is worth noting that QST is a general concept that can be applied to any quantum system, including digital quantum computers [2] and analog quantum simulators (computers) [3].In [3], a tomography approach (described in terms of Positive Operator-Valued Measures (POVMs) formalism) that is implementable on analog quantum simulators, including ultra-cold atoms, is proposed.Additionally, since quantum sensing and imaging technologies have a lot of exciting applications in optical measurements, using entanglement with applications in quantum lithography [4], one of the applications of QST is in quantum sensing [5].
The QST problem can be formulated as a smooth optimization problem, and as a result, gradient-based methods [6] can be used to solve the problem.If we denote the value Sensors 2024, 24, 5464 2 of 21 of the objective function of the optimization (minimization) problem at point x by f (x), the gradient descent uses x k+1 = x k − ε∇ f (x k ), where ∇ f (x k ) is the gradient of f at x k , which is a vector consisting of the first-order partial derivatives of f with respect to the decision variables, and ε is a small number and represents the step size.However, while gradient-based methods are easy to implement and highly scalable, they are very slow to converge.
Another approach is to generalize the iteration procedure by using x k+1 = x k − εD∇ f (x k ), where D is a square matrix.Note that if we set D to be equal to the identity matrix then we will arrive at x k+1 = x k − ε∇ f (x), as in gradient descent.
If we set D = ∇ 2 f (x) −1 , then we will have the Newton method [6], which is very fast to converge in terms of the number of iterations, but ∇ 2 f (x) −1 is not only challenging and time-consuming to compute, but also requires a lot of memory (RAM).Now, one solution is to use less idealistic but more practical choices of D. For instance, L-BFGS [7] tries to directly approximate the vector (∇ 2 f (x)) −1 ∇ f (x) using a limited number of previous gradients stored in RAM.However, another class of methods that are less demanding in terms of memory (RAM) usage are Accelerated Gradient Descent (AGD) methods [8][9][10], which are based on running two iterative procedures simultaneously.The advantages of this method are its ease of implementation, relatively fast convergence, and being less memory-intensive compared to L-BFGS.
To solve the QST problem, a solver based on Accelerated Gradient Descent (AGD) followed by the singular value decomposition (SVD) projection step is proposed in [2].A MATLAB implementation is available on GitHub [11].In [12], compressed sensing (CS) is proposed for QST.In [13], AGD is applied again; SVD projection is bypassed by introducing a non-convex programming formulation of the QST problem, and the Python code is made available [14].A neural network-based method is presented in [15], and the code is available on GitHub [16].In [17], a combination of CS and projected least squares (PLS) is proposed, and the code is available in [18].In [19], attention mechanisms are used in neural networks on informationally complete POVMs.For digital quantum computers, the complexity of tomography scales exponentially with the number of qubits.To address the challenge of exponential scaling, in [20], POVMs are approximated with low-rank approximation (rank-1 projectors of the K = 6 eigenstates of the Pauli matrices), which is similar to the approach that we use for generating simulated data to validate the proposed method.The authors in [5] performed QST on pure tomography sensing using singular value decomposition (SVD) techniques for a 2-qubit system to come up with a robust method.
In this paper, we propose a new method based on the modification of AGD for a non-convex programming formulation.In our approach, one of two step sizes is adaptively chosen which increases the speed of convergence, as demonstrated in the numerical evaluations carried out.A comparison of similarities and differences between the proposed method and the top two most similar methods is presented in Table 1.The proposed method combines Momentum-inspired Factorized Gradient Descent (MiFGD) with an adaptive step length.The novel contribution of the proposed method is the introduction of a closed-form solution for locally optimal step length in AGD, significantly enhancing the convergence speed of the original MiFGD.

Introduction to Quantum Tomography
Quantum physics began with Albert Einstein's attempts to explain the "Photoelectric effect".Einstein's theory suggested that the energy carried by each single energy packet (quantum) of light can be computed as follows: where f is the frequency of the light and h is Planck's constant.In 1924, Louis de Broglie suggested that particles such as electrons can behave like waves [1], and the equation for such wavy behavior was then proposed by Erwin Schrödinger: for matter particles with mass m, where V(x, t) is the potential.Indeed, physical quantities that physicists are interested in measuring, such as position, energy, momentum, and spin, are represented by Hermitian operators (acting on a Hilbert space), which are called observables.Since these operators are Hermitian, their eigenvalues are real, corresponding to the possible measurement outcomes of their corresponding observable.For example, the eigenvalues of the Hamiltonian operator Ĥ = − h 2 2m ∂ 2 ∂x 2 + V(x, t) are real numbers that correspond to the energy of the particle.
Indeed, it can be shown that if E n is an eigenvalue with the corresponding eigenfunction It can be shown that with some assumptions in Ket-Bra notation, we have the following: Also, the general solution of (2) is the following: where C 1 , C 2 , C 3 , . . . .are complex numbers.Now, it is easy to see that It is assumed that Max Born (1882-1970) suggested that the set of possible outcomes is exclusively restricted to Ψ 1 (x, t), Ψ 2 (x, t), Ψ 3 (x, t), . . .but with different probabilities.These probabilities are proportional to Equation (7) means that P m projects |Ψ⟩ onto |Ψ m ⟩, and for this reason P m is called a projector.
Another property which immediately follows from ( 5) and ( 7) is the following: which says that ∑ ∞ n=0 P n = I, where I is the identity operator.
Also, note that P n is a positive semi-definite (PSD) matrix.Now, if we define ϱ = |Ψ × Ψ|, then it is obvious that ϱ is PSD and As a result, ϱ ij = C i C * j , and therefore trace(ϱ)= |C 1 | 2 + |C 2 | 2 + |C 3 | 2 + . . .= 1.Now, if y m denotes the probability of outcome m, then For this reason, ρ is called a density operator.The projection operators P 1 , P 2 , P 3 , . . .can be generalized to PSD matrices M 1 , M 2 , M 3 , . .., such that ∑ ∞ n=0 M n = I.In this case, they are referred to as POVMs.Now, given the statistical data y 1 , . . ., y m and the POVMs M 1 , M 2 , M 3 , . .., the goal is to find a PSD density operator ϱ that best fits the measurement data y 1 , . . ., y m .
In other words, we want to solve the following optimization problem: Optimization ( 11) is called quantum state tomography (QST) [2].In quantum computing and quantum information theory, quantum state tomography (QST) is one of the key steps in determining the state of the quantum system, which is crucial as this is key to understanding and controlling the quantum system.The term "tomography" comes from the ancient Greek words "tomos" ("slice" or "section") and "grapho" ("to draw" or "to write") [21].Regular X-ray tomography (such as CT scans in medicine) performs a 3D reconstruction of an object (which cannot be directly seen) by piecing together 2D X-ray images (measurements) taken from different angles or sections to build up a 3D picture of the original unknown object in the body.QST, to a large extent, is similar, but instead of 2D X-ray images (2D X-ray projections) taken from different angles or perspectives, QST uses multiple measurements on many identical copies of the same quantum system, each measured in a slightly different way (by projecting the state onto various bases).The outcomes of these measurements are gathered as statistical data.Again, like X-ray tomography, in which complex algorithms are used to process 2D X-ray data to reconstruct a 3D image in QST, mathematical optimization based on linear algebra, like (11), is used to reconstruct the density matrix ϱ (which fully describes the quantum state).Having gathered the statistical data y 1 , . . ., y m , what these optimization techniques (for instance, least squares fitting) do is to fit a model (the density matrix ϱ) that best fits the measurement data y 1 , . . ., y m .Here, we must deal with statistical noise from quantum measurements and reconstruct the most probable quantum state, despite measurement imperfections.The main difference here is that while 3D reconstruction in X-ray tomography can be undertaken via a series of measurements on the same (classical) object, in the case of a single quantum particle, measurement perturbs the state of the quantum particle, often making its further investigation uninformative [21].We use a source that creates many identical particles in the same unknown quantum state.So, tomography uses many identical particles, each measured in a slightly different way.By piecing together these measurements, scientists can build a picture of the original quantum state.

Proposed Adaptive Method for QST
As problem (11) does not have a closed-form solution, we need to use an iterative approach to solve it.
In [13], problem (11) is formulated as follows: in which the matrix U is assumed to be low rank.Also, ∥U∥ is the Frobenius norm of U, which means that Then, an Accelerated Gradient Descent (AGD) method called MiFGD is proposed [8], as follows: in which η and µ are two fixed step sizes.
In this paper, we present a method for finding a closed-form solution for the greedy optimal choice of η, which means choosing η in a way that results in the biggest decrease in the objective function in that iteration.A description of the proposed method is given below and in Figure 1.
Here, it is worth mentioning that we keep the other step size µ constant, which is set to µ = 0.95 inourexperiments.
For this reason, in two consecutive steps, there is a possibility that while η is chosen to result in a lower error in the next iteration, the step size µ will remain out of our control and may push the iterative approach away from what we hope to see for the next run.However, the method still performs much better than MiFGD overall.This is something that we will verify in our experiments (see Section 4).
Let us start with rewriting (15) as follows: Now, if we plug in U k+1 from (14), we arrive at the following: in which and As we want Z k+1 to be normalized, we need to minimize It can be obtained that Sensors 2024, 24, 5464 Similarly, we have Now, if we define vectors A, B, C and Y to be vectors whose i'th entries are trace and y i , respectively, then from ( 20) and ( 23), we have It is worth noting that while A, B, C and Y are vectors, A T C, B T B, B T C and C T C are all real numbers independent of η.Now, from ( 22) and ( 24), we can conclude that we have to minimize the following 4th-order polynomial with respect to η: This attains a minimum where the derivative with respect to η is zero: Now, by solving (26), which is a humble cubic equation with respect to η, the optimal step size for the kth step is obtained.
Note that computing A, B, and C at each step is the most time-consuming operation, which makes the proposed method almost three times slower at each iteration compared to MiFGD, but as we choose a much more optimized step size, our approach becomes faster overall.The numerical results are presented in Section 5.

Numerical Results
While what we have explained in the previous section applies to general quantum systems, in this section, we apply it to multi-qubit spin-half systems.
Suppose that we have a particle that only has two states, and the state of the particle has two entries: where α and β are two complex numbers and || 2 and || 2 are the probabilities of being at each of the two states.The discretization of the Hamiltonian is a 2 by 2 matrix.On the other hand, since the Hamiltonian as an operator is expected to be Hermitian, which guarantees that the eigenvalues are real (and as a result, they correspond to the possible

Numerical Results
While what we have explained in the previous section applies to general quantum systems, in this section, we apply it to multi-qubit spin-half systems.
Suppose that we have a particle that only has two states, and the state of the particle has two entries: where α and β are two complex numbers and |α| 2 and |β| 2 are the probabilities of being at each of the two states.The discretization of the Hamiltonian is a 2 by 2 matrix.On the other hand, since the Hamiltonian as an operator is expected to be Hermitian, which guarantees that the eigenvalues are real (and as a result, they correspond to the possible measurement in which which are called Pauli matrices.
Additionally, the following matrices can be defined: These satisfy the following properties: The above three equalities remind physicists of angular momentum.For this reason, they relate them to the spin system.They call the system "spin-1/2" because of the coefficient h 2 (in Equation ( 33)), which is the component of angular momentum.These spin-1/2 systems are interesting since they have two states and are the building blocks for qubits (in digital quantum computers).Now, if we assume that, for example, the density operator in the case of a single qubit is and from (10) we can calculate r x from our measurement data.Similarly, we can calculate from our measurement data the other coefficients, r y and r z , which allows us to identify the density matrix ϱ.This is the simplest form of QST.But for more complicated cases (multi-qubit systems), more has to be undertaken.
It is easy to see that Here, 1 0 is called spin-up and is denoted by |↑ ⟩.Therefore, Equations (39) and (40) are sometimes expressed as follows: which indicates that the eigenstates of σ 3 are |↑ ⟩ and |↓ ⟩, with the corresponding eigen- values h 2 and − h 2 , respectively.Also, it can be seen that the following two vectors are the eigenvectors for σ 1 : and the following two are the eigenvectors for σ 2 : Furthermore, the real three-dimensional space that we live in relates to the twodimensional complex vector space within which a qubit sits by rewriting Equation (27) as follows.Suppose that where |α| 2 + |β| 2 = 1.Using polar coordinates, we have But since r 2 1 + r 2 2 = 1, we can write r 1 = cos θ 2 , r 2 = sin θ 2 , and therefore And if we ignore e iϕ 1 , we end up with the following: which is called the Bloch sphere representation of the quantum state.The Bloch sphere representation ( 50) is important as it helps us to figure out the state of a qubit on the surface of a unit sphere in real 3D space rather than complex 2D space.Now, in this coordinate system state, "z" (when θ = 0) is equivalent to |↑ ⟩ and "−z" (when θ = π, ϕ = 0) is equivalent to |↓ ⟩, which means that while in complex vector space |↑ ⟩ and |↓ ⟩ are orthogonal states, once we represent them on the Block sphere they are antipodal states that point in opposite directions.Indeed, angle θ is the angle between the unit vector representing the state of a qubit in 3D space and the axis z, and angle ϕ is the angle between the projection of the unit vector on the x-y plane and the x axis.Now, θ = π in [13], the following three bases are used for QST: However, in [11], more complicated bases are introduced, as follows: where W 1 , . . ., W 6 are the rows of the following matrix: Now, again, similar to (51)-(53), we have To obtain the above formulas, remember that, for example, what we mean by , not cos(θ/2) −isin(θ/2) .
We create our data according to the procedure explained in [11].As mentioned in [2], the reason for choosing this kind of procedure is that it allows for applications in which the measurement matrices are ill conditioned [22][23][24].
To be more specific, the proposed procedure selects (with repetitions) n rows of matrix W in 6 n different ways, and for each of these choices we compute the tensor products of the selected rows.Then, we come up with s = 6 n different row vectors, A 1 , . . . ,A s , each of which with d = 2 n entries.Now, for an n-qubit system, the POVMs are . ., M s = A T s A s .Also, a random PSD matrix ϱ is created and the simulated generated data are constructed by the relation y i = trace(ϱM i ) + ε i , where ε i is a Gaussian noise term.Now, given the POVMs and the simulated measurement data y 1 , . . ., y s , the QST algorithm should be able to find a PSD matrix ϱ such that y i ≈ trace(ϱM i ), for which we solve (12) using the proposed method.As our simulated data are constructed by y i = trace(ϱM i ) + ε i , we expect that the optimal value of the problem (12) will be ∑ i ∥ε i ∥ 2 , which is around 0.003 in all our experiments.So, we can approximately measure the optimality gap.
As an illustrative example of how POVMs are created, let us take n = 3.Also, suppose that θ = π 2 .Therefore, the matrix W is as follows: It is easy to check that (61)-( 63) are verified.
Now, we have s = 6 3 different ways to select (with repetitions) 3 rows of matrix W, which we denote by the following 216 rows: Then, for example, for the 9th row, which is [1 2 3], we have to pick the corresponding rows in W, which are Now, as the tensor product of two row vectors is defined as and therefore In a similar way, A 1 , . . ., A 216 are constructed.Now, for a 3-qubit system, the POVMs are A 216 , consisting of 216 matrices of dimensions of 8 by 8.
We apply the proposed method on a n-qubit system, where in our case n = 6, 7, 8 each with θ = π 3 or θ = π 2 and rank(U) = 10 or rank(U) = d.As a result, we will report the comparison of our method with MiFGD for 12 different configurations.
To be more specific, for instance, in the first configuration (as seen in Figure 2), after 3480 iterations and spending 318 s, MiFGD reaches a normalized error of 0.0082, while our method after just 61 iterations in 15 s reaches a normalized error of 0.0080.
As can be seen, the proposed method takes 0.24 s per iteration, while MiFGD takes 0.09 s, which means that the proposed method is 2.7 times slower per iteration, but overall much faster thanks to a very well-optimized step length.
In the case of n = 7, for the first configuration, MiFGD performed 3229 iterati 3488 s, which is approximately 1 s per iteration, while the proposed method comple iterations in 225 s, which is 3 s per iteration.However, overall, it performed much as can be seen from Figures 6-9.Indeed, in the proposed method, in each step, we h perform three expensive operations to calculate �    †   �, �    †          For n = 8, in the first configuration, MiFGD took 617.5 s for each iteration wh proposed method took 1596 s per iteration.This means the proposed algorit approximately 2.6 times slower per iteration, as it performs more calculations to av     , and rank = 10.In the first iterations, the pro method does not seem to be progressing, but as time passes it starts to outperform the basic M In the first iterations, the proposed method does not seem to be progressing, but as time passes it starts to outperform the basic MiFGD.
Another interesting thing is that while the proposed method is supposed to choose the optimal step size, we can see some oscillation noticeably at the beginning.The reason for the oscillation is that, in equations 14) and ( 15)), two different step lengths, η and µ, are involved.What we have undertaken so far in the proposed method is finding the optimal η, while the other one is fixed ( µ = 0.95).Indeed, the computation of Z k+1 will be influenced not only by U k+1 , but also by U k from the previous steps.Intuitively, µ determines to what extent the method "remembers" the past iterations (for instance, µ = 0 means no "memories" from the past iterations).As a result, while the random starting point might be very far from optimal, it gets reflected in the "memory" of the method and it takes a while before it fades away and gets overshadowed by the accumulation of new memories.As a result, we cannot expect a strictly decreasing error.This issue is something that we will address shortly.In all the previous experiments, we considered  = 0.95.The main reason was that this keeps the number of expensive operations small, but we could have merged (17)   In all the previous experiments, we considered µ = 0.95.The main reason was that this keeps the number of expensive operations small, but we could have merged ( 17) and ( 19) as follows: k M i , and by having a fixed µ, we can quickly (without much calculation) compute k M i , and then we can calculate the linear combinations of trace Z k M i and plug them into (23) to find the optimal step length η, as before.This means that by increasing the number of expensive operations per iteration from 3 to 6, we will be able to use grid search (for instance, varying µ from 0.01 to 0.99 by increments of 0.01) to adaptively obtain a good choice of µ for every iteration.Below, we compare this new version of adaptively selecting µ versus fixed µ (µ = 0.95) for the case of n = 8.
As can be seen from the Figures 14-17, with this slight modification, we no longer have the oscillatory behavior, and the convergence is slightly faster.Also, note that the computational cost per iteration for the case of adaptive µ is two times the cost for the case of fixed µ, resulting in faster convergence.However, still, it seems that η has the main crucial role compared to µ.
Sensors 2024, 24, x FOR PEER REVIEW 19 of 22 computational cost per iteration for the case of adaptive  is two times the cost for the case of fixed , resulting in faster convergence.However, still, it seems that  has the main crucial role compared to ., and rank = d.As can be seen, the adaptivity of  helps the new method not to oscillate., and rank = 10.As can be seen, the adaptivity of  results in a better performance.

Conclusions
In this paper, we proposed a novel method for achieving a closed-form solution for , and rank = d.As can be seen, the adaptivity of  helps the new method not to oscillate., and rank = 10.As can be seen, the adaptivity of  results in a better performance.

Conclusions
In this paper, we proposed a novel method for achieving a closed-form solution for optimal step length in an AGD approach to QST, leading to significantly improved

Conclusions
In this paper, we proposed a novel method for achieving a closed-form solution for optimal step length in an AGD approach to QST, leading to significantly improved performance compared to existing methods.Although the proposed method is three times slower per iteration, the numerical results demonstrate that it is more than ten times faster overall due to the educated step size that is chosen in each iteration.The key innovation of our method lies in the adaptive selection of the step length η, which ensures that each iteration leads to the maximum possible reduction in the objective function.This approach leverages the current state information to make more informed adjustments, thereby enhancing the overall efficiency of the algorithm.Looking ahead, this adaptive step-size strategy opens up several promising avenues for future research.One potential direction is to integrate similar adaptive step-size methods into more sophisticated gradient-based algorithms where the curvature information is better encoded in the iterative approach.

Figure 1 .
Figure 1.The proposed method: AGD with optimal step length .

Figure 1 .
Figure 1.The proposed method: AGD with optimal step length η.

Figure 11 .
Figure 11.Numerical result for n = 8, θ = π 2 , and rank = 10.In the first iterations, the proposed method does not seem to be progressing, but as time passes it starts to outperform the basic MiFGD.

Figure 13 . 3 ,
Figure 13.Numerical result for n = 8,  =  3, and rank = 10.Despite instability at the beginning, the method does a better job compared to the basic MiFGD.

Figure 13 .
Figure 13.Numerical result for n = 8, θ = π 3 , and rank = 10.Despite instability at the beginning, the method does a better job compared to the basic MiFGD.

Figure 14 . 3 ,
Figure 14.Numerical comparison of the case where  is fixed versus.the case of adaptively choosing  for n = 8,  =  3 , and rank = d.As can be seen, the new version does not suffer from instability at the beginning.

Figure 14 .
Figure 14.Numerical comparison of the case where µ is fixed versus.the case of adaptively choosing µ for n = 8, θ = π 3 , and rank = d.As can be seen, the new version does not suffer from instability at the beginning.

Figure 14 . 3 ,
Figure 14.Numerical comparison of the case where  is fixed versus.the case of adaptively choosing  for n = 8,  =  3, and rank = d.As can be seen, the new version does not suffer from instability at the beginning.

Figure 15 . 3 ,
Figure 15.Numerical comparison of the case where  is fixed versus.the case of adaptively choosing  for n = 8,  =  3 , and rank = 10.As can be seen, the new version does not oscillate, in contrast to the previous version.

Figure 15 . 22 Figure 16 .
Figure 15.Numerical comparison of the case where µ is fixed versus.the case of adaptively choosing µ for n = 8, θ = π 3 , and rank = 10.As can be seen, the new version does not oscillate, in contrast to the previous version.Sensors 2024, 24, x FOR PEER REVIEW 20 of 22

Figure 17 .
Figure 17.Numerical comparison of the case where  is fixed versus.the case of adaptively choosing  for n = 8,  =  2

Figure 16 . 22 Figure 16 .
Figure 16.Numerical comparison of the case where µ is fixed versus.the case of adaptively choosing µ for n = 8, θ = π 2 , and rank = d.As can be seen, the adaptivity of µ helps the new method not to oscillate.

Figure 17 .
Figure 17.Numerical comparison of the case where  is fixed versus.the case of adaptively choosing  for n = 8,  =  2

Figure 17 .
Figure 17.Numerical comparison of the case where µ is fixed versus.the case of adaptively choosing µ for n = 8, θ = π 2 , and rank = 10.As can be seen, the adaptivity of µ results in a better performance.

Table 1 .
Comparison of AGD-based methods for QST.
As a result, both H 11 and H 22 should be real numbers, and H 21 should be the conjugate of H 12 .After straightforward calculation, physicists arrive at the following as the general formula for the Hamiltonian in the aforementioned binary state.
and (19) as follows:  +1 = (1 + )  −   +   .This way, by having computed �    †   �, �    †   �, �    †   �, �    †   �, �    †   � and �    †   � , and by having a fixed  , we can quickly (without much calculation) compute   = (1 + )  −   and then compute �    †   �, �    †   � and �    †   � , and then we can calculate the linear combinations of �    †   �, �    †   �, �    †   �, �    †   �, �    †   � and �   †   � and plug them into (23) to find the optimal step length , as before.This means that by increasing the number of expensive operations per iteration from 3 to 6, we will be able to use grid search (for instance, varying  from 0.01 to 0.99 by increments of 0.01) to adaptively obtain a good choice of  for every iteration.Below, we compare this new version of adaptively selecting  versus fixed  ( = 0.95) for the case of n = 8.As can be seen from the Figures 14-17, with this slight modification, we no longer have the oscillatory behavior, and the convergence is slightly faster.Also, note that the