Optimal Combination of Tensor Optimization Methods

Kamzolov, Dmitry; Gasnikov, Alexander; Dvurechensky, Pavel

doi:10.1007/978-3-030-62867-3_13

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12422))

Included in the following conference series:

International Conference on Optimization and Applications

456 Accesses
7 Citations

Abstract

We consider the minimization problem of a sum of a number of functions having Lipshitz p-th order derivatives with different Lipschitz constants. In this case, to accelerate optimization, we propose a general framework allowing to obtain near-optimal oracle complexity for each function in the sum separately, meaning, in particular, that the oracle for a function with lower Lipschitz constant is called a smaller number of times. As a building block, we extend the current theory of tensor methods and show how to generalize near-optimal tensor methods to work with inexact tensor step. Further, we investigate the situation when the functions in the sum have Lipschitz derivatives of a different order. For this situation, we propose a generic way to separate the oracle complexity between the parts of the sum. Our method is not optimal, which leads to an open problem of the optimal combination of oracles of a different order.

The work of D. Kamzolov in Sects. 1–4 is funded by RFBR, project number 19-31-27001. The work of A. Gasnikov and P. Dvurechensky in Sects. 1–4 of the paper is supported by RFBR grant 18-29-03071 mk. The work in Sect. 5 is supported by the Ministry of Science and Higher Education of the Russian Federation (Goszadaniye) No. 075-00337-20-03, project No. 0714-2020-0005.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Agarwal, N., Hazan, E.: Lower bounds for higher-order convex optimization. In: Conference On Learning Theory. PMLR (2018)
Google Scholar
Arjevani, Y., Shamir, O., Shiff, R.: Oracle complexity of second-order methods for smooth convex optimization. Math. Program. 178(1–2), 327–360 (2019)
Article MathSciNet Google Scholar
Beznosikov, A., Gorbunov, E., Gasnikov, A.: Derivative-free method for decentralized distributed non-smooth optimization. arXiv preprint arXiv:1911.10645 (2019)
Bubeck, S., Jiang, Q., Lee, Y.T., Li, Y., Sidford, A.: Near-optimal method for highly smooth convex optimization. In: Conference on Learning Theory, pp. 492–507 (2019)
Google Scholar
Chebyshev, P.: Collected Works, vol. 5. Strelbytskyy Multimedia Publishing, Kyiv (2018)
MATH Google Scholar
Doikov, N., Nesterov, Y.: Local convergence of tensor methods. arXiv preprint arXiv:1912.02516 (2019)
Doikov, N., Nesterov, Y.: Minimizing uniformly convex functions by cubic regularization of newton method. arXiv preprint arXiv:1905.02671 (2019)
Doikov, N., Richtárik, P.: Randomized block cubic newton method. In: International Conference on Machine Learning, pp. 1290–1298 (2018)
Google Scholar
Dvinskikh, D., Gasnikov, A.: Decentralized and parallelized primal and dual accelerated methods for stochastic convex programming problems. arXiv preprint arXiv:1904.09015 (2019)
Dvinskikh, D., Omelchenko, S., Tiurin, A., Gasnikov, A.: Accelerated gradient sliding and variance reduction. arXiv preprint arXiv:1912.11632 (2019)
Dvurechensky, P., Gasnikov, A., Ostroukhov, P., Uribe, C.A., Ivanova, A.: Near-optimal tensor methods for minimizing the gradient norm of convex function. arXiv preprint arXiv:1912.03381 (2019)
Gasnikov, A., Dvurechensky, P., Gorbunov, E., Vorontsova, E., Selikhanovych, D., Uribe, C.A.: Optimal tensor methods in smooth convex and uniformly convex optimization. In: Conference on Learning Theory, pp. 1374–1391 (2019)
Google Scholar
Gasnikov, A., et al.: Near optimal methods for minimizing convex functions with lipschitz $p$-th derivatives. In: Conference on Learning Theory, pp. 1392–1393 (2019)
Google Scholar
Gorbunov, E., Dvinskikh, D., Gasnikov, A.: Optimal decentralized distributed algorithms for stochastic convex optimization. arXiv preprint arXiv:1911.07363 (2019)
Grapiglia, G.N., Nesterov, Y.: On inexact solution of auxiliary problems in tensor methods for convex optimization. arXiv preprint arXiv:1907.13023 (2019)
Grapiglia, G.N., Nesterov, Y.: Tensor methods for minimizing functions with Hölder continuous higher-order derivatives. arXiv preprint arXiv:1904.12559 (2019)
Jiang, B., Wang, H., Zhang, S.: An optimal high-order tensor method for convex optimization. In: Conference on Learning Theory, pp. 1799–1801 (2019)
Google Scholar
Kantorovich, L.V.: On Newton’s method. Trudy Matematicheskogo Instituta imeni VA Steklova 28, 104–144 (1949)
MATH Google Scholar
Lan, G.: Lectures on optimization. Methods for machine learning. H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA (2019)
Google Scholar
Lan, G.: Gradient sliding for composite optimization. Math. Program. 159(1–2), 201–235 (2016)
Article MathSciNet Google Scholar
Lan, G., Lee, S., Zhou, Y.: Communication-efficient algorithms for decentralized and stochastic optimization. Math. Program. 180(1), 237–284 (2018). https://doi.org/10.1007/s10107-018-1355-4
Article MathSciNet MATH Google Scholar
Lan, G., Ouyang, Y.: Accelerated gradient sliding for structured convex optimization. arXiv preprint arXiv:1609.04905 (2016)
Monteiro, R.D., Svaiter, B.F.: An accelerated hybrid proximal extragradient method for convex optimization and its implications to second-order methods. SIAM J. Optim. 23(2), 1092–1125 (2013)
Article MathSciNet Google Scholar
Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. 140(1), 125–161 (2013)
Article MathSciNet Google Scholar
Nesterov, Y.: Lectures on Convex Optimization. SOIA, vol. 137. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91578-4
Book MATH Google Scholar
Nesterov, Y.: Implementable tensor methods in unconstrained convex optimization. Math. Program., 1–27 (2019). https://doi.org/10.1007/s10107-019-01449-1
Rogozin, A., Gasnikov, A.: Projected gradient method for decentralized optimization over time-varying networks. arXiv preprint arXiv:1911.08527 (2019)
Song, C., Ma, Y.: Towards unified acceleration of high-order algorithms under Hölder continuity and uniform convexity. arXiv preprint arXiv:1906.00582 (2019)

Download references

Acknowledgements

We would like to thank Yu. Nesterov for fruitful discussions on inexact solution of tensor subproblem.

Author information

Authors and Affiliations

Moscow Institute of Physics and Technology, Moscow, Russia
Dmitry Kamzolov & Alexander Gasnikov
Weierstrass Institute for Applied Analysis and Stochastics, Berlin, Germany
Pavel Dvurechensky
Institute for Information Transmission Problems RAS, Moscow, Russia
Alexander Gasnikov & Pavel Dvurechensky

Authors

Dmitry Kamzolov
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Gasnikov
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Dvurechensky
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dmitry Kamzolov .

Editor information

Editors and Affiliations

Dorodnicyn Computing Centre, FRC CSC RAS, Moscow, Russia
Nicholas Olenev
Dorodnicyn Computing Centre, FRC CSC RAS, Moscow, Russia
Yuri Evtushenko
Krasovsky Institute of Mathematics and Mechanics, Ekaterinburg, Russia
Michael Khachay
Dorodnicyn Computing Centre, FRC CSC RAS, Moscow, Russia
Vlasta Malkova

Appendices

A Proof of Composite Accelerated Taylor Descent

This section is a rewriting of proof from [4], with adding composite part into the proof. Next theorem based on Theorem 2.1 from [4]

Theorem 6

Let $(y_k)_{k \ge 1}$ be a sequence of points in $\mathbb {R}^d$ and $(\lambda _k)_{k \ge 1}$ a sequence in $\mathbb {R}_+$. Define $(a_k)_{k \ge 1}$ such that $\lambda _k A_k = a_k^2$ where $A_k = \sum _{i=1}^k a_i$. Define also for any $k\ge 0$, $x_k = x_0 - \sum _{i=1}^k a_i (\nabla f(y_i)+g'(y_i))$ and $\tilde{x}_k := \frac{a_{k+1}}{A_{k+1}} x_{k} + \frac{A_k}{A_{k+1}} y_k$. Finally assume if for some $\sigma \in [0, 1]$

$$\begin{aligned} \Vert y_{k+1} - (\tilde{x}_k - \lambda _{k+1} \nabla f(y_{k+1}))\Vert \le \sigma \cdot \Vert y_{k+1} - \tilde{x}_k\Vert \,, \end{aligned}$$

(24)

then one has for any $x \in \mathbb {R}^d$,

$$\begin{aligned} F(y_k) - F(x) \le \frac{2 \Vert x\Vert ^2}{\left( \sum _{i=1}^k \sqrt{\lambda _i} \right) ^2} \,, \end{aligned}$$

(25)

and

$$\begin{aligned} \sum _{i=1}^k \frac{A_i}{\lambda _i} \Vert y_i - \tilde{x}_{i-1}\Vert ^2 \le \frac{\Vert x^*\Vert ^2}{1-\sigma ^2} \,. \end{aligned}$$

(26)

To prove this theorem we introduce auxiliaries lemmas based on Lemmas 2.2–2.5 and 3.1, Lemmas 2.6 and 3.3 one can take directly from [4] without any changes.

Lemma 3

Let $\psi _0(x) = \frac{1}{2} \Vert x-x_0\Vert ^2$ and define by induction $\psi _{k}(x) = \psi _{k-1}(x) + a_{k} \varOmega _1(F, y_{k}, x)$. Then $x_k =x_0 - \sum _{i=1}^k a_i (\nabla f(y_i) + g'(y_i))$ is the minimizer of $\psi _k$, and $\psi _k(x) \le A_k F(x) + \frac{1}{2} \Vert x-x_0\Vert ^2$ where $A_k = \sum _{i=1}^k a_i$.

Lemma 4

Let $(z_k)$ be a sequence such that

$$\begin{aligned} \psi _k(x_k) - A_k F(z_k) \ge 0 \,. \end{aligned}$$

(27)

Then one has for any x,

$$\begin{aligned} F(z_k) \le F(x) + \frac{\Vert x-x_0\Vert ^2}{2 A_k} \,. \end{aligned}$$

(28)

Proof

One has (recall Lemma 3):

$$ A_k F(z_k) \le \psi _k(x_k) \le \psi _k(x) \le A_k F(x) + \frac{1}{2}\Vert x-x_0\Vert ^2 \,. $$

Lemma 5

One has for any x,

$$\begin{aligned}&\psi _{k+1}(x) - A_{k+1} F(y_{k+1}) - (\psi _k(x_k) - A_k F(z_k)) \\&\ge A_{k+1} (\nabla f(y_{k+1}) +g'(y_{k+1}))\cdot \left( \frac{a_{k+1}}{A_{k+1}} x + \frac{A_k}{A_{k+1}} z_k - y_{k+1} \right) + \frac{1}{2} \Vert x -x_k\Vert ^2 \,. \end{aligned}$$

Proof

Firstly, by simple calculation we note that:

$$ \psi _k(x) = \psi _k(x_k) + \frac{1}{2} \Vert x- x_k\Vert ^2, \text { and}\ \psi _{k+1}(x) = \psi _k(x_k) + \frac{1}{2} \Vert x-x_k\Vert ^2 + a_{k+1} \varOmega _1(f, y_{k+1}, x) \,, $$

so that

$$\begin{aligned} \psi _{k+1}(x) - \psi _k(x_k) = a_{k+1} \varOmega _1(F, y_{k+1}, x) + \frac{1}{2} \Vert x-x_k\Vert ^2 \,. \end{aligned}$$

(29)

Now we want to make appear the term $A_{k+1} F(z_{k+1}) - A_k F(z_k)$ as a lower bound on the right hand side of (29) when evaluated at $x=x_{k+1}$. Using the inequality $\varOmega _1(F, y_{k+1}, z_k) \le f(z_k)$ we have:

$$\begin{aligned} a_{k+1} \varOmega _1(F, y_{k+1}, x)= & {} A_{k+1} \varOmega _1(F, y_{k+1}, x) - A_k \varOmega _1(F, y_{k+1}, x) \\= & {} A_{k+1} \varOmega _1(F, y_{k+1}, x) - A_k \nabla F(y_{k+1}) \cdot (x - z_k) - A_k \varOmega _1(F, y_{k+1}, z_k) \\= & {} A_{k+1} \varOmega _1\left( F, y_{k+1}, x - \frac{A_k}{A_{k+1}} (x - z_k) \right) - A_k \varOmega _1(F, y_{k+1}, z_k) \\\ge & {} A_{k+1} F(y_{k+1}) - A_k F(z_k)\\+ & {} A_{k+1} (\nabla f(y_{k+1})+g'(y_{k+1})) \cdot \left( \frac{a_{k+1}}{A_{k+1}} x + \frac{A_k}{A_{k+1}} z_k - y_{k+1} \right) \,, \end{aligned}$$

which concludes the proof.

Lemma 6

Denoting $\lambda _{k+1} := \frac{a_{k+1}^2}{A_{k+1}}$ and $\tilde{x}_k := \frac{a_{k+1}}{A_{k+1}} x_{k} + \frac{A_k}{A_{k+1}} y_k$ one has:

$$\begin{aligned}&\psi _{k+1}(x_{k+1}) - A_{k+1} F(y_{k+1}) - (\psi _k(x_k) - A_k F(y_k)) \\&\ge \frac{A_{k+1}}{2 \lambda _{k+1}} \bigg ( \Vert y_{k+1} - \tilde{x}_k\Vert ^2 - \Vert y_{k+1} - (\tilde{x}_k - \lambda _{k+1} (\nabla f(y_{k+1}))+g'(y_{k+1})) \Vert ^2 \bigg ) \,. \end{aligned}$$

In particular, we have in light of (24)

$$\psi _{k}(x_{k})-A_{k}F(y_{k})\ge \frac{1-\sigma ^{2}}{2}\sum _{i=1}^{k}\frac{A_{i}}{\lambda _{i}}\Vert y_{i}-\tilde{x}_{i-1}\Vert ^{2}.$$

Proof

We apply Lemma 5 with $z_k = y_k$ and $x=x_{k+1}$, and note that (with $\tilde{x} := \frac{a_{k+1}}{A_{k+1}} x + \frac{A_k}{A_{k+1}} y_k$):

$$\begin{aligned}&(\nabla f(y_{k+1})+g'(y_{k+1})) \cdot \left( \frac{a_{k+1}}{A_{k+1}} x + \frac{A_k}{A_{k+1}} y_k - y_{k+1} \right) + \frac{1}{2 A_{k+1}} \Vert x - x_k\Vert ^2 \\&= (\nabla f(y_{k+1})+g'(y_{k+1})) \cdot (\tilde{x} - y_{k+1}) + \frac{1}{2 A_{k+1}} \left\| \frac{A_{k+1}}{a_{k+1}} \left( \tilde{x} - \frac{A_k}{A_{k+1}} y_k \right) - x_k \right\| ^2 \\&= (\nabla f(y_{k+1})+g'(y_{k+1})) \cdot (\tilde{x} - y_{k+1}) + \frac{A_{k+1}}{2 a_{k+1}^2} \left\| \tilde{x} - \left( \frac{a_{k+1}}{A_k} x_k + \frac{A_k}{A_{k+1}} y_k \right) \right\| ^2 \,. \end{aligned}$$

This yields:

$$\begin{aligned}&\psi _{k+1}(x_{k+1}) - A_{k+1} F(y_{k+1}) - (\psi _k(x_k) - A_k F(y_k)) \\&\ge A_{k+1} \cdot \min _{x \in \mathbb {R}^d} \left\{ (\nabla f(y_{k+1})+g'(y_{k+1})) \cdot (x - y_{k+1}) + \frac{1}{2 \lambda _{k+1}} \Vert x - \tilde{x}_k\Vert ^2 \right\} \,. \end{aligned}$$

The value of the minimum is easy to compute.

For the first conclusion in Theorem 6, it suffices to combine Lemma 6 with Lemma 4, and Lemma 2.5 from [4]. The second conclusion in Theorem 6 follows from Lemma 6 and Lemma 3.

The following lemma shows that minimizing the $p^{th}$ order Taylor expansion (4) can be viewed as an implicit gradient step for some “large” step size:

Lemma 7

Equation (24) holds true with $\sigma = 1/2$ for (4), provided that one has:

$$\begin{aligned} \frac{1}{2} \le \lambda _{k+1} \frac{L_p \cdot \Vert y_{k+1} - \tilde{x}_k\Vert ^{p-1}}{(p-1)!} \le \frac{p}{p+1} \,. \end{aligned}$$

(30)

Proof

Observe that the optimality condition gives:

$$\begin{aligned} \nabla _y f_p(y_{k+1}, \tilde{x}_k) + \frac{L_p \cdot (p+1)}{p!} (y_{k+1} - \tilde{x}_k) \Vert y_{k+1} - \tilde{x}_k\Vert ^{p-1} + g'(y_{k+1})= 0 \,. \end{aligned}$$

(31)

In particular we get:

$$\begin{aligned}&y_{k+1} - (\tilde{x}_k - \lambda _{k+1} (\nabla f(y_{k+1})+ g'(y_{k+1}))) = \lambda _{k+1} (\nabla f(y_{k+1})+ g'(y_{k+1}))\\&- \frac{p!}{L_p \cdot (p+1) \cdot \Vert y_{k+1} - \tilde{x}_k\Vert ^{p-1}} (\nabla _y f_p(y_{k+1}, \tilde{x}_k)+g'(y_{k+1})) \,. \end{aligned}$$

By doing a Taylor expansion of the gradient function one obtains:

$$ \Vert \nabla f(y) - \nabla _y f_p(y, x)\Vert \le \frac{L_p}{p!} \Vert y - x\Vert ^p \,, $$

so that we find:

$$\begin{aligned}&\Vert y_{k+1} - (\tilde{x}_k - \lambda _{k+1} (\nabla f(y_{k+1})+ g'(y_{k+1}))) \Vert \\&\le \lambda _{k+1} \frac{L_p}{p!} \Vert y_{k+1} - \tilde{x}_k\Vert ^p + \left| \lambda _{k+1} - \frac{p!}{L_p \cdot (p+1) \cdot \Vert y_{k+1} - \tilde{x}_k\Vert ^{p-1}} \right| \cdot \Vert \nabla _y f_p(y_{k+1}, \tilde{x}_k)+ g'(y_{k+1})\Vert \\&\le \Vert y_{k+1} - \tilde{x}_k\Vert \left( \lambda _{k+1} \frac{L_p}{p!} \Vert y_{k+1} - \tilde{x}_k\Vert ^{p-1} + \left| \lambda _{k+1}\frac{L_p \cdot (p+1) \cdot \Vert y_{k+1} - \tilde{x}_k\Vert ^{p-1}}{p!} - 1\right| \right) \\&=\Vert y_{k+1}-\tilde{x}_{k}\Vert \left( \frac{\eta }{p}+\left| \eta \cdot \frac{p+1}{p}-1\right| \right) \end{aligned}$$

where we used (31) in the second last equation and we let $\eta := \lambda _{k+1} \frac{L_p \cdot \Vert y_{k+1} - \tilde{x}_k\Vert ^{p-1}}{(p-1)!}$ in the last equation. The result follows from the assumption $1/2 \le \eta \le p/(p+1)$ in (30).

Finally, if we replace $\Vert x^{*}\Vert $ by $\Vert x_0-x^{*}\Vert $ in Lemma 3.3 and use Lemma 3.4 from [4] we prove Theorem 6.

B Inexact solution of the subproblem

Suppose that (4) can not be solved exactly. Assume that we can find only inexact solution $\tilde{y}_{k+1}$ satisfies

$$\begin{aligned} \left\| \nabla \left( f_p(\tilde{y}_{k+1}, \tilde{x}_k) + \frac{L_p}{p!} \Vert \tilde{y}_{k+1} - \tilde{x}_k\Vert ^{p+1} +g(\tilde{y}_{k+1})\right) \right\| \le \frac{L_p}{2p!}\Vert \tilde{y}_{k+1}-\tilde{x}_k\Vert ^p. \end{aligned}$$

(32)

In this case Lemma 7 should be corrected.

Lemma 8

Equation (24) holds true with $\sigma = 3/4$ for (32), provided that one has:

$$\begin{aligned} \frac{1}{2} \le \lambda _{k+1} \frac{L_p \cdot \Vert \tilde{y}_{k+1} - \tilde{x}_k\Vert ^{p-1}}{(p-1)!} \le \frac{p}{p+1} \,. \end{aligned}$$

Proof

Let’s introduce

$$\Xi _{k+1} = \nabla \left( f_p(\tilde{y}_{k+1}, \tilde{x}_k) + \frac{L_p}{p!} \Vert \tilde{y}_{k+1} - \tilde{x}_k\Vert ^{p+1} +g(\tilde{y}_{k+1})\right) .$$

The main difference with the proof of Lemma 7 is in the following line

$$\begin{aligned}&\Vert \tilde{y}_{k+1} - (\tilde{x}_k - \lambda _{k+1} (\nabla f(\tilde{y}_{k+1})+ g'(\tilde{y}_{k+1}))) \Vert \\&\le \lambda _{k+1} \frac{L_p}{p!} \Vert \tilde{y}_{k+1} - \tilde{x}_k\Vert ^p + \\&\left| \lambda _{k+1} - \frac{p!}{L_p \cdot (p+1) \cdot \Vert \tilde{y}_{k+1} - \tilde{x}_k\Vert ^{p-1}} \right| \cdot \Vert \nabla _y f_p(\tilde{y}_{k+1}, \tilde{x}_k) + g'(\tilde{y}_{k+1})\Vert + \lambda _{k+1}\Xi _{k+1} \\&\le \Vert \tilde{y}_{k+1} - \tilde{x}_k\Vert \left( \lambda _{k+1} \frac{L_p}{p!} \Vert \tilde{y}_{k+1} - \tilde{x}_k\Vert ^{p-1} + \left| \lambda _{k+1}\frac{L_p \cdot (p+1) \cdot \Vert \tilde{y}_{k+1} - \tilde{x}_k\Vert ^{p-1}}{p!} - 1\right| \right) \\&+\Vert \tilde{y}_{k+1} - \tilde{x}_k\Vert \cdot \frac{1}{2p}\cdot \lambda _{k+1} \frac{L_p \cdot \Vert \tilde{y}_{k+1} - \tilde{x}_k\Vert ^{p-1}}{(p-1)!} . \end{aligned}$$

To complete the proof it’s left to notice that due to the (32)

$$\Vert \Xi _{k+1}\Vert \le \frac{L_p}{2p!}\Vert \tilde{y}_{k+1}-\tilde{x}_k\Vert ^p.$$

Based on (32) we try to relate the accuracy $\tilde{\varepsilon }$ we need to solve auxiliary problem to the desired accuracy $\varepsilon $ for the problem (1). For this we use Lemma 2.1 from [15]. This Lemma guarantee that if

$$\begin{aligned} \left\| \nabla \left( f_p(\tilde{y}_{k+1}, \tilde{x}_k) + \frac{L_p}{p!} \Vert \tilde{y}_{k+1} - \tilde{x}_k\Vert ^{p+1} +g(\tilde{y}_{k+1})\right) \right\| \le \frac{1}{4p(p+1)}\Vert \nabla F(\tilde{y}_{k+1})\Vert , \end{aligned}$$

(33)

then (32) holds true. So it’s sufficient to solve auxiliary problem in terms of (33).

Assume that F(x) is r-uniformly convex function with constant $\sigma _r$ ($r\ge 2$, $\sigma _r > 0$, see Definition 1), then from Lemma 2 [7] we have

$$\begin{aligned} F(\tilde{y}_{k+1}) - \min \limits _{x\in E} F(x) \le \frac{r-1}{r}\left( \frac{1}{\sigma _r}\right) ^{\frac{1}{r-1}} \Vert \nabla F(\tilde{y}_{k+1})\Vert ^{\frac{r}{r-1}}. \end{aligned}$$

(34)

Inequalities (33), (34) give us guarantees that it’s sufficient to solve auxiliary problem with the accuracy

$$\tilde{\varepsilon } = O\left( \left( \epsilon ^{r-1}\sigma _r\right) ^{\frac{1}{r}}\right) $$

in terms of criteria (33). Since auxiliary problem is every time r-uniformly convex we can apply (34) to auxiliary problem to estimate the accuracy in terms of function discrepancy. Anyway we will have that there is no need to think about it since the dependence of this accuracy are logarithmic. The only restrictive assumption we made is that F(x) is r-uniformly convex. If this is not a case, like in Sect. 4, we may use regularisation tricks [11]. This lead us to $\sigma _2 \sim \varepsilon $. So the dependence $\tilde{\varepsilon }$ becomes worthier, but this doesn’t change the main conclusion about possibility to skip the details concern the accuracy of the solution of auxiliary problem.

C CATD with restarts

The proof of the Theorem 2.

Proof

As F is r-uniformly convex function we get

$$\begin{aligned} R_{k+1}&=\Vert z_{k+1}-x_{*}\Vert \le \left( \frac{r \left( F(z_{k+1})-F(x_{*}) \right) }{\sigma _r} \right) ^{\frac{1}{r}} \overset{(5)}{\le } \left( \frac{r \left( \frac{c_p L_p R_{k}^{p+1}}{N_k^{\frac{3p+1}{2}}} \right) }{\sigma _r} \right) ^{\frac{1}{r}}\\&=\left( \frac{r c_p L_p R_{k}^{p+1}}{\sigma _r N_k^{\frac{3p+1}{2}}} \right) ^{\frac{1}{r}} \overset{(7)}{\le } \left( \frac{ R_{k}^{p+1}}{2^r R_k^{p+1-r}}\ \right) ^{\frac{1}{r}} = \frac{ R_{k}}{2}. \end{aligned}$$

Now we compute the total number of CATD steps.

$$\begin{aligned} \sum \limits _{k=0}^K N_k&\le \sum \limits _{k=0}^K \left( \frac{r c_p L_p 2^r}{\sigma _r} R_k^{p+1-r} \right) ^{\frac{2}{3p+1}}+K= \sum \limits _{k=0}^K \left( \frac{r c_p L_p 2^r}{\sigma _r} (R_0 2^{-k})^{p+1-r} \right) ^{\frac{2}{3p+1}}+K\\&=\left( \frac{r c_p L_p 2^r R_0^{p+1-r}}{\sigma _r}\right) ^{\frac{2}{3p+1}} \sum \limits _{k=0}^K 2^{\frac{-2(p+1-r)k}{3p+1}}+K. \end{aligned}$$

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kamzolov, D., Gasnikov, A., Dvurechensky, P. (2020). Optimal Combination of Tensor Optimization Methods. In: Olenev, N., Evtushenko, Y., Khachay, M., Malkova, V. (eds) Optimization and Applications. OPTIMA 2020. Lecture Notes in Computer Science(), vol 12422. Springer, Cham. https://doi.org/10.1007/978-3-030-62867-3_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-62867-3_13
Published: 04 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-62866-6
Online ISBN: 978-3-030-62867-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Optimal Combination of Tensor Optimization Methods

Abstract

Access this chapter

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

A Proof of Composite Accelerated Taylor Descent

Theorem 6

Lemma 3

Lemma 4

Proof

Lemma 5

Proof

Lemma 6

Proof

Lemma 7

Proof

B Inexact solution of the subproblem

Lemma 8

Proof

C CATD with restarts

Proof

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation