Skip to main content

Optimal Combination of Tensor Optimization Methods

  • Conference paper
  • First Online:
Optimization and Applications (OPTIMA 2020)

Abstract

We consider the minimization problem of a sum of a number of functions having Lipshitz p-th order derivatives with different Lipschitz constants. In this case, to accelerate optimization, we propose a general framework allowing to obtain near-optimal oracle complexity for each function in the sum separately, meaning, in particular, that the oracle for a function with lower Lipschitz constant is called a smaller number of times. As a building block, we extend the current theory of tensor methods and show how to generalize near-optimal tensor methods to work with inexact tensor step. Further, we investigate the situation when the functions in the sum have Lipschitz derivatives of a different order. For this situation, we propose a generic way to separate the oracle complexity between the parts of the sum. Our method is not optimal, which leads to an open problem of the optimal combination of oracles of a different order.

The work of D. Kamzolov in Sects. 14 is funded by RFBR, project number 19-31-27001. The work of A. Gasnikov and P. Dvurechensky in Sects. 14 of the paper is supported by RFBR grant 18-29-03071 mk. The work in Sect. 5 is supported by the Ministry of Science and Higher Education of the Russian Federation (Goszadaniye) No. 075-00337-20-03, project No. 0714-2020-0005.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agarwal, N., Hazan, E.: Lower bounds for higher-order convex optimization. In: Conference On Learning Theory. PMLR (2018)

    Google Scholar 

  2. Arjevani, Y., Shamir, O., Shiff, R.: Oracle complexity of second-order methods for smooth convex optimization. Math. Program. 178(1–2), 327–360 (2019)

    Article  MathSciNet  Google Scholar 

  3. Beznosikov, A., Gorbunov, E., Gasnikov, A.: Derivative-free method for decentralized distributed non-smooth optimization. arXiv preprint arXiv:1911.10645 (2019)

  4. Bubeck, S., Jiang, Q., Lee, Y.T., Li, Y., Sidford, A.: Near-optimal method for highly smooth convex optimization. In: Conference on Learning Theory, pp. 492–507 (2019)

    Google Scholar 

  5. Chebyshev, P.: Collected Works, vol. 5. Strelbytskyy Multimedia Publishing, Kyiv (2018)

    MATH  Google Scholar 

  6. Doikov, N., Nesterov, Y.: Local convergence of tensor methods. arXiv preprint arXiv:1912.02516 (2019)

  7. Doikov, N., Nesterov, Y.: Minimizing uniformly convex functions by cubic regularization of newton method. arXiv preprint arXiv:1905.02671 (2019)

  8. Doikov, N., Richtárik, P.: Randomized block cubic newton method. In: International Conference on Machine Learning, pp. 1290–1298 (2018)

    Google Scholar 

  9. Dvinskikh, D., Gasnikov, A.: Decentralized and parallelized primal and dual accelerated methods for stochastic convex programming problems. arXiv preprint arXiv:1904.09015 (2019)

  10. Dvinskikh, D., Omelchenko, S., Tiurin, A., Gasnikov, A.: Accelerated gradient sliding and variance reduction. arXiv preprint arXiv:1912.11632 (2019)

  11. Dvurechensky, P., Gasnikov, A., Ostroukhov, P., Uribe, C.A., Ivanova, A.: Near-optimal tensor methods for minimizing the gradient norm of convex function. arXiv preprint arXiv:1912.03381 (2019)

  12. Gasnikov, A., Dvurechensky, P., Gorbunov, E., Vorontsova, E., Selikhanovych, D., Uribe, C.A.: Optimal tensor methods in smooth convex and uniformly convex optimization. In: Conference on Learning Theory, pp. 1374–1391 (2019)

    Google Scholar 

  13. Gasnikov, A., et al.: Near optimal methods for minimizing convex functions with lipschitz \(p\)-th derivatives. In: Conference on Learning Theory, pp. 1392–1393 (2019)

    Google Scholar 

  14. Gorbunov, E., Dvinskikh, D., Gasnikov, A.: Optimal decentralized distributed algorithms for stochastic convex optimization. arXiv preprint arXiv:1911.07363 (2019)

  15. Grapiglia, G.N., Nesterov, Y.: On inexact solution of auxiliary problems in tensor methods for convex optimization. arXiv preprint arXiv:1907.13023 (2019)

  16. Grapiglia, G.N., Nesterov, Y.: Tensor methods for minimizing functions with Hölder continuous higher-order derivatives. arXiv preprint arXiv:1904.12559 (2019)

  17. Jiang, B., Wang, H., Zhang, S.: An optimal high-order tensor method for convex optimization. In: Conference on Learning Theory, pp. 1799–1801 (2019)

    Google Scholar 

  18. Kantorovich, L.V.: On Newton’s method. Trudy Matematicheskogo Instituta imeni VA Steklova 28, 104–144 (1949)

    MATH  Google Scholar 

  19. Lan, G.: Lectures on optimization. Methods for machine learning. H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA (2019)

    Google Scholar 

  20. Lan, G.: Gradient sliding for composite optimization. Math. Program. 159(1–2), 201–235 (2016)

    Article  MathSciNet  Google Scholar 

  21. Lan, G., Lee, S., Zhou, Y.: Communication-efficient algorithms for decentralized and stochastic optimization. Math. Program. 180(1), 237–284 (2018). https://doi.org/10.1007/s10107-018-1355-4

    Article  MathSciNet  MATH  Google Scholar 

  22. Lan, G., Ouyang, Y.: Accelerated gradient sliding for structured convex optimization. arXiv preprint arXiv:1609.04905 (2016)

  23. Monteiro, R.D., Svaiter, B.F.: An accelerated hybrid proximal extragradient method for convex optimization and its implications to second-order methods. SIAM J. Optim. 23(2), 1092–1125 (2013)

    Article  MathSciNet  Google Scholar 

  24. Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. 140(1), 125–161 (2013)

    Article  MathSciNet  Google Scholar 

  25. Nesterov, Y.: Lectures on Convex Optimization. SOIA, vol. 137. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91578-4

    Book  MATH  Google Scholar 

  26. Nesterov, Y.: Implementable tensor methods in unconstrained convex optimization. Math. Program., 1–27 (2019). https://doi.org/10.1007/s10107-019-01449-1

  27. Rogozin, A., Gasnikov, A.: Projected gradient method for decentralized optimization over time-varying networks. arXiv preprint arXiv:1911.08527 (2019)

  28. Song, C., Ma, Y.: Towards unified acceleration of high-order algorithms under Hölder continuity and uniform convexity. arXiv preprint arXiv:1906.00582 (2019)

Download references

Acknowledgements

We would like to thank Yu. Nesterov for fruitful discussions on inexact solution of tensor subproblem.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dmitry Kamzolov .

Editor information

Editors and Affiliations

Appendices

A Proof of Composite Accelerated Taylor Descent

This section is a rewriting of proof from [4], with adding composite part into the proof. Next theorem based on Theorem 2.1 from [4]

Theorem 6

Let \((y_k)_{k \ge 1}\) be a sequence of points in \(\mathbb {R}^d\) and \((\lambda _k)_{k \ge 1}\) a sequence in \(\mathbb {R}_+\). Define \((a_k)_{k \ge 1}\) such that \(\lambda _k A_k = a_k^2\) where \(A_k = \sum _{i=1}^k a_i\). Define also for any \(k\ge 0\), \(x_k = x_0 - \sum _{i=1}^k a_i (\nabla f(y_i)+g'(y_i))\) and \(\tilde{x}_k := \frac{a_{k+1}}{A_{k+1}} x_{k} + \frac{A_k}{A_{k+1}} y_k\). Finally assume if for some \(\sigma \in [0, 1]\)

$$\begin{aligned} \Vert y_{k+1} - (\tilde{x}_k - \lambda _{k+1} \nabla f(y_{k+1}))\Vert \le \sigma \cdot \Vert y_{k+1} - \tilde{x}_k\Vert \,, \end{aligned}$$
(24)

then one has for any \(x \in \mathbb {R}^d\),

$$\begin{aligned} F(y_k) - F(x) \le \frac{2 \Vert x\Vert ^2}{\left( \sum _{i=1}^k \sqrt{\lambda _i} \right) ^2} \,, \end{aligned}$$
(25)

and

$$\begin{aligned} \sum _{i=1}^k \frac{A_i}{\lambda _i} \Vert y_i - \tilde{x}_{i-1}\Vert ^2 \le \frac{\Vert x^*\Vert ^2}{1-\sigma ^2} \,. \end{aligned}$$
(26)

To prove this theorem we introduce auxiliaries lemmas based on Lemmas 2.2–2.5 and 3.1, Lemmas 2.6 and 3.3 one can take directly from [4] without any changes.

Lemma 3

Let \(\psi _0(x) = \frac{1}{2} \Vert x-x_0\Vert ^2\) and define by induction \(\psi _{k}(x) = \psi _{k-1}(x) + a_{k} \varOmega _1(F, y_{k}, x)\). Then \(x_k =x_0 - \sum _{i=1}^k a_i (\nabla f(y_i) + g'(y_i))\) is the minimizer of \(\psi _k\), and \(\psi _k(x) \le A_k F(x) + \frac{1}{2} \Vert x-x_0\Vert ^2\) where \(A_k = \sum _{i=1}^k a_i\).

Lemma 4

Let \((z_k)\) be a sequence such that

$$\begin{aligned} \psi _k(x_k) - A_k F(z_k) \ge 0 \,. \end{aligned}$$
(27)

Then one has for any x,

$$\begin{aligned} F(z_k) \le F(x) + \frac{\Vert x-x_0\Vert ^2}{2 A_k} \,. \end{aligned}$$
(28)

Proof

One has (recall Lemma 3):

$$ A_k F(z_k) \le \psi _k(x_k) \le \psi _k(x) \le A_k F(x) + \frac{1}{2}\Vert x-x_0\Vert ^2 \,. $$

Lemma 5

One has for any x,

$$\begin{aligned}&\psi _{k+1}(x) - A_{k+1} F(y_{k+1}) - (\psi _k(x_k) - A_k F(z_k)) \\&\ge A_{k+1} (\nabla f(y_{k+1}) +g'(y_{k+1}))\cdot \left( \frac{a_{k+1}}{A_{k+1}} x + \frac{A_k}{A_{k+1}} z_k - y_{k+1} \right) + \frac{1}{2} \Vert x -x_k\Vert ^2 \,. \end{aligned}$$

Proof

Firstly, by simple calculation we note that:

$$ \psi _k(x) = \psi _k(x_k) + \frac{1}{2} \Vert x- x_k\Vert ^2, \text { and}\ \psi _{k+1}(x) = \psi _k(x_k) + \frac{1}{2} \Vert x-x_k\Vert ^2 + a_{k+1} \varOmega _1(f, y_{k+1}, x) \,, $$

so that

$$\begin{aligned} \psi _{k+1}(x) - \psi _k(x_k) = a_{k+1} \varOmega _1(F, y_{k+1}, x) + \frac{1}{2} \Vert x-x_k\Vert ^2 \,. \end{aligned}$$
(29)

Now we want to make appear the term \(A_{k+1} F(z_{k+1}) - A_k F(z_k)\) as a lower bound on the right hand side of (29) when evaluated at \(x=x_{k+1}\). Using the inequality \(\varOmega _1(F, y_{k+1}, z_k) \le f(z_k)\) we have:

$$\begin{aligned} a_{k+1} \varOmega _1(F, y_{k+1}, x)= & {} A_{k+1} \varOmega _1(F, y_{k+1}, x) - A_k \varOmega _1(F, y_{k+1}, x) \\= & {} A_{k+1} \varOmega _1(F, y_{k+1}, x) - A_k \nabla F(y_{k+1}) \cdot (x - z_k) - A_k \varOmega _1(F, y_{k+1}, z_k) \\= & {} A_{k+1} \varOmega _1\left( F, y_{k+1}, x - \frac{A_k}{A_{k+1}} (x - z_k) \right) - A_k \varOmega _1(F, y_{k+1}, z_k) \\\ge & {} A_{k+1} F(y_{k+1}) - A_k F(z_k)\\+ & {} A_{k+1} (\nabla f(y_{k+1})+g'(y_{k+1})) \cdot \left( \frac{a_{k+1}}{A_{k+1}} x + \frac{A_k}{A_{k+1}} z_k - y_{k+1} \right) \,, \end{aligned}$$

which concludes the proof.

Lemma 6

Denoting \(\lambda _{k+1} := \frac{a_{k+1}^2}{A_{k+1}}\) and \(\tilde{x}_k := \frac{a_{k+1}}{A_{k+1}} x_{k} + \frac{A_k}{A_{k+1}} y_k\) one has:

$$\begin{aligned}&\psi _{k+1}(x_{k+1}) - A_{k+1} F(y_{k+1}) - (\psi _k(x_k) - A_k F(y_k)) \\&\ge \frac{A_{k+1}}{2 \lambda _{k+1}} \bigg ( \Vert y_{k+1} - \tilde{x}_k\Vert ^2 - \Vert y_{k+1} - (\tilde{x}_k - \lambda _{k+1} (\nabla f(y_{k+1}))+g'(y_{k+1})) \Vert ^2 \bigg ) \,. \end{aligned}$$

In particular, we have in light of (24)

$$\psi _{k}(x_{k})-A_{k}F(y_{k})\ge \frac{1-\sigma ^{2}}{2}\sum _{i=1}^{k}\frac{A_{i}}{\lambda _{i}}\Vert y_{i}-\tilde{x}_{i-1}\Vert ^{2}.$$

Proof

We apply Lemma 5 with \(z_k = y_k\) and \(x=x_{k+1}\), and note that (with \(\tilde{x} := \frac{a_{k+1}}{A_{k+1}} x + \frac{A_k}{A_{k+1}} y_k\)):

$$\begin{aligned}&(\nabla f(y_{k+1})+g'(y_{k+1})) \cdot \left( \frac{a_{k+1}}{A_{k+1}} x + \frac{A_k}{A_{k+1}} y_k - y_{k+1} \right) + \frac{1}{2 A_{k+1}} \Vert x - x_k\Vert ^2 \\&= (\nabla f(y_{k+1})+g'(y_{k+1})) \cdot (\tilde{x} - y_{k+1}) + \frac{1}{2 A_{k+1}} \left\| \frac{A_{k+1}}{a_{k+1}} \left( \tilde{x} - \frac{A_k}{A_{k+1}} y_k \right) - x_k \right\| ^2 \\&= (\nabla f(y_{k+1})+g'(y_{k+1})) \cdot (\tilde{x} - y_{k+1}) + \frac{A_{k+1}}{2 a_{k+1}^2} \left\| \tilde{x} - \left( \frac{a_{k+1}}{A_k} x_k + \frac{A_k}{A_{k+1}} y_k \right) \right\| ^2 \,. \end{aligned}$$

This yields:

$$\begin{aligned}&\psi _{k+1}(x_{k+1}) - A_{k+1} F(y_{k+1}) - (\psi _k(x_k) - A_k F(y_k)) \\&\ge A_{k+1} \cdot \min _{x \in \mathbb {R}^d} \left\{ (\nabla f(y_{k+1})+g'(y_{k+1})) \cdot (x - y_{k+1}) + \frac{1}{2 \lambda _{k+1}} \Vert x - \tilde{x}_k\Vert ^2 \right\} \,. \end{aligned}$$

The value of the minimum is easy to compute.

For the first conclusion in Theorem 6, it suffices to combine Lemma 6 with Lemma 4, and Lemma 2.5 from [4]. The second conclusion in Theorem 6 follows from Lemma 6 and Lemma 3.

The following lemma shows that minimizing the \(p^{th}\) order Taylor expansion (4) can be viewed as an implicit gradient step for some “large” step size:

Lemma 7

Equation (24) holds true with \(\sigma = 1/2\) for (4), provided that one has:

$$\begin{aligned} \frac{1}{2} \le \lambda _{k+1} \frac{L_p \cdot \Vert y_{k+1} - \tilde{x}_k\Vert ^{p-1}}{(p-1)!} \le \frac{p}{p+1} \,. \end{aligned}$$
(30)

Proof

Observe that the optimality condition gives:

$$\begin{aligned} \nabla _y f_p(y_{k+1}, \tilde{x}_k) + \frac{L_p \cdot (p+1)}{p!} (y_{k+1} - \tilde{x}_k) \Vert y_{k+1} - \tilde{x}_k\Vert ^{p-1} + g'(y_{k+1})= 0 \,. \end{aligned}$$
(31)

In particular we get:

$$\begin{aligned}&y_{k+1} - (\tilde{x}_k - \lambda _{k+1} (\nabla f(y_{k+1})+ g'(y_{k+1}))) = \lambda _{k+1} (\nabla f(y_{k+1})+ g'(y_{k+1}))\\&- \frac{p!}{L_p \cdot (p+1) \cdot \Vert y_{k+1} - \tilde{x}_k\Vert ^{p-1}} (\nabla _y f_p(y_{k+1}, \tilde{x}_k)+g'(y_{k+1})) \,. \end{aligned}$$

By doing a Taylor expansion of the gradient function one obtains:

$$ \Vert \nabla f(y) - \nabla _y f_p(y, x)\Vert \le \frac{L_p}{p!} \Vert y - x\Vert ^p \,, $$

so that we find:

$$\begin{aligned}&\Vert y_{k+1} - (\tilde{x}_k - \lambda _{k+1} (\nabla f(y_{k+1})+ g'(y_{k+1}))) \Vert \\&\le \lambda _{k+1} \frac{L_p}{p!} \Vert y_{k+1} - \tilde{x}_k\Vert ^p + \left| \lambda _{k+1} - \frac{p!}{L_p \cdot (p+1) \cdot \Vert y_{k+1} - \tilde{x}_k\Vert ^{p-1}} \right| \cdot \Vert \nabla _y f_p(y_{k+1}, \tilde{x}_k)+ g'(y_{k+1})\Vert \\&\le \Vert y_{k+1} - \tilde{x}_k\Vert \left( \lambda _{k+1} \frac{L_p}{p!} \Vert y_{k+1} - \tilde{x}_k\Vert ^{p-1} + \left| \lambda _{k+1}\frac{L_p \cdot (p+1) \cdot \Vert y_{k+1} - \tilde{x}_k\Vert ^{p-1}}{p!} - 1\right| \right) \\&=\Vert y_{k+1}-\tilde{x}_{k}\Vert \left( \frac{\eta }{p}+\left| \eta \cdot \frac{p+1}{p}-1\right| \right) \end{aligned}$$

where we used (31) in the second last equation and we let \(\eta := \lambda _{k+1} \frac{L_p \cdot \Vert y_{k+1} - \tilde{x}_k\Vert ^{p-1}}{(p-1)!}\) in the last equation. The result follows from the assumption \(1/2 \le \eta \le p/(p+1)\) in (30).

Finally, if we replace \(\Vert x^{*}\Vert \) by \(\Vert x_0-x^{*}\Vert \) in Lemma 3.3 and use Lemma 3.4 from [4] we prove Theorem 6.

B Inexact solution of the subproblem

Suppose that (4) can not be solved exactly. Assume that we can find only inexact solution \(\tilde{y}_{k+1}\) satisfies

$$\begin{aligned} \left\| \nabla \left( f_p(\tilde{y}_{k+1}, \tilde{x}_k) + \frac{L_p}{p!} \Vert \tilde{y}_{k+1} - \tilde{x}_k\Vert ^{p+1} +g(\tilde{y}_{k+1})\right) \right\| \le \frac{L_p}{2p!}\Vert \tilde{y}_{k+1}-\tilde{x}_k\Vert ^p. \end{aligned}$$
(32)

In this case Lemma 7 should be corrected.

Lemma 8

Equation (24) holds true with \(\sigma = 3/4\) for (32), provided that one has:

$$\begin{aligned} \frac{1}{2} \le \lambda _{k+1} \frac{L_p \cdot \Vert \tilde{y}_{k+1} - \tilde{x}_k\Vert ^{p-1}}{(p-1)!} \le \frac{p}{p+1} \,. \end{aligned}$$

Proof

Let’s introduce

$$\Xi _{k+1} = \nabla \left( f_p(\tilde{y}_{k+1}, \tilde{x}_k) + \frac{L_p}{p!} \Vert \tilde{y}_{k+1} - \tilde{x}_k\Vert ^{p+1} +g(\tilde{y}_{k+1})\right) .$$

The main difference with the proof of Lemma 7 is in the following line

$$\begin{aligned}&\Vert \tilde{y}_{k+1} - (\tilde{x}_k - \lambda _{k+1} (\nabla f(\tilde{y}_{k+1})+ g'(\tilde{y}_{k+1}))) \Vert \\&\le \lambda _{k+1} \frac{L_p}{p!} \Vert \tilde{y}_{k+1} - \tilde{x}_k\Vert ^p + \\&\left| \lambda _{k+1} - \frac{p!}{L_p \cdot (p+1) \cdot \Vert \tilde{y}_{k+1} - \tilde{x}_k\Vert ^{p-1}} \right| \cdot \Vert \nabla _y f_p(\tilde{y}_{k+1}, \tilde{x}_k) + g'(\tilde{y}_{k+1})\Vert + \lambda _{k+1}\Xi _{k+1} \\&\le \Vert \tilde{y}_{k+1} - \tilde{x}_k\Vert \left( \lambda _{k+1} \frac{L_p}{p!} \Vert \tilde{y}_{k+1} - \tilde{x}_k\Vert ^{p-1} + \left| \lambda _{k+1}\frac{L_p \cdot (p+1) \cdot \Vert \tilde{y}_{k+1} - \tilde{x}_k\Vert ^{p-1}}{p!} - 1\right| \right) \\&+\Vert \tilde{y}_{k+1} - \tilde{x}_k\Vert \cdot \frac{1}{2p}\cdot \lambda _{k+1} \frac{L_p \cdot \Vert \tilde{y}_{k+1} - \tilde{x}_k\Vert ^{p-1}}{(p-1)!} . \end{aligned}$$

To complete the proof it’s left to notice that due to the (32)

$$\Vert \Xi _{k+1}\Vert \le \frac{L_p}{2p!}\Vert \tilde{y}_{k+1}-\tilde{x}_k\Vert ^p.$$

Based on (32) we try to relate the accuracy \(\tilde{\varepsilon }\) we need to solve auxiliary problem to the desired accuracy \(\varepsilon \) for the problem (1). For this we use Lemma 2.1 from [15]. This Lemma guarantee that if

$$\begin{aligned} \left\| \nabla \left( f_p(\tilde{y}_{k+1}, \tilde{x}_k) + \frac{L_p}{p!} \Vert \tilde{y}_{k+1} - \tilde{x}_k\Vert ^{p+1} +g(\tilde{y}_{k+1})\right) \right\| \le \frac{1}{4p(p+1)}\Vert \nabla F(\tilde{y}_{k+1})\Vert , \end{aligned}$$
(33)

then (32) holds true. So it’s sufficient to solve auxiliary problem in terms of (33).

Assume that F(x) is r-uniformly convex function with constant \(\sigma _r\) (\(r\ge 2\), \(\sigma _r > 0\), see Definition 1), then from Lemma 2 [7] we have

$$\begin{aligned} F(\tilde{y}_{k+1}) - \min \limits _{x\in E} F(x) \le \frac{r-1}{r}\left( \frac{1}{\sigma _r}\right) ^{\frac{1}{r-1}} \Vert \nabla F(\tilde{y}_{k+1})\Vert ^{\frac{r}{r-1}}. \end{aligned}$$
(34)

Inequalities (33), (34) give us guarantees that it’s sufficient to solve auxiliary problem with the accuracy

$$\tilde{\varepsilon } = O\left( \left( \epsilon ^{r-1}\sigma _r\right) ^{\frac{1}{r}}\right) $$

in terms of criteria (33). Since auxiliary problem is every time r-uniformly convex we can apply (34) to auxiliary problem to estimate the accuracy in terms of function discrepancy. Anyway we will have that there is no need to think about it since the dependence of this accuracy are logarithmic. The only restrictive assumption we made is that F(x) is r-uniformly convex. If this is not a case, like in Sect. 4, we may use regularisation tricks [11]. This lead us to \(\sigma _2 \sim \varepsilon \). So the dependence \(\tilde{\varepsilon }\) becomes worthier, but this doesn’t change the main conclusion about possibility to skip the details concern the accuracy of the solution of auxiliary problem.

C CATD with restarts

The proof of the Theorem 2.

Proof

As F is r-uniformly convex function we get

$$\begin{aligned} R_{k+1}&=\Vert z_{k+1}-x_{*}\Vert \le \left( \frac{r \left( F(z_{k+1})-F(x_{*}) \right) }{\sigma _r} \right) ^{\frac{1}{r}} \overset{(5)}{\le } \left( \frac{r \left( \frac{c_p L_p R_{k}^{p+1}}{N_k^{\frac{3p+1}{2}}} \right) }{\sigma _r} \right) ^{\frac{1}{r}}\\&=\left( \frac{r c_p L_p R_{k}^{p+1}}{\sigma _r N_k^{\frac{3p+1}{2}}} \right) ^{\frac{1}{r}} \overset{(7)}{\le } \left( \frac{ R_{k}^{p+1}}{2^r R_k^{p+1-r}}\ \right) ^{\frac{1}{r}} = \frac{ R_{k}}{2}. \end{aligned}$$

Now we compute the total number of CATD steps.

$$\begin{aligned} \sum \limits _{k=0}^K N_k&\le \sum \limits _{k=0}^K \left( \frac{r c_p L_p 2^r}{\sigma _r} R_k^{p+1-r} \right) ^{\frac{2}{3p+1}}+K= \sum \limits _{k=0}^K \left( \frac{r c_p L_p 2^r}{\sigma _r} (R_0 2^{-k})^{p+1-r} \right) ^{\frac{2}{3p+1}}+K\\&=\left( \frac{r c_p L_p 2^r R_0^{p+1-r}}{\sigma _r}\right) ^{\frac{2}{3p+1}} \sum \limits _{k=0}^K 2^{\frac{-2(p+1-r)k}{3p+1}}+K. \end{aligned}$$

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kamzolov, D., Gasnikov, A., Dvurechensky, P. (2020). Optimal Combination of Tensor Optimization Methods. In: Olenev, N., Evtushenko, Y., Khachay, M., Malkova, V. (eds) Optimization and Applications. OPTIMA 2020. Lecture Notes in Computer Science(), vol 12422. Springer, Cham. https://doi.org/10.1007/978-3-030-62867-3_13

Download citation

Publish with us

Policies and ethics