Quadratic programming over ellipsoids with applications to constrained linear regression and tensor decomposition

Phan, Anh-Huy; Yamagishi, Masao; Mandic, Danilo; Cichocki, Andrzej

doi:10.1007/s00521-019-04191-z

Quadratic programming over ellipsoids with applications to constrained linear regression and tensor decomposition

Original Article
Published: 20 April 2019

Volume 32, pages 7097–7120, (2020)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Anh-Huy Phan ORCID: orcid.org/0000-0002-5509-7773^1,4,
Masao Yamagishi²,
Danilo Mandic³ &
…
Andrzej Cichocki^1,5,6,7

7 Citations
Explore all metrics

Abstract

A novel algorithm to solve the quadratic programming (QP) problem over ellipsoids is proposed. This is achieved by splitting the QP problem into two optimisation sub-problems, (1) quadratic programming over a sphere and (2) orthogonal projection. Next, an augmented-Lagrangian algorithm is developed for this multiple constraint optimisation. Benefitting from the fact that the QP over a single sphere can be solved in a closed form by solving a secular equation, we derive a tighter bound of the minimiser of the secular equation. We also propose to generate a new positive semidefinite matrix with a low condition number from the matrices in the quadratic constraint, which is shown to improve convergence of the proposed augmented-Lagrangian algorithm. Finally, applications of the quadratically constrained QP to bounded linear regression and tensor decomposition paradigms are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A hierarchy of semidefinite relaxations for completely positive tensor optimization problems

Article 22 February 2019

An exact penalty approach for optimization with nonnegative orthogonality constraints

Article 25 March 2022

Riemannian optimization on unit sphere with p-norm and its applications

Article 02 May 2023

References

Anandkumar A, Ge R, Hsu D, Kakade S, Telgarsky M (2014) Tensor decompositions for learning latent variable models. J Mach Learn Res 15:2773–2832
MathSciNet MATH Google Scholar
ApS M (2015) The MOSEK optimization toolbox for MATLAB manual. Version 7.1 (Revision 28). http://docs.mosek.com/7.1/toolbox/index.html
Arima N, Kim S, Kojima M (2013) A quadratically constrained quadratic optimization model for completely positive cone programming. SIAM J Optim 23(4):2320–2340. https://doi.org/10.1137/120890636
Article MathSciNet MATH Google Scholar
Bao X, Sahinidis NV, Tawarmalani M (2011) Semidefinite relaxations for quadratically constrained quadratic programming: a review and comparisons. Math Program 129(1):129. https://doi.org/10.1007/s10107-011-0462-2
Article MathSciNet MATH Google Scholar
Baron DP (1972) Quadratic programming with quadratic constraints. Naval Res Logist Q 19(2):253–260
Article MathSciNet Google Scholar
Ben X, Zhang P, Yan R, Yang M, Ge G (2016) Gait recognition and micro-expression recognition based on maximum margin projection with tensor representation. Neural Comput Appl 27(8):2629–2646. https://doi.org/10.1007/s00521-015-2031-8
Article Google Scholar
Ben-Tal A, Teboulle M (1996) Hidden convexity in some nonconvex quadratically constrained quadratic programming. Math Program 72(1):51–63. https://doi.org/10.1007/BF02592331
Article MathSciNet MATH Google Scholar
Biswas P, Lian TC, Wang TC, Ye Y (2006) Semidefinite programming based algorithms for sensor network localization. ACM Trans Sen Netw 2(2):188–220. https://doi.org/10.1145/1149283.1149286
Article Google Scholar
Bose S, Gayme DF, Chandy KM, Low SH (2015) Quadratically constrained quadratic programs on acyclic graphs with application to power flow. IEEE Trans Control Netw Syst 2(3):278–287. https://doi.org/10.1109/TCNS.2015.2401172
Article MathSciNet MATH Google Scholar
Boumal N, Mishra B, Absil PA, Sepulchre R (2014) Manopt, a Matlab toolbox for optimization on manifolds. J Mach Learn Res 15:1455–1459
MATH Google Scholar
Boyd S, El Ghaoui L, Feron E, Balakrishnan V (1994) Linear matrix inequalities in system and control theory. Studies in applied mathematics, vol 15. SIAM, Philadelphia
Burer S, Kim S, Kojima M (2014) Faster, but weaker, relaxations for quadratically constrained quadratic programs. Comput Optim Appl 59(1):27–45. https://doi.org/10.1007/s10589-013-9618-8
Article MathSciNet MATH Google Scholar
Cardoso JF (1991) Super-symmetric decomposition of the fourth-order cumulant tensor. blind identification of more sources than sensors. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP91), Toronto, vol 5, pp 3109–3112
Chen Y, Gao DY (2013) Global solutions to large-scale spherical constrained quadratic minimization via canonical dual approach. ArXiv e-prints arXiv:1308.4450v1
de Almeida ALF, Luciani X, Stegeman A, Comon P (2012) CONFAC decomposition approach to blind identification of underdetermined mixtures based on generating function derivatives. IEEE Trans Signal Process 60(11):5698–5713
Article MathSciNet Google Scholar
Ding S, Zhang N, Zhang X, Wu F (2017) Twin support vector machine: theory, algorithm and applications. Neural Comput Appl 28(11):3119–3130. https://doi.org/10.1007/s00521-016-2245-4
Article Google Scholar
Dostál Z (2009) Optimal quadratic programming algorithms: with applications to variational inequalities, 1st edn. Springer, New York
MATH Google Scholar
Dostál Z, Kozubek T (2012) An optimal algorithm and superrelaxation for minimization of a quadratic function subject to separable convex constraints with applications. Math Program 135(1):195–220. https://doi.org/10.1007/s10107-011-0454-2
Article MathSciNet MATH Google Scholar
Gander W, Golub GH, von Matt U (1989) A constrained eigenvalue problem. Special Issue Dedicated to Alan J. Hoffman, Linear Algebra Appl 114:815–839. https://doi.org/10.1016/0024-3795(89)90494-1
Article MathSciNet MATH Google Scholar
Gentile C, Li S, Kar P, Karatzoglou A, Zappella G, Etrue E (2017) On context-dependent clustering of bandits. In: Precup D, Teh YW (eds) Proceedings of the 34th international conference on machine learning, proceedings of machine learning research. PMLR, International Convention Centre, Sydney, vol 70, pp 1253–1262
Gershman AB, Sidiropoulos ND, Shahbazpanahi S, Bengtsson M, Ottersten B (2010) Convex optimization-based beamforming. IEEE Signal Process Mag 27(3):62–75. https://doi.org/10.1109/MSP.2010.936015
Article Google Scholar
Goemans MX, Williamson DP (1995) Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J ACM 42(6):1115–1145. https://doi.org/10.1145/227683.227684
Article MathSciNet MATH Google Scholar
Hager WW (2001) Minimizing a quadratic over a sphere. SIAM J Optim 12(1):188–208. https://doi.org/10.1137/S1052623499356071
Article MathSciNet MATH Google Scholar
Holmström K (1997) TOMLAB—an environment for solving optimization problems in MATLAB. In: Proceedings for the Nordic Matlab conference ’97, pp 27–28
Kar P, Li S, Narasimhan H, Chawla S, Sebastiani F (2016) Online optimization methods for the quantification problem. In: Proceedings of the 22 ACM SIGKDD international conference on knowledge discovery and data mining, KDD’16. New York, pp 1625–1634. https://doi.org/10.1145/2939672.2939832
Kim S, Kojima M (2000) Second order cone programming relaxation of nonconvex quadratic optimization problems. Optim Methods Softw 15:201–224
Article MathSciNet Google Scholar
Kim S, Kojima M (2003) Exact solutions of some nonconvex quadratic optimization problems via SDP and SOCP relaxations. Comput Optim Appl 26(2):143–154. https://doi.org/10.1023/A:1025794313696
Article MathSciNet MATH Google Scholar
Korda N, Szörényi B, Li S (2016) Distributed clustering of linear bandits in peer to peer networks. In: Proceedings of the 33nd international conference on machine learning, ICML 2016, pp 1301–1309
Li S (2016) The art of clustering bandits. PhD thesis, Universitá degli Studi dell‘Insubria
Li S, Karatzoglou A, Gentile C (2016) Collaborative filtering bandits. In: Proceedings of the 39th international ACM SIGIR conference on research and development in information retrieval, SIGIR’16. ACM, New York, pp 539–548 https://doi.org/10.1145/2911451.2911548
Lim L, Comon P (2014) Blind multilinear identification. IEEE Trans Inf Theory 60(2):1260–1280. https://doi.org/10.1109/TIT.2013.2291876
Article MathSciNet MATH Google Scholar
Linderoth J (2005) A simplicial branch-and-bound algorithm for solving quadratically constrained quadratic programs. Math Program 103(2):251–282. https://doi.org/10.1007/s10107-005-0582-7
Article MathSciNet MATH Google Scholar
Locatelli M (2015) Some results for quadratic problems with one or two quadratic constraints. Oper Res Lett 43(2):126–131. https://doi.org/10.1016/j.orl.2014.12.002
Article MathSciNet MATH Google Scholar
Luo Z, Ma W, So AM, Ye Y, Zhang S (2010) Semidefinite relaxation of quadratic optimization problems. IEEE Signal Process Mag 27(3):20–34. https://doi.org/10.1109/MSP.2010.936019
Article Google Scholar
Muti D, Bourennane S (2005) Multiway filtering based on fourth order cumulants. Appl Signal Proc EURASIP 7:1147–1159
MATH Google Scholar
Nesterov Y, Wolkowicz H, Ye Y (2000) Semidefinite programming relaxations of nonconvex quadratic optimization. Springer, New York, pp 361–419. https://doi.org/10.1007/978-1-4615-4381-7_13
Google Scholar
Nie J (2017) Generating polynomials and symmetric tensor decompositions. Found Comput Math 17(2):423–465. https://doi.org/10.1007/s10208-015-9291-7
Article MathSciNet MATH Google Scholar
Phan AH, Cichocki A (2010) Tensor decompositions for feature extraction and classification of high dimensional datasets. Nonlinear Theory Appl IEICE 1(1):37–68
Article Google Scholar
Phan AH, Tichavský P, Cichocki A (2019) Error preserving correction: a method for CP decomposition at a target error bound. IEEE Trans Signal Process 67(5):1175–1190. https://doi.org/10.1109/TSP.2018.2887192
Article MathSciNet MATH Google Scholar
Phan AH, Yamagishi M, Cichocki A (2017) An augmented Lagrangian algorithm for decomposition of symmetric tensors of order-4. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 2547–2551. https://doi.org/10.1109/ICASSP.2017.7952616
Reeves SJ (2014) Chapter 6 - image restoration: Fundamentals of image restoration. In: Trussell J, Srivastava A, Roy-Chowdhury AK, Srivastava A, Naylor PA, Chellappa R, Theodoridis S (eds) Academic press library in signal processing, vol 4. Elsevier, Amsterdam, pp 165–192. https://doi.org/10.1016/B978-0-12-396501-1.00006-6
Google Scholar
Rendl F, Wolkowicz H (1997) A semidefinite framework for trust region subproblems with applications to large scale minimization. Math Program 77:273–299. https://doi.org/10.1007/BF02614438
Article MathSciNet MATH Google Scholar
Rojas M, Santos SA, Sorensen DC (2008) Algorithm 873: LSTRS: Matlab software for large-scale trust-region subproblems and regularization. ACM Trans Math Softw 34(2):11:1–11:28. https://doi.org/10.1145/1326548.1326553
Article MathSciNet MATH Google Scholar
Shashua A, Zass R, Hazan T (2006) Multi-way clustering using super-symmetric non-negative tensor factorization. In: European conference on computer vision (ECCV), Graz. http://www.cs.huji.ac.il/~zass/
Chapter Google Scholar
Sorensen DC (1997) Minimization of a large-scale quadratic function subject to a spherical constraint. SIAM J Optim 7(1):141–161. https://doi.org/10.1137/S1052623494274374
Article MathSciNet MATH Google Scholar
Waldspurger I, d’Aspremont A, Mallat S (2015) Phase recovery, maxcut and complex semidefinite programming. Math Program 149(1–2):47–81. https://doi.org/10.1007/s10107-013-0738-9
Article MathSciNet MATH Google Scholar
Wen Z, Yin W (2012) A feasible method for optimization with orthogonality constraints. Math Program. https://doi.org/10.1007/s10107-012-0584-1
Article MATH Google Scholar
Yuen N, Friedlander B (1996) Asymptotic performance analysis of blind signal copy using fourth order cumulant. Int J Adapt Control Signal Process 10(2–3):239–265
Article Google Scholar
Zhang LH, Liao LZ, Ng MK (2010) Fast algorithms for the generalized Foley–Sammon discriminant analysis. SIAM J Matrix Anal Appl 31(4):1584–1605. https://doi.org/10.1137/080720863
Article MathSciNet MATH Google Scholar
Zhu Y, Xue J (2017) Face recognition based on random subspace method and tensor subspace analysis. Neural Comput Appl 28(2):233–244. https://doi.org/10.1007/s00521-015-2052-3
Article Google Scholar

Download references

Acknowledgements

The work of AHP and AC was supported by the Mega Grant Project (14.756.31.0001).

Author information

Authors and Affiliations

Skolkovo Institute of Science and Technology (Skoltech), Moscow, 121205, Russia
Anh-Huy Phan & Andrzej Cichocki
Tokyo Institute of Technology, Tokyo, Japan
Masao Yamagishi
Imperial College, London, UK
Danilo Mandic
Tokyo University of Agriculture and Technology, Tokyo, 183-8538, Japan
Anh-Huy Phan
Systems Research Institute, Polish Academy of Science, Warsaw, 01-447, Poland
Andrzej Cichocki
Hangzhou Dianzu University (HDU), Hangzhou, 310005, China
Andrzej Cichocki
Tokyo University of Agriculture and Technology, Tokyo, 183-8538, Japan
Andrzej Cichocki

Authors

Anh-Huy Phan
View author publications
You can also search for this author in PubMed Google Scholar
Masao Yamagishi
View author publications
You can also search for this author in PubMed Google Scholar
Danilo Mandic
View author publications
You can also search for this author in PubMed Google Scholar
Andrzej Cichocki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anh-Huy Phan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Proof of Lemma 1

Proof

We consider a simple case when some eigenvalues are identical, e.g. $s_1 = s_2 = \cdots = s_L< s_{L+1}< \cdots < s_K$. If $\varvec{c}_{1:L}$ are all zeros, the objective function is independent of $\tilde{\varvec{x}}_{1:L} = [\tilde{x}_1, \tilde{x}_2 \ldots , \tilde{x}_L]$. Hence, $\tilde{\varvec{x}}_{1:L}$ can be any point on the ball $\Vert \tilde{\varvec{x}}_{1:L}\Vert ^2 = d^2 = 1 - \sum _{k = L+1}^{K} \tilde{x}_{k}^2$. Otherwise, $\tilde{\varvec{x}}_{1:L}$ is a minimiser to a constrained linear programming while fixing the other parameters $\tilde{x}_{L+1}, \ldots , \tilde{x}_K$, in the form

$$\begin{aligned} \min \,\, \varvec{c}_{1:L}^{\mathrm{T}} \, \tilde{\varvec{x}}_{1:L} \quad {\text {s.t.}} \quad \Vert \tilde{\varvec{x}}_{1:L}\Vert = d \end{aligned}$$

which yields $ \tilde{\varvec{x}}_{1:L}= \frac{-d}{\Vert \varvec{c}_{1:L}\Vert } \varvec{c}_{1:L}$.

For both cases, we can define $ \varvec{z}= [-d, \tilde{x}_{L+1}, \ldots , \tilde{x}_{K}]$, $\tilde{\varvec{c}} = [\Vert \varvec{c}_{1:L}\Vert , c_{L+1}, \ldots , c_{K}]$, $\tilde{\varvec{s}} = [s_1, s_{L+1}, \ldots , s_K]$, and perform a reparameterisation to estimate $\varvec{z}$ from a similar SCQP but with distinct eigenvalues $\tilde{\varvec{s}}$, as

$$\begin{aligned}&\min \,\, \frac{1}{2} \, \varvec{z}^{\mathrm{T}} \, {\text {diag}}(\tilde{\varvec{s}} ) \, {\varvec{z}} + \tilde{\varvec{c}}^{\mathrm{T}} {\varvec{z}}\quad {\text {s.t.}} \quad {\varvec{z}}^{\mathrm{T}} {\varvec{z}} = 1. \end{aligned}$$

□

Appendix 2: Proof of Lemma 2

Proof

$f^{\prime }(\lambda )$ monotonically decreases with $\lambda < s_1 = 1$, since the second derivative $ f^{''}(\lambda ) = - 2 \, \sum _{k} \frac{c_k^2}{(s_k - \lambda )^3} <0 $ for all $\lambda < s_1 = 1$.

In addition, since $s_k \ge 1$, for all k, we have

$$\begin{aligned} f^{\prime }(0) = 1 - \sum _{k = 1}^{K} \frac{c_k^2}{s_k^2} \ge 1 - \sum _{k = 1}^{K} c_k^2 = 0 \end{aligned}$$

and

$$\begin{aligned} f^{\prime }(1-|c_1|)& = {} 1 - \sum _{k = 1}^{K} \frac{c_k^2}{(1-|c_1| - s_k)^2} = 1 - \frac{c_1^2}{c_1^2} - \sum _{k = 2}^{K} \frac{c_k^2}{(1-|c_1| - s_k)^2} \\ &\le {} - \sum _{k = 2}^{K} \frac{c_k^2}{(1-|c_1| - s_k)^2} \le 0 . \end{aligned}$$

This implies that $f^{\prime }(\lambda )$ has a unique root smaller than 1. Moreover, the root lies in the interval $[0,1-|c_1|)$. □

Appendix 3: Proof of Lemma 3

Proof

Let $\lambda _1$ be a root which is smaller than $s_1 = 1$, and $\lambda _2$ be another root of $f^{\prime }(\lambda ) $. Then, according to Lemma 2, $\lambda _2> 1 > \lambda _1$, and

$$\begin{aligned} \sum _{k = 1}^{K} \frac{c_k^2}{(\lambda _1 - s_k)^2} = \sum _{k = 1}^{K} \frac{c_k^2}{(\lambda _2 - s_k)^2} = 1. \end{aligned}$$

(40)

It can be shown that

$$\begin{aligned} & f(\lambda _2) - f(\lambda _1) = {} \lambda _2 - \lambda _1 + \sum _{k = 1}^{K} \frac{c_k^2}{\lambda _2 - s_k} - \frac{c_k^2}{\lambda _1 - s_k} \nonumber \\ & \quad = {} (\lambda _2 - \lambda _1) \left( 1 - \sum _{k = 1}^{K} \frac{|c_k|}{s_k-\lambda _1} \frac{|c_k|}{s_k-\lambda _2}\right) \nonumber \\ & \quad \ge {} (\lambda _2 - \lambda _1) \left( 1 - \sqrt{\sum _{k = 1}^{K} \frac{c_k^2}{(\lambda _1 - s_k)^2}} \, \sqrt{ \sum _{k = 1}^{K} \frac{c_k^2}{(\lambda _2 - s_k)^2}} \right) \nonumber \\ &\quad = {} (\lambda _2 - \lambda _1) ( 1 - 1 \times 1) = 0. \end{aligned}$$

(41)

This inequality is obtained by applying the Cauchy–Schwarz inequality, whereas (41) is obtained after replacing the optimal conditions in (40). The equality case does not occur because of $s_1 -\lambda _2 < 0$, that is, $f(\lambda _2) > f(\lambda _1)$ and the minimiser $\lambda ^{\star }$ of $f(\lambda )$ is the minimum root $\lambda _1$ of $f^{\prime }(\lambda )$. □

Appendix 4: Proof of Lemma 4

Proof

We first show that the polynomials $p_i(t)$ have unique roots in $\displaystyle \left[ {|c_1|},1 \right] $. The second derivative of $p_i(t)$ is given by

$$\begin{aligned} p_i^{\prime \prime }(t) = 12t^2 +12d_i t + 2(d_i^2-1) \end{aligned}$$

and has two roots $\bar{t}_{1,2} = \displaystyle \frac{-3d_i \mp \sqrt{3d_i^2 + 6}}{6}$.

If $d_i > 1$, the roots $\bar{t}_{1,2}$ are negative. Hence, the first derivative $p_i^{\prime }(t)$ monotonically increases in $[0, +\infty )$. In addition, since

$$\begin{aligned} p_i^{\prime }(0) = - 2c_1^2 d_i \le 0 \end{aligned}$$

$p_i^{\prime }(t)$ has only one root in $[0, +\infty )$. Together with the fact that $p_i(0) = -c_1^2 d_i^2 \le 0$, $p_i(|c_1|) = c_1^2 (c_1^2 - 1) \, \le \, 0 $ and $p_i(1) = d_i(d_i+2) (1 - c_1^2) \ge 0$, the polynomial $p_i(t)$ has a unique root in $\displaystyle \left[ {|c_1|},1 \right] $.

If $d_i\le 1$, the second root $\bar{t}_2$ is nonnegative, $\bar{t}_2 \ge 0$. However, since $p_i^{\prime }(0) = - 2c_1^2 d_i \le 0$, the first derivative $p_i^{\prime }(t)$ has only one root in $[0, +\infty )$. Again as for the case $d_1>1$, the polynomial $p_i(t)$ also has unique root in $\displaystyle \left[ {|c_1|},1 \right] $.

As the definition of the root $t_2$, we can prove that derivative $f^{\prime }(s_1 - t_2)$ does not exceed zero, that is

$$\begin{aligned} f^{\prime }(s_1-t_2)& = {} 1 - \frac{c_1^2}{t_2^2} - \sum _{k = 2}^{K} \frac{c_k^2}{(s_k-s_1 + t_2)^2} \le 1 - \frac{c_1^2}{t_2^2} - \frac{\sum _{k = 2}^{K} c_k^2}{(s_K-s_1 + t_2)^2} \nonumber \\ & = {} 1 - \frac{c_1^2}{t_2^2} - \frac{1 - c_1^2}{(d_2 + t_2)^2} = \frac{p_2(t_2)}{t_2^2 (d_2 + t_2)^2} \nonumber \\ & = {} 0. \end{aligned}$$

(42)

Similarly, we have

$$\begin{aligned} f^{\prime }(s_1-t_1)& = {} 1 - \frac{c_1^2}{t_1^2} - \sum _{k = 2}^{K} \frac{c_k^2}{(s_k-s_1 + t_1)^2} \ge 1 - \frac{c_1^2}{t_1^2} - \frac{\sum _{k = 2}^{K} c_k^2}{(d_1 + t_1)^2} \nonumber \\ & = {} 1 - \frac{c_1^2}{t_1^2} - \frac{1 - c_1^2}{(d_1 + t_1)^2} = \frac{p_1(t_1)}{t_1^2 (d_1 + t_1)^2} \nonumber \\ & = {} 0. \end{aligned}$$

(43)

From (42) and (43), it follows that $f^{\prime }(t)$ has a root in $[1 - t_1, 1-t_2]$. This root is unique and also the global minimiser of $f(\lambda )$ in (5). This completes the proof. □

Appendix 5: Proof of Lemma 5

Proof

First, similar to Lemma 2, the roots $\lambda _{l,L}^{\star }$ and $\lambda _{u,L}^{\star }$ are unique in the interval $[0, 1-|c_1|]$. Taking into account that $\sum _{k = 1}^{K} c_k^2 = 1$, and $s_K \ge s_k$ for all k, we have

$$\begin{aligned} f^{\prime }(\lambda )& = {} 1 - \sum _{l = 1}^{L} \frac{c_l^2}{(s_l - \lambda )^2} - \sum _{k = L+1}^{K} \frac{c_k^2}{(s_k- \lambda )^2} \\&\le {} 1 - \sum _{l = 1}^{L} \frac{c_l^2}{(s_l - \lambda )^2} - \frac{\sum _{k = L+1}^{K} c_k^2}{(s_K- \lambda )^2} = 1 - \sum _{l = 1}^{L} \frac{c_l^2}{(s_l - \lambda )^2} - \frac{\tilde{c}_{L+1}^2}{(s_K- \lambda )^2} \\ & = {} f^{(L)}_{u}(\lambda ) . \end{aligned}$$

Similarly, we can derive $f^{\prime }(\lambda ) \ge f^{(L)}_{l}(\lambda )$. It appears that the function values of $f^{\prime }(\lambda )$ at $\lambda _{l,L}^{\star }$ and $\lambda _{u,L}^{\star }$ are nonnegative and non-positive, respectively,

$$\begin{aligned} f^{\prime }(\lambda _{l,L}^{\star })\ge & {} f^{(L)}_{l}(\lambda _{l,L}^{\star }) = 0 , \;{\text {and}} \;\; f^{\prime }(\lambda _{u,L}^{\star }) \le f^{(L)}_{u}(\lambda _{u,L}^{\star }) = 0, \end{aligned}$$

thus implying that $ \lambda _{l,L}^{\star } \le \lambda ^{\star } \, \le \lambda _{u,L}^{\star }$. The sequence of inequalities in (6) can be proved in a similar way. □

Appendix 6: Proof of Lemma 6

Proof

By contradiction, assume that the variable $\tilde{x}_n^{\star }$ is nonzero. Since there is only one $c_n = 0$, from (3), the multiplier $\lambda ^{\star }$ must be equal to $s_n$, that is, $ \lambda ^{\star } = s_n$, and the minimiser $\tilde{\varvec{x}}^{\star }$ is given by

$$\begin{aligned} \tilde{x}_{k}^{\star }& = {} \frac{c_k}{s_n - s_k} ,\quad k \ne n \end{aligned}$$

while from the unit-length condition of $\tilde{\varvec{x}}^{\star }$, we have $\tilde{x}_{n}^{\star 2} = 1 - \sum _{k\ne n} \tilde{x}_{k}^{\star 2}$ which requires an additional assumption $ \sum _{k\ne n} \frac{c_k^2}{(s_n-s_k)^2} < 1$.

The objective function in (2) at ${\tilde{\varvec{x}}}^{\star }$, as well as the Lagrangian function at $({\tilde{\varvec{x}}}^{\star }, \lambda ^{\star } = s_n)$ are given by

$$\begin{aligned} {\mathcal {L}}({\tilde{\varvec{x}}}^{\star }, \lambda ^{\star })& = {} \frac{1}{2} \left( s_n - s_n \, \sum _{k \ne n} \frac{c_k^2 }{(s_n - s_k)^2} + \sum _{k \ne n} \frac{c_k^2 \, s_k}{(s_n - s_k)^2} \right) + \sum _{k \ne n} \frac{c_k^2 }{s_n - s_k} \nonumber \\ & = {} \frac{1}{2}\left( s_n + \sum _{k \ne n} \frac{c_k^2}{s_n - s_k} \right) . \end{aligned}$$

(44)

Now, we consider a vector $\bar{\varvec{x}}$ whose n-th entry is zero, $\bar{x}_n = 0$, and the rest $(K-1)$ coefficients $\bar{\varvec{x}}_{n} = [\bar{x}_1, \ldots , \bar{x}_{n-1}, \bar{x}_{n+1},\ldots , \bar{x}_K]$ are minimiser to a reduced problem

$$\begin{aligned}&\min \,\, \frac{1}{2} \sum _{k \ne n} s_k \, \tilde{x}_k^2 + \sum _{k \ne n} c_k \, \tilde{x}_k,\;\;\; {\text {s.t.}} \quad \sum _{k \ne n} \tilde{x}_k^2 = 1. \end{aligned}$$

According to the results in Sect. 2.2, when $c_k$, $k \ne n$, are nonzeros, the Lagrangian function for this reduced problem at the minimiser $\bar{\varvec{x}}_{n}$ is given by

$$\begin{aligned} {\mathcal {L}}_n(\bar{\varvec{x}}_{n}, \lambda _n^{\star }) = \frac{1}{2}\left( \lambda _n^{\star } + \sum _{k \ne n} \frac{c_k^2}{\lambda _n^{\star } - s_k} \right) , \end{aligned}$$

(45)

where the optimal multiplier $\lambda _n^{\star } < s_1 = 1$. From (44) and (45), it is apparent that

$$\begin{aligned} {\mathcal {L}}({\tilde{\varvec{x}}}^{\star }, \lambda ^{\star }) > {\mathcal {L}}_n(\bar{\varvec{x}}_{n}, \lambda _n^{\star }) = {\mathcal {L}}({\bar{\varvec{x}}}, \lambda ^{\star }_n), \end{aligned}$$

which contradicts with the claim that $\tilde{\varvec{x}}^{\star }$ is the minimiser to the problem (2). This implies that the n-th variable of the minimiser must be zero, i.e. $\tilde{x}_n^{\star } = 0.$ □

Appendix 7: Proof of Lemma 7

Proof

When $c_1 = 0$, from the first optimality condition in (3), we have

$$\begin{aligned} (s_1 - \lambda ) \, \tilde{x}_1 = 0. \end{aligned}$$

Assume that $\tilde{\varvec{x}}^{\star }$ is a minimiser to the problem in (2) with a nonzero $\tilde{x}_1^{\star }$, then $\lambda = s_1 = 1$ and

$$\begin{aligned} \tilde{x}_k^{\star } = \frac{c_k}{\lambda -s_k} = \frac{c_k}{1-s_k} \quad {\text {for}} \; k >1. \end{aligned}$$

From the unit-length constraint, it follows that $ {(\tilde{x}_1^{\star })}^2 = 1- \sum _{k>1} ({\tilde{x}_k^{\star }})^2 = 1 - d$, which requires the condition $ d \le 1$. Implying that, if $ d >1$, $\tilde{x}_1^{\star }$ must be zero, and the rest $(K-1)$ variables $[\tilde{x}_2^{\star },\ldots , \tilde{x}_K^{\star }]$ are minimiser to the reduced problem of (2).

When $d\le 1$, there exists $\tilde{x}_1^{\star }$, and the objective function at $\tilde{\varvec{x}}^{\star }$ is given by

$$\begin{aligned} {\mathcal {L}}(\tilde{\varvec{x}}^{\star },s_1) = \frac{1}{2}\left( 1 - \sum _{k>2} \frac{c_k^2}{s_k - 1}\right) . \end{aligned}$$

Now, we consider a vector $\tilde{\varvec{x}}$ whose $\tilde{x}_1 = 0$, and $\bar{\varvec{x}} = [\tilde{x}_2, \ldots , \tilde{x}_K]$ is a minimiser to the reduced problem (7). Similar to the analysis in Sect. 2.2, the objective function of the reduced problem (7) achieves a global minimum at the minimum root $\bar{\lambda }$ of the first derivative of the Lagrangian function

$$\begin{aligned} {\mathcal {L}}_1(\bar{\varvec{x}},\bar{\lambda }) = \frac{1}{2}\left( \bar{\lambda } - \sum _{k>2} \frac{c_k^2}{ s_k - \bar{\lambda }} \right) , \end{aligned}$$

where $\bar{\lambda }$ is smaller than $s_2$.

Since the second derivative of ${\mathcal {L}}_1(\bar{\varvec{x}},\lambda )$ w.r.t. $\lambda $ is negative for all $\lambda < s_2$, the function ${\mathcal {L}}_1(\bar{\varvec{x}},\lambda )$ is concave in $(-\infty , s_2)$. It then follows that $ {\mathcal {L}}(\tilde{\varvec{x}}^{\star },s_1) < {\mathcal {L}}_1(\bar{\varvec{x}},\bar{\lambda })$, and $\tilde{\varvec{x}}^{\star }$ is the global minimiser. Note that $\tilde{x}_1^{\star }$ can be $\sqrt{1-d}$ or $-\sqrt{1-d}$. □

Appendix 8: Proof of Lemma 9

Proof

Let $\varvec{x}^{\star }$ be a minimiser to the problem (16)

$$\begin{aligned} \varvec{x}^{\star } = \mathop {\text {arg min}}\limits _{\varvec{x}} \quad \Vert \varvec{x}\Vert ^2 \quad {\text {s.t.}} \quad \Vert \varvec{y}- \mathbf{A}\varvec{x}\Vert \le \delta . \end{aligned}$$

It is obvious that if there are zero entries in $\varvec{x}^{\star }$, we can omit columns of $\mathbf{A}$ corresponding to these entries, and the regression problem formulated for the remaining sub-matrix of $\mathbf{A}$ has a nonzero minimiser. Hence, we can assume that entries of $\varvec{x}^{\star }$ are nonzeros.

Let $\varvec{z}= \varvec{y}- \sum _{k = 2}^{K} {\varvec{a}}_k x_k^{\star } $, then $x_1^{\star }$ is a minimiser to the optimisation w.r.t. $x_1$, that is

$$\begin{aligned} x_1^{\star } = \mathop {\text {arg min}}\limits _{x_1} \quad x_1^2 \quad {\text {s.t.}} \quad \Vert \varvec{z}- {\varvec{a}}_1 x_1 \Vert \le \delta . \end{aligned}$$

(46)

The constraint function can be written as

$$\begin{aligned} c(x_1) = \Vert \varvec{z}- {\varvec{a}}_1 x_1\Vert ^2 - \delta ^2 = \Vert {\varvec{a}}_1\Vert ^2 \, x_1^2 - 2 ({\varvec{a}}_1^{\mathrm{T}} \varvec{z}) \, x_1 + \Vert \varvec{z}\Vert ^2 - \delta ^2. \end{aligned}$$

Since $c(x_1^{\star }) \le 0$, $c(x_1)$ must have two roots $t_{-}$ and $t_{+}$. Moreover, it is clear from (46) that $\Vert \varvec{z}\Vert ^2 > \delta ^2$ otherwise $x_1^{\star } = 0$. Hence, the two roots $t_{-}$ and $t_{+}$ have the same signs because $ t_{-} t_{+} = \frac{\Vert \varvec{z}\Vert ^2 - \delta ^2}{\Vert {\varvec{a}}_1\Vert ^2} > 0$. As a result, the minimiser to (46) must be one of the two roots, $x_1^{\star } = \min (|t_{-}|,|t_{+}|)$, and the inequality condition becomes the equality one. □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Phan, AH., Yamagishi, M., Mandic, D. et al. Quadratic programming over ellipsoids with applications to constrained linear regression and tensor decomposition. Neural Comput & Applic 32, 7097–7120 (2020). https://doi.org/10.1007/s00521-019-04191-z

Download citation

Received: 17 September 2018
Accepted: 02 April 2019
Published: 20 April 2019
Issue Date: June 2020
DOI: https://doi.org/10.1007/s00521-019-04191-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Quadratic programming over ellipsoids with applications to constrained linear regression and tensor decomposition

Abstract

Access this article

Similar content being viewed by others

A hierarchy of semidefinite relaxations for completely positive tensor optimization problems

An exact penalty approach for optimization with nonnegative orthogonality constraints

Riemannian optimization on unit sphere with p-norm and its applications

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix 1: Proof of Lemma 1

Proof

Appendix 2: Proof of Lemma 2

Proof

Appendix 3: Proof of Lemma 3

Proof

Appendix 4: Proof of Lemma 4

Proof

Appendix 5: Proof of Lemma 5

Proof

Appendix 6: Proof of Lemma 6

Proof

Appendix 7: Proof of Lemma 7

Proof

Appendix 8: Proof of Lemma 9

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Quadratic programming over ellipsoids with applications to constrained linear regression and tensor decomposition

Abstract

Access this article

Similar content being viewed by others

A hierarchy of semidefinite relaxations for completely positive tensor optimization problems

An exact penalty approach for optimization with nonnegative orthogonality constraints

Riemannian optimization on unit sphere with p-norm and its applications

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix 1: Proof of Lemma 1

Proof

Appendix 2: Proof of Lemma 2

Proof

Appendix 3: Proof of Lemma 3

Proof

Appendix 4: Proof of Lemma 4

Proof

Appendix 5: Proof of Lemma 5

Proof

Appendix 6: Proof of Lemma 6

Proof

Appendix 7: Proof of Lemma 7

Proof

Appendix 8: Proof of Lemma 9

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation