Skip to main content
Log in

Quadratic programming over ellipsoids with applications to constrained linear regression and tensor decomposition

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

A novel algorithm to solve the quadratic programming (QP) problem over ellipsoids is proposed. This is achieved by splitting the QP problem into two optimisation sub-problems, (1) quadratic programming over a sphere and (2) orthogonal projection. Next, an augmented-Lagrangian algorithm is developed for this multiple constraint optimisation. Benefitting from the fact that the QP over a single sphere can be solved in a closed form by solving a secular equation, we derive a tighter bound of the minimiser of the secular equation. We also propose to generate a new positive semidefinite matrix with a low condition number from the matrices in the quadratic constraint, which is shown to improve convergence of the proposed augmented-Lagrangian algorithm. Finally, applications of the quadratically constrained QP to bounded linear regression and tensor decomposition paradigms are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Anandkumar A, Ge R, Hsu D, Kakade S, Telgarsky M (2014) Tensor decompositions for learning latent variable models. J Mach Learn Res 15:2773–2832

    MathSciNet  MATH  Google Scholar 

  2. ApS M (2015) The MOSEK optimization toolbox for MATLAB manual. Version 7.1 (Revision 28). http://docs.mosek.com/7.1/toolbox/index.html

  3. Arima N, Kim S, Kojima M (2013) A quadratically constrained quadratic optimization model for completely positive cone programming. SIAM J Optim 23(4):2320–2340. https://doi.org/10.1137/120890636

    Article  MathSciNet  MATH  Google Scholar 

  4. Bao X, Sahinidis NV, Tawarmalani M (2011) Semidefinite relaxations for quadratically constrained quadratic programming: a review and comparisons. Math Program 129(1):129. https://doi.org/10.1007/s10107-011-0462-2

    Article  MathSciNet  MATH  Google Scholar 

  5. Baron DP (1972) Quadratic programming with quadratic constraints. Naval Res Logist Q 19(2):253–260

    Article  MathSciNet  Google Scholar 

  6. Ben X, Zhang P, Yan R, Yang M, Ge G (2016) Gait recognition and micro-expression recognition based on maximum margin projection with tensor representation. Neural Comput Appl 27(8):2629–2646. https://doi.org/10.1007/s00521-015-2031-8

    Article  Google Scholar 

  7. Ben-Tal A, Teboulle M (1996) Hidden convexity in some nonconvex quadratically constrained quadratic programming. Math Program 72(1):51–63. https://doi.org/10.1007/BF02592331

    Article  MathSciNet  MATH  Google Scholar 

  8. Biswas P, Lian TC, Wang TC, Ye Y (2006) Semidefinite programming based algorithms for sensor network localization. ACM Trans Sen Netw 2(2):188–220. https://doi.org/10.1145/1149283.1149286

    Article  Google Scholar 

  9. Bose S, Gayme DF, Chandy KM, Low SH (2015) Quadratically constrained quadratic programs on acyclic graphs with application to power flow. IEEE Trans Control Netw Syst 2(3):278–287. https://doi.org/10.1109/TCNS.2015.2401172

    Article  MathSciNet  MATH  Google Scholar 

  10. Boumal N, Mishra B, Absil PA, Sepulchre R (2014) Manopt, a Matlab toolbox for optimization on manifolds. J Mach Learn Res 15:1455–1459

    MATH  Google Scholar 

  11. Boyd S, El Ghaoui L, Feron E, Balakrishnan V (1994) Linear matrix inequalities in system and control theory. Studies in applied mathematics, vol 15. SIAM, Philadelphia

  12. Burer S, Kim S, Kojima M (2014) Faster, but weaker, relaxations for quadratically constrained quadratic programs. Comput Optim Appl 59(1):27–45. https://doi.org/10.1007/s10589-013-9618-8

    Article  MathSciNet  MATH  Google Scholar 

  13. Cardoso JF (1991) Super-symmetric decomposition of the fourth-order cumulant tensor. blind identification of more sources than sensors. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP91), Toronto, vol 5, pp 3109–3112

  14. Chen Y, Gao DY (2013) Global solutions to large-scale spherical constrained quadratic minimization via canonical dual approach. ArXiv e-prints arXiv:1308.4450v1

  15. de Almeida ALF, Luciani X, Stegeman A, Comon P (2012) CONFAC decomposition approach to blind identification of underdetermined mixtures based on generating function derivatives. IEEE Trans Signal Process 60(11):5698–5713

    Article  MathSciNet  Google Scholar 

  16. Ding S, Zhang N, Zhang X, Wu F (2017) Twin support vector machine: theory, algorithm and applications. Neural Comput Appl 28(11):3119–3130. https://doi.org/10.1007/s00521-016-2245-4

    Article  Google Scholar 

  17. Dostál Z (2009) Optimal quadratic programming algorithms: with applications to variational inequalities, 1st edn. Springer, New York

    MATH  Google Scholar 

  18. Dostál Z, Kozubek T (2012) An optimal algorithm and superrelaxation for minimization of a quadratic function subject to separable convex constraints with applications. Math Program 135(1):195–220. https://doi.org/10.1007/s10107-011-0454-2

    Article  MathSciNet  MATH  Google Scholar 

  19. Gander W, Golub GH, von Matt U (1989) A constrained eigenvalue problem. Special Issue Dedicated to Alan J. Hoffman, Linear Algebra Appl 114:815–839. https://doi.org/10.1016/0024-3795(89)90494-1

    Article  MathSciNet  MATH  Google Scholar 

  20. Gentile C, Li S, Kar P, Karatzoglou A, Zappella G, Etrue E (2017) On context-dependent clustering of bandits. In: Precup D, Teh YW (eds) Proceedings of the 34th international conference on machine learning, proceedings of machine learning research. PMLR, International Convention Centre, Sydney, vol 70, pp 1253–1262

  21. Gershman AB, Sidiropoulos ND, Shahbazpanahi S, Bengtsson M, Ottersten B (2010) Convex optimization-based beamforming. IEEE Signal Process Mag 27(3):62–75. https://doi.org/10.1109/MSP.2010.936015

    Article  Google Scholar 

  22. Goemans MX, Williamson DP (1995) Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J ACM 42(6):1115–1145. https://doi.org/10.1145/227683.227684

    Article  MathSciNet  MATH  Google Scholar 

  23. Hager WW (2001) Minimizing a quadratic over a sphere. SIAM J Optim 12(1):188–208. https://doi.org/10.1137/S1052623499356071

    Article  MathSciNet  MATH  Google Scholar 

  24. Holmström K (1997) TOMLAB—an environment for solving optimization problems in MATLAB. In: Proceedings for the Nordic Matlab conference ’97, pp 27–28

  25. Kar P, Li S, Narasimhan H, Chawla S, Sebastiani F (2016) Online optimization methods for the quantification problem. In: Proceedings of the 22 ACM SIGKDD international conference on knowledge discovery and data mining, KDD’16. New York, pp 1625–1634. https://doi.org/10.1145/2939672.2939832

  26. Kim S, Kojima M (2000) Second order cone programming relaxation of nonconvex quadratic optimization problems. Optim Methods Softw 15:201–224

    Article  MathSciNet  Google Scholar 

  27. Kim S, Kojima M (2003) Exact solutions of some nonconvex quadratic optimization problems via SDP and SOCP relaxations. Comput Optim Appl 26(2):143–154. https://doi.org/10.1023/A:1025794313696

    Article  MathSciNet  MATH  Google Scholar 

  28. Korda N, Szörényi B, Li S (2016) Distributed clustering of linear bandits in peer to peer networks. In: Proceedings of the 33nd international conference on machine learning, ICML 2016, pp 1301–1309

  29. Li S (2016) The art of clustering bandits. PhD thesis, Universitá degli Studi dell‘Insubria

  30. Li S, Karatzoglou A, Gentile C (2016) Collaborative filtering bandits. In: Proceedings of the 39th international ACM SIGIR conference on research and development in information retrieval, SIGIR’16. ACM, New York, pp 539–548 https://doi.org/10.1145/2911451.2911548

  31. Lim L, Comon P (2014) Blind multilinear identification. IEEE Trans Inf Theory 60(2):1260–1280. https://doi.org/10.1109/TIT.2013.2291876

    Article  MathSciNet  MATH  Google Scholar 

  32. Linderoth J (2005) A simplicial branch-and-bound algorithm for solving quadratically constrained quadratic programs. Math Program 103(2):251–282. https://doi.org/10.1007/s10107-005-0582-7

    Article  MathSciNet  MATH  Google Scholar 

  33. Locatelli M (2015) Some results for quadratic problems with one or two quadratic constraints. Oper Res Lett 43(2):126–131. https://doi.org/10.1016/j.orl.2014.12.002

    Article  MathSciNet  MATH  Google Scholar 

  34. Luo Z, Ma W, So AM, Ye Y, Zhang S (2010) Semidefinite relaxation of quadratic optimization problems. IEEE Signal Process Mag 27(3):20–34. https://doi.org/10.1109/MSP.2010.936019

    Article  Google Scholar 

  35. Muti D, Bourennane S (2005) Multiway filtering based on fourth order cumulants. Appl Signal Proc EURASIP 7:1147–1159

    MATH  Google Scholar 

  36. Nesterov Y, Wolkowicz H, Ye Y (2000) Semidefinite programming relaxations of nonconvex quadratic optimization. Springer, New York, pp 361–419. https://doi.org/10.1007/978-1-4615-4381-7_13

    Google Scholar 

  37. Nie J (2017) Generating polynomials and symmetric tensor decompositions. Found Comput Math 17(2):423–465. https://doi.org/10.1007/s10208-015-9291-7

    Article  MathSciNet  MATH  Google Scholar 

  38. Phan AH, Cichocki A (2010) Tensor decompositions for feature extraction and classification of high dimensional datasets. Nonlinear Theory Appl IEICE 1(1):37–68

    Article  Google Scholar 

  39. Phan AH, Tichavský P, Cichocki A (2019) Error preserving correction: a method for CP decomposition at a target error bound. IEEE Trans Signal Process 67(5):1175–1190. https://doi.org/10.1109/TSP.2018.2887192

    Article  MathSciNet  MATH  Google Scholar 

  40. Phan AH, Yamagishi M, Cichocki A (2017) An augmented Lagrangian algorithm for decomposition of symmetric tensors of order-4. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 2547–2551. https://doi.org/10.1109/ICASSP.2017.7952616

  41. Reeves SJ (2014) Chapter 6 - image restoration: Fundamentals of image restoration. In: Trussell J, Srivastava A, Roy-Chowdhury AK, Srivastava A, Naylor PA, Chellappa R, Theodoridis S (eds) Academic press library in signal processing, vol 4. Elsevier, Amsterdam, pp 165–192. https://doi.org/10.1016/B978-0-12-396501-1.00006-6

    Google Scholar 

  42. Rendl F, Wolkowicz H (1997) A semidefinite framework for trust region subproblems with applications to large scale minimization. Math Program 77:273–299. https://doi.org/10.1007/BF02614438

    Article  MathSciNet  MATH  Google Scholar 

  43. Rojas M, Santos SA, Sorensen DC (2008) Algorithm 873: LSTRS: Matlab software for large-scale trust-region subproblems and regularization. ACM Trans Math Softw 34(2):11:1–11:28. https://doi.org/10.1145/1326548.1326553

    Article  MathSciNet  MATH  Google Scholar 

  44. Shashua A, Zass R, Hazan T (2006) Multi-way clustering using super-symmetric non-negative tensor factorization. In: European conference on computer vision (ECCV), Graz. http://www.cs.huji.ac.il/~zass/

    Chapter  Google Scholar 

  45. Sorensen DC (1997) Minimization of a large-scale quadratic function subject to a spherical constraint. SIAM J Optim 7(1):141–161. https://doi.org/10.1137/S1052623494274374

    Article  MathSciNet  MATH  Google Scholar 

  46. Waldspurger I, d’Aspremont A, Mallat S (2015) Phase recovery, maxcut and complex semidefinite programming. Math Program 149(1–2):47–81. https://doi.org/10.1007/s10107-013-0738-9

    Article  MathSciNet  MATH  Google Scholar 

  47. Wen Z, Yin W (2012) A feasible method for optimization with orthogonality constraints. Math Program. https://doi.org/10.1007/s10107-012-0584-1

    Article  MATH  Google Scholar 

  48. Yuen N, Friedlander B (1996) Asymptotic performance analysis of blind signal copy using fourth order cumulant. Int J Adapt Control Signal Process 10(2–3):239–265

    Article  Google Scholar 

  49. Zhang LH, Liao LZ, Ng MK (2010) Fast algorithms for the generalized Foley–Sammon discriminant analysis. SIAM J Matrix Anal Appl 31(4):1584–1605. https://doi.org/10.1137/080720863

    Article  MathSciNet  MATH  Google Scholar 

  50. Zhu Y, Xue J (2017) Face recognition based on random subspace method and tensor subspace analysis. Neural Comput Appl 28(2):233–244. https://doi.org/10.1007/s00521-015-2052-3

    Article  Google Scholar 

Download references

Acknowledgements

The work of AHP and AC was supported by the Mega Grant Project (14.756.31.0001).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anh-Huy Phan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Proof of Lemma 1

Proof

We consider a simple case when some eigenvalues are identical, e.g. \(s_1 = s_2 = \cdots = s_L< s_{L+1}< \cdots < s_K\). If \(\varvec{c}_{1:L}\) are all zeros, the objective function is independent of \(\tilde{\varvec{x}}_{1:L} = [\tilde{x}_1, \tilde{x}_2 \ldots , \tilde{x}_L]\). Hence, \(\tilde{\varvec{x}}_{1:L}\) can be any point on the ball \(\Vert \tilde{\varvec{x}}_{1:L}\Vert ^2 = d^2 = 1 - \sum _{k = L+1}^{K} \tilde{x}_{k}^2\). Otherwise, \(\tilde{\varvec{x}}_{1:L}\) is a minimiser to a constrained linear programming while fixing the other parameters \(\tilde{x}_{L+1}, \ldots , \tilde{x}_K\), in the form

$$\begin{aligned} \min \,\, \varvec{c}_{1:L}^{\mathrm{T}} \, \tilde{\varvec{x}}_{1:L} \quad {\text {s.t.}} \quad \Vert \tilde{\varvec{x}}_{1:L}\Vert = d \end{aligned}$$

which yields \( \tilde{\varvec{x}}_{1:L}= \frac{-d}{\Vert \varvec{c}_{1:L}\Vert } \varvec{c}_{1:L}\).

For both cases, we can define \( \varvec{z}= [-d, \tilde{x}_{L+1}, \ldots , \tilde{x}_{K}]\), \(\tilde{\varvec{c}} = [\Vert \varvec{c}_{1:L}\Vert , c_{L+1}, \ldots , c_{K}]\), \(\tilde{\varvec{s}} = [s_1, s_{L+1}, \ldots , s_K]\), and perform a reparameterisation to estimate \(\varvec{z}\) from a similar SCQP but with distinct eigenvalues \(\tilde{\varvec{s}}\), as

$$\begin{aligned}&\min \,\, \frac{1}{2} \, \varvec{z}^{\mathrm{T}} \, {\text {diag}}(\tilde{\varvec{s}} ) \, {\varvec{z}} + \tilde{\varvec{c}}^{\mathrm{T}} {\varvec{z}}\quad {\text {s.t.}} \quad {\varvec{z}}^{\mathrm{T}} {\varvec{z}} = 1. \end{aligned}$$

Appendix 2: Proof of Lemma 2

Proof

\(f^{\prime }(\lambda )\) monotonically decreases with \(\lambda < s_1 = 1\), since the second derivative \( f^{''}(\lambda ) = - 2 \, \sum _{k} \frac{c_k^2}{(s_k - \lambda )^3} <0 \) for all \(\lambda < s_1 = 1\).

In addition, since \(s_k \ge 1\), for all k, we have

$$\begin{aligned} f^{\prime }(0) = 1 - \sum _{k = 1}^{K} \frac{c_k^2}{s_k^2} \ge 1 - \sum _{k = 1}^{K} c_k^2 = 0 \end{aligned}$$

and

$$\begin{aligned} f^{\prime }(1-|c_1|)& = {} 1 - \sum _{k = 1}^{K} \frac{c_k^2}{(1-|c_1| - s_k)^2} = 1 - \frac{c_1^2}{c_1^2} - \sum _{k = 2}^{K} \frac{c_k^2}{(1-|c_1| - s_k)^2} \\ &\le {} - \sum _{k = 2}^{K} \frac{c_k^2}{(1-|c_1| - s_k)^2} \le 0 . \end{aligned}$$

This implies that \(f^{\prime }(\lambda )\) has a unique root smaller than 1. Moreover, the root lies in the interval \([0,1-|c_1|)\). □

Appendix 3: Proof of Lemma 3

Proof

Let \(\lambda _1\) be a root which is smaller than \(s_1 = 1\), and \(\lambda _2\) be another root of \(f^{\prime }(\lambda ) \). Then, according to Lemma 2, \(\lambda _2> 1 > \lambda _1\), and

$$\begin{aligned} \sum _{k = 1}^{K} \frac{c_k^2}{(\lambda _1 - s_k)^2} = \sum _{k = 1}^{K} \frac{c_k^2}{(\lambda _2 - s_k)^2} = 1. \end{aligned}$$
(40)

It can be shown that

$$\begin{aligned} & f(\lambda _2) - f(\lambda _1) = {} \lambda _2 - \lambda _1 + \sum _{k = 1}^{K} \frac{c_k^2}{\lambda _2 - s_k} - \frac{c_k^2}{\lambda _1 - s_k} \nonumber \\ & \quad = {} (\lambda _2 - \lambda _1) \left( 1 - \sum _{k = 1}^{K} \frac{|c_k|}{s_k-\lambda _1} \frac{|c_k|}{s_k-\lambda _2}\right) \nonumber \\ & \quad \ge {} (\lambda _2 - \lambda _1) \left( 1 - \sqrt{\sum _{k = 1}^{K} \frac{c_k^2}{(\lambda _1 - s_k)^2}} \, \sqrt{ \sum _{k = 1}^{K} \frac{c_k^2}{(\lambda _2 - s_k)^2}} \right) \nonumber \\ &\quad = {} (\lambda _2 - \lambda _1) ( 1 - 1 \times 1) = 0. \end{aligned}$$
(41)

This inequality is obtained by applying the Cauchy–Schwarz inequality, whereas (41) is obtained after replacing the optimal conditions in (40). The equality case does not occur because of \(s_1 -\lambda _2 < 0\), that is, \(f(\lambda _2) > f(\lambda _1)\) and the minimiser \(\lambda ^{\star }\) of \(f(\lambda )\) is the minimum root \(\lambda _1\) of \(f^{\prime }(\lambda )\). □

Appendix 4: Proof of Lemma 4

Proof

We first show that the polynomials \(p_i(t)\) have unique roots in \(\displaystyle \left[ {|c_1|},1 \right] \). The second derivative of \(p_i(t)\) is given by

$$\begin{aligned} p_i^{\prime \prime }(t) = 12t^2 +12d_i t + 2(d_i^2-1) \end{aligned}$$

and has two roots \(\bar{t}_{1,2} = \displaystyle \frac{-3d_i \mp \sqrt{3d_i^2 + 6}}{6}\).

If \(d_i > 1\), the roots \(\bar{t}_{1,2}\) are negative. Hence, the first derivative \(p_i^{\prime }(t)\) monotonically increases in \([0, +\infty )\). In addition, since

$$\begin{aligned} p_i^{\prime }(0) = - 2c_1^2 d_i \le 0 \end{aligned}$$

\(p_i^{\prime }(t)\) has only one root in \([0, +\infty )\). Together with the fact that \(p_i(0) = -c_1^2 d_i^2 \le 0\), \(p_i(|c_1|) = c_1^2 (c_1^2 - 1) \, \le \, 0 \) and \(p_i(1) = d_i(d_i+2) (1 - c_1^2) \ge 0\), the polynomial \(p_i(t)\) has a unique root in \(\displaystyle \left[ {|c_1|},1 \right] \).

If \(d_i\le 1\), the second root \(\bar{t}_2\) is nonnegative, \(\bar{t}_2 \ge 0\). However, since \(p_i^{\prime }(0) = - 2c_1^2 d_i \le 0\), the first derivative \(p_i^{\prime }(t)\) has only one root in \([0, +\infty )\). Again as for the case \(d_1>1\), the polynomial \(p_i(t)\) also has unique root in \(\displaystyle \left[ {|c_1|},1 \right] \).

As the definition of the root \(t_2\), we can prove that derivative \(f^{\prime }(s_1 - t_2)\) does not exceed zero, that is

$$\begin{aligned} f^{\prime }(s_1-t_2)& = {} 1 - \frac{c_1^2}{t_2^2} - \sum _{k = 2}^{K} \frac{c_k^2}{(s_k-s_1 + t_2)^2} \le 1 - \frac{c_1^2}{t_2^2} - \frac{\sum _{k = 2}^{K} c_k^2}{(s_K-s_1 + t_2)^2} \nonumber \\ & = {} 1 - \frac{c_1^2}{t_2^2} - \frac{1 - c_1^2}{(d_2 + t_2)^2} = \frac{p_2(t_2)}{t_2^2 (d_2 + t_2)^2} \nonumber \\ & = {} 0. \end{aligned}$$
(42)

Similarly, we have

$$\begin{aligned} f^{\prime }(s_1-t_1)& = {} 1 - \frac{c_1^2}{t_1^2} - \sum _{k = 2}^{K} \frac{c_k^2}{(s_k-s_1 + t_1)^2} \ge 1 - \frac{c_1^2}{t_1^2} - \frac{\sum _{k = 2}^{K} c_k^2}{(d_1 + t_1)^2} \nonumber \\ & = {} 1 - \frac{c_1^2}{t_1^2} - \frac{1 - c_1^2}{(d_1 + t_1)^2} = \frac{p_1(t_1)}{t_1^2 (d_1 + t_1)^2} \nonumber \\ & = {} 0. \end{aligned}$$
(43)

From (42) and (43), it follows that \(f^{\prime }(t)\) has a root in \([1 - t_1, 1-t_2]\). This root is unique and also the global minimiser of \(f(\lambda )\) in (5). This completes the proof. □

Appendix 5: Proof of Lemma 5

Proof

First, similar to Lemma 2, the roots \(\lambda _{l,L}^{\star }\) and \(\lambda _{u,L}^{\star }\) are unique in the interval \([0, 1-|c_1|]\). Taking into account that \(\sum _{k = 1}^{K} c_k^2 = 1\), and \(s_K \ge s_k\) for all k, we have

$$\begin{aligned} f^{\prime }(\lambda )& = {} 1 - \sum _{l = 1}^{L} \frac{c_l^2}{(s_l - \lambda )^2} - \sum _{k = L+1}^{K} \frac{c_k^2}{(s_k- \lambda )^2} \\&\le {} 1 - \sum _{l = 1}^{L} \frac{c_l^2}{(s_l - \lambda )^2} - \frac{\sum _{k = L+1}^{K} c_k^2}{(s_K- \lambda )^2} = 1 - \sum _{l = 1}^{L} \frac{c_l^2}{(s_l - \lambda )^2} - \frac{\tilde{c}_{L+1}^2}{(s_K- \lambda )^2} \\ & = {} f^{(L)}_{u}(\lambda ) . \end{aligned}$$

Similarly, we can derive \(f^{\prime }(\lambda ) \ge f^{(L)}_{l}(\lambda )\). It appears that the function values of \(f^{\prime }(\lambda )\) at \(\lambda _{l,L}^{\star }\) and \(\lambda _{u,L}^{\star }\) are nonnegative and non-positive, respectively,

$$\begin{aligned} f^{\prime }(\lambda _{l,L}^{\star })\ge & {} f^{(L)}_{l}(\lambda _{l,L}^{\star }) = 0 , \;{\text {and}} \;\; f^{\prime }(\lambda _{u,L}^{\star }) \le f^{(L)}_{u}(\lambda _{u,L}^{\star }) = 0, \end{aligned}$$

thus implying that \( \lambda _{l,L}^{\star } \le \lambda ^{\star } \, \le \lambda _{u,L}^{\star }\). The sequence of inequalities in (6) can be proved in a similar way. □

Appendix 6: Proof of Lemma 6

Proof

By contradiction, assume that the variable \(\tilde{x}_n^{\star }\) is nonzero. Since there is only one \(c_n = 0\), from (3), the multiplier \(\lambda ^{\star }\) must be equal to \(s_n\), that is, \( \lambda ^{\star } = s_n\), and the minimiser \(\tilde{\varvec{x}}^{\star }\) is given by

$$\begin{aligned} \tilde{x}_{k}^{\star }& = {} \frac{c_k}{s_n - s_k} ,\quad k \ne n \end{aligned}$$

while from the unit-length condition of \(\tilde{\varvec{x}}^{\star }\), we have \(\tilde{x}_{n}^{\star 2} = 1 - \sum _{k\ne n} \tilde{x}_{k}^{\star 2}\) which requires an additional assumption \( \sum _{k\ne n} \frac{c_k^2}{(s_n-s_k)^2} < 1\).

The objective function in (2) at \({\tilde{\varvec{x}}}^{\star }\), as well as the Lagrangian function at \(({\tilde{\varvec{x}}}^{\star }, \lambda ^{\star } = s_n)\) are given by

$$\begin{aligned} {\mathcal {L}}({\tilde{\varvec{x}}}^{\star }, \lambda ^{\star })& = {} \frac{1}{2} \left( s_n - s_n \, \sum _{k \ne n} \frac{c_k^2 }{(s_n - s_k)^2} + \sum _{k \ne n} \frac{c_k^2 \, s_k}{(s_n - s_k)^2} \right) + \sum _{k \ne n} \frac{c_k^2 }{s_n - s_k} \nonumber \\ & = {} \frac{1}{2}\left( s_n + \sum _{k \ne n} \frac{c_k^2}{s_n - s_k} \right) . \end{aligned}$$
(44)

Now, we consider a vector \(\bar{\varvec{x}}\) whose n-th entry is zero, \(\bar{x}_n = 0\), and the rest \((K-1)\) coefficients \(\bar{\varvec{x}}_{n} = [\bar{x}_1, \ldots , \bar{x}_{n-1}, \bar{x}_{n+1},\ldots , \bar{x}_K]\) are minimiser to a reduced problem

$$\begin{aligned}&\min \,\, \frac{1}{2} \sum _{k \ne n} s_k \, \tilde{x}_k^2 + \sum _{k \ne n} c_k \, \tilde{x}_k,\;\;\; {\text {s.t.}} \quad \sum _{k \ne n} \tilde{x}_k^2 = 1. \end{aligned}$$

According to the results in Sect. 2.2, when \(c_k\), \(k \ne n\), are nonzeros, the Lagrangian function for this reduced problem at the minimiser \(\bar{\varvec{x}}_{n}\) is given by

$$\begin{aligned} {\mathcal {L}}_n(\bar{\varvec{x}}_{n}, \lambda _n^{\star }) = \frac{1}{2}\left( \lambda _n^{\star } + \sum _{k \ne n} \frac{c_k^2}{\lambda _n^{\star } - s_k} \right) , \end{aligned}$$
(45)

where the optimal multiplier \(\lambda _n^{\star } < s_1 = 1\). From (44) and (45), it is apparent that

$$\begin{aligned} {\mathcal {L}}({\tilde{\varvec{x}}}^{\star }, \lambda ^{\star }) > {\mathcal {L}}_n(\bar{\varvec{x}}_{n}, \lambda _n^{\star }) = {\mathcal {L}}({\bar{\varvec{x}}}, \lambda ^{\star }_n), \end{aligned}$$

which contradicts with the claim that \(\tilde{\varvec{x}}^{\star }\) is the minimiser to the problem (2). This implies that the n-th variable of the minimiser must be zero, i.e. \(\tilde{x}_n^{\star } = 0.\)

Appendix 7: Proof of Lemma 7

Proof

When \(c_1 = 0\), from the first optimality condition in (3), we have

$$\begin{aligned} (s_1 - \lambda ) \, \tilde{x}_1 = 0. \end{aligned}$$

Assume that \(\tilde{\varvec{x}}^{\star }\) is a minimiser to the problem in (2) with a nonzero \(\tilde{x}_1^{\star }\), then \(\lambda = s_1 = 1\) and

$$\begin{aligned} \tilde{x}_k^{\star } = \frac{c_k}{\lambda -s_k} = \frac{c_k}{1-s_k} \quad {\text {for}} \; k >1. \end{aligned}$$

From the unit-length constraint, it follows that \( {(\tilde{x}_1^{\star })}^2 = 1- \sum _{k>1} ({\tilde{x}_k^{\star }})^2 = 1 - d\), which requires the condition \( d \le 1\). Implying that, if \( d >1\), \(\tilde{x}_1^{\star }\) must be zero, and the rest \((K-1)\) variables \([\tilde{x}_2^{\star },\ldots , \tilde{x}_K^{\star }]\) are minimiser to the reduced problem of (2).

When \(d\le 1\), there exists \(\tilde{x}_1^{\star }\), and the objective function at \(\tilde{\varvec{x}}^{\star }\) is given by

$$\begin{aligned} {\mathcal {L}}(\tilde{\varvec{x}}^{\star },s_1) = \frac{1}{2}\left( 1 - \sum _{k>2} \frac{c_k^2}{s_k - 1}\right) . \end{aligned}$$

Now, we consider a vector \(\tilde{\varvec{x}}\) whose \(\tilde{x}_1 = 0\), and \(\bar{\varvec{x}} = [\tilde{x}_2, \ldots , \tilde{x}_K]\) is a minimiser to the reduced problem (7). Similar to the analysis in Sect. 2.2, the objective function of the reduced problem (7) achieves a global minimum at the minimum root \(\bar{\lambda }\) of the first derivative of the Lagrangian function

$$\begin{aligned} {\mathcal {L}}_1(\bar{\varvec{x}},\bar{\lambda }) = \frac{1}{2}\left( \bar{\lambda } - \sum _{k>2} \frac{c_k^2}{ s_k - \bar{\lambda }} \right) , \end{aligned}$$

where \(\bar{\lambda }\) is smaller than \(s_2\).

Since the second derivative of \({\mathcal {L}}_1(\bar{\varvec{x}},\lambda )\) w.r.t. \(\lambda \) is negative for all \(\lambda < s_2\), the function \({\mathcal {L}}_1(\bar{\varvec{x}},\lambda )\) is concave in \((-\infty , s_2)\). It then follows that \( {\mathcal {L}}(\tilde{\varvec{x}}^{\star },s_1) < {\mathcal {L}}_1(\bar{\varvec{x}},\bar{\lambda })\), and \(\tilde{\varvec{x}}^{\star }\) is the global minimiser. Note that \(\tilde{x}_1^{\star }\) can be \(\sqrt{1-d}\) or \(-\sqrt{1-d}\). □

Appendix 8: Proof of Lemma 9

Proof

Let \(\varvec{x}^{\star }\) be a minimiser to the problem (16)

$$\begin{aligned} \varvec{x}^{\star } = \mathop {\text {arg min}}\limits _{\varvec{x}} \quad \Vert \varvec{x}\Vert ^2 \quad {\text {s.t.}} \quad \Vert \varvec{y}- \mathbf{A}\varvec{x}\Vert \le \delta . \end{aligned}$$

It is obvious that if there are zero entries in \(\varvec{x}^{\star }\), we can omit columns of \(\mathbf{A}\) corresponding to these entries, and the regression problem formulated for the remaining sub-matrix of \(\mathbf{A}\) has a nonzero minimiser. Hence, we can assume that entries of \(\varvec{x}^{\star }\) are nonzeros.

Let \(\varvec{z}= \varvec{y}- \sum _{k = 2}^{K} {\varvec{a}}_k x_k^{\star } \), then \(x_1^{\star }\) is a minimiser to the optimisation w.r.t. \(x_1\), that is

$$\begin{aligned} x_1^{\star } = \mathop {\text {arg min}}\limits _{x_1} \quad x_1^2 \quad {\text {s.t.}} \quad \Vert \varvec{z}- {\varvec{a}}_1 x_1 \Vert \le \delta . \end{aligned}$$
(46)

The constraint function can be written as

$$\begin{aligned} c(x_1) = \Vert \varvec{z}- {\varvec{a}}_1 x_1\Vert ^2 - \delta ^2 = \Vert {\varvec{a}}_1\Vert ^2 \, x_1^2 - 2 ({\varvec{a}}_1^{\mathrm{T}} \varvec{z}) \, x_1 + \Vert \varvec{z}\Vert ^2 - \delta ^2. \end{aligned}$$

Since \(c(x_1^{\star }) \le 0\), \(c(x_1)\) must have two roots \(t_{-}\) and \(t_{+}\). Moreover, it is clear from (46) that \(\Vert \varvec{z}\Vert ^2 > \delta ^2\) otherwise \(x_1^{\star } = 0\). Hence, the two roots \(t_{-}\) and \(t_{+}\) have the same signs because \( t_{-} t_{+} = \frac{\Vert \varvec{z}\Vert ^2 - \delta ^2}{\Vert {\varvec{a}}_1\Vert ^2} > 0\). As a result, the minimiser to (46) must be one of the two roots, \(x_1^{\star } = \min (|t_{-}|,|t_{+}|)\), and the inequality condition becomes the equality one. □

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Phan, AH., Yamagishi, M., Mandic, D. et al. Quadratic programming over ellipsoids with applications to constrained linear regression and tensor decomposition. Neural Comput & Applic 32, 7097–7120 (2020). https://doi.org/10.1007/s00521-019-04191-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-019-04191-z

Keywords

Navigation