Skip to main content
Log in

Sharp Recovery Bounds for Convex Demixing, with Applications

  • Published:
Foundations of Computational Mathematics Aims and scope Submit manuscript

Abstract

Demixing refers to the challenge of identifying two structured signals given only the sum of the two signals and prior information about their structures. Examples include the problem of separating a signal that is sparse with respect to one basis from a signal that is sparse with respect to a second basis, and the problem of decomposing an observed matrix into a low-rank matrix plus a sparse matrix. This paper describes and analyzes a framework, based on convex optimization, for solving these demixing problems, and many others. This work introduces a randomized signal model that ensures that the two structures are incoherent, i.e., generically oriented. For an observation from this model, this approach identifies a summary statistic that reflects the complexity of a particular signal. The difficulty of separating two structured, incoherent signals depends only on the total complexity of the two structures. Some applications include (1) demixing two signals that are sparse in mutually incoherent bases, (2) decoding spread-spectrum transmissions in the presence of impulsive errors, and (3) removing sparse corruptions from a low-rank matrix. In each case, the theoretical analysis of the convex demixing method closely matches its empirical behavior.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. We prefer the notation \(\mathrm {nnz}(\cdot )\) over \(\Vert \cdot \Vert _{\ell _0}\) because the number of nonzero elements in a vector is not a norm.

  2. Our actual computation uses the simpler, but equivalent, formula given in [5, Prop. 4.9].

  3. Equation (10.12) requires a significant amount of wholly uninteresting algebraic simplification from the formulas of [25]. The key steps in this simplification follow from the equations on p. 638 of the reference. In particular, we write \(y\) explicitly in terms of \(s\) with their Eq. (6.12) and then write \(\xi \) explicitly in terms of \(s\) using this expression for \(y\)—see Eq. (6.13) in the reference. The reference defines \(\gamma = \frac{\tau }{\theta }\) on p. 631, which gives (10.12), modulo trivial simplifications.

  4. This result is a corollary of a Gaussian process inequality due to Gordon [40, 41].

References

  1. A. Adler, V. Emiya, M.G. Jafari, M. Elad, R. Gribonval, and M.D. Plumbley. Audio inpainting. IEEE Trans. Audio, Speech, and Lang. Process., 20(3):922–932, 2012.

    Google Scholar 

  2. F. Affentranger and R. Schneider. Random projections of regular simplices. Discrete Comput. Geom., 7(1):219–226, 1992.

    Article  MATH  MathSciNet  Google Scholar 

  3. D. Amelunxen. Geometric analysis of the condition of the convex feasibility problem. Dissertation, Universität Paderborn, 2011.

  4. D. Amelunxen and P. Bürgisser. Intrinsic volumes of symmetric cones and applications in convex programming. Math. Program. Ser. A, 2014. doi:10.1007/s10107-013-0740-2.

  5. D. Amelunxen, M. Lotz, M. B. McCoy, and J. A. Tropp. Living on the edge: A geometric theory of phase transitions in convex optimization. preprint, March 2013. arXiv:1303.6672.

  6. M. Bayati, M. Lelarge, and A. Montanari. Universality in polytope phase transitions and message passing algorithms. preprint, July 2012. arXiv:1207.7321.

  7. J. Bobin, Y. Moudden, and J.-L. Starck. Morphological diversity and source separation. IEEE Trans. Signal Process., 13(7):409–412, 2006.

    Article  Google Scholar 

  8. J. Bobin, J.-L. Starck, J. Fadili, and Y. Moudden. Sparsity and morphological diversity in blind source separation. IEEE Trans. Image Process., 16(11):2662–2674, Nov. 2007.

    Google Scholar 

  9. K. Böröczky, Jr. and M. Henk. Random projections of regular polytopes. Arch. Math. (Basel), 73(6):465–473, Dec. 1999.

    Google Scholar 

  10. P. Bürgisser and D. Amelunxen. Robust smoothed analysis of a condition number for linear programming. Math. Program., Apr. 2010.

  11. E. J. Candès, Y. C. Eldar, D. Needell, and P. Randall. Compressed sensing with coherent and redundant dictionaries. Appl. Comput. Harmon. Anal., 31(1):59–73, 2011.

    Article  MATH  MathSciNet  Google Scholar 

  12. E. J. Candès, X. Li, Y. Ma, and J. Wright. Robust principal component analysis? J. Assoc. Comput. Mach., 58(3):1–37, May 2011.

    Article  Google Scholar 

  13. E. J. Candès, J. Romberg, and T. Tao. Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inform. Theory, 52(2):489–509, 2006.

    Article  MATH  MathSciNet  Google Scholar 

  14. V. Chandrasekaran, P. A. Parrilo, and A. S. Willsky. Latent variable graphical model selection via convex optimization. In 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 1610–1613, Oct. 2010.

  15. V. Chandrasekaran, B. Recht, P. A. Parrilo, and A. S. Willsky. The convex geometry of linear inverse problems. Found. Comput. Math., 12(6):805–849, 2012.

    Article  MATH  MathSciNet  Google Scholar 

  16. V. Chandrasekaran, S. Sanghavi, P. A. Parrilo, and A. S. Willsky. Sparse and low-rank matrix decompositions. In SYSID 2009, Saint-Malo, France, July 2009.

  17. V. Chandrasekaran, S. Sanghavi, P. A. Parrilo, and A. S. Willsky. Rank-sparsity incoherence for matrix decomposition. SIAM J. Optim, 21(2):572–596, 2011.

    Article  MATH  MathSciNet  Google Scholar 

  18. S. S. Chen, D. L. Donoho, and M. A. Saunders. Atomic decomposition by basis pursuit. SIAM J. Sci. Comput., 20(1):33–61, 1999.

    MATH  MathSciNet  Google Scholar 

  19. J. F. Claerbout and F. Muir. Robust modeling with erratic data. Geophysics, 38(5):826–844, 1973.

    Article  Google Scholar 

  20. S. F. Cotter, B. D. Rao, K. Engan, and K. Kreutz-Delgado. Sparse solutions to linear inverse problems with multiple measurement vectors. IEEE Trans. Signal Process., 53(7):2477–2488, 2005.

    Article  MathSciNet  Google Scholar 

  21. R. A. DeVore and V. N. Temlyakov. Some remarks on greedy algorithms. Adv. Comput. Math., 5(2–3):173–187, 1996.

    Article  MATH  MathSciNet  Google Scholar 

  22. NIST Digital Library of Mathematical Functions. http://dlmf.nist.gov/, Release 1.0.5 of 2012–10-01. Online companion to [59]

  23. D. L. Donoho. Neighborly polytopes and sparse solutions of underdetermined linear equations. Technical report, Stanford University, 2004.

  24. D. L. Donoho. Compressed sensing. IEEE Trans. Inform. Theory, 52(4):1289–1306, 2006.

    Article  MathSciNet  Google Scholar 

  25. D. L. Donoho. High-dimensional centrally symmetric polytopes with neighborliness proportional to dimension. Discrete Comput. Geom., 35(4):617–652, Dec. 2006.

    Article  MATH  MathSciNet  Google Scholar 

  26. D. L. Donoho and X. Huo. Uncertainty principles and ideal atomic decomposition. IEEE Trans. Inform. Theory, 47(7):2845–2862, Aug. 2001.

    Article  MATH  MathSciNet  Google Scholar 

  27. D. L. Donoho and P. B. Stark. Uncertainty principles and signal recovery. SIAM J. Appl. Math., 49(3):906–931, June 1989.

    Article  MATH  MathSciNet  Google Scholar 

  28. D. L. Donoho and J. Tanner. Neighborliness of randomly projected simplices in high dimensions. Proc. Natl. Acad. Sci. USA, 102(27):9452–7, July 2005.

    Article  MATH  MathSciNet  Google Scholar 

  29. D. L. Donoho and J. Tanner. Counting faces of randomly-projected polytopes when the projection radically lowers dimension. J. Amer. Math. Soc., 22(1):1–53, 2009.

    Article  MATH  MathSciNet  Google Scholar 

  30. D. L. Donoho and J. Tanner. Observed universality of phase transitions in high-dimensional geometry, with implications for modern data analysis and signal processing. Philos. Trans. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci, 367(1906):4273–4293, 2009.

    MATH  MathSciNet  Google Scholar 

  31. D. L. Donoho and J. Tanner. Counting the faces of randomly-projected hypercubes and orthants, with applications. Discrete Comput. Geom., 43:522–541, 2010.

    Article  MATH  MathSciNet  Google Scholar 

  32. D. L. Donoho and J. Tanner. Exponential bounds implying construction of compressed sensing matrices, error-correcting codes, and neighborly polytopes by random sampling. IEEE Trans. Inform. Theory, 56(4):2002–2016, 2010.

    Article  MathSciNet  Google Scholar 

  33. D. L. Donoho and J. Tanner. Precise undersampling theorems. Proc. IEEE, 98(6):913–924, June 2010.

    Article  Google Scholar 

  34. M. Elad, P. Milanfar, and R. Rubinstein. Analysis versus synthesis in signal priors. Inverse Problems, 23(3):947–968, 2007.

    Article  MATH  MathSciNet  Google Scholar 

  35. M. Elad, J.-L. Starck, P. Querre, and D. L. Donoho. Simultaneous cartoon and texture image inpainting using morphological component analysis (MCA). Appl. Comput. Harmon. Anal., 19(3):340–358, Nov. 2005.

    Article  MATH  MathSciNet  Google Scholar 

  36. M. Fazel. Matrix rank minimization with applications. Dissertation, Stanford University, Stanford, CA, 2002.

  37. R. Foygel and L. Mackey. Corrupted sensing: Novel guarantees for separating structured signals. preprint, 2013.

  38. S. Glasauer. Integralgeometrie konvexer Korper im spharischen Raum. Dissertation, University of Freiburg, 1995.

  39. S. Glasauer. Integral geometry of spherically convex bodies. Diss. Summaries Math., 1:219–226, 1996.

    MathSciNet  Google Scholar 

  40. Y. Gordon. Elliptically contoured distributions. Probab. Theory Related Fields, 76(4):429–438, 1987.

    Article  MATH  MathSciNet  Google Scholar 

  41. Y. Gordon. On Milman’s inequality and random subspaces which escape through a mesh in \(\mathbb{R}^{n}\). In Geometric aspects of functional analysis: Israel seminar (GAFA), 1986–87, page 84. Springer, 1988.

  42. M. Grant and S. Boyd. Graph implementations for nonsmooth convex programs. In V. Blondel, S. Boyd, and H. Kimura, editors, Recent Advances in Learning and Control, Lecture Notes in Control and Information Sciences, pages 95–110. Springer-Verlag Limited, London, 2008.

    Chapter  Google Scholar 

  43. M. Grant and S. Boyd. CVX: Matlab software for disciplined convex programming, version 1.21, October 2010.

  44. B. Grünbaum. Convex polytopes, volume XVI of Pure and Applied Mathematics. Wiley, London, 1967.

  45. B. Grünbaum. Grassmann angles of convex polytopes. Acta Math., 121(1):293–302, Dec. 1968.

    Article  MATH  MathSciNet  Google Scholar 

  46. C. Hegde and R. G. Baraniuk. Signal recovery on incoherent manifolds. IEEE Trans. Inform. Theory, 58(12):7204–7214, December 2012.

    Article  MathSciNet  Google Scholar 

  47. A. Jalali, P. Ravikumar, S. Sanghavi, and C. Ruan. A dirty model for multi-task learning. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems 23, pages 964–972. NIPS, 2010.

  48. A. Khajehnejad, W. Xu, A.S. Avestimehr, and B. Hassibi. Improved sparse recovery thresholds with two-step reweighted \(\ell _1\) minimization. In IEEE Int. Symp. Inform. Theory Proc. (ISIT), pages 1603–1607, Austin, TX, 2010.

  49. A. Khajehnejad, W. Xu, A. S. Avestimehr, and B. Hassibi. Analyzing weighted \( \ell _1\) minimization for sparse recovery with nonuniform sparse models. IEEE Trans. Signal Process., 59(5):1985–2001, 2011.

    Article  MathSciNet  Google Scholar 

  50. V. L. Klee, Jr. Separation properties of convex cones. Proc. Amer. Math. Soc., 6(2):313–318, 1955.

    Article  MATH  MathSciNet  Google Scholar 

  51. O. L. Mangasarian and B. Recht. Probability of unique integer solution to a system of linear equations. European J. Oper. Res., 214(1):27–30, Oct. 2011.

    Article  MATH  MathSciNet  Google Scholar 

  52. M. B. McCoy and J. A. Tropp. Two proposals for robust PCA using semidefinite programming. Elec. J. Statist., 5:1123–1160, 2011.

    Article  MATH  MathSciNet  Google Scholar 

  53. M. B. McCoy and J. A. Tropp. The achievable performance of convex demixing. preprint, 2013. Submitted to Math. Program. Available at.

  54. P. McMullen. Non-linear angle-sum relations for polyhedral cones and polytopes. Math. Proc. Cambridge Philos. Soc., 78(02):247, Oct. 1975.

    Article  MATH  MathSciNet  Google Scholar 

  55. F. Mezzadri. How to generate random matrices from the classical compact groups. Notices Amer. Math. Soc., 54(5):592–604, 2007.

    MATH  MathSciNet  Google Scholar 

  56. S. Negahban, P. Ravikumar, M. Wainwright, and B. Yu. A unified framework for high-dimensional analysis of \(m\)-estimators with decomposable regularizers. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 1348–1356. NIPS, 2009.

  57. S. Negahban and M. J. Wainwright. Restricted strong convexity and weighted matrix completion: optimal bounds with noise. J. Mach. Learn. Res., 13:1665–1697, 2012.

    MATH  MathSciNet  Google Scholar 

  58. N. H. Nguyen and T. D. Tran. Exact recoverability from dense corrupted observations via \(\ell _1\)-minimization. IEEE Trans. Inform. Theory, 59(4):2017–2035, 2013.

    Article  MathSciNet  Google Scholar 

  59. F. W. J. Olver, D. W. Lozier, R. F. Boisvert, and C. W. Clark, editors. NIST Handbook of Mathematical Functions. Cambridge University Press, New York, NY, 2010. Print companion to [22].

  60. S. Oymak and B. Hassibi. New null space results and recovery thresholds for matrix rank minimization. preprint, 2010.

  61. S. Oymak and B. Hassibi. Asymptotically exact denoising in relation to compressed sensing. preprint, May 2013.

  62. G. Pataki. The geometry of semidefinite programming. In H. Wolkowicz, R. Saigal, and L. Vandenberghe, editors, The Handbook of Semidefinite Programming: Theory, Algorithms, and Applications, pages 29–65. Kluwer, Boston, 2000.

    Chapter  Google Scholar 

  63. G. Pope, A. Bracher, and C. Studer. Probabilistic recovery guarantees for sparsely corrupted signals. IEEE Trans. Inform. Theory, 59(5):3104–3116, 2013.

    Article  MathSciNet  Google Scholar 

  64. B. Recht, M. Fazel, and P. A. Parrilo. Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev., 52(3):471–501, 2010.

    Article  MATH  MathSciNet  Google Scholar 

  65. R. T. Rockafellar. Convex Analysis. Princeton Landmarks in Mathematics. Princeton University Press, 1997.

    MATH  Google Scholar 

  66. R. T. Rockafellar and R. J.-B. Wets. Variational Analysis, volume 317 of Grundlehren der mathematischen Wissenschaften. Springer, Berlin, 1998.

  67. H. Ruben. On the geometrical moments of skew-regular simplices in hyperspherical space, with some applications in geometry and mathematical statistics. Acta Math., 103:1–23, 1960.

    Article  MATH  MathSciNet  Google Scholar 

  68. F. Santosa and W. W. Symes. Linear inversion of band-limited reflection seismograms. SIAM J. Sci. Statist. Comput., 7(4):1307–1330, 1986.

    Article  MATH  MathSciNet  Google Scholar 

  69. R. Schneider and W. Weil. Stochastic and Integral Geometry. Probability and its Applications. Springer Verlag, 2008.

    Book  MATH  Google Scholar 

  70. J.-L. Starck, D. L. Donoho, and E. J. Candès. Astronomical image representation by the curvelet transform. Astronom. Astrophys., 398(2):785–800, 2003.

    Article  Google Scholar 

  71. J.-L. Starck, M. Elad, and D. L. Donoho. Image decomposition via the combination of sparse representations and a variational approach. IEEE Trans. Image Process., 14(10):1570–1582, Oct. 2005.

    Article  MathSciNet  Google Scholar 

  72. M. Stojnic. Strong thresholds for \(\ell _1/\ell _2\)-optimization in block-sparse compressed sensing. In ICASSP 2009, pages 3025–3028, 2009.

    Google Scholar 

  73. M. Stojnic. Various thresholds for \(\ell _1\)-optimization in compressed sensing. 2009. arXiv:0907.3666.

  74. M. Stojnic. A framework to characterize performance of lasso algorithms. March 2013. arXiv:1303.7291.

  75. C. Studer, P. Kuppinger, G. Pope, and H. Bölcskei. Recovery of sparsely corrupted signals. IEEE Trans. Inf. Theory, pages 3115–3130, May 2012.

  76. G. Tang, B. N. Bhaskar, and B. Recht. Near minimax line spectral estimation. 2013. arXiv:1303.4348.

  77. H. L. Taylor, S. C. Banks, and J. F. McCoy. Deconvolution with the l1 norm. Geophysics, 44(1):39, 1979.

    Article  Google Scholar 

  78. V. N. Temlyakov. Nonlinear methods of approximation. Found. Comput. Math., 3(1):33–107, 2003.

    Article  MATH  MathSciNet  Google Scholar 

  79. J. A. Tropp. On the linear independence of spikes and sines. J. Fourier Anal. Appl, 14:838–858, 2008.

    Article  MATH  MathSciNet  Google Scholar 

  80. A. M. Vershik and P. V. Sporyshev. An asymptotic estimate for the average number of steps in the parametric simplex method. USSR Comput. Maths. Math. Phys., 26(3):104–113, 1986.

    Article  MATH  MathSciNet  Google Scholar 

  81. M. J. Wainwright. Information-theoretic limits on sparsity recovery in the high-dimensional and noisy setting. IEEE Trans. Inform. Theory, 55(12):5728–5741, 2009.

    Article  MathSciNet  Google Scholar 

  82. M. J. Wainwright. Sharp thresholds for high-dimensional and noisy sparsity recovery using \(\ell _1\)-constrained quadratic programming (Lasso). IEEE Trans. Inform. Theory, 55(5):2183–2202, 2009.

    Article  MathSciNet  Google Scholar 

  83. J. Wright, A. Ganesh, K. Min, and Y. Ma. Compressive principal component pursuit. Information and Inference, 2(1):32–68, 2013.

    Article  Google Scholar 

  84. J. Wright and Y. Ma. Dense error correction via \(\ell ^1\)-minimization. IEEE Trans. Inform. Theory, 56(7):3540–3560, 2010.

    Article  MathSciNet  Google Scholar 

  85. A. D. Wyner. An analog scrambling scheme which does not expand bandwidth, part I: Discrete time. IEEE Trans. Inform. Theory, 25(3):261–274, May 1979.

    Article  MATH  MathSciNet  Google Scholar 

  86. A. D. Wyner. An analog scrambling scheme which does not expand bandwidth, part II: Continuous time. IEEE Trans. Inform. Theory, 25(4):415–425, 1979.

    Article  MATH  MathSciNet  Google Scholar 

  87. H. Xu, C. Caramanis, and S. Sanghavi. Robust PCA via outlier pursuit. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems 23, pages 2496–2504. NIPS, 2010.

  88. H. Xu, C. Caramanis, and S. Sanghavi. Robust PCA via outlier pursuit. IEEE Trans. Inform. Theory, 58(5):1–24, 2012.

    Article  MATH  MathSciNet  Google Scholar 

  89. W. Xu and B. Hassibi. Precise stability phase transitions for \(\ell _1\) minimization: A unified geometric framework. IEEE Trans. Inform. Theory, 57(10):6894–6919, Oct. 2011.

    Article  MathSciNet  Google Scholar 

  90. W. Xu, A. Khajehnejad, A. S. Avestimehr, and B. Hassibi. Breaking through the thresholds: an analysis for iterative reweighted \(\ell _1\) minimization via the Grassmann angle framework. In 2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pages 5498–5501, Mar. 2010.

Download references

Acknowledgments

This research was supported by ONR Awards N00014-08-1-0883 and N00014-11-1002, AFOSR Award FA9550-09-1-0643, DARPA Award N66001-08-1-2065, and a Sloan Research Fellowship.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael B. McCoy.

Additional information

Communicated by Emmanuel Candès.

Appendices

Appendix 1: Equivalence of Constrained and Penalized Methods

This appendix provides a geometric account of the equivalence between the constrained (1.2) and penalized (1.3) convex demixing methods. The results in this section let us interpret our conditions for the success of the constrained demixing method (1.2) as limits on, and opportunities for, the Lagrange demixing method (1.3).

We begin with the following well-known result; it holds without any technical restrictions. We omit the demonstration, which is an easy exercise in proof by contradiction. (See also [65, Cor. 28.1.1].)

Proposition 8.1

Suppose the Lagrange problem (1.3) succeeds for some value \(\lambda >0\). Then (1.2) succeeds.

Before stating a partial converse to Proposition 8.1, we require a technical definition. We say that a proper convex function \(f\) is typical at \(\varvec{x}\) if \(f\) is subdifferentiable at \(\varvec{x}\) but does not achieve its minimum at \(\varvec{x}\). With this technical condition in place, we have the following complement to Proposition 8.1.

Proposition 8.2

Suppose \(f\) is typical at \(\varvec{x}_0\) and \(g\) is typical at \(\varvec{y}_0\). If the constrained method (1.2) succeeds, then there exists a parameter \(\lambda > 0\) such that \((\varvec{x}_0,\varvec{y}_0)\) is an optimal point for the Lagrange method (1.3).

Note that there is a subtlety here: the Lagrange program may have strictly more optimal points than the corresponding constrained problem even for the best choice of \(\lambda \), so that we cannot guarantee that \((\varvec{x}_0,\varvec{y}_0)\) is the unique optimum. See [65, Sec. 28] for more details.

Proof of Proposition 8.2

The key idea is the construction of a subgradient that certifies the optimality of the pair \((\varvec{x}_0,\varvec{y}_0)\) for the Lagrange penalized problem (1.3) for an appropriate choice of parameter \(\lambda \). As with many results in convex analysis, a separating hyperplane plays an important role.

By Lemma 2.4, the constrained problem (1.2) will succeed if and only if \(\mathcal {F}(f,\varvec{x}_0) \cap -\varvec{Q} \mathcal {F}(g,\varvec{y}_0) = \{\mathbf {0}\}\). The trivial intersection of the feasible cones implies that there exists a hyperplane that separates these cones. (This fact is a special case of the Hahn–Banach separation theorem for convex cones due to Klee [50].) In other words, there exists some vector \(\varvec{u} \ne \mathbf {0}\) such that

$$\begin{aligned} \langle \varvec{u}, \varvec{x} \rangle \le 0 \text { for all } \varvec{x} \in \mathcal {F}(f,\varvec{x}_0), \end{aligned}$$

and, moreover,

$$\begin{aligned} \langle \varvec{u}, \varvec{y} \rangle \ge 0 \text { for all } \varvec{y} \in -\varvec{Q} \mathcal {F}(g,\varvec{y}_0). \end{aligned}$$

In the language of polar cones, the first separation inequality is simply the statement that \(\varvec{u} \in \mathcal {F}(f,\varvec{x}_0)^\circ \), while the second inequality is equivalent to \(\varvec{Q}^*\varvec{u} \in \mathcal {F}(g,\varvec{y}_0)^\circ \).

We will now show that \(\varvec{u}\) generates a subgradient optimality certificate for the point \((\varvec{x}_0,\varvec{y}_0)\) in problem (1.3) for an appropriate choice of parameter \(\lambda >0\). We denote the subdifferential map by \(\partial \).

At this point, we invoke our technical assumption. Since \(f\) is typical at \(\varvec{x}_0\), the polar to the feasible cone is generated by the subdifferential of \(f\) at \(\varvec{x}_0\) [65, Thm 23.7]. In particular, there exists a number \(\lambda _f\ge 0\) such that \(\varvec{u} \in \lambda _f \partial f(\varvec{x}_0)\). In fact, the stronger inequality \(\lambda _f>0\) holds because \(\varvec{u}\ne \varvec{0}\). For the same reason, there exists a number \(\lambda _g >0\) such that \(\varvec{Q}^*\varvec{u} \in \lambda _g \partial g(\varvec{y}_0)\).

Define \(h(\varvec{x}) {:=}\lambda _f f(\varvec{x}) + \lambda _g g(\varvec{Q}^*(\varvec{z}_0 - x))\). By standard transformation rules for subdifferentials [65, Thms. 23.8, 23.9], we have

$$\begin{aligned} \partial h(\varvec{x}_0) \supset \lambda _f \partial f(\varvec{x}_0) -\lambda _g \varvec{Q} \partial g(\varvec{y}_0), \end{aligned}$$

where \(A - B {:=}A + (-B)\) is the Minkowski sum of the sets \(A\) and \(-B\). Since \(\varvec{u}\in \lambda _f \partial f(\varvec{x}_0)\) and \(\varvec{u} \in \lambda _g \varvec{Q} \partial g(\varvec{y}_0)\), we see \(\mathbf {0} \in \partial h(\varvec{x}_0)\). By the definition of subgradients, \(\varvec{x}_0\) is a global minimizer of \(h\). Introducing the variable \(\varvec{y} = \varvec{Q}^*(\varvec{z}_0-\varvec{x})\), it follows that \((\varvec{x}_0,\varvec{y}_0)\) is a global minimizer of

$$\begin{aligned} \left. \begin{array}{ll} \text {minimize}_{} &{} {f(\varvec{x}) +\frac{\lambda _g}{\lambda _f} g(\varvec{y})} \\ \text {subject to} &{} {\varvec{x} + \varvec{Q} \varvec{y} = \varvec{z}_0.} \end{array} \right. \end{aligned}$$

This is Lagrange problem (1.3) with the parameter \(\lambda = \lambda _g/\lambda _f>0\), so we have the result. \(\square \)

Appendix 2: Regions of Failure and Uniform Guarantees

We now present the proofs of the results of Sect. 4 concerning regions of failure (Theorem 4.3) and strong demixing guarantees (Theorem 4.11) for the convex demixing method. These demonstrations closely follow the pattern laid down by the proof of Theorem 4.2.

1.1 Regions of Failure: The Proof of Theorem 4.3

We first state an analog of Theorem 4.4. As usual, \(\fancyscript{D}\) is an infinite set of indices.

Theorem 9.1

Let \(\{K^{(d)}\subset \mathbb {R}^d\mathrel {\mathop {:}}d\in \fancyscript{D}\}\) and \(\{\tilde{K}^{(d)}\subset \mathbb {R}^d \mathrel {\mathop {:}}d \in \fancyscript{D}\}\) be two ensembles of closed convex cones with lower decay thresholds \(\kappa _\star \) and \(\tilde{\kappa }_\star \). If \(\kappa _\star + \tilde{\kappa }_\star > 1\), then there exists an \(\varepsilon >0\) such that \(\mathbb {P}\bigl \{ K^{(d)} \cap \varvec{Q} \tilde{K}^{(d)} \ne \{\mathbf {0}\}\bigr \} \ge 1- \mathrm {e}^{-\varepsilon d} \) for all sufficiently large \(d\).

Theorem 4.3 follows from Theorem 9.1 in the same way that Theorem 4.2 follows from Theorem 4.4, with one additional technical point regarding closure conditions.

Proof of Theorem 4.3

(from Theorem 9.1) By the assumptions in Theorem 4.3, the ensembles \(\bigl \{\overline{\mathcal {F}}(f^{(d)}, \varvec{x}_0^{(d)})\bigr \}\) and \(\bigl \{-\overline{ \mathcal {F}}(g^{(d)},\varvec{y}_0^{(d)})\bigr \}\) of closed cones satisfy the hypotheses of Theorem 9.1. Therefore, there is an \(\varepsilon >0\) such that the closure of the feasible cones have a nontrivial intersection with probability at least \(1-\mathrm {e}^{- \varepsilon d}\), for all large enough \(d\).

It follows from Remark 4.5 that the probability of the event \(\overline{\mathcal {F}}(f^{(d)}, \varvec{x}_0^{(d)})\cap -\varvec{Q} \overline{\mathcal {F}}(g^{(d)}, \varvec{y}_0^{(d)}) \ne \{\varvec{0}\}\) is equal to the probability of the event \(\mathcal {F}(f^{(d)}, \varvec{x}_0^{(d)}) \cap - \varvec{Q} \mathcal {F}(g^{(d)}, \varvec{y}_0^{(d)}) \ne \{\varvec{0}\}\). Applying the geometric optimality condition of Lemma 2.4 immediately implies that (1.2) fails with probability at least \(1-\mathrm {e}^{-\varepsilon d}\). \(\square \)

The proof of Theorem 9.1 requires an additional fact concerning spherical intrinsic volumes.

Fact 9.2

(Spherical Gauss–Bonnet formula [69, P. 258]) For any closed convex cone \(K\subset \mathbb {R}^d\) that is not a subspace, we have

$$\begin{aligned} \sum _{\begin{array}{c} i=-1\\ i \hbox { even} \end{array}}^{d-1} v_i(K) = \sum _{\begin{array}{c} i=-1\\ i \hbox { odd} \end{array}}^{d-1} v_i(K) = \frac{1}{2}. \end{aligned}$$

In the following proof, the Gauss–Bonnet formula is crucial for dealing with the parity term \((1+(-1)^k)\) that arises in the spherical kinematic formula (3.2).

Proof of Theorem 9.2

Since the Gauss–Bonnet formula only applies to cones that are not subspaces, we split the demonstration into three cases: neither the \(\{K^{(d)}\}\) nor the \(\{\tilde{K}^{(d)}\}\) ensemble contains a subspace, one ensemble consists of subspaces, or both ensembles consist of subspaces. We assume without loss that each case holds for every dimension \(d\in \fancyscript{D}\); the proof extends to the general case by considering subsequences where only a single case applies.

We drop the superscript \(d\) for clarity. Assume first that neither \(K\) nor \(\tilde{K}\) is a subspace. Let \(\varpi (k) = (1+(-1)^k)\) be the parity term in the spherical kinematic formula (3.2). Changing the order of summation in the spherical kinematic formula, we find

$$\begin{aligned} P := \mathbb {P}\left\{ K \cap \varvec{Q} \tilde{K} \ne \{\mathbf {0}\} \right\} = \sum _{i=0}^{d-1} v_i(K) \sum _{k=d-i-1}^{d-1} \varpi (k-d+1+i) v_k(\tilde{K}). \end{aligned}$$

Let \(\kappa < \kappa _\star \) and \(\tilde{\kappa }< \tilde{\kappa }_\star \) with \(\kappa + \tilde{\kappa }> 1\); such scalars exist because \(\kappa _\star + \tilde{\kappa }_\star >1\). By the positivity of the spherical intrinsic volumes (Fact 3.5.1), we have

$$\begin{aligned} P \ge \sum _{i= \lceil \kappa d \rceil + 1}^{d-1} v_i(K) \sum _{k=d-i-1}^{d-1} \varpi (k-d+1+i) v_k(\tilde{K}). \end{aligned}$$
(9.1)

We will see that the preceding inner sum is very close to one. Indeed,

$$\begin{aligned} \sum _{k=d-i-1}^{d-1} \varpi (k-d+1+i) v_k(\tilde{K}) = \sum _{k=-1}^{d-1} \varpi (k-d+1+i) v_k(\tilde{K}) - \tilde{\xi }_{i}=1-\tilde{\xi }_i, \end{aligned}$$
(9.2)

where \(\tilde{\xi }_i\) is a discrepancy term [see (9.3) below]. The second equality follows by the spherical Gauss–Bonnet formula (Fact 9.2), and the assumption that \(\tilde{K}\) is not a subspace.

We now bound the discrepancy term \(\tilde{\xi }_i\) uniformly over \(i \ge \lceil \kappa d\rceil +1\). Since \(\kappa +\tilde{\kappa }>1\), for any \(i \ge \lceil \kappa d \rceil +1\) we have \(d-2-i \le \lceil \tilde{\kappa }d\rceil \). By definition of the lower decay threshold, we see that the discrepancy term must be small: for any \(i \ge \lceil \kappa d \rceil +1\),

$$\begin{aligned} \tilde{\xi }_i = \sum _{k=-1}^{d - 2-i} \varpi (k-d+1+i) v_k(\tilde{K}) \le 2 \sum _{k=-1}^{\lceil \tilde{\kappa }d\rceil } v_k(\tilde{K}) \le 2(d-1)\mathrm {e}^{-\varepsilon ' d}, \end{aligned}$$
(9.3)

for some \(\varepsilon ' >0\) and all sufficiently large \(d\). Applying (9.2) and (9.3) to (9.1), we find

$$\begin{aligned} P \ge \sum _{i=\lceil \kappa d \rceil + 1}^{d-1} v_i(K)\bigl ( 1- 2(d-1)\mathrm {e}^{-\varepsilon ' d}\bigr ) \ge \left( \sum _{i=\lceil \kappa d \rceil + 1}^{d-1} v_i(K) \right) - 2(d-1) \mathrm {e}^{-\varepsilon ' d}, \end{aligned}$$
(9.4)

where the second inequality follows from Fact 3.5: the spherical intrinsic volumes are positive and sum to one. We now reindex the sum on the right-hand side of (9.4) over \(i = -1,0,\cdots , d-1\) with only an exponentially small loss:

$$\begin{aligned} \sum _{i=\lceil \kappa d \rceil + 1}^{d-1} v_i(K^{(d)}) = \sum _{i=-1}^{d-1} v_i(K) - \xi , \end{aligned}$$

where the discrepancy \(\xi \) satisfies

$$\begin{aligned} \xi = \sum _{i = -1}^{\lceil \kappa d \rceil } v_k(K) \le (d-1) \mathrm {e}^{-\varepsilon '' d} \end{aligned}$$

for some \(\varepsilon ''>0\) and all sufficiently large \(d\) by definition of the lower decay threshold. Applying these observations to (9.4), we deduce that

$$\begin{aligned} P \ge \sum _{i=-1}^{d-1} v_i(K) - (d-1)\Bigl ( \mathrm {e}^{-\varepsilon ' d} + \mathrm {e}^{-\varepsilon ''d}\Bigr )\ge \sum _{i=-1}^{d-1} v_i(K) - \mathrm {e}^{-\varepsilon d} \end{aligned}$$

for some \(\varepsilon >0\) and all sufficiently large \(d\). Since \(\sum _{i=-1}^{d-1} v_i(K)= 1\) by Fact 3.5.2, this shows the result when both \(K\) and \(\tilde{K}\) are not subspaces, completing the first case.

For the second case, suppose that only one of the cones is a subspace. Without loss, we may assume \(\tilde{K}\) is the subspace by the symmetry of the spherical kinematic formula (Remark 3.8). Denote the dimension of the subspace \(\tilde{K}\) by \(\tilde{n}{:=}\mathrm {dim}(\tilde{K})\), and take parameters \(\kappa \) and \(\tilde{\kappa }\) as before.

By Proposition 3.2, the spherical intrinsic volumes of \(\tilde{K}\) are given by \(v_i(\tilde{K}) = \delta _{i,\tilde{n}-1}\). Inserting this Kronecker \(\delta \) into the spherical kinematic formula (3.2) and simplifying the resulting expression, we find the probability of interest is given by

$$\begin{aligned} P {:=}\mathbb {P}\left\{ K \cap \varvec{Q} \tilde{K} \ne \{\mathbf {0}\} \right\} = \sum _{k=d-\tilde{n}}^{d-1} \varpi (k+\tilde{n}-d) v_k(K). \end{aligned}$$

Reindexing the sum over \(k=-1,0,\cdots ,d-1\), we see

$$\begin{aligned} P&= \sum _{k=-1}^{d-1}\varpi (k+\tilde{n}-d) v_k(K)- \sum _{i=-1}^{d-\tilde{n}-1}\varpi (k+\tilde{n}-d) v_i(K)\nonumber \\&= 1- \sum _{i=-1}^{d-\tilde{n}-1}\varpi (k+\tilde{n}-d)v_i(K), \end{aligned}$$
(9.5)

where the second equality holds by the spherical Gauss–Bonnet formula (Fact 9.2).

We now show that \(\tilde{n}\) is relatively large. The definition of the lower decay threshold implies that there exists an \(\varepsilon ' >0\) such that

$$\begin{aligned} v_i(\tilde{K})= \delta _{i,\tilde{n}-1} \le \mathrm {e}^{-\varepsilon ' d} \;\;\;\text {for all}\;\; i \le \lceil \tilde{\kappa }d\rceil \end{aligned}$$

when \(d\) is sufficiently large. This inequality cannot hold if \(\tilde{n} -1\le \lceil \tilde{\kappa }d\rceil \), so we deduce that \(\tilde{n} \ge \tilde{\kappa }d \) for all sufficiently large \(d\).

Since \(\tilde{n}\ge \tilde{\kappa }d\) and \(\kappa +\tilde{\kappa }> 1\), we must have \(d-\tilde{n}-1 < \lceil \kappa d\rceil \) for all sufficiently large \(d\). Applying the definition of the lower decay threshold, we find that the sum on the right-hand side of (9.5) is exponentially small: there exists an \(\varepsilon ''>0\) such that

$$\begin{aligned} \sum _{i=-1}^{d-\tilde{n}-1}\varpi (k+\tilde{n} - d)v_i(K)\le 2(d-1)\mathrm {e}^{-\varepsilon '' d} \end{aligned}$$

for all sufficiently large \(d\). The result for the second case follows immediately.

Finally, we consider the case where both cones are subspaces. Suppose \(K\) has dimension \(n\) while \(\tilde{K}\) has dimension \(\tilde{n}\), and let \(\kappa \), \(\tilde{\kappa }\) be as above. As in the second case, we find that \(n\ge \kappa d\) and \(\tilde{n}\ge \tilde{\kappa }d\) when \(d\) is sufficiently large. Then the inequality \(\kappa _\star + \tilde{\kappa }_\star >1\) implies that \(n+\tilde{n} > d\), that is, the sum of the dimensions of the subspaces is larger than the ambient dimension. A standard fact from linear algebra implies \(K \cap \varvec{Q} \tilde{K} \ne \{\varvec{0}\}\) for any unitary \(\varvec{Q}\)—in other words, for all \(d\) large enough, the probability of nontrivial intersection is one. This completes the third case, and we are done. \(\square \)

1.2 Proof of Strong Guarantees of Theorem 4.11

Proof of Theorem 4.11

For clarity, we drop the superscript \(d\) in this proof. We begin with the union bound: the probability of interest \(P\) is bounded above by

$$\begin{aligned} P {:=}\mathbb {P}\left\{ K\cap \varvec{Q} \tilde{K} \ne \{\varvec{0}\} \,\text { for any }\, K \in \mathcal {K},\; \tilde{K} \in \tilde{\mathcal {K}}\right\} \le |\mathcal {K}| \cdot |\tilde{\mathcal K}| \cdot \mathbb {P}\left\{ K \cap \varvec{Q} \tilde{K} \ne \{\varvec{0}\}\right\} . \end{aligned}$$
(9.6)

From here, the proof closely parallels the proof of Theorem 4.4, so we compress the demonstration. We consider two cases, one where at least one cone is a subspace, the other where neither cone is a subspace; the result extends to the mixed case by considering subsequences.

Suppose first that at least one cone is a subspace. Let \(\theta >\theta _\star \) and \(\tilde{\theta }> \tilde{\theta }_\star \) with \(\theta +\tilde{\theta }<1\). We bound the probability on the right-hand side of (9.6) by

$$\begin{aligned} \frac{1}{2}\mathbb {P}\left\{ K \cap \varvec{Q} \tilde{K} \ne \{\varvec{0}\}\right\} \le \Sigma _1+\Sigma _2+\Sigma _3+\Sigma _4, \end{aligned}$$

where the \(\Sigma _i\) are given in (4.3). The fact that \(\theta + \tilde{\theta }<1\) implies that \(\Sigma _1 = 0\) for sufficiently large \(d\), as in the proof of Theorem 4.4. Since \(\theta > \theta _\star \), the definition of the upper decay threshold at level \(\psi +\tilde{\psi }\) implies

$$\begin{aligned} \Sigma _2 \le \sum _{i=\lceil \theta d \rceil + 1}^{d-1} v_i(K) \le (d-1) \mathrm {e}^{-d (\psi + \tilde{\psi }+ \varepsilon ')} \end{aligned}$$

for some \(\varepsilon '>0\) and all sufficiently large \(d\). With analogous reasoning, we find similar exponential bounds for \(\Sigma _3\) and \(\Sigma _4\):

$$\begin{aligned} \Sigma _3 \le (d-1) \mathrm {e}^{-d(\psi + \tilde{\psi }+ \varepsilon '')} , \quad \Sigma _4 \le (d-1) \mathrm {e}^{-d( \psi + \tilde{\psi }+\varepsilon ''')} \end{aligned}$$

again for positive \(\varepsilon '',\varepsilon '''\) and all sufficiently large \(d\). Summing these inequalities and taking \(d\) sufficiently large gives

$$\begin{aligned} \mathbb {P}\left\{ K \cap \varvec{Q} \tilde{K} \ne \{\varvec{0}\}\right\} \le \mathrm {e}^{-d(\psi + \tilde{\psi }+ \hat{\varepsilon })} \end{aligned}$$

for some \(\hat{\varepsilon }>0\). The claim then follows from our exponential upper bound on the growth of \(|\mathcal {K}|\) and \(|\tilde{\mathcal {K}}|\) with \(\eta = \hat{\varepsilon }/2\):

$$\begin{aligned} P \le 2\cdot |\mathcal {K}| \cdot |\tilde{\mathcal {K}}|\cdot \mathrm {e}^{ d (\psi + \tilde{\psi }+ \hat{\varepsilon }/2) - d(\psi +\tilde{\psi }+ \hat{\varepsilon })} = 2\mathrm {e}^{-\frac{\hat{\varepsilon }}{2}d}. \end{aligned}$$

Taking \(\varepsilon = \hat{\varepsilon }/4\) and \(d\) sufficiently large gives the claim in the first case.

Now consider the case where both cones are subspaces, and let \(n{:=}\dim (K) \) and \(\tilde{n}{:=}\dim (\tilde{K})\). Take parameters \(\theta > \theta _\star \) and \(\tilde{\theta }>\tilde{\theta }_\star \) such that \(\theta +\tilde{\theta }<1\). As in the proof of Theorem 4.4, the Kronecker \(\delta \) expression for the intrinsic volumes of the subspaces \(K\) and \(\tilde{K}\) given by Proposition 3.2, combined with the definition of the upper decay threshold, reveals that \(n\le \lceil \theta d\rceil \) and \(\tilde{n} \le \lceil \tilde{\theta }d\rceil \) for all sufficiently large \(d\). The fact that \(\theta +\tilde{\theta }<1\) implies \(n+\tilde{n} < d\) for all sufficiently large \(d\). Since randomly oriented subspaces are almost always in general position, the probability that \(K\cap \varvec{Q} K\ne \{\varvec{0}\}\) is zero. This is the second case, so we are done. \(\square \)

Appendix 3: Decay Thresholds for Feasible Cones of \(\ell _1 \) Norm

This section describes how we compute decay thresholds for the feasible cone of the \(\ell _1\) at sparse vectors. The polytope angle calculations appearing in [25] form an important part of this computation. For convenient comparisons, Table 2 provides a map between our notation and that of the reference.

Table 2 Notation translation between this work and [25]. (Note: in reference, \(\Psi _{\mathrm {net}}\) is defined for three arguments but only depends on two parameters, namely, \(\nu \) and \(\rho \delta \))

Fix a sparsity parameter \(\tau \in [0,1]\), and let \(\fancyscript{D}\) be an infinite set of indices. For each dimension \(d \in \fancyscript{D}\), we define a vector \(\varvec{x}^{(d)} \in \mathbb {R}^d\) such that \(\mathrm{nnz }(\varvec{x}^{(d)}) = \lceil \tau d\rceil \). The following results describe the behavior of the spherical intrinsic volumes of the feasible cone \(\mathcal {F}(\left\| {\cdot } \right\| _{\ell _1},\varvec{x}^{(d)})\) in terms of the sparsity \(\tau \) and the normalized index \(\theta = i/d\) when \(d\) is large.

Lemma 10.1

Consider the preceding ensemble. There exists a function \(\Psi _{\mathrm {total}} \) such that, for every \(\varepsilon > 0\) and all sufficiently large \(d\in \mathcal {D}\), we have

$$\begin{aligned} \frac{1}{d}\log \Bigl (v_{\lceil \theta d\rceil }\bigl (\mathcal {F}(\left\| {\cdot } \right\| _{\ell _1},\varvec{x}^{(d)})\bigr )\Bigr ) \le \Psi _{\mathrm {total}}(\theta ,\tau ) + \varepsilon \end{aligned}$$
(10.1)

for all \(\theta \in [\tau ,1]\), and

$$\begin{aligned} v_{\lceil \theta d\rceil }\bigl (\mathcal {F}(\left\| {\cdot } \right\| _{\ell _1},\varvec{x}^{(d)})\bigr ) = 0 \end{aligned}$$
(10.2)

for \(\theta \in [0, \tau )\).

We discuss the definition and computation of the normalized exponent \(\Psi _{\mathrm {total}}\) in Sect. 10.1. This function provides decay thresholds for the ensemble \(\bigl \{\mathcal {F}(\left\| {\cdot } \right\| _{\ell _1},\varvec{x}^{(d)})\mathrel {\mathop {:}}d \in \fancyscript{D}\bigr \}\) in the same way that the limit (3.5) provides decay thresholds for the ensemble of orthants. See Sect. 10.2 for details.

Proof of Lemma 10.1

We leave the dependence on the dimension implicit for clarity. Define \(k{:=}\lceil \tau d\rceil \).

We first show that (10.1) holds. The proof relies on an expression for spherical intrinsic volumes in terms of polytope angles. For a face \(F\) of a polytope \(P\), we define \(\beta (F,P)\) as the internal angle of \(P\) at \(F\) and \(\gamma (F,P)\) as the external angle of \(P\) at \(F\) (see [44, Chap. 14] for the definitions). What follows is an important alternative characterization of the spherical intrinsic volumes in terms of these angles.

Fact 10.2 ([69, Eq. (6.50)]) Let \(K\) be a polyhedral cone, and let \(\mathfrak {F}_i(K)\) be the set of all \(i\)-dimensional faces of \(K\). Then

$$\begin{aligned} v_i(K) = \sum _{F \in \mathfrak {F}_{i+1}(K)} \beta (\varvec{0},F) \gamma (F, K). \end{aligned}$$
(10.3)

We now specialize Fact 10.2 to the case where \(K= \mathcal {F}(\left\| {\cdot } \right\| _{\ell _1},\varvec{x})\). Define the sublevel set \(S {:=}\{\varvec{w} \mathrel {\mathop {:}}\left\| {\varvec{w}} \right\| _{\ell _1}\le \left\| {\varvec{x}} \right\| _{\ell _1}\}\). Recalling that our assumption \(\tau >0\) implies \(\left\| {\varvec{x}} \right\| _{\ell _1}>0\), we have

$$\begin{aligned} \mathcal {F}(\left\| {\cdot } \right\| _{\ell _1},\varvec{x}) = \mathrm {cone}(S-\{\varvec{x}\}) = \mathrm {cone}\bigl (C-\{\varvec{x}/\left\| {\varvec{x}} \right\| _{\ell _1}\}\bigr ), \end{aligned}$$
(10.4)

where \(C {:=}\{\varvec{w}\mathrel {\mathop {:}}\left\| {\varvec{w}} \right\| _{\ell _1} \le 1\}\) is the standard cross polytope. The fact that \(\varvec{x}\) is \(k\)-sparse is equivalent to the statement that \(\varvec{x}/\left\| {\varvec{x}} \right\| _{\ell _1}\) lies in the relative interior of a \((k-1)\)-dimensional face of the cross polytope \(C\).

Relationship (10.4) implies that there is a one-to-one correspondence between the \(i\)-dimensional faces of \(\mathcal {F}(\left\| {\cdot } \right\| _{\ell _1},\varvec{x})\) and the \(i\)-dimensional faces of the cross polytope \(C\) that contain \(\varvec{x}/\left\| {\varvec{x}} \right\| _{\ell _1}\). Since the internal and external angles only depend on the local structure of a given polytope, we find that for every nonempty face \(F\) of \(\mathcal {F}(\left\| {\cdot } \right\| _{\ell _1},\varvec{x})\) the internal and external angles satisfy

$$\begin{aligned} \beta (\varvec{0}, F) =\beta (\varvec{x}, \tilde{F}) \;\;\text {and}\;\; \gamma (F, K) = \gamma (\tilde{F}, C), \end{aligned}$$

where \(\tilde{F}\) is the face of the cross polytope \(C\) naturally corresponding to the face \(F\) of the feasible cone \(\mathcal {F}(\left\| {\cdot } \right\| _{\ell _1},\varvec{x})\).

A number of important relationships due to Böröczky and Henk [9] for faces of the cross polytope \(C\) are conveniently collected in [25], Sect. 3.3]. In particular, we will need the following two facts:

  1. 1.

    There are \(2^{i-k+2}\left( {\begin{array}{c}d-k\\ i-k+2\end{array}}\right) \) faces of \(C\) of dimension \((i+1) \ge (k-1)\) containing a given \((k-1)\)-dimensional face of \(C\).

  2. 2.

    The high degree of symmetry of the cross polytope ensures that the internal and external angles at these faces depend only on the dimensional parameters \(k\) and \(i\).

Applying the preceding observations to Eq. (10.3), we find

$$\begin{aligned} v_i\bigl (\mathcal {F}(\left\| {\cdot } \right\| _{\ell _1},\varvec{x})\bigr ) = 2^{i-k+2} \left( {\begin{array}{c}d-k\\ i-k+2\end{array}}\right) \beta (T_{k-1}, T_{i+1}) \gamma ( \tilde{F}_{i+1}, C) \end{aligned}$$
(10.5)

for \(i = \{k-1,\cdots ,d-1\}\). The notation \(\tilde{F}_{i+1}\) denotes an \((i+1)\)-dimensional face of the cross polytope \(C\), and \(T_j\) is the \(j\)-dimensional regular simplex.

The internal and external angles in (10.5) have explicit expressions due to [9] and the work of Ruben [67]. Donoho [25] conducts an asymptotic investigation of these formulas. To distill the essence of the analysis, Donoho gives continuous functions \(\Psi _{\mathrm {int}}(\theta , \tau )\) and \(\Psi _{\mathrm {ext}}(\theta )\) such that, for any \(\varepsilon >0\) and all sufficiently large \(d\), the inequalities

$$\begin{aligned} \frac{1}{d}\log \bigl (\beta (T_{k-1},T_{i+1})\bigr )&\le -\Psi _{\mathrm {int}}\left( \tfrac{i}{d},\tau \right) +\frac{\varepsilon }{3} \end{aligned}$$
(10.6)
$$\begin{aligned} \frac{1}{d}\log \bigl (\gamma (F_{i+1},C)\bigr )&\le -\Psi _{\mathrm {ext}}\left( \tfrac{i}{d}\right) + \frac{\varepsilon }{3} \end{aligned}$$
(10.7)

hold uniformly over \(i = \{\lceil \tau d \rceil ,\cdots ,d-1\}\). Moreover, it follows from Eq. (3.5) that for sufficiently large \(d\), we have

$$\begin{aligned} \frac{1}{d} \log \left( 2^{i-k+2}\left( {\begin{array}{c}d-k\\ i-k+2\end{array}}\right) \right) \le \Psi _{\mathrm {cont}}\left( \tfrac{i}{d},\tau \right) + \frac{\varepsilon }{3}, \end{aligned}$$

where the exponent \(\Psi _{\mathrm {cont}}\) for the number of containing faces is defined by

$$\begin{aligned} \Psi _{\mathrm {cont}}(\theta ,\tau ) {:=}(\theta -\tau )\log (2) + (1-\theta )H\left( \frac{\theta -\tau }{1-\tau }\right) . \end{aligned}$$
(10.8)

The function \(H(\theta )\) is the entropy defined by (3.6). Equation (10.1) follows by defining

$$\begin{aligned} \Psi _{\mathrm {total}}(\theta ,\tau ) {:=}\Psi _{\mathrm {cont}}(\theta ,\tau ) - \Psi _{\mathrm {int}}(\theta ,\tau ) - \Psi _{\mathrm {ext}}(\theta ) \end{aligned}$$
(10.9)

and taking logarithms in (10.5). This is the first claim.

We now show that, for any \(\theta <\tau \), Eq. (10.2) holds for all sufficiently large \(d\). Since \(\varvec{x}/\left\| {\varvec{x}} \right\| _{\ell _1}\) lies in a \((k-1)\)-dimensional face of the cross polytope \(C\), every face of \( \mathcal {F}(\left\| {\cdot } \right\| _{\ell _1},\varvec{x})\) has dimension at least \((k-1)\). It follows immediately from Definition 3.1 that

$$\begin{aligned} v_i\bigl (\mathcal {F}(\left\| {\cdot } \right\| _{\ell _1},\varvec{x})\bigr ) = 0 \end{aligned}$$

for all \(i <k-1\). Since \(k = \lceil \tau d\rceil \), we see that (10.2) holds for all sufficiently large \(d\) as long as \(\theta < \tau \). This is the second claim. \(\square \)

1.1 Computing the Exponents

We now define the functions needed to compute \(\Psi _{\mathrm {total}}\) in (10.9). Recall that \(\Psi _{\mathrm {cont}}\) is explicitly defined in (10.8). The functions \(\Psi _{\mathrm {int}}\) and \(\Psi _{\mathrm {ext}}\) are defined in [25], but we recapitulate their definitions for completeness. Define the implicit parameters \(x = x(\theta )\) and \(s = s(\theta ,\tau )\) as the solutions to the equations

$$\begin{aligned} \frac{2 x G(x)}{G'(x)}&= \frac{1-\theta }{\theta },\end{aligned}$$
(10.10)
$$\begin{aligned} M(s)&= 1- \frac{\tau }{\theta }, \end{aligned}$$
(10.11)

where \(G(x) = \frac{2}{\sqrt{\pi }} \int _0^x \exp (-t^2) \mathrm {d} t\) is the error function \(\mathrm{erf }(x)\) and \(M(s)\) is a variant of the Mills ratio given by

$$\begin{aligned} M(s) = -s \mathrm {e}^{s^2/2} \int _{-\infty }^s \mathrm {e}^{-t^2/2} \mathrm {d}t = -s\sqrt{\frac{\pi }{2}}\; \mathrm{erfcx }\!\left( -\frac{s}{\sqrt{2}}\right) , \end{aligned}$$

where \(\mathrm {erfcx}(s) = \mathrm {e}^{s^2} \mathrm {erfc}(s)\) is the scaled complementary error function. This second form for \(M(s)\) is convenient for numerical computations. It follows from [25] that \(x\) and \(s\) are well defined. Numerical evaluation of \(x(\theta )\) and \(s(\theta , \tau )\) is straightforward using, for example, bisection methods.

With these parameters in hand, the exponent for the internal angle is given byFootnote 3

$$\begin{aligned} \Psi _{\mathrm {int}}(\theta , \tau ) {:=}(\theta -\tau )\log \left( \sqrt{2\pi } \frac{s \theta }{\tau - \theta }\right) - \frac{\tau s^2}{2}, \end{aligned}$$
(10.12)

where \(s=s(\theta ,\tau )\) satisfies Eq. (10.11). The exponent for the external angle is

$$\begin{aligned} \Psi _{\mathrm {ext}}(\theta ) {:=}-(1-\theta ) \log \bigl (G(x)\bigr ) + \theta x^2, \end{aligned}$$
(10.13)

where \(x = x(\theta )\) is given by (10.10) above.

Figure 13 displays \(\Psi _{\mathrm {total}}(\cdot ,\tau )\) for a few values of the parameter \(\tau \). Empirically, it appears that \(\Psi _{\mathrm {total}}(\cdot ,\tau )\) is concave for every value of \(\tau \in [0,1]\) and has a unique maximal value of zero.

Fig. 13
figure 13

Upper bounds for exponent of spherical intrinsic volumes. We plot \(\Psi _\mathrm {total}(\tau ,\cdot ) \) for several different values of \(\tau \). The best upper decay threshold at level \(\psi \) is given by the rightmost \(\theta \) for which the curve intersects the horizontal line at \(-\psi \). The short dashes show the upper decay threshold \(\theta _{\ell _1}(0.1,0)\), while the long dashes show \(\theta _{\ell _1}(0.1,0.1)\), defined in Eq. (10.14). For each \(\tau \), the upper decay threshold at level zero is numerically equal to the lower decay threshold

1.2 Defining Decay Thresholds

The exponent \(\Psi _{\mathrm {total}}\) provides decay thresholds for the \(\ell _1\) norm at proportionally sparse vectors. We define

$$\begin{aligned} \theta _{\ell _1}(\tau ,\psi ) {:=}\inf \bigl \{\theta _{\star }\in [0,1]\mathrel {\mathop {:}}\Psi _{\mathrm {total}}(\theta , \tau ) < -\psi \text { for all } \theta \in ( \theta _\star ,1]\bigr \}. \end{aligned}$$
(10.14)

In words, \(\theta _{\ell _1}(\tau , \psi )\) is the rightmost point of intersection of the curve \(\Psi _{\mathrm {total}}(\cdot , \tau )\) with the horizontal line at the level \(-\psi \) (Fig. 13). Further, define

$$\begin{aligned} \kappa _{\ell _1}(\tau ) {:=}\sup \bigl \{\kappa _\star \in [0,1]\mathrel {\mathop {:}}\Psi _{\mathrm {total}} (\theta ,\tau ) < 0 \text { for all } \kappa \in [0,\kappa _\star )\bigr \}. \end{aligned}$$
(10.15)

This function \(\kappa _{\ell _1}(\tau )\) is the leftmost point of intersection of \(\Psi _{\mathrm {total}}(\cdot ,\tau )\) with the horizontal line at level zero. Equations (10.14) and (10.15) define decay thresholds for the ensemble of feasible cones for the \(\ell _1\) norm at proportionally sparse vectors.

Proposition 10.2

(Decay thresholds for the \(\ell _1\) norm at proportionally sparse vectors) Consider the ensemble of Lemma 10.1. The function \(\theta _{\ell _1}(\tau ,\psi )\) is an upper decay threshold at level \(\psi \) for the ensemble \(\{\mathcal {F}(\left\| {\cdot } \right\| _{\ell _1},\varvec{x}_0^{(d)})\mathrel {\mathop {:}}d \in \fancyscript{D}\}\), while \(\kappa _{\ell _1}(\tau ) \) is a lower decay threshold for the ensemble \(\{\mathcal {F}(\left\| {\cdot } \right\| _{\ell _1},\varvec{x}_0^{(d)})\mathrel {\mathop {:}}d \in \fancyscript{D}\}\).

Proof sketch

By definition, \(\Psi _{\mathrm {total}}(\theta ,\tau )<-\psi \) for every \(\theta > \theta _{\ell _1}(\tau ,\psi )\). It then follows immediately from Lemma 10.1 and Definition 4.7 that \(\theta _{\ell _1}(\tau ,\psi )\) is an upper decay threshold at level \(\psi \) for the ensemble \(\{\mathcal {F}(\left\| {\cdot } \right\| _{\ell _1},\varvec{x}^{(d)})\mathrel {\mathop {:}}d \in \fancyscript{D}\}\). The proof that \(\kappa _{\ell _1}(\tau )\) is a lower decay threshold for \(\bigl \{\mathcal {F}(\left\| {\cdot } \right\| _{\ell _1},\varvec{x}^{(d)})\mathrel {\mathop {:}}d \in \fancyscript{D}\bigr \}\) is equally straightforward, so we omit the argument.

We abbreviate the upper decay threshold at level zero by

$$\begin{aligned} \theta _{\ell _1}(\tau ) {:=}\theta _{\ell _1}(\tau ,0) \end{aligned}$$
(10.16)

for consistency with Definition 3.9.

Figure 13 illustrates the definition of \(\theta _{\ell _1}(\tau ,\psi )\). Numerically, it appears that zero is the unique maximal value of \(\Psi _{\mathrm {total}}(\tau ,\cdot )\) for every value \(\tau \in (0,1]\). If this is indeed the case, then we would be able to deduce that \(\theta _{\ell _1}(\tau ) = \kappa _{\ell _1}(\tau )\) for all values of sparsity \(\tau \).

1.3 Reconciliation with [25]

We now discuss the relationship between our spherical intrinsic volume approach and the bounds of [25], [28] for basis pursuit. Numerically, it appears that the two approaches provide equivalent success guarantees, but the expressions for the exponents seem to preclude a direct proof of equivalence. We also describe how our approach gives matching upper bounds for a region of success of basis pursuit, which shows that our results are unimprovable beyond numerical accuracy.

1.3.1 Reconciliation with Weak Threshold

Recall that basis pursuit is the linear inverse problem (5.1) with objective \(f(\cdot ) =\left\| {\cdot } \right\| _{\ell _1}\). By the first part of Lemma 5.1, basis pursuit with a Gaussian measurement matrix \(\varvec{\Omega }\in \mathbb {R}^{\lceil \sigma d\rceil \times d}\) will succeed at recovering a \(k=\lceil \tau d\rceil \)-sparse vector with overwhelming probability in high dimensions, as long as the pair \((\tau ,\sigma )\) satisfies \(\theta _{\ell _1}(\tau ) < \sigma \).

We now describe the analogous result given in [25, Sec. 7.1]. Define the critical sparsity ratio (compare with the critical proportion [25, Def. 2])

$$\begin{aligned} \tau _{W}(\sigma ) = \sup \bigl \{ \hat{\tau }\in [0,\sigma ]\mathrel {\mathop {:}}\Psi _{\mathrm {total}}(\theta ,\hat{\tau })<0 \text { for all } \theta \in [\sigma ,1]\bigr \}. \end{aligned}$$
(10.17)

Then the result [25, Thm. 2] is equivalent to the statement that basis pursuit with a Gaussian matrix \(\varvec{\Omega }\in \mathbb {R}^{\lceil \sigma d \rceil \times d}\) will succeed with overwhelming probability in high dimensions whenever the pair \((\tau ,\sigma )\) satisfies \(\tau < \tau _W(\sigma )\).

These two approaches show strong similarities, and the methods provide the same results to numerical precision. Indeed, under the assumption that both \(\tau _W(\sigma )\) and \(\theta _{\ell _1}(\tau )\) are monotonically increasing functions (this appears to hold empirically), one can show that these approaches are equivalent. Rather than dwell on this fine detail, we present a matching failure region for basis pursuit.

1.3.2 Matching Upper Bound

The following result links the lower decay threshold to regions where basis pursuit fails.

Proposition 10.3

Suppose \(\kappa _{\ell _1}(\tau ) > \sigma \). Then basis pursuit with \(n=\lceil \sigma d\rceil \) Gaussian measurements will fail with overwhelming probability in high dimensions for the \(\tau \)-sparse ensemble of Lemma 10.1.

Since the function \(\Psi _{\mathrm {total}}(\cdot ,\tau )\) has a unique maximal value of zero up to our ability to compute the functions involved (Fig. 13), we have the equality \(\kappa _{\ell _1}(\tau ) = \theta _{\ell _1}(\tau )\) to numerical precision. Coupling Proposition 10.3 with our discussion in Sect. 10.3.1 reveals that basis pursuit with a Gaussian measurement matrix exhibits a phase transition between success and failure at \(\sigma =\theta _{\ell _1}(\tau )\).

Proof of Proposition 10.4

Let \(\{\varvec{x}_0^{(d)} \in \mathbb {R}^d \}\) be an ensemble of \(\tau \)-sparse vectors as in Lemma 10.1. The null space of an \(n\times d\) Gaussian matrix is distributed as \(\varvec{Q} L\), where \(L\) is a linear subspace of dimension \((d-n)\). It then follows from [15, Prop. 2.1] that basis pursuit with \(n=\lceil \sigma d\rceil \) Gaussian measurements will succeed with the same probability that \(\varvec{Q} L \cap \mathcal {F}(\left\| {\cdot } \right\| _{\ell _1},\varvec{x}_0) = \{\varvec{0}\}\).

By Proposition 3.11, the value \(\kappa _\star = (1-\sigma )\) is a lower decay threshold for \(L\). Our assumption implies \(\kappa _{\ell _1}(\tau ) + \kappa _\star > 1\), so by Theorem 9.1, we see \(\varvec{Q} L \cap \mathcal {F}(\left\| {\cdot } \right\| _{\ell _1},\varvec{x}_0)\ne \{\varvec{0}\}\) with overwhelming probability in high dimensions. We conclude that basis pursuit will fail with overwhelming probability in high dimensions. \(\square \)

Appendix 4: Proof of Lemma 5.1 and Corollary 5.3

We begin with the proof of the two claims of Lemma 5.1 concerning the relationship between the upper decay threshold and the linear inverse problem (5.1). The first result is a corollary of [15, Prop. 2.1]. We drop the superscript \(d\) for clarity.

Proof of Lemma 5.1

(Part 1) Let \(\varvec{\Omega }\) be an \(n\times d\) Gaussian measurement matrix, where \(n=\lceil \sigma d\rceil \). The null space of \(\varvec{\Omega }\) is distributed as \(\varvec{Q} L\), where \(\varvec{Q}\) is a random basis and \(L\) is any fixed \((d-n)\)-dimensional subspace of \(\mathbb {R}^d\). Therefore, the probability that (5.1) will succeed is equal to the probability that \( \varvec{Q} L \cap \mathcal {F}(f,\varvec{x}_0) \ne \{\varvec{0}\}\) [15, Prop. 2.1]. By Proposition 3.11, the subspace \(L\) has an upper decay threshold \(\theta _L = 1-\sigma \), so that the assumption \(\theta _\star < \sigma \) implies that \(\theta _\star + \theta _L < 1\). The claim follows from Theorem 4.4. \(\square \)

The second claim of Lemma 5.1 requires additional effort. We require the following technical lemma.

Lemma 11.1

Let \(\fancyscript{D}\) be an infinite set of indices. Let \(\{K^{(d)}\mathrel {\mathop {:}}d \in \fancyscript{D}\}\) be an ensemble of closed convex cones with \(K^{(d)}\subset \mathbb {R}^d\) for each \(d\), and let \(\{L^{(d)}\mathrel {\mathop {:}}d \in \fancyscript{D}\}\) be an ensemble of linear subspaces of \(\mathbb {R}^d\) of dimension \(d-\lceil \sigma d\rceil \). If there exists an \(\varepsilon >0\) such that, for every sufficiently large \(d\),

$$\begin{aligned} \mathbb {P}\bigl \{ K^{(d)} \cap \varvec{Q} L^{(d)} \ne \{\mathbf {0}\}\bigr \} \le \mathrm {e}^{-\varepsilon d}, \end{aligned}$$

then \(\{K^{(d)}\}\) has an upper decay threshold \(\theta _\star = \sigma \).

Again, the spherical kinematic formula (3.2) is at the heart of the proof.

Proof of Lemma 11.1

We split the argument into two cases: first, we consider the case where \(K^{(d)}\) is not a subspace, and then we consider the case where \(K^{(d)}\) is a subspace. The general mixed-cone case follows by applying these arguments to the subsequences consisting of only one type of cone.

We drop the superscript \(d\) for clarity. Let \(n = \lceil \sigma d\rceil \). We first assume that \(K\) is not a subspace. By the spherical kinematic formula, the probability of interest \(P\) is given by

$$\begin{aligned} P {:=}\mathbb {P}\bigl \{K\cap \varvec{Q} L \ne \{\mathbf {0}\}\bigr \} = \sum _{k=0}^{d-1}(1+(-1)^k) \sum _{i=k}^{d-1}v_i(L) v_{d-1-i + k}(K ). \end{aligned}$$

By Proposition 3.2, \(v_i(L) = \delta _{i,d-n-1}\), so by replacing \(k\) with \(k-n\), the foregoing probability reduces to

$$\begin{aligned} P = \sum _{k=n}^{d-1} (1+(-1)^{k-n}) v_k(K). \end{aligned}$$
(11.1)

By assumption, we have \(P\le \mathrm {e}^{-\varepsilon d}\) for all sufficiently large \(d\), so the positivity of spherical intrinsic volumes (Fact 3.5.1) implies

$$\begin{aligned} v_k(K) \le \mathrm {e}^{-\varepsilon d}, \hbox { for any}\;k\ge n \hbox { such that}\;k \equiv n \mod 2. \end{aligned}$$

It requires an additional geometric observation to remove the dependence on parity. Let \(\tilde{L}\) be a \((d-n-1)\)-dimensional subspace contained in \(L\). By containment, it is immediate that

$$\begin{aligned} \tilde{P} {:=}\mathbb {P}\left\{ K\cap \varvec{Q} \tilde{L} \ne \{\mathbf {0}\}\right\} \le \mathbb {P}\left\{ K\cap \varvec{Q} L \ne \{\mathbf {0}\}\right\} \le \mathrm {e}^{-\varepsilon d}, \end{aligned}$$

where the last inequality is by assumption. But the same manipulations as above show

$$\begin{aligned} \tilde{P} = \sum _{k=n+1}^{d-1}(1+(-1)^{k-n-1}) v_k(K), \end{aligned}$$

so we have \(v_k(K)\le \mathrm {e}^{-\varepsilon d} \) for every \(k \ge n+1\) such that \(k \equiv n+1 \mod 2\). In summary, for every \(d\) sufficiently large and any \(k \ge n =\lceil \sigma d\rceil \), we have \(v_k(K) \le \mathrm {e}^{-\varepsilon d}\). By definition, \(\sigma \) is an upper decay threshold for the ensemble of cones \(K\). This completes the first case.

Now suppose that \(K\) is a subspace, and define \(m {:=}\dim (K)\). Since \(\dim (L) = d- \lceil \sigma d\rceil \), we have

$$\begin{aligned} P{:=}\mathbb {P}\left\{ {K\cap \varvec{Q} L \ne \{\varvec{0}\}} \right\} = \left\{ \begin{array}{ll}0, &{}\quad m< \lceil \sigma d\rceil \\ 1, &{}\quad \hbox {otherwise}\end{array}\right. \end{aligned}$$

because randomly oriented subspaces are almost always in general position. The assumption that \(P \le \mathrm {e}^{-\varepsilon d}\) for all sufficiently large \(d\) requires that \(m< \lceil \sigma d \rceil \) for all sufficiently large \(d\). By Proposition 3.11, the scalar \(\sigma \) is an upper decay threshold for subspace \(K\). This is the result for the second case, so we are done. \(\square \)

Proof of Lemma 5.1

(Part 2) The results of [15] imply that the linear inverse problem (5.1) with a Gaussian measurement matrix \(\varvec{\Omega }\) will succeed with the same probability that a randomly oriented \((n-d)\)-dimensional subspace \(\varvec{Q} L\) strikes the feasible cone \(\mathcal {F}(f,\varvec{x}_0)\) trivially. The result then follows from Lemma 11.1. \(\square \)

We conclude with the proof of Corollary 5.3, which asserts that \(\theta _\star \) is an upper decay threshold for the ensemble \(\{\mathcal {F}(f^{(d)},\varvec{x}_0^{(d)})\mathrel {\mathop {:}}d\in \mathcal {D}\}\) whenever the bound (5.3) holds.

Proof of Corollary 5.3

To shorten notation, we drop the explicit dependence on \(d\), and we define the width \(W:= W\bigl (\mathcal {F}(f,\varvec{x}_0)\cap \mathsf {S}^{d-1}\bigr )\). The resultFootnote 4 [15, Cor. 3.3(1)] states that for any \(\varepsilon >0\),

$$\begin{aligned} n \ge \bigl (W+\varepsilon \sqrt{d}\bigr )^2 +1 \ \implies (5.2)\;\hbox {fails with probability}\le \mathrm {e}^{-\varepsilon ^2d/2}. \end{aligned}$$
(11.2)

Fix \(\varepsilon \in (0,1)\), and define \(\theta _\varepsilon {:=}\theta _\star + 2\varepsilon (\theta _\star ^{1/2}+1)\). We claim that \(\theta _\varepsilon \) is an upper decay threshold for the ensemble \(\{\mathcal {F}(f,\varvec{x}_0)\}\). To see this, choose the number of measurements \(n= \lceil \theta _\varepsilon d\rceil \) in the linear inverse problem (5.2). Then for \(d\ge \varepsilon ^{-1}\) we have

$$\begin{aligned} n \ge \theta _\varepsilon d \ge \bigl [\theta _\star +\varepsilon (2\theta _\star ^{1/2}+1)\bigr ] d + 1 \end{aligned}$$

by our choice of \(n\). We bound the bracketed expression using the convexity of the map \(\varepsilon \mapsto \left( \theta _\star ^{1/2} + \varepsilon \right) ^2\),

$$\begin{aligned} \theta _\star +\varepsilon (2\theta _\star ^{1/2}+1) = (1-\varepsilon )\theta _\star + \varepsilon \left( \theta _\star ^{1/2}+1\right) ^2 \ge \left( \theta _\star ^{1/2} + \varepsilon \right) ^2, \end{aligned}$$

because \(\varepsilon \in (0,1)\) and a convex function lies below the chord connecting its endpoints. Combining the two displayed preceding equations yields

$$\begin{aligned} n -1 \ge \left( \theta _\star ^{1/2} + \varepsilon \right) ^2 d \ge \bigl (W + \varepsilon \sqrt{d}\bigr )^2, \end{aligned}$$

where the second inequality holds for all sufficiently large \(d\) by assumption (5.3). The implication (11.2) thus implies that the linear inverse problem (5.2) will succeed with overwhelming probability in high dimensions when \(n = \lceil \theta _\varepsilon d\rceil \). From the second part of Lemma 5.1, we conclude that \(\theta _\varepsilon \) is an upper decay threshold for \(\{\mathcal {F}(f,\varvec{x}_0)\}\), as claimed. The proof is completed by taking \(\varepsilon \rightarrow 0\) and verifying that a limit of decay thresholds is itself a decay threshold. We omit this straightforward, but technical, argument. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Cite this article

McCoy, M.B., Tropp, J.A. Sharp Recovery Bounds for Convex Demixing, with Applications. Found Comput Math 14, 503–567 (2014). https://doi.org/10.1007/s10208-014-9191-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10208-014-9191-2

Keywords

Mathematics Subject Classification (2000)

Navigation