Skip to main content
Log in

A Geometric Analysis of Phase Retrieval

  • Published:
Foundations of Computational Mathematics Aims and scope Submit manuscript

Abstract

Can we recover a complex signal from its Fourier magnitudes? More generally, given a set of m measurements, \(y_k = \left| \varvec{a}_k^* \varvec{x} \right| \) for \(k = 1, \ldots , m\), is it possible to recover \(\varvec{x} \in \mathbb C^n\) (i.e., length-n complex vector)? This generalized phase retrieval (GPR) problem is a fundamental task in various disciplines and has been the subject of much recent investigation. Natural nonconvex heuristics often work remarkably well for GPR in practice, but lack clear theoretic explanations. In this paper, we take a step toward bridging this gap. We prove that when the measurement vectors \(\varvec{a}_k\)’s are generic (i.i.d. complex Gaussian) and numerous enough (\(m \ge C n \log ^3 n\)), with high probability, a natural least-squares formulation for GPR has the following benign geometric structure: (1) There are no spurious local minimizers, and all global minimizers are equal to the target signal \(\varvec{x}\), up to a global phase, and (2) the objective function has a negative directional curvature around each saddle point. This structure allows a number of iterative optimization methods to efficiently find a global minimizer, without special initialization. To corroborate the claim, we describe and analyze a second-order trust-region algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. Another least-squares formulation, \({{\mathrm{minimize}}}_{\varvec{z}}\; \frac{1}{2m} \sum _{k=1}^m (y_k - \left| \varvec{a}_k^* \varvec{z} \right| )^2\), was first studied in the seminal works [41, 46]. An obvious advantage of the \(f(\varvec{z})\) studied here is that it is differentiable in the sense of Wirtinger calculus introduced later.

  2. Strictly speaking, \(f(\varvec{z})\) is not a complex polynomial in \(\varvec{z}\) over the complex field; complex polynomials are necessarily complex differentiable. However, \(f(\varvec{z})\) is a fourth-order real polynomial in real and complex parts of \(\varvec{z}\).

  3. Mathematically, \(f(\varvec{z})\) is not complex differentiable; here, the gradient is defined based on the Wirtinger calculus [65]; see also [31]. This notion of gradient is a natural choice when optimizing real-valued functions of complex variables.

  4. Note that the global sign cannot be recovered.

  5. The probability is with respect to drawing of \(\varvec{a}_k\)’s.

  6. Such saddle points are called ridable saddles [97] or strict saddles [44]; see [6] for computational methods for escaping from higher-order saddles also.

  7. Another line of research [5, 14, 16] seeks to co-design the measurements and recovery algorithms based on frame- or graph-theoretic tools. While revising this work, new convex relaxations based on second-order cone programming have been proposed [13, 48, 50, 51].

  8. In addition, [36] shows that the measurements can be nonadaptive, in the sense that a single, randomly chosen collection of vectors \(\varvec{a}_i\) can simultaneously recover every \(\varvec{x} \in \mathbb C^n\). Results in [31, 76] and this paper pertain only to adaptive measurements that recover any fixed signal \(\varvec{x}\) with high probability.

  9. The same challenge is also faced by [31, 36].

  10. Two complex vectors \(\varvec{w}, \varvec{v}\) are orthogonal in complex sense if \(\varvec{w}^* \varvec{v} = 0\).

  11. The precise definition is as follows: Write \(\varvec{z} = \varvec{u} + \mathrm {i}\varvec{v}\). Then \(\frac{\partial g}{\partial \varvec{z}} \doteq \tfrac{1}{2} \left( \frac{\partial g}{\partial \varvec{u}} - \mathrm {i}\frac{\partial g}{\partial \varvec{v}} \right) \). Similarly, \(\frac{\partial g}{\partial \bar{\varvec{z}}} \doteq \tfrac{1}{2} \left( \frac{\partial g}{\partial \varvec{u}} + \mathrm {i}\frac{\partial g}{\partial \varvec{v}} \right) \).

  12. This can also be proved, in a relatively straightforward way, using the geometry of the objective f. In the interest of brevity, we do not pursue this here.

  13. It is possible to refine the argument a bit by proving that the sequence does not exit \(\mathscr {R}_3'\) once entering it, in which case the bound can be tightened as \(f(\varvec{z}^{(0)}) /\min (d_1, d_2, d_3)\). We prefer to state this crude bound to save the additional technicality.

  14. The proof ideas are contained in Chapter 6 of [37]; see also [2]. Intuitively, such result is possible because reasonably good approximate solutions to the TRM subproblem make qualitatively similar progress as the exact solution. Recent work [23, 33] has established worst-case polynomial iteration complexity (under reasonable assumptions on the geometric parameters of the functions, of course) of TRM to converge to point verifying the second-order optimality conditions. Their results allow inexact trust-region subproblem solvers, as well as adaptive step sizes. Based on our geometric result, we could have directly called their results, producing slightly worse iteration complexity bounds. It is not hard to adapt their proof taking advantage of the stronger geometric property we established and produce tighter results.

  15. Available online: http://www.manopt.org.

  16. ...adjusted in sign to ensure positive correlation with the gradient—if it does not vanish.

  17. Similar modification is also adopted in the TRM algorithmic framework in the recent work [23] (Algorithm 3).

  18. This prescription should be taken with a grain of salt, as here we have only tested a single fixed n.

  19. Numerics in [36] suggest that under the same measurement model, \(m = 5n\) is sufficient for efficient recovery. Our requirement on control of the whole function landscape and hence “initialization-free” algorithm may need the additional complexity.

  20. The main limitation in this experiment was not the TRM solver, but the need to store the vectors \(\varvec{a}_1, \ldots \varvec{a}_m\). For other measurement models, such as the coded diffraction model [30], “matrix-free” calculation is possible, and storage is no longer a bottleneck.

References

  1. Pierre-Antoine Absil, Christopher G. Baker, and Kyle A. Gallivan. Trust-region methods on Riemannian manifolds. Foundations of Computational Mathematics, 7(3):303–330, 2007.

    Article  MathSciNet  Google Scholar 

  2. Pierre-Antoine. Absil, Robert Mahoney, and Rodolphe Sepulchre. Optimization Algorithms on Matrix Manifolds. Princeton University Press, 2009.

  3. Alekh Agarwal, Animashree Anandkumar, Prateek Jain, Praneeth Netrapalli, and Rashish Tandon. Learning sparsely used overcomplete dictionaries via alternating minimization. arXiv preprint arXiv:1310.7991, 2013.

  4. Alekh Agarwal, Animashree Anandkumar, and Praneeth Netrapalli. Exact recovery of sparsely used overcomplete dictionaries. arXiv preprint arXiv:1309.1952, 2013.

  5. Boris Alexeev, Afonso S. Bandeira, Matthew Fickus, and Dustin G. Mixon. Phase retrieval with polarization. SIAM Journal on Imaging Sciences, 7(1):35–66, 2014.

    Article  Google Scholar 

  6. Anima Anandkumar and Rong Ge. Efficient approaches for escaping higher order saddle points in non-convex optimization. arXiv preprint arXiv:1602.05908, 2016.

  7. Animashree Anandkumar, Rong Ge, and Majid Janzamin. Analyzing tensor power method dynamics: Applications to learning overcomplete latent variable models. arXiv preprint arXiv:1411.1488, 2014.

  8. Animashree Anandkumar, Rong Ge, and Majid Janzamin. Guaranteed non-orthogonal tensor decomposition via alternating rank-1 updates. arXiv preprint arXiv:1402.5180, 2014.

  9. Animashree Anandkumar, Prateek Jain, Yang Shi, and Uma Naresh Niranjan. Tensor vs matrix methods: Robust tensor decomposition under block sparse perturbations. arXiv preprint arXiv:1510.04747, 2015.

  10. Sanjeev Arora, Aditya Bhaskara, Rong Ge, and Tengyu Ma. More algorithms for provable dictionary learning. arXiv preprint arXiv:1401.0579, 2014.

  11. Sanjeev Arora, Rong Ge, Tengyu Ma, and Ankur Moitra. Simple, efficient, and neural algorithms for sparse coding. arXiv preprint arXiv:1503.00778, 2015.

  12. Sanjeev Arora, Rong Ge, and Ankur Moitra. New algorithms for learning incoherent and overcomplete dictionaries. arXiv preprint arXiv:1308.6273, 2013.

  13. Sohail Bahmani and Justin Romberg. Phase retrieval meets statistical learning theory: A flexible convex relaxation. arXiv preprint arXiv:1610.04210, 2016.

  14. Radu Balan, Bernhard G. Bodmann, Peter G. Casazza, and Dan Edidin. Painless reconstruction from magnitudes of frame coefficients. Journal of Fourier Analysis and Applications, 15(4):488–501, 2009.

    Article  MathSciNet  Google Scholar 

  15. Radu V. Balan. On signal reconstruction from its spectrogram. In Information Sciences and Systems (CISS), 44th Annual Conference on, pp. 1–4. IEEE, 2010.

  16. Radu Balana, Pete Casazzab, and Dan Edidin. On signal reconstruction without phase. Applied and Computational Harmonic Analysis, 20(3):345 – 356, 2006.

    Article  MathSciNet  Google Scholar 

  17. Afonso S. Bandeira, Nicolas Boumal, and Vladislav Voroninski. On the low-rank approach for semidefinite programs arising in synchronization and community detection. arXiv preprint arXiv:1602.04426, 2016.

  18. Tamir Bendory and Yonina C. Eldar. Non-convex phase retrieval from STFT measurements. arXiv preprint arXiv:1607.08218, 2016.

  19. Dimitri P. Bertsekas. Nonlinear programming. 1999.

  20. Srinadh Bhojanapalli, Behnam Neyshabur, and Nathan Srebro. Global optimality of local search for low rank matrix recovery. arXiv preprint arXiv:1605.07221, 2016.

  21. Stéphane Boucheron, Gábor Lugosi, and Pascal Massart. Concentration inequalities: A nonasymptotic theory of independence. Oxford University Press, Oxford, 2013.

  22. Nicolas Boumal. Nonconvex phase synchronization. arXiv preprint arXiv:1601.06114, 2016.

  23. Nicolas Boumal, P-A Absil, and Coralia Cartis. Global rates of convergence for nonconvex optimization on manifolds. arXiv preprint arXiv:1605.08101, 2016.

  24. Nicolas Boumal, Bamdev Mishra, P.-A. Absil, and Rodolphe Sepulchre. Manopt, a Matlab toolbox for optimization on manifolds. Journal of Machine Learning Research, 15:1455–1459, 2014.

    MATH  Google Scholar 

  25. Nicolas Boumal, Vladislav Voroninski, and Afonso S. Bandeira. The non-convex burer-monteiro approach works on smooth semidefinite programs. arXiv preprint arXiv:1606.04970, 2016.

  26. Oliver Bunk, Ana Diaz, Franz Pfeiffer, Christian David, Bernd Schmitt, Dillip K. Satapathy, and J. Friso van der Veen. Diffractive imaging for periodic samples: retrieving one-dimensional concentration profiles across microfluidic channels. Acta Crystallographica Section A, 63(4):306–314, Jul. 2007.

    Google Scholar 

  27. T. Tony Cai, Xiaodong Li, and Zongming Ma. Optimal rates of convergence for noisy sparse phase retrieval via thresholded Wirtinger flow. arXiv preprint arXiv:1506.03382, 2015.

  28. Emmanuel J. Candès, Yonina C. Eldar, Thomas Strohmer, and Vladislav Voroninski. Phase retrieval via matrix completion. SIAM Journal on Imaging Sciences, 6(1), 2013.

  29. Emmanuel J. Candès and Xiaodong Li. Solving quadratic equations via phaselift when there are about as many equations as unknowns. Foundations of Computational Mathematics, 14(5):1017–1026, 2014.

    Article  MathSciNet  Google Scholar 

  30. Emmanuel J. Candès, Xiaodong Li, and Mahdi Soltanolkotabi. Phase retrieval from coded diffraction patterns. Applied and Computational Harmonic Analysis, 39(2):277–299, 2015.

    Article  MathSciNet  Google Scholar 

  31. Emmanuel J. Candès, Xiaodong Li, and Mahdi Soltanolkotabi. Phase retrieval via wirtinger flow: Theory and algorithms. Information Theory, IEEE Transactions on, 61(4):1985–2007, April 2015.

    Article  MathSciNet  Google Scholar 

  32. Emmanuel J. Candès, Thomas Strohmer, and Vladislav Voroninski. Phaselift: Exact and stable signal recovery from magnitude measurements via convex programming. Communications on Pure and Applied Mathematics, 66(8):1241–1274, 2013.

    Article  MathSciNet  Google Scholar 

  33. Coralia Cartis, Nicholas IM Gould, and Ph L Toint. Complexity bounds for second-order optimality in unconstrained optimization. Journal of Complexity, 28(1):93–108, 2012.

    Article  MathSciNet  Google Scholar 

  34. Anwei Chai, Miguel Moscoso, and George Papanicolaou. Array imaging using intensity-only measurements. Inverse Problems, 27(1):015005, 2011.

    Article  MathSciNet  Google Scholar 

  35. Yudong Chen and Martin J. Wainwright. Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees. arXiv preprint arXiv:1509.03025, 2015.

  36. Yuxin Chen and Emmanuel J. Candès. Solving random quadratic systems of equations is nearly as easy as solving linear systems. arXiv preprint arXiv:1505.05114, 2015.

  37. Andrew R. Conn, Nicholas I.M. Gould, and Philippe L. Toint. Trust region methods, volume 1. SIAM, 2000.

  38. John V. Corbett. The pauli problem, state reconstruction and quantum-real numbers. Reports on Mathematical Physics, 57(1):53–68, 2006.

    Article  MathSciNet  Google Scholar 

  39. Chris Dainty and James R. Fienup. Phase retrieval and image reconstruction for astronomy. Image Recovery: Theory and Application, pages 231–275, 1987.

  40. Armin Eftekhari and Michael B. Wakin. Greed is super: A fast algorithm for super-resolution. arXiv preprint arXiv:1511.03385, 2015.

  41. James R. Fienup. Phase retrieval algorithms: a comparison. Applied Optics, 21(15):2758–2769, Aug 1982.

    Article  Google Scholar 

  42. Charles Fortin and Henry Wolkowicz. The trust region subproblem and semidefinite programming. Optimization methods and software, 19(1):41–67, 2004.

    Article  MathSciNet  Google Scholar 

  43. Bing Gao and Zhiqiang Xu. Gauss-newton method for phase retrieval. arXiv preprint arXiv:1606.08135, 2016.

  44. Rong Ge, Furong Huang, Chi Jin, and Yang Yuan. Escaping from saddle points—online stochastic gradient for tensor decomposition. In Proceedings of The 28th Conference on Learning Theory, pages 797–842, 2015.

  45. Rong Ge, Jason D. Lee, and Tengyu Ma. Matrix completion has no spurious local minimum. arXiv preprint arXiv:1605.07272, 2016.

  46. R. W. Gerchberg and W. Owen Saxton. A practical algorithm for the determination of the phase from image and diffraction plane pictures. Optik, 35:237–246, 1972.

    Google Scholar 

  47. Donald Goldfarb. Curvilinear path steplength algorithms for minimization which use directions of negative curvature. Mathematical programming, 18(1):31–40, 1980.

    Article  MathSciNet  Google Scholar 

  48. Tom Goldstein and Christoph Studer. Phasemax: Convex phase retrieval via basis pursuit. arXiv preprint arXiv:1610.07531, 2016.

  49. David Gross, Felix Krahmer, and Richard Kueng. A partial derandomization of phaselift using spherical designs. arXiv preprint arXiv:1310.2267, 2013.

  50. Paul Hand and Vladislav Voroninski. Compressed sensing from phaseless gaussian measurements via linear programming in the natural parameter space. arXiv preprint arXiv:1611.05985, 2016.

  51. Paul Hand and Vladislav Voroninski. An elementary proof of convex phase retrieval in the natural parameter space via the linear program phasemax. arXiv preprint arXiv:1611.03935, 2016.

  52. Moritz Hardt. Understanding alternating minimization for matrix completion. In Foundations of Computer Science (FOCS), 2014 IEEE 55th Annual Symposium on, pages 651–660. IEEE, 2014.

  53. Moritz Hardt and Mary Wootters. Fast matrix completion without the condition number. In Proceedings of The 27th Conference on Learning Theory, pages 638–678, 2014.

  54. Teiko Heinosaari, Luca Mazzarella, and Michael M. Wolf. Quantum tomography under prior information. Communications in Mathematical Physics, 318(2):355–374, 2013.

    Article  MathSciNet  Google Scholar 

  55. Samuel B. Hopkins, Tselil Schramm, Jonathan Shi, and David Steurer. Speeding up sum-of-squares for tensor decomposition and planted sparse vectors. arXiv preprint arXiv:1512.02337, 2015.

  56. Kishore Jaganathan, Yonina C. Eldar, and Babak Hassibi. Phase retrieval: An overview of recent developments. arXiv preprint arXiv:1510.07713, 2015.

  57. Kishore Jaganathan, Samet Oymak, and Babak Hassibi. Sparse phase retrieval: Convex algorithms and limitations. In Proceedings of IEEE International Symposium on Information Theory, pages 1022–1026. IEEE, 2013.

  58. Prateek Jain, Chi Jin, Sham M. Kakade, and Praneeth Netrapalli. Computing matrix squareroot via non convex local search. arXiv preprint arXiv:1507.05854, 2015.

  59. Prateek Jain and Praneeth Netrapalli. Fast exact matrix completion with finite samples. arXiv preprint arXiv:1411.1087, 2014.

  60. Prateek Jain, Praneeth Netrapalli, and Sujay Sanghavi. Low-rank matrix completion using alternating minimization. In Proceedings of the forty-fifth annual ACM symposium on Theory of Computing, pages 665–674. ACM, 2013.

  61. Prateek Jain and Sewoong Oh. Provable tensor factorization with missing data. In Advances in Neural Information Processing Systems, pages 1431–1439, 2014.

  62. Kenji Kawaguchi. Deep learning without poor local minima. arXiv preprint arXiv:1605.07110, 2016.

  63. Raghunandan H. Keshavan, Andrea Montanari, and Sewoong Oh. Matrix completion from a few entries. Information Theory, IEEE Transactions on, 56(6):2980–2998, 2010.

    Article  MathSciNet  Google Scholar 

  64. Ritesh Kolte and Ayfer Özgür. Phase retrieval via incremental truncated wirtinger flow. arXiv preprint arXiv:1606.03196, 2016.

  65. Ken Kreutz-Delgado. The complex gradient operator and the \(\mathbb{C}\mathbb{R}\)-calculus. arXiv preprint arXiv:0906.4835, 2009.

  66. Jason D Lee, Max Simchowitz, Michael I Jordan, and Benjamin Recht. Gradient descent converges to minimizers. arXiv preprint arXiv:1602.04915, 2016.

  67. Kiryung Lee and Marius Junge. RIP-like properties in subsampled blind deconvolution. arXiv preprint arXiv:1511.06146, 2015.

  68. Kiryung Lee, Yanjun Li, Marius Junge, and Yoram Bresler. Blind recovery of sparse signals from subsampled convolution. arXiv preprint arXiv:1511.06149, 2015.

  69. Kiryung Lee, Yihong Wu, and Yoram Bresler. Near optimal compressed sensing of sparse rank-one matrices via sparse power factorization. arXiv preprint arXiv:1312.0525, 2013.

  70. Xiaodong Li and Vladislav Voroninski. Sparse signal recovery from quadratic measurements via convex programming. SIAM Journal on Mathematical Analysis, 45(5):3019–3033, 2013.

    Article  MathSciNet  Google Scholar 

  71. Jianwei Miao, Tetsuya Ishikawa, Bart Johnson, Erik H. Anderson, Barry Lai, and Keith O. Hodgson. High resolution 3D X-Ray diffraction microscopy. Phys. Rev. Lett., 89(8):088303, Aug 2002.

    Article  Google Scholar 

  72. R. P. Millane. Phase retrieval in crystallography and optics. Journal of the Optical Society of America A, 7(3):394–411, Mar 1990.

    Article  Google Scholar 

  73. Jorge J. Moré and Danny C. Sorensen. Computing a trust region step. SIAM Journal on Scientific and Statistical Computing, 4(3):553–572, 1983.

    Article  MathSciNet  Google Scholar 

  74. Cun Mu, Bo Huang, John Wright, and Donald Goldfarb. Square deal: Lower bounds and improved convex relaxations for tensor recovery. Journal of Machine Learning Research, 1:1–48, 2014.

    Google Scholar 

  75. Yurii Nesterov and Boris T. Polyak. Cubic regularization of newton method and its global performance. Mathematical Programming, 108(1):177–205, 2006.

    Article  MathSciNet  Google Scholar 

  76. Praneeth Netrapalli, Prateek Jain, and Sujay Sanghavi. Phase retrieval using alternating minimization. In Advances in Neural Information Processing Systems, pages 2796–2804, 2013.

  77. Praneeth Netrapalli, Uma Naresh. Niranjan, Sujay Sanghavi, Animashree Anandkumar, and Prateek Jain. Non-convex robust PCA. In Advances in Neural Information Processing Systems, pages 1107–1115, 2014.

  78. Jorge Nocedal and Stephen Wright. Numerical optimization. Springer Science & Business Media, Berlin, UK,2006.

  79. Henrik Ohlsson, Allen Y. Yang, Roy Dong, and S. Shankar Sastry. CPRL – An extension of compressive sensing to the phase retrieval problem. In Advances in Neural Information Processing Systems. 2012.

  80. Henrik Ohlsson, Allen Y. Yang, Roy Dong, and S. Shankar Sastry. Compressive phase retrieval from squared output measurements via semidefinite programming. arXiv preprint arXiv:1111.6323, 2013.

  81. Henrik Ohlsson, Allen Y. Yang, Michel Verhaegen, and S. Shankar Sastry. Quadratic basis pursuit. arXiv preprint arXiv:1301.7002, 2013.

  82. Samet Oymak, Amin Jalali, Maryam Fazel, Yonina C. Eldar, and Babak Hassibi. Simultaneously structured models with application to sparse and low-rank matrices. arXiv preprint arXiv:1212.3753, 2012.

  83. Ioannis Panageas and Georgios Piliouras. Gradient descent only converges to minimizers: Non-isolated critical points and invariant regions. CoRR, vol. abs/1605.00405, 2016.

  84. Dohyung Park, Anastasios Kyrillidis, Constantine Caramanis, and Sujay Sanghavi. Non-square matrix sensing without spurious local minima via the burer-monteiro approach. arXiv preprint arXiv:1609.03240, 2016.

  85. Qing Qu, Ju Sun, and John Wright. Finding a sparse vector in a subspace: Linear sparsity using alternating directions. In Advances in Neural Information Processing Systems, pages 3401–3409, 2014.

  86. H. Reichenbach. In Philosophic foundations of quantum mechanics. University of California Press, 1965.

  87. Franz Rendl and Henry Wolkowicz. A semidefinite framework for trust region subproblems with applications to large scale minimization. Mathematical Programming, 77(1):273–299, 1997.

    Article  MathSciNet  Google Scholar 

  88. W. Harrison Robert. Phase problem in crystallography. Journal of the Optical Society of America A, 10(5):1046–1055, 1993.

    Article  Google Scholar 

  89. Christopher De Sa, Christopher Re, and Kunle Olukotun. Global convergence of stochastic gradient descent for some non-convex matrix problems. In The 32nd International Conference on Machine Learning, volume 37, pages 2332–2341, 2015.

  90. Hanie Sedghi and Animashree Anandkumar. Provable tensor methods for learning mixtures of classifiers. arXiv preprint arXiv:1412.3046, 2014.

  91. Yoav Shechtman, Amir Beck, and Yonina C. Eldar. GESPAR: Efficient phase retrieval of sparse signals. Signal Processing, IEEE Transactions on, 62(4):928–938, Feb 2014.

    Article  MathSciNet  Google Scholar 

  92. Yoav Shechtman, Yonina C. Eldar, Oren Cohen, Henry N. Chapman, Jianwei Miao, and Mordechai Segev. Phase retrieval with application to optical imaging: A contemporary overview. Signal Processing Magazine, IEEE, 32(3):87–109, May 2015.

    Article  Google Scholar 

  93. Mahdi Soltanolkotabi. Algorithms and theory for clustering and nonconvex quadratic programming. PhD thesis, Stanford University, 2014.

  94. Daniel Soudry and Yair Carmon. No bad local minima: Data independent training error guarantees for multilayer neural networks. arXiv preprint arXiv:1605.08361, 2016.

  95. Gilbert W. Stewart and Ji-guang Sun. Matrix Perturbation Theory. Academic press, Cambridge, 1990.

  96. Ju Sun, Qing Qu, and John Wright. Complete dictionary recovery over the sphere. arXiv preprint arXiv:1504.06785, 2015.

  97. Ju Sun, Qing Qu, and John Wright. When are nonconvex problems not scary? arXiv preprint arXiv:1510.06096, 2015.

  98. Ruoyu Sun and Zhi-Quan Luo. Guaranteed matrix completion via non-convex factorization. arXiv preprint arXiv:1411.8003, 2014.

  99. Stephen Tu, Ross Boczar, Mahdi Soltanolkotabi, and Benjamin Recht. Low-rank solutions of linear matrix equations via procrustes flow. arXiv preprint arXiv:1507.03566, 2015.

  100. Stephen A. Vavasis and Richard Zippel. Proving polynomial-time for sphere-constrained quadratic programming. Technical report, Cornell University, 1990.

  101. Roman Vershynin. Introduction to the non-asymptotic analysis of random matrices. In Yonina C. Eldar and Gitta Kutyniok, editors, Compressed Sensing, pages 210–268. Cambridge University Press, 2012.

  102. Vladislav Voroninski and Zhiqiang Xu. A strong restricted isometry property, with an application to phaseless compressed sensing. arXiv preprint arXiv:1404.3811, 2014.

  103. Irène Waldspurger. Phase retrieval with random gaussian sensing vectors by alternating projections. arXiv preprint arXiv:1609.03088, 2016.

  104. Iréne Waldspurger, Alexandre d Aspremont, and Stéphane Mallat. Phase recovery, maxcut and complex semidefinite programming. Mathematical Programming, 149(1-2):47–81, 2015.

    Article  MathSciNet  Google Scholar 

  105. Adriaan Walther. The question of phase retrieval in optics. Journal of Modern Optics, 10(1):41–49, 1963.

    MathSciNet  Google Scholar 

  106. Gang Wang, Georgios B Giannakis, and Yonina C Eldar. Solving systems of random quadratic equations via truncated amplitude flow. arXiv preprint, 2016.

  107. Ke Wei, Jian-Feng Cai, Tony F. Chan, and Shingyu Leung. Guarantees of Riemannian optimization for low rank matrix recovery. arXiv preprint arXiv:1511.01562, 2015.

  108. Chris D. White, Rachel Ward, and Sujay Sanghavi. The local convexity of solving quadratic equations. arXiv preprint arXiv:1506.07868, 2015.

  109. Yinyu Ye. On affine scaling algorithms for nonconvex quadratic programming. Mathematical Programming, 56(1-3):285–300, 1992.

    Article  MathSciNet  Google Scholar 

  110. Xinyang Yi, Constantine Caramanis, and Sujay Sanghavi. Alternating minimization for mixed linear regression. arXiv preprint arXiv:1310.3745, 2013.

  111. Huishuai Zhang, Yuejie Chi, and Yingbin Liang. Provable non-convex phase retrieval with outliers: Median truncated wirtinger flow. arXiv preprint arXiv:1603.03805, 2016.

  112. Huishuai Zhang and Yingbin Liang. Reshaped wirtinger flow for solving quadratic systems of equations. arXiv preprint arXiv:1605.07719, 2016.

  113. Qinqing Zheng and John Lafferty. A convergent gradient descent algorithm for rank minimization and semidefinite programming from random linear measurements. arXiv preprint arXiv:1506.06081, 2015.

Download references

Acknowledgements

This work was partially supported by funding from the Gordon and Betty Moore Foundation, the Alfred P. Sloan Foundation, and the Grants ONR N00014-13-1-0492, NSF CCF 1527809 and NSF IIS 1546411. We thank Nicolas Boumal for helpful discussion related to the Manopt package. We thank Mahdi Soltanolkotabi for pointing us to his early result on the local convexity around the target set for GPR in \(\mathbb R^n\). We also thank Yonina Eldar, Kishore Jaganathan and Xiaodong Li for helpful feedback on a prior version of this paper. We also thank the anonymous reviewers for their careful reading of the paper and for constructive comments which have helped us to substantially improve the presentation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ju Sun.

Additional information

Communicated by Thomas Strohmer.

Appendices

Appendices

Basic Tools and Results

Lemma 25

(Even moments of complex Gaussian) For \(a \sim \mathscr {CN}(1)\), it holds that

$$\begin{aligned} \mathbb E\left[ \left| a \right| ^{2p} \right] = p! \quad \forall \; p \in \mathbb N. \end{aligned}$$

Proof

Write \(a = x + \mathrm {i}y\), then \(x, y \sim _{i.i.d.} \mathscr {N}(0, 1/2)\). Thus,

$$\begin{aligned} \mathbb E\left[ \left| a \right| ^{2p} \right] = \mathbb E_{x, y} \left[ \left( x^2 + y^2\right) ^p\right] = \frac{1}{2^p}\mathbb E_{z \sim \chi ^2(2)} \left[ z^p\right] = \frac{1}{2^p} 2^p p! = p!, \end{aligned}$$

as claimed. \(\square \)

Lemma 26

(Integral form of Taylor’s theorem) Consider any continuous function \(f(\varvec{z}): \mathbb C^n \mapsto \mathbb R\) with continuous first- and second-order Wirtinger derivatives. For any \(\varvec{\delta }\in \mathbb C^n\) and scalar \(t\in \mathbb R\), we have

$$\begin{aligned} f(\varvec{z}+t\varvec{\delta })&= f(\varvec{z} ) + t \int _0^1 \begin{bmatrix} \varvec{\delta }\\ \overline{\varvec{\delta }}\end{bmatrix}^* \nabla f(\varvec{z}+ s t \varvec{\delta }) \; \mathrm{d}s, \\ f(\varvec{z}+t\varvec{\delta })&= f(\varvec{z} ) + t \begin{bmatrix}\varvec{\delta }\\ \overline{\varvec{\delta }}\end{bmatrix}^* \nabla f(\varvec{z} ) + t^2 \int _0^1 (1-s)\begin{bmatrix}\varvec{\delta }\\ \overline{\varvec{\delta }}\end{bmatrix}^* \nabla ^2 f(\varvec{z}+ st \varvec{\delta }) \begin{bmatrix}\varvec{\delta }\\ \overline{\varvec{\delta }}\end{bmatrix}\; \mathrm{d}s. \end{aligned}$$

Proof

Since f is continuous differentiable, by the fundamental theorem of calculus,

$$\begin{aligned} f(\varvec{z} + t\varvec{\delta }) = f(\varvec{z} ) +\int _0^t \begin{bmatrix}\varvec{\delta }\\ \overline{\varvec{\delta }}\end{bmatrix}^* \nabla f(\varvec{z} + \tau \varvec{\delta }) \; \text {d}\tau . \end{aligned}$$

Moreover, by integral by part, we obtain

$$\begin{aligned} f(\varvec{z}+t \varvec{\delta }) =&f(\varvec{z}) + \left. \left[ (\tau -t) \begin{bmatrix}\varvec{\delta }\\ \overline{\varvec{\delta }}\end{bmatrix} ^* \nabla f(\varvec{z} + \tau \varvec{\delta }) \right] \right| _{0}^t\\&- \int _{0}^t (\tau -t) \; d \left[ \begin{bmatrix}\varvec{\delta }\\ \overline{\varvec{\delta }}\end{bmatrix}^* \nabla f(\varvec{z} + \tau \varvec{\delta })\right] \nonumber \\ =&f(\varvec{x}) + t \begin{bmatrix}\varvec{\delta }\\ \overline{\varvec{\delta }}\end{bmatrix}^* \nabla f(\varvec{z}) +\int _{0}^t (t-\tau ) \begin{bmatrix}\varvec{\delta }\\ \overline{\varvec{\delta }}\end{bmatrix}^* \nabla ^2 f(\varvec{z} + \tau \varvec{\delta }) \begin{bmatrix}\varvec{\delta }\\ \overline{\varvec{\delta }}\end{bmatrix} \text {d}\tau . \end{aligned}$$

Change of variable \(\tau = st (0 \le s \le 1)\) gives the claimed result. \(\square \)

Lemma 27

(Error of quadratic approximation) Consider any continuous function \(f(\varvec{z}): \mathbb C^n \mapsto \mathbb R\) with continuous first- and second-order Wirtinger derivatives. Suppose its Hessian \(\nabla ^2 f(\varvec{z})\) is \(L_h\)-Lipschitz. Then the second-order approximation

$$\begin{aligned} \widehat{f}(\varvec{\delta }; \varvec{z}) = f(\varvec{z}) + \begin{bmatrix}\varvec{\delta }\\ \overline{\varvec{\delta }} \end{bmatrix}^* \nabla f(\varvec{z}) + \frac{1}{2}\begin{bmatrix}\varvec{\delta }\\ \overline{\varvec{\delta }} \end{bmatrix}^* \nabla ^2 f(\varvec{z}) \begin{bmatrix}\varvec{\delta }\\ \overline{\varvec{\delta }} \end{bmatrix} \end{aligned}$$

around each point \(\varvec{z}\) obeys

$$\begin{aligned} \left| f(\varvec{z}+ \varvec{\delta }) - \widehat{f}(\varvec{\delta }; \varvec{z}) \right| \le \frac{1}{3} L_h \left\| \varvec{\delta } \right\| _{}^3. \end{aligned}$$

Proof

By integral form of Taylor’s theorem in Lemma 26,

$$\begin{aligned} \left| f(\varvec{z}+ \varvec{\delta }) - \widehat{f}(\varvec{\delta }; \varvec{z} ) \right|&= \;\left| \int _0^1 (1-\tau ) \begin{bmatrix}\varvec{\delta }\\ \overline{\varvec{\delta }} \end{bmatrix}^* \left[ \nabla ^2 f(\varvec{x}+ \tau \varvec{\delta })- \nabla ^2 f(\varvec{x}) \right] \begin{bmatrix}\varvec{\delta }\\ \overline{\varvec{\delta }} \end{bmatrix} \; \text {d}\tau \right| \\&\le \; 2\left\| \varvec{\delta } \right\| _{}^2 \int _0^1 (1-\tau )\left\| \nabla ^2 f(\varvec{x}+ \tau \varvec{\delta })- \nabla ^2 f(\varvec{x}) \right\| _{} \;\text {d}\tau \\&\le \; 2L_h\left\| \varvec{\delta } \right\| _{}^3 \int _0^1 (1-\tau ) \tau \;\text {d}\tau = \frac{L_h}{3}\left\| \varvec{\delta } \right\| _{}^3, \end{aligned}$$

as desired. \(\square \)

Lemma 28

(Spectrum of complex Gaussian matrices) Let \(\varvec{X}\) be an \(n_1 \times n_2\) (\(n_1 > n_2\)) matrix with i.i.d. \(\mathscr {CN}\) entries. Then,

$$\begin{aligned} \sqrt{n_1} - \sqrt{n_2} \le \mathbb E\left[ \sigma _{\min }(\varvec{X}) \right] \le \mathbb E\left[ \sigma _{\max }(\varvec{X}) \right] \le \sqrt{n_1} + \sqrt{n_2}. \end{aligned}$$

Moreover, for each \(t \ge 0\), it holds with probability at least \(1 - 2\exp \left( -t^2\right) \) that

$$\begin{aligned} \sqrt{n_1} - \sqrt{n_2} - t \le \sigma _{\min }(\varvec{X}) \le \sigma _{\max }(\varvec{X}) \le \sqrt{n_1} + \sqrt{n_2} + t. \end{aligned}$$

Lemma 29

(Hoeffding-type inequality, Proposition 5.10 of [101]) Let \(X_1,\ldots , X_N\) be independent centered sub-Gaussian random variables, and let \(K = \max _i \left\| X_i \right\| _{\psi _2} \), where the sub-Gaussian norm

$$\begin{aligned} \left\| X_i \right\| _{\psi _2 } \doteq \sup _{p\ge 1} p^{-1/2} \left( \mathbb E\left[ \left| X \right| ^p \right] \right) ^{1/p}. \end{aligned}$$
(A.1)

Then for every \(\varvec{b} = \left[ b_1; \cdots ; b_N \right] \in \mathbb C^N\) and every \(t\ge 0\), we have

$$\begin{aligned} \mathbb P\left( \left| \sum _{k=1}^N b_k X_k \right| \ge t \right) \le e \exp \left( - \frac{ct^2}{K^2 \left\| \varvec{b} \right\| _{2}^2 } \right) . \end{aligned}$$
(A.2)

Here c is a universal constant.

Lemma 30

(Bernstein-type inequality, Proposition 5.17 of [101]) Let \(X_1,\ldots , X_N\) be independent centered subexponential random variables, and let \(K = \max _i \left\| X_i \right\| _{\psi _1} \), where the subexponential norm

$$\begin{aligned} \left\| X_i \right\| _{\psi _1} \doteq \sup _{p\ge 1} p^{-1} \left( \mathbb E\left[ \left| X \right| ^p \right] \right) ^{1/p}. \end{aligned}$$
(A.3)

Then for every \(\varvec{b} = \left[ b_1; \ldots ; b_N \right] \in \mathbb C^N\) and every \(t\ge 0\), we have

$$\begin{aligned} \mathbb P\left( \left| \sum _{k=1}^N b_k X_k \right| \ge t \right) \le 2 \exp \left( -c \min \left( \frac{t^2}{K^2\left\| \varvec{b} \right\| _{2}^2}, \frac{t}{K\left\| \varvec{b} \right\| _{\infty }}\right) \right) . \end{aligned}$$
(A.4)

Here c is a universal constant.

Lemma 31

(Sub-Gaussian lower tail for nonnegative RV’s, Problem 2.9 of [21]) Let \(X_1\), \(\ldots \), \(X_N\) be i.i.d. copies of the nonnegative random variable X with finite second moment. Then it holds that

$$\begin{aligned} \mathbb P\left[ \frac{1}{N} \sum _{i=1}^N \left( X_i - \mathbb E\left[ X_i \right] \right) < -t \right] \le \exp \left( -\frac{Nt^2}{2\sigma ^2}\right) \end{aligned}$$

for any \(t > 0\), where \(\sigma ^2 = \mathbb E\left[ X^2 \right] \).

Proof

For any \(\lambda > 0\), we have

$$\begin{aligned} \log \mathbb E\left[ \mathrm {e}^{-\lambda (X - \mathbb E\left[ X \right] )} \right] = \lambda \mathbb E\left[ X \right] +\log \mathbb E\left[ \mathrm {e}^{-\lambda X} \right] \le \lambda \mathbb E\left[ X \right] + \mathbb E\left[ \mathrm {e}^{-\lambda X} \right] - 1, \end{aligned}$$

where the last inequality holds thanks to \(\log u \le u -1\) for all \(u > 0\). Moreover, using the fact \(\mathrm {e}^u \le 1 + u + u^2/2\) for all \(u \le 0\), we obtain

$$\begin{aligned} \log \mathbb E\left[ \mathrm {e}^{-\lambda (X - \mathbb E\left[ X \right] )} \right] \le \frac{1}{2} \lambda ^2 \mathbb E\left[ X^2 \right] \Longleftrightarrow \mathbb E\left[ \mathrm {e}^{-\lambda (X - \mathbb E\left[ X \right] )} \right] \le \exp \left( \frac{1}{2} \lambda ^2 \mathbb E\left[ X^2 \right] \right) . \end{aligned}$$

Thus, by the usual exponential transform trick, we obtain that for any \(t > 0\),

$$\begin{aligned} \mathbb P\left[ \sum _{i=1}^N (X_i - \mathbb E\left[ X_i \right] ) < -t \right] \le \exp \left( -\lambda t + N\lambda ^2 \mathbb E\left[ X^2 \right] /2\right) . \end{aligned}$$

Taking \(\lambda = t/(N\sigma ^2)\) and making change of variable for t give the claimed result. \(\square \)

Lemma 32

(Moment-control Bernstein’s inequality for random variables) Let \(X_1, \ldots , X_p\) be i.i.d. copies of a real-valued random variable X Suppose that there exist some positive number R and \(\sigma _X^2\) such that

$$\begin{aligned} \mathbb E\left[ X^2 \right] \le \sigma _X^2, \quad \text {and} \quad \mathbb E\left[ \left| X \right| ^m \right] \le \frac{m!}{2} \sigma _X^2 R^{m-2}, \; \; \text {for all integers }m \ge 3. \end{aligned}$$

Let \(S \doteq \frac{1}{p}\sum _{k=1}^p X_k\), then for ... , it holds that

$$\begin{aligned} \mathbb P\left[ \left| S - \mathbb E\left[ S \right] \right| \ge t \right] \le 2\exp \left( -\frac{pt^2}{2\sigma _X^2 + 2Rt}\right) . \end{aligned}$$

Lemma 33

(Angles between two subspaces) Consider two linear subspaces \(\mathscr {U}\), \(\mathscr {V}\) of dimension k in \(\mathbb R^n\) (\(k \in [n]\)) spanned by orthonormal bases \(\varvec{U}\) and \(\varvec{V}\), respectively. Suppose \(\pi /2 \ge \theta _1 \ge \theta _2 \dots \ge \theta _k \ge 0\) are the principal angles between \(\mathscr {U}\) and \(\mathscr {V}\). Then it holds that

(i) \(\min _{\varvec{Q} \in O_k} \left\| \varvec{U} - \varvec{V} \varvec{Q} \right\| _{} \le \sqrt{2-2\cos \theta _1}\);

(ii) \(\sin \theta _1 = \left\| \varvec{U}\varvec{U}^* - \varvec{V}\varvec{V}^* \right\| _{}\);

(iii) Let \(\mathscr {U}^\perp \) and \(\mathscr {V}^\perp \) be the orthogonal complement of \(\mathscr {U}\) and \(\mathscr {V}\), respectively. Then \(\theta _1(\mathscr {U}, \mathscr {V}) = \theta _1(\mathscr {U}^\perp , \mathscr {V}^\perp )\).

Proof

Proof to (i) is similar to that of II. Theorem 4.11 in [95]. For \(2k \le n\), w.l.o.g., we can assume \(\varvec{U}\) and \(\varvec{V}\) are the canonical bases for \(\mathscr {U}\) and \(\mathscr {V}\), respectively. Then

$$\begin{aligned} \min _{\varvec{Q} \in O_k} \left\| \begin{bmatrix} \varvec{I} - \varvec{\varGamma }\varvec{Q} \\ - \varvec{\Sigma }\varvec{Q} \\ \varvec{0} \end{bmatrix} \right\| _{} \le \left\| \begin{bmatrix} \varvec{I} - \varvec{\varGamma }\\ - \varvec{\Sigma }\\ \varvec{0} \end{bmatrix} \right\| _{} \le \left\| \begin{bmatrix} \varvec{I} - \varvec{\varGamma }\\ - \varvec{\Sigma }\end{bmatrix} \right\| _{}. \end{aligned}$$

Now by definition

$$\begin{aligned} \left\| \begin{bmatrix} \varvec{I} - \varvec{\varGamma }\\ - \varvec{\Sigma }\end{bmatrix} \right\| _{}^2&= \max _{\left\| \varvec{x} \right\| _{} = 1} \left\| \begin{bmatrix} \varvec{I} - \varvec{\varGamma }\\ - \varvec{\Sigma }\end{bmatrix} \varvec{x} \right\| _{}^2 = \max _{\left\| \varvec{x} \right\| _{} = 1} \sum _{i=1}^k (1 - \cos \theta _i)^2 x_i^2 + \sin ^2\theta _i x_i^2 \\&= \max _{\left\| \varvec{x} \right\| _{} = 1} \sum _{i=1}^k (2-2\cos \theta _i) x_i^2 \le 2- 2\cos \theta _1. \end{aligned}$$

Note that the upper bound is achieved by taking \(\varvec{x} = \varvec{e}_1\). When \(2k > n\), by the results from CS decomposition (see, e.g., I Theorem 5.2 of [95])

$$\begin{aligned} \min _{\varvec{Q} \in O_k} \left\| \begin{bmatrix} \varvec{I}&\varvec{0} \\ \varvec{0}&\varvec{I} \\ \varvec{0}&\varvec{0} \end{bmatrix} - \begin{bmatrix} \varvec{\varGamma }&\varvec{0} \\ \varvec{0}&\varvec{I} \\ \varvec{\Sigma }&\varvec{0} \end{bmatrix} \right\| _{} \le \left\| \begin{bmatrix} \varvec{I} - \varvec{\varGamma }\\ - \varvec{\Sigma }\end{bmatrix} \right\| _{}, \end{aligned}$$

and the same argument then carries through. To prove (ii), note the fact that \(\sin \theta _1 = \left\| \varvec{U} \varvec{U}^* - \varvec{V} \varvec{V}^* \right\| _{}\) (see, e.g., Theorem 4.5 and Corollary 4.6 of [95]). Obviously, one also has

$$\begin{aligned} \sin \theta _1 = \left\| \varvec{U} \varvec{U}^* - \varvec{V} \varvec{V}^* \right\| _{} = \left\| (\varvec{I} - \varvec{U} \varvec{U}^*) - (\varvec{I} - \varvec{V} \varvec{V}^*) \right\| _{}, \end{aligned}$$

while \(\varvec{I} - \varvec{U} \varvec{U}^*\) and \(\varvec{I} - \varvec{V} \varvec{V}^*\) are projectors onto \(\mathscr {U}^\perp \) and \(\mathscr {V}^\perp \), respectively. This completes the proof. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, J., Qu, Q. & Wright, J. A Geometric Analysis of Phase Retrieval. Found Comput Math 18, 1131–1198 (2018). https://doi.org/10.1007/s10208-017-9365-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10208-017-9365-9

Keywords

Mathematics Subject Classification

Navigation