Total Least Squares Phase Retrieval

We address the phase retrieval problem with errors in the sensing vectors. A number of recent methods for phase retrieval are based on least squares (LS) formulations which assume errors in the quadratic measurements. We extend this approach to handle errors in the sensing vectors by adopting the total least squares (TLS) framework that is used in linear inverse problems with operator errors. We show how gradient descent and the specific geometry of the phase retrieval problem can be used to obtain a simple and efficient TLS solution. Additionally, we derive the gradients of the TLS and LS solutions with respect to the sensing vectors and measurements which enables us to calculate the solution errors. By analyzing these error expressions we determine conditions under which each method should outperform the other. We run simulations to demonstrate that our method can lead to more accurate solutions. We further demonstrate the effectiveness of our approach by performing phase retrieval experiments on real optical hardware which naturally contains both sensing vector and measurement errors.

Abstract-We address the phase retrieval problem with errors in the sensing vectors. A number of recent methods for phase retrieval are based on least squares (LS) formulations which assume errors in the quadratic measurements. We extend this approach to handle errors in the sensing vectors by adopting the total least squares (TLS) framework that is used in linear inverse problems with operator errors. We show how gradient descent and the specific geometry of the phase retrieval problem can be used to obtain a simple and efficient TLS solution. Additionally, we derive the gradients of the TLS and LS solutions with respect to the sensing vectors and measurements which enables us to calculate the solution errors. By analyzing these error expressions we determine conditions under which each method should outperform the other. We run simulations to demonstrate that our method can lead to more accurate solutions. We further demonstrate the effectiveness of our approach by performing phase retrieval experiments on real optical hardware which naturally contains both sensing vector and measurement errors.
Index Terms-Phase retrieval, total least squares, operator error, sensing vector error, quadratic equations.

I. INTRODUCTION
I N the phase retrieval problem we seek to recover the signal x ∈ C N from complex quadratic measurements where y m ∈ R are observed measurements and a m ∈ C N are sensing vectors. This problem appears in a plethora of applied science applications such as x-ray diffraction crystallography or astronomy where the sensing vectors are Fourier basis vectors [1] and imaging through scattering media where the sensing vectors may be complex random Gaussian [2]. In a prototypical phase retrieval problem, an object, x, is illuminated and the resulting optical field is measured with a detector. This optical field is complex-valued but common camera sensors only measure intensity, {| a m , x | 2 } M m=1 , and thus the measurement phase information is lost. The left and right hand sides in (1) are only approximately equal because in practical settings there can be errors in the measurements and sensing vectors. In this work we focus on gradient-based optimization strategies to solve (1) where M > N . Gradientbased methods have proven successful when imaging through random scattering media [2] or with coded diffraction Fourier patterns [3]. with r m ∈ R. Thus, LS seeks the smallest correction to the measurements so that (y m + r m ) can be obtained from quadratic measurements | a m , x | 2 for each m. This is analogous to LS for linear inverse problems where corrections that bring the measurements into the range space of the linear operator are required instead. In many practical settings, the sensing vectors, {a m } M m=1 , are only approximately known via calibration. In this work we show that properly accounting for errors in the sensing vectors may lead to a more accurate estimate of x. Inspired by the total least squares (TLS) framework for linear [4], [5] and nonlinear [6] inverse problems, we extend the LS formulation (2) to find corrections for both the measurements and the sensing vectors. In TLS phase retrieval, we optimize the objective with corrections e m ∈ C N for 1 ≤ m ≤ M . Scalars λ y ∈ R and λ a ∈ R are nonnegative regularization weights. Now for each m we want to find minimum weighted norm corrections so that (y m + r m ) can be obtained from quadratic measurements | a m + e m , x | 2 . Efficiently obtaining the sensing vector corrections {e m } M m=1 is a major challenge when moving from the LS problem (2) to the TLS problem (3).

A. Related work
Algorithms by Gerchberg and Saxton [7] and Fienup [8] are the most well-known approaches for solving the phase retrieval problem when the sensing vectors are the rows of the Fourier matrix, as in many practical imaging scenarios [9]. These methods iteratively reduce the error between the observed measurements and the measurements generated from the solution at the current iterate. Another class of algorithms based on message passing have also been developed [10], [11]. Despite the nonconvexity of the problem, these error reduction and message passing algorithms work well in practice. They do not directly use gradient descent to obtain a solution.
Recently a series of works have shown that for suitable measurement models, the nonconvex LS objective (LS-PR) can be globally optimized via gradient descent updates. The Wirtinger flow algorithm is one of the most well-known methods and proposes the framework comprising a spectral initialization followed by gradient descent updates [3]. Spectral initialization ensures that the iterates start in a convex basin near a global optimum when there are enough measurements in an error-free setting. This initialization was first proposed as part of the AltMinPhase algorithm [12]. Multiple works have extended this approach by modifying the initialization, gradient updates and objective for phase retrieval [13], [14], and other quadratic problems with sensing matrices rather than sensing vectors [15] like the unassigned distance geometry problem [16]. There are also extensions that incorporate signal priors such as sparsity [17], [18]. None of these gradient descent approaches, however, account for sensing vector or sensing matrix errors.
Another group of works have developed convex optimization approaches, which are closely related to low-rank matrix recovery techniques, for solving the phase retrieval problem [19]. These methods use the fact that the measurements in (1) can be expressed using the Frobenius matrix inner product, y m ≈ | a m , x | 2 = x * a m a * m x = a m a * m , xx * . With this formulation, phase retrieval amounts to recovering a rank-1 positive semidefinite matrix, X = xx * , from linear matrix inner product measurements, { a m a * m , xx * } M m=1 [20], [21]. In practice, lifting the problem from recovering vectors in C N to matrices in C N ×N poses significant computational and memory challenges for even moderately sized problems. Matrix sketching algorithms [22] and convex methods which do not require lifting [23] have since been developed to address these challenges.
For linear inverse problems, the TLS method is an established approach for handling errors in both the measurements and the operator [4], [5]. For linear problems, TLS can be efficiently solved using the singular value decomposition (SVD). For the quadratic case considered in this paper, such an approach is not apparent because of the magnitude-squared nonlinearity in (1). We therefore also cannot use the SVD to analyze the solution error as is done in the linear case [24]. Linear TLS has been extended to settings with structured operator errors [25], [26], sparse signals [27], and signals with norm constraints [28]. We note that Yagle and Bell use linear TLS to solve a particular subproblem in a phase retrieval algorithm which only addresses errors in the measurements [29].
There also exist algorithms for nonlinear TLS which aim to solve a general optimization problem for inverse problems with arbitrary nonlinearities [6], [30], [31]. The general optimization problem is similar to (3) except for the constraint which requires nonlinear rather than quadratic consistency. However, by using the specific structure of the phase retrieval problem (1) we are able to obtain efficient algorithms and perform error analysis for TLS phase retrieval.
Our gradient descent strategy uses alternating updates to solve the TLS phase retrieval problem. While alternating updates have been successfully utilized to solve the linear TLS problem [27], it is not straightforward to extend this approach to phase retrieval because of its quadratic nature. We show how to use the geometry of the optimization problem (3) to perform alternating updates for TLS phase retrieval.

B. Contributions and paper organization
We propose a TLS framework for solving the phase retrieval problem when there are errors in the sensing vectors. In Section II we explain our gradient descent strategy to solve the TLS phase retrieval problem which motivates an alternating updates procedure to solve the problem. With this approach there are additional computational challenges which we show can be made efficient by incorporating the geometry of the phase retrieval problem. In Section III we derive expressions for the reconstruction errors for the TLS and LS solutions. This gives us insight into when each method should perform well. This derivation requires the usage of theorems about differentiation of argmins and different matrix inversion lemmas. Through simulations in Section IV we show that the TLS approach can lead to solutions of greater accuracy when there are sensing vector errors. We further verify the applicability of our framework through experiments on real optical hardware in Section V. We see that TLS outperforms LS when aiming to recover random signals and real images. We conclude and motivate future work in Section VI.

II. TLS FOR PHASE RETRIEVAL
In this section we show how to solve the TLS phase retrieval problem. Recall (3), which has been normalized by the number of measurements by the scaling 1 M . We can rearrange the constraint and substitute Further we denote the mth corrected sensing vector as a m := (a m + e m ) to obtain the equivalent formulations .

(TLS-PR2)
As each data consistency term, , is proportional to x 4 2 in an error-free setting, we set λ y = λ † y x (0) 4 2 in order to make the scaling of the objective invariant with respect to the norm of x. The vector x (0) is an initial guess for x and λ † y is a regularization parameter. Furthermore, to account for the fact that the sensing vector corrections, (a m − a m ), are N -dimensional and the data consistency terms a is a regularization parameter.
In line with recent methods such as the Wirtinger flow algorithm [3], our high level strategy is to obtain x by solving arg min using gradient descent. To perform gradient descent with respect to x we can use Wirtinger gradient updates [3], where µ is the step size and x (0) 2 is a guess for x 2 . The gradient is given by where a † m is the solution to the following nonconvex optimization problem This motivates the following alternating updates procedure to solve the TLS problem: 1) Obtain an initial guess, x (0) ∈ C N , for x.
2) Repeat steps 2a and 2b until convergence: a) With x fixed, obtain corrected sensing vectors, fixed, take one gradient descent step to update x using (7). The main challenge in our approach is obtaining corrected sensing vectors { a † m } M m=1 by solving (9) so that we can perform gradient descent updates for x using (7). As (TLS-PR2) is nonconvex, a good initial guess, x (0) , can place us near a global minimum. There are multiple initialization options such as the spectral initialization for certain measurement models [12].
In the remainder of this section we will examine the geometry of the optimization problem in (TLS-PR1) and show how it can be leveraged to efficiently solve (9) and obtain corrected sensing vectors. This is summarized by Proposition 1 below. We will then present the complete TLS phase retrieval algorithm. Lastly, we also interpret the regularization parameters, λ a and λ y , by showing that the TLS solution is the maximum likelihood estimator for a quadratic complex-valued error-in-variables (EIV) model.

A. Optimization geometry
Moving from the LS formulation to the TLS formulation introduces significant computational issues. In addition to optimizing over vector x, we must additionally optimize over M sensing vectors in (TLS-PR1), with typically M > N . We now study the optimization geometry of (TLS-PR1) and show that the M inner minimizations over the N -dimensional vectors, { a m } M m=1 , can be simplified to minimizing over M scalars which improves efficiency. For ease of visualization in this subsection, we consider the real-valued problem (all quantities in (1), (LS-PR) and (TLS-PR1) are real) and we set λ a = λ y = 1.
For a given vector x we compare the values of the LS and TLS objectives, (LS-PR) and (TLS-PR1). The left column of Fig. 1 visualizes the phase retrieval problem with M = 5 data points, {(a m , y m )} M m=1 , when N = 2 and x 2 = 1. The middle column shows the same data points from a different viewing angle. In phase retrieval we fit a paraboloid, y(a) = | a, x | 2 that is parameterized by x to the data points, {(a m , y m )} M m=1 . If there is no sensing vector or measurement error, the data points lie on the paraboloid ((1) holds with equality). The left and middle figure show that the surface y(a) = | a, x | 2 does not change in the subspace perpendicular to x, denoted as x ⊥ . This can also be verified by considering the values of a that would result in the inner product a, x being zero. Crucially this means that the shortest paths between the data points and the paraboloid have no component in the x ⊥ subspace. As a result, we can view the problem in 2D from a viewpoint that looks into the x ⊥ subspace as shown in the right column of Fig. 1. This 2D plot shows two options for measuring closeness between the surface and the data points. The LS objective (LS-PR), is the sum of the squared vertical distance between the 2D parabola and each data point as indicated by the dashed lines. On the other hand, due to the minima over all a m , the TLS objective (TLS-PR1), is the sum of the squared Euclidean or orthogonal distance between the 2D parabola and each data point as shown by the solid lines. A similar geometrical interpretation is seen with linear TLS [4], [5]. Considering this geometry, to solve the inner minimizations in (TLS-PR1), we find the closest point on the paraboloid to each data point. As the shortest path has no component in the x ⊥ subspace, our task of finding the closest point on a (N +1)-dimensional paraboloid to a (N +1)-dimensional data point reduces to a 2D geometry problem of finding the closest point on a parabola to a 2D data point. Rather than finding the minimizing N -dimensional a † m for each data point, we instead only need to find the component of a † m in the x direction that is closest. This component is a scalar and is given by the inner product, ν m = a † m , x . We can then construct a † m by adding where x is x normalized. If λ a and λ y are not one, a perpendicular distance is not minimized. As λa λy gets larger, the solid lines in the right column of Fig. 1 become more vertical because there is a relatively larger penalty for correcting the sensing vectors and the problem moves towards a LS approach. Conversely, the lines become more horizontal as λa λy gets smaller. Irrespective of the values of λ a and λ y , the shortest paths between the paraboloid and the data points still have no component in the x ⊥ subspace and (10) can be used to obtain each a † m . We further note that this geometry also holds for the complexvalued phase retrieval problem (1).

B. Correcting complex-valued sensing vectors
Our strategy is to set up each inner minimization over a m in (TLS-PR1) as the minimization of a fourth degree equation with respect to scalar ν m = a m , x rather than vector a m . We then directly obtain the minimizer of this equation.
The M inner minimization problems in (TLS-PR1) are independent of each other and we can independently solve each summand for a fixed vector x. Consider the objective function of optimization problem I m (x), Proposition 1 states that arg min am f m ( a m ) can be obtained by solving two scalar variable cubic equations and using (10).
Proposition 1. Let sets R + and R − be the positive real solutions of where α = 2λ y x 2 2 ∈ R, β = λ a − 2λ y y m x 2 2 ∈ R and γ = −λ a x * a m ∈ C. Further, with κ denoting the phase of γ, let and a † m (·) is defined in (10).
We can use Wirtinger derivatives to calculate the derivative of the real-valued f m ( a m ) with respect to the complex vector a m [3], Setting the derivative to zero, and then left-multiplying by nonzero x * gives The left hand side is now scalar-valued and is a function of scalar ν m = a m , x = x * a m ∈ C instead of a vector. Recalling our analysis of the optimization geometry in Section II-A, we can solve for ν m and then obtain a † m = a † m (ν m ) using (10). If we substitute α = 2λ y x 2 2 ∈ R, β = λ a − 2λ y y m x 2 2 ∈ R and γ = −λ a x * a m ∈ C we wish to solve the following for ν m , Because the sensing vectors and ground truth signal are complex, this cubic equation is a function of ν m ∈ C and its conjugateν m (|ν m | 2 = ν mνm ). We therefore cannot use standard cubic root finding formulae. Further note that the coefficients α and β are always real and γ may be complex.
To solve, first multiply byν m , Next, with complex-exponential representation, ν m = re jφ and γ = |γ|e jκ (recall γ is known), the equation becomes The real and imaginary parts of the left hand side should both equate to zero. Using Euler's identity, e jθ = cos(θ) + j sin(θ), we arrive at the following simultaneous equations, For the first equation to hold, cos(κ − φ) = ±1 and so the phase of ν m has two possible values; φ = κ or φ = (κ − π).
To obtain the magnitude of ν m we can solve the following two cubic equations for r to get six values, three from each, As the solutions of these two cubic equations are magnitudes of complex numbers, we let sets R + and R − be the positive real solutions of (22) and (23) respectively. To obtain values for ν m we combine R + and R − with their phases to get S + and S − -multiply the elements of R + by e jκ and multiply the elements of R − by e j(κ−π) = −e jκ . We then construct candidate minimizers of f m (·) by using the possible values for ν m , the set, S + ∪ S − , as the argument for (10). Finally, the global minimizer is the candidate minimizer that gives the minimum value as the argument of f m (·).
To solve (22) and (23) for r, Cardano's formula for cubic equations or a general cubic root formula derived from Cardano's formula can be used (see Appendix A). Furthermore we note that the procedure to update the sensing vectors is independent of the sensing vector measurement model.

C. TLS phase retrieval algorithm
Now that we have a method for solving the inner minimizations in (TLS-PR1), we present the complete TLS phase retrieval algorithm in Algorithm 1. We say that the algorithm has converged if the value of J (x, a † 1 , . . . , a † M ) in (TLS-PR2) between consecutive iterates is less than some threshold. In practice all sensing vectors can be updated (lines [5][6][7][8][9][10][11][12][13][14][15][16] in parallel for a given x because all sensing vectors are independent of each other.

D. ML estimator for EIV models
Proposition 2 below provides an interpretation of the regularization parameters in (TLS-PR1) by connecting them to the error level. It states that under certain assumptions the solution to (TLS-PR1) is the maximum likelihood (ML) estimator for the complex-valued EIV model given by κ ← Angle(γ) 10:

16:
end for // Update x with sensing vectors fixed 17: x  [25], [32]. Similarly, this result is a specific instance of what is seen for nonlinear TLS [6]. Proof: The proof follows a standard procedure and is provided in Appendix B.

III. TLS AND LS SOLUTION RECONSTRUCTION ERRORS
In this section we evaluate the reconstruction error for the TLS and LS phase retrieval solutions by deriving their Taylor expansions. Through these expressions we are able to gain insight into the behavior of the TLS solution relative to the LS solution and understand when each method performs well. We also use these expressions to understand how the reconstruction errors rely on the level of the measurement and the sensing vector errors when all the errors are Gaussian. Since this analysis is cumbersome, in this section we will consider the real-valued phase retrieval problem where the ground truth signal, the sensing vectors and the sensing vector errors in (1) are real. Simulations in Section IV show that the reasoning carries through to the complex problem. In our derivations we will use theorems about differentiation of argmins and various matrix inversion lemmas.
We denote the ground truth signal as x # and the TLS and LS solutions as x † TLS and x † LS . If there are no errors in the sensing vectors or measurements, x # and −x # are both optimum LS and TLS solutions for (LS-PR) and (TLS-PR2) (with the mth corrected sensing vector being a m ). Due to this inherent sign ambiguity it is standard to define the reconstruction errors as Our results are unchanged if the analysis is done with optimum solution x # (σ = 1) or with optimum solution −x # (σ = −1). Consequently, we choose optimum solution x # with σ = 1 in the following analysis.

A. Reconstruction error analysis
The erroneous sensing vectors and measurements in (1) can be expressed as perturbed versions of error-free sensing vectors and measurements, { a m } M m=1 and { y m } M m=1 . We denote the sensing vector and measurement error perturbations as {δ m } M m=1 and {η m } M m=1 . Stacking these into vectors we define, In order to calculate the reconstruction errors we need access to expressions for x † TLS and x † LS . We begin by noting that the solutions are functions of the sensing vectors and measurements, x † TLS (t) and x † LS (t). If there are no errors in the sensing vectors or measurements, an optimum LS solution Similarly an optimum TLS solution in (TLS-PR2) for x † TLS ( t) = x # with the mth corrected sensing vector being a m (no correction needed). Now, if we instead have sensing vector and measurement errors, Assuming γ is small, we can study the first-order terms in the Taylor series expansions of x † TLS (t) and x † LS (t) to measure the perturbation from x # .
The Taylor series expansion of x † TLS (t) = x † TLS ( t + γ) at the no error point, t, is where O( γ 2 2 ) represents terms with norm of order γ 2 2 . The Taylor series expansion for x † LS (t) can be written similarly. Using these expansions, to the first-order when γ is small, the reconstruction errors for the TLS and LS problems are To evaluate e TLS and e LS we must calculate the derivatives which are the derivatives of the argmins of (TLS-PR2) and (LS-PR). We use the method by Gould et al. to take derivatives of argmin problems [33].
With the substitution e m = a m − a m and multiplicative constants absorbed into λ a and λ y , the TLS optimization problem (TLS-PR2) can be rewritten as x † The derivatives of g(t) with respect to the kth sensing vector and measurement can be computed after specific second derivatives of f (q, t) are computed [33], We can then obtain ∇ t x † TLS (t) by vertically stacking the derivatives (35) and (36) and taking the last N rows. Appendix C-A contains the derivations for the last N rows of (35) and (36).
The same approach can be used for the LS problem by considering its optimization problem, The corresponding derivative derivations are in Appendix C-B.
Proposition 3 below states the expressions for e TLS and e LS . We denote and use these quantities to define diagonal matrix, D, and vector, w, Proposition 3. To the first-order, the reconstruction errors for the solution x † TLS to the TLS optimization problem (32), and, the solution x † LS to the LS optimization problem (38) are Proof: Lemma 1 in Appendix C states the Taylor series expansions around the no error point, t, for the TLS and LS solutions. The result in this proposition follows by considering only the zeroth and first-order terms.
As expected, when γ → 0, the errors E A and E Y tend to zero which makes the vector w zero and the reconstruction errors are zero. The difference between the TLS and LS reconstruction errors in Proposition 3 is due to the diagonal matrix D. As Furthermore, if M = N and A is invertible, we can again have e TLS = e LS . However, having M = N is not a practical setting for the real-valued phase retrieval problem because the map from x # to a 1 , x # 2 , . . . , a m , x # 2 T is not injective, even after accounting for the sign ambiguity [34]. The same holds for the complex-valued phase retrieval problem, even after accounting for the global phase shift [35]. Therefore, we can expect to require more measurements to obtain a unique solution to the phase retrieval problem with the TLS framework.
The reconstruction errors in Proposition 3 can be further interpreted by assuming a distribution for the measurement and sensing vectors errors; in Proposition 4 we assume that the nonzero entries of E Y and E A are iid zero-mean Gaussian (with different variances for E Y and E A ).

Proposition 4.
With the setting of Proposition 3, assume that the diagonal elements of the diagonal matrix E Y are iid zero-mean Gaussian with variance σ 2 η and that the rows of E A are independent zero-mean Gaussian random vectors with covariance σ 2 δ I. If E Y and E A are independent of each other, the expected squared first-order reconstruction errors are Proof: The expectations are computed in Appendix D. Just like in Proposition 3, the difference between the TLS and LS expressions in Proposition 4 are due to the diagonal matrix D. Each expression is a sum of two terms-the first term shows how the expectations depend on σ 2 δ and the second term shows how they depend on σ 2 η .

B. Reconstruction error numerical experiments
The expressions in Proposition 3 provide a means to understand when each approach should perform well. Furthermore, their squared-expectations in Proposition 4 allow us to verify the optimal maximum likelihood parameters stated in Proposition 2.
a) Impact of varying error strength and number of measurements: We compare TLS and LS by numerically evaluating (45) and (46) with different measurement and sensing vector error levels while varying the number of measurements.
These experiments only consider the first-order error. The actual error is computed in a variety of experiments in Section IV. We will use SNR to quantify the measurement and sensing vector error level. The measurement SNR is −20 log 10 ( E Y F / Y F ) and similarly the sensing vector SNR is −20 log 10 ( E A F / A F ). Furthermore we define the relative reconstruction errors, rel.e TLS = eTLS x # . We plot the relative reconstruction errors as the oversampling ratio M N is varied with N = 100. Regularization parameters λ y and λ a are set to one. For each value of M N we do 100 trials and each trial uses new sensing vectors, ground truth signals and errors. The standard deviation of the trials is indicated by error bars in the upcoming plots. The sensing vectors and ground truth signal are iid standard real Gaussian. Furthermore, the measurement and sensing vector errors are iid zero-mean real Gaussian with variance such that the sensing vector SNR is 40 dB. In Fig. 2a the measurement SNR is 65 dB and TLS has lower reconstruction error than LS. When the measurement SNR decreases to 40 dB in Fig. 2b, LS outperforms TLS. Although these experiments use the firstorder error, they are consistent with our intuition. The relative  performance of TLS is better when most of the error is due to sensing vector error. We also see that the performance of both methods improves as the number of measurements increases. Lastly, from Fig. 2b, TLS may improve relatively faster than LS as the number of measurements increase. b) Verification of optimal ML parameters: The expression for TLS (47) in Proposition 4 enables us to verify the optimal maximum likelihood parameters for λ y and λ a from Proposition 2. Although Proposition 2 is stated for the complex-valued phase retrieval problem, the same procedure shows that the optimal parameters are the same for real-valued phase retrieval we considered here. The theoretically optimal parameter ratio is To verify numerically whether this agrees with Proposition 4, we vary λy λa (which is contained in D) around the optimal ratio and plot the TLS expression (47). We do this multiple times and in each run use a different iid standard real Gaussian ground truth signal and a different set of iid standard real Gaussian sensing vectors. As in Proposition 4, the errors in each run are iid zero-mean Gaussian and their variances are set to obtain different SNRs. Fig. 3 shows the different runs with the minima marked in red and the theoretically optimal ratio indicated by the dashed black lines. We can see that all the minima are at the optimal ratio which verifies Proposition 2. In the top row, most of the error is due to measurement error and the optimal ratio is low. This further highlights that TLS sensing vector corrections are less important when most of the error is due to measurement error.
We further note that the consistency between Propositions 2 and 4 demonstrates that the first-order expressions can be used to explain the performance of our TLS framework. Section IV shows that the real reconstruction errors follow the same trends as the numerical simulations in this section.

IV. TLS PHASE RETRIEVAL SIMULATIONS
We compare the performance of TLS phase retrieval against LS phase retrieval through simulations. 1 To obtain a LS solution we use the Wirtinger flow method [3].
In this section we set the regularization parameters of (TLS-PR2) to λ a = 1 N and λ y = In all experiments we generate M quadratic measurements using M clean sensing vectors. The TLS and LS methods must then recover the signal x # from erroneous measurements and sensing vectors. We use SNR, as defined in Section III, to quantify measurement and sensing vector error. Also as in Section III, the plots in this section indicate the standard deviation of the trials using error bars.

A. Measurement models
In our experiments we will consider the complex-valued Gaussian and coded diffraction pattern measurement models. However, Algorithm 1 is not restricted to these measurement models. Recently in optical computing applications, random Gaussian scattering media have been used to do rapid highdimensional randomized linear algebra, kernel classification and dimensionality reduction using laser light [36], [37]. The coded diffraction pattern model modulates the signal with different patterns before taking the Fourier transform. It is inspired by the fact that in coherent x-ray imaging the field at the detector is the Fourier transform of the signal [9].
When using the Gaussian measurement model, the nth entry of sensing vector m, a mn , is distributed by the complex normal distribution for the complex-valued problem, a mn ∼ N (0, 1) + jN (0, 1). For the real-valued problem it is the standard normal distribution, a mn ∼ N (0, 1). The Gaussian measurement model sensing vector entries are independent of each other and the sensing vectors are also independent of each other. A description of the coded diffraction pattern measurement model is in Appendix G.
In this section of the main paper, the complex Gaussian measurement model is used. In Appendix G-A these experiments are repeated for the coded diffraction pattern measurement model and the same behavior is seen.

B. Algorithm initialization
In our experiments we opt to do the initialization of the signal being recovered (line 1 of Algorithm 1) via a spectral initialization. This method comprising a spectral initialization followed by gradient descent updates has been proven to lead to globally optimal solutions for the LS phase retrieval problem (LS-PR) in an error-free setting under the Gaussian and coded diffraction pattern models [12], [3].
The spectral initialization is the leading eigenvector of the matrix m y m a m a * m ∈ C N ×N which we efficiently compute using 50 power method iterations. This eigenvector is scaled appropriately by estimating the norm of the signal of interest as 1 2M m y m 1/2 .

C. Signal recovery
To evaluate performance we compute the distance between the ground truth signal and the recovered signal. As the value of the objective function (TLS-PR1) is the same for x and phase shifted e jϕ x, we cannot distinguish between x and its phase-shifted variant. We therefore use a standard definition of distance that is invariant to phase shifts which is detailed in Definition 1. Definition 1. Denote the ground truth as x # ∈ C N and let x † ∈ C N be a solution to the phase retrieval problem. The distance between x # and x # 2 and the reconstruction SNR in dB is defined as −20 log 10 (rel.dist(x # , x † )). a) Combinations of sensing vector and measurement error: To understand how performance changes with different amounts of sensing vector and measurement error, we add different amounts of random iid complex Gaussian error to sensing vectors and random iid real Gaussian error to measurements. For each combination of sensing vector error and measurement error we perform 100 phase retrieval trials. In each trial we generate a new ground truth signal and M new sensing vectors to produce M new error-free measurements. In each trial we then add new random error perturbations to the sensing vectors and measurements. We evaluate performance by subtracting the relative distance of the TLS solution from that of the LS solution, , and average across all 100 trials. If this average is positive, TLS has outperformed LS.
We use a step size of µ = 0.5 λa for TLS and µ = 0.02 for LS to perform the gradient update for x in (7). The TLS step size is inversely proportional to λ a because the relative importance of the data consistency term is inversely proportional to the sensing vector consistency term in (TLS-PR2). Fig. 4   the sensing vector SNR decreases for a fixed measurement SNR because TLS phase retrieval accounts for sensing vector error. However, this trend starts to break for very low sensing vector SNR as shown at 5 dB when M N = 16. Increasing the number of measurements overcomes this issue and in general improves TLS performance as was indicated by the first-order reconstruction errors with Gaussian error in Figs. 2a and 2b. b) Impact of varying the number of measurements: To clearly see the impact of varying the number of measurements we fix the measurement SNR to 20 dB and sensing vector SNR to 10 dB and plot the reconstruction relative distance for TLS and LS in Fig. 5a. We do 100 trials for each value of   as the number of measurements are increased. In Fig. 5b we increase the sensing vector SNR to 30 dB. When the balance of the Gaussian error shifts more towards the measurements, LS performs better. This is identical to what was seen with the first-order reconstruction errors in Figs. 2a and 2b. c) Accuracy of first-order reconstruction errors: Appendix E contains the description of an experiment where we verify the accuracy of the reconstruction error expressions in Proposition 3 against the real errors. As expected, it shows that the first-order expressions in Proposition 3 increase in accuracy as the sensing vector and measurement error decreases-this corresponds to the norm of γ in (27) decreasing, but that these expressions may only serve as a rough rule of thumb when errors are large. d) Sensing vectors and measurements error model: The simulations in this section use iid random Gaussian errors. In Appendix F we design errors that require access to the ground truth signal norm and to the error-free measurements. We show that the improvement of TLS over LS can be larger in this artificial scenario.

D. Corrected sensing vector verification
To characterize the sensing vector corrections performed by our algorithm, we define a metric sensitive to the relative correction error of the sensing vectors. The metric only considers corrections in the direction of the recovered signal because our algorithm only corrects the component of the sensing vectors in the direction of this signal due to the optimization geometry (Section II-A).

Definition 2.
Denote the complex-valued ground truth signal and sensing vectors as x # and a # := {a # m } M m=1 . Let their counterparts obtained by solving the TLS phase retrieval problem be x † and a † : Then, denoting y(a, x) = [ a 1 , x , . . . , a M , x ], the relative sensing vector correction error between a # and a † is defined as rel.
To evaluate performance, we denote the ground truth and TLS corrected sensing vectors as { a m } M m=1 and { a † m } M m=1 . For the sensing vectors in the previous experiments of Fig.  4 we compute rel.corr({ a, x # }, { a † , x † TLS }) and average across the 100 trials. Fig. 6 shows the relative correction error when M N ∈ {16, 32}. We see that as the sensing vector SNR increases, the relative correction error decreases. Furthermore, as the measurement SNR decreases, the relative correction error increases. These relative correction error increases are more pronounced when the sensing vector SNR is high, a setting where sensing vector correction is needed less. This observation is consistent with Fig. 4-when sensing vector SNR is high, the TLS sensing vector corrections hinder TLS performance and LS outperforms TLS.
Next we investigate how the number of measurements impacts the relative correction error. We do this with the sensing vectors from the previous experiments in Figs. 5a and 5b. Fig. 7 shows the averages over the 100 trials. Here the measurement SNR was fixed to 20 dB and the sensing vector SNR was 10 dB or 30 dB. Consistent with what was seen previously, the performance of TLS improves with increasing number of measurements. Additionally, increasing the number of measurements provides greater gains when the sensing vector SNR is lower.

V. EXPERIMENTS ON REAL OPTICAL HARDWARE
In this section we show that TLS phase retrieval outperforms LS phase retrieval when using real optical hardware. We use an Optical Processing Unit (OPU) which enables rapid random high-dimensional matrix-vector multiplication. 2 A known signal x # ∈ R N is encoded onto coherent laser light using a digital micro-mirror device (DMD) which is then shined through a Gaussian multiple scattering medium as shown in Fig. 8. We denote the transmission matrix of the Gaussian medium as A ∈ C M ×N . The M rows of the transmission matrix are sensing vectors, a m ∈ C N for 1 ≤ m ≤ M . The intensity of the scattered light in the sensor plane, y m ≈ a m , x # 2 for all m, is then measured using a camera. We do phase retrieval using the optical measurements to reconstruct the input, x # . The input signals are limited to real-valued binary images due to the DMD. 2 Visit https://www.lighton.ai/lighton-cloud/ for a publicly available cloud OPU with a scikit-learn interface.   measurements [2]. There may also be other sources of experimental error. Unlike in the computer simulations of Section IV, when using the OPU, we do not know the exact error model and we also do not know the levels of the errors in the measurements and sensing vectors.
In the experiments, the TLS and LS step sizes are tuned to 0.4 λa and 0.005. The initialization method and termination criteria are the same as in Section IV. Additionally, we use the fact that the images being reconstructed are real-valued and binary to regularize both the TLS and LS methods. After the initialization (Algorithm 1, Step 1 for TLS) and each x update step (Algorithm 1, Step 17 for TLS) we take the elementwise absolute value of the signal to set the phase of all elements to zero. We then normalize the entries of x † TLS and x † LS with absolute value larger than one to one.
Appendix H contains details of the sensing vector calibration method used and further OPU experimental details. a) Random ground truth signals: Our ground truth signals are real-valued random binary images of size N = 16 × 16 = 256. We vary the oversampling ratio, M N , and perform 100 trials for each ratio. In each trial a new ground truth image and set of calibrated sensing vectors is used. On a held out set of ten images and with M N = 8 we tune λ a = 40 and λ y = x (0) −4 2 in (TLS-PR2) where x (0) is the initialization. Fig. 9 shows that the SNR of the reconstructed images using TLS is higher than when using LS for all numbers of measurements. Additionally, in Fig. 10 we plot the standard deviation of the results in Fig. 9 and show that the TLS method has lower variability.  Fig. 11 shows the original images, their reconstructions and their reconstruction SNR. For a given oversampling ratio, the TLS approach reports better SNR values and reconstructs images of better visual quality as compared to the LS approach.

VI. CONCLUSION
We have developed a TLS framework for solving the phase retrieval problem that accounts for both sensing vector error and measurement error. One of the keys to solving the TLS  problem via gradient descent was studying the geometry of the TLS optimization problem to realize that the sensing vectors can be efficiently updated by solving a scalar variable optimization problem instead of a vector variable optimization problem. By deriving the Taylor series expansions for the TLS and LS solutions we have also obtained approximate expressions for their reconstruction error. These expressions enabled us to anticipate the accuracy of the TLS solution relative to the LS solution and understand when which approach will lead to a better solution. We verify the TLS method through a range of computer simulations. Furthermore, in experiments with real optical hardware, TLS outperforms LS.
Presently we correct the sensing vectors based on the one signal that we wish to recover. An interesting future line of work lies in multi-signal TLS for sensing vector denoising so that subsequent signals can be recovered without requiring sensing vector corrections if their measurements use the same set of sensing vectors. There are multiple areas where this is required. An upcoming application is in optical neural network backpropagation where unknown random sensing vectors are the rows of weight matrices of fully-connected layers. Denoised and more accurate rows will enable better machine learning performance.
There exist other applications of phase retrieval with uncertain sensing vectors. Ptychography, which can be modeled analogously to (1), is a prime example [38]. Ptychography has recently been addressed by least squares, comprising spectral initialization followed by gradient descent [39]. It would be interesting to see whether our TLS phase retrieval algorithm brings about improvements. Other ptychography methods use alternating updates to recover the object and the sensing vectors [40]. Our geometric intuition may help reduce the number of unknowns in sensing vector updates and thus improve the overall computational efficiency of these algorithms.

APPENDIX A ROOTS OF CUBIC EQUATIONS
Consider finding the roots of the following cubic equation Denote Then for k ∈ {0, 1, 2} the three roots, x k , are where θ is the cube root of unity, θ = −1+ Note that the cubic equations in this paper are with b = 0 which simplifies the above expressions.
Using the assumptions on the error distributions, .
SUPPLEMENTARY MATERIAL APPENDIX C TAYLOR SERIES EXPANSIONS Lemma 1 states the Taylor series expansions around the no error point t for the TLS and LS solutions. The notation defined in Section III is used. Lemma 1. The Taylor series expansions for the solution x † TLS to the TLS optimization problem (32), and, the solution x † LS to the LS optimization problem (38) at the no error point t = t with perturbation γ are Proof: To compute the Taylor series expansions we require x † TLS ( t) and ∇ t x † TLS (t) t= t γ for the TLS problem (29) and the corresponding terms for the LS problem, x † LS ( t) and ∇ t x † LS (t) t= t γ In the no error setting, when γ = 0, x † TLS ( t) = x # . This is because with no error the minimum objective function (32) value of zero is achievable with e m = 0 for all m and x † TLS (t) = x # . Similarly for the LS problem, when γ = 0, the LS solution, x † LS ( t), is also x # , as this achieves the minimum LS objective function (38) value of zero.
The full derivations for the derivatives ∇ t x † TLS (t) and ∇ t x † LS (t) using (37) are contained in Appendices C-A and C-B. Appendices C-C and C-D then evaluate these derivatives at t and multiply them by γ. We again use the fact that at t the solutions are x # and that the TLS sensing vector corrections, e m , are zero for all m.

A. Gradients for TLS problem
For convenience, we restate the optimization problem (32), We denote the quantities which are used to denote The first derivative of the objective function with respect to The second derivative of the objective function with respect to q, 16 The second derivative with respect to y k , d The second derivative with respect to a k , ∇ 2 We will require the inverse of the second derivative (74), N +N ) , in our calculations (35) (36). We can use blockwise matrix inversion to invert the block matrix (74), where and Q S = T − C T B −1 C is the Schur complement of B. Furthermore because B is a block diagonal matrix, B −1 is also block diagonal with each block being the inverse of its counterpart block in B. Each block in B has the same structure and due to this structure the Sherman-Morrison formula can be used to invert each block, and B −1 T = B −1 .
As we wish to understand the sensitivity of x † TLS (34) we only require the final N rows of the inverse of (74). More precisely we will only require the submatrix To calculate (81) we require Q CB which is a block matrix with M matrices horizontally stacked. The mth block is To obtain Q S in (81) we can use (82) and the block matrix structure of C to calculate C T B −1 C = Q CB C, Then the Schur complement of B is Using (81) with (84) and (82) we can compute the last N rows of (36), d dy k x † TLS (t) ∈ R N . First, and therefore, Similarly using (81) with (84) and (82) we can compute the last N rows of (35) and therefore,

B. Gradients for LS problem
We restate the optimization problem (38) x † LS (t) = arg min We denote similar quantities to those in the TLS derivation. The main differences are that there are no e m , λ y and λ a in the LS approach, To derive d dy k x † LS (t) ∈ R N and ∇ a k x † LS (t) ∈ R N ×N for LS, the expressions that were derived for TLS can be used. Set {e m } M m=1 = 0, λ y = λ a = 1 in the TLS expressions (74), (75) and (76). Then take the bottom right N × N block of (74) and the bottom N rows of (75) and (76) to get for LS

C. Derivation of TLS solution Taylor series expansion
To calculate ∇ t x † TLS (t) t= t we need the last N rows of ∇ a k g(t) t= t and d dy k g(t) t= t for 1 ≤ k ≤ M as in (37). With t = t, {e m } M m=1 = 0 and x = x # , the quantities defined when deriving the gradients in Appendix C-A become a m = a m + e m = a m ∈ R N (100) Using these quantities with (86) and (88) in Appendix C-A, Therefore

D. Derivation of LS solution Taylor series expansion
Following the same procedure as in Appendix C-C and using (98) and (99) in Appendix C-B, The result in Proposition 4 then follows by substituting the TLS and LS values for R. We also use the fact that the

APPENDIX E EXPERIMENTAL VERIFICATION OF PROPOSITION 3
We verify the derived reconstruction errors in Proposition 3. As the results are for a first-order approximation, we expect their accuracy to reduce as γ increases. To vary γ we vary the sensing vector SNR and pin the measurement SNR to be twice the sensing vector SNR. For each SNR combination we perform 100 trials. In each trial we generate new real-valued ground truth signals, Gaussian sensing vectors and Gaussian errors for sensing vectors and measurements. The ground truth signals are iid standard real Gaussian with N = 100 and M N = 8.
We plot the average of the absolute difference between the relative distance from the solution of Algorithm 1 and the relative reconstruction error, rel.dist(x # , x † LS ) − rel.e LS for LS and rel.dist(x # , x † TLS ) − rel.e TLS for TLS in Fig.  12. The step sizes are 1.0 λa for TLS and 0.05 for LS. We set λ a = 1 N and λ y = 1 x (0) 4 2 . As expected the first-order approximations are accurate for high SNR and decrease in accuracy with decreasing SNR. The high accuracy for the moderate to high SNR values also confirms that Algorithm 1 can optimize (TLS-PR2).

APPENDIX F HANDCRAFTED ERRORS
Using the results and notation of Section III, we show that there exist error models which can significantly change the relative performance of TLS and LS. With scalars k a , k y ∈ R + to control the SNR of E A and E Y , we create errors With these errors, the expressions from Proposition 3 become  x # 2 2 y m > 1, and we investigate how this alters performance when the sensing vectors follow the iid standard real Gaussian measurement model. Appendix G-A contains experiments using the coded diffraction pattern model.

A. First-order reconstruction error numerical experiments
In the next set of experiments, is free of the regularization parameters. Matrix E A and diagonal matrix E Y are iid zero-mean Gaussian. We repeat the experiment of Fig. 2b using these created errors in Fig.  13. All other experimental details are unchanged. We see that now TLS outperforms LS with this error model. Next we fix M N = 8 and the sensing vector SNR to 100 dB so there is virtually no sensing vector error. We vary the measurement SNR over 100 trials and use the handcrafted errors. The sensing vectors and ground truth signals are generated in the same way as in the numerical experiments of Section III. In Fig. 14 we plot the average first-order relative reconstruction error and see that with this setting TLS outperforms LS. This occurs despite there only being measurement error, a setting where we may expect LS to outperform TLS.
The results in Figs. 13 and 14 show that the type of measurement and sensing vector error can impact performance.

B. Simulations with actual reconstruction error
To investigate the impact of this error model on the actual reconstruction error, we design handcrafted errors in the same manner as above and calculate the actual reconstruction error. Despite the earlier analysis using real-valued errors, we show that the ideas carry through when we use the complex Gaussian measurement model with E A being complex Gaussian.
In the first simulation we use a step size of 0.5 λa for TLS and 0.02 for LS and repeat the experiment of Fig. 5b with handcrafted errors instead. Fig. 15 shows that in this setting TLS outperforms LS, the opposite of what is seen with  Gaussian errors in Fig. 5b. This is consistent with the firstorder reconstruction error numerical experiment of Fig. 13.
Next we use M N = 8 and a step size of 0.2 λa for TLS and 0.02 for LS. Following the experiment of Fig. 14 Fig. 17 shows an identical experiment to that of Fig. 4. We see that the relative performance of TLS over LS improves with handcrafted errors compared to Fig. 4 where random Gaussian errors were used.   coded diffraction pattern measurements are then where 1 ≤ n ≤ N , 1 ≤ l ≤ L [41]. The modulation patterns p l ∈ C N follow the octanary pattern which means its entries are iid and follow the distribution of p where p = q 1 q 2 . The random variable q 1 is one of {−1, 1, −j, j} with equal probability and q 2 = √ 2 2 with probability 0.8 or q 2 = √ 3 with probability 0.2. Note that M can only be an integer multiple of N and depends on the number of patterns used.

A. Experiments
In this section we repeat the experiments that were done for the Gaussian measurement model in Section IV for the CDP measurement model. The experimental setup such as the number of trials, type of ground truth signal, step sizes and iteration stopping criteria are the same as those used for the equivalent simulation with the Gaussian model.

A. Sensing vector calibration
Sensing vector calibration is typically time consuming due to the quadratic nature of (1). We use a rapid numerical interferometry calibration procedure that first inputs K calibration signals, Ξ = [ξ 1 , . . . , ξ K ] ∈ R N ×K into the OPU and obtains the phase of the corresponding optical measurements, S in Q := |S| 2 ≈ |AΞ| 2 ∈ R M ×K . Transmission matrix A ∈ C M ×N is then recovered by solving the linear system S = AΞ [2]. If S has errors, the calibrated A may have errors. We implemented this method with 1.5N calibration   signals and 20 anchor signals. We use the same procedure as Gupta et al. to design calibration signals [2].

B. Experiment details
When doing the experiments with random images on the OPU, we set the camera exposure time to 700 µs to utilize and not saturate the full zero to 255 8-bit measurement range of the camera. The input display frametime is set to 1200 µs. For the experiments with the real images, the camera exposure is 400 µs and the frametime is 500 µs.
To use a new set of sensing vectors in each trial we calibrate a complex-valued transmission matrix with 2 17 rows. In each trial in Fig. 9 we then do phase retrieval by choosing a new set of M rows. The optical measurements corresponding to   the chosen M rows are used. As previously, the TLS and LS iterations in Algorithm 1 are stopped when the objective function value between successive iterates changes by less than 10 −6 and the initialization is done using 50 iterations of the power method. Because the output device exposure time controls the range of the measurements, the entries of the calibrated OPU transmission matrix are iid complex Gaussian and the calibrated Gaussian sensing vectors are scaled versions of the sensing vectors from the complex Gaussian measurement model. This does not impact the sensing vector updates in Section II-B because the procedure does not assume a measurement model. However, the initialization scaling and signal gradient descent updates (7) for both TLS and LS require minor changes. Instead of altering these steps we estimate the standard deviation and variance of the calibrated transmission matrix from its entries and divide the calibrated matrix by the estimated standard deviation. Correspondingly, we also divide the measurements by the estimated variance.