1 Introduction

We highlight recent theoretical advances that have opened the door to a quantitative convergence analysis of well-known phase retrieval algorithms. As shown in Chap. 6, phase retrieval problems have a natural and easy characterization as feasibility problems, and issues like noise and model misspecification do not effect the abstract regularity of the problem formulation. This was also observed in studies by Bauschke et al. [1] and Marchesini [2] reviewing phase retrieval algorithms in the context of fixed point iterations, though in those works the theory only provided convex heuristics for understanding the most successful algorithms. A slow progression of the theory for nonconvex feasibility culminating in the work by Luke et al. in [3] now provides a firm theoretical basis for understanding most of the standard algorithms for phase retrieval.

The approach is fix-point theoretic and is based on a framework introduced by Luke et al. in [3]. Given some (set-valued) mapping \(T: \mathcal {E}\rightrightarrows \mathcal {E}\), where \(\mathcal {E}\) is a finite-dimensional Euclidean space, the algorithms are studied as mere generators of sequences \((x^k)_{k\in \mathbb {N}}\) through the fixed point iteration \(x^{k+1}\in T x^k\) \((\forall k \in \mathbb {N})\) with \(x^k\rightarrow x^*\) where \(x^*=Tx^*\). We demonstrate the convergence framework of [3] on a few of the more prevalent iterative phase retrieval algorithms introduced in Chap. 6.

The analysis is based on two main properties. The first of these is the regularity of the mapping defining the fixed point iteration; the second property concerns the stability of the fixed points of the mapping. The first property is covered by the notion of pointwise almost averagedness, a generalization of regularity concepts like (firm) nonexpansiveness. Already in the 1960s Opial [4] showed that an iterative sequence defined by an averaged self-mapping with nonempty fixed point set converges to a fixed point. It is no surprise, then, that generalizations of averagedness should play a central role in convergence for more general fixed point mappings. In the setting of feasibility problems, i.e. finding a point in the intersection of a collection of sets, pointwise almost averagedness of the fixed point mapping is inherited from the regularity of the sets.

The other concept that is central to the analysis concerns stability of the fixed points. This is the characterized by the notion of metric subregularity as presented in Dontchev and Rockafellar [5], and Ioffe [6, 7]. Metric subregularity of the mapping at fixed points guarantees quantitative estimates for the rate of convergence of the iterates. This is closely related to the existence of error bounds, and weak-sharp minima, among other equivalent notions that provide a path to a quantitative convergence analysis.

In Sect. 23.2 we remind the reader of the phase retrieval problem. Section 23.3 and its subsections introduce basic notations and concepts. This is followed by a toolkit for convergence in Sect. 23.4 that describes the convergence framework we are working with. The use of this theoretical toolkit is demonstrated on two of the most prevalent algorithms for phase retrieval. We conclude this chapter with some numerical remarks in Sect. 23.8.

2 Phase Retrieval as a Feasibility Problem

The phase retrieval problem reviewed in Chaps. 2 and 6 involves reconstructing a complex valued field in a plane (the object plane) from measurements of its amplitude under a unitary mapping in a plane somewhere downstream from the object plane (the image plane). We use the notation for the phase retrieval problem already introduced in previous chapters. For a detailed description see Sect. 6.1.1. The measurements are represented by the sets

$$\begin{aligned} M_{j} := \left\{ u \in \mathbb {C}^{n}~|~\left\| \left( \mathcal {F}u\right) _{i} \right\| = \sqrt{I}_{j , i}, \; (i = 1 , 2 , \ldots , n) \right\} \quad (j = 1 , 2 , \ldots , m). \end{aligned}$$
(23.1)

The problem of recovering the phase from just the modulus of unitary transformed measurements is impossible to solve uniquely. Usually nonuniqueness is associated with ill-posedness, but for feasibility problems it is rather existence that is the source of difficulty. In real-world problems measurement errors and model misspecification have profound implications for feasibility models, but not for the reasons that one might expect. The geometry of the individual measurement sets does not change in the presence of noise or model misspecification. The issue is that the measurements are not consistent with one another. In other words, there is no solution that satisfies the measurements and other model requirements (like nonnegativity, in the case of real objects). A solution from the provided information is then only an approximation to the actual signal. Mathematically these characteristics translate into an inconsistent feasibility problem. That is, the intersection of the sets in the feasibility model is empty. Inconsistency has been investigated in many works (see for instance [8,9,10,11]) but most of these studies consider convex sets. Unfortunately, the sets involved in the phase retrieval problem are mostly nonconvex and have empty intersecton. In [3] the authors provided a scheme to handle even this case. The following sections are devoted to their work and present the most important concepts.

To avoid ambiguities recovering the phase, one often uses a priori information about the model. Common examples are the knowledge of a support of the signal, real-valuedness, non-negativity, sparsity or the information about an amplitude:

$$\begin{aligned} \begin{array}{rrl} \text {support constraint}&{}\qquad \!{\mathfrak {S}}&{}:=\left\{ y \in {{\mathbb {C}}^{n}}\,|\,y_i =0 \;\; \forall i \notin I\right\} \\ \text {real-valued support constraint}&{}\qquad \! {\mathfrak {S}_r}&{}:=\left\{ y \in {{\mathbb {R}}^{n}}\,|\,y_i =0 \;\; \forall i \notin I\right\} \\ \text {non-negative support constraint}&{}\qquad \! {\mathfrak {S}_+}&{}:=\left\{ y \in \mathbb {R}^n_+\,|\,y_i =0 \;\; \forall i \notin I\right\} \\ \text {amplitude constraint}&{}\qquad \!{A}&{}:=\left\{ y \in {{\mathbb {C}}^{n}}\,|\,|y_k|=a_k,\; 1\le k \le n\right\} \\ \text {sparsity constraint}&{}\qquad \!{\mathcal {A}_s}&{}:=\left\{ y\in {{\mathbb {R}}^{n}}\,|\,\Vert x\Vert _0\le s\right\} \end{array} \end{aligned}$$
(23.2)

for a set of indices \(I\subset \left\{ 1,2,\dots ,n\right\} \), \(a\in \mathbb {R}^n_+\) and \(s\in \left\{ 1,2,\dots ,n\right\} \), where \(\mathbb {R}^n_+=\left\{ x\in {{\mathbb {R}}^{n}}\,|\,x_i\ge 0,~ 1\le i\le n\right\} \). In the following we focus on the (non-negative) support constraint.

3 Notation and Basic Concepts

Our setting throughout this chapter is a finite dimensional real Euclidean space \(\mathcal {E}\) equipped with inner product \(\left\langle \cdot ,\cdot \right\rangle \) and induced norm \(\Vert \cdot \Vert \). The open unit ball is denoted by \({\mathbb {B}}\), whereas \({\mathbb {S}}\) stands for the unit sphere in \(\mathcal {E}\). The open ball with radius \(\delta \) and center x is denoted by \({\mathbb {B}}_\delta (x)\). The iterative algorithms we analyze can be represented by mappings \(T: \mathcal {E}\rightrightarrows \mathcal {E}\), where \(\rightrightarrows \) indicates that T is a point-to-set mapping. \(\mathbb {N}\) denotes the natural numbers. The inverse mapping \(T^{-1}\) at a point y in the range of T is defined as the set of all points x such that \(y\in T(x)\).

3.1 Projectors

We follow in this section the definitions introduced in Chap. 6. As a reminder: the distance of a point x to a set \(\varOmega \subset \mathcal {E}\) is defined by \( \mathrm {dist}\left( x,\varOmega \right) := \inf _{y \in \varOmega }\left\{ \Vert y-x\Vert \right\} . \) The corresponding projector onto the set \(\varOmega \) is given by \(\mathcal {P}_\varOmega : \mathcal {E}\rightrightarrows \mathcal {E},~ x \mapsto \{y \in \varOmega | \) \(\text {dist} (x, \varOmega ) = || y - x || \}\). A single element of \(\mathcal {P}_\varOmega x \) is called a projection. Similarly to the projector, the reflector onto a set \(\varOmega \) is defined by \(\mathcal {R}_\varOmega : \mathcal {E}\rightrightarrows \mathcal {E},~ x \mapsto 2\mathcal {P}_C x -x,\) which is again a set. A single element in \(\mathcal {R}_\varOmega \) is called a reflection.

The regularity of a set influences the properties of the corresponding projector onto the set. The best properties are generated by convex sets. A convex set \(\varOmega \) is defined as a set that contains the line segment between any two points \(x,y\in \varOmega \). The projector onto a convex set is not only single-valued, but can be characterized by a variational inequality (see for instance [12, Theorem 3.14]). As we see in Sect. 23.3.2 the algorithms considered here are all composed of projectors and reflectors. This leads to an analysis of the projectors onto the sets introduced in Sect. 23.2. The projector onto the measurement sets \(\mathcal {M}_j\), defined in (23.1) was already discussed in Sect. 6.1.2. The projectors onto the support constraint sets are even simpler. The following statement is taken from [1, Example 3.14].

Lemma 23.1

(projectors onto support constraints) Let \(y \in {{\mathbb {C}}^{n}}\), \(I\subset \left\{ 1,2,\dots ,n\right\} \). Then the following hold.

$$\begin{aligned} \mathcal {P}_{\mathfrak {S}} y = z\quad \text {where}\;z_j&={\left\{ \begin{array}{ll}y_j &{} \text {if}\;j \in I\\ 0\qquad &{} \text {otherwise}\end{array}\right. }&\text {for }1\le i\le n, \end{aligned}$$
(23.3)
$$\begin{aligned} \mathcal {P}_{\mathfrak {S}_+} y =z\quad \text {where}\;z_j&={\left\{ \begin{array}{ll}\max \left\{ Re(y_j),0\right\} &{} \text {if}\;j \in I\\ 0\quad &{}\text {otherwise}\end{array}\right. }&\text {for }1\le i\le n. \end{aligned}$$
(23.4)

The projectors onto other constraint sets can be found, for instance, in [13] or [14] for a sparsity constraint, or in [1, Example 3.14] for an amplitude constraint or real-valued sparsity constraint. Except for the amplitude and sparsity constraint, all other mentioned constraint sets are closed and convex. The type of regularity of the constraint sets is later discussed in Remark 23.5.1.

Another concept closely related to that of projectors are normal cones.

Definition 23.2

(Normal cones) Let \(\varOmega \subseteq \mathcal {E}\). Define the cone containing \(\varOmega \) by

$$ \mathrm {cone}(\varOmega ):= \mathbb {R}_+\cdot \varOmega :=\left\{ \kappa s\,|\,\kappa \in \mathbb {R}_+, s\in \varOmega \right\} . $$

Let \(\varOmega \subseteq \mathcal {E}\) and let \({x}\in \varOmega \).

  1. (i)

    The proximal normal cone of \(\varOmega \) at \(\bar{x}\) is defined by

    $$ N^{\mathrm {prox}}_{\varOmega }({x}) = \mathrm {cone}\left( \mathcal {P}^{-1}_\varOmega {x}-{x}\right) .$$

    Equivalently, \({x}^*\in N^{\mathrm {prox}}_{\varOmega }({x})\) whenever there exists \(\sigma \ge 0\) such that

    $$\langle {x}^*,y-{x}\rangle \le \sigma \Vert y-{x}\Vert ^2 \quad (\forall y\in \varOmega ).$$
  2. (ii)

    The limiting (proximal) normal cone of \(\varOmega \) at x is defined by

    $$ {N}_{\varOmega }({x}) = \mathop {\mathrm{Lim\,sup}\,}_{z\rightarrow {x}}N^{\mathrm {prox}}_{\varOmega }(z), $$

    where the limit superior is taken in the sense of Painlevé-Kuratowski outer limit (for more details on the outer limit see for instance [15, Chap. 4]).

When \({x}\not \in \varOmega \) all normal cones at x are empty (by definition). If the set \(\varOmega \) is convex, the given definitions of the normal cones coincide (see for instance [16]).

3.2 Algorithms

In the context of feasibility problems, a prominent class of iterative algorithms are projection algorithms. Under these, the most prominent and probably one of the easiest to compute is the method of cyclic projections as introduced in Sect. 6.2.1. Given a finite number of closed sets \(\varOmega _1,\varOmega _2,\dots ,\varOmega _m\subseteq \mathcal {E}\) and a point it generates the next iterate by consecutively projecting onto each of the individual sets. For only two sets the algorithm reduces to the method of alternating projections. In Sect. 6.2.3 the error reduction algorithm was identified with the method of alternating projections applied to a measurement and a support constraint. This connection was first made by Levi and Stark in [17]. Considering again only two sets, Sect. 6.1.2 introduced the well-known Douglas-Rachford algorithm as well as its relaxed version, the relaxed averaged alternating reflection algorithm introduced by Luke in [10]. For one magnitude constraint and a support constraint Douglas-Rachford yields Fienup’s hybrid input output method (HIO) [18]. The connection of HIO and Douglas-Rachford was already observed by Bauschke et al. [1]. These three algorithms are the ones we want to focus on here. Nevertheless, we want to emphasize that the analysis shown below can be applied also to other projection methods.

Our survey is far from complete. Other approaches worthy of mention are several of the algorithms discussed in Chap. 5 and those in Chap. 6. Readers familiar with the physics literature will also miss the Hybrid Projection Reflection algorithm, [19], difference map, [20], solvent flipping algorithm, [21], and Fienup’s Basic Input-Output algorithm (BIO). BIO is, in fact, nothing more than Dykstra’s algorithm, see [1]. Like the BIO algorithm, most of the known approaches to phase retrieval fit into a concise scheme presented in [22].

3.3 Fixed Points and Regularities of Mappings

We refer to \(\mathsf {Fix}\,T\) as the set of fixed points of the mapping T, i.e. \(x\in \mathsf {Fix}\,T\) if and only if \(x \in Tx\). The continuity of set-valued mappings is a well-developed concept and follows the familiar patterns of continuity for single-valued functions. One key property is nonexpansiveness, which nothing more than being Lipschitz continuous with constant 1. That is, given two points, their images under the mapping T are no further away from each other than the initial points. A slightly stronger notion than nonexpansiveness is averagedness. For set-valued mappings, a finer distinction of the types of continuity, whether pointwise, or uniform, for example, is necessary. The following definition captures the crucial types of continuity and regularity of set-valued mappings that lie at the heart of numerical analysis of algorithms for phase retrieval.

Definition 23.3

(almost nonexpansive/averaged mappings)  Let \(D\subseteq \mathcal {E}\) and \(T: D\rightrightarrows \mathcal {E}\).

  1. (i)

    T is said to be pointwise almost nonexpansive on D at \(y \in D\) if there exists a constant \(\epsilon \in [0,1)\) such that

    $$\begin{aligned} \Vert x^+-y^+\Vert \le \sqrt{1+\epsilon }\Vert x-y\Vert \quad (\forall \ y^+\in Ty)(\forall x^+\in Tx) (\forall x \in D). \end{aligned}$$
    (23.5)

    If (23.5) holds with \(\epsilon =0\) then T is called pointwise nonexpansive at y on D.

    If T is pointwise (almost) nonexpansive at every point on a neighborhood of y (with the same violation constant \(\epsilon \)) on D, then T is said to be (almost) nonexpansive at y (with violation \(\epsilon \)) on D.

    If T is pointwise (almost) nonexpansive on D at every point \(y \in D\) (with the same violation constant \(\epsilon \)), then T is said to be pointwise (almost) nonexpansive on D (with violation \(\epsilon \)). If D is open and T is pointwise (almost) nonexpansive on D, then it is (almost) nonexpansive on D.

  2. (ii)

    T is called pointwise almost averaged on D at y if there is an averaging constant \(\alpha \in (0,1)\) and a violation constant \(\epsilon \in [0,1)\) such that the mapping \(\tilde{T}\) defined by \( T=(1-\alpha )\mathrm {Id}+\alpha \tilde{T} \) is pointwise almost nonexpansive at y with violation \(\epsilon /\alpha \) on D.

    Similarly, if \(\tilde{T}\) is (pointwise) (almost) nonexpansive on D (at y) (with violation \(\epsilon \)), then T is said to be (pointwise)(almost) averaged on D (at y) (with averaging constant \(\alpha \) and violation \(\alpha \epsilon \)).

    If the averaging constant \(\alpha =\frac{1}{2}\), then T is said to be (pointwise) (almost) firmly nonexpansive on D (with violation \(\epsilon \)) (at y).

From the above definition it can easily be seen that if a set-valued mapping is nonexpansive at a point, then it is single-valued there. This is a crucial property for our analytical framework, but should not be confused with uniqueness of fixed points: a multi-valued operator can be single-valued at its fixed points without having unique fixed points.

Proposition 23.4

(single-valuedness, Proposition 2.2 of [3]) Let \(T: \mathcal {E}\rightrightarrows \mathcal {E}\) be pointwise almost averaged on \(D\subset \mathcal {E}\) at \({\overline{x}}\in D\) with violation \(\epsilon \ge 0\). Then T is single-valued at \({\overline{x}}\). In particular, if \({\overline{x}}\in \mathsf {Fix}\,T\), then \(T{\overline{x}}=\left\{ {\overline{x}}\right\} \).

Averaged mappings do not enjoy as nice a calculus as nonexpansive mappings, but the next proposition shows that averagedness of some sort is preserved under addition and composition.

Proposition 23.5

(compositions, Proposition 2.4 of [3]) Let \( T_j: \mathcal {E}\rightrightarrows \mathcal {E}\) for \(j=1,2, \dots , m\) be pointwise almost averaged on \(U_j\) at all \(y_j \in S_j \subset \mathcal {E}\) with violation \(\epsilon _j\) and averaging constant \(\alpha _j\in (0,1)\) where \(U_j \supset S_j\) for \(j=1,2,\dots , m\).

  1. (i)

    If \(U := U_1=U_2=\cdots =U_m\) and \(S:= S_1 =S_2=\cdots =S_m\) then the weighted mapping \(T:= \sum _{j=1}^m w_jT_j\) with weights \(w_j\in [0,1], \ \sum _{j=1}^mw_j=1\), is pointwise almost averaged at all \(y \in S\) with violation \(\epsilon =\sum _{j=1}^m w_j\epsilon _j\) with averaging constant \(\alpha =\max _{j=1,2, \dots ,m}\left\{ \alpha _j\right\} \) on U.

  2. (ii)

    If \(T_jU_j\subseteq U_{j-1}\) and \(T_jS_j\subseteq S_{j-1}\) for \(j=2,3, \dots ,m\), then the composite mapping \(T:= T_1\circ T_2\circ \cdots \circ T_m\) is pointwise almost averaged at all \(y \in S_m\) on \(U_m\) with violation at most \( \epsilon =\prod _{j=1}^m\left( 1+\epsilon _j\right) -1. \) and averaging constant at least \( \alpha =m/\left( m-1+\frac{1}{\max _{j=1,2,\dots ,m}\left\{ \alpha _j\right\} }\right) . \)

4 A Toolkit for Convergence

With the characterization of algorithms as simply self mappings with certain regularity properties, we show in this section how those properties come together to guarantee convergence of the algorithm iterations to fixed points. The fixed points need not be solutions to the feasibility problem (indeed, this does not exist for phase retrieval) but will in general be a point that allows one to compute another point that does have some physical significance, such as a local best approximation point.

It turns out that convergence itself is provided by regularity properties introduced in Sect. 23.3.3. The basic convergence idea goes back to Opial [4]. It says that averagedness of a single-valued mapping T and nonemptyness of the fixed point set imply convergence of the iterative sequence \((T^kx^0)_{k\in \mathbb {N}}\) to a point in \(\mathsf {Fix}\,T\) for any \(x^0\in \mathcal {E}\). Henceforth, we will see that averagedness of T and a nonempty fixed point set is enough to get convergence. As one would expect, it can be difficult for a map to satisfy these properties globally. Nevertheless, this is often the case in nonconvex problem instances. Thus, we seek a statement that includes local properties. That is in our case pointwise almost averagedness as introduced in Definition 23.3.

But convergence alone for iterative procedures is not enough: eventually one has to stop the iteration and without knowing the rate of convergence it is impossible to estimate how far a given iterate must be to the solution. A quantitative convergence analysis is achieved with the second essential property: metric (sub-)regularity. This concept has been studied by many authors in the literature (see for instance [5,6,7, 15, 23, 24]). For the definition of metric regularity we need gauge functions. A function \(\mu : [0,\infty )\rightarrow [0,\infty ) \) is a gauge function if it is continuous and strictly increasing with \(\mu (0)=0\) and \(\lim _{t\rightarrow \infty }\mu (t)=\infty \). The following definition is taken from [3, Definition 2.5].

Definition 23.6

(metric regularity on a set) Let \(\varPhi : \mathcal {E}\rightrightarrows \mathbb {Y}\), \(U \subset \mathcal {E}\), \(V \subset \mathbb {Y}\). The mapping \(\varPhi \) is called metrically regular with gauge \(\mu \) on \(U \times V\) relative to \(\varLambda \subset \mathcal {E}\) if

$$\begin{aligned} \mathrm {dist}\left( x, \varPhi ^{-1}(y)\cap \varLambda \right) \le \mu \left( \mathrm {dist}\left( y, \varPhi (x)\right) \right) \end{aligned}$$
(23.6)

holds for all \(x \in U \cap \varLambda \) and \(y \in V\) with \(0<\mu \left( \mathrm {dist}\left( y, \varPhi (x)\right) \right) \). When the set V consists of a single point, \(V=\left\{ \bar{y}\right\} \), then \(\varPhi \) is said to be metrically subregular for \(\bar{y}\) on U with gauge \(\mu \) relative to \(\varLambda \subset \mathcal {E}\).

When \(\mu \) is a linear function (that is, \(\mu (t)=\kappa t, \forall t \in [0,\infty )\)) one says “with constant \(\kappa \)” instead of “with gauge \(\mu (t)=\kappa t\)”. When \(\varLambda =\mathcal {E}\), the quantifier “relative to” is dropped. When \(\mu \) is linear, the smallest constant \(\kappa \) for which (23.6) holds is called modulus of metric regularity.

While this definition might seem abstract there are properties that directly imply metric regularity or reformulations that allow to prove metric regularity. One of these is polyhedrality (see [3, Proposition 2.6]). A mapping \(T: \mathcal {E}\rightrightarrows \mathcal {E}\) is called polyhedral if its graph is the union of finitely many sets that can be expressed as the intersection of finitely many closed half-spaces and/or hyper-planes [5].

Collecting the concepts we have established so far, we present the following convergence result that goes back to Luke et al. in [3, Theorem 2.2] and was later refined in [25] by Luke et al. to convergence to a specific point.

Theorem 23.4.1

(basic convergence template with metric regularity)  Let \(T: \varLambda \rightrightarrows \varLambda \) for \(\varLambda \subset \mathcal {E}\), \(\varPhi := T-\mathrm {Id}\) and let \(S \subset {\text {ri}}\varLambda \) be closed and nonempty with \(TS\subset \mathsf {Fix}\,T \cap S\). Denote \(\left( S+\delta {\mathbb {B}}\right) \cap \varLambda \) by \(S_{\delta }\) for a nonnegative real \(\delta \). Suppose that, for all \(\bar{\delta }>0\) small enough, there are \(\gamma \in (0,1)\), a nonnegative scalar \(\epsilon _i\) and a positive constant \(\alpha \) bounded above by 1, such that,

  1. (i)

    T is pointwise almost averaged at all \(y \in S\) with averaging constant \(\alpha \) and violation \(\epsilon \) on \(S_{\gamma ^i\bar{\delta }}\), and

  2. (ii)

    for \( R_i:= S_{\gamma ^i\bar{\delta }}\setminus \left( \mathsf {Fix}\,T \cap S+\gamma ^{i+1}\bar{\delta }{\mathbb {B}}\right) , \)

    1. (i)
      $$\mathrm {dist}\left( x,S\right) \le \mathrm {dist}\left( x, \varPhi ^{-1}(\bar{y})\cap \varLambda \right) $$

      for all \(x\in R_i\) and \(\bar{y}\in \varPhi \left( \mathcal {P}_S x \right) \setminus \varPhi (x)\),

    2. (ii)

      \(\varPhi \) is metrically regular with gauge \(\mu _i\) relative to \(\varLambda \) on \(R_i\times \varPhi \left( \mathcal {P}_S(R_i)\right) \), where \(\mu _i\) satisfies

      $$\begin{aligned} \sup _{x\in R_i, \bar{y}\in \varPhi \left( \mathcal {P}_S(R_i)\right) , \bar{y}\notin \varPhi (x)}\frac{\mu _i\left( \mathrm {dist}\left( \bar{y},\varPhi (x)\right) \right) }{\mathrm {dist}\left( \bar{y}, \varPhi (x)\right) }\le \kappa _i< \sqrt{\frac{1-\alpha _i}{\epsilon _i\alpha _i}}. \end{aligned}$$
      (23.7)

Then, for any \(x^0 \in \varLambda \) close enough to S, the iterates \(x^{j+1}\in Tx^j\) satisfy

\(\mathrm {dist}\left( x^j, \mathsf {Fix}\,T\cap S\right) \rightarrow 0\) and

$$\begin{aligned} \mathrm {dist}\left( x^{j+1}, \mathsf {Fix}\,T\cap S\right) \le c\mathrm {dist}\left( x^j, S\right) \quad \left( \forall x^j \in R_i\right) , \end{aligned}$$
(23.8)

where \(c_i:= \sqrt{1+\epsilon -\left( \frac{1-\alpha }{\kappa _i^2\alpha }\right) }<1\).

In particular, if \(\kappa _i\le \bar{\kappa }<\sqrt{\frac{1-{\alpha }}{{\alpha }{\epsilon }}}\) for all i large enough, then convergence is eventually at least R-linear with rate at most \(\bar{c}:= \sqrt{1+\bar{\epsilon }-\left( \frac{1-{\alpha }}{\bar{\kappa }^2{\alpha }}\right) }\) to some point in \(\mathsf {Fix}\,T\cap S\). If \(S\cap \varLambda \) is a singleton, then (iii) is redundant and convergence is Q-linear.

In both Opial’s original statement as well as Theorem 23.4.1 averagedness is the essential property for convergence of iterative algorithms. Whereas assumption (ii) of Theorem 23.4.1 serves to quantify the convergence.

5 Regularities of Sets and Their Collection

In this section we connect the regularities of sets to regularities of the projectors on these, which effect the regularity of the mapping T. When dealing with nonconvex sets there are numerous set-regularity definitions available. A recent survey by Kruger et al. [26], sorted the different classes of nonconvex sets to highlight their dependencies and differences. Uniting several concepts of regularity, we propose to use the notion of \(\epsilon \)-set regularity as introduced in [26] and refined in [27].

Definition 23.7

(\(\epsilon \)-set regularity) Let \(\varOmega \subset \mathcal {E}\) be nonempty and let \({\overline{x}}\in \varOmega \). The set \(\varOmega \) is said to be \(\epsilon \)-subregular relative to \(\varLambda \) at \({\overline{x}}\) for \(\left( {\overline{y}},{\overline{v}}\right) \in \mathrm {gph}\left( {N}_{\varOmega }\right) \) if it is locally closed at \({\overline{x}}\) and there exists an \(\epsilon >0\) together with a neighborhood U of \({\overline{x}}\) such that

$$\begin{aligned} \left\langle {\overline{v}}-(x-x^+),x^+-{\overline{y}}\right\rangle \le \epsilon \Vert {\overline{v}}-&(x-x^+)\Vert \Vert x^+-{\overline{y}}\Vert ,\nonumber \\&(\forall x \in \varLambda \cap U))(x^+ \in \mathcal {P}_\varOmega x). \end{aligned}$$
(23.9)

if for every \(\epsilon >0\) there is a neighborhood (depending on \(\epsilon \)) such that (23.9) holds, then \(\varOmega \) is said to be subregular relative to \(\varLambda \) at \({\overline{x}}\) for \(\left( {\overline{y}},{\overline{v}}\right) \in \mathrm {gph}\left( {N}_{\varOmega }\right) \). If \(\varLambda =\left\{ {\overline{x}}\right\} \), then the qualifier “relative to” is dropped.

In the phase retrieval problem one type of nonconvexity, that is also covered by \(\epsilon \)-subregularity, is prox-regularity.

Definition 23.8

(prox-regular sets) A closed set \(\varOmega \) is prox-regular at \({\overline{x}}\in \varOmega \) if for \({\overline{v}}\in {N}_{\varOmega }({\overline{x}})\) there exist \(\epsilon ,\delta >0\) such that

$$\begin{aligned} \frac{\epsilon }{2}\Vert x-c\Vert ^2\ge \left\langle v,x-c\right\rangle \quad (\forall x,c \in \varOmega \cap {\mathbb {B}}_\delta ({\overline{x}})) (\forall v \in {N}_{\varOmega }(c)\cap {\mathbb {B}}_\delta ({\overline{v}})). \end{aligned}$$

This definition dates back to Federer [28] who called the property sets with positive reach. The definition presented here is taken from [29, Proposition 1.2]. The authors in [29] showed that their definition of prox-regularity at \({\overline{x}}\in C\) is equivalent to several statements. One of the most prominent might be local single-valuedness of the projector [29, Theorem 1.3] around \({\overline{x}}\). Kruger et al. showed that prox-regularity implies \(\epsilon \)-subregularity in [26, Proposition 4(vi)]. As the next remark shows all constraint sets involved in the phase retrieval problem are, in fact, prox-regular.

Remark 23.5.1

(phase retrieval constraint sets are prox-regular) Of great importance for the convergence analysis of the introduced algorithms is the \(\epsilon \)-subregularity of the measurement sets defined in (23.1). By [3, Example 3.1.b] circles are subregular at any of their points \({\overline{x}}\) for all \(\left( {\overline{x}}, v\right) \) in the graph of the normal cone of the sets. As mentioned before \(\epsilon \)-subregularity covers a divers range of regularity notions for sets. The measurement sets investigated here are in fact shown to be semi-algebraic [30, Proposition 3.5] and prox-regular by [29, Theorem 1.3] and (6.11).

The other sets that are involved in the phase retrieval problem are the qualitative constraints introduced in (23.2) or mentioned before. Except for the amplitude constraint and the sparsity constraint all of these sets are convex and thus by [3, Proposition 3.1 (vii)] subregular. Fortunately, the amplitude constraint describes coordinatewise circles when the other coordinates are fixed, like the measurement constraint. Hence, the amplitude constraint is \(\epsilon \)-subregular as well (and additionally semi-algebraic and prox-regular). The sparsity constraint \(\mathcal {A}_s\) is prox-regular at all points \({\overline{x}}\) satisfying \(\Vert {\overline{x}}\Vert _0=s\) (similar to the proof in [14, Proposition 4.4]).

By [12, Proposition 4.8] the projector onto a closed convex set is averaged with constant \(\alpha =1/2\). Allowing sets to have a more general regularity, here prox-regularity, yield regularity of the projectors as well.

Proposition 23.9

(projectors and reflectors onto prox-regular sets) Let \(\varOmega \subset \mathcal {E}\) be nonempty closed, and let U be a neighborhood of \({\overline{x}}\in C\). Let \(\varLambda \subset \varOmega \cap U\). If \(\varOmega \) is prox-regular at \({\overline{x}}\) with constant \(\epsilon \) on the neighborhood U, then the following hold.

  1. (i)

    Let \(\epsilon \in [0,1)\). The projector \(\mathcal {P}_\varOmega \) is pointwise almost firmly nonexpansive at each \(y\in \varLambda \) with violation \(\epsilon _2:= 2\epsilon +2\epsilon ^2\) on U. That is, at each \(y\in \varLambda \)

    $$\begin{aligned} \Vert x-y\Vert ^2+\Vert x'-x\Vert \le \left( 1+\epsilon _2\right) \Vert x'-y'\Vert ^2\quad \left( \forall x'\in \right) \left( \forall x \in \mathcal {P}_\varOmega x'\right) . \end{aligned}$$
  2. (ii)

    The reflector \(\mathcal {R}_\varOmega \) is pointwise almost nonexpansive at each \(y \in \varLambda \) with violation \(\epsilon _3:= 4\epsilon +4\epsilon ^2\) on U; that is, for all \(y \in \varLambda \)

    $$\begin{aligned} \Vert x-y\Vert&\le \sqrt{1+\epsilon _3}\Vert x'-y\Vert \quad \left( \forall x' \in U\right) \left( \forall x \in \mathcal {P}_\varOmega x'\right) . \end{aligned}$$

Proof

By [26, Proposition 4(vi)] prox-regularity of \(\varOmega \) at \({\overline{x}}\) implies that the set \(\varOmega \) is \(\epsilon \)-subregular at \({\overline{x}}\) for all \((c,v)\in \mathrm {gph}{N}_{\varOmega }\), where \(c\in U\). The result follows then from [3, Theorem 3.1].

Note that Proposition 23.9 presents a special case of [3, Theorem 3.1], where the authors allowed their sets to be \(\epsilon \)-subregular for certain normal vectors. By Proposition 23.5 compositions and convex combinations of averaged mappings are again averaged. Combining this with Proposition 23.9 implies that compositions of projectors are averaged. Thus, the algorithms presented in Sect. 23.3.2 are pointwise almost averaged as we see in Sect. 23.7.

Whereas the regularity of the individual sets imply almost averagedess of the mapping T, metric regularity relies on the regularity of the whole collection of sets \(\left\{ \varOmega _1,\varOmega _2, \dots , \varOmega _m\right\} \). The idea of regularities of collections of sets traces back to [26, Theorem 3] by Kruger, Luke and Thao, but the analysis there covers only consistent feasibility problems, i.e. the intersection of sets is nonempty. A generalized notion of subtransversality proposed in [3, Definition 3.2] includes inconsistent settings too.

Definition 23.10

(subtransversal collection of sets) Let \(\left\{ \varOmega _1, \dots , \varOmega _m\right\} \) be a collection of nonempty closed subsets of \(\mathcal {E}\) and define \(\varUpsilon : \mathcal {E}^m\rightrightarrows \mathcal {E}^m \) by \(\varUpsilon (x):= \mathcal {P}_\varOmega \left( \varPi x \right) -\varPi x\) where \(\varOmega := \varOmega _1\times \varOmega _2\times \dots \times \varOmega _m\), the projection \(\mathcal {P}_\varOmega \) is with respect to the Euclidean norm on \(\mathcal {E}^m\) and \(\varPi : x=\left( x_1,x_2, \dots , x_m\right) \mapsto \left( x_2, x_3,\dots , x_m,x_1\right) \) is the permutation mapping on the product space \(\mathcal {E}^m\) for \(x_j \in \mathcal {E}\ \left( j=1,2, \dots , m\right) \). Let \({\overline{x}}=\left( {\overline{x}}_1, {\overline{x}}_2, \dots , {\overline{x}}_m\right) \in \mathcal {E}^m\) and \({\overline{y}}\in \varUpsilon ({\overline{x}})\). The collection of sets is said to be subtransversal with gauge \(\mu \) relative to \(\varLambda \subset \mathcal {E}^m\) at \({\overline{x}}\) for \({\overline{y}}\) if \(\varUpsilon \) is metrically subregular at \({\overline{x}}\) for \({\overline{y}}\) on some neighborhood U of \({\overline{x}}\) (metrically regular on \(U \times \left\{ {\overline{y}}\right\} \)) with gauge \(\mu \) relative to \(\varLambda \). As in Definition 23.6, when \(\mu (t)=\kappa t, \ \forall t \in [0,\infty )\), one says “constant \(\kappa \)” instead of “gauge \(\mu (t)=\kappa t\)”. When \(\varLambda =\mathcal {E}\), the quantifier “relative to” is dropped.

In [3, Proposition 3.3] Luke et al. showed that for a consistent feasibility problem subtransversality of the collection of sets is equivalent to what is elsewhere recognized as linear regularity of the collection [31].

6 Analysis of Cyclic Projections

Having introduced the main tools for convergence, this section is devoted to an explicit demonstration of how this framework can be applied. In particular, we present the main steps of the convergence analysis of the cyclic projection mapping as done by Luke et al. in [3].

As introduced in Algorithm 6.2.1 the method of cyclic projections on a finite collection of closed subsets of \(\mathcal {E}\), \(\left\{ \varOmega _1,\varOmega _2,\dots ,\varOmega _m\right\} \) (\(m\ge 2\)), is defined by the mapping

$$\begin{aligned} \mathcal {P}_0:= \mathcal {P}_{\varOmega _1}\mathcal {P}_{\varOmega _2}\cdots \mathcal {P}_{\varOmega _m}\mathcal {P}_{\varOmega _1}, \end{aligned}$$
(23.10)

that we denote for notational simplicity by \(\mathcal {P}_0\). For an initial point \(u^0\) the algorithm generates a sequence \(\left( u^k\right) _{k \in \mathbb {N}}\) by \(u^{k+1}\in \mathcal {P}_0 u^k \). For the analysis of \(\mathcal {P}_0\) it is convenient to introduce some auxiliary sets. We denote by \(\varOmega \) the product of the sets \(\varOmega _j\) on \(\mathcal {E}^m\),

$$\begin{aligned} \varOmega := \varOmega _1\times \varOmega _2\times \cdots \times \varOmega _m. \end{aligned}$$

Let \({\overline{u}}\in \mathsf {Fix}\,\mathcal {P}_0\) and fix \({\bar{\zeta }}\in \mathcal {Z}({\overline{u}})\) where

$$\begin{aligned} \mathcal {Z}(u):= \left\{ \zeta := z-\varPi z\,|\,z\in W_0\subset \mathcal {E}^m, \ z_1=u\right\} \end{aligned}$$
(23.11)

for

$$\begin{aligned} W_0:= \left\{ x\in \mathcal {E}^m\,|\,x_m \in \mathcal {P}_{\varOmega _m}x_1,\ x_j\in \mathcal {P}_{\varOmega _j}x_{j+1},\ j=1,2,\dots ,m-1\right\} . \end{aligned}$$
(23.12)

Note that \(\sum _{j=1}^m{\bar{\zeta }}_j=0\). The elements of \(W_0\) are all cycles of the cyclic projection method, where each coordinate of x corresponds to an inner iterate of \(\mathcal {P}_0\). The first coordinate \(x_1\) of \(x\in W_0\) is, thus, a fixed point of \(\mathcal {P}_0\). The vectors \(\zeta \in \mathcal {Z}(u)\) are called difference vectors. Their coordinate entries provide information about the gaps between the inner iterates of a cycle of the mapping \(\mathcal {P}_0\).

To monitor the inner iterations, we consider the cyclic projection algorithm lifted to the product space \(\mathcal {E}^m\). That is, generate the sequence \((x^k)_{k\in \mathbb {N}}\) by \(x^{k+1}\in T_{\bar{\zeta }}x^k\) with

$$\begin{aligned} T_{\bar{\zeta }}: \mathcal {E}^m\rightrightarrows \mathcal {E}^m x \mapsto \left\{ \left( x_1^+, x_1^+- {\bar{\zeta }}_1, \dots , x_1^+-\sum _{j=1}^{m-1}{\bar{\zeta }}_j\right) \,|\,x_1^+\in \mathcal {P}_0 x_1\right\} \end{aligned}$$
(23.13)

for \({\bar{\zeta }}\in \mathcal {Z}({\overline{u}})\) where \({\overline{u}}\in \mathsf {Fix}\,\mathcal {P}_0\). Thus, the first entry of \(T_{\bar{\zeta }}\) belongs to the cyclic projection mapping \(\mathcal {P}_0\). Whereas the other entries of \(T_{\bar{\zeta }}x \) indicate how close or distant \(x_1^+\) is from a certain cycle specified by \({\bar{\zeta }}\). In order to isolate cycles, we restrict our attention to relevant subsets of \(\mathcal {E}^m\). These are

$$\begin{aligned} W({\bar{\zeta }})&:= \left\{ x\in \mathcal {E}^m\,|\,x-\varPi x={\bar{\zeta }}\right\} ,\end{aligned}$$
(23.14)
$$\begin{aligned} L&:= \text { an affine subspace with } T_{\bar{\zeta }}: L\rightrightarrows L,\end{aligned}$$
(23.15)
$$\begin{aligned} \varLambda&:= L\cap W({\bar{\zeta }}). \end{aligned}$$
(23.16)

The set \(W({\bar{\zeta }})\) contains all points whose entries have a certain distance to each other, namely \({\bar{\zeta }}_i\). In particular, \(W({\bar{\zeta }})\) contains all fixed points of \(T_{\bar{\zeta }}\). The affine subspace L is used to restrict the analysis to an affine subspace that contains the iterates \(x^k\) of \(T_{\bar{\zeta }}\).

To apply the convergence framework, Theorem 23.4.1, there are two major steps we have to take. First, we have to show that the mapping is averaged. Since the cyclic projection mapping is, as its name suggests, a composition of projectors averagedness, this not hard to show by the concepts presented in Sect. 23.5. Second, metric subregularity needs to be proven. For this, we state an auxiliary result that relates metric subregularity to subtransversality of the collection of sets (see [3, Proposition 3.4]).

Proposition 23.11

(metric subregularity of cyclic projections) Let \({\overline{u}}\in \mathsf {Fix}\,\mathcal {P}_0\) and \({\bar{\zeta }}\in \mathcal {Z}({\overline{u}})\) and let \({\overline{x}}=\left( {\overline{x}}_1,{\overline{x}}_2, \dots , {\overline{x}}_m\right) \in W_0\) satisfy \({\bar{\zeta }}={\overline{x}}-\varPi {\overline{x}}\) with \({\overline{x}}_1={\overline{u}}\). For L an affine subspace containing \({\overline{x}}\), let \(T_{\bar{\zeta }}: L\rightrightarrows L\) and define the mappings for \(\varPhi _{{\bar{\zeta }}}:= T_{\bar{\zeta }}-\mathrm {Id}\) and \(\varUpsilon := \left( \mathcal {P}_\varOmega -\mathrm {Id}\right) \circ \varPi \). Suppose the following hold:

  1. (i)

    the collection of sets \(\left\{ \varOmega _1,\varOmega _2, \dots , \varOmega _m\right\} \) is subtransversal at \({\overline{x}}\) for \({\bar{\zeta }}\) relative to \(\varLambda := L \cap W({\bar{\zeta }})\) with constant \(\kappa \) and neighborhood U of \({\overline{x}}\);

  2. (ii)

    there exists a positive constant \(\sigma \) such that

    $$\begin{aligned} \mathrm {dist}\left( {\bar{\zeta }}, \varUpsilon (x)\right) \le \sigma \mathrm {dist}\left( 0,\varPhi _{\bar{\zeta }}(x)\right) , \quad \forall x \in \varLambda \cap U \text { with }x_1\in \varOmega _1. \end{aligned}$$

Then \(\varPhi \) is metrically subregular for 0 on U (metrically regular on \(U \times \left\{ 0\right\} \)) relative to \(\varLambda \) with constant \({\bar{\kappa }}=\kappa \sigma \).

Proposition 23.11 indicates that subtransversality plus the additional assumption (ii) are enough to deduce metric subregularity of \(\varPhi _{\bar{\zeta }}:= T_{\bar{\zeta }}-\mathrm {Id}\) as required in Theorem 23.4.1. Using this connection and the development in Sect. 23.5 about almost averagedness we can state the following convergence result which is an implication of Theorem 23.4.1.

Theorem 23.6.1

(convergence of cyclic projections)  Let \(S_0\subset \mathsf {Fix}\,\mathcal {P}_0\ne \emptyset \) and \(Z:= \cup _{u \in S_0}\mathcal {Z}(u)\). Define

$$\begin{aligned} S_j:= \cup _{\zeta \in Z}\left( S_0-\sum _{i=1}^{j-1}\zeta _i\right) \quad j=1,2,\dots ,m. \end{aligned}$$
(23.17)

Let \(U:= U_1\times U_2\times \cdots \times U_m\) be a neighborhood of \(S:= S_1\times S_2\times \cdots \times S_m\) and suppose that

$$\begin{aligned} \mathcal {P}_{\varOmega _j}\left( u-\sum _{i=1}^j\zeta _i\right)&\subseteq S_0-\sum _{i=1}^{j-1} \quad \forall u \in S_0, \ \forall \zeta \in Z \text { for each } j=1,2,\dots ,m,\end{aligned}$$
(23.18)
$$\begin{aligned} \mathcal {P}_{\varOmega _j}U_{j+1}&\subseteq U_j \text { for each }j=1,2,\dots ,m \quad \left( U_{m+1}:= U_1\right) . \end{aligned}$$
(23.19)

For fixed \({\bar{\zeta }}\in Z\) and \({\overline{x}}\in S\) with \({\bar{\zeta }}={\overline{x}}-\varPi {\overline{x}}\), generate the sequence \((x^k)_{k \in \mathbb {N}}\) by \(x^{k+1}\in T_{\bar{\zeta }}x^k\) for \(T_{\bar{\zeta }}\) defined by (23.13), seeded by a point \(x^0\in W({\bar{\zeta }})\cap U\) for \(W({\bar{\zeta }})\) defined by (23.14) with \(x^0_1\in \varOmega _1\cap U_1\).

Suppose that, for \(\varLambda := L\cap \mathrm {aff}\left( \cup _{\zeta \in Z}W(\zeta )\right) \supset S\) such that \(T_\zeta : \varLambda \rightrightarrows \varLambda \) for all \(\zeta \in Z\) and an affine subspace \(L\supset \mathrm {aff}(x^k)){k \in \mathbb {N}}\), the following hold:

  1. (i)

    \(\varOmega _j\) is prox-regular at all \({\widehat{x}}_j\in S_j\) with constant \(\epsilon _j\in (0,1)\) on the neighborhood \(U_j\) for \(j=1,2,\dots ,m\);

  2. (ii)

    for each \({\widehat{x}}=\left( {\widehat{x}}_1,{\widehat{x}}_2,\dots ,{\widehat{x}}_m\right) \in S\), the collection of sets \(\left\{ \varOmega _1,\varOmega _2, \dots , \varOmega _m\right\} \) is subtransversal at \({\widehat{x}}\) for \({\widehat{\zeta }}:= {\widehat{x}}-\varPi {\widehat{x}}\) relative to \(\varLambda \) with constant \(\kappa \) on the neighborhood U;

  3. (iii)

    for \(\varUpsilon _{{\bar{\zeta }}}:= T_{\bar{\zeta }}-\mathrm {Id}\) and \(\Psi := \left( \mathcal {P}_\varOmega -\mathrm {Id}\right) \circ \varPi \) there exists a positive constant \(\sigma \) such that for all \({\bar{\zeta }}\in Z\)

    $$\begin{aligned} \mathrm {dist}\left( {\widehat{\zeta }}, \varUpsilon (x)\right) \le \sigma \,\mathrm {dist}\left( 0,\varPhi _{\bar{\zeta }}(x)\right) \end{aligned}$$

    holds whenever \(x \in \varLambda \cap U\) with \(x_1\in \varOmega _1\);

  4. (iv)

    \(\mathrm {dist}\left( x,S\right) \le \mathrm {dist}\left( x, \varPhi _{\bar{\zeta }}^{-1}(0)\cap \varLambda \right) \) for all \(x\in U\cap \varLambda \), for all \({\widehat{\zeta }}\in Z\).

Then the sequence \((x^k)_{k\in \mathbb {N}}\) initialized by a point \(x^0\in W({\bar{\zeta }})\cap U\) with \(x^0_1\in \varOmega _1\cap U_1\) satisfies

$$\begin{aligned} \mathrm {dist}\left( x^{k+1},\mathsf {Fix}\,T_{\bar{\zeta }}\cap S\right) \le c\mathrm {dist}\left( x^k,S\right) \end{aligned}$$

whenever \(x^k\in U\) with \( c:= \sqrt{1+{\overline{\epsilon }}-\frac{1-\alpha }{\alpha {\bar{\kappa }}^2}} \) where \( {\overline{\epsilon }}:= \prod _{j=1}^m\left( 1+{\widetilde{\epsilon }}_j\right) -1\), \({\widetilde{\epsilon }}_j:= 4\epsilon _j\frac{1+\epsilon _j}{\left( 1-\epsilon _j\right) ^2}\), \(\alpha := \frac{m}{m+1}\) and \({\bar{\kappa }}:= \kappa \sigma \). If, in addition, \( {\bar{\kappa }}<\sqrt{\frac{1-\alpha }{{\overline{\epsilon }}\alpha }}, \) then \(\mathrm {dist}\left( x^k,\mathsf {Fix}\,T_{\bar{\zeta }}\cap S\right) \rightarrow 0\), and hence \(\mathrm {dist}\left( x^k_1,\mathsf {Fix}\,\mathcal {P}_0\cap S_1\right) \rightarrow 0\) at least linearly with rate \(c<1\).

Proof

This is a special case of [3, Theorem 3.2] when the sets are prox-regular.

Remark 23.6.2

Theorem 23.6.1 is rather long and technical at first sight, though the pieces are easily parsed. Equations (23.17)–(23.19) force the iterations to stay in specific neighborhoods. This is needed to apply Proposition 23.9 with the help of (i) to deduce pointwise almost averagedness of \(\mathcal {P}_0\) and likewise of \(T_{\bar{\zeta }}\). Assumptions (ii) and (iii) then yield metric subregularity of \(\varPhi _{\bar{\zeta }}=T_{\bar{\zeta }}-\mathrm {Id}\) by Proposition 23.11. This is where the construction in the product space comes into play. Working on \(\mathcal {E}^m\), we were able to use subtransversality to show metric subregularity of \(\varPhi _{\bar{\zeta }}\). It is worth mentioning that, until now, we were not able to show metric subregularity for the mapping directly associated to \(\mathcal {P}_0\). Adding assumption (iv) in Theorem 23.6.1 we can finally apply Theorem 23.4.1 and deduce convergence of \(T_{\bar{\zeta }}\) with the given constants. At this point the definition of \(T_{\bar{\zeta }}\) becomes crucial. Since the first iterate of the sequence \(x^k\) generated under the mapping \(T_{\bar{\zeta }}\) is nothing more than applying the method of cyclic projections \(\mathcal {P}_0\), convergence of \(x^k\) implies convergence of \(x^k_1\), that is, the sequence generated by cyclic projections. In [25] Luke et al. discussed the necessity of subtransversality for alternating projections to converge R-linearly.

7 Application to Phase Retrieval Algorithms

In Sect. 23.6 we have seen how to apply Theorem 23.4.1 on the method of cyclic projections. This section is devoted to the analysis of other well known algorithms which we introduced in Sect. 23.3.2. The analysis in Sect. 23.6 focuses on showing how to satisfy the assumptions of Theorem 23.4.1 in the context of set-feasibility. This section aims to provide a broad intuition of the convergence of projection based algorithms used to solve the phase retrieval problem. This explains also why the statements given next are presented in a cartoon-like manner. The statements include only the most important parts that yield local convergence, but not how to construct it nor at which rate. Nevertheless, these are verifiable by following the approach in Sect. 23.6.

Corollary 23.12

(convergence of the error reduction algorithm) Let \(\mathsf {Fix}\,\mathcal {P}_\mathfrak {S}\mathcal {P}_{\mathcal {M}_1}\ne \emptyset \). The error reduction algorithm, that is alternating projections as discussed in Sect. 6.2.3 on the sets \(\mathfrak {S}\) and \(\mathcal {M}_1\), converges locally linearly to a point \(\tilde{x}\in \mathsf {Fix}\,\mathcal {P}_\mathfrak {S}\mathcal {P}_{\mathcal {M}_1}\) whenever the mapping \(\varPhi =\mathcal {P}_\mathfrak {S}\mathcal {P}_{\mathcal {M}_1}-\mathrm {Id}\) is locally metrically subregular at its zeros.

Proof

Following Luke et al. in [32, Sect. 3.2.2], we represent \(\mathbb {C}\) as \(\mathbb {R}^2\) and reformulate the phase retrieval problem as a feasibility problem with entrywise values in \(\mathbb {R}^2\). Then this is an application of Theorem 23.4.1 using Remark 23.5.1.

Remark 23.7.1

In contrast to Theorem 23.6.1 metric subregularity is required directly in Theorem 23.12. Equivalently, we could demand subtransversality of the collection of sets \(\left\{ \mathfrak {S}, \mathcal {M}_1\right\} \) plus the additional assumption (iii) in Theorem 23.6.1. The problem here is, that, until now, it is not clear when and where these two assumptions are satisfied. Illustrative examples and numerical simulations indicate that they hold in many instances. Nevertheless, there are certain situations when at least one of the two assumptions is violated (see for instance [33]). Moreover, allowing metric subregularity under some gauge depicts the reality sometimes better than restricting the analysis to a linear setting. One example is the setting of alternating projections applied to the sphere \({\mathbb {S}}\) and a line tangent to \({\mathbb {S}}\) at \({\overline{x}}=(0,-1)\). In this instance the algorithm does not converge linearly to \({\overline{x}}\), although it converges depending on the initial point (see for instance [3]). This problem is not only interesting for the type of convergence, but also when it comes to the actual numerical implementation of algorithms. Although sets in real-life applications intersect tangentially on a set of measure zero, beyond a certain numerical accuracy the distinction between tangential intersection and linear convergence with a rate constant within 15 digits of 1 is rather academic. Having a relatively large gap between sets for inconsistent feasibility is in fact an advantage for the numerical performance of an algorithm.

Theorem 23.7.2

(convergence of Fienup’s HIO method) Let \(\beta _n=1\) for all n and \(\mathsf {Fix}\,\tfrac{1}{2}\left( \mathcal {R}_\mathfrak {S}\mathcal {R}_{\mathcal {M}_1}+\mathrm {Id}\right) \ne \emptyset \). The HIO algorithm, defined in (6.9) that is Douglas-Rachford as defined in (6.15) on the sets \(\mathfrak {S}\) and \(\mathcal {M}_1\), converges locally linearly to a point \(\tilde{x}\in \mathsf {Fix}\,\tfrac{1}{2}\left( \mathcal {R}_\mathfrak {S}\mathcal {R}_{\mathcal {M}_1}+\mathrm {Id}\right) \) whenever the mapping \(\varPhi =\tfrac{1}{2}\left( \mathcal {R}_\mathfrak {S}\mathcal {R}_{\mathcal {M}_1}+\mathrm {Id}\right) -\mathrm {Id}\) is locally metrically subregular at its zeros.

Proof

Since Fienup’s HIO for \(\beta _n=1\) for all n can be identified with the Douglas-Rachford method the result follows from [3, Theorem 3.4].

Even if one had an infinite detector, noisy measurements make the phase retrieval problem almost always inconsistent. It is easy to prove [8, Theorem 3.13] that, in this case, \(\mathsf {Fix}\,\tfrac{1}{2}\left( \mathcal {R}_\mathfrak {S}\mathcal {R}_{\mathcal {M}_1}+\mathrm {Id}\right) =\emptyset \) and so \(\varPhi \) does not possess zeros. Consequently, Fienup’s HIO algorithm cannot converge. To circumvent this problem, one can use a relaxed version of Douglas-Rachford, the relaxed averaged alternating reflections method (RAAR), that we introduced in Sect. 6.1.2 which is adapted to inconsistent feasibility.

Theorem 23.7.3

(convergence of RAAR) relaxed averaged alternating reflections. Let \({\overline{x}}\in \mathsf {Fix}\,T_{\scriptscriptstyle {RAAR}}\) for \(T_{\scriptscriptstyle {RAAR}}\) defined in (6.22). The relaxed averaged alternating reflections applied to a phase retrieval problem converges locally linearly to a point \(\tilde{x}\in \mathsf {Fix}\,T_{\scriptscriptstyle {RAAR}}\) whenever the mapping \(\varPhi =\frac{\lambda }{2}\left( \mathcal {R}_{\mathfrak {S}}\mathcal {R}_{\mathcal {M}_1}+\mathrm {Id}\right) +(1-\lambda )\mathcal {P}_{\mathcal {M}_1}-\mathrm {Id}\) is locally metrically subregular at its zeros.

A detailed proof of the convergence analysis for the relaxed averaged alternating reflection algorithm can be found in [33] by the authors of this chapter. There we use subtransversality of the collections of sets in general feasibility problems to make the connection to metric subregularity of the algorithm in question. The analysis does not use prox-regularity as the desired type of regularity for sets yielding the almost averaging property, but rather the property of being super-regular at a distance. This extends notions of regularity of sets to their effect on points that are not in the sets. Their definition is in line with \(\epsilon \)-subregularity and is thus connected to the analysis of [3].

Remark 23.7.4

In [33] we not only provided a convergence statement for the relaxed averaged alternating reflections method, but also gave a description of the fixed point set of the underlying mapping. For super-regular sets at a distance, the fixed points, if they exist, are either points in the intersection of both sets or relate to the local gap between these, if the intersection of the sets is empty. This result is in line with [11] where Luke studied the case of one set being convex and the other prox-regular. In contrast to the original Douglas-Rachford algorithm, the main advantage of the relaxed version is that existence of fixed points does not depend on whether the feasibility problem is consistent. Connecting this observation to the convergence analysis presented here, in practice the Douglas-Rachford/HIO is much less stable than the relaxed version.

Following the ideas above, it is not hard to show that most projection methods are pointwise almost averaged mappings when applied to the phase retrieval problem. Nonetheless, the property of metric subregularity is still an open problem in some important cases. Thus, local convergence can be easily verified, but it is hard to quantify.

Fig. 23.1
figure 1

Alternating projections applied to the data set “tasse” for full data (blue) and cropped data (orange). a Change in iterates until iteration 900. b Gap distance until iteration 50

8 Final Remarks

When it comes to computing (see Remark 23.7.1), whether a method converges, let alone determining the rate depends on the numerical precision. But also inconsistency has an impact on the numerical performance. Closely related to this, we want to stress another feature of the analysis surveyed here. That is, sometimes less information can lead to better performance of an algorithm. For a demonstration we analyze a data set recorded by undergraduates at the X-Ray Physics Institute at the University of Göttingen. It is an optical diffraction image with model constraints \(\sqrt{I}_{j , i}, ~j = 1, 2, . . . ,m\), as in (23.1) with \(m = 1\) and n the dimension of the image and additional support constraint. The full data set has dimension \(n=1392\times 1040\), the cropped data set \(n=128^2\). The graphs shown in Figs. 23.1 and 23.2 are produced by applying the alternating projection algorithm, i.e. error reduction, on the data sets individually. As it turns out alternating projections on the full data set (Fig. 23.2) shows a worse convergence behavior than the image with the limited data set (Fig. 23.1). Not only that the algorithm needs more iterations to reach a certain accuracy (\(9.8485\times 10^4\) instead of 666), but also the rate of linear convergence when the iterates reach a suitable neighborhood is worse. Noteworthy is the observed gap in both problem instances. In the full data set version the gap is smaller than in the version with a limited data set. We conjecture that this behavior is closely related to the property of metric subregularity, or in the context of set feasibility, subtransversality. The more, and better, information one has, the closer the constraint sets come to intersect. But this can included cases in which the sets intersect transversally as well. In cases like these the method of alternating projections does not have to converge locally linearly but can show a sublinear convergence behavior (see for instance [3, Remark 3.2]). The take home message in this context is that more information does not have to yield a better image when applying numerical algorithms. This is good news and bad news for these algorithms. The good news is that one can profit from implicit regularization with smaller problem sizes. The bad news is that this indicates a type of dimension dependence of these methods: the higher the dimension, the worse the constants in the linear convergence rates. This is not surprising and points to the need for models that lead to algorithms whose performance (that is, regularity) is dimension independent. While our discussion here focuses on the theoretical analysis rather than the comparison of the presented algorithms we point the reader to a study by Luke et al. [22], where the authors present a thorough review of first-order proximal methods for phase retrieval algorithms.

Fig. 23.2
figure 2

Change in iterates for full data set “tasse”