Denoising Poisson Phaseless Measurements via Orthogonal Dictionary Learning

Phaseless diffraction measurements recorded by a CCD detector are often affected by Poisson noise. In this paper, we propose a dictionary learning model by employing patches based sparsity to denoise Poisson phaseless measurement. The model consists of three terms: (i) A representation term by an orthogonal dictionary, (ii) an $L^0$ pseudo norm of coefficient matrix, and (iii) a Kullback-Leibler divergence to fit phaseless Poisson data. Fast Alternating Minimization Method (AMM) and Proximal Alternating Linearized Minimization (PALM) are adopted to solve the established model with convergence guarantee, and especially global convergence for PALM is derived. The subproblems for two algorithms have fast solvers, and indeed, the solutions for the sparse coding and dictionary updating both have closed forms due to the orthogonality of learned dictionaries. Numerical experiments for phase retrieval using coded diffraction and ptychographic patterns are performed to show the efficiency and robustness of proposed methods, which, by preserving texture features, produce visually and quantitatively improved denoised images compared with other phase retrieval algorithms without regularization and local sparsity promoting algorithms.


I. INTRODUCTION
I N lensless imaging techniques, only the magnitudes of Fourier transformation can be recorded due to physical limitation of detector technologies, and the central computational task, known as "Phase Retrieval", is to recover underling images from phaseless measurements.It is a challenging inverse quadratic problem.For noiseless case, related multivariable quadratic systems should be computed, the complexity of which is possibly NP-complete [1].In practice, the measurements are possibly contaminated by noise, and one shall seek an approximation solution to the reformulated noncovex optimization problem by maximum a posterior (MAP) of noise.
Numerically, researchers have developed various iterative algorithms involving with a nonconvex constraint set of magnitude, such as an alternating projection algorithm [2], [3] for the classical phase retrieval problem 1 .Several variant heuristic algorithms popular in the optics community including [4], [5], [6], [7] were proposed to solve the classical phase retrieval problems, and one can refer to [8], [9] and the reference therein.These methods are based on projections onto a nonconvex set, and it is therefore very difficult to describe and prove the convergence mathematically.Consequently, more recent work has focused on designing fast algorithms with convergence guarantee for such nonconvex minimization problem, e.g.[10], [11], [12], [13], [14], [15], [16], etc, with random-masking or oversampling measurements.In the optics community, in order to increase the imaging resolutions and robustness, a popular approach, known as ptychography [17], uses a sequence of phaseless diffraction measurements recorded while translating the specimen with respect to a constant illumination mask.In order to prevent trapping into a local minimum for solving conventional nonconvex projection or minimizing methods, another scheme is to model phase retrieval by convex methods including PhaseLift [18] and PhaseCut [19] based on semi-definite programming(SDP), as well as PhaseMax [20], [21] operating in the original signal space with much less computational cost compared with SDP based convex methods.
To the best of our knowledge, an early work for phase retrieval by employing sparse prior of underlining images is the SHRINK-WRAP algorithm [22], which was established by iteratively shrinking the size of support set for unknown objects.It has been successfully applied to several ground breaking X-ray Free Electron Lasers based experiments in [23], which also demonstrates that sparse prior should be considered for modeling phase retrieval problem in order to increase the robustness and accuracy.The L 1 norm based phase retrieval method with a compressibility constraint was proposed in [24], [25].SDP-based convex methods combined with L 1 regularization of the lifting matrix were proposed to solve a sparse phase retrieval problem in [26], [27].With prior knowledge of exact sparsity level(number of nonzeros), a sparse Fieup method [3] was proposed in [28].A generic twostep iterative scheme was proposed in [29], which consisted of updating support set, and using damped Gauss-Newton algorithm to solve a nonconvex optimization problem with updated support set.A two-stage solution technique was presented in [30] by first using arbitrary phase retrieval technique to recover sampling measurements and then compressive sensing method to recover the unknown signals.If the transform domain of the unknown objects are corrupted by additive noise, a probabilistic method based on the generalized approximate message passing algorithm was given in [31].Hard thresholding based truncated gradient iterations in [32] were employed in order to refine a sparse orthogonality-promoting initialization.An L 0 regularized variational model was established for sparse phase retrieval problem accompanied by an efficient algorithms based on adaptive-step alternating direction method of multipliers in [33].Under the assumption of objects possessing a sparse representation in the transform domain, shearlet and total variation based regularization methods were considered in [34], [35].Dictionary learning methods were proposed to reconstruct the image in [36], [37] from Gaussian noised measurements with outliers, where Gaussian distributions only provide a limited approximation in most real applications.
Roughly speaking, the intensity is recorded by a detector for a limited time, which is usually affected by Poisson noise.For linear measurements, in order to remove the Poisson noise, one can adopt a variance stabilizing transformation such as the Anscombe root transformation in [38], [39], where a pointwise nonlinear transformation was introduced such that one just needs to deal with Gaussian data.When the peak level decreases, this transformation is less efficient.One should directly tackle it and total variation based regularization based method was proposed in [40], where Kullback-Leibler (KL) divergence was derived based on the Bayesian framework in the case of Poisson noise.In a similar manner, for phaseless nonlinear measurements of phase retrieval, variational methods based on Tikhonov and total variation regularization were proposed to recover images from Poisson measurements in [41], [42].The reconstructed images shown in [41] exhibit sharp edges and clean background for very noisy and limited data.However, as such method was built upon the localsparsity regularization term, there exist visible staircase artifacts, and the smaller repetitive features can not been preserved particularly for measurements generated with deterministic illumination in ptychographic phase retrieval problem.In order to further increase the accuracy and visual quality of recovery results for practical applications, one possible direction is to exploit the redundancy between overlapping image patches to boost the sparsity, such as using dictionary learning based denoising method [43], [44], [45], which pursues a sparse representation by a dictionary learned from image patches.
In this paper, we propose to build up a novel dictionary learning model to reconstruct high quality images from Poisson phaseless measurements, which can deal with both real and complex-valued images.It consists of three terms, i.e. a representation terms by an orthogonal dictionary, an L 0 pseudo norm of coefficient matrix, and a Kullback-Leibler divergence to fit phaseless Poisson data.The pseudo L 0 norm is used as a sparse regularization of coefficient matrix with both isotropic and anisotropic forms.Meanwhile, fast algorithms are designed with global convergence guarantee in order to solve the proposed nonconvex nondifferential optimization model.Since an orthogonal dictionary is employed, subproblems w.r.t. the dictionary and coefficients both have close-form solutions.Numerous experiments demonstrate for coded diffraction and ptychographic patterns are performed to show the efficiency of our proposed methods, which demonstrate our proposed methods can recover images with sharp edges, clean background compared with the phase retrieval methods without any regularization.Moreover, they are capable of producing higher quality results by preserving the texture features compared with total variation based method [41].Additionally, it also shows that the proposed method helps to further increase the quality for reconstructing complex-valued images by unitizing the introduced anisotropic L 0 pseudo norm.
The rest of this paper is organized in the following way.The proposed dictionary learning model is given in section II.We first propose an alternative minimization method in section III by alternately minimizing the coupled variables to compute the proposed model.Then a proximal alternating linearized minimization method is further designed in section IV with theoretical analysis of global convergence.Numerous experiments are performed in section V to demonstrate the efficiency of the proposed method.We conclude the paper in Section VI.

II. PROPOSED MODEL
We consider phase retrieval for a 2-dimensional (2D) image in a discrete setting, i.e., an underlying object u in which we represent a 2D object with resolution n 1 × n 2 in terms of a vector of size n by a lexicographical order.For classical phase retrieval problem, measured data are only the magnitudes of the discrete Fourier transform of u, i.e., |Fu| 2 , where | • | 2 denotes the pointwise square of the absolute value of a vector, F : C n → C n denotes the discrete Fourier transform (DFT).In this paper, we generalize it to a general phase retrieval problem [18], i.e.
where A : C n → C m is a linear operator in the complex Euclidean space and b : Ω Assume that in the best scenario, photon counting procedures to collect the measurements b are contaminated by Poisson noise, i.e. for noisy measurements f ∈ R m + , the intensity values located at each position are independent and identically distributed (i.i.d.) random variables, and the probability can be written as with the ground truth b ∈ R m + as the mean value and variance.We will first give a brief review of dictionary learning based denoising, and regularized methods for denosing noisy phaseless measurements contaminated by Poisson noise in the following subsection.
A. Review of Dictionary Learning and Regularized Poisson Denoising for Phase retrieval 1) Dictionary Learning and Denoising: , each of whose columns represents a 1-D signal, the target of dictionary learning (or sparse coding) [46] is to find a "sparse" linear presentation for the matrix Y by a learned dictionary D = [D(0), D(1), • • • , D(c − 1)] ∈ C l,c , i.e.Y ≈ Dα, such that the coefficient matrix α has more zeros ("sparse") entries.Obviously, the dictionary D and coefficient matrix α ∈ C c,d are coupled and both needed to be determined in order to derive a sparse representation.Roughly speaking, it can be reformulated as the following minimization problems where • denotes the Frobenius norm and L 2 norm for a matrix and vector respectively, • 0 denotes the L 0 pseudo norm for counting the number of nonzero elements, τ is a positive constant to balance the data fitting term (first term) and the regularization term (second term), and the constraints guarantee that the dictionary is normalized.
The well known work named K-SVD denoising [43] is to first produce redundant signals by sliding a small window over an entire image, and then to solve a sparse coding problem by K-SVD in [47].The K-SVD denoising model can be established as follows, where one redundantly selects image patches by the linear selection operator to build totally d atoms of the underlying image u such that it can be sparsely represented by D with the coefficient matrix α, f is the noisy image, and D ∈ C l,c is a normalized dictionary.The variational methods combining K-SVD were widely applied to different image processing tasks, especially for Poisson denoising [45] and deblurring [44].In [44], the data fitting term in the aforementioned model is replaced with Kullback-Leibler divergence by maximum a posterior (MAP) estimate of Poisson noise, and additional total variation regularization term was also employed.
2) Regularized Poisson Denoising for Phase Retrieval: Thibault and Guizar-Sicairos [42] first proposed to solving the following minimization problem in coherent diffractive imaging with Tikhonov regularization min u∈C n λ ∇u 2 + B(|Au| 2 , f ), where the Kullback-Leibler divergence following Bayesian framework by MAP of Poisson noise is denoted as follows, where ∇ denotes the gradient operator and A represents the masked Fourier transformation generated by illumination masks.Consequently, significant gains in reconstructed accuracy and sensitivity to noise were achieved compared with the model without any regularization.In order to further increase the quality (e.g.sharpness of edges) of recovery images, Chang et al. [35] established total variation regularized model for very general phase retrieval tasks with TV as the total variation, which was successfully applied to denoise phaseless measurements for phase retrieval with coded diffraction, holographic and ptychographic patterns for both the real and complex-valued images.
B. The proposed dictionary learning phase retrieval (DicPR) for Poisson denoising In this paper, we consider the noisy phase retrieval problem in which the measurements f = Poisson(|Au| 2 ) ∈ R m in (1) are corrupted by Poisson noise, and establish a minimization problem driven by the patch based sparsity [43], [44], [45] of the underlying images, referred to as "DicPR", which reads, where u is an underlying image that we want to reconstruct from magnitude data f , D ∈ C l,c is a learned orthogonal dictionary as [48] which combines the sparse coefficients α by the L 0 pseudo norm denoted by • 0 to represent the underling image sparsely, τ and η are two parameters to balance the sparsity and data fitting terms and the identical matrix is denoted as I, B(•, •) is defined in (2).We rewrite the model (3) as with and related indicator function is introduced as I K .We introduce two different definitions of α 0 for complex-valued α with isotropic and anisotropic L 0 pseudo norms as follows: Isotropic form: Anisotropic form: where and denote the real and complex parts of some complex-valued matrix.One readily has α iso 0 = α aniso 0 if α ∈ R c,d .For complex-valued matrix, it is quite different, and one can see the advantage by using the anisotropic form in the numerical experiments of this paper.Hereafter, for simplicity α 0 represents the isotropic version pseudo norm if not specified.
Remark II.1.In this paper, we propose to learn an orthogonal dictionary such that the subproblems for solving the proposed model involving with D and α both have closed forms.In [48] it was demonstrated that the denoising results with orthogonal dictionary were comparable with nonorthogonal ones.In our experiments, we set D to a square matrix, i.e. l = c.
Remark II.2.Tillmann, Eldar, and Mairal [36] proposed the following dictionary learning based model to denoising Gaussian noised measurements for real-valued images Qiu and Palomar [37] generalized it to complex-valued images with different data fitting terms for the amplitude i.e.
The existence of the minimizer for the proposed model ( 3) can be readily obtained, and we just list it without proof.
Due the the existence of the data fitting term for phase retrieval and L 0 regularization of the coefficient matrix, the proposed model is a nonconvex and discontinuous optimization problem, which is very challengeable for designing an algorithm with theoretical convergence guarantee.In the following sections, we first give a simple Alternating Minimization Method (AMM), and only local convergence can be obtained.Then a global convergent algorithm called Proximal Alternating Linearized Minimization method (PALM) will be provided.

III. ALGORITHMS I: ALTERNATING MINIMIZATION METHOD (AMM)
Since three variables u, D, α of (3) are coupled, in a natural way we use AMM to solve the problem.If giving the approximation solution (u k , D k , α k ), the overall iterative algorithm consists of three steps w.r.t. the three coupled variables as Readily one can know that there exist at least a solution to each subproblem.In the following subsections, we will present how to solve these subproblems.For simplicity we omit all the superscripts of the notations in (6).

A. Subproblem w.r.t. variable u
We concentrate on the first subproblem w.r.t.variable u in this subsection.By introducing an auxiliary variable z, we have with An alternating direction of multiplier method (ADMM) is adopted to solve the above optimization problem similar to [35].The corresponding augmented Lagrangian reads where (•) denotes the real part of a complex-valued number, r is a positive parameter, • denotes the inner product of two vectors.The ADMM is designed as below, starting from the previous iterative solution (u j , z j , Λ j ).Following the framework in [41], one can readily obtain that the closed form of the first subproblems of (10) as with v j = z j + Λ j /r, and We can simplify the solution of above subproblem if the matrix A involves Fourier measurements with masks {I k } K−1 k=0 as where • denotes the pointwise multiplication, I k is a masked matrix indexed by k, each of which is represented by a vector in C n in a lexicographical order.Therefore, we have which is a real-valued matrix.Finally we derive the solution if the diagonal matrix A * A + W is non-singular.
Remark III.1.One can also use a gradient descent type algorithm with adaptive steps as Wirtinger flow [10], Newton type algorithm [49] to solve it if the number of measurements are sufficient.It seems that ADMM can work pretty well with very few measurements in [41], and has low computation cost compared to Newton type algorithm.

B. Subproblem w.r.t. variables D and α
For the second subproblem of (6) w.r.t.D, we have The minimizer D has a close form [48] as where R(u)α * = U ΣV * , with corresponding eigenvectors U, V and singular values Σ by singular value decomposition (SVD).
For the third subproblem of (6) w.r.t α, we have For the isotropic L 0 norm, it has a close form solution known as "hard thresholding" as below, where the hard thresholding Thresh τ (α) is defined as For the anisotropic case, similarly, one readily obtains the close form solution by separating the real and imaginary parts (21) Therefore, we can ready to list an overall algorithm for our proposed model (3) as follows: Algorithm I: AMM for "DicPR" (3) 1. Initialization: Initialize D 0 by a discrete cosine dictionary, α 0 = 0 and k = 0, and parameter τ, η. 2. Solve u k+1 by Algorithm I-I with Y := D k α k .3. Solve D k+1 by (18) where u := u k+1 and α := α k .4. Solve α k+1 by (19) or (21) where u := u k+1 and D := D k+1 . 5. If some stopping condition is satisfied, stop the iterations and output the iterative solution; else set k = k + 1, and goto Step 2.
Remark III.2.In order to accelerate the computation, in the first few iterations (within three iterations in the numerical experiments), we use a truncated version for (18) where only parts (one half in our numerical examples) of the all columns of U and V are selected corresponding to the larger singular values.
One can readily conclude to the decrease of the functional values of iterative sequences, and we list it as below without proof, which can be directly obtained by (6) following Algorithm I.
Lemma III.1.The objective functional values of the iterative sequences generated by Algorithm I are non-increasing, i.e.
We give an assumption of operator A as follows.
Assumption 1.The sequence {u k } ∞ k=0 is bounded if and only if {|Au|} ∞ k=0 is bounded.Remark III.3.In order to guarantee the uniqueness of the solution for phase retrieval, the oversampling or multiple measurements are required.As a result, it make the above assumption available.For example, for the coded diffraction patter and ptychographic pattern, A * A is a diagonal matrix as (13).If we assume that the diagonal elements are all nonzeros, Assumption 1 holds.In practise, the matrix by the orthogonality of the dictionary, which leads to the boundedness of {α k }.Proof: Since {Z k } is bounded, there exists a subsequence {Z kn } ∞ n=0 ⊂ {Z k } with the limit point Z = (ũ, D, α), s.t.lim n→∞ Z kn = Z.Therefore by Lemma III.1, we conclude to the theorem.
Here we only derive the local convergence, and more future work should be done in order to investigate the convergence of global minimizer.

IV. ALGORITHM II: PROXIMAL ALTERNATING LINEARIZED MINIMIZATION (PALM)
In order to guarantee the global convergence, we propose a proximal alternating linearized minimization method (PALM) to solve the established dictionary learning model (3) following the multi-block splitting algorithm proposed in [50].First we give the derivative of H(u, D, α) as below: The PALM is given as with with with with three positive steps c k , d k and e k .For u−subproblem in (23) for PALM, one can readily give the ADMM following Algorithm I-I.Only the solver for u in Step 2 has slightly difference, and we directly list Algorithm II-I as follows.
For the D−subproblem in (24) for PALM, we need to solve the following problem as One can readily derive the closed form for it as with For the α−subproblem in (25), one can directly get the closed form represented by the hard thresholding as and ) for the isotropic and anisotropic L 0 pseudo norms respectively.
Therefore we can give an overall PALM for our proposed model (3) as below.In the following parts, we will provide convergence analysis for PALM.For simplicity, the analysis is conducted for realvalued images, and it can readily generalized to the complexvalued images, where one only needs to rebuild the related norms and operators of variables on their corresponding real and imaginary parts respectively as in [41].
The existence of the critical point of proposed model is given below.
Lemma IV.1.The critical point of (8) exists with the finite functional value.
Proof: Following Lemma 5 in [50] and the lower semicontinuity of F, I D and • 0 , one can readily prove it, and details are omitted.
We need the assumption of the boundedness of u k as follows.
Based on above assumption, the boundedness of the iterative sequences (u k , D k , α k ) can be proved, and see details in Lemma A.1 in the appendix.The proof for the first item is trivial.By (22) and Assumption 2, one can readily finish the proof for the left parts based on the boundedness of iterative sequences proved by Lemma A.1 in the Appendix.

Lemma IV.2 (Lipschitz-Gradient
For the PALM, we assume the step sizes should be large enough and the following assumption is needed. Assumption 3 (Step Sizes).
Then we can consider the global convergence of PALM.Readily one knows that Υ(u, D, α) is semi-algebra [50], [51] with the data term B(•, •) in ( 2), and finally we can get the final convergence theorem as follows.
Theorem 3. Let Assumption 2, 3 hold and e k > 1/2.The sequence {u k , D k , α k } generated by Algorithm II globally converges to a critical point of the proposed model (4).
Proof: Based on Lemma A.2 and Lemma A.3 in the Appendix, one can finish the proof following Theorem 1 in [50].
Remark IV.1.Although in this paper, we focus on the orthogonal dictionary, the proposed PALM can also be applied to the case with non-orthogonal dictionary, and further speedup of PALM should be investigated as in [51].We leave them as future work.
At the end of this section, we analyze the computational complexity of Algorithm I and Algorithm II, which have similar cost.Assume that the operator A generates Fourier masked measurements which means fast Fourier transformation can be adopted, computation complexity for Algorithm I-I and Algorithm II-I is of O(n log(n)T in + ln) after T in inner iterations.The complexity for Step 3 and 4 is O(l3 + l2 d).
Hence the complexity of Algorithm I and Algorithm II is O (n log(n)T in +ln+l 3 +l 2 d)T out after T out outer iterations.
In our experiments, we set d ≈ n, l n and as a result the complexity is about O n(log(n)T in + l 2 )T out .

V. NUMERICAL EXPERIMENTS
All the tests are performed on a laptop with Intel I7-5600U2.6GHZ,and 16GB RAM, and the codes are implemented in MATLAB.The dictionary is initialized by discrete cosine transform.The model parameters η and τ for Algorithm I and Algorithm II are selected by hand.In the inner iterative algorithm e.g.Algorithm I-I and Algorithm II-I, we set defaulted parameter r = 1 × 10 −3 , and the defaulted inner iteration number to five.For the left parameters for Algorithm II, one can use dynamic schemes to update the parameters as [50], [51].In our experiments, for simplicity, these parameters are set to be fixed.We stop Algorithm I and Algorithm II after a given maximum outer iteration number T to guarantee the convergence, which will be specified in the following subsections.All the other needed parameters will also be addressed in the following subsections if we do not give or use the defaulted values.
Set image patch size to 8 × 8 empirically, and hence l = c = 64.If the patch sizes are too large, the computational cost increases dramatically based on the given complexity analysis.The number of patches d = ( √ n − 7) 2 for square images with n pixels.In this paper we have given a framework for phase retrieval with arbitrary linear operator A. However, it it more practical to consider Fourier type transformation, and we will show the performance on Fourier masked measurements involved with two types of patterns for linear operators A: Coded diffraction pattern (CDP) with random masks and ptychographic phase retrieval with deterministic masks generated by zone plate lens.Especially, ptychographic phase retrieval is a very promising technique to generate high resolution images with large field of view compared with the traditional diffraction imaging and meanwhile it requires less temporal and spatial coherence, while the related phase retrieval problem is more challengeable.
The ground truth images are provided in Fig. 1, where four real-valued images with resolution 512 × 512 are put in Fig. 1(a)-(d), and a complex-valued image with resolution 256 × 256 is put in Fig. 1(e)-(g).Given a ground truth u, the noisy measurement is generated as f (j) = Poisson(|(Au δ )(j)| 2 ) ∀j ∈ Ω, with u δ = δu at peak level 2 δ.We measure the quality of the reconstructed image ũ by signal-to-noise ratio (SNR) 3 with the ground truth image u.In order to measure the sparsity of coefficient matrix α ∈ C c,d or R c,d , we introduce the We compare our proposed methods with ADMM for phase retrieval algorithm without any regularization [9] and the total variation (TV) based Poisson noise removal method [41].Denote ADMM for phase retrieval [9], TV based Poisson denoising method [41], our proposed Algorithm I and Algorithm II with isotropic L 0 norm by "PR", "TVPR", "ALGI" and "ALGII", respectively.We denote Algorithm I with anisotropic L 0 term for complex-valued images by "ALGI aniso ".Since our proposed algorithms solve the nonconvex optimization problem, they are initialized by the results of "PR" in order to increase the robustness and convergence speed.

A. CDP for real-valued and complex-valued images
For the first pattern, the octanary CDP is explored, and specifically each element of I j in ( 12) takes a value randomly among the eight candidates, i.e., Set the number of masks K = 2, 4 as [41] for real-valued and complex-valued images, respectively.
1) Real-valued Images: We will show the performance of our proposed algorithms from noisy measurement on four realvalued images shown in Fig. 1 Reconstructed results with different noise levels are put in Fig. 2-Fig.3, and zoom-in parts of corresponding results are put in Fig. 4-Fig.5. Readily one can see that "PR" generates very noisy results by observing the first rows of Fig. 2-Fig.3 and the second rows in the zoom-in results of Fig. 4-Fig.5, where the edges, background and the repetitive structures are contaminated severely.By TV regularization method and our proposed dictionary learning methods, all the recovery images have sharper edges and cleaner background.However, by observing the zoom-in results in the third rows of Fig. 4-Fig.5, "TVPR" generates images with visible stair case artifacts, and some important texture information can not be kept, while "ALGI" and "ALGII" can both produce high quality images and one does not find any visible staircase artifacts in the zoom-in parts.By observing Fig. 4(m), (q) or Fig. 5(m), (q), "ALGI" and "ALGII" generate cleaner background than "TVPR".Moreover, by Fig. 4(n)-(p), (r)-(t) or Fig. 5(n)-(p), (r)-(t), the textures are preserved pretty well compared with the results by "TVPR".In order to qualify the improvement of our proposed methods, the corresponding SNRs are put in Table I and Table II, where the maximum in the same row is marked in bold font.Inferred from these two tables, SNRs by "TVPR", "ALGI" and "ALGII" are about double of those by "PR", and SNRs by "ALGI" and "ALGII" increase averagely about 1.5dB and 2dB, respectively.The increase or improvement with the proposed methods compared with "PR" and "TVPR" is more obvious on the image with   more texture information e.g."Fingerprint" or "Barbara" than that e.g."Peppers" with piecewise smooth features, which demonstrates that its advantage is to handle images with sophisticated texture features.
By observing the zoom-in parts, the results in the fourth rows of Fig. 4 and Fig. 5 seem more smoothing by "ALGI" than those in the fifth rows by "ALGII".On the other hand, the results by "ALGII" contain a bit more features than by those by "ALGI".In order to investigate the performance differences of "ALGI" and "ALGII", the sparsity of the coefficient matrix α is put in Table III, and one can readily see that the coefficient matrix is sparser by "ALGI" than by "ALGII", which possibly leads to those above behaviours.Anyway, both two algorithms produce comparable results with cleaner background and wellpreserved textures.Hereafter, we only provide the results by "ALGI" since it has fewer parameters and seems more suitable for practical use.
2) Complex-valued Image: More experiments on the complex-valued image "Goldballs" in Fig. 1 for CDP with peak level δ ∈ {0.08, 0.1} are performed, and please see the reconstructed results in Fig. 6.Notice that for complex valued images, we introduce two kinds of definitions for L 0    pseudo norm in (5), and therefore we have two versions of PALM, i.e. "ALGI" and "ALGI aniso ".Set the parameters η = 3.0 × 10 −5 , τ = 1.0 × 10 −3 for "ALGI" and η = 2.5 × 10 −5 , τ = 8.0 × 10 −4 for "ALGI aniso ".Set iteration number in the outer loop as T = 50.By observing Fig. 6, from noisy measurements one can only derive noisy results by "PR".By "TVPR", "ALGI" and "ALGI aniso ", the recovery results are almost noise free and also have very clean background."TVPR" can only recover the large scale structures at the left top corner, but can not keep the smaller repetitive structures.Obviously our proposed methods can produce better results, where both the large and small structures can be preserved well.Related SNRs are put in Table IV increases are gained by "ALGI", and "ALGI aniso " compared with "TVPR".One also notices that "ALG aniso " produces the recovery results with higher accuracy than "ALGI".In order to investigate the improvements by the anisotropic norm, we show the learned dictionaries in Fig. 7, and the sparsity of coefficient matrix α in Table V. Especially in Fig. 7(d) and (h) the imaginary parts of learned dictionary by "ALGI aniso " seem to have more features than those in Fig. 7(b) and (f) by "AlGI".Meanwhile the real parts of learned dictionaries by the anisotropic version algorithm are quite close to the imaginary parts, which is consistent with the similarity of structure of real and complex parts of ground truth in Fig. 1(f) and (g).Therefore, it produces better dictionaries by the anisotropic style in (5).As a result, the sparsity is strengthened, and one can observe that the sparsity levels S( (α)) are much smaller by "ALGI aniso " than those by "ALGI" in Table V.It demonstrates that anisotropic version algorithm can help to improve the image qualities for complex-valued images.B. Ptychographic phase retrieval (PtychoPR) for complexvalued image 1) Fixed Sliding Distance: For PtychoPR, a zone plate lens and responding illumination mask are employed as [13].VI.Inferred from Fig. 8(a) and (e), it seems very blurry in the results of "PR" from noisy measurements, since the phaseless data are generated by structured deterministic illumination, and corrupted low frequency parts are worse than high frequency parts.By "TVPR", at higher noise level δ = 0.2 in the first row of Fig. 8, it can produce results with sharp edges for large scale features and clean background, but can not preserve the smaller features at all.For the case with peak level δ = 0.5, "TVPR" can produce pretty good recovery results for both smaller and larger scale features.The proposed "ALGI" and "ALGI aniso " can recover the smaller features very well especially at peak level δ = 0.2.Inferred from Table VI, SNRs are increased about 3.5dB, 4dB for "ALGI" and "ALGI aniso ", respectively compared with "TVPR" at peak level δ = 0.2; SNRs are increased about 1.5dB, 1.7dB for "ALGI" and "ALGI aniso " respectively compared with "TVPR" at peak level δ = 0.5.Such gains in SNRs implied our proposed algorithms can produce more accuracy results.Similarly to the previous subsection for complex-valued image, the anisotropic version algorithm "ALGI aniso " has better performances than isotropic version algorithm.2) Variable Sliding Distance: In order to further study the robustness of the proposed algorithms, more experiments with different sliding distances (collecting less data by increasing the sliding distances) are done.Set peak level δ = 0.2.Same parameters as the previous tests are used.Performances for PtychoPR with different number of measurements by increasing SlidDist are shown in Fig. 9, where m/n = 12.25, 9, and 7.56 when the sliding distances SlidDist = 18, 20, 22 respectively.On can see that in the first column of Fig. 9, the results of "PR" are not only blurry, but also contain some visible structured artifacts."TVPR" can remove such artifacts, and recover some edges of large scale feature.Our proposed algorithms can further recover most of smaller features in Fig. 9(c) and (d).In an extreme case with SlidDist = 22 shown in the third row of Fig. 9, the proposed algorithms can also recover some parts of smaller features.We also put the corresponding SNRs in Table VII, where one can see the obvious gains at about 1.3dB, 1.6dB for "ALGI" and "ALGI aniso " respectively compared with "TVPR".

C. Convergence Study
To check the convergence of proposed algorithms, we monitor the histories of SNRs and the successive errors of iterative solution u k and D k w.r.t. the iteration number k, which are defined as . We show the histories of errors and SNRs in Fig. 10    which is consistent with the provided theories.It seems that the dictionary converges fast by "ALGII" than by "ALGI".By inferred from Fig. 10 (c) and (f), SNR first increase, and then get stable, which demonstrates that the proposed algorithms are quite robust.Moreover, when the noise level increases, more iterations are needed.It is very interesting and important to give an optimal iteration condition, and we leave it as a future study.

VI. CONCLUSION
We propose a novel orthogonal dictionary learning based model to denoise the phaseless measurements contaminated by Possion noise.Meanwhile, two efficient algorithms are designed, i.e. locally convergent AMM and globally convergent PALM.The experiments demonstrate that both two algorithms are capable of producing higher quality results, especially preserving the texture features compared with the TV denoising method.An incoherent dictionary based method and further speed up of the proposed algorithms should be investigated in the future.

APPENDIX
One can obtain the following lemma to guarantee the boundedness of iterative solutions for Algorithm II immediately.By (31), |1− 1 e + < 1, and finally one obtains the boundedness of {α k }.
One can also readily estimate the low bound of the subgradient of H as follows.
Lemma A.3.Denote three quantities for the subgradient as . and with λ := max k max{c k−1 , d k−1 , e k−1 , λ + D , λ − D , λ + α }.Proof: Eqn.(32) can be readily proved by computing the first order optimal condition of ( 23) - (25).We just estimate the bounded of A k D as a example, and it is similar for the other two.and A k α ≤ (e k−1 + λ + α ) α k−1 − α k .By summing up the above three estimates, pwe conclude to this lemma.

Fig. 7 . 6 .
Fig. 7. Learned dictionaries for CDP corresponding to the results in Fig. 6.First row: δ = 0.08; Second row: δ = 0.1.From left to right: Real and complex parts of learned dictionaries by ALGI in the first two columns, and by ALGI aniso in the third and fourth columns.

TABLE III SPARSITY
S(α) FOR CDP ON REAL-VALUED IMAGES.

TABLE V SPARSITY
FOR CDP ON COMPLEX-VALUED IMAGES WITH DIFFERENT L 0 PSEUDO NORMS.

TABLE IV SNRS
OF CDP FOR COMPLEX-VALUED IMAGE

TABLE VI SNRS
OF PTYCHOPR FOR COMPLEX-VALUED IMAGE CORRESPONDING TO FIG.8

TABLE VII SNRS
OF PTYCHOPR WITH DIFFERENT SLIDING DISTANCES FOR COMPLEX-VALUED IMAGE CORRESPONDING TO FIG.9(a)