Superiorization of EM Algorithm and Its Application in Single-Photon Emission Computed Tomography(SPECT)

In this paper, we presented an efficient algorithm to implement the regularization reconstruction of SPECT. Image reconstruction with priori assumptions is usually modeled as a constrained optimization problem. However, there is no efficient algorithm to solve it due to the large scale of the problem. In this paper, we used the superiorization of the expectation maximization (EM) iteration to implement the regularization reconstruction of SPECT. We first investigated the convergent conditions of the EM iteration in the presence of perturbations. Secondly, we designed the superiorized EM algorithm based on the convergent conditions, and then proposed a modified version of it. Furthermore, we gave two methods to generate desired perturbations for two special objective functions. Numerical experiments for SPECT reconstruction were conducted to validate the performance of the proposed algorithms. The experiments show that the superiorized EM algorithms are more stable and robust for noised projection data and initial image than the classic EM algorithm, and outperform the classic EM algorithm in terms of mean square error and visual quality of the reconstructed images.


Introduction
Single-photon emission computed tomography (SPECT), which can visualize the physiological information of various organs with the help of radiopharmaceuticals [1,2,3], a biochemical molecule labeled with radioactivity. The gamma-rays emitted by the injected radioactive material are recorded by a gamma camera rotated around the patient. The goal of SPECT is to reconstruct the radionuclide distribution from the measurements numerically. The variety of existing SPECT reconstruction algorithms can be split into a family of analytical methods [4,5,6,7,8] and a wide class of iterative techniques [9,10,11].
From the analytic point of view, the SPECT reconstruction problem [3,4,7] is to invert the attenuated Radon transform(aRt) of f (distribution of radiopharmaceutical) where µ is a known function, referred to as the attenuation map of gamma-rays, θ = (cos ϕ, sin ϕ) and θ ⊥ = (− sin ϕ, cos ϕ). In practice, f and µ are two functions with compact support Ω. Therefore, the integrand in (1) is zero outside a bounded interval, and this integral is written over (−∞, +∞) for convenience. For the iterative methods, the SPECT reconstruction problem is to solve the following linear system [9], where the elements of the observed data b = (b 1 , b 2 · · · , b M ) t ∈ R M , the unknown image x = (x 1 , x 2 · · · , x N ) t ∈ R N and the system matrix A = (a ij ) ∈ R M ×N are all nonnegative.
Here and in the following, the superscript t denotes the transpose of vector. The aim is to reconstruct the unknown x as an image from the projection data b via stable algorithms. A solution is not feasible with conventional methods directly because of the noisy projection data b, the ill-posedness and large scale of the problem. For the SPECT reconstruction, the system matrix A not only can model the attenuation of the gamma-rays, but also can fuse some realistic factors, such as photon scattering and camera blurring. The expectation maximization (EM) algorithm [9,12] and the algebraic reconstruction technique (ART) [13,14] are two widely used technologies in imaging sciences, due to their simplicity, efficiency and performance. In practice, the EM algorithm is more appropriate for emission tomography including SPECT. Firstly, the EM algorithm maintains the nonnegative constraint in the iteration procedure. Secondly, the EM algorithm is relatively robust against data inconsistencies introduced by Poisson noise, because it seeks to minimize the Kullback-Liebler(K-L) distance between the measured data b and the projection of the estimated image Ax, which is equivalent to maximizing the likelihood of Poisson distribution.
For SPECT reconstruction, it is one of the main issues to estimate the radionuclide distribution from low-counts projection data. This issue occurs quite frequently because of practical constraints, such as imaging hardware and scanning geometry. Furthermore, in order to reduce acquisition time, radiation dose and imaging cost efficiently, we should decrease the counts of the projection data. However, this would cause the strong deterioration of the observed data and the under-determinacy (m n) of the linear system (2). In these situations, the reconstructed images by the EM algorithm are usually dominated by various distortions, because the EM algorithm accepts any solution which minimizes the K-L distance and depends on the initial point.
The qualities of the reconstructed images can be improved by regularization reconstruction methods. Regularization methods are often used techniques to improve the quality of reconstructed images. There are two regularization reconstruction models: unconstraint optimization [15,16,17,18,19,20] and constraint optimization [21,22,23] where φ is a convex function, representing the prior knowledge. D(·, ·) denotes a distance function, such as l 2 and K-L distances. The parameters λ, are nonnegative and related to the noise level. The lower the noise level is, the smaller the parameters are. For the first optimization problem (3), there are different algorithms [17,18,20,24]. In this paper, we focus on the second optimization problem (4). To our knowledge, there is no optimal algorithm to solve the constraint optimization problem (4) efficiently due to the large scale. In this paper, we resort to an emerging approach called superiorization [25] to implement the regularization reconstruction of SPECT.
The superiorization of iterative methods, which was first proposed by the authors of [21], is a relaxation technology for the constrained optimization problem. The superiorized algorithm lies between the feasibility-seeking algorithms, which seek a feasible point in the constrained set, and the optimal algorithms, which seek the minimum point of objective function in the constrained set. The aim of superiorization algorithm is to look for a superior instead of the optimal point of the objective function or just a feasible point in the constrained set. The basic idea of superiorization is to do the feasibility-seeking algorithms with perturbations about the objective function.
The superiorization of ART algorithms has been studied and applied to the regularization reconstruction of computed tomography(CT) [21,22,25,26,27]. The authors of [21,26] first investigated the convergence of the two variants of ART under summable perturbations for consistent case. For inconsistent case, the authors of [27] proved the convergence of symmetric version of ART in the presence of summable perturbations. The superiorization of the EM algorithm was firstly proposed in [28], and applied to bioluminescence tomography. However, to our knowledge, it is still an open problem about the convergence of the superiorized EM algorithm.
In this paper, we first discussed the convergence of the EM algorithm in the presence of perturbation in section 2. A so-called bounded perturbation resilient (BPR) property of ART is vital in the proof of convergence for the perturbed version of ART. However, we cannot prove the BPR property of the EM iteration so far, because of the nonlinearity of the EM operator. Therefore, we investigated the convergence of the perturbed EM algorithm under the following assumptions. Firstly, the perturbations should maintain the positivity of iterations. Secondly, the perturbations should go to zero with the increase of iterates. Lastly, the perturbed EM iteration should gradually decrease the K-L distance between the observed data and the projection of estimation by each iteration.
Based on the convergent conditions, we presented the superiorized EM algorithm and its modified version in section 3. Furthermore, practicable techniques were given to produce ideal perturbations for two special objective functions, total variation(TV) [29] and l 1 -norm [30], widely used in imaging sciences. While the proposed algorithms are applicable to diverse inverse problems, in this paper we restricted ourselves to demonstrate its usefulness to the SPECT reconstruction. Numerical results for SPECT reconstruction were given in section 4. As our expectation, the superiorization algorithms output superior image comparing with the classic EM algorithm in terms of MSE and visual quality. Some conclusions and discussions were given in section 5.

Perturbations Resilience of EM Iteration
For the sake of reference, we first introduce some notations and assumptions. In this paper, an image x is described as a vector of length N with individual elements x j , j = 1, 2, · · · , N . When it is necessary to refer to pixels in the context of a two-dimensional (2D) image we use the double subscript form x p,q , where j = (q − 1)W + p, p = 1, 2, · · · , H, q = 1, 2, · · · , W, and integers W and H are, respectively, the width and height of the 2D image array, which has a total number of pixels N = W × H. Denote by R N + the region {x ≥ 0|x ∈ R N }. For the system matrix A, we set H j = M i=1 a ij and d i (x) = N j=1 a ij x j for convenience. Furthermore, we introduce and Let P denote the EM operator, and then the EM iteration x k+1 = P (x k ) = (P 1 (x k ), · · · , P N (x k )) t for the problem (2) is defined as with an initial image x 0 > 0. Similarly, the perturbed version of the EM iteration is defined as where y k = x k + β k v k . Here, the vector v k ∈ R N and number β k ≥ 0 represent the direction and length of the perturbation of the kth iteration, respectively. Hereafter, we call the EM iteration without perturbation as the classic EM iteration to distinguish from each other. It has been established that the sequence {x n } generated by the classic EM iteration converges to a minimizer of the K-L distance I b A (x) between b and Ax on R N + , where I b A (x) is defined as where I(·, ·) denote the K-L distance function of any two nonnegative vectors. An important inequality used in this paper is for t > 0, and the inequality holds with equality if and only if t = 1. By using the inequality (11), we have that I(x, y) ≥ 0 for all vectors x, y ≥ 0, and the inequality holds with equality if and only if x = y.
Obviously, if β k = 0, the perturbed EM iteration is the same as the classic EM iteration, in which we are not interested. Therefore, in the following we assume β k > 0. A natural question is that under what assumptions the sequence {x n } generated by the perturbed EM iteration (9) converges to a minimizer of (10) as well. Before discussing the convergent conditions of the perturbed EM iteration, we first summarize some propositions of the classic EM iteration [9,12,31]. Without loss of generality, we assume that all the elements b i > 0 for all i, and H j = M i=1 a ij = 1 for all j for simplicity. Theorem 2.1 For any initial image x 0 > 0, denote by x k the estimate of the classic EM algorithm after k iterations. Then the following propositions hold: A (x) on R N + , and x * is a fixed point of the EM operator.
The inequalities in second and fourth items above hold with equalities if and only if x k is a fixed point of the EM operator. From the propositions above, we have that the sequence {I b A (x k )} monotonically converges to the minimum of I b A (·) on R N + , and {x k } approximates to the minimizer x * gradually. Next, we investigate the convergence of the perturbed EM iteration, and prove the similar propositions as far as we can. Theorem 2.3 Given any initial image x 0 , denote by {x k } k∈N the sequence generated by the perturbed EM iteration (9).
where y k j = x k j + β k v k j , and {v k } is a bounded sequence. Suppose that 1. Positivity: y k j > 0. 2. Vanishing: β k −→ 0.

Decreasing
Here ρ is a sufficiently small positive number.
Then, the following propositions hold: {x k } has a convergent subsequence {x m k }, and the limitx is a fixed point of the EM operator.
4.x is a minimizer of I b A (x). Furthermore, the inequalities above hold with equalities iff x k is a fixed point of the EM operator and β k v k = 0.
Before presenting the proof, we explain the necessities of the conditions of theorem 2.3. The positive condition of y k j is necessary for the nonnegative constraints. The second condition is required by the convergence of the sequence {x k }. Intuitively, the last condition is used to An important concern is about the existence of the perturbations satisfying the conditions of theorem 2.3, especially the third condition. Because the sequence {x k } generated by the classic EM iteration (β k = 0) satisfies the conditions in theorem 2.3, the sequence {x k } generated by the perturbed version also satisfies them when β k s are small enough for each iteration due to the continuities of the EM operator (9) and K-L distance (10). Therefore, there exist perturbations which satisfy the assumptions of theorem 2.3. Proof : We prove this theorem step by step, which follows the proving procedure in [31].
1. Proof of proposition 1: By the definition of I b A (·) and d i (·), we have The first and second inequalities hold because of the inequality (11) and the condition (13), respectively. Therefore, we have proved that (13) and (15).
The sufficiency is obvious by theorem 2.1. The necessity is also true by the following facts.
we have that all the inequalities in the derivation above hold with equalities, which implies that (14) and (15), respectively, i.e. β k d i (v k ) = 0 and v k j ≥ 0. In fact, we have that v k j = 0(∀ j), i.e. y k j = x k j , and x k is a fixed point by theorem 2.1. Otherwise, assume v k j 0 > 0 for some j 0 , there must exist i 0 such that a i 0 j 0 > 0 and To this end, we define a function for x ≥ 0 By the second proposition of theorem 2.1, we have Due to the proposition 1 of theorem 2.3, i.e.x is a fixed point of the EM operator, since I(x, y) = 0 ⇐⇒ x = y. Therefore, we have f j (x) = 1 ifx j = 0 by the EM iteration formula.
3. Proof of proposition 3: Before going further, we first prove a general inequality that j ln where we used the relationship N j=1x The inequality and the second equality are implied by Jensen's inequality and the fact f j (x) = M i=1 g ij (x) = 1 forx j = 0, respectively. The last equality follows the fact that by the assumption that M i=1 a ij = 1. Let x = y k , then z = P (y k ) = x k+1 and I(x, y k ) − (19). Furthermore, we have The last inequality holds because I b A (x k+1 ) ≥ I b A (x). Now, we prove I(x, x k+1 ) ≤ I(x, x k ). By the definitions of the K-L distance and the perturbed EM iteration, we have Again we used the inequality (11) for the first inequality. The second and the last inequalities hold by (21) and (13), respectively. Thus, we have that I(x, x k+1 ) ≤ I(x, x k ) by equation (23). Next, we need to prove that I(x, x k+1 ) = I(x, x k ) iff β k v k j = 0 and x k is a fixed point of EM operator. The sufficiency is clear. The necessity is also true by the following facts. If I(x, x k+1 ) = I(x, x k ), we have that all the inequalities in the derivation above hold with equalities, which implies that β k v k j = 0 and I b A (y k ) = I b A (x k+1 ) by inequalities (22) and (23), respectively. Therefore, we have that y k = x k and x k is a fixed point of EM operator by theorem 2.1. Finally, we need to prove that x k −→x. Firstly, we have I(x, x k ) 0 since I(x, x m k ) −→ 0 and the sequence {I(x, x k )} monotonously decreases. Therefore, we have proved x k −→x by I(x, y) = 0 ⇐⇒ x = y, and y k −→x as well by β k v k −→ 0.

Proof of proposition 4:
In order to prove this proposition, we should prove thatx satisfies the Kuhn-Tucker (K-T) conditions. For the EM algorithm and the K-L function, the K-T conditions are equivalent to (1) f j (x) = 1 ifx j = 0, and (2) [9,31]. Obviously, the first condition holds becausex is a fixed point of the EM operator. For the second condition, the nonnegativity of f j (x) is satisfied since x k j −→ x j ≥ 0, b i ≥ 0 and a ij ≥ 0. Next we need to prove that f j (x) ≤ 1 forx j = 0. Suppose f j (x) ≥ 1 + > 1 andx j = 0 for some j. There exists a sufficiently large integer L > 0 such that f j (y k ) > 1 + 1 with 1 > 0 for all k ≥ L since y k −→x. If x k j −→ 0, there must be infinite iterations such that x k+1 j < x k j . In the following, we assume that k > L. For each iteration satisfying x k+1 j = y k j · f j (y k ) < x k j , we have that y k j < x k+1 j < x k j since f j (y k ) > 1, and By the assumption that f j (y k ) > (1 + 1 ), we have Abstracting 1 on both sides of (25), we have β k v k j x k j < − 1 1+ 1 . By equation (15), we have that The second inequality holds since y k j < x k j and B + k ≤ j∈S + k x k+1 j . The last inequality follows from the definition of B − k . From equation (26), we can see that the decrease of I b A (x k ) is larger than a positive number for each iteration satisfying x k+1 j < x k j . Furthermore, because the number of iterations satisfying x k+1

Superiorization of the EM Algorithm
The SPECT regularization reconstruction can be modeled as a constrained optimization problem where φ is a convex function, which assigns each image x a number indicating the "undesirability" of the image in some sense. The set E is called feasible set, and it is called feasible problem to look for a point in E. To our knowledge, there is no efficient algorithm to deal with the constrained optimization problem (27) because of the large scale of it for SPECT reconstruction. On the other hand, although the feasible problem is also a constrained optimization, we can solve it by the classic EM algorithm [9] and its variants [10,11] efficiently. Based on the facts above, we use the superiorization methodology to implement the SPECT regularization reconstruction.
For the objective function φ, the superiorized EM algorithm is illustrated as algorithm 1 based on the conditions of theorem 2.3. In order to emphasize the objective function φ for which we are superiorizing, we refer to the superiorized algorithm as the φ-superiorization of the EM iteration.
Because the perturbation direction v k is selected as the decreasing direction of φ at x k , the size of β k represents the strength of regularization in some sense. Because the condition 3 of theorem 2.3 is very strict, the numerical experiments show that {β k } goes to zero very fast, which results in the regularization is very weak. Therefore, we propose a modified version of Algorithm 1 Framework of φ-superiorization algorithm Initialization: β 0 > 0, x 0 > 0, k = 0, and 0 < γ < 1. repeat: logic=true while logic find a decreasing direction v k of φ at x k , such that y k j = x k j + β k v k j > 0. If φ(y k ) ≤ φ(x k ) and inequality (13) holds.
( * ) logic=false, x k+1 = P (y k ), β k+1 = β k , k = k + 1. end(if) β k = γβ k . end(while) algorithm 1, which is shown in algorithm 2. In algorithm 2, we only validate I b A (x k+1 ) < I b A (x k ), rather than the inequality (13) of theorem 2.3. Since the inequality (13) implies I b A (x k+1 ) < I b A (x k ) by theorem 2.3, algorithm 2 can be seen as a relaxation of algorithm 1. Furthermore, we introduce a relative decrease of I b A (·) to avoid the situation that the amount of I b A (x k )−I b A (x k+1 ) for each iteration is too small, which can accelerate the convergence of I b A (x k ) and x k intuitively.
Algorithm 2 Modification of the φ-superiorization algorithm 1 β 0 > 0, x 0 > 0, k = 0, and 0 < γ < 1. repeat: logic=true while logic find a decreasing direction v k of φ at x k , such that y k In order to confirm the inequality (13) of theorem 2.3 in algorithm 1, we should compute j∈S − kx j and j∈S + kx j for each iteration. However,x is unknown in the iterative procedure. In practice, we estimate the values j∈S − kx j and j∈S + kx where |S − k | and |S + k | denote the cardinalities of the sets S − k and S + k , respectively. Here, the capital letters N and B = M i=1 b i denote the length of x and the total counts. Lastly, the parameter ρ is chosen as 10 −6 in this paper.
In the following, we will discuss how to generate desirable perturbation β k v k or y k for two concrete objective functions, TV and l 1 -norm, such that the perturbations satisfy the conditions ( * ) and ( ) of the two φ-superiorization algorithms. Since the TV regularization allows the reconstructed image to have sharp edges, TV based models are widely used in imaging sciences [15,16,26,29]. For an H × W image x whose pixel values are denoted by x i,j , the TV of x is defined as where H, W are the height and width of x. In order to reduce the value of TV at x k , we choose v k as where s k ∈ ∂T V (x k ) is the sub-gradient of T V at point x k , and |s k | ∞ is the maximum absolute value of the components of s k . Therefore, the sequence {v k } is bounded. In fact, v k is the normalization of s k in the l ∞ space, rather than in the l 2 space used in [25,27]. In addition to the TV based models, l 1 -norm minimization method is another widely used technique in image sciences [20,23,32,33]. Here, the l 1 -norm is about the wavelet coefficients of x, which is defined as where α j s are the coefficients of x under a given wavelet basis {ψ j }, and the letter T denotes the wavelet decomposition operator. Although we can use the same method for TV function to reduce the wavelet l 1 -norm, we introduce two more effective methods, soft and hard thresholding schemes, to reduce the wavelet l 1 -norm of x k . Let or where α k j are the wavelet coefficients of x k . The perturbation direction for each iteration is defined as v k = T −1 {ψ j } (γ k j ), and y k = x k + β k v k . In fact, we need not to compute v k explicitly. By the linearity of wavelet transform, we can obtain the wavelet coefficients of y k directly by or and y k = T −1 {ψ j } (α k j ). In the numerical experiments, we use Daubechies 6.8 bi-orthogonal wavelets with symmetric extensions at the boundaries. For referred convenience, we use hardsuperiorization and soft-superiorization to distinguish them for l 1 -superiorization of EM algorithm in this paper. Furthermore, in order to avoid y k j ≤ 0, y k j is set as 1 2 x k j if y k j ≤ 0 in practice.

Numerical Results
In this section we investigate the performance of the proposed algorithms by several numerical experiments of SPECT reconstruction. To this end, and the projection data were generated based on the following model. As shown in figure 1, the activity phantom consists of an ellipsoidal background (body region) with axes of length 22.5cm and 30cm, which contains two smaller ellipsoidal regions(lungs) with axes of length 10cm and 8.8cm, and a ring(myocardium) of inner and outer diameters 6cm and 8cm, respectively. The activities in myocardium, background, and lungs are specified to be in the ratio 3:2:1.
To simulate the attenuation coefficient in chest, we utilized the phantom used in [10], which imitates a section of human thorax. Besides the body background and lungs, the attenuation map consists of two circular regions (bones) of diameter 2.5cm(see figure 1). The attenuation coefficients were 0.03cm −1 within 'lung' regions, 0.17cm −1 within 'bone' regions, 0.15cm −1 elsewhere within the body ellipse, and 0.00cm −1 outside the body.
In the experiments, the activity and attenuation maps were evenly sampled in [−15, 15] × [−15, 15] on a grid of 128 × 128. A perfect parallel hole collimator was assumed, and noise-free projection data were created via attenuated Radon transform formula (1), which included tissue attenuation, but neither scattering nor blurring. In order to simulate the quantum noise in the simulated data, the following procedure was implemented [34].
• The projection data are scaled (multiplied by a constant factor) so that the number of counts is a predefined integer.
• Each value in the data set is then replaced by a random realization of a Poisson variant with a mean equal to that value.
Two data sets, sixty and thirty projections, were generated over 180 • evenly with view angles ϕ l = l−1 N 0 π(l = 1, · · · , N 0 , and N 0 = 60 or 30). The counts were recorded in 128 bins per projection, and the total counts were approximately 500K and 100K for two projection data sets, respectively.
We illustrated four numerical experiments for the proposed algorithms. The first and second experiments are about the projection data set 1(60 projections, 500K) and projection data set 2(30 projections, 100K) to validate the performance of the algorithms for different counts level. The third experiment uses the randomly initial image to test the robustness of the proposed algorithms. In the fourth experiment, we relax the condition of the TV-superiorized EM algorithms 1 and 2 further.
In order to evaluate the qualities of the reconstructed images, we computed the mean images x * from 100 noise trials as the standard images of the two data sets, respectively.
wherex m (m = 1, 2, · · · , 100) is the reconstructed image of the mth trial by the classic EM algorithm, and the number of iterations for each trial is 30. As shown in figure 2, the mean images for the two data sets are clear visually, while those reconstructed by the classic EM algorithm are dominated by noise, especially the image for data set 2 due to the low count level. Thus, we use the mean square error (MSE) between the estimation x and the mean image x * from the same data set to measure the image qualities, where the MSE is computed by The outputs of the different algorithms were taken as the best estimations in terms of MSE. In order to compare the qualities of images reconstructed from different data sets, we introduced the relative MSE(RMSE) defined as In the numerical experiments, an uniformly initial image x 0 of value c = 1 N M i=1 b i was used for all experiments, unless there was a further explanation. The parameters β 0 were chosen as c/2 and c/10 for the TV-and l 1 -superiorized EM algorithms, respectively. The thresholding Q 1 and the parameter γ were chosen as 0.01 and 1/2, respectively. In the following, we use TV-, hard-and soft-alg n(n = 1, 2) as the abbreviations of the TV-, hard-and soft-superiorized EM algorithm n(n = 1, 2) for simplicity. Experiment 1: sixty projections and 500K counts As expected, the images by the superiorized EM algorithms are superior to the one by the classic EM algorithm in terms of TV value, l 1 -norm and RMSE (see figure 3 and table 1), though the images by the superiorized algorithm 1 are visually indistinguishable from the one by the classic EM algorithm. Furthermore, the superiorized EM algorithm 2 are superior to the superiorized EM algorithm 1, because the regularization parameter β k goes to zero very fast for the superiorized EM algorithm 1.
As figure 4 shown, the evolutions of I b A (x k ) and MSE(x k ) of the superiorized EM algorithms 1 validate the conclusions of theorem 2.3. Furthermore, the evolutions of I b A (x k ) and MSE(x k ) of show the convergence of the superiorized EM algorithms 2, although we can not prove it theoretically. Because the inequality of condition 3 is very strict, the parameters β k went to zero very fast for the superiorized algorithm 1. This results in the reconstructed images and  the evolutions of I b A (x k ) and MSE(x k ) of the superiorized EM algorithms 1 and the classic EM algorithm are indistinguishable from each other, i.e. the regularization of the superiorized EM algorithm 1 is not remarkable.
Although the number of iterations of the superiorized EM algorithm 2 is larger than the classic EM and the superiorized EM algorithm 1(table 1), the superiorized algorithm 2 is superior to the superiorized algorithm 1 in terms of RMSE: RMSE(x tv,13 ) = 0.1668, RMSE(x h,13 ) = 0.1741 and RMSE(x s,13 ) = 0.1797, where x tv,13 , x h,13 , x s,13 denote the results of tv-, hard-and softsuperiorized algorithms 2, respectively. Experiment 2: thirty projections and 100K counts From the observations of figures 5 and 6, and table 2, we can draw the same conclusions as the experiment 1. Comparing figures 3 and 5, we can see that the visual quality of the images reconstructed from data set 2 are inferior to those reconstructed from data set 1 because of the low count level. And the performance of the superiorized EM algorithms 2 are much more remarkable than the superiorized EM algorithms 1 for data set 2.
Comparing the results of the two experiments above, we can observe that the TV-superiorized EM algorithm 2 is better in terms of RMSE, while the l 1 -superiorized EM algorithms 2 in terms of TV and l 1 -norm values. In addition, the thresholding operations cause the Gibbs oscillations in the reconstructed images by l 1 -superiorized EM algorithms 2. Because the detectors rotated from top to bottom through left side, the gamma rays emitted from the right part pixels are more likely to be absorbed. Therefore, the right part of the reconstructed images are blurred strongly(see figures 2, 3 and 5). Experiment 3: initial image x 0 with random values on interval [1,2] for data set 1 In this experiment, an initial image x 0 with random values on interval [1,2] and the projection data set 1 are used. The reconstructed images by different algorithms are displayed in figure 7, and the corresponding TV values, l 1 -norms and RMSEs are tabulated in table 3. Because the evolutions of I b A (x k ) and MSE(x k ) are very similar to these of experiment 1, we only plot the evolution of ln(1 + β k ) in figure 8.
This experiment shows that the superiorized EM algorithms 2 are stable and robust for initial image. By comparing figures 3 and 7, and tables 1 and 3, we have that the effect of initial image is very strong to the classic EM algorithm and the superiorized EM algorithm 1, but very weak to the superiorized EM algorithm 2.
A surprising observation is that the randomly initial image is superior to the uniformly initial image for the superiorized EM algorithms 2 in term of RMSE by comparing table 1 and table  3. This changes the long-standing opinion about the selection of the initial image for EM-like  algorithm, and present a new method to improve the qualities of the reconstructed image.
Comparing the evolutions of ln(1 + β k ) in figures 4, 6 and 8, we can observe the following facts. Firstly, the parameter β k for the superiorized EM algorithms 1 go to zero fast because of the strict condition 3 of theorem 2.3. Secondly, the reconstructed images of uniformly initial image is more smooth than these of randomly initial image by the TV-superiorized algorithm at low iterations, which results in the parameter β k decrease at fast and low rates for experiments 1 and 3, respectively, because of the decreasing condition of the TV function in the conditions ( * ) and ( ). Thirdly, the thresholding operations(hard and soft) always reduce the l 1 -norm, which implies that the conditions ( * ) and ( ) for the l 1 -superiorized algorithm 1 and 2 become one condition about the decreasing of K-L distance.
The last observation enlightens us to modify the TV-superiorized EM algorithm further, which discards the deceasing condition of TV function in the conditions ( * ) and ( ). Experiment 4: modification of the TV-superiorized algorithms 1 and 2 Figure 9 displays the reconstructed images by the modified versions of TV-superiorized EM algorithms 1 and 2 in absence of the decreasing condition of TV function. The TV values, l 1 -norms, RMSEs and iterations are tabulated in table 4. And figure 10 plots the evolutions of ln(1 + β k ).
As our expectation, the evolution of parameter β k of the modified version of TV-superiorized algorithm 2 is similar to the l 1 -superiorized EM algorithm 2, decreasing at much slower rate. However, this modification has very little effect on the TV-superiorized algorithm 1.
It is amazing that the modified algorithms also reduce the TV function (see table 3), even the reconstructed images are better than those reconstructed by the TV-superiorized EM algorithm 2, although we do not validate the decreasing condition of it. The reasons include two aspects. In superiorized algorithms 1 and 2, there are two conditions to control the decreasing of β k , which causes the size of β k is very small at large iterations. Therefore, the strength of regularization is very weak, and the reconstructed image is not good enough. For the modified superiorization algorithms, there is only one condition to control the decreasing of β k , and β k decreases at a lower rate. Therefore, the modified superiorization algorithms can maintain stronger regularization at large iterations, and the reconstructed image is much better. The further study about this algorithm is future work.

Remark 4.1
The experiments show that the superiorized EM algorithm 2 is convergent, though we cannot prove it theoretically so far.
Remark 4.2 The parameter β k represents the strength of regularization in a sense, so we can  obtain the regularization reconstruction by terminating the algorithms as long as β k is smaller than a predefined threshold. As explanation above, the size of β k represents the strength of regularization. Therefore, the threshold should be related to the noise level. Intuitively, in order to maintain the regularization strength, the higher the noise level is, the larger the threshold is.
The further discussion about the selection of it will be studied in future work.

Conclusions and Discussions
In this paper, the convergence of the EM algorithm in the presence of perturbations is discussed, and the superiorized EM algorithm based on the convergent conditions and its modified version are proposed. The numerical experiments validate the correction of theorem 2.3. The superiorized EM algorithms could efficiently reduce the corresponding objective functions which we are superiorizing. Furthermore, The proposed algorithms are more stable and robust than the classic EM algorithm for low counts projection data and randomly initial image. Although the numerical experiments show the convergence of algorithm 2, we cannot prove the convergence of it theoretically. A more challenging work is about the amazing observation of the experiment 4, which enlightens us to modify the superiorized EM algorithm further. In addition, we could not prove φ(x * ) ≤ φ(x) theoretically, wherex and x * are the solutions by the classic iteration algorithm and the φ-superiorization version [25].