GENERAL CONVERGENT EXPECTATION MAXIMIZATION (EM)-TYPE ALGORITHMS FOR IMAGE RECONSTRUCTION

. Obtaining high quality images is very important in many areas of applied sciences, such as medical imaging, optical microscopy, and astronomy. Image reconstruction can be considered as solving the ill-posed and inverse problem y = Ax + n , where x is the image to be reconstructed and n is the unknown noise. In this paper, we propose general robust expectation maximization (EM)-type algorithms for image reconstruction. Both Poisson noise and Gaussian noise types are considered. The EM-type algorithms are performed using iteratively EM (or SART for weighted Gaussian noise) and regularization in the image domain. The convergence of these algorithms is proved in several ways: EM with a priori information and alternating minimization methods. To show the eﬃciency of EM-type algorithms, the application in computerized tomography reconstruction is chosen.


1.
Introduction. Obtaining high quality images is very important in many areas of applied science, such as medical imaging, optical microscopy, and astronomy. For some applications such as positron-emission-tomography (PET) and computed tomography (CT), analytical methods for image reconstruction are available. For instance, filtered back projection (FBP) is the most commonly used method for image reconstruction from CT by manufacturers of commercial imaging equipments [41]. However, it is sensitive to noise and suffers from streak artifacts (star artifacts). An alternative to this analytical reconstruction is the use of the iterative reconstruction technique, which is quite different from FBP. The main advantages of the iterative reconstruction technique over FBP are insensitivity to noise and flexibility [28]. The data can be collected over any set of lines, the projections do not have to be distributed uniformly in angle, and the projections can be even incomplete (limited angle). With the help of parallel computing and graphics processing units (GPUs), even iterative methods can be solved very fast. Therefore, iterative methods become more and more important, and we will focus on the iterative reconstruction technique only.
The degradation model can be formulated as a linear inverse and ill-posed problem: (1) Here, y is the measured data (vector in R M in the discrete case). A is a compact operator (matrix in R M ×N in the discrete case). For all the applications we will consider, the entries of A are nonnegative and A does not have full column rank.
x is the desired exact image (vector in R N in the discrete case). b is the background emission and n is the noise (both are vectors in R M in the discrete case). We will consider the case without background emission (b = 0) in this paper. Since the matrix A does not have full column rank, the computation of x directly by finding the inverse of A is not reasonable because (1) is ill-posed and n is unknown. Even for the case without noise (n = 0), there are many solutions because A does not have full column rank. When there is noise in the measured data (n = 0), finding x is more difficult because of the unknown n. Therefore regularization techniques are needed for solving these problems efficiently. One powerful technique for applying regularization is the Bayesian model, and a general Bayesian model for image reconstruction was proposed by Geman and Geman [15], and Grenander [18]. The idea is to use a priori information about the image x to be reconstructed. In the Bayesian approach, we assume that measured data y is a realization of a multi-valued random variable, denoted by Y and the image x is also considered as a realization of another multi-valued random variable, denoted by X. Therefore, Bayes' theorem gives us (2) p X (x|y) = p Y (y|x)p X (x) p Y (y) .
This is a conditional probability of having X = x given that y is the measured data. After inserting the detected value y, we obtain a posterior probability distribution of X. Then we can find x * such that p X (x|y) is maximized, as maximum a posteriori (MAP) likelihood estimation. In general, X is assigned a Gibbs random field, which is a random variable with the following probability distribution where J(x) is a given energy function (J(x) can be non-convex), and β is a positive parameter. There are many different choices for J(x) depending on the applications. Some examples are, for instance, quadratic penalization J(x) = x 2 2 /2 [12,34], quadratic gradient J(x) = ∇x 2 2 /2 [54], total variation J(x) = |∇x| 1 , and modified total variation [38,40,14,6,49,51]. We also mention Good's roughness penalization J(x) = |∇x| 2 /x 1 [26] for a slightly different regularization. We assume here that · 1 and · 2 are the 1 and 2 norms respectively, and that ∇x denotes the discrete gradient operator of an image x.
For the choices of probability densities p Y (y|x), we can choose in the case of additive Gaussian noise, and the minimization of the negative loglikelihood function gives us the famous Tikhonov regularization method [46] (5) minimize If the random variable Y of the detected values y follows a Poisson distribution [31,42] with an expectation value provided by Ax instead of Gaussian distribution, we have By minimizing the negative log-likelihood function, we obtain the following optimization problem In this paper, we will focus on solving (5) and (7). It is easy to see that the objective functions in (5) and (7) are convex if J(x) is a convex function. Additionally, with suitably chosen regularization J(x), the objective functions are strictly convex, and the solutions to these problems are unique. If J(x) = |∇x| 1 , i.e,. the regularization is the total variation, the well-posedness of the regularization problems is shown in [1] and [6] for Gaussian and Poisson noise respectively. In this paper, we also consider a more general case where J(x) is non-convex.
The paper is organized as follows. In section 2, we will give a short introduction of expectation maximization (EM) iteration, or Richardson-Lucy algorithm, used in image reconstruction without background emission from the view of optimization. In section 3, we will propose general EM-type algorithms for image reconstruction without background emission when the measured data is corrupted by Poisson noise. This is based on the maximum a posteriori likelihood estimation and an EM step. In this section, these EM-type algorithms are shown to be equivalent to EM algorithms with a priori information. In addition, these EM-type algorithms are also considered as alternating minimization methods for equivalent optimization problems, and the convergence results are obtained for both convex and non-convex J(x). When the noise is weighted Gaussian noise, we also have the similar EMtype algorithms. Simultaneous algebraic reconstruction technique is shown to be EM algorithm in section 4, and EM-type algorithms for weighted Gaussian noise are introduced in section 5. In section 5, we also show the convergence analysis of EM-type algorithms for weighted Gaussian noise via EM algorithms with a priori information and alternating minimization methods. Some numerical experiments on image reconstruction are given in section 6 to show the efficiency of the EM-type algorithms. We will end this work by a short conclusion section.

Expectation maximization (EM) iteration.
A maximum likelihood (ML) method for image reconstruction based on Poisson data was introduced by Shepp and Vardi [42] in 1982 for image reconstruction in emission tomography. In fact, this algorithm was originally proposed by Richardson [37] in 1972 and Lucy [33] in 1974 for image deblurring in astronomy. The ML method is a method for solving the special case of problem (7) without regularization term, i.e., J(x) is a constant, which means we do not have any a priori information about the image. From equation (6), for given measured data y, we have a function of x, the likelihood of x, defined by p Y (y|x). Then a ML estimation of the unknown image is defined as any maximizer x * of p Y (y|x).
By taking the negative log-likelihood, one obtains, up to an additive constant, and the problem is to minimize this function f 0 (x) on the nonnegative orthant, because we have the constraint that the image x is nonnegative. In fact, we have where D KL (y, Ax) is the Kullback-Leibler (KL) divergence of Ax from y, and C is a constant independent of x. The KL divergence is considered as a data-fidelity function for Poisson data just like the standard least-square Ax − y 2 2 is the datafidelity function for additive Gaussian noise. D KL (y, ·) is convex, nonnegative and coercive on the nonnegative orthant, so the minimizers exist and are global.
In order to find a minimizer of f (x) with the constraint x j ≥ 0 for all j, we can solve the Karush-Kuhn-Tucker (KKT) conditions [30,29], where s j is the Lagrange multiplier corresponding to the constraint x j ≥ 0. By the positivity of {x j }, {s j } and the complementary slackness condition s T x = 0, we have s j x j = 0 for every j ∈ {1, · · · , N }. Multiplying by x j gives us Therefore, we have the following iteration scheme This is the well-known EM iteration or Richardson-Lucy algorithm in image reconstruction, and an important property of it is that it preserves positivity. If x k is positive, then x k+1 is also positive if A preserves positivity. It is also shown that for each iteration, i (Ax) i is fixed and equals the minimizer has a weighted l 1 constraint. Shepp and Vardi showed in [42] that this is equivalent to the EM algorithm proposed by Dempster, Laird and Rubin [13]. To make it clear, EM iteration means the special EM method used in image reconstruction, while EM algorithm means the general EM algorithm for solving missing data problems.
3. EM-type algorithms for Poisson data. The method shown in the last section is also called maximum-likelihood expectation maximization (ML-EM) reconstruction, because it is a maximum likelihood approach without any Bayesian assumption on the images. If additional a priori information about the image is given, we have maximum a posteriori probability (MAP) approach [21,32], which is the case with regularization term J(x). Again we assume here that the detected data is corrupted by Poisson noise, and the regularization problem is This is still a convex constraint optimization problem if J is convex and we can find the optimal solution by solving the KKT conditions: Here s j is the Lagrangian multiplier corresponding to the constraint x j ≥ 0. By the positivity of {x j }, {s j } and the complementary slackness condition s T x = 0, we have s j x j = 0 for every j ∈ {1, · · · , N }. Thus we obtain Notice that the last term on the left hand side is an EM step (8). After plugging the EM step into the equation, we obtain which is the optimality condition for the following optimization problem Therefore we propose the general EM-type algorithms in Algorithm 1. The initial guess x 0 can be any positive initial image, and , chosen for the stopping criteria, is a small constant. N um Iter is the maximum number of iterations. If J(x) is constant, the second step is just x k = x k− 1 2 and this is exactly the ML-EM from the previous section. When J(x) is not constant, we have to solve an optimization problem at each iteration. In general, the problem can not be solved analytically, and we have to use iterative methods to solve it. However, in practice, we do not have to solve it exactly to the steady state, just a few iterations are sufficient. We will show that the algorithm will also converge without solving it exactly.
Algorithm 1 Proposed EM-type algorithms.
There are several other methods for solving this problem using EM iteration, we mention two of them and show the differences with our method. One of the methods is a modified EM iteration [17,35] which is This can be considered as solving a modified version of (10): The convergence is given only when β is small [17]. It is easy to notice that when β is large, x j will be negative for some j's and the projection onto the non-negative cone is needed. In this case, there is no convergence result. Another algorithm is also a two stage method, and the difference is in the second stage [6,8,7]. Instead of solving (10) directly, which is a weighted Poisson denoising problem, a semi-implicit scheme is developed and it becomes a problem of weighted Gaussian denoising. The semi-implicit scheme is However, there is no convergence for this algorithm and the convergence provided in [6] is for the following damped version: There is a bound on w for each iteration to show the convergence of damped algorithm [6], and the bound is difficult to find. In the following, we will show the convergence of our algorithm without any assumptions on β and additional parameters. It will converge to a global minimum for convex J(x) and a subsequence will converge to a local minimum for non-convex J(x).

3.1.
Equivalence to EM algorithms with a priori information. In this subsection, the EM-type algorithms are shown to be equivalent to EM algorithms with a priori information. The EM algorithm is a general approach for maximizing a posterior distribution when some of the data is missing [13]. It is an iterative method which alternates between expectation (E) steps and maximization (M) steps. For image reconstruction, we assume that the missing data is the latent variables {z ij }, describing the intensity of pixel (or voxel) j observed by detector i. It is introduced as a realization of a multi-valued random variable Z, and for each (i, j) pair, z ij follows a Poisson distribution with expected value A i,j x j . Then the observed data is y i = j z ij , because the summation of two Poisson distributed random variables also follows a Poisson distribution, whose expected value is the summation of the two expected values.
The original E-step is to find the expectation of the log-likelihood given the present variables x k : Then, the M-step is to choose x k+1 to maximize the expected log-likelihood Q(x|x k ) found in the E-step: From (13), what we need before solving it is just {E z|x k ,y z ij }. Therefore we can compute the expectation of missing data {z ij } given present x k and the condition y i = j z ij , denoting this as an E-step. Because for fixed i, Thus we can find the expectation of z ij with all these conditions by the following E-step (14) z k+1 After obtaining the expectation for all z ij , we can solve the M-step (13). We will show that EM-type algorithms are exactly the described EM algorithms with a priori information. Recalling the definition of x EM , we have Therefore, the M-step is equivalent to We have shown that EM-type algorithms are EM algorithms with a priori information. The convergence of EM-type algorithms is shown in the next subsection from the equivalence of EM-type algorithms with alternating minimization methods for equivalent problems.
3.2. EM-type algorithms are alternating minimization methods. In this section, we will show that these algorithms can also be derived from alternating minimization methods of other problems with variables x and z. The new optimization problems are Here E P is used again to define the new function. E P (·) means the negative loglikelihood function of x, while E P (·, ·) means the new function of x and z defined in new optimization problems. x ≥ 0 is implicitly included in the formula.
Having initial guess x 0 , z 0 of x and z, the iteration for k = 0, 1, · · · is as follows: Firstly, in order to obtain z k+1 , we fix x = x k and easily derive After finding z k+1 , let z = z k+1 fixed and update x, then we have which is the M-Step (13) in section 3.1. Thus EM-type algorithms for (9) are alternating minimization methods for problem (15). The equivalence of problems (9) and (15) is provided in the following theorem. (15), then x * is also a local minimum of (9). If x * is a local minimum of (9), then we can find z * from (16) and (x * , z * ) is a local minimum of problem (15).
Proof. If (x * , z * ) is a local minimum of problem (15), we can find δ > 0 so that for all (x, z) such that (x − x * , z − z * ) < δ and j z ij = y i , z ij ≥ 0, the expression E P (x, z) ≥ E P (x * , z * ) holds. Let us assume that x * is not a local minimum of problem (9), then there exist a sequence x k such that lim k→∞ x k = x * and E P (x k ) < E P (x * ). Let z k+1 be chosen from (16) for corresponding x k and we have E P (x k , z k+1 ) = E P (x k ) < E P (x * ) = E P (x * , z * ), and lim k→∞ z k = z * . This is a contradiction with that (x * , z * ) is a local minimum of (15).
In another way, if x * is a local minimum of (9), there exist a δ > 0 such that for all x such that x − x * ≤ δ and x ≥ 0, we have Thus (x * , z * ) is a local minimum of problem (15).
From the equivalence of EM-type algorithm with alternating minimization methods, we know that the function value is decreasing. Next, we will show that if J(x) is convex, (x k , z k ) will converge to a global minimum of problem (15), and if J(x) is non-convex, a subsequence of {(x k , z k )} will converge to a local minimum of problem (15). Before that, we show two lemmas which are used in the proof of the convergence results.
For s > 0, t ≥ 0, let us define d P (s, t) = t log t s + s − t, assuming that 0 log 0 = 0 and log 0 = −∞. In addition Z is defined as Z := {z : z ij ≥ 0, j z ij = y i }. We will have the following lemmas: Proof.
The last equality comes form Lemma 3.2. Since d P (s, t) is convex, the result follows because d d P ((s, t), (p, q)) ≥ 0.
In order to show the convergence for general J(x), we will need the following assumption on J(x). Assumption 1. J(x) is lower semicontinuity, bounded below, and there exists δ > 0 for any givenz ∈ Z and any local minimum pointx of E P (x,z), such that for all x satisfying x −x ≤ δ, the following inequality holds: where z ∈ Z.
Remark 1. For convex J(x), we have a stronger statement The equality comes from the optimality condition forx.
With Assumption 1, we have the following theorem which will provide an alternative way to determine the local optimality.
Theorem 3.4. If there exists (x,z) such thatz = argmin z E P (x, z) andx is a local minimum point of E P (x,z). Then (x,z) is a local minimum of problem (15).
Proof. Sincex is a local minimum point of E P (x,z), there exists δ > 0 such that for all x satisfying x −x ≤ δ, we have, from Assumption 1, that for all z ∈ Z. The first equality comes from (19), and the second equality holds because for any z ∈ Z, we have Therefore, we have for all x satisfying x −x ≤ δ and z ∈ Z, which means that (x,z) is a local minimum of problem (15).

Remark 2.
In fact, (x,z) being a local minimum of problem (15) requires that there exist a constant δ > 0 such that for all x satisfying x −x ≤ δ, the following inequality holds: where z ∈ Z. Therefore, it is reasonable to make the assumptions on J(x).
Theorem 3.5. The algorithm will converge to a global minimum of problem (15) for convex J(x), and there exists a subsequence which converges to a local minimum of problem (15) for non-convex J(x).
Proof. From previous theorem, we have to show that it converges to (x * , z * ) with z * = argmin z E P (x * , z) and x * being a local minimum point of E P (x, z * ).
The last inequality comes from Assumption 1, and it holds only for x close to x k . Therefore, we have ij d P (z k ij , z ij ) − ij d P (z k+1 ij , z ij ) ≥ E P (x k , z k+1 ) − E P (x, z). Since J(x) is bounded below and d P (+∞, z) = +∞, x k is bounded and there exists a subsequence denoted by {x o k } converging to x * . Let z * = argmin z E P (x * , z). Then we have lim k→∞ z o k +1 = z * and E P (x * , z * ) ≤ lim k→∞ E P (x o k , z o k +1 ) ≤ E P (x o k , z o k +1 ). Since E P (x k , z k+1 ) is decreasing, we have E P (x * , z * ) < E P (x k , z k ) and lim k→∞ E P (x k , z k ) = E P (x * , z * ). From (21), d P (z k ij , z * ij ) is monotone decreasing and {z o k } is bounded, there exists a subsequence converging toz, and we still denote it by {z o k } for simplicity. Since E P (x * ,z) ≤ lim k→∞ E P (x o k , z o k ) = E P (x * , z * ), we havez = z * . Next, we will show that x * is a local minimum point of E P (x, z * ). From Assumption 1 we have , which means that x * is a local minimum of E P (x, z * ) and (x * , z * ) is a local minimum.
If J(x) is convex, we can choose (x, z) in (21) to be a global minimum (x,z), and we have that d P (z k ij ,z ij ) is monotone decreasing. Thus z is bounded and for each convergent subsequence of z o k with lim k→∞ z o k =z, we can find x o k and lim k→∞ x o k =x, withx being a minimum point of E P (x,z). We have E P (x,z) ≤ lim k→∞ E P (x o k , z o k ) = E P (x,z). Thus (x,z) is also a global minimum. Therefore, lim k→∞ z k = z * , and let x * = argmin x E P (x, z * ) satisfying lim k→∞ x k = x * . The resulting (x * , z * ) is a global minimum of problem (15).
In fact it is impossible to solve the second step exactly in many cases, and we have to approximately solve it using iterative methods.
Remark 4. The relations between these algorithms are shown in Figure 1. EM-type algorithm is a special EM-algorithm with a priori information, and EM iteration is a special case of EM-type algorithm without J(x).

EM-algorithm
EM-type algorithm EM iteration

Simultaneous algebraic reconstruction technique (SART) is EM.
Among all the iterative reconstruction algorithms, there are two important classes. One is EM from statistical assumptions mentioned above, and the other is algebraic reconstruction technique (ART)-type algorithms [16,20]. Simultaneous algebraic reconstruction technique (SART) [3,4], as a refinement of ART, is used widely [5,36,52] and the convergence analysis of SART is well studied by Jiang and Wang [25,24], Wang and Zheng [47], Censor and Elfving [9], and Yan [50]. In this section, we will show that SART is also an EM algorithm, building the connection between these two classes.
From the convergence analysis of SART in [50], SART is also an algorithm for solving a maximum likelihood problem In this section, we consider the special case without regularization function, i.e., there is no a priori information about the image to be reconstructed. The M-step is to maximize the expected value of the log-likelihood function, where C is a constant independent of x and z. Therefore, for the E-step we have to just find the expected value of z ij given x k and the constraints, which is For the M-step, we find x k+1 by maximizing p(y, z k+1 |x) with respect to x, which has an analytical solution This is the original SART algorithm proposed by Andersen [3].
From the convergence analysis of SART in [50], the result of SART depends on the initialization x 0 for both noiseless and noisy cases when A is underdetermined.
Remark 5. SART is just one example of Landweber-like schemes for solving systems of linear equations. By changing the variance of y i and z ij , different schemes can be proposed. For other Landweber-like schemes such as component averaging in [9,10], they can also be derived from the EM algorithm similarly by choosing different variances. Furthermore, new schemes can be derived by choosing different variances.
5. EM-type algorithms for Gaussian noise. It is shown in the last section that SART is an EM algorithm based on weighted Gaussian assumption for the problem without regularization. Without regularization, the original problem is ill-posed, and the result will depend on the initialization x 0 . In this section, we will consider the regularized problem (24) minimize and derive EM-type algorithms with Gaussian noise assumption for solving it. The E-step is the same as in the case without regularization, However, the M-step is different because we have a priori information on the image x to be reconstructed. The new M-step is to solve the following optimization problems which is equivalent to From the SART iteration (23) in the last section, we can define and have (28) x k+1 = argmin Therefore, the proposed EM-type algorithms for image reconstruction with Gaussian noise are as follows.
Algorithm 2 Proposed EM-type algorithms for Gaussian noise.
Input: x 0 , , for k = 0 : N um Iter do The initial guess x 0 can be any initial image and , chosen for the stopping criteria, is very small. N um Iter is the maximum number of iterations. When J(x) is not constant, we have to solve an optimization problem for each iteration. The convergence analysis of these algorithms can be shown similarly as for the case with Poisson noise, which is described in the following subsection. 5.1. EM-type algorithms are alternating minimization methods. Same as the algorithms for Poisson data, the algorithms can also be derived from an alternating minimization method of other problems with variables x and z. The new problems are Here E G is used again to define the new function. E G (·) means the negative loglikelihood function of x, while E G (·, ·) means the new function of x and z defined in new optimization problems. The iteration is as follows: First, let us fix x = x k and update z. It is easy to derive Then, by fixing z = z k+1 and updating x, we have If J(x) is convex, problem (29) is convex, and we can find the minimizer with respect to z for fixed x first and obtain a function of x as follows, which is also convex and equals E G (x). Therefore EM-type algorithms will converge to the solution of (24). The convergence can be shown similarly with small changes in the definition of d P (s, t), for weighed Gaussian noise, we can define d G (s, t) = 1 2 (s − t) 2 . In addition Z is defined as Z := {z : z ij = 0 if A i,j = 0, j z ij = y i }. The corresponding lemmas are provided without proof.
Lemma 5.1. For all p, s, t ∈ R, we have Lemma 5.2. For all p, q, s, t ∈ R, we have As for the Poisson noise case, we have the following assumption on J(x). Assumption 2. J(x) is lower semicontinuity, bounded below, and there exists δ > 0 for any givenz ∈ Z and any local minimum pointx of E G (x,z), such that for all x satisfying x −x ≤ δ, the following inequality holds: where z ∈ Z.
Remark 6. For convex J(x), we have a stronger statement The equality comes from the optimality condition forx.
With this assumption, we have the following theorem which will provide an alternative way to determine the local optimality. The proof is similar to the case of Poisson noise and we omit it here. Theorem 5.3. If there exists (x,z) such thatz = argmin z E G (x, z) andx is a local minimum point of E G (x,z). Then (x,z) is a local minimum of problem (29).
The convergence result is similar and we put it here without proof.
Theorem 5.4. The algorithm will converge to a global minimum of problem (29) for convex J(x), and there exists a subsequence which converges to a local minimum of problem (29) for non-convex J(x).

5.2.
Relaxation. In practice, some authors use a relaxation of SART reconstruction, which is with a relaxant coefficient w. The convergence of this relaxation is shown in [25,24,50] for any w ∈ (0, 2). Inspired by this strategy, we have a relaxation of the EM-type algorithms for image reconstruction with Gaussian noise. The EM-step is the relaxed SART with relaxant coefficient w: x The corresponding regularization step is When w = 1, we have already discussed the convergence in the previous subsections by EM algorithms with a priori information and alternating minimization methods. For w = 1, we will show the convergence of the relaxed EM-type algorithms for w ∈ (0, 1) by alternating minimization methods. We will show that the relaxed EM-type algorithms are equivalent to solving the unconstrained problems where γ = w 1−w , by alternating minimization between x and z. First, fix x = x k , we can solve the problem of z only, and the analytical solution is Then let z = z k+1 fixed, and we can find x k+1 by solving where C is a constant independent of x. Having z k+1 from (31), we can calculate Therefore this relaxed EM-type algorithm is an alternating minimization method. We will show next that the result of this relaxed EM-type algorithm is the solution to (24). Because the objective function E G R (x, z) in (30) is convex if J(x) is convex, we can first minimize the function with respect to z with x fixed. Then the problem becomes We have shown in this subsection that the relaxed EM-type algorithm will also converge to the solution of the original problem (24) when α ∈ (0, 1]. 6. Numerical experiments. As mentioned in the introduction, the main focus of this paper is the convergence result of EM-type algorithms, and some of the numerical experiments in this section have been included in [49,51]. The newly added experiment is using of a non-convex J(x). In addition, this algorithm has been implemented in different hardwares and the speedups can be found in [11].
In this section, several numerical experiments are provided to show the efficiency of EM-type algorithms. Though these EM-type algorithms can be used in many applications, we choose Computed Tomography (CT) image reconstruction as our application in this work. CT is a medical imaging method which utilizes X-ray equipment to produce a two dimensional (or three dimensional) image of the inside of an object from a large series of one dimensional (or two dimensional) X-ray images taken along a single axis of rotation [20]. In CT reconstruction, the operator A is the discrete Radon transform, and the discrete version of A is constructed by Siddon's algorithm [43,53]. The problem is to reconstruct the image from the measurements, which is equivalent to solve Ax = b. Poisson noise is assumed [45] and two regularizations are used: the total variation (TV), and an approximation to a non-convex Mumford-Shah TV-like version.
6.1. CT reconstruction using EM+TV (2D). At first, we illustrate one method (EM+TV) on a simple synthetic object (two dimensional Shepp-Logan phantom), see Figure 2. The most common method used in commercial CT (computerized tomography) is the filtered back projection (FBP), which can be implemented in a straight forward way and it can be computed rapidly. However, FBP has limitations due to the presence of streak artifacts and the noise enhancement, which is inherent in the reconstruction. Furthermore, in order to obtain an accurate image, many views are taken. Algorithms that can perform accurate image reconstruction from fewer views are very important in reducing patient radiation dose and speeding up scans. Optimization based methods, including EM+TV, can reconstruct an image from fewer views, but require more computing time. However, with the development of graphics processing units (GPUs), the computing time has reduced greatly and this kind of techniques becomes more and more important.
In the following experiment, we will compare the reconstruction results obtained by EM+TV with those obtained by filtered back projection. Firstly we obtain the measurements using Siddon's algorithm. We consider both the noise-free and noisy cases. For the FBP method, we present results using 36 views (every 10 degrees; for each view there are 301 measurements), 180 views, and 360 views. In order to show that we can reduce the number of views by using EM+TV, we only use 36 views for the proposed method. The results are shown in Figure 3. We notice the much improved results obtained with EM+TV using only 36 views, by comparison with FBP using 36, 180 or even 360 views. To solve the regularization step, we use semi-implicit scheme for some iterations [49].  6.2. CT reconstruction using EM+MSTV (2D). Instead of TV regularization, we will show the results using a modified TV-like regularization, which is called Mumford-Shah TV (MSTV) [40]. This regularization is which has two variables x and v, and Ω is the image domain. In the continuous case, it is shown by Alicandro et al. [2] that J(x, v) will Γ-converge to where x + and x − denote the image values on two sides of the edge set K, H 1 is the one-dimensional Hausdorff measure and D c x is the Cantor part of the measurevalued derivative Dx.
The comparisons between EM+TV and EM+MSTV in both noise-free and noisy cases are shown in Figure 4. From the results, we can see that with MSTV, the reconstructed images will be better than with TV only, visually and according to the root-mean-squared-error (RMSE).  . Comparisons of TV regularization and MSTV regularization for both without and with noise cases. Top row: reconstructed images by these two methods in both cases. Bottom row: differences (errors) between the reconstructed images and original phantom image. The RMSEs and differences show that MSTV can provide better results than TV only.
6.3. CT reconstruction using EM+TV (3D). In this experiment, we will show the reconstruction results using EM+TV for a three dimensional phantom image. The image chosen is the 128 × 128 × 128 Shepp-Logan phantom, and the sinogram data is obtained from 36 views. The result is compared with the one obtained using the EM step only in Figure 5. 7. Conclusion. In this paper, we proposed general robust EM-type algorithms for image reconstruction without background emission. Both Poisson noise and Gaussian noise are considered. The EM-type algorithms are performed using iteratively EM (or SART for weighted Gaussian noise) and regularization in the image domain.
The convergence of these algorithms is proved in several ways: EM with a priori information and alternating minimization methods. To show the efficiency of EMtype algorithms, the application in CT computerized tomography reconstruction is chosen. We compared EM+TV and EM+MSTV for 2D CT reconstruction. Both methods can give us good results by using undersampled data comparing to the filtered back projection. Results from EM+MSTV have sharper edges than those from EM+TV. Also EM+TV is used for 3D CT reconstruction and the performance is better than using EM only (without regularization term) for undersampled data.