SOME PROXIMAL METHODS FOR POISSON INTENSITY CBCT AND PET

. Cone-Beam Computerized Tomography (CBCT) and Positron Emission Tomography (PET) are two complementary medical imaging modal- ities providing respectively anatomic and metabolic information on a patient. In the context of public health, one must address the problem of dose reduction of the potentially harmful quantities related to each exam protocol : X-rays for CBCT and radiotracer for PET. Two demonstrators based on a technological breakthrough (acquisition devices work in photon-counting mode) have been developed. It turns out that in this low-dose context, i.e. for low intensity sig- nals acquired by photon counting devices, noise should not be approximated anymore by a Gaussian distribution, but is following a Poisson distribution. We investigate in this paper the two related tomographic reconstruction problems. We formulate separately the CBCT and the PET problems in two gen- eral frameworks that encompass the physics of the acquisition devices and the speciﬁc discretization of the object to reconstruct. We propose various fast numerical schemes based on proximal methods to compute the solution of each problem. In particular, we show that primal-dual approaches are well suited in the PET case when considering non diﬀerentiable regularizations such as Total Variation. Experiments on numerical simulations and real data are in favor of the proposed algorithms when compared with well-established methods. and repre-


Introduction. Cone Beam Computerized Tomography (CBCT) and Positron
Emission Tomography (PET) scans are medical imaging devices that provide respectively anatomical and metabolic complementary information on the patient.
From the point of view of modeling these are key-examples of ill-posed inverse problems. Indeed the observations in both cases are obtained by a given number of projections of the original data corrupted by noise. The projection operator is generally ill-conditioned and non invertible. The noise has two origins: additive dark noise (Gaussian noise) coming from the electronics of the measurement device, and Poisson noise coming from the counting process leading to the measurements.
Moreover both modalities are somewhat invasive. On the one hand CBCT scans use the intrinsic physical property that biological tissues absorb X rays. One would like to lower the dose delivery necessary to acquire an image, which could be done by improving the detection efficiency of the scanner. On the other hand a PET scan is based on the principle of injecting an active radiotracer to the patient and one would also like to reduce this dose. Dose reduction impacts the level of noise, and thus the difficulty of the reconstruction problem. Notice furthermore that the information provided by a CBCT scan are needed in the reconstruction of data from a PET scan (see explanations in Section 2.2.3). This is why the CPPM in Marseille, France, developed recently a new prototype of small animal scanner ClearPET/XPAD [36]. It is designed to record images for both modalities simultaneously. Moreover it makes use of a new generation of detectors which are not affected by dark noise. Thus the noise in the measurements acquired in this prototype is modeled as pure Poisson noise.
The goal of this work is to apply reconstruction algorithms both in CBCT and PET cases in this new setting, i.e. for pure Poisson noise. Building numerical algorithms to reconstruct the data with this model is challenging. However, these inverse problems in tomography may be solved quite accurately using recent developments in the field of convex optimization such as proximal methods, and in particular primal-dual algorithms [11]. To our knowledge some of these methods were never applied to tomography. We test their robustness when increasing the noise and lowering the number of projections used to reconstruct the images.
The contributions of the papers are the following: • We model the acquisitions of a new PET and a new CBCT scanner that both use the new hydrid pixels technology detailed in section 2. This leads to new optimization problems we seek to solve. • Using recent developments in convex optimization, we are able to minimize the proposed functionals without smoothing them (as far as we know, this is new in the case of the PET functional involving a logarithm term). • We present extensive numerical results, so that the reader may judge the efficiency of the proposed methods. • Since we have prototypes of the new scanners, we are able to test the modeling and the algorithms introduced in the paper on real data (and not just on synthetic examples). Our analysis thus starts from the modeling of the physics of the acquisition device, and finishes with tests on real data.
The outline of this paper is as follows. In Section 2, we describe the physics and explain the mathematical modeling in both cases. In Section 3, we quickly review the state of the art in inverse problems related to tomography. Section 5 to Section 7 introduce the convex optimization tools used here and describe in details the algorithms we test. In sections 8 and 9, we present numerical results on simulated and real data.
2. Physics of CBCT and PET imaging.
2.1.1. Introduction. Computerized Tomography (CT) is a medical imaging modality that provides anatomical information via contrast images. A CT-scan consists in a X-ray source and a X-ray camera with the imaged object placed in between. For each position of the source-detector couple relative to the object, an image is acquired that consists of a projection along the line-of-sight (LOS) defined by the source-detector direction. A full set of acquisitions is then obtained by rotating the source-detector block around the object. The data acquired, called a sinogram, are the input of the tomographic reconstruction. They express the value of the projection as a function of the projection angle.
The purpose of a CT-scan is to estimate the absorption coefficient µ of the different tissues that constitute the object. This coefficient is energy-dependent and its values are tabulated. For instance, the main organs of the human body are made of a mixture of tissues and each organ is characterized by a single averaged coefficient µ.
Recent scanners make use of cone beam tomography rather than fan beam tomography. Indeed a Cone Beam CT scan has a cone shaped beam which images the whole body at each shot and rotates around the subject.
In the following, we will describe the general model of a CT-scan acquisition more precisely.

2.1.2.
The PIXSCAN II demonstrator. The PIXSCAN II is a small animal Cone-Beam Computerized Tomography (CBCT) demonstrator based on the prototype X-ray hybrid pixel camera XPAD3 developed at CPPM [37,9].
Hybrid pixel detectors are used to count X-ray photons above a given energy threshold. The XPAD3 camera is composed of 130 µm 2 pixels with a readout every micro-second without dead time. The readout is independent between pixels. There is no charge amplification in the readout process, so that the XPAD3 camera is not subject to dark noise like usual charge integration detectors. Notice that the dark noise that usually affects any CCD-like integration charge detectors is classically modeled by an additive Gaussian noise.
2.1.3. General and monochromatic acquisition models. Let us model the reconstruction of images from a CBCT scan. The probability p that a photon at a given energy E is not absorbed by the material along a line-of-sight L is expressed thanks to the Beer-Lambert law by : This probability only depends on the coefficient µ that reflects anatomical properties of the human body.
In the general case, let us define an object by its absorption coefficient µ = µ(r; E), where r stands for a position in R 3 , and denote by y L the measurement of a CT-scanner.
The measurement is made counting all the photons that have not been absorbed by the imaged object. In this paper, we will exclusively focus on a monochromatic model, i.e. we assume that the X-ray source delivers a monochromatic beam at a given energy E 0 . Thus by integrating the absorption coefficient µ along the LOS L, we obtain where z L (E) = I L (E) · S(E) and • S(E) : the detector efficiency at energy E; • I L (E) : the number of photons emitted by the source at energy E along the LOS L.

2.1.4.
Discretization of the problem. Let us consider the discrete framework that is relative to the real-world aspects : objects to reconstruct are discretized into voxels and CT cameras are pixellized. First, if a CT camera contains N pixels, a tomographic set of measurements y is obtained with Θ angles of projection such that y ∈ R J1 with J 1 = N Θ.
Secondly, the continuous object µ to reconstruct is discretized according to a decomposition on a finite dimensional basis {b i } i∈I1 such that µ ≈ i∈I1 µ i b i . For numerical reasons, most of the time the function family of voxels is chosen by the users (see [22] for more results). Then, the object to reconstruct is a vector µ ∈ R I1 . Equation (2) then becomes: where z j (E 0 ) represents the number of photons emitted by the source into the solid angle defined by the pixel j seen from the source point. In the following, we will write z j (E 0 ) = z j for simplicity. In this problem, (a i,j ) I1×J1 , (z j ) I1 , (y j ) J1 and the unknown vector (µ i ) I1 have nonnegative values. The system matrix A = (a ij ) I1×J1 ∈ R I1×J1 is a linear operator that implements numerically the operators of projections that fully describe the geometry of the acquisition system. The coefficient a i,j characterizes the probability that an event occurring on a photon in pixel i is detected on pixel j. Thus, we have a ij = 0 when ray j does not intersect voxel i, and 0 < a ij ≤ 1 when ray j intersects voxel i.

2.1.5.
Incorporating noise. Generally, the measurements are modeled as follows. 1) Dark noise is modeled by an additive Gaussian noise r i ; 2) photon-counting noise is modeled by assuming the measurements to be independent Poisson variables. This leads to the general model for the measurements: where P(γ) describes a Poisson distribution with parameter γ.
Since new generation photon-counting detectors are not affected by dark noise, we have here for all i r i = 0. Hence the model for the measurements reads: Classically we compute the negative log-likelihood by taking the logarithm of this distribution and keeping the µ-dependent part. We obtain that for our photoncounting detectors, the negative log-likelihood writes: This problem is ill-posed, and we add a regularization term J(µ) (described later) to the data fidelity term L CT . Thus we consider the following problem: λ is a regularization parameter that balances the data-fidelity term with the regularization J(µ), the latter being a cost function that introduces a priori information on the object to reconstruct.

Positron emission tomography.
2.2.1. Introduction. Positron Emission Tomography (PET) is a medical imaging modality that provides measurements of the metabolic activity in the imaged body. Before the exam, a radiotracer attached to a molecule is injected to the patient, for instance an analog of glucose like the [18]-FDG. The level of absorption of the molecule depends on the metabolic activity of the imaged organs. The attached radiotracer thus provides information on this metabolic activity.
To acquire information on the radiotracer, one proceeds as follows. The radioactive decay emits positrons, each of which annihilates with an electron after a very short time of flight in the body (the mean free path is around 1mm); during this positron-electron annihilation, two gamma rays of 511keV are emitted in opposite directions, which are detected by the PET detectors located on a ring around the patient.

2.2.2.
The ClearPET/XPAD demonstrator. The ClearPET/XPAD small animal PET/CT scanner is based on the combination of the ClearPET scanner [31], built at EPFL (Lausanne) within the Crystal Clear Collaboration and the PIXSCAN microCT prototype, built at CPPM (Marseille). The ClearPET has two main characteristics: its detectors and its partial ring geometry. The detectors are constituted of LSO/LuYAP phoswich crystal matrices read-out by multi-anode photomultiplier tubes. The partial ring geometry allows for inserting an X-ray tube and the hybrid pixel camera which together constitute a micro-CT scanner [32,41]. Therefore, both modalities share the same field of view.
2.2.3. Acquisition model. We wish to reconstruct the concentration of radioactive decay v = v(r), subsequently called concentration of activity. The PET-scan detectors count the number of gamma rays along each line-of-response (LOR) L. These measurements are denoted by w L .
The absorption coefficient µ at 511keV of the material lying along the line-ofresponse is taken into account to model the absorption of photons emitted at this energy. For the PET reconstruction problem, we consider that µ is known and obtain the following acquisition model: (8) w L = L v(l) exp (−µ(l, E 511 ))dl.

2.2.4.
Discretization of the problem. The discretization of the PET acquisition can be deduced from Equation (8) by assuming that the concentration activity is discretized into a number of voxels equal to I 2 , so that the vector to reconstruct is v ∈ R I2 . In the same way as for the CT-scan, we assume that J 2 measurements are acquired by Θ cameras uniformly located in angles around the imaged object, and each composed of N pixels so that J 2 = N Θ. We finally obtain: where the matrix C = (c ij ) I2×J2 ∈ R I2×J2 incorporates the parallel-beam geometry of the acquisition. Depending on the numerical implementation of the projection and backprojection, the coefficient c ij may typically describes the probability that an event (a gamma emission) occuring in the voxel i is detected on the pixel j. If we define the final system matrix B for the PET-scan problem that incorporates the µ(E 511 ) correction for all i, j, b ij = c ij exp (−c ij µ ij (E 511 )), we deduce the final discrete acquisition framework: 2.2.5. Incorporating noise. As the PET-scan measurements are a counting process, we model the data acquisition of a PET-scan in a similar way as for the CBCT problem by incorporating a pure Poisson noise: where v ∈ R I2 denotes the radioactive concentration vector to reconstruct, w ∈ R J2 the vector of measurements and B the system matrix which describes the full properties of the PET-scan (J 2 < I 2 ). The negative log-likelihood that we want to minimize then reads: Since this problem is ill-posed, we add a regularization term J to the data fidelity term L P ET and we consider the PET-reconstruction problem: Some of the algorithms we use here can not actually minimize the previous formulation because of the presence of the logarithm term. Therefore, we also consider a smoothed version by replacing log(x) by log (x) = log(x + ). In the following, our goal is to minimizing Eq.(13) with the fidelity term: Bear in mind that according to the algorithm used, we have either = 0 or > 0.
3. State of the art. The CT and PET problems (7) and (13) are different but share the two main difficulties: 1) handling a Poisson noise and 2) inverting a X-ray transform (generalization of Radon transform). Therefore, the methods developed in the literature for both problems come from the same seminal families. In the following we first describe commonly used methods in CT and PET. Then we explain recent convex optimization technics in the more general context of Poisson noisy data.
The Filtered back-projection (FBP) is the classical method to invert a X-ray transform [19]. It amounts to pre-filtering the data and backprojecting the result [24]. The pre-filter amplifies high frequencies in an attempt to recover sharp edges in the reconstruction. However, this also causes noise amplification, which significantly decreases the performance of the method as the level of noise increases.
Among iterative methods, Expectation-Maximization algorithms [15] are the most widely used to maximize the log-likelihood (see [43] for PET and [30] for both CT and PET). It can be unstable at high levels of noise (or low counts) , but may be stabilized by adding a regularization term.
Iterative methods that incorporate a regularization term have also evolved. Fessler et. al. [18] followed the approach of [38] using quadratic surrogates to minimize the full functional in PET with a quadratic regularization. These methods may be accelerated [26], but to garantee their convergence one must use a relaxed version [1]. Other regularizations have been proposed, such as the Huber regularization [27,16] or the median root prior [2].
In this paper, we are interested in incorporating more recent a priori that aim at preserving sharp edges in the reconstructed data, such as the Total Variation (TV) or more generally 1 -type regularizations ( 1 ). Such regularizations introduce a non-differentiability in the functionals that is algorithmically challenging.
In context of CT, Total Variation has been used only with an approximate version of the likelihood [44], thus assuming a Gaussian noise instead of Poisson. Using quadratic surrogates functions together with TV penalization or also 1 regularization, the authors in [23] develop algorithms which are actually very close to proximal algorithms developed in convex optimization.
The pure Poisson noise problems with 1 regularization such as (7) and (13) can actually be tackled using recent developments in convex analysis [12,13,34,11]. The choice of the parameter λ in the regularization term was studied in [6]. The framework is to consider (7) and (13) as the sum of two non necessarily differentiable functions. The proximal operator is one of the key tools to handle the non-differentiability [28].
Some proximal algorithms were used with a Poisson noise model. The forwardbackward splitting algorithm [13] was used in [17] for image deconvolution, after applying the Anscombe's transform. Recall the Anscombe's transform [3], like the Haar-Fisz transform [21], is a variance stabilizing method which converts Poisson noise into Gaussian noise. In [20], the authors study the Alternative Direction Method of Multipliers (ADMM) which is able to optimize totally non-differentiable functionals. The algorithm PPXA, which deals with the sum of possibly more than two (non-differentiable) functions, was applied to dynamical PET [40,39]. Mathematical proofs ensure convergence but each iteration requires several steps.
Proximal operators may also be used in primal-dual methods, in which the primal and dual variables are successively updated [12]. Recently, the authors in [7] developed a primal-dual algorithm that minimizes a data fidelity term based on Poisson noise regularized with TV. They applied it to problems of denoising and deblurring images. The new primal-dual algorithm in [11] is able to minimize the sum of two convex functions without additional assumptions on their regularity but some on the Legendre-Frechel transform of one of the functions. It turns out that this setting fits perfectly with the goal of minimizing the negative log-likelihood penalized by non differentiable regularizations for both the PET and CT problems. Note that theoretical convergence to the unique solution as well as numerical performances are addressed in the previously mentioned articles.
In the following, we present several proximal algorithms that minimize (7) and (13) with different regularizations and compare their performance to those of well established algorithms. More precisely, we study two types of regularizations: the total variation and the 1 -norm on a wavelet frame, which we describe in Section 4. We apply three different algorithms described in Section 5: the projected gradient, the forward-backward splitting algorithm, and Chambolle and Pock's algorithm. When possible, we use accelerated versions of these algorithms.

4.
Regularizations. In this paper, we consider two regularizations that promote sharp edges in the reconstruction: a 1 -norm on a (wavelet) frame and the total variation. We also investigate a smoothed version of the TV norm, that allows to use a simple projected gradient because it is differentiable but results a loss in sharpness. Let us describe the notations first. 4.1. Discrete setting. We use the same notations as in [10]. The image is a two dimensional vector of size N × N . X is the Euclidean space IR N ×N endowed with the inner product u, v = 1≤i,j≤N u i,j v i,j and the norm u = u, u . We introduce a discrete version of the gradient operator. If u ∈ X, the gradient ∇u is a vector in X × X given by: The discrete version of the divergence operator is defined by analogy with the continuous setting. We have div = −∇ * where ∇ * is the adjoint of ∇, that is, for every p ∈ X × X and u ∈ X, −divp, u X = p, ∇u X×X . It is easy to check that: The discrete version of the Laplacian operator is defined by ∆u = div(∇u) if u ∈ X.
• The discrete total variation of u is defined by: • A regularized version of the total variation is: • A 1 (or sparsity-inducing) norm on a frame reads: where {ϕ γ } γ∈Γ is a tight frame, i.e. a family of elements of X such that and R ϕ is the frame analysis operator: Frames are a generalization of orthornormal bases that include unions of several orthonormal bases and other redundant systems.
Sparsity-inducing norms on frames such as wavelet frames or curvelets allow to recover efficiently images that have sharp edges and thus are widely used in image processing.

5.
Algorithms. The non-differentiability of the regularizations we consider are challenging problems when optimizing Problems (7) and (13). Therefore, we taclke them using proximal algorithms. After basics are reminded in Section 5.1, Sections 5.2 and 5.3 describe the three algorithms we tested, two of which are primal algorithms: the forward-backward splitting algorithm and the projected gradient, the last one -Chambolle-Pock algorithm -being primal-dual.

Proximal operators and other definitions.
We remind some mathematical concepts coming from convex analysis and refer the reader to [42] for a complete introduction to this subject.
X and Y are two finite-dimensional real vector spaces embedded with an inner product ., . and the associated norm . = ., . . We introduce a continuous linear operator K : X → Y with respect to the induced norm Let F : X → [0, +∞) and G : X → [0, +∞) be two proper, convex, lower semi-continuous (l.s.c.) functions.
We define the Legendre-Fenchel conjugate of F by We recall that the subdifferential of F , denoted by ∂F , is defined by The proximity operator is defined by (see [28] for computation and details): We refer to [13] for examples of proximal operator computations. Notice computing the proximal operator is itself a minimization problem. As we will see later, the proximal operator is sometimes straightforward to compute (see Section 5.2.2) but may in other cases require the use of an iterative algorithm (see Section 5.2.3 and 5.2.4).
5.2.1. Forward-backward splitting and an accelerated version (FISTA). The Forward-Backward algorithm [13,14] was designed to solve the unconstrained minimization problem: where F is a convex C 1,1 function, with ∇F L-Lipschitz, and G a simple convex l.s.c. function (simple means that the proximity operator of G is easy to compute). The Forward-Backward algorithm [13,14] reads: • Iterations (k ≥ 0): update x k as follows: This algorithm is known to converge provided h ≤ 1/L. In terms of objective functions, the convergence rate is of order 1/k. It has been shown by Nesterov in [33,34,35], and by Beck and Teboulle in [5], that it could be modified so that a convergence rate of order 1/k 2 is obtained. The following algorithm, proposed by Beck and Teboulle in [5] and called FISTA, converges provided h ≤ 1/L: • Iterations (k ≥ 1): update x k , t k+1 , y k+1 as follows: Notice that both algorithms require the computation of the proximal operator (I + h∂G) −1 , which may itself be computed via a Forward-Backward algorithm or FISTA.

5.2.2.
Projected gradient algorithm. A particular case of Problem (25) is the constrained minimization of a differentiable function F , i.e. when G is the indicator function of a closed convex non-empty set C: The projected gradient algorithm is exactly Algorithm 1 replacing G by χ C . Indeed, one easily checks that (I + h∂G) −1 = P C , the orthogonal projection onto C. In this case, Algorithm (2) is an accelerated projected gradient method that converges provided h ≤ 1/L.
• Iterations (k ≥ 1): update x k , t k+1 , y k+1 as follows: In this paper, we are also interested in cases where G contains the non-differentiable regularization (TV or 1 ) and the constraint together. In these cases, the proximity operators are not analytically known. We use FISTA to compute them. 5.2.3. FISTA and constrained total variation. When G = J T V + χ C , finding (I + h∂G) −1 amounts to solving the constrained total variation problem (30) min which can be done using FISTA. Indeed, in [4], it is shown that the dual problem is a constrained smooth minimization problem. It can be solved with the previous projection algorithm with the FISTA acceleration.
Notice that when C = X (i.e. no constraint), then the derivation of the dual problem was first done in [10], and the acceleration using Nesterov ideas in [46]. 5.2.4. FISTA and constrained regularization. We show here how the results of Beck and Teboulle in [4] can be adapted to a general 1 regularization. When G(u) = Ku 1 + χ C (u), finding (I + h∂G) −1 amounts to solving the following problem: (Notice that when K is the gradient operator, Problem (31) is the constrained total variation regularization Problem (30).) Proposition 1. Let us set: where H C (u) = u − P C (u) and P C (u) is the orthogonal projection of u onto C. Let us define: Then the solution of Problem (31) is given by: Proof. Using the remark that and since the minimizer of the right term of (35) is given by u = P C (f + λK * v), solving Problem (31) amounts to solving Problem (33).
Hence it suffices to compute the minimizer of (33) to get the solution of Problem (31). Problem (33) is the constrained minimization of a smooth function, and this can be done with FISTA (2). Indeed, it is easy to prove that the function h K given by (32) satisfies: (32) is continuously differentiable, and its gradient is given by (36) ∇h K (v) = −2λK(P C (f + λK * v)).
• Iterations (k ≥ 1): update u k , x k , t k+1 , y k+1 as follows: In the case of total variation regularization, K is the gradient operator and −K * the divergence operator. or its dual max y∈Y (−G * (−K * y) + F * (y)) .
One can thus tackle our Problems (7) and (13) using a primal-dual algorithm that aims at solving (38) such as the one proposed by A. Chambolle and T. Pock [11]: • Iterations (k ≥ 0): update x k , y k ,x k as follows: The convergence of this algorithm is proved in [11]: Theorem 5.2. Assume Problem (38) has a saddle point. Choose τ and σ such that τ σ K 2 < 1, and let (x n ,x n , y n ) be defined by (39). Then there exists a saddle point (x * , y * ) such that x n → x * and y n → y * .
Notice that Algorithm (38) can be used even when both F and G are not smooth. One should also be aware that similar approaches had been previously proposed in [12,8].
6.1. Functional. Let y be the data, A the operator, µ the unknown, u the vector whose coordinates are all 1, and z the number of photons.
Notice that we have to take into account the fact/constraint that µ ≥ 0. The considered minimization problem then reads: where J is a regularization operator. (40) when J = J reg T V . In this case, Problem (40) can be solved with the accelerated projected gradient Algorithm 3 with F (µ) = λJ reg T V (µ)+ L CT (µ), and C = {µ ≥ 0}. We have
7.1. Functional. We denote by w the data, B the operator, v the unknown and u the vector whose coordinates are all 1. Notice that we have to take into account the fact/constraint that v ≥ 0. The considered minimization problem then reads:  (25),  .., 1)). Then Chambolle-Pock algorithm becomes in this case: (I + σ∂F * ) −1 is given by (45) (see Lemma 7.1 here-after). When a 1 regularization is considered, i.e. J = J l1,ϕ , then (I + τ ∂G) −1 is computed with the fast Algorithm 4 of Section 5.2.4.
We denote by CPwav this algorithm.
In the case of a total variation regularization, i.e. J = J T V , then (I + τ ∂G) −1 is computed with the Algorithm 4 with K the gradient operator, as in [4] (see Section 5.2.3).
We denote by CP-TV-BT this algorithm.

Proof. Let us define the function f i on IR by
Since it is separable solving the minimization problem amounts to compute separately thex i . This yields the following two cases. [28] for details), then we get that y i is given by (45).

Second approach.
It is possible to avoid the inner loop of the previous approach by writing Problem (41) in another way. This is more elegant, but since it involves considering a higher dimensional operator, this will be less efficient in term of computation speed (as we will see in the numerical section of the paper). Problem (41) can be rewritten into: The associated saddle point problem is: Chambolle-Pock algorithm thus becomes in this case (gradient descent in v, gradient ascent in y and z): There are 3 proximity operators to compute. It is straightforward to check that (I + α∂χ C ) −1 (v) = P C (v) with P C the orthogonal projection onto C and (I + β∂F * ) −1 (y) = P C (y) with P C the orthogonal projection onto C = {y / y ∞ ≤ 1}. As for (I + γ∂G * ) −1 , its value is given by Lemma 7.1 (see formula (45)).
We denote by CP-TV this algorithm.
8. Results on simulated data.

Experimental setup.
In this section, we present the results of our numerical experiments. We analyze the performances of the different algorithms we have presented so far, as well as three state-of-the-art algorithms. In the case of the PET problem, we also ran experiments using the very similar spiral [23] algorithm. The experiments are made on simulated observations for both PET and CT. The simulations were done using the open-source Matlab toolbox Image Reconstruction Toolbox (irt) 1 for several levels of noise. The true objects that we seek to reconstruct are displayed in Figure 1 (these are 128x128 images for Zubal's phantom, 256x256 images otherwise). The three phantoms used here are standard tomography phantoms. Zubal's phantom displays tissues of different absorption and is used here to study the global performances of the algorithms. For this phantom, the performances are evaluated in terms of the standard Signal-To-Noise Ratio (snr), the Structural Similarity ssim [45] (see Eq. (49) and (50)), as well as the computation time. The contrast and resolution phantoms are used to study the precision of the algorithms in terms of contrast or resolution i.e. to evaluate which is the lowest contrast and the smallest object size that can be resolved. As is well-known, global measures such as the snr are not accurate enough to evaluate fine contrast or resolution. Therefore, we also compute the better suited Contrast-to-Noise Ratio (cnr) [25] defined in Eq. (51).
When comparing a reconstructed image I to the true object T , the snr, ssim and cnr are defined as: For the ssim, w is a window sliding through the image and a and b are constants. For the cnr, the subscript in refers to the inside of the test tubes and out to the outside of the test tubes within the phantom. This measure is straightforward in the case of the resolution phantom since all test tubes have the same intensity. In the case of the contrast phantom, we average the cnr measures for the 6 test tubes since they have different contrasts.
Let us emphasize that T V and 1 regularizations are well adapted to characteristics of Zubal and the contrast phantoms (and to a lesser extent to the resolution phantom) since these images are flat by parts. 8.1.1. CT. For the CT problem, the level of noise in the observations is driven by the source power measured by the photon counts. We started from a standard level (1e5 photons) and studied how the quality degrades at lower levels (1e4 and 1e3 photons).
We compared three algorithms we proposed, namely • the gradient descent with a regularized version of the tv-norm (referred to as TVreg, see Section 6.2), • the Forward-Backward algorithm with a 1 penalization in Wavelet space using the Haar wavelet, 3 levels of decomposition (referred to as FB-Wav, see Section 6.3), • the Forward-Backward algorithm with a tv penalization (referred to as FB-TV, see Section 6.4), to three state-of-the-art algorithms implemented in the irt toolbox: • the filtered back-projection (referred to as FBP ), • the MLEM algorithm (referred to as MLEM and described in [30]), • the MLEM algorithm penalized by a Huber function (referred to as MLEM-Huber or MLEM-H and described in [18]).

PET.
For the PET problem, the level of noise in the observations is driven by the efficiency of the detector. We studied several levels of fcount: 5e5, 2e5, 1e5. In addition to the six algorithms studied in the CT problem, we also ran expriments in PET with three versions of the Chambolle-Pock algorithm we described in Section 7.4: • Chambolle-Pock's algorithm computed using the first approach, with a tv penalization computed using FISTA (referred to as CP-TV-BT, see Section 7.4.1), • Chambolle-Pock's algorithm computed using the second approach, with a tv penalization (referred to as CP-TV, see Section 7.5), • Chambolle-Pock's algorithm computed using the second approach with a 1 penalization in Wavelet space using the Haar wavelet, 3 levels of decomposition (referred to as CP-Wav, see Section 7.4.1), as well as the spiral [23] algorithm (referred to as SPIRAL).
In the next two subsections we give the results of our experiments.

8.2.1.
Zubal's phantom. For this phantom, we show in Figure 2 the reconstructions obtained for a noise level of 1e2 photons count. Table 1 synthesizes the results. This table gives the mean snr, ssim and computation time for one hundred realizations of the noise. If relevant, we also give the number of iterations nb. iter. and the regularization(s) parameter(s) (λ), which is tuned to yield the best snr. We indicate in bold the settings yielding the best snr and ssim.  Both snr and ssim indicate that the proximal algorithms yield the best results, and that they are more robust to noise than the state-of-the-art algorithms. The Filtered-Back-Projection clearly suffers many artifacts due to the-ill-posedness of the problem. Plain MLEM is able to recover good quality results at high photon count; however it is clear that it is also unstable as the noise increases. All other methods are able to stabilize at higher noise level since they incorporate a regularization.
In terms of snr and ssim, MLEM-Huber yields comparable results to the proximal algorithms we proposed at high photon counts, however those indicators drop faster as noise increases for MLEM-Huber. Finally, the visual aspect of the forwardbackward reconstructions is quite different from the MLEM-Huber ones. The flat part in Zubal's phantom tend to be reconstructed as flat parts with the forwardbackward algorithms while they seem more "cloudy" with MLEM-Huber. This is precisely the advantage of using a 1 -type regularization (as in the Forward-Backward algorithms) versus a quadratic one (as in MLEM-Huber). Notice that the blocky aspect in the wavelet-based Forward-Backward reconstructions is due to the fact that we use the Haar wavelet.

Effects of noise and number of projections on contrast and resolution.
We now study how the quality degrades with noise and when the number of angles of projections decreases for both the resolution and contrast phantom. The noise level considered are again 1e4, 1e3 and 1e2 photons count, while the number of angle of projections is 90, 60 and 30 (sampled uniformly). The performances are evaluated in terms of cnr, ssim and snr. But for each algorithm, the hyperparameters are tuned to yield the best cnr. We summarize in Tables 2 and 3 the numerical results obtained with our methods and state-of-the-art methods. Images of reconstructions are displayed in Figures 3 and 5. Furthermore, profiles are extracted from both phantoms and their reconstructions to assess the precision in terms of resolution and contrast recovery for each method. These are shown in Figure 4.
For the proximal approaches and the MLEM approaches we also noticed that the quality of the reconstructions does not degrade much when the number of projections decays from 90 to 30. On the other hand, the quality does decay with the photon count, as may be seen in Figure 3 and Figure 5. These figures show the reconstructions for the two MLEM approaches and the proximal FB-TV approach, for 60 projections as the photon count decreases from 1e3 to 1e2. The snr, ssim and cnr of each reconstruction are given below it. These three indicators all show that FB-TV performs the best, then MLEM-Huber and finally MLEM for a photon count and number of projection fixed. One may also notice that the indicators degrade faster for MLEM, a little slower for MLEM-Huber and yet even slower for FB-TV as the photon count decreases.
As we saw for Zubal's phantom , the visual characteristics of the MLEM-Huber and FB-TV are quite different, the former yielding a "cloudy" reconstructions, the latter flat-by-part ones. This visual sensation at low photon count is also verified at higher counts. To show this in more details, we plot in Figure 4 a profile cut through the test tubes for the contrast and resolution reconstructions for both algorithms.
As far as the contrast phantom is concerned, there is a clear advantage to using the FB-TV algorithm over the MLEM-Huber, since the latter will not reconstruct the two lowest contrasted tubes from 1000 photons count to counts below, while the FB-TV algorithm reconstructs all contrasts at 1000 photons and only fails at reconstructing the faintest one at 100 photons.  Table 3. CT resolution phantom reconstruction results.  As far as the resolution phantom is concerned, the differences are not as clear to the eye, since one may distinguish objects of the same size for both approaches. However one may argue that further processing such as automatic segmenting will yield better performances when fed with FB-TV reconstructions than with the MLEM-Huber ones. Indeed, these programs rely on edge detectors and thus will work better with the FB-TV reconstructions that are flat by parts than with the fuzzier ones obtained with MLEM-Huber.

PET results.
8.3.1. Zubal's phantom. As in the CT case, for Zubal's phantom, we study the performances of the different algorithms in terms of snr and ssim (the hyperparameters are tuned to yield the best snr). Table 4 synthesizes the results (see Section 8.2.1 for the legends). We show in Figure 6 the obtained reconstructions for the level of noise fcount = 1e5 (note that for the algorithms we proposed, the figure displays only the most efficient ones: Chambolle-Pock's algorithm).
The conclusions here are very similar to the CT case, the algorithm we proposed yielding constantly better snr and ssim, and being more robust to increasing noise levels. MLEM-Huber does yield interesting results though not as high quality as the proximal algorithms. The "cloudy" and "flat-by-part" aspects again differentiate the 1 versus 2 regularizations.

8.3.2.
Effects of noise on contrast and resolution. As in the CT case, we study how the quality degrades with noise for both the resolution and contrast phantom. The noise level considered are again f count = 5e5, 2e5 and 1e5, while the number of angle of projections is 60. The parameter f count stands for the total number of counts acquired on the sinogram. The performances evaluated in terms of snr and ssim displayed in Table 5.
Compared to the CT problem, the PET problem seems to be a much harder problem to solve. The quality of the reconstructions for all algorithms compared to the CT ones is much degraded. One notes however that as happens for CT, the Filtered Back-Projection is fast left behind by the other methods. Again all proximal algorithms perform similarly, although the tv based algorithms give slightly better results than the wavelet-based ones. We focus on the Chambolle-Pock algorithm, which deals with the exact fit-to-data term ( = 0) as opposed to the Forward-Backward ones.   Although MLEM-Huber yields better numerical numerical results for low noise (f count = 5e5), at low fcounts (2e5 and 1e5) i.e. when the dose decreases, all three indicators show that the proximal tv-based algorithm CP-TV-BT performs better than MLEM-Huber, which itself performs better than a plain MLEM. The plain MLEM also degrades faster than the regularized algorithms (see Figures 7 and 8). Here however, one notices that the differences between the indicators is not as marked as for CT.
Concerning the contrast phantom, visual inspection of Figure 7 seems to show that the lower right tube has more chances to be recovered by the MLEM-Huber while it is the opposite for the top right tube. Examining the profile in Figure 9, the explanation is that the CP-TV-BT slightly underestimates the intensity of the flat part, which is a well known artifact of 1 estimation.
The results obtained with the resolution phantom are similarly not totally conclusive since it is not clear whether one method or the other fails at recovering the finest resolution objects. However in both cases, it remains clear that CP-TV-BT will lead to reconstructions that are flat by part, thus making it more likely to yield high quality automatic segmentations of the different imaged tissues than the MLEM-Huber approach.  9. Results on real data. In this section, we present the results obtained on real CBCT and real PET acquisitions. The CBCT data have been obtained with the PIXSCAN II demonstrator developed at CPPM and incorporating the XPAD3 hybrid pixels detectors. The TEP data have been obtained with the ClearPET demonstrator developed at CPPM. 9.1. Results on CBCT.
9.1.1. Experimental setup. As described in Section 2.1.2, we used a wide aperture cone beam, Tungsten target, X-ray tube (90kV, 60W) whose spectrum was hardened by a 2.5mm Aluminium filter. In this configuration, four complete acquisitions have been realized where 15000, 10000, 1000 and 600 photons were counted on average per pixel for flat irradiation conditions (in absence of a mouse) after respective 1.5s, 1s, 100ms and 50ms exposure times. For each acquisition, 360 projections of the same healthy mouse were acquired every degree in about 15 minutes per acquisition.
Image reconstruction was then performed using the different algorithms described in Section 9.1.1. For each of the four acquisitions, we performed the 2D reconstruction of a slice, chosen to be the medium slice lying in the plane containing the optical axis, which joins the source point and the detector and is perpendicular to the detector plane. This slice corresponds to the abdominal region of the mouse where soft tissues and spine bones are visible. Given the pixel size of 130 × 130µm 2 and a geometrical zoom factor of two, a slice of 614 × 614 pixels of an average resolution of 65 × 65µm 2 and of thickness 65µm was reconstructed.
The slice reconstruction was performed by the Forward-Backward algorithm with a tv penalization (referred to as FB-TV ). The results have been compared with a standard filtered back-projection implemented in the irt toolbox. In particular, the system matrix was computed with a strip-integral model incorporating the geometry of the acquisition obtained as described in [29,36].
We performed three different experiments for each acquisition: the reconstruction has been performed with 90, 60 and 36 projections obtained by extracting data respectively every 4, 6 and 10 degrees from the complete acquisition with 360 projections. These experiments aim at evaluating the robustness of our algorithms to the reduction of projection angles, one of the current hot topic in the field of CBCT reconstruction. 9.1.2. Results and discussion. Figures 10 to 13 display the results obtained with a decimated dataset of 60 projections over the 360 that have been acquired. They respectively correspond to photon counts of 15000, 10000, 1000 and 600 in flat irradiation conditions. Note that between these two extreme experimental conditions, the X-ray dose absorbed by the imaged mouse is reduced by a factor 25 for a full acquisition of 360 projection angles. On Figure 10, we compare the results obtained with different values of the parameter λ with the results of the Filtered Back-Projection (FBP) reconstructed from the full acquired data (i.e. 360 projections) and from the decimated acquired data (i.e. 60 projections). The FBP reconstruction from the full data can be considered as the reference reconstruction for these tests, although it suffers from artifacts introduced by the erratic behavior of a very small proportion of pixels that can not be fully corrected by pre-processing.
One can typically notice circular-shaped artifacts in the mouse body in the reconstructed slices with FBP at 15000 and 10000 photon counts, and several rays of light are visible in the reconstructed slices for number of angles lower than 360 whatever the photon count. Moreover, the Filtered Back-Projection is known to produce negative values without any physical meaning. For this reason, it does not appear relevant to use a snr or a ssim criteria using the FBP reconstruction with full data as a reference.
The very good quality of results obtained for usual flat irradiation (10000 photon per pixels) highlights the robustness of the FB-TV algorithm with respect to the number of projection angles. With a strong enough regularization weight, the reconstruction from only 60 projection angles is not subject to any initial artifacts induced by dead or failing pixels: ringing artifacts are eliminated as well as spurious rays of light. Moreover, the essential structures of the image are preserved, which allows to identify the main biological features: pawns (outside the main body), spine bone (bright part at the center of the body), a very tiny part of a rib bone (small bright part around the bottom of the body), some air or gas areas (black holes at the top and top-left of the body), and the cylindrical plastic support of the mouse (on the top-left of the images, outside the body). The Total Variation prior seems well adapted to the presented images, but these first results on real data have to be completed by additional tests on images of other anatomical parts that can be significantly more complex and less adapted to the TV prior. An optimal choice of the regularization parameter is obviously highly subjective on real data. A value of 25 or 40 for the λ parameter strongly regularizes the images whatever the photon count whereas a value of 5 or 15 will mainly reduce the noise and preserve the finest details that might be of interest from a medical point of view. The images acquired with a 15000 and 10000 photon count and reconstructed with λ ∈ {25, 40} are accurate and edges are sharp. However, the same values of λ for lower photon counts introduce a blur which in particular slightly degrades the accuracy of the paws or the spine of the mouse. A lower value of λ leads to reconstructions that are satisfactory for a diagnosis, but that may be noisy and then less adapted to be the input of a segmentation or classification algorithm, or to a statistical analysis. Here, the trade-off has to be made jointly with medical experts, and it strongly depends on the subsequent use of the images.   These reconstructions have been obtained with the same value λ = 20. Even if the quality of images logically decreases when the number of projections is reduced, the overall quality of reconstruction with only 36 projection angles and a 600 photon count is high enough to visually identify the main features. This is the worst tested case here: it corresponds to a theoretical reduction of X-ray dose of 250 when compared to a reconstruction with 360 projection angles with a photon count of 15000. However, these images suffer from noise, due to the low level of the acquired signal which is consequently strongly corrupted by Poisson noise. This noise can only be completely eliminated by a strong regularization which may introduce a blur and make a diagnosis more difficult to establish. 9.2. Results on TEP.
9.2.1. Experimental setup. A complete acquisition of a resolution phantom (also called Derenzo phantom) has been realized. This phantom is constituted of six angular sectors, each one containing several balls of the same size. The size of balls is different between each angular sectors. For reasons of computational complexity, we only performed a reconstruction of a 2D slice extracted from the complete volume. In that purpose, we used the data acquired on the segment 0.
The sinogram, constructed from the data acquired on the segment 0 on 81 pixels, is of size 81 × 80. The reconstructed slice is of size 81 × 81 and has a resolution of 1.15mm at the center of the volume.
Due to the rather short exposure time, the mean count per pixel on the data is of 76 and the maximum count rate is of 160, which corresponds to a total number of counts of 5e5 on the whole sinogram. Data have been normalized using the STIR library, i.e. corrected with the geometric effects and the sensitivity of detectors effects.
The slice reconstruction was performed by the Forward-Backward algorithm with a tv penalization (referred to as FB-TV ). The results have been compared with a standard filtered back-projection implemented in the irt toolbox. In particular, the system matrix was computed with the same strip-integral model than for CBCT real data, that incorporates the geometry of the ClearPET acquisition. 9.2.2. Results and discussion. Figure 14 displays the results obtained for the reconstruction of the Derenzo phantom from the acquisition that has been made with the ClearPET demonstrator. Given the low level of count, the results of the 2D reconstruction with the filtered back-projection are of quite poor quality. We can only clearly distinguish balls on three sectors out of 6, and the rest of the reconstruction phantom suffers from high frequency noise whereas it should be homogeneous. The balls appear sharper and slightly more accurate with a TV regularization with λ = 0.5 to λ = 5 and the rest of the phantom is more uniform and better reconstructed. A too large value of the regularization parameter (the example here is the reconstruction obtained for λ = 50) leads to uniform phantom out of the balls. However, the TV regularization completely smooths the balls that can not be distinguish any more in this case. A better model like Total Variation for this image may allow us to better restitute the structures of interest while preserving the uniformity and the accuracy of reconstruction concerning the other structures.