Local Models for Scatter Estimation and Descattering in Polyenergetic X-Ray Tomography

We propose a new modeling approach for scatter estimation and descattering in polyenergetic X-ray computed tomography (CT) based on fitting models to local neighborhoods of a training set. X-ray CT is widely used in medical and industrial applications. X-ray scatter, if not accounted for during reconstruction, creates a loss of contrast in CT reconstructions and introduces severe artifacts including cupping, shading, and streaks. Even when these qualitative artifacts are not apparent, scatter can pose a major obstacle in obtaining quantitatively accurate reconstructions. Our approach to estimating scatter is, first, to generate a training set of 2D radiographs with and without scatter using particle transport simulation software. To estimate scatter for a new radiograph, we adaptively fit a scatter model to a small subset of the training data containing the radiographs most similar to it. We compared local and global (fit on full data sets) versions of several X-ray scatter models, including two from the recent literature, as well as a recent deep learning-based scatter model, in the context of descattering and quantitative density reconstruction of simulated, spherically symmetrical, single-material objects comprising shells of various densities. Our results show that, when applied locally, even simple models provide state-of-the-art descattering, reducing the error in density reconstruction due to scatter by more than half.


Introduction
In the event of a possible nuclear threat, the United States government employs an Emergency Response Team to determine the proper course of action. A crucial step in assessing the threat is a determination of the object characteristics, and radiographic examination of the object is an important part of this process. While tomographic reconstruction techniques such as filtered backprojection or model-based iterative reconstruction (MBIR) [1] are available, great difficulty remains in determining the material composition of the object due to scattered radiation, spectral effects, and issues associated with the detector. Similar challenges exist in the context of baggage screening and nondestructive testing.
This work focuses on addressing scattered radiation in service of performing quantitative radiographic reconstructions. In general, scatter creates loss of contrast impairing the ability to find sharp features as well as introducing severe image artifacts, e.g., cupping, shading, streaks, etc. Consequently, in applications in which a quantitative density reconstruction is desired, scatter must be mitigated in hardware (e.g., by increasing the distance between the object and the image plane, or through use of a Bucky-Potter grid) and/or removed in software during reconstruction.
Scatter is a complex phenomenon with many possible types of individual radiation-matter interactions (as X-ray photons pass through an object), e.g., Compton scatter, Rayliegh scatter, pair production, and the photoelectric effect [2]. The total amount of scatter along a ray is simply proportional to the probability of interaction [2], but the difficulty lies in computing the spatial distribution of scatter reaching the detector. The transport of the scattered radiation is in a direction away from the direction of the initial scattering interaction. Consequently, the resulting scatter that contributes to the image is a function of the magnitude of the scattered radiation produced, its angular distribution and the density distribution of the object. This motivates models in which a convolutional kernel is utilized to capture the spatial aspects of scatter, but where the kernel somehow changes in response to the density distribution of the object, e.g., [3].
There exists a vast literature on scatter correction, beginning in at least 1976 [4] and presented in the context of many different scanners and applications, from keV CT for medical imaging to MeV CT for nondestructive testing (where keV and MeV refer to the nominal energy of the X-ray photons). Here, we aim to discuss the main trends in computational approaches to scatter correction, see the pair of reviews [5,6] for more comprehensive coverage.
Most scatter correction techniques rely on scatter estimation, and can therefore be classified by what information they estimate the scatter from, e.g., the total transmission radiograph, a radiograph without scatter (which we refer to as a direct radiograph), or the reconstructed density [5]. Scatter estimation from a reconstructed density volume can be highly accurate, because, in principle, the density fully determines the scatter. In practice, it is slow because it necessitates alternating between approximate reconstruction of the 3D density and scatter estimation and removal. Scatter estimation from the total transmission (i.e., the radiographic measurements corrupted by scatter) has the advantage of providing single-shot descattering: no iteration is required; however, these methods have limited accuracy because some approximation is involved when predicting scatter without access to the underlying density. Scatter estimation from a radiograph without scatter strikes a balance between accuracy and speed, requiring only iteration in the radiograph domain (the scatter is repeatedly estimated and removed based on iteratively estimated direct radiographs) thereby avoiding the need to perform multiple high-dimensional density reconstructions. Which scatter estimation approach is best depends on the application: the amount and dominant mechanism of scatter, the requirements on descattering and reconstruction time, and the desired accuracy of the reconstructions.
In this paper, we propose a locally learned model of scatter, wherein the free parameters of a given model are fit on training data in a local neighborhood around the 2D radiograph to be descattered. We emphasize that the proposed model is local in the sense that the model changes as a function of the input-as distinct from spatially local models, where the model changes as a function of location in the radiograph.
We use the proposed local scatter model as a component of an algorithm to reconstruct the material density field of an object (with known material composition) from radiographs obtained with a polyenergetic X-ray beam. Our focus on quantitative material density reconstruction is somewhat different than in typical medical applications, where the material composition of the object being imaged (and therefore the physical properties that would allow mapping linear attenuation to density) is unknown and therefore typically only the linear attenuation map is reconstructed (with some exceptions, e.g., [7]). The use of a polyenergetic beam introduces another complication to our problem as compared to medical imaging, where often (although not always, e.g., [8]) a monoenergetic source is assumed. In our setting, we assume known materials, which gives us the opportunity to reconstruct density without facing an underdetermined problem. Future work will examine objects consisting of multiple material compositions.
The rest of the manuscript is organized as follows. The remainder of this section discusses related work. In Section 2, we discuss the descattering problem, including the forward model for monoenergetic and polyenergetic tomography. In Section 3, we discuss local modeling in general, then describe local models for scatter estimation and our descattering algorithm. In Section 4, we present experiments validating our proposed local modeling approach. We provide additional discussion in Section 5 and conclude in Section 6.

Monoenergetic and polyenergetic tomography with scatter
The overall goal of this work is to recover a discretized, space-varying density ∈ R 1 × 2 × 3 , from a set of X-ray projection measurements (radiographs), 1 , 2 , . . . , ∈ R 1 × 2 , where indexes the viewing direction. (If the object being imaged is axially symmetric, as is the case in some important security applications, a single view suffices for reconstruction, since, ignoring noise, all views are identical). In this section, we describe the forward model that relates to { } =1 ; much of our presentation is adapted from the standard textbook [9]. Fig. 1. Sketch of the tomography setup. In reality, the object is 3D and the detector is a plane. ∈ R 1 × 2 × 3 denotes the density field of the object being imaged, , is the set of rays entering detector pixel ( , ), and , is the central ray of , .
In a generic radiographic setup (see Figure 1 for a sketch), there is an X-ray source, one or more objects being imaged, and a detector. Every ray between the source and detector, , intersects a certain amount of each object. This is called the areal density of the object along , where ( ) has units g/cm 2 , ( ), ( ), ( ) are a parameterization of the ray , and is the object index-each different material being imaged corresponds to a different . We note a slight abuse of notation here: a continuous in (1) versus a discrete mentioned before; we will put aside the details of discretization here as they are an implementation detail of the reconstruction software discussed in Section 3.5.

Monoenergetic case
In the absence of scatter and with a monoenergetic source, the number density of photons reaching the detector along ray is where ( ) has units 1/cm 2 , in ( ) is the number density of the incident beam along ray , and ∈ R is the mass attenuation coefficient of the th material, with units cm 2 /g. In our application, we assume known materials, therefore the mass attenuation coefficients can be looked up, e.g., in the XCOM database [10].
In reality, what we measure is not the number density of photons at infinitesimal points, as given by (2), but a discrete, finite image which represents the number of photons reaching each of a finite grid of detectors. We refer to this scatter-free measurement data as the direct radiograph, ∈ R 1 × 2 . It is given by where [ , ] is unitless, , is the set of rays entering detector pixel ( , ), , is the ray through the center of pixel ( , ), and is a constant related to the area of a detector pixel and angle of incidence; the approximation holds when the detector elements are sufficiently small and regular in size (think pixels). Real measurement systems also have a per-pixel gain, meaning that we actually measure . We ignore this gain factor as it is easily handled via calibration.

Polyenergetic case
In the polyenergetic case, we have where in ( , ) is the energy profile of the number density of the incident beam and ( ) gives the mass attenuation coefficient of the th material at energy . Using the same numerical integration approximation as in the monoenergetic case and an additional quadrature for energy gives where indexes discrete energy bins, is a constant depending on the pixel size and energy bin width, and In some cases, it is sufficient to stop at the model for the direct radiograph given by (3) or (5). However, in many cases, there is a nontrivial amount of scatter, i.e., photons that reach the detector without following a straight path through the object ( Figure 2 illustrates the effect this scatter on reconstruction). Scatter arises from several different physical processes. A photon may undergo exactly one coherent scattering interaction with the object; undergo exactly one Compton scattering interaction with the object; interact with and scatter from parts of the measurement apparatus, e.g., the detector itself [11] or the room; or undergo any other interaction(s) not included above, e.g., multiple scatters or pair production. These processes are well understood, but they are complicated enough that they need to be modeled at the individual photon level, a computationally expensive process and one that is difficult to invert. For our purposes, we consider the measured radiograph ∈ R 1 × 2 as a sum of direct photons, as described in (3) or (5), and scattered photons, ∈ R 1 × 2 , = + . We will also refer to as the total radiograph, since it contains both the contribution of the scattered and direct photons. Finding a way to efficiently and accurately model is the main focus of this work.

Proposed approach-local modeling for descattering
As stated earlier, the goal of this work is to recover a 3D object, ∈ R 1 × 2 × 3 , from its 2D tomographic measurements (corrupted by scatter), 1 , 2 , . . . , ∈ R 1 × 2 . Our proposed approach consists of two steps: descattering radiographs and tomographic reconstruction; we now discuss the first of these. We discuss related work, introduce our proposed local modeling approach, describe the specific local models we evaluate, and finish with a description of our descattering algorithm.

Related work on scatter estimation
Most methods that estimate scatter from a direct radiograph use some form of the model where [ , ] and [ , ] are the scatter (only) and direct (without scatter) radiographs at position , and is a scalar function. The term ( [ , ]), sometimes called the scatter potential or scatter amplitude factor, represents how much scatter comes off of a given pixel. The term [ , , , ], called the scatter kernel, represents how much radiation scatters from position ( , ) to position ( , ). In the most general case, it is dependent on the 3D density field of the object being imaged, .
The model (7) is hardly useful as it is because it is so general. The challenge is simplifying it to a point where it can be parameterized and the parameters estimated. One common approach is to fix the scatter amplitude factor to the parametric form ( ) = | ln | , which can be motivated by a simple physical model [12]. Another is to constrain the scatter kernel to be shift invariant, , or a spatially-varying weighted sum of such kernels, where the weights may depend on (estimated) object thickness or distance to the object boundary [3]. There also exists recent work that uses a convolutional neural network to map from a direct radiograph to the corresponding scatter radiograph [13], which we can view as an overparameterized and nonlinear version of (7).
No matter the form of the scatter model, its parameters must be determined in order to perform scatter estimation. One approach is to fit the parameters on simple simulations, e.g., of a pencil beam through a slab [3,14]. Another is to approach fitting as an optimization problem on complex reference simulation(s) [13]. The benefit of the former approach is that it may generalize well and fits well with physically interpretable models. The benefit of the latter is that it may provide more accurate fits, because it explicitly minimizes fit error.

Local modeling formulation
Given a training set of 2D direct radiographs and the corresponding 2D scatter radiographs, {( , )} =1 , our goal is to learn a mapping, A : R 1 × 2 → R 1 × 2 , from direct to scatter. (We leave aside the question of whether such a mapping exists, i.e., whether two objects can produce the same direct radiograph but different scatter radiographs.) These training radiographs can be formed by generating synthetic objects, , and simulating their radiographs via particle transport software, e.g., MCNP [15] or Geant4 [16].See Section 4.2 for details of the training sets we used in our experiments.
Our proposed approach is to represent A via a large number of simple models, i.e., a collection of A 's, each acting in a different region of R 1 × 2 . To do this, we define the nearest neighbor function, U, to return the set of indices of the nearest training pairs to in the training set, i.e., where · 2 denotes the squared Frobenius norm (sum of the squared elements). (Other metrics may be used to determine these neighborhoods, e.g., the Wasserstein metric, if they are more suitable to the particular application) Finally, we select a class, H , of models for the local A 's; see Section 3.3 for specific examples.
Our local model A is then given by Note the use of a subscript : A is a model (i.e., an image-to-image function) fit on the neighbors of ; A ( ) is that model evaluated at the point , which is an image. Put simply: to predict from , we fit a model on the nearest neighbors to in the training set and apply that model to .
What we have described is distinct from the standard approach to fitting a model on a training set, which we refer to here as global modeling. Using our formulation, global modeling corresponds to local modeling with the number of neighbors, , set to the entire training set size, . This simplifies to To see the conceptual advantage of local over global models, consider the case of linear regression: global modeling fits a single hyperplane though all of the data, while local modeling fits different hyperplanes in different regions of the space. Thus, local modeling can accurately describe more complex relationships than global modeling, with the drawback that it may lead to worse overfitting. Related work on local modeling. Local modeling has a long history in a variety of fields. For example, in statistics, local regression (also called LOESS), which involves fitting local models to form a smooth interpolator, has been studied since at least the 1970s [17]. The novelty of our work is to propose and validate specific, high-dimensional local models for use in the context of X-ray scatter estimation, descattering, and density reconstruction.

Models
We now describe the four classes of models (i.e., choices of H in (8) and (9)) used in our experiments. The first two (single field and convolutional) are simple models that we propose; the second two (parametric and multikernel) come from the scatter modeling literature [3].

Single field model
The single field model represents the scatter as being a single image, A ( ) =ˆ∈ R 1 × 2 .
Fitting the single field model requires solving the problem = argmin where X may represent either the entire training set (global fitting) or U( ) (local fitting). The solution to (10) is easily obtained by taking the pixelwise average of the { } ∈X . In the case where one neighbor is used, the single field model is the classical nearest neighbor interpolator. This model provides a simple baseline for comparison.

Convolutional model
The convolution model represents the scatter as a convolution, which is a simple and computational efficient way of modeling the spatial distribution of scatter. The form of the model is Here, * denotes the 2D, linear convolution between zero-padded images, where the result is cropped to match the support of . The size of the kernel is the smallest such that each point in can contribute to each point in the result, thus ∈ R (2 1 −1)×(2 2 −1) . Each pixel in the kernel is arbitrary, thus, the number of parameters for the convolutional model is The nonlinear function is common in the descattering literature and can be justified by a physics argument when = = 1 [12]. (To make sense of the negative sign, note that is expected to be normalized to lie between zero and one.) The flexibility provided by and is used in the next model.
The kernel is determined by solving the optimization problem = argmin ∑︁ where X is an appropriate set of training indices (local or global). The problem (12) is a linear least squares problem because the 1,1 ( )'s are fixed. For our experiments, we solved it using the conjugate gradient method [18].

Parametric kernel model
The parametric kernel model is a version of the convolution model with a parametric kernel and nonlinearity. It appears in the recent literature on descattering, e.g., [3]. The form of the model is where is defined as in (11) and all of the parameters, , , 1 , 2 , , and depend on . The kernel is a sum of two centered Gaussian functions, where 0 , 0 gives the central pixel of the radiograph. Fitting the parametric kernel model involves the following nonconvex optimization with respect to the scalar parameters , , 1 , 2 , , and , argmin For our experiments, we used the LBFGS algorithm [19] (using the PyTorch implementation), which we found to be significantly faster than gradient descent. We note that LBFGS is not guaranteed to find the global minimizer of (14) because it is nonconvex.

Multikernel model
The multikernel model, proposed in [3], is a generalization of the parametric kernel model wherein different parameters are used in different regions of the image. To fit kernels, − 1 thresholds, 1 , 2 , . . . , −1 are established and a partition function is defined according to The model is then given by where each kernel is given by (13), the indicator function is given by and indicates elementwise multiplication. Put simply, the multikernel model uses a combination of parametric kernel models, each acting in a different region of the input (specified by the indicator). Fitting the multikernel model, like the parametric kernel, involves solving a nonconvex optimization problem over all the scalar parameters, for which we again used the LBFGS algorithm [19].

Descattering algorithm
We have so far discussed scatter estimation from a known direct radiograph, but our ultimate goal is descattering, estimating a direct radiograph from a total radiograph containing scatter. With the scatter models from Section 3.3 in hand, we are ready to descatter by solving = A ( ) + for . (One could readily incorporate regularization on and terms modeling noise or other errors in this model; we leave these for future work.) Inverting this equation may be challenging because it involves additional underlying optimization problems such as (10), (12), or (14) that can themselves be challenging and nonconvex. Here, we use an iterative fixed-point method.
Fixed-point algorithms are commonly used for descattering, e.g., see [3]. The wrinkle here is that, in the local learning case, the scatter model parameters themselves change as a function of the current estimate of . The pseudocode for our descattering algorithm is given in Algorithm 1.

Monoenergetic and polyenergetic density reconstruction
After descattering, we have an estimate of the set of direct radiographs, { } =1 , wherein each is estimated independently. The next tasks is to reconstruct the density field of the object, from this set, which amounts to inverting (3) or (5) (and, consequently, (2) or (4) and (1)) with respect to the underlying density. In general, if the inversion is ill-posed (e.g., limited view imaging), additional regularization can be used during reconstruction [1,20]. Here, we consider the case when enough views are acquired to adequately reconstruct the 3D object. In particular, because the objects we are imaging, which are of interest in important nuclear security applications, are spherically symmetric, a single view suffices for reconstruction [21]. This inversion is a well-studied problem and not the main focus of this work; we therefore leave the details of our approach to Sections 7.1 and 7.2.

Experiments and results
To evaluate our algorithms, we performed descattering and reconstruction experiments with simulated data. These experiments elucidate the descattering performance of the methods, their ability to fit to known scatter, and the effect the number of local neighbors used. In this section, we describe implementation details for each comparison method, our datasets, and our experiments and results.

Preliminaries
Implementation details. In all experiments, scatter models were fit as described in Section 3.3, and all code was written in Python. The area outside the object, i.e., beyond a 5 cm radius, was excluded in all cost functionals because, in practice, there would be no difficulty locating the outer edge of the object. Full-resolution images were 257 × 257, but scatter fitting was performed on 4× downsampled versions (to improve fitting speed and because the scatter is spatially smooth; this is common in the literature, e.g., [3]) and scatter estimates were upsampled to the original size using bilinear interpolation. Data was rescaled to simplify the tuning of fitting routines. Specifically: each training direct radiograph was divided by the Frobenius norm of the entire direct training set (square root of the sum of the squares of all pixels). Each training scatter radiograph was divided by the Frobenius norm of the entire scatter training set. At testing time, model inputs were divided by the previously-computed training direct norm and outputs were multiplied by the previously computed training scatter norm. For CG (for the convolutional model), the number of iterations was 40. For both the parametric kernel and multikernel models, the number of LBFGS iterations was 20 with step size 1.0 and back-tracking line search was enabled; the initial parameters were = 1.0, = 1.0, 1 = 4, 2 = 64, = 1.0, = 0. For the multikernel model, we used = 3. While we did not make a careful study of the runtimes of the proposed methods, fitting global models was slower than fitting local ones (minutes versus seconds on a consumer-level laptop) and evaluating the models once fit took under a second for all models. Code for our method is available upon request.
Inverse Abel transforms were performed with the three-point algorithm of [21] from the PyAbel Python package. Polyenergetic reconstructions were slower than monoenergetic ones due to the lookup table, but both took on the order of seconds on a consumer-level laptop.
Box and whisker plots. Throughout this section, we summarize results using box and whisker plots, e.g., see Figure 3. Unless otherwise noted, each box summarizes the ten testing images. Whiskers indicate minimum and maximum values and boxes indicate first and third quartile. The orange line in the center of each box indicates the median value.

Datasets
Our experiments use two simulation datasets with properties motivated by our nuclear security application: large, dense, spherical objects. We chose to use objects comprising a single material (plus a known collimator) to simplify reconstruction and focus on the effects of descattering.
We generated synthetic radiographs using Monte Carlo particle transport simulation software (MCNP6 [15]) that consisted of five concentric shells. The outermost shell had a fixed radius of 5 cm and the other four shell radii were randomly varied between 0.25 and 2.5 cm. The material was set to uranium for the entire object, but the density of each shell was randomly selected from a set of five densities, and the mass of the object was conserved through a multiplicative constant. The object center was placed 133 cm from the source and the detector was placed at 525 cm from the source. A tantalum collimator (3.5 inches thick, 6.4 inch diameter, 1.4 inch opening, and 2.55 degree half-angle) was placed at 98 cm from the source. While the collimator design was not modified for each object, it served to reduce the dynamic range of the image, in general. We simulated radiographs of these objects with both a monoenergetic and polyenergetic X-ray source. The monoenergetic single-material dataset was generated using a 1.5 MeV X-ray source and the polyenergetic single-material dataset was generated using a bremsstrahlung with a 19.4 MeV endpoint energy. We generated synthetic radiographs using the radiography tally in MCNP, which generates and image produced by X-rays. This tally is a quasideterministic calculation that maps each particle to each pixel and the final image does not contain stochastic noise. Radiographs of direct and scattered radiation were generated separately. The direct radiation calculations utilized 50000 particles while the scattered radiation calculations utilized 1e6 particles. Additionally, the scattered radiation calculations included the PDS card in MCNP, which estimates all possible reaction contributions for each collision and significantly reduces the appearance of fliers due to rare coherent scatter events. Degradation of the image due to source blur and detector effects such as scintillator efficiency and blur were not modeled in this study.
For the monoenergetic dataset, we generated 99 pairs of direct and scatter radiographs. For the polyenergetic dataset, we generated 100 pairs. These were the largest datasets we could simulate in a reasonable amount of (order of days). In both cases, we partition the datasets into a testing set consisting of ten pairs (used for evaluation of algorithms) and a training set consisting of the remaining pairs (used for fitting models). These partitions were chosen to balance the need for computing statistics on the testing set with the desire to have as large a training set as possible.

Descattering and density reconstruction
Our main experiment evaluated the ability of each of the models to enable accurate descattering and density reconstruction. The experimental setup mimicked our actual application. We fit scatter models on a training set of ( , ) pairs and then descattered a set of new total radiograph using the fit models and Algorithm 1. Finally, we reconstructed the underlying density based on the estimated direct radiograph. For local models, we fixed the number of neighbors to three (we explore the effect of the number of neighbors in Section 4.7).
We quantified the reconstruction errors in terms of the median absolute density error (MADE) on the central slice of the 3D reconstruction, in g/cm 3 , whereˆis the reconstructed density, is the ground truth, and we let and range over the area where [ , , 0], is nonzero. (Recall that the objects we are imaging are 3D, hence the three indices for .) We use the central slice it because matches one way we typically view the reconstructions, i.e., as a slice; quantifying error on 1D profiles or the whole volume would also be reasonable. We used the median rather than the mean because the mean is dominated by errors at the edge of the simulation and at the interface between materials.  Comparison methods. While our main focus is the comparison between global fitting and the proposed local fitting, we note that the global versions of the parametric model and multikernel model represent baseline comparison methods from the recent literature [3]. As as an additional comparison, we implemented the deep scatter estimation (DSE) network [13], which is a CNN-based scatter estimation method (trained globally and incorporated into the fixed point algorithm in the same manner as the other scatter models).
Results. The results for the descattering experiment are given in Figure 3. The results show that local models uniformly outperform global ones, which justifies the use of the proposed local fitting approach. For the monoenergetic dataset, the best descattering was provided by the local convolutional model (maximum MADE=0.040 g/cm 3 ), followed by the local parametric model (0.045 g/cm 3 ) and the DSE (0.050 g/cm 3 ). None of the methods matched the performance of the reconstruction without scatter (maximum MADE=.009 g/cm 3 ). For the polyenergetic dataset, the best descattering was provided by the DSE (maximum MADE=0.253 g/cm 3 ), followed by the local convolutional model (0.270 g/cm 3 ). Here, the error when reconstructing from the direct signal (0.243 g/cm 3 ) is higher than for the monoenergetic case: this is to be expected because the particle transport simulation used to generate the data and the forward model used to reconstruct it are distinct. Slight differences in the physical constants (beam spectrum and mass attenuation coefficients) between the model and simulation cause these reconstruction errors. We believe that this mismatch is a strength of the experiments, because it reflects (at least partly) the uncertainties that exist when reconstructing from experimental data. We show error images for the convolutional method in Figure 4. This shows that errors are highest at material interfaces, and that there exist both density under-and overshoots. We provide reconstruction and error profiles in Figures 7 and 8.
Convergence. We quantified the convergence of the fixed point algorithm by computing the normalized mean squared error (NMSE) at each iteration, + A ( ) − 2 2 / 2 2 . (Note that this is not identically zero because the s on the left and right hand sides of Algorithm 1 line 5 represent two successive iterations and are not in general equal.) Although the value of the NMSE and its rate of decrease varied from image to image, it generally either plateaued within a few iterations or dropped to very low values (below 10 −10 ). In the latter case, this suggests that the fixed point algorithm finds a such that + A ( ) = ; in the former case, we suspect there is no solution to the system and we get + A ( ) ≈ . This behavior occurred for both local and global models. We provide convergence plots in Figure 9.
Noise. In a separate, small experiment (described in Section 9) we explored how descattering accuracy was affected by noise. Results showed that noise passed without amplification through the descattering process, indicating that performing descattering does not necessitate additional noise removal beyond that performed in the reconstruction step (e.g., regularization).

Oracle scatter fitting
To explore the capacity of the models, we evaluated them in an oracle fitting setting, i.e., we evaluated their ability to map from a single direct radiograph, to its corresponding scatter, , when both and are known (provided by some oracle that one would not have in practice). A model's performance at this task upper bounds its descattering performance; however, it does not give a complete picture of a model's potential for descattering, since models may overfit the training data and fail to generalize. method dataset w/out scatter single field parametric multikernel convolution w/ scatter For each ( , ) in the monoenergetic and polyenergetic testing sets, we fit a single field, parametric kernel, multi kernel, and convolutional model. (This is equivalent to performing local fitting with a single neighbor if the training set includes the testing set.) We used each fitted model to make a prediction,ˆ. We formed an estimated direct transmission,ˆ= + −ˆ, and reconstructed, yieldingˆ. We compared these estimated densities to the ground truth in terms of median absolute density error (18).
The scatter fitting results (Table 1) show that all models have some potential to improve density reconstruction via descattering, giving maximum density errors of under 0.2 g/cm 3 and 0.4 g/cm 3 in the monoenergetic and polyenergetic cases, respectively, as compared to errors of over 1.25 g/cm 3 when scatter is simply ignored. The single field model performs nearly identically to reconstruction without scatter; this is expected, because eachˆand are identical except for the fact that, as described in Section 4.1, we downsample by a factor of four during scatter fitting. We can view the parametric model as a generalization of the convolutional model with extra flexibility in the tuning of the parameters of the nonlinearity (11). That the parametric  Fig. 5. Effect of increasing object density on reconstruction performance using a local convolutional model and fixed-point descattering (Algorithm 1). As the density scales up, the scatter-to-direct ratio increases, increasing the error. model gives a smaller error than the convolutional model here but not in the main experiment in Section 4.3 suggests that this extra flexibility leads to some degree of overfitting, i.e., less generalization to training data. The same observation holds for the multikernel model.

Scatter estimation
We also assessed the methods' scatter estimation performance without descattering involved, i.e., fitting on a training set and predicting the corresponding for a given known from the testing set. We assessed the results in terms of the median absolute density error (18), when reconstructing from − A ( ).
The results show that scatter estimation was not significantly easier than estimation and descattering (never more than 5% lower error). We interpret this to mean that the limiting factor for descattering performance is accurately modeling scatter as opposed to solving for from . We provide estimated scatter profiles in Figure 10 and example local and global kernels in Figure 11.

Cases with increased scatter
To investigate the performance of local descattering as the amount of scatter increases, we chose a single object from the polyenergetic dataset and repeated the MCNP simulation with its mass scaled by a factor of 1.5 and 2 (leaving the collimator unchanged). These scaled objects have a much higher scatter-to-direct ratio than the original set, with a maximum ratio (over pixels) of 1.57 and 25.60, respectively, compared to 0.34 in the original dataset. For each of these scaled simulations, we additionally perturbed the density by adding or subtracting 1.0 or 2.0 g/cm 3 from the density of each shell, creating four variations (-2.0, -1.0, +1.0, +2.0). We fit a convolutional model to these four variations and used it to descatter the scaled total transmission radiograph. Thus, we can think of the four variations as forming a local training set. We compare the original descattering result to the two high-scatter cases in Figure 5. The first trend we note is that, as the density increases, the reconstruction from the total radiograph (no descattering) deviates more and more from the ground truth, indicating that scatter has become a larger source of error. While descattering is successful in the original simulation and in the 1.5× case, it begins to struggle in the 2× case, reaching density errors of over 10 g/cm 3 in some regions. However, the local approach is still better than ignoring scatter altogether.

Effect of number of neighbors on reconstruction performance
To evaluate the effect of the number of neighbors, we repeated the descattering experiment with the local convolutional model with the number of neighbors varying from one to five. The results ( Figure 6) show that the reconstruction error is robust with respect to the number of neighbors, with local fitting consistently outperforming global no matter the number of neighbors.
Although we might expect that a small number of neighbors could lead to overfitting and that increasing the number of neighbors may cause underfitting, we do not see clear evidence of this here. (Except that the global model, which is equivalent on our data to a local model with 90 neighbors, performs worse than the local ones.)

Discussion
Local models vs CNNs. Our experiments showed that, using the proposed local fitting scheme, the performance of simple models can rival that of a CNN (the DSE [13]); we now compare and contrast these methods. One advantage of the proposed method is that it does not require training a CNN, which is slow and requires challenging hyperparameter tuning. Another is its flexibility in terms of training: in a setting where we know approximately what we want to image, we could perform a few scatter simulations, fit a local convolutional model, and then use it to descatter experimental data. This aspect is especially appealing for applications wherein creating a training set with scatter of even tens of objects takes an unacceptable amount of time, due to the complexity of the object or desired simulation accuracy. Fitting a deep CNN on the same few scatter simulations would likely lead to overfitting. It is also typically much easier to simulate the direct radiographs than to simulate scatter. So in practice, we could simulate a dense set of direct radiographs for performing the local matching in our algorithm and simulate scatter only as necessary. The proposed local learning approach affords this flexibility. The local fitting performed adaptively at reconstruction time also means that new examples can be added to the training dataset and be used by the model without retraining. Finally, the proposed model is simple to describe and easy to interpret, in the sense that the learned kernels can be viewed as images and intuitively encode the spatial component of scatter. The main advantage of using a CNN over the proposed approach is that it is faster at testing time (milliseconds vs seconds to estimate a scatter radiograph). This may be attractive in settings where data is collected quickly and scatter estimation needs to be immediate, but may not matter in many practical settings. Taking a broader perspective, the local convolutional model is a piecewise linear function from images to images, i.e., R 1 × 2 → R 1 × 2 , with the pieces being defined by the neighborhoods in the training data. ReLU CNNs, including the DSE, are also piecewise linear functions [22]. The two models differ only in the parameterization of the linear pieces and their boundaries. These similarities may partly explain the similarity in performance between the two models.
Limitations. Our simulations did not account for scatter within the detector, which, depending on the hardware, may contribute significantly to measured radiographs [11]. Our local fitting method may be able to account for this type of scatter, but it would need to be included in the training data. Our testing datasets, although not small by the standards of other descattering work, are probably not large enough to detect small differences between methods, especially given the large amount of variation in the data. Our experimental datasets consisted only of spherically symmetric objects and the performance of the scatter estimation may be different for different types of objects. However, we note that spherical objects are themselves of practical importance in nuclear security applications and, further, there is nothing about the proposed scatter model that explicitly relies on this symmetry. See Section 10. for a proof-of-concept experiment on an ellipsoidal object. Finally, in cases where the direct-to-scatter mapping is one-to-many, no technique can deterministically estimate scatter from the direct radiograph perfectly, including the proposed method. However, the proposed technique may still be practically useful; further, performance might be improved by finding neighbors based on total transmission, which could be unique even in a case where the direct signal is not.

Conclusions and future work
Our experiments show that, when fit locally, even simple models can describe scatter well, thus avoiding the algorithmic challenges and computational burden of fitting and using a nonlinear or shift-variant model. Of the four models we compared, we recommend the local convolutional model, given that it is easy to fit and provides excellent descattering performance. The proposed approach offers performance on par with a recent deep learning-based method, with the advantage that it avoids an expensive network training step. We see several possible extensions of this work. In the context of X-ray CT, our method can easily be used for multiview (i.e., not spherically symmetric) tomography problems, including medical imaging. However, these settings may be challenging if the variety in the objects to be imaged is large; this would necessitate gathering a larger training set to ensure good neighbors can be found. Validating the method in additional settings is an important next step. We would also like to the study the effect of training set density on reconstruction performance; ideally, we could find a rule of thumb for how many simulations are needed to achieve a certain accuracy for a given application. Another direction may be to study different distance metrics for neighbor determination, or to learn a good metric from the training data [23]. Finally, we believe that the local modeling approach may have applications outside of scatter estimation. Wherever we have a training set of image pairs, we can potentially fit simple, instance-adaptive local models instead of complex, global models. local free kernel Fig. 9. Convergence of the fixed point algorithm using the convolutional model on the monoenergetic dataset. Each line represents one image in the test set, all ten are not visible due to overlaps.

Descattering in the presence of noise
In this small experiment, we explored whether the proposed descattering approach magnifies noise. We added various levels of AWGN to the total transmission and performed descattering. Our results ( Figure 12) show that noise passes though descattering without significant amplification, i.e., the RMSE of the descattered direct radiograph w.r.t the ground truth direct radiograph increases linearly with the standard deviation of the noise. We also verified that, for the levels of noise in the above plot, the nearest neighbors of the total transmission radiographs are not changing. This explains the linear behavior with respect to the noise, because when the neighbors don't change (with respect to same algorithm in noiseless case), the scatter model is a linear function.

Nonspherical objects
As a small proof-of-concept, we performed scatter estimation on a simulated ellipsoidal object using a set of four other ellipsoidal objects as training. Results from the proposed local convolutional model are given in Figure 13. While preliminary, these results suggest that the proposed method can work on nonspherical objects.