Joint Reconstruction of X-Ray Fluorescence and Transmission Tomography

X-ray fluorescence tomography is based on the detection of fluorescence x-ray photons produced following x-ray absorption while a specimen is rotated; it provides information on the 3D distribution of selected elementals within a sample. One limitation in the quality of sample recovery is the separation of elemental signals due to the finite energy resolution of the detector. Another limitation is the effect of self-absorption, which can lead to inaccurate results with dense samples. To recover the true elemental map, we combine x-ray fluorescence detection with a second data modality: conventional x-ray transmission tomography using either absorption or phase contrast. By using these combined signals in a nonlinear optimization-based approach, we obtain an improved quantitative reconstruction of the spatial distribution of most elements in the sample. Compared with single-modality inversion based on x-ray fluorescence alone, this joint inversion approach reduces ill-posedness and results in improved elemental quantification and better correction of self-absorption. © 2016 Optical Society of America OCIS codes: (340.7460) X-ray microscopy; (340.7440) X-ray imaging; (110.3010) Image reconstruction techniques; (100.6950) Tomographic image processing. References and links 1. H. Moseley, “Atomic models and x-ray spectra,” Nature 92, 554 (1914). 2. C. J. Sparks, Jr., “X-ray fluorescence microprobe for chemical analysis,” in “Synchrotron Radiation Research,” , H. Winick and S. Doniach, eds. (Plenum Press, New York, 1980), pp. 459–512. 3. J. Kirz, “Specimen damage considerations in biological microprobe analysis,” in “Scanning Electron Microscopy,” , vol. 2 (SEM Inc., Chicago, 1980), vol. 2, pp. 239–249. 4. L. Grodzins, “Intrinsic and effective sensitivities of analysis by x-ray fluorescence induced by protons, electrons, and photons,” Nuclear Instruments and Methods in Physics Research A 218, 203–208 (1983). 5. V. Cosslett and P. Duncumb, “Micro-analysis by a flying-spot x-ray method,” Nature 177, 1172–1173 (1956). 6. P. Horowitz and J. A. Howell, “A scanning x-ray microscope using synchrotron radiation,” Science 178, 608–611 (1972). 7. C. G. Ryan, R. Kirkham, R. M. Hough, G. Moorhead, D. P. Siddons, M. D. de Jonge, D. J. Paterson, G. De Geronimo, D. L. Howard, and J. S. Cleverley, “Elemental x-ray imaging using the Maia detector array: The benefits and challenges of large solid-angle,” Nuclear Instruments and Methods in Physics Research A 619, 37–43 (2010). 8. Y. Sun, S.-C. Gleber, C. Jacobsen, J. Kirz, and S. Vogt, “Optimizing detector geometry for trace element mapping by x-ray fluorescence,” Ultramicroscopy 152, 44–56 (2015). 9. P. Boisseau and L. Grodzins, “Fluorescence tomography using synchrotron radiation at the NSLS,” Hyperfine Interactions 33, 283–292 (1987). 10. R. Cesareo and S. Mascarenhas, “A new tomographic device based on the detection of fluorescent x-rays,” Nuclear Instruments and Methods in Physics Research A 277, 669–672 (1989). 11. M. D. de Jonge and S. Vogt, “Hard x-ray fluorescence tomography an emerging tool for structural visualization,” Current Opinion in Structural Biology 20, 606–614 (2010). 12. S. Vogt, “MAPS: A set of software tools for analysis and visualization of 3D x-ray fluorescence data sets,” Journal de Physique IV (Proceedings) 104, 635–638 (2003). 13. A. Muñoz-Barrutia, C. Pardo-Martin, T. Pengo, and C. Ortiz-de Solorzano, “Sparse algebraic reconstruction for fluorescence mediated tomography,” Proceedings of SPIE 7446, 744604–10 (2009). 14. E. X. Miqueles and A. R. De Pierro, “Iterative reconstruction in x-ray fluorescence tomography based on Radon inversion,” IEEE Transactions on Medical Imaging 30, 438–450 (2011). 15. D. Gürsoy, T. Biçer, A. Lanzirotti, M. G. Newville, and F. De Carlo, “Hyperspectral image reconstruction for x-ray fluorescence tomography,” Optics Express 23, 9014–9023 (2015). 16. G. Schmahl and D. Rudolph, “Proposal for a phase contrast x-ray microscope,” in “X-ray Microscopy: Instrumentation and Biological Applications,” , P. C. Cheng and G. J. Jan, eds. (Springer-Verlag, Berlin, 1987), pp. 231–238. 17. T. J. Davis, D. Gao, T. E. Gureyev, A. W. Stevenson, and S. W. Wilkins, “Phase-contrast imaging of weakly absorbing materials using hard x-rays,” Nature 373, 595–598 (1995). 18. Y. P. Hong, S.-C. Gleber, T. V. O’Halloran, E. L. Que, R. Bleher, S. Vogt, T. K. Woodruff, and C. Jacobsen, “Alignment of low-dose x-ray fluorescence tomography images using differential phase contrast,” Journal of Synchrotron Radiation 21, 229–234 (2014). 19. L.-T. Chang, “A method for attenuation correction in radionuclide computed tomography,” IEEE Transactions on Nuclear Science 25, 638–643 (1978). 20. J. Nuyts, P. Dupont, S. Stroobants, R. Benninck, L. Mortelmans, and P. Suetens, “Simultaneous maximum a posteriori reconstruction of attenuation and activity distributions from emission sinograms,” IEEE Transactions on Medical Imaging 18, 393–403 (1999). 21. H. Zaidi and B. Hasegawa, “Determination of the attenuation map in emission tomography,” Journal of Nuclear Medicine 44, 291–315 (2003). 22. J. P. Hogan, R. A. Gonsalves, and A. S. Krieger, “Fluorescent computer-tomography a model for correction of x-ray absorption,” IEEE Transactions on Nuclear Science 38, 1721–1727 (1991). 23. P. J. La Rivière, “Approximate analytic reconstruction in x-ray fluorescence computed tomography,” Physics in Medicine and Biology 49, 2391–2405 (2004). 24. E. X. Miqueles and A. R. De Pierro, “Exact analytic reconstruction in x-ray fluorescence CT and approximated versions,” Physics in Medicine and Biology 55, 1007–1024 (2010). 25. T. Yuasa, M. Akiba, T. Takeda, M. Kazama, A. Hoshino, Y. Watanabe, K. Hyodo, F. A. Dilmanian, T. Akatsuka, and Y. Itai, “Reconstruction method for fluorescent x-ray computed tomography by least-squares method using singular value decomposition,” IEEE Transactions on Nuclear Science 44, 54–62 (1997). 26. Q. Yang, B. Deng, W. Lv, F. Shen, R. Chen, Y. Wang, G. Du, F. Yan, T. Xiao, and H. Xu, “Fast and accurate xray fluorescence computed tomography imaging with the ordered-subsets expectation maximization algorithm,” Journal of Synchrotron Radiation pp. 210–215 (2012). 27. C. G. Schroer, “Reconstructing x-ray fluorescence microtomograms,” Applied Physics Letters 79, 1912 (2001). 28. P. J. La Rivière, D. Billmire, P. Vargas, M. Rivers, and S. R. Sutton, “Penalized-likelihood image reconstruction for x-ray fluorescence computed tomography,” Optical Engineering 45, 077005 (2006). 29. Q. Yang, B. Deng, G. Du, H. Xie, G. Zhou, T. Xiao, and H. Xu, “X-ray fluorescence computed tomography with absorption correction for biomedical samples,” X-ray Spectrometry 43, 278–285 (2014). 30. B. Golosio, A. Simionovici, A. Somogyi, L. Lemelle, M. Chukalina, and A. Brunetti, “Internal elemental microanalysis combining x-ray fluorescence, Compton and transmission tomography,” Journal of Applied Physics 94, 145–156 (2003). 31. B. Vekemans, L. Vincze, F. E. Brenker, and F. Adams, “Processing of three-dimensional microscopic x-ray fluorescence data,” Journal of Analytical Atomic Spectrometry 19, 1302–1308 (2004). 32. Z. Di, S. Leyffer, and S. M. Wild, “Optimization-based approach for joint x-ray fluorescence and transmission tomographic inversion,” SIAM Journal on Imaging Sciences 9, 1–23 (2016). 33. A. C. Kak and M. Slaney, Principles of Computerized Tomographic Imaging (IEEE Press, 1988). 34. J. Sherman, “The theoretical derivation of fluorescent x-ray intensities from mixtures,” Spectrochimica Acta 7, 283–306 (1955). 35. T. Schoonjans, A. Brunetti, B. Golosio, M. S. del Rio, V. A. Solé, C. Ferrero, and L. Vincze, “The xraylib library for x-ray-matter interactions. Recent developments,” Spectrochimica Acta Part B: Atomic Spectroscopy 66, 776–784 (2011). 36. W. T. Elam, B. D. Ravel, and J. R. Sieber, “A new atomic database for x-ray spectroscopic calculations,” Radiation Physics and Chemistry 63, 121–128 (2002). 37. J. A. Browne and T. J. Holmes, “Developments with maximum-likelihood x-ray computed tomography: initial testing with real data,” Applied Optics 33, 3010–3022 (1994). 38. T. J. Holmes and Y.-H. Liu, “Acceleration of maximum-likelihood image restoration for fluorescence microscopy and other noncoherent imagery,” Journal of the Optical Society of America A 8, 893–907 (1991). 39. S. G. Nash, “A survey of truncated-Newton methods,” Journal of Computational and Applied Mathematics 124, 45–59 (2000). 40. D. Gürsoy, F. De Carlo, X. Xiao, and C. Jacobsen, “TomoPy: a framework for the analysis of synchrotron tomographic data,” Journal of Synchrotron Radiation 21, 1188–1193 (2014). 41. P. C. Hansen and D. P. O’Leary, “The use of the L-curve in the regularization of discrete ill-posed problems,” SIAM Journal on Scientific Computing 14, 1487–1503 (1993).


Introduction
The use of characteristic x-ray emission lines to distinguish between different chemical elements in a specimen goes back to the birth of quantum mechanics [1].X-ray fluorescence can be stimulated by energy transfer from electron or proton beams, but the best combination of sensitivity and minimum radiation damage is provided by using x-ray absorption [2][3][4] for this purpose.This is usually done in a scanning microscope mode, where a small x-ray beam spot is raster-scanned across the specimen while x-ray photons are collected by an energy dispersive detector that provides a measure of the energy of each emitted photon [5].Following early demonstrations [6], x-ray fluorescence microscopy is now commonplace in many laboratories and in particular at a wide range of synchrotron radiation light source facilities worldwide.Because the x-ray beam from synchrotron light sources is usually linearly polarized in the horizontal direction, the energy dispersive detector is usually located at a position 90 • in the horizontal from the incident beam so as to be centered on the direction of minimum elastic scattering as shown in Fig. 1 (other energy dispersive detector positions can be used [7], with various relative merits [8]).Because the depth of focus of the x-ray beam is usually large compared to the specimen size, one can treat the incident x-ray beam as a pencil beam of constant diameter and thus rotate the specimen to obtain a set of 2D projections from each x-ray fluorescence line for tomographic reconstruction [9,10], even for elements present at low concentration such as trace elements in biological specimens [11].
A common approach is to collect the photon counts within predetermined energy windows, or to use per-pixel spectral fitting [12], so as to get immediate elemental concentration maps.These maps are then used in conventional tomographic reconstruction schemes, such as filtered back projection and iterative reconstruction techniques [13,14].A more recent approach has been to use a penalized maximum likelihood estimation method on the per-pixel spectra recorded by the energy dispersive detector for improved quantification and elemental separation [15]; we refer to this as full-spectrum analysis.A separate complication involves the correction of fluorescence self-absorption, where characteristic x-ray fluorescence photons emitted from one voxel in a 3D specimen might suffer absorption in specimen material that lies between this voxel and the energy dispersive detector.There have been interesting approaches to correct for self-absorption as will be discussed below, but these approaches have not been combined with full-spectrum analysis.In addition, phase contrast dominates over absorption contrast in transmission imaging using multi-keV x-rays [16,17] and it can provide a superior imaging signal for the alignment of x-ray fluorescence tomography datasets when rotation stage run-out error is significant [18].For these reasons, we consider here a combined approach that incorporates both full-spectrum fluorescence analysis, and transmission imaging using absorption and phase contrast, as part of an optimization-based approach to fluorescence tomography analysis by using a complete forward model of the x-ray imaging process.

Fluorescence Self-Absorption
We begin with a simple illustration of self-absorption.Consider a specimen that consists of a 200 µm diameter borosilicate glass cylinder with a 10 µm diameter tungsten wire off to one side, and a 10 µm gold wire off to another side (Fig. 2).The borosilicate glass was assumed to consist of 81% SiO 2 , 13% B 2 O 3 , 3.5% Na 2 O, 2% Al 2 O 3 , and 0.5% K 2 O, with a density of 2.23 g/cm 3 .If a chromatic x-ray focusing optic like a Fresnel zone plate is used to produce the x-ray pencil beam, a monochromatic x-ray beam should be used for scanning and its energy might be set to 12.1 keV to be well-separated in energy from the Au L β 1 fluorescence line at 11.4 keV.As the specimen is rotated, one obtains x-ray transmission (XRT) sinograms based on the attenuation of 12.1 keV x rays in Si, W, and Au as shown at right in Fig. 2. The situation with the x-ray fluorescence (XRF) signal is different; when the x-ray beam is at the right edge of the Si cylinder, the Si Kα 1 x-rays with 1.74 keV photon energy will have to traverse nearly 200 µm of Si before they reach the XRF detector located at left (Fig. 1).Since Table 1 shows that the absorption length of 1.74 keV x-rays is 1.66 µm in glass, the fraction of the signal reaching the XRF detector is only exp[−200/1.66]≃ 5 × 10 −53 so that essentially none of the Si XRF signal is detected in this case.In fact, only the Si XRF signal from the side nearest to the XRF detector is registered, so that from the Si XRF signal one cannot distinguish between a solid Si cylinder versus one that is hollowed out as shown in the bottom row of Fig. 2. The Au and W XRF signals can better traverse through the Si cylinder to reach the XRF detector, and moreover the 12.1 keV incident beam is also only partly absorbed so by combining all of these signals one can indeed distinguish between a solid and hollow Si cylinder.
Several methods have been used to correct for the self-absorption effect, including earlier approaches used for radionuclide emission tomography [19][20][21].If one can measure the transmission sinograms of the specimen at the energies of all x-ray fluorescence lines, it is possible to correct for self absorption [22].However, this approach is exceedingly difficult to realize experimentally, since a large number of x-ray fluorescence lines are present in many specimens (see Fig. 11) and one would need to collect a transmission tomography dataset at each of these energies.In the case of uniform absorption and illumination at a single x-ray energy, analyt-  1: X-ray absorption lengths µ −1 for silicon (Si), a borosilicate glass with a composition described in Sec. 2, tungsten (W), and gold (Au) at the energies of selected x-ray fluorescence lines (see also the spectrum in Fig. 11) and also for an incident x-ray energy of 12.1 keV.As can be seen, Si x-ray fluorescence in particular will be strongly self-absorbed according to the Lambert-Beer law of I = I 0 exp[−µt] for a material of thickness t.The x-ray attenuation coefficient µ E e for element e at x-ray energy E is just the inverse of the absorption length.
ical approaches have been developed [23,24] and these have been shown [14] to provide a good starting point iterative methods we now describe.One approach is to use algebraic (rather than filtered back projection) reconstruction methods to better handle limited rotational sampling, and least-squares fitting to better handle quantum noise [25]; other approaches have used ordered-subsets expectation maximization [26].In more recent work, optimization approaches have been introduced where the transmission tomography data at a single x-ray energy was used to estimate the absorption at all x-ray fluorescence energies using the fact that (in the absence of x-ray absorption edges) x-ray absorption scales in a power-law relationship with x-ray energy [27][28][29].One can also add the Compton scattered signal as another measurement of overall specimen electron density, and use the tabulated absorption coefficients µ E e of all elements e at each fluorescence energy E [30].Other approaches classify the specimen as being composed of a finite number of material phases for the calculation of self-absorption [31].The optimization approaches in particular serve as inspiration for our approach, which we believe is unique in combining both full-spectrum analysis and transmission imaging along with fluorescence.
We begin by briefly describing the mathematical "forward models" of XRF and XRT.Next, we detail our joint reconstruction approach and the formulation of the objective function and corresponding optimization algorithm.We then discuss choices of scaling parameters in the numerical implementation of the algorithm and present the performance of our joint inversion compared with existing approaches on real datasets.We start from an earlier approach [32], which we extend considerably here.We use θ ∈ Θ and τ ∈ T to denote, respectively, the index of the x-ray beam angle and discretized beamlet from a collection of |Θ| angles and |T | beamlets.The set V denotes the collection of |V | spatial voxels used to discretize the reconstructed sample.By L = [L θ,τ v ], we denote the intersection lengths (in cm) of beamlet (θ, τ) with the voxel v ∈ V. We use E to denote the collection of |E | possible elements and µ E e to denote the mass attenuation coefficient (in cm 2 g −1 ) of element e at beam incident energy E. Our goal is to recover W W W = [W v,e ], the concentration (in g cm −3 ) of element e in voxel v.

XRT Imaging Model
A conventional way (see, e.g., [33]) to model the XRT projection FT θ,τ (in counts/sec) of a sample from beamlet (θ, τ) is where I 0 is the incident x-ray intensity (in counts/sec) and μ is the linear attenuation absorption coefficient (in cm −1 ) at incident energy E.
To better explore the correlation of XRF and XRT and to link these two data modalities by the common variable W W W, we note that the coefficients μ To obtain a simple proportional relationship, we divide both sides of Eq. 3.2 by I 0 and then take the logarithm to obtain the XRT forward model used in this work: We similarly take the logarithm of the raw XRT sinograms used in this paper.

XRF Imaging Model
Our discrete model follows an elemental approach, in the sense that we model the XRF energy emitted from a single elemental atom by its corresponding elemental unit spectrum.Then, justified by the fact that photon counts are additive, the total XRF spectrum detected from the given sample is estimated as a weighted sum of the unit spectra of the elements being recovered.First, we model the net XRF intensity I e,ℓ, s , which corresponds to the characteristic XRF energy E e emitted from element e, by Sherman's equation [34] up to first order (i.e., neglecting effects such as Rayleigh and Compton scattering): where c e is the total concentration of element e (c e = 1 in the case of our unit spectra), ω e,ℓ is the XRF yield of e for the spectral line ℓ, and r e, s is the probability that a shell s electron (rather than other shell electrons) will be ejected.
In our calculations, the quantity ω e,ℓ 1 − 1 r e, s µ E e is approximated by the XRF cross sections provided from xraylib [35].Next, we convert the intensity to a spectrum by incorporating the practical experimental environment.Given an energy-dispersive XRF detector with energy channels x i ,i = 1, . . ., |I|, we define an indicator function Then we have the ideal, delta-function peak I x e,l, s = I e,l, s 1 x E e .In practice, because of the detector energy resolution [2], discrete x-ray lines get broadened by a Gaussian distribution with a standard deviation σ.The resulting unit spectrum of element e is thus given by M e = ℓ, s M e,ℓ, s , where and where ⊙ denotes pointwise (Hadamard product) multiplication and F (F −1 ) is the (inverse) Fourier transform.To simplify the model, we consider only the K α , K β , L α , L β , and M α lines as tabulated [36].
We then model the total XRF spectrum of a sample with multiple elements by explicitly considering the attenuation of the beam energy and the self-absorption effect of the XRF energy.We represent the attenuation experienced by beamlet (θ, τ) (at incident beam energy E) as it travels toward voxel v by where I X is the indicator (Dirac delta) function for the event X and U θ,τ v is the set of voxels that are intersected by beamlet (θ, τ) before it enters voxel v.
We let P θ,τ v,e (W W W) be the attenuation of XRF energy emitted from element e at voxel v by beamlet (θ, τ).To reduce the complexity of the calculation, instead of tracking all the emitted photons isotropically, we consider only the emission from the region Ω v .This region is the part of the sample discretization that intersects the pyramid determined by the centroid of the voxel v and the XRF detector endpoints; see Fig. 1 for a 2D illustration.In a slight abuse of notation, we let v ′′ ∈ Ω v indicate that the centroid of voxel v ′′ is contained in the region Ω v .Then the self-absorbed XRF energy is approximated by where a(Ω) is the volume of Ω (or area of Ω for a 2D sample) and µ E e e ′ is the linear attenuation coefficient of element e ′ at the XRF energy E e of element e. Accordingly, the fluorescence spectrum F R θ,τ (in counts/sec) of the sample resulting from beamlet (θ, τ) is the |I|-dimensional vector

Optimization-Based Reconstruction Formulations and Algorithms
We take D T θ,τ ∈ R and D R θ,τ ∈ R |I | to denote the experimental data for XRT and XRF, respectively.We now solve reconstruction problems involving the models F T θ,τ (W W W ) and F R θ,τ (W W W). Given that both these data sources are derived from measured photon counts, we follow a maximum likelihood approach that assumes the measurements are subject to independent Poisson noise [37,38].First, we take a logarithm of D T θ,τ and work with D T θ,τ = − ln( D T θ,τ /I 0 ).Maximizing the likelihood (derived in App.A) for our joint inverse problem then can be written as min where the non-negativity constraint W W W ≥ 0 is enforced to respect the physical nature of mass; φR and φT correspond to the XRF and XRT objective terms, respectively; and β 1 ≥ 0, β 2 ≥ 0 are scaling parameters.The scaling parameter β 1 balances the ability of each modality to fit the data, and β 2 detects the relative variability between the data sources D R θ,τ and D T θ,τ .Advances in x-ray sources, optics, and detectors mean that the datasets to be analyzed can be large; thus, having a fast and memory-efficient algorithm to solve (4.8) is highly desirable.Therefore, we apply an alternating direction approach described in Alg. 1.In this approach, instead of directly minimizing Eq. 4.8, we first solve an "inner iteration" subproblem: , and where A(W W W i ) and P P P(W W W i ) are fixed given the current solution W W W i at the "outer" iteration i of Alg. 1.
To approximately solve Eq. 4.9, we adapt a form of the inexact truncated Newton (TN) method in [39].We write TN as a function of the form which applies k iterations of TN to the problem in Eq. 4.9 with initial guess W W W i to obtain W W W i+1 .In particular, we use a bound-constrained preconditioned conjugate gradient method to obtain the search direction, followed by a backtracking line search (see Alg. 2) to obtain the next iterate W W W i+1 .The process is repeated until desired stopping criteria are satisfied; in Alg. 1 we repeat until consecutive iterates are within a user-specified distance ǫ of one another.Since the focus of this work is to show the potential of joint inversion with multimodal data, future work will address convergence analysis of Alg. 1.

Experimental Reconstruction
Algorithm 1 Algorithm for Solving Joint Inversion with Linearized XRF Term.
1: Given tolerance ǫ > 0 and initial W W W 0 ; initialize iteration counter i = 0. 2: repeat 3: 4: Compute the search direction e i = W W W i+1 − W W W i .

6:
Use backtracking line search α = LIN_NAIVE(e i ,W W W i , φ, 1) (see Alg. 2) to obtain 4: We now demonstrate the benefit of the proposed joint inversion approach by using experimental data.We constructed a simple test sample consisting of a borosilica glass rod with a composition as described in Sec. 2, wrapped with a a W wire of 10 µm diameter, and a Au wire of 10 µm diameter.This test specimen was scanned by an incident beam energy of 12.1 keV at beamline 2-ID-E at the Advanced Photon Source at Argonne National Laboratory.Each projection was acquired by raster scanning with a 200 nm scanning step size and 73 scanning angles over an angular range of 360 • .At each scanning step, the transmission signal was acquired by using a segmented photon diode located downstream of the sample, while the fluorescence signals were collected by using an energy-dispersive detector (Vortex ME-4) located at 90 • relative to the incident beam, covering an energy range of 0-20 keV with 2,000 energy channels.The final XRT data contains 73 projections with each slice involving 1, 750 × 51 pixels, leading to a total data set of dimension 73 × 1, 750 × 51 × 2, 000.
As expected from the attenuation lengths given in Table 1 and the simulations shown in Fig. 2, this dataset shows strong self-absorption in the Si fluorescence measurements.We compare our reconstruction result with the output of TomoPy 0.1.15[40], a widely used tomographic dataprocessing and image reconstruction library.TomoPy takes the elemental concentration map decomposed from the raw spectrum by the program MAPS 1.2 [12] for improved photon statistics compared with the raw data.The MAPS program fits the full energy spectrum recorded at each scan to a set of x-ray fluorescence peaks plus background signals, and it returns a 2D dataset corresponding to a certain elemental concentration per scan.Figure 3 shows three elemental XRF sinograms of interest as calculated by this approach.Within TomoPy, we used a maximum likelihood expectation maximization algorithm to reconstruct the three sinograms separately.All numerical experiments are performed on a platform with 1.5 TB DDR3 memory and four Intel E7-4820 Xeon CPUs.
For the purposes of algorithm testing with reduced computational cost, we reconstructed only a 2D middle slice (x, z) of the 3D glass rod dataset (x, y, z).For each angle, we summed together 9 adjacent y slices of both XRF spectra and XRT measurements as the input experiment data.Therefore  Based on the different magnitudes of these two datasets, we choose β 2 = 10 as the scaling parameter to balance the measurement variability of the two data sources, so that both measurements have maxima near 10 in their respective units.As a result, the relative variability of the two detectors between the data sources is mitigated.
β 1 =0.25The curve displays the tradeoff between these two modalities.

Selection of β Values
Next, we explain one way to select values for β 1 and β 2 for use in the objective in Eq. 4.8.
We recall that the effect of β 2 is to balance the magnitudes of the XRF and XRT measurement data, and that its exact value is not critical.Therefore, according to the magnitude difference of two data sources shown in Fig. 4, we chose to use β 2 = 10 to balance this difference so that both measurements have maxima near 10 in their respective units.The selection of β 1 is accomplished by applying the so-called L-curve method [41].In Fig. 5, we plot the L-curve defined by the curve of XRT terms φT versus XRF terms φR obtained from solving Eq. 4.8 with different β 1 values.This curve displays the tradeoffs between these two modalities and provides an aid in choosing an appropriate balancing parameter β 1 .The curvature, defined as the curvature of a circle drawn through three successive points on the L-curve, is calculated and plotted in Fig. 6.As suggested by [41], we choose the point on the L-curve with the maximum curvature; according to the objective values of φR and φT shown in Fig. 5 and the curvature shown in Fig. 6, this is β 1 = 1.Consistently, Fig. 10 shows the reconstructed elemental maps corresponding to different β 1 , and β 1 = 1 returns the results that are closest to the ground truth.

Joint Inversion Results
Given W W W 0 = 0 as the initial guess for the joint inversion, Fig. 7 shows the reconstruction result for each element by using Alg. 1 with ǫ = 10 −6 .In particular, Fig. 8 shows the performance of the inner iteration by TN to reduce both the XRF and XRT objective values.Correspondingly, Fig. 9 shows the reconstructed result of each outer iteration of Alg. 1.The reconstructed elemental maps show the benefits of our joint inversion mainly from two perspectives.First, because of the imperfections of spectral fitting and background rejection, the decomposed elemental concentrations show certain artifacts-which we call the "shadow effect"-where certain elemental sinograms pick up other elements' signals.For example, according to the ground truth, we know that Si exists only in the rod part with a cylinder shape; but its corresponding sinogram from MAPS, shown in Fig. 3, shows that it also exists in the two wires, which is caused by imperfect data fitting.Those two extra waves are actually picked up from Au and W signals Fig. 6: Method for choosing the parameter β 1 that appears in the cost function of Eq. 4.8: The curvature from successive points in Fig. 5; the point with maximum curvature occurs at β 1 = 1.
because certain emission lines of Au and W overlap those of Si.As a result, the reconstruction from TomoPy based on these decomposed sinograms will contain the "shadow points," which are shown as two small dots around Si and a dot in the W map in the left bottom corner of Fig. 7. Comparing the results from XRF single inversion using our forward model with the To-moPy output, we see that our forward model is able to better distinguish the different elemental signals; that is, the "shadow effect" is greatly mitigated.Furthermore, by introducing the XRT modality into the reconstruction, the joint inversion not only suppressed the artifacts from the "shadow effect" and the strikes introduced by the misalignment between projections but also more accurately recovered Si by filling the inside of the cylinder and thereby correcting the self-absorption effect.Also, we provide a quantity evaluation of our reconstruction.We simulate the XRF spectrum based on the reconstructed elemental composition and compare it to the real experimental data as shown in Fig. 11.We can see that, except the background region that our forward model does not include to simulate, the essential peaks corresponding to the main elements we are interested to cover agree very well with the experimental data.This comparison indicates that not only our joint reconstruction improves the solution from a visualization perspective, but also from quantification point of view.Furthermore, it indicates a satisfying accuracy of our XRF forward model.

A. Maximum Likelihood Derivation
We assume that the measurement data are independent and that each measurement D j follows a Poisson distribution with mean F j (W W W ). The likelihood for any D j is then By the assumed independence of the measurements, the joint likelihood is Our approach requires first-order derivatives, which are easily derived in the Poisson noise setting.For a particular (voxel v, element e) pair, the first-order derivative of (A.The calculation of the remaining derivatives ∂ ∂W v, e F j (W W W) is described in our previous paper [32].
The submitted manuscript has been created by UChicago Argonne, LLC, Operator of Argonne National Laboratory (Argonne).Argonne, a U.S. Department of Energy Office of Science laboratory, is operated under Contract No. DE-AC02-06CH11357.The U.S. Government retains for itself, and others acting on its behalf, a paid-up nonexclusive, irrevocable worldwide license in said article to reproduce, prepare derivative works, distribute copies to the public, and perform publicly and display publicly, by or on behalf of the Government.The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan.http://energy.gov/downloads/doe-public-access-plan.

Fig. 1 :
Fig.1: Top view of the geometry used in x-ray fluorescence microscopy.The x-ray beam is treated as a pencil beam in the z direction that is raster-scanned across the specimen in 1D in the x direction, and the specimen is then rotated before another image is acquired (successive 2D planes are imaged by motion of the 3D specimen in the y direction, into/out of the plane of this top view).The x-ray transmission signal (absorption or phase contrast) is recorded, and the x-ray fluorescence (XRF) signal is recorded over an angular range of Ω v by using an energy dispersive detector located at 90 • to the beam, in the direction of the elastic scattering minimum for a horizontally polarized x-ray beam.The grid overlay on the specimen shows its discretization with a pixel size of L v ; the set of pixels (in 2D; voxels in 3D) through which the XRF signal might undergo self-absorption in the specimen is indicated in orange.

Fig. 2 :
Fig.2: Illustration of the x-ray fluorescence self-absorption effect, and how x-ray transmission can be used to recognize and correct for it.We show here a specimen composed of cylinders, or circles in this top view.The largest is of borosilicate glass (composition described in Sec. 2) with 200 µm diameter, followed by tungsten (W) with 10 µm diameter, and gold (Au) with 10 µm diameter.As 1D scans in beamlet positions τ are collected at successive specimen rotation angles θ, one builds up sinograms or (τ, θ) views of elemental x-ray fluorescence (XRF) signals such as the Si XRF signal shown in the middle, as well as 12.1 keV x-ray transmission (XRT) sinograms as shown at right (based on absorption contrast; phase contrast images can also be used).If there is no self-absorption of the fluorescence signal, one obtains a Si XRF sinogram as shown in the top row, where the incident x-ray beam is partially absorbed in the small W and Au wires as they rotate into positions to intercept the incident beam before it reaches the glass cylinder.However, the 200 µm diameter glass cylinder is large compared to the 1.66 µm absorption length µ −1 of Si Kα 1 x-rays in the glass as shown in Table1, so that a fraction 1 − exp[−200/1.66]≃ 1 − 5 × 10 −53 of the Si XRF signal will be self-absorbed in the rod.As a result, the Si XRF signal will be detected mainly when the incident beam is at the left side of the scan; this leads to the Si XRF sinogram shown in the middle row (the sinogram also shows absorption of the Si XRF signal in the W and Au wires as they rotate through positions where they partly obscure the XRF detector's view of the Si cylinder).In the bottom row we show the case where the glass cylinder is hollow, with a wall thickness of 30 µm that is nevertheless large compared to the 1.66 µm absorption length of Si XRF photons; in this case the Si XRF sinogram is almost unchanged, but the XRT sinogram is clearly different.By using the combined information of the fluorescence (XRF) and transmission (XRT) sinograms, one can in principle obtain a better reconstructed image of the specimen in the case of strong fluorescence self-absorption.

Fig. 3 :
Fig. 3: Relative elemental concentrations obtained from a MAPS-based fit of the raw x-ray fluorescence data for the glass rod sample.Due to the imperfection of fitting and background rejection (which might be able to be corrected with additional expert input), the decomposed elemental concentrations show certain artifacts, where certain elemental sinograms pick up other elements' signals.For example, according to the ground truth, we know that Si exists only in the rod part with a cylinder shape; but its corresponding sinogram shows that it also exists in the two wires, which is caused by imperfect data fitting.Those two extra waves are actually picked up from Au and W signals because certain emission lines of Au and W overlap those of Si.

Fig. 4 :
Fig. 4: Experimental sinograms.Left: mean (across energy channels) value of XRF raw spectrum; Right: XRT sinogram.Based on the different magnitudes of these two datasets, we choose β 2 = 10 as the scaling parameter to balance the measurement variability of the two data sources, so that both measurements have maxima near 10 in their respective units.As a result, the relative variability of the two detectors between the data sources is mitigated.

Fig. 5 :
Fig. 5: Method for choosing the parameter β 1 that appears in the cost function of Eq. 4.8: XRF objective value φR versus XRT objective value φT given different values β 1 , with fixed β 2 = 10.The curve displays the tradeoff between these two modalities.

Fig. 7 :
Fig. 7: Comparison of reconstruction results for MAPS+TomoPy, XRF alone, and joint reconstruction, respectively, given an initial guess of all zeros, β 1 = 1, and β 2 = 10.Every elemental map is rescaled to the range [0, 0.25].It is clear that the joint reconstruction returns the best result from two perspectives: first, the glass rod is filled with Si; and second, the "shadow effect" is dramatically mitigated for the reconstruction of Si and W.

Fig. 8 :
Fig.8: Convergence of TN for each inner iteration j, given a maximum number of inner iterations as 50, β 1 = 1, and β 2 = 10.We can see that along the iterations, TN is reducing the objective function so that the forward model fits better and better to the given data.

Fig. 9 :
Fig. 9: Solution for each outer iteration i, given ( β 1 , β 2 ) = (1, 10).At iteration i = 3, Alg. 1 reaches its stopping criterion in the sense that the solution does not change anymore.It also indicates that our alternating algorithm is very efficient to converge to the solution within only a few iterations.

jf
(D j ; F j (W W W )).The problem of maximizing the log likelihood is thusmax j ; F j (W W W)) = j ln f (D j ; F j (W W W )) = j ln F j (W W W ) D j exp{−F j (W W W ) } D j != j ln(F j (W W W) D j ) + ln(exp{−F j (W W W )}) − ln(D j !) .Since each D j is a scalar (independent of W W W), it is therefore equivalent to solve maxW W W ψ(W W W ) = j D j ln(F j (W W W)) − F j (W W W) .(A.10)

1 ∂
10) with respect to the concentration W v,e is∂ ∂W v, e ψ(W W W ) = j D j F j (W W W ) − ∂W v, e F j (W W W).