Automatic attenuation map estimation from SPECT data only for brain perfusion scans using convolutional neural networks

Yuan Chen; Marlies C Goorden; Freek J Beekman

doi:10.1088/1361-6560/abe557

1. Introduction

In SPECT, attenuation of photons in tissue hampers quantitative analysis of regional tracer uptake and may lead to image artefacts. Attenuation correction (AC) is thus required to improve the diagnostic value and quantitative accuracy of reconstructed images. Besides, quantitative SPECT (i.e. SPECT that provides a precise estimate of the activity level in kBq. ${{\rm{ml}}}^{-1}$ ) is also important for dosimetry planning (Bailey and Willowson 2013).

Correction for photon attenuation is commonly based on a 3D map (attenuation map or $\mu$ map) that quantifies the amount of attenuation in each voxel within the patient at the given photon energy. Today, these $\mu$ maps are often derived from an additional CT or MRI scan. However, such an additional scan may not be available, can add radiation dose in case of CT, and is prone to registration errors (King et al 2004, Goetze et al 2007, Berker and Li 2016). Besides the use of additional CT or MRI scans, simple methods based solely on emission data that delineate the object contour and assume a uniform attenuation within the contour are used. Currently, manual placement of an ellipse approximating the head contour is still the most widely implemented approach for brain studies with commercial SPECT systems (Zaidi and Hasegawa 2003). Such a method however can be highly subjective and suffer from limited accuracy due to the operator dependency, coarse approximation of the skull contour and lack of internal head anatomy.

In clinical SPECT, photon-interactions in biological tissues are dominated by Compton scatter while photoelectric absorption is almost negligible. For example, for photons at 140 keV traveling through brain, only 1% (a linear coefficient of 0.0015/cm) of the total attenuation (0.1461/cm) is due to photoelectric effects. Therefore, one might expect that the detected scattered photons contain essential information about attenuation maps. Conventionally, the use of SPECT images reconstructed from scattered photons or projections in scatter windows for $\mu$ map generation has been investigated in the 90s for brain (Macey et al 1988) and body (cardiac and liver) SPECT (Macey et al 1988, Wallis et al 1995, Pan et al 1996, Younes et al 1998), yet in most cases solely a contour was estimated from a single scatter window while a uniform attenuation coefficient was assigned to the volume within the contour. This approach is seldom applied clinically, possibly due to the increased complexity in light of the limited improvement of accuracy compared to the ellipse method. With the advance of SPECT instruments, list mode acquisition—in which case the estimated interaction position as well as the energy are recorded simultaneously for every detected event—is gaining popularity. In the work of Jha et al (2013), the authors studied the information content in SPECT list mode data and proposed a MLEM approach using scattered photons from the observed list mode data to jointly reconstruct the attenuation and activity map. This, however, remains a theoretical study due to the complexity of the approach. Similarly, jointly reconstructing the attenuation and activity map using both the photopeak and scattered projections was also proposed in a newly published work for PET scans without time of flight information (Brusaferri et al 2020).

Recently, deep learning with convolutional neural networks (CNNs) has been widely investigated in medical image restoration and analysis, e.g. for tissue segmentation, image de-noising, and image transformation (e.g. MRI to CT) (Beekman 1993, Gong et al 2018, Leynes et al 2018, Mok et al 2018, Wang et al 2019, Shi et al 2020, Shiri et al 2020). These networks can capture relevant information inherent in data and establish highly nonlinear mapping from input to output. This enables the extraction of energy-spatial information from the scatter- and primary-window reconstructed SPECT images for $\mu$ map estimation. Successful implementations of this deep learning based strategy have been demonstrated in Shi et al (2020) for ^99mTc-tetrofosmin SPECT scans, and in Liu et al (2018a), Reimold et al (2019) for ¹⁸F-FDG brain PET scans. Specifically in Shi et al (2020), the authors performed image reconstructions from two energy windows (primary and one scatter window) of ^99mTc-tetrofosmin myocardial SPECT; a generative adversarial network was used to transform 4D SPECT image patches (3D SPECT over two energy windows) to 3D attenuation map patches. Results were validated on clinical myocardial scans acquired from a GE dual-head parallel-hole SPECT/CT 850 system.

The aim of the present work is to develop and validate a CNN approach for estimating $\mu$ maps of ^99mTc-HMPAO brain perfusion scanning with a full ring stationary SPECT system (G-SPECT-I, (Beekman et al 2015)). The full emission data including both primary and scattered photons over a broad energy range was utilized via multiple image reconstructions at different energy windows. A patch-based CNN approach was implemented in the present work. Such an approach is often applied in medical image restoration and analysis (Beekman 1993, Guo et al 2018, Leynes et al 2018, Liu et al 2018b) due to the reduced number of parameters and increased amount of training data that is required compared to the full-image to full-image approaches e.g. U-Net (Ronneberger et al 2015). In the present study, a large number of 4D (i.e. XYZE dimensions) SPECT patches (sub-volumes of 21 × 21 × 21 voxels × 5 energies) were used as input for the CNN to estimate the attenuation coefficient of the corresponding central voxel of the patch. Our proposed method was tested on Monte Carlo (MC) simulated ^99mTc-HMPAO SPECT scans based on the G-SPECT-I geometry.

2. Methods

2.1. G-SPECT-I system

The G-SPECT-I is a multi-pinhole system with stationary detectors that demonstrates excellent resolution down to 2.5 mm (resolved rod size for Derenzo hot rod phantom) and a central sensitivity of 415 cps/MBq when using a dedicated brain collimator with 3 mm $\varnothing$ pinholes (Beekman et al 2015). This system (see figure 1) consists of nine large-area NaI detectors, an interchangeable collimator and a precisely controlled xyz-stage used for bed translation. The collimator assumed in this paper was developed for high sensitivity brain and pediatric imaging and has a total of 54 pinholes with a pinhole diameter of 4.5 mm. The G-SPECT-I has a focused geometry design, similar to that of its preclinical predecessors, i.e. various versions of U-SPECT/CT and VECTor/CT scanners that are now in use by many labs worldwide (Beekman et al 2005, van der Have et al 2009, Goorden et al 2013, Ivashchenko et al 2014). Such focused geometries entail that all pinholes simultaneously 'view' a central volume in which a very high sensitivity and complete data (without bed movement) is obtained. This central volume is termed the complete data volume (CDV). This CDV has a transaxial diameter and axial length of 100 mm and 60 mm respectively (see figure 1). For a scan of an object larger than the CDV, the bed is translated to extend the volume in which sufficient sampling is obtained for optimal image reconstruction. In the present paper, for a whole brain perfusion scan, 18 bed translations with overlapping CDVs are used for sufficient sampling of the entire scan volume based on findings in (Chen et al 2020). All pinhole projections from all bed positions together are used simultaneously for image reconstruction using the so called scanning focus method (Vastenhouw and Beekman 2007). Other details concerning G-SPECT-I are described in Chen et al (2018).

**Figure 1.** Illustration of the G-SPECT-I scanner. The G-SPECT-I system has three optical cameras and a user interface for volume of interest (VOI) selection. Based on the selected VOI, G-SPECT-I software automatically adjusts parameters for optimal imaging of the VOI. The CDV is the volume 'seen' by all pinholes; it has a transaxial diameter R_c of 100 mm and an axial length L_c of 60 mm.
Download figure:
Standard image High-resolution image

2.2. SPECT data simulations

The CNN estimation of the $\mu$ maps is based entirely on MC simulated SPECT list-mode data for the G-SPECT-I geometry using head phantoms. As the presence of noise in the MC simulated realistic brain images may hamper visualization of AC effects when evaluating the $\mu$ maps, we additionally performed a voxelized ray tracing (VRT) simulation (Goorden et al 2016, Wang et al 2017) to generate noise-free images, such that the accuracy of AC and thus the quality of the $\mu$ maps could be studied on noisy as well as on noiseless images. These simulations are described in detail below (see also figure 2).

**Figure 2.** Illustration of SPECT data simulation. (a) MC simulations of the head phantoms; for each phantom, an activity and attenuation map were generated and were used for MC simulation. Background counts due to cosmic radiation were added to the MC simulated projection data. Five SPECT reconstructions were performed from different energy windows. (b) VRT simulations to generate noise-free images on which the effects of attenuation correction with different $\mu$ maps were studied. In the figure, 'bkg' stands for background, 'WM' is the white matter and 'GM' is the grey matter.
Download figure:
Standard image High-resolution image

$\mu $ — **Figure 2.** Illustration of SPECT data simulation. (a) MC simulations of the head phantoms; for each phantom, an activity and attenuation map were generated and were used for MC simulation. Background counts due to cosmic radiation were added to the MC simulated projection data. Five SPECT reconstructions were performed from different energy windows. (b) VRT simulations to generate noise-free images on which the effects of attenuation correction with different $\mu$ maps were studied. In the figure, 'bkg' stands for background, 'WM' is the white matter and 'GM' is the grey matter.
Download figure:
Standard image High-resolution image

2.2.1. Digital phantoms for simulation

The publicly available Brainweb database with digital phantoms generated based on normal subjects (Aubert-Broche et al 2006) were used to simulate ^99mTc-HMPAO brain perfusion scans. This type of scan was chosen given the wide application of perfusion SPECT in the diagnosis of cerebrovascular (e.g. stroke), neurological (e.g. epilepsy) and psychiatric disorders (e.g. post-traumatic stress disorder) (Catafau 2001, Juni et al 2009, Amen 2015). A total of six phantoms (the first six phantoms when ordered by subject number) were used in this work, with five used for training and one for testing in a leave-one-out cross validation manner.

For each phantom, the activity map was generated by assigning tracer concentrations to grey matter, white matter and background regions (e.g. skin, skeletal muscle) with a mean ratio of 80:20:5 (Glick and Soares 1997, Stodilka et al 2000, Pato et al 2015) to mimic a realistic blood flow. This ratio was introduced with a random variation (normally distributed) with a standard deviation of 10% for each number to make the activity distribution variable among phantoms. Besides, a total activity of 50 MBq in average was set in the head (resembling an injected dose of 25 mCi as in (Laere et al 2000, Nobili et al 2002, Bowen et al 2011)). This number was induced with a standard deviation of 10% (normally distributed) for each phantom. Simulations assumed a scan time of 30 min.

The attenuation map was obtained by segmenting the phantom into different regions. For MC simulations, the maps with each region assigned with a material (e.g. water) were used (see figure 2(a)), while for VRT simulations, the maps with each region assigned with an attenuation coefficient were used (subject to the different requirements of both simulators). A total of eight different regions were segmented, i.e. skull, skin, blood, muscle, brain, water, fat and air, with a corresponding coefficient of 0.248, 0.155, 0.149, 0.147, 0.146, 0.142, 0.128 and 0 cm⁻¹ respectively. These values were calculated based on the chemical component of each tissue and the mass attenuation coefficient given in NIST (National Institute of Standards & Technology) for photons at 140 keV. The map with given attenuation coefficient is also the ground truth (GT) $\mu$ map that was used in the cross-validation step for training and for evaluation of the network predicted $\mu$ maps. All phantoms for simulations were down-sampled with trilinear interpolation to have a voxel size of 1 × 1 × 1 mm³ from its original voxel size of 0.5 × 0.5 × 0.5 mm³.

2.2.2. Noisy MC simulations

The software used for MC simulations was the Geant4 Application for Tomographic Emission (GATE) (Jan et al 2004), with Geant4 v10 and GATE v8.0 installed on a CentOS 6.6 cluster. The long scanning time (30 min) simulation was divided into multiple simulations with shorter time intervals (randomly seeded) that were executed in parallel on a computer cluster.

The system geometry in GATE was designed to closely represent the actual G-SPECT-I design, namely, the computer-aided design (CAD) drawing of the G-SPECT-I collimator was put into GATE. Additionally, nine NaI-scintillators were created natively by defining a 497 × 410 × 9.5 mm³ box and replicating it with a ring repeater with a collimator center to detector distance of 757 mm (same as in the G-SPECT-I prototype). The full emission spectrum of ^99mTc was used in the simulations. Physics processes modelled were photoelectric effect, Compton and Rayleigh scattering for gamma photons; bremsstrahlung and multiple scattering for electrons. Within the scintillators, the interaction time, total deposited energy, and energy-weighted average interaction location for each gamma photon were recorded. The uncertainty of the scintillation process and light collection is not fully modelled in the MC simulations, but rather accounted for by taking random samples from a Gaussian distribution in both the spatial and energy domain for each photon recorded on the detector. This is to accelerate the simulation process. The full width at half max (FWHM) of the Gaussians were set according to findings in (Nguyen et al 2019): for photons at 140 keV, the respective spatial and energy blurring were 3.5 mm and 10% FWHM (based on measurements of detectors at our institute); for photons at other energies these two values were calculated based on models from literature (Nguyen et al 2019).

Background counts due to cosmic radiation were emulated to make the simulation more realistic. This was done by acquiring a long void scan (10 h scan without radioactivity) with the G-SPECT-I prototype, followed by count scaling (by dividing the count numbers by 360 to emulate a 100 s scan at one bed position) and Poisson statistics generation. This process was performed for all 18 bed positions to emulate a total acquisition time of 30 min. Finally, the background counts at each bed position were added to the corresponding projections obtained with the MC simulation.

2.2.3. Noise-free VRT simulations

The VRT simulator takes the system geometry (i.e. the precise pinhole and detector positions and detector orientations) as input and models the collimator and detector crystal penetration but ignores scattering (Goorden et al 2016, Wang et al 2017). For these noise-free VRT scan simulations, the same scanner geometry and the same acquisition parameters (e.g. the same 18 bed positions) as in the MC simulation were assumed. Cosmic radiation counts were not added in the VRT simulations. Effects of patient attenuation were included in the VRT simulated projection data.

2.3. Image reconstruction

For the MC simulated projection data, five SPECT reconstructions were performed (see figure 2(a)); one used the photons detected in the photopeak window combined with a triple energy window (TEW) scatter correction (28 keV width centered at 140 keV for the photopeak window and 5.6 keV width at each side for scatter correction) and four additional reconstructions were done from different scatter windows (28 keV width centered at 120, 100, 80 and 60 keV respectively). For the VRT simulated projection data, only primary photons were simulated and could thus be reconstructed, resulting in noise- and scatter-free SPECT scans (see figure 2(b)).

All image reconstructions were performed on a 1.5 mm grid, larger than the voxel size of the digital phantoms, to mimic a continuous activity distribution reconstructed on a discrete grid. The system matrix for reconstruction was calculated using the VRT simulator for photons at 140 keV and excludes effects of object scatter. This system matrix was used in image reconstruction for all energy windows. No attenuation correction was performed during reconstruction. Similarity regulated OSEM (Vaissier et al 2016) with 8 subsets and 10 iterations was implemented for image reconstruction.

2.4. Attenuation map ( $\mu$ map) estimation

2.4.1. SPECT image preprocessing

Before being used as input to the CNN to estimate $\mu$ maps, the reconstructed MC simulated SPECT images were preprocessed (see also figure 3). Firstly, the reconstructed noisy MC simulated SPECT images (from all energy windows) were masked to remove artefacts outside the head (figure 3(a)). The mask used was a cylinder with a relatively large diameter of 240 mm to safely preserve the brain/head structures. Secondly, intensity normalization was performed to compensate for variance of reconstructed image intensity among phantoms (figure 3(a)). Here for each phantom, the maximum intensity from the photopeak reconstructed image was firstly calculated. To reduce the effects of noise and/or any strong edge artefacts (which might appear at the edge slices of the reconstructed volume), this maximum value was obtained from slightly filtered SPECT scans (with a 6 mm FWHM Gaussian) and from the central 36-mm-thick slices of the reconstructed volume (the reconstructable length even with one bed position in the axial direction of G-SPECT-I, see figure 1). Subsequently, all five reconstructions from different energy windows were normalized by division by this maximum value such that the images from different phantoms that are used as input to CNN have a similar dynamic range.

**Figure 3.** Illustration of attenuation map estimation. (a) SPECT image preprocessing; (b) architecture of the CNN; $n$ is the number of energy windows. Three networks were tested. The network that takes five, three or one energy window as input is termed CNN5E, CNN3E and CNN1E respectively. 'Conv' and 'BN' are short for convolution and batch normalization respectively. The number of the filters is 32, 64, 128, 256 and 512 as shown in the figure. (c) Leave-one-out cross validation with six phantoms; for testing on each phantom, the other five phantoms are used for training.
Download figure:
Standard image High-resolution image

The MC SPECT images were interpolated (tri-linearly) to a voxel size of 3 × 3 × 3 mm³ from an original voxel size of 1.5 × 1.5 × 1.5 mm³ before being fed into the neural network. This is to speed up the training process with a relatively larger image voxel size.

2.4.2. CNN regression

The neural network takes 4D (XYZ and energy) patches centered at each voxel as input and was trained to predict the attenuation coefficient of the corresponding central voxel in object space (figure 3(b)). The patch size used was 21 × 21 × 21 voxels × 5 energies. The voxel size of the patch images is 3 × 3 × 3 mm³. We used a typical CNN architecture that consists of multiple stages of convolution, batch normalization, ReLU activation and max pooling, followed by a fully connected layer with a sigmoid for voxel regression (see figure 3(b)). The network that takes in total five energy windows as input is termed CNN5E. In addition, we included two networks using less energy windows to investigate the effect of energy range of scattered photons on $\mu$ map estimation. These two networks are termed CNN3E and CNN1E respectively, with three energy windows (one photopeak window and the two high-energy scatter windows, see figure 2(a)) and only the photopeak window involved respectively. Convolution was performed with a kernel size of 3 × 3 × 3. The number of the filters is 32, 64, 128, 256 and 512 as indicated in the figure. Pooling was done with a grid of 2 × 2 × 2.

Leave-one-out cross validation was performed using five phantoms in training and one for testing (see figure 3(c)). Each network was trained with a balanced set containing 5k samples (4D patches) randomly selected from each of the three main tissue classes (0 ≤ μ < 0.07/cm for air voxels, 0.07 ≤ μ < 0.20/cm for water-like voxels and 0.20 ≤ μ < 0.25/cm for bone voxels; thus 15k samples in total). Meanwhile, for each epoch during training, a new selection of 15k samples was made in order to feed the network with as many samples as possible, similar as in Moeskops et al (2016). The network used in this study was trained to minimize the mean square error between the predicted attenuation coefficient $\mu$ and the GT $\mu$ map (see description in section 2.2.1 for definition).

The proposed network was implemented using TensorFlow. The Adam optimizer (Kingma and Jimmy 2014) with default settings (learning rate = 0.001, ${\beta }_{1}$ = 0.9, ${\beta }_{2}$ = 0.999) was used to train the network. A mini-batch of 15 4D patches was used. No validation set was used to determine the optimal epoch due to the limited number of phantoms in this study. The model was trained for 200 epochs for convergence.

2.5. Attenuation correction with estimated maps

As the aim is to use the $\mu$ maps to realize quantitative SPECT, attenuation correction was implemented using the estimated maps. Correction was done using an adapted first-order Chang's method (Chang 1978). In Chang's method with traditional parallel-hole collimation, the transmission along every projection line from a given voxel is calculated, and the transmission fraction ( $TF$ ) for that voxel is defined as the average transmission value among all projection lines (equation (1)).

$\begin{eqnarray}&&TF=\displaystyle \frac{1}{M}\displaystyle \sum _{m=1}^{M}\exp \left(-\displaystyle {\int }_{{L}_{m}}\mu \left(l\right)dl\right).\end{eqnarray} \tag{ 1 }$

Here M is the number of projections involved in data acquisition for a certain voxel, ${L}_{m}$ is the mth projection path of gamma photons and $\mu$ is the attenuation coefficient along the projection line ${L}_{m}.$

Due to the (multi-pinhole multi-bed-position) characteristic of G-SPECT-I, for a given voxel at a given bed position, only photons that travel toward some distinct directions will be captured by one of the pinholes (see figure 4). Thus, here we implemented an adapted multi-pinhole Chang's method. This was done by first checking—at every bed position—if a voxel is seen by a pinhole. If yes, the transmission along this projection line (from that voxel center to the pinhole center) is counted, and weighted by the pinhole's sensitivity for that voxel (see equations (2)–(4)). The parameters used in the adapted multi-pinhole Chang's method (equations (2)–(4)) are summarized in table 1.

$\begin{eqnarray}&&TF=\displaystyle \frac{1}{\displaystyle {\sum }_{n=1}^{N}\displaystyle {\sum }_{p=1}^{P}{S}_{n,p}}\displaystyle \sum _{n=1}^{N}\displaystyle \sum _{p=1}^{P}\left({S}_{n,p}\,\exp \left(-\displaystyle {\int }_{{L}_{n,p}}\mu (l)dl\right)\right)\end{eqnarray} \tag{ 2 }$

$\begin{eqnarray}&&{S}_{np}=\displaystyle \frac{{d}_{e}^{2}}{16\,{{r}_{n,p}}^{2}}\,\cos \,{\vartheta }_{n,p}\end{eqnarray} \tag{ 3 }$

$\begin{eqnarray}&&{d}_{e}=\sqrt{{d}^{2}+\displaystyle \frac{2}{{\mu }_{0}}d\,\tan \left(\displaystyle \frac{\alpha }{2}\right)+\displaystyle \frac{2}{{{\mu }_{0}}^{2}}d\,{\tan }^{2}\left(\displaystyle \frac{\alpha }{2}\right)},\end{eqnarray} \tag{ 4 }$

**Figure 4.** Illustration of the adapted first-order Chang's method used in this study. With G-SPECT-I, for a given voxel at a given bed position, only photons that travel toward some distinct directions will be captured by one of the pinholes, as shown in the figure.
Download figure:
Standard image High-resolution image

Table 1. Parameters used in the adapted multi-pinhole Chang's method.

Symbol	Description
N	Number of bed positions in data acquisition
P	Number of pinholes 'seeing' the voxel
${S}_{n,p}$	Sensitivity of the given voxel corresponding to pinhole p at bed position n (Paix 1967)
$\mu$	Attenuation coefficient along the line ${L}_{n,p}$ from voxel to pinhole p center at bed position n
${r}_{n,p}$	Voxel-to-pinhole distance, given pinhole p and bed position n
${\theta }_{n,p}$	Angle between the line of voxel to pinhole center and the line of collimator center to pinhole center, given pinhole p and bed position n (see figure 4)
${d}_{e}$	Equivalent pinhole diameter
$d$	Actual pinhole diameter (4.5 mm here)
${\mu }_{0}$	Attenuation coefficient (24.3 cm⁻¹, measured experimentally) for pinholes made of hard lead (antimonial lead, containing a mixture of Pb and Sn)
$\alpha$	Pinhole opening angle (27° here)

2.6. Evaluation

2.6.1. Visual inspection

The network estimated $\mu$ maps were compared to the GT $\mu$ maps. Besides, attenuation corrected SPECT images using the three CNN estimated $\mu$ maps (CNN5E-AC, CNN3E-AC and CNN1E-AC) were compared to the ground-truth-AC (GT-AC) images which use the GT $\mu$ maps for correction.

2.6.2. Quantitative analysis

2.6.2.1. Standard uptake value (SUV)

As often used in absolute quantification for PET and SPECT, the standardized uptake values SUVs were calculated in a number of regions of interest (ROIs). The SUV (in units of g ml⁻¹) is the mean concentration in a region ${C}_{ROI}$ (Bq ml⁻¹) normalized by the injected dose per patient weight (Bq g⁻¹), as shown in equation (5). In this work a clinically injected dose of 25 mCi and a body weight of 70 kg were assumed for brain perfusion scans for all phantoms. To define the ROIs, a template phantom based on the 'Colin 27 Average Brain' (Collins et al 1998) provided by the Brainweb database was registered to subject phantoms. ROIs were generated by warping the automated anatomical labeling (AAL v3) template (Tzourio-Mazoyer et al 2002) to subject space using the same transformation. A total of 166 regions were defined in the AAL template. These localized regions have a size in the 0.07–41.15 ml range with a median value of 3.75 ml (mean of 6.40 ml) respectively (across all phantoms).

The capability to achieve quantitative SPECT was assessed by comparing the SUVs calculated from the attenuation corrected SPECT images to those of the digital phantom image. This direct comparison of the SUVs was performed in eight big structures as in Gong et al (2018), Yang et al (2019), including four big lobes (temporal, occipital, parietal and frontal lobe), three subcortical structures (thalamus, putamen and caudate nucleus) and the cerebellum. Here in our work these eight structures were further separated into 16 ROIs by hemispheres (e.g. thalamus in the left hemisphere and in the right).

$\begin{eqnarray}&&SUV({\rm{g}}\,{{\rm{ml}}}^{-1})=\displaystyle \frac{{C}_{ROI}}{Dose/Weight}.\end{eqnarray} \tag{ 5 }$

2.6.2.2. Difference from GT-AC

Differences in units of SUV ( $DIFF$ ) when using the CNN estimated $\mu$ maps for attenuation correction compared to that of the GT $\mu$ maps were evaluated (see equation (6)). The distribution of the differences were presented in Bland–Altman plots as in Yang et al (2019). Differences (in percentage) from the GT-AC images, which is termed deviations ( $DEV$ ) here, were calculated as in equation (7). Statistical significance of the deviations was assessed using paired t tests (p < 0.05 is considered as statistically significant) as in Liu et al (2018a), Yang et al (2019)

$\begin{eqnarray}&&DIFF({\rm{SUV}})=SUV-SU{V}_{GT-AC}\end{eqnarray} \tag{ 6 }$

$\begin{eqnarray}&&DE{V}_{SUV}\left( \% \right)=\left|\displaystyle \frac{SUV-\,SU{V}_{GT-AC}}{SU{V}_{GT-AC}}\right|\times 100 \% .\end{eqnarray} \tag{ 7 }$

For all SPECT images shown in this paper, a 3D Gaussian post filter with 6 mm FWHM was applied. For quantitative analysis, measurements were performed on the unfiltered SPECT images to avoid any bias from filtering.

3. Results

3.1. Visual inspection

Figure 5 shows a comparison of the CNN estimated and the GT $\mu$ maps. Five slices obtained from one of the phantoms (indicated by the solid black lines) are displayed. A full comparison of results for all six phantoms is included in the appendix (figure A1). Figure 5 shows that compared to the GT, $\mu$ maps can be well estimated with CNN5E and CNN3E in which cases scatter windows (besides the photopeak window) were involved as input to the CNN; in these cases, the shape and size of the heads are well retrieved. The improvement by including two additional low-energy scatter windows from CNN3E to CNN5E is small.

$\mu $ — **Figure 5.** Comparison of the ground truth and CNN estimated $\mu$ maps. Five slices (equally distributed with 25.5 mm separation) obtained from one of the phantoms are displayed. Locations of the are indicated by the solid lines. The predicted $\mu$ maps of CNN1E appear incorrectly large.
Download figure:
Standard image High-resolution image

CNN1E has inferior performance compared to CNN5E and CNN3E; the head size can be inaccurately predicted with CNN1E as in the example shown in figure 5 where the predicted $\mu$ maps appear incorrectly large. Besides, an insufficient estimation of the air and bone structures was observed for CNN1E as demonstrated in figure A1 (phantom number 5). This might be due to the fact that input to CNN1E includes only the photopeak window which represents solely the activity distribution. Information to correctly determine the air and bone voxels where there is barely any tracer accumulation is thus insufficient.

Figure 6 shows a comparison of the SPECT images as well as the image profiles when using the GT or CNN $\mu$ maps for attenuation correction. SPECT images presented in this figure are the noiseless scans from VRT simulations to better visualize the effects of attenuation correction. Results of corrections performed on the realistic MC simulated scans are included in the appendix (figure A3). Figure 6 shows that the image and image profile differences from the ground-truth-AC are small for CNN5E-AC and CNN3E-AC images, while CNN1E-AC images deviate more from the ground-truth-AC.

$\mu $ — **Figure 6.** Comparison of the attenuation corrected SPECT images using different $\mu$ maps. (a) Attenuation corrected SPECT images; image profiles through each slice are included and shown in panel (b). These profiles are taken from a line with a width and thickness of 4.5 mm. A zoomed view of some parts of the profiles are displayed in panel (c).
Download figure:
Standard image High-resolution image

3.2. Quantitative analysis

3.2.1. SUV

The absolute quantification results when using the ground-truth-AC and CNN-AC are shown in figure 7. The SUVs in the 16 merged ROIs (8 structures × 2 hemispheres) are plotted. Figure 7 shows that attenuation correction is essential to achieve accurate quantification; without correction, an underestimation of about 70% for the SUVs is observed. Compared to the digital phantom, the ground-truth-AC, CNN5E-AC and CNN3E-AC images suffer from slight underestimation of the SUVs, which could be due to the partial volume effects given the finite system resolution ( $\sim 3.5$ mm) or the imperfect attenuation correction method.

3.2.2. Difference from GT-AC

In the Bland–Altman plot (figure 8), the distribution of the SUV differences is assessed. Figure 8(a) plots the distribution for measurements taken from the merged 16 × 6 regions while those from the localized 166 × 6 regions over the entire brain are given in figure 8(b). In figure 8, differences from the ground-truth-AC are small (close to the zero line) for both the CNN5E-AC and CNN3E-AC. Among these two methods, CNN3E-AC shows a slightly more diverging distribution and thus a larger difference. A deviation from the ground-truth-AC of 10% is highlighted by the semi-transparent grey region. For CNN5E-AC, all measurements are within 10% deviation across all regions.

The mean value of the deviations for measurements among all ROIs is summarized in table 2 for different CNN-AC methods. For CNN5E-AC and CNN3E-AC, the mean deviations across 16 × 6 regions are within 1.60% and are statistically insignificant (p > 0.05), while for assessments across 166 × 6 localized regions, the mean deviations are within 1.82% (p < 0.05).

Table 2. SUV deviation (mean ± standard deviation) from the ground-truth-AC across 16 merged ROIs and 166 small localized ROIs for all six phantoms. P < 0.05 is defined as the significance level.

16 × 6 regions	CNN5E-AC	CNN3E-AC	CNN1E-AC	No-AC
$DE{V}_{SUV}\,$ (%)	1.60 ± 1.53	1.59 ± 1.68	7.35 ± 8.91	67.50 ± 5.30
P	0.368	0.616	0.002	<0.001

166 × 6 regions	CNN5E-AC	CNN3E-AC	CNN1E-AC	No-AC
$DE{V}_{SUV}$ (%)	1.63 ± 1.66	1.82 ± 1.90	7.53 ± 8.80	69.25 ± 4.40
P	<0.001	0.016	<0.001	<0.001

3.3. Counts from each energy window

Table 3 provides the number of counts detected in each energy window. An example of the energy spectrum from one phantom is shown in figure 2(a). Table 3 shows that (i) the detected counts from tracer emission decreases for energy windows going from E1 to E5, and (ii) contribution of cosmic radiation to the total increases (from 5% to 15%) for energy window going from E1 to E5.

Table 3. Comparison of the count number from different energy windows (keV). The mean counts across all phantoms are given. 'M' stands for the unit million. The first row gives the mean counts detected from tracer emission based on Monte Carlo simulations. The second row gives the background counts from cosmic radiation based on a long void scan (10 h scan without radioactivity) with the G-SPECT-I prototype. Note that these two types of counts were added in projections for image reconstruction.

Mean counts	Photopeak E1 (126–154)	E2 (106–134)	E3 (86–114)	E4 (66–94)	E5 (46–74)
From tracer	7.50 M	4.20 M	3.59 M	3.29 M	2.19 M
From cosmic	0.45 M	0.44 M	0.45 M	0.54 M	0.35 M
Cosmic/tracer	6.0%	10.5%	12.5%	16.4%	16.0%

4. Discussion

The current work shows the feasibility of estimating attenuation maps by using SPECT data only without additional (radiation) scans or the need to draw head contours. This could facilitate SPECT imaging with minimal ionizing radiation or user interactions, enabling quantitative brain perfusion SPECT to be independent from data from other scanners, thereby avoiding registration issues between modalities as well.

Figures 5–7 and A1 show that despite of the inadequate $\mu$ map estimation of CNN1E, a reasonable quantification accuracy could still be obtained with this method. This might explain the broad use of an ellipsoidal region for attenuation correction in clinical routine given its benefit of simplicity and decent effectiveness (i.e. the improved accuracy compared to that of not performing a correction at all). While various contour-based methods have been proposed to attain a uniform $\mu$ map, devising one here that is suitable for our work is beyond the scope of this paper. Alternatively, a ground-truth-uniform (GT-uniform) $\mu$ map, which can be regarded as the best achievable $\mu$ map using the contour-based method, was generated by replacing all tissues in the GT $\mu$ map with the attenuation coefficient of brain tissue ( $\mu$ = 0.146/cm). This part of the results is included in the appendix (figure A2 and table A1). GT-uniform-AC suffers from underestimation of the SUVs compared to the GT-AC since all bone voxels are replaced by brain. The Bland–Altman plot of GT-uniform-AC shows a more diverging distribution of the SUV differences compared to those of the CNN5E-AC and CNN3E-AC. The mean deviation from GT-AC is within 3.78%, which is also larger than CNN5E-AC (1.63%) and CNN3E-AC (1.82%). Note that the GT-uniform $\mu$ map is the ideal $\mu$ map one could get using the contour-based method. In reality, obtaining such a contour generally requires image processing steps, which involve a set of parameters (e.g. smoothing and threshold) that are sensitive to tracer distribution and noise in image. On the other hand, with the proposed CNN method, an optimal mapping with a large number of parameters is generated automatically. Here our results show that, even though the tracer distribution and the amount of activity were variable for each phantom, CNN could still accurately estimate the attenuation map when trained on a small set with five phantoms.

Three neural networks (CNN5E, CNN3E and CNN1E) were tested in this work, where we found that compared to the use of only the photopeak window (CNN1E), the involvement of scatter windows as done in CNN3E and CNN5E improves the performance. The use of five energy windows (CNN5E) instead of three (CNN3E) only has a limited effect on results. This might be due to (i) the intrinsic weak correlation between the multi-order scattered photons and the attenuation coefficients and (ii) the relatively low scatter signal compared to the level of noise in the additional two low-energy scatter windows of CNN5E (see table 3). The results also indicate that investigating the utilization of photons from high-energy scatter windows close to the photopeak might be beneficial. In this paper, energy windows used all have an equal width of 28 keV and are already slightly overlapping (see figure 2(a)), thus adding additional high-energy scatter windows may not lead to improvements. Given better energy resolution, the incorporation of more energy windows may be possible by narrowing the width of each window. This could provide the network with photons with more precise energy-spatial information. However, the exact effect on CNN estimation of $\mu$ maps requires additional studies.

In this paper, evaluation of attenuation correction with the CNN estimated $\mu$ maps was performed on the noise-free VRT simulated scans (for better visualization of the AC effects when using different $\mu$ maps). Results of correction performed on the realistic MC simulated scans are included in the appendix (figure A3). These results show that visually the differences of using GT-AC, GT-uniform-AC, CNN5E-AC and CNN3E-AC are small on the MC simulated scans. Besides, comparing the realistic MC simulated SPECT images to the noiseless VRT scans (figure 6(a)), one could see that the effects on image quality due to, e.g. noise and imperfect scatter correction as in the former scans can be larger than effects caused by using different CNN-AC methods (e.g. CNN5E-AC and CNN3E-AC). This is not surprising as noise generally plays a critical role in SPECT. While we aim to estimate the attenuation map using deep learning for automatic and accurate attenuation correction, future development that could achieve SPECT denoising can be beneficial, and using deep learning techniques for this task is also being actively investigated (Ramon et al 2018, Reymann et al 2019, Zhang et al 2019).

As a feasibility study based on Monte Carlo simulated data, limitations of this work include a validation using experimental scans which is required to bring the method closer to clinical applicability. For implementation on experimental data, we expect that the network may be trained on physical scans with CT GT attenuation maps acquired from only a few patients, as it was shown that the proposed CNN needed only a limited number of training samples. Besides, the effect of subtle differences of attenuation correction when using a CNN instead of a CT $\mu$ map needs to be evaluated on real patient data by human readers with a specific clinical task to ultimately prove the merit of the proposed CNN approach. Additionally, we used discretized maps consisting of a limited number of tissue classes (eight classes) as the GT attenuation map. For brain scans with relatively simple attenuation maps, this approximation could be acceptable. For other applications especially those involving fine structures, e.g. myocardium imaging, more complex attenuation maps with more tissue classes or with continuous attenuation coefficients would be preferable. Furthermore, the proposed method was tested on a G-SPECT-I geometry as it is an ultra-high resolution system we are developing at our institute. Given the validity of the proposed approach on G-SPECT-I, it would be worthy to try it for other standalone SPECT devices as well. Moreover, the proposed methodology was validated only for brain perfusions scans with ^99mTc HMPAO in the present work. We expect that the approach may be applicable to other types of scans when the CNN is trained on that specific scan, yet the accuracy and effectivity have to be validated.

5. Conclusion

We have implemented and validated a neural network approach to generate attenuation maps solely from SPECT emission data for ^99mTc-HMPAO brain perfusion scans. This could enable quantitative SPECT imaging with minimal ionizing radiation and make SPECT independent of data from other imaging modalities, while the fully automated approach could reduce the subjectivity due to intra- or inter-observer variability.

Acknowledgments

Financial disclosures of authors: FB is employee and shareholder of MILabs BV. This work is conducted with financial support of the Netherlands Organization for Scientific Research (NWO-I), Physics Valorization Prize 2015 'Ultra-fast, ultra-sensitive and ultra-high resolution SPECT' co-financed by MILabs BV.

Appendix

Comparison of the GT and CNN estimated $\mu$ maps for all six phantoms are given in figure A1. Five slices with an equal separation of 25.5 mm are displayed for each phantom. Figure A2 and table A1 provide the quantitative analysis results of GT-uniform-AC. The results from CNN5E-AC are shown together here as a comparison. Figure A3 displays the MC simulated realistic SPECT scans after attenuation correction with different $\mu$ maps. Quantitative analysis on these MC simulated scans of the deviations from the GT-AC for various CNN-AC methods are not included. These deviations are the same as those given in table 2 where assessments are performed on the noiseless VRT scans as the deviations depend only on the transmission fraction of the $\mu$ maps (see equation (2)).

$\mu $ — **Figure A1.** Comparison of the ground truth GT and the CNN estimated $\mu$ maps for all six phantoms. Five slices from the top to the bottom of the head (equally distributed with 25.5 mm separation) are displayed.
Download figure:
Standard image High-resolution image

**Figure A2.** Bland–Altman plots for SUVs calculated from the noiseless VRT simulated SPECT images when using ground-truth-uniform-AC. The results from CNN5E-AC are also plotted as a comparison. The difference for ground-truth-uniform-AC are negative since all bone voxels are replaced by the attenuation coefficient of brain tissue.
Download figure:
Standard image High-resolution image

Table A1. SUV deviation (mean ± standard deviation) from the ground-truth-AC for ground-truth-uniform-AC when assessed on the VRT simulated SPECT images. The results from CNN5E-AC are also given here as a comparison. P < 0.05 is defined as the significance level.

	16 × 6 regions		166 × 6 regions
	CNN5E-AC	Ground-truth-uniform-AC	CNN5E-AC	Ground-truth-uniform-AC
$DE{V}_{SUV}\,$ (%)	1.60 ± 1.53	3.72 ± 1.58	1.63 ± 1.66	3.78 ± 1.65
p	0.368	<0.001	<0.001	<0.001

**Figure A3.** Comparison of the attenuation corrected images when using different $\mu$ maps for Monte Carlo simulated realistic SPECT scans. Five slices (equally distributed with 25.5 mm separation) from two phantoms are displayed (as in figures 5, 6). Locations of the three slices shown at the left panel are indicated by the solid lines and those at the right panel are highlighted by the dashed lines.
Download figure:
Standard image High-resolution image

$\mu $ — **Figure A3.** Comparison of the attenuation corrected images when using different $\mu$ maps for Monte Carlo simulated realistic SPECT scans. Five slices (equally distributed with 25.5 mm separation) from two phantoms are displayed (as in figures 5, 6). Locations of the three slices shown at the left panel are indicated by the solid lines and those at the right panel are highlighted by the dashed lines.
Download figure:
Standard image High-resolution image

Automatic attenuation map estimation from SPECT data only for brain perfusion scans using convolutional neural networks

Article metrics

Author e-mails

Author affiliations

Dates

Abstract

1. Introduction