Implementation and validation of time-of-flight PET image reconstruction module for listmode and sinogram projection data in the STIR library

In this paper, we describe the implementation of support for time-of-flight (TOF) positron emission tomography (PET) for both listmode and sinogram data in the open source software for tomographic image reconstruction (STIR). We provide validation and performance characterization using simulated data from the open source GATE Monte Carlo toolbox, with TOF configurations spanning from 81.2 to 209.6 ps. The coincidence detector resolution was corrected for the timing resolution deterioration due to the contribution of the crystal length. Comparison between the reconstruction of listmode and sinogram data demonstrated good agreement in both TOF and non-TOF cases in terms of relative absolute error. To reduce the reconstruction time, we assessed the truncation of the TOF kernel along lines-of-response (LOR). Rejection of LOR elements beyond four times the TOF standard deviation provides significant acceleration of without compromising the image quality. Further narrowing of the kernel can provide extra time reduction but with the gradual introduction of error in the reconstructed images. As expected, TOF reconstruction performs better than non-TOF in terms of both contrast-recovery-coefficient (CRC) and signal-to-noise ratio (SNR). CRC achieves convergence faster with TOF, at lower noise levels. SNR with TOF was superior for early iterations, but with quick deterioration. Higher timing resolution further improved reconstruction performance, while TOF bin mashing was shown to have only a small impact on reconstructed images.


Introduction
Until relatively recently, PET detectors on commercially available PET scanners had a timing resolution of a few nanoseconds. Therefore, data acquired by the scanner provided the position of the emitting atom to within a line across the two detectors. Time-of-flight (TOF) capable scanners measure the temporal difference in the γ-photon detection with sufficient accuracy to provide an indirect measurement of the most likely location of the annihilation, thereby increasing the signal-to-noise ratio (SNR) (Nemallapudi et al 2015, Dujardin et al 2018. Although early experimental PET scanners had TOF capabilities by using barium fluoride (BaF) scintillators (Laval et al 1983), these crystals had low stopping-power and the photo-detector technology and speed of the acquisition electronics were not developed sufficiently for stable operation.
In the early 2000s, the introduction of lutetium oxyorthosilicate (LSO) and lutetium-yttrium oxyorthosilicate (LYSO) scintillation crystals with high light output, good stopping power and fast responses revitalized TOF as an area of interest (Melcher and Schweitzer 1992, Moses and Derenzo 1999, Popescu et al 2004. Implementation and validation of time-of-flight PET image reconstruction module for listmode and sinogram projection data in the STIR library 2 N Efthimiou et al The first generation TOF-capable clinical scanners had a coincidence timing resolution of around 600 ps (Moses 2003), with recent systems using silicon photomulipliers (SiPM) detectors achieving 250-350 ps , Grant et al 2016.
Much of the mathematical background for TOF reconstruction was presented by Snyder et al (1981), Tomitani (1981) and Snyder and Politte (1983), who envisaged efficient reconstruction where measurements would be made with an arbitrary timing accuracy and the radioactivity distribution could be estimated by histograms derived with no more than a scaling to account for the speed of the 511 keV γ-photons (Mullani et al 1981, Snyder 1981, Wong et al 1983. The early publications focused on the benefits of using signal-to-noise ratio (SNR) gain as a metric of sensitivity (Mullani et al 1984, Wagner andBrown 1985). Conti et al (2005) presented the benefits of TOF for an LSO-based PET scanner. By comparing noise-contrast curves, they demonstrated that TOF reconstructions converge faster and achieve better contrast recovery (CR) than non-TOF reconstructions. In addition, this study highlighted the issue of coarse time binning, which may contribute to the degradation of Poisson statistics. Thoen et al (2013) used listmode TOF reconstruction of Monte Carlo simulated data to determine the effects of different PET detector timing parameters on the performance of simultaneous PET-MR systems. They studied the effect of TOF resolution, transverse pixel size and depth of interaction correction on image quality, in terms of spatial resolution, contrast recovery and SNR and the study concluded that the image quality can be significantly improved by reducing the transverse pixel size and improving the TOF resolution. Brunner and Schaart (2017), reported that the cost-efficient bismuth germanate (BGO) crystals, used mostly in older generation scanners, may be capable of providing TOF information, too.
Commercial TOF-capable systems are provided with a proprietary closed source toolkit for acquiring and reconstructing data. However, there is a strong need for independent, open source software libraries to support the development of prototype scanners (Moskal et al 2016) or research in medical imaging (Martins et al 2005, Ahn et al 2018, Berg and Cherry 2018. Independent reconstruction software packages that support TOF reconstruction include MOLAR (Johnson et al 2004, Jin et al 2013, QETIR (Thoen et al 2013, Kolstein andChmeissani 2016) and more recently CASToR2 (Merlin et al 2018).
The software for tomographic image reconstruction (STIR) library is the oldest open source software in the field, originating back to the PARAPET project (1997)(1998)(1999), a European Union ESPRIT project on parallel 3D PET reconstruction algorithms. The library's source code was first released in June 2000 and a second release was made publicly available in 2009 (Thielemans et al 2006(Thielemans et al , 2012. The current stable version of the library is v3.0. In this paper we present the introduction of TOF (projection and listmode) reconstruction in the STIR library and the corresponding validation and performance evaluation through GATE Monte Carlo (MC) simulations (Jan et al 2004). It is for the first time, a freely distributed software is being used to investigate of the benefits of the TOF reconstruction. The source code will be made available as an open source in the forthcoming release of the library.

Image reconstruction with time of flight
Tomographic image reconstruction tries to estimate volumetric images of the radiopharmaceutical distribution, using as input data acquired by the scanner. Iterative methods try to fit the data to a statistical model of the scanner which considers the underlying physics (Kinahan et al 2004). Excluding background events, the forward model of emission tomography (ET) can be represented by the linear equation: where ḡ represents a vector of the projection data expected to be measured, f represents the unknown underlying radiotracer distribution image vector and P is the system matrix which represents the scanner's geometry and physical imaging processes. Let each element p ij of the matrix P represent the mean contribution of voxel j to projection element bin i. In the simplest case of solely geometrical contributions, the weights in P can be approximated by the intersection of the volumetric elements (voxels) to a line of response (LOR), specific for each bin or a detector pair. For this calculation, STIR normally uses a variation of Siddon's ray tracing algorithm (Siddon 1985), optionally with multiple rays per detector pair (Jacobson et al 2000), although other system models are equally supported, too. All voxels which do not intersect with the LOR are immediately excluded.
TOF-capable scanners have the capacity to record the difference in arrival times between the two γ-photons. This narrows the uncertainty of the origin of the annihilation along the LOR. When the detection time difference is taken into account the equation (1) can be formulated as: where t is the index of the TOF bin.
If the voxel size is much smaller than the uncertainty in location due to the timing resolution, the system matrix can be computed, in good approximation, as: where K itj is the time-spread-function (TSF) for the tth TOF bin.

Image reconstruction
STIR supports a wide range of algorithms for the determination of the maximum likelihood estimate (MLE), including ordered subsets expectation maximization (OSEM), median root prior (MRP) and quadratic prior (QP) Bayesian one step late methods (Erdogan andFessler 1999, Green et al 2010), and the ordered subsets separable paraboloidal surrogates algorithm (Ahn and Fessler 2003). However, for the scope of this paper, only maximum likelihood expectation maximization (MLEM) was used (Shepp and Vardi 1982) as it is the simplest option, and is guaranteed to converge (even slowly) to a solution. However, due to the modular nature of STIR, all iterative algorithms are now support TOF. Jacobson et al (2000) presented the MLEM version for projection data in the STIR library, however, a newly introduced listmode MLEM (LM-MLEM) (Barrett et al 1997), is included within this major upgrade of the library and is formulated as follows: where λ (n) is the estimated image at the nth iteration, ∈ E are the events in the listmode file, d is the detector pair for event and t is its TOF bin. S j is the backprojection of the detection efficiency for all LORs, which provides the sensitivity image, given as: with D the set of all detector pairs, N d is the normalization, A d the attenuation correction factor. The absence of a sum over the TOF bins in the sensitivity image calculation is a consequence of equation (3).

TOF kernel application
For the TOF kernel model we will assume, as common in the literature, that the PET scanner measures the difference in arrival times with the uncertainty that can be modelled with a normal distribution. As current PET scanners store the data using discrete TOF information, this distribution needs to be integrated over the TOF bins. In addition, this uncertainty in arrival time needs to be converted to spatial information. Figure 1 illustrates the application of the TOF kernel.
For each voxel j in the non-TOF row, let v cj be the projection of the Cartesian coordinates of each voxel's center on the vector connecting the two detectors AB.
For a given projection of the annihilation position along the LOR (v cj ), the position corresponding to the detected timing difference follows a Normal distribution (µ, σ), where µ is v cj and σ is the standard deviation (SD) of the timing resolution of the scanner (Mehranian et al 2016).
The probability of detection in a certain TOF bin can then be computed using its cumulative distribution function (cdf). As a result, the timing dependent probability K itj is given by: where we expressed the distance of the integration boundaries [k t , k t+1 ) of the tth TOF bin, from the v cj point in multiples of the timing kernel's SD. Some PET scanners have the capability to use TOF mashing, where a number of TOF bins are combined by spacing out the integration boundaries. This process leads to smaller data sizes. Figure 2 illustrates the TOF kernel when using TOF mashing, for a single ray. The initial setup had 7 TOF bins with integration boundaries every 62.95 mm. Subsequently, a mashing factor of 3 was applied to reduce the number of TOF bins to 3 with integration size of 149.74 mm. Please note that the sum over all bins in both cases are the same. This confirms that equation (6) is implemented properly.

Scanner geometry and coincidence timing resolution
The geometry of a cylindrical PET scanner was simulated using the GATE simulation toolkit (v.7.2) (Jan et al 2004).
The scanner had 24 rings with 666 detectors per rings. No gaps were considered. The cylindrical PET hierarchy was used with, with 666 Rsectors, each one having one row of 24 crystals. The reason behind this arrangement was to the scanner's geometry as close as possible to a cylinder, without gaps, as STIR does not support models of blocked geometries.
The inner ring radius was 424.5 mm and the total axial length 110.0 mm. The crystals were made of LSO:Ce (Melcher and Schweitzer 1992) with size 4 × 4 × 20 mm 3 . The energy resolution of the system was set to 11.7% and the applied energy window was 435-650 keV. The coincidence time window was set to 3 ns.
The number of TOF bins was set to 2999 with integration size 0.149 mm. In order to have a centered zero index bin, the number of TOF bins was kept odd. Detector dead-time was not considered. Three coincidence timing resolutions were considered: 50.0, 100.0 and 200.0 ps.
Post simulation, the detection time differences were sorted in histograms and fitted to a Gaussian function. After fitting the simulated data, the actual system's timing resolution (FWHM T ) values were found to be 81.2 ps, 118.4 ps and 209.6 ps, correspondingly. The difference between the expected and the actual FWHM T can be then explained by the photon transport spread (PTS), which contributes to the timing resolution deterioration (for more information, see appendix). The values found from the kernel fitting were those used for the TOF reconstruction.
For the sake of simplicity the three timing resolution will be labelled as TOF-80, TOF-120 and TOF-210, throughout the paper.

Simulated phantom
An acquisition of the NEMA IQ phantom was simulated (National Electrical Manufacturers Association 2012). The computational phantom had the exact same dimensions as the physical IQ phantom. Four radioactive, hot, spheres (radii 5, 6.5, 8.5 and 11 mm) and 2 cold spheres (radii 14 and 18.5 mm) were introduced. The spheres were placed at the radial distance of 114.4 mm from the centre of the phantom and were centered axially. At the center of the phantom a cold cylinder made of the material equivalent to lung tissue, was placed.
The activity ratio between the sources and the background was 4 : 1. Random and scattered photon events were excluded post simulation.
The sinogram dimensions were 333 views ×320 tangential positions and 27 segments.

Image reconstruction
The voxel size of the reconstructed images was 2 × 2 × 2.08 mm 3 with 297 × 297 × 47 voxels, in total. No postreconstruction smoothing filters were applied to the images. Attenuation correction factors were produced with an analytical simulation, of the phantom, having the appropriate linear attenuation values for 511 keV γ-photons, as found in NIST (Hubbell and Seltzer 1995).
Normalization factors were calculated using a component-based methodology (Pépin et al 2012, Efthimiou et al 2015. In brief, a cylindrical back-to-back source with diameter 400 mm, without an attenuating component was simulated with Monte Carlo for a long acquisition obtaining approximately 10 × 10 8 events. Fan sums were used, taking into consideration the axial and transaxial geometrical non-uniformity, crystal interleaving, detector efficiencies and block profiles.

Comparison between sinogram and listmode reconstruction
A GATE simulated dataset of approximately 10 × 10 6 counts was used for the comparison between the LM and projection reconstruction. The agreement between the two modes was validated for non-TOF and all TOF configurations.
The TOF data were histogrammed in sinograms with 13 TOF bins using timing mashing factor 215. The same timing mashing factor was applied in LM reconstruction. In figure 3 the central nine positions out of the 13 TOF positions for a single LOR, are illustrated.
The comparison was performed in terms of relative absolute error (E) using averaged images over 10 noise realizations.

Relative absolute error
The default image comparison tool (compare_image) provided in STIR quantifies the largest error and is calculated by the following formula: where Y is the first image and X is the second to be compared.

TOF kernel truncation
The duration of the reconstruction process strongly depends on the size of the projection matrix row. With TOF, the truncation of the LOR at a specific distance from the center of the TOF bin is possible (Daube-Witherspoon et al 2007). An investigation for the potential introduction of error due to truncation on different distances from the center of the TOF bin was performed. The distances 1.4, 2.7, 3.0, 3.5 and 4σ, were considered. Listmode reconstruction, without TOF mashing, for approximately 11 × 10 6 events, was performed for 50 iterations.
For the purpose of this comparison, all reconstructions were performed on an Intel i7 Skylake 6700 K processor with 16 GB RAM and SSD drive, using a single thread. STIR provides the options of OPEN-MP and MPI (Thielemans et al 2015) but were not used in this instance.

ROI analysis
For each sphere the mean, standard deviation (SD) and coefficient of variation (CoV) were calculated using tools included in the STIR library. Circular regions of interest (ROIs) with the same radius as each source, were selected. The ROIs were drawn on the central slice, where the profile of the spheres was the largest. In our case, according to the alignment of the phantom and the scanner that slice was the 24th. Background ROIs of the same size as those used on the spheres were placed in the background area. The selected positions were nonsymmetrical with respect to the center of the image. Twelve such background ROIs per slice for ±2 slices from the central slice, for each source, were considered. All background ROIs were placed so that none was closer than 15 mm to any sphere (National Electrical Manufacturers Association 2012).

Contrast recovery coefficient
The average contrast recovery coefficient (CRC H ) for each hot sphere (H) was calculated over four noise realizations ( p ∈ P) with ≈18 × 10 6 events each, using the formula (National Electrical Manufacturers Association 2012, Westerwoudt et al 2014): where µ H,r denotes the mean value of the of hot sphere's (H r ) ROI, µ B,r denotes the mean background ROI value and α = 4 which is the true activity ratio between the hot spheres and the background. Respectively, in the case of the cold spheres C, r the CRC C,r was calculated by: The background viability was expressed as the coefficient of variation (CoV r ), calculated for each sphere with size r as:

Signal to noise ratio (SNR)
For the assessment of the SNR performance, four datasets with ≈18 × 10 6 events, were used. The SNR was calculated according to the formula (Westerwoudt et al 2014):

Comparison between sinogram and listmode reconstruction
The sinogram and listmode reconstructions are in very good agreement for both TOF and non-TOF cases with E below 0.009% at 10th, 30th and 40th iteration for non-TOF, TOF-80, TOF-120 and TOF-210. The small differences were expected as STIR uses single precision calculations. Further, investigation of the difference images at the 38th iteration do not demonstrate any striking local artifacts between the different regions (figure 4). Figure 5 illustrates the impact of different levels of LOR truncation in terms of (wall-clock) computation time and relative error in the reconstructed images for TOF-80 and TOF-120. As expected, TOF-80 which has the narrowest kernel demonstrated the largest reduction on their wall-clock computation time compared to TOF-120.

TOF kernel truncation
Reconstruction times were improved to ≈45% when truncating 4.0σ with practically no measurable impact in the reconstructed image. The majority of the rejected voxels in each TOF position did not relate to that particular position, their probability was negligible, left over from the initial non-TOF LOR.
The cases 3.5σ and 3σ seemed to be equally appropriate alternative options for the NEMA phantom simulation and the statistics obtained in this simulation. We anticipate that this will vary with noise level and regularization. Further, truncation with 1.4σ offered an acceleration of up to 55% but with a significant impact of E of the images. The largest differences are located near the boundaries of the phantom and at locations of high contrast. Larger smooth areas were not affected to the same extent in most cases, although 1.4σ demonstrated a strong impact in all locations of the phantom. Figure 6 shows that there is a clear localization of the effect of truncation.
In the case of TOF-80, STIR was able to process 6050 events per second without truncation which with 4σ increase to 10 222. Non-TOF has a throughput of 17 037 events per second.

Mean values and SD
In figure 7 the mean value in the region of the two larger hot spheres and the background SD values, are shown. TOF converge much faster than non-TOF reconstruction.
For the largest hot sphere, µ H,4 converged to the approximately same values for all configurations ( figure  7(b)), but the 6.5 mm sphere non-TOF reached a ≈9% lower value.  Worth noting that the mean values of the two sources are approximately the same with TOF reconstruction. The background SD kept increasing throughout the range of iterations, significantly faster in the TOF cases than the non-TOF.

Contrast recovery coefficient
In figure 8 CRC over CoV for 100 iterations for the largest hot and cold spheres, is presented. Note that we did not iterate MLEM long enough to achieve convergence for nonTOF, but enough to demonstrate the benefits of TOF. Contrast recovery around 97% is achieved with high resolution TOF, while low resolution TOF and non-TOF reach a slightly lower value.
Smaller spheres recover to smaller values as 90% for the 8.5 mm sphere. In all cases, high-resolution TOF performs better than low-resolution TOF and non-TOF. For example, for the 11 mm sphere, 95% recovery is reached after four iterations with a 16% noise level using TOF-80, while TOF-210 needs 17 iterations and achieves 24% noise.
The positive effect of TOF reconstruction is particularly evident in the cold spheres. Without TOF the recovery is significantly slower (figure 8(b)). Cold areas are known to be problematic to MLEM reconstruction. Figure 9 shows example reconstructed images. For each configuration the iteration number was selected where the CRC of the largest hot sphere reached 95% of its maximum CRC. For non-TOF, TOF-80, TOF-120 and TOF-210 the 65th, 5th, 7th and 13th iterations were selected, respectively.  For the circular profiles it is evident that the non-TOF reconstruction has poorer noise characteristics. In addition, the cold sources have not reached value 0 in most cases as it converges slower than the hot sphere, which was used to select the iteration number. The smaller spheres, especially the 5 mm, have the lower contrast, but more effects contribute to that, as partial volume error.
CRC sees a minimal impact from TOF mashing. Using only 13 TOF positions to perform the reconstruction resulted in 2%-3% worse recovery, mainly for higher resolutions.

Signal to noise ratio
TOF reconstruction drastically improved the SNR at early iterations, as demonstrated for the two larger hot spheres, in figure 10.
However, at a certain iteration (in most cases before the 10th) the SNR of the TOF images got poorer than the non-TOF case. Finally, non-TOF and TOF reconstructed images reach to approximately similar SNR levels in the range of iterations we studied.
On early iterations, TOF mashing has a small impact on the SNR peak value about 5% or less. But quickly after few iterations the mashed and non-mashed SNRs are aligned.

Discussion
Development of the TOF feature in the STIR reconstruction toolbox is a major update. The majority of modern scanners have TOF capable detectors. In addition, LM reconstruction is favored nowadays, as the size of the stored data is growing due to the timing positions, larger scanner sizes and better angular sampling.
Similarly to other authors (Karp et al 2008, Conti et al 2013, Westerwoudt et al 2014, Suljic et al 2015, Surti 2015, Zhang et al 2018, we demonstrated that when using TOF, CRC converges faster. Comparison between different timing resolutions showed that the convergence speed is further improved when the scanner has better timing resolution. The benefits of TOF reconstruction are more evident in cold sources, which are a known issue in non-TOF MLEM reconstruction. The mean values per ROI, for the larger spheres convergence to approximately the same value (<5% variation) is achieved, which is a good indication of only minimal bias introduction, in the TOF system model (see below for further discussion).
In terms of SNR, TOF performs better at low iteration numbers, as the mean value of the source is significantly higher than in the non-TOF case. Almost when the maximum mean value is reached, the SD becomes the driving factor and SNR decreases faster for TOF compared to non-TOF reconstruction. Similar behavior has been demonstrated by other authors (Conti et al 2013, Westerwoudt et al 2014. In our case, we found that TOF does not actually provide improved SD values, but actually the benefits on the SNR come from the improved source mean value which rises faster. This behaviour is stronger for higher timing resolutions, which makes the need to stop MLEM at the proper iteration more critical.
Monte Carlo simulations in medical imaging research are used for a wide range of purposes, support for the reconstruction of such data is in high demand. Therefore an interface which supports the reconstruction of data produced in the GATE toolbox was developed. Using GATE simulations and fast timing resolution simulations, we observed the importance of the photon detection time spread. Otherwise underestimated TOF kernels will be used in the reconstruction. It is important to fit the TOF timing resolution to the observed timing of the coinci- dences or modify the resolution by a factor l c , where l is the crystal length. To the best of our knowledge this has not been reported by other authors working with GATE simulations (Groiselle and Glick 2004, Geramifar et al 2008, Thoen et al 2013. Results in figure 7 seem to indicate that mean ROI values do not quite match between the different TOF resolutions. Aside from convergence issues, a potential cause is the slight mismatch between the actual the photon detection time spread function and the Gaussian model used in the reconstruction (Efthimiou et al 2018).
The adjustment of TOF locations and mashing, which is essential for sinogram based TOF reconstruction, has a small impact on the reconstructed images, in terms of CRC and SNR. The contrast converged slightly slower, therefore more iterations are needed, which will then influence the SNR properties. However, the impact was not dramatic in the cases of TOF configurations under study. The current TOF implementation is slower than for non-TOF, as the TOF kernel modelling is applied onthe-fly on the non-TOF LOR elements. Therefore, in order to improve this trade-off in execution time, the effect of TOF-LOR truncation was investigated, in order to accelerate the reconstruction (Daube-Witherspoon et al 2007). It was shown that 4σ and 3σ provide most of the benefits with respect to the duration of the reconstruction without any measurable impact on the reconstructed images, in all cases tested.
In the future, we plan to implement caching of the TOF system matrix for systems with sufficient available memory. In addition, the ray-tracing algorithm will be modified to directly support TOF bins. The modification would make the algorithm to start the ray-tracing from the most probable point and move towards the integration boundaries.
Finally, our future plans also include the development of a TOF scatter simulation method (Hemmati et al 2017).

Conclusion
In this work, we presented the validation of the addition of TOF reconstruction of listmode and sinogram data in the STIR library. GATE Monte Carlo simulations were used in order to provide a well-controlled dataset for the validation. The simulated data were reconstructed using a new interface for GATE generated ROOT files. All the components presented in this paper will be distributed as open source in the next version of the STIR library.
length of the crystal and c is the speed of light. Thus the total timing response is given by the convolution of two distributions, and FWHM T = FWHM CDR 2 + FWHM crystal 2 , where FWHM crystal = l c . The model predicted FWHM T of 80.1 ps, 118.2 ps and 209.36 ps, respectively, which are in agreement with the values found by fitting the simulated data.
Note that another main component of PTS is the light transfer time spread (LTTS), which is an additional factor to FWHM T deterioration (Cates and Levin 2018) and is associated to the generation and spread of optical photons. However, this effect was not considered in our MC simulation as it would severely increase the simulation time.