Flexible numerical simulation framework for dynamic PET-MR data

This paper presents a simulation framework for dynamic PET-MR. The main focus of this framework is to provide motion-resolved MR and PET data and ground truth motion information. This can be used in the optimisation and quantitative evaluation of image registration and in assessing the error propagation due to inaccuracies in motion estimation in complex motion-compensated reconstruction algorithms. Contrast and tracer kinetics can also be simulated and are available as ground truth information. To closely emulate medical examination, input and output of the simulation are files in standardised open-source raw data formats. This enables the use of existing raw data as a template input and ensures seamless integration of the output into existing reconstruction pipelines. The proposed framework was validated in PET-MR and image registration applications. It was used to simulate a FDG-PET-MR scan with cardiac and respiratory motion. Ground truth motion information could be utilised to optimise parameters for PET and synergistic PET-MR image registration. In addition, a free-breathing dynamic contrast enhancement (DCE) abdominal scan of a patient with hepatic lesions was simulated. In order to correct for breathing motion, a motion-corrected image reconstruction scheme was used and a Toft’s model was fit to the DCE data to obtain quantitative DCE-MRI parameters. Utilising the ground truth motion information, the dependency of quantitative DCE-MR images on the accuracy of the motion estimation was evaluated. We demonstrated that respiratory motion had to be available with an average accuracy of at least the spatial resolution of the DCE-MR images in order to ensure an improvement in lesions visualisation and quantification compared to no motion correction. The proposed framework provides a valuable tool with a wide range of scientific PET and MR applications and will be available as part of the open-source project Synergistic Image Reconstruction Framework (SIRF).


Introduction
A major challenge for 3D high-resolution magnetic resonance (MR) and especially PET in applications in the abdomen or thorax is physiological motion (Nehmeh and Erdi 2008, Xu et al 2011, Zaitsev et al 2015, Newby 2015, which impacts data quality during the long acquisition time of both modalities in the order of several minutes. Simultaneous PET-MR offers a range of possibilities to obtain physiological motion information and utilise it in order to minimise motion artefacts. Several approaches use high-resolution anatomical MR images or dedicated MR sequences such as tagging for accurate motion correction of PET data (Holman et al 2016, Rank et al 2016, Munoz et al 2016, Cal-Gonzalez et al 2018, Kustner 2017, Munoz et al 2018, Robson et al 2018. Other techniques combine both MR and PET information for image registration and use this to improve both MR and PET image quality (Kolbitsch et al 2018). One of the main challenges of these motion correction schemes is the evaluation of their accuracy. Commonly there is no ground truth physiological motion information available. In addition, acquisition of ground truth image information without motion artefacts (e.g. using gated acquisitions) is often not feasible due to long acquisition times. Evaluation of motion correction schemes is therefore often limited to assessing the improvement in image quality between uncorrected and motion-corrected images. Although this gives an indication of the motion-correction scheme's functionality, it does not necessarily provide information on how accurate it is.
To overcome this challenge numerical simulations are employed. While many existing simulation frameworks are dedicated to realistic simulations of a single modality (Stoecker et al 2016, Layton et al 2017, Liu et al 2017, Fortin et al 2018, Lo et al 2018 Id 2019) they are not primarily aimed at simulating simultaneous PET-MR data acquisition. There are PET-MR simulations which also incorporate physiological motion (Tsoumpas 2011, Polycarpou et al 2012, Polycarpou et al 2017. Nevertheless, these methods are limited to respiratory motion and cannot easily be extended to other types of motion or other dynamic processes such as contrast agent uptake. The design is tailored to certain applications making it challenging to adapt for other purposes. In particular, the frameworks' emphases are put onto the data generation to create more realistic anatomical input rather than utilising available solutions (Segars et al 2010, Wissmann et al 2014, Segars et al 2019. It is not possible to supply custom segmentations, motion models and surrogate signals and their output cannot be easily integrated into different reconstruction pipelines. Furthermore, simultaneous multidimensional motion simulations (i.e. combining both cardiac and respiratory motion) in a PET-MR framework remain unavailable so far.
In this study, a flexible framework to simulate dynamic simultaneous PET-MR raw data is presented. It allows for the combined simulation of a range of different motion types such as breathing or heartbeat. The framework provides ground truth (GT) motion information for the simulated dynamic processes. Thereby, an evaluation of spatial accuracy of image registration and motion correction schemes can be performed. A detailed signal model is used allowing the simulation of image contrast for different MR sequence parameters. Additional dynamic processes such as uptake of MR contrast agents and PET tracers can be flexibly included, enabling a wide range of different PET-MR applications. The input of MR and PET acquisition parameters and output of MR and PET raw data files uses standardised formats (ISMRMRD for MR (Inati 2017) and Interfile for PET (Todd-Pokropek et al 1992)). This makes it compatible with standard reconstruction frameworks (Uecker et al 2015, Thielemans et al 2012, Hansen and Sorensen 2013. The framework is implemented in C++ and incorporated as a submodule of the Synergistic Image Reconstruction Framework (SIRF) SIRF (Ovtchinnikov 2020), an open-source PET-MR image reconstruction framework aimed specifically at synergistic PET-MR applications. Its PET and MR capabilities are based on the powerful tomographic image reconstruction toolboxes Gadgetron and Software for Tomographic Image Reconstruction (STIR) (Hansen andSorensen 2013, Thielemans et al 2012).

Methods
In the following an overview of the design of the framework is given, detailing its functionality and use when simulating dynamic processes, such as cardiac and respiratory motion, contrast agent uptake and PET tracer kinetics. Furthermore, the performance of the framework is demonstrated for three different MR and PET-MR applications: (a) optimisation of image registration parameters for cardiac and respiratory motion estimation in PET and PET-MR, (b) comparison of 5D (cardiac and respiratory) motion-corrected image reconstruction of PET-MR data and (c) assessment of the effect of inaccuracies in the motion estimation on quantitative biophysical parameters obtained from motion-corrected free-breathing abdominal dynamic contrast enhancement (DCE) MR. All input parameters for the simulation were taken from patient scans carried out on a simultaneous PET-MR scanner. The default scan parameters of the simulation, including MR and PET acquisition details, hardware layout, are determined by the header information of the template data. These parameters can be overwritten by the user to further customise the simulation. A voxelised tissue segmentation in combination with an XML descriptor must be supplied as the second input. The XML file details the MR and PET specific tissue parameters. Thirdly, a motion model and/or temporal variation of contrast and tracer concentration can be specified in order to simulate dynamic processes. The simulation applies the signal and dynamic models and simulates data sampling based on the parameters defined in the input file. Finally, the simulated data are written together with the simulation parameters into a raw data file.

Simulation input and output
An overview of the framework design is given in figure 1. Input and output for the simulation is provided in standardised community format (ISMRMRD (Inati 2017) format https://github.com/ismrmrd/ismrmrd for MRI 1 , and Interfile for PET (Todd-Pokropek et al 1992)). The acquisition parameters (e.g. T E , T R , flip angle, sequence type, image resolution, number of receiver coils and k-space trajectory for MR or detector geometry for PET) are automatically set based on the header information of MR and PET raw data files. This enables and reproduction of a phantom or in-vivo scan carried out on a PET-MR scanner using the exact same hardware-and sequence-related parameters in the simulation framework, or the user can choose to overwrite them.
During the simulation, the data part of the input file is replaced by the generated simulation data yielding a raw data output file, which is fully compatible with any image reconstruction framework capable of dealing with ISMRMRD and/or Interfile raw data. In this manner, the simulation can emulate the acquisition of already available in-vivo data while simultaneously providing GT information. Furthermore, this allows using the simulation output in already existing reconstruction workflows without modifications otherwise necessary to read and process the simulation output.
In addition to the raw data file, an anatomical segmentation combined with an XML descriptor detailing the tissue parameters for each voxel in the segmentation must be supplied. Each tissue type is characterised by T 1 , T 2 , spin density, chemical shift, PET tracer radioactive activity concentration and PET attenuation value which describes its MR and PET behaviour. In this study, the XCAT (Segars et al 2010) phantom was employed to generate a tissue segmentation and respiratory and cardiac motion models. However, the input format for the segmentation is vendor-independent and arbitrary voxelised anatomical segmentations and matching motion models can be used. The MR and PET scan parameters are then utilised to simulate realistic MR and PET signal intensities for the different tissue types present in the anatomical segmentation. The MR signal for each voxel is computed based on tissue parameters and sequence type using an effective signal model (Brown et al 2014). Currently, only a low flip angle FLASH contrast is available, but extensions to other sequences with a signal model available can be easily made. The PET signal assigns the activity concentration (defined in the XML descriptor) to each voxel and transforms it into specific activity (Bq) by multiplication with the voxel volume.

Signal encoding
The signal encoding process is performed after the generation of PET and MR signal intensities. The MR raw data is generated by sampling the k-space of the three-dimensional (3D) signal. The default k-space shape and acquisition parameters are defined by the information contained in the header of the input file, but can also be changed by the user to simulate modifications to the sampling procedure. Furthermore, MR signal reception is simulated using multiple receiver coils to allow parallel imaging (Pruessmann et al 2001). Each sampled readout line is subsequently stored in the ISMRMRD data structure and written to file.
In contrast to MR, the parameters of PET raw data are defined by the geometry of the detector. This is provided by a template raw data file in Interfile format describing the desired scanner, which is currently limited to ring-shaped designs with possible future extensions to prototype architectures (Khateri et al 2019). The forward operatorÊ PET maps the activity distribution into sinogram space and includes tissue attenuation. During a simulation including motion, the motion model is applied simultaneously to the activity distribution and the attenuation map to ensure they are aligned for all simulated time points.

Noise simulation
For MR, Gaussian noise is added to the real and imaginary part of the simulated k-space data based on an signal-to-noise ratio (SNR) parameter defined in the image as SNR = S/σ n , where S is the simulated signal and σ n is the standard deviation of a complex Gaussian distribution in image space. The signal S in the MR image depends on sequence but also tissue-specific parameters (T 1 , T 2 , spin density). Therefore the user can set a reference tissue type in which the SNR is defined. This enables to simulate data with a predefined SNR in one region of the image, e.g. SNR = 15 in the liver. The noise is then added to the k-space data in Fourier space as a Gaussian distribution with a scaled width. The SNR is computed assuming data acquisition is carried out with the body coil. Therefore, using the simulation with multiple receiver coils the data will yield data with increased SNR compared to the simulation SNR parameter due to noise averaging. This has to be accounted for by the user.
The sinograms for the PET raw data given by the forward operationÊ PET applied to the activity distribution A(r) describe the decay events along each line of response (LOR) per unit time. Poisson noise is added to them based on the acquisition time T acq defined by the user.

Motion models
The framework is set up to flexibly accept arbitrarily many independent dynamic processes as input, in the following referred to as modes, which are combined into one simultaneous process during the simulation. Hence, the framework allows the inclusion of multiple four-dimensional (4D) motion models, e.g. 4D respiratory and cardiac motion as independent motion modes, that result in a data set containing cardio-respiratory motion.
To fully describe one such motion mode during the simulation, a set of motion vector fields (MVFs) describing the anatomical deformations and a motion control signal (CM) describing its temporal progression are required. The CM can be passed in the most generic format as a set of pairs (t, s) = (time-point, signal amplitude) with s ∈ [0, 1], which are interpolated onto continuous time during the simulation. Deformations must define the displacement of each voxel during the motion relative to the reference motion state given by the segmented image. The motion model must encompass all the affected organs, e.g. the entire abdomen for respiration, and must be self-consistent, such that different organs move correctly relative to each other. Anatomy and motion models available for individual organs must be combined and interpolated onto a grid matching the segmentation prior to using them as an input and must be supplied as a set of dense MVFs: m = {m n }, n ∈ {1, . . . , N} where m 1 = m(s = 0) and m N = m(s = 1) respectively correspond to the borders of the interval on which the CM is defined. However, the number of supplied MVFs does not limit the number of simulated motion states. The simulation performs a continuous simulation of the entire motion range interpolating between the discrete motion states given by {m n }: for example, assume N = 3 with {m 1 = m(s = 0), m 2 = m(s = 0.5), m 3 = m(s = 1)}. Then the state s = 0.75 is generated using m(s = 0.75) = 0.5 · (m 2 + m 3 ).

Dynamic contrast models
While motion deforms the anatomy, dynamic contrast modes can be included to modify the tissue parameters as a function of time, e.g. T 1 or activity concentration, translating into signal variation over time. Independent contrast modes can be included flexibly and are combined during the simulation to simultaneous contrast variations. As for motion, a contrast mode requires a contrast control signal (CC) describing the temporal variation of the tissue parameters. For each tissue type θ, (e.g. θ = liver) the user must define two border tissue parameter states, θ 0 = θ(s = 0) and θ 1 = θ(s = 1) and the normalised CC as pairs (t, s), s ∈ [0, 1]. Every tissue parameter θ s can be described by a linear combination between the two border condition tissue states θ 0 and θ 1 : For example to simulate the contrast changes in the liver upon injection of contrast agent, θ = liver. s = 0 means no contrast agent is in the tissue and s = 1 corresponds to the maximum concentration. For a contrast agent effecting the T 1 relaxation time the liver parameter in the simulation could be set up as liver 0 . T 1 = 900 ms and liver 1 .T 1 = 300 ms. E.g. at s = 0.3 this yields liver 0.3 .T 1 = 0.7 · liver 0 .T 1 + 0.3 · liver 1 . T 1 = 720 ms. A dynamic behaviour is generated since s(t) is a function of time. Aside from the T 1 value, the contrast modes can for example also model the decay of activity over time or changes in the attenuation map linked to respiration in PET. The independent contrast dynamic modes are not limited to one tissue type, but each tissue type (i.e. liver parenchyma, blood, lesion etc) can be assigned its own contrast mode independently without computational overhead. Naturally, the simulation combines the added contrast modes with added motion modes, yielding their combined dynamic behaviour.

Data binning
For MR, the time axis underlying the simulation process is defined by the timestamps of the MR template raw data. During a dynamic simulation it must be guaranteed that each readout line is acquired in the correct dynamic state defined by the simulation control signal at the readout's timestamp. In principle the above approach of interpolating the dynamic models onto a continuous time axis allows simulations with an arbitrary temporal resolution. In practice, however, only a finite number of dynamic states are simulated for computational reasons. This means that multiple time points are grouped together into a bin with a certain temporal width for contrast modes and with a certain motion amplitude for motion states. Readouts at these time points are acquired in the same dynamic state. Eventually, only one k-space is stored, whose individual points are sampled at different contrast and motion states. In most cases, motion leads to periodic changes in the anatomy which is different for contrast modes. Therefore, two alternative binning schemes are used (figure 2): • For contrast modes the time axis is binned directly by grouping together a set of subsequent time points.
The dynamic state is then defined by sampling the CC in the bin centre. This yields a unique partition of the time axis independent of the CC. Therefore contrast variations in many different tissue types can be simulated simultaneously without computational overhead. • For motion modes, instead of the time axis, the CM axis is split into bins and the dynamic state is defined directly at their centre. The time axis binning results from grouping together time points from the same dynamic state interval. Since the same motion state occurs repeatedly throughout the simulation it is much more efficient to only apply the motion model a minimum number of times and sample the necessary kspace points depending on their timestamps. However, for each motion dynamic, an individual time axis partition must be computed. All time points corresponding to one signal bin can be simulated with a single application of the motion model (Modat et al 2010), avoiding redundant motion simulations.
For each independent mode, their individual time axis partition is computed independently and intersected subsequently with the others to generate the simultaneous dynamic behaviour of the simulation.
For PET, however, each dynamic state is stored as an independent sinogram. The time spent in each bin defines how many counts contribute to each sinogram and hence its noise level which therefore differs between dynamic states.
Generally, for each mode the number of simulated motion states should be higher than the number of reconstructed motion states in order to accurately simulate effects such as intra-bin blurring. The suitable number of simulated states depends on the motion amplitude and should be such that the maximum amplitude in voxels divided by the number of simulated phases is below half the voxel size.

Motion ground truth
The MVFs m b a are defined to transform images I between motion states a and b: The framework yields one MVF for each of the N sim simulated motion states. The set of GT MVFs are defined relative to a reference phase I 0 , given by the segmentation input, and point to the simulated N sim motion states: The image reconstruction can yield a different number of dynamic states N reco compared to the simulated ones N sim and can also yield a different reference phase which is used in the image registration. In order to be able to compare the MVFs obtained from the reconstructed images, the GT motion information output GT ′ sim must be post-processed. This is discussed in more detail in the appendix in section A. To evaluate the registration output the simulation framework is able to generate regions of interest (ROIs) based on the underlying tissue segmentation for the different motion states.

Experiments
As mentioned above, we evaluated the framework on three different applications. In the first, we optimised the parameters of a non-rigid image registration algorithm for cardiac-resolved PET data and of a synergistic motion estimation approach to estimate cardiac and respiratory motion using both MR and PET data. In the second application, 5D non-rigid motion fields were utilised in a motion-corrected image reconstruction to compensate for cardiac and respiratory motion. Finally, the effect of inaccuracies in the motion fields on the final quantitative MR parameters obtained from DCE MR was assessed.

PET-MR simulation parameters
All tissue segmentation and motion models were generated by the XCAT software (Segars et al 2010). A simulation of a PET-MR exam on a 3 T Siemens Biograph mMR was performed using raw data files from a patient data examination with the patient's self-navigator and ECG signal as CM input (figure 3). Continuous MR data acquisition during free-breathing was simulated for a triple-echo prototype Dixon-based GRE Golden angle Radial Phase Encoding sequence (TR = 5.9 ms, TE = 1.2/2.7/4.2 ms, FA = 10 • , #phase encoding points = 125 pts angle × 256 angles) (Prieto et al 2010, Kolbitsch et al 2018. The spatial resolution of MR was 1.9×3.2×3.2 mm 3 . The simulated coil sensitivities consisted of a simple setup of four 3D varying Gaussian profiles without phase variations between the different RF receive channels. However, more sophisticated coil map simulations or measurements from an existing scanner could be easily included and passed as a simulation parameter. The PET sinograms were simulated based on the same segmentation input as the MRI data. The forward model encoded the spatial resolution of a Biograph mMR detector using ray tracing. The individual PET images were reconstructed at different resolutions depending on the application. This is described below. Attenuation was included in the simulation but scatter and randoms effects were so-far omitted. The total acquisition time was set to match the MR scan at 3.2 min. These parameters were used for all subsequent PET simulations. The cardiac signal has the shape of a sawtooth function increasing linearly from R-peak to R-peak.

Quantitative evaluation of image registration
The output of the simulation framework was reconstructed into different cardiac and respiratory motion states resulting in a series of 3D images {I s }, s ∈ [0, 1]. To generate a motion model, this image series was registered to a reference phase I s0 evaluated against the GT in an ROI containing the left ventricle myocardium. Details on the post-processing procedure and definition of the registration error can be found in appendix (A and B). Using equation (B), the maximum error over all motion phases, i.e.
is computed to quantify the spatial accuracy of the registration output. ∆ reg will be referred to as the registration error in the following. When evaluating the registration error it was compared to the maximum amplitude of the GT MVFs which is the error made in neglecting motion and reconstructing the average motion. The application of a motion-compensated reconstruction can only expect to improve image quality if the motion model is more accurate than this threshold. This comparison is depicted in figures 5 and 6.

Image registration using cardiac-resolved 4D PET
To quantitatively evaluate a registration algorithm a 4D PET dataset was simulated in 16 cardiac motion states. Eight motion states {I t } were reconstructed at the signal locations t = {0.125, 0.25, . . . 1.0} at a spatial resolution matching the MR reconstructions at 1.92×3.2×3.2 mm 3 . Attenuation correction was omitted to avoid the imprint of a static attenuation map veiling the actual motion content of the images at the cost of an apparently increased uptake in the lungs. The registration algorithm optimised an energy functional: where S is a similarity measure. In the case of this study, normalised mutual information was used as a similarity metric. The transformation was modelled as a free-form deformation (FFD) using B-spline interpolation (Rueckert 1999, Schnabel et al 2003 without further regularisation because these ensure a smooth motion field on the interpolation scale. The reconstructed images were registered for different spline control point distances ∆B to assess the influence of the intrinsic regularisation on the registration quality. The influence of the registration output on the image quality was assessed by summing the deformed image series:

Synergistic registration using 4D PET-MR
Further application of the framework was the evaluation of an advanced synergistic motion estimation algorithm (Kolbitsch et al 2018). To this end two PET-MR datasets were simulated containing 4D cardiac and respiratory motion, respectively. The MR images were reconstructed using an iterative SENSE approach (Pruessmann et al 2001) for respiration and compressed sensing (CS) (Lustig et al 2007) for cardiac motion.
Reconstructions yielded image series at the same motion states and with the same spatial resolution as the 4D with I MR/PET as the PET and MR image series relative to a phase I MR/PET s0 , and the synergy weight λ ∈ [0, 1] balancing the PET and MR contribution to the cost function. The synergy weighting parameter was varied in the range λ ∈ {0, 0.1, . . . , 0.9, 1.0} at ∆B = 12 for cardiac motion and ∆B = 24 for respiration.

Motion compensation of 5D cardiac PET-MR
All simulations were carried out using in-vivo PET-MR data as template input. Cardiac and respiratory motion were included as independent motion modes with N r = N c = 8 states respectively, yielding simultaneous cardio-respiratory motion in 64 combined motion states. This corresponded to a cardiac time interval of 125 ms at 60 bpm. Based on the cardiac and respiratory input signal the simulation output was double-binned into N r × N c = 4 × 4 motion states for both modalities, corresponding to a temporal resolution 250 ms at a motion frequency of 60 bpm. The reconstructions were performed with and without motion compensation (Batchelor et al 2005) using the ground truth motion fields. Subsequently, the resulting images were compared to corresponding reconstructions of the in-vivo patient data to verify that the proposed framework yields realistic output data. The PET images were reconstructed at a resolution of 2×2.1×2.1 mm 3 matching the patient data used in the comparison. The MR simulation data were reconstructed using an iterative SENSE approach (Pruessmann et al 2001). For PET attenuation correction (AC) during reconstruction a static attenuation map used as the simulation input was employed. For a more detailed description of the employed motion-correction scheme, please refer to Kolbitsch (2017).

Data acquisition
An abdominal DCE MR exam during the injection of an MR contrast agent (Primovist, i.e. gadoxeate disodium) was simulated with the input data also acquired on a 3 T Siemens Biograph mMR and using a GRE Golden angle Radial Phase Encoding (Ippoliti et al 2019) sequence (T R / T E = 3.29/1.36 ms, FA = 11 • , #phase encoding points = 120 pts angle × 640 angles) at an isotropic spatial resolution of 1.5×1.5× 1.5 mm 3 . Fat suppression was simulated by setting the spin density of fat to 10% of the water value.

Quantitative parameter estimation
An extended Toft's model for the temporal evolution of the gadoxeate disodium concentration in the tissue of interest (TOI) C toi (t) was fitted based on an arterial input function (AIF) extracted from the hepatic artery in the template data (Ippoliti et al 2019): with C p the blood plasma contrast uptake, v p the volume fraction of blood plasma, v e the extravascular extracellular fractional volume, k trans the volume transfer constant, ∆T the time delay of tissue enhancement with respect to the arterial concentration in plasma and * representing the convolution operator. Every parameter of the right-hand side of equation (9) depends on TOI except C p . The simulated tissues of interest were healthy liver tissue and a hepatic lesion with a necrotic core. The parameters used as simulation input can be found in table 1. The necrotic core was simulated as lesion tissue without any contrast agent kinetics. The resulting Toft's model and simulated T 1 variations are depicted in figure 4. The concentration of contrast agent was converted into a temporal evolution of T 1 using T 1 (t) = 1 R1(t) = 1 R1+r·C toi (t) where r = 6.2 mmol s −1 is the relaxivity of Primovist at 3 T (Rohrer et al 2005). The simulated Toft's model and the calculated change in the T 1 parameter extracted from patient data used as simulation input is depicted in figure 4.
The DCE simulation was performed for N r = 16 respiratory states and N c = 49 contrast states.  table 1 for three different tissue types. Right: Toft's model converted into T1 variations and respiration signal passed to the simulation as CCs based on which tissue contrast and respiratory state were changed. One can see how a contrast agent uptake leads to a drop in T1 which generates contrast between the lesion and healthy tissue in late contrast phases.

Image reconstruction and impact of motion field inaccuracies
From the simulation output a k-t-SENSE reconstruction was performed with and without respiratory motion correction (Batchelor et al 2005). In order to assess, how inaccuracies in the motion estimation would impact the assessment of the quantitative parameters such as k trans , a bias of different strength σ was added to the GT motion fields using: where m t 0 are the GT MVFs for reconstructed motion phase t ∈ {0 . . . N reco − 1}, σ is the bias strength, σ ∈ {0, . . . , 6} and n a spatially varying (but temporally constant) random unit vector field which fixes the direction of the bias for each voxel. This random unit vector was not drawn for each voxel independently, but on a larger grid and subsequently spline-interpolated to the resolution of the motion fields. This procedure took into consideration that the image registration algorithm used here is spline-based, and hence errors in the motion estimation are expected to vary smoothly between neighbouring voxels. The increase of bias with each reconstructed motion phase included potential accumulation of error with increasing registration amplitude in registration. Each of the motion field sets GT σ were then used in the motion-corrected reconstruction and k trans maps were computed from the dynamic contrast-resolved image time series. The between lesion and necrotic core for uncorrected and motion-corrected (with different values for σ) images was compared. CNR was computed as CNR = S lesion −Score σ lesion where S i and σ i are the mean and standard deviation of the k trans maps in the respective tissue regions. These were evaluated in a 2D axial slice through the lesion centre and segmented by hand.

Image registration using cardiac-resolved 4D PET
The results of the 4D PET image registration for different registration parameters are shown in figure 5. Coronal views of the motion-corrected PET image data are displayed, using both suboptimal and well-regularised registration results, as well as motion correction using the GT motion model. The displayed images are the average of all motion phases deformed into end-diastole as in equation (7). The figure also shows the respective transformations which transform end-diastole images into systole. The visualisations of the motion fields show that the motion fields with ∆B = 8 contain many areas with non-physiological motion vectors. Motion fields obtained with ∆B = 18 have a much better correspondence to the GT motion fields. Nevertheless, the motion-corrected PET images have similar image quality and appear to be accurately corrected for cardiac motion for both ∆B = 8 and ∆B = 18. The registration error evaluated in the left ventricle is depicted as a function of the spline point distance ∆B. Note that only amplitudes are plotted but the deviations between registration output and GT are computed vectorially. The effect of motion compensation using the differently regularised registration outputs can be seen in the animation in supplementary figure S1, available online at htttps://stacks.iop.org/PMB/65/145003/mmedia. The proposed Figure 5. Quantitative evaluation of 4D PET image registration. Left: Cardiac motion-corrected PET images (top row) and corresponding motion information used for the correction (bottom row). All images were motion-corrected to end-diastole. The red box indicates the heart region for better comparison of images and motion information. A similar effect on the image quality can be seen for both poorly (∆B = 8) and well (∆B = 18) regularised image registration. Only the motion field transforming the data from systole to diastole is shown here. Right: Maximum error over all motion phases between estimated motion and ground truth in the left myocardium as a function of different spline point distances ∆B of the registration algorithm. The red line indicates the maximum ground truth amplitude. A minimum can be found at ∆B = 18. Figure 6. Quantitative assessment of joint PET-MR 4D motion estimation. Left: MR and PET images used as the input for 4D respiratory and cardiac motion estimation. Myocardial contraction between diastole and systole (red arrows) and translation between in-and exhale (red lines) can be clearly seen. The depicted motion states correspond to the respective maximum amplitude of the GT motion fields for both motion types. Right: quantitative evaluation of the registration. The registration error is plotted in blue, with the maximum amplitude of the GTMVFs in red. Minima can be found in both curves yielding the best choice of registration parameter λ. framework provides the GT motion fields and allows for the calculation of a motion field error in order to quantitatively assess the performance of different registration parameters.

Synergistic registration using 4D PET-MR
The results of the quantitative assessment of synergistic motion estimation is depicted in figure 6. Systolic and diastolic and exhale and inhale images are shown for cardiac and respiratory resolved PET and MR images, respectively. Markers indicate cardiac (top) and respiratory (bottom) motion contained in the image data. PET reconstructions show large uptake in the lungs in the absence of attenuation correction. The right column shows the registration error for different synergistic weighting λ. The red line is the amplitude of the GT motion and hence corresponds to the error when neglecting motion. For both respiration and cardiac motion using both MR and PET image data for the registration (i.e. 0 < λ < 1) improves the registration accuracy compared to using either only MR or only PET image data separately. For respiration one can see that the maximum error is always much lower than the maximum amplitude of the ground truth motion whereas for cardiac motion the λ parameter needs to be fine-tuned to achieve a registration error below the GT amplitude and hence a registration results which leads to an improvement in image quality when using motion correction.

Motion compensation of 5D cardiac PET-MR
Images reconstructed from the simulated PET-MR raw data are compared to the corresponding in-vivo patient data in figure 7. Respiratory and cardiac motion leads to similar motion artefacts in both datasets for PET and MR. In the MR images blurring of anatomical structures such as the heart or the liver can be clearly seen for both in-vivo data and numerical simulations. The PET images show blurring of the myocardium and papillary muscles. Motion correction of respiratory and cardiac motion in the in-vivo data improves the image quality, leading to a better depiction of the anatomy comparable to the motion-free reference image obtained with the simulation framework.

DCE abdominal MRI
The reconstructed DCE images are pictured in figure 8 for both simulation and patient data. For the two datasets both respiration averaged and motion compensated k-t-reconstructions are compared in selected phases. An ROI containing a lesion with a necrotic core is zoomed in on. The slice showing the respiration average has been selected to show the same anatomical region as the respiratory-compensated images. For the selected ROI the k trans parameter of a Toft's model fit to the DCE images is depicted. In simulation and patient data motion compensation improves the visualisation of the necrotic core of the lesion with lower k trans value compared to the lesion itself. In the k trans maps computed from motion-averaged cases the necrotic core is strongly blurred.
This finding is confirmed quantitatively by the results shown in figure 9. The left plot shows the deviation from the true motion for the different σ values of the maximum respiratory amplitude. Only the configuration of σ = 6 has a deviation larger than the GT amplitude marked by the dotted line. Upper right: the CNR in the k trans fits between lesion and core as a function of MVF bias. A non-linear relationship is revealed resulting in a decrease in CNR between lesion and necrotic core in the k trans . A ROI-averaged bias larger than 4 mm yields a reduced CNR, in which the core is not visible any more. Deviations larger than 5 mm yield a CNR of the same magnitude as if no motion compensation had been employed (indicated by the dotted line). The k trans maps for the different bias strengths in which the CNR was evaluated are shown below the quantitative plots. While one can see a degradation in contrast for growing bias, between σ = 0 and σ = 3 the core stays clearly outlined.

Discussion
This paper presents the design and application of a simulation framework used to generate realistic dynamic PET-MR raw data. The framework is specifically targeted at providing dynamic ground truth information for different types of physiological motion. In addition, other effects such as contrast or tracer dynamics and chemical shift between fat and water in MRI can be included. It aims at supporting the development of new-and validation and improvement of currently employed reconstruction methods and, more specifically, PET-MR motion correction approaches.
The success of a motion-corrected reconstruction depends on the accuracy of the estimated motion model from image registration. However, an improvement in image quality might be an insufficient   (11) computed as the mean bias averaged over all motion phases in an ROI including only the lesion and the necrotic core. Error bars correspond to the standard deviation in the ROI. σ = 0 yields the GT MVF with no bias; the dotted line indicates the mean motion amplitude used in the simulation. Only the largest bias σ = 6 exceeds the average motion amplitude. Right: variation of the CNR in lesion and core with different MVF bias. Upper dotted line: CNR without including motion in the simulation. Dotted line: resulting CNR without including motion correction in the image reconstruction. It can be observed that the CNR drops to the level of neglecting motion correction for deviations larger than 4 mm. Bottom: series of ktrans maps resulting from different bias strengths. A degradation in contrast for increasing bias starting from σ = 3 is visible. The CNR evaluation was performed on manual segmentations of these images.
indicator for successfully modelling motion. Figure 5 shows that also motion fields which are not physiologically plausible can lead to high image quality. Non-rigid image registration offers many degrees of freedom ( The proposed approach offers here the possibility to simulate not just dynamic data but also GT motion information which can serve as a reference standard, similar to previously proposed MR-only approaches (Schnabel et al 2003, Rohr et al 2003, Roche et al 2001. This allows for a comprehensive evaluation of image registration techniques without the need for any manual annotations (Castillo et al 2009). The realism of numerical simulations of human anatomy is often limited by the accuracy of the organ and tissue segmentation, assigned tissue parameters and signal and acquisition models. The proposed method is not developed for a specific anatomical model, but can be combined with any pixel-based model of the human anatomy (Segars et al 2010, Tsoumpas 2011, Polycarpou et al 2017, Lo et al 2018. The presented framework builds itself on flexible input possibilities allowing the inclusion of currently existing and future human phantoms in a generic way. The output in standardised format allows direct integration of the simulation results into existing workflows. The computational demand is kept low, by directly computing effective MR and PET signals, and omitting Bloch and Monte-Carlo simulations. However, this comes at the cost of reduced realism, neglecting intravoxel effects and yielding identical contrast over regions of the same tissue. Despite this limitation, the presented framework is able to generate data which is very similar to real patient data as shown in figure 7. Nevertheless, more complex signal models could be included in the framework. Currently randoms and scatter effects, positron range and partial volume effects are not part of the PET simulation but this could also be added in the future. Following this could also yield applications in training data generation for machine learning.
All simulations were performed on a Virtual Machine running on a workstation with access to 58 GB of RAM and 8 CPUs. Using this setup and a 208 × 208 × 208 voxel segmentation the time to simulate 4 motion states was 25 minutes for PET using a Siemens Biograph mMR detector consisting of 11 segments and 4 minutes for MRI simulating a 96 000 phase-encoding points on a 3D GRPE trajectory acquisition. The simulation time scales linearly with the number of dynamic states included. The current implementation only allows equal-sized CC bins and CC is sampled at the centre of each bin. Hence to avoid non-linear intra-bin CC behaviour the number of simulated states must be high. Further improvement could be achieved by computing the average CC in each bin instead and perform automatic bin sizing based on the slope of the CC signal to ensure proper signal sampling while reducing computational costs.
In addition to comparing the image quality of the simulation to the template data, the framework was used to evaluate the accuracy of registration algorithms. With the GT information the optimal registration parameters for 4D motion in PET, shown in figure 5, MR images or weighted combinations thereof, as in figure 6 could be determined. For a strong weighting of the MR image information, image registration is affected by the undersampling artefacts in the MR images and hence yields inaccurate motion fields. The results for the optimal weighting between MR and PET image information obtained from the simulated data was λ = 0.4 and 0.2 for respiratory and cardiac motion. This is in good agreement with the values reported from the in-vivo data of Kolbitsch et al (2019). The framework was also shown to be feasible in standalone MR applications, such as the DCE simulation presented in 4.3. In this case, the number of simulated and reconstructed dynamic phases coincided for contrast as well as motion. The framework was able to produce data with realistic content and the same dynamic behaviour as observed in the patient data acquisitions. Furthermore, it enabled a Toft's model fit and reproduced the k trans -maps found in the template input data. An improvement in CNR between lesion and necrotic core when applying respiratory motion compensation could be visually detected for the simulation results where the motion compensated fit showed a clear delineation of the core. To quantify this effect different degrees of bias were added to the MVFs in a fashion very similar to errors produced by artefacts and noise. Employing the biased MVFs in image reconstruction prior to fitting k trans leads to different CNR between active tumor volume and necrotic core (figure 9). Motion compensated k trans maps yielded a well delineated core for ROI-averaged deviations of the MVFs from the correct motion in the maximum amplitude motion phase up to the limit of approximately twice the voxel size. Deviations larger than 4 mm, however, showed a CNR comparable to completely neglecting motion. The direction of bias was fixed to one random configuration and scaled to increase the deviation from the true motion. While the importance of precise image registration to obtain high-quality k trans could be shown, to determine the exact behaviour of the CNR evaluations with different directions of bias (i.e. different noise realisations) would have to be carried out. In this study, the number of used template datasets was limited to two, one from a PET-MR and from a free-breathing DCE abdominal exam. The focus was set to prove the feasibility of the framework in multiple different tasks. It was not examined how the degree of realism changes with different template data or different segmentation input. Furthermore, for the cardiac PET-MR simulation, no pathology (e.g. fibrotic tissue in the myocardium) was added. Future extensions in order to improve the realism of the simulation framework will include scatter and randoms simulation and the application of a positron range kernel for PET, and more realistic coil map simulations for MR.
The output in standardised raw data format enables reconstruction with open-source reconstruction packages such as Berkley Advanced Reconstruction Toolbox (BART), Gadgetron, STIR and other frameworks able to read ISMRMRD or Interfile format. The framework allowed an assessment of image registration accuracy to be efficiently performed on data simulated with parameters as would be used in-vivo.
In this study, we used a spline-based model for image registration. Also the GT motion fields used as input to the simulation were based on the same model. This could lead to a bias in the evaluation of image registration algorithms and requires further investigation.
The framework is not limited to periodic motion. The motion model can include global rotations and translations, which in combination with a non-continuous CM (e.g. a step function) enable the simulation of sudden patient motion.
Simulations often serve as means to validate newly developed methods, however they should also become a useful tool to allow others to reproduce previous results. Supplying the software as part of SIRF https://www.ccppetmr.ac.uk/ ensures it will be available as part of a well-maintained framework with an active PET-MR community. Implemented in C++, Matlab and Python interfaces will enable easy use and both simulation and output can be shared among the community. Additionally, the simulation and all its dependencies as part of the SIRF software will be supplied in a virtual machine or Docker container.

Conclusion
A novel framework to simulate dynamic PET-MR data with motion and contrast changes was presented. It allows for flexible user input and generates output in standardised raw data format, integrating into existing workflows. Its performance was evaluated in different applications and its output compared to patient data, showing that realistic data was generated. The value of the framework for the evaluation and optimization of registration algorithms was assessed and demonstrated for PET, MR and synergistic PET-MR image registration approaches of cardiac and respiratory motion estimation. Especially for complex reconstruction pipelines (e.g. motion-corrected image reconstruction with subsequent model-fitting to obtain quantitative biophysical parameters) this framework offers the possibility to evaluate how inaccuracies in motion estimation are propagated through the pipeline to the final diagnostic parameters. Figure A1. Schematic overview of relationship between registration and ground truth outputs. N sim = 4 red dots show the simulated motion states, Nreco = 3 black dots show the reconstructed states. Red arrows symbolise the MVFs provided by the simulation, black arrows show the registration output against which the GT must be compared. The arrows' starting and endpoint symbolise in which coordinate system they are defined and to which they point. The post-processing steps discussed above would correspond to the following operations: first the red arrows are weighted to create three arrows which point from s = 0 to each of the black dots. To adjust their starting coordinate system their starting points is shifted by the blue offset vector is added s.t. they each depart from the black dot at s = 0.25 instead of s = 0. Then GT and registration coordinate systems match and their content can be meaningfully compared. Figure S1: Effect of application of different motion fields on the binned image data. (a) Input motion. (b+c) Motion compensation using the registration output regularised with spline distance 8 and 18, respectively. (d) Motion compensation using the ground truth motion. For both spline distances the different motion resolved images are transformed to the reference phase (i.e. end-diastole) and no cardiac motion is visible any more.