A probabilistic deep learning model of inter-fraction anatomical variations in radiotherapy

Abstract Objective. In radiotherapy, the internal movement of organs between treatment sessions causes errors in the final radiation dose delivery. To assess the need for adaptation, motion models can be used to simulate dominant motion patterns and assess anatomical robustness before delivery. Traditionally, such models are based on principal component analysis (PCA) and are either patient-specific (requiring several scans per patient) or population-based, applying the same set of deformations to all patients. We present a hybrid approach which, based on population data, allows to predict patient-specific inter-fraction variations for an individual patient. Approach. We propose a deep learning probabilistic framework that generates deformation vector fields warping a patient's planning computed tomography (CT) into possible patient-specific anatomies. This daily anatomy model (DAM) uses few random variables capturing groups of correlated movements. Given a new planning CT, DAM estimates the joint distribution over the variables, with each sample from the distribution corresponding to a different deformation. We train our model using dataset of 312 CT pairs with prostate, bladder, and rectum delineations from 38 prostate cancer patients. For 2 additional patients (22 CTs), we compute the contour overlap between real and generated images, and compare the sampled and ‘ground truth’ distributions of volume and center of mass changes. Results. With a DICE score of 0.86 ± 0.05 and a distance between prostate contours of 1.09 ± 0.93 mm, DAM matches and improves upon previously published PCA-based models, using as few as 8 latent variables. The overlap between distributions further indicates that DAM’s sampled movements match the range and frequency of clinically observed daily changes on repeat CTs. Significance. Conditioned only on planning CT values and organ contours of a new patient without any pre-processing, DAM can accurately deformations seen during following treatment sessions, enabling anatomically robust treatment planning and robustness evaluation against inter-fraction anatomical changes.


Introduction
Modern radiotherapy techniques such as intensity modulated proton therapy (IMPT) have the potential to deliver highly conformal doses to tumors while maximally sparing organs at risk (OARs). Although offering dosimetric advantages with respect to conventional modalities, such treatments are particularly sensitive to geometrical uncertainties arising from setup errors before delivery or range errors caused by organ movements between or during treatment sessions. In the presence of uncertainties, planned doses are delivered to anatomies different from the 3D computed tomography (CT) scan used during treatment planning, which may translate into shifting high dose regions away from clinical target volumes (CTVs) into critical OARs. Being one of the main sources of error in, e.g. prostate cancer treatments (van Herk et al 2002), the magnitude of the deformations and their effect on the final dose distribution must be quantified to ensure robust delivery. Ideally, treatments could be real-time adapted via image guidance, or alternatively adjusted before each treatment session (Jagt et al 2017(Jagt et al , 2018, but such adaptive workflows are constrained by the speed of the CT acquisition, delineation, dose calculation and treatment re-optimization processes in practice. An efficient alternative currently used in the clinic consists of including setup and range uncertainties during treatment planning optimization to design robust treatment plans that withstand positioning and range errors (van der Voort et al 2016, Unkelbach andPaganetti 2018, Rojo-Santiago et al 2021). Similarly, inter-fractional movement information could be incorporated during treatment planning or treatment evaluation to make treatment plans robust against complex geometrical variations. To account for such anatomical changes, some published works propose computing expected dose distributions using weighted scenarios, where each scenario corresponds to the dose deposited in a patient geometry generated by an anatomy model. Typically, such models extract the main eigenmodes of organ deformation-groups of correlated movements-via principal component analysis (PCA) (Söhn et al 2005, Jeong et al 2010, Budiarto et al 2011, Szeto et al 2017. During the last decades, linear PCA models have been successfully employed to quantify and understand the effect of organ deformations in different treatment sites and modalities (Thörnqvist et al 2013a, Rios et al 2017, Magallon-Baro et al 2019; to extend clinical volumes with extra margins and compensate for anatomical changes (Thörnqvist et al 2013b, Bondar et al 2014; to characterize respiratory deformations (Zhang et al 2007, Badawi et al 2010, Dhou et al 2015, 2015; and to simulate dosimetric outcomes of delivery in the presence of geometrical uncertainties (Nie et al 2012, Söhn et al 2012, Xu et al 2014, Tilly et al 2017. Focusing on conventional photonbased modalities, most of these studies are based only on organ contours without including CT intensity values, and require time-consuming image registrations as pre-processing to find corresponding points across a population of patients before being usable for learning generic deformations. Furthermore, all previously introduced models are either patient-specific (requiring several CTs per patient) or population-based (applying the same set of deformations to all patients), which limits their accuracy and applicability. For widespread adoption of anatomically robust treatment planning, we require accurate probabilistic models quickly generating patient-specific treatment anatomies.
All published PCA models learn correlated organ movements from a dataset of 3D deformation vector fields (DVFs), where each vector indicates the magnitude and direction of displacement for each point in a voxelized volume. Such DVFs can be obtained via image registration algorithms finding a nonlinear correspondence between, e.g. two CT scans (Ashburner 2007, Vásquez Osorio et al 2009, Bruveris and Holm 2015. While traditional not data-driven algorithms require minutes to solve a registration task, recent deep learning based methods reduce computing times down to few seconds and additionally increase registration accuracy . Several architectures generating DVFs and warping pairs of images have been proposed and applied to radiotherapy problems such as automated contour propagation in adaptive workflows (Liang et al 2021) or 4D image registration and generation of moving images due to breathing in lung geometries (Lei et al 2020, Romaguera et al 2020, Chang et al 2021, modeling intra-fraction breathing movements that occur during the delivery of the patient. Our objective, however, is to generate a set of DVFs to warp a single planning CT into different repeat CTs that are likely to be observed during the course of a radiotherapy treatment. Ideally, a suitable model would be able to implicitly capture the relative likelihood of correlated groups of movements depending on the input patient geometry. Probabilistic frameworks based on variational inference ( Extending on these recent architectures, we present a probabilistic deep learning framework that represents common anatomical movements and deformations in a population of patients using few latent variables. The proposed daily anatomy model (DAM) first generates DVFs conditioned on an input planning CT scan and latent variables, where each combination of latent variables corresponds to a different group of movements; and subsequently warps the planning CT with the generated DVFs into a set of artificial repeat scans. We train the model using a dataset containing planning and repeat CTs recorded at different stages of prostate cancer treatments in three different institutions, evaluating whether DAM is able to learn realistic movements with two external patients. Compared to previous methods, DAM does not require any pre-processing registration step and can in principle be applied to quickly simulate patient anatomies for treatment adaptation and robustness evaluation purposes.

Methods and materials
Here we describe the fundamentals of the variational framework used to capture anatomical variations, including the different parametric models and the procedure used to tune their parameters. Subsequently, we describe the model architecture in detail, together with the data and the evaluation metrics used in each experiment.

Proposed framework
During the course of a radiotherapy treatment, the internal structures and organs of the patient change between fractions/days. As a result, the anatomy captured in the planning image As demonstrated in previous studies (Budiarto et al 2011) for treatment sites like prostate, common anatomical variations such as volume and contour changes are observed across an entire population. Based on the existence of such generic movements we assume that, given a planning image x and structures s x , there is an unknown patient-specific generative distribution P * (y|x, s x ) of repeat scans that can be approximated via a probabilistic model with learned parameters. Given a planning image from a new patient, we can sample the resulting model distribution P θ (y|x, s x ) parametrized by θ to generate a set of artificial anatomies observed at future treatment stages.
In this case, θ corresponds to the parameters of the U-net neural network that is used to compute a DVF : 3 3 F    mapping coordinates between images. We model Φ as a diffeomorphic transformation, which is invertible, preserves topology, and in our case practically allows obtaining the forward and inverse transformations in a very simple manner. Such diffeomorphic transformation is represented by a stationary velocity field v: 3 As for the inputs, we discretize the velocity field v Generative model. We use a probabilistic model that conditions the generated DVFs (and thus also the repeat images) on N unobserved latent variables z N Î  capturing the main factors of variation in the data, i.e. the main groups of anatomical deformations. The latent variables distribute following a multivariate Gaussian prior probability distribution that depends on the input planning anatomy where the mean μ θ and diagonal covariance matrix Σ θ are deterministic functions calculated by a neural network referred to as Encoder (figure 1), which corresponds to the down-sampling path of a U-net. The prior dependence on the input results in a different distribution over latent variables per patient, which allows the model to select the groups of movements that are likely to be observed for each specific input image. The Encoder additionally outputs a volume r = g θ (x, s x ), which is the results of several deterministic convolution operations containing features from the input. Since r is a deterministic function of the input, we substitute any conditioning on r with x and s x in the remainder of the paper. The relationship between the input planning image and latent variables and the output warped repeat images is computed in the up-sampling path of the U-net, which takes sampled latent variables and the lowdimensional features r to generate a velocity field v z,θ = f θ (z, x, s x ), where the subscripts denote the deterministic dependence to z and θ. After exponentiating v z,θ to obtain the DVF Φ z,θ , the output repeat image y M Î  is obtained by warping the input as y = Φ z,θ •x.
Different latent variable samples z result in different repeat images given the same input planning scan, and the modeled distribution of repeat images can be recovered as a function of the prior P(z|x, s x ) and a likelihood P θ (y|z, x, s x ) distributions as The choice of the likelihood distribution affects the final loss function. Based on previous work (Krebs et al 2019), we model the likelihood distribution as a function of the cross-correlation (CC) between predicted y and ground-truthŷ images, scaled by a constant λ as The CC has been empirically found to yield better similarity than other metrics such as the mean squared error (Balakrishnan et al 2019), with larger CC values corresponding to more alike images. Let y(p) andˆ( ) p y denote the intensity values for each voxel at position p in the predicted and ground-truth images, respectively. If w(p) andˆ( ) p w are images where each voxel is the local mean of the n 3 neighbouring voxels, e.g. , instead of sampling the likelihood P θ (y|z, x, s x ) each time during inference to generate anatomies, we always use the mode of the distribution Φ z,θ • x.
Learning. With the presented probabilistic formulation, the goal is to maximize equation (2) by learning the parameters θ from a dataset containing planning x and repeat y pairs. However, estimating the integral over the latent space would require sampling a large number of latent variables, being intractable in practice. Instead, we resort to a variational framework and define an approximate posterior distribution Q ψ (z|x, s x , y, s y ), parametrized by an Inference Network with parameters ψ. During training, the Inference Network has access to the real repeat scans and predicts the parameters of Gaussian distribution covering a small region of the latent space containing variables that are likely to explain the deformation between x and y scans. Thus, the predicted Gaussian is x y x y x y m S = y y y with deterministic mappings μ ψ and Σ ψ computed by the Inference neural network. Our formulation allows estimating the model parameters θ and ψ by minimizing the negative evidence lower bound as The lower bound balances two terms: the D KL ( · || · ) term-Kullback-Leibler (KL) divergence-forces the approximated posterior to be close to the prior distribution, while the first term corresponds to maximizing the CC, encouraging similarity between real and generated images. Further details about deriving the lower bound are included in appendix B.
Explicit regularization terms. The current form of the likelihood enforces image similarity regardless of structure overlap or DVF quality. We modify the lower bound and add two regularization terms to enforce realistic predicted anatomies. To encourage smooth and realistic DVFs, we introduce a spatial regularization term that penalizes large unrealistic spatial gradients , , of the DVF Φ z,θ , which is multiplied by a constant κ as Figure 1. Proposed generative framework. The probabilistic models are embedded within a U-net, where the down-sampling path is referred to as Encoder, and the up-sampling path is the Generator. The Encoder takes the planning CT and structures and outputs both a compressed representation of the input r as well as a distribution P(z|x, s x ) over the region of the latent space containing variables corresponding to plausible patient-specific movements. Given r and any sample z from the latent space distribution, the Generator outputs a deformation vector field that is used to warp the planning CT into an artificial repeat CTs.
A segmentation regularization term is added to improve the overlap between propagated and ground-truth structures, using the DICE score (defined between 0 and 1, where 1 denotes perfect overlap). For K structures, let s y k be the voxels in the ground-truth scan with structure number k ä [ With these two terms multiplying the likelihood in the lower bound of equation (9), the final optimization problem becomes x y s s x y s s z z x s y s z p z , 2 with x, y, s x and s y sampled from the real data distribution P D (x, y, s x , s y ). Î´ with categorical labels depending on the organ present in each voxel. Given the stochasticity in the density of the rectum fillings, we adhere to clinical practice and mask all voxels in the rectum and set their intensity to −1000 (vacuum).

Model architecture
As shown in figure 2, the proposed variational framework comprises two different models, parametrized by artificial neural networks: the Inference network and the probabilistic U-net with down-sampling and upsampling paths denoted as Encoder and Generator, respectively. Based on the input planning CT and structures, the Encoder computes (i) a low-dimensional volume of input image features r, and (ii) the parameters μ θ and Σ θ of the prior distribution P θ (y|z, x, s x ) over a region of the latent space containing movements that are likely to be observed for the patient. The prior depends on the input, thus one of the functions of the Encoder is selecting primary groups of movements for each patient based on planning CT anatomy. The Generator takes the features r and sampled latent variables z ∼ P θ (y|z, x, s x ) and produces the velocity field v z,θ that is exponentiated to obtain a diffeomorphic transformation Φ z,θ .
During training, the Inference network takes a pair of planning and repeat CTs and outputs the parameters μ ψ and Σ ψ of the distribution Q ψ (z|x, s x , y, s y ) over a much smaller region of the latent space containing latent variables that explain the deformation between both images. The DVF resulting from such latent variables is used to warp the planning CT into artificial repeat CTs y and structures Φ • s x . The distributions Q ψ (z|x, y) from the Inference network and P θ (z|x) from the Encoder are forced to overlap via the KL divergence in equation (9), while the artificial CT and structures are forced to match the ground-truth repeat CTs via the CC and DICE terms in the likelihood.
For the model with the lowest validation loss, the Encoder and Inference network are identical: three consecutive convolutional blocks, where each block contains a 3D convolutional layer with 32 channels and a 3 × 3 × 3 kernel followed by Group Normalization (Wu and He 2020), a rectified linear (ReLU) activation and a max pooling down-sampling operation. At the lowest level, an additional 3D convolution with 4 channels results in the low-dimensional feature volume r 4 8 8 6

Î´´
, which is mapped to the means and variances of the prior distribution via two different fully-connected layers. Conversely, the Generator first concatenates the latent variables to r as an additional channel, and then applies three up-sampling convolutional blocks with 32 channels. Two additional 3D convolution operations with 16 and 3 channels result in the final velocity field v z,θ . All models are trained for 1000 epochs using a learning rate of 0.001, hyper-parameters κ = 0.1 and λ = 1000, and the Adam optimizer (Kingma and Ba 2017) with default parameters.

Experiments
We assess the model's accuracy in both generating feasible groups of deformations and reconstructing the ground-truth repeat scans. Additional experiments aim at exploring the structure of the latent space and the types of movements triggered by different latent variables.
• Reconstruction accuracy. Given a planning and one of its repeat CTs in the test set, the Inference network can be used to obtain the latent variables corresponding to the deformation between both images, which are in turn used to get the DVF and warp the planning scan. For all 22 test planning/repeat pairs, we compare such generated repeat CTs to the ground truth repeats via computing the CC (equation (4)) and the DICE score (equation (8)). Additionally, we warp points i 3 p Î  on the surface of the planning prostate and calculate their distance to corresponding pointsˆi 3 p Î  on the surface of the repeat prostates via the mean surface error as To allow for a fair comparison with PCA-based methods, we compute the mean and standard deviation across the same L = 5864 randomly chosen points as in previous studies (Budiarto et al 2011). Finally, we evaluate the effect of the latent space dimensionality by comparing all accuracy metrics for different models trained with a varying number of latent variables.
• Generative performance. To finally be applied in clinical settings, the generated movements must match those from the recorded CT scans. Based on a previous study quantifying anatomical changes in prostate patients (Antolak et al 1998), we compute the volume changes and center of mass shifts between planning and repeat scans, and compare their distributions obtained using real and artificial repeat CTs. To be able to compare to the reference values (Antolak et al 1998), we reduce center of mass shifts to a single value by computing the average of absolute differences across coordinates. An additional Inference Network takes a pair of planning and repeat CT and outputs the parameters of a distribution over a smaller region of the latent space that is likely to capture the deformation between the two images. The prior distribution predicted by the Encoder is forced to the distribution produced by the Inference Network via a KL-divergence term in the loss. Additionally, a reconstruction term encourages the resulting artificial CT (obtained after warping the planning scan with the predicted deformation) to be similar to real repeat CT.
• Latent space analysis. By individually varying the values of each latent variable while keeping the other fixed, we numerically and visually assess the volume changes and center of mass shifts triggered by each variable. Finally, to understand the structure of the latent space, we obtain the latent variables from all pairs in the dataset and classify them according to the magnitude of their induced center of mass shifts and volume changes. Ideally, similar latent variables should correspond to similar deformations. which can be verified by plotting a 2D representation of the N latent variables using t-SNE (van der Maaten and Hinton 2008) together with their associated label to determine the presence of clusters.

Results
In this section, we evaluate DAM's performance in generating realistic CTs with anatomical changes that match those of the real recorded repeat CTs. First, the reconstruction accuracy of real CTs is assessed, followed by an analysis of the latent space, and the types of deformations captured by the latent variables.

Reconstruction accuracy
Given a planning-repeat pair of CT scans and structures in the test set, a repeat scan can be reconstructed via the same framework as used during training: sampling latent variables with the Inference network that are used by the Generator to generate a DVF. To verify the similarity between DAM's reconstructions and the real repeat CTs, we compute three metrics assessing CT and structure overlap: the CC, DICE score, and surface error e. All three metrics in figure 3 are computed for different models trained with a varying number of latent variables, from 1 to 32. The values shown for 0 latent variables correspond to using the planning CTs as a prediction, which is equivalent to disregarding any model. First, the cross correlation between the real and reconstructed repeat CT is shown in the left plot of figure 3, indicating that the model significantly improves when adding the first few variables, whereas no substantial is observed beyond 10 variables. As seen in DICE scores for the prostate and rectum from the middle plot in figure 3, DAM can model prostate deformations with high accuracy even with a single latent variable, while representing rectum movements generally requires a slightly larger latent space with ≈8 variables. The relative simplicity in capturing prostate movements is further confirmed from the right plot in figure 3, showing that most surface error (equation (10)) reduction results from adding the first latent variable. On average, DAM matches-and even outperforms in the low-dimensional regime-the accuracy of countourbased PCA models (Budiarto et al 2011). The larger spread in error values is likely caused by the fact that, unlike for the values reported in the PCA study, all surface points are not equidistant but randomly sampled over the surface, increasing the distance between correspondent points in under-sampled areas.

Generative performance
Besides generating realistic CT scans, DAM should produce patient-specific movements whose distribution approximately matches those observed in the clinics, as reported in previous work (Antolak et al 1998). Given the limited dataset size, we train 3 models using different training-test dataset splits, i.e. changing the 2 test patients for each model. For these 2 test patients and each model, figure 4 displays the distribution of the anatomical variations seen in the 11 recorded repeat CTs (blue), compared to the deformations seen in 100 randomly sampled CTs (orange). Except for the large center of mass movements seen for the second patient figures 4(b) and (d), the ranges of values for both volume changes and center of mass shifts in figure 4 are approximately equal. Similarly, figure 5 shows the center of mass shift and volume changes distributions for all training patients with more than 5 repeat CTs. To compress all the information into one plot, we plot the mean and standard deviation, instead of the full histogram. The good overlap between distributions demonstrates that DAM captures the correct frequency and range of movements. As for the test patients, the biggest differences between both distributions occur for the last patient in figure 5(b) with large center of mass shifts, which is aggravated by the fact that this patient has three big outliers of >7 mm shift. Finally, figure 6 displays generated and real anatomies for one of the patients, showing high quality images and contours with similar features and shapes. Figure 7 displays representative DFVs from 3 patients, with overall smooth deformations that match the  movements observed between the planning and repeat CTs. To evaluate the realism of the deformations, we calculate determinant of the Jacobian of the DVF, and specifically the fraction of voxels whose Jacobian determinant is negative, indicating unrealistic 'folding' in the DVF. In our case, the diffeomorphic DVFs generated by the model prevent any folding from occurring, resulting in smooth, invertible deformations with Jacobian determinants that are always positive.

Latent space analysis
To investigate the deformations captured by the latent variables, we compute the center of mass shifts and volume changes triggered by each variable independently, while keeping the rest fixed. Figure 8 displays such changes for 4 randomly picked variables from the model with 8 latent variables, whose value was modified between −1.5 and 1.5 times the standard deviation of the prior distribution. The results show magnitudes and correlations between changes as can be expected: smaller prostate volume changes, and large bladder and rectum variations shifting the center of mass of the prostate and seminal vesicles. To further demonstrate DAM's learned correlated groups of movements, in figure 9 we plot a grid of structures corresponding to simultaneously varying two latent variables. Individual changes in the horizontal and vertical axis mainly control the bladder and rectum volumes, respectively. Correlated deformations arise: the increase of bladder volume above the seminal vesicles, together with the decrease of rectum filling below the prostate, cause a prostate and vesicles shift and rotation. We analyze the structure of the latent space by determining if similar deformations (shifts and volume changes) or anatomical features (organ volume) result in similar latent variables. Figure 10 shows a twodimensional t-SNE representation of the latent variables, where only samples with the smallest and largest movements or volumes are included, i.e. samples whose with center of mass shifts or volumes that fall above the 90% percentile or below the 10% percentile. Most of the latent space information seems to concern center of mass shifts and bladder/rectum volume changes, since their 2D representations can be clearly separated. Ideally, similar latent variables that are clustered together will correspond to different anatomical deformations, and will not carry information about anatomical features of the patient such as absolute organ volume. Instead, the Encoder is in charge to mapping deformations to anatomical traits observed in the planning CT or structures. Prostate and bladder volume seem to have no effect in how the latent space is organized, since similar latent variables correspond to very different sizes. To some extent, the effect of rectum size is also limited, resulting from the possible correlation between rectum fillings and volume changes.

Discussion
In this study, we developed a probabilistic framework to model patient-specific inter-fraction movements based on population data. The presented DAM captures deformation patterns, generating DVFs only based on the planning CT scan and delineations. Based on the metrics obtained in figure 3 for the 22 scans from two  suffices to capture prostate movements, and that both the CC and rectum DICE score keep increasing with larger latent spaces, we can conclude that most of the computational effort is dedicated to modeling rectum deformations. Prostate IMPT treatments typically solely involve lateral beams, for which the impact of error due to rectum movement is small. In some cases, models with as little as 4-8 variables may be accurate enough, while 8-16 variables additionally ensure accurate rectum deformations for plans requiring more precision.  For clinical application, it is critical that the model generates realistic shifts and deformations of the volume to be irradiated/treated (in this case, the prostate). Overall, based on the results in figures 4 and 5, the center of mass shifts and volume changes produced by DAM show good overlap to the deformations and shifts recorded in the clinic, matching previously reported values (Antolak et al 1998). One reason why DAM struggles in simulating the most extreme shifts or slides is the regularization term of the loss, which limits large deformations. Despite this limitation, such large anatomical variations are typically taken care of by adapting the treatment plan to the new anatomy, whereas robust treatment planning and evaluation-the main potential applications of DAM-are in principle oriented to incorporating average, frequent deformations into treatment design and evaluation, and we expect DAM to be useful for such purposes.
Comparison to other methods. All the previously published approaches are either patient-specific or population models based on PCA. Patient-specific methods (Söhn et al 2005, Zhang et (Szeto et al 2017) to capture 90% of the variance in the training data. A large number of components (equivalent to DAM's latent variables) adds more variation, increases the chance of sampling unrealistic deformations and limits their applicability as reduced order models. Most importantly, all previous population-based methods require a time-consuming pre-processing step involving multiple deformable image registration steps between scans and patients to an organ or CT template. The accuracy of such registration calculation degrades the final accuracy and generative performance of the model, with previous studies (Szeto et al 2017, Tilly et al 2017 showing surface errors of around 1.5 ± 1.0 mm introduced in their pre-processing step alone that are comparable the DAM's total errors reported in the right plot of figure 3. Given the lack of uniformity in treatment site and evaluation metrics in previous studies-where most focus on evaluating the variance captured by the PCA model components and the errors on the DVFs caused by truncating the number of eigenmodes-we compare DAM's performance to a PCA model of the prostate (Budiarto et al 2011) in the right plot of figure 3. Even without adding any preprocessing errors, DAM matches the overall performance and is to capture prostate motion with a lower number of modeling parameters. Being trained directly on CT images in an unsupervised manner, DAM bypasses any performance or time losses from any pre-processing step, and can be easily applied to generate new anatomies in few milliseconds, compared to the tens of minutes or hours needed to obtain accurate enough registrations using conventional clinical software. Figure 10. Latent space structure. Each latent variable is reduced to 2D space t-SNE representation and classified, from left to right, according to whether they correspond to small (blue) or large (orange): prostate center of mass shifts, prostate, bladder and rectum volume changes, or prostate, bladder and rectum sizes. 'Small' samples fall below the 10% percentile of all values, while 'large' samples include all values above the 90% percentile.
Like PCA-based models, DAM assigns realistic correlated deformations to different values of the latent variables. Figures 8 and 9 show that variables control shifts, volume changes and rotations similar to those reported in previous studies (Budiarto et al 2011, Magallon-Baro et al 2019). Figure 10 demonstrates that the latent variables almost exclusively carry information about deformations, and not about anatomical traits from the patients. Instead, the Encoder is in charge of independently mapping planning anatomies to a subset of latent variables. Furthermore, unlike all previous approaches not evaluating the generative performance of their proposed models, we demonstrate the DAM also generates the adequate range and frequency of deformations for each patient.
Applicability. DAM's main application in robust treatment planning and robust evaluation against interfraction movements involves sampling patient anatomies and calculating the corresponding dose distributions. With prediction times of few milliseconds per generated anatomy, DAM offers huge speed-up possibilities for plan evaluation when coupled to fast dose calculation algorithms (Perkó et al 2016, Pastor-Serrano and Perkó 2022a, 2022b, Pastor-Serrano et al 2022. Few (3-5) representative scenarios corresponding to points around mean of the posterior distribution can be sampled to be used for scenario based robust optimization, which may translate into a dosimetric advantage or be used for margin reduction. In principle, the same modeling framework could be applied to any other treatment site, with additional applications involving obtaining weighted dose scenarios (Tilly et al 2017) to formulate anatomical robustness margin recipes (van der Voort et al 2016). Straightforward extensions include adapting the DAM framework to other types of movement (e.g. intra-fraction movements) by adding temporal dependence for treatments where patients' anatomies significantly change following a clear pattern during (e.g. breathing) or between the different fractions of the treatment (e.g. modeling tumor shrinkage). Such time-dependent model could be coupled to breathing interplay effect simulation tools (Pastor-Serrano et al 2021) to design plans based on breathing signals (Pastor-Serrano et al 2021) that mitigate the detrimental effect of movement during delivery.
Limitations. Like PCA-based models, DAM will struggle to generate deformations that are not represented in the training data, for which continuously updating the model (e.g. using cone beam CTs) can be a solution. We refer to this as having limited extrapolation capabilities, e.g. generating bladder emptying or filling when no bladder movement is included in the data, which is a problem shared with linear PCA models. Likewise, low resolution images with poor contrast can also affect performance by masking small movements of structures, especially in areas with similar organ tissue densities. DAM's implementation in the clinic thus requires a quality assurance protocol that evaluates robustness in predictions e.g. by training several models using different data, and evaluating result similarity on a same test dataset.
As for many other deep learning algorithms, DAM's generalization capabilities depend on the size and variability of the data in the dataset, as well as on the quality and resolution of the CT images. Due to the rather small size of the dataset in this preliminary study-caused by the scarcity of recorded sets of planning and repeat CTs-and based on the initial positive results, further testing appears warranted. Although DAM seems to learn representative deformations from the rather small dataset used during training, larger datasets will likely improve model performance, as typically observed in deep learning approaches. Once applied in clinical environments, DAM can be re-trained using more data, including the CTs collected in the center and possible new publicly available datasets. Other options to mitigate the (possible) negative impact of small datasets include generating extra data, .e.g. via a much more computationally expensive mechanistic model (such as the ones developed for heart motion (Gerach et al 2021)).
DAM's accuracy in generating reasonable patient-specific movements depends on the extent to which movements can be predicted only from the planning CT and structures. As with other classical and deep learning registration algorithms, DAM would struggle to register rectum structures due to the randomness in their intensity values. Following clinical practice, we opted for masking the rectum voxels with air. As a result, all deformed CTs have air-filled rectum structures, which can affect the accuracy in the dose calculation, especially for beams delivered in the anterior-posterior direction. Possible solutions include adding an additional generative model that generates rectum voxel intensities based on the organ mask shape.

Conclusion
We presented DAM, a deep learning-based DAM to simulate patient-specific deformations that may be observed during the course of a prostate cancer radiotherapy treatment. DAM captures groups of correlated movements via few auxiliary latent variables, where few variables are able to model prostate deformations and shifts with similar accuracy as state-of-the-art models based on PCA. Compared to previous population models, DAM can generate realistic CT images and contours in less than a second without any pre-processing, with volume changes and center of mass shifts that match in frequency and range those reported in the clinics and in previous studies. Given its simplicity and speed to generate CTs based on a single planning scan and delineations, DAM can be tested in treatment planning and evaluation to design treatment plans that are robust against interfraction variations. where p are spatial locations. Typically, T is chosen so that v(p)/2 T is small, with higher T leading to more accurate solutions. In Group theory, the velocity field v is a member of the Lie algebra, which is exponentiated to produce the member of the Lie group ( ) v exp 1 F = , establishing the connection between the exponentiation and the integration of the ordinary differential equation.