1 Introduction

Diffeomorphic image registration has been widely studied in the fields of computational anatomy [3, 8], atlas-based image segmentation [2], and anatomical shape analysis [9, 16], as it provides smooth and invertible smooth spatial correspondences between images. The problem of ‘inexact’ registration is an ill-posed problem since the image data are usually contaminated by unknown noise. Providing efficient measures to quantify the registration uncertainty or error is critical to fair assessment on estimated transformations and subsequent improvement on the accuracy of predictive models. This also forms the basis for model-assisted decision making, for example, image-guided neurosurgery system, where surgeons need a better understanding of registration uncertainty to identify residual tumors [10].

Motivated by probabilistic modeling, several works have proposed to quantify registration uncertainty by having a probability distribution over the space of transformation parameters [4, 11, 13, 14]. These approaches formulate Bayesian image registration as an image matching likelihood term regularized by a prior that encourages smooth deformations. The spread of the posterior of unknown registration parameters is then considered as a measure of the registration uncertainty. Existing methods including stochastic [13] and sampling [4, 5, 10] methods have been investigated to estimate the uncertainty, due to the fact that the posterior does not have a closed form and is computationally problematic to solve. A large computational effort is required to sample over high dimensional parameter spaces. To alleviate the excessive computational demands of sampling methods, a multivariate Gaussian approximation to the mode over the posterior was presented in [14]. In spite of the advantages of constructing Bayesian models, the extremely high computational cost and large memory footprint of the algorithms mentioned above have limited their usage in important applications that require computational efficiency.

In this paper, we are the first to introduce an efficient Bayesian image registration uncertainty quantification model that employs a low dimensional Fourier representation of the tangent space of diffeomorphisms [17]. We develop a novel Laplace approximation of the log posterior distribution that characterizes the deformation uncertainty at the optimal solution in a bandlimited space. More specifically, we assume a complex Gaussian distribution at the mode of the posterior, and approximate its covariance matrix to quantify the registration uncertainty. Our method dramatically reduces the computational complexity of approximating posterior marginals, which makes the uncertainty analysis for diffeomorphic image registration tractable in time. Another major benefit of our algorithm is that the covariance matrix can be easily computed and stored through the inverse Hessian of the log posterior defined by the low dimensional representations of transformation fields. We demonstrate the effectiveness of our model in both synthetic and real brain MRI data. A promising clinical application of our method is to provide key information on brain shifts; hence helping neurosurgeons to identify residual tumor tissue in real time, while lowering the risk of collateral tissue damage.

2 Background: LDDMM with Geodesic Shooting

Consider a source image S and a target image T as square-integrable functions defined on a torus domain \(\varOmega = \mathbb {R}^d / \mathbb {Z}^d\) (\(S(x), T(x) : \varOmega \rightarrow \mathbb {R}\)). The problem of diffeomorphic image registration is to find the shortest path of diffeomorphic transformations \(\psi _t \in \mathrm{Diff}(\varOmega ): \varOmega \rightarrow \varOmega , t \in [0, 1]\), such that the deformed image \(S \circ \psi _1\) at time point \(t=1\) is similar to T. An explicit energy function of LDDMM with geodesic shooting [12, 15] is formulated as an image matching term plus a regularization term that guarantees the smoothness of the transformation fields

$$\begin{aligned} E(v_0) = \frac{\lambda }{2} \, \mathrm{Dist}(S \circ \psi _1, T) + \frac{1}{2}(\mathcal {L}v_0, v_0), \quad s.t. \,\, \textit{ geodesic constraint}, \end{aligned}$$
(1)

where \(\lambda \) is a positive weight parameter and \(\mathrm{Dist}(\cdot , \cdot )\) is a distance function that measures the similarity between images. The deformation \(\psi \) is defined as an integral flow of the time-varying Eulerian velocity field \(v_t\) that lies in the tangent space of diffeomorphisms \(V = T \mathrm{Diff}(\varOmega )\). Here \(\mathcal {L}: V \rightarrow V^*\) is a symmetric, positive-definite differential operator that maps a tangent vector \(v \in V\) into the dual space \(m \in V^*\), with its inverse \(\mathcal {K}: V^* \rightarrow V\). The \((\cdot , \cdot )\) denotes a paring of a momentum vector \(m \in V^*\) with a tangent vector \(v \in V\).

The geodesic at the minimum of (1) is uniquely determined by integrating the geodesic constraint, a.k.a. Euler-Poincaré differential equation (EPDiff) [1, 7], which is computationally expensive in high dimensional image spaces. A recent work demonstrated that the entire optimization of LDDMM with geodesic shooting can be efficiently carried in a low dimensional bandlimited space with dramatic speed improvement [17, 18]. We briefly review the basic concepts below.

Let \(\widetilde{\mathrm{Diff}}(\varOmega )\) and \(\tilde{V}\) denote the space of Fourier representations of diffeomorphisms and velocity fields respectively. Given time-dependent velocity field \(\tilde{v}_t \in \tilde{V}\), the diffeomorphism \(\tilde{\psi }_t \in \widetilde{\mathrm{Diff}}(\varOmega )\) in the finite-dimensional Fourier domain can be computed as

$$\begin{aligned} \tilde{\psi }_t = \tilde{e} + \tilde{u}_t, \quad \frac{d \tilde{u}_t}{dt}&= -\tilde{v}_t - \tilde{\mathcal {D}} \tilde{u}_t *\tilde{v}_t, \end{aligned}$$
(2)

where \(\tilde{e}\) is the frequency of an identity element, \(\tilde{\mathcal {D}}\tilde{u}_t\) is a tensor product \(\tilde{\mathcal {D}} \otimes \tilde{u}_t\), representing the Fourier frequencies of a Jacobian matrix \(\tilde{\mathcal {D}}\) with central difference approximation, and \(*\) is a circular convolution with zero padding to avoid aliasingFootnote 1.

The Fourier representation of the geodesic constraint (EPDiff) is defined as

$$\begin{aligned} \frac{\partial \tilde{v}_t}{\partial t} =\mathrm{ad}^{\dagger }_{\tilde{v}_t} \tilde{v}_t =-\tilde{\mathcal {K}}\left[ (\tilde{\mathcal {D}} \tilde{v}_t)^T \star \tilde{m}_t + \tilde{\nabla } \cdot (\tilde{m}_t \otimes \tilde{v}_t) \right] , \end{aligned}$$
(3)

where \(\star \) is the truncated matrix-vector field auto-correlation and \(\mathrm{ad}^{\dagger }\) is an adjoint operator to the negative Lie bracket of vector fields, \(\mathrm{ad}_{\tilde{v}} \tilde{w} = -[\tilde{v}, \tilde{w}] = \tilde{\mathcal {D}}\tilde{v} *\tilde{w} - \tilde{\mathcal {D}}\tilde{w} *\tilde{v}\). The operator \(\tilde{\nabla } \cdot \) is the discrete divergence of a vector field. Here \(\tilde{\mathcal {K}}\) is a smoothing operator with its inverse \(\tilde{\mathcal {L}}\), which is the Fourier transform of a commonly used Laplacian operator \((-\alpha \varDelta + I)^c\), with a positive weight parameter \(\alpha \) and a smoothness parameter c. The Fourier coefficients of \(\tilde{\mathcal {L}}\) is, i.e., \(\tilde{\mathcal {L}}(\xi _1 , \ldots , \xi _d) = \left( -2 \alpha \sum _{j = 1}^d \left( \cos (2\pi \xi _j) - 1 \right) + 1\right) ^c\), where \((\xi _1 , \ldots , \xi _d)\) is a d-dimensional frequency vector.

3 Low-Dimensional Bayesian Registration Uncertainty

We introduce a Bayesian model of diffeomorphic image registration represented in the bandlimited velocity space \(\tilde{V}\), with registration uncertainty explicitly encoded as latent variables of the model.

Assuming i.i.d. Gaussian noise on image intensities, we obtain the likelihood

$$\begin{aligned} p(T \, | \, S, \sigma ^2) = \frac{1}{(\sqrt{2 \pi } \sigma ^2 )^{M}} \exp { \left( -\frac{1}{2\sigma ^2}|| S \circ \psi _1 -T ||_2^2 \right) }, \end{aligned}$$
(4)

where \( \sigma ^2 \) is the noise variance and M is the number of image voxels. The deformation \(\psi _1\) corresponds to \(\tilde{\psi }_1\) in Fourier space via the Fourier transform \(\mathcal {F}(\psi _1)=\tilde{\mathcal \psi }_1\), or its inverse \(\psi _1=\mathcal {F}^{-1}(\tilde{\mathcal \psi }_1)\).

We define a prior on the initial velocity field \(\tilde{v}_0\) to be a complex multivariate Gaussian distribution that ensures the smoothness of the geodesic path, i.e.,

$$\begin{aligned} p(\tilde{v}_0) = \frac{1}{(2 \pi )^{\frac{Md}{2}} | \tilde{\mathcal {L}}^{-1} |^{\frac{1}{2}}} \exp { \left( -\frac{1}{2}(\tilde{\mathcal {L}} \tilde{v}_0, \tilde{v}_0) \right) }, \end{aligned}$$
(5)

where \(|\cdot |\) is matrix determinant.

Combining the likelihood (4) and prior (5) together, we obtain the negative log posterior distribution on the deformation parameter parameterized by \(\tilde{v}_0\) as

$$\begin{aligned} -\ln \, p(\tilde{v}_0 \, | \, S, T, \sigma ^2) = \frac{1}{2}(\tilde{\mathcal {L}} \tilde{v}_0, \tilde{v}_0) + \frac{\Vert S \circ \psi _1 - T \Vert _2^2}{2\sigma ^2} + M \ln \sigma + \text{ const }. \end{aligned}$$
(6)

In most probabilistic formulations of image-based registration, the likelihood function (4), as a function of the transformation parameters, is highly non-Gaussian because of the complex spatial structure of the images. This brings difficulties in the inference of such a non-Gaussian posterior.

3.1 Laplace Approximation

In this section, we introduce Laplace’s method to approximate the covariance matrix at the mode of the posterior in a low dimensional bandlimited space. We first minimize the negative log posterior in (6) to the optimum, denoted as \(\tilde{v}_0^{opt}\) (details are introduced in the following Sect. 4). We then assume a local complex Gaussian distribution at the optimal solution.

To simplify the notation, we use \(f(\tilde{v}_0) \triangleq -\ln \, p(\tilde{v}_0 \, | \, S, T, \sigma ^2)\). The function \(f(\tilde{v}_0)\) is approximated to quadratic order by using second order Taylor series expansion at the optimal solution \(\tilde{v}_0\) as

$$\begin{aligned} f(\tilde{v}_0) \approx f(\tilde{v}_0^{opt}) + \nabla f^T(\tilde{v}_0^{opt})(\tilde{v}_0 - \tilde{v}_0^{opt}) + \frac{1}{2} (\tilde{v}_0 - \tilde{v}_0^{opt})^T \mathcal {H} f(\tilde{v}_0^{opt}) (\tilde{v}_0 - \tilde{v}_0^{opt}), \end{aligned}$$

where \(\nabla \) denotes the first derivative and \(\mathcal {H}\) is a second order Hessian. Since the first derivative of f vanishes at the optimal solution \(\tilde{v}_0^{opt}\), we have

$$\begin{aligned} f(\tilde{v}_0) \approx f(\tilde{v}_0^{opt}) + \frac{1}{2} (\tilde{v}_0 - \tilde{v}_0^{opt})^T \mathcal {H} f(\tilde{v}_0^{opt}) (\tilde{v}_0 - \tilde{v}_0^{opt}). \end{aligned}$$
(7)

The posterior is approximately a Gaussian \(\mathcal {N}(\tilde{v}_0^{opt}, \mathcal { H}^{-1}f(\tilde{v}_0^{opt}))\). The inverse Hessian corresponds to the covariance matrix of the registration parameters.

4 Inference

Following optimal control theory [12], we first develop a gradient decent algorithm to minimize the negative log posterior distribution (6) w.r.t. the initial velocity \(\tilde{v}_0\) and the image noise variance \(\sigma ^2\). Analogous to [14], we then derive the second variation of (6) to compute the Hessian-vector product via a linearized forward-backward sweep.

Parameter Estimation. We add Lagrange multipliers to constrain the diffeomorphism \(\tilde{\psi }_t\) to be a geodesic path in the frequency domain. This is done by introducing time-dependent adjoint variables, \(\hat{v}_t\) and \(\hat{u}_t\), and writing the augmented energyFootnote 2,

$$\begin{aligned} E(\tilde{v}_0) = -\ln \, p(\tilde{v}_0 \, | \, S, T, \sigma ^2) + \int _0^1 \langle \hat{v}_t, \dot{\tilde{v}}_t + \mathrm{ad}^\dagger _{\tilde{v}_t} \tilde{v}_t \rangle + \langle \hat{u}_t, \dot{u}_t + \tilde{v}_t + \tilde{\mathcal {D}} \tilde{u}_t *\tilde{v}_t \rangle \, dt, \end{aligned}$$
(8)

where the last two terms correspond to Lagrange multipliers enforcing the geodesic constraint (3) and deformation transport Eq. (2) that is proved mathematically equivalent to the evolution equation [12].

The optimality conditions for the adjoints \(\hat{v}_t, \hat{u}_t\) are given by the following time-dependent system of ordinary differential equations, termed the adjoint equations (equivalent to error-back propagation):

$$\begin{aligned} -\dot{\hat{v}}_t + \mathrm{ad}_{\tilde{v}_t} \hat{v}_t - \mathrm{ad}^\dagger _{\hat{v}_t} \tilde{v}_t + \hat{u}_t+(\tilde{\mathcal {D}}\tilde{u}_t)^T \star \hat{u}_t=0, \quad \, -\dot{\hat{u}}_t - \mathrm{div}(\hat{u}_t \otimes \tilde{v}_t) = 0, \end{aligned}$$
(9)

subject to initial conditions \(\hat{v}_1 = 0\) and \(\hat{u}_1 = -\frac{1}{\sigma ^2} \mathcal {F}[\langle \nabla S(1), S(1)- T\rangle ],\) where \(S(1)=S \circ \psi _1\).

After integrating the geodesic equations (state equations) (3) forward in time to \(t=1\) and then backward integrating the adjoint Eqs. (9) in time to \(t=0\), the gradient of E w.r.t. \(\tilde{v}_0\) is \(\nabla _{\tilde{v}_0} E = \tilde{v}_0 - \hat{v}_0.\)

Setting the gradient w.r.t. \(\sigma ^2\) to zero, we have a closed form update \(\sigma ^2 = \frac{1}{M} \Vert S (1) - T \Vert _2^2\).

Covariance Estimation. To estimate the full covariance matrix as an inverse Hessian, we develop a similar forward-backward approach to compute the Hessian-vector products that involves the second variation of the augmented energy function (8). In particular, after deriving the second variation in the direction \(\delta \tilde{v}_0\) as \(\frac{\partial ^2}{\partial \epsilon ^2} E(\tilde{v}_0 + \epsilon \cdot \delta \tilde{v}_0) |_{\epsilon =0} = \langle \delta \tilde{v}_0, \mathcal {H}E \, \delta \tilde{v}_0\rangle \), we read off the Hessian-vector product \(\mathcal {H}E \, \delta \tilde{v}_0\). Given an initial condition \(\delta \tilde{v}_0\), we can compute the Hessian-vector product as \( \mathcal {H}E \, \delta \tilde{v}_0 = \delta \tilde{v}_0 - \delta \hat{v}_0, \) where \(\delta \hat{v}_0\) is the adjoint variable of \(\delta \tilde{v}_0\).

The second variation can be accomplished by forward-sweeping the linearized geodesic constraint around the optimal solution, followed by a backward sweep of the linearized adjoint system. Introducing time-dependent adjoint variables \(\delta \hat{v}_t\) and \(\delta \hat{u}_0\), the forward linearized geodesic equations are

$$\begin{aligned} \delta \dot{\tilde{v}}_t =-\mathrm{ad}^\dagger _{\delta \tilde{v}_t} \tilde{v}_t-\mathrm{ad}^\dagger _{ \tilde{v}_t}\delta \tilde{v}_t, \quad \quad \delta \dot{\tilde{u}}_t = -\tilde{\mathcal {D}}_{\delta \tilde{u}_t}*\tilde{v}_t- \tilde{\mathcal {D}}_{\tilde{u}_t} *\delta \tilde{v}_t-\delta \tilde{v}_t. \end{aligned}$$

The linearized adjoint system for the backward integration is

$$\begin{aligned} \delta \dot{\hat{v}}_t&= \mathrm{sym}^{\dagger }_{\tilde{v}_t} {\delta \hat{v}_t}- \mathrm{sym}^{\dagger }_{\delta \hat{v}_t} { \hat{v}_t} + \delta \hat{ u}_t+(\tilde{\mathcal {D}}\tilde{u}_t)^T \star \delta \hat{u}_t+(\tilde{\mathcal {D}}\delta \tilde{u}_t )^T\star \hat{u}_t, \nonumber \\ \delta \dot{\hat{u}}_t&= -\mathrm{div}(\delta \hat{u}_t \otimes \tilde{v}_t+\hat{u}_t \otimes \delta \tilde{v}_t), \end{aligned}$$

subject to initial conditions \(\delta \hat{u}_1=-\frac{2}{\sigma ^2}\mathcal {F}[\nabla S(1)\cdot \nabla S(1) +(S(1)-T)\cdot \nabla ^2S(1)]\) and \(\delta \hat{v}_1=0\), with \(\mathrm{sym}^{\dagger }_{\tilde{v}_t} {\delta \hat{v}_t}=\mathrm{ad}_{\delta {\tilde{v}_t} }\hat{v}_t -\mathrm{ad}^\dagger _{\hat{v}_t }\delta {\tilde{v}_t } \).

5 Results

To evaluate our model, we first estimate the covariance matrix by using \(\alpha =3, c=6\) for the operator \(\tilde{L}\). We set each dimension of the initial velocity field \(\tilde{v}_0\) as 16, which is similar to the settings used in the pairwise diffeomorphic image registration [17]. The number of time steps for Euler integration in geodesic shooting is set to 10. We then compare our results with an uncertainty quantification method computed in a full dimensional image space [14] on the same dataset. For a fair comparison, we keep all the parameters including regularization and time steps for numerical integration fixed across both algorithms.

Data. We test the proposed approach both on 2D synthetic data and 3D brain MRI scans from the OASIS dataset [6]. We generate a collection of binary image with resolution \(100^2\). The MRIs are of dimension \(128^3\), \(1.25\,\text {mm}^3\) isotropic voxels, and underwent skull-stripping, downsampling, intensity normalization, bias field correction, and co-registration with affine transformations.

Experimental Results. Figure 1 visualizes the uncertainty information estimated from both our method and the baseline algorithm performed in high dimensional space on 2D data. We extract the local covariance matrix of each voxel and visualize it as an ellipse on the source image, with the color representing the matrix determinant. The smaller determinants are closer to the non-isotropic area (e.g., circle boundaries), which indicate more confident registration results.

Fig. 1.
figure 1

Left to right: source image, target image, covariance matrix determinant estimated by baseline algorithm and our method.

Figure 2 visualizes an example of 3D brain registration uncertainty. Note that due to the difficulty of computing a full covariance matrix by inverse Hessian In a high dimensional image space, we need to use an approximated low-rank Hessian with a number of dominant eigenmodes [14]. We choose the first 3000 eigenmodes with non-zero values to represent the most effective uncertainty results estimated from the baseline method (see the left panel of Fig. 3). Both methods show that the high uncertainty (with less confidence) appears in isotropic areas (e.g., inside the ventricle), while the low uncertainty (with high confidence) appears around non-isotropic areas (e.g. ventricle boundaries). The subtle differences hardly affect the uncertainty visualization between these two methods. Figure 3 reports the comparison of time and memory consumption. Our algorithm offers significant improvements in computational efficiency.

Fig. 2.
figure 2

Left to right: source image, target image, and uncertainty (visualized as the trace of covariacne) estimated by baseline algorithm and our method.

Fig. 3.
figure 3

Left: eigenvalues of the Hessian matrix estimated by baseline algorithm; Right: comparison of runtime and memory consumption.

6 Conclusion

We presented a low dimensional Bayesian model for registration uncertainty quantification in the space of diffeomorphic transformations. Our method dramatically reduces the computational cost of the registration posterior approximation effectively in a bandlimited velocity space. This work is the first step toward efficient probabilistic models of registration uncertainty quantification based on high dimensional diffeomorphisms. The next future work will be investigating sampling based methods to assess our developed model uncertainty. While in this paper we focus on the context of LDDMM, our method can be generalized to other transformation parameterizations such as stationary velocity fields.