Intrasubject multimodal groupwise registration with the conditional template entropy

HIGHLIGHTSA novel similarity metric for multimodal groupwise registration is proposed.The proposed method showed equivalent or improved registration accuracy.Pairwise mutual information is outperformed in terms of transitivity. ABSTRACT Image registration is an important task in medical image analysis. Whereas most methods are designed for the registration of two images (pairwise registration), there is an increasing interest in simultaneously aligning more than two images using groupwise registration. Multimodal registration in a groupwise setting remains difficult, due to the lack of generally applicable similarity metrics. In this work, a novel similarity metric for such groupwise registration problems is proposed. The metric calculates the sum of the conditional entropy between each image in the group and a representative template image constructed iteratively using principal component analysis. The proposed metric is validated in extensive experiments on synthetic and intrasubject clinical image data. These experiments showed equivalent or improved registration accuracy compared to other state‐of‐the‐art (dis)similarity metrics and improved transformation consistency compared to pairwise mutual information.


Introduction
Biomedical image registration is the process of spatially aligning medical images, allowing for an accurate and quantitative comparison. An increasing number of image analysis tasks calls for the alignment of multiple (more than two) images. Examples include the joint analysis of tissue properties using multi-parametric MRI Wells et al., 2015 ), spatio-temporal motion estimation from dynamic sequences ( Metz et al., 2011;Vandemeulebroucke et al., 2011 ), atlas construction ( Fletcher et al., 20 09;Joshi et al., 20 04;Wu et al., 2011 ) and population analyses ( Geng et al., 2009 ).
One approach to perform such a registration task would be to take one image in the group as a reference and register all other images to this reference in a pairwise manner. However, such an approach has two distinct shortcomings. First, the choice of the reference image inherently biases the resulting transformations and subsequent data analysis towards the chosen reference.
Secondly, only a fraction of the total information available within the group of images is used in each pairwise registration, possibly leading to sub-optimal results.
An alternative is to perform a groupwise registration in which all transformations are optimized simultaneously. Transformations are expressed with respect to a common reference space, thereby removing the need for choosing a particular reference image, and the bias associated with that choice. Additionally, a global cost function simultaneously takes into account all information in the group of images. In this work we will address such groupwise similarity metrics for multimodal registration problems.
Multimodal intensity-based pairwise registration is commonly solved using mutual information (MI) ( Collignon et al., 1995;Viola and Wells III, 1995;Wells et al., 1996 ), since it assumes a stochastic relationship between the two images to be registered. Extending MI to groupwise registration leads to a high-dimensional joint probability density function with an exponentially increasing number of histogram bins. Sparsity becomes a major concern as the number of images grows larger and limits the application to small groups of images ( Wachinger and Navab, 2013 ). A number of alternatives have been proposed to perform multimodal groupwise registration. Orchard and Mann (2010) proposed to use a Gaussian mixture model instead of histograms to approximate the joint probability density functions and Spiclin et al. (2012) approximated the joint probability density functions with a nonparametric approach based on a hierarchical intensity-space subdivision scheme. However, both approaches remain limited by the sparsity in the joint intensity space and perform poorly for large groups of images.
Alternatively, one could represent the intensities as a graph and relate the length of such a graph to the entropy of the images ( Hero et al., 2002 ). Such an approach requires a computationally expensive optimization for the construction of the graph and is not continuously differentiable, making gradient-based optimization difficult. Zöllei et al. (2005) proposed the use of a voxelwise stack entropy. Herein, the intensities of all separate images in the group at a given sampled coordinate are grouped into a one-dimensional probability density distribution. For each sampled coordinate, the entropy is calculated and summed. However, for a low number of images in the group, the probability density functions are sparse which limits its use to larger groups of images. Wachinger et al. (2007) proposed to accumulate all pairwise estimates of mutual information for all possible pairs of images in the group under consideration. Such an approach leads to a computation time which is proportional to the square of the number of images, making its application to larger groups of images increasingly difficult. Joshi et al. (2004) developed an interesting metric where the mean squared differences is used as a pairwise metric to compare every image in the group to the average image. Herein the average image is updated in each iteration. They applied the method to monomodal brain atlas construction and it has also been applied to thoracic 4D CT data ( Metz et al., 2011 ) and 4D ultrasound of the liver ( Vijayan et al., 2014 ). The approach carries a number of advantages, such as the linear scaling of the computational complexity with respect to the number of images in the group and the possibility to parallelize the algorithm, making it feasible for both small and large groups of images. Bhatia et al. (2007) proposed to use the normalized mutual information ( Studholme et al., 1999 ) as a pairwise similarity metric and the average image as a template image on monomodal intersubject data. The metric was termed the average normalized mutual information and has been used (together with the average mutual information) in subsequent literature as a metric for multimodal groupwise registrations ( Ceranka et al., 2017;Hallack et al., 2014;Huizinga et al., 2016;Polfliet et al., 2016;. However, the use of the average image as the template image might not be appropriate in multimodal data with intensities of varying scales, ranges and contrast. In this work a novel similarity metric, the conditional template entropy (CTE), is introduced for multimodal groupwise registration based on this principle of pairwise similarity with respect to a template image. Following the original formulation by Joshi et al. (2004) , we first design a suitable pairwise metric to be used in the comparison of the template image and every image in the group. Afterwards we investigate the use of a template image based on principal component analysis.
Given the linear scaling of the computational complexity, the metric can be applied to a wide range of intrasubject multimodal groupwise registration problems, for both small and large groups of images, and can be used as a general purpose metric. The proposed metric is validated in extensive experiments on synthetic and intrasubject clinical data, demonstrating equivalent or improved registration accuracy compared to other state-of-the-art methods and improved transformation consistency compared to pairwise MI.

Pairwise registration
In pairwise registration, a target (moving, floating) image I T is registered to a reference (fixed, source) image I R . The transformation T θ , parameterized by θ, needs to be determined that maps coordinates from the reference image domain to the target image domain ( Fig. 1 (a)). The registration can be defined as an optimization problem ˆ θ = arg min θ C ( I R , I T • T θ ) . (1) Here, C is the cost function or objective value of the registration problem, which is often represented as a weighted sum of a dissimilarity metric, D, and a regularization term, R , such that in which λ is the weight for the regularization.

Mutual information
In the pairwise approach, mutual information (MI) ( Collignon et al., 1995;Viola and Wells III, 1995;Wells et al., 1996 ) is defined as a similarity metric ( S = −D) Here, H ( · ) and H ( · , · ) refer to, respectively, the marginal and joint entropy of the marginal and joint intensity distributions, often calculated via normalized histograms. In Eq. (3) , the first term expresses the complexity of the reference image and the second term is the entropy of the target image mapped onto the reference, which favors transformations that map onto complex parts of the target image. The final term expresses the complexity of the shared or common relationship between the reference and target image. It is maximized when the (statistical or stochastic) relationship is stronger and thus less complex ( Wells et al., 1996 ).
Following Maes et al. (1997) , MI can be rewritten in terms of the conditional entropy (CE) The conditional entropy H ( A | B ) describes the amount of information that remains in a random variable A once the random variable B is known. With the entropy of the reference image being independent of the transformation parameters, maximization of the negated conditional entropy and maximization of the mutual information lead to equivalent solutions of the registration problem.

Groupwise registration
In groupwise registration we consider a group of n images for which the transformations to a common reference frame are unknown. We can consider the following optimization problem to determine these transformations: where T μ i is the transformation, parameterized by μ i , that maps the coordinates from the common reference domain to the domain of the i th image ( Fig. 1 (b)). μ is the vector formed by the concatenation of all separate transformation parameters μ i , and I i is the continuous intensity function of the i th image. Joshi et al. (2004) proposed the following formulation for monomodal groupwise registration, in which both the transformation parameters and a template image are optimized ˆ μ, ˆ

Template construction
with J the continuous intensity function of a template image, x the coordinate samples drawn from the image and S the set of these samples. The template image can be interpreted as being the image that is most similar to the other images in the group in terms of the mean squared differences. For a given value of the transform parameters, the optimization with respect to the template image J was solved analytically to be the average image As such, the registration problem in Joshi et al. (2004) is reduced to ˆ μ = arg min

The conditional template entropy
In this work, a novel similarity metric for multimodal groupwise registration is proposed, based on this paradigm in which similarity of the group of images is measured with respect to an iteratively updated template image. Considering the interpretation of the entropy terms given in Section 2.2 , we propose to measure similarity using the negated joint entropy of each image in the group with the template image, favoring transformations for which the template explains the group of images well; and the marginal entropies of each image in the group, encouraging transformations that map onto complex parts of the images in the group. Note that this is equivalent to a formulation based on the conditional entropy: Observing the resulting metric, one can notice the resemblance with a formulation based on mutual information. The difference lies in the absence of the marginal entropy of the template image, H ( J ). As we will demonstrate, this term counteracts the alignment of the group of images. A representative template image is likely to grow sharper when converging towards the optimal registration solution, leading to a reduced complexity of its intensity distribution and a decrease in the marginal entropy, which is opposite of the desired optimization behavior. The proposed method based on conditional entropy as shown in Eq. (9) eliminates this problem.
To find the appropriate template image, we revisit Eq. (6) where the template image could be obtained analytically as the average image. Unfortunately, Eq. (9) cannot be solved analytically with respect to the template image, J , for a given set of transformations if the trivial solution of a constant template image with a single intensity is excluded. Hypothetically, one could set up an optimization scheme where the template image is predefined by a functional relationship and weights corresponding to the images in the group. Herein, the optimization of the transformation parameters could be alternated with the optimization of the weights for the template image. Such nested optimization is error-prone and costly, and undesirable in this context.
Alternatively, instead of maximizing Eq. (9) , we propose a more pragmatic approach which maximizes the variance in the template image. By defining J as the linear combination of the images in the group, principal component analysis (PCA) can be used to find the weights associated to the images. This has previously been shown to reduce the noise due to motion in the template image ( Melbourne et al., 2007 ). Additionally, negatively correlated intensities can be accounted for to increase the contrast in the template image, instead of decreasing the contrast as might be the case for simple intensity averaging.
PCA defines a linear transformation from a given highdimensional space to a low-dimensional subspace whilst retaining as much variance as possible. In this work, PCA is performed with each sampled coordinate as a separate observation and the different images in the group corresponding to different features. The transformation to the 1-dimensional subspace along which the most variance is observed, is given by the eigenvector associated with the largest eigenvalue. As such, the elements of this eigenvector can serve as the weights for the construction of the template image.
Here, v μ is the eigenvector associated with the largest eigenvalue and the subscript μ is added to show its dependence on the transformation parameters. This template image, based on the principal component of the PCA, will hereafter be referred to as the principal component image.
Combining (9) and (10) leads to a novel similarity metric, the conditional template entropy (CTE), where similarity is expressed as the sum of the conditional entropy between every image in the group and the principal component image:

Optimization
The proposed metric was implemented as part of the software package elastix ( Klein et al., 2010 ) and is publicly available. An adaptive stochastic gradient descent was employed to minimize the cost function ( Klein et al., 2009 ). As such, the negated form of Eq. (11) is used, to allow a minimization to take place. The derivative of the proposed metric with respect to μ was determined following the approach of Thévenaz and Unser (20 0 0) in which Bsplines were used as a Parzen windowing function such that the joint probability density functions p i between the template image and the i th image in the group become Here, α is a normalization factor to obtain a density function, is related to the width of the histogram bin and β m is a B-spline function of the order of m. ι and κ are the discretized intensities corresponding to the template image and images in the group, respectively. With B-splines fulfilling the partition of unity constraint ( Thévenaz and Unser, 20 0 0 ), we have where L PCA and L i are the discrete sets of intensities associated with the principal component and the i th image. This leads to With p I i ( κ; μ i ) the probability density function of the i th image. In Appendix A the derivative of the principal component image with respect to the transformation parameters is given.

Transformation degeneracy
Given the degeneracy of estimating n transformations for n images with an arbitrary global transformation, we chose to constrain our transformation following Bhatia et al. (2004) i.e. the sum of all transformations is the identity, effectively registering the group of images to the mean space. With Rosen's Gradient Projection Method ( Luenberger, 1973 ) this is solved by setting and using this projected gradient in the stochastic gradient descent optimization.

Regularization
Following Geng et al. (2009) we used a groupwise regularization term, the groupwise bending energy (GBE) Herein, d is the spatial dimension of the images. Regularization was performed in all clinical experiments with a deformable transformation model.

Data and experiments
A total of six experiments were conducted with two on synthetic data and four on clinical intrasubject data. Herein, the proposed conditional template entropy ( S CT E ) was compared to the average mutual information ( S AMI ) Furthermore, two auxiliary similarity metrics were implemented to investigate complementary advantages of the proposed methodology, respectively the advantage of using the conditional entropy ( S CE ) and the advantage of using the principal component image ( S PC ).
For the clinical data, the four previously discussed groupwise similarity metrics were used in addition to the PCA2 metric proposed in Huizinga et al. (2016) and pairwise MI ( Eq. (3) ) as a baseline for comparison. PCA2 was proposed for the registration of images for which the intensity distribution could be represented into a low-dimensional subspace and is given as Herein, λ i refers to the i th eigenvalue of the correlation matrix of the images in the group. In Huizinga et al. (2016) it was subsequently validated on monomodal and quantitative MRI image data for which such a low-dimensional subspace exists. PCA2 can be thus considered as a specialist metric specifically designed to register such images. To demonstrate the more generic nature of the proposed methodology, CTE was compared to PCA2 for both quantitative MRI and multimodal image data. All registrations were performed in an intrasubject manner and the images were normalized by z-scoring to allow for a fair comparison to the similarity metrics employing the average image. In the pairwise registration of a group of images, one image (the first in the sequence) was chosen as a reference to which all others were mapped. Note that other strategies for choosing the reference image in pairwise registrations for a group exist, such as the pre-contrast image in dynamic contrast enhanced sequences ( Kim et al., 2011 ), the end-expiration in 4D CT ( Saito et al., 2009 ) or the mid-way image in computational anatomy ( Reuter et al., 2010 ).
As the optimization strategy, interpolation algorithm, random sampler and transformation model is equivalent for all (dis)similarity metrics, any difference in results can be solely attributed to the use of a different dissimilarity metric.
The proposed methods were validated with two validation criteria. First, the groupwise target registration error (gTRE) was used as a measure for the accuracy of the registration with ground truth annotations of certain anatomical landmarks in the   (22) r is the index of the reference image, P i the collection of landmarks in the i th image, T i,r the transformation that maps the coordinates from the i th image to the reference image and p i, j the j th landmark from the i th image. In a groupwise setting T i,r was determined through the composition of the forward transformation, that maps the coordinates from the common reference space to the reference image, with the inverse transformation, that maps the coordinates from the i th image to the common reference space: T i,r = T μr • T −1 μ i ( Fig. 2 ) ( Metz et al., 2011 ). To allow for a fair comparison between pairwise and groupwise registrations, all validation measurements were performed in the same reference space, i.e. the same image which was chosen as a reference in the pairwise registrations.
Secondly, we computed the transitivity error ( Christensen et al., 2006;Metz et al., 2011 ) to assess the quality of the transformation The transitivity error measures the transitive property of the transformations in a group of images and can be interpreted as a measure for the consistency of the transformations in a groupwise setting. For pairwise registration the use of different reference images is required to measure the transitivity and the bias associated with the choice will influence the results, whereas in groupwise registration, all transformations are estimated simultaneously and are inherently transitive (when the inverse transformation is available). As the inverse is approximated iteratively and the source for the transitivity error in the groupwise methods, no comparisons are made among the groupwise metrics based on the transitivity error. The maximum transitivity error of the groupwise methods is reported and compared to the transitivity error of the pairwise method.
The cost function hyperparameters (the number of histogram bins and regularization weight) were chosen such that they optimized the mean gTRE per dataset. The different regularization weights are reported in Table 1 . Due to the arbitrary sign of the projection vector for the principal component image, the number of histogram bins (used to calculate the entropy) are at least doubled compared to the number of histogram bins in registrations using the average image. Other optimization hyperparameters such as the spatial samples in the stochastic optimizer and the number of iterations were set to their default value. All registration hyperparameters in pairwise registrations were kept equal to those in the groupwise approach.
Results for the gTRE were compared in a pairwise manner among all similarity metrics (totaling 64 comparisons). The Wilcoxon signed-rank test was used for significance testing at a significance level of 0.05 adjusted by the Bonferroni correction for multiple comparisons.

Black&White
To illustrate the effect the entropy term of the template image has on the optimization, an experiment was performed on synthetic data. Eleven identical black-and-white images were progressively and simultaneously translated along the horizontal axis and the similarity metric values were computed. A mask was used to keep the sampling domain constant. Fig. 3 shows a single blackand-white image and the average image of the group of images when they are at maximal displacement (15 mm).

Multimodal Cubes
To further investigate registration accuracy, 100 registrations were performed on a group of six images (256 × 256 × 256 voxels) each containing two cubes, one surrounding the other. The intensities of the cubes and the backgrounds were set at random intensities to simulate a multimodal setting ( Fig. 4 ). For each group of images a random set of deformable transformations was generated with a grid spacing of 8 × 8 × 8 voxels. The gTRE of the corners of the cubes was used to quantify the registration accuracy.

Thoracic 4D CT
Thoracic 4D CT data ( Fig. 5 ) was taken from the publicly available POPI and DIR-LAB datasets which include, respectively, 6 and 10 sequences of 10 respiratory phases each ( Castillo et al., 2009;Vandemeulebroucke et al., 2011 ). Thoracic 4D CT data is often considered as monomodal data. However, minor intensity changes can occur due to changes in the voxel density in the lungs associated with the inhalation and exhalation of air ( Sarrut et al., 2006 ) leading several authors to employ adapted or multimodal metrics for lung registration ( Murphy et al., 2011 ).  The POPI dataset contains three patients with 100 manually identified landmarks in the lungs for every breathing phase and three patients with 100 landmarks in end-inspiration and endexpiration phases with an inter-rater error of 0.5 ± 0.9 mm. In the DIR-LAB dataset, all patients have 300 landmarks in the lungs for the inspiration and expiration phases and 75 in the four phases in between and an intra-rater error between 0.70 and 1.13 mm. Accuracy of the registration was determined using the gTRE with respect to the inspiration phase, the first image in the dynamic series.
A deformable registration was performed using cubic B-splines with a final grid spacing of 12.0 mm. Lung masks were used and obtained following Vandemeulebroucke et al. (2012) . For each resolution level 20 0 0 iterations were performed, except for the last resolution where 40 0 0 iterations were allowed.

Carotid MR
MR image sequences were acquired of the carotid artery by Coolen et al. (2015) . The acquisitions were performed with a gradient echo MRI sequence for different flip angles and TE preparation times ( Fig. 6 ). Each sequence consisted of five images and was performed for eight patients. The bifurcation of both carotid arteries was identified for each patient and consequently used as a landmark in the validation of the registration.
For this data we performed a deformable registration with cubic B-splines and a final grid spacing of 8.0 mm. van 't Klooster et al. (2013) has shown that a deformable registration is needed in such acquisitions of the carotid arteries. Masks around the carotid arteries were used as region of interest for registration.

Head&Neck
As part of radiotherapy planning, 22 patients underwent a CT, MR-T1 and MR-T2 imaging protocol of the head and neck region Verhaart et al., 2014 ) ( Fig. 7 ). In each acquisition between 15 to 21 landmarks were used to quantify the registration accuracy in terms of gTRE. The intra-rater variability of the landmarks was approximately 1 mm.
Prior to registration, all images were resampled to the smallest voxel spacing present in the group of images. A deformable transformation was used in two resolution levels using cubic B-splines with a final grid spacing of 64.0 mm, as suggested by Fortunati et al. (2014) .

RIRE
The RIRE database ( West et al., 1997 ) includes 18 patients with up to five different imaging modalities of the brain ( Fig. 8 ). All 18 patients had at least three of the following modalities available: CT, PET, MR-T1, MR-T2, MR-PD. Fiducial markers and a stereotactic frame were used to determine the ground truth transformations for CT to MR and PET to MR. Four to ten landmarks were available for each patient as a ground truth for the registrations and their target registration error was computed through the webform of the RIRE project, where rigid displacements between acquisitions were assumed.
To increase the robustness of the optimization, a two-step approach is used. First, a translation is optimized and used as an initialization for a second full rigid transformation with three translational and three rotational degrees of freedom. The registration was performed with five and two resolution levels, respectively. Similar to the Head&Neck dataset, preprocessing was performed by resampling the images in the group to the smallest voxel spacing.   The registration hyperparameters for the different experiments are summarized in Table 2 .

Synthetic data
The behavior of the metric value and its separate components in the Black&White experiment are shown in Fig. 9 as a function of the translation. The Black&White experiment shows that the metric behavior of S AMI and S PC is equal to the behavior of the entropy of the images in the group. The contribution of the entropy of the template image completely cancels out the contribution of the joint entropy in S AMI and S PC as can be seen in Fig. 9 (c) and (d). The resulting optimization is only driven by the complexity of the images in the group and not by their shared relationship.
The results for the Multimodal Cubes experiment are shown in Fig. 10 . When comparing the similarity metrics, S CT E (1.71 ± 0.11 mm) significantly outperformed all other entropy-based groupwise metrics (2.80 ± 0.32 mm, 2.73 ± 0.34 mm and 1.74 ± 0.11 for S AMI , S PC and S CE respectively).

Clinical data
Results for the gTRE in experiments on clinical data are visualized with boxplots in Figs. 11 and 12 . For the experiments on the Thoracic 4D CT and Carotid MR datasets ( Fig. 11 ), no statistically significant differences were observed in terms of gTRE for the investigated information-based metrics.
In the Head&Neck experiment ( Fig. 12 ) the best results are achieved by S CT E with a gTRE of 2.74 ± 1.17 mm performing significantly better compared to S AMI , S PC and D PCA 2 .
Pairwise S MI performed best in the RIRE experiment ( Fig. 12 ) with a gTRE of 2.29 ± 0.72 mm ( S CT E , 2.33 ± 0.57 mm), but no significant differences were found compared to the other entropybased metrics. D PCA 2 performs worst, with the differences being statistically significant. A group of images was found to be misregistered following Tomaževi č et al. (2012) when the gTRE is larger than the largest voxel spacing in the images. No misregistrations were obtained for S CT E , S CE and S MI whereas S AMI and S PC misregistered two patients and D PCA 2 misregistered 14 patients.
In all four experiments on clinical data, pairwise MI performed worst in terms of transitivity, whereas the transitivity error for groupwise metrics reduced to (close to) zero ( Table 3 ).
In Table 4 , the values are given for the average runtime of the experiments performed in this work. The use of the conditional entropy does not induce an extra computational burden, whereas the use of the principal component images does. This discrepancy originates from an additional loop over the sampled coordinates, needed to perform the PCA and determine the weights of the eigenvector. Note that for more complex registrations with a regularizer, the additional computation time is relatively small compared to the total cost.

Discussion
Results on the Thoracic 4D-CT and Carotid MR dataset showed equivalent performance of the proposed methodology compared to other state-of-the-art methods in terms of registration accuracy.
The results for the Multimodal Cubes, Head&Neck and RIRE results were consistent. In all three datasets the accuracy improved for the proposed formulation compared to S AMI , and the improvement was found to be statistically significant in the former two experiments. Throughout these experiments the behavior of the auxiliary metrics S CE and S PC was also consistent. Using the conditional entropy instead of mutual information led to a large improvement, while using the principal component image improved the accuracy modestly. The combination of both contributions led to the best results in all three experiments compared to other groupwise metrics. As expected, the PCA2 metric performed poorly in multimodal registrations where a quantitative model or lowdimensional subspace is not available.
In all experiments based on clinical data, the transitivity of the resulting transformations was compared to S MI for groupwise approaches. These results emphasize the added value of the implicit reference space in multimodal groupwise registration. Whereas a pairwise approach has to perform two separate registrations with different reference images to obtain a concatenated transformation, in a groupwise approach all transformations are evaluated simultaneously and with a substantially lower transitivity error. These results are consistent with previous findings in monomodal data ( Geng et al., 2009;Metz et al., 2011 ).
In summary, for experiments based on images where no or modest changes in intensity distributions are present ('Thoracic 4D-CT' and 'Carotid MR'), CTE showed comparable performance to previously proposed groupwise methods and pairwise MI. In experiments with strongly varying intensity distributions ('Multimodal Cubes', 'Head&Neck' and 'RIRE'), CTE showed superior performance to previously proposed groupwise methods and performed on par to pairwise MI, with little to no transitivity error.
Figs. 5 (f) and 6 (f) highlight the differences in the average and principal component images. Herein, the absolute difference image between the average and principal component image is given in the 'Thoracic 4D CT' and 'Carotid MR' dataset, respectively, for a single patient. Herein, the largest differences occur in regions where the motion is greatest near moving structures or edges. This is consistent with previous work, where the principal component image was used to separate motion present in the images ( Feng et al., 2016;Hamy et al., 2014;Melbourne et al., 2007 ). For multimodal registrations, the benefit of PCA over averaging can be seen by considering cases in which images with an inverted intensity profile are merged into the template image, as shown in Fig. 4 (g) and (h) and Fig. 13 . For the 'Multimodal Cubes' experiment, PCA lead to an increase of the contrast-to-noise ratio from 7.4 to 32.5 compared to simple averaging. Fig. 13 shows the average and principal component image when applied to the ventricles for an arbi-   trary patient in the RIRE dataset. With the T2 modality having an inverted intensity profile, the principal component image is able to retain the contrast in the template image. In the average image the intensities cancel out and the ventricles are poorly visible. Two limitations should be stated with respect to current work. Firstly, only intrasubject data has been employed. Intersubject data is characterized by greater variability of intensity profiles and morphology, and has been reported to considerably increase the complexity of groupwise registration ( Hamm et al., 2009;Tang et al., 2009 ). It remains to be verified how CTE would perform when confronted with such data.
Secondly, in this work a methodology was used where the images are deformed and compared to the template image in the implicit reference system. However, previous work has shown that deforming the template image to the images in the group suits a generative model better ( Allassonnière et al., 2007;Ma et al., 2008 ). In methodologies where the template is deformed to the images in the group, no need exists to constrain the transformations to the average deformation space ( Eq. (16) ). This was shown to be advantageous, as such constraints could exclude some legitimate results ( Aganj et al., 2017 ). We expect the proposed metric to perform equally well in such frameworks as it is independent of the transformations that were used.

Conclusion
In this work we proposed a novel similarity metric for intrasubject multimodal groupwise registration, the conditional template entropy. The proposed metric was evaluated in experiments based on synthetic and clinical intrasubject data and showed equivalent or improved registration accuracy compared to other state-ofthe-art (dis)similarity metrics and improved transformation consistency compared to pairwise mutual information. These improvements were achieved mainly by the use of the conditional entropy, whereas the use of the principal component image contributed modestly in our experiments.