Distance Oriented Particle Swarm Optimizer for Brain Image Registration

In this paper, we describe improvements to the particle swarm optimizer (PSO) made by the inclusion of an unscented Kalman filter to guide particle motion. We show how this method increases the speed of convergence, and reduces the likelihood of premature convergence, increasing the overall accuracy of optimization. We demonstrate the effectiveness of the unscented Kalman filter PSO by comparing it with the original PSO algorithm and its variants designed to improve the performance. The PSOs were tested firstly on a number of common synthetic benchmarking functions and secondly applied to a practical three-dimensional image registration problem. The proposed methods displayed better performances for 4 out of 8 benchmark functions and reduced the target registration errors by at least 2mm when registering down-sampled benchmark brain images. They also demonstrated an ability to align images featuring motion-related artifacts which all other methods failed to register. These new PSO methods provide a novel, efficient mechanism to integrate prior knowledge into each iteration of the optimization process, which can enhance the accuracy and speed of convergence in the application of medical image registration.


I. INTRODUCTION
Optimization is a key component in many practical scientific computing problems.It is used to search for the optimum value of a pre-defined fitness function of a measure within a problem space [1].As a typical global optimization method, particle swarm optimization (PSO) has been paid significant attention during the last few decades, as it is less prone to becoming trapped in local optima.Various improvements have been suggested to the original PSO algorithm to improve convergence and computation speed.
However, neither the original PSO method nor its generalpurpose modifications derived any advantage from available prior knowledge about the problem space which may act as a critical role in specific applications.The goal of many optimization problems is not just searching for an optimal value of the fitness function.One typical example of this issue is presented by a problem associated with image registration, for which the distance to the real global optima, rather than The associate editor coordinating the review of this manuscript and approving it for publication was Victor Hugo Albuquerque.
the value of the measurement function, is more important.This is because small differences of the fitness function values can actually represent large differences between image transformation parameters, which may in turn falsely indicate alignment between images.If prior knowledge about the content of the image is ignored in favor of the result of the value-oriented PSO, the optimization process may tend to converge to local optima that exhibit ''better'' measurement values.These local optima may be at a significant distance from the global optimum, thereby causing the image registration to ''fail''.To deal with this specific application, in this paper, we introduce a novel distance-oriented PSO, guided by an unscented Kalman filter (UKF) [1].This method can encode prior knowledge about the distribution of a fitness function within the problem space and stretch the optimizer to converge at a point near the true global optimum.
Image registration algorithms are often based on the premise that the magnitude of the chosen similarity metric is related to the magnitude of the error between the current spatial transform between the images and the optimal spatial transform between the images [1], [2].Assuming the distribution of the similarity metric function is approximately unimodal, we propose a customized UKF-PSO framework derived from the Bayesian perspective of the PSO [3].The UKF-PSO algorithm iteratively estimates global optima with accumulated information about probability distributions of the similarity measurements.This leads to faster convergence, with improved robustness to local optima over a large search space.Another advantage of this approach is the ease with which multiple similarity metrics can be combined, by extension to a nested UKF-PSO (N-UKF-PSO) that removes the need to apply fixed weights to the different similarity metrics by adaptively adjusting the weighting during the convergence process of the Kalman filter.The proposed methods are compared to several presently popular PSO methods using some popular benchmark functions, as well as a publicly available medical image registration dataset.Both the UKF-PSO and N-UKF-PSO display better robustness to local optima and better accuracies in the image registration experiments.
In this paper, important previous work that attempts to solve similar image registration problems using the original or modified versions of PSO are briefly reviewed in section II.The theory of our UKF-PSO and N-UKF-PSO methods are introduced in section III.Sections IV and V describe the details of UKF-PSO and N-UKF-PSO.Experiments performed on both benchmark functions and a publicly available image registration dataset are shown in sections VI and VII, and discussed in section VIII

II. RELATED WORKS
Both local and global optimization methods have been applied to solve image registration problems.Local optimization suffers from becoming trapped in local optima.The use of multi-resolution image pyramids can partially mitigate this, however, the global optimum may not be represented in the down-sampled problem spaces, in which case the optimizer will still converge to a local optima [4].Among the global optimization methods, evolutionary computation plays an important role.For example, inspired by social and cooperative behavior, Kennedy and Eberhart [5] proposed the first PSO algorithm in the mid-1990s [6].Since then a number of modified versions of PSO have been developed and applied to different image registration applications [4], [7].Research efforts have concentrated on improving the convergence speed and robustness of the PSO when the problem spaces are very large and exhibit multiple local optima.These extensions of PSO methods use either alternative neighborhood structures [8] or novel particle evolution strategies [6], [7], [9].A widely used PSO using alternative particle evolution formulae is quantum behaved PSO (QPSO) [9].The formulae were further redesigned in the revised QPSO (RQPSO), the diversity controlled RQPSO (DRQPSO) [10] and the chaotic search QPSO [11].Another popular approach is to hybridize PSO with other optimization methods, for example Genetic algorithm [12] or Simplex [13].Comparisons and reviews of the major PSO variants can be found in [3].
Wachowiak's method provides a registration-specific prior knowledge approach [4] but requires precise initialization.Other methods that exploit prior knowledge include the Bare Bones PSO [14], Kalman Filter PSO [15], and Andras' Gaussian PSO, based on a Bayesian interpretation [3].These methods either provide a probabilistic perspective of the particle status, or an adaptive mechanism to integrate prior knowledge.

III. THEORY DERIVATION
For a fitness function f (x), an optimization process search in a problem space for x gives an optimal value of f (x).
For the problem targeted by this paper, image registration, f (x) is a predefined similarity measure between images, and is all the possible image transformations limited by degrees of freedom.The purpose of optimization is then formulated by: where x o is the optimal solution of x, and the purpose of registration is to find x o which gives the optimal image transformation parameters or leads to the highest similarity of the images.However, due to the presence of local optima, x o is often difficult to find.In this case, the returned x should be as close as possible to x o .The PSO simulates the social and cooperative behavior of a ''swarm'' of potential solutions, called particles [6].Each potential solution corresponds to one position in problem space.Each particle explores the problem space at an individual random speed that is partially affected by combined knowledge about the up-to-date global and local optima.Searching for global optima in a D-dimension problem space with K particles at the tth iteration of PSO, a solution represented by the position of the ith particle is a D-element vector, In the original PSO method, a widely used formula for updating the speeds of the particles, v i (t + 1), is given by [3], [5], [6]: where ω is the inertia weight, x p i is the local best solution found by the ith particle, and x g is the best up-to-date global optimum.c p and c g are acceleration constants that weight the attraction of local and global optima to each particle, and r p and r g are random generated numbers drawn from the uniform distribution over the range of (0, 1) [6].The updated particle positions are then given by [3], [8], [9]: Equation (2) consists of three components: the previous velocity v i (t), the cognition component c p r p x p i − x i (t) , and the social component c g r g (x g − x i (t)).The combination of these components is a compound velocity that moves the particles towards the local and global optima, while preventing any significant deviations from the particles' previous directions.This mechanism makes a stepwise improvement in the algorithm convergence until all of the particles have moved into a small constrained area, or the global best position remains unchanged for a certain number of iterations.
Other than the coefficients which appear in the PSO formula, the most common modifiable parameters are the swarm size (i.e. the number of particles), the searching range, and the maximum number of iterations.
If f (x) is complicated and presents multiple local optima which is common for image registration applications, PSO still suffers from premature convergence.Integration of prior knowledge of the problem space into the particle evolution formulae can improve the robustness of PSO.To encode prior knowledge into the particle evolution process, Andras [3] proposed a Gaussian PSO model based on a Bayesian interpretation.In this model, the evaluated fitness value, f (x) [3] is given by: where is a noise distribution (typically zero-mean Gaussian) added to the noise-free fitness value [3].Following Bayesian theory, likelihood is given in the form of a probability density function (PDF), P (x), defined over the search range.Given all f (x i (0)), the PDF may be calculated using: where P x f (x i (0)) calculated in an iteration is used as the new P (x) in the following iteration.The evolution of the particles can then be formulated by [3]: where P t (x) is the P (x) calculated in the t-th iteration.The calculation of P t (x) can be performed based on the assumption that the evaluated fitness values of the particles are either co-dependent or independent, leading to two implementations of this Bayesian Gaussian PSO.The fitness function is assumed to be proportional to the probability of a point in the search range being the optimal solution.Thus in a registration problem, the similarity measure can be considered as a nonnormalized probability, or factor.The probability distribution over the whole search range is interpolated using multiple Gaussian bases for the Bayesian Gaussian PSO.This work provides a framework to integrate prior knowledge into image registration in the form of P (x) [3].In this paper, we use a simplified definition of P (x), based on prior knowledge specific to image registration.As a result, there is no need to calculate the probability distribution under different assumptions of dependences between particles, as P (x) can be directly fitted using the evaluation values of all particles.
Target registration error (TRE) is often the ground truth metric of image registration problems.The TRE is zero for two perfectly aligned images.We generalize this, such that x o is the optimal transformation represented as a point in the problem search space, that results in a TRE closest to zero.Over the whole search range, the similarity measure f (x i ) of a transformation represented by any particle is the distance measure x i − x o .Any other similarity measure can be considered as a monotonic mapping of this distance, K ( x i − x o ).We simply assume a form of Gaussian function for K, This assumption of prior knowledge indicates that P (x) follows a Gaussian-like distribution with unknown expectation, x o .The advantage of using this Gaussian form is that x o is the expectation, xP (x) dx over the whole problem space.In each iteration of the PSO, x o is estimated by the optimum value within the area searched by particles, x g .Here, rather than directly selecting the optimum value from among all particles, the estimated global optimum x g (t) is calculated by the average of all x i (t) weighted by the normalized f (x i (t)) defined in equation ( 4).
According to (4), and the theory of the Bayesian interpretation of the PSO [3], P (x) is thus modeled as: where σ is a normalization constant and is a zero-mean Gaussian noise with unknown standard deviation.Ignoring the noise , a reasonable estimation of P (x) is: where the Pt (x i ) is the estimation of P (x) at x i in the t-th iteration.Pt (x) can be obtained by fitting a Gaussian function using all f (x i (t)).δ2 is the variance of this Gaussian function.The global optimum can be estimated by solving, Although the assumed Gaussian form of f (x) and P (x i ) cannot accurately capture the shape of the similarity measure for a large search range, it gives a reasonable estimation of the global optima, and will improve as the search range contracts, as shown in Equation ( 10) can be solved by fitting the shape of ln Pt (x)using a quadratic least squares method, though this will introduce much greater computational complexity [31].The purpose of fitting the Gaussian function is to obtain an estimated global optimum xg (t) , and δ2 is not used in further optimization processes.We use the weighted mean of all particles obtained in each iteration to estimate the initial global optimum, i.e.
where N max is the maximum number of iterations.The estimation of the global optimum should move towards the true global optimum of the similarity measure as the search range contracts during the optimization process.One important assumption of ( 10) is that f (x) ≥ 0, which is easy to achieve by normalization.Specific to image registration problems, if the images are aligned by minimizing a difference measure, denoted as f d (x), we can convert it to a similarity measure by, where ε (•) is a function of f d (x) in the searching range.In summary, during each iteration of the PSO, a noisy estimation of the global optimum xg (t) can be obtained using (11).xg (t) can then be improved during the evolutionary process of the PSO by combining information from all the particles and the previous iterations.

IV. THE LDS-KFPSO METHOD
xg calculated using (11) can replace x g in the PSO formulae as it moves closer to the optimum of f (x).However, with integrated prior knowledge, the estimation of the PDF of x o in the search range can be improved by accumulating the information obtained in previous iterations.This can be achieved through the dynamic Bayesian network (DBN) presented in Monson and Seppi's Kalman filter PSO [15], which is used to characterize the time-sensitive relationship between observable and hidden states.For image registration problems using swarm optimization, the global and local optima obtained in FIGURE 2. Information available at the t-th iteration of PSO: The hidden state θ represents an optimal estimation x g * of the true global optimum; the observed state ξ is defined as the average position of all particles xg weighted by the measured fitness function f of each particle.An estimation of the hidden state x g is produced by fitting f to a Gaussian function in each iteration of the optimization process.For the t -th iteration, x g (t) can be obtained by combining x g (t−1) and xg (t).When solving the optimization problem using a linear Kalman filter, x g (t−1) is treated as the output of time − update stage, θ -, and x g (t) is the output of the measurement − update stage, θ .
each iteration can be encoded as the observed state ξ .Based on the theory in [30], the hidden state, θ, represents the ideal location and speed of a particle that leads to a better fitness of x g * .With the prior knowledge discussed above we can define x g * as the average of x i , weighted by the noise-free fitness function, f (x i ), or more directly define it as x g * = x o .An estimation θ of the hidden state is given for each iteration.
However, because the prior knowledge of registration problems is integrated and xg is calculated using equation (11), a much simpler DBN can be adopted here, using the raw information demonstrated in Fig. 2.After t − 1 iterations, the hidden state is the ideal position x g * (t) that is closer to x o , or equals x o .The observation ξ can be directly defined as xg (t).Each iteration has a current estimation of the hidden state x g * (t) based on this observation.To obtain this estimation, the relationship between θ and ξ is depicted as an instance of the hidden Markov model (HMM), as shown in Fig. 3 [15].The hidden state θ evolves over time, based on a state transition model F, and influences the observable state through a known observation model H .The transition model, F, reflects how an estimated global optimum moves closer to locations of better fitness, and the observation model can then be described as a model of the influence of x g * (t) upon xg (t).When defining x g * as the average of x i weighted by f (x i ), as shown in Fig. 4, F can be specified such that the evolution of x g * depends on the movements of every particle.This assumes either a highly non-linear state transition process, or we may use x o as the hidden state that assumes an identical state transition.In both cases, the observation model is an identical mapping.
This influence of x g * on xg is inherently noisy, and the noise is used as a subjective uncertainty model of the accuracy of an observation [15].Based on the prior knowledge being  integrated, the current state is modeled by a Gaussian distribution with mean x g and a variance that models how strong the likelihood is that x g reflects x g * .The goal of the registration process is then to reduce the uncertainty of this likelihood over x g to its lowest level, and thus give the most accurate prediction.Since this prediction is produced by combining the information from all particles and all previous iterations, it is applicable to different PSO methods with different velocity and position updating mechanisms.
For the HMM described above, the Kalman filter [16] and its extensions [15] can be regarded as solutions.When F and H are linear, and the HMM is therefore known as a linear dynamic system (LDS), the Kalman filter provides an efficient way to recursively estimate the state of this process while minimizing the mean square error [18].The Kalman filter models the HMM as a predictor-corrector circle, where both the state-transition and observation are noisy processes with additive Gaussian noise.In our registration problem assuming a LDS in the prediction or time-update stage, a prediction of x g * (t) is given by, where F is the matrix representation of the state transition function, θ − (t) and − (t) are the mean and variance of predicted x g (t) respectively, and θ is the covariance of the state-transition noise.Assuming θ (t) = x g * (t) = x o , F is an identity matrix.Then in the correction, or measurementupdate stage, the estimation of state is refined using the observation, where the K (t) is the Kalman gain in the t-th iteration that is used to balance the influence of prediction and observation, H is the observation matrix, which is identity, and θ (t) and (t) are the mean and variance of the estimation respectively.The estimate of global optimum is based on the following probability distribution [18], This PSO model guided by Kalman filter (KF) under LDS assumption is named as LDS-KFPSO.

V. THE SPO-UKFPSO METHOD
When using the non-linear state transition model shown in Fig. 4, the HMM is not a LDS.In this case the non-linear extensions of the Kalman filter should be applied to deal with the non-linear state transition process x g * = F (x g * (t − 1)).The extended Kalman filter (EKF) is the standard method for dealing with non-linear processes.However, it requires the calculation of a Jacobian matrix for F (x) [2], which is difficult for this complicated state transition function.Hence we propose the novel use of an unscented Kalman filter (UKF) [2].Rather than estimate an arbitrary transition function as the EKF does, the UKF approximates a Gaussian probability distribution using standard vector and matrix operations based on a set of weighted sigma points, . For the t-th iteration in a D-dimensional problem space, the sample mean and covariance of the set of sigma points are θ (t − 1) and (t − 1) [19].Specifically, the sigma points and their associated weights are selected by, where W j is the weight associated with the jth sigma point.Details of how to select the weighting parameter, κ, can be found in [2] and [19].In this work, we follow Uhlmann's [19] recommendation that κ + D = 3.In the Kalman update stage each sigma point is instantiated through the state transition function by [19], and then the mean of state prediction is calculated by [19]: and the variance is given by [19],  As the observation model is an identity function, we can still use the linear measurement update formulae of the original Kalman filter (given by equations (15)(16)(17)) in the correction stage to obtain θ (t) and (t).Under this non-LDS assumption, since the uncertainty associated with the estimated global optimum is related to the distribution of particles, we can simply use either a sample, or all of the particles together with the estimated global optimum as the sigma points of UKF.This allows the number of sigma points to be greater than 2D + 1, and makes integrating the UKF into the PSO more convenient.In addition to the traditional stopping criteria, (t) may be used as additional evidence of the convergence situation of the PSO.To sum up, the procedure of the PSO was combined with the predict-correct circle of the Kalman filter.For both LDS and non-LDS cases, our new UKF-PSO algorithm can be represented as shown in Fig. 5.
The estimated global optimum, xg , will be affected by the relative location of the global optimum in the search range.Fig. 6 shows how this estimation changes when using different search ranges with the same size.The estimation is more accurate when the true global optimum is closer to the center of the search range.A slightly different observation can therefore be used to improve the estimated global optimum: in each iteration, after xg is calculated, all the particles are resampled to be xi , so that the searching range is centered on xg .Then a new average xg can be calculated as the observation, weighted by the new evaluations f (x i ).We name this model the ''shift particles observation'' UKFPSO (SPO-UKFPSO).In this case, the HMM will be different from the one used in the above UKFPSO method, with different definitions of θ, θ , , F, and ξ .The workflow of the SPO-UKFPSO method is shown in Fig. 7. To apply the UKF guided PSO model to real image registration tasks, the choice of similarity measure also has a profound influence on the results.The chosen similarity measure has to follow the prior knowledge modeled by equation (10), which allows the problem to be solved as shown in Fig. 1.For example, for a multi-modality registration problem, the sum of squared difference (SSD) of intensity is a poor choice.Therefore, we opt for the widely used mutual information (MI) instead.To register a reference image µ and a floating image ν, MI is calculated using their joint entropy H (µ, ν), and marginal entropies H (µ) and H (ν), where MI is the similarity measure which makes registration a maximization problem.

VI. THE NESTED UKF-PSO
Image registration can be performed using different types of similarity measures, as well as different features.In order to combine different features and measures we must assign a suitable weighting to each one and normalize them to comparable scales.A benefit of the proposed model using prior knowledge, is that fitness values of any similarity measure are automatically normalized so as to be samples of a probability distribution, which maps all the measures to a uniform scale.
As shown in Fig. 8, in the case where we have two similarity measures, f 1 (x) and f 2 (x), the estimation of the global optimum output by a UKF using one measure can be intuitively considered as θ − of the second UKF associated with the other measure.The two UKFs share the same population of particles during the optimization process, which means that each particle obtains two fitness values in each iteration.The framework can be extended using multiple  nested UKFs to allow any number of features or similarity measures to guide the optimization.

VII. PARTICLE STATE EVOLUTION
To sum up, the outputs of the KF or UKF in the three implementations of PSO above include the estimated hidden state θ, and a variance , that reflects the estimation error.As discussed in sections III and IV, the accuracy of the estimation of the global optimum given by the weighted average (equation ( 11)) is dependent on the size of the search region, and the positioning of the true global optimum.Furthermore, the KF and its extensions generally behave like low-pass filters, which means high frequency information may be filtered out as well as the noise.In this case, a more reliable rapid model can be formulated by: where c θ is the acceleration constant weighting the attraction of the estimated hidden state output by the KF or UKF, and r θ is a randomly generated number drawn from the uniform distribution over the range (0, 1).The component c θ r θ ( θ − x i (t)) controls the influence of the estimated hidden state over the orientation of particles.The acceleration constants c p , c g and c θ need to be adjusted to balance the influence of the personal optima x p i , the measured global optimum x g , and the filtered optimum θ.Many methods initialise c p and c g as 2.0.In this work, c g and c θ are initialized by letting c g = c θ = 1, and during the particle evolution process they are adjusted by and where x θ (t) is the measured global optimum x g obtained in the t-th iteration.Particle positions are then updated using equation (3).

VIII. EXPERIMENTS
The proposed PSO methods were evaluated on both general optimization and image registration problems.A few representative PSO methods previously used for registration are also chosen for comparison purposes.

A. BENCHMARK FUNCTIONS
The proposed PSO models were compared using some common benchmark functions widely used in the PSO literature [20], shown in table 1.Since the optimization methods proposed in this paper are customized for image registration applications with the assumed prior knowledge described in section IV, we chose different types of benchmark functions, both single-objective and multi-objective, to comprehensively compare the power of the different PSO methods.
As the nested UKFPSO method is specifically designed for image registration applications requiring multiple types of features or different types of similarity measures, it is not included in this benchmark function comparison.
For image registration problems, it is more important to find a position that is closer to the real global optima in the search space than to search for a better value of the fitness function.The performances of the compared algorithms are therefore measured by the norm of the differences between their returned vectors and the ground truths of the benchmark functions.Since for most of the chosen benchmark functions the ground truth optima locate in the center of the search space, a weak optimization algorithm that tends to converge to the center of the search space may obtain better results than others.To deal with this bias, while keeping the ground truth within the search space, we generated random shifts of the searching bounds, limited to be within 40% of the problem space.
Besides the random shift of the search ranges, the algorithms were tested using a random problem dimension chosen between 2 to 30, and repeated for each algorithm 100 times for each benchmark function.The mean and standard deviation (STD) of each algorithm were calculated.The stop condition of the algorithms was either reaching 300 iterations or reduction of the variability of the particle positions around the global optima to be less than 10 −6 .All algorithms were implemented in MATLAB (Mathworks, USA) with vectorized simulation of particle positions.Other than the particle position update mechanism, and some method specific parameters, all implementations shared core code to ensure that the comparison was performed under similar circumstances.
Accuracy, convergence speeds and the run times of each method were measured.Speeds were evaluated using the average number of iterations and function evaluations of each run, as well as the raw convergence time.For a general overview of the performances, the mean accuracy of each method over all benchmark functions was also calculated.

B. REGISTERING BENCHMARK DATASETS
In order to evaluate the performances of the proposed PSO methods in real registration applications, we conducted a rigid registration experiment based on data from the multi-modality brain image datasets from the Retrospective Image Registration Evaluation (RIRE) Project [21].The comparison original the DRQPSO, the Bare Bones PSO, the Kalman filter PSO, LDS-KFPSO, SPO-UKFPSO and the nested UKFPSO methods.All methods use MI as the similarity measure, except for nested UKFPSO, which used MI for measure f 1 (x) and the gradient features proposed by Pluim al. [38] were used as f 2 (x).
As the purpose of this experiment is to compare the performance of different PSO methods in real image registration applications, rather than to obtain the absolute highest registration accuracy, we integrated the PSO methods into a very simple registration framework.For the sake of simplicity and efficiency, each slice of both the reference and floating volumes was down-sampled to 20% of the original in-plane resolution of the reference image along each dimension.The slice thickness of the floating volume was also interpolated to the slice thickness of the reference volume so that the optimization method only dealt with translation and rotation parameters.To allow further speed-up of the registration, we selected a cubic region of interest (ROI) in each volume by applying Otsu's histogram-based threshold selection method [23] to the normalized data.The RIRE project measures the accuracy of registration using target registration error (TRE), calculated from multiple volumes of interest (VOIs).Target registration error (TRE) is used as the measure of registration accuracy.The transformation parameters calculated from the resampled data are rescaled for transformation of the original volume.For each patient, 10 attempts at registration were completed, and in each run all methods use the same set of initialized particles that were generated by a MATLAB quasi-random number simulator.

C. REGISTERING NATURAL DATASETS
To further compare the performance of our methods with the original PSO, we also conducted an experiment using neonatal data collected from a clinical trial performed at the Clinical Research Imaging Centre (CRIC), University of Edinburgh (UoE).This dataset has previously been used to evaluate the performance of the registration framework based on a rearranged histogram specification (RHS) and K-means binning [24].We used images acquired at 38-44 weeks' postmenstrual age in natural sleep using a 3T Verio system (Siemens Healthcare Gmbh, Erlangen, Germany).Because of the neonatal age of the population being imaged, there is likely to be significant motion between acquisitions, which makes this dataset a good test of registration algorithms.Isotropic anatomical data were acquired with a range of contrasts, selected to facilitate the development of volumetric brain segmentation algorithms for the main study.
Data from 10 patients were aligned using a rigid-body transform, calculated within a 51 × 51 × 41mm 3 userpositioned ROI on volumes with an isotropic voxel size of 1.56mm.Transformation matrices were obtained from data down-sampled to half original resolution.Performance was evaluated by TREs, calculated from 1908 pairs of corresponding landmarks (18 on each volume), manually placed by a clinical expert.The accuracy of the LDS-KFPSO and the nested-UKFPSO are compared with the results from our earlier work based on the original PSO [24].

A. BENCHMARK FUNCTIONS
Table 2 shows the average minimization error of the different algorithms for each benchmark function (the STD of each run is shown within parenthesis).Table 3 summarizes the overall performances of the different algorithms.
As shown in table 2, the original PSO gave the best result for the Step function.The Bare Bones PSO performed better for the Griewank, Modulus Sum and Salomon functions.The proposed LDS-KFPSO method converged to positions that are closer to the true global optima for the Ackley Schewefel and Rosenbrock functions.For the majority of the benchmark functions, the proposed LDS-KFPSO and SPO-UKFPSO returned the best performances, or performances comparable to Bare Bones PSO.The Step function is a special case among all the benchmark functions, as it is increases monotonically, and the global optimum is located around the upper bound of the search range.In registration applications, this may happen when the true global optimum is not included in the search space.As expected, in this case, the LDS-UKFPSO and SPO-UKFPSO methods gave worse results.
Based on the results shown in table 3, due to the simplicity of its position update model, the implementation of chaotic QPSO has the fastest convergence time, but worst accuracy.comparison, the LDS-KFPSO and SPO-UKFPSO may take slightly longer to complete each iteration, but both required fewer iterations than the other methods.In particular, the LDS-KFPSO used the least number of function evaluations, and had the shortest run time to achieve the best optimization results.The SPO-UKFPSO provided greater accuracy compared to LDS-KFPSO and converged quicker than most of the other methods.

B. RIRE DATA
The TREs for the CT-MR_T2 and PET-MR_PD registrations are shown in table 4. All three proposed PSO methods returned better results than the other methods in terms of mean and median TRE.Due to the combined features and similarity measures it utilizes, the nested-UKFPSO gave better results amongst the three proposed PSO models.For the Bare Bones PSO and Kalman filter PSO, since these methods feature a more deterministic position update mechanism, they display better convergence speed than the original PSO and DRQPSO.However, the original PSO and DRQPSO were highly sensitive to particle initialization and gave the greatest variability between each run of the experiment.
C. NEONATAL DATA Fig. 9 displays the results of successfully registering the T2-w dark fluid and T1-w MRPAGE neonatal images using the UKFPSO methods.Registration of this particular dataset was only achieved using the UKFPSO method, previous methods had failed to register the shown example.The quantitative evaluation of these registration results are shown in table 5.The LDS-KFPSO and nested UKFPSO therefore not only gave smaller TREs than the original PSO, but also successfully aligned one particular problematic dataset that our previous method failed to register [24].

X. CONCLUSION
In this paper, we have described three new UKF-guided registration-oriented optimization implementations.The new PSO-based methods were evaluated using benchmark functions and by registering two medical image cohorts.Compared to the selected PSO algorithms, the UKF-guided PSO methods achieved more accurate registration results and displayed better robustness to the presence of local optima.The convergence speed is comparable to the QPSO when minimizing benchmark functions and is comparable to the original PSO algorithm when registering medical images.
This new type of UKF-based PSO algorithm provides an efficient mechanism to encode prior knowledge of the search space into the optimization process, without requiring manually assigned weights for each feature included in that prior knowledge.Unlike other PSO methods, the proposed methods update the probabilistic distribution of the whole search space, rather than storing the distribution for each particle.This process iteratively moves the particles close to the global optimum, especially in the early stage of PSO, thus leading to quicker convergence.Furthermore, the mechanism that updates the knowledge of the search space can also be applied to other swarm-based optimization methods, for example, other swarm intelligence methods.Thus, it has great potential for application in a variety of medical image registration problems.
Fig. Error!Reference source not found..If the searching algorithm converges ideally, the Gaussian function becomes a Dirac delta function.

FIGURE 1 .
FIGURE 1. Fitting a Gaussian function to the distribution of mutual information within three different searching ranges.The fitted Gaussian function tends to give a more accurate estimation of the distribution within a smaller searching range.

FIGURE 3 .
FIGURE 3. Hidden markov model: θ and ξ are the hidden and observed states.

FIGURE 4 .
FIGURE 4. The non-linear state transition model F used to evolve the optimal estimation x g * of the true global optimum.

FIGURE 6 .
FIGURE 6. Estimations of global optimum when placing the searching range to different positions of the problem space.

FIGURE 9 .
FIGURE 9. Registration results obtained using original particle swarm optimizer (PSO), the linear dynamic system Kalman filter PSO (LDS-KFPSO) and the nested unscented Kalman filter PSO (UKFPSO): (a) before registration; (b) registered using original PSO; (c) registered using LDS-KFPSO; (d) registered using nested UKFPSO.The registration is performed to align the T2 weighted dark fluid and T1 MRPAGE images of the patient which failed all the registration methods tested in our previous work.The results are visualized in overlapped red and green color channels.

TABLE 2 .
Evaluation of the PSO methods applied to the chosen benchmark functions.

TABLE 3 .
Performances of the PSO methods applied to the chosen benchmark functions.

TABLE 4 .
Evaluation of the PSO methods to RIRE data.

TABLE 5 .
Statistics of target registration errors (TRE).