Medical Image Registration via Neural Fields

Image registration is an essential step in many medical image analysis tasks. Traditional methods for image registration are primarily optimization-driven, finding the optimal deformations that maximize the similarity between two images. Recent learning-based methods, trained to directly predict transformations between two images, run much faster, but suffer from performance deficiencies due to model generalization and the inefficiency in handling individual image specific deformations. Here we present a new neural net based image registration framework, called NIR (Neural Image Registration), which is based on optimization but utilizes deep neural nets to model deformations between image pairs. NIR represents the transformation between two images with a continuous function implemented via neural fields, receiving a 3D coordinate as input and outputting the corresponding deformation vector. NIR provides two ways of generating deformation field: directly output a displacement vector field for general deformable registration, or output a velocity vector field and integrate the velocity field to derive the deformation field for diffeomorphic image registration. The optimal registration is discovered by updating the parameters of the neural field via stochastic gradient descent. We describe several design choices that facilitate model optimization, including coordinate encoding, sinusoidal activation, coordinate sampling, and intensity sampling. Experiments on two 3D MR brain scan datasets demonstrate that NIR yields state-of-the-art performance in terms of both registration accuracy and regularity, while running significantly faster than traditional optimization-based methods.


Introduction
3D image registration has a pivotal role in many medical applications [30,46], such as merging images from different modalities, motion correction, tracking disease progression, and atlas-based image segmentation.Image registration can be categorized into two groups: rigid and nonrigid.Non-rigid registration (also known as deformable registration), considering non-affine coordinate transformations between two images, is more widely used.Diffeomorphic image registration, imposing additional transformation constraints, such as smoothness, invertibility and topology preservation, is often preferred in certain applications.In this paper, we present a new image registration framework that supports both general deformable and specific diffeomorphic image registrations.
Traditional image registration methods [4,6,9,47,50] treat image registration as an optimization problem: finding the optimal coordinate transformations that maximize the similarity between the transformed source image and the target image.These methods usually require hard modeling assumptions on the types of permissible deformations to ensure registration regularity.For instance, NiftyReg [47] models deformation fields using B-splines with a set of control points.Flow-based methods model the transformations 1 arXiv:2206.03111v2[cs.CV] 22 Aug 2022 via a series of time-dependent velocity fields [9,69] or stationary velocity fields [4], and impose strong assumptions on the space of permissible velocity vector fields.The strong modeling assumptions produce well-behaved transformations, but sometimes also lead to detrimental registration outcomes.To improve optimization-based registration requires a more flexible framework for modeling the space of permissible transformations.In addition, optimizationbased methods are often time-consuming.
Recent advances in deep learning have inspired the development of learning based image registration methods [8,15,35,36,38].The learning-based registration methods are trained to directly output transformations between two images.Although training may take time, predictions are usually generated through a feed-forward model and therefore are very fast.However, in terms of registration accuracy, learning-based methods still lag behind the optimization-based ones under unsupervised settings, even with very complex and large-scale network structures utilized in recent works [11,38,53].Part of the reason is due to the discrepancy between the performances of the models on training data vs. test data.Benefiting from high representational capacity of deep neural networks, learning-based methods can generate high quality transformations between training image pairs, but often generalize poorly on previously unseen image pairs.Inadequacies in size and diversity of medical datasets accentuate the generalizability issue.To alleviate the generalizability issue, several recent works [25,27,54,73] have advocated a two-step approach -using learning models to derive an initial registration, followed by traditional optimization methods for fine-tuning.
Naturally, we ask: Can the optimization-based registration also leverage the expressive power of deep neural nets?To this end, we propose NIR (stands for Neural Image Registration), an optimization-based framework that solves medical image registration via neural fields.Neural fields are a class of neural networks, also called coordinate-based neural multilayer perceptrons (MLPs) or implicit neural representation (INR), that map a point in space and time to a continuous quantity.Previously we demonstrated the effectiveness of neural fields in modeling diffeomorphic transformations for anatomic shape analysis [56].This motivates us to apply neural fields to model deformable and diffeomorphic registrations between images.NIR provides two ways of modeling image deformations, either directly modeling the displacement vector field or modeling the velocity vector field.In both cases, the neural field within NIR takes as input a 3D coordinate of the source image and outputs a 3D vector (either displacement or velocity) at the location.In the second case, the velocity vector field is further integrated through a Neural ODE Solver [12] to produce the final deformation field, thereby ensuring that the resulting deformation is diffeomorphic.
Modeling deformation fields as coordinate-based MLPs, supplemented with additional features such as Fourier position encoding [58] and periodic activation functions [55] in NIR, offers several advantages.First, the neural deformation model is simple and flexible, and yet still has great expressive power.It can use a relatively small number of coefficients to encode signals with an exponentially large frequency support [68].Deformations with high frequencies can be captured by scaling up the number of hidden layers and neurons.Second, different from other neural nets defined on discrete grid coordinates like convolutional neural networks (CNNs), coordinate-based MLPs are defined on the continuous coordinate space.Neural fields can be optimized to model fine deformations with sampled data points and does not require dense input.Consequently, optimizing neural fields is memory-efficient.Third, the optimization of neural fields can take full advantages of the modern high-performance automatic differentiation toolboxes such as PyTorch [42], Tensorflow [1] and JAX [10].
We use stochastic gradient descent (SGD) to find the optimal parameters of the neural field in NIR and design efficient coordinates sampling strategies to run SGD.Taking registration accuracy, registration regularity, as well as convergence rate into account, two coordinate samplersdownsize sampler and mini-patch sampler, are examined.Downsize coordinate sampler offers faster convergence and higher registration accuracy, whereas mini-patch coordinate sampler is more beneficial to the regularity of deformation fields.To bring together the strengths of these two coordinate samplers, we further propose a hybrid sampler for NIR, comprising two concatenated neural fields optimized with a downsize and a mini-patch coordinate sampler separately.Our experiments show that NIR with a hybrid sampler can perform well in both registration accuracy and regularity without significantly compromising optimization efficiency.
The main contributions of our work are summarized as follows: • We introduce NIR, a novel optimization-based deformable image registration framework that models the displacement field or velocity field via lightweight coordinate-based MLPs with Fourier position encoding and sinusoidal activation functions.
• We further propose a hybrid sampling scheme, composed of two stacked neural fields, separately optimized with two different coordinate samplers, to efficiently solve optimization in NIR via SGD.
• NIR is evaluated on two brain MRI datasets and shows state-of-the-art registration results in multiple metrics, including intensity similarity between fixed and transformed images, regularity of the transformation, and GPU consumption.It runs significantly faster than traditional optimization based methods, while requiring less memory than learning based methods (less than 3500MB GPU memory).

Optimization-based Registration Methods
Extensive works have been conducted in 3D deformable image registration through the decades [2,4,6,9,16].Several studies solve the task of image registration as an optimization problem in the space of displacement vector fields.They optimize the deformable model iteratively with the constraint from a smoothness regularizer which is typically a Gaussian smooth filtering.These include elastictype models [6], free-form deformation with B-splines [47], statistic parametric mapping [3], local affine models [26] and Demons [59].Diffeomorphic image registration with the attributes of topology preserving and transformation invertibility also achieve remarkable progress in various anatomical studies.Some of the popular methods include Large Diffeomorphic Distance Metric Mapping (LDDMM) [9], DARTEL [2] and standard symmetric normalization (SyN) [4].In this field, the deformation is modeled by integrating its velocity over time according to the Lagrange transport equation [14,19] to achieve a global one-to-one smooth and continuous mapping.

Learning-based Registration Methods
Many learning-based methods [7, 11, 15, 29, 35-38, 49, 53, 70] are proposed to provide promising registration results with fast inference speed and high registration accuracy.By learning the representation of images through large amount of training data, the neural networks are able to capture the difference between input pair of images and predict the transformation.VoxelMorph [7] utilizes the UNet-like structure to directly regress the deformation fields by minimizing the dissimilarity between input and target images.SYM Net [35] provides a symmetric registration method which estimates the forward and backward deformation simultaneously within the space of the diffeomorphic maps.LapIRN [36] avoids the local minima of registration in a coarse-to-fine fashion.A recursive cascaded network [71] was proposed to iteratively apply the registration network to the warped moving image and fixed image.DTN [70] deploys a transformer over the CNN backbone to capture the semantic contextual relevance and enhance the extracted feature from backbone.MS-ODENet [65] chooses to learn a registration optimizer via a multi-scale neural ODE model and proposes the cross-model similarity metric to alleviate the appearance difference in different contrast levels.

Neural Fields for Visual Computing
Recently, neural fields have advanced as a popular technique in solving visual computing problems.It uses coordinate-based neural networks to parameterize the physical properties of scenes and objects across space and time [64].Initially, neural fields were designed to solve the shape representation problem [39,41].From then on, neural fields have been applied in more computer vision tasks.Neural Radial Field introduced in [34] is designed to achieve viewdependent scene representation.Periodic sinusoidal activation function [55] are proposed to replace the relu-based activation functions for the better representation of complex natural signals.Coin [18] compress an image by storing the weights of a neural field overfitted to it.

Deformation Representation
Neural Fields can be used to represent continuous transformation with more flexibility.As target geometry and appearance are often modeled with neural fields, it is natural to use neural field to represent the transformation.[39] performs 4D reconstruction via learned temporal and spatially continuous vector field.Neural Mesh Flow [24] focuses on generating manifold mesh from images or point clouds via conditional continuous diffeomorphic flow.PointFlow [67] incorporates continuous normalizing flows with a principle probabilistic framework to reconstruct 3d point clouds.DiT [72] builds up the dense correspondence across shapes in one category by decomposing DeepSDF [41] into a deformation network and a single shape representation network.

Medical Imaging Application
Neural fields have been applied in some medical image analysis tasks, such as 3D image reconstruction or representation.[57] tries to augment the quantities measured in the sensor domain and reconstructs images with less measurement noise.[51] predicts the density value at a 3D spatial coordinate, and is supervised by mapping its value back to the sensor domain.[62] views the 2D slice as the samples from 3D continuous function and reconstructs 3D images from the observed tissue anatomy.NDF [56] follows the paradigm of DiT and proposes to model the topology preserving transformation between each organ shape instance and the learned shape template via neural diffeomorphic flow.
Two recent independent works, IDIR [61] and NODEO [63] also proposed optimization-based pair-wise image registration methods utilizing coordinate-based neural networks.IDIR is a direct extension of SIREN [55] predicting the displacement vectors of randomly sampled query coordinates during optimization.Similar to our work, NODEO also leverages Neural ODE [12] to integrate the velocity fields to obtain the deformation fields.However, NODEO uses a completely different network architecture.Their neural velocity field is based on a Unet-like 3D CNN model with fully connected bottleneck layers, whereas ours is a simple MLP with coordinate encoding and sinusoidal activation functions.During optimization, NODEO receives the whole grid coordinates as input and predicts the entire deformations in every iteration.Thus, NODEO requires a large memory footprint.To reduce memory consumption, NODEO must reduce the spatial size or the channel number of the feature maps, making it difficult to represent the fine deformations.Let T ∈ R D×H×W and M ∈ R D×H×W denote the target and moving volumetric images, respectively.Let φ : Ω ⊂ R 3 → Ω be the deformation field between T and M .The unsupervised image registration is commonly formulated as an optimization problem:

Method
where the cost function includes two terms: a) L sim , measuring image similarity between the target and warped moving volumes, and b) L reg , a regularization term on the deformation field.M • φ denotes M warped by the deformation field φ. λ reg is a hyperparameter controlling the relative weight of the regularization term .Registration field φ is represented either directly via a displacement field u with φ = Id + u, where Id is the identity map [6,8], or indirectly via a velocity vector field v, the integration of which leads to φ.The second approach is preferred if we require the registration field to be diffeomorphic, i.e., invertible and topology preserving [9,35].

Neural Fields
Both displacement fields and vector fields are modeled by a coordinate-based neural net, referred to as neural field F θ : R 3 → R 3 , which provides a continuous mapping from 3D coordinate p to the displacement or velocity vector at that position.θ denotes the parameters of the neural net.Neural fields provide a flexible framework for modeling registration field, powerful enough to model highly complex deformations, while maintaining analytic differentiability and allowing us to leverage powerful optimization tools in existing deep learning toolboxes [22].
The neural fields used in this work all consist of a coordinate encoding layer γ, followed by a multilayer perceptron (MLP) whose weights, bias and activation function at the -th layer are denoted as W ( ) , b ( ) and ρ ( ) , respectively.The activities of neurons at each layer are computed sequentially as follows, where p is the input coordinate and F θ (p) denotes the output displacement vector or velocity vector at p.

Overview of NIR
NIR uses neural fields to represent the transformation between two medical images.It solves the image registration problem Eq. ( 1) by optimizing θ.The optimization is solved via stochastic gradient descent by finding a stochastic approximation of the objective function Eq. ( 2) through sampling, as opposed to batch gradient descent, which requires a complete calculation of Eq. ( 2) and therefore is both memory-demanding and less efficient.
NIR consists of three main components -Coordinate Sampler (CS), Neural Field (NF), and Intensity Sampler (IS) (Fig. 1).CS samples coordinates from the 3D grid points of T , randomly at each step of the optimization.The sampled points are sent to NF, which maps position p ∈ R 3 in the coordinate space of T to position p ∈ R 3 in the coordinate space of M .IS returns image intensities at query locations of source and target images.Let I T p denote the intensity of p on T and I M p denote the intensity of p on M .The sampled image intensities are then used to calculate the similarity loss L sim (e.g., local normalized crosscorrelation loss) between I T p and I M p , as well as the smooth term L Jdet .
The inference mode of NIR is much simpler: the pretrained neural field takes the whole grid coordinates as input and outputs the deformations at all input coordinates.The warped volume W is then obtained by sampling intensities from the moving volume M given the deformed coordinates.
In Sec.3.3, we describe the network design of NF.In Sec.3.4, we go over several optimization components, including CS, IS, and the objective functions.In Sec.3.5, we present hybrid coordinate sampling scheme that strikes a balance between registration accuracy and regularity and maintain the optimization efficiency.

Network Design
As illustrated in Fig. 1, NF takes as input a 3D coordinate p ∈ R 3 in T and outputs the corresponding coordinate p ∈ R 3 in M .The transformation from p to p can be parameterized in two options: 1) use a neural field to directly predict the the displacement vector, or 2) use a neural field to predict the velocity vector, the integral of which leads to the deformation vector.Both neural displacement field and neural velocity field can be formulated as the Eq. 3 and next we will look into the design of each component in our neural fields.

Coordinate Encoding
Coordinate encoding module maps three-dimensional input coordinates to a higher-dimensional embedding [34,58].The mapping can be realized by a family of functionals e i : R 3 → R 2 , written as: We follow the suggestion from [58], encoding coordinates via Fourier mapping, such that where ω i ∈ R 3 is randomly sampled i.i.d.from a Gaussian distribution with standard deviation σ.The higher the  σ, the more likely the model will bias towards the highfrequency signal.In our experiments, n and σ are set to be 128 and 3 no matter which neural field and coordinate samplers we choose.

Sinusoidal representation networks (SIRENs)
On top of coordinate encoding layer, the main body of our neural field is a SIREN network [55], in which all neurons are activated with sinusoidal functions, i.e., ρ ( ) = sin.Notably, the first layer of SIREN networks can be written as z (1) = sin ω 0 W (0) z (0) + b (0) .Thus, similar to Fourier coordinate mapping, SIRENs can also regulate the spectral bias of the network by adjusting the network hyperparameter ω 0 , which is set to be 30 for all our experiments.
[68] reveals that the expressive power of coordinatebased MLP with sinusoidal encodings is equivalent to that of a structured signal dictionary, which is restricted to functions that can be expressed as a linear combination of certain harmonics of the coordinate encoding γ(p).SIREN can be seen as the nested sinusoids and the few coefficients of this network are enough to represent signals with an exponentially large frequency support.

Neural Displacement Field
Neural displacement field F θ takes as input a 3D location p in T and outputs a displacement vector φ p = [φ px , φ py , φ pz ] T = F θ (p).As a result, the deformed position p in M is p + φ p .

Neural Velocity Field
Under this option, our proposed framework can perform diffeomorphic image registration.Let Φ(p, t) : where v(p, t) : Ω × [0, 1] → Ω indicates the velocity vector of coordinate p at time t.If v is Lipschitz continuous, a solution to Eq. ( 6) exists and is unique in the interval [0, 1], which ensures that any two deformation trajectories do not cross each other [17].In this work, we assume that v is stationary and can be modeled via a neural field, written as The initial value problem (IVP) in Eq. 6 can be solved with a Differentiable ODE Solver (NODE) [12] whose dynamic function is set to be F θ .Considering the trade-offs between speed and accuracy, we choose the Fourth-order Runge-Kutta method (rk4) with step size of 0.25 as the ODE solver for our diffeomorohic registration experiments.In the forward pass, the deformed position p of position p can be estimated by integrating F θ (p) from t = 0 to t = 1 via NODE, formulated as For backpropagation, NODE adopts the adjoint sensitivity method [44], which retrieves the gradient by solving the adjoint ODE backwards in time and allows solving with O(1) memory usage no matter how many steps the ODE solver takes.

Optimization
In this section, we will introduce the intensity sampler, objective functions as well as coordinate sampler used in our NIR.

Intensity Sampler
To utilize gradient-based optimization method, a differentiable intensity sampler is required to estimates the intensities of sub-voxel positions given source images.Same as [8,31,35,36], we apply linear interpolation (other interpolation methods can also be applied) as intensity sampler, referred as IS linear .Given a coordinate c and scans S, the intensity value at c, referred to as I S c , is obtained based on the intensities of the eight surrounding voxels, where Z (c ) denotes the voxel neighbors of c, d iterates the dimension index, and S(c i ) indicates the intensity value at voxel c i on volume S.

Objective Functions
Local normalized cross-correlation is adopted to measure the intensity similarity.Let ĪS c denote the intensity mean of local region centering at position c on volume S. In our experiments, ĪS c = c i I S c i w 3 , where c i iterates over the local region in the size of w 3 with w = 9.Then local normalized corss-correlation can be defined as below: where p denotes the sampled position in the coordinate of target volume T , and p denotes deformed position in the coordinate of moving volume M .As for the regularization term, we follow [35] to impose the Jacobian determinant penalty on the predicted deformation field.The Jacobian matrix of the deformation field φ at a position p, notated as J φ (p), is given by: where φ p = [φ px , φ py , φ pz ] ∈ R 3 denotes the deformation vector at position p.If |J φ (p)| is positive, it is suggested that the deformation field preserves the local orientation near p.Conversely, if |J φ (p)| is negative, the deformation field reverses the local orientation around p. Thus, the local orientation consistency constraint can be defined as which only penalizes the regions with negative Jacobian determinants.In our experiment, J φ (c) is approximated as the differences between neighboring deformation vectors.Overall, the objective function L can be expressed as the weighted sum of the intensity similarity L sim and the Jaco-bian determinant regularization L reg , where Here, N is the toal number of sampled locations per optimization iteration and p j denotes the jth location sampled in one batch.

Coordinate Sampler
To optimize the parameters of our neural fields, we apply mini-batch stochastic gradient descent method.In other words, we sample a subset of coordinates of the whole image grid to update the model parameters per iteration in optimization.Next, we will discuss three different coordinate samplers: random sampler, downsize sampler and mini-patch sampler.
Random Sampler (Fig. 3a) is most commonly used in coordinate-based neural networks [13,39,55] because the coordinates sampled via a random sampler are distributed across the whole grid and the unbiased sampled coordinates allow for the more stable optimization.But random coordinate sampler is inapplicable in our case.To compute LN CC, we need to search closest coordinates among all sampled coordinates to estimate the local intensity mean and correlation, whose consequence is that the optimization speed can be significantly impeded by the searching time.Moreover, randomly sampling coordinate will bring about larger memory consumption for calculating LOCC.As we mentioned in Sec.3.4.2,we approximate the Jacobian matrix of the deformation field by discretizing the image coordinate space, asking for the coordinates to be sampled in a spatial regularity.If the sampled coordinates are distributed randomly, the Jacobian matrix requires extra memory for the second-order derivatives of deformation field with respect to model parameters during optimization.After all, considering the time and memory deficiency, random coordinate sampler is an impractical choice for our NIR.
Downsize Sampler samples coordinates with specific step size in each dimension as shown in Fig. 3b.Coordinates sampled by downsize sampler can well cover the entire image coordinate space but the approximation of Jacobian matrix might be of more flaws due to downsizing.The consequence is, the neural fields optimized via downsize coordinate sampler achieve great alignment accuracy but relatively bad local orientation consistency in deformations.The down-sampling step size used in downsize coordinate sampler is set as 3 along all dimensions in our experiments.
Mini-Patch Sampler randomly selects multiple highresolution small coordinate blocks as shown in Fig. 3c.Compared to downsize coordinate sampler, it can provide more accurate Jacobian matrix approximation but the drawback lies in the computation of local normalized crosscorrelation.Specifically, the extensive padding operations along the patch borders result in the inaccurate local normalized cross-correlation.Thus, the neural fields optimized via mini-patch coordinate sampler are good at registration regularity but bad at alignment accuracy.In our experiments, the mini-patch coordinate samplers randomly select 5 patches in size of 32 × 32 × 32 per optimization iteration.
Fig. 3d demonstrates the rank of the registration performance of two candidate coordinate samplers with the spatial regularity in four criteria -accuracy, regularity, memory consumption, and converge speed.It is apparent in Fig. 3d that no coordinate sampling strategy can outperform the others in all criteria.Downsize coordinate sampler is good at criteria in all aspects but the registration regularity, which happens to be the strength of mini-patch coordinate sampler.The expected solution should have high registration accuracy, minor distortions in the deformation field, rapid converge rate as well as little memory consumption, as indicated by the red-dot region in Fig. 3d.Please refer to Sec. 4.5 and Sec.3.4.3for more details in the evaluation metrics and quantitative comparisons of downsize sampler and mini-patch sampler.

Overview
We intend to enhance the complementarity of the downsize and mini-patch coordinate samplers without the substantial increase in memory and time consumption during optimization.To this end, we propose a hybrid coordinate sampler which performs two different coordinate sampling strategies in two phases of optimization.As shown in Fig. 4, NIR with a hybrid coordinate sampler consists of two concatenated neural fields optimized separately.The first neural field (N F 1 ) takes in charge of the rough alignment between the moving and target scans and the residual transformation is completed by another neural field (N F 2 ).In inference, N F 1 and N F 2 deform the whole grid in cascade, which means the output of N F 1 is taken as the input of N F 2 and then N F 2 outputs the final deformed grid.As for optimization, the parameters of N F 1 and N F 2 are updated with the downsize sampler CS 1 and mini-patch sampler CS 2 separately in two phases as depicted in Fig. 4a.

Optimization
The example in Fig. 5 demonstrates that, the downsize sampler can generate more accurate registration in the price of   more distortions in the deformation field while the minipatch sampler tends to provide the over-smooth deformation fields and results in much slower convergence speed.We also noticed that, in the very early stage of optimization, the regularity of deformation field from NIR optimized with the downsize sampler is well-preserved and at the same time, registration accuracy is quite decent.But as optimization time grows, the fraction of positions with a negative Jacobian determinant increases a lot.One possible explanation is that, owing to the spectral bias of neural networks, the fields tends to reconstruct the lower-frequency signal in the beginning of optimization.
The optimization strategy for NIR with a hybrid sampler is motivated by the above observations and is conducted in two phases.In the first phase, N F 1 is optimized with the downsize coordinate sampler CS 1 for a short time.After the first-phase optimization, N F 1 is able to generate the smooth and relatively accurate transformation.Then the goal of the second-phase optimization is to let N F 2 complete the transformation left unfinished by N F 1 .We prefer a neural registration field that can align the more detailed structures and doesn't mess up the underlying topology in the second phase of optimization.For this reason, the input coordinate p 2 for the second-phase optimization are sampled by the mini-patch sampler CS 2 and the initial deformed positions p 2 for N F 2 are predicted by N F 1 .Notably, taking optimization stability and memory efficiency into account, only N F 2 is optimized in the second-phrase optimization while weights of N F 1 are frozen.
Similar to NIR, the optimization objective functions of NIR with a hybrid sampler is given by: where w 1 = 1, w 2 = 0 in the first phase of optimization and w 1 = 0, w 2 = 1 in the second phase.L 1 sim , L 2 sim , L 1 reg and L 2 reg follows the same definition as introduced in Sec.3.4.2.

Dataset
All our experiments are conducted on two public 3D brain MR datasets -Mindboggle101 and OASIS.
Mindboggle101 [33] consists of 101 T1-weighted MR scans of healthy participants coming from 5 data sources.We select 31 cortical regions as [66] for evaluation.
OASIS dataset contains 416 T1-weighted MR images of subjects aged from 18 to 96 including individuals with early-stage Alzheimer's Disease (AD).[28] annotated 35 anatomical structures associated with OASIS dataset and 27 of them are selected for the performance evaluation in experiments.
All MRI images used in our experiments are preprocessed by the same procedures, which sequentially are skull stripping, resampling to 1mm × 1mm × 1mm spacings, affinely align to MNI template of T1-weighted MRI imaging [20,21] and cropping to size of 160 × 192 × 144.Since all images have already been aligned to the MNI template, we focus on the non-linear deformation between pair of images in our experiments.

Implementation Details
The neural displacement field in Sec.3.3.3 contains 4 fully connected layers with hidden feature in the length of 256.Due to the computation inefficiency of NODE, we design shallower SIREN models for neural velocity field in Sec,3.3.4,which contains 3 fully connected layers with 256 hidden feature size.The network parameters are updated using Adam optimizer [32] with a learning rate of 1e −4 .During the optimization, λ reg are set as 1000 for our displacement-based deformable registration methods and 100 for our diffeomorphic registration methods.For NIR, the maximum optimization iterations is set to be 900.For NIR with a hybrid coordinate sampler, the first phase is optimized for 200 iterations and the second phase is further optimized for 900 iterations.Our framework is implemented with PyTorch [42] and all experiments are deployed on a machine with a NVIDIA GTX 2080Ti GPU and an Intel i7-7700K CPU.

Experimental Setup
All medical image registration methods in our experiments aim to transform a moving volume to a target volume.If the structure labels in association with the moving volume are available, we can map the structures onto the target volume via the transformation obtained from the image registration task.The registration results are evaluated on the similarity between target volumes (structures) and warped volumes (structures), and the local orientation consistency of the deformation fields.
In this paper, we conduct three groups of unsupervised registration experiments where (1) 5 moving scans and 40 target scans both come from the Mindboggle101 dataset; (2) 5 moving scans and 40 target scans both come from the OASIS dataset; (3) 3 moving scans of healthy brains come from the Mindboggle101 dataset and 20 target scans with Alzheimer's disease come from the OASIS dataset.In experiment (1), 45 image scans with structure labels are randomly selected for test and the rest data in the Mind-boggle101 dataset are used for training the learning-based models in comparisons.In experiment (2), we randomly select 45 cases for test and 250 of the rest data are randomly selected as the training data for the learning-based methods in comparisons.The purpose of the first two experiments is to assess the applicability of our proposed methods in the brain MRI registration task.In experiment (3), 3 moving volumes are randomly selected from the Mind-boggle101 dataset and 20 target volumes are randomly selected among the cases with the Clinical Dementia Rating (CDR) larger than 0.5 from the OASIS dataset, i.e., patients who have been diagnosed with moderate Alzheimer's disease.By comparing with the learning-based methods, this experiment can help reveal the performance robustness of our proposed optimization-based methods.
In order to determine some hyper parameters of our proposed methods, such as the learning rate and the regularization weight, 15 image pairs are randomly selected from the Mindboggle101 dataset to be the validation set.These hyper parameters are fixed across the above three groups of experiments.

Methods in comparisons
In Sec.3.1.2,we have introduced two types of neural fields for the displacement-based deformable registration and diffeomorphic registration respectively, both of which can be integrated into the framework of NIR (Sec.3.3) and NIR with a hybrid coordinate sampler (Sec.3.5).We provide several options of running NIR and their names are listed in Tab. 1. Next, we will go through the baseline methods and their training or optimization recipes in our experiments.
VoxelMorph is a well-known learning-based method enabling the pairwise, deformable 3D medical image registra- In our experiments, we followed the original setting and trained the Voxelmorph using LN CC as the similarity metric with a regularization term weighted by 1 to enforce the smoothness of the predicted displacement field.SYM Net is the state-of-the-art learning-based method which provides the diffeomorphic deformable image registration.Unlike VoxelMorph which directly predict the displacement field, SYM Net is a symmetric image registration method which maximize the image similarity inside the space of diffeomorphic deformation and estimate the forward and backward transformation simultaneously.We trained the SYM Net using LN CC with the parameters suggested by their original paper.Specifically, the weights for penalizing the negative jacobian determinant, enforcing the smoothness of velocity field and constraining the bias for bidirectional velocity field are set to 1000, 3 and 0.1 respectively.
SyN is an popular diffeomorphic registration method and we applied the implementation by DIPY [23].We take CC as the metric with the sampling radius as 4 and the standard deviation of the Gaussian smoothing kernel as 2. The results reported in Tab. 2 are achieved with the maximum optimization iterations as {100, 50, 25} at each level.We didn't used their official implementation in the Advanced Normalization Tools (ANTs) [5] because it will take over one hour to register one pair of images with CC as the metric with only {40, 20} optimization iterations at two scales, which is unacceptable.
NiftyReg is the fast free-form deformation algorithm for non-rigid registration.In our experiments, cubic B-spline interpolation is used to deform moving volumes to optimize LN CC image similarity with the squared Jacobian determinant log as a penalty term.The standard deviation of the Gaussian kernel and the weight of the penalty term are set to be 40 and 0.01 separately, as suggested by [52].In addition, three scales are used in optimization with the maximum optimization iterations as {1200, 600, 300} for each scale.
Grid is our implementation of a Demons-based deformable image registration method [43] optimized by gra-dient decent, similar to what Autograd Image Registration Laboratory (airlab) [48] did.Grid applies the same optimization object functions, intensity sampler and optimizer as NIR-D.However, instead of modeling the deformation field as a coordinate-based MLP, Grid directly optimizes the displacement vector of each grid coordinate via gradient decent.Also, the whole grid coordinates are sampled in each optimization iteration.We don't intend to fully reproduce the original Demons via PyTorch but the expressive power of neural fields in modeling deformation fields can be justified if NIR-D outperforms Grid significantly.

Evaluation Metrics
All methods in comparisons aim to map the moving volumes to the target volumes.If the moving volume and target volume come from the same dataset, the Dice's coefficient (DSC) and the ratio of coordinates with non-positive Jacobian determinant (J ≤0 ) are used for evaluation.If the moving volume and target volume don't share the same annotations, we will apply Structural Similarity Index (SSIM ) and J ≤0 for evaluation.
Two types of DSC -volumetric DSC for OASIS data and surface DSC for Mindboggle101 data, are used to evaluate the overlap between two regions.Given the target mask M T and warped mask M W , the volumetric DSC for can be written as As pointed out by [45], the volumetric measurements may lead to the same evaluation score given substantially different regions with complex shapes.In such cases as cortical structures, boundary-based measures are preferred.To be more specific, we use the surface DSC introduced by [40] to evaluate the alignment accuracy in experiment (1).The surface DSC assesses rather than the overlap of two volumetric regions but two surfaces within a specific tolerance τ , formulated as where S i refers to the surfaces of mask M i , and B τ i denotes the border regions for the surface S i within a tolerance τ , which is 1mm in our experiments.For more details about the surface DSC, please look into [40].Both volumetric and surface DSC range from 0 to 1 and higher score represents better registration accuracy.The final reported scores are the average DSC of all structures over all pairs.J ≤0 measures the regularity of deformation fields as the ratio of coordinates with non-positive Jacobian determinant.The Jacobian matrix represents the derivatives of the deformations, indicating the property of the local deformation field.Only the local regions with positive Jacobian determinant is transformed with topology-preservation and invertibility, so larger J ≤0 signifies worse registration regularity.The calculation of the Jacobian matrix of deformations is given in Sec.3.4.2.
SSIM [60] is a weighted sum of three comparison measurements between two images: luminance, contrast and structure.SSIM ranges between 0 and 1, with larger values representing higher similarity between image pairs.Please refer to the original paper [60] for the calculation details.

Quantitative Comparisons with Baselines
In this section, we will present the registration performance comparison among the baseline methods and our proposed methods.The scores of our proposed methods reported in Tab. 2 and Tab. 3 are achieved by 900-iteration optimization.Tab. 2 shows the performance comparisons among all methods in experiment ( 1) and ( 2) over three aspects of criteria -registration accuracy, registration regularity, and maximum GPU memory consumption during optimization and inference.
Compared with the learning-based methods (Voxel-Morph and SYM Net), our GPU memory cost for optimization is less than half of the memory that they consume for training.Moreover, the GPU memory consumption of our methods for optimization is close to that of the two learningbased methods for inference.In terms of registration accuracy, our NIR-H and NIR-H-Diff both perform better than the learning-based methods in both experiment (1) and (2).Especially in experiment (1), our advantages in alignment accuracy applied to almost all annotated structure groups as shown in Fig. 6a.As for the registration regularity, our NIR-H-Diff can also achieve lower J ≤0 than another learningbased diffeomorphic registration method SYM Net (J ≤0 ).It can be observed from Tab. 2 that our performance edge over the learning-based methods get smaller in experiment (2).A possible explanation for this might be that more data are available for training and all data were recruited from the same institution following the similar scanning protocols in OASIS dataset, therefore the generalization issue of the learning-based methods is not fully exposed in this experiment.
Compared with the optimization-based registration methods (NiftyReg and SyN), our NIR-H-Diff and NIR-H can both provide high-accuracy registration performance but only NIR-H-Diff achieve the the top performance in terms of registration regularity.NIR-H is not comparable with diffeomorphic registration method, i.e., SyN, in the metric of J ≤0 .but it only generates about 1/10 folds in deformation fields compared with NiftyReg.Fig. 6a and Fig. 6b present a closer inspection of registration accuracy of methods in comparison, from which we can tell that our ) and Volumetric Dice's Cofficient (DSCv) are respectively applied in Mindboggle101 dataset and OASIS dataset for registration accuracy.The ratio of coordinates with non-positive Jacobian determinant (J ≤0 ) are used to evaluate the registration regularity.The GPU memory consumption in optimizing the hybrid MINRF models varies in two phases because the numbers of sampled coordinates in two phases are different.Specifically, the number of coordinates sampled by the downsize sampler and the mini-patch sampler in two phases are 165888 and 163840, respectively.Thus, the maximum GPU memory consumption for hybrid NIR models comes from the first phase of optimization.proposed method can achieve the best performance in 9 out of 12 structure groups in experiment (1) and 9 out of 15 structure groups in experiment (2).Another important criterion to assess the optimization-based methods is the performance relationship with optimization duration, which will be discussed in Sec.4.9.
Compared with Grid, our NIR-H is significantly better in terms of registration accuracy (¿0.6 greater in DSC) and regularity (≈10x smaller in J ≤0 (↓)), consuming less GPU memory.Because Grid and NIR-H share the similar optimization process but mainly differ in the ways to describe the deformation fields, the significant advantage of our proposed method may suggest the effectiveness of neural fields in modeling the deformation fields.
What's more, Tab. 3 compares the performance of NIR-H, NIR-H-Diff and two learning-based methods in experiment (3).As the moving volumes are healthy scans from the Mindboggle101 dataset and 20 target scans with Alzheimer's disease come from the OASIS dataset, the results in experiment (3) might suggest the robustness of an algorithm against modest domain shift.It is apparent from Tab. 3 that, compared with our proposed method, the learning-based methods learned from the one dataset cannot as well generalize to pair of images coming from different datasets with different health status.To be specific, our NIR-H-Diff method can achieve almost 0.1 higher and more than 10x fewer folds [52] in predicted deformation fields, compared with the state-of-the-art learning-based methods SYM Net.

Qualitative Comparisons with Baselines
Fig. 7 presents the qualitative comparisons between our proposed and baseline methods.From Fig. 7, we can see that all registration methods in comparison can well align the subcortical regions and what makes different lies in the cortex regions within the white dotted boxes.Since the aligned structures in experiment (2) are mostly located in the subcortical regions, the gap between our methods and the baseline methods in experiment (2) should not be as sig-nificant as that in experiment (1), which agrees with the results in Tab. 2 and Fig. 6.Parameterized by a neural field with well-behaved derivatives, the velocity flow is continuously differentiable in NIR-H-Diff.Therefore our registration method doesn't require some explicit smooth operations over the velocity flow to guarantee diffeomorphic transformation.This might explain why NIR-D-Diff can generate deformations with more diverse magnitudes and local orientations on the brain surface, but simultaneously topology preserving is barely touched.

Ablation Study 4.8.1 Influence of Coordinate Samplers
In Sec.3.4.3,we have visually compared the effect of different coordinate samplers on the registration performance, here we present the quantitative comparison to support our analysis and designs.Tab. 4 shows the diffeomorphic registration performance differences in experiment (2), resulting from the selections of coordinate samplers.In the table, there is a clear trend that registration accuracy of both NIR-D-Diff and NIR-P-Diff improves over time, but registration regularity deteriorates.Furthermore, the results indicate that NIR-D-Diff can more quickly converge to the higher registration accuracy than NIR-P-Diff.On the other hand, NIR-P-Diff is able to maintain very small J ≤0 throughout the optimization process, whereas NIR-D-Diff fails.Based on the above observations, we proposed NIR with a hybrid sampling scheme whose registration performance in Tab. 4 justifies our design.NIR-H-Diff barely costs extra memory for optimization and outperforms NIR-D-Diff and NIR-P-Diff in registration accuracy and regularity.It should be clarified that in Tab. 4, the iteration number for NIR-H-Diff refers to the second phase of optimization, which means, NIR-H-Diff requires 200 more iterations than the other two methods.Nevertheless, our design still meets our expectations in terms of developing a method that can converge quickly to a decent registration results with high accuracy and good topology preserving in an efficient way.

Influence of Regularization Weight λ reg
We then investigate whether simply adjusting the weight of regularization term can reach the performance comparable to what NIR with a hybrid coordinate sampler can achieve.All the diffeomorphic registration methods in Tab. 5 are conducted under the setting of experiment (2).Since we are not satisfied with the performance of NIR-D-Diff in terms of registration regularity, we increase λ reg used for optimizing NIR-D-Diff.On the contrary, we decrease λ reg used for optimizing NIR-P-Diff in the hope of improving registration accuracy.It turns out, when NIR-D-Diff achieves the comparable J ≤0 to NIR-H-Diff, the scale of λ reg needs to be 100 times larger than the original value, and its DSC is  greatly decreased.Surprisingly, decreasing λ reg for NIR-P-Diff cannot improve the registration accuracy as expected but indeed harm the registration regularity.In a word, the results in Tab. 5 support that NIR with a hybrid coordinate sampler is a more effective way to balance the registration accuracy and regularity, compared with simply adjusting the regularization weight.
4.9.Optimization Duration v.s.Registration Performance As for our proposed methods, we evaluate their registration performance at {100, 300, 600, 900} optimization iterations.To finish 100-iteration optimization, the displacement field based NIR methods take about 9s and the diffeomorphic NIR methods take about 64s.It needs to be clarified that the optimization iteration of NIR-H and NIR-H-Diff counts from the start of the second-phase optimization.As for SyN, we evaluate the registration performance  2).The solid and dotted curves respectively illustrate the change of registration accuracy and regularity over optimization duration.In the bottom half of this plot, the higher a solid curve goes, the better registration accuracy it indicates.While in the top half this plot, the lower a dotted curve goes, the better registration regularity it reflects.Thus, visually speaking, a method is preferred if its solid and dotted curves get close over time.
when setting the maximum optimization iteration as {8, 4, 2}, {20, 10, 5}, {60, 30, 15} and {100, 50, 25} at each level, and the average of corresponding optimization time is about 117s, 261s, 829s and 1273s.In terms of NiftyReg, we evaluate the registration performance with the maximum optimization iteration as {120, 60, 30}, {400, 200, 100}, {800, 400, 200} and {1200, 600, 300} at each level, and the average optimization time is approximately 309s, 901s, 1638s and 2521s.Among all methods in comparison, NIR-D has the fastest converge speed and the highest DSC v (0.8435), but it also achieves the highest J ≤0 (1.08e-03).NIR-H can mitigate the registration regularity issue of NIR-D at the cost of lower registration accuracy.From Fig. 8, we can see that NIR-H has the potential to get improved in both DSC v and J ≤0 if the optimization duration is extended.NiftyReg performs relatively bad at converge speed, registration accuracy as well as registration regularity.Because GPU acceleration has been disabled by the official implementation of NiftyReg, optimizing with LN CC similarity becomes so time-consuming that it is even slower than those methods supporting diffeomorphic transformations.
NIR-D-Diff is capable of reaching a very decent registration accuracy (DSC v ≥ 0.83) in a short amount of time (≈ 200s), but the registration regularity degrade as the number of optimization iterations grows.NIR-H-Diff is proposed to achieve a better balance between the registration accuracy and regularity.While achieving the similar DSC v as NIR-D-Diff, NIR-H-Diff can obtain considerably greater regularity of deformation fields, i.e., J ≤0 stays lower than 5e-06 during optimization.SyN shows very strong performance in experiments (2), especially in terms of registration regularity.Despite this, it is demonstrated in Fig. 8 that our approaches have two main advantages over SyN.First, optimized for the similar duration, our NIR-D-Diff and NIR-H-Diff can achieve higher DSC v scores than SyN.Second, SyN gets significantly worse registration regularity as optimization iterations in the finer scale get increased, and ends with a higher J ≤0 than our NIR-H-Diff.

Limitations and Future Directions
One major limitation of NIR is its running time.Although significantly faster than traditional optimizationbased methods, it is still much slower than learning-based methods.There are a few potential approaches to address this limitation.First, we can design an adaptive coordinate sampler that samples coordinates sparsely in easy-toalign regions, but densely in the regions with large alignment errors.Second, NIR can be used in conjunction with a learning-based method in a two-step approach, using the learning-based method to generate an initial registration, followed by fine-tuning through NIR.Third, neural fields can also be integrated into a learning-based framework [41,56,72], where the coordinate-based MLPs and an embedding layer are learned from the training data.During inference, the parameters of coordinate-based MLPs are fixed and merely a latent code associated with the test data is optimized.
In addition, how to introduce surface registration into our image registration framework is a topic worth exploring.NIR establishes correspondence between image pairs to match voxel intensities.It is agnostic to anatomic struc-tures within the images and thus does not always lead to semantically meaningful registrations.One future direction in this regard is to optimize both intensity and shape similarities between two images.Since shape registration can also be realized via neural fields as we showed previously [56], neural fields provide a promising approach to unify both intensity-based and shape-based registrations within the same framework.

Conclusions
We presented a new optimization-based framework, named NIR, for deformable image registration.NIR uses coordinate-based MLPs with Fourier position encoding and sinusoidal action functions to model deformation vector fields, and leverages the full power of existing deep learning toolboxes to solve the optimization efficiently.
We presented several options of running NIR, depending on the type of registrations (deformable or diffeomorphic) and the speed requirement: a) NIR-D: the fastest displacement-based deformable registration method; b) NIR-H: a rapid displacement-based deformable registration method with a better registration regularity compared to NIR-D; c) NIR-D-Diff: a diffeomorphic registration method with a good registration regularity; and d) NIR-H-Diff: a slightly slower diffeomorphic registration method with the best registration regularity.
We compared our methods with several benchmarks on two brain MRI datasets and show that our methods achieve state-of-the-art performances in both registration accuracy and regularity.Compared to the traditional optimizationbased methods, our methods achieve competitive results with significantly shorter running time.Compared to the learning-based methods, our methods show significantly better generalization ability.
Modeling deformation with neural fields offers some major advantages -can model complex deformations with the expressive power of deep neural nets, and can solve optimization efficiently with existing deep learning toolboxes.We believe it offers an appealing alternative for solving the long-standing image registration problem.

Figure 1 .
Figure 1.Overview of NIR, which is a optimization-based pairwise medical image registration framework via neural fields.In each iteration of optimization, every position p is sampled from the coordinate of target volume T and the deformed position p is predicted by NF.The intensity similarity loss Lsim between sampled image intensities is govern by the local normalized crosscorrelation and the regularization term L jedt penalizes the regions where the local deformation orientations are inconsistent, as formulated in Eq. 14.During inference, the neural field takes as input the whole grid and outputs the deformed positions of the whole grid.Then, by sampling the intensity of the deformed grid on the moving volumes, we can get the warped volumes.Plot (b) only presents the transformation of moving volumes via NIR, but the structures associated with the moving volumes can also be transformed in the same way.

Figure 2 .
Figure 2. Neural Fields for Coordinate Deformations -In the above figure, blue modules indicate the parameters to be optimized.(a) illustrates the neural deformation field that directly transforms the coordinate p in the target volume to the coordinate p in the moving volume.(b) illustrate the neural velocity field which predicts the stationary velocity vector along the deformation trajectory from p to p .The neural velocity field plays as the dynamic function of a NODE solver and the final deformations are obtained via the integration of the predicted velocity vector.
3 define a continuous, invertible trajectory from the initial position p = Φ(p, 0) to the final position p = Φ(p, 1), satisfying such ordinary differential equation (ODE) and the initial condition:

Figure 3 .
Figure 3. Coordinate Samplers and Performance Comparisons.(a), (b) and (c) illustrate sampling 16 coordinates per batch from total 64 2D coordinates with three kinds of coordinate samplers.(d) ranks the registration performance of NIR models optimized with two practical coordinate samplers (downsize sampler and mini-patch sampler) in four aspects.The higher ranking in each dimension indicates better performance in that aspect.As is shown in (d), consuming almost the same GPU memory during optimization, compared to NIR optimized with the mini-patch sampler, NIR optimized with the downsize sampler can take less time to converge to a more accurate registration results with more violations in topology preserving.The expected solution, as indicated by the red-dot line, should be of great performance in both registration accuracy and regularity with no or modestly extra computations.For the numerical results supporting the ranking in plot (d), please refer to Tab. 4.

Figure 4 .
Figure 4. Overview of NIR with Hybrid Coordinate Sampling Scheme.The optimization is composed of two phases, in which two neural fields (N F1 and N F2) are optimized separately.In the first phase, N F1 is optimized with the downsize coordinate sampler (CS1) for 200 iterations.In the second phase, with the minipatch coordinate sampler (CS2), fixed N F1 provides the initial deformations and only N F2 is optimized.During inference, NIR with hybrid coordinate sampler requires grid coordinates to pass through two neural fields in sequence to get the deformed coordinates.

Figure 5 .
Figure 5.Comparison of different coordinate samplers (a), (b) and (c) are registration results of NIR optimized with the downsize sampler, mini-patch sampler and hybrid NIR.The above image pair are 'OASIS OAS1 0001 MR1' (T ) and 'OASIS OAS1 0002 MR1' (M ) from the OASIS dataset and we present the registration results over the optimization iterations, generated by the differomorphic NIR.DSC and J ≤0 are the evaluation metrics for registration accuracy and regularity separately.Details about the dataset and evaluation metrics can be found in Sec. 4.

Figure 7 .
Figure 7. Qualitative Registration Performance Comparison of Different Methods.The models in qualitative comparison are Voxel-Morph, SyM Net, NiftyReg, NIR-H and NIR-H-Diff.In the above plots, we present two volume pairs from experiment (1) and experiment (2) in two views.The warped volumes generated by different methods are overlapped with the warped structures which are indicated by colors.The key differences in registration quality of different methods are highlighted by the white dotted boxes.The deformation fields are illustrated by the downsized deformed grid in blue and the regions with negative jacobian determinant are colored in red.The last row in Fig.(a) and (b) are the quantitative performance of different registration methods on that image pair.If the J ≤0 is less than 1e − 06, we take it as ≈ 0.

Fig. 8
Fig. 8 presents the relationship between registration performance in experiment (2) and optimization duration of six optimization-based registration methods, four of which are our proposed methods and the other two are SyN and NiftyReg.As for our proposed methods, we evaluate their registration performance at {100, 300, 600, 900} optimization iterations.To finish 100-iteration optimization, the displacement field based NIR methods take about 9s and the diffeomorphic NIR methods take about 64s.It needs to be clarified that the optimization iteration of NIR-H and NIR-H-Diff counts from the start of the second-phase optimization.As for SyN, we evaluate the registration performance

Figure 8 .
Figure 8. Optimization Duration v.s.Registration Performance in Experiment(2).The solid and dotted curves respectively illustrate the change of registration accuracy and regularity over optimization duration.In the bottom half of this plot, the higher a solid curve goes, the better registration accuracy it indicates.While in the top half this plot, the lower a dotted curve goes, the better registration regularity it reflects.Thus, visually speaking, a method is preferred if its solid and dotted curves get close over time.

Table 1 .
Names of Options under NIR Framework

Table 2 .
Registration Performance Comparison on Mindboggle101 dataset and OASIS dataset.The GPU memory consumption for the learning-based methods are "training consumption -inference consumption", but for our proposed methods, are just maximum memory consumption during optimization.Surface Dice's Cofficient within 1mm tolerance (DSC 1mm s

Table 3 .
Registration Performance Comparison on Experiment (3). 3 target volumes from Mindboggle101 dataset and 20 moving volumes from OASIS dataset are randomly selected to conduct the cross-dataset image registration experiments.The learning-based methods are trained with the training set of Mind-boggle101 dataset.

Table 4 .
Registration Performance Differences Resulting from Coordinate Samplers.The below shows the comparisons of diffeomorphic NIR frameworks optimized via the downsize sampler and mini-patch sampler as well as hybrid diffeomorphic NIR.The comparisons are based on the registration accuracy (DSCv), registration regularity (J ≤0 ) and converge speed (Iteration).The iteration number of NIR-H-Diff is that of the second phase of optimization.This table supports the qualitative comparisons of different coordinate samplers as shown in Fig.3d.

Table 5 .
Influence of Regularity Weight λ jdet .The effect of λ jdet in the balance of DSCv and J ≤0 can be observed on NIR-D-Diff and NIR-P-Diff.However, by merely adjusting the scale of λ jdet , both methods cannot outperform NIR-H-Diff in terms of registration accuracy and regularity.