Adaptive Magnetic Hamiltonian Monte Carlo

Magnetic Hamiltonian Monte Carlo (MHMC) is a Markov Chain Monte Carlo method that expands on Hamiltonian Monte Carlo (HMC) by adding a magnetic field to Hamiltonian dynamics. This magnetic field offers a great deal of flexibility over HMC and encourages more efficient exploration of the target posterior. This results in faster convergence and lower autocorrelations in the generated samples compared to HMC. However, as with HMC, MHMC is sensitive to the user specified trajectory length and step size. Automatically setting the parameters of MHMC is yet to be considered in the literature. In this work, we present the Adaptive MHMC (A-MHMC) algorithm which extends MHMC in that it automatically sets the parameters of MHMC and thus eliminates the need for the user to manually set a trajectory length and step size. The trajectory length adaptation is based on an extension of the No-U-Turn Sampler (NUTS) methodology to incorporate the magnetic field present in MHMC, while the step size is set via dual averaging during the burn-in period. Empirical results based on experiments performed on jump diffusion processes calibrated to real world financial market data, a simulation study using multivariate Gaussian distributions and real world benchmark datasets modelled using Bayesian Logistic Regression show that A-MHMC outperforms MHMC and NUTS on an effective sample size basis. In addition, A-MHMC provides significant relative speed up (up to 40 times) over MHMC and produces similar time normalised effective samples sizes relative to NUTS.


I. INTRODUCTION
Markov Chain Monte Carlo (MCMC) methods are a key inference tool in the inference of probabilistic machine learning models [1,2]. One of the simplest MCMC methods is the random walk Metropolis-Hastings [3] algorithm which suffers from random walk behaviour and consequently generates highly correlated samples [4]. This method has been extended using gradient-free MCMC techniques such as multiple-try Metropolis, delayed rejection techniques and methods that adapt the proposal distribution [5,6,4]. It has been further enhanced through methods that employ gradient information of the target posterior such as the Metropolis Adjusted Langevin Algorithm (MALA) and Hamiltonian Monte Carlo (HMC) [7,8].
HMC has been successfully employed in the inference of probabilistic machine learning models and has been applied in various fields including renewable energy, health and cosmology [8,2,9,7,10,11,12,13,14,15,16,10,15,17]. This algorithm is the preferred MCMC method in practice due to its ability to incorporate first-order gradient information about the target posterior distribution. This leads to better exploration of the posterior and lower autocorrelations in the generated samples when compared to the Metropolis-Hastings algorithm, Gibbs sampling and MALA [8,2,7,9]. However, HMC still produces samples with relatively high autocorrelations [11,7].
Various extensions to HMC have been introduced to improve the efficiency of HMC. These include the Riemannian Manifold Hamiltonian Monte Carlo [7] (RMHMC) method that takes into account the local geometry of the target posterior, the No-U-Turn Sampler [18] (NUTS) algorithm which automatically tunes the HMC parameters, thermostat-VOLUME 4, 2021 assisted continuously-tempered Hamiltonian Monte Carlo [19,20] which better explores multi-modal target densities and also provides the Bayesian evidence metric which can be used for model selection, as well as Magnetic Hamiltonian Monte Carlo [11,14] (MHMC) which uses noncanonical Hamiltonian dynamics to better explore the posterior. Mongwe et al. [14] extend MHMC by utilising partial momentum refreshment, which they show enhances the performance of MHMC. These authors also show that using a random mass for the auxiliary momentum variable in MHMC to mimic the behaviour of quantum particles results in improved performance over MHMC and Quantum-Inspired Hamiltonian Monte Carlo [21,10].
MHMC improves the exploration of HMC by introducing a magnetic field in addition to the force field already in HMC.
The magnetic field provides an extra degree of freedom over HMC and results in efficient exploration of the target posterior and consequently faster convergence and lower autocorrelations in the generated samples when compared to HMC [11,22,23]. When the magnetic component of MHMC is set to zero, MHMC has exactly the same dynamics as HMC [11,22,23]. We leverage this key characteristic of MHMC in developing our proposed method.
The main impediment of HMC and MHMC being broadly used in practice is that they are very sensitive to the trajectory length and step size parameters [18,12]. If these parameters are not set correctly, it can result in very inefficient sampling of the posterior distribution. A trajectory length that is too short leads to random walk behaviour akin to the Metropolis-Hasting method [18]. A trajectory length that is too long results in a trajectory that traces back [18,9]. On the otherhand, small step sizes are computationally inefficient leading to correlated samples and poor mixing, while large step sizes compound discretisation errors leading to low acceptance rates [18,9,15,12]. Tuning these two parameters requires the practitioner to conduct multiple time consuming pilot runs [18,9,12].
The No-U-Turn Sampler (NUTS) of Hoffman and Gelman [18] automates the tuning of the step size and trajectory length parameters for HMC. The step size parameter is tuned during the burn-in phase using the primal dual averaging methodology by targeting a user specified level of acceptance rate in the generated samples [18,12]. The trajectory length is set by iteratively doubling the trajectory length until specific criterion are met [18,12,24]. Empirical results show that NUTS performs at least as efficiently as and sometimes more efficiently than a well tuned standard HMC method, without requiring user intervention or costly tuning runs [18]. The tuning of the parameters of MHMC is yet to be explored in the literature. This work aims to fill this gap in the literature.
MHMC is closely related to HMC and only differs with it due to the magnetic term. In this work, we expand on the NUTS methodology of Hoffman and Gelman [18] by extending it to incorporate the magnetic field present in MHMC. This new algorithm will automatically tune the trajectory length as in NUTS, with the step size tuned using dual averaging during the burn-in phase while leaving the target distribution invariant. We refer to this new method as Adaptive MHMC (A-MHMC).
Empirical results on the sampling of jump diffusion processes calibrated to real world financial market data, modelling of real world benchmark classification datasets using Bayesian logistic regression and simulation studies utilising multivariate Gaussian distributions show that A-MHMC outperforms MHMC and NUTS on an effective sample size basis without the user having to manually set the trajectory length and step size. The proposed method is more computationally expensive compared to NUTS due to the extra matrix multiplications required to incorporate the magnetic field. This leads to the new method producing similar effective sample sizes normalised by execution time when compared to NUTS.
Contributions: Our contributions in this work can be summarised as follows: • We introduce the adaptive Magnetic Hamiltonian Monte Carlo method which automatically tunes the trajectory length and step size parameters of Magnetic Hamiltonian Monte Carlo and leaves the target density invariant.
• Numerical examples using various targets are provided demonstrating significant improvements over Magnetic Hamiltonian Monte Carlo and produces higher effective sample sizes than the No-U-Turn Sampler. The remainder of this paper is structured as follows: Section II discuss the Markov Chain Monte Carlo methods that form the basis of the new method, Section III presents the proposed method, Section IV outlines the experiments conducted, Section V presents and discusses the results of the experiments and we provide the conclusion in Section VI.

II. THE NO-U-TURN SAMPLER
The Hamiltonian Monte Carlo (HMC) algorithm better explores the target posterior distribution due to its use of Hamiltonian dynamics [8,1]. Hamiltonian dynamics incorporate the first-order gradient information about the target posterior distribution. The HMC adds an auxiliary momentum variable p to the parameter space w and the resultant Hamiltonian H(w, p) is given as: where U (w) is the negative log-likelihood of the target posterior distribution and K(p) is the kinetic energy defined by the kernel of a Gaussian as where M is a positive definite mass matrix. The the systems of equations governing the path of the chain are defined by Hamilton's equations at a fictitious time t as [25]: The leapfrog integration scheme is utilised to explore the space. The update equations for the leapfrog integration scheme are [25]: where is the discretisation step size. Due to the discretisation errors arising from the leapfrog integration, a Metropolis-Hastings acceptance step is then performed in order to accept or reject the proposed sample [25,9]. The leapfrog steps are repeated until the maximum trajectory length L is reached. For the HMC algorithm, one needs to manually tune the step size and the trajectory length. Tuning these parameters typically requires multiple time consuming pilot runs [18].
The No-U-Turn Sampler (NUTS) of Hoffman and Gelman [18] automates the tuning of the HMC parameters. The step size is tuned through primal dual averaging during an initial burn-in phase by targeting a user specified level of sample acceptance. The trajectory length parameter is automatically tuned by iteratively doubling the trajectory length until the Hamiltonian becomes infinite or the chain starts to trace back [15,12]. That is, when the last proposed position state w * starts becoming closer to the initial position w. This happens if: (w * − w) × p ≤ 0. This approach however, violates the detailed balance condition. To overcome this problem in NUTS, a path of states B is generated such that its size is determined by the termination criterion [12,18]. The set B has the following property, which is required for detailed balance [12,18]: where z = (w, p) is the current state. That is, the probability of generating B, starting from any of its members is the same [12,18]. Having B, the current state z and a slice variable, u, that is uniformly drawn from the interval [0, π Z (z)], a set of chosen states C [12,18]: is constructed from which the next state is drawn uniformly. Algorithm 1 shows the pseudo-code for NUTS with dual averaging, showing how the trajectory length L and the step size are set automatically.

III. ADAPTIVE MAGNETIC HAMILTONIAN MONTE CARLO
The recently introduced Magnetic Hamiltonian Monte Carlo (MHMC) algorithm enhances HMC by equipping it with a magnetic field [11]. This magnetic field encourages better exploration of the posterior [11,26]. This results in faster convergence and lower autocorrelations in the generated samples when compared to HMC [11,14].
return w , p , w , p , w , p , n , s , α, n α 29: else 30: With prob. n n +n , set w = w , p = p 38: α = α + α , n = n + n , n α = n α + n α 39: MHMC uses the same Hamiltonian as in HMC in equation (1), but exploits non-canonical Hamiltonian dynamics where the canonical matrix now has a non-zero element on the diagonal. The MHMC dynamics are [27]: where G is the term that represents the magnetic field. The update equations for the non-canonical MHMC integration scheme are given as [11]: This means that MHMC only differs from HMC dynamics by G being non-zero [11]. When G = 0, MHMC and HMC have the same dynamics. These MHMC dynamics cannot be integrated exactly and we resort to a numerical integration scheme with a Metropolis-Hastings acceptance step to ensure detailed balance [11]. The integrator for MHMC is not exactly the leapfrog integrator, but is very similar. This leapfrog-like integration scheme is shown as the Integrator function in Algorithm 2. As with HMC, the MHMC algorithm has trajectory length and step size parameters that need to be tuned. Tuning of these parameters often requires expert judgment, which makes the method difficult to use in practice for non-specialists. We now introduce the adaptive Magnetic Hamiltonian Monte Carlo (A-MHMC) algorithm which automatically tunes the parameters of MHMC. The proposed method addresses a key limitation of MHMC -which is how to tune the step size and trajectory length parameters of MHMC. Without a systematic algorithm for tuning an MHMC algorithm its utility becomes severely limited. The tuning of the parameters of MHMC is yet to be explored in the literature. The proposed algorithm addresses this gap in the literature and makes the MHMC accessible to non-experts.
This new algorithm is based on incorporating the magnetic field in MHMC into the NUTS methodology in Algorithm 1. In particular, the step size is tuned via primal dual averaging during the burn-in phase, while the trajectory length is set by a recursive doubling procedure until the particle traces back. The A-MHMC method differs from NUTS in that a different integrator is used. Instead of using the leapfrog integrator associated with HMC, the leapfrog-like integrator (which is the explicit and symplectic) corresponding to MHMC is used. Thus the A-MHMC algorithm combines the non-canonical dynamics of MHMC with the automation benefits of NUTS to create a new sampler.
The algorithmic difference of A-MHMC with NUTS is highlighted in Algorithm 2. The stopping criteria in A-MHMC is the same as in the original NUTS algorithm outlined in 6. It is crucial to note that when G = 0, this new algorithm is equivalent to NUTS. This results in the proposed method inheriting the properties of NUTS such as detailed balance and time-reversibility.
Theorem III.1. A-MHMC satisfies detailed balance and leaves the target distribution invariant.
Proof. To show that A-MHMC satisfies detailed balance one would need to show that a NUTS algorithm that uses the leapfrog-like numerical integrator is both sympletic and reversible, and hence satisfied detailed balance. This amounts to showing that the leapfrog-like integrator satisfies detailed balance as this is what is required to show that NUTS satisfies detailed balance [18,12].
We now argue that the leapfrog-like integrator used in A-MHMC satisfies detailed balance. The map defined by integrating the non-canonical Hamiltonian system [11] d dt with initial conditions (w, p), where A is any square invertible, antisymmetric matrix induces a flow on the coordinates (w, p) that is still energy conserving with respect to A which also implies volume-preservation of the flow [11]. In addition, setting leads to MHMC dynamics and ensures that non-canonical dynamics have a (pseudo) time-reversibility symmetry as shown in [11]. These properties of being energy conserving, volume preserving and time-reversible guarantee that the leapfrog-like scheme satisfies detailed balance as required. Thus A-MHMC satisfies detailed balance and hence leaves the target invariant.
It should be noted that the approach undertaken in this paper is not the only method that could have been used to tune the parameters of MHMC. Wang et al. [28,29] introduce a Bayesian optimisation framework for tuning the parameters of Hamiltonian and Riemann Manifold Hamiltonian Monte Carlo samplers. The authors show that their approach is ergodic and in some instances precludes the need for more complex samplers. Buchholz et al. [30] adapt Hamiltonian Monte Carlo kernels using sequential Monte Carlo. They show that their approach improves as the dimensionality of the problem increases. Hoffman et al. [31] propose an adaptive MCMC scheme for tuning the trajectory length parameter in HMC and show that this new technique typically yields higher effective samples sizes normalised by the number of gradient evaluation when compared to NUTS. We plan to consider these and other approaches for tuning the parameters of MHMC in future work.
The key advantage of the proposed A-MHMC algorithm over MHMC is that it removes the need to manually tune the MHMC parameters. This makes MHMC more accessible as it removes the need for expert judgement in tuning the if v = −1 then With prob. n n +n , set w = w , p = p 15: α = α + α , n = n + n , n α = n α + n α 16: end if 18: return w − , p − , w + , p + , w , p , n , s , α , n α 19: end if function Integrator(p, w, , G) 20: p ← p + 2 ∂H ∂w (w, p) 21: w ← w + G −1 (exp(G ) − I) p 22: p ← exp(G )p 23: p ← p + 2 ∂H ∂w (w, p) return w, p algorithm parameters. Note however that the magnetic component G would still need to be specified by the user.
It is important to note that the number of likelihood function evaluations for NUTS is not deterministic as the trajectory length used to generate each sample varies. As our proposed method is based on the NUTS methodology, the number of function evaluations is not deterministic and depends on the target posterior under consideration. The computational cost of the proposed method is inline with that of NUTS as MHMC has similar, albeit slightly higher, computational cost to HMC. Thus on targets where NUTS has high execution times, we would expect A-MHMC to also have high execution times.
The magnetic field G in MHMC and A-MHMC provides an extra degree of freedom for both these algorithms [11]. It is not immediately clear as to how one should set or tune this matrix, but this should ideally be conducted in an automated manner [11,27,14]. Tuning the magnetic component is still an open area of research. In this work, we follow the directions of the authors [11,27,14,21] and select only a few dimensions to be influenced by the magnetic field. In particular, G was set such that G 1i = G 5i = g, G i1 = G i5 = −g and zero elsewhere for g = 0.2. Note that this particular form of G was chosen as G has to be antisymmetric [11]. In addition, we chose the first and firth dimensions as the target posteriors used in this paper include jump diffusion processes, which have only five parameters. Thus we did not consider higher dimensions as we wanted to use the same setting for G for all the target posterior distributions considered in this paper. This is means that the choice of G is not necessarily the optimal choice for all the target distributions considered, but was sufficient for our purposes as this basic setting still leads to good performance of the algorithm. Tuning G for each target posterior should result in improved performance compared to the results presented in this manuscript. An alternative approach to the selection of G would have been to follow [23] and selecting G to be random antisymmetric matrix. We plan to explore this approach in future work.

IV. EXPERIMENTS
We consider three test problems to demonstrate the performance of A-MHMC over MHMC and NUTS. We first study a jump diffusion process [32], whose transition density is an infinite mixture of Gaussian distributions with the mixing weights being probabilities from a Poisson distribution. We then perform a simulation study using multivariate Gaussian distributions with increasing dimensionality as in [33,14]. We proceed to consider performance on Bayesian Logistic Regression over various benchmark datasets outlined in [7].
Performance of the methods is measured via multivariate effective sample size (ESS) and ESS per second. We also assess the convergence of the methods using theR metric of Gelman and Rubin [34,35]. For all the algorithms, the step size was set by targeting an acceptance rate of 80% during the burn-in period. A total of 10 independent chains were run for each method on each target distribution. All experiments were conducted on a 64bit CPU using PyTorch. For MHMC, we considered the trajectory lengths to be in the set {10, 50, 100}. This allowed us to assess the sensitivity of the results to the trajectory length.
TheR diagnostic of Gelman and Rubin [34] is a popular method for establishing the convergence of MCMC chains [35]. This diagnostic relies on running multiple chains {X i0 , X i1 , ..., X i(n−1) } for i ∈ {1, 2, 3, ..., m} staring at various initial states with m being the number of chains and n being the sample size. Using these parallel chains, two estimators of the variance can be constructed. The estimators are the between-the-chain variance estimate and the withinthe-chain variance. When the chain has converged, the ratio of these two estimators should be one. TheR metric, which is formally known as the potential scale reduction factor, is VOLUME 4, 2021 defined as:R is the within-chain variance estimate andV = n−1 n W + B n is the pooled variance estimate which incorporates the betweenchains and within-chain W variance estimates withX i. andX .. being the i th chain mean and overall mean respectively for i ∈ {1, 2, 3, ..., m}. Values larger than the convergence threshold of 1.05 for theR metric indicate divergence of the chain [23,34]. The ESS calculation used in this work is the multivariate ESS calculation outlined in [36,37]. This metric takes into account the correlations between the parameters as well, which is unlike the minimum univariate ESS measure that is typically used to analyse MCMC results [7,36,27]. The minimum univariate ESS calculation results in the estimate of the ESS being dominated by the parameter dimensions that mix the slowest, and ignoring all other dimensions [36]. The multivariate ESS is calculated as: where n is the number of generated samples, D is the number of parameters, |Λ| is the determinant of the sample covariance matrix and |Σ| is the determinant of the estimate of the Markov chain standard error. When D = 1, mESS is equivalent to the univariate ESS measure [36]. Note that when there are no correlations in the chain, we have that |Λ| = |Σ| and mESS = n.

A. JUMP DIFFUSION PROCESS
The modelling of asset returns as typically been undertaken by models that explicitly or implicitly assume that returns are normally distributed [38,39,16]. Jump diffusion processes are appealing to use to model asset returns, instead of the traditional diffusion models such as geometric Brownian motion, because they produce returns which are leptokurtic [38,39,37,16]. The jump diffusion model that is analysed in this work is the [32] one-dimensional Markov process {S t , t ≥ 0} which is characterised by the following stochastic differential equation (SDE): where µ is the drift coefficient, σ is the diffusion coefficient, {B t , t ≥ 0} is a standard Brownian motion process, Y i is the random size of the ith jump and {N t , t ≥ 0} is a Poisson process with intensity λ. Furthermore, the jump sizes Y i are assumed to be independent and identically distributed random variables, and independent of both {N t , t ≥ 0} and {B t , t ≥ 0}. In addition, it is assumed that Y i ∼ N (µ jump , σ 2 jump ). The SDE in equation (14) implies the following transition density: where w and x are real numbers, τ is the time difference between S(t + τ ) and S(t) and φ is the probability density function of a standard normal random variable. Given that the transition density in equation (15) is an infinite mixture of normal distributions. In this work we truncate the infinite summation in equation (15) to the first 10 terms. The prior distribution over the parameters was the standard normal distribution.
We calibrate jump diffusion processes to real world financial datasets. The datasets consists of daily prices that were obtained from Google Finance. The specifics of the datasets are shown in Table 1 and are as follows: • Bitcoin dataset: Daily data for the crypto-currency index from 1 Jan 2017 to 31 Dec 2020. The prices were converted into log returns. • USDZAR dataset: Daily data for the currency from 1 Jan 2017 to 31 Dec 2020. The prices were converted into log returns. The formula used to calculate the log returns is given as: where r i is the log return on day i and S i is the stock or currency level on day i. The descriptive statistics of the datasets is presented in Table 2. This table shows that the USDZAR dataset has a very low kurtosis, suggesting that it has very few, if any, jumps when compared to the Bitcoin dataset.
A 90-10 train-test split was used for both datasets. A total of 2 000 samples were generated after a burn-in period of 500 samples.

B. MULTIVARIATE GAUSSIAN DISTRIBUTIONS AND BAYESIAN LOGISTIC REGRESSION
We first undertake a simulation study using multivariate Gaussian distributions. In this simulation study, the task is to sample from D-dimensional Gaussian distributions. The covariance matrix is set to be diagonal, with the standard deviations simulated from a log normal distribution with mean zero and unit standard deviation. For the simulation study, we consider the number of dimensions D to be in the set {10, 20, 50, 100}. A total of 10 000 samples were generated for each value of D, with the first 5 000 being discarded as the burn-in.
The details of the real world datasets modeled using logistic regression are displayed in Table 1. These datasets  are binary classification tasks and are thus modeled using Bayesian logistic regression. The specifics of the datasets are: • Pima Indian Diabetes dataset -This dataset has 7 features and a total of 532 observations. The aim for this dataset is to predict if a patient has diabetes based on diagnostic measurements mad on the patient [40]. • Heart dataset -This dataset has 13 features and 270 data points. The purpose of the dataset is to predict the presence of heart disease based on medical tests performed on a patient [40]. • Australian credit dataset -This dataset has 14 features and 690 data points. The objective for this dataset is to assess applications for credit cards [40]. • German credit dataset -This dataset has 25 features and 1 000 data points. The aim for this datatset was to classify a customer as either good or bad credit [40]. A 90-10 train-test split was used and the input features were normalised for all the datasets. The prior distribution over the parameters for the logistic regression datasets was Gaussian with standard deviation equal to 10. The above settings are in line with those in [7]. For the logistic regression benchmark datasets, we generated 10 000 samples with a burn-in period of 5 000 samples.

V. RESULTS AND DISCUSSION
The experiments were implemented in PyTorch and were carried out on a 64-bit precision CPU. The performance, in terms of mESS, of the algorithms for the simulation study and logistic regression datasets is shown in Figure 1. The detailed results for the real world jump diffusion and logistic regression datasets across different metrics are shown in Tables 3 and 4. Note that the execution time t in Tables 3  and 4 is in seconds.
The first row in Figure 1 shows the mESS for the multivariate Gaussian distributions in the simulation study with increasing dimensionality D. The second row in Figure 1 shows the mESS for the logistic regression datasets. The results in in Tables 3 and 4 are the mean results over the 10 runs for each algorithm. Note that we use the mean values over the 10 runs to form our conclusions about the performance of the algorithms.
The first row of Figure 1 shows that in all the values of D, A-MHMC performs comparably or better than the MHMC method with fixed trajectory lengths. The results also show that the optimal trajectory lengths to use for the MHMC for each D differs with no obvious pattern, illustrating how difficult it is to tune the trajectory length parameter. A-MHMC tunes this trajectory length in an automated fashion, thus removing the manual process that would have been required by the practitioner to select the optimal trajectory length for each D. The results also show that for all the values of D, A-MHMC outperforms NUTS.
The second row in Figure 1 shows that across all the real world logistic regression datasets, A-MHMC outperforms all the methods considered, with NUTS being second. This is also confirmed by the mean results, over 10 runs of the algorithms, shown in Tables 3 and 4. The A-MHMC produces significantly more effective sample sizes, even after taking into account the execution time, than the other MHMC with different trajectory lengths -except on the Bitcoin dataset where the adaptive algorithms (NUTS and A-MHMC) have very large execution times which causes poor normalised effective sample size performance. The A-MHMC methdo can provide up to 40 times relative speed up when compared to MHMC. This shows the benefit of A-MHMC's ability to automatically tune the trajectory length without user intervention. A-MHMC and NUTS produce similar effective sample sizes normalised by execution time, and relative speed up, across all the datasets. With exception on the Bitcoin dataset, NUTS has the lowest execution time t on all the datasets, with A-MHMC being a close second. The low execution time was to be expected as A-MHMC does additional matrix operations, to take into account the magnetic term, when compared to NUTS. Tables 3 and 4 show that the algorithms have similar predictive performance, with the MHMC algorithm with trajectory length equal to 100 marginally outperforming the other methods on all the logistic regression datasets. A-MHMC outperforms on the jump diffusion datasets. It is also worth noting that A-MHMC and NUTS have large executive times on the Bitcoin dataset and low execution times on the USDZAR dataset. The high execution time on the Bitcoin dataset is due to the high kurtosis of this dataset. Figure 2 shows that all the MCMC methods have converged on all the target posteriors. This is further confirmed in Tables 3 and 4 whereR is close to one for all the algorithms with A-MHMC outperforming on theR metric on the majority of the datasets.

VI. CONCLUSIONS
We present the adaptive Magnetic Hamiltonian Monte Carlo algorithm that automatically tunes the trajectory length and step size parameters of Magnetic Hamiltonian Monte Carlo VOLUME 4, 2021   without the user having to manually specify these parameters. We assess the performance of this new method against the No-U-Turn Sampler and Magnetic Hamiltonian Monte Carlo methods using 10, 50 and 100 trajectory lengths respectively. The methods are compared on the calibration of jump diffusion processes, Bayesian logistic regression and multivariate Gaussian targets. The A-MHMC algorithm significantly outperforms MHMC on both an effective sample size basis and normalised effective sample size basis on the majority of the datasets. The empirical results show that A-MHMC outperforms NUTS on an effective sample size basis and produces similar normalised effective sample sizes when compared to NUTS.
This work can be improved by assessing the performance of the proposed method on deep neural networks and larger datasets such as MNIST. Comparing the performance of the method with Riemanian manifold based Markov Chain Monte Carlo methods would be an area of interest. Incorporating the adaptation of the magnetic field term in the proposed algorithm could also serve as an improvement on the method. We plan to also assess other methods for tuning parameters of MHMC, such as those based on utilising the variance of the change in the shadow Hamiltonian corresponding to the leapfrog-like integrator used in MHMC.