Sampling Errors in Nested Sampling Parameter Estimation

Sampling errors in nested sampling parameter estimation differ from those in Bayesian evidence calculation, but have been little studied in the literature. This paper provides the first explanation of the two main sources of sampling errors in nested sampling parameter estimation, and presents a new diagrammatic representation for the process. We find no current method can accurately measure the parameter estimation errors of a single nested sampling run, and propose a method for doing so using a new algorithm for dividing nested sampling runs. We empirically verify our conclusions and the accuracy of our new method.


Introduction
Nested sampling (Skilling, 2006) is a Monte Carlo method for Bayesian analysis which simultaneously calculates both Bayesian evidences and posterior samples. The early development of the algorithm was focused on evidence calculation, which is computationally expensive using variants of standard Markov chain Monte Carlo (MCMC) sampling based on the Metropolis-Hastings algorithm (MacKay, 2003).
Contemporary implementations such as MultiNest (Feroz and Hobson, 2008;Feroz et al., 2009Feroz et al., , 2013 and PolyChord (Handley et al., 2015a,b) are now also extensively used for parameter estimation from posterior samples (see for example Planck Collaboration, 2016). Nested sampling compares favourably to MCMC-based parameter estimation for degenerate, multi-modal likelihoods as it has no "thermal" transition probability and exponentially compresses the prior distribution to the posterior. However, despite its increasing popularity, the sampling errors in nested sampling parameter estimation are poorly understood.
Correctly quantifying uncertainty is vital for identifying spurious results -in particular we find sampling errors often significantly affect estimates of credible intervals on parameters. Conversely, finding such errors are very small may imply an unnecessarily large amount of computational resource is being used for the calculation. This paper has two goals: to provide an explanation of the sources of these errors and an empirical technique for estimating them. One obvious method is to repeat the analysis some n repeats times, although this increases the computational cost by a corresponding factor. Interestingly, we find no current method can accurately estimate these errors on parameter estimates from a single analysis, and so we present a new method for doing this. Our approach uses a new algorithm for dividing a single nested sampling run into multiple valid nested sampling runs; these can then be recombined in different combinations using resampling techniques such as the bootstrap. We test our results and new method empirically.
The paper begins with background on sampling errors in parameter estimation from posterior samples, then describes the nested sampling algorithm and how it is currently used for parameter estimation in Section 2. We explain the two main sources of sampling errors in nested sampling parameter estimation in Section 3, and present a new diagrammatic representation of the process (illustrated in Figures 3(a) to 3(e)). Section 4 describes our new method for measuring sampling errors from a single nested sampling run, using our new algorithm for division of such runs.
We empirically test our method's accuracy in Section 5 with the help of analytical cases in the manner described by Keeton (2011). Here one can obtain uncorrelated samples from the prior space within some likelihood contour using standard techniques, and we term the resulting procedure perfect nested sampling. Our approach accurately quantifies uncertainties on parameter estimates from the stochasticity of the nested sampling algorithm, but software used for practical problems may produce additional errors from correlated samples within likelihood contours that are specific to a given implementation. We discuss implementation-specific errors in Section 6, including testing sampling error estimates from our method for PolyChord calculations. Our method gives superior performance to the current approach and can be easily included in nested sampling software; we are currently working on incorporating it into future versions of PolyChord.

Background: sampling errors in parameter estimation
Sampling can be used to represent a posterior distribution P(θ) via a set of weighted samples where each θ s is drawn from the posterior distribution with probability proportional to p s × P(θ s ), and s∈S p s = 1. Likelihoods L are often computationally expensive functions, so the goal of parameter estimation is to sample the posterior distribution P(θ) numerically with a limited number of likelihood calls.
Samples S may be used to compute numerical results. For example, the posterior expectation of a function of the parameters f (θ) can be estimated as (2) In this case the sampling error is the difference between s∈S p s f (θ s ) and the exact value of E[f (θ)]. Often the posterior distributions of parameters θ are of interest, and are estimated numerically from the samples by dividing the parameter space into cells or via kernel density estimation.
There have been many works on approximating MCMC sampling errors, including investigation of quantiles and the amount of computation required to reach some level of accuracy -see for example Doss et al. (2015); Flegal et al. (2008); Liu et al. (2016). In particular Sequential Monte Carlo samplers (Del Moral et al., 2006) have similarities with nested sampling, and their sampling errors are better understood. For some related methods such as the Tootsie Pop algorithm (Huber and Schott, 2014) and accelerated simulated annealing (Bezáková et al., 2008) the error distribution is known exactly, although these techniques are less widely used. This paper introduces empirically tested techniques for quantifying sampling errors from the nested sampling algorithm.

The nested sampling algorithm
Nested sampling (Skilling, 2006) is a numerical method computing Bayesian evidences and samples from the posterior distribution given some likelihood L(θ) and prior π(θ).
Initially n points, termed live points, are sampled randomly from the prior. At each iteration i, the live point with the lowest likelihood L i is removed and replaced by a new live point sampled from the prior subject to the constraint that it has a likelihood higher than L i . Iterating until some termination condition is met generates a list of discarded samples known as dead points, which are used to estimate the evidence and make posterior inferences 1 . We refer to the completed nested sampling process as a run.
To compute the evidence, the many-dimensional integral (3) is reduced to a onedimensional integral in terms of the fractional prior volume within an iso-likelihood contour. We define the fraction of the prior θ with likelihood L(θ) greater than some value L * as X(L * ), where and X ∈ [0, 1]. Provided the inverse L(X) ≡ X −1 (L) exists 2 , the evidence (3) can be expressed as Figure 1: A schematic representation of nested sampling with a constant number of live points n. The curve L(X)X shows the relative posterior mass, the bulk of which is contained in some small fraction exp(−H) of the prior and is only visible on a log scale in X. The algorithm iterates inwards in X exponentially with stochastic shrinkage ratios distributed according to (7).
Given a set of dead points with likelihoods L i , the corresponding prior volumes X i are unknown but are modelled statistically as X i = t i X i−1 , where X 0 = 1 and each shrinkage ratio t i is independently distributed as the largest of n random variables from the interval [0, 1] (Skilling, 2006). Hence: and the algorithm samples within an exponentially shrinking part of the prior. This exponential shrinkage is shown schematically in Figure 1.

Evidence estimation
Nested sampling therefore allows one to approximate the evidence (6) via a quadrature sum over the dead points where t = {t 1 , t 2 , . . . , t n dead } are the unknown set of shrinkage ratios for the n dead iterations of the nested sampling process, and each t i is an independent random variable drawn from distribution (7). The shrinkage ratios define the prior volumes via X i (t) = i k=0 t k , and the w i are appropriately chosen quadrature weights roughly corresponding to the volume of the "prior shell" to which a given dead point belongs. For example, using the trapezium rule: w i (t) = 1 2 (X i−1 (t) − X i+1 (t)). Given that the shrinkage ratios t are a priori unknown, we may quantify our knowledge of Z by simulating sets of t according to (7), and working with the distribution of the resulting set of evidences {Z} t from (8) (Skilling, 2006). Typically one then computes and reports a mean value and error for log Z from this distribution.
Several alternative methods for calculating evidence inferences are reported in the literature. Skilling (2006) also proposes an error calculation based on relative entropy, which demonstrates that the uncertainty of log Z is dominated by the Poisson variability in the number of steps required to reach the bulk of the posterior mass. Keeton (2011) uses distribution moments and running totals which are updated with each nested sampling step. This method has been extended by Handley et al. (2015b) to allow the splitting of multi-modal likelihoods into different clusters and the treatment of variable numbers of live points. For a more detailed discussion of the convergence properties of nested sampling evidences, see Chopin and Robert (2010).
Thus, the dominant sampling error in the evidence estimate (8) from perfect nested sampling is from statistical variation in the unknown volumes of the prior "shells" w i (t) that each point represents. The error from approximating the integral for Z with a sum can be safely neglected unless n is very small 3 (Skilling, 2006). There is also some error from terminating the algorithm and truncating the sum, but this is can be made negligible with appropriate termination conditions.

Parameter estimation
One may also perform posterior inference from nested sampling by using the dead points to construct a set of posterior samples with weights proportional to their share of the posterior mass (Skilling, 2006): As before, t is the set of prior shrinkage ratios and in the trapezium rule case w i (t) = 1 2 (X i−1 (t) − X i+1 (t)). The weights defined by (9) present a departure from traditional sampling approaches in that the w i (t) are random variables, with their stochasticity determined by (7). When computing expectations (2) there is now an additional error associated with our lack of knowledge of the precise values p i (t). Nested sampling software packages such as MultiNest and PolyChord produce posterior files containing only the expected values To account for the stochasticity in the weights p i , Skilling (2006) suggests simulating the prior volume shrinkage ratios t in the same manner as for evidence estimation, and using these simulations to calculate a set of values for estimators such as (2). The sampling error should then be estimated from the variation within this sample; we term this the simulated weights method. We believe this procedure is the only estimate of sampling errors in parameter estimation from a single nested sampling run proposed in the literature. However it is in general an underestimate, as can be seen in the numerical tests in Section 5. Section B of the supplementary material (Higson et al., 2017) discusses this underestimation of errors in detail.
We now describe why the simulated weights method does not capture all sources of sampling errors, and in Section 4 we propose a new method for correctly computing these errors.

Sources of sampling errors in nested sampling parameter estimation
In order to understand why the simulated weights method underestimates sampling errors, we require a result from Chopin and Robert (2010). They show that the expectation integral (2) may be re-phrased in terms of the prior volume X via: wheref (X) is the prior expectation of f (θ) on some iso-likelihood contour L(θ) = L(X), The simulated weights approach amounts to discretising the integral (11) as and, most importantly, further requiring that we may use f (θ i ) as a proxy forf (X i ) at each point X i . In some special cases f (θ i ) =f (X i ) for all θ and this approach is valid, for example when f (θ i ) =f (X i ) ∝ − log L i (entropy computation), but in general it is not. This can cause significant inaccuracies as iso-likelihood contours often span wide ranges of different parameter values, as illustrated in Figure 2.
To summarise, the dominant sampling errors in estimating some parameter or function of parameters from perfect nested sampling typically come from two sources: (i) approximating the unknown prior volumes w i (t) with their expectation E[w i (t)] using (7); (ii) approximating the mean value of a function of parameters over an entire isolikelihood contourf (X i ) with its value at a single point f (θ i ).
Errors from (i) are also present in evidence calculation; in the parameter estimation case they are typically smaller as results depend only on the relative weights of the samples. In contrast (ii) is only present in parameter estimation, where it is typically a significant or dominant source of sampling errors. The relative contributions of (i) and (ii) are empirically tested in Section A of the supplementary material, where they are Figure 2: Nested sampling dead points and iso-likelihood contours for a two-dimensional multi-modal likelihood L(θ); darker shading shows higher likelihoods. Iso-likelihood contours can pass through a wide range of different parameter values. calculated for analytical cases by using exact values for weights w i (t) and by replacing f (θ i ) withf (X i ). The simulated weights method underestimates sampling errors in parameter estimation as it ignores errors from (ii).
We now introduce a new diagrammatic representation of nested sampling parameter estimation to illustrate the two different sources of sampling errors.

Diagrammatic representation
Nested sampling transforms evidence calculations of any dimension into a 1-dimensional problem 4 in L(X) which can be entirely represented on a diagram like Figure 1. An analogous diagram for parameter estimation must also illustrate sampling a single point We propose a generalisation of Figure 1 for visualising parameter estimation problems, and present it in Figures 3(a) to 3(e). The top panel in each figure is similar to Figure 1 and shows the relative posterior mass L(X)X at each value of log X. The lower central panel shows the probability distribution P (f (θ)|X) and its meanf (X). The posterior distribution is shown on the left -this is equal to the distributions P (f (θ)|X) (the lower central panel) marginalised over X in proportion to the posterior weight at each X (the top panel).
For these example plots we use d-dimensional spherical unit Gaussian likelihoods and d-dimensional spherical unit Cauchy likelihoods  with d-dimensional co-centred spherical Gaussian priors We denote the first component of the θ vector as θ1, although by symmetry the results will be the same for any component. θ1 and θ 2 1 are the first and second moments of the posterior distribution of θ1.
The form of the distribution P (f (θ)|X) as X varies depends on the likelihood only through the shape of the iso-likelihood contours L(θ) = L(X). Therefore the lower central panel of the diagrams for some f (θ) is the same for any likelihoods with the same contours -this can be seen in These diagrams can be constructed for any nested sampling calculation using the posterior samples and kernel density estimation; this could provide insight into the nature of the calculation and the relative contributions from the different sources of sampling errors.

Transforming a parameter estimation problem into 2 dimensions
As illustrated by our diagrams, nested sampling parameter estimation is fundamentally a 2-dimensional problem in L(X) and P (f (θ)|X). In fact a parameter estimation calculation for some f (θ) given L(θ) is equivalent to a 2-dimensional problem for f * (θ * ) given L * (θ * ) when for all X. Any transformation satisfying (17) and (18) will leave our proposed diagram for the calculation unchanged. Parameter estimation can also be represented as a 1dimensional problem in L * (θ * ) = L(X) combined with a univariate stochastic process for each dead point i with the distribution P (f (θ)|X i ).
One way to express a general nested sampling calculation in 2 dimensions is to map it onto the unit square θ * = (X, Y ) with uniform priors X, Y ∈ [0, 1] and a likelihood L * (θ * ) = L(X) which is independent of Y and satisfies (17). In this case X is as before the remaining fractional prior volume and Y parameterises each iso-likelihood contour. Using inverse transform sampling, for a general f (θ) a corresponding f * (θ * ) satisfying (18) is where As an example let us consider d-dimensional spherically symmetric likelihoods such as (14) or (15) with co-centred spherically symmetric priors such as (16). Then X(θ) is a function only of the radial distance from the centre |θ|, and the iso-likelihood contours L(θ) = L(X) are hyperspherical shells of some radius |θ X |. The probability distribution of a single parameter θ1 (a single component of θ) on such an iso-likelihood contour is then θ1 can be sampled directly or used to calculate the inverse cumulative distribution which together with knowledge of the function L(X) allows the parameter estimation of a d-dimensional Gaussian to be transformed into a 2-dimensional problem on the unit square.
Samples from (21) can be generated efficiently using the symmetry around θ1 = 0 and the change of variables Θ = θ 2 1 /|θ X | 2 to give a Beta distribution This technique is used for the numerical tests in Section 5, and allows the efficient sampling of high dimensional spherically symmetric distributions where only a few parameters are of interest without generating all the remaining uninteresting parameters.

Estimating sampling errors in nested sampling parameter estimation
Following the discussion of sources of sampling errors in Section 3, we seek a method for correctly calculating parameter estimation sampling errors from a single nested sampling run. As no additional samples θ i are available, a natural starting point is to utilise resampling techniques such as the jackknife (Tukey, 1958), bootstrap (Efron, 1979) and Bayesian bootstrap (Rubin, 1981), which estimate the uncertainty on inferences from a set of samples by calculating the variation when samples are re-weighted.
However, as described in Section 2.2, the uncertainty in nested sampling weights w i (t) produces additional sampling errors which are unique to the nested sampling process. These are not accounted for by naïvely applying jackknives and bootstraps to posterior samples produced by nested sampling, and these approaches fail when tested numerically. We instead require a method for dividing runs in a manner that preserves the statistical properties of nested sampling. No such method exists in the literature, so we present one in the remainder of this section. Skilling (2006) describes how several nested sampling runs r = 1, 2, . . . with n (r) live points may be combined simply by merging the dead points and sorting by likelihood value. The combined sequence of dead points is equivalent to a single nested sampling run with n = r n (r) live points.

Dividing runs into threads
In fact, as we show now, the reverse procedure is also possible. A nested sampling run with n points can be unwoven into a set of n valid nested sampling runs, each with n (r) = 1. We term these single live point runs threads. During nested sampling, each dead point i is replaced by a new point sampled uniformly within its iso-likelihood contour L(θ) = L i . Starting from each initial live point that is generated, one may follow this sequence of replacements down the set of dead points. This sub-sequence of dead points is in fact a nested sampling run with n = 1. More formally: Algorithm 1: Splitting a nested sampling run into threads.
Result: n threads. Data: Dead points and the iterations at which they were sampled for a nested sampling run with n live points. Rank dead points by likelihood in ascending order; while i ∈ n do make a new stack i; select one of the initial points sampled at the start of the run; move the point out to the stack i; while iteration < final iteration do select point sampled at the iteration where previous point was replaced ("died"); move the point to the stack i; end end A few points are worthy of note: 1. splitting a run by randomly selecting some fraction of the dead points will not produce threads (i.e. single point nested sampling runs); 2. one may split a given nested sampling run into separate runs with n (r) = 1 by first separating into threads, and then recombining threads as desired; 3. the algorithm can be easily adapted for varying numbers of live points by permitting it to select multiple points on contours where n increases. This can result in constituent threads stopping or dividing into multiple threads part of the way through the run; 4. typically there is only one point which was sampled uniformly from the prior volume within each dead point i's iso-likelihood contour L(θ) = L i -the point which replaced i. A sufficient condition for a nested sampling run to only have one unique division into threads is that L(X) is an injective function; 5. in order for the threads to be true nested sampling runs, care must be taken with the termination conditions used. See Section D of the supplementary material for a full discussion.
Given that threads represent independent nested sampling runs, one may apply standard resampling techniques to the set of threads and approximate the entire sampling error distribution without making assumptions about its form. This works as the log X i values of the dead points i from some run with n live points form a Poisson process with rate n, meaning the log X j values of the dead points j of a single thread are a Poisson process of rate 1. For typical problems with computationally expensive likelihoods the computational cost of even a large number of resampling replications is negligible.
Having introduced a framework for applying resampling to nested sampling parameter estimation we now present an example method using bootstrap resampling.

Bootstrap estimate of sampling errors
Given n observations x = (x 1 , . . . , x n ), the bootstrap (Efron, 1979) creates new data sets x * b by drawing n samples from x with replacement. This corresponds to approximating the probability distribution of a single data point x as where δ(x) is the Dirac delta function (Ivezić et al., 2014).
As the form of the distribution of sampling errors for a general nested sampling parameter estimation problem is not known, we use the non-parametric bootstrap. In this case the uncertainty on a quantity T (x) calculated from the data can be estimated by calculating T (x * b ) for a number of resampled data sets b = 1, . . . , B. For example the bootstrap estimate of the standard error on T (x) is There are many methods for calculating approximate credible intervals on T (x) from bootstrap replications {T (x * b )} -see Efron and Tibshirani (1986) for a detailed discussion. A simple approach from Johnson (2001) is to estimate the boundaries of the 100α% and 100(1 − α)% credible regions 5 as where G −1 (x) is the inverse cumulative distribution of the bootstrap samples {T (x * b )}. B = 50 is typically sufficient for an estimate of the standard deviation of a parameter estimate due to sampling errors, but depending on the method used credible intervals on parameter estimates may require 1,000 bootstrap replications or more (Efron and Tibshirani, 1986).
When the bootstrap is applied to nested sampling each observation x i is a thread, and the number of observations is n. Calculating the quantity T (x) involves first combining the set of threads x into a single run using Skilling (2006)'s method (described in Section 4.1), then performing a standard nested sampling calculation including estimating the weight of each point w i (t) statistically. Including the same thread multiple times does not cause problems -repeated dead points θ i = θ i+1 are simply assigned the weights w i (t) and w i+1 (t) respectively.
The following algorithm provides a set of bootstrap replications and an estimate of the standard deviation of sampling errors.

Result: Sampling errors and bootstrap replications for the nested sampling
calculation T (dead points, weights). Data: List of dead points and the steps they were sampled at. Divide dead points into a list of threads x using Algorithm 1; We find that bootstrap resampling gives better results than jackknife resampling, which fails to calculate sampling errors on credible intervals of posterior distributions of parameters such as C.I. 84% (θ1). The Bayesian bootstrap was not used as it gives each observation a non-integer weight, which requires modifying nested sampling's use of dead points to statistically estimate prior volume shrinkages.
Resampling techniques such as the bootstrap can generate many simulated runs with the same number of live points n as the original run. In comparison sampling error estimates from simply splitting a run into many smaller runs and assessing their variation perform poorly, as shown in Section C of the supplementary material.

Numerical tests
Following Keeton (2011) we first test our new method using analytic cases where uncorrelated samples can be easily obtained from the prior within an iso-likelihood contour, allowing us to perform perfect nested sampling. This ensures our results are not affected by imperfect implementation of the nested sampling algorithm by a specific software.
As discussed in Section 3 perfect nested sampling parameter estimation problems depend on the likelihood L(θ) and prior π(θ) only through the distribution of posterior mass L(X) and the distribution of parameters on iso-likelihood contours P (f (θ)|X), both of which are functions of both L(θ) and π(θ). We therefore empirically test our method using a wide range of distributions of posterior mass, and examine several functions of parameters f (θ) in each case. We construct such tests using Gaussian likelihoods (14) and Cauchy likelihoods (15) of a variety of dimensions d, each with a Gaussian prior (16). The different distributions of posterior mass for different d are illustrated in Figure 4; the Cauchy distributions have extremely fat tails with significant sample weights throughout the range of log X values explored. We use the termination conditions described by Handley et al. (2015b, Section 3.4), stopping when the estimated evidence contained in the live points is less than 10 −4 times the evidence contained in dead points (see Section D of the supplementary material for a discussion of termination conditions for nested sampling parameter estimation). Numerical calculations for high-dimensional cases are performed in two dimensions using the technique described in Section 3.2.
As in Section 3 we denote the first component of the θ vector as θ1, although by symmetry the results will be the same for any component. θ1 and θ 2 1 are the first and second moments of the posterior distribution of θ1, and the one-tailed Y % upper credible interval C.I. Y % (θ1) is the value θ * 1 for which P (θ1 < θ * 1 |L, π) = Y/100.

3-dimensional Gaussian example
We first test our bootstrap approach to estimating sampling errors on a 3-dimensional Gaussian likelihood (14) - Figure 5 illustrates sampling errors on posterior distributions of parameters θ in this case. Unlike the simulated weights method, the mean estimates of sampling errors from our method are very close to measurements of sampling errors from repeated calculations -this shown in the second row of Table 1. Furthermore the fractional variation of estimates from single runs around the mean estimate is similar to that from the simulated weights method, as shown in the fourth and fifth rows, indicating our method will give a reasonable estimate of sampling errors when only a single nested sampling run is available.
The final two rows of Table 1 show the empirical coverage rates for bootstrap credible intervals are very close to their nominal values. Figure 6 shows estimates of the full sampling error distribution from a single run nested sampling run using the bootstrap and simulated weights methods; the bootstrap results are much closer to the sampling errors observed in repeated calculations, and give accurate estimates of the 1σ and 2σ credible intervals.  (14) and a uniform prior. The shading and black lines show the analytic posterior distribution and the 68% and 95% credible intervals. The red lines show the calculated posterior credible intervals for a nested sampling run with n = 100, and differ from the analytic answer due to sampling errors.
Section E of the supplementary material shows similar numerical tests for a 3dimensional Cauchy likelihood (15). Even for this challenging, fat-tailed distribution our method performs similarly to the Gaussian case, giving accurate mean error estimates and estimates of credible intervals with measured coverage similar to their nominal coverage.

Sampling errors in different dimensions
We now verify the bootstrap method's accuracy for Gaussian (14) and Cauchy (15) likelihoods of between 2 and 50 dimensions. Figure 7 shows bootstrap sampling error estimates accurately match the errors measured from repeated calculations, even for the challenging fat-tailed Cauchy distribution. In contrast the simulated weights method consistently underestimates the sampling errors in parameter estimation, although as expected it is accurate for errors on the evidence log Z. See Section B of the supplementary material for a detailed discussion of the simulated weights method.
As the dimension d increases, Figure 7 shows parameter estimation errors decreasing and the evidence errors increasing (with a constant number of live points n). This effect is due to the posterior being contained in a smaller fraction of the prior volume in higher dimensions. In the spherically symmetric cases considered, the range of log X  (14), a Gaussian prior (16)    to be explored increases approximately linearly with the dimension d, as can be seen in Figure 4. With a constant number of live points, the number of samples is therefore also approximately proportional to d.
In parameter estimation from posterior samples only points' relative weights matter, so the increased number of samples in higher dimension problems typically increases accuracy as can be seen in Figures 7(a) and 7(b). However for high dimensional Cauchy likelihoods (15) the posterior mass is spread over a wide range of log X values, so errors in the relative weights of points become large in high dimensions 6 .
For log Z the dominant error is in the absolute value of point weights, which is approximately proportional to the square root of the number of steps required to reach the posterior (Skilling, 2006). log Z errors are therefore approximately proportional to √ d when n is constant, as can be seen in Figure 7(c).

Application to existing nested sampling software
Nested sampling software such as MultiNest and PolyChord can be easily modified to output information about the step at which dead points were sampled and give sampling error estimates using bootstrap resampling of threads. We are currently working on incorporating this into future releases of PolyChord.
Sampling error estimates from our approach will be accurate provided the software is performing nested sampling approximately correctly. However such software can only approximately sample randomly from the prior within iso-likelihood contours -this may result in additional errors which are specific to a given implementation and which may not be captured by general methods such as resampling threads. Tuning parameters such as "num repeats" (the number of slice samples taken between dead points) in PolyChord allow reduced correlation between samples at a higher computational cost. Testing sampling error estimates from our method against those from repeated calculations can be used to detect implementation-specific errors and to select appropriate values for tuning parameters.
In addition our algorithm for dividing nested sampling runs could be used to detect implementation-specific errors by testing the difference in correlation between threads from the same run and from different runs. In principle this could allow estimates of sampling errors to be corrected for the effects of correlations between threads.
We now demonstrate our method's application to nested sampling results produced with PolyChord.

Sampling errors on data fitting with PolyChord
We fit a set of points D = {x i , y i } with normally distributed errors σ y on the y values using a sinusoid y(x) = A sin(ωx + φ).
(28) The likelihood is then  where θ = (A, ω, φ) and we use a uniform prior for A ∈ (0, 1), ω ∈ (0, 10) and φ ∈ (−π/2, π/2). Numerical tests use 40 data points sampled from y(x) = 1 2 sin(2πx) with Gaussian noise of size σ y = 0.2 added to the y values; y(x) and the data points are shown in Figure 8 and the posterior distribution of y(x) given the data is shown in Figure 9. Posterior distributions on A, ω and φ can be calculated with nested sampling -these are illustrated in Figure 10 along with example sampling errors. Table 2 shows sampling errors from PolyChord with num repeats = 15 -the default value for a 3-dimensional problem. As in perfect nested sampling, our bootstrap estimates of the standard error agree with the variation in results observed, and the observed coverage of credible intervals is close to their nominal coverage. This implies that num repeats = 15 is sufficient for PolyChord to perform parameter estimation accurately in this case.

Conclusion
Sampling errors in nested sampling parameter estimation arise principally from two sources: uncertain sample weights w i (t), and approximating the average of a function Figure 10: Posterior distributions and sampling errors from fitting a sinusoid to data (29) using PolyChord. The shading and black lines show an accurate calculation of posterior distribution and the 68% and 95% credible intervals from combining 1,000 nested sampling runs with n = 100. The red lines show the calculated posterior credible intervals for a single nested sampling run with n = 100, and differ from the true posterior due to sampling errors. of parameters on each iso-likelihood contourf (X i ) with a single sample f (θ i ). The latter error is not present in evidence calculation and has been previously ignored. The added stochasticity from sampling each iso-likelihood contour makes nested sampling parameter estimation a 2-dimensional problem, with a dependence on both the distribution of posterior mass L(X) and the distribution of parameter values P (f (θ)|X) on each iso-likelihood contour. We proposed a new diagram for representing both aspects of the calculation, and presented it in Figures 3(a) to 3(e).
Estimating sampling errors is vital for interpreting the results of a nested sampling calculation, as well as for allocating computational resources -for example by choosing an appropriate number of live points. However the current approach (the simulated weights method) underestimates sampling errors as it does not account for approximatingf (X i ) with a single sample f (θ i ). We proposed a new method for estimating sampling errors using our new algorithm (Algorithm 1) for dividing a nested sampling run into single live point runs ("threads"), which can then be resampled with techniques such as the bootstrap. This works as the log X i values of the dead points i from some nested sampling run with n live points form a Poisson process with rate n, meaning the log X j values of the dead points j of a single thread are a Poisson process of rate 1.   (29) using PolyChord with n = 100. The first row shows the standard deviation of 1,000 nested sampling calculations. The second and third rows show the mean of 1,000 error estimates from the bootstrap and simulated weights methods respectively as a ratio to the error observed from repeated calculations; 200 weight simulations and 200 bootstrap replications were used for each run. The fourth and fifth rows show the standard deviations of sampling error estimates for both methods as a percentage of the mean estimate. The sixth row shows the mean of 100 bootstrap estimates of the one-tailed 95% credible interval on the calculation result, each using 1,000 bootstrap replications. The final two rows show the empirical coverage of the bootstrap standard error and 95% credible interval from the 1,000 repeated calculations. Numbers in brackets show the error on the final digit.
Our method shows accurate and robust estimation of sampling errors in parameter estimation in empirical tests, and compares favourably to the other methods discussed. The new method can be easily incorporated into existing nested sampling software, and will be reliable provided the implementation is performing the nested sampling algorithm accurately. We are currently working on including nested sampling run division and sampling error estimates from our method in future versions of PolyChord.