1 Introduction

One of the greatest challenges in theoretical high-energy physics is to meet the demand for increasingly precise predictions. Even leaving aside conceptual issues, reducing theoretical uncertainties typically requires ever more complex calculations, which incur steeply rising computing costs. Already now Monte Carlo event generation for the LHC constitutes a notable fraction of the experimental computing budgets. Even with this large computing power investment event sample sizes have to be limited to a size where the resulting uncertainty can be non-negligible [1]. What is more, event generation is only the first step in the full simulation chain, and the already substantial computing costs of this step are often dwarfed by the subsequent simulation of the detector response. All these problems are expected to become even more severe in the future, in particular with the advent of the HL-LHC. This development is obviously at odds with general sustainability goals. Improving the efficiency of event simulation is one of the foremost tasks in particle physics phenomenology.

For a high-accuracy simulation a Monte Carlo generator needs to combine real and virtual corrections, match resummations in different limits and combine processes of different multiplicities while avoiding double counting. Many different prescriptions exist for each of these combination steps [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24]. While they differ substantially in the details, a common theme is the introduction of a varying number of auxiliary events that subtract from the accumulated cross section instead of adding to it. If the number of such negative-weight events becomes large enough, they can severely impair the statistical convergence since large amounts of events are required for a sufficient precision to allow for accurate cancellation of the contributions with opposite signs. In fact, for a fractional negative-weight contribution \(r_-\) the number \(N(r_-)\) of required unweighted Monte Carlo events to reach a given statistics goal is (see e.g. [25])

$$\begin{aligned} N(r_-) = \frac{N(0)}{(1 - 2r_-)^2}. \end{aligned}$$
(1)

Requiring a larger number of events not only increases the computational cost in the generation stage, but especially also in the subsequent detector simulation. A further problem is the increased disk space usage, inducing both short- and long-term storage costs. It is therefore highly desirable to keep the negative-weight contribution small, \(r_- \ll \frac{1}{2}\).

One avenue in this direction is to reduce the number of negative-weight events during event generation, see e.g. [25,26,27,28,29]. A second approach is to eliminate negative weights in the generated sample, before detector simulation [30,31,32,33]. In the following, we focus on the cell resampling approach proposed in [33], which in turn was inspired by positive resampling [30]. Cell resampling is independent of both the scattering process under consideration and any later stages of the event simulation chain. It only redistributes event weights, exactly preserving event kinematics. The effective range over which event weights are smeared decreases with an increasing event density. This implies that the smearing uncertainty decreases systematically with increasing statistics without the need to change the method. As demonstrated for the example of the production of a W boson with two jets at next-to-leading-order (NLO), a large fraction of negative weights can be eliminated without discernible distortion of any predicted observable. A limitation of the original formulation is the computational cost, which rises steeply with both the event sample size and the number of final-state particles.

In Sect. 2, we briefly review the method and describe a number of algorithmic improvements which allow us to overcome the original limitations through a speed-up by several orders of magnitude. In Sect. 3, we then apply cell resampling to high-multiplicity samples with up to several billions of events for the production of a W or Z boson in association with up to five jets at NLO. We conclude in Sect. 4.

2 Algorithmic improvements

In the following, we briefly review cell resampling and describe the main improvements that allow us to apply the method also to large high-multiplicity Monte Carlo samples. For a detailed motivation and description of the original algorithm see [33].

2.1 Cell resampling

At its heart, cell resampling consists of repeatedly selecting a subset of events – referred to as cells – and redistributing event weights within the selected set. The steps are as follows.

  1. 1.

    Select an event with negative weight as the first event (the “seed”) of a new cell \({{\mathcal {C}}}\).

  2. 2.

    Out of all events outside the cell, add the one with the smallest distance from the seed to the cell. Repeat for as long as the accumulated weight of all events within the cell is negative.

  3. 3.

    Redistribute the weights of the events inside the cell such that the accumulated weight is preserved and none of the final weights is negative.Footnote 1

  4. 4.

    Start anew with the next cell, i.e. with step 1.

Note that an event can be part of several cells, but will only be chosen as a cell seed at most once. In practice, we usually want to limit the maximum cell size and abort step 2 once the distance between the cell seed and its nearest neighbour outside the cell becomes too large. We denote the maximum cell radius, i.e. the maximum allowed distance between the cell seed and any other event within the cell, by \(d_\text {max}\). It is a parameter of the cell resampling algorithm. If we limit the cell size in this way, we can only achieve a partial cancellation between negative and positive event weights in the following step 3. For practical applications this is often sufficient, since the small remaining contribution from negative weights has a much reduced impact on the statistical convergence, cf. Eq. (1).

The computational cost of cell resampling tends to be completely dominated by the nearest-neighbour search in step 2. In a naive approach, one has to calculate the distances between the cell seed and each other event in the sample. Since the number of cells is proportional to the sample size N, the total computational complexity is \({{\mathcal {O}}}(N^2)\). This renders the naive approach unfeasible for samples with more than a few million events. For this reason, an alternative approximate nearest-neighbour search based on locality-sensitive hashing (LSH) [34, 35] was considered in [33]. While this lead to an improved scaling behaviour, the quality of the approximate search was also found to deteriorate with an increasing sample size. An improved version of this algorithm, discussed in Appendix A, still appears to suffer from the same problem. In Sect. 2.2, we introduce an exact search algorithm that is orders of magnitude faster than the naive search.

The problem of costly distance calculations is further exacerbated by the fact that a direct implementation of the originally proposed distance function suffers from poor scaling for high multiplicities. To compute the distance between two events e and \(e'\), we first cluster the outgoing particles into infrared-safe physics objects, e.g. jets. We collect objects of the same type t into sets \(s_t\) for e and \(s_t'\) for s. The distance between the two events is then

$$\begin{aligned} d(e,e') = \sum _t d(s_t, s_t'), \end{aligned}$$
(2)

where \(d(s_t, s_t')\) is the distance between the two sets \(s_t, s_t'\). It is given by

$$\begin{aligned} d(s_t,s_t') = \min _{\sigma \in S_P} \sum _{i=1}^P d(p_i, q_{\sigma (i)}), \end{aligned}$$
(3)

where \(p_1,\dots , p_P\) are the momenta of the objects in \(s_t\) and \(q_1,\dots , q_P\) the momentaFootnote 2 in \(s_t'\). A naive minimisation procedure considers all permutations \(\sigma \) in the symmetric group \(S_P\), i.e. P! possibilities. For large multiplicities P a direct calculation quickly becomes prohibitively expensive. In [33], it was therefore suggested to use an approximate scheme in this case. In Sect. 2.3 we discuss how the set-to-set distance can be calculated both exactly and efficiently.

2.2 Nearest-neighbour search

Our improved nearest-neighbour search is based on vantage-point trees [36, 37]. To construct a vantage-point tree, we choose a single event as the first vantage point. We then compute the distance to the vantage point for each event. The closer half of the events lie within a hypersphere with radius given by the median distance to the vantage point. We call the populated part of this hypersphere the inside region and its complement the outside region. We then recursively construct vantage-point trees inside each of the two regions. The construction terminates in regions that only contain a single point.

To find the nearest neighbour for any event e, we start at the root of the tree, namely the first chosen vantage point. We calculate the distance D between this vantage point and e. If D is less than the radius R of the hypersphere defining the inside region, we first continue the search in the inside subtree, otherwise we choose the outside subtree first. Let us first consider the case that the inside region is the preferred one. It will contain a nearest-neighbour candidate with a distance d to the initial event e. By the triangle inequality we deduce that the actual nearest neighbour can have a distance of at most \(D+d\) to the current vantage point. Therefore, if \(D+d < R\), the actual nearest neighbour cannot be in the outside region. Conversely, if we started our search in the outside region and found a nearest-neighbour candidate with \(D-d > R\), then the actual nearest neighbour cannot lie in the inside region. In summary, if \(d < |R - D|\) only the preferred region has to be considered.

Vantage-point tree search is indeed very well suited for cell resampling. The construction is completely agnostic to the chosen distance function. In particular, unlike the LSH-based methods considered in [33] and Appendix A, it does not require a Euclidean metric. For an event sample of size N, the tree construction requires \({{\mathcal {O}}}(N \log N)\) steps and can be easily parallelised. In the ideal case where only the preferred regions are probed, each nearest-neighbour search requires \(\log _2 N\) comparisons, which again results in an overall asymptotic complexity of \({{\mathcal {O}}}(N \log N)\). While this means that for sufficiently large event samples cell resampling will eventually require more computing time than the \({{\mathcal {O}}}(N)\) event generation, we find that this is not the case for samples with up to several billion events. Timings for a practical application are given in Sect. 3.3.

We further optimise the nearest-neighbour search in several aspects. Most importantly, if we limit the maximum cell size to \(d_{\text {max}}\), we can dramatically increase the probability that only the preferred regions have to be considered. In fact, if \(|R - D| > d_{\text {max}}\) then any suitable nearest neighbours have to lie inside the preferred region. We can further enhance the probability through a judicious choice of the vantage points. Since input events near the boundary between inside and outside regions require checking both regions for nearest neighbours, the general goal is to minimise this surface. To this end, we choose our first vantage point at the boundary of the populated phase space. We select a random event, calculate the distance to all other events, and choose the event with the largest distance as the vantage point. Then, when constructing the subtrees for the inside and outside regions, we choose as vantage points those events that have the largest distance to the parent vantage point.

When constructing a cell, we have to find nearest neighbours until either the accumulated weight becomes non-negative or the distance exceeds the maximum cell radius. This corresponds to a so-called k nearest neighbour search, where in this case the required number k of nearest neighbours is a priori unknown. To speed up successive searches, we cache the results of distance calculations, i.e. all values of D for a given input event.

Finally, we note that the vantage-point tree can also be employed for approximate nearest-neighbour search if one only searches the preferred region in each step. We exploit this property by first partitioning the input events into the inside and outside regions of a shallow vantage point tree, aborting the construction already after the first few steps. We then apply cell resampling to each partition independently. This approach allows efficient parallelisation, while yielding much better results than the independent cell resampling of randomly chosen partial samples. The price to pay is that the quality of the nearest-neighbour search and therefore also of the overall resampling deteriorates to some degree. In practice this effect appears to be minor, see also Sect. 3.2.

2.3 Set-to-set distance at high multiplicities

The distance between two events as defined in [33] is the sum of distances between sets of infrared-safe physics objects, see Eq. (2). To define the distance between two such sets \(s_t, s_t'\), we aim to find the optimal pairing between the momenta \(p_1,\dots , p_P\) of the objects in \(s_t\) and the momenta \(q_1,\dots , q_P\) of the objects in \(s_t'\). The naive approach of considering all possible pairings, cf. Eq. (3), scales very poorly with the number of objects. However, the task of finding an optimal pairing is an instance of the well-studied assignment problem.

Table 1 NLO event samples used for cell resampling

Let us introduce the matrix \({{\mathbb {D}}}\) of distances with

$$\begin{aligned} {{\mathbb {D}}}_{ij} \equiv d(q_i, p_j). \end{aligned}$$
(4)

An efficient method for minimising \(\sum _{i=1}^{P} {{\mathbb {D}}}_{i\sigma (i)}\) was first found by Jacobi [38, 39]. It was later rediscovered independently and popularised under the name “Hungarian method” [40,41,42,43,44]. The algorithm mutates the entries of \({{\mathbb {D}}}_{ij}\) in such a way that the optimal pairing is preserved during each step. After each step, one marks a minimum number of rows and columns such that each vanishing entry is part of a marked row or column. The algorithm terminates as soon as all rows (or columns) have to be marked. The mutating steps are as follows:

  1. 1.

    Replace each \({{\mathbb {D}}}_{ij}\) by \({{\mathbb {D}}}_{ij} - \min _{k} {{\mathbb {D}}}_{ik}\).

  2. 2.

    Replace each \({{\mathbb {D}}}_{ij}\) by \({{\mathbb {D}}}_{ij} - \min _{k} {{\mathbb {D}}}_{kj}\).

  3. 3.

    Find the smallest non-vanishing entry. Subtract it from all unmarked rows and add it to all marked columns.

Step 3 is repeated until the termination criterion is fulfilled. In our code, we use the implementation in the pathfinding [45] package. Like the remainder of our implementation of cell resampling it is written in the Rust programming language.

Using the Hungarian algorithm instead of a brute-force search improves the scaling behaviour for sets with P momenta from \({{\mathcal {O}}}(P!)\) to \({{\mathcal {O}}}(P^3)\). In practice, we find it superior for \(P > 3\). The FlowAssign algorithm proposed by Ramshaw and Tarjan [46] would scale even better, with a time complexity of \({{\mathcal {O}}}\big (P^{5 / 2}\log (D P)\big )\). The caveat is that the scaling also depends logarithmically on the range D of distances encountered. Since the maximum multiplicity reached in current NLO computations is limited, an auction [47] (or equivalently push-relabel [48, 49]) algorithm may still perform better in practice despite formally inferior scaling behaviour. We leave a detailed comparison to future work.

3 Negative weight elimination in vector boson plus jets production at NLO

We are now in a position to apply cell resampling to large high-multiplicity event samples. We consider the production of a vector boson in association with jets at NLO, using ROOT [50] ntuple [51, 52] event files generated with BlackHat [53] and Sherpa 2.1 [54]. Jets are defined according to the anti-\(k_t\) algorithm [55] with \(R = 0.4\) and a minimum transverse momentum of \(25\,\)GeV. More details on the event generation are given in [52, 56]. The various samples with their most salient properties are listed in Table 1.

We apply cell resampling to each of the samples, defining infrared-safe physics objects according to the above jet definition. We use the distance function defined in [30], which follows from Eqs. (2), (3), and the momentum distance

$$\begin{aligned} d(p, q) = \sqrt{|\textbf{p} - \textbf{q}\,|^2 + \tau ^2 (p_\perp - q_\perp )^2}. \end{aligned}$$
(5)

Here, we set \(\tau = 0\) and limit the maximum cell radius to 10 GeV for samples Z1, Z2, and W5, and to 2 GeV for sample Z3. To examine the impact of these choices for the maximum radius we additionally compare to a resampling run with a maximum cell size of 100 GeV for sample W5.

For better parallelisation and general performance we pre-partition each input sample into several regions according to one of the upper levels of a vantage-point tree, as explained in Sect. 2.2. Here, we use the seventh level, corresponding to 128 regions.

To interpret our results we use standard Rivet [57] analyses. We verify that the event count and total cross section of each sample is preserved using the MC_XS analysis. Furthermore, we employ this analysis to assess the degree to which negative weights are eliminated. For the sample W5 we additionally use the MC_WINC and MC_WJETS analysis, and their counterparts MC_ZINC and MC_ZJETS for the remaining samples involving a Z boson. We investigate the impact of additional cuts applied after cell resampling using the ATLAS analysis ATLAS_2017_I1514251 [58] for inclusive Z boson production.

3.1 Comparison of predictions

We first assert that predictions remain equivalent by comparing a number of distributions before and after cell resampling. Figure 1 shows a variety of distributions for sample W5. In Fig. 2 we show selected distributions for the Z1, Z2, and Z3 samples. In all cases, we find that the differences between original and resampled predictions are comparable to or even smaller than the statistical bin-to-bin fluctuations in the original. A more indirect way to estimate the bias introduced by cell resampling is to consider the characteristic cell radii and the spread of measured observables within the cells. This is discussed in Appendix 1.

Fig. 1
figure 1

Comparison of distributions before and after cell resampling for sample W5 in Table 1. The blue lines indicate cell resampling with a maximum cell radius of 10 GeV, the green lines result from a radius limit of 100 GeV. Distributions are normalised according to the total cross section for sample W5

Fig. 2
figure 2

Comparison of distributions before and after cell resampling for samples Z1, Z2, and Z3 in Table 1. ad show jet transverse momentum and rapidity distributions taken from the ATLAS_2017_I1514251 Rivet analysis. e is a jet rapidity distribution taken from MC_ZJETS

3.2 Improvement in sample quality

In order to assess the improvement achieved through cell resampling, we first consider the reduction in the negative weight contribution. To this end, we determine how much larger the original and the resampled event samples have to be to reach the same statistical power as an event sample without negative weights. In other words, we compute \(N(r_-)\) as defined in Eq. (1), where the fractional negative-weight contribution \(r_- = \sigma _-/(\sigma _+ + \sigma _-)\) is obtained from the contribution \(\sigma _+\) of positive-weight events to the total cross section \(\sigma \) and the absolute value of the negative-weight cross section contribution \(\sigma _- = \sigma _+ - \sigma \).

As demonstrated in Fig. 3, cell resampling leads to a drastic improvement by roughly two to three orders of magnitude. Increasing the maximum cell radius leads to an even stronger reduction, at the cost of increased computing time and potentially larger systematic errors introduced by the procedure. To assess the impact of pre-partioning the event samples, we alternatively resample Z1 without prior partitioning. This leads to a slight reduction of \(N(r_-)/N(0)\) from 18.4 with pre-partitioning to 17.1 without pre-partitioning.

Fig. 3
figure 3

Required number of events relative to an ideal event sample without negative weights before and after resampling. Event samples are labeled as listed in Table 1

Cell resampling not only reduces the amount of negative weights, but as a by-product also results in a narrower weight distribution, enhancing the unweighting efficiency. Indeed, after standard unweighting we retain only 320 out of the \(8.21\times 10^8\) events in the Z1 sample. If we apply resampling beforehand, unweighting yields 11,574 events. The resulting unweighted sample is not only larger, but also contains a lower fraction of negative-weight events. We show the gain in statistical power by selecting a subset of 320 randomly chosen events and compare to the unweighted sample based on the original events. Selected distributions are shown in Fig. 4.

Fig. 4
figure 4

Comparison between unweighted samples before and after cell resampling. Lines labeled “original” show the reference prediction from the original weighted event sample Z1. After standard unweighting, the lines with the label “unweighted” are obtained. Applying cell resampling followed by standard unweighting to the sample Z1 yields a sample represented by the “resampled + unweighted” lines. Out of this sample, we randomly select a subset matching the size of the original “unweighted” sample. This leads to the “resampled + unweighted (small sample)” lines. Data points are taken from [58]

3.3 Runtime requirements

Cell resampling with the improvements presented in Sect. 2 and a maximum cell size of 10 GeV typically takes a few hours of wall-clock time for samples with about a billion events. As an example, let us consider the resampling for the W5 sample listed in Table 1. The combined size of the original compressed event files is approximately 150 GB. The resampling program requires about 450 GB of memory and the total runtime is about 9 h on a machine with 24 Intel Xeon E5-2643 processors. The memory usage could of course be reduced significantly at the cost of computing time by not keeping all events in memory, but we have not explored this option in our current implementation. Reading in the events and converting them to a space-efficient internal format that only retains the information needed for resampling takes about 2 h. This is followed by approximately 30 min spent for the pre-partitioning of the event sample and less than 3 h for resampling itself, including the construction of the search trees, cf. Sect. 2.2. Since the event information in the internal format is incomplete, we finally read in the original events again and write them to disk after updating their weights. This final step takes roughly 4 h. While input and output do not benefit from parallelisation, the pre-partitioning and the resampling are performed in parallel and the total CPU time spent is 55 h.

One important optimisation discussed in Sect. 2.2 is trimming the nearest-neighbour search according to the maximum cell radius. In fact, when increasing the allowed radius from 10 to 100 GeV the wall clock time needed for resampling rises to several weeks, with a corresponding increase in total CPU time. Extrapolating from smaller sample sizes, the expected total required CPU time without any of the new optimisations would be of the order of 1600 years for the much simpler process of W boson production with two jets considered in [33].

4 Conclusions

We have demonstrated that the fraction of negative event weights in existing large high-multiplicity samples can be reduced by more than an order of magnitude, whilst preserving predictions for observables within statistical uncertainties. Concretely, we have employed the cell resampling method proposed in [33] with NLO event samples for Z boson production with up to three jets and W boson production with five jets produced with Sherpa and BlackHat.

For the first time, cell resampling has been applied to samples with up to several billions of events. This was made possible by algorithmic improvements leading to a speed-up by several orders of magnitude. Our updated implementation can be retrieved from https://cres.hepforge.org/.

The advances in the development of the cell resampling method presented in this work pave the way for future applications to processes with high-multiplicities, in particular including parton showered predictions. It will be necessary to quantify the uncertainty introduced by the weight smearing. Variations in the maximum cell size parameter and different prescriptions for weight redistribution within a cell can serve as handles to assess this uncertainty. Another promising avenue for further exploration is the analysis of the information on weight distribution within phase space collected during cell resampling. Regions with insufficient Monte Carlo statistics could be identified by their accumulated negative weight, thereby guiding the event generation. We leave the investigation of these questions to future work.