Competing Sudakov Veto Algorithms

We present a way to analyze the distribution produced by a Monte Carlo algorithm. We perform these analyses on several versions of the Sudakov veto algorithm, adding a cutoff, a second variable and competition between emission channels. The analysis allows us to prove that multiple, seemingly different competition algorithms, including those that are currently implemented in most parton showers, lead to the same result. Finally, we test their performance and show that there are significantly faster alternatives to the commonly used algorithms.


Introduction
Parton showers form an integral part of the event generators that are commonly used to compare data from collider experiments with theory [1][2][3]. The Sudakov veto algorithm is used in the procedure of generating the subsequent emissions that make up the shower. It facilitates the resummation of logarithmic contributions to all orders in the coupling constant in a Monte Carlo framework, thereby producing realistic final states. A positive ordering variable (scale) t is typically evolved down from an initial scale u, generating ordered branchings of partons. The scale of the next branching is selected according to a probability distribution of the form where a e-mail: R.Kleiss@science.ru.nl b e-mail: rverheyen@science.ru.nl The function p(t) is the branching kernel. The function (t, u) is known as the Sudakov form factor. It represents the probability of no emission occurring between two scales. In a Monte Carlo setting, scales must be sampled from Eq. (1). To do that, the inverse of the Sudakov form factor must be computed. Unfortunately, p(t) is typically not simple enough for this inverse to be analytically calculable. Therefore, the Sudakov veto algorithm is used. In this paper, we will present a thorough analysis of this algorithm. In a practical setting, Eq. (1) has to be extended in several ways, one of which is the competition between branching channels. We will analyze the veto algorithm for these extensions, and we will in particular provide multiple algorithms to handle competition. Among these algorithms are those used currently by event generators, and some alternatives which, although seemingly different, will be shown to be equivalent. By implementing them in an antenna parton shower much like [4][5][6], we test their performance and show that the alternative algorithms are much faster. This paper is organized as follows. In Sect. 2, we will first set up a formalism to analyze Monte Carlo algorithms in general. This formalism is then used to show the validity of the Sudakov veto algorithm in Sect. 3. Next, in Sect. 4, the algorithm is extended to include a cutoff scale, a second variable and competition between branching channels. We will then prove the equivalence of several different algorithms for competition. In Sect. 5, the performance of these algorithms is tested by implementing them in an actual parton shower.

The unitary algorithm formalism
A useful approach to the analysis of algorithms can be formulated in terms of integration results. This can be denoted the formalism of unitary algorithms. The idea is that these integration results can be translated, at the one hand, into positive statements and, on the other hand, into readily implementable pseudocode. Let g(x) be a probability density. Then, the formula on the one hand reads 'we have an algorithm to generate random numbers according to the distribution g(x) , and, on the other hand, the pseudocode statement which says that the number x be obtained from the algorithm delivering the distribution g. As a simple example, the statement implies that we have available an algorithm that delivers random numbers x, uniformly distributed with the density θ(0 < x < 1), where we have defined the logical step function And indeed, this just says 'we generate a random number uniformly distributed between 0 and 1', using the pseudorandom number generator of choice. 1 In fact, to shorten notation later on, we will denote any random number generated according to Eq. (5) by ρ. A second ingredient of the formalism is the assignment operation which is equivalent to the pseudocode statement We shall of course use the standard result where the sum runs over the roots x j of h(x) = y (all assumed to be single). It is to be noted here that the integral over y runs over all real values, but if the range of h is restricted to h 0 ≤ h(x) ≤ h 1 , then we automatically have the corresponding bounds on y.
As a simple example, let us imagine the inverse of the primitive function P(t) of p(t) from Eq. (1) is available. 1 A possible source of conflict is that the formalism uses the realnumber model of computation, while of course the actual code uses finite-wordsize numbers. On the other hand, any algorithm that is sensitive to the difference between the two models of computation is tainted and should be shunned.
The pseudocode to generate values of t according to Eq. (1) is: where ρ here and in the following comes from an (idealized) source of iid 2 random numbers uniform in (0, 1]. We analyze Eq. (10): So that 'we have an algorithm to generate t according to Eq. (1) , where the algorithm is of course given by Eq. (10).
A variant of the formalism is encountered in the rejection algorithm, which is already very close to the Sudakov veto algorithm. Let g(x) be a probability density that we can generate, f (x) a non-negative function, and c a number such that c g(x) ≥ f (x) over the support of f (x). The rejection algorithm then reads.

Algorithm 1 The rejection algorithm
Let K (x) be the resulting density. We can then write δ(x − y) 2 Independent, identically distributed.
from which we see that K (x) is the normalized probability density proportional to f (x): Note how the loop is embodied by the reappearance of K (x) on the right-hand side in the first line of Eq. (12). With these few basic ingredients the result of any algorithm (provided it terminates with unit probability) can be reduced to the elimination of Dirac delta functions, and we shall employ these ideas in what follows.

Analyzing the Sudakov veto algorithm
We now present the Sudakov veto algorithm and analyze it using the techniques of the previous section. We first establish that Eq. (1) is normalized if P(t), the primitive function of The Sudakov veto algorithm relies on the existence of an overestimate function q(t) ≥ p(t) which does have an invertible Sudakov factor. The algorithm is given below in pseudocode.

Algorithm 2 The Sudakov veto algorithm
It was shown in the previous section that the first step in the loop generates values of t distributed according to Eq.
(1) where the kernel is q(t) instead of p(t), and the scale u is set to the previous value of t. Thus, the value of t is evolved downward at every step of the loop, which is the crucial difference with algorithm Eq. (12). There, subsequent values for t would be generated in the same way every time. The ifstatement represents the veto step. A scale is accepted with probability p(t)/q(t), at which point the algorithm terminates. We now convert the algorithm to unitary language as we did before in Eq. (12) for the rejection algorithm.
After generating a trial scale τ , the random number ρ and the step functions guide the algorithm to either accept the generated scale, or to start over using τ as the new starting point. Next, the integral over ρ is worked out.
Taking the derivative with respect to u, we find the following differential equation: It is solved by which is Eq. (1). It is, however, not the most general solution to Eq. (17). We will consider this issue more carefully in the next section.

Extending the algorithm
Next, we consider the Sudakov veto algorithm in a more practical setting. The algorithm needs to be extended in several ways to be applicable in a real parton shower. They are: • An infrared cutoff μ has to be introduced. This cutoff is required in QCD to avoid the nonperturbative regime. In event generators, the parton shower is evolved to this cutoff scale, after which the results are fed to a hadronization model. The consequence is that the Sudakov factor will not equal zero at the lower boundary of the scale integral. Therefore Eq. (1) is no longer normalized to one and is thus not a probability distribution. • The scale variable t is not enough to parameterize the entire branching phase space. An additional variable z has to be introduced. 3 In traditional parton showers, this parameter is the energy fraction carried by a newly created parton. However, in the more modern dipole or antenna showers, it is just a variable that parameterizes the factorized phase space. The boundaries of the branching phase space translate to scale-dependent boundaries on z. • The algorithm has to account for emissions from multiple channels. These channels can originate from either the presence of multiple partons or dipoles, or from multiple branching modes.
We now include these issues separately before incorporating them into a single algorithm.

Introducing a cutoff
In a realistic parton shower, the values of the scale t are not allowed to reach zero. In the case of QCD, a cutoff value μ is set at a value of about 1 GeV, below which a perturbative approach is no longer valid. Equation (1) now no longer represents a probability distribution. This same problem would occur if the primitive of the branching kernel P(t) would not diverge for vanishing t, as is for instance the case for kernels of massive particles. The following algorithm, due to [7], allows for the introduction of a cutoff and deals with nondiverging P(t) simultaneously. The algorithm below first shows how to generate trial values for t.
We analyze this algorithm to find what probability distribution it represents.
where in the last step we used the fact that q(t) is a positive function, and therefore Q(t) is monotonically increasing. Compared with Eq. (11), Eq. (19) has an additional term that compensates the contribution of the lower bound on the original probability distribution. The veto algorithm should reproduce this distribution for the branching kernel p(t).

Algorithm 4
The Sudakov veto algorithm in the presence of a cutoff μ Writing it down in unitary language: Going through the same steps as before, we find After taking the derivative with respect to u, the first term drops out and the μ-dependence disappears from the second. Therefore, Eq. (17) is recovered. However, Eq. (18) is not the only solution to this differential equation. A more general solution is: for some scale σ < u. To fix sigma, we require that E(t; u) reduces to a delta function distribution when u → μ, which leads to σ = μ.

Introducing a second variable
The targeted distribution is now: where which is normalized as We now need to produce pairs (t, z) distributed according to E(t, z; u). A difficulty lies in the dependence of the range of z on the scale. In order to generate a value for t, the ζ integral in the Sudakov factor is required, which depends on t. On the other hand, z cannot be generated first, since its boundaries depend on t.
To deal with this problem, an additional veto condition is introduced. We introduce a constant overestimate of the z-range as z − ≤ z − (t) and z + ≥ z + (t). Additionally we require the overestimate function to be factorized as q(t, z) = r (t)s(z) where still q(t, z) ≥ p(t, z). Then, we define The algorithm is given below.

z)/q(t, z) and z − (t) < z < z + (t) then return t end if end loop
We first analyze the step of this algorithm that generates z.
Thus, z is distributed according to s(z). Introducing the notation we now analyze Algorithm 5.
Evaluating the integrals and taking the derivative with respect to u leads to: which is solved by Eq. (23).

Competing channels
Let us assume there are n branching channels, each characterized by a branching kernel p i (t). The density E(t; u) now contains a Sudakov factor representing the no-branching probability for all channels, which is just the product of the individual Sudakov factors. The probability of branching at some scale is the sum of the kernels. Introducing the notation for any set of n functions, this leads to the probability distribution This distribution can be produced by generating multiple scales and selecting the highest. This can be shown using the following result: where we used the notation which is a step function selecting the highest of all τ . The functions f i can be either p i or q i . In the first case, the veto algorithm for a single channel can be used to produce the densities that appear in the first line of Eq. (34). In the second case, the highest of the trial scales is selected and subsequently the veto step is applied using the kernel of the selected channel. Both procedures result in Eq. (32). Next, we present a very different algorithm that also produces this density.

Algorithm 6 A different competition Sudakov veto algorithm
Select i between 1 and n with probability

t) then return t end if end loop
We analyze this algorithm to show that it also produces Eq. (32): where q 0 (t) ≡ 0. We go through the usual steps, noting that after doing the ρ 1 integral, the new sum over step functions yields terms q i (τ )/ q(τ ) representing the probabilities to select the corresponding channels. The differential equation becomes: which is solved by Eq. (32). Algorithm 6 requires the generation of trial scales using q(t) as overestimated branching kernel. In practice, this is often not much harder than generating trial scales for individual channels, since the kernels q i (t) can usually be chosen to have the same t-dependence. In such a case, the channel selection step in Algorithm 6 does not even require the evaluation of the kernels at the trial scale anymore. We note that Algorithm 6 can still be used in more complicated situations by using the procedure outlined in Eq. (34) to split q(t) up into groups of similar channels. In the next chapter, we incorporate the extensions discussed here into a full, practical veto algorithm. Since it was found there are multiple ways to handle competition, these algorithms are tested for their computing times.

Testing the algorithms
We now combine all the pieces discussed in the previous section into a single algorithm. Here, we give a description of the full algorithms that all handle competition differently. A concrete statement of the algorithms can be found in the appendix. Additionally, the expression of every algorithm in unitary language is included. These equation can all be shown to be satisfied by: • Veto-Max: This algorithm handles competition using Eq. (34), where f i (t, z) = p i (t, z). That is, the veto algorithm is applied to every channel individually, then the highest of the generated scales is selected. This is the most common way of handling competition. It is usually cited in the literature as the competition algorithm [7,8], and is used in most parton showers.
• Max-Veto: This algorithm also uses Eq. (34), but with f i (t, z) = q i (t, z). That is, trial pairs (t, z). The highest of these scales is selected, to which the veto step is applied using the branching kernel of the selected channel. This algorithm is used in the Vincia parton shower [4,5]. • Generate-Select: This is the new algorithm described in Sect. 4.3. It generates trial scales τ using the sum of the overestimate functions q(t, z). The overestimate functions are required to have the same z-dependence. That way, a corresponding ζ can be generated using boundaries that are overestimates for all channels. Next, a channel i is selected with probability q i (τ )/ q(τ ). Then, the veto step is applied to this channel. • Select-Generate: Under certain circumstances, a slight variation of the Generate-Select algorithm is possible.
If we require all overestimate functions q i (t, z) to have the same scale dependence, this dependence drops out of the selection probabilities. In that case, a channel can be selected before a scale is generated. As a consequence, the overestimate functions can have different dependence on z, and universal overestimates are no longer required.
We test these algorithms by implementing them in a relatively simple antenna shower very close to what is described in [4,5]. This shower handles QCD radiation using an antenna scheme to include collinear and soft enhancements. It features exact 2 → 3 kinematics for massive particles, but does not include any matching scheme and concerns only final state radiation. It is very basic compared with the parton showers of [1][2][3] or recent versions of the Vincia shower [6], including only the absolute necessities for a functional parton shower.
The running coupling is taken into account by an overestimatê where a and b are chosen such that, at the starting scale and the cutoff scale,α s (t) matches the real one-loop running α s (t), which includes the proper flavor thresholds. This overestimate is corrected by usingα s (t) for the overestimate kernels and α s (t) for the branching kernels. The possible branchings for a QCD shower can be divided into two categories: emissions, where a quark or gluon sends out a new gluon, and splittings, where a gluon splits into a quark-antiquark pair. We use p ⊥ -ordering for both for easy application of the Generate-Select and Select-Generate algorithms. The following overestimate kernels were used: where λ is the Källén function, m 1 and m 2 are the masses of the particles in the antenna and s 12 is its invariant mass. Note that a factor n F is included in the overestimate of the splitting kernel. It is there because Vincia uses a mix of the Max-Veto and the Generate-Select algorithms. If a gluon splitting is selected through the Max-Veto algorithm, a quark flavor is chosen at random as is done by the Generate-Select algorithm. We use the antennae functions given in given in [5] for the splitting kernels. The code can be found in [9]. We compare the performance of the algorithms described above on this shower. In the Veto-Max algorithm we have implemented the following shortcut. While running the single-channel veto algorithm on all available channels, the algorithm keeps track of the highest scale generated thus far. Then, if a scale lower than this highest scale is ever reached, the veto algorithm on the current channel can immediately be aborted. This trick is not available for the Max-Veto algorithm, because it performs the veto step after selecting the highest trial scale between all channels.
For the Select-Generate algorithm, the bottleneck is the channel selection step. It is complicated by the fact that the Källén function and the z integral in the overestimates are different for every antenna. We use stochastic roulettewheel selection [10] for the selection step, which achieves O(1) complexity. 4 The Generate-Select algorithm assigns the same boundaries for the z integral for all channels, but retains differences in the Källén function. We move this difference to the veto step by using the lowest Källén function of all antennae for all channels, increasing the overestimation of the branching kernels. Then, for n F = 6 and the standard values C A = 3 and T R = 1/2, all overestimate functions are the same, and the channel selection step is trivial. In this sense, the difference between the Generate-Select and the Select-Generate algorithms is a trade-off between easier selection of a channel and lower veto rates.
A remark is in order here. In the splitting g → qq the original colour structure is separated into two pieces which can be evolved independently. Since our interest here is in the speed of the various algorithms rather than the development of a fully realistic parton shower, we have not implemented this effect.
We produce 8 million events per algorithm. The initial scale is (7 TeV) 2 and the cutoff scale is (1 GeV) 2 . These settings produce events with parton multiplicities of O(100), which are typical at the LHC. To check the equivalence of the veto algorithms, we compute the average amounts of quarks and gluons generated per event. These numbers are very sensitive to small differences in distribution. Table 1 shows these averages for every algorithm. Figure 1 shows the average amount of CPU time the shower requires to produce events, plotted as a function of the amount of available branching channels as the shower terminates. This measure gives us a good idea of the performance of the algorithms in a practical context. The shape of the curves of the Veto-Max and the Max-Veto algorithms should not be heavily influenced by the specifics of the shower, since factors like branching kernel evaluation times and veto probabilities should be similar for different implementations. However, the relative performance of the Generate-Select and the Select-Generate algorithms does depend on the specific implementation. In this case, the algorithms perform similarly, but this may not be the case for other branching kernels. Either way, the Generate-Select and the Select-Generate algorithms perform much better than the Veto-Max and the Max-Veto algorithms.

Conclusion
The Sudakov veto algorithm forms an integral part of all modern parton shower programs. We describe a formalism that can be used to analyze the distributions that are produced by different versions of this algorithm. Using this method, we discuss various ways of handling competition. While seemingly different, our formal analysis shows that they produce the same distributions. The algorithms were tested using a simple antenna shower, which showed that the new algorithms are faster than the traditional algorithms used in most parton shower programs currently, which may be of considerable importance for higher energy events or for the inclusion of more types of radiation.