Shower it again, Pythia

The Parton-Shower algorithm implement in the Pythia generator is applied multiple times to the same parton-level configuration to estimate the systematic uncertainty affecting large-radius jet substructure variables associated with the stochastic nature of the algorithm. Results are presented in the case of boosted $h\rightarrow b\bar{b}$ and $t\rightarrow bqq$. The code is publicly available on the repository https://github.com/rdisipio/ReShower.git


Introduction
Among different sources of uncertainty, there are known knowns, things we know we know. There are also known unknowns, some things we know may exist but do not know much about. Notoriously, there are unknown unknowns -the ones we don't know we don't know. In this paper I want to discuss an example of a unknown knowns, that is to say things we do not single out on purpose, but in fact we know a lot about. The usual argument to ignore them is that they are small compared to other uncertainties or can not be easily factored out. In this work I will focus on one of them, the intrinsic stochastic uncertainty of the parton shower algorithm. What happens if the very same hard-scattering event is showered time and again? Shouldn't we consider the fact that, for each generated event, we observe only a single realization of the process? Such an uncertainty is likely to be covered when a very large sample of events are simulated: one can assume that among millions of events, a number of them lay very close to each other in the phase-space so that those are virtually identical. There may be some outliers for which this is a too strong assumption. Also, this procedure can be used to assign an uncertainty to a single event and may turn out to be useful to train a classifier or other deep learning systems. Finally, it can be viewed as an example of likelihood-free inference, in which the probability of an event to be showered following a certain history is conditioned by the parton-level four-momentum of the originating particle.
To restrict the aim of the study to a few concrete examples, a Higgs boson or a top quark are generated always with the same four-momentum, and then decayed by the Pythia8 Shower Monte Carlo generator [1] in the bb or fullyhadronic channel respectively. Afterwards, large-radius jets are reconstructed at hadron level to see how many times the particle is identifiable using jet tagging techniques as a function of kinematic observables such as the jet transverse momentum (p T ) and pseudo-rapidity (η). One can expect that the decay products of a very high-p T particle will remain collimated in the vast majority of the times, but this picture may change dramatically as the transverse momentum of the original particle approaches the Lorentz boost threshold given by: where m is the mass of the particle and R is the distance parameter of the anti-k T algorithm used for the jet clustering. In the kinematic region close to the threshold, effects due to the emission of QCD radiation can affect the final state configuration so that a single large-radius jet may be unable to capture all the decay products and hence be ineffective to identify such boosted particles. One of the aims of this paper is to quantify how often this happens in the cases under consideration, and how the resulting event kinematics is affected by the intrinsic stochastic nature of the decay and parton shower processes.

Pythia Setup
Higgs bosons and top quarks are individually generated in the particlegun setup that is possible in Pythia8 8.240 [1], i.e. a single particle is added to the event record with no underlying pp scattering process. Only the hadronic decays h → bb and t → bqq are allowed. Jets are reconstructed using the anti-k T algorithm [2] with distance parameter R = 1.0 implemented in FastJet 3.3.2 [3]. The calculation of the N -subjettiness [4] ratios τ 21 = τ 2 /τ 1 and τ 32 = τ 3 /τ 2 is implemented in the FastJet plugin distributed with the package FastJet-contrib 1.041. The N -subjettiness is defined as where the index k runs over the constituent particles in a given jet, p T,k are their transverse momenta, ∆R J,k = (∆η) 2 + (∆φ) 2 is the distance in the rapidity-azimuth plane between a candidate subjet J and a constituent particle k, and d 0 is a normalization constant defined as k p T,k R, where R is the distance parameter used in the clustering algorithm (in this case R=1.0).

Results
The effect of the repeated application of the parton-shower algorithm is presented in Figures 1 and 2   • |m jet − m h,P DG | < 30 GeV and τ 21 < 0.5 for the Higgs boson.
Neither is fully efficient, but both reach a plateau in the mid-to-high-p T region. Finally, figures 5 and 6 show the correlation between the jet mass and the substructure variable used for tagging for the Higgs boson and the top quark respectively. As a result, the tagger selects the region where the decay products are contained in the large-R jet, but in both cases at low transverse momentum about half of the events fall outside that kinematic region, resulting in a jet with lower mass and high subjettines ratio. In the case of the top quark, the jet mass around 80 GeV indicates that such jet corresponds to the boosted W boson.    Red lines indicate the region selected by the simple tagger.   Acknoledgments I would like to thank Pekka Sinervo, Kyle Cormier and Francesco Spanò for the very useful discussions about this topic. We'll always have Monte