Treating jet correlations in high pile-up at hadron colliders

Experiments in the high-luminosity runs at the Large Hadron Collider face the challenges of very large pile-up. Primary techniques to deal with this are based on precise vertex and track reconstruction. Outside tracker acceptances, however, lie regions of interest for many aspects of the LHC physics program. We explore complementary approaches to pile-up treatment and propose a data-driven jet-mixing method which can be used outside tracker acceptances without depending on Monte Carlo generators. The method can be applied to treat correlation observables and take into account, besides the jet transverse momentum pedestal, effects of hard jets from pile-up.


Introduction
Experiments at hadron colliders operating with very high luminosity face the challenge of pile-up, namely, a very large number of overlaid hadron-hadron collisions per bunch crossing. At the Large Hadron Collider (LHC), in Run I data the pile-up is about 20 pp collisions on average, while it reaches the level of over 50 at Run II, and increases for higher-luminosity runs [1][2][3][4][5][6][7][8][9][10][11]. In regions covered by tracking detectors, advanced vertexing techniques have been developed to deal with environments characterized by high pile-up. More generally, experiments rely on Monte Carlo simulations which include pile-up for comparisons with data. However, this introduces a significant model dependence, especially in regions where no detailed and precise measurements are available to constrain Monte Carlo modeling.
In this paper we propose a different approach to treating high pile-up, with a view to employing data-driven methods rather than Monte Carlo methods. Our main focus is to deal with potentially large probabilities that jets with high transverse momenta are produced from pile-up events independent of the primary interaction vertex, in a region where tracking devices are not available to identify pile-up jets. A typical application would be Higgs production by vector boson fusion, where the associated jets may be produced outside the tracking detector acceptances. The issue we address is thus quite different from the issues that most of the existing meth-* Corresponding author.
E-mail address: francesco.hautmann@desy.de (F. Hautmann). ods for pile-up treatment are designed to deal with, which are the jet transverse momentum pedestal, due to the bias in the jet transverse momentum from added pile-up particles in the jet cone, and the clustering into jets of overlapping soft particles from pile-up.
In what follows we will therefore use standard existing methods to treat soft particles and the jet pedestal, and devise new approaches to tackle the issue of misidentification which arises, in addition, in cases where precise tracking and vertexing are not feasible. The aim is to look for methods which treat pile-up without spoiling the physics of the signal process and which can be used outside the tracking detector acceptances without depending on Monte Carlo modeling. To this end, we suggest using minimum bias (or jet) samples recorded from data in high pile-up runs and applying event-mixing techniques to relate, via these data samples, the "true" signal to the signal measured in high pile-up.
The approach does not address the question of a full detector simulation including pile-up. Rather, it focuses on how to extract physics signals with the least dependence on pile-up simulation, and how to use real data, rather than Monte Carlo events, at physics object level.
The proposed method applies to the regime of high pile-up which is relevant for the LHC as well as for future high-luminosity colliders. It is designed to treat not only inclusive variables but also correlations. One of the features of the method is that it does not require data-taking in dedicated runs at low pile-up. Rather, the data required for event mixing are recorded at the same time as the signal events in high pile-up runs, so that there is no loss in luminosity. We will illustrate the approach using Drell-Yan lepton pair production associated with jets as a case study. We discuss two main physical consequences of pile-up collisions, the bias in the jet transverse momentum due to pile-up particles in the jet cone, and the misidentification of high transverse momentum jets from independent pile-up events. The method is general and can straightforwardly be extended to a large variety of processes affected by pile-up.

Drell-Yan plus jets at high pile-up as a case study
Let us consider the associated production of a Drell-Yan lepton pair via Z -boson exchange and a jet. We take the jet transverse momentum and rapidity to be p (jet) T > 30 GeV, |η (jet) | < 4.5, and the boson invariant mass and rapidity to be 60 GeV < m (boson) < 120 GeV, |η (boson) | < 2. Event samples are generated by Pythia 8 [12] with the 4C tune [13] for the different scenarios of zero pile-up and N PU additional pp collisions at We reconstruct jets with the anti-k T algorithm [14] with distance parameter R = 0.5. Results for the spectrum in the transverse momentum p T of the Z -boson, for Z + jet events, are shown in Fig. 1  T > 30 GeV the Z -boson p T distribution in boson + jet events will tend to approach the inclusive Drell-Yan spectrum, given by the solid green curve.
More precisely, we can identify two main implications of pileup collisions: a large bias in the jet transverse momentum due to added pile-up particles in the jet cone leading to a jet pedestal, and a large probability that jets with high transverse momentum come from independent pile-up events.
Several methods exist to deal with the jet p T pedestal. These include techniques based on the jet vertex fraction [3] and charged hadron subtraction [5,15], the Puppi method [16], the SoftKiller method [17]. These methods correct for transverse momenta of individual particles, but not for any mistagging. So do approaches inspired by jet substructure studies, such as jet cleansing [18]. In Fig. 2 we apply SoftKiller [17], a new event-wide particle-level pile-up removal method, which can also be used with calorimeter information only. We present results at zero pile-up (N PU = 0), at pile-up N PU = 50, and the result at pile-up N PU = 50 with SoftKiller subtraction (N PU = 50 SK). Fig. 2 illustrates different physical effects of pile-up in the leading jet spectrum and in the Z -boson spectrum. In Fig. 2a we compute the leading jet p T spectrum, and verify that SoftKiller efficiently removes the jet pedestal from pile-up: the zero pileup jet spectrum (solid black curve) is shifted toward larger p T by pile-up collisions (dot-dashed black curve for N PU = 50) but the application of SoftKiller (dashed blue curve N PU = 50 SK) corrects for this and restores the original signal with very good approximation. In Fig. 2b, on the other hand, we compute the Z -boson p T spectrum. The solid black curve is the zero pile-up result, the dotdashed black curve is the N PU = 50 result, and the dashed blue curve is the result of applying SoftKiller. In the higher p T part of the spectrum we observe that there is no need for any correction. In contrast, in the lower p T part significant contributions are present from misidentified pile-up jets. These are not corrected for, and need to be properly treated, particularly in regions outside tracker acceptances where vertexing techniques cannot be relied on to identify pile-up jets [6]. We address this point next.

Uncorrelated event samples and jet mixing
To treat effects beyond soft particles and the jet p T pedestal, we employ event mixing techniques [19][20][21][22] using uncorrelated samples. The main idea is that the signal in the pile-up scenario is obtained via mixing from the signal without pile-up and a minimum bias sample of data at high pile-up. Thus, to identify the contribution of the high p T jets coming from independent pile-up events, we construct a signal plus pile-up scenario in a data-driven manner. We do this by adding physics objects from pileup background to event samples before selection criteria are applied. The approach is designed to treat the region of high number N PU of pile-up events, where (N PU + 1)/N PU ≈ 1.
We illustrate the method by taking a sample containing N PU minimum bias events (which could be recorded data but we just take for illustration as Monte Carlo events), mixing this with the signal at zero pile-up, and then requiring a jet with p  Fig. 2b, this is far from the solid black curve in the lower-p T part of the spectrum. We regard the dashed blue curve as pseudodata in high pile-up. The long-dashed red curve is the jet mixing curve, obtained as described above by mixing the signal with the minimum bias sample. The result of the mixing method is then given as the solid red curve by a simple "unfolding", defined by multiplying the signal by the ratio of the pile-up (dashed blue) curve to the mixing (long-dashed red) curve. We see that without appealing to any Monte Carlo method the true signal is extracted nearly perfectly from the mixed sample. In addition to the closure test carried out above, we have checked the model dependence by applying the mixing procedure to different starting distributions, and verified that in this case as well the unfolding returns the true signal.
In Fig. 4 we plot the maximal value of the relative deviation between the pile-up corrected signal and the true signal ((correctedtrue)/true), for the SoftKiller case without jet mixing (black dots) and for the case with the jet mixing method applied (open circles), as a function of the number of pile-up collisions N PU . We see that, while in the SoftKiller case, in which the jet pedestal is removed, the deviation from the true signal becomes larger as N PU increases, the deviation does not increase with N PU once the jet mixing method is applied to take account of the hard jets from pile-up.
The main advantages of this approach are that it can be used with data recorded in high pile-up, and it does not depend on Monte Carlo algorithms for pile-up correction. In addition, it is interesting to perform control checks by examining results for the jet resolution which we obtain from the jet mixing method. These are shown in Fig. 5. Fig. 5a reports the parton-jet p T correlation, and Fig. 5b the distribution in R = φ 2 + η 2 , where φ and η are respectively the separation in azimuth and rapidity. We see that the features of the "true" signal are well reproduced.

Conclusions
Current methods to deal with pile-up at the LHC employ precise vertex and track reconstruction, in regions where these are available, and in general rely on Monte Carlo simulations to model pile-up for data comparisons. The use of Monte Carlo event generators brings in a significant model dependence particularly in regions where these are not well constrained by measurements.
In this paper we have discussed a different, data-driven approach to the treatment of pile-up, which makes use of minimum bias or jet samples recorded from data taken in high pile-up runs, constructs mixing methods to extract the signal process, and thus circumvents the model dependence implied by the use of Monte Carlo generators.
The methodology is general, and can be applied in measurements to restore correlations between final-state particles. In such measurements two important kinds of pile-up effects are present, exemplified in the case of Z -boson plus jets which we have used for illustration, the jet p T pedestal and the misidentification of high-p T jets from independent pile-up events. While several methods exist to correct for the first effect (as well as for the related effect of the clustering into jets of overlapping soft particles), the second effect is not treated at present. We have proposed a jetmixing method which treats this, and we have shown that it allows one to successfully extract the signal process from the mixed sample to within few percent.
The methods discussed in this paper can be applied to the high pile-up regime and do not require special runs at low pile-up. The data samples needed for jet mixing are recorded at the same time as the signal events. There is therefore no loss in luminosity. The advantages are that one can access the proper pile-up distribution and there is no need for pile-up reweighting.
The use of these methods thus implies good prospects both for precision Standard Model studies at moderate scales affected by pile-up, e.g. in Drell-Yan and Higgs production [23][24][25], and for searches for rare processes beyond Standard Model in high pile-up regimes.