Measurement of jet activity produced in top-quark events with an electron, a muon and two b-tagged jets in the final state in pp

Measurements of jet activity in top-quark pair events produced in proton–proton collisions are presented, using 3.2 fb−1 of pp collision data at a centre-of-mass energy of 13 TeV collected by the ATLAS experiment at the Large Hadron Collider. Events are chosen by requiring an oppositecharge eμ pair and two b-tagged jets in the final state. The normalised differential cross-sections of top-quark pair production are presented as functions of additional-jet multiplicity and transverse momentum, pT. The fraction of signal events that do not contain additional jet activity in a given rapidity region, the gap fraction, is measured as a function of the pT threshold for additional jets, and is also presented for different invariant mass regions of the eμbb̄ system. All measurements are corrected for detector effects and presented as particle-level distributions compared to predictions with different theoretical approaches for QCD radiation. While the kinematics of the jets from top-quark decays are described well, the generators show differing levels of agreement with the measurements of observables that depend on the production of additional jets.


Introduction
Top-quark pair production final states in proton-proton ( pp) collisions at the Large Hadron Collider (LHC) often include additional jets not directly produced in the top-quark decays. The uncertainties associated with these processes are significant in precision measurements, such as the measurement of the top-quark mass [1] and the inclusive tt production cross-section [2].
These additional jets arise mainly from hard gluon emissions from the hard-scattering interaction beyond tt production and are described by quantum chromodynamics (QCD). The higher centre-of-mass energy of the pp scattering process in LHC Run 2 opens a large kinematic phase space for QCD radiation. Several theoretical approaches are available to model the production of these jets in tt processes, including next-to-leading-order (NLO) QCD calculations, partonshower models, and methods matching fixed-order QCD with the parton shower. The aim of this analysis is to test the predictions of extra jet production in these approaches and to provide data to adjust free parameters of the models to optimise their predictions.
The jet activity is measured in events with at least two b-tagged jets, i.e. jets tagged as containing b-hadrons, and exactly one electron and exactly one muon of opposite electrical charge in the final state. Additional jets are defined as jets produced in addition to the two b-tagged jets required for the event selection, without requiring any matching of jets to partons. In order to probe the p T dependence of the hard-gluon emission, this analysis measures the normalised differential tt cross-sections as a function of the jet multiplicity for different transverse momentum ( p T ) thresholds of the additional jets. The p T of the leading additional jet is measured, as well as the p T of the leading and sub-leading jets initiated by b-quarks ("b-jets"), which are top-quark decay products in most of the events.
Furthermore, the gap fraction defined as the fraction of events with no jet activity in addition to the two b-tagged jets above a given p T threshold in a rapidity region in the detector, is measured as a function of the additional jets' minimum p T threshold as defined in Refs. [3,4]. The results are presented in a fiducial phase space in which all selected final-state objects are produced within the detector acceptance following the definitions in Ref. [5].
This paper provides a measurement of additional jets in tt events in the dilepton channel for the new centre-of-mass energy of 13 TeV. Measurements similar to those presented in this paper were performed by ATLAS at 7 TeV [3,5] and have been used to tune parameters in Monte Carlo (MC) generators for LHC Run 2 [6-8]. These earlier measurements were performed in the lepton+jets channel where the inclusive jet multiplicity was measured, since it is difficult to distinguish jets originating in W decays from additional jets produced by QCD radiation. Recent measurements of jet multiplicity were performed in the single lepton channel by CMS at 13 TeV [9] and in the dilepton channel, including also the gap fractions, by ATLAS and CMS at 8 TeV [4,10].

ATLAS detector
The ATLAS detector [11] at the LHC covers nearly the entire solid angle 1 around the interaction point. It consists of an inner tracking detector surrounded by a thin superconducting solenoid, electromagnetic and hadronic calorimeters, and a muon spectrometer incorporating three large superconducting toroid magnets. The inner-detector system is immersed in a 2T axial magnetic field and provides charged-particle tracking in the range |η| < 2.5.
The high-granularity silicon pixel detector covers the interaction region and provides four measurements per track. The closest layer, known as the Insertable B-Layer (IBL) [12], was added in 2014 and provides high-resolution hits at small radius to improve the tracking performance. The pixel detector is followed by the silicon microstrip tracker, which provides four three-dimensional measurement points per track. These silicon detectors are complemented by the 1 ATLAS uses a right-handed coordinate system with its origin at the nominal interaction point (IP) in the centre of the detector and the zaxis along the beam pipe. The x-axis points from the IP to the centre of the LHC ring, and the y-axis points upwards. Cylindrical coordinates (r, φ) are used in the transverse plane, φ being the azimuthal angle around the z-axis. The pseudorapidity is defined in terms of the polar angle θ as η = − ln tan(θ/2). Angular distance is measured in units of R = ( η) 2 + ( φ) 2 . transition radiation tracker, which enables radially extended track reconstruction up to |η| = 2.0. The transition radiation tracker also provides electron identification information based on the fraction of hits (typically 30 in total) passing a higher charge threshold indicative of transition radiation.
The calorimeter system covers the pseudorapidity range |η|<4.9. Within the region |η|<3.2, electromagnetic calorimetry is provided by barrel and endcap high-granularity lead/liquid-argon (LAr) electromagnetic calorimeters, with an additional thin LAr presampler covering |η| < 1.8 to correct for energy loss in material upstream of the calorimeters. Hadronic calorimetry is provided by the steel/scintillatortile calorimeter, segmented into three barrel structures within |η| < 1.7, and two copper/LAr hadronic endcap calorimeters. The solid angle coverage is completed with forward copper/LAr and tungsten/LAr calorimeter modules, which are optimised for electromagnetic and hadronic measurements, respectively.
The muon spectrometer comprises separate trigger and high-precision tracking chambers, measuring the deflection of muons in a magnetic field generated by superconducting air-core toroids. The precision chamber system surrounds the region |η| < 2.7 with three layers of monitored drift tubes, complemented by cathode strip chambers in the forward region, where the background is highest. The muon trigger system covers the range |η| < 2.4 with resistive plate chambers in the barrel, and thin-gap chambers in the endcap regions.
A two-level trigger system is used to select interesting events [13,14]. The Level-1 trigger is implemented in hardware and uses a subset of detector information to reduce the event rate to a design value of at most 100 kHz. This is followed by the high-level software-based trigger (HLT), which reduces the event rate to 1 kHz.

Data and simulation samples
The proton-proton ( pp) collision data used in this analysis were collected during 2015 by the ATLAS detector and correspond to an integrated luminosity of 3.2 fb −1 at √ s = 13 TeV. The data considered in this analysis were collected under stable beam conditions, requiring that all detectors were operational. Each selected event includes interactions from an average of 14 inelastic pp collisions in the same proton bunch crossing, as well as residual signals from previous bunch crossings with a 25 ns bunch spacing. These two effects are collectively referred to as "pile-up". Events are required to pass a single-lepton trigger, either electron or muon. Multiple triggers are used to select events: either triggers with low lepton p T thresholds of 24 GeV which utilise isolation requirements to reduce the trigger rate, or triggers with higher p T thresholds but looser isolation requirements to increase event acceptance. The higher p T thresholds were 50 GeV for muons and 60 GeV or 120 GeV for electrons.
MC simulations are used to model background processes and to correct the data for detector acceptance and resolution effects. The nominal tt sample is simulated using the NLO Powheg-Box v2 matrix-element (ME) generator [15][16][17], referred to as Powheg in the following, and Pythia6 [18] (v6.427) for the parton shower (PS), hadronisation and underlying event. Powheg is interfaced to the CT10 [19] NLO parton distribution function (PDF) set, while Pythia6 uses the CTEQ6L1 PDF set [20]. Pythia simulates the underlying event and parton shower using the P2012 set of tuned parameters (tune) [21]. The "h damp " parameter, which controls the p T of the first additional emission beyond the Born configuration, is set to the mass of the top quark (m t ). The main effect of this is to regulate the highp T emission against which the tt system recoils. The choice of this h damp value has been found to improve the modelling of the tt system kinematics with respect to data in previous analyses [6]. In order to investigate the effects of initial-and final-state radiation, alternative Powheg+Pythia6 samples are generated with the renormalisation and factorisation scales varied by a factor of 2 (0.5) and using low (high) radiation variations of the Perugia 2012 tune and an h damp value of m t (2m t ), corresponding to less (more) parton-shower radiation [6]. These samples are called RadHi and RadLo in the following. These variations are selected to cover the uncertainties in the measurements of differential distributions in 7 TeV data [22]. Alternative samples are generated using Powheg and MadGraph5_aMC@NLO [23] (v2.2.1) with CKKW-L, referred to as MG5_aMC@NLO hereafter, both interfaced to Herwig++ [24] (v2.7.1), in order to estimate the effects of the choice of matrix-element generator. These tt samples are described in Ref. [6].
Additional tt samples are generated for comparisons with unfolded data as follows. The predictions of the ME generators Powheg and MG5_aMC@NLO are interfaced to Herwig7 [24,25] and Pythia8. In all Powheg and MG5_aMC@NLO samples mentioned above, the first emission is calculated from the leading-order real emission term, and further additional jets are simulated from parton showering, which is affected by significant theoretical uncertainties. Improved precision is expected from using Sherpa v2.2 [26], which models the inclusive and the one-additional-jet process using an NLO matrix element and up to four additional jets at leading-order (LO) accuracy using the ME + PS@NLO prescription [27]. The sample used to compare to particle-level results presented here is generated with the central scale set to μ 2 = m 2 t +0.5×( p 2 T,t + p 2 T,t ), where p T,t and p T,t refer to the p T of the top and antitop quark, respectively, and with the matching scale set to 30 GeV. Furthermore, the NNPDF 3.0 PDF [28] at next-to-next-to-leading order (NNLO) is used.
All tt samples are normalised to the cross-section calculated with the Top++2.0 program to NNLO in perturba-tive QCD, including soft-gluon resummation to NNLL [29], assuming a top-quark mass of 172.5 GeV.
Background processes are simulated using a variety of MC generators, as described below. Details of the background estimation are described in Sect. 5. Single top-quark production in association with a W boson (W t) is simulated using Powheg-Box v1+Pythia6 with the same parameters and PDF sets as those used for the nominal tt sample and is normalised to the approximate NNLO cross-section (71.7 ± 3.8 pb) described in Ref. [30]. At NLO, part of the final state of W t production is identical to the final state of tt production. The "diagram removal" (DR) generation scheme [31] is used to remove this part of the phase space from the background calculation. A sample generated using an alternative "diagram subtraction" (DS) method [31] is used to evaluate systematic uncertainties. Both samples are normalised to the generator cross-section.
The majority of backgrounds with at least one misidentified lepton in the selected sample arise from tt production in which only one of the top quarks decays semileptonically, which is simulated in the same way as the tt production in which both top quarks decay leptonically.
Sherpa v2.1, interfaced to the CT10 PDF set, is used to model Drell-Yan production, specifically Z /γ * → τ + τ − . For this process, Sherpa calculates matrix elements at NLO for up to two partons and at LO for up to four partons using the OpenLoops [32] and Comix [33] matrix-element generators. The matrix elements are merged with the Sherpa PS [34] using the ME + PS@NLO prescription [35]. The total crosssection is normalised to NNLO predictions calculated using the FEWZ program [36] with the MSTW2008NNLO PDF [37]. Sherpa v2.1 with the CT10 PDF set is also used to simulate electroweak diboson production [38] (W W , W Z, Z Z), where both bosons decay leptonically. For diboson production, Sherpa v2.1 calculates matrix elements at NLO for zero additional partons, at LO for one to three additional partons (with the exception of Z Z production, for which the one additional parton is also NLO), and using PS for all parton multiplicities of four or more.
The ATLAS detector response is simulated [39] using Geant 4 [40]. A "fast simulation" [41], utilising parameterised showers in the calorimeter, is used in the samples chosen to estimate tt modelling uncertainties. Additional pp interactions are generated using Pythia8.186 [42] with tune A2 and overlaid with signal and background processes in order to simulate the effect of pile-up. The MC simulations are reweighted to match the distribution of the average number of interactions per bunch crossing that are observed in data, referred to as "pile-up reweighting". Corrections are applied to the MC simulation in order to improve agreement with data for the efficiencies of reconstructed objects. The same reconstruction algorithms and analysis procedures are then applied to both data and MC simulation.

Object reconstruction
This analysis selects reconstructed electrons, muons and jets. Electron candidates are identified by matching an innerdetector track to an isolated energy deposit in the electromagnetic calorimeter, within the fiducial region of transverse momentum p T > 25 GeV and pseudorapidity |η| < 2.47. Electron candidates are excluded if the energy cluster is within the transition region between the barrel and the endcap of the electromagnetic calorimeter, 1.37 < |η| < 1.52, and if they are also reconstructed as photons. Electrons are selected using a multivariate algorithm and are required to satisfy a likelihood-based quality criterion, in order to provide high efficiency and good rejection of fake and nonprompt electrons [43,44]. Electron candidates must have tracks that pass the requirements of transverse impact parameter significance 2 |d sig 0 | < 5 and longitudinal impact parameter |z 0 sin θ | < 0.5 mm. Electrons must also pass isolation requirements based on inner-detector tracks and topological energy clusters varying as a function of η and p T . The track isolation cone size is given by the smaller of R = 10 GeV/ p T and R = 0.2, i.e. a cone which increases in size at lower p T values, up to a maximum of 0.2. These requirements result in a 95% efficiency of the isolation cuts for electrons from Z → e + e − decays with p T of 25 GeV and 99% for electrons with p T above 60 GeV; when estimated in simulated tt events, this efficiency is smaller by a few percent, due to the increased jet activity. Electrons that share a track with a muon are discarded. Double counting of electron energy deposits as jets is prevented by removing the closest jet with an angular distance R < 0.2 from a reconstructed electron. Following this, the electron is discarded if a jet exists within R < 0.4 of the electron, to ensure sufficient separation from nearby jet activity.
Muon candidates are identified from a track in the inner detector matching a track in the muon spectrometer; the combined track is required to have p T > 25 GeV and |η| < 2.5 [45]. The tracks of muon candidates are required to have a transverse impact parameter significance |d sig 0 | < 3 and a longitudinal impact parameter below 0.5 mm. Muons are required to meet quality criteria and the same isolation requirement as applied to electrons, to obtain the same isolation efficiency performance as for electrons. These requirements reduce the contributions from fake and non-prompt muons. Muons may leave energy deposits in the calorimeter that could be misidentified as a jet, so jets with fewer than three associated tracks are removed if they are within R < 0.4 of a muon. Muons are discarded if they are separated from the nearest jet by R < 0.4, to reduce the 2 The transverse impact parameter significance is defined as d sig 0 = d 0 /σ d0 , where σ d0 is the uncertainty in the transverse impact parameter d 0 .
background from muons originating in heavy-flavour decays inside jets.
Jets are reconstructed with the anti-k t algorithm [46,47], using a radius parameter of R = 0.4, from topological clusters of energy deposits in the calorimeters. Jets are accepted within the range p T > 25 GeV and |η| < 2.5, and are calibrated using simulation with corrections derived from data [48]. Jets likely to originate from pile-up are suppressed using a multivariate jet-vertex-tagger (JVT) [49] for candidates with p T < 60 GeV and |η| < 2.4. Jets containing bhadrons are b-tagged using a multivariate discriminant [50], which uses track impact parameters, track invariant mass, track multiplicity and secondary vertex information to discriminate those jets from light quark or gluon jets ("light jets"). The average b-tagging efficiency is 77% for b-jets in simulated dileptonic tt events with a purity of 95%. The tagging algorithm gives a rejection factor of about 130 against light jets and about 4.5 against jets originating from charm quarks ("charm jets").

Event selection and background estimates
Signal events are selected by requiring exactly one electron and one muon of opposite electric charge ("opposite sign"), and at least two b-tagged jets. With this selection, almost all of the selected events are tt events. The other processes that pass the signal selection are events with single top quarks (W t), tt events in the single-lepton decay channel with a misidentified (fake) lepton, Z /γ * → τ + τ − (→ eμ) and diboson events. Other backgrounds, including processes with two misidentified leptons, are negligible for the event selections used in this analysis.
Additional jets are defined as those produced in addition to the two highestp T b-tagged jets. They are identified as jets above p T thresholds of 25, 40, 60 and 80 GeV, independent of the jet flavour. In very rare cases, b-jets may also be produced in addition to the top-quark pair, for example through splitting of a very high momentum gluon, or through the decay of a Higgs boson into a bottom-antibottom pair, leading to events with more than two b-tagged jets. In this case, the two selected b-tagged jets with the highest p T are assumed to originate from tt decay, and the others are considered as additional jets. This procedure ignores that occasionally a b-jet which is not the decay product of a top quark might have higher p T than those from the top-quark decays. This is a negligible effect within the uncertainties of this measurement.
The single-top background is estimated from simulation, as described in Sect. 3. The background from tt events in the lepton+jets channel with a fake lepton is estimated from a combination of data and simulation, as in Ref. [2]. This method uses the observation that samples with a same-sign eμ pair and two b-tagged jets are dominated by events with a misidentified lepton, with a rate comparable to those in the opposite-sign sample. The contributions of events with misidentified leptons are therefore estimated as same-sign event counts in data, after subtraction of predicted prompt same-sign contributions multiplied by the ratio of oppositesign to same-sign fake leptons, as predicted from the nominal tt sample.
The backgrounds from Z /γ * → τ + τ − and from diboson events are estimated from simulation and are below 1%. The normalisation for the Z /γ * → τ + τ − contribution is estimated from events with Z /γ * → e + e − or μ + μ − and two b-jets within the acceptance of this analysis. The Monte Carlo prediction is scaled by 1.37 ± 0.30 to fit the observed rate.
After the event selection, only about 4.5% of the events are background, as listed in Table 1. The background is dominated by single top production (3.1%) and fake leptons (1.6%). The event yields and the relative background contributions vary with jet multiplicity and jet p T as shown in Figs. 1 and 2, respectively. The single-top background dominates across all jet p T values and at low additional jet multiplicities. At high jet multiplicities (≥3 additional jets) the fake-lepton background exceeds the number of singletop events. While the number of events observed in the 0-jet bin agrees with the prediction within the uncertainties, the data exceed the predictions increasingly with jet multiplicity, reaching a 25% deviation for events with at least four additional jets above 25 GeV.
The table and figures also list the contribution of tt events with at least one additional jet identified as originating from pile-up (pile-up jets). These are signal events, but a few pileup jets are still in the sample after object and event selection, as the background suppression of the JVT cut is very high but not 100%. Due to the presence of at least one jet that does not originate from the hard interaction, these events may appear in the wrong jet multiplicity bin. In the jet p T spectra, pile-up jets contribute at low additional-jet p T as the pile-up jets are generally softer than the jets in tt events. For the same reason, pile-up jets only contribute significantly to the jet multiplicity distributions with the 25 GeV threshold. In most of the events with remaining pile-up jets, only one of the additional jets is caused by pile-up. Any remaining pile-up jets can be identified in the simulation, but not in data. Therefore the data are corrected for pile-up jets in the unfolding procedure, as described later.

Sources of systematic uncertainty
The systematic uncertainties of the reconstructed objects, in the signal modelling and in the background estimates, are evaluated as described in the following. The jet energy scale (JES) uncertainty is evaluated by varying 19 uncertainty parameters derived from in situ analyses at √ s = 8 TeV and extrapolated to data at √ s = 13 TeV [48]. The JES uncertainty is 5.5% for jets with p T of 25 GeV and quickly decreases with increasing jet p T , falling to below 2% for jets above 80 GeV. The uncertainty in the jet energy resolution (JER) is calculated by extrapolating the uncertainties derived at √ s = 8 TeV to √ s = 13 TeV [48]. The uncertainty in JER is at most 3.5% at p T of 25 GeV, quickly decreasing with increasing jet p T to below 2% for jets above 50 GeV.
Uncertainties on the efficiency for tagging b-jets were determined using the methods described in Ref. [51] applied to dileptonic ttbar events in √ s = 13 TeV data. The uncertainties on mistagging of charm and light jets were determined using √ s = 8 TeV data as described in Refs. [52,53]. Additional uncertainties are assigned to take into account the presence of the new IBL detector and the extrapolation to √ s = 13 TeV [50]. The lepton-related uncertainties are assessed mostly using Z → μ + μ − and Z → e + e − decays measured in √ s = 13 TeV data. The differences between the topologies of Z and tt pair production events are expected not to be significant for the estimation of uncertainties.
The uncertainty associated with the amount of QCD initial-and final-state radiation is evaluated as the difference between the baseline MC sample and the corresponding RadHi and RadLo samples described in Sect. 3. The uncertainty due to the choice of parton-shower and hadronisation algorithms in the signal modelling is assessed by comparing the baseline MC sample (Powheg+Pythia6) with Powheg+Herwig++. The uncertainty due to the use of a specific NLO MC sample with its particular matching algorithm is derived from the comparison of Powheg+Herwig++ to the MG5_aMC@NLO+Herwig++ sample.
The uncertainty due to the particular PDF used for the signal model prediction is evaluated by taking the standard deviation of variations from 100 eigenvectors of the recommended Run-2 PDF4LHC [54] set and adding them in

(d)
quadrature with the difference between the central predictions from CT10 and CT14 [55]. The uncertainty in the single top-quark background is evaluated based on the 5.3% error in the approximate NNLO cross-section prediction and by comparing samples with diagram removal and diagram subtraction schemes, as described in Sect. 3. The uncertainty in the background from fake leptons is estimated to be 100% from the statistical uncertainty of the same-sign event counts in data and an interpolation error using the envelope of the differences of individual subcomponents (such as photon-conversion, heavy-flavour decay leptons, for example) of misidentified lepton background between the same-sign and the opposite-sign sample.
For Z +jets backgrounds, the scale factor derived in the e + e − and μ + μ − channels and used to reweight the signalregion distribution is varied by 22%, corresponding to the difference in the scale factors derived in subsamples with and without an additional jet. This value covers the variations of the correction factor derived from subsets of events with different jet multiplicities. No theoretical uncertainty is applied to the Z +jets background normalisation as this is scaled to data.
The uncertainty in the amount of pile-up is estimated by changing the nominal MC reweighting factors to vary the number of interactions per bunch crossing in data up and down by 10%. Two methods were used to estimate the amount of interactions per bunch crossing. The first method calculated the number of interactions using the instantaneous luminosity and the inelastic proton-proton cross section [56,57]. The results of the calculation were compared to results from a data-driven method based on the number of reconstructed vertices. The difference between the correlation of the two methods in data and MC is taken as the uncertainty.
The uncertainty due to the 2-3% loss of hard-scatter jets due to the JVT cut is estimated using Z +jet events. The uncertainty in the efficiency of the JVT cut to reduce pile-up jets is estimated by using a sideband method. The JVT cut is inverted in simulation to estimate the number of pile-up jets and derive a scale factor to describe the number of pile-up

(c)
jets in data. This factor is then used to scale the predicted number of pile-up jets in the signal region (with the JVT cut applied). Scale factors are also derived using the samples with increased and decreased pile-up mentioned above, and the larger of two variations is taken as systematics.

Definition of the fiducial phase space
For the measurement of the jet multiplicity, the jet p T spectra and the gap fractions, the data are corrected to particle level by comparing to events from MC generators in the fiducial volume described below. The fiducial volume, i.e., the object definitions and the kinematic phase space at particle level, is designed to match the reconstruction level as closely as possible and follow closely the definitions in Refs. [4,5]. Leptons and jets are defined using particles with a mean lifetime greater than 0.3 × 10 −10 s, directly produced in pp interactions or from subsequent decays of particles with a shorter lifetime. Leptons from W boson decays (e, μ, ν e , ν μ , ν τ ) are identified as such by requiring that they are not hadron decay products. Electron and muon four-momenta are calculated after the addition of photon four-momenta within a cone of R = 0.1 around their original directions. Jets are defined using the anti-k t algorithm with a radius parameter of 0.4. All particles are considered for jet clustering, except for leptons from W decays as defined above (i.e., neutrinos from hadron decays are included in jets) and any photons associated with the selected electrons or muons. Jets initiated by b-quarks are identified as such, i.e., identified as b-jets if a hadron with p T > 5 GeV containing a b-quark is associated with the jet through a ghost-matching technique as described in Ref. [58].
The cross-section is defined using events with exactly one electron and one muon with opposite-sign directly from W boson decays, i.e. excluding electrons and muons from decay of the τ leptons. In addition, at least two b-jets each with p T > 25 GeV and |η| < 2.5 are required. Following the reconstructed object selection, events with jet-electron pairs or jet-muon pairs with R < 0.4 are excluded. Additional jets are considered within |η| < 2.5 for p T thresholds of 25 GeV or higher, independently of their flavour.

Measurement of jet multiplicities and p T spectra
The multiplicities of additional reconstructed jets with different p T thresholds are corrected to particle level within the fiducial volume as defined above. Even though the kinematic range of the measurement is chosen to be the same for particle-level and reconstruction-level objects, corrections are necessary due to the efficiencies and detector resolutions that cause differences between reconstruction-level and particle-level jet distributions. Examples include events in which one or more particle-level jets do not pass the p T threshold for reconstruction-level jets and when the selection efficiency for inclusive tt events changes as a function of jet multiplicity. Furthermore, additional reconstructed jets without a corresponding particle-level jet may appear due to pile-up, or if a jet migrates into the fiducial volume due to an upward fluctuation caused by the p T resolution, or if a single particle-level jet is reconstructed as two separate jets. These effects lead to migrations between bins and are taken into account within an iterative Bayesian unfolding [59].
The reconstructed jet multiplicity measurements are corrected separately for each additional-jet p T threshold according to where N i unfold is the total number of fully corrected particlelevel events with particle-level jet multiplicity i. The term f i eff represents the efficiency to reconstruct an event with i additional jets, defined as the ratio of events with i particle-level jets that fulfil both the fiducial volume selection at particlelevel and the reconstruction-level selection, N i reco∧part , to the number of events that fulfil the particle-level selection, N i part : (2) The resulting ratio f i eff is approximately 0.33 and has very small dependence on the jet multiplicity. The analysis of different tt MC samples results in values of f i eff which vary by up to 10%. The variations of f i eff between different p T thresholds are less than 2%. The function f j accept is the probability of an event fulfilling the reconstruction-level selection and with j reconstructed jets, N j reco , to also be within the particle-level acceptance defined in Sect. 7: ( The variable N j data is the number of events in data with j reconstructed jets and N j bg is the number of background events, as evaluated in Sect. 5. The resulting f j accept decreases from around 0.85 for events without additional jets to about 0.76 for the highest jet multiplicities. The MC predictions of f j accept agree within 1% for events without any additional jets and within 5% at high jet multiplicities. Only MG5_aMC@NLO+Herwig++ predicts a smaller change as a function of the number of jets. The response matrix M part,i reco, j represents the probability P(N j reco |N i part ) of finding an event with true particle-level jet multiplicity i with a reconstructed jet multiplicity j. As shown in Fig. 3, at the higher jet p T thresholds, at least 77% of the events have the same jet multiplicity at particle level and at reconstruction level. At the 25 GeV threshold, the agreement still exceeds 64%. The worse agreement can be explained in part by the presence of pile-up jets, which leads to events with more reconstructed than particle-level jets. There are almost no events with a difference of more than one jet between particle and reconstruction-level multiplicity.
As part of the Bayesian unfolding using Eq. (1), M part,i reco, j is calculated iteratively, i.e., the result of the first iteration is used as the reconstruction-level jet multiplicity for the following one. The corrected spectra are found to converge after four iterations of the Bayesian unfolding algorithm. The unfolded additional-jet multiplicity distributions are normalised after the last iteration according to where N i unfold , as defined in Eq.
(1), corresponds to the number of events with i jets after full unfolding and σ is the measured tt production cross section in the fiducial volume.
A potential bias of the unfolded results due to data statistics and the unfolding procedure is investigated using pseudo-experiments by performing Gaussian sampling of the reconstruction-level distributions with statistical power equivalent to that present in data. The size of the bias, defined as the relative difference between the unfolded and predicted particle-level distributions, is found to be within the statistical uncertainty of the data. To check the size of a potential bias of the unfolding due to the relation between reconstructed and particle level distributions, the particle-level distributions are reweighted to alternative MC samples. Pseudo-experiments are performed based on the resulting alternative spectrum at reconstruction level. The pseudo-experiments are unfolded using the original correction procedure. The relative difference between the unfolded particle-level distribution and the predicted particle-level distribution from the alternative MC sample is found to be well within the modelling uncertainty. In addition, it is ensured that differences between the nominal and alternative particle-level distributions are at least as large as the difference between data and the predicted reconstruction-level distributions. Unfolding response matrices to match distributions (jet multiplicity, jet p T ) at reconstruction level to particle-level distributions in the fiducial phase space. Only events that fulfil the reconstruction-(particle-) level selection are included. Matrices to unfold a jet multiplicity for additional jets with p T > 25 GeV, b jet multiplicity for additional jets with p T > 40 GeV, c jet p T of the leading additional jet, and d jet p T of the leading b-jet

(d)
The effect of the uncertainties listed in Sect. 6 on the unfolded multiplicity and jet spectra is evaluated as follows. The uncertainties due to detector-related effects, such as JES, JER and b-tagging and data statistics, are propagated through the unfolding by varying the reconstructed objects for each uncertainty component by ±1σ . The modified spectrum is then used as N j data in Eq. (1) for the iterative unfolding and the difference on the particle-level distribution is taken as the systematic uncertainty.
The uncertainties due to the MC modelling of the QCD initial-and final-state radiation (ISR/FSR) and the partonshower uncertainty are evaluated by replacing the data with the corresponding alternative MC sample and using the response matrix and the correction factors from the baseline tt MC sample for unfolding. The result is compared to the particle-level distribution of the alternative MC sample and the difference is taken as a systematic uncertainty. The uncertainties due to the MC modelling of the NLO matrix element and the matching algorithm are estimated in a similar way by replacing the data with the MG5_aMC@NLO+Herwig++ sample but using the response matrix and correction factors from Powheg+Herwig++. The resulting uncertainties are symmetrised for each component.
To unfold the leading and sub-leading b-jet p T and the leading additional-jet p T , the same ansatz is used as for the jet multiplicity measurement, but with the jet p T instead of the jet multiplicity in the matrix, the acceptance and the efficiency formula. The binning is chosen to limit the migration, such that most events have reconstruction-level jet p T in the same region as the particle-level jet p T , and to limit the uncertainty due to data statistics. The efficiency correction f i eff for the b-jets has a significant p T dependence: it is around 0.2 for the lowest p T bin and reaches approximately 0.35 at p T of 80 GeV. The efficiency for the additional jet varies only slightly between 0.28 and 0.31. The acceptance correction is between 0.8 and 0.9 for all jets and almost independent of p T , except at very low p T , at which it decreases significantly, to 0.56 for the leading additional jet. The unfolding response matrix presented in Fig. 3 shows that more than 60% of the jets are in the same p T bin at particle and reconstruction level.
The spectra are normalised after the last iteration similarly to those in the jet multiplicity measurement: where N i p T ,unfold , as defined in Eq. (1), corresponds to the number of events with the jet p T in bin i after full unfolding.
The measurement of the jet p T spectra is as stable as the jet multiplicity measurements and the biases are small.

Jet multiplicity results
The unfolded normalised cross-sections are shown in Fig. 4 and are compared to different MC predictions. Events with up to three additional jets with p T above 25 GeV are measured exclusively (four jets inclusively) and up to two additional  , which deviates significantly from the data for all p T thresholds. The MG5_aMC@NLO predictions agree within 5-10% regardless of which parton shower is used (except Herwig7), and the Powheg predictions vary slightly more. The variations are larger when using different matrix elements but the same parton shower.
The unfolded data are compared with different MC predictions using χ 2 tests. Full covariance matrices are produced from the unfolding taking into account statistical and all systematic uncertainties. The correlation of the measure-ment bins is similar for all jet p T thresholds: strong anticorrelations exist between events with no additional jet and events with any number of additional jets. Positive correlations exist between the bins with one and two additional jets. The χ 2 is determined using: where S n−1 is a column vector representing the difference between the unfolded data and the MC generator predictions of the normalised cross-section for one less than the total number of bins in the distribution, and Cov n−1 is a matrix with n − 1 rows and the respective n − 1 columns of the full covariance matrix. The full covariance matrix is singular and non-invertible, as it is evaluated using normalised distributions. The p-values are determined using the χ 2 and n − 1 degrees of freedom. Table 6 shows the χ 2 and p-values.   A statistical comparison taking into account the bin correlations indicates that the agreement with data is slightly better for MG5_aMC@NLO+Herwig++, as shown in Table 6. The ratio of the data to predictions of Powheg+Pythia6 with different levels of QCD radiation both in the matrixelement calculation and in the parton shower is also shown. Powheg+Pythia6 (RadLo) does not describe the data well. The central prediction of Powheg+Pythia6 yields fewer jets than in data; however, the predictions are still within the experimental uncertainties. Powheg+Pythia6 (RadHi) describes the data most consistently, which is also confirmed by high p-values for all p T thresholds. The Powheg+Pythia6 (RadLo) sample has p-values around 0.5 and the central sample mostly between 0.8 and 0.9.

Jet p T spectra results
The particle-level normalised cross-sections differential in jet p T are shown in Fig. 6 and are compared to different MC    Total 9.0 6.0 9.0 11.0 1 1 .0 1 8 .0 predictions. The total uncertainty in the p T measurements is 5-11%, although higher at some edges of the phase space. The predictions agree with data for all jet p T distributions as shown in Figs. 6 and 7, although the predictions of Powheg+Herwig++ and MG5_aMC@NLO+Pythia8 do not give a good description of the leading additional-jet p T distribution, which is consistent with the jet multiplic-     Table 11 Sources of uncertainty in the gap fraction measurement as a function of Q 0 for the full central region |y| < 2.1, for a selection of Q 0 thresholds. "Signal modelling" sources of systematic uncertainty includes the hadronisation, parton shower and NLO modelling uncer-tainties. "Other" sources of systematic uncertainty refer to lepton and jet selection efficiencies, background (including pile-up jets) estimations, and the PDF  Fig. 9 The measured gap fraction f part gap (Q 0 ) as a function of Q 0 in different rapidity veto regions, a |y| < 0.8, b 0.8 < |y| < 1.5, c 1.5 < |y| < 2.1 and d |y| < 2.1. The data are shown by the points with error bars indicating the total uncertainty, and compared to the predictions from various tt simulation samples shown as smooth curves.
The lower plots show the ratio of predictions to data, with the data uncertainty indicated by the shaded band, and the Q 0 thresholds corresponding to the left edges of the histogram bins, except for the first bin ity results. This is reflected by the statistical comparison as well (Table 10).

Gap fraction measurements
The jet activity is also studied by measuring the gap fraction f gap , defined as the fraction of events with no jet activity in addition to the two b-tagged jets above a given p T threshold in a "veto region" defined as a rapidity region in the detector. The transverse momentum threshold is defined in two ways, and the gap fraction in two ways accordingly. First, the gap fraction is measured as the fraction of events without any additional jet in that rapidity region above a given p T threshold Q 0 : where N tt is the total number of selected events, Q 0 is the p T threshold for any additional jet in the veto region of these events, and n(Q 0 ) represents the subset of events with no additional jet with p T > Q 0 . The second type of gap fraction is defined as the fraction of events in which the scalar p T sum of all additional jets in the given veto region does not exceed a given threshold Q sum : Here, n(Q sum ) represents the subset of events in which the scalar p T sum of all additional jets in the veto region is less than Q sum . The gap fraction defined using Q 0 is mainly sensitive to the leading p T emission accompanying the tt system, whereas the gap fraction defined using Q sum is sensitive to all hard emissions accompanying the tt system. In the following descriptions of the gap fraction measurement process, the same procedure is followed for Q sum as for Q 0 . Both types of gap fraction are measured in four veto regions: |y| < 0.8, 0.8 < |y| < 1.5, 1.5 < |y| < 2.1 and the full central region |y| < 2.1, where y is calculated as Furthermore, the gap fraction is measured considering jet activity in the full central region (|y| < 2.1) for four different subsamples specified by the mass of the eμ + 2 b-tagged jets system, m eμbb . Both the rapidity region and the m eμbb subsamples are chosen to correspond to those used in earlier publications at lower energies [3,4].
The gap fraction f part gap (Q 0 ) (and analogously for f part gap (Q sum ) in the following) is measured as defined in Eq. (10) by counting the number of selected data events N data and the number n data (Q 0 ) of those that had no additional jets with p T > Q 0 within the veto region, where the sets of Q 0 and Q sum threshold values correspond approximately to one standard deviation of the jet energy resolution and are the same as in the earlier publications [3,4] (d) Fig. 11 The measured gap fraction f part gap (Q sum ) as a function of Q sum in different rapidity veto regions, a |y| < 0.8 and b |y| < 2.1, followed by ratios of prediction to data of the measured gap fraction f part gap (Q sum ) as a function of Q sum in the same two rapidity regions. The data in a and b are shown by the points with error bars indicating the total uncertainty, and compared to the predictions from various tt simulation samples shown as smooth curves. The lower plots in a and b and the set of ratio plots in c and d show the ratio of predictions to data, with the data uncertainty indicated by the shaded band, and the Q sum thresholds corresponding to the left edges of the histogram bins, except for the first bin and similarly for f part gap (Q sum ). The measured gap fraction f data (Q 0 ) is then corrected for detector effects to particle level by multiplying it by a correction factor C(Q 0 ) to obtain f part gap (Q 0 ). The correction factor C(Q 0 ) is determined from the baseline Powheg+Pythia6 tt sample using the simulated gap fraction values at reconstruction level f reco (Q 0 ), and at particle level f part (Q 0 ): The values of the correction factors C(Q 0 ) and C(Q sum ) deviate by less than 4% from unity at low Q 0 and Q sum values in the rapidity regions (less than 8% in the m eμbb subsamples), and approach unity at higher threshold values. The small corrections reflect the high selection efficiency and high purity of the event samples. At each threshold Q 0 , the baseline simulation predicts that around 80% of the selected reconstructed events that do not have a jet with p T > Q 0 also have no particle-level jet with p T > Q 0 . Therefore, a simple bin-by-bin correction method is considered adequate, rather than a full unfolding as used in Sect. 8. Systematic uncertainties arise in this procedure from the uncertainties in C(Q 0 ) and the subtracted backgrounds. The uncertainties, as described in Sect. 6, are used to recalculate f data (Q 0 ) and C(Q 0 ) to obtain the gap fraction f part gap (Q 0 ). The corresponding quantities for Q sum are calculated accordingly. Figure 8 and Table 11 Table 12 Values of χ 2 for the comparison of the measured gap fraction distributions with the predictions from various tt generator configurations, for the four rapidity regions as a function of Q 0 . The χ 2 and p-values correspond to 18 degrees of freedom Generator   The p T distribution of the first additional jet shown in Fig. 6 contains only events with at least one additional jet and differs in this respect from the gap fraction distribution which includes events with no additional jet. However, the results are also consistent as Powheg+Pythia8 predicts a slightly softer p T spectrum for the additional jet which leads to the observed effect that less jets above the 25 GeV threshold are observed.
The matrix of statistical and systematic correlations is shown in Fig. 12 for the gap fraction measurement at different values of Q 0 for the full central |y| < 2.1 rapidity region. Nearby points in Q 0 are highly correlated, while well-separated Q 0 points are less correlated. The full covariance matrix, including correlations, is used to calculate a χ 2 value for the compatibility of each of the NLO generator predictions with the data in each veto region. The results are given in Tables 12 and 13. An analysis of the p-values confirms that Powheg+Herwig++, MG5_aMC@NLO+Herwig7, MG5_aMC@NLO+Pythia8 and Powheg+Pythia6 (RadLo) are not consistent with the data. Powheg+Pythia6 (RadHi) has the best p-values among the QCD shower variations of Powheg+Pythia6.

Gap fraction results in m eμbb subsamples
The gap fraction is also measured over the full central veto region |y| < 2.1 after dividing the data sample into four regions of m eμbb . The distribution of reconstructed m eμbb in the selected eμ + 2 b-tagged jets events is reasonably well-reproduced by the nominal tt simulation sample, as shown in Fig. 13. The distribution is divided into four regions at both reconstruction and particle level: m eμbb < 300 GeV, 300 GeV < m eμbb < 425 GeV, 425 GeV < m eμbb < 600 GeV and m eμbb > 600 GeV. These boundaries are chosen to minimise migration between the regions. In the baseline simulation, around 85% of the reconstructed events in each m eμbb region belong to the corresponding region at particle level. The corresponding correction factors C m (Q 0 ) which translate the measured gap fraction in the reconstruction-level m eμbb region to the corresponding particle-level gap fractions f m (Q 0 ) are of similar    Figures 14 and 15 show the measured gap fractions as a function of Q 0 in the four m eμbb regions in data, compared to the same set of predictions as shown in Figs. 9 and 10. Tables 14 and 15 give the χ 2 and p-values taking into account bin-by-bin correlations of the gap fractions compared to the predictions from the different generators. Figure 16 gives an alternative presentation of the gap fraction f m (Q 0 ) as a function of m eμbb for four different Q 0 values. The level of agreement between the data and the various predictions is consistent with the results of the gap fraction in rapidity bins. Only in the lowest mass region the Powheg+Pythia8 prediction agrees very well, while MG5_aMC@NLO+Herwig++ and Sherpa are at the lower edge of the uncertainties.

Conclusions
Studies of additional jet activity, using differential crosssection and gap fraction measurements, are presented for dileptonic tt events identified by the presence of an oppositesign eμ pair and at least two b-tagged jets. These measurements are performed using 3.2 fb −1 of √ s = 13 TeV pp collision data collected by the ATLAS detector in 2015 at the LHC. The measurements are corrected back to the particle level using full unfolding or correction factors, for welldefined fiducial regions and various p T thresholds for the additional jets.
The different measurements are compared to various Monte Carlo predictions and give consistent results. Even though many predictions are within the uncertainty band of the measurements, the proper evaluation of the compatibility of the models, taking into account the bin-by-bin correlations within each measurement, revealed that Powheg+Pythia6 (RadHi), MG5_aMC@NLO+Herwig++ and Sherpa describe the data best for all observables. Powheg+Pythia6 (RadLo), MG5_aMC@NLO+Pythia8 and all predictions involving Herwig7 do not describe the data well.
All studied combinations of the matrix element generators MG5_aMC@NLO and Powheg with the shower generators Herwig++, Pythia6 and Pythia8 provided no systematic trend indicating that one of the matrix element generators describes the data better for all parton shower generators. We also have no indication that one of the parton shower generators describes the data systematically better for both matrix element generators. This observation suggests that the matching between the parton shower and matrix element calculation plays an important role, and motivates further study in this area. The predictions of Sherpa which use NLO matrix elements consistently matched with up to four additional jets at LO show similar good agreement with data as the best of the MG5_aMC@NLO and Powheg predictions.