Measurements of tt cross sections in association with b jets and inclusive jets and their ratio using dilepton final states in pp collisions at √s=13 TeV

The cross sections for the production of t ¯ tb ¯ b and t ¯ tjj events and their ratio σ t ¯ tb ¯ b / σ t ¯ tjj are measured using data corresponding to an integrated luminosity of 2.3 fb − 1 collected in pp collisions at √ s = 13 TeV with the CMS detector at the LHC. Events with two leptons (e or μ ) and at least four reconstructed jets, including at least two identiﬁed as b quark jets, in the ﬁnal state are selected. In the full phase space, the measured ratio is 0 . 022 ± 0 . 003 (stat) ± 0 . 006 (syst), the cross section σ t ¯ tb ¯ b is 4 . 0 ± 0 . 6 (stat) ± 1 . 3 (syst) pb and σ t ¯ tjj is 184 ± 6 (stat) ± 33 (syst) pb. The measurements are compared with the standard model expectations obtained from a powheg simulation at next-to-leading-order interfaced with pythia .


Introduction
Since the discovery of the Higgs boson [1-3], its properties have been measured and compared to the standard model (SM) prediction [4][5][6][7][8][9]. However, the coupling of the top quark to the Higgs boson remains to be determined. Although it appears indirectly through loops in the gluon-gluon fusion production process and in the H → γ γ decay channel, a direct measurement has yet to be completed. One of the most promising channels for a direct measurement of the top quark Yukawa coupling in the SM is the production of the Higgs boson in association with a tt pair (ttH), where the Higgs boson decays to bb, thus leading to a ttbb final state. This final state, which has not been observed yet [10], has an irreducible nonresonant background from the production of a top quark pair in association with a b quark pair produced via gluon splitting (g → bb).
Calculations of the inclusive production cross section for tt events with additional jets have been performed to next-toleading-order (NLO) precision for proton-proton centre-of-mass energies of 7, 8, and 13 TeV [11]. The dominant uncertainties in these calculations are from the choice of the factorization (μ F ) and renormalization (μ R ) scales [12,13], and are complicated by the presence of two very different scales in this process: the top quark mass and the jet transverse momentum (p T ). Therefore, experi-E-mail address: cms-publication-committee-chair@cern.ch. mental measurements of production cross sections pp → ttjj (σ ttjj ) and pp → ttbb (σ ttbb ) can provide an important test of NLO quantum chromodynamics (QCD) theory calculations and important input for describing the main background in the search for the ttH process. Previous cross section and ratio measurements at √ s = 7 and 8 TeV have been reported by the CMS [14,15] and ATLAS Collaborations [16].
In this Letter, the measurements of the cross sections σ ttbb and σ ttjj and their ratio are presented using a data sample of pp collisions collected at a centre-of-mass energy of 13 TeV at the CERN LHC by the CMS experiment, and corresponding to an integrated luminosity of 2.3 fb −1 [17]. Events are selected with the final state consisting of two leptons (e or μ) and at least four reconstructed jets, of which at least two are identified as b quark jets. The cross section ratio is measured with a smaller systematic uncertainty exploiting the partial cancellation of uncertainties.

The CMS detector and event simulation
The central feature of the CMS apparatus is a superconducting solenoid of 6 m internal diameter, providing a magnetic field of 3. 8  endcap detectors. Muons are reconstructed in gas-ionization detectors embedded in the steel flux-return yoke outside the solenoid. A more detailed description of the CMS detector, together with a definition of the coordinate system used and the relevant kinematic variables, can be found in Ref. [18].
The Monte Carlo (MC) simulated samples for the tt signal are generated by the powheg (v2) event generator [19][20][21] at NLO, interfaced with pythia (v8.205) [22,23] using the tune CUETP8M1 [24] to provide the showering of the partons and to match soft radiation with the contributions from the matrix elements (MEs). The NNPDF3.0 [25] set of the parton distribution functions (PDFs) is used. The MadGraph (v5.1.5.11) event generator [26] with MEs at leading order (LO), allowing up to three additional partons, including b quarks, and the Mad-Graph5_amc@nlo (v2.2.2) event generator [27] are both used for cross-checks and studies of systematic uncertainties. The tt samples are normalized to the next-to-next-to-leading-order (NNLO) cross section calculation [28]. The W+jets and Z/γ * +jets processes are simulated in MadGraph5_amc@nlo and are normalized to their NNLO cross sections [29]. The single top quark associated production with a W boson (pp → tW and pp →tW) is simulated in the five-flavour scheme in powheg (v1) at NLO and normalized to an approximate NNLO cross section calculation [30], while the t-channel single top quark events are simulated in the fourflavour scheme in MadGraph5_amc@nlo. The multijet production is modelled in pythia with LO MEs. The CMS detector response is simulated using Geant4 (v9.4) [31]. The events in simulation include the effects of additional interactions in the same or nearby bunch crossings (pileup) and are weighted according to the vertex distribution observed in data. The number of pileup interactions in data is estimated from the measured bunch-to-bunch instantaneous luminosity and the total inelastic cross section [32].

Definition of signal events
Measurements are reported for two different regions of the phase space: the visible and the full phase space. The result in the visible phase space is measured at the particle level, using the stable particles after the hadronization, to reduce the possible theoretical and modelling uncertainties, while the purpose of performing the result in the full phase space is to facilitate comparisons to NLO calculations or measurements in other decay modes.
To define the visible phase space, all ttbb final-state particles except the neutrinos, i.e. the charged leptons and jets originating from the decays of the top quarks, as well as the two additional b quark jets ("b jets"), are required to be within the experimentally accessible kinematic region. The leptons must have p T > 20 GeV, and |η| < 2.4. Electrons or muons originating from the leptonic decays of τ leptons produced in W → τ ν decays are included. The particle-level jets are obtained by combining all final-state particles, excluding neutrinos, at the generator level with an anti-k T clustering algorithm [33] with a distance parameter of 0.4 and are required to satisfy |η| < 2.5 and p T > 20 GeV, which is lower than the reconstructed minimum jet p T due to jet resolution -to have all events that pass the reconstructed jet p T in the visible phase space. Jets that are within R = ( φ) 2 + ( η) 2 = 0.5 units of an identified electron or muon are removed, where φ and η are the differences in azimuthal angle and pseudorapidity between the directions of the jet and the lepton. To identify the b and c quark jets ("c jets") unambiguously, the b and c hadron momenta are scaled down to a negligible value and included in the jet clustering (so called "ghost matching") [34]. The b and c jets are then identified by the presence of the corresponding "ghost" hadrons among the jet constituents.
Simulated events are categorized as coming from the ttjj process if they contain at least four particle-level jets, including at least two jets originating from b quarks, and two leptons (ttjj → bW +b W − jj → b + νb −ν jj). The ttjj sample contains four components according to the number of b and c jets in addition to the two b jets required from the top quark decays. The four components are the ttbb final state with two b jets, the ttbj final state with one b jet and one lighter-flavour jet, the ttcc final state with two c jets, and the ttLF final state with two light-flavour jets (from a gluon or u, d, or s quark) or one light-flavour jet and one c jet. The ttbj final state mainly originates from the merging of two b jets or the loss of one of the b jets caused by the acceptance requirements.

Event selection
The events are recorded at √ s = 13 TeV using a dilepton trigger [35] that requires the presence of two isolated leptons (e or μ) both with p T larger than 17 GeV. The particle-flow (PF) event algorithm [36,37] reconstructs and identifies each individual particle with an optimized combination of information from the various elements of the CMS detector. The energy of photons is directly obtained from the ECAL measurement. The energy of electrons is determined from a combination of the electron momentum at the primary interaction vertex as determined by the tracker, the energy of the corresponding ECAL cluster, and the energy sum of all bremsstrahlung photons spatially compatible with originating from the electron track. The energy of muons is obtained from the curvature of the corresponding track reconstructed by combining information from the silicon tracker and the muon system [38]. The energy of charged hadrons is determined from a combination of their momenta measured in the tracker and the matching ECAL and HCAL energy deposits, corrected for zero-suppression effects and for the response function of the calorimeters to hadronic showers. Finally, the energy of neutral hadrons is obtained from the corresponding corrected ECAL and HCAL energy.
The leptons and all charged hadrons that are associated with jets are required to originate from the primary vertex, defined as the vertex with the highest p 2 T of its associated tracks. Muon candidates are further required to have a high-quality fit including a minimum number of hits in both systems. Requirements on electron identification variables based on shower shape and trackcluster matching are further applied to the reconstructed electron candidates [39][40][41]. Muons and electrons must have p T > 20 GeV and |η| < 2.4.
To reduce the background contributions of muons or electrons from semileptonic heavy-flavour decays, relative isolation criteria are applied. The relative isolation parameter, I rel , is defined as the ratio of the summed p T of all objects in a cone of R = 0.3 ( R = 0.4) units around the electron (muon) direction to the lepton p T . Different cone sizes for electron and muon are used to maximize the sensitivity. The objects considered are the charged hadrons associated with the primary vertex as well as the neutral hadrons and photons, whose energies are corrected to take into account pileup effects. Thus, The muon candidates are required to have I rel < 0.15. For the electron candidates, different I rel thresholds (0.077 or 0.068) are applied depending on the pseudorapidity of the candidate (|η| < 1.48 or 1.48 ≤ |η| < 2.40). These thresholds are obtained from a multivariate analysis technique and result from the considerable differences in both the ECAL and the tracker in the two pseudorapidity regions. The efficiencies for the above lepton identification requirements are measured using Z boson candidates in data with a dilepton invariant mass between 70 and 130 GeV, and are compared with the values from the simulation. The differences between the two evaluations are applied as a correction to the simulation.
The is defined as the projection on the plane perpendicular to the beams of the negative vector sum of the momenta of all reconstructed PF candidates in the event. Its magnitude is referred to as p miss T . In the same-flavour channels, remaining backgrounds from Z+jets processes are suppressed by demanding p miss T > 30 GeV. For the e ± μ ∓ channel, no p miss T requirement is applied.
Jets are reconstructed using the same anti-k T clustering algorithm as particle-level jets in the simulations, with the PF candidates as input particles. The jet momentum is determined as the vectorial sum of all PF candidate momenta in the jet and is found from simulation to be within 5 to 10% of the true momentum over the whole p T spectrum and detector acceptance. An offset correction is applied to jet energies to take into account the contribution from pileup interactions. Jet energy corrections are derived from simulation and confirmed with in situ measurements of the energy balance in dijet and photon+jet events [42]. Additional selection criteria are applied to each event to remove spurious jet-like features originating from isolated noise patterns in certain HCAL regions. The event must contain at least four reconstructed jets with p T > 30 GeV and |η| < 2.4, of which at least two jets must be identified as b jets, using the combined secondary vertex (CSV) algorithm (v2), which combines secondary vertex information with lifetime information of single tracks to produce a b tagging discriminator [43]. A b tagging requirement on this discriminator is applied, which has an efficiency of about 60-70% for b jets and a misidentification probability of 1% for light-flavour jets and 15-20% for c-flavour jets [44].
Differences in the b tagging efficiencies between data and simulation [43] are accounted for by reweighting the shape of the CSV b tagging discriminator distribution in the simulation to match that in the data. Data/simulation p T -and η-dependent correction factors are derived from the control samples separately for light-and heavy-flavour jets, that are described in Section 6.
The diboson, W+jets and multijet contributions are found to be negligible after the full event selection. The Z+jets background is estimated from data using control samples enriched in Z boson events. Table 1 gives the predicted number of events for each physics process and for each lepton category, as well as a comparison of the total number of events expected from the simulation and observed in data. Since the full event selection requires at least two b-tagged jets, a condition which is usually satisfied by tt events, only 5% of the events are from non-tt processes. The ttbj final state is predominantly composed of ttbb events where there is one lost b jet due to acceptance requirements (73% of ttbj events). The background contribution from tt events that fail the visible phase space requirements is labelled "tt others". The number of observed events with four or more reconstructed jets is lower than the prediction from the simulation, a condition that is also observed in the lepton+jets decay mode [45]. Table 1 Predicted number of events for each physics process and for each dilepton category, their total, and the observed number of events. Results are shown after the final event selection. The Z+jets normalization and uncertainty are calculated from data, while all other predictions and statistical uncertainties come from the simulated data samples. The tt sample for event categorization is from the powheg (v2) event generator interfaced with pythia (v8.205).

Cross section measurements
The first and the second jets in decreasing order of the b tagging discriminator usually (in 85% of ttjj events) correspond to the b jets from the decays of top quarks, and hence these jets provide no discriminating power between ttbb and ttjj events. The third and the fourth jets from ttjj events are mostly light-flavour jets, while these are heavy-flavour jets for ttbb events. The normalized 2D distributions of the discriminators from simulation for the third and the fourth jets are shown in Fig. 1. These 2D distributions are used to separate ttbb events from other processes. To extract the ratio of the number of ttbb events to ttjj events, a binned maximum-likelihood fit is performed on the 2D distribution of the CSV b tagging discriminators of the third and the fourth jets, where the three event categories e ± e ∓ , e ± μ ∓ , and μ ± μ ∓ are merged.
The number of ttjj events and the ratio of the numbers of ttbb events to ttjj events are free parameters in the fit. The ttcc and ttLF processes have similar 2D distributions so their contributions are combined based on the MC simulation.
The likelihood function is constructed as the product over all bins of a Poisson probability with a mean defined in each bin by where F norm ttbb , F norm ttbj , and F norm ttLF+ttcc are the normalized expectations for each bin of ttbb, ttbj, and the combination of ttLF and ttcc, respectively. The parameter N ttjj denotes the number of the ttjj events from the fit. The quantity f tt others reflects the fraction of other tt processes in the ttjj sample as calculated in simulation (tt others divided by the sum of the ttjj components in Table 1).
The other backgrounds, such as tt V (V = W or Z) and single top quark processes are fixed to the simulation expectations, while the Z+jets background is fixed to its estimation from control samples in data. This remaining background not from the tt process is labelled N bkg . The parameter R is the ratio of the number of ttbb events with respect to the number of ttjj events, and R is the fraction of ttbj events at the reconstruction level and constrained to the ratio of the number of the ttbj events to ttbb events. It is fixed to 2.43 as calculated from the MC simulation (powheg interfaced with pythia). The effect of this assumption is estimated as a systematic uncertainty in Section 6. Values for N ttjj of 950 ± 30 events and R of 0.056 ± 0.008 are obtained from the fit. The correlation coefficient between the two parameters is 0.002.
The result obtained for R is corrected to account for the different selection efficiencies for the two processes. The event selection efficiencies, defined as the number of ttbb and ttjj events after the full event selection divided by the number of events in the corresponding visible phase space, are 27% and 12%, respectively. For the ttbb process, there are at least 4 b jets in the events, therefore, it is easier to fulfill the requirement of at least two b-tagged jets than the ttjj process. Fig. 2 shows the comparisons of the b tagging discriminator distributions of the third and the fourth jets in the events from data and simulation, where the simulated histograms have been scaled to the fit result.
The b-tagged jet multiplicity distribution in Fig. 3 shows the comparison between data and the simulation after the requirement of at least four jets, together with the ratio of the number of data events to the expectation in the lower panel, where the simulated histograms have been scaled to the fit result.
The ttbb and ttjj cross sections in the visible phase space are calculated using the relationship σ visible = N/( L), where L is the integrated luminosity, N is the number of events from the fit result, and is the efficiency for each process. For the purpose of comparing with the theoretical prediction and the measurements in the other decay modes, the cross sections in the full phase space are extrapolated from the cross sections in the visible phase space using the relation σ full = σ visible /A, where A is the acceptance, defined as the number of events in the corresponding visible phase space divided by the number of events in the full phase space. The acceptances are calculated based on the powheg simulation and are 2.2% and 2.0% for ttbb and ttjj, respectively, including the leptonic branching fraction of both W bosons [46].

Estimation of systematic uncertainties
The systematic uncertainties are determined separately for the ttbb and ttjj cross sections, and their ratio. In the ratio, many systematic effects cancel, specifically normalization uncertainties, such as the ones related to the measurement of the integrated luminosity and the lepton identification, including trigger efficiencies, since they are common to both processes. The various systematic uncertainties in the measured values are shown in Table 2 for the visible phase space.
The systematic uncertainties associated with the b tagging efficiency for heavy-and light-flavour jets are studied separately, varying their values within the corresponding uncertainties. The bflavour correction factors are obtained using tt enriched events by tagging one b jet and probing the other b jet. Their dominant uncertainty comes from the contamination when one of the b jets is not reconstructed [47] (indicated as "b quark flavour" in Table 2). The light-flavour jet correction factors are determined from Z+jets enriched events with at least two jets (indicated as "light flavour" in Table 2). The uncertainty arises because in this control sample of Z+jets, the contamination from the Z+bb process is not well modelled. The correction factor for c jets is not measured, owing to the limited amount of data, and is assumed to be unity with an uncertainty twice as large as for b jets [43] (indicated as "c quark flavour" in Table 2). In the correction factor evaluation, the statistical uncertainty, which can arise owing to low event yields in certain regions, e.g. at values of the b tagging discriminator near one, is also taken into account.
The b tagging discriminator can also be affected by the jet energy scale (JES) variations [42] since the efficiency correction changes through its p T dependence. The corresponding systematic uncertainty is obtained by varying the JES correction within its uncertainty and repeating the whole analysis. The uncertainty induced by the jet energy resolution (JER) is assessed by smearing the jet energy resolution in simulation by an additional uncertainty dependent on η of about 10% [42].
The ratio of ttbb events with respect to ttbj events is based on the powheg MC simulation. The uncertainty arising from this rate is evaluated by comparing the reference value (powheg) with that of a MadGraph5_amc@nlo sample, and powheg samples with different μ F and μ R scales in the ME and parton shower (PS) calculations.
The contributions from Z+jets and single top quark processes are small, and the 2D b tagging discriminator distributions from these backgrounds are similar to those of the ttLF component. Therefore, these backgrounds do not affect the measurement sig- Total uncertainty 34 19 28 nificantly. The uncertainty caused by mismodelling of these backgrounds is assessed by varying the contribution to cover the uncertainty in the single top quark production cross section (indicated as "Background modelling" in Table 2). An uncertainty to account for the modelling of the ttcc fraction by simulations is also assigned by varying the contribution by 50% in the fit. This is derived from the theoretical uncertainty on the ttjj cross section. For the efficiency of ttjj events, the uncertainty owing to the heavy-flavour fraction is negligible because of their small fraction. The systematic uncertainty in the lepton identification is calculated by varying the correction factor for the efficiency within its uncertainty, as derived from Z boson candidates as a function of lepton η and p T , and also taking into account the different phase space between Z boson and tt events.
The systematic uncertainty in the number of pileup events is estimated by varying the total inelastic cross section by 5% to cover all of the uncertainties in the modelling of the pileup [32].
The dependence of the correction factor at the particle level on the assumptions made in the MC simulation is another source of systematic uncertainty: the generators powheg and Mad-Graph5_amc@nlo are compared and the difference in the efficiency is taken as systematic uncertainty. The uncertainties from the μ F and μ R scales at the ME level are estimated by making use of a weighting scheme implemented in powheg to vary the scales by a factor of two up and down with respect to their reference values μ F = μ R = m 2 t + p 2 T,t , where m t is 172.5 GeV, with p T,t being the top quark transverse momentum. The uncertainties from the μ F and μ R scales at the PS level are assessed by using additional simulations where the scales are changed by a factor of two up and down relative to their reference values. In simulation, event weights are calculated that represent the usage of the uncertainty eigenvector sets of the PDF. The uncertainties in the PDFs are accounted for by using these various event weights. The uncertainty from the modelling of jet multiplicity, in particular, the mismodelling for events with more than five jets, is also taken into account. It is estimated to be 5% by comparing the rates of high-multiplicity events in data and simulation.
The size of the MC sample used for ttbb simulation being limited, the uncertainty from the statistical fluctuations in the sim- ulated event samples is assessed by repeating the fit with the method described in Ref. [48]. The difference of 1.5% in the result is accounted for in the systematic uncertainty.
In addition to the theoretical and modelling uncertainties described above, the uncertainty coming from the modelling of the top quark p T distribution in the ME calculations is taken into account. The uncertainty is calculated by taking the difference in shape between the parton-level p T spectrum from the ME generator and the unfolded p T spectrum from the data [49]. The uncertainty due to the top quark p T modelling is negligible in this analysis, as shown in Table 2.
Adding all these contributions in quadrature gives a total systematic uncertainty of 28% in the cross section ratio, with the dominant contributions coming from the b tagging efficiency and the misidentification of light-and c-flavoured partons, followed by the matching scale systematic uncertainties.
The uncertainty in σ ttjj is significantly smaller than that in σ ttbb since the measurement of the latter requires the identification of multiple b jets. The uncertainty in σ ttbb is larger than that for the cross section ratio, since the uncertainties that are common between ttbb and ttjj, such as the JES uncertainty, partially or completely cancel in the ratio.
When extrapolating the measurements from the visible phase space to the full phase space, the systematic uncertainty in the acceptance is included. The effect of the MC modelling of the acceptance is estimated by comparing the results between Mad-Graph5_amc@nlo and powheg. This uncertainty amounts to 4% for each of the cross section measurements and 1% for the cross section ratio.

Results
After accounting for all corrections and systematic effects, the cross section ratio σ ttbb /σ ttjj is measured in the visible phase space from a fit to the measured CSV b tagging discriminator distributions. The measured cross section ratio in the visible phase space for events with particle-level jets is (σ ttbb /σ ttjj ) vis = 0.024 ± 0.003 (stat) ± 0.007 (syst). (3) The result is obtained in the visible phase space, defined as events having two leptons with p T > 20 GeV and |η| < 2.4, plus at least four jets, including at least two b jets with p T > 20 GeV and |η| < 2.5. The cross section ratio in the full phase space that uses the acceptance correction described in Section 5 is σ ttbb /σ ttjj = 0.022 ± 0.003 (stat) ± 0.006 (syst). (4) The predicted values from powheg are 0.014 ± 0.001 and 0.012 ± 0.001 for the visible and full phase space, respectively, where the Table 3 The measured cross sections σ ttbb and σ ttjj and their ratio for the visible and the full phase space, corrected for acceptance and branching fractions. The uncertainties on the measurements show separately the statistical and systematic components, while those are combined for the powheg predictions. uncertainty in the simulation is the sum in quadrature of the statistical, and the μ F /μ R scale systematic uncertainties. The prediction obtained from powheg simulation (interfaced with pythia) underpredicts the measured cross section ratio by a factor of 1.8, but it is compatible with the observation within two standard deviations. The measured cross sections in the visible and the full phase space are presented in Table 3.

Summary
Measurements of the cross sections σ ttbb and σ ttjj and their ratio σ ttbb /σ ttjj are presented using a data sample recorded in pp collisions at √ s = 13 TeV, corresponding to an integrated luminosity of 2.3 fb −1 . The cross section ratio has been measured in a visible phase space region using the dilepton decay mode of tt events and corrected to the particle level, corresponding to the detector acceptance. The measured cross section ratios in the visible and the full phase space are σ ttbb /σ ttjj = 0.024 ± 0.003 (stat) ± 0.007 (syst) and σ ttbb /σ ttjj = 0.022 ± 0.003 (stat) ± 0.006 (syst), respectively, where a minimum transverse momentum for the particle-level jets of 20 GeV is required. The ttH contribution, being negligible, is not removed from data. Theoretical ratios predicted from the powheg simulation (interfaced with pythia) are 0.014 ± 0.001 for the visible and 0.012 ± 0.001 for the full phase space, which are lower than the measured values but consistent within two standard deviations. The individual cross sections σ ttbb = 4.0 ± 0.6 (stat) ± 1.3 (syst) pb and σ ttjj = 184 ± 6 (stat) ± 33 (syst) pb have also been measured. These results, in particular the ratio of the cross sections, provide important information for the ttH search, permitting the reduction of a dominant systematic uncertainty that derives from the uncertainty in the ttbb background. They can also be used as a figure of merit for testing the validity of next-to-leading-order QCD calculations at √ s = 13 TeV.

Acknowledgements
We congratulate our colleagues in the CERN accelerator departments for the excellent performance of the LHC and thank the technical and administrative staffs at CERN and at other CMS institutes for their contributions to the success of the CMS effort. In addition, we gratefully acknowledge the computing centers and personnel of the Worldwide LHC Computing Grid for delivering so effectively the computing infrastructure essential to our analyses. Finally, we acknowledge the enduring support for the construction and operation of the LHC and the CMS detector provided by the following funding agencies: BMWFW and FWF (Aus