Measurements of the Cross-Section for the $t\bar{t}$ + Heavy-Flavor Production at the LHC

At the LHC, the process of a Higgs boson decaying into bottom or charm quarks produced in association with a pair of top quarks, ttbarH , allows for an empirical exploration of the heavy-flavor quark Yukawa couplings to the Higgs boson. Accordingly, the cross-sections for the $t\bar{t}$ + heavy-flavor production without the appearance of the Higgs boson have been measured at the LHC in various phase spaces using data samples collected in pp collisions at $\sqrt{s}$ = 7, 8 and 13 TeV with the ATLAS and CMS experiments. Flavor ratios of cross-sections of $t\bar{t}$ + heavy-flavors to $t\bar{t}$ + additional jets processes are also measured. In this paper, the measured cross-sections and ratios are reviewed and the prospects with more data are presented.


Introduction
Decades of theoretical and experimental exploration of the most elementary particles and their properties yielded a detailed description of fundamental interactions, captured in a quantum field framework known as the Standard Model of particle physics. The success of this model in describing observations over many orders of magnitude in interaction energy cannot be overestimated. However, despite leading to a more profound understanding, the research field faces several problems and mysteries. Some are related to cosmological observations of dark matter in the universe and the ubiquity of matter over antimatter, some to the mathematical consistency of the model itself with respect to even the smallest variations in its parameters. Several puzzling features are related to the flavor structure of the Standard Model of particle physics, not least those present in the heavyflavor sector. Through accurate measurements, we attempt to find cracks in the model where theoretical predictions may not match experimental observations. These discoveries may open new avenues to address the open problems and mysteries, either within the realm of quantum field theory or even by questioning the basic principles underlying this mathematical framework.
After the Higgs (H) boson discovery in 2012, the consistency check with the H boson in the standard model was one of the highest priorities at the Large Hadron Collider (LHC), especially in the heavy-flavor sector. From analyzing the proton collision data of the LHC, the couplings of a top quark and a bottom quark (third-generation quarks) in the standard model with the H boson were discovered in different processes [1,2]. However, the confirmation that both couplings are simultaneously consistent with the predictions is only possible at the LHC by measuring the unique process of the H boson production in association with a tt pair (ttH), where the H boson decays to a pair of bottom (b) quarks. This decay channel of the Higgs boson gives the largest signature of the tt pair (ttH). This process alone has yet to be discovered in the data, leading to a ttbb final state. Understanding the ttbb process in proton-proton collisions without the presence of an H boson is a prerequisite to the discovery. In addition, the charm (c) jets in the ttcc process can also be misidentified as b jets, inducing a background in the analogy of the ttbb process. Therefore, the measurements of cross-sections of the tt + heavy-flavor (tt + HF) process at the LHC are essential, yet challenging objectives.
Calculations of the inclusive production cross-section for top quark pairs with additional jets by matching matrix element generators to parton showers have been performed to next-to-leading-order (NLO) precision in quantum chromodynamics (QCD) [3][4][5][6][7]. Theoretical QCD calculations of the ttbb process are available at NLO [8][9][10][11][12][13][14][15] but they suffer from large factorization and renormalization uncertainties due to the presence of two very different scales in this process. Therefore, precise measurements can also provide a good test of the NLO QCD theory itself. Full NLO QCD corrections to off-shell ttbb production are available in Ref. [16,17]. Calculations of ttbb with massive b quarks use parton density functions (PDFs) of the proton in the four flavor scheme (4FS), where b quarks are not part of the proton PDF. These matrix element level predictions of ttbb with massive b quarks are matched to parton showers [18][19][20]. In addition, the associated production of ttbb with one additional jet is available as well [21].
The cross-sections for the tt + HF production have been measured in various phase spaces using data samples collected in pp collisions by the ATLAS [22] and CMS [23] experiments at √ s = 7, 8 and 13 TeV [24][25][26][27][28][29][30][31][32][33]. In order to obtain observable cross-section values, certain kinematic thresholds should be applied to the additional heavy-flavor jets. The interplay between the b jets from the top quark decay with the additional heavyflavor jets is not trivial and, accordingly, the definition of the signal is challenging. The definitions are different in each measurement and between experiments. We will discuss the definitions in Section 2 in more detail. In order to achieve higher precision, flavor ratios of cross-sections of tt + HF to tt + additional jets processes are also measured. The cross-section ratio measurement was originally motivated as many kinematic distributions are expected to be similar for ttbb, ttcc and ttjj, leading to reduced systematic uncertainties in the ratio.
Most measurements focus on the ttbb cross-section. The ttcc process has been explored less due to the fact that the experimental signature of a c jet is sandwiched between that of b jets and light quark jets and gluons. With the recent development of charm jet taggers, the ttbb and ttcc processes can be more efficiently distinguished and the ttcc cross-section has now been measured by CMS [32].
In this experimental review, we summarize the results for the inclusive and differential cross-section measurements of tt + HF production at the LHC submitted to journals or available to the public before May 2023.

Definition of the tt + Heavy-Flavor Signal
The measurements of the ttbb and ttcc cross-sections are performed for both regions of the visible and the full phase space. The resulting cross-sections at the particle level in the visible phase spaces have reduced theoretical and modeling uncertainties while the purpose of performing the measurement in the full phase space is to facilitate comparisons to theoretical calculations or measurements obtained in other decay modes. An example of the ttbb and ttcc processes in Feynman diagrams are shown in Figure 1. Final-state particles are defined in Section 2.1 and the processes in Section 2.2. Figure 1. An example of the Feynman diagram of the ttbb and ttcc processes at the LHC in the lepton + jets channel.

Particle-Level Object Definition
In the definition of the visible phase space, all generated objects such as leptons and jets are required to be within the experimentally accessible kinematic region. In ATLAS, the objects are defined at the particle level which is based on the stable particles after the hadronization to reduce the dependence on the generation level information. Electrons and muons not emerging from hadron decays are considered. Furthermore, to reach the full particle-level definition, for all charged leptons, potential final-state photon radiation within a ∆R = 0.1 cone around the lepton is added to the four-momenta of the lepton. In CMS, the electrons and muons are required to originate from a W boson at the generator level. The electrons or muons originating from the leptonic decays of τ leptons produced in W → τν decays are included. The procedure of adding final state photon radiation to the lepton is not performed in CMS except for the latest result in the lepton + jets channel [33], where the final-state photon radiation is added to the lepton at the particle level. The particle-level jets are defined by clustering stable particles, excluding neutrinos with the anti-kt algorithm with a distance parameter of 0.4 at a center of mass energy √ s = 13 TeV and 0.5 at √ s = 7, 8 TeV. To identify the heavy-flavor b and c particle-level jets, a so-called ghost matching is performed. The b-and c-hadrons are included in the jet clustering procedure after scaling their momenta to negligible values while preserving their directions. The b and c jets are then identified by the presence of the corresponding "ghost" hadrons among the jet constituents. The approach to defining the particle-level jets is the same in ATLAS and CMS. However, in terms of defining the tt + HF quarks processes, there are subtle distinctions between different channels and experiments we will discuss in the following section.

Process Definition
In ATLAS, ttb(b) is defined by the presence of at least three (four) particle-level b jets. Events with only three b jets can come from the case wherein one of the b jets is out of acceptance or two b jets are merged together. For ttc, in the dilepton channel, the number of particle-level b jets should be less than 3 and at least one c jet while in the lepton + jets channel events should contain at least two c jets as the events with exactly one c jet would come from the W → cs(cs) decays. If the events with additional jets do not meet the criteria described above, the events are grouped into a ttl. In ATLAS, only measurements in the visible phase space are available and the origin of the heavy-flavor jets is not identified. Instead, the two b jets with the smallest ∆R separation or with the highest transverse momentum are selected. In CMS, for the visible phase space, the ttjj process is defined if the event contains at least four particle-level jets including two b jets, and the same number of leptons as required at the reconstructed level. The ttbb process is defined by the presence of at least four b jets regardless of their origin in the dilepton, lepton + jets channel and hadronic channel, called "parton-independent". Additionally, in the hadronic channel, a parton-based definition of requiring two b jets originating from the top quark and two additional b jets is introduced. For the ttbl process, the event should contain only one additional b jet and at least one additional light-flavor jet or c jet. The ttll process is the case where there are no additional b or c jets, but at least two additional light-flavor jets within the acceptance. In the ttcc measurements, the ttcc process is defined by the presence of at least two b jets and at least two c jets.
The cross-sections are measured in the visible phase space to reduce the systematic uncertainties that can be coming from the theory dependence on the acceptance. For the full-phase space measurements performed by CMS, the additional b jets are required not to be from the weak decay of the tt system at the generator level. There is no further requirement for the decay particles from the top quarks. Therefore, the measurements can be compared across different channels as well as with theory predictions. The cross-sections in the full phase space are obtained by taking into account the acceptance which can only be calculated based on simulations inducing an additional systematic uncertainty. The definitions of the signal phase space in ATLAS and CMS are summarized in Table 1. The measured cross-sections can not be compared directly between ATLAS and CMS due to the different phase space definitions. In the following sections, the results from the two experiments are reviewed.

Monte Carlo Simulation
The signals of the tt + HF events were simulated using various Monte Carlo (MC) samples in ATLAS and CMS. Theoretical predictions are summarized in this section.
The nominal tt sample was generated using the POWHEG generator at next-to-leadingorder (NLO) [34][35][36] at √ s = 13 TeV. The parton shower, fragmentation and the underlying events were simulated using PYTHIA 8.210 [37]. This sample is called POWHEG +PYTHIA 8 in the following. At √ s = 8 TeV, the events generated using the POWHEG generator were interfaced with PYTHIA 6 [38]. In CMS, the MADGRAPH [39] generator was also used as the nominal tt sample at √ s = 8 TeV. For the purpose of assessing the uncertainty due to the choice of the QCD MC model and to compare with unfolded data, alternative tt samples were generated. Two samples were generated using POWHEG +PYTHIA 8 with different renormalization and factorization scales. To estimate the effect of the choice of parton shower and hadronization algorithms, a tt sample was generated by interfacing POWHEG with HERWIG 7 [40,41] (referred to as POWHEG +7 or as POWHEG + HERWIG ++ in this paper). The tt events were also generated with the SHERPA 2.2.1 generator [42], which models the zero and one additional parton process at NLO accuracy and up to four additional partons at LO accuracy. In addition to the samples above, a tt sample was also generated using the MADGRAPH _aMC@NLO [5], interfaced to PYTHIA 8. In CMS, the MADGRAPH _aMC@NLO generator is matched to HERWIG 6 and PYTHIA 6 as well at √ s = 8 TeV. All of the tt samples are normalized to a cross-section calculated at next-to-next-to-leading order (NNLO) [43,44].
A dedicated sample of ttbb events was generated using SHERPA +OPENLOOPS [18]. The ttbb matrix elements were calculated with massive b-quarks at NLO, using the COMIX [45] and OPENLOOPS [46] matrix element generators, and merged with the SHERPA parton shower, tuned by the authors [47]. This sample is referred to as SHERPA 2.2 ttbb (4FS). A sample of ttbb events was generated using the POWHEL [15], where the matrix elements were calculated at NLO with massless b-quarks and matched to the PYTHIA 8. This sample is referred to as POWHEL +PYTHIA 8 ttbb (5FS). The POWHEL generator with massive b-quarks and matched to the PYTHIA 8 is referred to as POWHEL +PYTHIA 8 ttbb (4FS). Another sample of ttbb events using the POWHEG generator where ttbb matrix elements were calculated at NLO with mass b-quarks. The events were matched to the PYTHIA 8. This sample is referred to as POWHEG +PYTHIA 8 ttbb (4FS) to distinguish it from the nominal tt sample described above.

Portfolio of Cross-Section Measurements
In ATLAS, the ttb and ttbb inclusive and differential cross-sections are measured using data corresponding to an integrated luminosity of 4.7 fb −1 of proton-proton collisions at a center-of-mass energy of 7 TeV [24] and 20.3 fb −1 at √ s = 8 TeV [25]. At √ s = 13 TeV, the cross-sections using data corresponding to an integrated luminosity of 36.1 fb −1 are measured in the eµ and in the lepton + jets channels [26].
In CMS, the ttbb inclusive and differential cross-sections are measured using data collected at √ s = 8 TeV [28]. The inclusive cross-sections of the ttbb production are also measured in the dilepton channel using early data corresponding to an integrated luminosity of 2.3 fb −1 at √ s = 13 TeV [29]. The inclusive analysis was updated in the dilepton channel and extended to the lepton + jets channel with data corresponding to an integrated luminosity of 35.9 fb −1 [30]. Measurements of the ttbb process in the hadronic channel are performed using data corresponding to an integrated luminosity of 35.9 fb −1 at √ s = 13 TeV [31]. Measurements of the ttcc production are also available at √ s = 13 TeV [32]. Recently, the measurements of the inclusive and differential cross-sections were updated in the lepton + jets channel with a full Run 2 data corresponding to an integrated luminosity of It is worth noting that only a small fraction of the available data has been used for all these σ ttbb and σ ttcc measurements except the lepton + jets channel.

Inclusive Cross-Section Measurement
In ATLAS, at √ s = 13 TeV, the cross-section measurements were performed in the eµ channel within the at least three b jet visible phase space and in lepton + jets channels within the at least four b jet visible phase space. To extract the tt + HF number of events, in both channels, a binned maximum likelihood fit is used on observables discriminating between signal and background. A combined template is created from the sum of all backgrounds. Three templates of ttb, ttc and ttl events are created from all of tt, tt in association with a vector boson (ttV) and ttH simulations as those samples contain the signal process. In the eµ channel, ttc and ttl are merged together to fit to the distribution of the third highest b-tagging discriminant among the reconstructed jets in the event. The scale factors obtained from the fit are 1.33 ± 0.06 for the number of ttb events and 1.05 ± 0.04 for the number of the combined ttc + ttl events. In the lepton + jets channel, all three templates are used to fit to the 2D histograms of the third and fourth b-tagging discriminant. The best fit values are 1.11 ± 0.2 for the number of ttb events, 1.59 ± 0.06 for the number of ttc events and 0.962 ± 0.003 for the number of ttl events. The measured cross-section values for ttb for both channels are compatible with each other.
To facilitate the comparison with the theory ttbb cross-section, the ttH and ttV processes are also subtracted from the measured cross-section. The measured inclusive crosssections are shown in Figure 2. All of the inclusive cross-sections measured at √ s = 13 TeV in the visible phase space by the ATLAS experiment are summarized in Table 2. The cross-section measurement for the ≥three b jet phase space in the eµ channel has an uncertainty of 13%, which is the most precise measurement. The uncertainties are dominated by systematic uncertainties mainly from the tt modeling and b-tagging, as well as the jet energy scale.   [26] 177 ± 5 ± 24 103 ± 30 ≥3b (eµ) [26] 25 ± 3 ± 7 17.3 ± 4.2 ≥4b (lepton + jets) [26] 2370 ± 40 ± 690 1600 ± 530 ≥ 5j, ≥3b (lepton + jets) [26] 331 ± 11 ± 61 270 ± 70 ≥ 6j, ≥4b The ratio measurement of the cross-sections of ttbb to ttjj production is also available using data collected at √ s = 8 TeV [25]. The ratio measurement is motivated to reduce the systematic uncertainties and the result is compared with predictions in Figure 3.
In CMS, the inclusive ttbb cross-sections are measured in the different phase spaces of the dilepton, lepton + jets and hadronic channels using data collected at √ s = 13 TeV by CMS. In the dilepton channel, measurements at √ s = 8 TeV are also available. In the dilepton channel, the final state consists of two reconstructed leptons and at least four reconstructed b jets. With these two leptons, the dominating Z + jets background is estimated from data using control samples enriched in Z boson events. Among the at least four b jets, the first and the second jets in decreasing order of the b tagging discriminator tend to be the b jets from the top quark. Therefore, jets with the third and fourth largest b tagging discriminator are considered as the additional b jets. Using the two-dimensional distribution of these discriminators of two determined additional jets, the number of ttbb events is extracted. Together with the ratio σ ttbb /σ ttjj , the cross-sections σ ttbb and σ ttjj are measured in the visible phase space. For the purpose of comparing the measurements with the theoretical prediction and with measurements in the other decay modes, the cross-sections in the full phase space are obtained by taking into account the acceptance, σ f ull = σ visible /A, where A is the acceptance, defined as the number of events in the corresponding visible phase space divided by the number of events in the full phase space. The results for the full phase space are shown in Figure 4 (upper). -1 =8 TeV, 20.3 fb s Figure 3. Measurement of the ratio between the ttbb and ttjj visible cross-sections at √ s=8 TeV by ATLAS [25].

ATLAS
In the lepton + jets channel, the measurement was conducted with data corresponding to an integrated luminosity of 35.9 fb −1 at √ s = 13 TeV in CMS. In this channel, the identification of the origin of the jets is challenging because the final state with at least six jets including four b jets leads to ambiguities in the jet assignment. Moreover, the heavy-flavor jet can also originate from the W boson decay. In order to address this, the kinematic reconstruction method is used to identify the additional b jets. The algorithm assigns a χ 2 value according to the goodness of fit of each jet permutation to meet certain kinematic constraints. The solution selected is the one with the lowest χ 2 value. Once a jet topology is selected, the additional jets in the event are arranged in decreasing order of their b tagging discriminant value. Then, similar to the dilepton channel, only the information from two additional jets with the highest b tagging discriminant value is used to extract the ttbb cross-section. The results for the ratio σ ttbb /σ ttjj , σ ttbb and σ ttjj are presented for both the visible phase space and the full phase space (see Figure 4). Recently, the measurement in the lepton + jets channel was updated with a full Run 2 data corresponding to an integrated luminosity of 138 fb −1 [33]. In this analysis, the cross-sections in four different visible phase spaces are measured extensively in four different phase spaces. The final states of each phase space are shown in Table 1. For the phase spaces with the requirement of three additional light jets, it is motivated for the study of additional QCD radiation in ttb or ttbb events as these have been shown to be sensitive to the modeling of ttbb production. The measured cross-sections in all phase spaces are larger than the predictions from the POWHEG +PYTHIA 8. All other predicted values in each phase space are available in Ref. [33].
In the hadronic channel, the multi-jet process is the main background. To remove the multi-jet events, the quark-gluon discriminant was used. The unsupervised learning algorithm was also further used to maximize the contribution of ttbb events. The measured cross-sections follow two definitions of the ttbb events in the fiducial phase space. One is based exclusively on stable generated particles after hadronization (parton-independent). This definition facilitates comparisons with predictions from event generators. The other uses parton-level information after radiation emission (parton-based). This definition is closer to the approach taken by searches for ttH production to define the contribution from the ttbb process. To address the large combinatorial ambiguity in identifying the additional jets in the events, a boosted decision tree (BDT) was used. The cross-section is also reported for the total phase space by correcting the parton-based fiducial cross-section by the experimental acceptance. The results are presented in Figure 5.  The cross-section of a top quark pair production with an additional pair of c jets has been measured for the first time by CMS. This measurement is challenging as the experimental signature of a b jet is very similar to that of a c jet. Two additional jets are selected using a deep neural network classifier. To separate the ttcc, ttbb and ttll events, a NN is trained using charm jet tagging information of the first and second additional jets, and kinematic variables such as the angular separation ∆R between two additional jets, as well as the NN score for the best jet permutation. This NN predicts the probabilities for five output classes of ttcc, ttcl, ttbb, ttbl and ttll. Two discriminators are derived as follows.
The ttcc, ttbb and ttll cross-sections are extracted from a fit to the two-dimensional distribution of these discriminators. The ratios R b and R c of, respectively, the measured σ ttbb and σ ttcc cross-sections with respect to the inclusive tt + two jets cross-section were also measured. The results are compared to theoretical predictions of either the POWHEG or MADGRAPH 5_aMC@NLO generators as shown in Figure 6.  Figure 6. Results of the ttbb versus ttcc cross-section measured by CMS in the fiducial phase space, and their ratios to the inclusive tt + two jets cross-section [32].
All of the inclusive cross-sections measured in the visible phase space by the CMS experiment are summarized in Tables 3 and 4, and for the full phase space in Tables 5 and 6. Figure 7 also shows the comparison between the measured values in the full phase space and various theoretical predictions in CMS. Table 3. Measured and predicted cross-sections in the visible phase space at √ s = 13 TeV by CMS. In the hadronic channel, the parton-based cross-section is shown. The predictions of POWHEG +PYTHIA 8 are shown.

Differential Cross-Section Measurements
In addition to the inclusive cross-section measurements, the differential measurements of the tt + HF production cross-sections can also provide information on the perturbative QCD (pQCD) and enable the searches for potential new physics. The ttbb differential cross-section measurements have been performed at √ s = 7, 8 and 13 TeV with the ATLAS experiment and at √ s = 8 and 13 TeV with the CMS experiment.
To measure the differential cross-sections, the measured distributions at the detector level need to be unfolded to the generator level where the detector effect is removed so that the resulting cross-section can be compared with theory predictions and results from other experiments. At the generator level, it is not trivial to define the additional b jets in the ttbb process as we have b jets from the top quark decay. Moreover, the b jet could also emerge from the W boson decay. The additional b jets are expected to come from the gluon decay and can also come from the decay of the H boson or another boson.
In the b jet identification, there is a clear difference between the two experiments in ATLAS and CMS. In the ATLAS experiment, at the particle level, there is no attempt to identify the origin of the b jets relying on the simulation information. At this particle level, the two b jets with the highest p T or the smallest ∆R are selected for the differential cross-section measurement. The highest p T jets are considered as the b jets from the top quark while the b jets with the smallest ∆R are considered as the additional b jets not from the top quark decay to make use of the fact that the b jets from a gluon splitting tend to be collinear. While in CMS, the origin of the b jets is explicitly identified using the simulation information. For example, the b hadron is traced back through its ancestors in the simulation chain. In this way, only if the b jet is not from a top quark, the b jet is identified as one of the two additional b jets.
For the ATLAS measurements, the unfolded results are presented as normalized differential cross-sections in visible phase space as a function of the b jet multiplicity, global event properties and various kinematic variables. The measurements are conducted in the eµ channel with at least three reconstructed b jets and in the lepton + jets channel with at least four b jets. The sample with at least four b jets in the lepton + jets channel has high signal purity resulting in a measurement with smaller dependence on the simulation. The eµ channel benefits from an order of magnitude of a larger sample size containing at least three b jets.
Once the reconstructed level distributions of tt + HF events are extracted, then the measured distributions are unfolded to the particle level. The detector resolution effect and inefficiency are corrected by inverting the migration matrix which is optimized for a diagonal matrix. An iterative Bayesian unfolding technique [49] implemented in the ROOUNFOLD software package [50] is used in this process. Detector efficiencies and acceptance are then corrected using a bin-by-bin method. Figure 8 shows the normalized cross-section as a function of the b jet multiplicity compared with predictions from various generator set-ups. The first three panels show the ratios of various predictions to data. The last panel shows the ratio of predictions of normalized differential cross-sections from MADGRAPH 5_aMC@NLO+PYTHIA 8, including or not the contributions from the ttH and ttV processes. All predictions relying on the parton shower generation of jets for high multiplicities are lower compared to the measurements. This suggests that the b jet production by the parton shower is not optimal in these processes. The comparison of the predictions from various generators with the measurements are made after subtracting the simulation-estimated contributions of ttV and ttH production from the data. The impact of including these processes in the prediction increases with b jet multiplicity, resulting in a change of about 10% relative to the QCD tt prediction alone in the inclusive four b jet bin. The measurement in the eµ channel with at least three b jets tends to be more precise than in the lepton + jets channel with at least four b jets.
It is also of importance to verify the distributions of the p T , the mass and the angular distance ∆R of the two b jets where the b 1 b 2 system is built from the two highest-p T b jets and the two closest b jets in ∆R. The measured distributions of those three variables in the lepton + jets channel are shown in Figures 9-11. The differential cross-section as a function of the p T of the b 1 b 2 system is measured with a precision of 10-15% over the full range in the eµ channel and with an uncertainty of 20-25% in the lepton + jets channel. In general, the differential distributions are well described by the different theoretical predictions, which vary significantly less compared to the size of the experimental uncertainty. All other distributions such as H T or p T of additional b jets are available in Ref. [26].
In CMS, the differential cross-sections are measured in the visible phase space as a function of various kinematic properties such as the p T and η of the leading and subleading additional b jets, the angular distance ∆R between them and the invariant mass m bb of the two additional b jets. In particular, the differential cross-sections as a function of the m bb and ∆R are of interest as the two additional b jets from a gluon tend to be produced collinearly and those from the H boson have the resonance peak at 125 GeV. Stat.

ATLAS
MG5_aMC@NLO+Pythia8 Figure 8. The relative differential cross-section as a function of the b jet multiplicity in events with at least two b jets in the eµ channel compared with various generators. The ttH and ttV contributions are subtracted from data. Uncertainty bands represent the statistical and total systematic uncertainties [26].

Figure 9.
Relative differential cross-sections as a function of p T of the two highest-p T b jets (left) and the two closest b jets in ∆R (right) in the events with at least four b jets in the lepton + jets channel compared with various generators from the ATLAS measurements. The contributions from ttH and ttV are subtracted from data [26].

Figure 10.
Relative differential cross-sections as a function of m b 1 b 2 of the two highest-p T b jets (left) and the two closest b jets in ∆R (right) in the events with at least four b jets in the lepton + jets channel compared with various generators from the ATLAS measurements. The contributions from ttH and ttV are subtracted from data [26].
Syst. Stat. Figure 11. Relative differential cross-sections as a function of ∆R b 1 ,b 2 of the two highest−p T b jets (left) and the two closest b jets in ∆R (right) in the events with at least four b jets in the lepton + jets channel compared with various generators from the ATLAS measurements. The contributions from ttH and ttV are subtracted from data [26].
At the reconstruction level, it is very challenging to identify two additional b jets because there are four b jets from top quarks and a gluon splitting. To select the additional b jets, the multivariate approach of a BDT was used to maximize the correct assignment of additional b jets. The input variables to the BDT combine information from the two final-state leptons, the jets and E miss T . A total of twelve variables, e.g., the sum and difference of the invariant mass of the bl + andbl − system, m bl + ± mb l − ; the absolute difference in the azimuthal angle between them, |∆ϕ bl + ,bl − |; the p T of the bl + andbl − system, p bl + T and pb l − T and the difference between the invariant mass of the two b jets and two leptons and the invariant mass of the bb pair, m bbl + l − − m bb , are used as input variables. The variables insensitive to the additional radiation are selected to avoid any dependence on the kinematics of the additional jets. The jets from the tt system are identified as the pair with the highest BDT discriminant. From the remaining jets, those b-tagged jets with the highest p T are selected as being the leading additional ones. With this method, the correct assignment rate for the additional b jets in ttbb events is around 40%.
A template fit to the b-tagged jet multiplicity distribution is performed to improve the data and simulation comparison. For the differential cross-section measurements, effects from detector efficiency and resolution are corrected by using the regularized inversion of the response matrix which is calculated from simulated tt events. The measured differential cross-sections as a function of the leading and subleading additional b jet p T , the ∆R and invariant mass of two additional b jets are shown in Figure 12 for CMS. Measured cross-sections are compared with various theoretical predictions. The shape of the p T distributions are well described by prediction. However, the measured values by CMS have larger uncertainties due to the use of a smaller data sample with respect to ATLAS.  In CMS, the differential cross-sections are measured with a full Run 2 data in the lepton + jets channel. In this analysis, two approaches are used to identify the additional b jets from the gluon splitting, while two b jets with the smallest angular separation are selected to reduce the systematic uncertainty on theory dependence, a multivariate algorithm based on a deep neural network (DNN) is also used to identify additional b jets not from top quarks by using the MC information.
To find the correct pair of b jets not from top quarks, only four b jets in the highest p T order are used as candidate jets, which results in the six possible candidate jet combinations. The DNN makes use of two sets of input variables, targeting jet-specific input information and global event information separately. For jet-specific input information, the input variables consist of the p T , η, a flag indicating whether it passes the tight b tagging working point, the angular separation (∆R) with the charged lepton and the invariant mass with the charged lepton. These input variables are connected via five convolutional network layers (CNN) [51] followed by a long short-term memory (LSTM) cell [52]. For the global event information, the input variables consist of the scalar p T sum of the four candidate b jets, the p T , η, ϕ of the charged lepton, the ∆ϕ, ∆η and invariant mass of the dijet combinations, the ∆ R of the dijet combinations and the charged lepton as well as the jet and b-tagged jet multiplicities. These input variables are connected to three dense network layers with 50 nodes each. Both of these sequences are concatenated at the end into one dense layer with 10 nodes, which is connected to an output layer consisting of six nodes, each representing one of the six possible candidate jet combinations. The pair of b-tagged jets with the highest DNN output value per event is chosen as the correct assignment of the additional b jet pair and used further for the differential cross-section measurement.
The correct assignment of additional b jets in the DNN is about 49%, which represents a significant increase compared to choosing the two b jets closest in ∆R, which only yields about 41%. The measured differential cross-sections as a function of the leading and subleading additional b jet p T , the ∆R and invariant mass of two additional b jets selected in the DNN are shown in Figure 13. The distributions are not well described by POWHEG + HERWIG 7 (referred to as POWHEG + H7 in Figure 13). More differential variables are available in Ref. [

Discussion
The results from the Run 1 and Run 2 data analyses of ATLAS and CMS of the measured inclusive cross-sections of tt + HF jets are higher than the theoretical predictions (see Tables 2-6).
It will be interesting to observe whether these differences become significant with additional data. There were also attempts to measure the differential cross-sections aiming to identify variables where the differences become larger. The measured differential cross-sections are in general consistent with theory predictions within its large statistical uncertainty. However, in the ∆R distribution, there is a discrepancy at the first bin. In particular, the HERWIG prediction tends to produce two additional b jets with smaller angles than the measured value as well as other predictions matched to PYTHIA. In the realm of tt + HF, there is a large fraction of Run-2 data yet to be analyzed and we expect twice more data in Run-3 at the LHC in the coming years. We can envisage reducing not only the statistical uncertainty but also systematic uncertainties as more data may enable more data-driven techniques. More data will also make it possible to use a smaller bin width to enable hints about potential discrepancies in the inclusive measurements. More advanced heavy-flavor tagging may distinguish the c-flavor jet better from the bflavor jets. Novel flavor tagging developments can indeed increase our understanding of pQCD and the potential to discover new physics. More synchronized definitions are also required to compare or combine results from ATLAS and CMS.
As we have a systematically higher measured value of cross-section of the tt + HF compared to prediction, more data from Run-3 and eventually from the High Luminosity-LHC may provide interesting opportunities to find cracks in our theoretical understanding. The discrepancy could be from the fact that signal samples are modeled only at NLO in QCD. We should also make use of the effective field theory (EFT) approach for possible new physics. To interpret experimental measurements in the context of physics beyond the standard model, the EFT approach is of interest as a model-independent approach [53]. Differential measurements may be crucial in this approach as the presence of the SMEFT operators can modify the kinematics in the standard model processes.

Conclusions
The inclusive and differential cross-sections for the tt + HF jet production have been measured extensively in ATLAS and CMS for the various phase spaces using data samples collected in pp collisions at √ s = 7, 8 and 13 TeV. The ratio of the cross-sections of the tt + HF jets with respect to the cross-section of the tt + additional jets is also measured, aiming for reduced uncertainties as many kinematic distributions are expected to be similar between the tt + HF jets and the tt + additional jets. These measured cross-sections systematically tend to be higher than the predictions. The measurements are dominated by systematic uncertainties that could be reduced by deploying data-driven techniques to better control the impact of backgrounds and reconstruction-related systematic uncertainties. Having more data in the coming years with a better understanding of detectors and more sophisticated reconstruction techniques will bring us to the precision era, where possible new physics could finally be revealed.

Conflicts of Interest:
The authors declare no conflict of interest.