Search for standard model production of four top quarks in proton-proton collisions at sqrt(s) = 13 TeV

A search for events containing four top quarks (ttbar-ttbar) is reported from proton-proton collisions recorded by the CMS experiment at sqrt(s) = 13 TeV and corresponding to an integrated luminosity of 2.6 inverse femtobarns. The analysis considers the single-lepton (e or mu)+jets and the opposite-sign dilepton (mu+mu-, mu+/- e-/+, or e+e-)+jets channels. It uses boosted decision trees to combine information on the global event and jet properties to distinguish between ttbar-ttbar and ttbar production. The number of events observed after all selection requirements is consistent with expectations from background and standard model signal predictions, and an upper limit is set on the cross section for ttbar-ttbar production in the standard model of 94 fb at 95% confidence level (10.2 times the prediction), with an expected limit of 118 fb. This is combined with the results from the published CMS search in the same-sign dilepton channel, resulting in an improved limit of 69 fb at 95% confidence level (7.4 times the prediction), with an expected limit of 71 fb. These are the strongest constraints on the rate of ttbar-ttbar production to date.

Published in Physics Letters B as doi:10.1016/j.physletb.2017.06.064.c 2022 CERN for the benefit of the CMS Collaboration.CC-BY-3.0 license

Introduction
In proton-proton collisions at the CERN LHC most top quarks are produced in tt pairs, with a small contribution from single top quark production.It is also possible, however, to produce four top quarks (tttt) in the standard model (SM) via higher-order diagrams in quantum chromodynamics (QCD), mainly via gluon fusion, as shown in Fig. 1.
The SM cross section for tttt production is approximately five orders of magnitude smaller than that for tt, and tttt production has not yet been observed.Observing tttt production consistent with predictions would provide a valuable test of higher-order perturbative QCD calculations.In addition, many models of physics beyond the SM predict an increase in the tttt cross section owing either to the presence of hypothetical particles that decay into top quarks or to modified couplings.These include models with massive colored bosons, Higgs boson and top quark compositeness, or extra dimensions, models with extended scalar sector such as 2HDM models [1], and supersymmetric extensions of the SM [2][3][4][5][6][7][8].Some of these models predict enhancements in the observed tttt cross section, with the associated kinematic distributions remaining similar to those from SM production.This is particularly the case when no new particles beyond those in the SM are produced on-shell.

At
√ s = 8 TeV, where the SM predicts a cross section (σ SM tttt ) of 1.3 fb [9], CMS set a 95% confidence level (CL) upper limit of 32 fb on the production cross section, and comparable results, 23 fb, were obtained by ATLAS [10,11].At √ s = 13 TeV, the SM prediction increases to 9.2 fb [9,12], where CMS set a 95% CL upper limit of 119 fb using a same-sign dilepton analysis [13].The studies presented in this paper are performed in two separate decay modes that are complementary to and statistically independent from the same-sign dilepton analysis.The first analysis examines the final state where only one of the four W bosons from the top quark decays in tttt production decays to a muon or electron.This single-lepton final state has the largest branching fraction in tttt production.The second analysis focuses on the opposite-sign dilepton channel with exactly two of any combination of electrons or muons.Both use the 13 TeV data recorded by the CMS experiment in 2015 (corresponding to an integrated luminosity of 2.6 fb −1 ), and apply multivariate techniques to discriminate between the tttt and tt processes.In order to enhance the sensitivity, the search is performed in multiple jet and b jet multiplicity categories.
The dominant background process is tt production, which is simulated at NLO using POWHEG v2 [17][18][19][20].The events coming from Drell-Yan (qq → Z/γ * → + − , with = e or µ)+jets and W boson+jets production are modeled using MADGRAPH [12], with the MLM matching scheme [21], and up to three jets.Single top quark production and the production of tt pairs in conjunction with a Higgs boson give small background contributions, and these are simulated using POWHEG v1.Lastly, the production of a tt pair in association with a W or Z boson is modeled using the MG5 aMC@NLO generator.In all of the simulations, the initial-and finalstate radiation (ISR and FSR), and the fragmentation and hadronization of quarks are modeled using PYTHIA 8.212 [22,23] with the underlying event tune CUETP8M1 [24].This tune uses a value of 0.137 for the strong coupling α S (M Z ) in the parton shower simulation, which leads to a mismodeling of the jet multiplicity (N j ) spectrum.All of the simulations are corrected for this effect using factors that depend on the number of observed jets and which are equal to the ratio of the particle-level cross sections calculated assuming α S (M Z ) = 0.137 and α S (M Z ) = 0.113.The latter value was derived from a comparison of the predictions of the tt simulation and a CMS measurement on an independent tt dataset at 8 TeV [25].
Detailed studies show that the contributions of tt+H, Z boson, or W boson do not significantly affect the sensitivity of the analysis.This combined background is included in the overwhelming tt background for the rest of this paper, unless mentioned otherwise.Backgrounds containing Z and W bosons but no top quarks are further referred to as electroweak (EW) backgrounds.Among single top production modes, the only relevant one is the contribution from tW production.All of these backgrounds are included in the analysis but are orders of magnitude smaller than the background originating from tt+jets production.
The NNPDF 3.0 and NNPDF nlo as 0118 [26] PDFs are used to generate all events, the latter being used for samples created with NLO generators.The simulated samples already include an estimate of the additional pp interactions per bunch crossing (pileup), and further corrections are applied to make the simulation of the number of additional interactions representative of that observed in the data.All of the simulated events are propagated through a simulation of the CMS detector, which is based on GEANT4 (v.9.4) [27].The tt background process is normalized to the next-to-next-to-leading-order cross section [28].In all other cases, the NLO cross sections are used [29][30][31].

Event selection
The final states considered in this analysis are the single-lepton channel with exactly one muon or electron, and the opposite-sign dilepton channel with exactly µ + µ − , µ ± e ∓ , or e + e − .These leptons originate from the W bosons from top quark decays and tend to be isolated from jets, unlike leptons produced in the decay of B or other hadrons within jets.Single-lepton events were recorded using a trigger that required at least one isolated muon with p T > 18 GeV or one isolated electron with p T > 23 GeV.Dilepton events were recorded using a trigger that required an electron or muon with p T > 17 GeV, in combination with a second lepton where the requirement is p T > 8 GeV for a muon and 12 GeV for an electron.
Each event is required to have at least one reconstructed vertex.The primary vertex is chosen as the one with the largest value of ∑ p 2 T of the tracks associated with it.Single-lepton events are required to contain exactly one isolated muon or electron with p T > 26 GeV or 30 GeV, respectively, and pseudorapidity |η| < 2.1.The isolation is ensured by demanding the variable I rel to be below the predefined threshold.The relative isolation, I rel , is defined as the scalar p T sum of the additional particles emanating from the same vertex as the lepton, within a cone of angular radius ∆R = √ (∆η) 2 + (∆φ) 2 = 0.4 around the lepton, divided by the p T of the lepton, where ∆η and ∆φ are the differences in pseudorapidity and azimuthal angle (in radians), respectively, between the directions of the lepton and the additional particle.The sum does not include the p T of the muon and is corrected for the neutral particle contribution from pileup on an event-by-event basis.Electron candidates are required to satisfy restrictive identification criteria, including isolation, which are described in Ref. [32].Muons are required to satisfy the criteria described in Ref. [33] and have a relative isolation variable, I rel , smaller than 0.15.
In the dilepton channel, events are required to contain two isolated leptons of opposite charge with p T > 20 GeV or 25 GeV for muons or electrons, respectively, and |η| < 2.4.Electron candidates are required to satisfy the same identification criteria as in the single-lepton channel.Because of the lower background, the muon isolation requirement is relaxed to I rel < 0.25.In the µµ and ee channels, the lepton pair is also required to have an invariant mass greater than 20 GeV and outside of a 30 GeV window centered on the Z boson mass, to exclude leptons from the decays of low-mass resonances and Z bosons.Events containing additional isolated charged leptons are vetoed.
Jets are reconstructed using a particle-flow algorithm in which the particles are clustered using the anti-k T algorithm [34,35] with a distance parameter of 0.4.The jet momentum is determined from the vectorial sum of all particle momenta in the jet, and is reconstructed to within 5-10% of the true momentum over the full range of p T within the detector acceptance, as determined from simulations.An offset correction is applied to jet energies to take into account the contribution from pileup.Jet energy corrections are derived for simulation, and verified using in situ measurements of the energy balance in dijet and photon+jet events.These corrections are applied as a function of the jet p T and η to both data and simulated events [36].
A minimum of six or four jets are each required to have p T > 30 GeV and |η| < 2.5 in the single-lepton or dilepton channel, respectively, with two or more required to be tagged as originating from the hadronization of b quarks (b jets) using the combined secondary vertex (CSV) algorithm [37].A working point ("medium") of the algorithm is chosen to give a misidentification rate of approximately 1% for light-quark and gluon jets, with b tagging efficiencies of 40-75% depending on the kinematic properties of the jet.In the dilepton channel, events are also required to have H T > 500 GeV, where H T is defined as the scalar sum of the p T of all jets.This selection removes a large amount of the tt background, while not significantly reducing the expected number of signal events.
The efficiency of the lepton selection is measured using tag-and-probe techniques [38,39].The simulation is corrected using p T -and η-dependent scale factors of order unity to provide consistency with the data.The distribution of the CSV discriminant in simulation is corrected to match that observed in data [40,41].The relative rates of tt+bb and tt+light-quark events in the simulation are corrected to make them consistent with previous measurements [42].After implementing the complete event selection criteria and all corrections, good consistency in kinematic distributions is found between the data and simulation in both channels.The total number of events selected in the single-electron (muon) channel is 3740 (5600) and 332 events are selected in the dilepton channel.All are consistent with the expected background.Selected events are predominantly (over 97% in the single-lepton channel and 95% in the dilepton channel) expected to be tt+jets, with the remaining fraction from single top, Drell-Yan+jets, and W+jets production.

Multivariate analysis
Two boosted decision trees (BDTs), implemented using the TMVA library [43][44][45], are used to improve the discrimination between signal and background.The principal differences between tttt and tt production are in the jet multiplicity and the number of b jets.These and the associated kinematic variables feature strongly in the choice of BDT input parameters.This is based on the strategy developed for the CMS √ s = 8 TeV analysis, and the training uses simulated backgrounds from tt+jets and tttt production [10].
The first BDT is used to identify combinations of three jets (trijet) consistent with being the product of all-hadronic top quark decays, rather than of other sources such as ISR or FSR.The BDT uses the invariant dijet and trijet masses, b tagging information, and the angles between the three jets as input variables.All possible trijet permutations are ranked according to their BDT discriminant value, from highest to lowest.In the dilepton channel, the tt background contains no hadronic top decays, so the BDT output for the first-ranked trijet (T trijet1 ) is used as the discriminant.In the single-lepton channel, each background event contains a genuine hadronic top quark decay, so the jets included in T trijet1 are removed and the highest-ranked BDT discriminant using the remaining jets (T trijet2 ) is used.
The second BDT, with discriminants D lj tttt for the single-lepton channel and D dil tttt for the dilepton channel, takes the discriminant from the trijet associations as one of its input parameters.Additional inputs are then optimized separately for the two channels and are based on the characteristics of the lepton and jet activity in the events.Not all inputs are used by both analyses.These are grouped into three categories: event activity, event topology, and b quark multiplicity.Although many of these are correlated, each one contributes some additional discrimination between the tt background and the tttt signal.
Comparison of simulated tt and tttt events leads to the selection of the following event activity variables: 1.The number of jets present in the event, N j .
2. Weighted jet multiplicity (N w j ), based on both the jet multiplicity and the p T distribution of the jets.This quantity is sensitive to the differences between the p T spectra of the jets from top quark decays and those originating from gluon radiation, having higher values in events with many high-p T jets than in events where only a few jets have high p T and the rest are close to the selection threshold.It is defined as where the limits of integration are in GeV and N j p T > p th T is the number of jets with p T above a threshold p th T , while p 0 T = 30 GeV, is the p T of the ith jet and p N j +1 T = 125 GeV.The lower limit of 30 GeV is driven by the minimum p T requirements on the jets, while the upper limit of 125 GeV is chosen since above this value, there are few events and increasing the upper limit on p th T would not significantly affect the sensitivity of N w j .
3. The variable H b T , defined as the scalar sum of the p T of all jets that are identified as b jets by the CSV algorithm, applied at its medium working point.

T
) of the H T of the four highest-p T jets in the event in the single-lepton, or the two highest-p T jets in the dilepton channel, to the H T of the other jets in the event.

The quantity H 2m
T , defined as the H T in the event minus the scalar sum of the p T of the two highest-p T b jets.7. The reduced event mass (M h red ), defined as the invariant mass of the system comprising all the jets in the reduced event, where the reduced event is constructed by subtracting the jets contained in T trijet1 in single-lepton events.In tt events, the reduced event will typically only contain the b jet from the semileptonic top quark decay and jets arising from ISR and FSR.Conversely, a reduced tttt event can contain up to two hadronic top quarks and, as a result, numerous energetic jets.

The reduced event H T (H x T )
. This is defined as the H T of all jets in the single-lepton event selection excluding those contained in T trijet1 .
The event topology is characterized by the variables: 1. Event sphericity (S) [46], calculated from all of the jets in the event in terms of the tensor , where α and β refer to the three-components of the momentum of the ith jet.The sphericity is then S = (3/2)(λ 2 + λ 3 ), where λ 2 and λ 3 are the two smallest eigenvalues of S αβ .The sphericity in tttt events should differ from that in background tt events of the same energy, since the jets in tt events will be less isotropically distributed because of their recoil from sources such as ISR.
2. Hadronic centrality (C), defined as the value of H T divided by the sum of the energies of all jets in the event.
Since all the previous variables rely only on the hadronic information in the event, sensitivity to the lepton information is provided through the p T and η of the highest-p T lepton (or only lepton for the single-lepton channel) p 1 T , η 1 and the angular difference (∆R ) between the leptons in dilepton events.Finally, the b jet multiplicity is characterized in terms of the number of b jets tagged by the CSV algorithm operating at its loose [40] (N l tags ) and medium (N m tags ) operating points, and the angular separation ∆R bb between the b-tagged jets with the highest CSV discriminants, and the third-and fourth-highest discriminant values.
The event-level discriminants D lj tttt and D dil tttt are optimized separately, resulting in the choice of different sets of variables.For the single-lepton channel, the optimal variable set, in order of sensitivity, is found to be N j , T trijet2 , H b T , H ratio T , p 1 T , N w j , M h red , H x T , and the third-and fourth-highest CSV discriminants.In the dilepton channel, the optimal variable set, in order of sensitivity, is N j , N w j , S, T trijet1 , N l tags , N m tags , ∆R bb , H b T , p  To further improve the sensitivity of the analyses, the data are split into exclusive jet multiplicity categories.The single-lepton analysis uses categories of N j = 6, 7, 8, and ≥9.The dilepton channel, with fewer events, uses only the 4-5, 6-7, and ≥8 jet categories.A further division into exclusive b jet multiplicities is possible only for the single-lepton analysis, where the N j  categories are subdivided into categories with N m tags = 2, 3, and ≥4. Figure 2 shows D lj tttt in the µ+jets and e+jets channels for two of the most sensitive categories, and Fig. 3 shows the D dil tttt distributions for the dilepton channel.
The distributions of the discriminants D lj tttt and D dil tttt are fitted simultaneously for each N j and N m tags bin.For the single-lepton channel the fit is also performed separately for the µ+jets and e+jets events.In the three dilepton channels (µ + µ − , µ ± e ∓ , e + e − ), the D dil tttt distributions are found to be consistent, and they are combined to improve the statistical precision.In all cases good agreement is observed between the data and the simulated background, and the results from each of the channels are combined to obtain an upper limit on the tttt production cross section.

Sources of systematic uncertainty
The systematic uncertainties affecting the analyses are grouped into normalization and shape categories, depending on their effect on the event-level BDT discriminant distribution.While all normalization uncertainties apply to both the signal and all the background simulations, the shape uncertainties are only considered for the tt background and the tttt signal.These include effects related to the change in normalization owing to changes in the shape.The normalization uncertainties are: 1.An uncertainty of 2.3% [47] in the integrated luminosity.
2. The uncertainty in the theoretical tt cross section dominates the uncertainty in the predicted event yields, since the tt process dominates the selected data samples.This cross section is taken from Ref. [48], and includes uncertainties of +2.5% −3.4% (renormalization and factorization scale) and +6.2% −6.4% (PDF).The effect of uncertainties in the cross sections for the other backgrounds were checked and found to be negligible.
3. The uncertainties from trigger, lepton identification, and lepton isolation corrections, which are included as nuisance parameters in determining the upper limit.Combined, these give an uncertainty of 1.2% in the single-muon channel, 3.7% in the single-electron channel, 4.3% in the µµ channel, 4.6% in the µe channel, and 4.8% in the ee channel.
The shape uncertainties are: 1.The uncertainty from the choice of the factorization and renormalization scales in the calculation of the matrix element of the hard-scattering process, which is estimated by the maximum variation in the D dil tttt or D lj tttt distribution obtained when each scale is changed separately by a factor of 1/2 and 2, excluding unphysical anticorrelated combinations.This procedure is performed separately for the tttt signal and the tt background.In addition, alternative tt samples are used to estimate the impact of a change in the scale at the parton-shower level, taking into account the uncertainty in α S for the hadronization [25].The differences in the distributions with respect to the nominal ones are taken as the uncertainty.The uncertainty in the matrix-element scale is the dominant systematic uncertainty in the analysis.
2. Differences in the simulation of tt from the choice of the matrix-element generator, which is estimated by comparing the nominal tt simulation using POWHEG+PYTHIA 8 to samples generated using MADGRAPH+PYTHIA 8 with MLM matching [16].The difference relative to the nominal simulation is used to estimate the uncertainty from this source.
3. The uncertainty in the fraction of ttbb events in the tt background, which is estimated using the uncertainty in the measured cross section ratio σ ttbb : σ ttjj [42] that was used to correct the ttbb content of the tt simulation.An anticorrelated uncertainty in the measured cross section ratio of (σ ttjj − σ ttbb ) : σ ttjj is applied simultaneously to the light-quark fraction to maintain the total tt cross section.
4. The uncertainties in the jet energy scale and the jet energy resolution [49], which are estimated by varying these within their uncertainties by ±1 standard deviation.A similar method is used to estimate the uncertainty from the inelastic proton-proton cross section and the procedure used in the pileup reweighting.These uncertainties have very little influence on the final limit.
5. The uncertainty in the corrections to the values of the b tagging CSV discriminator, where three categories of systematic uncertainty are applied for each jet flavor: the jet energy scale, purity of the data sample used to derive the corrections, and the statistical uncertainties derived from the fits used in the method.The uncertainty in the b tagging correction caused by the jet energy scale is treated as fully correlated with the jet energy scale uncertainty described above.Typical magnitudes of each of these individual uncertainties on the corrections to the b tagging CSV discriminator are 10-50% before the fit, depending on the number of jets and b jets in the event.A full description of these corrections can be found in Ref. [41].
Each systematic source was attributed a nuisance parameter in the limit determination.

Results
No deviation from the background-only simulation, which includes tt production and negligible single top, tt+H/Z/W boson, Drell-Yan+jets, and W+jets backgrounds, is observed in the D dil tttt or D lj tttt distributions.An upper limit is derived for the tttt production cross section using the asymptotic approximation of the CL s method provided in Refs.[50][51][52][53][54].The signal and background distributions are fitted using a simultaneous maximum-likelihood method.The normalization uncertainties are included using log-normal functions and the shape uncertainties are included as Gaussian-distributed nuisance parameters.The expected and observed 95% CL upper limits from the two analyses and their combination are listed in Table 1.For the combination of the single-lepton and opposite-sign dilepton results, the systematic uncertainties attributed to the integrated luminosity, jet energy scale, and modeling of the pileup contribution are assumed to be fully correlated.All other systematic uncertainties are assumed to be uncorrelated, but taking them as fully correlated does not modify the expected limit.The likelihood function for the single and opposite-sign dilepton limit has 24 nuisance parameters corresponding to the sources of systematic uncertainties that are described above.The combination with the like-sign dilepton analysis, which is described below, adds 19 additional nuisance parameters specific to Ref. [13].The data are able to significantly constrain the parameters corresponding to the ME generator choice and the parton shower scale.All of the post-fit nuisance parameter values were found to be consistent with their initial values to well within their quoted uncertainties.
The combined observed 95% CL upper limits on the cross section for four-top-quark production measured in the single-lepton, dilepton, and combined results are shown in Table 1.To crosscheck the increase in sensitivity of the multivariate approach, the analysis is performed in the single-lepton channel using the same event categorization, but only the H T distributions in place of D lj tttt .The expected limit increases by approximately 20%, thus justifying the use of the more complicated BDT analyses.
CMS has also produced an upper limit on the tttt cross section from the analysis of the likesign dilepton channel [13].The limit from the analysis is also shown in Table 1.To improve sensitivity, the results from this search are combined with the results from that analysis.For this combination, in addition to the assumed correlations described above, the uncertainty in the modeling of the response of the CMS trigger system to dilepton events is treated as correlated between the opposite-sign and like-sign dilepton analyses.A combined upper limit for the SM tttt cross section is listed in Table 1.

Summary
In summary, a search has been performed for events containing four top quarks using data recorded by the CMS experiment in proton-proton collisions at √ s = 13 TeV corresponding to an integrated luminosity of 2.6 fb −1 .The final states considered in the analysis are the single-Table 1: Expected and observed 95% CL upper limits on SM tttt production as a multiple of σ SM tttt and in fb.The results for the two analyses from this paper are shown separately and combined.The result from a previous CMS measurement [13] is also given, along with the overall limits when the three measurements are combined.The values quoted for the uncertainties on the expected limits are the one standard deviation values and include all statistical and systematic uncertainties.

Channel
Expected limit Observed limit Expected limit Observed limit lepton channel with exactly one electron or muon, and the opposite-sign dilepton channel with exactly two of any combination of electrons or muons.A boosted decision tree is used to discriminate between the tttt signal and the tt background, and no signal is observed.This leads to an upper limit on the SM production cross section for tttt of 94 fb (10.2 σ SM tttt ), with an expected limit of 118 + 76 − 41 fb at the 95% confidence level.This result is combined with a previous search [13] with similar sensitivity in the same-sign dilepton channel to obtain an improved limit of 69 fb, with an expected limit of 71 + 38 − 24 fb.This is the most stringent limit on tttt production at √ s = 13 TeV published to date.

Figure 1 :
Figure 1: A representative Feynman diagram for tttt production in the SM at lowest order in QCD.

6 .
The transverse momenta of the jets with the third-and fourth-largest p T in the event (

Figure 2 :
Figure 2: Distribution of the event-level BDT discriminants D lj tttt for the µ+jets (left) and e+jets (right) final states from data and the estimated background contributions from simulation, in the N j ≥ 9 and 3 N m tags (upper panels) and the N j ≥ 9 and ≥4 N m tags categories (lower panels).The vertical bars show the statistical uncertainties in the data.The predicted background distributions from simulation are shown by the shaded histograms The hatched area shows the size of the dominant systematic uncertainty in the simulation, which comes from the matrixelement (ME) factorization and renormalization scales used in the simulation.The expected SM tttt signal contribution is shown by open histogram, multiplied by a factor of 20.

Figure 3 :
Figure 3: Distribution of the event-level BDT discriminants D dil tttt for the combined dilepton (µ + µ − + µ ± e ∓ + e + e − ) event sample for 4-5 jets (upper left), 6-7 jets (upper right), and ≥8 jets (bottom).The vertical bars show the statistical uncertainty in the data.The predicted background distributions from simulation are shown by the shaded histograms.The hatched area shows the size of the dominant systematic uncertainty in the simulation, which comes from the choice of the matrix-element (ME) factorization and renormalization scales used in the simulation.The electroweak (EW) histogram is the sum of the Drell-Yan and W boson+jets backgrounds.The expected SM tttt signal contribution is shown by the open histogram, multiplied by a factor of 20.