1 Introduction

The top quark is the most massive known elementary particle. Given that its Yukawa coupling to the Higgs boson is close to unity, it may play a special role in electroweak symmetry breaking [1, 2]. Studies of top-quark production and decay are major research goals at the LHC, providing both a precise probe of the Standard Model (SM) [3] and a window on physics beyond the Standard Model (BSM) [4]. The LHC supplies a large number of top-quark events to its major experiments, offering an excellent environment for such studies.

In proton–proton collisions, the dominant top-quark production process is pair production via the strong interaction. The measurement of the production cross-section provides a stringent test of QCD calculations with heavy quarks [5], allows a determination of the top-quark mass in a well-defined renormalisation scheme [6, 7], and can be sensitive to potential new physics such as top-quark partners degenerate in mass with the SM top quark [8].

The predicted inclusive \(t\bar{t}\) cross-section at a centre-of-mass energy of \(\sqrt{s} = 8~\hbox {TeV}\), assuming a top-quark mass \(m_{\text {top}} = 172.5~\hbox {GeV}\), is

$$\begin{aligned} \sigma (pp \rightarrow t\bar{t}) \ = \ 253^{+13}_{-15}\ \ \ {{\mathrm {pb}}}. \end{aligned}$$
(1)

It is calculated at next-to-next-to-leading order (NNLO) in QCD including resummation of next-to-next-to-leading logarithmic (NNLL) soft-gluon terms with Top++ (v2.0) [5, 9,10,11,12,13]. The QCD scale uncertainties are determined as the maximum deviation in the predicted cross-section for the six probed variations following a prescription referred to as independent restricted scale variations. Here the renormalisation scale (\(\mu _{\text {r}}\)) and the factorisation scale (\(\mu _{\text {f}}\)) are varied independently to half the default scale and twice the default scale omitting the combinations (\(0.5 \mu _{\text {r}}^{\text {def}}\), \(2.0 \mu _{\text {f}}^{\text {def}}\)) and (\(2.0 \mu _{\text {r}}^{\text {def}}\), \(0.5 \mu _{\text {f}}^{\text {def}}\)). The uncertainties due to the parton distribution functions (PDFs) and \(\alpha _{\text {S}} \) are calculated using the PDF4LHC prescription [14] where the uncertainties of the MSTW2008 68% CL NNLO [15, 16], CT10 NNLO [17, 18] and NNPDF 2.3  [19] PDF sets are added in quadrature to the \(\alpha _{\text {S}} \) uncertainty. Comparable results are obtained using a different resummation technique as reported in Refs. [20, 21]. The predicted cross-section’s total scale, PDF and \(\alpha _{\text {S}} \) uncertainty of about 6% sets the current goal for the experimental precision.

Measurements of the \(t\bar{t}\) cross-section have been published for several centre-of-mass energies between 1.96 and 13 TeV in \(p\bar{p}\) and pp collisions. At the Tevatron, the uncertainty in the \(t\bar{t}\) cross-section measured by the D0 and CDF collaborations at a centre-of-mass energy of 1.96 TeV is 5.4% [22]. The most precise measurement for a centre-of-mass energy of 8 TeV, with a total uncertainty of 3.2%, was performed by the ATLAS Collaboration in the dilepton channel, where both top quarks decay via \(t \rightarrow \ell \nu b\) [23]. The final-state charged lepton \(\ell \) is either an electron or a muon.Footnote 1 Further measurements at 7, 8 and 13 TeV in the same final state were published by the ATLAS and CMS collaborations [24,25,26]. Additionally, a measurement of the \(t\bar{t}\) production cross-section in the forward region at 8 TeV was published by the LHCb collaboration [27].

The measurement reported in this paper is performed in the lepton+jets final state, where one W boson decays leptonically and the other W boson decays hadronically, i.e.

$$\begin{aligned} t\bar{t}\ \rightarrow W^+ W^- b\bar{b} \rightarrow \ell \nu q\bar{q}' b\bar{b}. \end{aligned}$$

Results are reported for both the full phase space and for a fiducial phase space close to the selected dataset.

Since experimental uncertainties may affect each decay mode differently, it is important to determine whether \(t\bar{t}\) cross-sections measured in different decay modes are consistent with each other. Furthermore, new physics processes can contribute in different ways to the different decay modes.

The analysis is based on data collected at a pp centre-of-mass energy of \(\sqrt{s} = 8~\hbox {TeV}\). The most precise cross-section previously measured in this channel at \(\sqrt{s} = 8~\hbox {TeV}\) was published by the CMS Collaboration and reached an uncertainty of 6.8% [28]. This analysis supersedes the previous measurement from the ATLAS Collaboration, which achieved a total uncertainty of 9.4% using the same dataset [29]. This analysis improves on the previous result by splitting the overall sample of \(t\bar{t}\) candidates into three signal regions and by constraining important sources of systematic uncertainty.

2 ATLAS detector

The ATLAS detector [30] is a multi-purpose particle detector with forward-backward symmetry and a cylindrical geometry.Footnote 2 The inner detector (ID) tracking system is surrounded by a thin superconducting solenoid magnet, electromagnetic and hadronic calorimeters, and a muon spectrometer (MS) in a magnetic field generated by three superconducting toroidal magnets of eight coils each. The inner detector, in combination with the 2 T magnetic field from the solenoid, provides precision momentum measurements for charged particles within the pseudorapidity range \(\vert \eta \vert<\) 2.5. It consists of, from the interaction point to the outside, a silicon pixel detector and a silicon microstrip detector (together allowing a precise and efficient identification of secondary vertices), complemented with a straw-tube tracker contributing transition radiation measurements to electron identification. The calorimeter system covers the pseudorapidity range \(\vert \eta \vert<\) 4.9. A high-granularity liquid-argon (LAr) sampling calorimeter with lead absorbers provides the measurement of electromagnetic showers within \(\vert \eta \vert<\) 3.2. In the ID acceptance region, \(\vert \eta \vert<\) 2.5, the innermost layer has a fine segmentation in \(\eta \) to allow separation of electrons and photons from \(\pi ^0\) decays and to improve the resolution of the shower position and direction measurements. Hadronic showers are measured by a steel/plastic-scintillator tile calorimeter in the central region, \(\vert \eta \vert<\) 1.7, and by a LAr calorimeter in the endcap region, 1.5 \(<\vert \eta \vert<\) 3.2. In the forward region, measurements of electromagnetic and hadronic showers are provided by a LAr calorimeter covering the pseudorapidity range 3.1 \(<\vert \eta \vert<\) 4.9. The MS combines trigger and high-precision tracking detectors, and allows measurements of charged-particle trajectories within \(\vert \eta \vert<\) 2.7. The combination of all ATLAS detector subsystems provides charged-particle tracking, along with identification for charged leptons and photons, in the pseudorapidity range \(\vert \eta \vert<\) 2.5.

A three-level trigger system is used to select interesting events [31]. A hardware-based first-level trigger uses a subset of detector information to bring the event rate below 75 kHz. Two additional software-based trigger levels together reduce the event rate to about 400 Hz on average, depending on the data-taking conditions.

3 Data and simulated events

This analysis is performed using pp collision data recorded at a centre-of-mass energy of \(\sqrt{s}=8~\hbox {TeV}\), corresponding to the full 2012 dataset. The data-taking periods in which all the subdetectors were operational are considered, resulting in a data sample with an integrated luminosity of \(\mathcal {L}_{\mathrm {int}} = \hbox {20.2 fb}^{-1}\).

Detector and trigger simulations were performed within the GEANT4 framework [32, 33]. The same offline reconstruction methods used on data are applied to the simulated events. Minimum-bias events generated with Pythia 8 [34] were used to simulate multiple pp interactions (pile-up). The distribution of the number of pile-up interactions in the simulation is reweighted according to the instantaneous luminosity spectrum in the data.

Signal \(t\bar{t}\) events were simulated using the Powheg-Box event generator (r3026) [35, 36] with the CT10 PDF set [17]. The renormalisation and factorisation scales in the matrix element calculation were set to the value \(\mu =\sqrt{m_{\text {top}} ^2 + p_{\text {T}} ^2(t)}\) where \(p_{\text {T}} (t)\) is the top-quark transverse momentum, evaluated for the underlying Born configuration, i.e. before radiation. The \(h_{\text {damp}}\) parameter, which controls the transverse momentum, \(p_{\text {T}}\), of the first emission beyond the Born configuration, was set to \(m_{\text {top}}\). The main effect of this is to regulate the high-\(p_{\text {T}}\) gluon emission against which the \(t\bar{t}\) system recoils. Parton shower (PS), hadronisation and the underlying event were simulated with Pythia  (v6.428) [37] and the Perugia2011C set of tuned parameters [38].

For systematic studies of the \(t\bar{t}\) process, alternative event generators and variations of the tuned parameter values in Pythia are used. The Powheg-Box event generator, using the same configuration as the nominal sample, interfaced to Herwig (v6.5.20) [39] is used for hadronisation-modelling studies, while MC@NLO (v4.09) [40, 41] interfaced to Herwig is used to study the dependence on the matching method between the next-to-leading-order (NLO) matrix element (ME) generation and the PS evolution. In the case of events showered by Herwig, the Jimmy  (v4.31) [42] model with the ATLAS AUET2 [43] set of tuned parameters were used to simulate the underlying event. Variations of the amount of additional radiation are studied using events generated with the Powheg-Box + Pythia event generators after changing the scales in the ME and the scales in the parton shower simultaneously. In these samples, a variation of the factorisation and renormalisation scales by a factor of 2 was combined with the Perugia2012radLo parameters and a variation of both scale parameters by a factor of 0.5 was combined with the Perugia2012radHi parameters [38]. In the second case, the \(h_{\text {damp}} \) parameter was also changed and set to twice the top-quark mass [44].

The associated production of an on-shell W boson and a top quark (Wt), and single top-quark production in the s- and t-channel, were simulated by the Powheg-Box (r2819, r2556) event generator [45, 46] with the CT10 PDF set interfaced to Pythia using the Perugia2011C set of tuned parameters. The Wt process has a predicted production cross-section of 22.3 pb [47], calculated to approximate NNLO accuracy with an uncertainty of 7.6% including scale and PDF uncertainties. The cross-sections for single top-quark production in the s- and t-channel are calculated with the Hathor v2.1 tool [48] to NLO precision, based on work documented in Ref. [49]. Uncertainties from variations of scales used in the ME and PDFs are estimated using the same methodology as for \(t\bar{t}\) production. For t-channel production, this leads to a cross-section of 84.6 pb with a total uncertainty of 4.6%, while for s-channel production a cross-section of 5.2 pb with a total uncertainty of 4.2% is predicted.

All top-quark processes were simulated with a top-quark mass of 172.5 GeV and a width of 1.32 GeV modelled using a Breit–Wigner distribution. The top quark is assumed to decay via \(t \rightarrow Wb\) 100% of the time.

Vector-boson production in association with jets (W/Z+jets) was simulated with Alpgen  (v2.14) [50], using the CTEQ6L1 set of PDFs [51]. The partonic events were showered with Pythia using the Perugia2011C set of tuned parameters. Simulated \(W \text {+\,jets}\), \(W+b\bar{b}, W+c\bar{c}, W+c\) and Z+jets, \(Z+b\bar{b}, Z+c\bar{c}\) events with up to five additional partons were produced, and the overlap between the ME and the PS was removed with the “MLM” matching scheme [52]. The double-counting between the inclusive \(W+n\) parton samples and dedicated samples with at least one heavy quark (c- or b-quark) in the ME was removed by vetoing events based on a \(\Delta R\) matching. The cross-sections for inclusive W- and Z-boson production are calculated with NNLO precision using the FEWZ program [53, 54] and are estimated to be 12.1 and 1.13 nb, respectively. The uncertainty is 4%, including the contributions from the PDF and scale variations.

Samples of diboson (VV, \(V=W\) or Z) events were produced using the Sherpa  (v1.4.1) event generator [55] with the CT10 PDF set, up to three additional partons in the ME, and a dedicated parton-shower tune developed by the Sherpa authors. The CKKW method [56] was used to remove overlap between partonic configurations generated by the ME or the PS. All three processes are normalised using the inclusive NLO cross-sections provided by MCFM  [57], which are 56.8 pb for WW, 7.36 pb for ZZ, and 21.5 pb for WZ production. The total uncertainty for each of the three processes, including scale variations and uncertainties in the PDF, is estimated to be 5%.

4 Event reconstruction

In this analysis, \(t\bar{t}\) candidate events are identified by means of isolated electrons and muons, jets, some of which are possibly b-tagged as likely to contain b-hadrons, and sizeable missing transverse momentum. The definitions of these reconstructed objects, called detector-level objects, together with the corresponding objects reconstructed using only MC event generator information, called particle-level objects, are discussed in this section. The particle-level objects are used to define a fiducial volume.

4.1 Detector-level object reconstruction

Electrons Electron candidates are reconstructed by matching tracks in the ID to energy deposits (clusters) in the electromagnetic calorimeter [58]. Selected electrons are required to satisfy strict quality requirements in terms of shower shape, track properties and matching quality. Electron candidates are required to be within \(|\eta | < 2.47\), and candidates in the calorimeter barrel–endcap overlap region, \(1.37< |\eta | < 1.52\), are excluded. Electrons from heavy-flavour decays, hadronic jets misidentified as electrons, and photon conversions are the major backgrounds for high-\(p_{\text {T}}\)  electrons associated with a W-boson decay. The suppression of these backgrounds is possible via isolation criteria that require little calorimeter activity and a small sum of track \(p_{\text {T}}\) in an \(\eta \)\(\phi \) cone around the electron. The electromagnetic (EM) calorimeter isolation variable is defined as the scalar sum of the transverse momenta of calorimeter energy deposits within the cone, corrected by subtracting the estimated contributions from the electron candidate and from the underlying event and pile-up contributions. The track isolation variable is defined as the scalar sum of all track transverse momenta within the cone, excluding the track belonging to the electron candidate [59]. Thresholds are imposed on the EM calorimeter isolation variable in a cone of size \(\Delta R = 0.2\) around the electron and on the track isolation in a cone of size \(\Delta R = 0.3\). The isolation requirements imposed on the electron candidates are tuned to achieve a uniform selection efficiency of 90% across electron transverse energy \(E_{\text {T}}\) and pseudorapidity \(\eta \). The electron pseudorapidity is taken from the associated track.

Muons Muon candidates are reconstructed by combining MS tracks with tracks in the ID, where tracks in the MS and ID are reconstructed independently [60]. The final candidates are required to be in the pseudorapidity region of \(|\eta |<2.5\). A set of requirements on the number of hits in the ID must also be satisfied by muon candidates. An isolation requirement [61] is applied to reduce the contribution of muons from heavy-flavour decays. The isolation variable is defined as the scalar sum of the transverse momenta of all tracks originating from the primary vertex with \(p_{\text {T}}\) above 1 GeV, excluding the one matched to the muon, within a cone of size \(\Delta R_{\text {iso}}=10~\hbox {GeV}/p_{\text {T}} (\mu )\), where \(p_{\text {T}} (\mu )\) is the transverse momentum of the muon. Muon candidates are accepted when the value of the isolation variable divided by the \(p_{\text {T}} (\mu )\) is smaller than 0.05.

Jets Jets are reconstructed using the anti-\(k_{t}\) algorithm [62] implemented in the FastJet package [63] with a radius parameter of 0.4, using topological clusters calibrated with the local cell weighting (LCW) method [64] as inputs to the jet finding algorithm. The energies of the reconstructed jets are calibrated using \(p_{\text {T}}\)- and \(\eta \)-dependent factors that are derived from MC simulation with a residual in situ calibration based on data [65]. In addition, a pile-up correction is applied to both the data and Monte Carlo (MC) events to further calibrate the jet before selection [66]. To reject jets likely to have originated from pile-up, a variable called the jet vertex fraction (\({\mathrm {JVF}}\)) is defined as the ratio of \(\sum {p_{\mathrm {T},i\in \text {PV}}}\) of all tracks in the jet which originate from the primary vertex to the \(\sum {p_{\mathrm {T},i}}\) of all tracks in the jet. Only tracks with \(p_{\text {T}}\) > 1 GeV are considered in the \({\mathrm {JVF}}\) calculation. Jets with \(|\eta |<2.4\) and \(p_{\text {T}} < 50~\hbox {GeV}\) are required to have \(|{\mathrm {JVF}}|> 0.5\).

Identification of b-quark jets One of the most important selection criteria for the analysis of events containing top quarks is the one that identifies jets likely to contain b-hadrons, called b-tagging. Identification of b-jets is based on the long lifetime of b-hadrons, which results in a significant flight path length and leads to reconstructable secondary vertices and tracks with large impact parameters relative to the primary vertex. In this analysis, a neural-network-based algorithm is used at a working point corresponding to a b-tagging efficiency in the simulated \(t\bar{t}\) events of 70%, a c-jet rejection factor of 5 and light-flavour jet rejection factor of 140 [67].

Missing transverse momentum The missing transverse momentum is a measure of the momentum of the escaping neutrinos. It also includes energy losses due to detector inefficiencies, leading to the mismeasurement of the true transverse energy \(E_{\text {T}} \) of the detected final-state objects. The missing transverse momentum vector, \(\vec {E}_{\text {T}}^{\text {miss}}\), is calculated as the negative vector sum of the transverse momenta of reconstructed and calibrated physics objects, i.e. electrons, muons and jets as well as energy deposits in the calorimeter which are not associated with physics objects [68]. The magnitude of the missing transverse momentum vector is defined as \(E_{\text {T}}^{\text {miss}} = | \vec {E}_{\text {T}}^{\text {miss}} |\).

A procedure to remove overlaps between physics objects is applied, where jets overlapping with identified electron candidates within a cone of size \(\Delta R = 0.2\) are removed from the list of jets, as the jet and the electron are very likely to correspond to the same physics object. In order to ensure the selection of isolated charged leptons, further overlap removals are applied. If electrons are still present with distance \(\Delta R < 0.4\) to a jet, they are removed from the event. Muons overlapping with a jet within \(\Delta R < 0.4\) are discarded from the event.

4.2 Particle-level object reconstruction

Particle-level objects are defined using stable particles with a mean lifetime greater than \(0.3\cdot 10^{-10}~\hbox {s}\). Selected leptons are defined as electrons, muons or neutrinos originating from the W-boson decay, including those that originate from a subsequent \(\tau \)-lepton decay. Leptons from hadron decays either directly or via a hadronic \(\tau \) decay are excluded. The selected charged lepton is combined with photons within \(\Delta R < 0.1\), which implies that the final four-momentum vector is the vector sum of the associated photons and the original lepton four-vector. Finally the charged lepton is required to have \(p_{\text {T}} > 25~\hbox {GeV}\) and \(|\eta | < 2.5\).

Particle-level jets are reconstructed using the anti-\(k_t\) algorithm with a radius parameter of \(R=0.4\). All stable particles are used for jet clustering, except the selected leptons (electrons, muons or neutrinos) and the photons associated with the charged leptons. This implies that the energy of the particle level b-jet is close to that of the b-quark before hadronisation and fragmentation. Each jet is required to have \(p_{\text {T}} > 25~\hbox {GeV}\) and \(|\eta | < 2.5\).

Events are rejected if a selected lepton is at a distance \(\Delta R < 0.4\) to a selected jet.

The fiducial volume is defined by selecting exactly one electron or muon and at least three particle-level jets. Setting the minimum number of particle-level jets to three minimises the extrapolation uncertainty going from the detector-level volume to the particle-level fiducial volume. In this case the fraction of events which are selected in the detector-level volume and not selected in the particle-level fiducial volume is of the order of 10%.

5 Event selection and classification

This section describes the selection of \(t\bar{t}\) candidate events. The datasets used in this analysis are obtained from single-electron or single-muon triggers. For the electron channel, a calorimeter energy cluster needs to be matched to a track, and the trigger electron candidate is required to have \(E_{\text {T}} > 60~\hbox {GeV}\) or \(E_{\text {T}} > 24~\hbox {GeV}\) with additional isolation requirements [31]. The single-muon trigger [69] requires either an isolated muon with \(p_{\text {T}} >24~\hbox {GeV}\) or a muon with \(p_{\text {T}} >36~\hbox {GeV}\).

Each event is required to have at least one vertex reconstructed from at least five tracks, where the \(p_{\text {T}}\) of each track is above 400 MeV. The vertex with the largest sum of \(p_{\text {T}}^{2}\) of the associated tracks is chosen as the primary vertex. Events containing any jets with \(p_{\text {T}} > 20~\hbox {GeV}\) failing to satisfy quality criteria defined in Ref. [70] are rejected, in order to suppress background from beam–gas and beam-halo interactions, cosmic rays and calorimeter noise.

In order to select \(t\bar{t}\) events in the lepton+jets channel, exactly one electron or muon with \(p_{\text {T}} > 25~\hbox {GeV}\) is required. In addition to the requirements explained in Sect. 4, the \(\Delta R\) value between the reconstructed lepton and the trigger-lepton has to be smaller than 0.15. Events containing an electron candidate and a muon candidate sharing an ID track are discarded.

Furthermore, events must have at least four jets with \(p_{\text {T}} > 25~\hbox {GeV}\) and \(|\eta |<2.5\). At least one of the jets has to be b-tagged. To enhance the fraction of events with a leptonically decaying W boson, events are required to have \(E_{\text {T}}^{\text {miss}} > 25~\hbox {GeV}\) and the transverse mass \(m_{\text {T}}(W)\) of the lepton–\(E_{\text {T}}^{\text {miss}} \) pair is required to be

$$\begin{aligned} m_{\text {T}}(W)= & {} \sqrt{2 p_{\text {T}} (\ell ) \cdot E_{\text {T}}^{\text {miss}} \left[ 1-\cos \left( \Delta \phi \left( \vec {\ell }, \vec {E}_{\text {T}}^{\text {miss}} \right) \right) \right] }\\> & {} 30~\hbox {GeV}, \end{aligned}$$

where \(p_{\text {T}} (\ell )\) is the transverse momentum of the charged lepton and \(\Delta \phi \) is the angle in the transverse plane between the charged lepton and the \(\vec {E}_{\text {T}}^{\text {miss}}\) vector.

The measurement of the \(t\bar{t}\) cross-section is performed by splitting the selected sample into three disjoint signal regions. These have different sensitivities to the various backgrounds, to the production of additional radiation, and to detector effects.

  • SR1: \(\ge 4\) jets, 1 b-tag

    In this region, events with at least four jets of which exactly one is b-tagged are selected. This region has the highest background fraction of all three signal regions, with \(W \text {+\,jets}\) being the dominant background. This signal region has the highest number of selected events.

  • SR2: 4 jets, 2 b-tags

    In this region, events with exactly four jets of which exactly two are b-tagged are selected. The background is expected to be small in this region and this allows an unambiguous matching of the reconstructed objects to the top-quark decay products. In particular, the two untagged jets are likely to originate from the hadronically decaying W boson. The reconstructed W-boson mass is sensitive to the jet energy scale and to additional radiation.

  • SR3: \(\ge 4\) jets, \(\ge 2\) b-tags (excluding events from SR2)

    In the third region, events are required to have at least four jets with at least two b-tagged jets. Events with exactly four jets and two b-tags are assigned to SR2. This region includes \(t\bar{t}\) events with extra gluon radiation, including \(t\bar{t}\) + heavy-flavour production, and is sensitive to the efficiency of misidentifying c-jets, originating mainly from the \(W\rightarrow cs\) decay, as b-jets. The expected background is the smallest among the signal regions.

For the determination of the \(t\bar{t}\) cross-section a discriminating variable in each signal region is defined, as explained in Sect. 7. The number of \(t\bar{t}\) events is extracted using a simultaneous fit of three discriminating-variable distributions, one from each signal region, to data. In order to reduce systematic uncertainties due to the jet energy scale and b-tagging efficiency, their effects on the signal and background distributions are parameterised with nuisance parameters, which are included in the fit.

6 Background modelling and estimation

The dominant background to \(t\bar{t}\) production in the lepton+jets final state is \(W \text {+\,jets}\) production. This analysis uses a sample defined from data to model the shapes of the discriminating-variable distributions for this background, while the normalisation in each signal region is determined in the final fit. The multijet background process, which is difficult to model in the simulation, is also modelled using data and normalised using control regions. All remaining backgrounds are determined using simulated events and theoretical predictions.

The method used to model the \(W \text {+\,jets}\) background shape from data is based on the similarity of the production and decay of the Z boson to those of the W boson.

First, an almost background-free \(Z\)+ jets sample is selected in the following way:

  • Events are required to contain exactly two oppositely charged leptons of the same flavour, i.e. \(e^{+} e^{-}\) or \(\mu ^{+} \mu ^{-}\), and

  • the dilepton invariant mass \(m(\ell \ell )\) has to be consistent with the Z-boson mass (\(80< m(\ell \ell ) < 102~\hbox {GeV}\)).

These events are then ‘converted’ into \(W \text {+\,jets}\) events. This is achieved by boosting the leptons of the Z-boson decay into the Z boson’s rest frame, scaling their momenta to that of a lepton decay from a W boson by the ratio of the boson masses and boosting the leptons back into the laboratory system. The scaled lepton momenta are given by

$$\begin{aligned} \vec {p'^{*}}_{\ell _i}=\frac{m_W}{m_Z} \vec {p^{*}}_{\ell _i}, \end{aligned}$$

where \(\vec {p^{*}}_{\ell _i}\) is the momentum vector of lepton i in the Z-boson’s rest frame, \(m_W\) and \(m_Z\) are the masses of the W- and Z-bosons respectively, and \(\vec {p'^{*}}_{\ell _i}\) is the scaled momentum vector of lepton i in the Z-boson’s rest frame.

After this conversion, one of the leptons is randomly chosen to be removed, and the \(\vec {E}_{\text {T}}^{\text {miss}}\) vector is recalculated. Finally, the event selection requirements discussed in Sect. 5 are applied, except for the b-tagging requirement. In the following, this sample is referred to as the \(Z\) to \(W\) sample.

Detailed studies in simulation and in validation regions are performed. As an example, two important variables, discriminating between \(W \text {+\,jets}\) and \(t\bar{t}\) events, are compared between simulated \(W \text {+\,jets}\) events with at least four jets and at least one b-tag and \(Z\) to \(W\) events derived from a simulated \(Z\)+ jets sample with at least four jets and no b-tagging requirement. Distributions of these variables are shown in Fig. 1: the aplanarity event-shape variable and the mass of the hadronically decaying top-quark candidate. Details about the top-quark reconstruction are given in Sect. 7. The aplanarity is defined as

$$\begin{aligned} A= & {} \frac{3}{2}\lambda _3, \end{aligned}$$
(2)

where \(\lambda _3\) is the smallest eigenvalue of the sphericity tensor, defined by

$$\begin{aligned} S^{\alpha \beta }= & {} \frac{\sum _i p_i^{\alpha } p_i^{\beta } }{\sum _i\mid p_i \mid ^2 }. \end{aligned}$$

Here, \(\alpha \) and \(\beta \) correspond to the x, y and z momentum components of final-state object i in the event, i.e. the jets, the charged lepton and the reconstructed neutrino (see Sect. 7).

Fig. 1
figure 1

Probability densities of a the aplanarity and b the mass distribution of the hadronically decaying top-quark candidates for simulated \(W \text {+\,jets}\) events with at least four jets and at least one b-tag and \(Z\) to \(W\) events derived from a simulated \(Z\)+ jets sample with at least four jets and no b-tagging requirement. The lower histogram shows the relative difference between the numbers of \(Z\) to \(W\) and \(W \text {+\,jets}\) events in each bin with respect to the number of \(W \text {+\,jets}\) events. The grey error band represents the Monte Carlo statistical uncertainty of the \(W \text {+\,jets}\) sample. Events beyond the x-axis range are included in the last bin

Residual differences between the shapes of the \(W \text {+\,jets}\) and \(Z\) to \(W\) templates are accounted for as a systematic uncertainty in the analysis. Since the method only provides shape information, the expected number of events for the \(W \text {+\,jets}\) process in the signal regions is obtained from the acceptance of simulated samples using Alpgen + Pythia and normalised using the inclusive NNLO \(W \text {+\,jets}\) cross-section as described in Sect. 3. These numbers are used to define the nominal background yield prior to the fit and used as initial values for the fit in the final statistical analysis.

Multijet events may be selected if a jet is misidentified as an isolated lepton or if the event has a non-prompt lepton that appears to be isolated (these two sources of background are referred to as fake leptons). The normalisation of the multijet background is obtained from a fit to the observed \(E_{\text {T}}^{\text {miss}}\) distribution in the electron channel or to the \(m_{\text {T}}(W)\)  distribution in the muon channel in the signal regions. In order to construct a sample of multijet background events, different methods are adopted for the electron and muon channels.

The ‘jet-lepton’ method [71] is used to model the background due to fake electrons using a dijet sample simulated with the Pythia 8 event generator [34]. A jet that resembles the electron has to have \(E_{\text {T}} > 25~\hbox {GeV}\) and be located in the same \(\eta \) region as the signal electrons. The jet energy must have an electromagnetic fraction of between 0.8 and 0.95. The event is accepted if exactly one such jet is found, and if the event passes all other selection requirements as described above, except the one on \(E_{\text {T}}^{\text {miss}}\). The yield of the multijet background in the electron-triggered data sample is then estimated using a binned maximum-likelihood fit to the \(E_{\text {T}}^{\text {miss}}\) distribution using the template determined from the selected events in the dijet simulated sample. In order to improve the modelling of the \(\eta (\ell )\) distribution of the ‘jet-lepton’ model in SR1, the fit is done separately in the barrel region (\(|\eta | \le 1.37\)) and in the endcap region (\(|\eta | > 1.52\)). The fits for SR2 and SR3 are performed inclusively in \(|\eta |\) due to the lower number of selected events.

The ‘anti-muon’ method [71] uses a dedicated selection on data to enrich a sample in events that contain fake muons in order to build a multijet model for muon-triggered events. Some of the muon identification requirements differ from those for signal muon candidates. The calorimeter isolation requirement is inverted, while keeping the total energy loss of the muon in the calorimeters below \(6~\hbox {GeV}\), and the requirement on the impact parameter is omitted. The additional application of all other event selection requirements mentioned in Sect. 5 results in a sample that is highly enriched in fake muons from multijet events, but contains only a small fraction of prompt muons from Z- and W-boson decays. The yield of the multijet background in the muon-triggered data sample is estimated from a maximum-likelihood fit to the \(m_{\text {T}}(W)\) distribution using the template determined from the selected multijet events in the data sample. A different fit observable (\(m_{\text {T}}(W)\)) in the muon channel is used, since it provides a better modelling of the multijet background than the \(E_{\text {T}}^{\text {miss}}\) observable used in the electron channel.

In both methods to obtain the multijet background normalisation, the multijet template is fitted together with templates derived from MC simulation for the \(t\bar{t}\) and \(W \text {+\,jets}\) processes. The \(t\bar{t}\) and \(W \text {+\,jets}\) rate uncertainties, obtained from theoretical cross-section uncertainties, are accounted for in the fitting process in the form of constrained normalisation factors. The rates for \(Z\)+ jets, single-top-quark processes, and VV processes are fixed to the predictions as described in Sect. 3. For the fits in SR2 and SR3, the \(W \text {+\,jets}\) process is fixed as well, since the predicted yield is very small in these signal regions. The resulting fitted rate of \(t\bar{t}\) events is in agreement within the statistical uncertainty with the result of the final estimation of the \(t\bar{t}\) cross-section and therefore does not bias the result. Distributions of the fitted observable, normalised to the fit results, are shown in Fig. 2.

Fig. 2
figure 2

Observed and simulated (left) \(E_{\text {T}}^{\text {miss}}\) distributions in the electron channel and (right) \(m_{\text {T}}(W)\) distributions in the muon channel, normalised to the result of the binned maximum-likelihood fit, a for the barrel region in SR1, b in SR1, c, d in SR2, and e, f in SR3. The hatched error bands represent the uncertainty due to the sample size and the normalisation of the backgrounds. The ratio of observed to predicted (Pred.) number of events in each bin is shown in the lower histogram. Events beyond the x-axis range are included in the last bin

The ‘matrix’ method [71] is used as an alternative method to estimate systematic uncertainties in the multijet background estimate. It provides template distributions and estimates of the number of multijet events in SR1. Differences between the fitting method and the ‘matrix’ method are taken into account as systematic uncertainties yielding a normalisation uncertainty of 67%. Due to the small number of events for the ‘matrix’ method in SR2 and SR3 an uncertainty of 50% is assigned, based on comparisons of the rates obtained using alternative methods described in previous analyses [71].

As a result of the above procedure, the fraction of the total background estimated to originate from multijet events for \(E_{\text {T}}^{\text {miss}} > 25~\hbox {GeV}\) and \(m_{\text {T}}(W) > 30~\hbox {GeV}\) is \((5.4\pm 3.0)\%\) in SR1, \((2.6\pm 1.3)\%\) in SR2 and \((1.5\pm 0.8)\%\) in SR3. All other processes, namely \(t\bar{t}\) and single top-quark production, \(Z\)+ jets and VV production, are modelled using simulation samples as described in Sect. 3.

Table 1 summarises the event yields in the three signal regions for the \(t\bar{t}\) signal process and each of the background processes. The yields, apart from the multijet background, are calculated using the acceptance from MC samples normalised using their respective theoretical cross-sections as discussed in Sect. 3. The quoted uncertainties correspond to the statistical uncertainties in the Monte Carlo samples, except in the case of the multijet background where they correspond to the uncertainties in the background estimate.

Table 1 Event yields for the three signal regions. The multijet background and its uncertainty are estimated from the \(E_{\text {T}}^{\text {miss}}\) or \(m_{\text {T}}(W)\) fit to data. All the other expectations are derived using theoretical cross-sections, and the corresponding Monte Carlo sample statistical uncertainty

7 Discriminating observables

In order to further separate the signal events from background events in SR1 and SR3, the output distribution of an artificial neural network (NN) [72, 73] is used. A large number of potential NN input variables are studied for their discriminating power between \(W \text {+\,jets}\) and \(t\bar{t}\) and the compatibility of their distributions between simulated \(W \text {+\,jets}\) events with at least one b-tag and \(Z\) to \(W\) events with no b-tagging requirement. The observables investigated are based on invariant masses of jets and leptons, event shape observables and properties of the reconstructed top quarks.

In SR1 and SR3, the semileptonically decaying top quark is reconstructed. First, the leptonically decaying W boson’s four-momentum is reconstructed from the identified charged lepton’s four-momentum and the \(E_{\text {T}}^{\text {miss}}\) vector, the latter representing the transverse momentum of the neutrino. The unmeasured z-component of the neutrino momentum \(p_z(\nu )\) is inferred by imposing a W-boson mass constraint on the lepton–neutrino system, leading to a two-fold ambiguity. In the case of two real solutions, the one with the lower \(|p_z(\nu )|\) is chosen. In the case of complex solutions, which can occur due to the finite \(E_{\text {T}}^{\text {miss}}\) resolution, a fit is performed that rescales the neutrino \(p_x\) and \(p_y\) such that the imaginary radical vanishes, at the same time keeping the transverse components of the neutrino’s momentum as close as possible to the x- and y-components of \(E_{\text {T}}^{\text {miss}}\). To reconstruct the semileptonically decaying top quark, the four jets with the highest \(p_{\text {T}}\) are selected and the one with the smallest \(\Delta R\) to the charged lepton is chosen to be the b-jet. The semileptonically decaying top quark is then reconstructed by adding the four-momentum of the W boson and the chosen b-jet. The hadronically decaying top quark is reconstructed by adding the four-momenta of the remaining three highest-\(p_{\text {T}}\) jets.

Table 2 List of the seven input variables of the NN, ordered by their discriminating power

Seven observables are finally chosen as input variables to the NN (see Table 2). The choice was made by studying the correlations between the potential input variables and choosing the ones with small correlations that still provide a good separation between the signal and the background events. The NN infrastructure consists of one input node for each input variable plus one bias node, eight nodes in the hidden layer, and one output node, which gives a continuous output \(o_{\mathrm {NN}}\) in the interval [0, 1]. For the training of the NN, an equal number of simulated \(t\bar{t}\) events and \(Z\) to \(W\) events are used. The training is performed in an inclusive phase space with \(\ge 4\) jets and \(\ge 1\) b-tag to cover the whole phase space and achieve an optimal separation power in both signal regions.

The discriminating power of the NN between \(Z\) to \(W\) and \(t\bar{t}\) events can be seen in Fig. 3 for SR1 and SR3.

Fig. 3
figure 3

Probability densities of the neural-network discriminant \(o_{\mathrm {NN}}\) for the simulated \(t\bar{t}\) signal process and the \(W \text {+\,jets}\) background process derived from data using converted \(Z\)+ jets events a for SR1 and b for SR3

In SR2, a different distribution is used as discriminant in the final fit. Inspired by measurements of the top-quark mass, where the invariant mass of the two untagged jets m(jj) is frequently utilised to reduce the impact of the jet energy scale (JES) uncertainty [74,75,76,77], this approach is also followed here. The dependency of m(jj) on the JES is shown in Fig. 4a using simulated \(t\bar{t}\) events with modified global JES correction factors. Here the energy of the jets is scaled by \(\pm 4\%\). Additionally, the mean of the m(jj) distribution is sensitive to the amount of additional radiation. A comparison of the mean value of a Gaussian distribution fitted to the m(jj) distribution in the range of \(60~\hbox {GeV}< m(jj) < 100~\hbox {GeV}\) for different generator set-ups is presented in Fig. 4b. It can be seen that the mean value is compatible for different generator set-ups, but varies for different settings of the parameters controlling the initial- and final-state radiation. For these reasons, the m(jj) observable is used as the discriminant in SR2.

Fig. 4
figure 4

a Probability densities of the m(jj) distribution from the \(t\bar{t}\) signal process for three different values of the JES, where events beyond the x-axis range are not shown and the range is restricted to show the peak. b Mean value of the fit to the m(jj) distribution using a Gaussian distribution for different signal generator set-ups. The uncertainties shown are statistical only

Finally, the ratio of single to double b-tagged events, i.e. the ratio of events in SR1 to the sum of events in SR2 and SR3, is sensitive to the b-tagging efficiency. The effect of the b-tagging efficiency is parameterised with a nuisance parameter in the final fit. Since only two b-jets are present in \(t\bar{t}\) events, any additional b-tagged jets originate either from heavy-flavour production in the parton shower or from mistagged c-hadrons. The inclusion of events with more than two b-tags in SR3 gives a small sensitivity to heavy-flavour production in the parton shower.

8 Sources and estimation of systematic uncertainties

Several sources of systematic uncertainty affect the \(t\bar{t}\) cross-section measurement. In addition to the luminosity determination, they are related to the modelling of the physics objects, the modelling of \(t\bar{t}\) production and the understanding of the background processes. All of them affect the yields and kinematic distributions (shape of the distributions) in the three signal regions.

8.1 Physics objects modelling

Systematic uncertainties associated with reconstructed jets, electrons and muons, due to residual differences between data and MC simulations after calibration, and uncertainties in corrective scale factors are propagated through the entire analysis.

Uncertainties due to the lepton trigger, reconstruction and selection efficiencies in simulation are estimated from measurements of the efficiency in data using \(Z\rightarrow \ell \ell \) decays. The same processes are used to estimate uncertainties in the lepton momentum scale and resolution, and correction factors and their associated uncertainties are derived to match the simulated distributions to the observed distributions [58,59,60].

The JES uncertainties are derived using information from test-beam data, collision data and simulation. The uncertainty is parameterised in terms of jet \(p_{\text {T}}\) and \(\eta \) [65]. The JES uncertainty is broken down into various components originating from the calibration method, the calorimeter response, the detector simulation, and the set of parameters used in the MC event generator. Furthermore, contributions from the modelling of pile-up effects, differences between jets induced by b-quarks and those from gluons or light-quarks are included. A large uncertainty in the JES originates from the a-priori unknown relative fractions of quark-induced and gluon-induced jets in a generic sample, which is normally assumed to be \((50\pm 50)\%\). In this analysis, the actual fraction of gluon-induced jets is estimated in simulated events, which leads to a reduction in the uncertainty of these components by half. The fraction of gluon-induced jets is obtained, considering all selected jets apart from b-jets and it is between 15% to 30% depending on the \(p_{\text {T}}\) and \(\eta \) of the jet. The uncertainty in this fraction is estimated by comparing different \(t\bar{t}\) samples, namely Powheg-Box + Pythia, Powheg-Box + Herwig, and MC@NLO + Herwig as well as samples with varied scale settings in the Powheg-Box + Pythia set-up. To estimate the systematic uncertainty of the JES, a parameterisation with 25 uncorrelated components is used, as described in Ref. [65]. For the purpose of the extraction of the \(t\bar{t}\) cross-section, a single correction factor for the JES is included in the fit as a nuisance parameter (see Sect. 9). In this procedure, the dependence of the acceptance and the shape of the m(jj) template distribution on the JES is parameterised using the global JES uncertainty correction factor corresponding to the total JES uncertainty. Figure 5 shows the effect of a \(\pm 1 \sigma \) change in the global JES correction factor on the m(jj) distribution. When estimating the systematic uncertainty in the \(t\bar{t}\) cross-section due to the JES in the statistical procedure, all 25 components are considered and evaluated as described in Sect. 9.

Fig. 5
figure 5

Probability density of the m(jj) distribution from simulated \(t\bar{t}\) events in SR2 for the nominal JES and the \(\pm 1 \sigma \) variation. The lower histogram shows the relative difference between the numbers of \(t\bar{t}\) events for the \(\pm 1 \sigma \) JES and the nominal JES in each bin with respect to the nominal JES. The grey error band represents the statistical uncertainty of the sample. Events beyond the x-axis range are included in the last bin

Smaller uncertainties originate from modelling of the jet energy resolution [78, 79] and missing transverse momentum [68] to account for contributions from pile-up, soft jets, and calorimeter cells not matched to any jets. Uncertainties from the scale and resolution corrections for leptons and jets are propagated into the calculation of the missing transverse momentum as well. The effect of uncertainties associated with the \({\mathrm {JVF}}\) is also considered for each jet.

Since the analysis makes use of b-tagging, the uncertainties in the b-tagging efficiencies and the c-jet and light-jet mistag probabilities are taken into account [80, 81]. Correction factors applied to simulated events compensate for differences between data and simulation in the tagging efficiency for b-jets, c-jets and light-flavour jets. The correction for b-jets is derived from \(t\bar{t}\) events in the dilepton channel and dijet events, and are found to be consistent with unity with uncertainties at the level of a few percent over most of the jet \(p_{\text {T}}\) range [81]. Similar to the JES, the uncertainty in the correction factor of the b-tagging efficiency is included as a nuisance parameter in the fit for the extraction of the \(t\bar{t}\) cross-section. The parameterisation of the correction factor is obtained from the total uncertainty in the b-tagging efficiency.

8.2 Signal Monte Carlo modelling and parton distribution functions

Systematic effects from MC modelling are estimated by comparing different event generators and varying parameters for the event generation of the signal process.

The uncertainty from renormalisation and factorisation scale variations, and amount of additional radiation in the parton shower is estimated using the Powheg-Box event generator interfaced to Pythia by varying these scales and using alternative sets of tuned parameters for the parton shower as described in Sect. 3. Systematic effects due to the matching of the NLO matrix-element calculation and the parton shower for \(t\bar{t}\) is estimated by comparing MC@NLO with Powheg-Box, both interfaced to the Herwig parton shower. An uncertainty related to the modelling of parton-shower, hadronisation effects and underlying-event, is estimated by comparing samples produced with Powheg-Box + Herwig and Powheg-Box + Pythia. More details about these samples are given in Sect. 3.

Systematic uncertainties related to the PDF set are taken into account for the signal process. The uncertainty is calculated following the PDF4LHC recommendation [82] using the PDF4LHC15_NLO PDF set. In addition, the acceptance difference between PDF4LHC15_NLO and CT10 is considered, since the latter PDF set is not covered by the uncertainty obtained with PDF4LHC15_NLO and it is used in the simulation of \(t\bar{t}\) events. This uncertainty is used in the final results, since it is larger than the uncertainty obtained with PDF4LHC15_NLO.

Finally, the statistical uncertainty of the MC samples as well as the \(Z\) to \(W\) data sample is included.

8.3 Background normalisation for non-fitted backgrounds

Uncertainties in the normalisation of the non-fitted backgrounds, i.e. single-top-quark, VV, and \(Z\)+ jets events, are estimated using the uncertainties in the theoretical cross-section predictions. In the case of \(Z\)+ jets, an uncertainty of 24% per additional jet is added to the uncertainty of the inclusive cross-section in quadrature leading to an total uncertainty of 48% for events with four jets. The uncertainty in the multijet background is obtained in SR1 from the comparison between the fitting method and the ‘matrix’ method as detailed in Sect. 6. For the other two regions, an uncertainty of 50% is used.

8.4 Background modelling

Uncertainties in the shape of the \(W \text {+\,jets}\) and multijet backgrounds are taken into account for the discriminating observables used in the analysis. For the \(W \text {+\,jets}\) background, shape uncertainties are extracted from the differences between Z-boson and W-boson production. Although their production modes are very similar, differences exist in the details of the production and decay. There are differences in heavy-flavour production and in the helicity structures of the decay vertices. Shape variations are built from a comparison of the NN discriminant and the m(jj) distribution between simulated \(W \text {+\,jets}\) events, described in Sect. 3, and \(Z\) to \(W\) events derived from a simulated \(Z\)+ jets sample. The uncertainty in the multijet background kinematics is estimated from the differences between the predictions from the ‘jet-lepton’ or ‘anti-muon’ method and the ‘matrix’ method in SR1.

8.5 Luminosity

The absolute luminosity scale is derived from beam-separation scans performed in November 2012. The uncertainty in the integrated luminosity is 1.9% [83].

8.6 Beam energy

The beam energy of the LHC was determined at 4 TeV from the LHC magnetic model together with measurements of the revolution frequency difference of proton and lead-ion beams, with an uncertainty of 0.1% [84]. The impact of the uncertainty of the beam energy on the measured cross-section is negligible.

9 Extraction of the \(t\bar{t}\) cross-section

The measured inclusive cross-section is given by

$$\begin{aligned} \sigma _{\mathrm {inc}} = \frac{\hat{\nu }}{\epsilon \cdot \mathcal {L}_{\mathrm {int}}} = \frac{\hat{\beta } \cdot \nu }{\epsilon \cdot \mathcal {L}_{\mathrm {int}}} \quad \text {with} \ \epsilon = \frac{N_{\mathrm {sel}}}{N_{\mathrm {total}}}, \end{aligned}$$
(3)

where \(\hat{\nu }\) is the observed number of signal events. The quantity \(\epsilon \) is the total event-selection efficiency, \(N_{\mathrm {total}}\) is the number of events obtained from a simulated signal sample before applying any requirement and \(N_{\mathrm {sel}}\) is the number of events obtained from the same simulated signal sample after applying all selection requirements. In practice, \(\hat{\nu }\) is given by \(\hat{\beta } \cdot \nu \), where \(\hat{\beta }\) is an estimated scale factor obtained from a binned maximum-likelihood fit and \(\nu = \epsilon \cdot \sigma _{\mathrm {theo}} \cdot \mathcal {L}_{\mathrm {int}}\) is the expected number of events for the signal process. The reference cross-section \(\sigma _{\mathrm {theo}}\) is defined by the central value of the theoretical prediction given in Eq. (1). By combining Eq. (3) with the expected number of events, one obtains:

$$\begin{aligned} \sigma _{\mathrm {inc}} = \hat{\beta } \cdot \sigma _{\mathrm {theo}}. \end{aligned}$$

The fiducial cross-section is given by

$$\begin{aligned} \sigma _{\mathrm {fid}} = A_{\mathrm {fid}}\cdot \sigma _{\mathrm {inc}} \quad \text {with} \ A_{\mathrm {fid}} = \frac{N_{\mathrm {fid}}}{N_{\mathrm {total}}}, \end{aligned}$$

with \(N_{\mathrm {fid}}\) being the number of events obtained from a simulated signal sample after applying the particle-level selection. Here \(A_{\mathrm {fid}}\) is defined for an inclusive \(t\bar{t}\) sample, including all decay modes of the W bosons. Using Eq. (3), the fiducial cross-section can be written as:

$$\begin{aligned} \sigma _{\mathrm {fid}} = \frac{\hat{\nu }}{\epsilon ^{\prime } \cdot \mathcal {L}_{\mathrm {int}}} \quad \text {with}\ \epsilon ^{\prime } = \frac{N_{\mathrm {sel}}}{N_{\mathrm {fid}}}. \end{aligned}$$
(4)

From Eq. (4) it is apparent that signal modelling uncertainties that affect \(N_{\mathrm {sel}}\) and \(N_{\mathrm {fid}}\) in a similar way give a reduced uncertainty in \(\sigma _{\mathrm {fid}}\) compared to that in \(\sigma _{\mathrm {inc}}\).

The binned maximum-likelihood fit is performed simultaneously in the three signal regions defined in Sect. 5. For SR1 and SR3 the distribution used in the fit is the NN discriminant, while the invariant-mass distribution m(jj) of the two untagged jets is used in SR2. Electron- and muon-triggered events are combined in the templates used in this fit.

Scale factors \(\beta ^{t\bar{t}}\) and \(\beta ^{W_j}\) for the signal and \(W \text {+\,jets}\) background, respectively, and two nuisance parameters \(\delta _i\), namely the b-tagging efficiency correction factor \(\delta _{b-{\mathrm {tag}}}\) and the JES correction factor \(\delta _{\mathrm {JES}}\), are fitted in all three signal regions simultaneously. The \(\delta _i\) are defined such that 0 corresponds to the nominal value and \(\pm 1\) to a deviation of \(\pm 1 \sigma \) of the corresponding systematic uncertainty.

In order to account for differences in the flavour composition of the \(W \text {+\,jets}\) background, two uncorrelated scale factors are used: one in SR1 (\(\beta ^{W_1}\)) and one in the two other signal regions (\(\beta ^{W_{2,3}}\)). The event yields of the other backgrounds are not allowed to vary in the fit, but instead are fixed to their predictions. The likelihood function is given by the product of the Poisson likelihoods in the individual bins M of the histograms. A Gaussian prior is incorporated into the likelihood function to constrain \(\delta _{b-{\mathrm {tag}}}\) within the associated uncertainty:

$$\begin{aligned}&L(\beta ^{t\bar{t}}, \beta ^{W_1}, \beta ^{W_{2,3}} ,\delta _{b\text {-tag}},\delta _{\mathrm {JES}})\\&\quad =\prod _{k=1}^M \frac{\mathrm {e}^{-\mu _k}\cdot \mu _k^{n_k}}{n_k!} \;\cdot \; G(\delta _{b\text {-tag}}; 0, 1) \end{aligned}$$

with

$$\begin{aligned} \mu _k= & {} \beta ^s \cdot \nu _{s} \cdot \alpha ^s_k + \sum _{j=1}^2 \beta ^{W_j} \cdot \nu _{W_j} \cdot \alpha ^{W_j}_k + \sum _{b=1}^4 \nu _{b} \cdot \alpha ^{b}_k, \\ \beta ^s= & {} \beta ^{t\bar{t}} \left\{ 1 + \sum _{i=1}^2 |\delta _i| \cdot \left( H(\delta _i)\cdot \epsilon _{i+} + H(-\delta _i)\cdot \epsilon _{i-} \right) \right\} , \\ \alpha ^s_k= & {} \alpha ^{t\bar{t}}_k \sum _{i=1}^2 |\delta _i| \cdot \left\{ (\alpha _{ki}^+ - \alpha _{k})\cdot H(\delta _i) + (\alpha _{ki}^- - \alpha _{k})\cdot H(-\delta _i) \right\} . \end{aligned}$$

The expected number of events \(\mu _k\) in bin k is the sum of the expected number of events for the signal and the background processes. These are given by the product of the predicted number of events \(\nu _p\) of each process and the fraction of events \(\alpha ^{p}_k\) in bin k of the normalised distribution. Here p denotes the signal s and background processes \(W_j\) and b, where b represents the background processes which are not varied in the fit. The number of events observed in bin k is denoted by \(n_k\). For the \(t\bar{t}\) signal, the scale factor \(\beta ^s\) contains the acceptance uncertainties for positive \(\epsilon _{i+}\) and negative \(\epsilon _{i-}\) variations of the two profiled systematic uncertainties, multiplied by their nuisance parameter \(\delta _i\). The symbol H denotes the Heaviside function. The signal template shape for each profiled systematic variation is calculated by interpolating in each bin k between the standard template \(\alpha _{k}\) and the systematically altered histograms \(\alpha _{ki}^{\pm }\) using the nuisance parameter \(\delta _i\) as a weight. Linearity and closure tests are done to validate the statistical procedure.

The fit found the minimum of the negative log-likelihood function for the parameter values shown in Table 3. The estimators for the nuisance parameters, which parameterise their optimal shift relative to the default value 0 in terms of their uncertainty, are found to be \(\hat{\delta } = 0.62\pm \) 0.09 for the b-tagging efficiency correction factor and \(0.68\pm 0.07\) for JES correction factor. This deviation of the b-tagging efficiency correction factor from the nominal value of the simulated sample corresponds to a shift of the acceptance in SR1 of 1% and 2.6% in SR2 and SR3. The deviation for the JES correction factor corresponds to a shift of the acceptance of 2.9% in SR1, of 1.4% in SR2, and of 4.4% in SR3. The deviation of the JES correction factor also potentially accounts for differences in the modelling of additional radiation between the different MC event generator set-ups. Finally, the fitted scale factor of the \(W \text {+\,jets}\) process in SR2 and SR3 yields a value significantly higher than the one predicted by MC simulation, consistent with previous measurements indicating an underestimate of heavy-flavour production in the simulation [85].

Table 3 Result of the maximum-likelihood fit to data. Estimators of the parameters of the likelihood function, the scale factor \(\hat{\beta }\) for the \(t\bar{t}\) and the two \(W \text {+\,jets}\) channels and the derived contributions of the various processes to the three signal regions are listed. Only the statistical uncertainties obtained from the maximum-likelihood fit are shown for \(t\bar{t}\) and \(W \text {+\,jets}\), while the normalisation uncertainties are quoted for the other processes

The signal and background templates scaled and morphed to the fitted values of the fit parameters are compared to the observed distributions of the NN discriminant in SR1 and SR3 and the m(jj) distribution in SR2, shown in Fig. 6. Comparisons of the data and the fit results are shown for the three most discriminating input variables of the NN in Fig. 7 for SR1 and for SR3.

Fig. 6
figure 6

Neural network discriminant \(o_{\mathrm {NN}}\) or the m(jj) distribution normalised to the result of the maximum-likelihood fit for a SR1, b SR2, and c SR3. The hatched error bands represent the post-fit uncertainty. The ratio of observed to predicted (Pred.) number of events in each bin is shown in the lower histogram. Events beyond the x-axis range are included in the last bin

Fig. 7
figure 7

Distributions of the three most discriminating NN input variables for (left) SR1 and (right) SR3. The signal and backgrounds are normalised to the result of the maximum-likelihood fit: a, b smallest invariant mass between jet pairs, c, d cosine of the angle between the hadronic-top-quark momentum and the beam direction in the \(t\bar{t}\) rest frame, and e, f mass of the reconstructed semileptonically decaying top quark. The hatched error bands represent the post-fit uncertainty. The ratio of observed to predicted (Pred.) number of events in each bin is shown in the lower histogram. Events beyond the x-axis range are included in the last bin

The systematic uncertainties in the cross-section measurements are estimated using pseudo-experiments. In each of these experiments, the detector effects, background contributions and model uncertainties are varied within their systematic uncertainties. They impact the yields of the processes and shapes of the template distributions used to create the pseudo-datasets in the three signal regions. Correlations between rate and shape uncertainties for a given component are taken into account. The entire set of pseudo-experiments can thereby be interpreted as a replication of the sample space of all systematic variations. By measuring the \(t\bar{t}\) cross-section, an estimator of the probability density of all possible outcomes of the measurement is obtained. The RMS of this estimator distribution is itself an estimator of the observed uncertainties. Using the measured \(t\bar{t}\) cross-section and the estimated nuisance parameters, the uncertainty of the actual measurement is estimated.

The total uncertainty in both the inclusive and the fiducial \(t\bar{t}\) cross-section is presented in Table 4 and is estimated to be \(5.7\%\) for the inclusive measurement and \(4.5\%\) for the fiducial measurement. The breakdown of the contributions from individual, or categories of, systematic uncertainties are also listed. In this case, only the considered source or group of sources is varied in the generation of the pseudo-experiments. The largest uncertainty in the inclusive measurement is due to the uncertainty in the PDF sets and the MC modelling of the signal process. The effects of uncertainties in the JES and the b-tagging efficiency have been significantly reduced by including them as nuisance parameters together with the chosen signal regions and discriminant distributions. Furthermore, the uncertainty due to additional radiation is reduced by a factor of three thanks to the inclusion of the m(jj) distribution in the analysis. For the fiducial cross-section measurement, the uncertainties in the MC modelling and PDF sets are reduced. The uncertainty in the \(t\bar{t}\) cross-section due to the PDF sets is largest for events which are produced in the forward direction, i.e. one initial gluon has a high Bjorken-x value. The PDFs for high-x gluons have large uncertainties in all current PDF sets. Selecting events in the fiducial volume reduces the fraction of such events significantly and therefore the uncertainty is reduced significantly as well. In the case of the MC modelling, the uncertainty due to additional radiation is reduced more than the parton-shower and NLO-matching uncertainties, since varying the amount of radiation leads to similar changes in the selection efficiencies of the fiducial and reconstructed volumes and therefore to smaller uncertainties in the \(t\bar{t}\) cross-section.

Table 4 Breakdown of relative uncertainties in the measured inclusive and fiducial \(t\bar{t}\) cross-sections. The total uncertainties contain all considered uncertainties

10 Results

After performing a binned maximum-likelihood fit to the NN discriminant distributions and the m(jj) distribution, and estimating the total uncertainty, the inclusive \(t\bar{t}\) cross-section is measured to be:

$$\begin{aligned} \sigma _{\text {inc}}(t\bar{t}) = 248.3 \pm 0.7 \, ({\mathrm {stat.}}) \pm 13.4 \, ({\mathrm {syst.}}) \pm 4.7 \, ({\mathrm {lumi.}}) \ {\mathrm {pb}} \end{aligned}$$

assuming a top-quark mass of \(m_{\text {top}} = 172.5~\hbox {GeV}\).

The fiducial cross-section measured in the fiducial volume defined in Sect. 4.2 with acceptance \(A_{\text {fid}}=19.6\%\) is:

$$\begin{aligned} \sigma _{\text {fid}}(t\bar{t}) = 48.8 \pm 0.1 \, ({\mathrm {stat.}}) \pm 2.0 \, ({\mathrm {syst.}}) \pm 0.9 \, ({\mathrm {lumi.}})~\text {pb}. \end{aligned}$$

The dependence of the inclusive \(t\bar{t}\) cross-section measurement on the assumed value of \(m_{\text {top}} \) is mainly due to acceptance effects and can be expressed by the function:

$$\begin{aligned} \sigma _{t\bar{t}}(m_{\text {top}}) = \sigma _{t\bar{t}}(172.5~\hbox {GeV}) + p_1 \cdot \Delta m_{\text {top}} + p_2 \cdot \Delta m_{\text {top}} ^{2}, \end{aligned}$$

with \(\Delta m_{\text {top}} = m_{\text {top}}- 172.5~\hbox {GeV}\). The parameters \(p_1=-2.07 \pm 0.07~\hbox {pb}/\hbox {GeV}\) and \(p_2=0.07 \pm 0.02~\hbox {pb}/\hbox {GeV}^2\) are determined using dedicated signal samples with different \(m_{\text {top}} \) values, where signal template distributions are obtained from the alternative samples and the fit to data is repeated.

A combination of the cross-section in this channel with the more precise result in the dilepton channel [86] was tested. The central values of the two results are consistent within 0.2%, but due to the higher precision of the dilepton result, the combination yielded only a marginal improvement.

11 Conclusions

A measurement of both the inclusive and fiducial \(t\bar{t}\) cross-sections in pp collisions at \(\sqrt{s} = 8~\hbox {TeV}\) in the lepton+jets channel is presented using data collected in 2012 with the ATLAS detector at the LHC, corresponding to an integrated luminosity of \(20.2~\hbox {fb}^{-1}\).

In order to reduce major uncertainties coming from the jet energy scale and the b-tagging efficiency, the analysis splits the selected data sample into three disjoint signal regions with different numbers of b-tagged jets and different jet multiplicities. Using an artificial neural network, the separation between the signal and background processes is improved compared to using single observables. Additionally, the analysis makes use of a data-driven approach to model the dominant \(W \text {+\,jets}\) background. It is modelled from collision data by converting \(Z\)+ jets candidate events into a \(W \text {+\,jets}\) sample.

The \(t\bar{t}\) cross-section is determined using a binned maximum-likelihood fit to the three signal regions, constraining correction factors for the jet energy scale and the b-tagging efficiency. The inclusive \(t\bar{t}\) cross-section is measured with a precision of 5.7% to be:

$$\begin{aligned} \sigma _{\text {inc}}(t\bar{t}) = 248.3 \pm 0.7 \, ({\mathrm {stat.}}) \pm 13.4 \, ({\mathrm {syst.}}) \pm 4.7 \, ({\mathrm {lumi.}}) \ {\mathrm {pb}} \end{aligned}$$

assuming a top-quark mass of \(m_{\text {top}} = 172.5~\hbox {GeV}\).

The fiducial cross-section is measured with a precision of 4.5% to be:

$$\begin{aligned} \sigma _{\text {fid}}(t\bar{t}) = 48.8 \pm 0.1 \, ({\mathrm {stat.}}) \pm 2.0 \, ({\mathrm {syst.}}) \pm 0.9 \, ({\mathrm {lumi.}})~\text {pb}. \end{aligned}$$

This result is a significant improvement on the previous ATLAS measurement at \(\sqrt{s} = 8~\hbox {TeV}\) in the lepton+jets channel and is in agreement with measurements of the inclusive \(t\bar{t}\) cross-section in other decay modes and with the theoretical prediction.