1 Introduction

The lifetime of the top quark is shorter than the timescale for hadronisation (\({\sim }10^{-23}\) s) and is much shorter than the spin decorrelation time (\({\sim }10^{-21}\) s) [1]. As a result, the spin information of the top quark is transferred directly to its decay products. Top quark pair production (\(t\bar{t}\)) in QCD is parity invariant and hence the top quarks are not expected to be polarised in the Standard Model (SM); however, the spins of the top and the anti-top quarks are predicted to be correlated. This correlation has been observed experimentally by the ATLAS and CMS collaborations in proton–proton collision data at the Large Hadron Collider (LHC) at centre-of-mass energies of \(\sqrt{s}=7\) TeV [2,3,4,5] and \(\sqrt{s}=8\) TeV [6,7,8,9]. It has been also studied in proton–antiproton collisions at the Tevatron collider [10,11,12,13,14]. This paper presents measurements of spin correlation at a centre-of-mass energy of \(\sqrt{s}=13\) TeV in proton–proton collisions using the ATLAS detector and data collected in 2015 and 2016.

Due to the unstable nature of top quarks, their spin information is accessed through their decay products. However, not all decay particles carry the spin information to the same degree, with charged leptons arising from leptonically decaying W bosons carrying almost the full spin information of the parent top quark [15,16,17,18]. This feature, along with the fact that charged leptons are readily identified and reconstructed by collider experiments, means that observables to study spin correlation in \(t\bar{t}\) events are often based on the angular distributions of the charged leptons in events where both W bosons decay leptonically (referred to as the dilepton channel). The simplest observable is the absolute azimuthal opening angle between the two charged leptons [19], measured in the laboratory frame in the plane transverse to the beam line. This opening angle is denoted by \(\Delta \phi \). Non-vanishing spin correlation was observed by the ATLAS experiment using the \(\Delta \phi \) observable and \(\sqrt{s}=7\) TeV data [2]. Since that time, spin correlation in \(t\bar{t}\) pairs has been extensively studied by both ATLAS and CMS using many observables and techniques. Spin correlation measurements have also been used to search for physics beyond the Standard Model (BSM) either directly, by searching for decreases in the expected SM spin correlation induced by scalar supersymmetric top squarks (stops) [6], or indirectly by setting limits on effective field theory operators, such as the chromo-magnetic and chromo-electric dipole operators [8]. Previous measurements by ATLAS [2, 3, 6] and CMS [5, 8] using \(\Delta \phi \) show slightly stronger spin correlation than expected in the SM, but with experimental uncertainties large enough that the results are still consistent with the SM expectation. In this paper, improved Monte Carlo (MC) generators are employed relative to previous spin correlation results from ATLAS to better control the systematic uncertainties. The spin correlation is measured as a function of the invariant mass of the \(t\bar{t}\) system, as well as inclusively.

Charged-lepton observables can be used to search for the production of supersymmetric top squarks with masses close to that of the SM top quark. Such a scenario is difficult to constrain with conventional searches; however, observables such as \(\Delta \phi \) and the absolute difference between the pseudorapidities of the two charged leptons, \(\Delta \eta \), are highly sensitive in this regard. The \(\Delta \phi \) distribution was previously used in such a search by ATLAS [6] and this new paper also includes \(\Delta \eta \) for this purpose. Although this observable is only mildly sensitive to the SM spin correlation, it is sensitive to different supersymmetry (SUSY) hypotheses; the two observables are therefore used together in this paper to set limits on SUSY top squark production.

This paper is organised as follows. The ATLAS detector is described in Sect. 2. Section 3 describes the data and Monte Carlo (MC) used in the analysis and Sect. 4 describes the object definitions and event selection requirements. The unfolding procedure is described in Sect. 5 and the systematic uncertainties that are considered are described in Sect. 6. The differential cross-section results are presented in Sect. 7, the spin correlation extraction is described in Sect. 9, and the SUSY limits are presented in Sect. 9. Finally, the conclusions of the paper are summarised in Sect. 10.

2 ATLAS detector

The ATLAS detector [20] at the LHC covers nearly the entire solid angleFootnote 1 around the interaction point. It consists of an inner tracking detector surrounded by a thin superconducting solenoid, electromagnetic and hadronic calorimeters, and a muon spectrometer incorporating three large superconducting toroidal magnet systems. The inner-detector system is immersed in a 2T axial magnetic field and provides charged-particle tracking in the range \(|\eta | < 2.5\).

The high-granularity silicon pixel detector surrounds the collision region and provides four measurements per track. The innermost layer, known as the insertable B-Layer [21, 22], was added in 2014 and provides high-resolution hits at small radius to improve the tracking performance. The pixel detector is followed by the silicon microstrip tracker, which provides four three-dimensional measurement points per track. These silicon detectors are complemented by the transition radiation tracker, which enables radially extended track reconstruction up to \(|\eta | = 2.0\). The transition radiation tracker also provides electron identification information based on the number of hits (typically 30 in total) passing a higher charge threshold indicative of transition radiation.

The calorimeter system covers the pseudorapidity range \(|\eta | < 4.9\). Within the region \(|\eta |< 3.2\), electromagnetic calorimetry is provided by barrel and endcap high-granularity lead/liquid-argon (LAr) sampling calorimeters, with an additional thin LAr presampler covering \(|\eta | < 1.8\) to correct for energy loss in material upstream of the calorimeters. Hadronic calorimetry is provided by the steel/scintillator-tile calorimeter, segmented into three barrel structures within \(|\eta | < 1.7\), and two copper/LAr hadronic endcap calorimeters that cover \(1.5< |\eta | < 3.2\). The solid angle coverage is completed with forward copper/LAr and tungsten/LAr calorimeter modules optimised for electromagnetic and hadronic measurements respectively, in the region \(3.1< |\eta | < 4.9\).

The muon spectrometer comprises separate trigger and high-precision tracking chambers measuring the deflection of muons in a magnetic field generated by superconducting air-core toroids. The precision chamber system covers the region \(|\eta | < 2.7\) with three layers of monitored drift tubes, complemented by cathode strip chambers in the forward region, where the background is highest. The muon trigger system covers the range \(|\eta | < 2.4\) with resistive-plate chambers in the barrel, and thin-gap chambers in the endcap regions.

A two-level trigger system is used to select interesting events [23]. The level-1 trigger is hardware-based and uses a subset of detector information to reduce the event rate to a design value of at most 100 kHz. This is followed by the software-based high-level trigger, which reduces the event rate to around 1 kHz.

3 Data and Monte Carlo simulation

The pp collision data used in this analysis were collected during 2015 and 2016 by the ATLAS experiment at a centre-of-mass energy of \(\sqrt{s}=13\) TeV and correspond to an integrated luminosity of 36.1 \(\hbox {fb}^{-1}\). The data considered in this analysis were recorded under stable beam conditions and required all sub-detectors to be operational. Each selected event included additional interactions from, on average, 24 inelastic pp collisions in the same proton bunch crossing, as well as residual detector signals from previous and subsequent bunch crossings, collectively referred to as “pile-up”. Events were required to pass either a single-electron or single-muon trigger. Multiple triggers were used to select events: the lowest-threshold triggers utilised isolation requirements to reduce the trigger rate, and had transverse momentum (\(p_{\mathrm{T}}\)) thresholds of 24 GeV for electrons and 20 GeV for muons in 2015 data, or 26 GeV for both lepton types in 2016 data. These triggers were complemented by others with higher \(p_{\mathrm{T}}\) thresholds and no isolation requirements to increase event acceptance.

MC simulations were used to model background processes and to correct the data for detector acceptance and resolution effects. The ATLAS detector was simulated [24] using Geant 4 [25]. A faster detector simulation [24], utilising parameterised showers in the calorimeter, but with full simulation of the inner detector and muon spectrometer, was used in the samples generated to estimate certain \(t\bar{t}\) modelling uncertainties. Additional pp interactions were generated with Pythia 8 (v8.186) [26] and overlaid onto signal and background processes in order to simulate the effect of pile-up. The simulated events were weighted to match the distribution of the average number of interactions per bunch crossing that are observed in data. The same reconstruction algorithms and analysis procedures were applied to both data and MC events. Corrections derived from dedicated data samples were applied to the MC simulation to improve agreement with data.

The primary \(t\bar{t}\) sample used in this result (hereafter referred to as nominal) was simulated using the next-to-leading order (NLO) Powheg-Box (v2) matrix-element (ME) event generator [27,28,29] interfaced to Pythia 8 (v8.210) for the parton shower (PS) and fragmentation. The NNPDF3.0 NLO parton distribution function (PDF) set [30] was used in the matrix element (ME) generation and the NNPDF2.3 PDF set was used in the PS. Non-perturbative QCD effects were modelled using a set of tuned parameters called the A14 tune [31]. The “\(h_{\text {damp}}\)” parameter, which controls the \(p_{\mathrm{T}}\) of the first additional gluon emission beyond the Born configuration, was set to 1.5 times the mass of the top quark (\(m_t\)) of 172.5 GeV. The main effect of this was to regulate the high-\(p_{\mathrm{T}}\) emission against which the \(t\bar{t}\) system recoils. The choice of this \(h_{\text {damp}}\) value was found to improve the modelling of the \(t\bar{t}\) system kinematics in previous analyses [32]. The renormalisation and factorisation scales were set to \(\mu _{\text {F}} = \mu _{\text {R}} = \sqrt{(m_t^2 + p_{\text {T}}(t)^2)}\), where the \(p_{\text {T}}\) of the top quark is evaluated before radiation. The \(t\bar{t}\) contribution was normalised using the predicted cross-section, \(\sigma _{t{\bar{t}}} = 832^{+20}_{-29}\left( \mathrm {scale}\right) \pm 35\left( \mathrm {PDF}\right) ^{+23}_{-22}\left( \mathrm {mass}\right) \) pb as calculated with the Top++2.0 program at next-to-next-to-leading (NNLO) order in perturbative QCD, including soft-gluon resummation to next-to-next-to-leading-log order [33] and assuming a top quark mass of \(172.5 \pm 1.0\) GeV. The top quark mass was set to 172.5 GeV in all simulated top quark samples. An alternative \(t\bar{t}\) sample was simulated with the same settings but with the top quarks decayed using MadSpin [34] and with spin correlations between the t and \({\bar{t}}\) disabled. This sample was used, along with the nominal sample, as a template in the extraction of spin correlation, described in Sect. 8. A further Powheg +Pythia 8 sample was generated with the spin correlations enabled in MadSpin, to allow a comparison of the simulation of Powheg +Pythia 8 with and without the use of MadSpin. In order to facilitate comparisons to predictions from fixed-order calculations or from other MC generators, the primary spin correlation coefficients as measured in the nominal Powheg-Box sample, using the formalism described in Ref. [35], are: \(C(k,k)= 0.314 \pm 0.002\), C\((n,n) = 0.320 \pm 0.002\), C\((r,r) = 0.050 \pm 0.002\), under the assumption that the spin-analysing power of the leptons is equal to unity. The uncertainties quoted are purely statistical.

In order to investigate the effects of initial- and final-state radiation, an alternative Powheg-Box + Pythia 8 sample was generated with the renormalisation and factorisation scales varied by a factor of 2, using the low radiation variation of the A14 tune and an \(h_{\text {damp}}\) value of \(1.5\times m_t\), corresponding to reduced parton-shower radiation [32]. The A14 Var3c [31] tune variation corresponded to varying \(\alpha _\text {s}\), which impacts the initial-state radiation in the A14 tune, and covered the size of the other available A14 variations. In order to estimate the effect of the choice of ME event generator, a sample was generated with MadGraph5_aMC@NLO  (v2.2.1) [36], interfaced to Pythia 8. The choice of PS algorithm is evaluated using a sample generated using Powheg-Box interfaced to Herwig 7 [37]. An additional Sherpa  (v2.2.1) [38] sample was used in which events were generated with up to one additional parton simulated at NLO and two, three and four partons at LO with the CT10 [39] PDF set for comparison purposes.

Background processes were simulated using a variety of MC event generators. Single top quark production in association with a W boson (tW) was simulated at NLO using the Powheg-Box (v1) [27] ME event generator with CT10 as the PDF. It was interfaced to Pythia 6 (v6.428) [40] for the PS, fragmentation and underlying event with the CTEQ6L1 [39] NLO PDF set, and a set of tuned parameters called the Perugia 2012 tune [41]. The sample was normalised to the theoretical cross-section \(\sigma _{tW} = 71.7 \pm 1.8\left( \mathrm {scale}\right) \pm 3.4\left( \mathrm {PDF}\right) \) pb [42]. The higher-order overlap with \(t\bar{t}\) production was addressed according to the “diagram removal” (DR) generation scheme [43]. A sample generated with an alternative “diagram subtraction” (DS) method was used to evaluate systematic uncertainties [43].

Sherpa (v2.2.1) with the NNPDF3.0 PDF set was used to model Drell–Yan production. For the \(Z/\gamma ^*\rightarrow \tau ^+\tau ^-\) process, Sherpa calculated matrix elements at NLO for up to two partons and at LO for up to two additional partons using the OpenLoops [44] and Comix [45] ME event generators. The MEs were merged with the Sherpa PS [46] using the ME + PS@NLO prescription [38]. The simulation was normalised using the total cross-section from NNLO predictions [47].

Electroweak diboson production [48], with both bosons decaying leptonically, was simulated with the same Sherpa version and PDF settings as Drell–Yan production. Sherpa calculated the MEs for diboson samples at NLO for zero or one additional partons and at LO for two to three additional partons. The Sherpa PS was used for all parton multiplicities of four or more. The number of simulated events was normalised using the cross-section computed by the event generator. Electroweak and loop-induced diboson processes were simulated using Sherpa (v2.1.1) [38, 49] with the CT10 PDF set.

Events with \(t\bar{t}\) production in association with a vector boson or a Higgs boson were simulated using MadGraph5_aMC@NLO + Pythia 8 [50], using the NNPDF2.3 PDF set and the A14 tune, as described in Ref. [51]. The t-channel production of a single top quark in association with a Z boson (tZ) was generated using MadGraph5_aMC@NLO interfaced with Pythia 6 [40] with the CTEQ6L1 PDF [52] set and the Perugia 2012 tune [41]. The tW channel production of a single top quark together with a Z boson (tWZ) was generated with MadGraph5_aMC@NLO and showered with Pythia 8, using the PDF set NNPDF3.0NLO and the A14 tune. The production of \(t\bar{t}\)WW and \(t\bar{t}t\bar{t}\) were simulated at LO using MadGraph5_aMC@NLO + Pythia 8, using the NNPDF2.3 PDF set and the A14 tune.

EvtGen (v1.2.0) [53] was used for the heavy-flavour hadron decays in all samples, with the exception of Sherpa, which performed these decays internally.

Backgrounds also arise from events containing one prompt lepton from the decay of a W or Z boson and either a non-prompt lepton or a particle misidentified as a lepton. These “fake leptons” can arise from heavy-flavour hadron decays, photon conversions, jet misidentification or light-meson decays, and were estimated using MC simulations. The history of the stable particles in the generator-level record was used to identify fake leptons from these processes. The majority (\({\sim }\)90%) of events containing a fake lepton originated from the single-lepton \(t\bar{t}\) process, with smaller contributions arising from W boson production in association with jets, t-channel single top quark production, and \(t\bar{t}\) production in association with a vector boson. Sherpa (v2.2.1) with the NNPDF3.0 PDF set was used to simulate W boson production in association with jets. The t-channel single-top quark process was generated using Powheg-Box v1 + Pythia 6 with the same parameters and PDF sets as those used for the tW sample. Other possible processes with fake leptons, such as multi-jet and Drell–Yan production, were negligible for the event selection used in this analysis. The fake-lepton contribution derived from MC simulation was verified using a same-charge lepton control region in the data; the MC distributions were scaled up by a small amount as a consequence.

Fully simulated samples involving the SUSY decays \({\tilde{t}}\rightarrow t {\tilde{\chi }}^0_1\) with left-handed top squarks were generated using MadGraph5_aMC@NLO + Pythia 8 interfaced to EvtGen and MadSpin, with the A14 tune and the LO PDF set NNPDF2.3. The samples contained dilepton \(e\mu \) final states only, and covered a range of \(170.0< m({{\tilde{t}}}) < 300.0\) GeV and \(0.5< m({{\tilde{\chi }}^0_1}) < 142.5\) GeV. The top quark mass was set to 172.5 GeV but was allowed to be off-shell by  \(2\cdot \Gamma _{t}\) and therefore decays of top squarks to top quarks with a mass of 170 GeV were permitted.

4 Event selection and reconstruction

4.1 Object and event selection

This analysis utilises reconstructed electrons, muons, jets, and missing transverse momentum. Jets are reconstructed with the anti-\(k_t\) algorithm [54, 55], using a radius parameter of \(R = 0.4\), from topological clusters of energy deposits in the calorimeters [56]. Jets are accepted within the range \(p_{\mathrm{T}} > 25\) GeV and \(|\eta |<2.5\) and are calibrated using simulation with corrections derived from data [57]. Jets likely to originate from pile-up are suppressed using a multivariate jet-vertex-tagger (JVT) [58] for candidates with \(p_{\mathrm{T}} < 60\) GeV and \(|\eta | < 2.4\). Additionally, pile-up effects on all jets are corrected using a jet area method [57, 59]. Jets are identified as containing b-hadrons using a multivariate discriminant [60], which uses track impact parameters, track invariant mass, track multiplicity, and secondary vertex information to discriminate b-jets from light-quark or gluon jets (light jets). The average b-tagging efficiency is 77%, with a purity of 95% for b-tagged jets in simulated dileptonic \(t\bar{t}\) events with the selection used in this analysis.

Electron candidates are identified by matching an inner-detector track to an isolated energy deposit in the electromagnetic calorimeter, within the fiducial region of transverse momentum \(p_{\mathrm{T}} > 25\) GeV and \(|\eta | < 2.47\). Electron candidates are excluded if the pseudorapidity of the calorimeter cluster is within the transition region between the barrel and the endcap of the electromagnetic calorimeter, \(1.37< |\eta | < 1.52\). Electrons are selected using a multivariate algorithm and are required to satisfy a Tight likelihood-based quality criterion in order to provide high efficiency and good rejection of fake electrons [61]. Electron candidates must have tracks that pass the requirements of transverse impact parameter significance with respect to the primary vertexFootnote 2 \(|d_0^\text {sig}|<5\) and longitudinal impact parameter \(|z_0 \sin \theta | < 0.5\) mm. Electrons must pass \(p_{\mathrm{T}}\)- and \(\eta \)-dependent isolation requirements based on inner-detector tracks and topological clusters in the calorimeter. These requirements have an efficiency of 95% for an electron \(p_{\mathrm{T}}\) of 25 GeV and 99% for an electron \(p_{\mathrm{T}}\) above 60 GeV, when determined in simulated \(Z\rightarrow e^{+}e^{-}\) events.

Electrons that share a track with a muon are discarded. Double counting of electron energy deposits as jets is prevented by removing the closest jet within \(\Delta R=0.2\) of a reconstructed electron. Following this, the electron is discarded if a jet exists within \(\Delta R = 0.4\) of the electron to ensure sufficient separation from nearby jet activity, where in this case \(\Delta R\) was calculated using the rapidity of the jets.

Muon candidates are identified from muon-spectrometer tracks that match tracks in the inner detector, with \(p_{\mathrm{T}} > 25\) GeV and \(|\eta | < 2.5\) [62]. The tracks of muon candidates are required to have a transverse impact parameter significance \(|d_0^\text {sig}|<3\) and a longitudinal impact parameter \(|z_0 \sin \theta | < 0.5\) mm. Muons must satisfy quality criteria and isolation requirements based on inner-detector tracks and topological clusters in the calorimeter which depend on \(\eta \) and \(p_{\mathrm{T}}\). These requirements reduce the contributions from fake muons and provide the same efficiency as for electrons. The criteria used for the muons in this analysis is the Medium working point. Muons may leave energy deposits in the calorimeter that could be misidentified as a jet, so jets with fewer than three associated tracks are removed if they are within \(\Delta R = 0.4\) of a muon. Muons are discarded if they are separated from the nearest jet by \(\Delta R < 0.4\) to reduce the background from muons from heavy-flavour hadron decays inside jets.

The missing transverse momentum (with magnitude \(E_{\mathrm{T}}^{\mathrm{miss}}\)) is defined as the negative vector sum of the transverse momenta of reconstructed, calibrated objects in the event. It is computed using calibrated electrons, muons, and jets [63] and includes contributions from soft tracks associated with the primary vertex but not forming the lepton or jet candidates. The primary vertex of an event is defined as the vertex for which the associated tracks have the highest sum of \(p_{\mathrm{T}}^2\), where each track has \(p_{\mathrm{T}} > 400\) MeV.

Two types of signal events are considered, depending on whether a full reconstruction of the \(t{\bar{t}}\) system is performed, denoted here as inclusive and reconstructed selections. The inclusive selection is used for the \(\Delta \phi \) and \(\Delta \eta \) differential cross-sections. It is defined by requiring exactly one electron and one muon of opposite electric charge, where at least one of them has \(p_{\mathrm{T}} > 27\) GeV, and at least two jets, at least one of which must be b-tagged. The reconstructed selection is used for the measurement of \(\Delta \phi \) as a function of the \(t\bar{t}\) invariant mass. It has a more stringent b-tagging requirement of at least two b-tagged jets and also requires that at least one solution was found for the reconstruction of the \(t\bar{t}\) system (described in detail later in this section). The tighter b-tagging requirement is imposed in the reconstructed selection to improve the performance of the \(t{\bar{t}}\) reconstruction by removing light jets that are erroneously assigned to the top-quark or top-antiquark decay. A less strict b-tagging selection requirement of only one or more b-tagged jets is used in the inclusive selection in order to increase the event selection efficiency. Only events with exactly one electron and one muon are considered as this decay mode provides the highest signal purity as well as more than sufficient data statistics. The dielectron and dimuon decay modes are not considered due to their enhanced Drell–Yan and heavy flavour backgrounds, while the increase in statistical power would not improve the overall uncertainty on the results.

Using the inclusive selection, 93% of selected events are expected to be \(t\bar{t}\) events. The other processes that pass the signal selection are Drell–Yan (\(Z/\gamma ^{*}\rightarrow \tau ^{+}\tau ^{-}\)), diboson, single top quark (tW) production, boson production in association with a \(t{\bar{t}}\) pair (\(t{\bar{t}}V\) and others), and fake-lepton events. The reconstructed selection gives a subset of these events, in which 96% of selected events are expected to be \(t\bar{t}\) events. This is higher than the inclusive selection because of the tighter b-tagging requirement and because the \(t\bar{t}\) reconstruction procedure tends to succeed more often for \(t\bar{t}\) events than for background processes.

The event yields after both selections are listed in Table 1. The expected yields are in agreement with the observed number of events in both cases. Distributions of the lepton and jet \(p_{\mathrm{T}}\) and \(E_{\mathrm{T}}^{\mathrm{miss}}\) are shown in Fig. 1 for the inclusive selection. The data and prediction agree within the total uncertainty for all of these kinematic observables. The trends observed in the lepton and jet \(p_{\mathrm{T}}\) arise from the well-documented limitations of the modelling of the top quark’s \(p_{\mathrm{T}}\) spectrum at NLO [64,65,66]. The systematic uncertainties included in both the table and the figures are described in Sect. 6. The azimuthal opening angle of the electron and muon, \(\Delta \phi \), and the absolute value of the separation of the leptons in pseudorapidity, \(\Delta \eta \), are shown in Fig. 2 for the inclusive selection. The observed distribution is compared to the sum of signal and background using three different signal models: Powheg +Pythia 8, Powheg +Herwig 7, and MadGraph5_aMC@NLO +Pythia 8, and the ratio panel compares the combined signal plus background to data for the three models.

Table 1 Event yields in the inclusive and reconstructed selections for the observed data, expected signal and expected background. The uncertainties quoted include contributions from leptons, jets, missing transverse momentum, luminosity, background modelling, and pile-up modelling. They do not include uncertainties from PDF or signal \(t\bar{t}\) modelling. The “\(t{\bar{t}}V\) and others” entries contain events from \(t{\bar{t}}Z\), \(t{\bar{t}}W\), \(t{\bar{t}}WW\), \(t{\bar{t}}H\), and the \(t{\bar{t}}t{\bar{t}}\) processes
Fig. 1
figure 1

Kinematic distributions for the a electron \(p_{\mathrm{T}}\), b muon \(p_{\mathrm{T}}\), c leading b-jet \(p_{\mathrm{T}}\), and d \(E_{\mathrm{T}}^{\mathrm{miss}}\) for the \(e^{\pm }\mu ^{\mp }\) inclusive selection. In all figures, the rightmost bin also contains events that are above the x-axis range. The dark uncertainty bands in the ratio plots represent the statistical uncertainties while the light uncertainty bands represent the statistical and systematic uncertainties added in quadrature. The systematic uncertainties include contributions from leptons, jets, missing transverse momentum, background modelling, pile-up modelling and luminosity, but not PDF or signal \(t\bar{t}\) modelling uncertainties. The observed distribution is compared to the sum of signal and background using three different \(t\bar{t}\) signal models: Powheg +Pythia 8, Powheg +Herwig 7 and MadGraph5_aMC@NLO +Pythia 8, and the ratio panel compares the summed prediction to data for the three models

Fig. 2
figure 2

Distribution of a the \(\Delta \phi \) and b \(\Delta \eta \) observables for the \(e\mu \) selection after the requirement of at least one b-tagged jet (inclusive selection). The highest bin for \(\Delta \eta \) also contains events that are above the x-axis range. The dark uncertainty bands in the ratio plots represent the statistical uncertainties while the light uncertainty bands represent the statistical and systematic uncertainties added in quadrature. The systematic uncertainties include contributions from leptons, jets, missing transverse momentum, background modelling, pile-up modelling and luminosity, but not PDF or signal \(t\bar{t}\) modelling uncertainties. The observed distribution is compared to the sum of signal and background using three different \(t\bar{t}\) signal models: Powheg +Pythia 8, Powheg +Herwig 7 and MadGraph5_aMC@NLO +Pythia 8, and the ratio panel compares the summed prediction to data for the three models

4.2 Reconstruction of the \(t\bar{t}\) system

In order to measure spin correlations as a function of the \(t\bar{t}\) invariant mass at detector level, the kinematic properties of the event must be reconstructed from the identified leptons, jets, and missing transverse momentum. The top quark, top antiquark, and reconstructed \(t\bar{t}\) system are built using the Neutrino Weighting (NW) method [67]. While the individual four-momenta of the two neutrinos in the final state are not directly measured in the detector, the sum of their transverse momenta is measured as \(E_{\mathrm{T}}^{\mathrm{miss}}\). The absence of the measured four-momenta of the two neutrinos leads to an under-constrained system that cannot be solved analytically. The following invariant mass constraints were applied to each event:

$$\begin{aligned} \begin{aligned} (\ell _{1,2} + \nu _{1,2})^2&= m^2_W = (80.4~\text {GeV})^2\text {,} \\ (\ell _{1,2} + \nu _{1,2} + b_{1,2})^2&= m_t^2 = (172.5~\text {GeV})^2\text {,}\\ \end{aligned} \end{aligned}$$
(1)

where \(\ell _{1,2}\), \(\nu _{1,2}\) and \(b_{1,2}\) represent the four-momenta of the charged leptons, neutrinos and b-quarks, respectively. Since the neutrino pseudorapidities (\(\eta (\nu )\) and \(\eta ({\bar{\nu }}\))) required for \(\nu _{1,2}\) are unknown, their values are scanned, in steps of 0.2, between \(-5\) and 5.

With the assumptions about \(m_t\), \(m_W\) and values for \(\eta (\nu )\) and \(\eta ({\bar{\nu }})\), Eq. (1) can now be solved, leading to two possible solutions for each assumption of \(\eta (\nu )\) and \(\eta ({\bar{\nu }})\). Only real solutions without an imaginary component are considered. An “inferred” \(E_{\mathrm{T}}^{\mathrm{miss}}\) value, resulting from the neutrinos for each solution, is compared to the \(E_{\mathrm{T}}^{\mathrm{miss}}\) observed in the event. A weight is introduced in order to quantify this agreement:

$$\begin{aligned} w = \exp \Bigg (\frac{-\Delta E^2_x}{2\sigma _x^2}\Bigg ) \cdot \exp \Bigg ( \frac{-\Delta E^2_y}{2\sigma ^2_y} \Bigg ) \text {,} \end{aligned}$$

where \(\Delta E_{x,y}\) is the difference between the (x,y) component of the missing transverse momentum computed from the neutrino four momenta in Eq. (1) and the observed missing transverse momentum, and \(\sigma _{x,y}\) is a fixed scale related to the resolution of the observed \(E_{\mathrm{T}}^{\mathrm{miss}}\) in the detector in (xy), based on studies in Z boson events [63]. The assumption for \(\eta (\nu )\) and \(\eta ({\bar{\nu }})\) that gives the highest weight is used to reconstruct the t and \({\bar{t}}\) quarks for that event.

In each event, there may be more than two b-tagged jets (on average there are 2.04 b-tagged jets per event) and therefore several possible combinations of jets to use in the kinematic reconstruction. In addition, there is an ambiguity in assigning a jet to the t or \({\bar{t}}\) quark candidate. To reduce this ambiguity, the two b-tagged jets with the highest weight from the b-tagging algorithm are used to reconstruct the t and \({\bar{t}}\) quarks and the assignment which produces the solution with highest weight in the NW is taken as the correct assignment.

Equation (1) cannot always be solved for a particular assumption of \(\eta (\nu )\) and \(\eta ({\bar{\nu }})\). This can be caused by mis-assignment of the input objects or through mis-measurement of the input object four-momenta. It is also possible that the assumed \(m_t\) is sufficiently different from the true value to prevent a valid solution for a particular event, or the event is from a background process, and therefore cannot be solved. To mitigate these effects, the assumed value of \(m_t\) is scanned between the values of 171 and 174 GeV, in steps of 0.5 GeV, and the \(p_{\mathrm{T}}\) of the measured jets are smeared using a Gaussian function with a \(p_{\mathrm{T}}\)-dependent width between 14% and 8% of their measured \(p_{\mathrm{T}}\). This smearing is repeated 5 times.

This procedure allows the NW algorithm to shift the four-momenta of the two jets and the \(m_t\) hypothesis to see if a solution can be found. The solution which produces the highest w gives the kinematics of the reconstructed event. Solutions which provide an invariant mass of the \(t\bar{t}\) system below 300 GeV, or which provide t or \({\bar{t}}\) quarks with negative energies, are rejected. For around 5% of events, no solution can be found, even after smearing. Only events with at least one solution with a weight above 0.4 are considered, where this criterion was chosen to optimise the angular resolution in the top quark reconstruction. The efficiency for \(t\bar{t}\) reconstruction is \({\sim }\)80%. Due to the implicit assumptions about \(m_t\) and \(m_W\), the reconstruction efficiency found in simulated background samples is much lower (\({\sim }\)60% for tW and Drell–Yan processes) and leads to a suppression of background events. Table 1 shows the event yields before and after reconstruction in the signal region. The different effects of the systematic uncertainties on each type of selection are discussed in greater detail in Sect. 7.

Figure 3 shows the distributions of \(\Delta \phi \) and \(m_{t\bar{t}}\) after reconstruction and with a requirement of at least two b-tagged jets (reconstructed selection). The four plots in Fig. 4 show the \(\Delta \phi \) distribution split into four mass regions: \(m_{t\bar{t}}<450\) GeV; \(450\le m_{t\bar{t}}<550\) GeV; \(550\le m_{t\bar{t}}<800\) GeV; and \(m_{t\bar{t}}\ge 800\) GeV. These bins in \(m_{t\bar{t}}\) were determined to have the finest possible granularity whilst maintaining an unbiased and stable unfolding procedure for the \(\Delta \phi \) observable (described further in Sect. 5).

Fig. 3
figure 3

Kinematic distributions for a \(\Delta \phi \) and b \(m_{t\bar{t}}\) after the requirement of at least two b-tagged jets and Neutrino Weighting (reconstructed selection). The highest bin in Fig. 3b also contains events that are above the x-axis range. The dark uncertainty bands in the ratio plots represent the statistical uncertainties while the light uncertainty bands represent the statistical and systematic uncertainties added in quadrature. The systematic uncertainties include contributions from leptons, jets, missing transverse momentum, background modelling, pile-up modelling and luminosity, but not PDF or signal \(t\bar{t}\) modelling uncertainties. The observed distribution is compared to the sum of signal and background using three different \(t\bar{t}\) signal models: Powheg +Pythia 8, Powheg +Herwig 7, and MadGraph5_aMC@NLO +Pythia 8, and the ratio panel compares the summed prediction to data for the three models

Fig. 4
figure 4

Kinematic distributions after the requirement of at least two b-tagged jets and Neutrino Weighting (reconstructed selection). The plots display \(\Delta \phi /\pi \) in individual mass ranges: a \(m_{t\bar{t}}<450\) GeV, b \(450\le m_{t\bar{t}}<550\) GeV, c \(550\le m_{t\bar{t}}<800\) GeV, and d \(m_{t\bar{t}}\ge 800\) GeV. The dark uncertainty bands in the ratio plots represent the statistical uncertainties while the light uncertainty bands represent the statistical and systematic uncertainties added in quadrature. The systematic uncertainties include contributions from leptons, jets, missing transverse momentum, background modelling, pile-up modelling and luminosity, but not PDF or signal \(t\bar{t}\) modelling uncertainties. The observed distribution is compared to the sum of signal and background using three different \(t\bar{t}\) signal models: Powheg +Pythia 8, Powheg +Herwig 7 and MadGraph5_aMC@NLO +Pythia 8, and the ratio panel compares the summed prediction to data for the three models

4.3 Definitions of partons and particles

In the measurements presented in this paper, events are corrected for detector effects using two definitions of particles in the generator-level record of the simulation: parton level and particle level. Parton-level objects are taken from the MC simulation history. Top quarks are taken after radiation but before decay (this is the last top quark in a decay chain) whereas leptons are taken before radiation (i.e. Born level leptons). The measurement corrected to parton level is extrapolated to the full phase-space, where all generated dilepton events are considered. However, events with leptons originating from an intermediate \(\tau \)-lepton in the \(t \rightarrow bW \rightarrow b \ell \nu \) decay chain are not considered as their subsequent decays do not carry the full spin information of their parent top quark and hence, dilute the spin correlation information. Fiducial requirements are not made on the partonic objects so that the results at parton level can be more easily compared to fixed-order predictions.

Particle-level objects are constructed using a procedure intended to correspond as closely as possible to the detector-level object and event selection. Only objects in the MC simulation considered stable (with lifetimes longer than \(3 \times 10^{-11}\) s) in the generator-level information are used. Particle-level leptons are identified as those originating from a W boson decay. The four-momentum of each electron or muon is summed with the four-momenta of all radiated photons within a cone of size \(\Delta R = 0.1\) about its direction, excluding photons from hadron decays. The resulting leptons are required to have \(p_{\mathrm{T}} > 25\) GeV and \(|\eta |<2.5\). Particle-level jets are constructed using stable particles, with the exception of selected particle-level electrons and muons, photons that are summed into the electrons or muons, and particle-level neutrinos originating from W boson decays. The jets are constructed using the anti-\(k_t\) algorithm with a radius parameter of \(R=0.4\), and selected if they pass the requirements of \(p_{\mathrm{T}}>25\) GeV and \(|\eta |<2.5\). Intermediate b-hadrons in the MC decay chain history are clustered in the stable-particle jets with their energies set to zero. If, after clustering, a particle-level jet contains one or more of these “ghost” b-hadrons, the jet is said to have originated from a b-quark. This technique is referred to as “ghost matching” [59]. Particle-level \(E_{\mathrm{T}}^{\mathrm{miss}}\) is calculated using the vector transverse-momentum sum of all neutrinos in the event, excluding those originating from hadron decays, either directly or via a \(\tau \)-lepton.

Events are selected at the particle level in a fiducial phase-space region with similar requirements to the phase-space region in the detector. They must contain exactly one particle-level electron and one particle-level muon of opposite electric charge, at least one of which must have \(p_{\mathrm{T}} > 27\) GeV, and at least two particle-level jets. The particle-level requirement on the number of jets that must be ghost-matched to a b-hadron mimics the inclusive and reconstructed selections at detector-level: for the inclusive selection, at least one particle-level jet must be ghost-matched, while for the reconstructed case, the particle-level selection requires exactly two ghost-matched jets. In addition, the reconstructed selection excludes particle-level leptons originating from an intermediate \(\tau \)-lepton in the \(t \rightarrow bW \rightarrow b \ell \nu \) decay chain. The particle-level \(t\bar{t}\) object is constructed using the sum of the particle-level electron and muon, the two ghost-matched jets, and the two neutrinos that originate from the same W boson decays as the selected particle-level leptons.

5 Unfolding procedure

The data are corrected for detector resolution and acceptance effects using an iterative Bayesian unfolding procedure [68] in order to create distributions at particle (parton) level in a fiducial (full) phase-space. The unfolding itself is performed using the RooUnfold package [69].

In the unfolding procedure, background-subtracted data are corrected for detector acceptance and resolution effects as well as for the efficiency to pass the event selection requirements in order to obtain the absolute differential cross-sections:

$$\begin{aligned} \frac{\text {d}\sigma _{t\bar{t}}}{\text {d}X^i} = \frac{1}{{\mathcal {L}} \cdot \Delta X^i \cdot \epsilon ^i_{\text {eff}}} \cdot \sum _{j} R^{-1}_{ij} \cdot f^j_{\text {acc}} \cdot (N^j_{\text {obs}}-N^j_{\text {bkg}}) \text {,} \end{aligned}$$

where j is the index for bins of observable X at detector level and i labels the bins at particle or parton level. \(\Delta X^i\) is the width of bin i, \(N^j_{\text {obs}}\) is the number of observed events in data in bin j, \({\mathcal {L}}\) is the integrated luminosity, \(N^j_{\text {bkg}}\) is the estimated number of background events in bin j, R is the response matrix and \(R^{-1}_{ij}\) symbolises the effective inversion of R in the Bayesian unfolding. The acceptance correction \(f^j_{\text {acc}}\) accounts for events that are outside the fiducial phase-space but pass the detector-level selection. The efficiency correction \(\epsilon ^i_{\text {eff}}\) corrects for events that are in the fiducial phase-space but are not reconstructed in the detector.

The fiducial differential cross-sections are divided by the measured total cross-section, obtained by integrating over all bins in the differential distribution, in order to obtain the normalised differential cross-sections. The response matrix, R, describes the detector response and is determined by mapping the bin-to-bin migration of events from particle or parton level to detector level in the nominal \(t\bar{t}\) MC simulation. Figures 5a, b illustrate the response matrices that are used for the single-differential \(\Delta \phi \) and \(\Delta \eta \) observables at parton level. Each response matrix is normalised such that the sum of entries in each row is equal to one. The values represent the fraction of events at either particle or parton level in bin i that are reconstructed in bin j at detector level. Figure 5c shows the response matrix for the double-differential distribution of \(\Delta \phi \) as a function of \(m_{t\bar{t}}\) at parton level. The \(\Delta \phi \) distributions for each \(m_{t\bar{t}}\) region are concatenated into a single one-dimensional distribution, such that the response matrix takes into account the migrations between different \(m_{t\bar{t}}\) regions. As can be observed in the figure, the \(\Delta \phi \) observable is diagonal in each region, with the majority of the off-diagonal smearing occurring due to the resolution of the \(m_{t\bar{t}}\) observable.

The binning for each observable is chosen in order to minimise the effect of statistical fluctuations in the data as well as in the alternative \(t\bar{t}\) samples which are used in the systematic prescription (and are a dominant source of systematic uncertainty), as well as to account for the experimental resolution. The size of the chosen bins is usually much larger than the detector resolution on the \(\Delta \phi \) observable, which is illustrated by the highly diagonal response matrices in the inclusive selection. In contrast, the resolution of the reconstructed \(m_{t\bar{t}}\) observable is significantly larger and so the binning here is chosen to be the smallest possible binning that reproduces the underlying truth-level distribution without bias, when measured using MC pseudo-experiments.

The stability of the unfolding procedure is determined by constructing pseudo-data sets by randomly sampling events from the nominal \(t\bar{t}\) MC sample with approximately the same statistical power as the expected data. Pull tests are performed as part of the binning optimisation and are therefore always successful for the chosen observable bins. In addition, the unfolding procedure is tested to see how it responds to various stresses introduced into the pseudo-data. Three such stresses are investigated: introducing linear slopes in the observables, the difference between the spin correlated and uncorrelated MC samples, and the observed difference between data and the expectation at detector level. In all cases, the unfolding procedure is able to correct the pseudo-data back to their underlying truth spectra and so a systematic uncertainty for the unfolding procedure is not included.

The number of iterations used in the iterative Bayesian unfolding is also optimised using pseudo-experiments. Iterations are performed until the \(\chi ^2\) per degree-of-freedom, calculated by comparing the unfolded pseudo-data to the corresponding generator-level distribution for that pseudo-data set, is less than or equal to unity. For the inclusive observables (\(\Delta \phi \) and \(\Delta \eta \)), the optimal number of iterations is determined to be two, whereas for the reconstructed observable (\(\Delta \phi \) in bins of \(m_{t{\bar{t}}}\)), the optimal number of iterations is determined to be four. All distributions are unfolded to the particle level and to the parton level.

Fig. 5
figure 5

Parton-level response matrices, normalised by row and shown as percentages, for: a \(\Delta \phi \), b \(\Delta \eta \), and c \(\Delta \phi \) as a function of \(m_{t{\bar{t}}}\), after Neutrino Weighting. For (c), the binning on the horizontal and vertical axes is identical, with each invariant mass region subdivided into \(\Delta \phi \) bins. The dotted lines separate different invariant mass regions, while the tick marks indicate the \(\Delta \phi \) bins

6 Systematic uncertainties

The measured differential cross-sections are affected by systematic uncertainties arising from detector response, signal modelling, and background modelling. The contributions from various sources of uncertainty are described in this section. These individual systematic uncertainties are summed in quadrature to obtain the total systematic uncertainty, and the overall uncertainty is calculated by summing the systematic and statistical uncertainties in quadrature.

6.1 Signal modelling uncertainties

The following four systematic uncertainties related to the modelling of the \(t\bar{t}\) system in the MC generators are considered: the choice of matrix-element generator, the hadronisation and parton-shower model, the amount of initial- and final-state radiation, and the choice of PDF set. In each case (except for the PDF uncertainty), alternative MC samples are unfolded with the nominal \(t\bar{t}\) MC response and the difference to their generator-level spectra is taken as the systematic uncertainty. A fast detector simulation (described in Sect. 3) is used for each of the alternative models and for the response matrix, rather than the full detector simulation used in the nominal unfolding procedure. In most cases, the resulting systematic shift is used to define a symmetric uncertainty, where deviations from the generator-level spectra are also considered to be mirrored in the opposite direction, resulting in equal and opposite symmetric uncertainties (called symmetrising).

The choice of NLO ME generator affects the invariant mass of the simulated \(t\bar{t}\) events, the observables themselves, and the reconstruction efficiencies. To estimate this uncertainty, MadGraph5_aMC@NLO  (with Pythia 8 for the parton-shower simulation) is used, applying the nominal unfolding procedure based on the Powheg-Box+Pythia 8 \(t\bar{t}\) sample. The resulting uncertainty is symmetrised.

To evaluate the uncertainty arising from the choice of parton-shower algorithm and the hadronisation model, the alternative sample generated with Powheg -Box + Herwig 7 is unfolded with the nominal \(t\bar{t}\) MC response. The resulting uncertainty is symmetrised.

The uncertainty arising from initial- and final-state radiation is evaluated using the reduced radiation sample of Powheg-Box + Pythia 8, and is again symmetrised. An enhanced radiation sample was also investigated as this has been used in previous similar analyses. However, it was found to markedly disagree with the data and is therefore not used here.

The uncertainty due to the choice of PDF set is evaluated using the PDF4LHC15 prescription [70], utilising 30 eigenvector shifts derived from fits to multiple NLO PDF sets. Each shift is evaluated for each bin added in quadrature and the resulting uncertainty in each bin is symmetrised.

6.2 Background modelling uncertainties

The uncertainties in the background processes are assessed by repeating the full analysis using pseudo-data sets and by varying the background predictions by one standard deviation of their nominal values. The difference between the nominal pseudo-data set result and the shifted result is taken as the systematic uncertainty, then the separate background uncertainties are combined in quadrature.

Each background prediction has an uncertainty associated with its theoretical cross-section. The cross-section for the tW process is varied by ±5.3% [42], the diboson cross-section is varied by ±6%, and the Drell–Yan \(Z/\gamma ^*\rightarrow \tau ^+\tau ^-\) background cross-section is varied by ±5% based on studies of different MC generators. Uncertainties on the remaining SM backgrounds are taken to be \(13\%\) for \(t\bar{t} V\) [36, 71], \(^{+6.8}_{-9.9}\%\) for \(t\bar{t} H\) [72], \(^{+10}_{-28}\%\) for tWZ and \(\pm 50\%\) for tZ, \(t\bar{t} WW\) and \(t\bar{t}t\bar{t}\) [73].

An additional scaling factor and uncertainty of \(1.07\pm 0.12\) is assigned to the \(Z/\gamma ^*\) background, based on a comparison of data and MC simulation in a region enriched in \(Z\rightarrow \ell ^{+}\ell ^{-}\) decays in association with b-jets.

A 40% uncertainty is assigned to the normalisation of the fake-lepton background based on comparisons between data and MC simulation in a fake-dominated control region, which is selected in the same way as the \(t\bar{t}\) signal region but the leptons are required to have same-sign electric charges. An additional uncertainty is included, to account for slight differences in shapes between the data-driven and MC estimates in \(\Delta \phi (\ell ^+,\ell ^-)\) and \(\Delta \eta (\ell ^+,\ell ^-)\).

An additional uncertainty is evaluated for the tW process by replacing the nominal DR sample with a DS sample, as discussed in Sect. 3, and taking the difference between the two as the systematic uncertainty. Other background process uncertainties are found to be insignificant and are not discussed further.

6.3 Detector modelling uncertainties

Systematic uncertainties due to the modelling of the detector response affect the signal reconstruction efficiency, the unfolding procedure, and the background estimation. In order to evaluate their impact, the full analysis is repeated with variations of the detector modelling and the difference between the nominal and the shifted results is taken as the systematic uncertainty.

The uncertainties due to lepton isolation, trigger, identification, and reconstruction requirements are evaluated in data using a tag-and-probe method in events with a leptonically decaying Z boson [61, 62].

The jet energy scale uncertainty is assessed in data [57], using simulation-based corrections and in situ techniques based on jets, photons and Z bosons. A 21-component breakdown of the uncertainty is used, with contributions from pile-up, jet flavour composition, single-particle response, and punch-through. The jet energy resolution uncertainty is parametrised as a function of jet \(p _{\mathrm{T}}\) and rapidity [74].

Uncertainties related to the b-jet tagging procedure, summarised under “b-tagging,” are determined separately for b-jets, c-jets and light-jets using a 27-component breakdown (6 for b-jets, 3 for c-jets, 16 for light-jets, and two extrapolation uncertainties) [60, 75, 76]. These uncertainties account for differences between data and simulation.

The systematic uncertainty due to the track-based terms (i.e. those tracks not associated with other reconstructed objects such as leptons and jets) used in the calculation of \(E_{\mathrm{T}}^{\mathrm{miss}}\) is evaluated by comparing the \(E_{\mathrm{T}}^{\mathrm{miss}}\) in \(Z\rightarrow \mu \mu \) events, which do not contain prompt neutrinos from the hard process, using different generators. Uncertainties associated with energy scales and resolutions of leptons and jets are propagated to the \(E_{\mathrm{T}}^{\mathrm{miss}}\) calculation [63].

The uncertainty in the combined 2015+2016 integrated luminosity is 2.1%. It is derived, following a methodology similar to that detailed in Ref. [77], and using the LUCID-2 detector for the baseline luminosity measurements [78], from calibration of the luminosity scale using \(x-y\) beam–separation scans. The uncertainty in the reweighting of the MC pile-up distribution to match the data is evaluated according to the uncertainty on the average number of interactions per bunch crossing.

7 Differential cross-section results

The absolute and normalised parton-level cross-sections for \(\Delta \phi \) and \(\Delta \eta \) are presented in Table 2 and Table 3. These results are compared to several NLO MC generators interfaced to parton showers (described in Sect. 3) in Fig. 6 and the breakdown of the contributions to the systematic uncertainties are shown in Fig. 7. In each case, the total generator cross-section was normalised to the NNLO values described in Sect. 3.

All uncertainties that are normalisation effects but which do not cause large changes in the shape of the observable (luminosity, for example) cancel when performing the normalised cross-sections. Jet and pile-up effects are also significant, but only in the absolute cross-sections. Overall, reasonable agreement is observed in the inclusive cross-section between the data and MC predictions but significant shape effects are apparent, particularly in the normalised observables where the uncertainties are small. Ignoring the differences in the absolute fiducial cross-sections between different MC generators, the shapes predicted by different generators are fairly consistent, except perhaps at very high \(\Delta \eta \). In the \(\Delta \phi \) observable, an obvious trend is observed, with the data tending to be higher than the expectation at low \(\Delta \phi \) and lower than the expectation at high \(\Delta \phi \). For \(\Delta \eta \), the data and expectation agree well at low values, even in the normalised cross-sections, but there is a slight tension at higher values.

Table 2 Summary of the parton-level absolute and normalised differential cross-sections as a function of \(\Delta \phi (\ell ^{+},\ell ^{-})\), with statistical and systematic uncertainties in each bin
Table 3 Summary of the parton-level absolute and normalised differential cross-sections as a function of \(\Delta \eta (\ell ^{+},\ell ^{-})\), with statistical and systematic uncertainties in each bin
Fig. 6
figure 6

The parton-level differential cross-sections compared to predictions from Powheg, MadGraph5_aMC@NLO and Sherpa: (top), absolute a \(\Delta \phi \) and b \(\Delta \eta \) and (bottom), normalised c \(\Delta \phi \) and d \(\Delta \eta \), using the inclusive selection

Fig. 7
figure 7

Systematic uncertainties for the parton-level differential cross-sections: (top), absolute a \(\Delta \phi \) and b \(\Delta \eta \) and (bottom), normalised c \(\Delta \phi \) and d \(\Delta \eta \). The \(t\bar{t}\) modelling uncertainties refer to the contributions from the NLO matrix-element generator (“Generator”), the PS algorithm (“Shower”) and the variation of initial- and final-state radiation (“Radiation”)

The unfolded, normalised parton-level cross-sections for \(\Delta \phi \) in four \(t\bar{t}\) invariant mass bins are shown in Table 4. They are compared with different NLO ME generators and parton showers in Fig. 8 and the systematic uncertainties are illustrated in Fig. 9. Each differential cross-section is normalised within its \(m_{t\bar{t}}\) range. In all regions of invariant mass, the systematic uncertainties arising from the modelling of the \(t\bar{t}\) and jets are dominant, with statistical uncertainties on the data becoming more important at higher values of invariant mass. In the lowest region of invariant mass, the various NLO predictions differ from each other and from the data. In the other regions of \(m_{t\bar{t}}\) the differences are less pronounced and agree within the uncertainties.

Table 4 Summary of the parton-level absolute and normalised differential cross-sections as a function of \(\Delta \phi (\ell ^{+},\ell ^{-})\) in four regions of \(m_{t{\bar{t}}}\), with statistical and systematic uncertainties in each bin
Fig. 8
figure 8

The normalised parton-level differential cross-sections in four \(t\bar{t}\) mass bins compared to predictions from Powheg, MadGraph5_aMC@NLO and Sherpa, using the reconstructed selection: a \(m_{t\bar{t}}<450\) GeV, b \(450\le m_{t\bar{t}}<550\) GeV, c \(550\le m_{t\bar{t}}<800\) GeV, and d \(m_{t\bar{t}}\ge 800\) GeV. Each differential distribution is normalised to the integrated cross-section within the individual \(m_{t\bar{t}}\) region

Fig. 9
figure 9

Systematic uncertainties for the normalised parton-level differential cross-sections in four \(t\bar{t}\) mass bins: a \(m_{t\bar{t}}<450\) GeV, b \(450\le m_{t\bar{t}}<550\) GeV, c \(550\le m_{t\bar{t}}<800\) GeV, and d \(m_{t\bar{t}}\ge 800\) GeV. The \(t\bar{t}\) modelling uncertainties refer to the contributions from the NLO matrix-element generator (“Generator”), the PS algorithm (“Shower”) and the variation of initial- and final-state radiation (“Radiation”)

The unfolded absolute and normalised particle-level cross-sections for \(\Delta \phi \) and \(\Delta \eta \) are presented in Fig. 10 and the overall data–MC agreement is very close to that observed at parton level. As with the parton-level results, the normalised uncertainties are significantly smaller than the absolute uncertainties, and signal modelling uncertainties are dominant. The size of the overall uncertainties are similar between fiducial particle and full phase-space parton level for the normalised cross-sections, indicating that the extrapolation to the full phase-space that is modelled by the NLO generators used in the parton-level results is not detrimental.

Fig. 10
figure 10

The fiducial particle-level differential cross-sections compared to predictions from Powheg, MadGraph5_aMC@NLO and Sherpa: (top), absolute a \(\Delta \phi \) and b \(\Delta \eta \) and (bottom), normalised c \(\Delta \phi \) and d \(\Delta \eta \) , using the inclusive selection

8 Spin correlation results

The level of spin correlation observed in data is (traditionally) assessed by quantifying it in relation to the amount of correlation expected in the SM [2,3,4,5,6,7,8,9]. This fraction of SM-like spin correlation (\(f_{\text {SM}}\)) is extracted using hypothesis templates that are fitted to the parton-level, unfolded normalised cross-sections from data. Two hypotheses are used: dileptonic \(t\bar{t}\) events with SM spin correlation (the nominal \(t\bar{t}\) sample) and dileptonic events where the effect of spin correlation has been removed (the nominal \(t\bar{t}\) sample where the top quarks are decayed using MadSpin with spin correlations disabled), as described in Sect. 3. In each observable, a binned maximum-likelihood fit is performed using MINUIT [79]. The predicted normalised cross-section in bin i, \(x_{i}\), is determined as a function of \(f_{\text {SM}}\) using the expression:

$$\begin{aligned} x_{i} = f_{\text {SM}} \cdot x_{\text {spin, }i} + (1 - f_{\text {SM}}) \cdot x_{\text {nospin, }i}~\text {,} \end{aligned}$$

where \(x_{\text {spin}}\) and \(x_{\text {nospin}}\) are the expected normalised cross-sections under the SM spin hypothesis and the uncorrelated hypothesis, respectively. The negative logarithm of a likelihood function is minimised in order to determine \(f_{\text {SM}}\). The extraction of \(f_{\text {SM}}\) is performed in five observables: the inclusive \(\Delta \phi \); and \(\Delta \phi \) in each of the four regions of \(m_{t\bar{t}}\). The total number of bins used in the extraction depends upon the region of \(m_{t\bar{t}}\).

The statistical uncertainty on \(f_{\text {SM}}\) is determined using ensemble tests. Ten thousand pseudo-data sets are constructed by Poisson-smearing the observed number of events in each bin of the detector-level distribution. Each of these data samples are unfolded in the usual manner, and fitted to extract \(f_{\text {SM}}\). The RMS of the resulting distribution of \(f_{\text {SM}}\) values gives the statistical uncertainty on this quantity.

Systematic uncertainties on \(f_{\text {SM}}\) are determined using the same procedure as for the unfolded differential cross-sections, considering the same sources as those described in Sect. 6. Monte Carlo samples with different sources of systematic uncertainty are unfolded, as described in Sect. 5, and the unfolded spectra are used as pseudo-data. The templates are fitted to this pseudo-data and the difference between the systematic \(f_{\text {SM}}\) and the nominal (i.e. \(f_{\text {SM}}=1\)) is taken as the systematic uncertainty on \(f_{\text {SM}}\) due to that source. The dominant uncertainties are summarised in Table 5; the largest sources of systematic uncertainty arise due to the modelling of the \(t\bar{t}\) process.

Table 5 Summary table of the effect of experimental systematic uncertainties on the \(f_{\text {SM}}\) extraction. Uncertainties which are smaller than the precision shown are included in the totals and the \(f_{\text {SM}}\) significance calculations

The hypothesis templates for each observable, the unfolded data, and the resulting fit are presented in Figs. 11 and 12. The \(f_{\text {SM}}\) extracted from each observable and the significance with respect to the SM hypothesis are presented in Table 6. Two cases are considered: first, only the uncertainties on the unfolded measurement are taken into account, and second, theoretical uncertainties on the hypothesis templates are included. These theoretical contributions include factorisation and renormalisation scale shifts as well as PDF uncertaintiesFootnote 3 and are distinct from the radiation uncertainties (which also include scale variations) that are already included in the unfolded differential cross-section uncertainties. An additional template uncertainty is considered, which takes into account the difference between the nominal Powheg +Pythia 8 \(t\bar{t}\) sample and the alternative Powheg +Pythia 8 sample in which MadSpin handles the decays, as described in Sect. 3.

For the inclusive result, the spin correlation extracted from the unfolded data is somewhat higher than the SM expectation, with a significance of 2.2 standard deviations when including theoretical uncertainties on the hypothesis templates, and 3.8 standard deviations without those uncertainties. Previous measurements from ATLAS and CMS have also observed \(f_{\text {SM}} > 1\) but the uncertainties were such that the results were consistent with the prediction, even without template uncertainties included [2,3,4,5,6,7,8,9]. The central \(f_{\text {SM}}\) value as a function of \(m_{t\bar{t}}\) is found to increase as a function of \(m_{t\bar{t}}\); however, the uncertainties on \(f_{\text {SM}}\) are much larger than in the inclusive case and none of the results deviate substantially from the SM expectation.

Table 6 Summary of extracted \(f_{\text {SM}}\) values for each explored region with total uncertainties as well as the significance of the result with respect to the SM hypothesis. The significance with respect to the SM hypothesis is calculated using the statistical and systematic uncertainties on the data under a Gaussian assumption, together with the theory uncertainties on the hypothesis templates. The latter include the effect of scale variations and PDF uncertainties, as well as the uncertainty arising from including top quark decays via MadSpin in the nominal \(t\bar{t}\) MC. The values in brackets exclude the effect of theoretical uncertainties on the hypothesis templates and only include the statistical and systematic uncertainties on the data

A number of cross-checks were performed to attempt to understand the results in terms of either the limitations of the modelling of the \(t\bar{t}\) system or by experimental effects not covered by the systematic uncertainty prescription described above. The NLO generators used in this analysis model \(t\bar{t}\) production at NLO in QCD (hereafter simply referred to as NLO) but do not fully include NLO effects in the decays of the top quarks, nor do they directly consider the effects of interference between the initial and final states. The production and decay of the top quarks are factorised using the narrow-width approximation (NWA). The MCFM generator [80] can provide fixed-order predictions for \(t\bar{t}\) production and decay at full NLO in the dilepton channel under the NWA. The effect of the spin analysing power of the lepton itself also changes from unity at LO to 0.998 at NLO [16] and this is also not considered in the nominal hypothesis templates. Alternative hypothesis templates were generated using MCFM (using the same scale and PDF settings as the nominal Powheg + Pythia 8 sample), illustrated in Fig. 13, and the prediction is remarkably close to the prediction from Powheg + Pythia 8. It is concluded that the LO decays of the top quarks used in the nominal hypothesis templates have little effect on the measurement and do not explain the observed differences between data and predictions.

The effect of removing the NWA can not be directly tested in the phase-space of this measurement. Without the NWA and with both NLO in production and in decay, it becomes unphysical to separate the \(t\bar{t}\) and the contribution of the tW processes. In this analysis the tW process is directly subtracted as a background from the data, preventing a direct comparison to calculations that do not include the NWA and simulate the full \(t\bar{t}\) + tW process. However, the effect on the \(\Delta \phi \) observable was investigated in an inclusive \(t\bar{t}\) + tW (\(b\ell ^{-}\nu _{\ell }{\bar{b}}\ell ^{+}\bar{\nu _{\ell }}\)) phase-space using the Powheg-Box-Res bb4l process [81] and compared to the nominal \(t\bar{t}\) + tW set-up and no significant differences were observed. It is therefore assumed that, in the \(t\bar{t}\) phase-space of this measurement, the NWA in the templates is not a limiting factor and does not explain the observed differences.

Alternative templates for \(f_{\text {SM}}\) extraction may be constructed from samples used to evaluate systematic uncertainties, such as the radiation variation of Powheg + Pythia 8, or from alternative generator set-ups, such as Powheg + Herwig 7 and MadGraph5_aMC@NLO + Pythia 8, or by changing the scales and PDF settings. In each case, the no-spin template is derived by scaling the prediction of the alternative model (with spin included) by the ratio of the no-spin and spin templates in the Powheg + Pythia 8 setup. The results of using different hypothesis templates are presented in Table 7. With the exception of the highest \(m_{t\bar{t}}\) bin, which has large statistical and systematic uncertainties, the \(f_{\text {SM}}\) values remain above 1 for all alternative templates.

The effect of higher orders in the production (NNLO) was investigated by reweighting the \(p_{\text {T}}(t)\) spectra in Powheg + Pythia 8 to NNLO fixed-order predictions and to observed detector-corrected data spectra [65]. The reweighting reduced the observed deviation somewhat but was consistent with the scale uncertainties that are already considered in the uncertainties on the hypothesis templates. Fixed-order NNLO predictions recently became available for the observables in this paper [82]. The results of these predictions are illustrated in Fig. 13 for the default renormalisation (\(\mu _{\text {R}}\)) and factorisation (\(\mu _{\text {F}}\)) scale choice of \(\mu _{\text {R}}=\mu _{\text {F}}=H_T/4\), where \(H_T = \sqrt{m_t^2+p_{T,t}^2}+\sqrt{m_t^2+p_{T,\bar{t}}^2}\). They are closer to the data than the NLO predictions, but still do not agree fully. The effect of higher orders in \(t{\bar{t}}\) production is therefore assumed to be well-covered by the theoretical uncertainties on the templates and also does not fully explain the observed value of \(f_{\text {SM}}\). Fiducial predictions are also available with the same NNLO calculation; however, the definition of the particles used to construct the fiducial region (specifically the b-jets) are not identical. This results in a somewhat different fiducial region compared to the measurements presented in Sect. 7, and therefore a direct comparison is not made.

Finally, an alternative differential prediction, made specifically for these observables [35, 83, 84], is used as a template. It is calculated at NLO in the strong and weak gauge couplings, using an expansion of the normalised differential distribution in powers of the couplings, with fixed renormalisation and factorisation scales equal to the top mass \(\mu _{\text {R}}=\mu _{\text {F}}=m_t\). This prediction also has a dedicated no-spin template. The prediction agrees better with the data but has significant scale uncertainties, leading to an \(f_{\text {SM}}=1.03\pm 0.07\text {(stat)}^{+0.10}_{-0.14} \text {(scale)}\), and is consistent both with the result from using the Powheg + Pythia 8 templates and with the SM expectation of \(f_{\text {SM}}=1\). The value of \(f_{\text {SM}}\) is consistent with that extracted from a measurement of \(\Delta \phi \) by CMS [85], using the same calculation. The predictions of Ref. [82] have also been calculated using the expansion technique of Refs. [35, 83, 84]; the NLO expansion with \(\mu _{\text {R}}=\mu _{\text {F}}=m_t\) leads to comparable results, again with significant scale uncertainties. When the calculation is extended to NNLO in the same framework, it lies further from the data and is consistent with the NNLO prediction without expansion of the normalised cross-section [82].

The comparison between data and the various SM predictions is illustrated in Figs. 13 and 14 . The disagreement between the data and the NLO predictions from MCFM and Powheg + Pythia 8 can be clearly observed in Fig. 14a. The NNLO fixed-order prediction agrees better with the data but still differs significantly. Finally, the expanded NLO predictions agree with the data within their large scale uncertainties, as shown in Fig. 14b, but the NNLO prediction using the same expansion does not.

Fig. 11
figure 11

Results of the fit of hypothesis templates to the unfolded data showing the \(\Delta \phi \) distribution for the inclusive selection. The hypothesis templates are described in Sect. 3

Fig. 12
figure 12

Results of the fit of hypothesis templates to the unfolded data showing the \(\Delta \phi \) distributions in \(m_{t\bar{t}}\) regions for the reconstructed selection: a \(m_{t\bar{t}}<450\) GeV, b \(450\le m_{t\bar{t}}<550\) GeV, c \(550\le m_{t\bar{t}}<800\) GeV, and d \(m_{t\bar{t}}\ge 800\) GeV. The hypothesis templates are described in Sect. 3

Fig. 13
figure 13

Comparison of the unfolded \(\Delta \phi \) distribution with theoretical predictions for the normalised cross-section with the inclusive selection. Each prediction is discussed in the text

Fig. 14
figure 14

Comparison of the unfolded \(\Delta \phi \) distribution with theoretical predictions for the inclusive selection: ratio as compared with Powheg + Pythia 8 for a NLO generators and NNLO fixed-order predictions, b NLO and NNLO theoretical predictions expanded in the normalised cross-section ratio. Each prediction is discussed in the text

Table 7 Summary of the extracted spin correlation values in the inclusive \(\Delta \phi \) observable using different hypothesis templates

9 SUSY interpretation

The detector-level \(\Delta \phi \) and \(\Delta \eta \) observables using the inclusive selection described in Sect. 4 are used to search for supersymmetric top squark pair production (\(\tilde{t}_{1}\bar{\tilde{t}}_{1}\)) with \({\tilde{t}}_{1} \rightarrow t{\tilde{\chi }}^{0}_{1}\) decays. Naturalness arguments suggest that SUSY models with light top squarks may provide a solution to the hierarchy problem; however, a light top squark with a mass nearly degenerate with that of the top quark (so called “stealth stops”) are challenging to detect using direct searches. It has been shown that leptonic spectra, such as \(\Delta \eta \), can differentiate between \(\tilde{t}_{1}\bar{\tilde{t}}_{1}\) and SM \(t\bar{t}\) production [86] and in previous searches ATLAS exploited differences between the expected spin correlations in the \(\Delta \phi \) observable to set limits on \(\tilde{t}_{1}\bar{\tilde{t}}_{1}\) production at a centre-of-mass energy of 8 TeV [6]. In this analysis the sensitivity of both of these observables is exploited simultaneously to maximise the sensitivity to stealth stop scenarios. Double-differential distributions of \(\Delta \phi \) in ranges of \(\Delta \eta \) of \(|\Delta \eta | < 1.5\), \(1.5< |\Delta \eta | < 2.5\), and \(2.5< |\Delta \eta | < 4.5\) are constructed as this was found to provide the optimal sensitivity to a wide variety of \(\tilde{t}_{1}\bar{\tilde{t}}_{1}\) scenarios, where the majority of the expected sensitivity comes from the \(|\Delta \eta |\) component. The MC samples used to simulate left-handed \(\tilde{t}_{1}\bar{\tilde{t}}_{1}\) production are described in Sect. 3, and are generated at LO in perturbative QCD. Figure 15 shows the effect on the expected \(\Delta \phi \) and \(\Delta \eta \) distributions individually, and on the double-differential distributions, from the inclusion of \(\tilde{t}_{1}\bar{\tilde{t}}_{1}\) signal with \(m_{{\tilde{\chi }}^{0}_{1}} = 0.5\) GeV and \(m_{{\tilde{t}}_{1}} = 170\) GeV or \(m_{{\tilde{t}}_{1}} = 210\) GeV compared to the data and SM \(t\bar{t}\) background. In the \(\Delta \phi \) distribution alone, \(f_{\text {SM}}\) varies from approximately 0.85 to 0.99 between \(m_{{\tilde{t}}_{1}} = 170\) GeV and \(m_{{\tilde{t}}_{1}} = 240\) GeV.

Observed and expected limits are set on the \(\tilde{t}_{1}\bar{\tilde{t}}_{1}\) production cross-section by simultaneously fitting the SM prediction to the observed data in the three double-differential distributions and varying the supersymmetric signal strength parameter \(\mu \). Limits are determined using a profile likelihood ratio in the asymptotic limit, using nuisance parameters to account for sources of systematic uncertainties. The limits are extracted at the \(95\%\) confidence level (CL) using the \(\hbox {CL}_\text {s}\) prescription [87]. The absolute cross-sections are used in the limits and the \(t{\bar{t}}\) cross-section is set to its SM value (as described in Sect. 3) but allowed to vary as a nuisance parameter within the theoretical uncertainties. All experimental and modelling systematic uncertainties that are considered for the differential cross-section results and for the spin correlation measurement are also considered here as nuisance parameters in the fit. In addition, these sources of experimental uncertainty are also considered on the \(\tilde{t}_{1}\bar{\tilde{t}}_{1}\) samples. In contrast to the \(t{\bar{t}}\) prediction, SUSY modelling uncertainties such as the choice of factorisation and renormalisation scales, the merging and matching scale, and the Pythia tune are found to produce a negligible contribution on the shape of the distributions. The limits are found to be insensitive to these systematic uncertainties, which are therefore neglected. Finally, an additional uncertainty is included to account for the spread of predictions observed in Fig. 13. The difference between the NLO background \(t{\bar{t}}\) predictions from Powheg + Pythia 8 and the NLO QCD + Weak expanded predictions is taken, symmetrised, as a theoretical model uncertainty in order to account for the differences observed in the data compared to the prediction as discussed in Sect. 8. This encompasses the range of current NLO predictions of the \(t{\bar{t}}\) background process, particularly for the \(\Delta \phi \) observable. The fit was found to be very sensitive to correlations between the additional radiation and MC generator uncertainties, therefore these uncertainties were split into individual nuisance parameters per bin to break the correlations. The largest sources of systematic uncertainty on the limits are the \(t\bar{t}\) cross-section uncertainty, the systematic uncertainties due to the choice of radiation settings and MC generator in the \(t\bar{t}\) simulation, and the theoretical model uncertainty on the \(t{\bar{t}}\) background. Without this final systematic uncertainty, the observed limits would become stronger due to the poorer description of the background compared to the data and the difference between the expected and observed limits would also become stronger. The limits are also found to be stable when the \(t{\bar{t}}\) prediction is reweighted to match the NLO QCD + Weak expanded prediction.

SUSY production for a given \(m_{{\tilde{t}}_{1}},m_{{\tilde{\chi }}^{0}_{1}}\) is considered to be excluded when the observed limit is below the expected SUSY cross-section, the theoretical uncertainty on which is 15% (from PDF and scale uncertainties [88]). For a neutralino mass \(m_{{\tilde{\chi }}^{0}_{1}} = 0.5\) GeV, Fig. 16 shows the observed (expected) limit, where top squarks with a mass between 170 (170) GeV and 230 (213) GeV are excluded with respect to the background generator prediction. Figure 17 shows the observed (expected) limit as functions of both \(m_{{\tilde{\chi }}^{0}_{1}}\) and \(m_{{\tilde{t}}_{1}}\) assuming the expected SUSY cross-sections. Observed (expected) limits are set on top squarks with masses between 170 (170) GeV and 230 (217) GeV for different values of \(m_{{\tilde{\chi }}^{0}_{1}}\), and stop production with neutralinos with masses below 62 (42) GeV is excluded for different values of \(m_{{\tilde{t}}_{1}}\). Figure 16 also shows the expected limits derived using only the single-differential distributions of \(\Delta \phi \) and \(\Delta \eta \). The limits on top squark production are dominated by the \(\Delta \eta \) observable, and are relatively insensitive to the contribution of the \(\Delta \phi \) distribution and its modelling at NLO.

If the \(t{\bar{t}}\) cross-section normalisation were a free parameter of the fit and not constrained within its theory prediction uncertainty, the expected cross-section limit would increase, giving a lower mass limit of \(m_{{\tilde{t}}_{1}} = 205\) GeV. If, on the other hand, the shape information of \(\Delta \phi \) and \(\Delta \eta \) were not used in the fit, the expected cross-section limit would increase, giving a lower mass limit of just below \(m_{{\tilde{t}}_{1}} = 170\) GeV. This shows that the shape information drives the limit, in contrast with the same study performed at 8 TeV [6].

The presence of low-mass top squarks could cause a bias in the determination of the top quark mass; this would cause an underestimation of the top quark mass in direct measurements. In the presence of two-body top squark decays, the top production cross-section used in the fit would be too high, resulting in more stringent limits being set on top squarks near and above the top quark mass. The limits would be most affected for \(m_{{\tilde{t}}_{1}} = m_{t}\) and the effect would disappear with increasing top squark mass. Equivalently, three-body top squark decays would affect limits near and below the top quark mass; however, as this analysis does not consider masses below \(m_{{\tilde{t}}_{1}} = 170\) GeV this effect is small. The magnitude of the expected bias on the top quark mass and the top quark pair production cross-section due to two-body top squark decays is estimated in Refs. [89, 90]. For \(m_{{\tilde{\chi }}^{0}_{1}} = 1\) GeV, the maximum bias is approximately \(\Delta m_{t} = 2.0~(0.4)\) GeV for \(m_{{\tilde{t}}_{1}} = 170~(200)\) GeV, resulting in a cross-section shift of 5.3% (1.1%). In addition, for \(m_{{\tilde{\chi }}^{0}_{1}} = 20\) GeV, the maximum bias is approximately \(\Delta m_{t} = 1.5~(0.5)\) GeV, resulting in a cross-section shift of 3.9% (1.3%). The fits are performed again assuming these biases, interpolated for the region of \(170 \le m_{{\tilde{t}}_{1}} \le 200\) GeV and \(m_{{\tilde{\chi }}^{0}_{1}} \le 20\) GeV. This results in an increase in the observed top squark cross-section limit at \(m_{{\tilde{t}}_{1}} = 170\) GeV of 8 pb, and an extension of the \(1\sigma \) error band in Fig. 17 by 10 GeV towards lower values of \(m_{{\tilde{t}}_{1}}\). However, neither the expected nor the observed upper limits set by the analysis are altered.

Some of the excluded phase-space has already been excluded by existing direct measurements, however, these results are more stringent than the existing limits [91] in a kinematically challenging region (where \(m_{{\tilde{t}}} \sim m_{t}\)) not currently excluded by direct searches [92, 93]. The entire phase-space excluded by this analysis is shown for completeness.

Fig. 15
figure 15

The inclusive a \(\Delta \phi \) and b \(\Delta \eta \) distributions compared to the sum of the SM and SUSY predictions, for \(m_{{\tilde{t}}_{1}} = 170\) GeV and 210 GeV, and \(m_{{\tilde{\chi }}^{0}_{1}} = 0.5\) GeV as well as the \(\Delta \phi \) in regions of \(\Delta \eta \): c \(|\Delta \eta | <1.5\), d \(1.5< |\Delta \eta | <2.5\), and e \(2.5<| \Delta \eta | <4.5 \). The dark uncertainty bands in the ratio plots represent the statistical uncertainties while the light uncertainty bands represent the statistical and systematic uncertainties added in quadrature. The systematic uncertainties include contributions from leptons, jets, missing transverse momentum, background modelling, pile-up modelling and luminosity, but not PDF or \(t\bar{t}\) modelling uncertainties

Fig. 16
figure 16

Expected and observed limits at \(95\%\) CL on the top squark pair production cross-section as a function of \(m_{{\tilde{t}}_{1}}\) assuming a \(100\%\) branching ratio for \({\tilde{t}}_{1} \rightarrow t{\tilde{\chi }}^{0}_{1}\) decays with \(m_{{\tilde{\chi }}^{0}_{1}} = 0.5\) GeV. The dashed line shows the expected limit with \(\pm 1\) and \(\pm 2\) standard deviation bands. The dashed line shows the theoretical cross-section with uncertainties. The solid line gives the observed limit. Also shown are the expected limits using the \(\Delta \phi \) and \(\Delta \eta \) distributions separately

Fig. 17
figure 17

Expected and observed limits at \(95\%\) CL on the top squark pair production cross-section as a function of \(m_{{\tilde{\chi }}^{0}_{1}}\) and \(m_{{\tilde{t}}_{1}}\) assuming a \(100\%\) branching ratio for \({\tilde{t}}_{1} \rightarrow t{\tilde{\chi }}^{0}_{1}\) decays. The dashed line shows the expected limit with \(\pm 1\) standard deviation band. The solid line shows the observed limit with the \(\pm 1\sigma \) (dotted) SUSY cross-section theoretical uncertainties

10 Conclusion

Absolute and normalised differential cross-sections have been measured as a function of the azimuthal angle difference, \(\Delta \phi \), and the pseudorapidity difference, \(\Delta \eta \), between the two charged leptons in the \(e\mu \) decay channel of top quark pairs using 36.1 \(\hbox {fb}^{-1}\) of data recorded by the ATLAS detector in proton–proton collisions at \(\sqrt{s} = 13\) TeV during 2015 and 2016 at the LHC. The \(\Delta \phi \) differential cross-section is also measured as a function of the \(t\bar{t}\) invariant mass. None of the studied generators are able to reproduce the normalised \(\Delta \phi \) distribution within the experimental errors. A comparison was made with fixed-order predictions at NNLO in QCD and with the expansion at NLO in QCD and Weak couplings using a fixed scale choice. The former slightly improves the description of the data, while the latter describes the data but with large scale uncertainties.

An extraction of spin correlation was performed using the normalised parton-level \(\Delta \phi \) observable. The spin correlation was found to be higher than that predicted by the SM as implemented in NLO MC generators with a significance of 2.2 standard deviations. The measured value of spin correlation agrees well with the prediction by the expansion at NLO in QCD and Weak couplings, but is less consistent with NNLO predictions, with or without expansion in the normalised cross-section. The spin correlation was also found to increase slightly as a function of the invariant mass of the \(t\bar{t}\) system but no individual bin indicates a discrepancy above 1.2 standard deviations, due to the larger statistical and systematic uncertainties in these regions.

A search for \(\tilde{t}_{1}\bar{\tilde{t}}_{1}\) production was also performed using double-differential distributions of \(\Delta \phi \) in ranges of \(\Delta \eta \). In the absence of a SUSY signal in data, limits were set on top squark and neutralino production, taking into account the current limitations of the signal and background modelling. Top squarks with masses between 170 and 230 GeV are excluded for most kinematically allowed values of the neutralino mass, compared to expected limits of 170 and 217 GeV.