1 Introduction

Neutrino physics presents one of the biggest puzzles yet to be addressed in modern particle physics. The extremely small values of the neutrino masses compared to the masses of the other fermions appear unnatural in the Standard Model (SM) [1]. The seesaw mechanism [2,3,4,5,6] provides an elegant way to give a very small mass \(m_\nu \) to each of the SM neutrinos by introducing a heavy Majorana neutrino with mass M. The spontaneously broken electroweak (EW) symmetry explains the neutrino mass as a Yukawa coupling. Three types of seesaw mechanisms have been proposed and their phenomenology can be tested at collider experiments. The type-III seesaw [7] introduces at least one extra fermionic \(\hbox {SU}(2)_{\textrm{L}}\) triplet field coupled to EW gauge bosons. These heavy charged and neutral leptons can in principle be produced by EW processes at the Large Hadron Collider (LHC).

Type-III seesaw heavy-lepton searches have already been performed in various decay channels by both the ATLAS and CMS collaborations. In Run 1, ATLAS excluded heavy leptons with masses below 335 GeV [8] using final states containing two light leptons (electrons or muons) and two jets. This mass limit was then improved to 470 GeV, still using Run 1 data, by adding the three-lepton channel [9] as suggested in Ref. [10]. Using the full Run 2 data sample of proton–proton collisions at \(\sqrt{s} = {13}~{\hbox {TeV}}\), the CMS Collaboration has excluded heavy-lepton masses up to 880 GeV [11] by analysing three- and four-lepton final states, while ATLAS has excluded heavy-lepton masses up to 790 GeV [12] by using only the two-lepton-plus-jets final state. The analysis presented in this paper searches for a type-III seesaw heavy lepton in three- and four-lepton final states. For the first time, a combination with the two-lepton-plus-jets final state is performed, giving a significant improvement in the sensitivity of the analysis.

Fig. 1
figure 1

Examples of Feynman diagrams for the considered type-III seesaw model [13] producing three- and four-lepton final states

The type-III seesaw model targeted in this search is described in Ref. [13]. It assumes the pair production of the neutral Majorana (\(N^{0} \)) and charged (\(L^{\pm } \)) heavy leptons proceeds via the s-channel production of virtual EW gauge bosons. \(N^{0}\) pairs are not produced because \(N^{0}\) has \(T_3 = Y = 0\) and thus does not couple to the \(Z \) [10, 14]. The production cross-section depends only on the masses of the \(N^{0} \) and \(L^{\pm } \), which are assumed to be degenerate as the mass splitting due to electroweak radiative corrections is expected to be smaller than \({\sim 200}~{\hbox {MeV}}\) [15]. The decays allowed in this model are \(L^{\pm } \rightarrow H\ell ^\pm ,Z\ell ^\pm ,W^\pm \nu \) and \(N^{0} \rightarrow Z\nu , H\nu , W^\pm \ell ^{\mp }\), where the SM leptons can be of any flavour, i.e. \(\ell = e,\mu ,\tau \). The branching ratios \({\mathcal {B}}_\ell \) for the heavy-lepton decays into \(\ell \) plus one SM boson are determined by the parameters \(V_\ell \) which govern mixing between the new heavy leptons and the SM leptons. Current bounds on \(V_\ell \) can be found in Ref. [16]. In this analysis, we assume the democratic scenario, where the three mixing parameters are equal, so that \({\mathcal {B}}_e = {\mathcal {B}}_\mu = {\mathcal {B}}_\tau = 1/3 \). The branching ratios \({\mathcal {B}}_{Z}\), \({\mathcal {B}}_{W}\), \({\mathcal {B}}_{H}\) for heavy-lepton decays into any SM lepton plus Z, W or H, namely \(L^{\pm },N^{0} \rightarrow Z,W ^\pm ,H \), are independent of the mixing parameters. For \(N^{0}\) masses larger than a few times the \(H\) mass, as considered in this analysis, the decays into different SM bosons are independent of the heavy-lepton mass, and therefore \(2 {\mathcal {B}}_H \simeq 2 {\mathcal {B}}_Z \simeq {\mathcal {B}}_W \simeq 1/2 \). Examples of Feynman diagrams in three- and four-lepton final states are shown in Fig. 1. These events are characterised by the production of two SM bosons (VV, VH or HH, where \(V= W, Z \)) and two charged leptons or neutrinos in the final state. The Majorana nature of \(N^{0}\) allows final states with four leptons and non null total lepton electrical charge as shown in Fig. 1b. This analysis focuses on events with high light-lepton multiplicity, including light leptons from \(\tau \)-lepton decays.

This paper is structured as follows. The ATLAS detector is described in Sect. 2, the data and simulated events used in the analysis are outlined in Sect. 3, and the event reconstruction procedure is detailed in Sect. 4. The analysis strategy and background estimation are presented in Sects. 5 and 6, respectively. The systematic uncertainties are described in Sect. 7. Finally, results and their statistical interpretation are presented in Sect. 8, followed by the conclusions in Sect. 9.

2 ATLAS detector

The ATLAS detector [17] at the LHC is a multipurpose particle detector with a near-\(4\pi \) coverage in solid angle around the collision point and a cylindrical geometryFootnote 1 coaxial with the beam axis. It consists of an inner tracking detector surrounded by a thin superconducting solenoid providing a 2 T magnetic field, electromagnetic and hadronic calorimeters, and a muon spectrometer with superconducting toroidal magnets.

The inner detector (ID) provides charged-particle tracking in the range \(|\eta | < 2.5\) and, going outwards from the beam pipe, is composed of a high-granularity silicon pixel detector that typically provides four measurements per track, the first hit normally being in the insertable B-layer installed before Run 2 [18, 19], a silicon microstrip tracker, and a transition radiation tracker that covers the region up to \(|\eta | = 2.0\).

The calorimeter system covers the pseudorapidity range \(|\eta | < 4.9\). Within the region \(|\eta |< 3.2\), electromagnetic calorimetry is provided by barrel and endcap high-granularity lead/liquid-argon (LAr) calorimeters, with an additional thin LAr presampler covering \(|\eta | < 1.8\) to correct for energy loss in material upstream of the calorimeters. Hadron calorimetry is provided by the steel/scintillator-tile calorimeter, segmented into three barrel structures within \(|\eta | < 1.7\), and two copper/LAr hadron endcap calorimeters. The solid angle coverage is completed with forward copper/LAr and tungsten/LAr calorimeter modules optimised for electromagnetic and hadronic energy measurements respectively.

The muon spectrometer (MS) instruments the outer part of the detector and is composed of high-precision tracking chambers up to \( |\eta | = 2.7 \) and fast detectors for triggering up to \( |\eta | = 2.4 \). The MS is immersed in a magnetic field produced by three large superconducting air-core toroidal magnets with eight coils each.

A two-level trigger system is used to select events that are of interest for the ATLAS physics programme [20]. The first-level trigger is implemented in hardware and reduces the event rate to below 100 kHz. A software-based trigger further reduces this to a recorded event rate of approximately 1kHz.

An extensive software suite [21] is used in the reconstruction and analysis of real and simulated data, in detector operations, and in the trigger and data acquisition systems of the experiment.

3 Data and simulated events

This analysis uses data collected in proton–proton collisions at \(\sqrt{s} = {13}\) TeV with proton bunches colliding every 25 ns. After requiring that all ATLAS subdetectors collected high-quality data and were operating normally [22], the total integrated luminosity amounts to \({139}~{\hbox {fb}^{-1}}\). The uncertainty in the combined 2015–2018 integrated luminosity is 1.7% [23], obtained using the LUCID-2 detector [24] for the primary luminosity measurements. Events were collected using dilepton triggers selecting pairs of electrons [25] or muons [26]. The transverse momentum (\(p_{\text {T}}\)) threshold of the unprescaled dilepton trigger was raised during the data taking, due to the increasing luminosity of the colliding beams, but was never higher than 24 GeV for the leading electrons and 22 GeV for the leading muons.

Signal and background events were modelled using different Monte Carlo (MC) generators as listed in Table 1. The response of the ATLAS detector was simulated [27] using the \(\textsc {Geant4} 4\) toolkit [28] and simulated events were reconstructed with the same algorithms as those applied to data [21]. The type-III seesaw signal model was implemented in the [29] generator at leading order (\(\text {LO}\)) using FeynRules [30] and the NNPDF3.0lo [31] parton distribution function (PDF) set. All decays of \(L^{\pm }\) and \(N^{0}\) into the different leptonic flavours and subsequent decays of the \(W\), \(Z\) and \(H\) are considered. Matrix element (ME) events were interfaced to [32] for parton showering with the A14 set of tuned parameters [33] and the NNPDF2.3lo PDF set [34]. The signal cross-section and its uncertainty at next-to-leading-order (NLO) plus next-to-leading-logarithm (NLL) accuracy were calculated from \( \text {SU}(2)\) triplet production in an electroweak chargino–neutralino model [35, 36]. The calculated cross-sections are compatible within uncertainties with the type-III seesaw NLO implementation [37, 38]. Production cross section for a 600, 800 and 1000 \(\text {GeV}\) type-III seesaw heavy lepton are \(29.6\pm 3.0\), \(7.0\pm 0.8\) and \(1.97 \pm 0.25\) fb respectively.

Simulated SM background samples include diboson processes, which are the dominant ones, followed by processes labelled rare top quark that include multi-top-quark production and top-quark production in association with EW bosons (\( t\bar{t} V, t\bar{t} H, t W Z \)). Other SM simulated samples are triboson (VVV), \(t\bar{t}\), single top, and Drell–Yan (\( q\bar{q} \rightarrow Z/\gamma ^{*} \rightarrow \ell ^{+} \ell ^{-} \, (\ell =e,\mu ,\tau ) \)) production processes. They are mainly used for the estimation of reducible backgrounds as described in Sect. 6. The MEPS@NLO prescription [39] was used in the generation of Drell–Yan processes to match the ME to the parton shower. The generators used in the MC sample production and the cross-section calculations used for MC sample normalisations are listed in Table 1. The normalisation of the dominant backgrounds, diboson and rare top-quark processes, are extracted from the final likelihood fit, as described in Sect. 8.

Table 1 Configurations used for event generation of signal and most-relevant background processes. For the cross-section, the order in the strong coupling constant is shown for the perturbative calculation. If only one parton distribution function is shown, the same one is used for both the ME and parton shower generators; if two are shown, the first is used for the ME calculation and the second for the parton shower. Tune refers to the set of tuned underlying-event parameters used by the parton shower generator. The masses of the top quark and SM Higgs boson were set to 172.5 \(\text {GeV}\) and 125 \(\text {GeV}\), respectively. The samples with negligible impact are mentioned in the table but not discussed in the text

Diboson VV and triboson VVV events were simulated with the –2 generator [44]. Off-shell effects and Higgs boson contributions were included. ME calculations were matched and merged with the parton shower based on the Catani–Seymour dipole factorisation [45, 46] using the MEPS@NLO prescription [39, 47,48,49]. The library [50, 51] provided QCD corrections. Diboson samples were produced with different lepton multiplicities using MEs at NLO accuracy in QCD for up to one additional parton and at LO accuracy for up to three additional parton emissions. Loop-induced \( gg \rightarrow VV \) processes were generated with LO MEs for emission of up to one additional parton for both the fully leptonic and semileptonic final states. Electroweak production of a diboson pair in association with two jets (VVjj) was also simulated at LO. The PDFs used for the nominal samples were CT14 [52] and MMHT2014 [53].

Samples of events from \( t\bar{t} + V\) (where V stands for \(\gamma ^*, W,Z \, {\textrm{and}}\, H \)) and \(t W Z \) processes were produced using [29] at NLO accuracy and were interfaced to the parton shower with NNPDF3.0nlo [31] PDFs. The \(h_\textrm{damp}\) parameter, which controls the matching between the ME and the parton shower, was set to \({1.5}{m_{\textrm{top}}}\) [54], using a top quark mass of \( m_{\textrm{top}} = {172.5}\) GeV.

All simulated events were overlaid with a simulation of multiple \(pp\) interactions occurring in the same or neighbouring bunch crossings. These pp inelastic scattering events were generated by using the NNPDF2.3lo set of PDFs and the A3 set of tuned parameters [55]. Their effects are referred to as pile-up. The simulated events were reweighted such that the distribution of the average number of interactions per bunch crossing is compatible with that observed in the data.

4 Event reconstruction

Events considered in this analysis are required to have at least one collision vertex reconstructed with at least two tracks with transverse momentum greater than 500 MeV. The primary vertex of the hard-scattering event is the one with the largest sum of the associated tracks’ squared transverse momenta. Events have to satisfy the quality criteria listed in Ref. [22], including rejection of events with a large amount of calorimeter noise or non-collision background.

Electrons are reconstructed by matching a charged-particle track in the ID with an energy deposit in the electromagnetic calorimeter. Electron candidates are required to satisfy a Loose likelihood-based identification selection [56] and to be in the fiducial volume of the inner detector, \( |\eta | < 2.47\). The transition region between the barrel and endcap calorimeters (\( 1.37< |\eta | < 1.52 \)) is excluded because it is partially non-instrumented due to services infrastructure. The transverse impact parameter \( d_{0} \) of the track associated with the candidate electron must have a significance satisfying \( |d_{0} |/\sigma (d_{0}) < 5\). This is required in order to reduce the number of electrons originating from secondary decays. Similarly, the track’s longitudinal impact parameter \(z_0\) relative to the primary vertex must satisfy \(|z_0\sin (\theta )|<{0.5}\) mm, where \(\theta \) is the track’s polar angle. The electron’s transverse energy \(E_{\text {T}}\) must exceed 10 GeV. After this preselection, to refine the electron quality, a Tight likelihood-based identification selection and a set of Loose isolation criteria based on both calorimetric and tracking information are applied to primarily select electrons coming from the decays of the heavy leptons or the EW bosons.

Track segments in the MS are matched with ID tracks to reconstruct muons if they are within the \(\eta \) coverage of the ID. Muon candidates with \(p_{\text {T}}\) lower than 300 GeV are required to satisfy the Medium muon identification requirements, while for high-\(p_{\text {T}}\) muons, a specific identification working point is applied [57]. Muon candidates are required to have \(|\eta | < 2.5\), a transverse impact parameter significance of \(|d_{0} |/\sigma (d_{0}) < 3 \) and a longitudinal impact parameter value of \( |z_{0}\sin (\theta )| < {0.5}\) mm. The minimum muon \(p_{\text {T}}\) is 10 GeV. After this preselection, an isolation requirement based only on tracking information is applied.

In this analysis, particle-flow objects [58] are formed from energy-deposit clusters in the calorimeters and tracks measured in the ID but not matched to identified leptons. Particle-flow objects are then clustered into jets using the anti-\(k_{t}\) algorithm [59] with a radius parameter \(R = 0.4\). The measured jet \(p_{\text {T}}\) is corrected for detector effects to measure the particle energy before interactions with the detector material [60]. Energy within jets that is due to pile-up is estimated and removed by subtracting an amount equal to the mean pile-up energy deposition density multiplied by the \( \eta \)\( \phi \) area of the jet. Pile-up can also produce additional jets that are identified and rejected by the jet-vertex tagger (JVT) algorithm [61], which distinguishes them from jets originating from the hard-scattering primary vertex. Only jets with transverse energy greater than 20 GeV and \(|\eta | < 2.4\) are considered. Jets originating from heavy-flavour quarks are identified with the MV2c10 multivariate b-tagging algorithm using the 77% efficiency working point [62,63,64], with measured rejection factors of approximately 134, 6 and 22 for light-quark and gluon jets, c-jets, and hadronically decaying \(\tau \)-leptons, respectively.

The missing transverse momentum \(\vec {p}_{{\textrm{T}}}^{\textrm{miss}}\) (with magnitude \(E_{\text {T}}^{\text {miss}}\)) is calculated as the negative vectorial sum of the \(p_{\text {T}}\) of reconstructed jets and leptons in the event. A ‘soft term’ taking into account tracks associated with the primary vertex but not with any hard object is then added to guarantee the best performance in a high pile-up environment [65]. The \(E_{\text {T}}^{\text {miss}}\) significance \({\mathcal {S}}(E_{\text {T}}^{\text {miss}})\), calculated with a maximum-likelihood ratio method, is used in Sect. 5 to define the various analysis regions, taking into account the direction of the \(\vec {p}_{{\textrm{T}}}^{\textrm{miss}}\) and calibrated objects as well as their respective resolutions.

Different objects reconstructed close together in the \(\eta \)\(\phi \) plane could in principle have originated from the same primary object. Possible overlaps are resolved by an algorithm that appropriately removes one of the two closely spaced objects to avoid double-counting. If a muon candidate is found to share an ID track with an electron candidate, the electron candidate is rejected. If two electron candidates share an ID track, the one with the lower \(p_{\text {T}}\) is rejected. Jets are rejected if they are within \(\Delta R = 0.2\) of a lepton candidate, except if the candidate is a muon and three or more collinear tracks are found. Finally, lepton candidates that are within \( \Delta R = 0.4\) of any remaining jet are removed.

5 Analysis strategy

Once events have been classified according to the presence of exactly either three or four light leptons,Footnote 2 the two lepton-multiplicity categories are refined with dedicated selections. Events with larger lepton multiplicities can be categorized in lower lepton multiplicities regions if one or more leptons escape detection. Signal regions (SRs) are defined so as to maximise the significance of the signal event count predicted by the targeted model relative to the expected number of SM background events. SM backgrounds are normalised by performing a simultaneous fit in the SRs and in dedicated control regions (CRs). The CRs are defined so as to be enriched in relevant background processes and depleted in events from signal processes. The fit uses a kinematic variable chosen to optimise the sensitivity to the small cross-sections expected for the signal processes. Validation regions (VRs), also depleted in signal events, are used to validate the extrapolation of the SM background expectations obtained from the background-only fit to independent regions kinematically close to the SRs. All CRs and VRs are characterised by a signal contamination below 2%.

Backgrounds are assigned to two broad categories: reducible and irreducible backgrounds. Reducible backgrounds include leptons from misreconstructed objects such as jets, or from light- or heavy-quark decays or, in the electron case, photon conversions. These are called fake or non-prompt (FNP) leptons, and events containing at least one such lepton are referred to as the FNP background. Its contribution is calculated using a data-driven method. Irreducible backgrounds are produced by SM processes with three or four prompt leptons in the final state. Prompt leptons are leptons produced in the decays of \(W\) bosons, \(Z\) bosons, and \(\tau \)-leptons, as well as direct decays of the heavy leptons considered as signal in this analysis. The most important sources are diboson and rare top-quark processes, with the latter being primarily \(t\bar{t}\) pairs produced in association with an EW or Higgs boson. For these processes, kinematic distributions are obtained from MC simulation and their normalisation is extracted from the fit. Because low-mass heavy leptons have been excluded by previous searches, this search focuses on higher masses, where signal events are characterised by objects having high momenta.

Details about the three- and four-lepton analysis regions are provided below, while details of the two-lepton analysis regions are given in Ref. [12]. The analysis regions are all orthogonal to one another.

Table 2 Summary of the selection criteria used to define relevant regions in the three-lepton analysis. No selection is applied when a dash is present in the corresponding cell

5.1 Three-lepton channel

Table 2 summarises the selection criteria used to define the three-lepton SRs, CRs and VRs. The ZL SR is characterised by a leptonically decaying \(Z\) boson, and thus an opposite-sign, same-flavour (OSSF) lepton pair compatible with the \(Z\) boson mass is required. The SM boson from the decay of the other heavy lepton produced in the event, decays hadronically. Signal events are expected to have a large three-lepton invariant mass (\(m_{\ell \ell \ell }\)), and the transverse masses of the two highest-\(p_{\text {T}}\) leptons, \(m_{\textrm{T}}(\ell _{1})\) and \(m_{\textrm{T}}(\ell _{2})\), are also expected to be large.Footnote 3 An additional requirement is placed on the angular distance between the leading and subleading leptons to further increase the signal-to-background ratio in the SRs.

A complementary ZLVeto SR, targeting signals involving leptonic decays of \(W\) bosons and hadronic decays of \(Z\) bosons (including those from \(H\) bosons), is defined by vetoing events containing OSSF lepton pairs compatible with a leptonic decay of an on-shell \(Z\) boson requiring the invariant mass of the pair to be larger than 115 \(\text {GeV}\). The \(H_{\text {T}}\) variable is defined as the scalar sum of the \(p_{\text {T}}\) of all selected objects in the event. In cases where the scalar sum of the \(p_{\text {T}}\) is restricted to only a subset of the objects, they are specified. Signal events are characterised by large \(H_{\text {T}}\) and \(E_{\text {T}}^{\text {miss}}\) values and by a large value of the scalar sum of the momenta of the same-sign leptons, denoted by \(H_{\text {T}} ({\textrm{SS}})\). Since the presence of SS leptons in this region is mainly due to rare top events and FNP leptons, the \(H_{\text {T}} \) of this pair is used as discriminating variable looking for values \(H_{\text {T}} ({\textrm{SS}}) \ge 300\) \(\text {GeV}\). To account for possible hadronic decays of electroweak bosons from diboson background sources, an upper limit is placed on the two leading jets invariant mass \(m_{jj}\).

Table 3 Summary of the selection criteria used to define relevant regions in the four-lepton analysis. \(N_Z \) is the number of leptonically reconstructed \(Z\) bosons, using opposite-sign same-flavour leptons. No selection is applied when a dash is in the corresponding cell

Finally, the JNLow SR targets events where the electroweak bosons decay leptonically, and therefore events with low jet multiplicity, as in Fig. 1a, are selected. A lower bound is imposed on the invariant mass of the OSSF lepton pair (\(m_{\ell \ell }({\textrm{OSSF}})\))Footnote 4 and a large value of the scalar sum of the \(p_{\text {T}}\) of the three leptons, \(H_{\text {T}} (\ell \ell \ell )\), is required. Fake-lepton background is further reduced by requiring \(m_{{\textrm{T}}}(\ell _{1})\) and \(m_{{\textrm{T}}}(\ell _{2})\) to exceed a minimum value. The angular separation \(\Delta R(\ell _1,\ell _2)\) between the two leptons is required to exceed a minimum value to reduce the FNP contribution. Overall selection efficiencies for the production of a 800 \(\text {GeV}\)type-III seesaw heavy lepton are 0.29%, 0.57%, 0.41% for the ZL, ZLVeto and JNLow SR respectively.

SM backgrounds in the three-lepton SRs consist of diboson events, which contribute \({\sim }60\%\), \({\sim }80\%\) and \({\sim }40\%\) in the ZL, JNLow and ZLVeto regions, respectively. Another background in the ZL and ZLVeto regions originates from rare top-quark processes involving one or more top quarks, which contribute \({\sim }40\%\) and \({\sim }50\%\) of the background in those regions, respectively. Therefore, a CR targeting the normalisation of the diboson background is defined by requiring at least two jets and a low transverse mass for the subleading lepton such that \(m_{\textrm{T}}(\ell _2)\le {200}\) GeV.

Two VRs are defined in order to validate background estimates for events containing a \(Z\) boson decaying into leptons, both obtained by inverting the \(\Delta R (\ell _1,\ell _2)\) selection of the ZL SR, and applying additional requirements. The DB-VR also requires a b-tag veto, while in the RT-VR the presence of at least one b-tagged jet is required. These VRs validate the predictions and normalisation of diboson and rare top-quark processes respectively. An additional JNLow-VR is obtained from the JNLow SR by inverting the transverse mass requirement on the leading lepton, \(m_{\textrm{T}}(\ell _1)\le {240}\) GeV. Moreover, a Fake-VR is defined by inverting the \({\mathcal {S}}(E_{\text {T}}^{\text {miss}})\) selection common to all the other regions without applying any additional requirement except for lepton \(p_{\text {T}}\) ones. This region is enriched in contributions from FNP backgrounds and is therefore used to validate them.

In the three-lepton channel SRs, the kinematic variable used as the final discriminant to the data is the transverse mass of the three-lepton system.

5.2 Four-lepton channel

In the four-lepton channel, the momentum requirement on the three leading leptons is the same as in the three-lepton channel; the momentum of the fourth lepton is required to be larger than 10 GeV. Events are classified using the sum of the charges of the four leptons in the final state: \(\sum q_\ell \). The conditions \(\sum q_\ell = 0\) and \(|\sum q_\ell |= 2\) identify the zero charge (Q0) and double charge (Q2) regions, respectively. A summary of the selection criteria defining the four-lepton regions is shown in Table 3. Signal events are characterised by large \(H_{\text {T}}\) and large invariant mass \(m_{\ell \ell \ell \ell }\) of the four-lepton system. The presence of possible neutrinos in the final state is taken into account using the \(H_{\text {T}} + E_{\text {T}}^{\text {miss}} \) variable; therefore \(m_{\ell \ell \ell \ell }\ge {300}\) GeV and \(H_{\text {T}} + E_{\text {T}}^{\text {miss}} \ge {300}\) GeV are required in both the Q0 and Q2 signal regions. Apart from the common lepton \(p_{\text {T}}\) requirements, these are the only kinematic selections applied in the Q2 SR, which has less background than the Q0 SR since it is very rare for a SM process to produce a doubly charged final state. To reduce the \(\hbox {ZZ}^*\) contribution in the Q0 SR, no more than one OSSF lepton pair in an event is allowed to be compatible with a leptonic \(Z\) decay defined by the invariant mass window 80–100 \(\text {GeV}\). Background in the Q0 SR is further reduced by requiring \({\mathcal {S}}(E_{\text {T}}^{\text {miss}}) \ge 5\). Overall selection efficiencies for the production of a 800 \(\text {GeV}\)type-III seesaw heavy lepton are 0.14% and 0.11% for the Q0 and Q2 SR respectively.

Two CRs and two VRs are defined in the zero-charge Q0 kinematic space. A DB-CR targeting diboson backgrounds is built by requiring a b-jet veto and defining an invariant mass window for the four-lepton system (\({170}~{\text {GeV}}\le m_{\ell \ell \ell \ell } < {300}\) GeV). A RT-CR targeting rare top-quark background is obtained by requiring at least two b-tagged jets and \(m_{\ell \ell \ell \ell } < {500}\) GeV. To ensure orthogonality to the CRs, the VRs require events to have exactly one b-jet. To increase the contributions of diboson and rare top-quark backgrounds in the VRs, \(m_{\ell \ell \ell \ell }\) must satisfy \({170}~{GeV}\le m_{\ell \ell \ell \ell } < {300}\) GeV in DB-VR and \( {300}~{GeV} \le m_{\ell \ell \ell \ell } < {500}\) GeV in RT-VR. The RT-VR also requires \(H_{\text {T}} + E_{\text {T}}^{\text {miss}} \ge {400}\) GeV and \({\mathcal {S}}(E_{\text {T}}^{\text {miss}}) \ge 5\).

The main sources of background in the Q2 signal region are diboson or rare top-quark events where the electric charge of one of the electrons is mismeasured. As mentioned above, the only additional kinematic selections used to define the Q2 SR are that both \(H_{\text {T}} + E_{\text {T}}^{\text {miss}} \) and the four-lepton invariant mass must exceed 300 GeV. A dedicated Q2 VR is obtained by requiring \(m_{\ell \ell \ell \ell } < {200}\) GeV or \(H_{\text {T}} + E_{\text {T}}^{\text {miss}} < {300}\) GeV in order to validate both the diboson and FNP background estimates.

In the four-lepton channel SRs, the kinematic variable \(H_{\text {T}} +E_{\text {T}}^{\text {miss}} \) is used as the final discriminant to fit to the data.

6 Background composition and estimation

The background estimation techniques used in the analysis, combining simulations and data-driven methods common to all channels, are discussed in this section.

Irreducible-background predictions are obtained directly from simulations, but normalisation of diboson and rare top-quark processes is obtained from the fit. To avoid double-counting between background estimates derived from MC simulation and the data-driven reducible-background predictions, a specific check is performed: events from irreducible-background MC samples are considered only if generator-level prompt leptons can be matched to their reconstructed counterparts.

There are two sources of reducible background: events in which at least one lepton charge is misidentified and FNP leptons. The former source is relevant only in the Q2 four-lepton signal region where an event in the Q0 category can migrate to the Q2 category if the charge of one of the leptons is mismeasured.

Charge misidentification for muons is well described by the simulation and occurs only for high-momentum muons where detector misalignments degrade the muon momentum resolution. However, electrons are more susceptible to charge misidentification, as a combination of effects from bremsstrahlung and photon conversions that might not be adequately described by the detector simulation. Correction factors (scale factors) accounting for charge misreconstruction are applied to the simulated background events. They are derived by comparing the charge misidentification probability measured in data with the one in simulations and are parameterised as function of \(p_{\text {T}}\) and \(\eta \). The charge misidentification probability is extracted by performing a likelihood fit in a dedicated \(Z \rightarrow ee\) data sample, as described in Ref. [56]. The charge misidentification probability increases from \({\sim }10^{-4}\) to \({\sim }10^{-1}\) with increasing \(p_{\text {T}}\) and \(|\eta |\).

FNP leptons are produced by secondary decays of light- or heavy-flavour mesons into light leptons embedded within jets. Although the b-jet veto and lepton isolation significantly reduce the number of FNP leptons, a fraction still satisfy the selection requirements. Significant components of FNP electrons arise from photon conversions and from jets that are wrongly reconstructed as electrons. MC samples are not used to estimate these background sources because the simulation of jet production and hadronisation has large intrinsic uncertainties.

Instead, the FNP background is estimated with a data-driven approach, known as the fake factor (FF) method, as described in Ref. [66]. The FF is measured in dedicated FNP-enriched regions where the events satisfy single-lepton triggers without isolation requirements and have low \(E_{\text {T}}^{\text {miss}}\), no b-jets, and only one reconstructed lepton that satisfies the lepton identification preselection described in Sect. 5. In these regions, two kinds of leptons are identified, \({\mathcal {L}}\) leptons satisfy looser object selection criteria than the ones used to identify the leptons considered in the analysis regions, which are here named \({\mathcal {T}}\) leptons. Electron and muon FFs are then defined as the ratio of the number of \({\mathcal {T}}\) leptons to the number of \({\mathcal {L}}\) leptons, and are parameterised as functions of \(p_{\text {T}}\) and \(\eta \). The FNP background is then estimated in the SRs by applying the FF as an event weight in a template region defined with the same selection criteria as the corresponding SRs, except that at least one of the leptons must be an \({\mathcal {L}}\) lepton but not a \({\mathcal {T}}\) one.

The prompt-lepton contribution is subtracted from the template region by using the irreducible-background MC samples to estimate the prompt-lepton contamination in the adjacent regions [56]. The Fake-VR, as defined in Table 2, is used to validate the data-driven FNP-lepton estimate.

7 Systematic uncertainties

Uncertainties affect several aspects of this analysis. Experimental uncertainties related to the trigger selection, lepton reconstruction, identification, momentum measurement and isolation selection affect both the global selection efficiency and the shape of the kinematic distributions used in the fit. The main contributions come from the electron selection efficiency and the sagitta resolution of the muon spectrometer. Uncertainties are estimated mainly from \(Z \rightarrow \ell \ell \) and \(J/\psi \rightarrow \ell \ell \) processes [56, 57]. Uncertainties in the jet energy scale and resolution are evaluated from MC simulations and data, using multi-jet, \(Z \text {\,+\,jets}\) and \(\gamma \) + jets events [67]; they are estimated to be less than 2% in the range of jet transverse momentum of interest and affect both the selection efficiency measurement and the kinematic distributions used in the fit. Uncertainties in the \(b\text {-tagging}\) efficiencies are evaluated from data using dileptonic \(t\bar{t}\) events [68]. They are estimated to range from 8% at low momentum to 1% at high momentum. These uncertainties affect the analysis region selection efficiencies. The \(E_{\text {T}}^{\text {miss}}\) measurement uncertainties are estimated from data-to-MC comparison in \(Z \rightarrow \mu \mu \) events without jets as described in Ref. [69]. The \(E_{\text {T}}^{\text {miss}}\) uncertainties affect both the selection efficiencies and the kinematic distributions used in the fit. The charge misidentification uncertainty affects only the Q2 analysis regions. The uncertainty in the charge-misidentification scale factor is estimated to be less than 10% from a comparison between same-sign dielectron data and MC events, with the same electron selection as used in this analysis, and with \(|m_{ee}-m_Z |<{10}\) GeV as described in Ref. [56]. The uncertainty in the pile-up simulation, derived from a comparison of data with simulation, is also taken into account [70]. The limited size of the MC samples is taken into account as an additional uncertainty.

Fig. 2
figure 2

Observed and expected event yields in the CRs, VRs and SRs for the three- and four-lepton channels after the fit procedure described in the text. Diboson indicates background from diboson processes. Rare top indicates background from \( t\bar{t} + V \) and \(t W Z \) processes. FNP includes the background from fake or non-prompt leptons. Other indicates all the other considered backgrounds that contribute less than 2%. The hatched bands include systematic uncertainties with the correlations between various background sources taken into account. The lower panel shows the ratio of the observed data to the predicted SM background after the likelihood fit

The FNP background uncertainty comes from the modelling and normalisation of the prompt leptons subtracted in the FF estimation. The origin of the FNP background is then varied by selecting slightly modified FNP-enriched regions, where the FF is measured. Variations of the FNP-enriched regions are, for example, obtained by varying the jet multiplicity requirement and the \(E_{\text {T}}^{\text {miss}}\) selection. The resulting uncertainty of the FF depends on the lepton momentum and pseudorapidity and ranges from 5 to 40% for electrons and from 10 to 30% for muons.

Theoretical uncertainties affect both the signal and background predictions. For both, the uncertainties from missing higher orders are evaluated by independently varying the QCD factorisation and renormalisation scales in the matrix element by up to a factor of two [71]. The PDF uncertainties are evaluated using the LHAPDF toolkit [72] and the PDF4LHC prescription [73]. An additional uncertainty of 10% is added to the diboson cross-section to take into account variations in the level of data-to-MC agreement for \(VV\) processes in different jet multiplicity regions. For rare top-quark backgrounds, uncertainties in the \(t\bar{t} W \) cross-section are evaluated to be \(\pm 50\%\), while for the \(t\bar{t} H \) cross-section the uncertainty from varying the QCD factorisation and renormalisation scales is \(^{+5.8}_{-9.2}\%\), with another \(\pm 3.6\%\) from PDF+\(\alpha _{\text {s}}\) variations. Since the yields of the rare top-quark and diboson backgrounds are derived from the likelihood fit to the data in the CRs, the systematic variations have little impact on the final yields of the background predictions in the CR and SR.

Table 4 Observed data and background yields in the three- and four-lepton signal regions after the background-only fit in the combined three- and four-lepton regions; the combination of statistical and systematic uncertainties have been reported
Fig. 3
figure 3

Distributions of \(m_{{\textrm{T}},3\ell }\) in the three-lepton signal regions after the combined fit: a the ZL signal region, b the ZLVeto signal region and c the JNLow signal region. The coloured lines correspond to signal samples with the \(N^{0}\) and \(L^{\pm }\) mass values stated in the legend. The hatched bands include all statistical and systematic post-fit uncertainties with the correlations between various background sources taken into account. The lower panel shows the ratio of the observed data to the predicted SM background. The last bin in the distributions contains the overflows

Fig. 4
figure 4

Distributions of \(H_{\text {T}} + E_{\text {T}}^{\text {miss}} \) in the four-lepton signal regions after the combined fit: a the Q0 signal region where the sum of lepton charges is zero and b the Q2 signal region where the sum of lepton charges is \(\pm 2\). The coloured lines correspond to signal samples with the \(N^{0}\) and \(L^{\pm }\) mass values stated in the legend. The hatched bands include all statistical and systematic post-fit uncertainties with the correlations between various background sources taken into account. The lower panel shows the ratio of the observed data to the predicted SM background. The last bin in the distributions contains the overflows

Fig. 5
figure 5

Distributions of \(m_{{\textrm{T}},3\ell }\) in the three-lepton control and validation regions a ZL-CR and b fake-VR, and of \(H_{\text {T}} +E_{\text {T}}^{\text {miss}} \) in the four-lepton control regions c Q0 DB-CR and d Q0 RT-CR after the combined fit. The simulated signal contribution was found to be below 2% and is not shown in the figure. The hatched bands include all statistical and systematic post-fit uncertainties with the correlations between various background sources taken into account. The lower panel shows the ratio of the observed data to the predicted SM background. The last bin in the distributions contains the overflow

8 Statistical analysis and results

The HistFitter [74] statistical package is used to fit the predictions to the data in the CRs and SRs. The fit considers the \(m_{{\textrm{T}},3\ell }\) and \(H_{\text {T}} +E_{\text {T}}^{\text {miss}} \) distributions for the three- and four-lepton channels, respectively, calculating lower limits on the heavy-lepton mass with a test statistic based on the binned profile likelihood in the asymptotic approximation [75], whose validity is tested using a pseudo-experiment approach. The lower limits on the mass are calculated at 95% confidence level (CL) and the binning is chosen to optimise the sensitivity to signal. The various components of the background predictions are validated in the corresponding VRs. Background and signal contributions are modelled by a product of independent Poisson probability density functions representing the likelihood of the fit. Systematic uncertainties are modelled by Gaussian probability density functions centred on the pre-fit prediction of the nuisance parameters, with widths that correspond to the magnitudes of these uncertainties. Four different fitting procedures are performed: the three-lepton channel on its own, the four-lepton channel on its own, the three- and four-lepton channels combined, and finally the two-, three- and four-lepton channels combined, where results for the two-lepton channel are taken from Ref. [12]. All the contributions from the experimental uncertainties in the lepton, jet and \(E_{\text {T}}^{\text {miss}}\) selections and reconstruction, pile-up simulation, background simulation, theoretical calculations and irreducible-background estimates are considered correlated among the different multiplicity channels in multi-channel fits.

After a background-only likelihood fit in the CRs, the three- and four-lepton channel diboson normalisation factors are found to be \(0.80\pm 0.09\) and \(1.08\pm 0.03\), respectively. The normalisation and shape of the \(m_{{\textrm{T}},3\ell }\) and \(H_{\text {T}} +E_{\text {T}}^{\text {miss}} \) distributions are validated in the ZL DB-VR and Q0 DB-VR, respectively, by comparing data and SM expectations after the fit. The rare top-quark contribution normalisation is estimated to be \(1.3\pm 0.2\) in the four-lepton channel Q0 RT-CR and is then extrapolated to all the SRs. The background modelling is validated in the ZL RT-VR and Q0 RT-VR for the three- and four-lepton channels, respectively.

Event yields after the likelihood fit for the analysis regions in the three- and four-lepton channels are shown in Fig. 2. Good agreement within statistical and systematic uncertainties between data and SM predictions is observed in all regions, demonstrating the validity of the background estimation procedure as shown in Table 4.

The post-fit distributions of the \(m_{{\textrm{T}},3\ell }\) and \(H_{\text {T}} +E_{\text {T}}^{\text {miss}} \) variables used in the likelihood fit in the three- and four-lepton channels are shown in Figs. 3 and 4, respectively, for the signal regions, with the binning used in the fit. After the fit, the compatibility of the data and the expected background is assessed. Good agreement is observed. The p-values,Footnote 5 evaluated using the distributions in Figs. 3 and 4, are 0.38, 0.090 and 0.25 for the three-, four- and the combined three-and-four lepton channels, respectively. Figure 5 shows the distributions of these discriminating variables in some of the control and validation regions.

The relative uncertainties in the background yield estimates are shown in Fig. 6 for all analysis regions in the three- and four-lepton channels. The dominant uncertainty in the SRs, and in most of the other regions, is the statistical uncertainty of the data, which varies from 20 to \({37}{\%}\) depending on the signal region. The MC statistical uncertainty varies from 2 to \({7}{\%}\) instead. In the Q2 SR an uncertainty contribution close to the data statistical uncertainty comes from the charge misidentification background, considered in the Experimental category.

Fig. 6
figure 6

Relative contributions from different sources of statistical and systematic uncertainty to the total background yield estimates after the fit. Experimental uncertainties are related to the lepton, jet and \(E_{\text {T}}^{\text {miss}}\) selection and reconstruction, and also to lepton charge misidentification. FNP includes the fake or non-prompt leptons contribution. Luminosity is related to the luminosity uncertainty that affects the background simulation yields. Theory includes theoretical uncertainties associated with the PDF, \(\alpha _{\text {s}}\), and renormalisation and factorisation scales. Normalisation is related to the diboson and rare top-quark normalisation factors extracted by the likelihood fit. Systematic uncertainties are calculated by changing each nuisance parameter from its fit value by one standard deviation, keeping all the other parameters at their central values, and comparing the resulting event yield with the nominal yield. Individual uncertainties can be correlated within each region, and do not necessarily add in quadrature to the total background uncertainty, which is shown as Total MC uncertainty (correlated). Data Stat. uncertainty refers to the statistical uncertainty of the collected data

In the absence of a significant deviation from SM expectations, 95% CL upper limits on the signal production cross-section are derived using the \( {\textrm{CL}}_{\textrm{s}} \) method [76]. The upper limits on the production cross-sections of the \( pp \rightarrow W^{*} \rightarrow N^{0} L^{\pm } \) and \( pp \rightarrow Z^{*} \rightarrow L^{\pm } L^{\mp } \) processes are evaluated as a function of the heavy-lepton mass, using the three- and four-lepton channels with the democratic \({\mathcal {B}}_\ell \) scenario. By comparing the upper limits on the cross-section with the theoretical cross-section calculation as a function of the heavy-lepton mass, a lower limit on the mass of the type-III seesaw heavy leptons \(N^{0}\) and \(L^{\pm }\) is derived. The observed (expected) exclusion limit is 870 GeV (\(900^{+80}_{-80}\) GeV).

The signal hypothesis in the three- and four-lepton channel result is also tested in a combined fit with the similar type-III seesaw search regions in the two-lepton channel [12]. All the CRs, VRs and SRs in the various lepton multiplicity regions are statistically independent. The reconstruction algorithms and working points are the same in all cases, and the FNP and lepton charge misidentification backgrounds are estimated using the same method. The parameter of interest, namely the number of signal events, and common systematic uncertainties are treated as correlated. Normalisations of the diboson, \(t\bar{t}\) (for the two-lepton multiplicity region) and rare top-quark (for the three- and four-lepton multiplicity regions) backgrounds are treated as uncorrelated since they account for different physics processes and different acceptances in each final state. The three-lepton channel’s limit dominates in the high heavy-lepton mass region, while the two-lepton channel dominates in the lower mass region. The combined observed (expected) exclusion limits on the total cross-section are shown in Fig. 7, excluding heavy-lepton masses lower than 910 GeV (\(960^{+90}_{-80}\) GeV) at 95% CL. The combined observed (expected) exclusion limits on the total cross-section restraining to three-lepton and four-lepton channels are shown in Figs. 8 and 9 respectively.

Fig. 7
figure 7

Expected and observed exclusion limits in the two-lepton channel (from Ref. [12]), the three- and four-lepton channels, and the two-, three- and four-lepton channels for the type-III seesaw process with the corresponding one- and two-standard-deviation uncertainty bands, showing the 95% CL upper limit on the cross-section. The theoretical signal cross-section prediction, given by the NLO calculation [35, 36], with its corresponding uncertainty band is also shown

Fig. 8
figure 8

Expected and observed 95% \({\textrm{CL}}_{\textrm{s}} \) exclusion limits in the three lepton channel for the type-III seesaw process with the corresponding one- and two-standard-deviation bands, showing the 95% CL upper limit on the cross-section. The theoretical signal cross-section prediction, given by the NLO calculation [35, 36], is shown with the corresponding uncertainty bands for the expected limit

Fig. 9
figure 9

Expected and observed 95% \({\textrm{CL}}_{\textrm{s}} \) exclusion limits in the four lepton channel for the type-III seesaw process with the corresponding one- and two-standard-deviation bands, showing the 95% CL upper limit on the cross-section. The theoretical signal cross-section prediction, given by the NLO calculation [35, 36], is shown with the corresponding uncertainty bands for the expected limit

9 Conclusion

ATLAS has searched for pair-produced heavy leptons predicted by the type-III seesaw model in \({139}~{\hbox {fb}^{-1}}\) of data from proton–proton collisions at \( \sqrt{s} = {13}~{\text {TeV}} \), recorded during the 2015–2018 data-taking period. A lower limit on the mass of the type-III seesaw heavy leptons \(N^{0}\) and \(L^{\pm }\) is derived for final states with three or four light leptons. No significant deviation from SM expectations is observed. The observed (expected) exclusion limit on the heavy-lepton mass is 870 GeV (\(900^{+80}_{-80}\) GeV) at the 95% CL. This result is combined with the result of the two-lepton analysis, whichused very similar experimental methodologies and treatment of statistics. In the full combination, heavy leptons with masses below 910 GeV are excluded at the 95% CL, while the expected lower limit on the mass is \({960^{+90}_{-80}}~{\hbox {GeV}}\). This is the most stringent limit to date on the type-III seesaw model from events with light leptons at LHC.