1 Introduction

Since the resurrection of jet-substructure methods as probes for new particles at the LHC [1, 2], boosted topologies in which multiple decay products from heavy intermediate states fall into a single large-radius (large-R) jet have seen wide application in searches for new physics [3,4,5,6,7,8]. While not initially considered in the early days of the LHC, these jet substructure techniques are now indeed largely used to extend the sensitivity of searches for new physics. This is particularly the case as the currently null results of those searches indicate that any relevant physics beyond the Standard Model (BSM) is most-likely located at a large mass scale, featuring heavy particles whose production and decay would naturally yield highly-boosted lighter Standard Model (SM) objects.

Many collider signatures can benefit from the usage of jet substructure methods, as they can be generally applied to tag many SM and BSM particles when they are produced with a high Lorentz boost. Among these, the top quark is an important target, for two reasons. First, the top quark is the highest-mass fermion in the SM, featuring a Yukawa coupling value close to 1. This makes it a natural candidate to provide an explanation for the hierarchy problem, and to play the role of a mediator that couples to new-physics sectors (e.g. through the Higgs field). Second, boosted methods can provide better background-rejection power than a classic ‘resolved’ reconstruction of the top quark kinematics. As a high-mass, colour-charged and non-hadronising particle, the top quark is the most complex SM resonance to reconstruct from fully resolved decay components. This not only requires highly performant b-tagging, but also suffers from either a complicated lepton and missing-momentum reconstruction or the resolution difficulties inherent to reconstructing a fully hadronic \(b\bar{q}q'\) final state.

Jet-substructure methods offer a way to bypass many of the difficulties related to the reconstruction and identification of hadronically-decaying top quarks by relying on one single large-radius jet in place of three small-radius ones. In addition, such an option generally exploits the presence of two heavy SM particles’ decay hierarchies within the large-R jet (the top quark itself and the W boson originating from its decay), together with information on the internal momentum and angular structure of all jet constituents (with or without b-tagging requirements) to disambiguate boosted top quarks from jets originating from pure QCD background processes. A prominent tool in such studies is the HEPTopTagger method [9, 10], which pioneered this approach and has since gone through several rounds of enhancement such as use of variable-radius jet clustering. In the meantime more sophisticated and efficient top tagging methods have been developed. Typical examples are based on a classification of jets making use of the radiation pattern within a jet (also known as shower deconstruction) [11], on advanced machine learning techniques (we refer to Ref. [12] for an overview) relying on observables like the jet transverse momentum and mass, the dispersion of its constituents estimated through N-subjettiness variables [13, 14], splitting scales [15], energy correlation functions [16, 17], as well as on jet image analysis by means of neutral networks [18,19,20] and image or language recognition techniques [21,22,23,24]. More recently, a series of machine-learning methods embedding Lorentz invariance [25, 26] have additionally been proposed and explored. The HEPTopTagger method, however, still plays the role of being an important benchmark in the top-tagging landscape, especially in the context of use by the LHC experiments [27, 28]. On the other hand, the related code has historically been unavailable for use in analysis prototyping and preservation within the two public analysis frameworks MadAnalysis 5 [29,30,31] and Rivet [32, 33], that are widely used across the high-energy physics community. The goal of the present work is to fill this gap, and to document, through a few examples, its addition to both frameworks. It also therefore serves as a prototype interface for integration of C++ versions of machine-learning taggers into these public analysis toolkits.

While most applications of boosted top quark reconstruction have been aimed at direct searches for new physics, the lack of tangible evidence for new high-mass resonances urges complementary studies of indirect routes through which BSM physics can manifest. A leading approach in this is that of the Standard Model Effective Field Theory (SMEFT), in which the explicit microscopic physics of a particular BSM model is replaced by an infinite set of higher-dimensional operators involving the SM fields and compatible with the SM symmetries [34,35,36]. The SMEFT is then an expansion in an energy scale \(\Lambda \) above which the effective theory breaks down and real new physics resides, so that new fields with masses comparable with \(\Lambda \) must be explicitly added to the model’s Lagrangian. Details about the UV theory are encoded in the Wilson coefficients multiplying each operator, and the relevance of a specific new interaction is dictated (a priori) by the dimension of the corresponding operator (that is thus suppressed by some power of the effective scale). At dimension six, 84 (3045) parameters encode the leading BSM effects, assuming a flavour-blind (flavour-general) setup [37, 38]. Constraints are typically made primarily in the model-independent space of the corresponding Wilson coefficients by investigating the possibility of small and often subtle deviations from the SM expectation. Among all operators, about twenty of them impact top physics under the simplifying assumption that new physics couples dominantly to bosons and to the left-handed doublet and right-handed up-type singlet of third generation quarks [39].

Global SMEFT interpretations of measurements at the LHC in the top sector have recently been achieved by several groups [40,41,42,43,44,45]. These studies demonstrated in particular that dozens of SMEFT operators could be constrained (and therefore determined) simultaneously, correlating sometimes information originating from different sectors. It is nevertheless well known that signatures of processes involving boosted top quarks could be crucially relevant [46]. These indeed involve large momentum transfers, so that they are expected to exhibit the largest sensitivity to new physics effects in the SMEFT, and subsequently show the most sensitivity to BSM phenomena. It is therefore natural to focus on high-momentum collider event categories involving the production of boosted top quarks, and to consider them as a promising avenue to statistically constrain the viable space of Wilson coefficients associated with top quark operators.

In the present study, we make use of the HEPTopTagger functionalities that we implemented in the Rivet and MadAnalysis 5 frameworks (together the possibility of computing emulated reconstruction-level observables) to study the sensitivity of the LHC to top-related SMEFT operators, focusing on the production of a pair of boosted top quarks. However, the HEPTopTagger algorithm is designed to exploit as much as possible the kinematics of the SM decay of a boosted top quark. This leads to the open question about how new physics effects arising from the introduction of non-zero top quark SMEFT operators could modify these kinematics, and hence impact the performance of HEPTopTagger and, by inference, of any similar reconstruction method based on the topology arising from an SM top quark decay. As a straightforward application and keeping this in mind, we highlight important resulting issues for BSM interpretations.

In Sect. 2, we detail our technical developments in Rivet (Sect. 2.2) and MadAnalysis 5 (Sect. 2.3), and briefly explain how to use the codes for physics studies. In Sect. 3, we exemplify the usage of these developments to estimate the impact of new physics via effective SMEFT operators on HEPTopTagger performance, and how this affects the sensitivity of the present and future runs of the LHC (assuming an integrated luminosity of 300 fb\(^{-1}\) and 3000 fb\(^{-1}\) and varied levels of systematic errors) to these operators. We summarise our work in Sect. 4.

2 HEPTopTagger implementation in the Rivet and MadAnalysis 5 frameworks

2.1 Generalities

In its initial proposal [9], the HEPTopTagger algorithm is a purely deterministic top-tagging method in which boosted top reconstruction is solely achieved from the geometrical structure and properties of the constituents of a fat jet. It first defines a fat jet collection from an event final state by using the Cambridge-Aachen jet algorithm [47,48,49]. A procedure is next applied to all jets included in this collection, in order to decide whether they should be top-tagged.

In practice, each reconstructed fat jet is decomposed into several subjets by applying a mass drop criterion [50]. More precisely, jet clustering is iteratively undone so that each fat jet is split in two subjets, and the subjet with the smallest invariant mass is kept only if its invariant mass is large enough. Each resulting subjet is further decomposed in the same manner provided that its invariant mass is larger than some threshold. All possible triplets of jets belonging to the subjet collection obtained in this way are then filtered, and the five hardest filtered subjets are selected for boosted top quark reconstruction.

These five subjets are reclustered into three subjets, that are thus assumed to originate from a top quark decay. Events are at this stage rejected if they do not include any resulting triplet with an invariant mass that is compatible with the top mass. Top tagging stems from several requirements that are imposed on the invariant masses of the different dijet pairs that could be formed from the three subjects of any boosted top quark candidate, in particular in order to ensure the compatibility with the presence of an intermediate W boson. Moreover the transverse momentum of the top candidate is required to be at least 200 GeV.

We refer to the original documentation [9, 10] for a more comprehensive and quantitative presentation of the HEPTopTagger algorithm.

The performance of any top quark tagger can be improved by using an increased set of input variables (as in most multi-variate methods), for which the explicit choices are made through a tuning process relative to a given reference. To this end, the HEPTopTagger method has been updated and now includes a variety of features that enhance the tagging efficiency and reduce the associated mistagging rates: it uses substructure mass-drop conditions [50], jet trimming [51] and pruning [4, 52] algorithms, and filtering steps [2], in addition to the core requirement that the large-radius jet demonstrates the three-pronged structure characteristic of a boosted top-quark’s hadronic decay. In the current version of the HEPTopTagger package, all these methods are used together in a multi-variate classification [53, 54] which maximises the expected tagging performance.

Access to this tool and all its embedded features within public frameworks like MadAnalysis 5 or Rivet is thus crucial for prototyping and reproducing collider-event data analyses, a key activity in collider phenomenology. In the rest of this section, we discuss technical details about the embedding of HEPTopTagger in these two software tools, and describe how they could practically be used. In practice, we rely on the latest public version of HEPTopTagger (i.e. its version 2 available from the webpage https://www.thphys.uni-heidelberg.de/~plehn/index.php?show=heptoptagger). Moreover, we have validated our implementations by confronting the results of a few test calculations obtained by using the two interfaced versions of HEPTopTagger to those returned by HEPTopTagger when used in a standalone mode.

2.2 Jet substructure tools in Rivet

The implementation of HEPTopTagger within Rivet has been designed on top of its existing jet-analysis toolkit, using the ‘smearing projection’ machinery that simulates kinematic and particle-identification misreconstruction through transfer functions, while preserving links between particle-level and reconstruction-level physics objects. When jet substructure methods are involved, dedicated smearing methods are required, as many observables (e.g. N-subjettinesses) are sensitive to angular correlations between the jet constituents. It is therefore necessary to model the detector’s finite angular resolution to get a realistic detector response, including the inefficiencies related to the hadronic calorimeter. This is achieved, as detailed in Ref. [55], through the directional smearing of the pseudo-rapidity \(\eta \) and azimuthal angle \(\varphi \) variables defining the direction of every jet constituent. As angular deflections are more significant for constituents with a low transverse momentum \(p_\textrm{T} \), this smearing is made \(p_\textrm{T} \)-dependent with greater angular stability at higher momentum. The specific form used, known to describe jet-substructure effects well on the public data, is angular smearing by a Gaussian with a mean of zero and a standard deviation given by

$$\begin{aligned} \sigma _\textrm{ang} = \frac{\alpha }{1+e^{\beta (p^i_T - \gamma )}} \,. \end{aligned}$$
(1)

Here \(\alpha ,\ \beta \) and \(\gamma \) are free parameters, set to \( 0.045,\ 0.013 \) and 31.15, respectively, from the fit detailed in Ref. [55]. Additionally, energy-resolution smearing was performed, using relative scaling by a Gaussian with mean of 1, and width \(\sigma _E \sim 10\%\).

Table 1 Accessors equipping the HEPTopTagger wrapper implemented in Rivet

Our implementation of the HEPTopTagger method in Rivet relies on an object of the class, normally to be declared as a member variable of an analysis (or projection) class,

figure ac

The class is defined in the header file . All available parameters for this wrapper are initialised through an object, that can be used for any further modification relevant for the needs of the user. A simple example is

figure ag

The list of available parameters can be found in the definition of the C++ structure in the file . All parameters that are not explicitly initialised by the user keep their default values which have been chosen according to Refs. [9, 10]. During the execution of a Rivet analysis, a reclustered jet, instantiated as a object, can be processed by the methods of the object, for example in

figure al

Here, refers to the leading (i.e. highest-\(p_\textrm{T} \)) jet included in a vector of clustered jets called (the object being thus of type ). The computation yields the creation of various accessors that returns a variety of information into native Rivet objects. This list of accessors is shown in Table 1.

For a practical example, we refer to the illustrative analysis that can be found in Rivet ’s file.

2.3 Jet substructure tools in MadAnalysis 5

Since 2021, MadAnalysis 5 and its SFS framework for fast simulation of detector effects [56] have been equipped with jet substructure tools and methods.Footnote 1 In particular, the smearing functionality implemented in the SFS framework allows for modifications of the properties of the jets’ constituents, so that the SFS is suitable for the embedding of HEPTopTagger in a way similar to what was achieved for Rivet in Sect. 2.2. As the substructure branch is so far largely undocumented, we take benefit from the present work to provide some details on its functioning and how to make use of the code to embed top tagging in a generic analysis.

When a jet reconstruction algorithm is turned on in MadAnalysis 5, a so-called ‘primary’ jet collection is built from a hadronised event. This primary jet collection is equivalent to the sole jet collection that used to be built in versions 1.X.Y of the code, which was documented in [31, 56]. In practice, the code makes use of its interface with FastJet [57], that can be turned on from the MadAnalysis 5 command line interface by typing

figure ar

A specific jet algorithm is then activated through the commands

figure as

The list of supported algorithms, together with the available properties, is provided in [56]. By default, the anti-\(k_T\) jet algorithm [58] is considered, with a radius parameter \(R=0.4\) () and a minimum \(p_\textrm{T}\) value of \(5 \,\,\textrm{GeV}\) (). The primary jet collection is identified by its jet identifier (or ), that is fixed to by default. This identifier can be further modified through the command

figure ax

Additional jet collections can be instantiated through

figure ay

where refers to the identifier of the collection, to the associated clustering algorithm, and where any algorithm-specific parameter can be optionally fixed through comma-separated or space-separated equalities (otherwise default values are used). For instance, typing

figure bb

defines a jet collection coined , in which jets are reconstructed by means of the Cambridge-Aachen jet algorithm [47,48,49], with a radius parameter set to 0.8 and a minimum \(p_\textrm{T}\) value of \(200 \,\,\textrm{GeV}\). Parameters can also be altered through specific commands, like for instance in

figure bd

Once multiple jet collections are defined, constituent-based smearing is always applied to the properties of all final-state hadrons before the different reconstructions are performed. This contrasts with the setup in which a single collection is defined, as here users can decide to smear reconstructed objects instead of their constituents. Reconstruction efficiencies can also be provided from the command line interface (see [56]), but they will only be applied to the primary jet collection. This limiting behaviour can however be bypassed by employing the expert mode of the code, in which users implement their analysis directly in C++ (and are thus free to do whatever they want). We therefore focus only on this expert mode from now on.Footnote 2

Table 2 Accessors equipping the HEPTopTagger wrapper implemented in MadAnalysis 5

At the level of the C++ code generated by MadAnalysis 5 (or implemented from scratch by expert users), the primary jet collection can be accessed through the standard accessor (as described in [30, 31]), and all jet collections (including the primary one) can be accessed through the accessor (with being the identifier referring to the collection). These accessors return a vector of pointers to constant objects (or objects for short), the entire vector being also of the shorthand type .

In the version 2.0.X of MadAnalysis 5, a namespace has been implemented and includes wrappers to a large set of FastJet and FastJet Contrib functionalities. This substructure module allows for three standard infrared and collinear safe jet-clustering algorithms, that can be initialised as for instance through

figure ce

This initialises a object named in which jet reconstruction relies on the anti-\(k_T\) algorithm with parameter \(R=0.4\), and that selects reconstructed jets featuring \(p_\textrm{T} > 20 \,\,\textrm{GeV}\). In order to make use of the Cambridge-Aachen or the generalised \(k_T\) [57] algorithm, the first argument of the method needs to be set to and respectively. The next arguments are related to the two options available for the three supported algorithms (namely the radius parameter R and the minimum \(p_\textrm{T}\) requirement applied on the reconstructed jets), and the last optional argument () indicates whether leptons and photons originating from hadron decays have to be included in their respective collections in addition to be considered as jet constituents (), or not (). Next, clustering is executed through the command

figure cn

where is the identifier of the jet collection to use to store the output of the clustering, and is an object pointing to the whole event. Smearing and reconstruction efficiencies are automatically included, if provided by the user (see Ref. [56]).

Clustered jets can be further manipulated, either one by one or all together. For instance, the first of the following commands defines a new collection as a sub-selection of all reconstructed (primary) jets satisfying \(p_\textrm{T} > 20 \,\,\textrm{GeV}\) and \(|\eta | < 2.5\). The next two lines are dedicated to the initialisation of a new clustering method (the Cambridge-Aachen algorithm with a radius parameter \(R=0.5\), that is the sole parameter that can be specified here), with which those jets will be reclustered,

figure cs

Here, we assume that the primary jets have been clustered through some (unspecified) algorithm. Next, we make use of the object, a first time on the whole jet collection, and a second time specifically on the leading jet,

figure cu

As another example, we now discuss jet reconstruction in which the radius parameter R is variable [59].Footnote 3 Such a method can be used from the wrapper as follows,

figure cw

The clustering type must be (Cambridge-Aachen), (\(k_T\) algorithm) or (anti-\(k_T\) algorithm), the parameters and stand for the minimum and maximum radius values allowed, and the internal clustering strategy to be used by FastJet has to be among , , , or . We refer to Ref. [59] for more information. Reclustering is then proceeded as above,

figure dh

In order to enable the usage of HEPTopTagger within MadAnalysis 5, the package must first be downloaded and linked to the code. This is achieved by typing in the MadAnalysis 5 command line interface

figure di

once FastJet and FastJet Contrib are installed and available (which is achieved by typing in the interpreter the command ). When implementing an analysis in C++, the execution of HEPTopTagger is controlled from a dedicated structure called . The latter is defined in the file “”, together with all associated parameters and methods, and it is documented in the file “”. Taking the example introduced in Sect. 2.2, a simple example of initialisation would read

figure dn

HEPTopTagger is then executed as

figure do

As for the embedding into Rivet, this method leads to the generation of a variety of accessors that allows for the exploration of the properties of the would be top-jet. Their list is given in Table 2. For more detailed practical examples on the usage of jet substructure techniques and HEPTopTagger within MadAnalysis 5, we refer to the tutorial available from https://github.com/MadAnalysis/tutorial_osu.

3 Exploring new physics effects with boosted top quarks in the SMEFT

In this section, we demonstrate the use of HEPTopTagger (version 2) within the Rivet and MadAnalysis 5 frameworks, and we study the potential impact of SMEFT operators on boosted top quark decays. The set of relevant operators that we consider is introduced in Sect. 3.1. In Sect. 3.2, we focus on the production of a semi-leptonically decaying \({{t\bar{t}}}\) pair to investigate how SMEFT deviations in the properties of boosted top quarks affect the performance of top taggers (through deviations from the taggers’ expectations of SM-like top quark decay properties). Next, we make use of our findings to derive in Sect. 3.3 the sensitivity of a typical analysis of boosted top-pair production and decay to various SMEFT operators poorly constrained by other means.

3.1 Theoretical framework

In the absence of any explicit evidence for new fields and interactions beyond the SM, effective field theories provide a natural path to scrutinising the impact of hypothetical BSM physics at the electroweak scale \(\Lambda _\textrm{EW}\). In this context, the SMEFT paradigm offers a very promising framework allowing for the exploration of heavy new physics. The SMEFT is an effective field theory expansion in an energy scale \(\Lambda \) that is assumed to satisfy \(\Lambda \gg \Lambda _\textrm{EW}\). The model Lagrangian is defined via a set \(\{ \mathcal{O}_1, \mathcal{O}_2,... \}\) of higher-dimensional (i.e. non-renormalisable) operators in the SM fields. Assuming that the leading new-physics effects arise at dimension six, this Lagrangian reads

$$\begin{aligned} \mathcal {L}_\textrm{SMEFT} = \mathcal{L}_\textrm{SM} + \sum _j \frac{C_j}{\Lambda ^2}\mathcal {O}_j + \mathcal{O}\big (1/\Lambda ^3\big ), \end{aligned}$$
(2)

where \(\mathcal {L}_\textrm{SM}\) is the SM Lagrangian, and the Wilson coefficients \(C_j\) encode the BSM details of the theory. Among the 3045 free parameters in this general SMEFT Lagrangian of Eq. (2) [37, 38], only a few are relevant for top quark physics.

We consider a scenario in which CP is conserved, and we next assume that new physics only couples to the weak doublet of left-handed top and bottom quarks (Q) and the right-handed weak singlets (t and b) of third-generation quarks (as well as to SM bosons). Moreover, bosonic operators leading to flavour-universal effects are discarded, we approximate the CKM matrix by the identity matrix, and all Yukawa couplings but those of the top and bottom quarks are neglected. In order to further reduce the number of free parameters, we consider a \(U(2)_q\times U(2)_u \times U(2)_d\) flavour symmetry among the quarks of the first and second generations, in agreement with the principle of minimal flavour violation [60,61,62]. Differences between the first and second-generation quarks are thus ignored, and we subsequently introduce the generic notation q for a left-handed weak doublet of first-generation or second-generation quark fields, and u and d for the corresponding right-handed weak singlets of up-type and down-type quark fields.

In our analysis, we aim to leverage the detector-simulation capabilities of the MadAnalysis 5 and Rivet frameworks (including our implementation of HEPTopTagger) to realistically explore the effects of effective operators on the reconstruction performance of boosted top quarks. Among the full set of potentially impactful SMEFT operators [39], only eight of them are not too strongly constrained by other means [40,41,42,43,44,45], so that an investigation of pair-production and decay of boosted top quarks could offer new handles on them. They read, in the notation of Ref. [42],

$$\begin{aligned}{} & {} \mathcal {O}_{Qq}^{1,8} = (\bar{Q} \gamma _\mu T^A Q)\ \ (\bar{q} \gamma ^\mu T^A q), \nonumber \\{} & {} \mathcal {O}_{Qq}^{3,8} = (\bar{Q} \gamma _\mu T^A\sigma ^I Q)\ \ (\bar{q} \gamma ^\mu T^A \sigma ^I q),\nonumber \\{} & {} \mathcal {O}_{Qq}^{3,1} = (\bar{Q} \gamma _\mu \sigma ^I Q)\ \ (\bar{q} \gamma ^\mu \sigma ^I q), \nonumber \\{} & {} \mathcal {O}_{tu}^8 = (\bar{t}\gamma _\mu T^A t)\ \ (\bar{u}\gamma ^\mu T^A u),\nonumber \\{} & {} \mathcal {O}_{td}^8 = (\bar{t}\gamma _\mu T^A t)\ \ (\bar{d}\gamma ^\mu T^A d), \nonumber \\{} & {} \mathcal {O}_{Qu}^8 = (\bar{Q}\gamma _\mu T^A Q)\ \ (\bar{u}\gamma ^\mu T^A u) ,\nonumber \\{} & {} \mathcal {O}_{Qd}^8 = (\bar{Q}\gamma _\mu T^A Q)\ \ (\bar{d}\gamma ^\mu T^A d), \nonumber \\{} & {} \mathcal {O}_{tq}^8 = (\bar{q}\gamma _\mu T^A q)\ \ (\bar{t}\gamma ^\mu T^A t), \end{aligned}$$
(3)

where the matrices \(T^A\) stand for the generators of \(SU(3)_c\) in the fundamental representation, and the matrices \(\sigma ^I\) are the usual Pauli matrices.

3.2 Top tagging performance in the presence of non-vanishing SMEFT operators

In order to assess how non-zero values for the Wilson coefficients associated with the SMEFT operators of Eq. (3) affect top quark tagging performance, we make use of MadGraph5_aMC@NLO version 3.0.3 [63] to generate parton-level events describing top-antitop production and their semi-leptonic decay at the LHC (operating at a centre-of-mass energy of 13 TeV). We rely on leading-order matrix elements convolved with the leading-order set of NNPDF3.0 parton distribution functions [64] provided through the Lhapdf6 library [65]. For efficiency reasons, the Monte Carlo event generation was kinematically biased to high scales, and we required that the invariant mass of the produced \({{t\bar{t}}}\) system satisfies \(m_{t\bar{t}}^\textrm{truth} > 950 \,\,\textrm{GeV}\). These fixed-order events are matched with parton showering and hadronisation as modelled by Pythia version 8.2 [66]. Background events are generated with the same toolchain, but by considering the production of a leptonically-decaying W boson in association with a pair of b jets (and two additional jets), \(pp\rightarrow W b \bar{b} + \text {jets}\).

Our canonical analysis was implemented in Rivet version 3 [33].Footnote 4 It employs FastJet version 3.3.3 [57] for event reconstruction, and HEPTopTagger version 2 [10] in its default configuration. We remind that the latter has been tuned on boosted top quarks with properties as expected from their SM production and decay, which may thus not be the best for scenarios in which SMEFT effects change the properties of the produced tops. In our usage of HEPTopTagger, we turn on the ‘optimal R’ option. This allows the tagging algorithm to determine the minimum choice for the fat jet reconstruction radius to ensure that the reconstructed top jet includes a three-prong structure (as expected from standard top quark decays).

Our event reconstruction is achieved by first defining a collection of ‘small jets’ through the clustering of all visible hadron-level final-state objects with a pseudo-rapidity \(|\eta |<4.5\), muons excepted. We use the anti-\(k_T\) jet algorithm [58] with radius parameter \(R=0.4\), and then impose a minimum transverse-momentum requirement of \(p_\textrm{T} > 30 \,\,\textrm{GeV}\) on the reconstructed small jets. Next, we define a collection of ‘fat jets’ from the same hadron-level objects. This collection is constructed by using the Cambridge-Aachen algorithm [47,48,49] with a radius parameter \(R=1.5\). We impose a minimum transverse momentum requirement of \(p_\textrm{T} > 200 \,\,\textrm{GeV}\) on the reconstructed fat jets.

Lepton candidates (i.e. electrons and muons) are required to satisfy basic momentum and pseudo-rapidity criteria, \(p_\textrm{T} > 10 \,\,\textrm{GeV}\) and \(|\eta |<2.5\). At this stage, \(\Delta R\)-based isolation is enforced in order to remove the overlap between the lepton collection and the two jet collections. We remove from the small-jet collection any small jet j lying in the vicinity of a lepton \(\ell \) by an angular distance \(\Delta R(\ell ,j) < 0.1\), and we then discard any lepton lying at a distance \(\Delta R(\ell ,j) < 0.4\) of any of the remaining small jets. Moreover, we define b jets as small jets with \(p_\textrm{T} > 30 \,\,\textrm{GeV}\) and with a ghost-associated b-hadron with \(p_\textrm{T} > 5\,\,\textrm{GeV}\) [67, 68].

After reconstruction, we select events whose topology is compatible with that expected from the production of a pair of boosted top quarks that decays semi-leptonically. We require that each selected event features one lepton with at least \(50\,\,\textrm{GeV}\) of transverse momentum, a minimum missing transverse energy , as well as at least two small b jets and two small light jets. Next, we reconstruct the leptonically-decaying W boson that we consider on-shell. This assumption implies that the invariant mass of the system comprising the lepton and the missing momentum is equal to the mass \(m_W\) of the W boson, which allows us to determine the longitudinal component of the missing momentum,

(4)

In the above expression, \(\textbf{p}_\ell = (p_{\ell , x}, p_{\ell , y}, p_{\ell , z})\) denotes the three-momentum of the lepton, is the missing three-momentum, and \(E_\ell \) stands for the energy of the lepton. From the solution to Eq. (4), we can define the four-momentum of the leptonically-decaying W boson \(W_\textrm{L}^\text {rec}\). In the case where this equation has two solutions, we arbitrarily choose the smallest value for . Moreover, when it has no solution, we set the associated discriminant to 0 and use the resulting solution.

In order to reconstruct the leptonically-decaying top quark, we match this reconstructed W boson with one of the b jets by minimising the difference between the top mass \(m_t\) and the invariant mass \(m[W_\textrm{L}^\text {rec} \oplus b]\) of the system constituted of the reconstructed W boson \(W_\textrm{L}^\text {rec}\) and the b jet. This is achieved through a \(\Delta \chi ^2\) minimisation,

$$\begin{aligned} \Delta \chi ^2 = \frac{|m_t - m[W_\textrm{L}^\text {rec} \oplus b]|^2}{\sigma ^2} \equiv \frac{|m_t - m(t_\textrm{L}^\text {rec})|^2}{\sigma ^2}, \end{aligned}$$
(5)

with a mass-resolution parameter \(\sigma = 40 \,\,\textrm{GeV}\). The b-jet matched in this leptonic-top reconstruction is denoted by \(b_\textrm{L} \) in the following text.

Fig. 1
figure 1

Invariant mass spectra relevant to the reconstruction of the leptonically decaying top quark. We display the invariant mass \(m(W_\textrm{L}^\text {rec})\) of the reconstructed W boson (upper panel), as well as that (\(m(t_\textrm{L}^\text {rec})\)) of the reconstructed top quark (lower panel). Predictions are shown for both the \({{t\bar{t}}}\) signal (red) and the associated background (blue)

Figure 1 illustrates the features of the reconstruction of the leptonic branch of the process. It shows the distribution in the invariant mass \(m(W_\textrm{L}^\text {rec})\) of the reconstructed W boson (upper panel) and that in the invariant mass \(m(t_\textrm{L}^\text {rec})\) of the reconstructed top quark (lower panel). Predictions are displayed both for the \({{t\bar{t}}}\) signal (red) and the associated background (blue). These results demonstrate that most signal events exhibit an on-shell leptonically-decaying W-boson and an on-shell associated top quark. However, the tails of the distributions extend quite significantly away from the peak values for the two spectra. This originates from the inefficiencies inherent to the kinematic fit performed in Eq. (4), which could lead to zero, one, or two solutions for . Consequently, the reconstructed mass of the \(W_\textrm{L}^\text {rec}\) boson (upper panel of Fig. 1) exhibits a plateau at values lower than the true W mass. This impacted our choice for the numerical value of the resolution parameter used in the \(\chi ^2\) fit of Eq. (5), which then leads to a quite broad peak around the true top mass for the distribution in the reconstructed top mass (lower panel of Fig. 1).

In the next step of our analysis, we study to which extent a hadronically-decaying top quark can be reconstructed from the event’s final state. We start from the fat-jet collection and discard any fat jet J that lies at angular distance \(\Delta R (J, t_\textrm{L}^\text {rec}) \le 1.5\) of the reconstructed leptonically-decaying top quark \(t_\textrm{L}^\text {rec} \). Next, we discard all fat jets found near the \(b_\textrm{L} \) jet, i.e. lying within a angular distance \(\Delta R (J, b_\textrm{L}) \le 1.5\). Finally, we reject events that do not comprise at least one fat jet that includes a (small) b-jet. This condition is implemented by requiring that there is a fat jet J such that a b-jet different from the \(b_\textrm{L} \) jet lies at a distance \(\Delta R(J,b) < 1.5\) from it. We then test whether the hardest of the remaining fat jet is top-tagged by HEPTopTagger.

Fig. 2
figure 2

Efficiency associated with the reconstruction of one leptonic and one hadronic top quark, estimated relatively to the number of events containing two on-shell top quarks. Results are shown after the analysis baseline cuts (red), an additional \(m^\text {truth}_{{t\bar{t}}}> 1\, \textrm{TeV}\) cut (blue), and an extra \(m^\text {truth}_{{t\bar{t}}}> 1.5 \,\textrm{TeV}\) cut (green). We consider the case of the SM (first column), as well as when eight different SMEFT operators are turned on (next columns)

We now introduce a few useful quantities in order to assess the performance of HEPTopTagger. First, we classify a truth-level top quark as “on-shell” when its invariant mass is in the range \([m_t-15\,\,\textrm{GeV}, m_t+15 \,\,\textrm{GeV}]\), and define the quantity \(T_{{{t\bar{t}}}}\) as the number of \({{t\bar{t}}}\) events featuring two such on-shell top quarks. Next, we denote by \(C_{t_H}\) the number of events for which the reconstructed hadronic top quark lies within an angular distance \(\Delta R <1.2\) from the corresponding truth-level object when the latter is on-shell.Footnote 5 Similarly, \(C_{t_\textrm{L}^\text {rec}}\) stands for the number of events for which the reconstructed leptonically-decaying top quark lies at a distance \(\Delta R<1.2\) of its truth-level counterpart when it is on-shell. The quantities \(C_{t_H}\) and \(C_{t_\textrm{L}^\text {rec}}\) hence refer to the number of events for which the reconstructed top quarks are matched with the corresponding truth-level objects so that reconstruction is deemed correct.

With the first set of three coloured columns displayed on the left of Fig. 2, we show the resulting reconstruction efficiency defined as the ratio of the number of events featuring correctly reconstructed hadronic and leptonic top quarks to the number of events including two truth-level on-shell top quarks, i.e. the self-explanatory quantity

$$\begin{aligned} \varepsilon = \frac{C_{t_H}~\wedge ~C_{t_\textrm{L}^\text {rec}}}{T_{{{t\bar{t}}}}} \equiv \frac{\left( C_{t_H}~\textsc {and}~C_{t_\textrm{L}^\text {rec}}\right) }{T_{{{t\bar{t}}}}}. \end{aligned}$$
(6)

This efficiency is given when the baseline cuts described above are imposed (red), when an additional selection of \(m^\text {truth}_{{t\bar{t}}}> 1\, \textrm{TeV}\) is enforced (blue), and finally, when we require \(m_{{{t\bar{t}}}}^\text {truth} > 1.5\, \textrm{TeV}\) (green). The error bars represent the related Monte Carlo statistical uncertainty. We observe that about 50% of the SM events with on-shell \({{t\bar{t}}}\) production are correctly reconstructed, this number slightly increasing when we focus more deeply on the boosted regime (i.e. with a larger \(m^\text {truth}_{{t\bar{t}}}\) cut).

Fig. 3
figure 3

Same as in Fig. 2 but for the efficiency associated with the reconstruction of one leptonic top quark, estimated relatively to the number of events containing two on-shell top quarks

Fig. 4
figure 4

Dalitz plots depicting the invariant mass ratios \(m_{13}/m_{123}\) and \(m_{23}/m_{123}\) where the indices refer to a specific jet among those comprising the reconstructed hadronically-decaying top quark. We show predictions when the on-shellness of the top quark is enforced (left column) and when there is no restriction on the invariant mass of the top quarks at parton level (right column). We consider the case of the SM (top row) and that of scenarios with one SMEFT operator turned on, namely \(\mathcal{O}_{Qq}^{3,1}\) (middle row) and \(\mathcal{O}_{Qq}^{3,8}\) (bottom row)

The efficiency, however, increases once one of the SMEFT operators of Eq. (3) is turned on, as shown in the rest of Fig. 2 (the dashed lines being guidelines for the comparison with the case of the SM). Here, the signal is simulated by implementing the Lagrangian and operators of Eqs. (2) and (3) in FeynRules as specified in Refs. [69, 70]. This is then used to generate a UFO [71] model to be used within MadGraph5_aMC@NLO so that events could be generated through the same toolchain as that described at the beginning of this section. However, whereas we include the interference of dimension-six contributions with SM diagrams, squared SMEFT contributions (thus formally of dimension-eight) are truncated away. The increase in efficiency observed in Fig. 2 can be traced back not only to a slight increase in the signal cross section, but also to a change in the event topology enhancing HEPTopTagger ’s ability to correctly tag the boosted, hadronically-decaying top quark. To prove this statement, we display in Fig. 3 the efficiency \(\varepsilon '\) of correctly tagging the leptonic top \(t_\textrm{L}^\text {rec} \) regardless of the hadronic branch of the events,

$$\begin{aligned} \varepsilon ' = \frac{C_{t_\textrm{L}^\text {rec}}}{T_{{{t\bar{t}}}}}. \end{aligned}$$
(7)

As can be seen in this figure, the efficiency \(\varepsilon '\) is almost 100% for all considered scenarios (both in terms of new-physics setup and the parton-level \(m^\text {truth}_{{t\bar{t}}}\) cut). This confirms that the global suppression of the efficiency \(\varepsilon \) shown in Fig. 2 (relative to \(\varepsilon '\)) originates solely from the tagging of the hadronic top quark, and is therefore related to the performance of HEPTopTagger. The latter can thus directly be assessed from the quantity \(\varepsilon \), and it is different between SM \({{t\bar{t}}}\) events and those including the interference of top-related SMEFT operators with the SM.

Our results demonstrate that the performance of HEPTopTagger could be strongly impacted by the physics model that is used as a reference during its tuning. Including effective operators such as those in Eq. (3) favours the production of a boosted top-antitop pair more than in the SM, as expected from operators sensitive to the event’s energy scale. While in this case the presence of operators not included in the HEPTopTagger tuning enhances the reconstruction efficiency, this is not generally true, and a tuning based on potential EFT contributions could find different optimal tagging parameters.

Importantly, analyses assuming SM-like HEPTopTagger reconstruction efficiencies, would underestimate the reconstruction and tagging efficiency for any data \({{t\bar{t}}}\) events involving these operators, and would hence systematically overestimate the magnitude of the corresponding Wilson coefficient. This observation reinforces the importance of using operator-dependent reconstruction efficiencies in SMEFT fits to boosted top quark data.

The presented efficiencies are, however, normalised to the number of events featuring an on-shell \({{t\bar{t}}}\) pair. The obtained increase in the tagging efficiency \(\varepsilon \) in the presence of SMEFT operators may, therefore, also be related to a different probability of getting at least one off-shell top quark in the events. This problem is addressed by the Dalitz-plot heat-maps shown in Fig. 4, which depict the on-shellness of the produced hadronic top quark. In these figures, we display the correlations between two ratios of invariant masses, \(m_{13}/m_{123}\) and \(m_{23}/m_{123}\). The three integers 1, 2 and 3 denote the three (\(p_\textrm{T} \)-ordered) subjets comprised in the hadronically-decaying boosted top quark, so that \(m_{123}\) stands for the invariant mass of the three-subjet system, \(m_{13}\) for the invariant mass the system made of the leading and third subjets, and \(m_{23}\) for that of the system made of the second and third subjets. We present results by restricting the events to those events featuring on-shell top quarks (left column) and for the entire generated samples (right column). Moreover, we explore the difference between the SM (top row), a scenario in which the \(\mathcal{O}_{Qq}^{3,1}\) operator of Eq. (3) is turned on (middle row), and a scenario in which the \(\mathcal{O}_{Qq}^{3,8}\) operator of Eq. (3) is turned on (bottom row).

As can be seen, the jet combinatorics are correctly resolved in most events in the case of the SM. The leading jet is most often that originating from the two-body \(t\rightarrow W b\) decay (with the b-tagging information being ignored), and the next two jets are those stemming from the hadronic W-boson decay. The distribution of the \(m_{23}/m_{123}\) ratio is indeed concentrated around \(m_W/m_t\) for the two subfigures of the upper row of Fig. 4. The spread around this value is more pronounced when no restriction is enforced on the invariant mass of the top quarks at parton level, as observed from a comparison of the predictions shown in the top-left and top-right figures. This can be easily explained by the inefficiency of HEPTopTagger to correctly tag off-shell top jets, as, by default, the algorithm has been tuned on events featuring on-shell top quarks.

This situation changes slightly when EFT operators are enabled (middle and lower rows of Fig. 4). First, although the associated amplitude does not feature any intermediate W boson (as the decay of the top quark proceeds via a single four-fermion operator), the interference with the SM diagrams (our predictions being truncated at dimension-six) is sufficient to keep the properties that the leading jet is the b-jet, and that the next two jets can be paired to reconstruct a hadronically-decaying W boson. It is additionally noticeable that the effective operators considered affect the reconstructed top quark so that the latter is naturally more often on-shell (and more boosted due to the energy growth inherent to the effective-theory paradigm). Consequently, we can expect better performance of HEPTopTagger, which confirms what was already found in Fig. 2.

3.3 Boosted tops as a probe to new physics in the SMEFT

Table 3 Number of \({{t\bar{t}}}\) and \(Wb\bar{b}\)+jets SM events surviving each step of our analysis, presented together with their respective selection efficiency \(\varepsilon \). The results are normalised to an integrated luminosity of 300 \(\hbox {fb}^{-1}\). In the last row of the table, we provide two alternative means to assess the analysis significance, namely the S/B and \(S/\sqrt{B}\) ratios where S and B are the number of \({{t\bar{t}}}\) and background events passing all cuts

In this section, we explore how the findings of Sect. 3.2 affect the sensitivity of the LHC to SMEFT effects originating from the operators of Eq. (3). We begin by providing, in Table 3, the numbers of events surviving each of the selection cuts introduced in the previous section, both for the \({{t\bar{t}}}\) signal and the \(W b \bar{b}\) + jets background. Our results are normalised to an integrated luminosity of \(300 \,\textrm{fb}^{-1}\), and we additionally estimate the efficiencies associated with each cut, which we define as the ratio of the number of events surviving a given cut to the number of events surviving the previous cut. Whereas the last cut on the invariant mass of the reconstructed \({{t\bar{t}}}\) system (i.e. the ninth one in the table, \(m_{{{t\bar{t}}}}^\text {rec}> 950 \,\,\textrm{GeV}\)) is not necessary for physics-analysis purposes, it is required to match the Monte Carlo signal-generation cut implemented in Sect. 3.2 (to enable a more efficient event-generation process in the boosted regime).

As already noticeable from the results introduced earlier in this manuscript, for instance from the invariant-mass spectra displayed in Fig. 1, the events surviving the entire selection are primarily dominated by signal events, which hence have large expected event-counts. This is further reflected in the S/B and \(S/\sqrt{B}\) ratios provided as significance estimators in the lower rows of Table 3, these two metrics being evaluated in terms of the number of signal events S and background events B passing all the analysis cuts. The background is thus fully under control in our study, so a shape analysis can be implemented to study how kinematic distributions can be best used to constrain the SMEFT-operators’ Wilson coefficients.

To do this, we first increase the final selection cut to maximise sensitivity by probing more deeply boosted top-antitop production. In the following, we hence consider either \(m^\text {rec}_{{{t\bar{t}}}} > \) \(1\, \textrm{TeV}\) or \(m^\text {rec}_{{{t\bar{t}}}}>\) \(1.5\, \textrm{TeV}\). The sensitivity of the LHC to a given SMEFT operator is derived through the evaluation of a \(\chi ^2\) test-statistic in an asymptotic scheme that involves deviations of SMEFT predictions relative to the associated SM predictions for a given set of observables. Our analysis explores simultaneously the distributions of the following observables:

  • the invariant mass \(m^\text {rec}_{{t\bar{t}}}\) of the di-top system;

  • the transverse momentum \(p_\textrm{T} (j^{R=1.5})\) of the leading fat-jet;

  • the transverse momenta \(p_\textrm{T} (j^{R=0.4}_1)\), \(p_\textrm{T} (j^{R=0.4}_2)\) and \(p_\textrm{T} (j^{R=0.4}_3)\) of the three leading small-R jets;

  • the transverse-momentum spectrum \(p_\textrm{T} (t_\textrm{H})\) of the reconstructed hadronic top quark;

  • the transverse-momentum spectrum \(p_\textrm{T} (t_\textrm{L}^\text {rec})\) of the reconstructed leptonic top quark;

  • the rapidity difference \(\Delta y (t_\textrm{L}^\text {rec},t_\textrm{H})\) between the two reconstructed top quarks;

  • and the azimuthal-angle difference \(\Delta \varphi (t_\textrm{L}^\text {rec},t_\textrm{H})\) between the two reconstructed top quarks.

In order to estimate the \(\chi ^2\) value associated with a specific SMEFT scenario, each of the nine histograms considered was divided into 25 bins (20 and 16 for the \(\Delta y (t_\textrm{L}^\text {rec},t_\textrm{H})\) and \(\Delta \varphi (t_\textrm{L}^\text {rec},t_\textrm{H})\) distributions respectively), and we calculated the quantity

$$\begin{aligned} \chi ^2 = \sum _{i} \frac{\left( N^\textrm{exp}_i -N^\textrm{obs}_i\right) ^2}{\sqrt{N^\textrm{obs}_i + \left( \Delta _\textrm{sys} N^\textrm{obs}_i \right) ^2}}, \end{aligned}$$
(8)

in which we sum over all bins and all histograms. The SM predictions are taken as the null hypothesis, \(N^\textrm{exp}_i\) denoting hence the expected number of events in the SM for a given observable and bin i, \(N^\textrm{obs}_i\) standing for the corresponding SMEFT predictions, and \(\Delta _\textrm{sys} N^\textrm{obs}_i\) referring to the error on the SMEFT predictions. In other words, we enforce that the pseudo-data corresponding to the SM scenario (i.e. the origin of the Wilson coefficient parameter space) corresponds to the background expectation with suppressed statistical and systematical fluctuations, which consists, therefore, of an Asimov dataset. The above \(\chi ^2\) test is thus asymptotically equivalent to a profile likelihood ratio \(\Delta {\chi ^2} = \chi ^2_\textrm{SMEFT} - \chi ^2_\text {best}\) for a given SMEFT scenario with an implicit best-fit reference model evaluated in the case of the SM (therefore with \(\chi ^2_\text {best} = 0\)). Without explicitly performing any profiling, we thus estimate the sensitivity of a profile-likelihood fit by comparing the obtained \(\chi ^2\) values with that expected from a \(\chi ^2\) distribution with one degree of freedom. In practice, however, profiled constraints could be slightly weaker due to a less perfect fit of observed data to the background model.

Table 4 Observable driving the sensitivity of the LHC (at 68% confidence level) to a given SMEFT operator from Eq. (3) (first column). We consider both a perfect situation without systematics (\(\Delta _\mathrm {\,sys} = 0\), second and fourth columns), and one with 10% of systematics (\(\Delta _\mathrm {\,sys}=10\%\), third and fifth columns). Moreover, we present results for 300 fb\(^{-1}\) and 3 ab\(^{-1}\), and for an invariant mass cut of \(m^\text {rec}_{t{\bar{t}}} > 1\, {\textrm{TeV}}\) (upper panel) and \(m^\text {rec}_{t{\bar{t}}} > 1.5\, {\textrm{TeV}}\) (lower panel)

In Table 4, we provide information on the observable found to provide the strongest sensitivity to each SMEFT operator. The results are shown for the two cuts on the invariant mass considered, \(m^\text {rec}_{t\bar{t}} > 1 \,\textrm{TeV}\) (upper panel of Table 4) and \(m^\text {rec}_{t\bar{t}} > 1.5 \,\textrm{TeV}\) (lower panel of Table 4). Moreover, we consider LHC luminosities of 300 fb\(^{-1}\) and 3000 fb\(^{-1}\), and two different options for the amount of systematics \(\Delta _\mathrm {\,sys}\) used in Eq. (8). We take as a reference the ideal situation in which there are no systematic uncertainties (\(\Delta _\mathrm {\,sys} =0\)), as well as a more realistic situation in which we set \(\Delta _\mathrm {\,sys} = 10\%\). In our procedure to extract this information, we define the sensitivity on the basis of a 68% confidence level. When we consider a moderate definition of the boosted regime with \(m^\text {rec}_{t\bar{t}} > 1 \,\textrm{TeV}\), the sensitivity is always driven by the distribution in the transverse momentum of either the leptonically-decaying top quark (\(p_\textrm{T} (t_\textrm{L}^\text {rec})\)) or of the lepton originating from the decay of this top quark (\(p_\textrm{T} (\ell _1)\)). The information brought by the hadronic branch of the event is found to be sub-leading for all SMEFT operators and systematic-uncertainty assumptions. However, the situation changes when the boosted regime is probed more deeply through the tighter cut \(m^\text {rec}_{t\bar{t}} > 1.5 \,\textrm{TeV}\). Here, both top quarks are reconstructed and tagged more accurately (in particular through the better performance of HEPTopTagger in a SMEFT scenario, see Sect. 3.2). This leads to an increased discovery potential through use of a larger set of contributing observables. This statement is illustrated in the lower panel of the table, which displays a greater variability in the leading observable driving the sensitivity of the LHC to a given SMEFT operator, with the \(\mathcal {O}_{Qq}^{1,8}\), \(\mathcal {O}_{Qq}^{3,8}\), \(\mathcal {O}_{Qq}^{3,1}\), and \(\mathcal {O}_{Qd}^{8}\) operators now most sensitive to either hadronic-top or \({{t\bar{t}}}\)-system observables.

Fig. 5
figure 5

Sensitivity of the LHC to the various SMEFT operators of Eq. (3). We present predictions for 300 fb\(^-1\) (blue) and 3000 fb\(^{-1}\) (red), \(\Delta _\mathrm {\,sys} = 0\) (shaded bars) and 10% (solid bars), and we distinguish an analysis of the full \({{t\bar{t}}}\) event sample generated (right column) and after enforcing on-shell top-antitop production (left column). Two analysis cuts on the invariant mass of the reconstructed top pair are imposed, \(m^\text {rec}_{{t\bar{t}}}> 1\,\textrm{TeV}\) (upper panel) and \(m^\text {rec}_{{t\bar{t}}}> 1.5\, \textrm{TeV}\) (lower panel)

Our final projections of SMEFT Wilson-coefficient expected limits, assuming the SM, are shown in Fig. 5. We derive the sensitivity of the LHC to each of the operators considered, making use of the procedure described above. We present bounds on the associated Wilson coefficients, both for an integrated luminosity of 300 fb\(^{-1}\) (blue) and 3000 fb\(^{-1}\) (red), and for the two options explored for the level of systematics, namely \(\Delta _\mathrm {\,sys} = 0\) (shaded bars) and 10% (solid bars). In addition, we distinguish the case in which we pre-select at parton-level on-shell \({{t\bar{t}}}\) events (left subfigures) and that in which we analyse the full event sample generated (right subfigures). As for the previous discussion, we first implement a relatively inclusive requirement of 1 TeV on the invariant mass of the reconstructed \({{t\bar{t}}}\) system (upper row) and as well as a more stringent \(m^\text {rec}_{{t\bar{t}}}> 1.5 \,\textrm{TeV}\) cut (bottom row).

We find limits on \(|C/\Lambda |\) that lie in the 0.1–1 \({\textrm{TeV}^{-1}}\) range. This means that for Wilson coefficients satisfying \(C\sim 1\), effective scales in the 1–5 TeV range can be probed. Conversely, for TeV-scale new physics, couplings of \(\mathcal {O}(0.1)\) can be reached. The bounds are found to be mildly more constraining with the increase in luminosity as well as with a harder cut on \(m^{\text {rec}}_{t\bar{t}}\), as expected, and the impact of off-shell top-antitop production is additionally found to be sub-leading. Such a sensitivity is of comparable size with that estimated on the basis of global fits (see e.g. predictions from Ref. [42]), which demonstrates the potential of including dedicated analyses of boosted top quark pair production and decay in SMEFT global fits. Global fits of LHC Run 2 data indeed indicate that \(|C/\Lambda |\) has to be smaller than about 0.1–1 \( {\textrm{TeV}^{-1}}\) too. Our results should however additionally be compared with individual limits extracted from fits of a large set of observables when one SMEFT operator is considered at a time (for a fairer comparison). Such fits lead to bounds on \(|C/\Lambda |\) of \(\mathcal {O}(0.1) \,\textrm{TeV}^{-1}\) [44], which are thus comparable with the findings of Fig. 5. Whereas exploiting boosted top quark production is already known to have a strong constraining power on individual operators (for instance in the context of top dipole moments, where it has been shown to significantly improve the bounds by a factor of a few [72]), a detailed quantitative analysis of its impact lies beyond the scope of this paper. Here, we have only investigated how using a specific boosted-top quark channel could lead to a better assessment of the sensitivity of the LHC to top quark-related SMEFT operators, thanks to a joint usage of a variety of potentially relevant observables and improved top-tagging capabilities in the SMEFT.

4 Conclusion and outlook

Jet substructure methods are known to be among the key players in the search for new phenomena beyond the Standard Model of particle physics. Among these, a set of dedicated techniques are related to the identification of jets originating from the hadronic decay of a boosted top quark. In this paper, we have reported the development of an interface between the HEPTopTagger package and two software tools widely used in the high-energy physics community, namely the MadAnalysis 5 and Rivet frameworks. Thanks to this development, the many users of these platforms now have the possibility to exploit boosted hadronically-decaying top quarks and their properties in analyses of high-energy physics events for the Large Hadron Collider and beyond.

We have briefly described these two implementations and how to use them. Our developments equip the Rivet toolkit from version 3.1.7, which is available from HepForge (see https://rivet.hepforge.org/), as well as the MadAnalysis 5 framework from version 2.0.4, available from GitHub (see https://github.com/MadAnalysis/madanalysis5/releases). Moreover, detailed tutorials exploiting all the possibilities can be found in the analysis file shipped with Rivet, as well as in the MadAnalysis 5 tutorial available from https://github.com/MadAnalysis/tutorial_osu.

To illustrate the power of these developments, we have considered the SMEFT framework in which new physics manifests through non-renormalisable operators in the Standard Model fields. We have focused on eight dimension-six, four-fermion operators relevant to the top quark sector, chosen as they are not stringently constrained by current SMEFT global fits. The analysis of the production of pairs of boosted top quarks could therefore provide new handles on associated heavy BSM physics. We have explored this option by first investigating the performance of the HEPTopTagger algorithm in the presence of non-vanishing SMEFT operators. Whereas the algorithm is tuned on SM top-pair production and decay, we have observed that its performance improves further in the presence of the considered additional SMEFT operators in the model’s Lagrangian. The energy dependence of the SMEFT operators considered indeed favours the production of very energetic boosted top quarks, with properties enhancing their tagging possibility by the HEPTopTagger method. This observation highlights the importance of considering new-physics effects upon reconstruction performance when attempting SMEFT parameter fits.

Secondly, we have investigated differential observables in boosted top-antitop production following HEPTopTagger tagging, to study how deviations from the Standard Model can best be used to isolate SMEFT effects emerging from the new operators. We have shown that a simple analysis based on HEPTopTagger could lead to bounds comparable with those stemming from other means to constrain SMEFT operators. We hope that this demonstrates the potential of the developments presented in this work and that they will serve the community well in the future.