Giving top quark effective operators a boost

We investigate the prospects to systematically improve generic effective field theory-based searches for new physics in the top sector during LHC run 2 as well as the high luminosity phase. In particular, we assess the benefits of high momentum transfer final states on top EFT-fit as a function of systematic uncertainties in comparison with sensitivity expected from fully-resolved analyses focusing on $t\bar t$ production. We find that constraints are typically driven by fully-resolved selections, while boosted top quarks can serve to break degeneracies in the global fit. This demystifies and clarifies the importance of high momentum transfer final states for global fits to new interactions in the top sector from direct measurements.


I. INTRODUCTION
Final states associated with top quarks are produced in abundance at the Large Hadron Collider (LHC). Top quark pair production in particular, with a cross section of around 900 pb [1][2][3][4], will enable us to perform precise spectroscopy of the top sector at the LHC. This plays an important role in paving the way to a better understanding of particle physics beyond the Standard Model (SM). In fact, since the top quark is the heaviest particle in the SM it typically assumes a central role in concrete beyond the SM (BSM) scenarios, ranging from composite Higgs to supersymmetric theories. Most of these theories are characterised by additional propagating degrees of freedom, some of which fall inside the kinetic coverage of the LHC. There is a significant effort to search for these exotic states. Unfortunately, none of these searches have provided a conclusive hint for physics beyond the SM so far.
If there is a mass gap between the electroweak scale v and the scale of the new physics, one can view the Standard Model as the leading term in an effective Lagrangian, where all non-standard couplings of SM particles to new degrees of freedom are integrated out and are left encoded in higher-dimensional operators, i.e. operators of dimension D > 4: As is convention, we normalise the operators such that Λ, which can be viewed as a generic scale for the supposed heavy degrees of freedom, has been displayed explicitly, * Electronic address: christoph.englert@glasgow.ac.uk † Electronic address: l.moore.1@research.gla.ac.uk ‡ Electronic address: k.nordstrom.1@research.gla.ac.uk § Electronic address: m.russell.2@research.gla.ac.uk so that the effective new physics couplings C (6) i (Wilson coefficients) are dimensionless. The ellipsis denotes operators of dimension D > 6. It is typically assumed that there is an adequate gap between v and Λ such that this expansion can be truncated at D = 6, although this may not be the case.
The structure of the operators O (6) i is dictated entirely by the field content and symmetry restrictions of the Standard Model, and several equivalent bases for O (6) i exist already in the literature. Once a basis has been chosen, all that remains is to compute the effects of O (6) i on a given observable, and obtain the allowed regions of its corresponding coefficient C (6) i . Any measurement of a non-zero coefficient is thus evidence of physics beyond the Standard Model.
The top quark sector is one of the pillars of the analysis programme pursued by the ATLAS and CMS experiments at the LHC, and the diverse range of measurements published during run 1 have been used to constrain new top couplings within the SMEFT framework [18,19,27]. No deviations have been found so far (in particular in resonant tt searches e.g. [29][30][31]), but the constraints available from the small integrated luminosity of run 1 are rather weak. This is a setback for the EFT approach, because weak constraints on C i /Λ 2 can correspond to mass scales Λ that are resolved by the measurements used in the fit, thus invalidating the perturba-tive expansion * of Eq. (I.1) in the first place [19,32,33] With the increase in luminosity forecast for run 2, as well as the high luminosity-phase of the LHC (or even a hypothetical 100 TeV collider), there will undoubtedly be a significant improvement of the currently rather loose constraints. This improvement will depend on the relative importance of particular top channels and the impact of their associated systematic uncertainties on the limit setting. We expect deviations from the SM to be most pronounced at large momentum transfers. However, in these regions both the theoretical modelling as well as the experimental measurements tend to be less reliable than at low momentum transfers. The small cross sections at large transverse momenta can be mitigated by employing efficient top reconstruction techniques in these particular phase space regions and these reconstruction approaches are subject to qualitatively different systematics compared to fully resolved top final state analyses. This holds in particular when new physics is present.
In this paper we show how the combination of these effects influences the potential improvement of a global top quark fit using the example of top quark pair production and discuss how improvements in boosted and fully resolved analyses can affect the global sensitivity to new physics in top final states. We organise this work as follows. We first introduce the relevant effective theory and fitting components which are relevant for our analysis in Sec. II. In Sec. III we discuss the improvements made on the limits of the coefficients as obtained from the boosted analysis, highlighting the interplay between various sources of experimental uncertainty, and trace the impact of theory uncertainties on our fit. We conclude in Sec. IV with a discussion of how our EFT results may then be interpreted within the parameter space of UV completions, and discuss future directions.

A. Model Summary
We implemented the complete 'Warsaw' Basis [34] of the SMEFT Lagrangian as a general FeynRules [35] model, in which we adopted these authors' conventions and permitted Wilson coefficients C to carry flavour indices wherever applicable. Herein, we included the minimal set of global parameter and field redefinitions necessary to restore canonical normalisation and massdiagonal states to the Lagrangian in the broken electroweak phase up to terms of O(Λ −4 ). In the case of strong top pair production this amounts to redefinitions * An identical interpretation is that weakly coupled UV completions are left unconstrained after matching. of e.g. the strong gauge coupling (see e.g. [13] for a detailed discussion), which have no physical consequences for our analysis.
The resulting model file is interfaced via the Ufo [36] format to MadEvent [37]. At leading order in the SMEFT, the operators that are relevant for top quark pair production at hadron colliders are summarised in Tab. I.
The lower six operators in Tab. I contribute via the partonic subprocess qq → tt, but at the interference level, only through four linear combinations which we denote C 1,2 u,d (see [19] for details).

B. Fitting
The events generated from MadEvent which sample the Wilson coefficient space are subsequently showered by Herwig++ [38,39], which takes into account initial and final state radiation showering, as well as hadronisation and the underlying event. At this stage, all our predictions are at leading order in the Standard Model EFT. While considerable progress has recently been made in extending the effective Standard Model description of top quark physics to next-to-leading order [6,40], the full description of top quark pair production is incomplete at this order. We take into account higher-order QCD corrections by re-weighting the Standard Model piece of our distributions to the NLO QCD prediction with Kfactors, as obtained from Mcfm [41] and cross-checked with Mc@Nlo [37]. Recently, full NNLO results for top quark pair production have become available in [3,4,42], we will comment on their potential for improving our results in Sec. III.
We estimate scale uncertainties in the usual way: For the central value of the distributions we choose renormalisation and factorisation scales equal to the top quark mass µ R = µ F = m t . Then we vary the scales independently over the range m t /2 < µ R,F < 2m t . PDF uncertainties are estimated by generating theory observables with the Ct14 [43], Mmht14 [44] and Nnpdf3.0 [45] as per the recommendations of the Pdf4Lhc working group for LHC run 2 [46], and we take the full scale+PDF envelope as our theory band. This defines an uncertainty on the differential K-factor which we propagate into each observable. We treat theory uncertainties as uncorrelated with experimental systematics and take them to be fixed as a function of luminosity unless stated otherwise.
In order to build the parameter space for the Wilson coefficients C i , instead of calculating coefficients on a multidimensional grid, which suffers from exponential scaling in the number of operators, we use an interpolation-based method, detailed in [47].
• We construct a logarithmically random-sampled 6 dimensional parameter space in the operators of Tab. 1. The logarithmic spacing reflects that we want our sampling to be most accurate near to the SM point {C i } = 0.
• We generate our theory predictions and uncertainties, as detailed above, at each point in this space.
• Once the parameter space has been constructed, we use a polynomial to interpolate between the randomly chosen values of {C i }, thus building up a smooth functional form for the change in the prediction for the observables considered with respect to {C i }.
Motivated by the functional form of the cross section with respect to the Wilson coefficient we choose a polynomial dependence on {C i } as our response function for a single bin b.
This way operators with vanishing interference with the SM amplitude piece can be treated separately and we gain complete analytical control over the fit. The ellipsis in Eq. (II.4) denotes higher order terms in {C i }. Comparing Eqs. (II.3) and (II.4), one would expect a quadratic polynomial to capture the full dependence on {C i }. However, when one considers observables such as asymmetries, or distributions normalised to the total cross section, this simple relation is no longer valid. In order to capture the dependence on the coefficients as accurately as possible, we use a fourth-order polynomial for f b † .
Once f b is constructed for each bin in the distribution, all that remains is to define a goodness of fit function between theory and data, and minimise it to obtain exclusion contours for {C i }. † We have checked that our fit is numerically stable with respect to higher-order terms in the response function; the fourth-order polynomial captures the best balance between fit coverage and computational efficiency.
2: Individual 95% bounds on the operators considered here, from the boosted analysis and the resolved fat jet analysis, and the combined constraint from both, assuming 20% systematics and 30 fb −1 of data. We also show existing constraints from unfolded 8 TeV pT distributions published in [48] and [49], showing the sizeable improvement even for a modest luminosity gain.

III. IMPROVING THE TOP EFT FIT AT THE LHC
A. The impact of high pT top final states As noted in the introduction, the bounds obtained on top quark operators from early LHC data are rather weak. In principle, differential distributions provide much more sensitivity to higher-dimensional operators than inclusive rates, because they isolate the regions of phase space where the operators are most sensitive. Typically, however, the differential measurements used in the fit have been based on standard top reconstruction techniques, which, while providing good coverage of the low p T 'threshold' region, suffer from poor statistical and systematic uncertainties in the tails of distributions, precisely the region of phase space we aim to isolate.
Moreover, the measurements used were typically unfolded; that is, the final-state objects were corrected for detector effects and the actual measured 'fiducial' cross section extrapolated to the full phase space, without cuts. This includes the treatment of reducible as well as irreducible backgrounds, which we implicitly understand as part of experimental systematic uncertainties in the following. Unfolded distributions substantially ease the workflow of our fit, since we can compare them directly to parton level quantities without the need for showering, hadronisation and detector simulation at each point in the parameter space. However, the extrapolation from the fiducial to full phase space, which makes use of comparing to Monte Carlo simulations, necessarily biases the unfolded distributions towards SM-like shapes. It also introduces additional correlations between neighbouring bins, broadening the χ 2 .
For top pair production, being a 2 → 2 process, the relevant observables which span the partonic phase space are scattering angle and partonic centre-of-mass energy. All other observables are functions of these parameters, of which the top quark transverse momentum is the most crucial in determining the quality and efficiency of the boosted top tagging approach [50][51][52][53][54][55][56] which we will employ in the following. The advantage of selecting high p T objects is thus twofold [57]. Firstly, by making use of sophisticated reconstruction techniques for boosted objects, we move to the region of phase space where the effects of heavy new degrees of freedom will be most pronounced, as illustrated in Fig. 1, and secondly, jet substructure techniques require, by definition, a hadron-level analysis, so we avoid the model-dependence that fitting parton-level distributions to unfolded measurements suffers from.
The sting in the tail for analyses selecting high p T objects is, of course, low rates. At 13 TeV, for instance, we find that 90% of the cross section comes from the resolved region p t T < 200 GeV. ‡ We thus aim to quantify at what stage in the LHC programme, if at all, the increased sensitivity in this region can compensate for the relatively poor statistics. Our analysis setup, as implemented in Rivet [58], is as follows (summarised in Tab. II): Restricting ourselves to the semileptonic top pair decay channel, we first require a single charged lepton with p T > 30 GeV § , and find the E miss T vector which we require to have a magnitude > 30 GeV. The leptonic W -boson is reconstructed from these by assuming it was produced on-shell. Jets are then clustered using the anti-k T algorithm [59] using FastJet [60] in two separate groups with R = (0.4, 1.

Boosted selection
FIG. 3: Fractional improvement on the 95% confidence intervals for the operators considered here, with various combinations of luminosity and experimental systematics considered. We take the width of the 95% confidence limit obtained from 20 % systematic uncertainty and 30 fb −1 of data as a baseline (green bar), and normalise to this, i.e. we express constraints as a fractional improvement on this benchmark. The purple and blue bars represent respectively, 300 fb −1 and 3 ab −1 of data, also at 20% systematics, while the yellow, orange and red are the analogous data sample sizes for 10% systematics.
tively, and jets which overlap with the charged lepton are removed. The R = 1.2 fat jets are required to be within |η| < 2, and the R = 0.4 small jets are b-tagged within the same η range with an efficiency of 70% and fake rate of 1% [61]. If at least one fat jet and one b-tagged small jet which does not overlap with the leading fat jet exists, we perform a boosted top-tag of the leading fat jet using HEP-TopTagger [50,51,62] and reconstruct the leptonic top candidate using the leading, non-overlapping b-tagged small jet and the reconstructed leptonic W .
If no fat jet fulfilling all the criteria exists, we instead require at least 2 b-tagged small jets and 2 light small jets. If these exist we perform a resolved analysis by reconstructing the hadronic W -boson by finding the light small jet pair that best reconstructs the W mass, and reconstruct the top candidates by similarly finding the pairs of reconstructed W -boson and b-tagged small jet that best reconstruct the top mass.
Finally, regardless of the approach used, we require both top candidates to have |m cand − m top | < 40 GeV. If this requirement is fulfilled the event passes the analysis.

Impact of experimental precision
Using a sample size of 30 fb −1 with a flat 20% systematic uncertainty (motivated by typical estimates from existing experimental analyses by ATLAS [63] and CMS [64]) on both selections as a first benchmark, the 1-dimensional 95% confidence intervals on the opera-tors considered here are presented in Fig. 2. All the bounds presented here are 'one-at-a-time', i.e. we do not marginalise over the full operator set. Our purpose here is to highlight the relative contributions to the allowed confidence intervals here, rather than to present a global operator analysis.
As a general rule, the increased sensitivity to the Wilson coefficients offered by the boosted selection is overpowered by the large experimental systematic uncertainties in this region, and the combined limits are dominated by the resolved top quarks. The exception to this rule is the coefficient Expanding out the field strength tensors leads to vertices with up to six powers of momentum in the numerator, more than enough to overcome the naïve 1/ŝ 2 unitarity suppression. Large momentum transfer final states thus give stronger bounds on this coefficient, even with comparatively fewer events.
With these constraints as a baseline, it is then natural to ask by how much they can be improved upon when refinements to experimental precision are made. The constraints are presented in Fig. 3 for different combinations of systematic and statistical uncertainties. We take the width of the 95% confidence interval in Fig. 2 as our normalisation (the green bars), and express the fractional improvements on the limits that can be achieved relative to this baseline, for each operator. The right bars (green, purple, blue) represent 20% systematic uncertainties with, respectively 30, 300 and 3 ab −1 of data. The left bars (yellow, orange, red) represent the same respective data sample sizes, but with 10% systematic uncertainties.
Beginning with the resolved selection, we find that the

FIG. 4:
Left: 68%, 95% and 99% confidence intervals for C G and C 33 uG , the lines are obtained using experimental (20% systematics and 30 fb −1 of data) uncertainties along with theoretical uncertainties, the filled contours using only experimental uncertainties. Right: the same plot, but using 10% systematics and 3 ab −1 of data, showing the much stronger impact of theory uncertainties in this region.
limits on the coefficient C G can be improved by 40% by going from 30 fb −1 to 300 fb −1 , and by a further 20% when the full LHC projected data sample is collected. Systematic uncertainties have a more modest effect on this operator: at 3 ab −1 the limit on C G is only marginally improved by a 10% reduction in systematic uncertainty. This merely reflects that C G mostly impacts the high p T tail, so it can only be improved upon in the threshold region by collecting enough data to overcome the lack of sensitivity. 8 TeV measurements are already constraining the relevant phase space region efficiently and the expected improvement at 13 TeV is only mild (see below).
For the chromomagnetic dipole operator O 33 uG , improving the experimental systematics plays much more of a role. A 10% improvement in systematics, coupled with an increase in statistics from 30 fb −1 to 300 fb −1 leads to stronger limits that maintaining current systematics and collecting a full 3 ab −1 of data. Similar conclusions apply for the four-quark operators, to varying degrees, i.e. reducing systematic uncertainties can provide comparable improvements to collecting much larger data samples.
For the boosted selection, the situation is quite different. For all the operators we consider, improving systematic uncertainties by 10% has virtually no effect on the improvement in the limits. This simply indicates that statistical uncertainties dominate the boosted region at 30 fb −1 . For C G , at 300 fb −1 some improvement can be made if systematics are reduced, however we then see that systematic uncertainties saturate the sensitivity to C G , i.e. there is no improvement to be made by collecting more data. For C 33 uG , a modest improvement can also be made both by reducing systematics by 10% and by increasing the dataset to 300 fb −1 . However, going beyond this, the improvement is minute. The four-quark operators again follow this trend, although C 2 u shows much more of an improvement when going from 300 fb −1 to 3 ab −1 .

The role of theoretical uncertainties
The other key factor in the strength of our constraints is the uncertainties that arise from theoretical modelling. The scale and PDF variation procedure outlined in Sec. II typically leads to uncertainties in the 10-15% range. Fully differential K-factors for top pair production at NNLO QCD (i.e. to order O(α 4 s )) have become available, which have substantially reduced the scale uncertainties. The numbers quoted in Refs. [4,65] are for the Tevatron and 8 TeV LHC, and available only for the low to intermediate p t T range (p t T < 400 GeV). Updated results for 13 TeV have become available only recently [66]. It is worthwhile to ask what impact such an improvement could have on the constraints.
We put this question on a firm footing by showing in Fig. 4 the 2D exclusion contours for the coefficients C G and C 33 uG , as obtained from combining the boosted and resolved limits, at fixed luminosity and experimental systematics, first using our NLO theory uncertainty, and also using no theory uncertainty at all. For 30 fb −1 the improvement is limited, indicating that at this stage in the LHC programme the main goal should be to first improve experimental reconstruction of the top quark pair final state. However, at 3 ab −1 the improvement is substantial, indicating that it will also become necessary to improve the theoretical modelling of this process, if the LHC is to augment its kinematic reach for non-resonant new physics.
In addition to SM theoretical uncertainties, there are uncertainties relating to missing higher-order terms in the EFT expansion. Uncertainties due to to loop corrections and renormalisation-group flow of the operators O (6) i are important for measurements at LEP-level precision [67,68] where electroweak effects are also resolved. However, at the LHC we find them to be numerically insignificant compared to the sources of uncertainty that we study in detail here. In addition, there is also the possibility of large effects due to dimension-8 operators, particularly owing to additional derivatives in the EFT expansion Eq. (I.1). Since the interference effects of omitted dimension-8 operators are formally of the same order as the retained quadratic terms in the dimension-6 operators, we emphasise that the numerical constraints presented here should be treated with caution. The only way to be certain that the omission of these terms is justified is to compute the effects of the interference of the relevant dimension-8 operators to a given process and demonstrate them to be small. This has been shown to be true for the gg → tt subprocess [69,70]. However, due to the large number of operators present there, this has not been studied for the qq → tt process. We leave a full computation of these effects as a future direction of study.

C. Interpreting the results
The whole purpose of the EFT approach is to serve as a bridge between the Standard Model and heavy degrees of freedom residing at some unknown mass scale M * . Connecting the EFT to this scale, however, necessarily involves making assumptions about the couplings of this new physics. We can make statements about the relation between the constraints presented here and such a scale, however, by making general assumptions, such as perturbativity of the underlying new physics.
Consider, for example, the simple case where the perturbative UV physics is characterised entirely by a single coupling g * and a unique mass scale M * . Such a scenario could arise from integrating out a heavy, narrow resonance. In this case we have the simple tree-level matching condition (III.5) Constraints on C i then map onto allowed regions in the g * -M * plane. In Fig. 5 we sketch these regions for illustrative values of C i . In order for the EFT description of a given mass region to be valid, we must not resolve it our measurement. Therefore we impose a hard cut at √ s = 2 TeV, obtained from the maximum tt invariant mass probed in our SM pseudodata. We also impose a generic perturbativity restriction g * < ∼ 4π to ensure that our EFT expansion is well-behaved and higherdimensional operators do not affect the power counting.
We see that for large Wilson coefficientsC i > ∼ 0.5 only a very small window of parameter space may be constrained, but the weak limits push the underlying coupling to such large values that loop corrections are likely to invalidate the simple relation of Eq. (III.5), making it hard to trust these limits. However, at 3 ab −1 , the projected constraints are typicallyC i < ∼ 0.01, therefore, even for moderate values of the coupling g * , our constraints are able to indirectly probe mass scales much higher than the kinematic reach of the LHC.

IV. SUMMARY AND CONCLUSIONS
The special role of the top quark in BSM scenarios highlights the importance of searches for new interactions in the top sector. Taking the lack of evidence of resonant new physics in the top sector at face value [29][30][31], we can assume that new interactions are suppressed by either weak couplings or large new physics scales. In both cases we can analyse the presence of new physics using effective field theory techniques. A crucial question that remains after the results from the LHC run 1 is in how far a global fit from direct search results will improve with higher statistics and larger kinematic coverage. We address this question focusing on the most abundant top physics-related channel pp → tt, which probes a relevant subset of top quark effective interactions. In particular, we focus on complementary techniques of fully-resolved vs. boosted techniques using jet-substructure technology, which are affected by different experimental systematic uncertainties. Sensitivity to new physics is a trade off between small statistical uncertainty and systematic control for low p T final states at small new physics-induced deviations from the SM expectation (tackled in fully-resolved analyses) and the qualitatively opposite situation at large p T . For the typical parameter choices where top-tagging becomes relevant and including the relevant efficiencies, we can draw the following conclusions: • Boosted top kinematics provide a sensitive probe of new interactions in tt production mediated by modified trilinear gluon couplings. In particular, this observation shows how differential distributions help in breaking degenerate directions in a global fit by capturing sensitivity in phenomenologically complementary phase space regions.
• The sensitivity to all other operators detailed in Tab. I is quantitatively identical for boosted and fully-resolved analyses for our choice of p boost T ≥ 200 GeV. Increasing the boosted selection to higher p T (where the top tagging will become more efficient) will quickly move sensitivity to new physics effects to the fully resolved part of the selection. The boosted selection is saturated by large statistical uncertainties for the for the typical run 2 luminosity expectation. These render systematic improvements of the boosted selection less important in comparison to the fully resolved selection, which provides an avenue to set most stringent constraint from improved experimental systematics. Similar observations have been made for boosted Higgs final states [71] and are supported by the fact that the overflow bins in run 1 analyses provide little statistical pull [19].
• Theoretical uncertainties that are inherent to our approach are not the limiting factors of the described analysis in the forseeable future, but will become relevant when statistical uncertainties become negligible at very large integrated luminosity.
Boosted analyses are highly efficient tools in searches for resonant new physics [29][30][31]72]. Our results show that similar conclusions do not hold for non-resonant new physics effects when the degrees in questions do not fall inside the kinematic coverage of the boosted selection anymore. Under these circumstances, medium p T range configurations which maximise new physics deviation relative to statistical and experimental as well as theoretical uncertainty are the driving force in setting limits on operators whose effects are dominated by interference with the SM amplitude in the top sector. This also implies that giving up the boosted analysis in favor of a fully resolve analysis extending beyond p t T ≥ 200 GeV will not improve our results significantly. The relevant phase space region can be accessed with fully resolved techniques, with a large potential for improvement from the experimental systematics point of view.