Top quark mass studies with $t\bar{t}j$ at the LHC

A precise measurement of the top quark mass, a fundamental parameter of the Standard Model, is among the most important goals of top quark studies at the Large Hadron Collider. Apart from the standard methods, numerous new observables and reconstruction techniques are employed to improve the overall precision and to provide different sensitivities to various systematic uncertainties. Recently, the normalised inverse invariant mass distribution of the $t\bar{t}$ system and the leading extra jet not coming from the top quark decays has been proposed for the $pp \to t\bar{t}j$ production process, denoted as ${\cal R}(m_t^{pole},\rho_s)$. In this paper, a thorough study of different theoretical predictions for this observable, however, with top quark decays included, is carried out. We focus on fixed order NLO QCD calculations for the di-lepton top quark decay channel at the LHC with $\sqrt{s}=13$ TeV. First, the impact on the extraction of $m_t$ is investigated and afterwards the associated uncertainties are quantified. In one approach we include all interferences, off-shell effects and non-resonant backgrounds. This is contrasted with a different approach with top quark decays in the narrow width approximation. In the latter case, two cases are employed: NLO QCD corrections to the $pp\to t\bar{t}j$ production process with leading order decays and the more sophisticated case with QCD corrections and jet radiation present also in top quark decays. The top quark mass sensitivity of ${\cal R}(m_t^{pole},\rho_s)$ is investigated and compared to other observables: the invariant mass of the top anti-top pair, the minimal invariant mass of the $b$-jet and a charged lepton as well as the total transverse momentum of the $t\bar{t}j$ system.

quark decays and which allow for a consistent treatment of top quark resonances have been introduced in Ref. [5][6][7]. Apart from parton shower effects, non-perturbative physics must also be incorporated into m t measurements. Here, choices must be made for example on the proton parton distribution functions (PDFs), the hadronisation model, the underlying event, the modelling of colour re-connection and the description of additional interactions accompanying the hard scattering process, the so-called pile-up. Even though the definition and implementation of the top quark mass in NLO+PS MC tools is based on the on-shell renormalisation scheme of m t at one loop and it is identical to what is used in parton-level calculations, above mentioned effects play an important role as they enter in the relation between m t and physical observables. The top quark mass can also be extracted indirectly from the inclusive total cross section for the top quark pair production process. However, even the total cross section, σ tt , is not free from uncertainties due to the above mentioned non perturbative effects. Due to the extrapolation of the fiducial cross section to the full phase space the measured σ tt depends on hadronisation effects as it relies on the MC modelling of these phenomena. The dependence on non-perturbative effects is smaller than for exclusive observables, but, unfortunately, top quark mass determinations based on the mass dependence of the inclusive tt production cross section are less precise.
Since the discovery of the top quark, direct measurements of tt production have already been made at five different center-of-mass system energies, two at the Tevatron and three at the LHC. The top-quark mass has been measured in various decay channels, i.e. the +jets, the di-lepton, and the all-jets channel by all four experiments: CDF, D0, ALTAS and CMS. A combination of Tevatron and LHC measurements has been performed in 2014 and resulted in with a total uncertainty of 0.70 GeV (ATLAS) and 0.49 GeV (CMS). The world's best measurements by the ATLAS and CMS collaborations are in good agreement with the 2014 world average. These results can be further compared to m t extracted from the inclusive top quark pair production cross-section σ(tt) at √ s = 7, 8 and 13 TeV. Using the expected dependence of the cross section on the top quark mass and comparing it to theoretical predictions at the next-to-next-to-leading order level including the resummation of nextto-next-to-leading logarithmic soft gluon effects (NNLO+NNLL) [11] the following values of m t have been determined Predictions for tt production at NNLO+NNLL also employ the on-shell scheme for mass renormalisation since the scheme is commonly used for calculations of perturbative higher order predictions in top quark physics. However, the top quark pole mass, m pole t , has an uncertainty of its own, which is of the order of O(Λ QCD ). For example, the intrinsic uncertainty on the m pole t definition due to renormalons has been recently estimated to be of the order of O(100) MeV [15,16]. On the experimental side, the main systematic uncertainties contributing to the top quark mass measurements typically originate from the understanding of the jet energy scale for light-quark and b-quark originated jets and from modelling of the performance of the b-tagging algorithms. Thus, various alternative methods to extract m t have been proposed to give a further insight by providing different sensitivities to various systematic uncertainties. Such methods, which can also help to improve the overall precision, comprise either new observables or new reconstruction techniques, see e.g. Ref. [17][18][19][20][21][22][23] for pp → tt production. Among others, a novel method to determine m t in the pp → ttj production process has been proposed in Ref. [24,25] for on-shell top quarks. It uses the normalised differential cross section as a function of the invariant mass of the tt system and the leading extra jet not coming from the top quark decays. To be more precise it is defined according to where m 0 is a parameter that is of the order of the top quark mass and M ttj is the invariant mass of the ttj system. In Ref. [24,25] NLO QCD corrections to on-shell ttj production are matched with parton shower programs that are responsible for top quark decays, shower effects and non-perturbative physics. Since additional radiation depends on the mass of the top quark, the ρ s distribution should impact the m t extraction differently than for example the invariant mass of the top anti-top pair alone. As a consequence it should be studied in the context of a precise determination of the top quark mass. Indeed, the method, has already been applied by ATLAS and CMS experimental groups [26,27]. The measured differential cross sections have been compared to the predicted cross sections for each bin of the ρ s observable for the full phase-space using different top quark masses. In the end the most probable top quark mass has been extracted yielding m t = 173.70 +2.28 −2.11 GeV ATLAS at 7 TeV, m t = 169.90 +4.52 −3.66 GeV CMS at 8 TeV. (1.5) In this paper we investigate the sensitivity of the ρ s observable even further by including NLO QCD corrections also in top quark decays. The main goal is to study the impact of top quark decay modelling on the extraction of the top quark mass in the di-lepton top quark decay channel. To this end we concentrate on fixed order NLO QCD predictions, which rigorously allows us to define the top quark pole mass as the input parameter. We compare three distinct theoretical predictions for the pp → ttj process at the NLO level in QCD. First a complete description of the e + ν e µ −ν µ bbj final state as explained in Ref. [28,29] is employed, which takes into account all possible contributions, i.e. double (top quark), single (top quark) and non (top quark) resonant contributions together with their interferences and off-shell effects. Off-shell effects and non-resonant contributions due to the W gauge boson are also properly taken into account. From the quantum field theory point of view this is the most comprehensive description of the ttj production process at NLO QCD because all effects that are perturbatively calculable at O(α 4 α 4 s ) are accounted for. We dub this approach the Full approach. As a second case, we consider the narrow width approximation (NWA) description for top quarks and W gauge bosons [30], with the following decay chains pp → ttj → W + W − bbj → e + ν e µ −ν µ bbj and pp → tt → W + W − bbj → e + ν e µ −ν µ bbj. Thus, NLO QCD corrections to top quark pair production with a hard jet are incorporated together with QCD radiative corrections to top quark decays including also the possibility that this hard jet is emitted in the decay stage. Even though t and W decays are treated in the NWA, NLO spin correlations are retained throughout the entire decay chain. This approach is dubbed NWA. Finally, mostly for comparisons, we employ calculations from Ref. [31], where the NLO QCD corrections to on-shell ttj production are provided, however, top quark decays are included only at the leading order (LO) in perturbative QCD. Thus, the following decay chain is investigated pp → ttj µ bbj, hence spin correlations are only contained at the LO level. This third approach is dubbed NWA P rod. . At the end of the paper we are going to compare the normalised ρ s differential distribution to other observables that are commonly used in the top quark mass measurements, namely the invariant mass of the tt system, M tt , and the (minimal) invariant mass of the charged lepton and the b-jet, M b . We shall also present results for the total transverse momentum of the ttj system, H T , owing to its similar sensitivity to m t as observed in the case of ρ s . The paper is organised as follows. The general setup of our analysis is described in Section 2. In Section 3 we depict the main observable and discuss the details of methods used in the top quark mass extraction. In Section 4 we present our results on the m t extraction and assess theoretical uncertainties stemming from the scale dependence and various assumptions that enter into the parameterisation of the PDFs. For the latter case we follow PDF4LHC recommendations for LHC Run II [32] by employing CT14, MMHT14 and NNPDF3 PDF sets. Results for a slightly modified version of ρ s are discussed in Section 5. In Section 6 a comparison between ρ s and other observables, that are also sensitive to m t , is performed. Following our conclusions that are given in Section 7, we include an appendix that presents the comparison between Full, NWA and NWA P rod. obtained using a fixed scale choice, for several observables.

Setup of the Analysis
Numerical results with complete top quark and W gauge boson off-shell effects and nonresonant backgrounds included, which are the basis for our top quark mass extraction studies, are obtained with the help of the Helac-Nlo Monte Carlo program [33], that comprises Helac-Dipoles [34,35] and Helac-1Loop [36]. Theoretical aspects related to the complex mass scheme introduced in our calculations are explained in details in Ref. [37]. On the other hand, a comprehensive description of NLO calculations in the NWA for the top quarks is given in Ref. [38]. We, therefore, do not repeat these details here, but rather refer interested readers to our earlier publications. We consider the pp → e + ν e µ −ν µ bbj + X process at O(α 4 s α 4 ) for the LHC Run II energy of √ s = 13 TeV. Throughout, for the masses where, in the total decay rates for the W and Z gauge bosons, the NLO QCD corrections to W → f 1f2 and Z → ff have been included. Further electroweak parameters such as the electroweak coupling and the weak mixing angle are computed in the so called G µ scheme with the Fermi constant G µ = 1.16637 · 10 −5 GeV −2 through the following formulae The mass and the width of the top quark are set to where Γ NLO t refers to the top quark width with W gauge boson off-shell effects included and Γ NLO tW to the top quark width with an on-shell W gauge boson as used in the NWA [39,40]. Both values are derived for massless b quarks since all leptons and u, d, c, s, b partons are considered to be massless. The normalised ρ s differential distribution and other observables are also evaluated with different top quark masses to be used for the fitting procedure. Generally we shall use the following five values of m t : 168.2 GeV, 170.7 GeV, 173.2 GeV (the default value), 175.7 GeV and 178.2 GeV. This corresponds to the following spread m t = 173.2 ± 5 GeV in steps of 2.5 GeV. For completeness, corresponding top quark decay widths are shown in Table 1. The value of α s (m t ) needed for Γ NLO t and Γ NLO tW is obtained from α s (m Z ) = 0.118 via LHAPDF [41]. In general, the running of the strong coupling constant α s with two-loop accuracy is provided by the LHAPDF library and the number of active flavours is set to N F = 5. We employ the CT14nlo [42], NNPDF30-nlo-as-0118 [43] and MMHT2014nlo68clas118 [44] PDF sets that we dubbed as CT14, NNPDF3 and MMHT14. Suppressed contributions from bottom quarks in PDFs are not included. All final-state partons with pseudo-rapidity |η| < 5 are recombined into jets with a separation parameter R in the rapidity-azimuthal-angle plane via the IR-safe anti−k T jet algorithm [45]. The value of the jet radius R is set to R = 0.5. When merging particles during the clustering procedure one must specify how to combine the momenta. We use the simplest procedure, currently used by the LHC experiments, and add the four-vectors of combined partons (the so called E-scheme). Finally, we require exactly two b-jets, at least one light jet, two charged leptons and missing transverse momentum, p miss T . These final states have to fulfil the following criteria where stands for µ − and e + whereas j corresponds to light-and b-jets. For renormalisation and factorisation scales, µ R and µ F , three cases are considered. Specifically, we use a fixed scale µ R = µ F = µ 0 = m t and two dynamical ones µ R = µ F = µ 0 = E T /2 and µ R = µ F = µ 0 = H T /2, where the transverse energy of the tt system and the total transverse momentum of the ttj system are defined according to The dynamical scales are evaluated using the momenta after the application of the jetalgorithm. Thus, j b 1 and j b 2 are the b-jets and j 1 is the light (hard) jet. In the case of two resolved jets the jet with the highest transverse momentum is chosen. Additionally, momenta of t andt are reconstructed from their decay products, i.e. p(t) = p(j b 1 ) + p(e + ) + p(ν e ) and p(t ) = p(j b 2 ) + p(µ − ) + p(ν µ ) where j b 1 comes from the b-quark and j b 2 from anti-b quark.

Description of the Observable and the Methods Used
We start with the R(m pole t , ρ s ) observable defined as normalised differential distribution of the tt + 1 jet cross section with respect to the inverse invariant mass of the final state, M ttj , that can be written in the following form where m 0 = 170 GeV is a scale of the order of m t . We note here that top quarks are reconstructed from their decay products assuming exact W gauge boson reconstruction and taking j as the leading light jet irrespectively of its origin (production or decay). This corresponds to the invariant mass of the W W bbj system, M W W bbj , which for brevity we dub M ttj . In Figure 1, we present the NLO predictions for R(m pole t , ρ s ) considering the following three cases, namely Full (red solid line), NWA (blue dashed line) and NWA P rod. (green dotted-dashed line) for µ R = µ F = µ 0 = m t that is a scale choice commonly used for the pp → ttj production process at the LHC and with the CT14 PDF set. Also shown are the relative NLO QCD corrections σ LO /σ NLO − 1 (upper right panel) and the relative The CT14 PDF set is employed.
deviation of the NWA results from the full calculation (lower right panel). Both are given in percent.
To be more precise, in the former and the latter case shape differences are shown, since we consider normalised differential cross sections. For completeness in Table  2 integrated NLO cross sections are provided. Combined finite top quark and W gauge boson width effects change the NLO cross section by 2%, which is consistent with the expected uncertainty of the NWA, that is of the order of O(Γ t /m t ). In Figure 2, we show R(m pole t , ρ s ) as given by the best theoretical predictions (Full ) with µ R = µ F = µ 0 = H T /2 for five different top quark masses, that is m t ∈ {168.2, 170.7, 173.2, 175.7, 178.2} GeV. We also plot the ratio to the result with the default value of the top quark mass, i.e. m t = 173.2 GeV. To obtain these results the CT14 PDF set has been used, however, any of the PDF sets recommended by the PDF4LHC group can be employed here. A significant mass dependence can be observed for the regions ρ s < 0.4 and ρ s > 0. 6. The regions that are the most sensitive to the top quark mass extraction are above ρ s > 0.7. The latter finding is a consequence of the fact that, the tail of the ρ s distribution is very sensitive to the position of the ttj threshold, which in turn is sensitive to m t . On the other hand, the crossing of various curves that happens around ρ s ≈ 0.55 marks a point where the normalised ρ s distribution is essentially insensitive to the top quark mass. We can observe from Figure 1 that in the most sensitive region deviations of NWA from the Full case are below 15%. On the contrary, substantial differences, 55% − 85% up to even 100%, are visible for NWA P rod. in that region. These differences should have a considerable impact on the extraction of m t . The comparison of NWA and NWA P rod shows that in the region ρ s ≈ 0.8 (close to production threshold) more than 50% of the events originate from the radiative treatment of top quark decays. Conversely, in the region around ρ s ≈ 0.2 this  treatment leads to 10% negative corrections with respect to the approximation where top quarks do not radiate hard jets in the decay stage. The correct perturbative description of this observable, therefore, requires hard jet emission in production and decay (and their mixed contributions). Note here that in [24,25] jet radiation by top quark decays is not included. Additionally, in the most sensitive region, sizeable NLO QCD corrections (shape differences), of the order of 50%, are obtained for Full and NWA theoretical predictions. In the case of NWA P rod. they are around 20%. The dominant source of the large K factor is final state radiation. Nevertheless, in each case differential K factors are indeed far from constant. Thus, LO calculations together with a suitably chosen global K-factor can not be applied to obtain results that well approximate the full NLO QCD calculation. As a consequence, great caution has to be taken for merging LO samples with parton shower programs to obtain realistic hadronic events, directly comparable with the experimental data. Instead, predictions with NLO QCD corrections included should be used in m t studies where the shape of the ρ s observable is important.
In the next step, the R(m pole t , ρ s ) differential distributions shall be used to obtain the top quark mass. To this end a set of pseudo-data is compared to R(m pole t , ρ s ) as generated with five different top quark masses and with three different theoretical descriptions of the pp → e + ν e µ −ν µ bbj + X production process. The pseudo-data set is generated randomly according to the best theoretical prediction at hand, i.e. the Full prediction at NLO in QCD as generated with m t = 173.2 GeV and µ R = µ F = µ 0 = H T /2. Unless explicitly mentioned this particular setup with the CT14 PDF set will always be employed for the
The CT14 PDF set and m t = 173.2 GeV are used. In the last two columns the combined relative size of off-shell effects of t and W is also given.
generation of the pseudo-data sets for all considered observables. For completeness, in Figure 3 the normalised ρ s observable is plotted again, however, this time the Full case (red solid line) is shown for µ R = µ F = µ 0 = H T /2. When comparing the differential K−factor for the Full case in Figure 1 and Figure 3 we find that the large corrections in the region ρ s ≤ 0.3 are removed. The effect can be attributed to the scale choice made. The kinematic tail of M ttj only shows perturbative convergence when dynamic scales are employed, and for the ρ s observable the high energy kinematic tail corresponds to low values of ρ s . Thus, in addition to differences for large values of ρ s present in Figure  1, there are now only differences up to −15% at low values of ρ s . Since this region is sensitive to the top quark mass we expect to see an impact on m t . Moreover, even though we have only simulated decays of the weak bosons to different lepton generations, i.e. W + W − → e + ν e µ −ν µ omitting same generation lepton interference effects as occurring in  W + W − → e + ν e e −ν e we adjust the counting factor to correspond to the production of all combinations of charged leptons of the first two generations. The interference effects can be safely neglected because they are at the per-mille level for our inclusive selection cuts as has been directly checked using LO results. The complete cross section with = e ± , µ ± is, thus, obtained by multiplying the result for pp → e + ν e µ −ν µ bbj + X with a lepton flavour factor of 4. In this way, two cases of integrated luminosity 2.5 fb −1 and 25 fb −1 , that we shall consider in the following, correspond, assuming perfect detector efficiency, approximately to  Table 3. Various binnings used in the m t extraction from the normalised ρ s distribution. The three cases are shown: 5 equal size intervals as well as ATLAS [26] and CMS [27] intervals. The latter two have been optimised for the +jets and di-lepton channels at √ s = 7 TeV and √ s = 8 TeV respectively.
5400 and 54000 events respectively. Errors on the pseudo-data are calculated according to the Bernoulli distribution. Notice that the theoretical predictions are calculated with such high statistics that the Monte Carlo errors in each bin are negligible compared to the errors of the pseudo-data samples with the chosen luminosities. Examples of the pseudo-data sets for both cases, 2.5 fb −1 and 25 fb −1 , are shown in Figure 4. We shall consider various choices for the number of bins and the bin size for the ρ s observable to check whether there is any effect on m t . More precisely, we consider 31 and 5 bins of equal size as well as the bin intervals as proposed by ATLAS [26] and CMS [27] collaborations in their studies at the LHC with √ s = 7 and √ s = 8 TeV using the +jets and the di-lepton top quark decay channels respectively. The latter three cases are summarised in Table 3. In Figure  5, templates for the Full case for five different top quark masses with the different bin size are given assuming µ 0 = H T /2 and the CT14 set for PDFs. To emphasise the regions with the largest sensitivity to the top quark mass the ratio to the result with the default value of the top quark mass, m t = 173.2 GeV, is also shown. The top quark mass is determined by a comparison of the pseudo-data with the theoretical predictions for different values of m t in individual bins of the normalised ρ s distribution. The most probable value of the top quark mass is extracted by means of the χ 2 i distribution for each bin i. To be more precise, for each bin the predicted theoretical values (cross sections) for different m t are fitted using a second order polynomial function f i (m t ) in order to obtain a continuous distribution as a function of the top quark mass. Example of such functions for the ATLAS and CMS intervals are shown in Figure 6 and Figure 7,     Figure 6. Bin-by-bin fit of template distributions for the normalised ρ s observable as given by full theory at NLO in QCD for the pp → e + ν e µ −ν µ bbj + X production process at the LHC with √ s = 13 TeV. The CT14 PDF set and µ R = µ F = µ 0 = H T /2 are used. The ATLAS binning is assumed. The error bars denote statistical uncertainties.
Afterwards the χ 2 i distribution is constructed according to the following formula where f i (m t ) represents the fit to the given theoretical predictions in the bin i, N pseudo−data i is the number of the selected pseudo-data events in that bin and δN pseudo−data i stands for statistical uncertainty of the pseudo-data in the bin i. The χ 2 i distribution does not take into account the theoretical uncertainties stemming from the scale variation and from the PDF . Bin-by-bin fit of template distributions for the normalised ρ s observable as given by full theory at NLO in QCD for the pp → e + ν e µ −ν µ bbj + X production process at the LHC with √ s = 13 TeV. The CT14 PDF set and µ R = µ F = µ 0 = H T /2 are used. The CMS binning is assumed. The error bars denote statistical uncertainties.
uncertainties, which are going to be treated as external variations as described below. The global χ 2 is calculated by simply summing all bins since individual bins are not correlated 5) where N is the number of bins. The number of degrees of freedom is reduced since one degree of freedom is used by the normalisation of the theoretical distributions. As usual we expect that the numerator of each term will be of the order of δN pseudo−data i , so that each term in the sum will be of the order of unity. Hence a sample value ofχ 2 ≡ χ 2 /d.o.f should be approximately equal to 1. If this is the case, we shall conclude that our pseudo-data are well described by the values we have chosen for the f i (m t ) functions. If our sample value ofχ 2 turns out to be much larger than 1 we may conclude the opposite. The resulting representativeχ 2 distributions with the binning as proposed by the ATLAS and the CMS collaborations for both cases of the integrated luminosity, i.e. 2.5 fb −1 and 25 fb −1 , are shown in Figure 8. The position of the minimum of the χ 2 distribution is taken as the extracted top quark mass value, m out t . The statistical uncertainty on the top quark mass δm out t is calculated in the standard way, i.e as the ±1σ deviation from the minimum by applying the χ 2 + 1 variation. The sensitivity to the theoretical assumptions and their uncertainties is assessed by using one thousand pseudo-data sets. Afterwards, the averaged χ 2 and m out t are inferred and δm out t is taken as ±1σ deviation from the averaged m out t by applying 68.3% C.L. to the following (sorted) spread m out t − m i t , where i = 1, . . . , 1000 counts the pseudo-experiments. Distributions of the minimumχ 2 and of the corresponding top quark mass from the 1000 pseudo-experiments are presented in Figure 9 and Figure 10. Results are shown for L = 2.5 fb −1 and L = 25 fb −1 respectively, and for the Full theory with µ R = µ F = µ 0 = H T /2 and the CT14 PDF set. For a given luminosity, a higher number of bins corresponds to the better top quark mass resolution and to a substantial decrease of the spread of theχ 2 min values. Once the luminosity is increased, see Figure  10, the top quark mass resolution is also improved as of course anticipated. The improved       NLO, NWA Prod. , µ 0 = m t Figure 11. A difference between the fitted top quark mass and the top quark mass assumed in the theoretical prediction used for the generation of the pseudo-data set. The ATLAS binning is assumed. Luminosity of L = 2.5 fb −1 is considered and the CT14 PDF set is employed. The grey (dashed) line corresponds to "Fitted m t " = "True m t ".
resolution can be used to make a more accurate determination of the top quark mass. Theoretical uncertainties stemming from the scale variation and various PDF parameterisations are included in the following manner. For each source of uncertainty the normalised ρ s differential distribution with various top quark masses are prepared replacing old template distributions with default setup, i.e. with the µ R = µ F = µ 0 and the CT14 PDF set. Thus, for each top quark mass value considered we generate the following additional normalised ρ s distributions For each case, the χ 2 distribution is calculated and the corresponding top quark mass is inferred. The difference between the central values of the new extracted top quark masses and m out t as obtained from the default case, ρ s (µ 0 , CT14), is taken as the systematic uncertainty. To be more precise, the theoretical uncertainties are estimated according to the following formulae (3.8) To be more conservative the highest value from the two obtained is chosen and symmetrisation is not utilised. Let us note that the simultaneous variation of the renormalisation and factorisation scales up and down by a factor of 2 around the central value µ 0 is motivated by our previous findings. In Ref. [29] we have shown that the scale variation for the process under consideration is fully driven by the changes in µ R independently of the scale choice.
Let us additionally add that scale variations are applied to the numerator and denominator of the normalised distributions in a correlated way. At last, possible biases on the m t extraction have also been examined by employing all theoretical descriptions to obtain the pseudo-data sets. Thus, the Full case for three different scale choices µ 0 = m t , µ 0 = H T /2 and µ 0 = E T /2 as well as NWA and NWA P rod. for µ 0 = m t have been used not only as templates but also for the pseudo-data generation. In the end, the value of the top quark mass obtained from the global χ 2 distribution has been compared to the top quark mass of the theoretical sample used as an input. The final results, assuming the ATLAS binning, are presented in Figure 11 and Figure 12 separately for L = 2.5 fb −1 and L = 25 fb −1 . A good agreement within the corresponding statistical errors has been found for each top quark mass, for all theoretical predictions and for both luminosity cases. Neither a favoured value of the top quark mass nor a bias towards a higher or a lower m t has been observed. Thus, we can conclude that the method used is indeed unbiased.

Numerical Results for m t Based on the Normalised ρ s Distribution
Our findings for the top quark mass, as determined from the normalised ρ s distribution using the methods described in the previous section, are summarised in Table 4 and Table  5. They are obtained for the integrated luminosity of L = 2.5 fb −1 and L = 25 fb −1 at the LHC with √ s = 13 TeV. We show the mean value of the top quark mass as collected from the 1000 pseudo-experiments, m out t , the 68 % C.L. (1σ) statistical error on the top quark mass, δm out t , and the averaged minimal χ 2 /d.o.f . The significance of a discrepancy between the pseudo-data and what one expects under the assumption of particular theoretical description is quantified by giving the probability value, the p-value. The latter is defined as the probability to find χ 2 in the region of equal or lesser compatibility with the theory in question than the level of compatibility observed with the pseudo-data. Thus, in Table 4 and Table 5 the p-value is also provided together with the corresponding number of standard deviations, which is shown in parentheses. Let us note at this point, that the smaller the p-value the larger the significance because it tells us that the theoretical description under consideration might not adequately describe the pseudo-data. We would normally start to question the theoretical description employed only if we were to have found the p-value smaller than 0.0455 (larger than 2σ). If the p-value is larger than 0.0455 (smaller than 2σ) then we assume that the pseudo-data are consistent with the theoretical approach used to model the process under consideration. Results with p-value smaller than 0.0027 (larger than 3σ) can be considered to be disfavoured by the pseudo-data. Finally,  Table 4. Mean value of the top quark mass, m out t , from 1000 pseudo-experiments as obtained from the normalised ρ s differential distribution for the pp → e + ν e µ −ν µ bbj + X production process at the LHC with √ s = 13 TeV. Also shown is 68 % C.L. (1σ) statistical error of the top quark mass, δm out t , together with the averaged minimal χ 2 /d.o.f and the p-value. The number of standard deviations corresponding to each p-value is presented in parentheses. In the last column the top quark mass shift, defined as m in  Table 5. Mean value of the top quark mass, m out t , from 1000 pseudo-experiments as obtained from the normalised ρ s differential distribution for the pp → e + ν e µ −ν µ bbj + X production process at the LHC with √ s = 13 TeV. Also shown is 68 % C.L. (1σ) statistical error of the top quark mass, δm out t , together with the averaged minimal χ 2 /d.o.f and the p-value. The number of standard deviations corresponding to each p-value is presented in parentheses. In the last column the top quark mass shift, defined as m in t − m out t , with m in t = 173.2 GeV, is also given. Luminosity of 25 f b −1 is assumed.
We start with results for L = 2.5 fb −1 that are collected in Table 4. The first thing that we can notice is an overall agreement, within 0.8σ − 1.3σ, between various theoretical descriptions and the pseudo-data. Moreover, for all considered cases, the averaged minimal χ 2 /d.o.f is of the order of 1. However, depending on the theory at hand, various mass shifts are observed. For the Full case, independently of the bin size and the scale choice, a difference from m in t up to 1 GeV can be identified. On the other hand, a shift of 2.0 − 2.5 GeV is visible for NWA. Should we use the Full case with the fixed scale µ 0 = m t for the generation of the pseudo-data instead, the shift of 1.2 − 2.0 GeV would rather be seen for NWA. However, the statistical uncertainty δm out t is still quite high for this case, that is of the order of 1 GeV. For the higher luminosity case, that we shall present in the next step, despite the diminished quality of the χ 2 fit the mass shifts will persist. They are again up to 2.5 GeV (2.0 GeV) for NWA with µ 0 = m t when pseudo-data are generated from the Full case with µ 0 = H T /2 (µ 0 = m t ). In that case δm out t is of the order of 0.3 − 0.4 GeV only. Thus, the off-shell effects and non-resonant contributions of the top quark and the W gauge boson are not negligible for the top quark mass extraction from the R(m pole t , ρ s ) observable. For the last case considered, that is NWA P rod. , a substantial deviation from m in t of the order of 3.2 − 3.8 GeV is observed, which can be explained by substantial shape differences of the normalised ρ s distribution in the regions sensitive to m t . The large mass shifts for two NWA cases suggest that for R(m pole t , ρ s ) the full theory description for the pp → + ν −ν bbj production process is indeed required. Additionally, when examining Full and NWA cases closer, for example for the same scale choice, that is for µ 0 = m t , we can see that the statistical uncertainty of m out t is always higher in the former case. This suggests underestimation of δm out t by about 20% − 45% in the case of NWA. Let us remind here, that the NWA P rod. case is far from complete theory since only higher order corrections to on-shell top quark pair production with one hard jet are incorporated. Thus, we mostly show this case for reasons of comparison and to underline the importance of QCD corrections and jet radiation in top quark decays. Moreover, let us stress here, that in ATLAS and CMS experimental analyses [26,27], the on-shell ttj production process calculated at NLO in QCD is combined with a parton shower. Top quark decays are treated in the parton shower approximation omitting tt spin correlations. However, the shower programs include higher-order corrections to the hard subprocess in an approximate way by including the leading-logarithmic contributions to all orders. These dominant contributions are associated with collinear parton splittings or soft gluon emissions. Additionally, the parton shower approximation takes into account not only the collinear enhanced real parton emissions at each order in perturbation theory but also, by unitarity, virtual effects of the same order. Such effects are included in the probability of not splitting during evolution from one scale to the other encoded in the Sudakov form factor. Finally, top quark decays in standard shower programs are not based on a strict NWA, but rather obey a Breit-Wigner distribution that should account for the dominant off-shell effects. Therefore, NLO plus parton shower results are better approximations of NWA rather than of NWA P rod. . Nevertheless, in Ref. [26,27], such predictions are first tuned to data and afterwards unfolded back to the parton level to obtain the on-shell top quarks, that are used to construct R(m out t , ρ s ).
In the next step, we concentrate on results obtained for increased integrated luminosity of L = 25 fb −1 , which are summarised in Table 5. First, as expected, the statistical uncertainty δm out t decreases with the square root of luminosity. Secondly, our conclusions about the top quark mass shift derived for L = 2.5 fb −1 are not altered. Thirdly, underestimation of the statistical uncertainties on m t by the NWA case can still be observed. Here this effect amounts to 15% − 35%. However, unlike the case of low integrated luminosity, for L = 25 fb −1 sensitivity to the various theoretical predictions is clearly visible. This can be best observed in the changes of χ 2 /d.o.f and the p-value. The pseudo-data are properly described only by Full either with 0σ − 1.3σ). The best agreement in the former case is rather trivial. Full with µ 0 = H T /2 will always work since it is used to obtain our pseudo-data sets. Less trivial is the fact that also Full with µ 0 = E T /2 performs very well at least with the integrated luminosity at hand. This is due to the fact that these two scales provide very similar results. On the other hand, independently of the bin size the NWA case is disfavoured at the 4σ − 5σ level and NWA P rod. at the 2σ − 5σ level. Even for the Full case with µ 0 = m t discrepancies at the level of 3σ − 4σ are observed. The latter finding underlines the fact that when differential cross sections are employed not only the full off-shell effects and non-resonant background contributions of the top quark and W gauge boson but also the scale choices play an important role. We note here, that a higher number of bins that corresponds to increased sensitivity to m t , helps to clearly distinguish between the case where the theory (still) agrees with the pseudo-data and the one where the theory is disfavoured by such pseudo-data.
In the following, systematic uncertainties on m out t are examined. They are estimated based on the full theory because ultimately only this description should be used for the normalised ρ s distribution. Our findings are luminosity independent. However, as expected, in the case of the scale dependence they depend on the scale choice. Additionally they vary with the bin size used. For µ 0 = H T /2 and µ 0 = E T /2 theoretical uncertainties stemming from the scale variation have been estimated to be of the order of 0.6 GeV − 1.2 GeV, whereas for µ 0 = m t they are larger of the order of 2.1 GeV − 2.8 GeV. The smallest values are obtained for the case of the largest number of bins of equal size. As mentioned before the theoretical uncertainties, as obtained from the scale dependence of the templates, are not the only source of systematic uncertainties. Another source comes from various PDF parameterisations. Here, quite uniform uncertainties in the range of 0.4 GeV − 0.7 GeV have been obtained. Thus, PDF uncertainties on m t for the process under scrutiny are well below the theoretical uncertainties due to scale dependence, which remain the dominant source of the theoretical systematics on the top quark mass extraction.

Comparison to the ρ s Observable
We also examine a slightly modified version of the normalised ρ s distribution dubbed ρ s . The difference between ρ s and ρ s comes from the second hard jet. Namely, if a second (leading-order) jet is resolved, it is added to the invariant mass of the ttj system. The upper part of Figure 13 presents the R(m pole t , ρ s ) differential distribution for three considered cases, Full, NWA and NWA P rod. . The renormalisation and factorisation scales are set to the common value µ 0 , where µ 0 = m t for both NWA cases and µ 0 = H T /2 for the Full case. Also shown are the relative NLO QCD corrections σ LO /σ NLO − 1 and the relative deviation of the NWA results from the full calculation. On the other hand, in the lower part of prediction, Full, with µ R = µ F = µ 0 = H T /2 for five different top quark masses. We can see that the beginning of the spectrum is mostly affected, because the additional hard jet essentially modifies the tails of M ttj . Moreover, the peak of the distribution is shifted towards smaller values of ρ s . The magnitude and sign of higher order corrections for NWA and NWA P rod. have also changed for ρ s < 0.3. For the Full case we have obtained shape differences from −40% to +20%, which once again underline the importance of the inclusion of higher-order corrections. Nevertheless, a large impact on the top quark mass extraction is not expected since the highest sensitivity falls in the range of high values of ρ s as can be deduced from the lower part of Figure 13 where a similar dependence on m t as in the case of ρ s is visible. For low values of ρ s we observe differences up to +20% for NWA and +15% for NWA P rod. , however, they are present only around ρ s ≈ 0.1 where the dependence on m t is diminished. Our findings on m out t can be summarised as follows. For the low luminosity case all extracted top quark masses are at most 1σ away from the corresponding values obtained with the help of ρ s . As a consequence, a similar size of the top quark mass shifts is noted. We should stress here that the quality of χ 2 /d.o.f. is worsened for all but the Full case with µ 0 = H T /2 and µ 0 = E T /2. For L = 25 fb −1 only these two cases should be employed since other theoretical approaches are disfavoured beyond 5σ level. The main reason for which the previously defined variable ρ s should be used instead of ρ s , however, is the size of theoretical uncertainties. Once ρ s is used the theoretical uncertainties due to the scale dependence are driven by the leading order scale dependence of the second hard jet and a significant increase is observed. Namely, they amount to 3 GeV − 4 GeV for the Full case with µ 0 = H T /2 or µ 0 = E T /2. On the other hand, the magnitude of the PDF uncertainties is the same.

Comparison to Other Observables
In the following, we shall focus on examining the sensitivity of the normalised ρ s distribution by comparing it to other observables sensitive to m t in the pp → ttj production process. Since ρ s is defined as the (inverse) invariant mass of the tt plus additional hard jet system the natural observable to start with is the invariant mass of the top anti-top pair alone. In such a way we can assess the impact of the additional hard jet on the m t extraction. The normalised M tt differential distribution is presented in Figure 14 together with its dependence on m t . For the top quark mass study a range up to 1 TeV has only been used, the reason being the minimised difference between Full and NWA in this region. Furthermore, high energy regions are not only sensitive to electroweak corrections but also could be potentially diluted by new, not yet discovered, heavy resonances decaying to tt final states. Our findings are summarised in Table 6. For the same luminosity, the M tt differential distribution seems to perform better than ρ s yielding statistical uncertainties a factor of 2 to 2.4 smaller. More importantly, the shift of the top quark mass, m in t − m out t , is greatly reduced. For Full and NWA it is below or of the order of 0.5 GeV and for the NWA P rod. case it is equal to 1.8 GeV. For a measurement not only the sensitivity is important but also the reliability of the observable used. For example in the case of M tt a good sensitivity is achieved mostly due to a few first bins. In this extreme threshold regime, however, theoretical predictions would require to go beyond fixed order perturbation theory resumming threshold effects and soft gluon emissions. Such studies have been carried out for the invariant mass distribution of the on-shell top quarks in the tt production process at the LHC with √ s = 14 TeV [46]. From that study one can conclude that in the threshold region the enhancement of the cross section amounts to roughly a factor 3, additionally a significant shift of the threshold is observed. Compared to the inclusive fixed order NLO total cross section for tt production with µ 0 = m t , however, the increase is relatively small, of the order of 1% only. In principle the shape of the differential distribution dσ/dM tt could be distorted in the threshold region, which in turn can affect the mean value of m out t and shift it towards smaller values. In practise, however, one needs to realise here, that the size of the region where these effects can be visible is of the order of 1 GeV only. Therefore a very fine resolution would be required to get sensitivity to the threshold effects. In our studies such effects are incorporated into one bin of 30 GeV size. We have even checked a larger bin size of 60 GeV (smaller number of bins) and confirmed that our findings on m out t extraction are unchanged as can be seen from Table 6. Therefore, we conclude that we are below any sensitivity to such threshold effects since they are completely washed out by our M tt resolution. Consequently, M tt can be safely used. We are not aware of similar studies for the on-shell ttj production at the LHC. Thus, it is still not clear to which extent the normalised ρ s distribution can be affected by initial state radiation as well as bound state corrections for ρ s ≈ 1. The theoretical uncertainties of m out t based on M tt stemming from the scale variation are estimated to be of the order of 1.3 GeV, thus slightly larger than in the case of ρ s . On the other hand, by comparison the PDF uncertainties are negligible, at the level of 0.1 GeV. One more time, we can observe that for L = 25 fb −1 only the Full theory adequately describes the pseudo-data. We would like to stress here, that the M tt observable, similarly as the normalised ρ s distribution, allows to unfold the (real) data to the perturbative partonic level uniquely linking the m out t value to the top quark mass from the SM Lagrangian.
Similar performance as in the case of M tt can been obtained with the help of the more exclusive observable, M b , defined as the invariant mass of a b-jet and a charged lepton. This observable is frequently used for top quark mass measurements by both ATLAS and CMS experimental collaborations in the di-lepton top quark decay channel, see e.g. Ref. [9,47]. We employ the invariant mass of the positron and a b-jet, keeping in mind that experimentally one cannot uniquely determine which b-jet should be taken into account to build the observable. If the be + pair that returns the smallest invariant mass will be chosen, however, then the probability that both final states come from the decay cascade initiated by the same top quark increases [48]. Thus, we define M be + as (6.1) The M be + observable possesses a kinematic endpoint that can be derived from the on-shell top-quark decay into t → W + b → e + ν e b. Since we have m 2 t = p 2 t = m 2 W + 2p b p e + + 2p b p νe the invariant mass of the positron and the bottom quark is given by M be + = √ 2p b p e + and in the massless case should be smaller or equal to m 2 t − m 2 W . When both the t quark and the W gauge boson are treated as on-shell particles at the lowest order this strict kinematic limit amounts to M max. be + = 153.4 GeV. Additional radiation, for example from parton showers or the real emission part of the higher order corrections, as well as off-shell effects and nonresonant contributions of the top quark and the W gauge boson introduce a smearing of M max. be + . In Figure 15, M be + is shown together with its top quark mass dependence. A sharp fall of the cross section around the value of 153 GeV is clearly observed. In the range below the kinematical cut-off the size of off-shell effects is negligible. Above the M max. the whole range of M b is to be taken into account, the reason being a drop of the cross section by one or even two orders of magnitude for M be + M max. be + . Our findings on m out t are recapitulated in Table 7. We first observe that for the same case of the integrated luminosity a similar size of statistical uncertainties is obtained as for M tt . Next, for the Full case with L = 2.5 fb −1 the top quark mass shift is much smaller of the order of 0.2 GeV only. When luminosity is increased it is even further reduced down to 0.03 GeV. At the  Table 6. Mean value of the top quark mass, m out t , from 1000 pseudo-experiments as obtained from the normalised M tt distribution for the pp → e + ν e µ −ν µ bbj + X production process at the LHC with √ s = 13 TeV. Also shown is 68 % C.L. shift in the top quark mass extraction of −0.70 GeV. We note that this result is compatible with the shift of −0.83 GeV observed for the process pp → e + ν e µ −ν µ bb in Ref. [21], that was generated for the similar setup. Also for L = 2.5 fb −1 all theoretical descriptions can be employed, whereas for the case of L = 25 fb −1 only the Full approach provides p-value larger than 0.0027 (below 3σ). Lastly, theoretical uncertainties are very small, i.e. of the order of 0.05 GeV for the Full case with dynamical scale choice and 1 GeV for the Full  Table 7. Mean value of the top quark mass, m out t , from 1000 pseudo-experiments as obtained from the normalised M b differential distribution for the pp → e + ν e µ −ν µ bbj +X production process at the LHC with √ s = 13 TeV. Also shown is 68 % C.L. (1σ) statistical error of the top quark mass, δm out case with a fixed scale. The PDF uncertainties are independent of the scale choice and yield 0.02 − 0.03 GeV. Overall, considering all aspects, i.e. statistical uncertainties on m out t , the top quark mass shift, the quality of the χ 2 fit as well as theoretical uncertainties, the M b observable provides the best sensitivity to the top quark mass when the Full case is employed.
Our last (exclusive) observable, that we would like to examine, is the total transverse momentum of the top anti-top plus one hard jet system, H T , defined as Let us remind that in the case of two resolved jets the one with highest transverse momentum is chosen. In Figure 16, this observable is presented, again together with its dependence on m t . For normalised distributions a shape difference between Full and NWA P rod. is noticeable, which will be definitely reflected on the mean value of m out t . By applying the same arguments as for M tt also here a range only up to 1 TeV is used in the top quark mass studies. Our results on m out t are provided in Table 8. This observable has a similar performance in terms of statistical uncertainties as the normalised ρ s distribution. The top quark mass shift for the Full case is also comparable, i.e. between 0.1 − 0.7 GeV independently of the luminosity considered and reduced in the case of NWA and NWA P rod. . In the latter cases it amounts to 1.4 GeV and 2 GeV respectively. What seems to be different, however, is the good quality of the χ 2 fit independently of the theory applied and luminosity examined.
To be more precise, for L = 2.5 fb −1 all theoretical descriptions are within 1σ with the pseudo-data whereas for the L = 25 fb −1 case the same applies to the Full case with µ 0 = H T /2 and µ 0 = E T /2. Nevertheless, for Full, NWA and NWA P rod. with µ 0 = m t  Table 8. Mean value of the top quark mass, m out t , from 1000 pseudo-experiments as obtained from the normalised H T differential distribution for the pp → e + ν e µ −ν µ bbj + X production process at the LHC with √ s = 13 TeV. Also shown is 68 % C.L. (1σ) statistical error of the top quark mass, δm out t , together with the averaged minimal χ 2 /d.o.f and the p-value. The number of standard deviations corresponding to each p-value is presented in parentheses. In the last column the top quark mass shift, defined as m in t − m out t , with m in t = 173.2 GeV, is also given. Luminosity of 2.5 f b −1 , 25 f b −1 and 50 f b −1 is assumed.
we have an agreement within 2.6σ with the pseudo-data. This suggests that a larger integrated luminosity is required to clearly differentiate among various theoretical approaches used in the calculation of higher order QCD corrections to the pp → ttj production process in the di-lepton channel at the LHC. Indeed, already for L = 50 fb −1 , which corresponds approximately to 10800 events, again only the Full case with a dynamical scale choice, either µ 0 = H T /2 or µ 0 = E T /2, reproduces the pseudo-data adequately as can be seen from Table 8. The remaining cases, Full and NWA with µ 0 = m t , are disfavoured beyond the 4σ level. In the former case the 4σ difference can be simply attributed to the fixed scale choice used for the description of the H T /2 differential distribution, which not sufficiently describes tails of the distribution. As to the theoretical uncertainties the contribution re-lated to unknown higher-order corrections is estimated to be of the order of 0.5 GeV − 1.8 GeV for a dynamical scale choice and 2 GeV for a fixed scale. We have also analysed the theoretical error arising from different parametrisation of PDFs, being able to quantify it at the level of 0.4 GeV, thus well below the uncertainty associated with the scale dependence.

Summary and Conclusions
In this paper we have studied the normalised ρ s differential distribution including the leptonic top quark decays. We focused on fixed order NLO QCD calculations at the LHC with √ s = 13 TeV. Three different theoretical descriptions of the top quark decay chain have been investigated. In the first approach we included all interferences, off-shell effects and non-resonant backgrounds. In the second case top quark decays in the narrow width approximation have been considered. To be more precise two cases have been employed: NLO QCD corrections to the pp → ttj production process with leading order decays and the more sophisticated case with QCD corrections and jet radiation present also in top quark decays. We have used these various theoretical prescriptions to investigate their impact on the extraction of the top quark mass. We have compared them to the pseudo-data sets, that have been generated from the best theoretical description, i.e. the Full prediction at NLO in QCD as generated with m t = 173.2 GeV and µ R = µ F = µ 0 = H T /2. Moreover, we have quantified associated theoretical uncertainties. For the low integrated luminosity case with L = 2.5 fb −1 that corresponded in our case to approximately 5400 events assuming perfect detector efficiency and to the statistical uncertainty on the top quark mass of the order of δm out t = 1 GeV − 1.5 GeV, all theoretical prescriptions seemed to be in agreement with the pseudo-data sets. The largest discrepancy amounted to 1.3σ only. Additionally, the averaged minimal χ 2 /d.o.f was always around 1. However, substantial mass shifts, even up to 2.5 GeV and 3.8 GeV, have been observed in the case of NWA and NWA P rod. respectively. We have checked that generating the pseudo-data sets with the Full case but for µ R = µ F = µ 0 = m t does not change the situation, namely mass shifts up to 2 GeV for NWA and 3.8 GeV for NWA P rod. are still obtained. Thus, they cannot be ascribed only to effects of the scale choice used in the generation of the pseudo-data sets. For the higher luminosity case, that corresponded to 54000 events and δm out t = 0.3 GeV − 0.5 GeV, despite the diminished quality of the χ 2 fit these mass shifts remained unchanged. Taking into account the size of the statistical uncertainty on the top quark mass and the negligible statistical errors of theoretical predictions as compared to pseudo-data errors we conclude that independently of the integrated luminosity case only the Full prediction with either µ R = µ F = µ 0 = H T /2 or µ R = µ F = µ 0 = E T /2 should be used to extract the top quark mass from the normalised ρ s differential distribution once top quark decays are included. Using the best theoretical description at hand, we have established that theoretical uncertainties stemming from the scale variation were luminosity independent and of the order of 0.6 GeV − 1.2 GeV. The smallest value has been obtained for the normalised ρ s observable with the largest number of bins. Once a fixed scale has been used instead, they increased to 2.1 GeV − 2.8 GeV. Thus, additionally, the importance of the proper scale choice for the description of the differential cross sections has been shown here. Another source of theoretical uncertainties on the top quark mass extraction coming from various PDF parameterisations has been estimated to be in the range of 0.4 GeV − 0.7 GeV.
In the next step we examined a slightly modified version of the normalised ρ s differential distribution. Namely, if the second resolved jet was present it has been included in the definition of the observable. We have found similar performance as in the ρ s case for all aspects but theoretical uncertainties. The theoretical errors from the scale dependence increased to 3 GeV − 4 GeV for the Full case either with µ 0 = H T /2 or µ 0 = E T /2. The latter raise has been driven by the leading order nature of the second resolved jet.
Finally, to check the sensitivity of the ρ s observable we have made a comparison to the invariant mass of the tt system and to two other more exclusive observables like the minimal invariant mass of the charged lepton and b-jet as well as the total transverse momentum of the e + ν e µ −ν µ bbj system. In terms of the statistical errors on the extraction of m t and the mass shift the normalised invariant mass of the top anti-top pair has performed better than ρ s . For the same integrated luminosity case, δm out t was lower by a factor of 2 − 2.4. The quality of the χ 2 fit seemed similar, however, the m in t − m out t shift was below 0.1 GeV for the Full case with the dynamical scale choice, 0.3 GeV for the fixed scale and 0.5 GeV for the NWA case. In the case of NWA P rod. a somewhat higher value of the m in t − m out t shift, around 2 GeV, has been obtained. Thus, in the chosen range, i.e. up to 1 TeV, and for the low integrated luminosity case, the off-shell effects and non-resonant contributions of the top quark and W gauge boson were not very crucial. It turned out that the inclusion of the higher order corrections to the top quark decays was more important. Both Full and NWA cases could be employed for the m t extraction. Generally speaking, the case of the low integrated luminosity has shown lack of a sensitivity to the details of the top quark decays. Once increased luminosity was considered, however, the NWA case has been disfavoured at the 4σ − 5σ level considering only the statistical uncertainties. The performance of the normalised M tt observable was similar to the performance of the more exclusive and very well known observable used in the alternative m t measurements, i.e. the (normalised) minimal invariant mass of the bottom jet and the charged lepton, M b , which has also been examined. The last observable that we have studied was the normalised H T differential cross section. This exclusive observable proved to be similar to R(m pole t , ρ s ) in terms of δm out t and the quality of the χ 2 fit, however, the observed mass shifts were smaller, of the order of 0.7 GeV for Full, 1.4 GeV for NWA and 2 GeV for NWA P rod. . In addition, in order to disfavour the NWA approach beyond the 3σ − 4σ level the integrated luminosity had to be increased 20 times unlike for all other cases where the smaller change from 2.5 fb −1 to 25 fb −1 has been sufficient to obtain the 5σ level. Overall, among all studied normalised differential cross sections, ρ s has shown the highest sensitivity to the top quark and W gauge boson off-shell effects and non-resonant background contributions.
Let us note here that this is a theoretical study and additional systematic uncertainties need to be addressed. Among others the impact of the parton shower on the shape of ρ s , M tt , M b and H T observables should be carefully examined as well as non-perturbative effects together with the b-tagging and neutrino reconstruction efficiencies should be estimated. These uncertainties are, however, beyond the scope of this paper. We plan to study them in a separate publication. Even though we can not quantify the size of systematic uncertainties on the experimental side we can make the following general statement. If, for the particular observable that we have scrutinised for which large mass shifts have not been present, the systematic uncertainties are larger or of the same order as our statistical uncertainty δm out t for L = 2.5 fb −1 , various theoretical descriptions at NLO in QCD, that have been investigated in the paper, can be employed to simulate the pp → ttj production process in the di-lepton top quark decay channel. This is possible since we do not have sufficient sensitivity to see differences in the various descriptions of the top quark decays. In the case of observables with a large mass shift, e.g. ρ s or H T , all these theoretical descriptions may still be used but one would have to compensate for the shift. If the size of systematic uncertainties, however, is rather similar to δm out t for L = 25 fb −1 or in the case of H T to δm out t for L = 50 fb −1 , only the Full theoretical description with the dynamical scale choice, either µ 0 = H T /2 or µ 0 = E T /2, should be used to simulate the pp → e + ν e µ −ν µ bbj production process at the LHC to extract the top quark mass.
A few additional comments are in order. The R(m pole t , ρ s ) differential observable has already been employed by the ATLAS and CMS experimental collaborations at the LHC to determine the top quark mass. In both studies, on-shell top quarks have been used to build the normalised ρ s observable. In practise, various Monte Carlo programs have been used where at most on-shell tt or ttj samples at NLO in QCD have been matched with parton shower programs like PYTHIA or HERWIG. Nevertheless, such theoretical predictions have been first tuned to data to account for missing perturbative and non-perturbative contributions. In the next step they are unfolded back to the so-called parton level to obtain on-shell top quarks. These calibrations come with additional uncertainties that the experimental collaborations need to consider. Finally, such predictions are contrasted with the same data to extract the top quark mass. NLO QCD calculations with complete top quark and W gauge boson off-shell effects and non-resonant contributions included allow, instead, to define top quarks using kinematics and selection cuts making them much closer to the experimental data. Thus, for example the top quark mass can be measured using the fiducial differential cross section as a function of ρ s or M tt . To summarise, the aim of such precise theoretical predictions can be twofold. First, they can be used for a direct comparison with the LHC data at the parton level, which would lead to the much simplified calibration procedure and substantial reduction of the systematic uncertainties. Secondly, they can be utilised by the experimental collaborations at the intermediate level to test the quality of the tuning and unfolding procedures. Close collaboration on these issues with experimental colleagues from ATLAS and CMS is already planned.  Here, µ 0 is the common value chosen for the renormalizaton and factorization scales, µ R = µ F = µ 0 . These comparisons were relevant for our top-quark mass extraction studies performed in Section 6. In this Appendix, we show the very same comparison albeit with a common scale choice, µ 0 = m t , used for all three theoretical predictions. We believe such comparison will better reflect the finite-top-width and finite-W -width effects in Full compared to NWA. In Figures 17, 18 and 19, we show Full, NWA and NWA P rod. predictions, all with µ 0 = m t for normalised M tt , M b and H T observables.