Fragmentation Uncertainties in Hadronic Observables for Top-quark Mass Measurements

We study the Monte Carlo uncertainties due to modeling of hadronization and showering in the extraction of the top-quark mass from observables that use exclusive hadronic final states in top decays, such as $t \rightarrow \text{anything+J/}\psi$ or $t\rightarrow \text{anything}+(B\rightarrow \text{charged tracks})$, where $B$ is a $B$-hadron. To this end, we investigate the sensitivity of the top-quark mass, determined by means of a few observables already proposed in the literature as well as some new proposals, to the relevant parameters of event generators, such as HERWIG 6 and PYTHIA 8. We find that constraining those parameters at $\mathcal{O}(1\%-10\%)$ is required to avoid a Monte Carlo uncertainty on $m_t$ greater than 500 MeV. For the sake of achieving the needed accuracy on such parameters, we examine the sensitivity of the top-quark mass measured from spectral features, such as peaks, endpoints and distributions of $E_{B}$, $m_{B\ell}$, and some $m_{T2}$-like variables. We find that restricting oneself to regions sufficiently close to the endpoints enables one to substantially decrease the dependence on the Monte Carlo parameters, but at the price of inflating significantly the statistical uncertainties. To ameliorate this situation we study how well the data on top-quark production and decay at the LHC can be utilized to constrain the showering and hadronization variables. We find that a global exploration of several calibration observables, sensitive to the Monte Carlo parameters but very mildly to $m_{t}$, can offer useful constraints on the parameters, as long as such quantities are measured with a 1% precision.


Introduction
The top quark mass (m t ) is a fundamental parameter of the Standard Model which, together with the W mass, constrained the Higgs mass even before its discovery. It plays a crucial role in the determination of the Standard Model vacuum life-time, which has been recently found to be at the border between stability and metastability [1,2]. It is therefore of paramount importance to measure m t with the highest possible accuracy and provide a reliable determination of the error on m t . Furthermore, much work has been lately carried out in order to estimate the uncertainty on m t , once it is interpreted in terms of well-defined theoretical quantities, such as the pole mass [3,4,5].
In fact, standard measurements, based on the template, matrix-element and ideogram methods (see, e.g., the analyses in [6,7,8]), rely on parton shower generators such as HERWIG [9] or PYTHIA [10]. They simulate the hard scattering at leading order (LO) and multiple radiation in the soft or collinear approximation, with phenomenological models for hadronization and underlying events. By contrast, so-called alternative methods use other observables, e.g., total cross sections or kinematic endpoints, which, besides Monte Carlo simulations, can be compared directly with fixed-order and possibly resummed QCD calculations, thus allowing a straightforward theoretical interpretation of the extracted mass (see, for example, Ref. [11] on the pole mass extraction from the total tt production cross section).
More recent NLO+shower programs such as aMC@NLO [12] and POWHEG [13] implement NLO hard-scattering amplitudes, but still rely on HERWIG and PYTHIA for parton cascades and non-perturbative phenomena. NLO corrections to top quark production have been available in both aMC@NLO and POWHEG for some time, while much effort has been later devoted to improve the treatment of top quark decays. In the aMC@NLO code, NLO top decays are implemented for single-top events [14]; in tt production, the decays are still on shell, but spin correlations and part of the off-shell contributions are included via MadSpin [15]. In the POWHEG framework, NLO corrections to top decays were implemented first in Ref. [16], with an approximate treatment of top-width effects, and more recently, in Ref. [17], the interference between top production and decay, as well as non-resonant contributions have been included. Furthermore, the SHERPA code [18] implements tt production in conjunction with up to three jets at NLO, merged to parton showers along the lines of Ref. [19]. Top decays in SHERPA include spin correlations and are accounted for in the LO approximation [20].
At present, since most standard m t determinations rely on the reconstruction of b-flavored jets in top decays (t → bW ), one of the main uncertainties is from the b-jet energy scale (bJES), which amounts to about 250 MeV of the overall 700 MeV in the world average determination [21]. Therefore, several attempts have been made so far to overcome the difficulties of such standard methods, in particular to calibrate the jet energies to very high accuracy. Some strategies use the W -boson mass as a constraint to calibrate in-situ the jet reconstruction [7]: this method, however, cannot account for the differences between b-jets and light-flavored jets, as the W decays mostly into light or charm quarks. In other attempts the jet energy scale has been effectively constrained by exploiting the anti-correlation of the b-jet energy and angular variables [8] imposed by the V − A matrix element that in the Standard Model describes top-quark decay.
Despite all these efforts, the jet energy scale keeps being a bottleneck for the improvement of standard mass measurements; therefore, new methods have been conceived to go around this uncertainty. Among the earliest attempts, we recall those based on the exclusive fragmentation of the b-quark to J/ψ [6,22] and studies of the decay length of B hadrons [23,24]. In the J/ψ method, leptonic decays of the J/ψ are used to identify the resonance. The distribution in the invariant mass formed by the two leptons from the J/ψ and the lepton from W decay, m ¯ , is the key observable which is then compared with a set of Monte Carlo simulation samples generated with different input top quark masses. The overall strategy does not directly involve any jet-energy measurement in the determination of the quantity sensitive to the top quark mass, and therefore it is free from jet-energy calibration issues. 1 Similarly, the determination of the B hadron 1 The relevant selection criteria are usually imposed on events accompanying jets as well as the leptonic J/ψ. Nevertheless, the effect of jet calibration is not direct because the mis-measurement of energy affects only the phase space of the events, not the observable itself. The Monte Carlo simulations, which are used to extract mt from the observed distribution, implement the nominal phase space which is not identical to decay length does not involve directly any jet-energy measurement, as it is based on the identification of the secondary vertex in the event through tracking: therefore, this measurement is clean from jet-calibration issues.
For all methods involving the tagging of exclusive hadronic final states, e.g., the di-lepton J/ψ final state mentioned above, we expect the following number of recorded events at 14 TeV LHC: where L, tagging , cuts , and BR(tt) denote the integrated luminosity in ab −1 , the tagging efficiency for hadronic objects of interest, the fraction of events satisfying the associated selection cuts, and the branching ratio of tt into a given final state, respectively. For the J/ψ clean di-lepton tagging, tagging will be O(10 −5 ), just taking into account branching fractions. Adding up all decay modes of B hadrons that can be reconstructed from just charged tracks (e.g., B + → K + π + π − ), we may increase the number of events by a factor O(1-10). Therefore we adopt tagging ∼ 10 −4 as baseline in eq. (1). Clearly these methods face the challenge of collecting the sufficient number of events in order to fulfill precision mass measurements; for cuts · BR(tt) 1 and an integrated luminosity of several ab −1 , the desired statistics may nevertheless be reached [25]. Although not being affected by uncertainties from jet measurements, in the J/ψ analysis a crucial role is nevertheless played by the hadronization of b quarks into B-hadrons and the associated Monte Carlo uncertainty. Reference [25] estimates that the contribution due to bottom-quark fragmentation is about 300 MeV, for a total theoretical error of 900 MeV In case of less exclusive requirements on B-hadron decays, it is possible to obtain larger data samples. This would be the case, for instance, of an analysis based on B → + X, which may allow to extract m t from the m distribution obtained by pairing up the soft lepton from B decay and the lepton from the W boson in the same top-quark decay. This method has been employed by CDF [26] and might be used in LHC data as well.
that for the events recorded in the experiment. This difference in practice induces very marginal effects on mt, as one can check them by making a slight change for the phase space in the event simulation.
Exactly as it is the case for mass measurement from the J/ψ method, in a mass determination that uses soft leptons from B hadrons, we should expect a sensitivity to the showering and hadronization description of the event and corresponding systematics. A similar conclusion holds for methods that use the decay length of B hadrons [23,24] and those that exploit the mass of the secondary vertex from charged particles [27]. It is the purpose of this work to study the hadronization uncertainties that affect all these methods.
Understanding the transition of b quarks into B hadrons is a crucial step to correctly predict the observables involved in the above-described mass measurements. This aspect of the physics of quarks and hadrons is ruled by non-perturbative QCD and it is clearly a challenge to describe it accurately starting from first principles. One possibility to fill in this gap is to advocate a factorization of long-distance effects, namely the hadronization mechanism, from the hard-scattering events. In this way it is possible to measure from data a non-perturbative fragmentation function D H,q (z), accounting for the transition of a quark q into a hadron H, carrying an energy fraction z. Most precise measurements of these fragmentation functions have been obtained at LEP [28,29] and SLD [30] experiments from which it is possible to obtain D H,q (z) at an energy scale Q 2 of the order of the inelasticity of the reaction. For processes e + e − → bb at the LEP, it is Q 2 m 2 Z : once known at some Q 2 , the fragmentation functions can be evolved to different Q 2 by means of perturbative evolution equations, analogous to those used for parton distribution functions.
Bottom-quark fragmentation is usually described using the perturbativefragmentation formalism [31]: up to power corrections O(m 2 b /Q 2 ), the heavy-quark spectrum can be factorized as a massless coefficient function and a process-independent perturbative fragmentation function, associated with the transition of a massless parton into a heavy quark. Such an approach allows a straightforward resummation of the large mass ln(m 2 b /Q 2 ) and threshold contributions, ∼ [1/(1 − z)] + and ∼ [ln(1 − z)/(1 − z)] + , which become large for z → 1, corresponding to soft-or collinear-gluon radiation. In the perturbative-fragmentation approach, the state of the art is next-to-leading-order (NLO) accuracy in α S , with next-to-leading logarithmic (NLL) resummation: Refs. [32,33,34] apply this formalism for bottom fragmentation to e + e − − annihilation, top and Higgs decays, respectively. Heavy-hadron spectra are then obtained by convoluting the heavy-quark energy distribution with non-perturbative fragmentation functions, such as the Kartvelishvili [35] or Peterson [36] models, containing few parameters which must be tuned to experimental data.
More recently, the approach based on Soft Collinear Effective Theory (SCET) and Heavy Quark Effective Theory (HQET) was used in the NNLO+NNNLL approximation in order to determine the b-quark fragmentation function from e + e − annihilation at the Z pole [37]. The authors found a general good agreement with the data, although some discrepancy is still present for large values of the energy fraction, where one is mostly sensitive to non-perturbative effects.
Although having lately become very accurate, calculations based on the fragmentation-function formalism are nonetheless too inclusive for a complete description of the final states and in general they need to be performed independently for each observable and hard-scattering reaction. Moreover, this approach is based on factorization, so it is not obvious whether the non-perturbative fragmentation function measured in a color-neutral environment, such as e + e − machines, could be straightforwardly extended to colored parton scatterings at hadron colliders.
As an alternative to the fragmentation function approach, there have been attempts to describe the formation of hadrons with phenomenological models inspired by some features that can be derived from first principles, albeit depending on a few parameters; namely the string model [38,39] or the cluster model [40]. This sort of approach is most common in shower Monte Carlo event generators, which are usually shipped with a "tuning" of the phenomenological parameters in such a way that they are tailored to reproduce a large set of previously available data like e + e − → hadrons. This approach is based on the universality of the hadronization transition, which, in principle would allow one to use models tuned at e + e − machines even at hadron colliders. However, as discussed above, it is not guaranteed that they will still be reliable in a colored environment, wherein even initialstate radiation and underlying event will play a role. Furthermore, the number of parameters involved is usually quite large, especially in the string model, and there is no clear factorization between perturbative and non-perturbative physics: every time one tries to tune a specific sample of experimental data, one should be warned that the possible agreement with other data may be spoiled.
Moreover, when using fragmentation functions and hadronization models, it is hard to make a reliable estimate of theory uncertainties. In general, there are pitfalls in carrying the properties of e + e − measurements to hadronic machines; although it is assumed that they will be accurate enough to capture correctly effects of order Λ QCD , the evaluation of the systematics due to this assumption is difficult to assess. For this reason it would be desirable to determine the hadronization parameters directly in hadronic collisions, so to remove all the issues tied with the use of e + e − data.
Extracting accurate fragmentation functions at hadronic colliders might bring about a formidable task. In fact, one needs to overcome the difficulty to obtain precise data and match them with sufficiently accurate theory calculations. For the purpose of top-quark phenomenology, this will likely need an NNLO accuracy in order not to spoil a sub-GeV accuracy on the top-quark mass determination. 2 Furthermore, the complexity of reconstructing the top-quark rest frame impedes the direct measurement of energy-fraction variables such as the x B = 2 · E B / √ s measured at LEP and SLD experiments, where the Z rest frame coincides with the laboratory frame. The use of hadronization models for precision physics at hadron colliders is challenging because simple models have had hard time to reproduce accurate experimental results [43,44,45] and elaborate models with many parameters are needed to describe e + e − → hadrons data. The large number of phenomenological parameters needed to describe present data makes it difficult to find a unique favorite set that best reproduces them. Even if such "best fit" point is found (or hand-picked), it is often the case that the determination of each parameter is tied to that of the others contained in the model. Such a non-trivial correlation makes it hard to evaluate the impact on physical observables of the change of each parameter, hence to define a theory uncertainty due to the modeling of hadronization.
A further complication arises from the fact that any hadronization model is strictly related to the the parton cascade, hence even the nonperturbative parameters are closely entangled with those characterizing the shower. All of this shows that it is necessary to carry out a coordinated study to constrain both hadronization and shower parameters from data, including possible pp → tt samples, in order to feel confident with the fits and their possible use in other environments. This was the purpose of the tunes of the PYTHIA 8 generator, such as the Monash tune [46], which we will take as baseline for our study, and of the efforts in the ATLAS collaboration [47] towards adding tt data in the fits.
Color reconnection is another source of non-perturbative uncertainty in m t , accounting for about 300 MeV in the estimate of the error in the worldaverage analysis [21]. Although color reconnection can even take place, e.g., in e + e − → W W → 4 jets processes at e + e − colliders, events where bottom quarks in top quark decays are color-connected to initial-state antiquarks do not have their counterpart in e + e − → bb annihilation. Therefore, color reconnection in tt events should deserve further investigation and may even need Monte Carlo retuning at hadron colliders. Studies on the impact of color reconnection on m t were undertaken in Refs. [48,49], in the frameworks of PYTHIA and HERWIG, respectively. Ref. [48] investigated the description of color reconnection within the PYTHIA string model, finding that, even restricting to the most sophisticated models, it can have an impact up to 500 MeV on the top mass. On the other hand, Ref. [49] addressed this issue by simulating fictitious top-flavored hadrons in HERWIG and comparing final-state distributions, such as the BW invariant mass, with standard tt events. In fact, assuming that top-flavored hadrons decay according to the spectator model, the b quark is forced to connect with the spectator or with antiquarks in its own shower, so that color reconnection with the initial state or witht-decay products is suppressed.
In this work we study the impact of the hadronization model parameters on precision measurements of the top-quark mass, for which a target accuracy around 0.5% is set. The aimed accuracy is about the uncertainty where current most precise measurements of m t tend to accumulate, mainly because of the limited knowledge of the jet energy. As the study of B-hadrons is motivated by going beyond the present limitations, this is clearly the precision level that hadronic observables need to reach in order to be useful. This accuracy corresponds to an absolute error δm t Λ QCD , which for hadronic observables implies the necessity to carefully model hadronization effects, whose impact on the single B-hadron kinematics is naturally of the same order of magnitude as Λ QCD .
In the following, we study several observables that have been discussed in the literature to measure the top mass, and assess the error on m t that would stem from a variation of hadronization and related shower parameters. As some of these observables have been proposed in studies of b-jets, we will define analogous variables for single B-hadrons: for example, a measurement of the invariant mass m bl of the b-tagged jet and a charged lepton has been modified to the invariant mass formed by the B-hadron and a charged lepton. For each observable, we derive the sensitivity of the top quark mass to the parameters of hadronization and showering. We then collect results in the tables of the following sections, expressing them in terms of logarithmic derivatives like: where O is a generic observable in tt events, the bar inŌ andθ indicates an average value in a given range, and θ denotes a generic Monte Carlo hadronization or showering parameter. While studying these sensitivities, we shall compute the precision in the knowledge of the value of these parameters that is necessary to achieve in order to warrant a sub-GeV theoretical error on the m t determination from hadronic observables. Furthermore, we carry out similar studies to quantify the sensitivity to showering and hadronization parameters of observables that are not sensitive to the top-quark mass. 3 We henceforth name these quantities "calibration observables" which are useful to gauge the suitability of the tunes obtained elsewhere to describe hadronization in the top-quark specific context. As the calibration observables do not (or very mildly) depend on m t , the measurements of these observables can be used to constrain the showering and hadronization parameters that have the greatest impact on the observables sensitive to m t , thus resulting in an improvement of the accuracy and the robustness of the mass measurement. A combined determination from data of both m t and Monte Carlo variables could be called an "in situ" calibration of the nuisance theoretical parameters that affect the top-quark mass determination with hadronic observables. In a way this could be thought as a strategy similar to the in-situ calibration of the jet-energy correction obtained from the reconstruction of the hadronic W boson peak in top quark events of Ref. [51]. Here, of course, we would be calibrating parameters of the phenomenological hadronization models and of the parton shower. The use of N + 1 observables, where N are mostly sensitive to Monte Carlo parameters and one is mostly sensitive to m t , could be considered as an ultimate strategy for the top quark mass determination or for the cross-check of theoretical uncertainties in the global picture of top quark mass determinations at the Large Hadron Collider. In the following, combining the results on calibration observables and mass sensitive ones, we will outline the ingredients and the inherent difficulties of such analysis.
To deliver our ideas efficiently, the paper is organized as follows. In Section 2, we begin with identifying relevant hadronization and shower parameters in PYTHIA 8 and HERWIG 6, together with their variation ranges to compute associated sensitivities. We then discuss kinematic observables to be used throughout this paper and the reconstruction schemes for the variables carrying some subtlety in Section 3. Section 4 is devoted to reporting our results on the calibration observables, while we present our analysis on observables sensitive to m t in Section 5. Our conclusions and outlook will be presented in Section 6. Finally, two appendices are reserved for discussing some details of our computation.

Parameters and calculations
Hereafter we shall study the process using PYTHIA 8.2 [10] or HERWIG 6 [9], namely LO hard scattering, parton showers and hadronization according to the string [38,39] or cluster [40] models, respectively. In the simulation of tt events, standard HERWIG and PYTHIA factorize production and decay phases and the top-quark width is neglected. Such effects are included in the new bb4 generator [17], implemented in the framework of the POWHEG-BOX [13], which simulates the full pp → bb + − ν ν at NLO, including the interference between top production and decay, as well as non-resonant contributions. Investigations of width effects and full NLO corrections to top decays are nevertheless beyond the scopes of the present paper and we defer it to future explorations. We point out, however, that both HERWIG and PYTHIA implement matrix-element corrections to parton showers in top decays, along the lines of [52] and [53], respectively, and therefore hard-and large-angle gluon radiation in t → bW processes will be taken into account in the results which we will present hereafter.

Variation of PYTHIA parameters
In order to reproduce the data, a number of hadronization parameters can be tuned; some of them are specific to heavy flavors and the formation of B hadrons, while others are meant to describe hadronization of light flavors. In our study we concentrate on heavy-flavor parameters and study variations of the Lund string model a and b, by considering deviations from the best-fit values which are used to describe light-flavored hadrons. Furthermore, we will also vary the r B parameter, the Bowler modification of the fragmentation function for massive quarks, which reads [39]: Here the z variable implies the fraction of E + p z taken by the B hadron under the assumption that a b quark of energy E moves along the z-axis, while m T symbolizes the associated B-hadron transverse mass. As discussed in the introduction, hadronization models are specific to the showering to which they are attached, and therefore it is important to study the sensitivity of top-mass observables to variations of the showering quantities. Parameters for which a sensitivity is expected include p T,min which is the scale at which the parton shower is stopped and hadronization models take over, the value of the strong coupling constant at the Z-boson mass that is used in the final-state parton shower, and the value of the b quark mass. In addition, we also check the effect induced by discrete choices in using the b-quark or the W boson as a recoiler to impose momentum conservation when we have splittings from the b quark in the shower. 4 It is important to remark that the chosen range of variation of these parameters has little importance for the calculation of the sensitivity defined in eq. (2). In fact, the derivative of the observables with respect to the Monte Carlo parameters is typically stable across large ranges, and as will be shown later on, even the sensitivity ∆ (O) θ is largely independent of it. What mainly leads to the choice of given ranges is to have sufficiently large differences between spectra for different parametrizations, so that the derivatives can be computed accurately, without being overwhelmed by the statistical errors due to the finite size of the Monte Carlo sample.
At the same time, we avoid taking values too far from the default, since this might generate unforeseen changes in the Monte Carlo predictions. For these reasons, we shall vary parameters up and down by at most 20% of their central values and evaluate numerically the derivatives in all available data points. For a summary of the PYTHIA parameters and the ranges of variation, we refer to Table 1. In principle, other parameters could be varied and tuned as they affect the kinematics of top quark events. For instance the initial state α s parameter of Pythia8, SpaceShower:alphaSvalue, could be considered. We choose to not investigate them, as they can be fixed from specific measurements such as the jet multiplicity used in [54].

Variation of HERWIG parameters
For the sake of comparison, we also investigate the impact of the HERWIG shower and hadronization parameters on the top-quark mass measurement. In fact, HERWIG and PYTHIA generators differ in several aspects: for example, the ordering variables of the showers are not the same, matrixelement corrections are implemented according to different strategies and, above all, models for hadronization and underlying events are different. As far as HERWIG is concerned, hadronization occurs according to the cluster model, which is strictly related to the angular ordering of the parton shower, yielding color pre-confinement even before the hadronization transition. In the following, we shall use the HERWIG 6 event generator, written in Fortran language. In fact, the object-oriented code HERWIG 7 [55] presents a number of improvements, especially when using the new dipole shower model with the modified kinematics for massive quarks [56]. However, although the latest version HERWIG 7.1 exhibits remarkable improvements for the purpose of bottom-quark fragmentation, the comparison with the e + e − → bb data is still not optimal.
As for HERWIG 6, in Ref. [57] the authors tried to tune it to LEP and SLD data, finding that the comparison could be much improved with respect to the default parametrization, although some discrepancy still persists and the prediction is only marginally consistent with the data. As a whole, we decided to follow [57] and stick to using HERWIG 6 in the present paper. As done in the case of PYTHIA, we identify the relevant shower   and hadronization parameters and vary them by at most 20% around their respective default values.
The relevant parameters of the cluster model are the following: CLMSR(2) which controls the Gaussian smearing of a B-hadron with respect to the b-quark direction, PSPLT(2) which governs the mass distribution of the decays of b-flavored clusters, and CLMAX and CLPOW which determine the highest allowed cluster masses. Their default values and variation ranges are summarized in Table 2. Furthermore, unlike Ref. [57] which just accounted for cluster-hadronization parameters, we shall also explore the dependence of top-quark mass observables on the following parameters: RMASS(5) and RMASS (13), the bottom and gluon effective masses, and the virtuality cutoffs, VQCUT for quarks and VGCUT for gluons, which are added to the parton masses in the shower (see Table 2). We also investigate the impact of changing QCDLAM as it plays the role of an effective Λ QCD in the so-called Catani-Marchesini-Webber (CMW) definition of the strong coupling constant α S in the parton shower (see Ref. [58] for the discussion on its relation with respect to the standard Λ QCD in the MS scheme). As well as for other paramters, we vary QCDLAM around its default value, the range of variation is tabulated in Table 2.

Observables
In this section, we identify the observables that we employ to study their sensitivities to parameter variations. We first discuss the observables relevant to top quark mass measurements, followed by our proposed variables for the tuning or the calibration of Monte Carlo parameters. For the formulation of how to compute these observables, we will follow the naming scheme for tt final states that we illustrate in Figure 1

m t -determination observables
We first list up the observables that we consider for the top-quark mass measurement.
• E B : energy of each tagged B-hadron; • p T,B : B-hadron transverse momentum; • E B +EB and p T,B +p T,B : sum of the energies and transverse momenta of B andB (in events where both B hadrons are tagged); • m B : invariant mass of the B-hadron and one lepton from W decay (the prescription for combinatorial ambiguity arising in forming m B will be discussed shortly); • m T 2 [59] and m T 2,⊥ [60] of the B and the B subsystems defined below; • m BB , the total mass of the system constructed by the two leptons and the two tagged B-hadrons in the event.
We choose these observables because some of them have already been discussed in the literature to carry out measurements of m t from hadronic exclusive final states: for instance, m B [41], m T 2 and its variants [61,62] and E B [42]. Furthermore, we consider observables that have been discussed for b-jets, and can naturally extend them by replacing b-jets with B hadrons. For instance, the sum of B-hadron energies E B + EB is inspired j bB Ā t t Figure 1: Schematic view of the event, including the transformation of ab quark into hadrons, namely into aB and other light-flavored hadrons denoted collectively asĀ, which together form a b-tagged jet jb. Similarly, a b quark coming from the t gives rise to jets and hadrons denoted by the same symbols without a bar.
by the analogous b-jet quantity E j b + E jb defined in [41]. To the best of our knowledge, the variable m BB has not been considered before. It is meant to be sensitive to m t as it probes the the total mass of the tt system. Therefore, we expect the bulk of its distribution to be sensitive to the top mass in a way similar to the bulk of the m B distribution. In the lower tail, the variable m BB is anticipated to have sensitivity to the top quark mass because of a threshold effect similar to the one in the proposal of Ref. [63]. It is noteworthy that the observables E B + EB, p T,B + p T,B , m BB , and all the m T 2 variables require two tagged B-hadrons in each event, and therefore they may be hardly measurable with enough statistics at the LHC. However, it is equally interesting to study them to see how much information on the top quark mass could be present in the kinematics of the whole event, rather than in a single top-quark decay, probed by other observables such as m B , p T,B , and E B . While it may be challenging to secure a sufficient number of events with fully-reconstructed B-hadrons, we also remark that for the measurements which do not actually demand tagged B-hadron(s), e.g., those based on generic tracks from the secondary vertex, it can be easier to achieve enough statistics. Therefore, our study of these observables with B hadrons is useful to see what can be gained using both sides of the tt events in these measurements.

Reconstruction schemes
For observables involving particle pairings, we need to specify a prescription: for example, in order to compute m B , we need to choose which lepton should be paired with the tagged B in the event.
Speaking of m B first, we define two variants. In the first one, we pick the lepton that gives the smaller m B for each lepton charge, that is to say, for each event we obtain two values, min(m B + , m B − ) and min(mB + , mB − ), and put both in relevant histograms. This selection scheme enables us to avoid leptons causing an endpoint violation of the true distribution arising from the B and in the same top decay. We call such a computed invariant mass m B ,min . We remark that, in the situation in which only one B-hadron is identified per event, this simply corresponds to just picking min(m B + , m B − ). In the second variant of m B ,true we use the event records to pair the B-hadron with the lepton that comes from the same top quark as the B.
For more complex observables, such as m T 2 , we have to specify further how we deal with constituent particles in the event. The b-jet j b is made of the B hadron and some other hadrons that we call A, as shown in Figure 1.
In a computation of m T 2 from b-jets, we need to calculate the transverse mass in combination with j b and some vector in the transverse plane that stands for one source of missing transverse energy (mET), associated with the neutrino in W decay. When dealing with the m T 2 of the B-hadron, we are naturally led to replace j b by B, but this leaves an ambiguity as of what to do with the rest of the jet. The hadrons A in j b are detectable particles, hence not invisible, but at the same time they are extraneous to the B-related transverse mass. For this reason, we are allowed to treat them as either extra radiation or invisible particles. In the latter case, they would be added to the mET of the event, as if they were not measured; in the former case, they would be regarded as extra radiation, as some kind of upstream momentum in top-quark decay. We shall call these two variants m the two options would give rise to mET and ISR computed as in Table 3. In our study, we shall also consider the m T 2,⊥ variable [60], which was considered at the jet level by CMS in [61]. The variable is defined in a similar fashion to the standard m T 2 , but in the plane perpendicular to the total upstream transverse momentum, so that it is insensitive to ISR, which affects the dynamical details of the m T 2 spectra. With the m T 2,⊥ variables, we therefore study the relevant sensitivities in the context of decay kinematics of top quarks, excluding any top quark production-related dynamics. 5

Definition of the sub-system observables
For the computation of the m T 2 variables it is possible to consider different subsets of top-decay products as visible particles [64]. Barring the light 5 Of course, this statement assumes precise measurement of the upstream momentum. In this sense, production-related information may enter the variables indirectly.
x for x not coming form either t ort hadrons A, each top quark gives rise to a B-hadron and a charged lepton , with two possible B-hadron-accompanying options for their treatment.
We can opt to form a compound system of these two and compute the transverse masses using p B + p as visible momentum or ignore and calculate the transverse mass using the B-hadrons and the mET, including the charged leptons as if they were invisible. These two options are labeled: • "B -subsystem" m T 2,B : the transverse mass is computed from p B +p for some pairing of B and with only the neutrino considered as invisible and the relevant test invisible mass is set to the neutrino mass; • "B-subsystem" m T 2,B : the transverse mass is computed from p B with the W treated as invisible and the relevant test invisible mass is set to the true W mass.
Both options for m T 2 yield distributions with a kinematic endpoint at m t , hence they can be used to measure the top mass. The "B-subsystem" mass m T 2,B has the virtue that it does not require any hypothesis on which lepton should be paired with each B. The "B -subsystem" m T 2,B , on the contrary, faces the issue of pairing leptons and hadrons, but generally it has the greatest sensitivity to m t , especially in the bulk of the distribution, where m T 2,B instead exhibits a very mild dependence on m t . Overall, the two options offer different pros and cons, making a comparison of their performances rather interesting.

Calibration observables
Even in presence of a perfect fit to precise e + e − data, there is no guarantee that the Monte Carlo tuning will be a good enough description of LHC data (fragmentation properties of light flavors have proven this to be the case [65]). It is therefore crucial to undertake checks of these tunings specific for light-and heavy-quark physics. With this motivation in mind, we study "calibration observables", i.e. quantities that we expect to have no or little sensitivity to m t but still depend on showering and hadronization in tt events. Exploiting this feature, one can use these observables to check the accuracy of the tunings or to constrain the Monte Carlo parameters "in-situ" for the mass measurement. In the latter case, the top-quark sample would be used to constrain simultaneously a number of showering and hadronization quantities from the data, as well as m t , in the same manner as the jet energy scale in-situ calibration carried out in [51]. We highlight the fact that in this in-situ calibration procedure one may end up with a point of Monte Carlo parameter space that may or may not be close to the point used of the standard tunings, e.g., the Monash one. The best-fit calibration point, in fact, is very specific to top quark events and in principle even to the selection cuts used to isolate top quarks from background. Therefore, it has a rather different meaning than usual tunings, and we do not expect it to be a Monte Carlo set-up usable outside the domain of the top quark mass measurement.
Furthermore, we point out that our calibration is intended only for the purpose of reducing the impact of fragmentation uncertainties on the m t measurements. For this reason, one might even think of studying the response of the observables to changes in the hadronization model in dijet events with two b-tags and better constraining the Monte Carlo parameters: however, this would go in the direction of producing a tuning, rather than an in-situ calibration. In the following, we shall not discuss the use of dijet and other heavy-flavor production processes, but we just leave them as a possible validation of the calibration procedure. Clearly, if a stark deviation from the process-universality of the Monte Carlo parameters had to be observed, this should raise questions on the appropriateness of the theory description employed to model tt events and other heavy-flavor production.
Studies to identify observables useful for a calibration specific to top quark have been carried out by ATLAS in the tuning work [47], as well as in Ref. [66]. These investigations consider the observables: • p T,B /p T,j b : the ratio of the B-hadron transverse momentum magnitude over that of the b-jet; : the radial jetenergy density, defined and measured as in [67].
We remark that these observables are sensitive to the presence of the heavy hadron and to the energy distribution in the jet. Hence, they are suitable to probe the dynamics of the conversion of a single parton into a hadron, but feel only indirectly the effects of other partons (hadrons) in the event. However, for the precision we are aiming at, it is important to test the possible cross-talk among partons in the event: in fact, in the hadronization transition, each parton is necessarily connected with the others, due to the need to form color-singlet hadrons. To probe these global effects in the formation of hadrons, we study the following variables: These variables are sensitive (in different manners) to the existence of a bb system, hence they probe hadronization in a more global way than the single b-jet-associated observables examined in [66] and [47]. All the options for X B tend to √ s in e + e − collisions or any other fixed partonic center-of-mass energy. In this context, this property is useful as it allows a more direct comparison to e + e − data, if necessary. The options for X B are sensitive to different aspects of the pp → bW +b W − + X kinematics, such as the relation between the b quarks with the initial state, which, being colored, can influence the hadronization. For example, in Figure 2 we display two configurations (first and second event sketches) that can have similar total hardness, e.g., measured by |p T,j b | + p T,jb , but might have considerably different m j b jb . For small m j b jb , the two b quarks will tend to be more collinear, hence they have a greater chance to interfere with each other when hadronizing. On the other hand, the first and third event sketches might have similar |p T,j b | + p T,jb and m j b jb , but the hadronization may still differ because of the different center-of-mass energy, which is larger in the third type of events. The notion of center-of-mass energy of the initial partons in presence of invisible particles can be captured by the variable √ s min proposed in [68], which would be a discriminant between the third and the first type of event.
In addition, if one wishes to probe the whole kinematic phase-space of the bb system and its hadrons, one can think of further observables. For a probe of emissions from the b quarks we can use the relative azimuthal angle ∆φ(j b jb), that is affected by b → bg splittings, hence it can be meaningful to constrain parton shower parameters. A similar angular observable for B-hadrons ∆φ(BB) can also be employed thanks to its sensitivity to the angular features of the transition of quark into hadrons. Attempting to isolate the fragmentation effect of these angular observables, in the following we will also consider the difference between hadron and jet angular separation, that is |∆φ(j b jb) − ∆φ(BB)|. Besides employing the azimuth φ, we use the complementary information of polar angles in the distribu-tions of ∆R(j b jb), ∆R(BB), and |∆R(j b jb) − ∆R(BB)|. 6 Moreover, we consider, as possible candidate calibration observables, the mass of the bjet, m j b , which might help to constrain the shower parameters, and the ratio of the invariant mass of two B-hadrons over the mass of the two jets, namely m BB /m j b jb .
A caution must be taken: some of the observables might suffer from experimental effects that hinder the use of their precise measurement for a calibration. For instance, the b-jet mass or the X B variables used to normalize χ B are, in principle, sensitive to jet energy scale uncertainties, which may spoil the accuracy on the calibration observables. We shall therefore consider two scenarios for the achievable precision on the calibration observables: an optimistic scenario in which the jet energy scale uncertainties do not inflate the total uncertainty, and another one where the accuracy is limited by these uncertainties at around 1%.
As a further exploration, we also study variables where jets are avoided altogether. One of them is the jet radial energy density ρ(r) defined above, which needs jets only to identify the hadrons that enter the calculation. In addition we also consider other observables such as the ratios: in which we use the lepton energy to make a dimensionless quantity of the energy out of a B-hadron. For these variables there is no use of jets so that a measurement with systematic uncertainties far below the percent level is achievable.

Variants of the calibration observables
Due to the complexity of the tt decay final state, it is possible to conceive variants of the above defined χ B . For instance, the inclusion of leptons in the "visible" system used for the computation of √ s min is somewhat arbitrary. Furthermore, if the calibration observables were to be studied in a high-p T sample of pp → bb events, there would be no leptons at all so that a comparison of resulting outputs to tt results would not be straightforward. Therefore, in addition to the standard definition of √ s min in Ref. [68], which is written in terms of the "visible" transverse momentum where M inv are hypothetical masses of invisible particles, we also employ a modified version in which the corresponding quantity is computed as if the charged leptons were invisible, hence part of the mET. We denote this definition as √ s min,bb , where E vis and P z,vis are expressed in terms of the bb system only. Although M inv can carry whatever values, in the case of √ s min , the masses of the invisible particles are simply set to be the same as the neutrino masses for LHC purposes. By contrast, in the case that the whole W boson is regarded as invisible, one may conceive to use the physical W -boson mass for M inv in eq. (5). We have explicitly checked the effect of not taking the physical mass of the W boson on our results and observed little dependence on this arbitrariness; therefore, throughout our analysis, we shall put M inv = 0 for √ s min,bb .

Results on the calibration observables
For a simple but effective characterization of the impact of the Monte Carlo parameters, we start by computing the average values of the calibrationobservable spectra, also referred in the following as the first Mellin moment, i.e., M 1 . The general expression of the k-th Mellin moment of a distribution with counts O j and bins centered in b j reads: Note that k is not necessarily an integer. Since we will deal with only the first moment in our study, we name it with a simpler notation M O . The spectra of the calibration observables and all our results in the following sections are obtained with samples obeying the following cuts on final-state jets and leptons: In order to avoid sensitivities to extreme kinematic configurations and the unknown unknowns about their theory description, we compute the average values in a range around the peak of the spectra. This range usually corresponds to the full width at half-maximum (FWHM) range, but in some cases we shall employ slightly different choices. For each observable we fix this range from the distribution that we obtain for one value of m t , and we use it throughout the analysis. Our choices for the observables are reported in Table 4. Example spectra in observables that are potentially interesting for the calibration of the Monte Carlo parameters are shown in Figure 3, in which the thick part of the histograms corresponds to the FWHM range.

Dependence on the top-quark mass
A preliminary step for the Monte Carlo tuning is the assessment of the dependence of the calibration observables on the top-quark mass. Ideally, a calibration observable should have no dependence on m t , so that it could be used to constrain the Monte Carlo parameters without any concern about a potential bias in the m t measurement. In practice, all quantities have some sensitivity to m t , so the only viable approach is using observables with minimal sensitivities to the top-quark mass. In this section we study the dependence on m t of the calibration observables, so that one can restrict the analysis to a specific set of quantities, clearly aimed at calibration. We determine the dependence on m t of the first Mellin moments, obtained from 21 m t values in-between 163 GeV and 183 GeV, by fitting a straight line. We then compute the sensitivities defined in eq. (2) and repeat this procedure for the several Monte Carlo settings needed to explore the dependence on the parameters in Table 1. The Mellin moments and the straight-line fits for some illustrative parametrizations are presented in Figure 4. Once we have collected the values of the sensitivity to m t for all the variations of the Monte Carlo parameters, we calculate the mean value and standard deviation for each observable. We highlight the fact that  the standard deviation, arising from the discrepancies in the straight-line fits to data with different parametrizations, is to be read as a measure of the sensitivity of our result to the Monte Carlo setting employed in the computation.
The results are reported in Table 4; a few comments are in order. First of all we remark that for our purposes it is sufficient to present the broad picture of the sensitivity of the observables to showering and hadronization parameters, and therefore we can content ourself with one or two digits of accuracy in the determination of sensitivity parameters. This accuracy will be more than sufficient to draw informative conclusions. Concerning the actual values of the sensitivity, we see that they tend to be ∆ (M O ) mt 1 for most observables, with few notable exceptions, the most apparent being the jet mass, which has ∆ (M O ) mt 0.23. This is the only observable not constructed to be dimensionless, hence it is not surprising that it exhibits the largest sensitivity to the dimensional m t . The smallest dependence on m t is found for the purely angular distributions ∆φ(BB) and ∆φ(j b jb), as well as their ∆R equivalents. In addition, we remark that the sample of variables χ B tends to have slightly larger sensitivity to m t . The choice of different denominators in χ B can lead to a milder dependence on m t in the numerator, as can be seen by comparing the results for χ B |p T,j b | + p T,j b with respect to χ B E j b + Ej b .

Constraining power of the calibration observables
In order to quantify the dependence of the first Mellin moments on the Monte Carlo tuning, we evaluate the difference between the moments obtained for two values of each parameter at fixed m t and compute the sensitivity according to eq. (2) for the m t value at hand. Considering several m t values between 163 GeV and 183 GeV, we calculate the average value and standard deviation for the sensitivity ∆  Table 4. The obtained estimate for the sensitivity essentially represents the distance between the lines shown in Figure 4 for different values of the same Monte Carlo parameter.
From Table 4 hence, observables with large ∆ , are the best diagnostics of the accuracy of the tuning or, equivalently, the best observables for an in-situ calibration in the tt environment.
From a quick glance to the table, we find that a few interesting patterns emerge: the radial energy distribution ρ(r) is among the most sensitive observables to both hadronization and shower parameters and tends to have opposite-sign sensitivity with respect to the other quantities. This tendency can be easily justified: for instance, when increasing α S,F SR , it is natural to expect more radiation, even at larger angles, so that ρ(r) grows, while p T,B or any possible denominator of χ B decreases; a similar but reversed argument holds for p T,min . In spite of this apparently different dependence of ρ(r), we warn the readers that the actual usefulness of the combination of several observables to obtain constraints from data on the parameters has to be evaluated globally, by looking at the dependence of all observables on all parameters. We defer to Section 4.3 a full commentary on the information that we can obtain on the Monte Carlo parameters combining several calibration observables.
Angular variables like ∆φ(BB) and ∆R, as well as their analogs at b-jet level, tend to exhibit very small dependence on all parameters. However, the difference between hadron-and jet-level observables exhibits significant dependence on most parameters, as well as some similarity with the results for ρ(r).
The hadron-to-jet ratios p T,B /p T,j b , E B /E j b , and m BB /m j b jb exhibit substantial sensitivity to the Monte Carlo parameters and are expected to probe similar aspects of the PYTHIA description of the events. Due to the little dependence of ∆φ and ∆R on both m t and shower/hadronization parameters, we can expect the sensitivity of the mass ratio to be closely related to that of the transverse-momentum ratio.
The ratios that involve lepton energies usually show very little sensitivity to the Monte Carlo parameters and m t , and therefore they have little use for the calibration.
The variables χ B show some of the largest sensitivity to the Monte Carlo parameters. We remark that for the denominator of χ B , the variants most sensitive to Monte Carlo parameters are those mostly dependent on m t as well. Usually, the increase of the dependence on the Monte Carlo parameters is smaller than the one on m t . Nevertheless, care must be taken when one uses the definitions of χ B more sensitive to m t in the calibration and in the m t determination.
The jet mass tends to have small dependence on the parameters, at most comparable to that exhibited by other observables. Putting this together with its large sensitivity to m t , it is clearly a non-optimal candidate for a tuning or a measurement of m t with an in-situ Monte Carlo calibration.

Combined constraining power
In order to go beyond the qualitative analysis of the preceding paragraphs, we have devised a procedure, based on the ∆ values, to quantify the constraints on the Monte Carlo parameters which can be obtained from the combined use of several calibration observables. This procedure returns an estimate of the relative error on the determination of the parameters from the first Mellin moments, but can be straightforwardly applicable as well to other input observable quantities such as the higher Mellin moments or other properties of the spectrum of the calibration observables. The procedure goes as follows. We denote our parameters generically as a vector θ, and collectively, we adopt the following notation: θ = {α s,F SR , m b , p T,min , a, b, r B , recoil} .
We put the observables as well in a vector that we denote as O = {O i } for the O i listed in the first column of Table 4. In this notation the sensitivity of each observable to the several parameters can be expressed in matrix notation as: where Table 4 gives the entries ∆ . This matrix contains all the necessary information to estimate how strong a constraint on the parameters θ can be obtained from measurements of the observables and their Mellin moments. In order to derive the constraining power of our set of calibration observables, we would like to invert the matrix ∆ , as to express the sensitivity of the parameters as functions of the observables, which would take the relation of with∆ Being aware that ∆ is, in general, a rectangular matrix, we may not be able to invert it to find∆ . However, we can utilize the pseudoinverse [69,70] procedure to define∆ The strength of the constraints that can be obtained on the parameters θ is given by the general transformation of the covariance matrix. From eq. (10) it follows that Assuming for simplicity that the measurements of the Mellin moments are all uncorrelated and that an overall precision of 1% in their measurement is achievable, we find that the constraining power implied by Table 4 would be in general very modest. In fact, the standard deviations read off the diagonal elements of cov dθ θ are all around or above 100% relative uncertainty. The origin of this fact can easily be tracked back to the fact that the matrix ∆  Table 4 the singular values read 1.7, 0.26, 0.048, 0.0075, 0.005, 0.0033, 0.0014. This means that the information contained in the FWHM Mellin moments of the several observables which we consider allows us to constrain at most one or maybe two parameters (that would be α s,F SR and m b ). In fact a singular value of order one means that from a measurement of the observables with precision 1% we can extract one parameter (or combination of Monte Carlo parameters) with the same accuracy, that is 1%. Similarly a singular value of order 10 −3 implies that even in presence of a measurement of the observables with 10 −4 precision we would obtain a constrain on the relevant (combination of) Monte Carlo parameters at a mere 10 −1 level, as suggested by eq. (8). Given the singular values obtained, we remark that, even imagining measurements of the calibration observables with precision better than 1%, it is clear that an analysis of very inclusive quantities such as the Mellin moments is not capable of yielding any useful constraint on the Monte Carlo parameters.
A more graphical way to picture why the Mellin moments do not contain enough information to constrain all the relevant Monte Carlo parameters is to look at the angles between the directions pointed by the rows of the matrix ∆  Table 4 all the rows point in a very similar direction. In Figure 5 we show the values of 1 − | cos α ij |, where α ij is the angle between the direction of the row i and the row j. From the figure it is apparent that several Mellin moments point essentially in the same direction (e.g., m BB /m j b jb , p T,B /p T,j b , and E B /E j b are well visible in the center part of the plot); the two furthest apart directions are separated by 1 − | cos α| 0.25. Overall we can define three main directions which might be (somewhat arbitrarily) labeled by ρ(r), for the observables in the upper left corner of Figure 5, p T,B /p T,j b for the observables in the middle part, and |∆R(BB) − ∆R(j b jb)| for the observables in the lower-right part.   Table 4.
Adding several higher Mellin moments to the analysis, we find very lit-tle change in the results of the analysis. We have tested values of k in the definition of the k-th Mellin moment eq. (6) from 0.25 to 4 to explore the sensitivity of Mellin moments to the Monte Carlo parameters. We find that, although they have different magnitudes, all Mellin moments are sensitive to the same combinations of Monte Carlo parameters as in the first moment. We ascribe this similarity to the fact that we are considering FWHM ranges, which limits the possibility to be sensitive to different physical regimes, hence different Monte Carlo parameters. In the following section, seeking for an improvement of the prospected constraints, we explore the constraints that can be obtained if one considers the full range of the calibration observables.

Differential constraining power
More stringent constraints on the Monte Carlo parameters can be obtained exploring more in detail the full shape of the distribution of the calibration observables. A common practice in this case is to study several Mellin moments of the full distribution as a mean to probe the full shape of the observable at hand. However, in most cases one cannot extract the highest Mellin moments from the data very accurately, since they would be sensitive even to small errors in the distributions. 7 As an alternative, and probably more transparent, manner to study the shapes of the several observables, we use directly the bin counts of a subset of the calibration observables in suitably chosen ranges. 8 The sensitivity of the bin counts to changes of the Monte Carlo parameters will be examined comparing bin-by-bin the spectra obtained from different Monte Carlo settings.
In order to keep the problem at a minimum of complexity, we start by identifying the most interesting physical quantities and studying the 7 More generally, the use of the moments in the analyses is often motivated by theory arguments, namely the fact that convolution integrals, such as those entering in the DGLAP equations, are turned into ordinary products in N -space. In our study, however, these properties are not compelling, so Mellin moments are really just a way to describe the shape of the spectra of our observables. dependence of each bin count on the Monte Carlo parameters. Results for a few representative observables are presented in Figure 6, where the vertical dashed lines limit the FWHM ranges used in the previous section to compute the first Mellin moments of the distributions. These lines also serve as guidance to get a feeling of which parts of the distribution belong to the tails and which to the bulk. Clearly in some tails there is a good deal of sensitivity to the Monte Carlo parameters, but it is likely that data will be scarce in those regions. From these results, we learn that sensitivities to α S,F SR and m b exceeding 2 can be achieved, e.g., in p T,B /p T,j b , whereas it is overall quite hard to find regions of the spectra with a similar dependence on p T,min . The dependence on a, b, r B and recoil is in general much milder.
Learning from the experience of the previous section, we can generalize the analysis by considering each bin as an independent observable and attempting to use its theoretical dependence on the Monte Carlo setting to constrain these. Similarly to the previous section, we can compute a matrix of sensitivity ∆ The observables that we use in this analysis are ρ(r), χ B (E j b + Ej b ), E B /E , ∆R(BB) − ∆R(j b jb), p T,B /p T,j b , and m BB /m j bjb . They are the observables that in the previous section have shown the greatest sensitivity in absolute sense and the most distinct dependence on linearly independent combinations of Monte Carlo parameters. In place of the χ B (E j b + Ej b ) variable we could have used other options for the denominator that appears in χ B , but we find it convenient employing χ B (E j b + Ej b ), because it is bound to be in the range [0,2], facilitating the identification of an optimal bin-size for the calibration. This fact is quite important, because the study of distributions rather than integrated quantities inflates the expected statistical errors on the measurements. Using an observable that is limited to a range by definition (such as χ B (E j b + Ej b ), p T,B /p T,j b , and m BB /m j bjb ), we can more easily define bins with sufficiently large expected number of events. In order not to deal with too small expected bin counts, we have divided each spectrum in about 10 bins, the details of the binning are reported in Table 5 for completeness.
We remark that by using a single observable and studying the whole spectrum the best results are obtained from the distribution of p T,B /p T,j b . In this case the sensitivity matrix has singular values 7.0, 1.8, 0.28, 0.11, 0.11, 0.037, 0.018 from which we see a substantial improvement with respect to the case of Mellin moments in the FWHM range of the previous section. The largest angular distance in this case is 1 − | cos α ij | 0.45, which clearly supports the observed sensitivity to more independent combinations of Monte Carlo parameters than in the inclusive analysis of Mellin moment. In spite of this, there are still several small singular values, which indicates that it is not possible to look only at this distribution to constrain all the parameters we set out to constrain. For completeness we report the sensitivity matrix for the p T,B /p T,j b distribution in Table 10 in Appendix A.
In order to obtain meaningful constraints on all Monte Carlo parameters of interest, it seems necessary to gather more information using a larger set of input observables. Performing such a global analysis on the six observables mentioned above, we find that the singular values of the sensitivity matrix are 15.0, 4.2, 0.75, 0.42, 0.27, 0.16, 0.13. Given the lowest singular value of order 10 −1 , we expect that it is possible to put meaningful bounds on all the parameters. For instance input observables measured at 10 −2 precision should give O(10 −1 ) even on the most loosely constrained parameter.
We use eq. (11) to determine the covariance matrix expected for the determination of the Monte Carlo parameters from the study of the full spectrum of the six observables. If we assume that each bin of these distributions is an independent observable, uncorrelated with all the rest, and measured with 1% precision the expected relative uncertainty ranges from 70% for the recoil to about 4% for α s,F SR . Such uncertainties for all the parameters as well as the correlation matrix are given in Table 6. The reported uncertainties are for 1% precision on the input observables, treated as independent and uncorrelated. Had we considered a different precision for the input observables, for instance a 0.1% precision (reachable at the HL-LHC if one considers purely statistical uncertainties), we would have obtained linearly rescaled uncertainties (that is 10 times smaller than what are reported in Table 6), as eq. (11) dictates. These results clearly indicate that a global analysis of several spectra of calibration observables has a potential to constrain all the PYTHIA parameters explored here.
The improvement of the achievable constraint can be tracked to the angular distance between the rows of the sensitivity matrix that is larger than in the case of single observable analysis: for the six observables at hand we get 1 − | cos α ij | 0.55. Still, most of the information is extracted from bins in the distributions that depend on very similar combinations of the Monte Carlo parameters. This can be seen in Figure 7, where we display the direction of the gradient of the 66 bin counts used in our global analysis in various subspaces of our 7-dimensional parameter space. The shower parameters tend to appear in overall different combinations, although p T,min has usually small impact on the calibration observables, hence the gradient vectors in the top left panel have small components in the p T,min direction. The hadronization parameters, presented in the top right panel, tend to appear in combination b − a − r B except for a few observables, which generates the high degree of correlation in the matrix in Table 6. Finally in the bottom panel we find the dependence on the recoil switch parameter which shows significant correlation with α s,F SR .
We stress the fact that we have not gone through an intense optimization of the choice of the observables and ranges of the spectra, but nevertheless we expect these results to be illustrative of the achievable constraints on PYTHIA 8.

Results on observables sensitive to m t
We are now in position to report our results on observables sensitive to m t , following a similar path to the preceding section on calibration observables. We quantify the sensitivity of a mass-sensitive observable O to a given Monte Carlo parameter θ or to m t , using a logarithmic derivative defined in eq. (2). The numerical derivatives necessary to compute ∆ (mt) θ are obtained from straight-line fits and comparison with Monte Carlo predictions varying the parameters in the same ranges as discussed for the purpose of the calibration variables. Examples of spectra for different Monte Carlo parameter settings are shown in Figure 8.
For mass-sensitive observables we would like to have a pattern of sensitivity that is opposite to the one wished for calibrations observables, that is to say large sensitivity to m t and small dependence on θ. In fact, in order to reach a given accuracy δm/m on m t , we need to have under control each θ better than a relative precision  Therefore, the observables with smaller ∆ (mt) θ can be useful even when shower and hadronization parameters θ are not known well. As a consequence, for the observables to be used for the m t measurement we wish to have small ∆ (mt) θ . Due to possible different sensitivity to m t which different ranges of the distributions might exhibit, we will carry out two analyses. One is more inclusive and is based on the dependence of the Mellin moment of each observable on m t and the Monte Carlo parameters; the other one is based on the shape analysis of certain spectral features. These shape analyses are motivated by known properties of kinematic endpoints of distributions in e.g., m B and m T 2 , and of peaks of distributions in e.g., E B , since the corresponding quantities constructed with a b-jet instead of a B-hadron are less affected by dynamical details. Hereafter, we shall first discuss the Mellin moments and then the shape investigation.

Mellin-moment observables
Similarly to the analysis carried out for the calibration observables in Section 4.2, we look at the first Mellin moment of the distributions of masssensitive observables, in order to characterize the dependence on m t and on the parameters θ. We obtain each Mellin moment, restricting the calculation to a range that roughly corresponds to the Full-Width Half-Maximum area. The exact choices that we adopt in our analyses are reported in Table 7, together with the sensitivity values to each PYTHIA variables and to the top-quark mass. In the first rows of Table 7, we present results for observables that can be reconstructed with just one hadron, followed by those using also leptons. Then we present observables that would require the tagging of two B-hadrons and finally quantities that employ only leptons.
A further comment is in order concerning the numbers in Table 7. In the previous sections we have highlighted how a number of variations of some observables can be devised if we change strategy i) to pair together leptons and B-hadrons ii) to deal with the hadrons other than the B in each jet iii) to deal with ISR, in the computation of m B and m T 2 . It turns out that the several variants of these observables tend to give rather similar results when it comes to both sensitivity to m t and to Monte Carlo parameters. For this reason we show only representative results for each type of observables. A similar table for HERWIG 6 parameters is given in Table 8, but it is limited to the most representative observables among those of the bigger set analyzed for PYTHIA.
Several interesting observations can be made on these tables. As for the PYTHIA study we find that the uncertainty on α S,F SR has the largest impact on the m t measurement: a relative error on α s,F SR around 1% seems necessary to achieve a 0.5%-level top quark mass determination. This level of precision in the knowledge of α s,F SR is clearly signaling the overall necessity to describe radiation very accurately to attempt a precision measurement of the top quark mass. One could argue that the intrinsic error of a calculation based on a leading order matrix element attached to a leading-logarithmic parton shower, such as PYTHIA, makes it meaningless to talk about fixing one of its main parameters to such a precision. In fact, one can interpret our results as a starting point to conclude that i) a calculation with matrix elements computed beyond leading order matched to a parton shower might be used to (hopefully) get a smaller sensitivity to the α s,F SR parameter and ii) a more accurate description of the parton shower beyond leading logarithms would probably be necessary as well to warrant a top-quark mass measurement from Monte Carlo simulations with sub-percent precision. In any case, for the scope of this paper, we will treat the findings on α s,F SR at face value, hence we conclude that this parameter should be known with a precision from 1% to few %, for the sake of employing any observables which does not rely only on lepton momenta.
Concerning the other Monte Carlo variables, we observe that the m tdependent observables are much less sensitive to variations of the other parameters. The second most relevant one is the bottom-quark mass m b , which, in order not to spoil the m t measurement, should be known with precision between few % and 10% from observables not made of just lepton momenta. Even smaller sensitivity values are obtained for a, b, r B , recoil, and p T,min : determining them to a 10% accuracy should be enough to warrant a 0.5% precision on m t .
Purely hadronic quantities such as p T,B , E B , and E B + EB yield quite similar values of ∆     Table 7, but in terms of the HERWIG 6 shower and hadronization parameters. Ranges are those reported in Table 7 for all values of m t . and showering for a mass measurement based on any of these observables are expected to be essentially the same. Variables like m B are in general half as sensitive as purely hadronic observables, hence they can tolerate a worse knowledge of the Monte Carlo parameters, even by a factor of 2.
To better put these results in context, we can compare them to the sensitivity ∆ (mt) θ of other observables defined in terms of the charged leptons, such as the single-lepton energy E and the invariant mass of the two leptons m . In general, we observe a reduction by a factor of 10 or greater in ∆ (mt) θ , in such a way that their parameter dependence in most cases becomes statistically compatible with zero. We can therefore argue that the mild sensitivity of observables like m B is a genuine consequence of the small dependence of the lepton variables on the details of shower and hadronization.
As for HERWIG 6, our investigation is limited to the B "true" invariant mass, and the energy and transverse momentum of B-hadrons, and the energy of charged leptons in W decay. For all observables and parameters, the ∆ values run between values statistically compatible with zero and O(10 −2 ), the highest dependence being exhibited by m B , E B , and p T,B on b-quark and gluon effective masses, RMASS(5) and RMASS (13), and on the cluster-spectrum variable PSPLT (2). This implies that, aiming at a precision of 0.5% on m t , it would be roughly sufficient to be aware of showering and hadronization parameters with about 10% accuracy. Before concluding this subsection, some comments on the tuning of the strong coupling constant are in order. Unlike PYTHIA, where different couplings for initial-and final-state radiation are employed and the value of the coupling at the Z mass is possibly fitted, in HERWIG the user-defined quantity QCDLAM is a modified version of the MS parameter Λ QCD , according to the CMW prescription [58]. It is easy to prove, using the formulas in [58] that a relative uncertainty of 10% on QCDLAM implies a precision about 1-2% on α S (m Z ) in the CMW scheme.

Shape Analyses
The Mellin-moment analysis above can be applied to all observables, since they all have a rise-and-fall shape and it appears reasonable to correlate it with m t . However, for some observables, kinematic features such as peaks and endpoints are likely to yield more precise and more robust extractions of m t than the Mellin moments. These features might possibly protect one against uncertainties on showering and hadronization parameters. For example, at leading order and in the narrow-width approximation, the m B endpoint arises from on-shell conditions of W bosons and top quarks. For perfectly on-shell kinematics, regardless of the parton shower, such an endpoint should not be surpassed in any event, because it is completely dictated by the phase space of the top quark decay, and not by the underlying dynamics. Barring the soft colored quanta exchanged to neutralize the color of the top quark with the one of the rest of the event, the endpoint should be respected by all events, regardless of how the hadrons were formed. Motivated by this possible reduction of sensitivity to the Monte Carlo parameters, we extend our previous analysis to the kinematic endpoints of the B invariant mass and of the transverse mass m T 2 distributions, and to the peak position of the B-hadron energy spectrum. Endpoints and peaks will be determined by fitting their spectra with suitable functions.

Definition of the dedicated shape analyses
In order to extract the kinematic endpoints and energy-peak position, we analyze the part of the spectrum around them, employing fitting ansatze which capture the necessary features. For the kinematic endpoints, we take a simple second-order polynomial where x m is identified as the corresponding endpoint and c 1,2 are other fit parameters, irrelevant to our study. For determining the peak position in a B-hadron spectrum, we adopt the following ansatz, inspired by the measurement of the b-jet energy peak in Ref. [71]: where x * is identified as the peak position and p is another fitting parameter.
As for the endpoint of m B , we use the data in the range above m B = 127 GeV, which corresponds to a fraction tail 0.05 of the total crosssection for m B ,true and m t = 173 GeV. In terms of the number of events N ev expected in each experiment at the LHC we have where BR(tt) is the branching ratio of tt into a given decay mode, and tagging is the tagging efficiency for an exclusive final state with a B-hadron. This estimate implies that the foreseen LHC dataset will yield statistical uncertainties ( 1.5%) somewhat above the level of the systematics at which we are aiming (∼ 0.5%). There are several ways to get around the potential loss of overall precision due to the statistical uncertainty. First of all, if a tagging efficiency better than 10 −4 could be achieved, for instance by not requiring exclusive B-hadron decays for tags, then the resulting statistical errors might become comparable to the systematic ones. Secondly, other experimental considerations may motivate to employ a broader range than the very end of the tail: for instance, in the CMS analysis [61] the m b region used in the fit corresponds to tail ∼ 0.1. One possible issue that one can raise with this trial is that the m B -endpoint analysis might indirectly reintroduce sensitivities to Monte Carlo parameters, through including more bulk of the distribution than endpoint region for the fit, resulting in an overall sensitivity to the Monte Carlo parameters similar to the Mellin moment analysis of the above Section 5.1. We have explicitly checked the sensitivities with smaller statistics, e.g., tail ∼ 0.02, and found that there are no relevant effects. Nevertheless, we leave to the experimental collaborations the task of looking for an optimal balance; similar considerations, including the fit-range choice, apply to the other kinematic endpoints (see also Table 9). For the study of the E B -peak position it is much easier to secure enough statistics. Of course, the exact range of energies in which we analyze data affects the sensitivity to shower and hadronization parameters. In this case as well, we leave the optimal choice of range to the experimental collaborations. In our analysis, we simply take [35,85] GeV for illustration, which corresponds roughly to the full width at three quarters of the maximum.
Finally, we make a further comment on notational conventions. Given the several options of reconstruction, subsystem and pairings, hereafter, for the sake of a more effective notation, the variants on which we focus will be labeled with roman letters in the math (min, true, ISR, mET). To avoid notational clutter with unnecessary indexes, we shall denote the endpoint of a mass distribution by a breve accent, i.e.,m, which should be evocative of our parabola fit near the endpoint.

Results of the dedicated shape analyses
The results on the sensitivity of our observables on the top-quark mass and of m t on the Monte Carlo parameters are reported in Table 9.
First of all we would like to comment on the form used to present these results: in fact, whenever the sensitivity of an observable to a given parameter is very small, it becomes numerically challenging to determine it. In the following we will be interested in identifying parameters that have remarkable effects on the observables; therefore, in presence of a small sensitivity, we will content ourselves with stating that it is below the few per-mil level, ∆ (mt) θ < 0.003. In fact, for an observable to be used to measure m t , this level of sensitivity is clearly safe, as it implies that knowing just the order of magnitude of a given θ will be enough to warrant a 10 −3 precision on m t .
For the energy-peak fit, we find results rather close to those from the Mellin-moment analysis: regarding hadronization and showering uncertainties, determinations of m t from a moment or from energy peaks are on the same footing. In fact, top-quark decays t → W + B + hadrons are genuinely multi-body processes, hence the B-energy peak does not fully enjoy the process-independent "invariance" property of the peak in b-jet energy spectra [71,72], which, instead is typical for a two-body process such as t → j b W + with j b = B + hadrons.
For the B invariant mass, the endpoint analysis shows a very significant reduction of shower and hadronization sensitivity: in fact, the upper-bound on m B is roughly the same as for an ideal two-step two-body on-shell decay, i.e., B → b limit where no shower or hadronization effect comes into play. However, we remind that in practice the endpoint is extracted by fitting a certain amount of bulk region with a suitable template, e.g.,  parabola in Eq. (12). As discussed before, the endpoint measurement is indirectly affected by the Monte Carlo parameters via fit parameters c 1 and c 2 in Eq. (12) that are responsible for the detailed shape near the endpoint.
In the chosen range, the dependence of m B on hadronization and showering parameters is milder by an order of magnitude with respect to the Mellin moment analysis. In some cases, the sensitivity is even below the statistical uncertainty: for example, the binary choice of the conservation of momentum in the shower, which has a ∆ recoil ∼ 0.02 in the m B Mellin analysis, becomes completely negligible for the endpoint strategy. As for p T,min , a, b, and r B , the corresponding ∆ (mt) θ are O(10 −3 ), hence they are borderline to be irrelevant; α S,F SR and m b carry small impacts, although not completely negligible. It should be remarked that these results rely on using only portion of the spectrum in the parabola fit.
As for m T 2 , all variants perform in a very similar way, with sensitivity being (mostly) as reasonably small as ∆ ∼ O(10 −3 ). Very little difference appears between the "min" and "true" options, and therefore, it seems that the pairing or combinatorial issue does not make a worrisome impact on the mass measurement. The mET and ISR prescriptions show a comparable dependence on the Monte Carlo parameters, for both Band the B -subsystems (see also Appendix B for a complete list). The mET ones exhibit the smaller sensitivity to some parameters, while the ISR ones are less dependent on other variables. Therefore, both prescriptions should be considered equally good to determine m t . Again, these results hold for the analysis with relatively narrow fitting ranges towards the tails, yielding reliable parabola fits of the endpoints, even with no reference to the details of the reconstruction. As for m T 2 in the B-subsystem, given the little difference between the "min" and "true" cases in the B -subsystem, the use of the B-subsystem options loses some of its initial motivations. However, it exhibits a dependence on the Monte Carlo parameters very close to the lowest of m T 2 , hence it can be a useful validation of the B results.
One notable feature has to do with the "perp" correction applied to the variables denoted by the ⊥ subscript. In this subsystem, the ⊥-correction leads to a slight reduction of the sensitivity to the Monte Carlo parameters, in particular, to α S,F SR and m B for the mET prescription. By performing the ⊥-correction, any process-dependent effect in association with initial-state radiation, which is somehow convoluted with the Monte Carlo parameters of interest, is detached, so that the resulting sensitivity gets reduced. By contrast, such a reduction does not arise for the ISR-reconstruction scheme, since the ⊥-correction itself carries over its own sensitivity to the parameters through A +Ā. The ⊥-variants in the B subsystem are somewhat more involved: Here the inclusion of leptons in constructing the variables already mitigates the sensitivity, even before applying the ⊥-correction. It turns out that the net effect resulting from the ⊥-correction is not friendly in the B subsystem. Nevertheless, this class of variables are worth investigating, in the sense of probing sensitivities to the Monte Carlo parameters with any production-dependent effects decoupled.
Overall, we find a conclusion similar to the Mellin analysis: ISR in tt production, pairing leptons and B-hadrons, as well as the treatment of lightflavored hadrons in the b-jet, are all largely orthogonal from the viewpoint of the sensitivity to the Monte Carlo hadronization parameters.

Conclusions
The accurate measurement of the top-quark mass is one of the major goals for the physics program of the Large Hadron Collider. The level of precision that is demanded for m t to be useful in the context of physics of the Standard Model and in tests of new physics is around 500 MeV, well below the percent level.
Carrying out a measurement at such a level of accuracy is challenging and has generated a number of proposals. All strategies rely to some extent on our ability to predict some observables by means of a QCD calculation or a Monte Carlo simulation, e.g., a total or fiducial rate of top quark production, or the shapes of some differential distributions: all such methods exhibit an associated theoretical error. Presently, most precise determinations of m t use quantities that are believed to be less affected by theoretical errors, such as the peak of invariant mass obtained with kinematic (template) methods, which essentially tries to capture the peak in b-jet+W invariant mass in top-quark decay. In this case the most important systematic uncertainty is the jet energy scale, namely the correction necessary to measure the true jet energy from the measured detector response. These corrections have been the subject of intense studies and could be improved in future, but probably only modestly. Therefore, it seems that it may not be possible to improve significantly the measurement of the top-quark mass from these "standard" methods.
Motivated by the above drawbacks, a number of "alternative" strategies have been proposed and will acquire even more importance in the future. In fact, many of the alternative techniques pursue different strategies than the "standard" methods, and hence they will allow to cross-check the available and existing results, eventually giving rise to a global determination of m t , accounting for measurements performed by using inherently different strategies. Of course, the most useful measurements in this combination will be those that are not affected significantly by the jet energy correction, which presently dominates the uncertainty on m t . Among the proposals, there are measurements that utilize only leptons from semi-leptonic and dileptonic final states of tt decays and others that employ exclusive and semiexclusive hadronic final states. In both cases there is no reference to jets in the formulation of the observables that are used to extract m t from the comparison of data and calculations, and therefore the jet energy correction has a very minor effect in these measurements. These observables are, however, not free from uncertainties: in fact, when dealing with hadrons, one clearly encounters the difficulty to carry out theoretical predictions for such final states. For leptonic final states as well, there are residual theory errors because the distributions that one uses to extract m t are usually less sensitive to m t , but they still carry some non-negligible sensitivity to the theoretical modeling, e.g., to missing orders in perturbation theory or to the description of the production mechanism of the top quarks at the LHC.
The goal of our work was to quantify the uncertainty on the top-quark mass extraction due to our limited ability to compute observables for exclusive hadronic final states. We have considered a number of possible observables that can be defined on exclusive hadronic final states, both drawing from the literature and introducing some new ones, potentially useful to explore the achievable precision on m t from the study of hadrons in tt samples. We have concentrated our efforts on Monte Carlo event generators and the dependence of the predictions on their tunable parameters. As discussed throughout the paper, Monte Carlo codes are often used to derive predictions on exclusive hadronic final states at colliders and are often preferred over theoretical approaches, such as resummed calculations in the fragmentation function formalism, which, as discussed in the Introduction, tend have a difficulty i) to give predictions for generic observables, that can instead be obtained from Monte Carlo programs; ii) to give precise-enough predictions at least by using the currently available perturbative computations, which are typically NLO+NLL, and iii) to yield results at hadron level without relying on phenomenological non-perturbative models, which are often too simple to describe the data and eventually meet the LHC precision goals. Therefore we believe that a study of exclusive hadronic final states is best carried out employing Monte Carlo event generators.
We have simulated tt events by means of the PYTHIA 8 and HERWIG 6 programs, which differ considerably in the description of both emissions of soft and collinear partons in the so-called "parton shower" and of the formation of hadrons. Looking at both approaches, we are able to evaluate the peculiarities of the codes and identify the common issues which are to be addressed in the top-quark mass measurement strategies.
For both codes, we have evaluated the dependence of several observables on m t and on the most relevant Monte Carlo parameters. To simplify the understanding of the results, we have introduced a sensitivity measure ∆, defined in eq.(2), which allows to look in a uniform way to the vast number of results that we have collected in our tables. The sensitivity has been computed for the mean value of each quantity, calculated in a specific range, which corresponds approximately to the bulk of the distribution of the observable.
The observables belong roughly to the following classes: purely hadronic observables, e.g. the p T of the B-hadron, mixed leptonic-hadronic observables, e.g. the invariant mass m B , and purely leptonic observables, e.g. the lepton energy E . As could have been anticipated, the description of the hadronic final states is very important for purely hadronic observables and slightly less important for the mixed leptonic-hadronic ones. Leptonic quantities usually have so small dependence on the Monte Carlo parameters, that we observe a dependence compatible with zero within our statistical uncertainties arising from limited simulated data sample.
More quantitatively, we find that, in order to warrant a 0.5 GeV precision on the m t extraction from hadronic and mixed leptonic-hadronic observables, most parameters must be known with a precision around 10% in both PYTHIA and HERWIG: it is clear that such an accuracy cannot be attained without inputs from data. We will discuss shortly how the interplay between data, parameters and the measurement of the top quark mass might unfold. Before doing that, we highlight that for both PYTHIA and HERWIG it seems necessary to have an even more precise knowledge of the strong coupling constant used in the parton shower. For PYTHIA which uses a parameter named α s,F SR , corresponding to α s (m Z ) with a one-loop evolution, we find that hadronic and mixed leptonic-hadronic observables demand the knowledge of α s,F SR with a precision from 1% up to a few %. The necessity to fix α s so precisely clearly signals the need for improved theoretical predictions, which could be ultimately achieved by using beyond-LO hard matrix elements as well as a more accurate description of the parton shower, e.g., in the NLL approximation. In any case, even within our study with the present parton shower accuracy, the message is clear: the description of radiation is of utmost importance for the measurement of m t if one uses any observable that involves hadrons.
The striking precision with which it is necessary to know the Monte Carlo parameters motivated two further explorations in our analysis: i) the study of more robust observables, possibly less sensitive to the Monte Carlo parameters that necessarily act as nuisances in the m t determination; ii) the investigation of a strategy to derive directly from data the value of the Monte Carlo parameters to be used for the mass extraction.
For the former, we have studied the use of kinematic endpoints of certain masses that are known to carry information on the top quark mass. To this end, we investigated the endpoints of m B and of several variants of the m T 2 variable which can be constructed from the four observable final state particles (i.e., two B hadrons and two leptons) of the fully leptonic tt decay. A crucial observation drawn from our scrutiny is that studying the regions very close to the endpoint enables us to make predictions sufficiently independent from Monte Carlo parameters. The price to pay for this is to reduce the number of events to be analyzed, hence inflating the statistical uncertainties to a level jeopardizing the entire strategy. In other words, the resulting statistical uncertainty can be even beyond the level of theoretical uncertainty that we aim to achieve. Furthermore, it should be remarked that, when one concentrates on the region of the spectrum so close to the endpoint, the current limit in the Monte Carlo description, e.g., "off-shell" production of top quarks, not-calculated perturbative higher orders, which were assumed sub-dominant and hence neglected in our study, might start to play an important role. To alleviate this issue one would be forced to consider a larger portion of the spectrum, hence reintroducing a larger dependence on the Monte Carlo parameters. Seeking a balance between statistical and Monte Carlo uncertainties parameters will hence be a delicate task for the experimental collaborations.
Clearly, if one had a good knowledge of the Monte Carlo parameters, all the above-mentioned conclusions on the uncertainties in the top quark mass determination could be improved. For this reason we have studied what kind of constraints can be obtained from data if one tries to use a number of measured distributions in tt events in order to adjust the parameters in the Monte Carlo event generators. We have restricted a majority of our study to PYTHIA, expecting that the general conclusions and observations obtained with PYTHIA will be valid for HERWIG as well.
The effort of reproducing measured distributions varying the parameters might be considered as a "tuning" similar to the general-purpose tunings which exist for Monte Carlo generators. However, we remark that for the mass measurements outlined above it is necessary to "tune" α s to about 1% precision and other parameters around 10% precision. Therefore, one should take this tuning in part as an attempt to capture physics effects that go beyond the approximation inherent in the use of the parton shower approach. In other words, in the fitting procedure, one incorporates in the parameters all the effects due to the incomplete description of perturbative and non-perturbative physics in the event generator. Therefore, our tuning is quite likely to be specific to the top-quark events, while in general it may not be valid for other processes. Therefore, even if PYTHIA is eventually tuned to top-quark data, it will be very unlikely for such fits to be "universal", i.e., they are not expected to be reliable outside the very same data sample accounted for in the parameter adjustment. For this reason we would like better to call this procedure a "calibration" of the Monte Carlo variables, specific to tt processes. Such a calibration is ideally performed on the same data as what will be used to measure the top-quark mass, hence we might call this strategy a m t determination, in conjunction with an "in-situ" Monte Carlo calibration.
As for PYTHIA, we have studied seven parameters which affect the showering and hadronization modeling. We find that, in order to constrain them from tt data, one has to examine the detailed shape of several observables, as the study of more inclusive quantities (e.g., averages in a range) may not be enough. We further observe that it is necessary to carry out a global analysis with several quantities, as no single-observable exploration can constrain all the necessary parameters.
In summary, we find that using simple features, e.g., the mean of the spectra, to extract the top-quark mass requires good knowledge on the Monte Carlo parameters. Such parameters can be obtained from the very same data sample used to determine m t , but looking at observables which have sufficiently large dependence on the parameters and very little sensitivity to m t . This "in-situ" calibration of the Monte Carlo event generator might warrant a sufficiently small theoretical error to obtain m t from exclusive hadronic final states, but it may be necessary to carry out a detailed shape analysis of several observables at once. Despite correlations among the calibration quantities, it seems that the required precision can ultimately be achieved if one is able to measure the observable shapes with about 1% accuracy, which may be challenging. Furthermore, our results might be altered by experimental systematics and correlations among measurements that we have not evaluated (e.g., a correlation in the calibration observables due to common experimental systematics). For this reason, one might look at endpoints of some distributions which are known to be sensitive to the top quark mass, e.g., the endpoint of m B . For such observables, we have demonstrated that the effect of the Monte Carlo parameters can be suppressed at the cost of larger statistical uncertainties. It remains as a future work to pursue an optimal balance between endpoint analyses versus studies of the bulk of the distributions, in view of the performances that can be achieved through the "in-situ" Monte Carlo calibration.
The present work can also be extended along several lines. Although in the introduction we discussed the role that it plays in the uncertainty in the top-mass determination, color reconnection was not explored in our investigation. Nevertheless, studying the impact of color-reconnection models and parameters, along the lines of [48], on Mellin moments, peaks and endpoints of B-hadron observables is certainly mandatory for the sake of completeness. Furthermore, even if tt-production processes are mostly high transverse-momentum events, investigating the effect on the top mass of the modeling of the underlying event and of the implementation of multiple scatterings is definitely compelling.
Regarding Monte Carlo generators, the HERWIG 6 code, which we used through our analysis, is outdated and not supported anymore. Therefore, it will be much useful updating our investigation by employing the latest object-oriented HERWIG 7.1 code. In fact, as debated in the introduction, its description of bottom-quark fragmentation is not optimal, but, thanks to the implementation of the new dipole-shower model and a number of novel features for the purpose of the treatment of heavy quarks, the comparison with B-hadron data at e + e − machines has much improved with respect to previous versions [56]. Analyzing the dependence of calibration and m t -dependent observables on the parameters of HERWIG 7.1 will hence be very instructive.
Finally, an obvious extension of our study consists of using NLO+shower codes, such as the latest version of aMC@NLO [12], SHERPA [18] and especially POWHEG [13], which accounts for NLO corrections to top decays, width and non-resonant effects. We plan to explore our B-hadron observables by using aMC@NLO, POWHEG and SHERPA, in order to check their sensitivity to non-perturbative parameters and whether it is different from the case of standard LO event generators. In fact, on the one hand, one should expect that, thanks to the improved description of the perturbative part of the tt event simulation, the dependence on non-perturbative parameters should become milder. On the other hand, however, a recent analysis on the determination of the top mass [73] using NLO+shower codes has displayed substantial discrepancies according to whether the latest POWHEG version is interfaced to HERWIG 7 or PYTHIA 8 for parton cascades, hadronization and underlying event. Exploring the parameter dependence of calibration and top-mass observables yielded by POWHEG interfaced to HERWIG 7 or PYTHIA 8 should possibly help to shed light on the discrepancy in the mass extraction pointed out in [73]. This is in progress as well.

A Determination of the sensitivity matrix
In the regime in which the observables are linear functions of the parameters where D is a N O × N θ matrix. The matrix can be said to contain the derivatives of the observables w.r.t. the parameters Since dÕ = dO we will use them interchangeably at our convenience.
If we want to derive the parameters that correspond to a certain measurement of the observables, we should invert D, which may not be rigorously possible when D is not a square matrix. In place of the inverse, we use its pseudo-inverse [69,70], that we denote as D −1 , and which can be computed by a singular-value decomposition of D. The singular-value decomposition allows to write D as the product of three matrices D = Ω −1D P which can be understood as a change of basis in the vector parameter space operated by P , a change of basis in the observable space operated by Ω, and a matrix D which gives the linear dependence of the observables on the parameters in the new basis. The matrixD is diagonal, hence the matrix made of its reciprocal diagonal values can be used to define the pseudo-inverse in such a way thatDD = 1. With this definition we can write and ultimately fÕ = Λ O DΛ −1 θ f θ i .
The last line involves dimensionless fudge-factors f and the sensitivity matrix ∆ whose elements are Once we know the linear mapping between the dimensionless fudge factors, we can use eq. (11) to find out the covariance matrix of one as a function of the covariance of the others.
As an example, Table 10 presents the sensitivity matrix for the p T,B /p T,j b ratio in terms of the PYTHIA parameters, in each bin.

B Sensitivities and special features of m T 2
For completeness, we report the sensitivities of m t to shower and hadronization parameters, once it is extracted from a shape analysis of special features of the m T 2 variable, obtained according to other reconstruction schemes with respect to those presented in Table 9. The corresponding numbers for ∆  Table 11.    for an analysis extracting m t by using special features of the m t spectra, as discussed in the text. The range extremal values reported in the second column are all in GeV.