Reconstruction of the Higgs mass in events with Higgs bosons decaying into a pair of tau leptons using matrix element techniques

We present an algorithm for the reconstruction of the Higgs mass in events with Higgs bosons decaying into a pair of tau leptons. The algorithm is based on matrix element (ME) techniques and achieves a relative resolution on the Higgs boson mass of typically 15-20%. A previous version of the algorithm has been used in analyses of Higgs boson production performed by the CMS collaboration during LHC Run 1. The algorithm is described in detail and its performance on simulated events is assessed. The development of techniques to handle tau decays in the ME formalism represents an important result of this paper.


Introduction
A new boson of mass 125 GeV has been observed by the ATLAS and CMS collaborations [1,2]. The properties of the new particle are compatible with the predictions for the Standard Model (SM) Higgs (H) boson [3,4,5,6,7,8] within the present experimental uncertainties [9, 10,11,12]. The observation of its decay into a pair of τ leptons, at a rate that is compatible 5 with the SM expectation, has been reported recently [13,14,15]. The decay into a pair of τ leptons allows for the most precise measurement of the direct coupling of the SM H boson to fermions. Decays of heavy resonances into τ lepton pairs furthermore provide high sensitivity to search for models with an extended Higgs sector, constituting an important experimental signature at the LHC. 10 The sensitivity of the SM H → τ τ analysis critically depends on the capability to distinguish the signal from a large irreducible background, arising from Z/γ * → τ τ Drell-Yan (DY) production. An important handle to separate the signal from the background is the mass of the τ lepton pair, which we denote by m τ τ . The signal is expected to show up as a small bump on the high mass tail of the m τ τ distribution of the background (see e.g. Figs. 15 8, 9, and 11 of Ref. [13]).
The separation of the signal from the background improves if the mass distribution for the signal is narrow. The SM predicts the total width of the H boson to be ≈ 4 MeV. Present experimental upper limits on the total width amount to ≈ 10 times the SM value [16,17]. These limits have been obtained by comparing the rates for off-shell versus on-shell H boson 20 production and depend on certain assumptions. Direct, model independent, upper limits on the total H width, obtained by analyzing the mass spectra in H → ZZ → 4 ( = e, µ) and H → γ γ events, are ≈ 1 GeV. In contrast, the width of the m τ τ distribution reconstructed in SM H → τ τ events typically amounts to ≈ 20 GeV and is dominated purely by the experimental resolution. 25 Different methods for the reconstruction of m τ τ have been discussed in the literature [18,19,20,21,22]. The SVfit algorithm [23] has been used to reconstruct the H boson mass in the SM H → τ τ analysis as well as in searches for further H bosons predicted by models beyond the SM performed by the CMS collaboration during LHC Run 1 [24,25,13,26,27,28,29,30,31]. Compared to alternative mass variables, the usage of the SVfit algorithm 30 has improved the sensitivity of the SM H → τ τ analysis for measuring the signal rate by ≈ 40% [13]. The improvement in sensitivity corresponds to a gain by about a factor of two in integrated luminosity of the analysed dataset.
In this paper we report on the development of an improved version of the SVfit algorithm. Two variants of the improved algorithm have been implemented. The first variant allows to 35 reconstruct the mass m τ τ of the τ lepton pair. It has been developed within the paradigm of the matrix element (ME) method [32,33]. Whereas the algorithm described in Ref. [23] uses a likelihood function of arbitrary normalization, the improved algorithm is based on a proper normalization within the formalism of the ME method. The second variant of the improved algorithm uses a likelihood function of arbitrary normalization. The algorithm allows for the 40 reconstruction of not only the mass m τ τ of the τ lepton pair, but of any kinematic function of the two τ leptons, including the p T , η, φ, and transverse mass of the τ lepton pair. A further improvement concerns the extension of the algorithm to account for the experimental resolution on the reconstruction of hadrons that are produced in the τ decays.
The development of the formalism to handle τ decays in the ME method constitutes an the ME method and our treatment of the experimental resolution on the reconstruction of hadrons that are produced in the τ decays. The second variant of the algorithm is presented in Section 3. The performance of both variants of the improved algorithm in terms of achieved m τ τ resolution is compared to the previous version of the SVfit algorithm, used during LHC 55 Run 1, and to selected alternative mass observables in Section 4. The results are discussed in Section 5. The paper concludes with a summary in Section 6.

The Matrix element method
In the ME method, an estimate for the unknown model parameter Θ is obtained by maximizing the probability density: with respect to Θ. The symbols E a and E b and p a and p b denote, respectively, the energies and momenta of the two colliding protons, √ s represents the centre-of-mass energy, x a and bosons at the LHC typically proceeds in association with jets. The treatment of events with hadronic activity, in which the H boson has non-zero p T , is detailed in Section 2.2. The TFs are described in Section 2.3 and the integration over the n-particle phase space is described in Section 2.4. The computation of the cross section σ(m H ) that is needed for a proper 95 normalization of the probability density P(y|m H ) in Eq. (1) is described in Section 2.5. The numerical maximization of the probability density P(y|m H ) with respect to the mass m H of the H boson is described in Section 2.6. We denote this version of the SVfit algorithm by SVfitMEM.
We conclude this section with a description of an artificial term that we choose to add 100 to the probability density P(y|m H ), in order to reduce tails in the m τ τ distribution reconstructed by the algorithm. The structure of this "regularization" term is described in Section 2.7. Distributions of m τ τ in simulated events, reconstructed with and without this term, are presented in Section 4. As a consequence of using LO ME to model the H boson production process pp → H, 105 the efficiency (p, m H ) and the acceptance A(m H ) cannot be determined reliably, because they depend on the H boson p T spectrum. In particular for H bosons of low mass m H , the probability for the particles produced in the τ lepton decays to pass selection criteria on p T and η, which are necessitated by trigger requirements at the LHC, may vary significantly as function of H boson p T . For this reason, we will assume that m τ τ is reconstructed before

Matrix element
We decompose the squared modulus of the ME, |M(p, m H )| 2 , for the process pp → H → τ τ with subsequent decay of the τ leptons into electrons, muons, or hadrons into five parts: |M(p, m H )| 2 = |M pp→H→τ τ (p, m H )| 2 · |BW (1) τ | 2 · |M (1) τ →··· (p)| 2 · |BW (2) where we use the superscripts (1) and (2) to refer to the τ lepton of positive and negative charge, respectively. The first term, |M pp→H→τ τ (p, m H )| 2 , represents the squared modulus of the ME for H boson production with subsequent decay of the H boson into a pair of τ leptons. This term can be computed by using automatized tools such as CompHEP or 120 MadGraph or it can be taken from the literature. We take |M pp→H→τ τ (p, m H )| 2 from the literature and decompose it into a product of three factors: |M pp→H→τ τ | 2 = |M gg→H | 2 · |BW H | 2 · |M H→τ τ | 2 .
We model the H boson production using the LO ME for the gluon fusion process gg → H, which accounts for about 90% of the total H boson production rate at the LHC. The corresponding Feynman diagram is shown in Fig. 1. The squared modulus of the ME reads [41]: The squared modulus of the Breit-Wigner propagator |BW H | 2 , given by: associates H boson production and decay. In Eq. (7), the symbol q 2 H = (E τ (1) + E τ (2) ) 2 − (p τ (1) + p τ (2) ) 2 denotes the mass of the τ lepton pair. The squared modulus of the ME for the decay of the H boson into a τ lepton pair, summed over the spin states of the two τ leptons, is given by: and is related to the branching ratio B(H → τ τ ) by [42]: For a SM H boson, the branching ratio B(H → τ τ ) becomes small and the total width Γ H becomes large once the decay into a pair of W bosons is kinematically possible, i.e. for m H 2 m W . We remark that in theories beyond the SM, which motivate the search for additional heavy scalars, the branching ratio and total width may be very different from the 125 SM values. In this paper, we assume B(H → τ τ ) = 100% and Γ H = 10 −2 · m H , and we compute |M H→τ τ | 2 according to Eq. (9). The branching ratio for the decay H → τ τ and the total width of the H boson actually have no effect on the value of m τ τ reconstructed by the algorithm, provided that B(H → τ τ ) and Γ H are treated consistently when computing the squared modulus of the ME |M(p, m H )| 2 and the normalization factor 1/σ(m H ) in Eq. (1).

130
Concerning the decay of the τ leptons, we use the narrow-width approximation (NWA) and for each τ lepton we take a separate and independent average over its spin states, effectively ignoring the correlation in spin orientations between the two τ leptons. In the NWA, the squared modulus of the Breit-Wigner propagator |BW τ | 2 that associates τ lepton production and decay yields a δ-function: where ∆t = 290 × 10 −15 s denotes the lifetime of the τ lepton [43]. The factor |M (2) represents the decay of the i-th τ lepton. For the decays τ → e ν e ν τ and τ → µ ν µ ν τ , which we refer to as "leptonic" τ decays, we take the ME from the literature. Taking the average over the spin states of the τ lepton, the squared modulus of the ME is given by [44]: The modelling of the decays τ → hadrons + ν τ by matrix elements is more difficult, due to the fact that τ leptons decay to a variety of hadronic final states, and because some of the decays proceed via intermediate meson resonances [43]. We refer to these decays as "hadronic" τ decays. The ME for the dominant hadronic τ decay modes are discussed in the literature [45,46]. In this paper, we use a simplified formalism and treat τ → hadrons + ν τ decays as two-body decays into a hadronic system τ h of momentum p vis and mass m vis plus a ν τ . The squared modulus of the ME for the decay is taken to be constant and denoted by |M eff τ →τ h ν τ | 2 . The value of |M eff τ →τ h ν τ | 2 is chosen such that it reproduces the measured branching fraction for hadronic τ decays. The following relation holds for the considered case of a two-body decay: from which it follows that: with B(τ → hadrons + ν τ ) = 0.648 [43]. We have verified that the sum of all hadronic final states produced in τ lepton decays is well reproduced by our simplified model. Fig. 2 shows the fraction of τ lepton energy, in the laboratory frame, carried by the "visible" τ decay products: We use the term "visible" τ decay products to refer to the sum of all hadrons produced in decays of the type τ → hadrons + ν τ as well as to the electron or muon produced in the decays τ → e ν e ν τ and τ → µ ν µ ν τ , respectively.

Treatment of hadronic activity
The phase-space for QCD radiation is large at the centre-of-mass energies of the LHC and 135 as a consequence, particles with masses up to a few hundred GeV are typically produced in association with a sizeable hadronic activity [47]. We refer to the vectorial sum of all particles in the event that do not originate from the H boson decay as the "hadronic recoil" and denote the corresponding momentum by p rec . CMS, as well as ATLAS, have developed sophisticated techniques to improve the reconstruction of the hadronic recoil [48,49].

140
The effect of the hadronic recoil is to modify the kinematics of the H boson decay by boosting the τ leptons and their decay products, in the transverse plane and in longitudinal direction. The longitudinal component of p rec can be accounted for by a small adjustement Figure 2: Fraction z of the τ lepton energy, in the laboratory frame, carried by visible τ decay products in simulated SM H → τ τ events of m H = 125 GeV. The case of τ → µ ν µ ν τ decays is shown on the left and the case of τ → hadrons + ν τ decays on the right. Our simplified model, which treats hadronic τ decays as two-body decays into a hadronic system τ h and a ν τ , reproduces the distribution in z obtained with a detailed Monte Carlo simulation of the τ decays, based on TAUOLA [50].
of the Bjorken scaling variables x a and x b and does not cause a problem for modeling the H boson production by LO ME. However, the components of p rec in the transverse plane need 145 to be accounted for. The LO ME that we use to model the production of the H boson is adequate for events in which the H boson has zero p T , implying the transverse components of p rec to be zero. In order to use the LO ME in analyses at the LHC, the momenta of the visible τ decay products and of the neutrinos produced in the τ decays need to be transformed into a frame in which the H boson has zero p T . The transformation is achieved by a Lorentz boost 150 in the transverse plane, using (Ê rec =p rec T ,p rec x ,p rec y ,p rec z = 0) as boost vector. We denote the true values of energies and momenta by symbols with a hat if they are given in the laboratory frame, and by symbols a tilde in case they are given in the zero transverse momentum frame of the H boson. Symbols with neither hat nor tilde denote values measured in the laboratory frame.

155
The hadronic recoil is reconstructed with a typical resolution of 10-20 GeV at the LHC [48,49]. It is demonstrated in Ref. [47] that the imperfect reconstruction of the hadronic recoil in the detector may cause a significant bias on the H boson mass reconstructed by the ME method in case one ignores the experimental resolution on the hadronic recoil. We account for the experimental resolution by introduction of a TF W rec (p rec x , p rec y |p rec x ,p rec y ). We then marginalize Eq. (1) with respect top rec x andp rec y : The cross section σ(m H ) needs to be replaced by: σ (m H ) = σ(m H ) dp rec x dp rec y (16) in order for P(y; p rec x , p rec y |m H ) to have the correct dimensions. The symbol y refers to the measured momenta of the visible τ decay products, p vis(1) and p vis(2) . Using the relations (Ê a ,p a ) = √ s 2 (1, 0, 0, 1) and (Ê b ,p b ) = √ s 2 (1, 0, 0, −1) for the energies and momenta of the two colliding protons, the integral over the δ-function that ensures the conservation of energy and momentum can be expressed by: dx a dx b dp rec x dp rec y δ(x aÊa + x bÊb − (Ê rec +Ê τ (1) +Ê τ (2) ))· δ 3 (x a p a + x bp b − (p rec +p τ (1) +p τ (2) )) = 2 s dp rec x dp rec y δ(p rec with: The integration over dp rec x and dp rec y removes the δ-functions δ(p rec x +p τ (1) x +p τ (2) x ) and δ(p rec y + p τ (1) y +p τ (2) y ) in Eq. (17). Substituting Eq. (2) into Eq. (15), we obtain: with x a and x b given by Eq. (18) and: The form of the TF for the hadronic recoil, W rec (p rec x , p rec y |p rec x ,p rec y ), is discussed in Section 2.3.3.
In practice, the experimental resolution on the momenta of electrons, muons and also τ h is typically negligible compared to the resolution on the hadronic recoil. In good approximation, the resolution on the hadronic recoil is equivalent to the resolution on the vectorial sum of the momenta, in the transverse plane, of all particles reconstructed in the event. The latter is referred to as "missing transverse momentum" and denoted by p miss T . The following relations hold for the components of the p miss T vector: x and p miss y = − p rec y + p vis(1) These relations are valid on reconstruction level as well as for the true values of the momenta, i.e. Eq. (21) is valid also in case all p x and p y are replaced byp x andp y . Approximating the resolution on the hadronic recoil by the resolution on p miss T has the advantage that the 160 resolutions on p miss T have been studied in detail and published by the ATLAS as well as CMS collaborations [48,49].

Transfer functions
The treatment of the experimental resolution on the momenta of electrons, muons and τ h produced in the τ decays, as well as of the hadronic recoil, are detailed in this section. The 165 resolution on the momentum p vis(i) of the visible τ decay products is parametrized as function of the true momentump vis(i) and modeled by TF W (p vis(i) |p vis(i) ). The case of hadronic and leptonic τ decays is described in Sections 2.3.1 and 2.3.2, respectively. The neutrinos produced in the τ decays are not included in the TF, but are handled separately, by the formalism detailed in Sections 7.1 and 7.2 of the appendix. The TF W rec (p rec x , p rec y |p rec x ,p rec y ) that we use 170 to model the experimental resolution on the hadronic recoil is discussed in Section 2.3.3.

Hadronic τ decays
The energy, or equivalently p T , of the τ h is reconstructed with a resolution of 5-25% at the LHC [51,52]. The resolution typically varies as function of p T and η and may additionally depend also on the multiplicity of the charged and neutral hadrons produced in the τ decay.

175
The resolution on the p T of the τ h is of similar magnitude as the resolution on m τ τ that we aim to achieve and needs to be taken into account by suitable TF, denoted by W h (p vis T |p vis T ), when evaluating the integral in Eq. (19).
ATLAS as well as CMS report that the energy response for hadronic τ decays may be asymmetric [53,54]. For the purpose of this paper, we assume the TF W h (p vis T |p vis T ) to follow a Gaussian distribution within the core region x = p vis T /p vis T ≈ 1 and to feature non-Gaussian tails, which follow power-law functions, on both sides of the Gaussian core. More specifically, we use the form: . The factor N is determined by the requirement that the function W h (p vis T |p vis T ) satisfies the normalization condition dp vis T W h (p vis T |p vis T ) = 1 185 for any given value ofp vis T . The TF given by Eq. (22) is visualized in Fig. 3. The resolution on the direction of the τ h is on the level of a few milliradians and is negligible in practice. We hence model the TF for the momentum of the τ h by the product of Eq. (22) and two δ-functions: The factor sin 2 θ vis /(p vis T ) 2 is needed to ensure the correct normalization of the TF: d 3 p W h (p vis |p vis ) = dp vis x dp vis y dp vis z W h (p vis |p vis ) = dp vis T dθ vis dφ vis where the factor (p vis T ) 2 / sin 2 θ vis corresponds to the Jacobian of the variable transformation from (p vis x = p vis T cos φ vis , p vis y = p vis T sin φ vis , p vis z = p vis T / tan θ vis ) to (p vis T , θ vis , φ vis ). A few words of explanation are in order concerning the treatment of the mass m vis of the hadronic system τ h produced in τ → hadrons + ν τ decays. Within the scope of this paper, 190 we will assume that m vis can be reconstructed with negligible experimental resolution. The τ h reconstruction algorithm of the CMS experiment [52] allows to reconstruct the mass of τ h . The experimental resolution on m vis can be taken into account by adding, to Eq. (19), a suitable TF W h (m vis |m vis ) and marginalization with respect to the true massm vis of the τ h . In case of the CMS experiment the effect of the resolution on m vis on the reconstruction of 195 m τ τ is found to be small. The τ h reconstruction algorithm used by the ATLAS experiment during LHC Run 1 [51] does not allow to reconstruct the mass of τ h . In case m vis cannot be reconstructed, one needs to add, to Eq. (19), a marginalization with respect to m vis .

Leptonic τ decays
We assume that the p vis T , η vis , and φ vis of the electrons and muons produced in τ decays are measured with negligible experimental resolution. Correspondingly, we model the TF by a product of three δ-functions:

Hadronic recoil 200
We use a two-dimensional normal distribution: to model the experimental resolution on the momentum, in the transverse plane, of the hadronic recoil. The symbols: ∆p rec x = p rec x −p rec x and ∆p rec y = p rec y −p rec y (27) refer to the difference between reconstructed and true values of the momentum in x-and y-direction, and the covariance matrix: accounts for the fact that the resolutions σ x and σ y in x-and y-direction may be correlated, with the correlation coefficient denoted by ρ. The symbol |V | denotes the determinant of V .
Two-dimensional normal distributions have been demonstrated to model well the resolution on p miss T in case of the CMS experiment [48,55]. As explained in Section 2.2, the resolution on p rec x and p rec y is very similar to the resolution on p miss x and p miss y , motivating the use of TF 205 of the same type for the modelling of the experimental resolution on the hadronic recoil.
For the purpose of this paper, we assume that the components p rec x and p rec y are reconstructed with a resolution of σ x = σ y = 10 GeV each and that the differences ∆p rec x and ∆p rec y between reconstructed and true values of both components are uncorrelated, i.e. ρ = 0. In the real experiment, the covariance matrix V is computed on an event-by-event basis, based 210 on the reconstructed hadronic activity in a given event, using resolution functions obtained from the Monte Carlo simulation [48,55].

Computation of phase space integral
The integration over the differential n-particle phase space element dΦ n in Eq. (19) needs to be done differently for events in which both τ leptons decay hadronically ("hadronic" τ pair decays), events in which one τ lepton decays hadronically and one τ lepton decays leptonically ("semi-leptonic" τ pair decays), and events in which both τ leptons decay leptonically ("leptonic" τ pair decays). With the approximation that we treat hadronic τ decays as twobody decays, the integral over dΦ n is of dimension 12 in case of hadronic τ pair decays, 15 in case of semi-leptonic τ pair decays, and 18 in case of leptonic τ pair decays. The differential phase space element reads: where: The integrand in Eq. (19) depends on the four-momentum of the two τ leptons via the product f (x a ) f (x b ) of the PDF, the factor 1/(2 x a x b s), referred to as "flux factor" 215 in the literature, and via the squared modulus |M pp→H→τ τ (p, m H )| 2 of the ME for H boson production and subsequent decay into a τ lepton pair. In addition, it depends on the four-momentum of the visible τ decay products via the squared moduli |M (1) τ →··· (p)| 2 and |M (2) τ →··· (p)| 2 of the ME for the τ lepton decays and via the transfer functions W (p vis(1) |p vis(1) ) and W (p vis(2) |p vis(2) ). The τ lepton energies and momenta need to be computed as function 220 of the integration variables.
The dimension of the integration over the phase-space elements d 3 p ν(i) and d 3 p ν (i) d 3 p ν(i) can be reduced by means of analytic transformations. Two variables are sufficient to fully parametrize the kinematics of hadronic τ decays. In case of leptonic τ decays, three variables are sufficient. We choose to parametrize hadronic τ decays by the variables z and φ inv . The variable z represents the fraction of τ lepton energy, in the laboratory frame, that is carried by the visible τ decay products (cf. Eq. (14)). We denote the energy and momentum of the τ neutrino produced in hadronic τ decays as well as of the neutrino pair produced in leptonic τ decays by the symbols E inv and p inv . The energy component E inv is related to the variable z via: The angle θ inv between the p inv vector and the p vis vector is related to the variable z as well, and is given by Eq. (48) in case of hadronic τ decays and by Eq. (67) in case of leptonic τ decays. The variable φ inv specifies the orientation of the p inv vector relative to the direction of the visible τ decay products. In case of hadronic τ decays, the magnitude of the p inv vector is equal to E inv . We choose the mass m inv of the neutrino pair as third variable to parametrize the kinematics of leptonic τ decays, so that the magnitude of the p inv vector is given by With the convention that m inv = 0 for hadronic τ decays, Eqs. (48) and (67) can be expressed by a common form that is valid for hadronic as well as for leptonic τ decays: The τ lepton momentum vector is given by the sum of the p vis and p inv vectors. The angles θ inv and φ inv are illustrated in Fig. 4. The p inv vector is located on the surface of a cone, the axis of which is given by the p vis vector and the opening angle of which is given by Eq. (32). The variable φ inv represents the angle of rotation, in counter-clockwise 225 direction, around the axis of the cone. The value φ inv = 0 is chosen to correspond to the case that the p inv vector is within the plane spanned by the p vis vector and the beam direction. Figure 4: Illustration of the variables θ inv and φ inv that specify the orientation of the p inv vector relative to the momentum vector p vis of the visible τ decay products.
The parametrization of the τ decay kinematics by p vis and the variables z and φ inv , respectively by z, φ inv , and m inv , allows one to simplify the evaluation of the integral in Eq. (19) considerably. Expressions for the product of the differential phase space elements dΦ with the squared modulus of the ME |BW τ | 2 · |M The functions f h and f are given by Eqs. (52) and (68). Eq. (33) represents the quintessence of what is needed in order to extend the ME generated by automatized tools such as CompHEP or MadGraph by the capability to handle the τ 230 decays. Instead of performing an integration over d 3 p τ (1) d 3 p τ (2) , which treats the τ leptons as stable particles, one needs to perform the integration over |BW τ →··· (p)| 2 dΦ (2) according to Eq. (33). The momenta of both τ leptons need to be computed as function of the integration variables z (1) , φ inv , using Eqs. (31) and (32), where m (i) inv is equal to zero in case the i-th τ lepton decays hadronically.

235
The τ lepton momenta can then be used to evaluate the product of the PDF, the flux factor, and the squared modulus |M pp→H→τ τ (p, m H )| 2 of the ME for H boson production and subsequent decay into a τ lepton pair in Eq. (19).
In order to improve the accuracy of the numerical integration, we perform a further variable transformation, replacing z (2) by the variable t H , defined below. The transformation is executed in two steps. First, we replace z (2) by: with q vis denoting the mass of the visible decay products of both τ leptons. Following Eq. (8) in Ref. [47], we then parametrize q 2 H by: The form of the variable transformation in Eqs. (34) and (35) is chosen such that the Jacobi factor of the transformation from z (2) to t H is proportional to the inverse of |BW H | 2 , the squared modulus of the Breit-Wigner propagator of the H boson in Eq. (7). The Jacobi factor is given by: Compared to Ref. [47] we differ by a factor 1 π in the derivative

Computation of σ (m H )
According to the paradigm of the ME method, the normalization factor 1/σ (m H ) in Eq. (19) is to be computed by evaluating the integral: Of the factors in the integrand of Eq. (37), only the TF depend on the measured momenta p vis(1) and p vis(2) of the visible decay products of the two τ leptons and on the measured momentum components p rec x and p rec y of the hadronic recoil. All other factors depend solely on the true values of the momenta. According to the TF normalization conditions so that Eq. (37) becomes: The integral in Eq. (39) is computed numerically, for H boson masses m H ranging from 50 to 5000 GeV in steps of 1 GeV. The numeric integration is performed using the VAMP algorithm [56], an improved implementation of the VEGAS algorithm [57]. The result is 245 used for the purpose of normalizing the probability density P(p vis(1) , p vis(2) ; p rec x , p rec y |m H ) in Eq. (19).
The cross sections σ (m H ) computed in this way cannot be directly compared to literature values, due to the fact that we are applying the LO ME to events in which the H boson has non-zero p T .

Determination of m τ τ
The best estimate, m τ τ , for the mass of the τ lepton pair in a given event is obtained by computing the probability density P(p vis(1) , p vis(2) ; p rec x , p rec y |m the integrand is evaluated 20 000 times. The series of mass hypotheses is defined by a recursive relation: where m vis denotes the mass of the visible τ decay products. The step size δ is chosen such that it is small compared to the expected resolution on m H , which typically amounts to 15-20% relative to the true mass of the τ lepton pair (cf. Section 4). In order to reduce the computing time, the series is computed in two passes. The purpose of the first pass, which uses a step size δ = 0.10, is to find an approximate value of m τ τ that maximizes the probability density P. The series of mass hypotheses that is used in the [GeV] test H m 0 200 400 600 , in two exemplary simulated Z/γ * → τ τ background events. In the event shown on the left ("0-jet") the Z boson has little p T , while in the event shown on the right ("1-jet boosted") the Z boson recoils against a high p T jet. The scale of the ordinate is adjusted such that the probability density is equal to one for the value of m is fitted by a second order polynomial in the region around the maximum, and m τ τ is taken to be the point at which the polynomial reaches its maximum.

275
Graphs of the probability density P(p vis(1) , p vis(2) ; p rec x , p rec y |m are shown in Fig. 6 for two exemplary events. The events are drawn from a simulated sample of Z/γ * → τ τ background events, produced as described in Section 4. In the first event ("0-jet") the Z boson has little p T , while in the second event ("1-jet boosted") the Z boson recoils against a high p T jet. The scale of the ordinate is adjusted such that the probability 280 density is equal to one for the value of m test(i) H that maximises P. The width of the graph reflects the experimental resolution on m τ τ . As will be explained in more detail in Section 4, the resolution on m τ τ improves considerably in case the Z boson has high p T . The graphs of the probability density are superimposed for two cases: for the case that the artificial regularization term described in Section 2.7 is used and for the case that it is not used. In 285 the Z/γ * → τ τ background event in which the Z boson has little p T the probability density decreases slowly as function of the mass hypothesis in case no artificial regularization term is used with the effect that such events have a sizeable probability for the reconstructed m τ τ to populate, due to resolution effects, the region in which the Higgs boson signal is searched for.

Artificial regularization term
The SM H → τ τ signal is produced at a rate about three orders of magnitude smaller than the irreducible Z/γ * → τ τ background. As illustrated in Fig. 5, the production rate for hypothetical heavy resonances, such as heavy pseudoscalar Higgs bosons or heavy spin 1 resonances, typically decreases steeply with mass. Physics analyses at the LHC will soon 295 start to probe signal cross sections of order 1 fb. In order to maintain high sensitivity for the SM H → τ τ signal as well as for hypothetical heavy resonances it is imperative to reduce, as much as possible, high mass tails in the m τ τ distribution reconstructed in Z/γ * → τ τ background events, as tails that are on or below the per mille level may yet compete with or even dwarf potential signals.
As exemplified in Fig. 6, high mass tails in the m τ τ distribution for the Z/γ * → τ τ background predominantly arise from events in which the Z boson has little p T and the probability density P decreases slowly as function of m Penalized maximum likelihood (ML) estimation is an established method for circumventing problems that arise in the stability of parameter estimates in case the likelihood function 305 is relatively flat [58]. Instead of maximising the log-likelihood function log L(Θ|y), the penalized ML method finds the best estimateΘ for the unknown model parameter Θ by maximising the sum log L(Θ|y) + r(Θ). The penalty function r(Θ) is added to the log-likelihood function in order to pull the best estimateΘ for the parameter Θ towards a value that has some rationale as good guess for Θ in information outside of the likelihood function. The penalized 310 ML approach can be viewed as a method for introducing some tolerable degree of bias in exchange for a reduction in the sampling variability of parameter estimates [59].
From a Bayesian perspective, the penalty function r(Θ) can be interpreted as prior distribution that describes the information that one has on the parameter Θ outside of any information conveyed by the measured observables y. 315 In the context of the SVfit algorithm, we choose a penalty function r(m test(i) H ) which gives preference to low values of m τ τ in case the probability density P is relatively flat. Recall that the cross section for producing resonances decaying into τ lepton pairs decreases steeply with the mass of the resonance (cf. Fig. 5). The information that large mass values are less likely is not contained within the probability density defined by Eq. (19) and motivates the 320 use of a penalized ML approach with a penalty function r(m In previous applications of the SVfit algorithm for data analyses performed by the CMS collaboration, regularization functions of the following type were considered: We will focus on this type of regularization functions in this paper, but remark that from a Bayesian perspective a well motivated alternative choice would be to use as penalty function the logarithm of the cross section as function of mass.

325
Our choice of the parameter κ is performed with the objective of achieving an optimal compromise between reducing the high mass tail in the m τ τ distribution reconstructed for Z/γ * → τ τ background events on the one hand and causing no or at most a small bias on the m τ τ distribution reconstructed in signal events on the other hand. We find that the optimal value of κ depends on the experimental resolution on the p T of τ h and on p miss T 330 and hence needs to be adjusted to the experimental conditions. Higher (lower) experimental resolution favors a small (large) value of κ. The optimal value of κ may furthermore differ for events in which both τ leptons decay hadronically, events in which one τ lepton decays into hadrons and the other into an electron or muon, and events in which both τ leptons decay into electrons or muons, with larger (smaller) values of κ being favored in case both (none) 335 of the τ leptons decay hadronically.
The effect of adding a regularization term of the kind r(m · GeV −1 ) to the probability density P given by Eq. (19) on the distribution in m τ τ reconstructed in Z/γ * → τ τ background events is visualized in Fig. 7. The effect on events in which the Z boson recoils against a high p T jet, for which the probability density typically exhibits a 340 narrow maximum, is small, while the addition of the regularization term effectively reduces the high mass tail in the m τ τ distribution reconstructed in events in which the Z boson has little p T , for which the probability density decreases more slowly as function of m test(i) H (cf. Fig. 6).

345
A variant of the SVfit algorithm without proper normalization of the likelihood function is maintained for analyses of LHC Run 2 data. The algorithm differs from the SVfitMEM algorithm, described in Section 2, in two respects, altering the integrand in Eq. (1) as well as the procedure for computing the integral. The indicator function Ω(y), the PDF factor f (x a ) f (x b ), the flux factor 1 2 x a x b s , and the efficiency factor (p, θ) of Eq. (1) are omitted, as is the normalization to inclusive cross section times acceptance, given by the factor 1 σ(θ) A(θ) . Instead of the complete ME |M(p, m H )| 2 given by Eq.
(2), only the terms |BW τ →··· (p)| 2 for the decay of the τ leptons are retained. The equivalent to Eq. (19) reads: τ →··· (p)| 2 · W (p vis(1) |p vis(1) ) W (p vis(2) |p vis(2) ) W rec (p rec x , p rec y |p rec x ,p rec y ) .  41) is used (ordinate) respectively not used (axis of abscissae). The correlation is shown separately for events in which the the Z boson has little p T (left) and for events in which the Z boson recoils against a high p T jet (right).
Instead of computing probability densities P(p vis(1) , p vis(2) ; p rec x , p rec y |m H ) for a series of mass hypotheses m test(i) H and determining the value of m H that maximizes the probability density, the following integral is computed: τ →··· (p)| 2 · W (p vis(1) |p vis(1) ) W (p vis(2) |p vis(2) ) W rec (p rec x , p rec y |p rec x ,p rec y ) · F(p) .
The function F(p) in the integrand may be an arbitrary function of the momenta p (1) and p (2) of the two τ leptons. The integral is evaluated numerically, using a custom implementation of the Markov chain Monte Carlo integration method with the Metropolis-Hastings algorithm [60]. The actual value L(y) of the integral is irrelevant. The reconstruction of the mass m τ τ of the τ lepton pair is based on choosing recording the values of F(p) for each evaluation of the integrand in Eq. (43) by the Markov chain and taking the median of the series of F(p) values as the best estimate m τ τ for the mass of the τ lepton pair in a given event. The total number of evaluations of the integrand, referred to as Markov chain "states", amounts to 100 000 per event. The first 10 000 evaluations of the integrand are used as "burn-355 in" period and are excluded from the computation of the median. The transitions between subsequent states of the Markov chain are computed for the case that F(p) ≡ 1. This choice has the advantage that the sequence of Markov chain states does not depend on F(p), which allows for F(p) to consist of multiple components that can be evaluated by the same Markov chain. Each component may be an arbitrary function of the momenta p (1) and p (2) of the 360 two τ leptons. In the default implementation of the algorithm, the p T , η, φ and transverse mass, m T τ τ = (E y ) 2 , of the τ lepton pair are reconstructed in addition to the mass m τ τ . We refer to this version of the SVfit algorithm as the "classic" SVfit (cSVfit) algorithm.
The cSVfit algorithm covers two use cases: data analyses use it either because of its 365 capability to reconstruct kinematic observables of the τ lepton pair other than the mass m τ τ or because of its significantly reduced requirement on computing resources compared to the SVfitMEM algorithm. The p T of the H boson was used for the purpose of categorizing events in the SM H → τ τ analysis performed by the CMS collaboration during LHC Run 1 [13]. The p T was reconstructed by computing the vectorial sum of the momenta p vis of the 370 visible τ decay products and of the missing transverse momentum p miss T . We will demonstrate in Section 4 that reconstructing the p T of the H boson candidate by the cSVfit algorithm improves the resolution compared to taking the sum of the p vis and p miss T vectors. The transverse mass m T τ τ of the H boson candidate has been used as observable to discriminate signal from background in the CMS search for heavy H bosons in the first LHC Run 2 375 data [61], performed in the context of the minimal supersymmetric extension of the Standard Model (MSSM) [62,63].
Compared to the cSVfit algorithm, the version of the SVfit algorithm used for analyses of data recorded by the CMS experiment during LHC Run 1 used an incomplete expression for the product of the differential phase space element and for the ME modelling the τ 380 lepton decays. In particular the factor 1 z z 2 is equivalent to adding an artificial regularization term of the type described in Section 2.7 with κ = 4 to the likelihood function. This can be seen in the limit that the angles θ inv between the p inv and p vis vectors are zero for both τ leptons: , with m vis denoting the 385 measured mass of the visible τ decay products and log(P · GeV 8 ) + 4 · log(m ). The term log(m 4 vis · GeV −4 ) has been omitted from the sum in the last step. As the term log(m 4 vis · GeV −4 ) does not depend on m test(i) H , it has no effect on the reconstruction of m τ τ .

390
The performance of the m τ τ reconstruction is studied in simulated events. Samples of SM H → τ τ signal events are generated with the next-to-leading-order (NLO) program POWHEG v2 [64,65,66] for a H boson of mass m H = 125 GeV and for the gluon fusion (gg → H) and vector boson fusion (qq → H) production processes. We also study the m τ τ reconstruction in events containing heavy pseudoscalar Higgs bosons A of mass m A = 200, 395 300, 500, 800, 1200, 1800, and 2600 GeV, produced via gluon fusion, and in events containing heavy spin 1 resonances, of mass 2500 GeV, that decay into τ pairs. We denote the latter by the symbol Z . The A → τ τ and Z → τ τ signal samples are generated with the LO generator PYTHIA 8.2 [67]. The Z/γ * → τ τ background sample is generated with the LO MadGraph program, in the version MadGraph aMCatNLO 2.2.2 [68]. The sample contains 400 τ lepton pairs of true mass m true τ τ > 50 GeV. Most of the τ lepton pairs have a mass near the Z peak at m Z = 91.2 GeV [43], but the sample also contains events of significantly higher mass. Drell-Yan events of mass m true τ τ < 50 GeV are not relevant for this study, because they do not pass the selection criteria on p T and η that are applied on analysis level. All events are generated for proton-proton collisions at √ s = 13 TeV centre-of-mass energy. The 405 samples produced by MadGraph and POWHEG are generated with the NNPDF3.0 set of parton distribution functions, while the samples produced by PYTHIA use the NNPDF2.3LO set [69,70,71]. Parton shower and hadronization processes are modelled using the generator PYTHIA with the tune CUETP8M1 [72]. The latter is based on the Monash tune [73]. The decays of τ leptons, including polarization effects, are modelled by PYTHIA.
The samples are normalized according to cross section for the purpose of comparing different mass reconstruction algorithms in terms of signal-to-background separation. The cross section for the irreducible Z/γ * → τ τ background is computed at NNLO accuracy and amounts to 1.92 × 10 3 pb [74]. The cross sections for the SM H → τ τ signal have been computed as detailed in Ref. [75], with the updates described in Ref. [76] included.

415
The product of cross section times the branching fraction for the decay into a τ lepton pair amounts to 2.77 pb for SM H bosons produced via gluon fusion and to 2.37 · 10 −1 pb for SM H bosons produced via vector boson fusion. The A → τ τ and Z → τ τ samples are scaled to a product of cross section times branching fraction, for the decay into a pair of τ leptons, of 1 pb.

420
The events are studied on generator level. The p T , η, and φ of the electrons and muons produced in the τ lepton decays are assumed to be reconstructed perfectly. The τ h are built by adding the visible decay products of a given τ lepton, excluding the neutrinos. The experimental resolution on the p T of the τ h is simulated by sampling from the TF described in Section 2.3.1, while the η, φ, and m vis of the τ h are assumed to be reconstructed perfectly.  The resolution on the hadronic recoil is simulated by sampling from the TF described in Section 2.3.3.
Distributions in m τ τ are computed separately for events in which both τ leptons decay hadronically (τ h τ h ), events in which one τ lepton decays hadronically and the other into a muon (µτ h ), and events in which one τ lepton decays into a muon and the other into an 435 electron (eµ). The visible τ decay products are required to pass selection criteria on p T and η, which are motivated by the SM H → τ τ analysis performed by the CMS collaboration during LHC Run 1 [13]. Events in the τ h τ h decay channel are required to contain two τ h of p T > 45 GeV and |η| < 2.1. Events in the µτ h channel are required to contain a muon of p T > 20 GeV and |η| < 2.1 plus a τ h of p T > 30 GeV and |η| < 2.3. Events selected in 440 the eµ channel are required to contain a muon with |η| < 2.1 and an electron with |η| < 2.4. The lepton of higher p T (either the electron or the muon) is required to satisfy the condition p T > 20 GeV, while the lepton of lower p T is required to satisfy p T > 10 GeV. Similar selection criteria on p T and η of the visible τ decay products were applied in the H → τ τ analyses performed by the ATLAS collaboration during LHC Run 1 [78,79].

445
The m τ τ reconstruction in SM H → τ τ signal events is studied in event categories motivated by the SM H → τ τ analysis performed by the CMS collaboration during LHC Run 1 [13]: • 0-jet: Events containing no jets of p T > 30 GeV and |η| < 4.5.
• 1-jet non-boosted: Events containing one or more jets of p T > 30 GeV and |η| < 4.5 450 in which the H (respectively Z) boson satisfies p T < 100 GeV. Events in the 1-jet non-boosted category are required not to be selected in the 2-jet VBF category.
• 1-jet boosted: Events containing one or more jets of p T > 30 GeV and |η| < 4.5 in which the H (respectively Z) boson satisfies p T > 100 GeV. Events in the 1-jet boosted category are required not to be selected in the 2-jet VBF category.

455
• 2-jet VBF: Events containing two or more jets of p T > 30 GeV and |η| < 4.5, with at least one pair of jets satisfying m jj > 500 GeV and ∆η jj > 3.5.
The event categorization is based on generator level quantities. We do not expect that migrations between the event categories due to resolution effects significantly affect the conclusions of the m τ τ resolution studies.

460
The m τ τ distributions reconstructed by the SVfitMEM and cSVfit algorithms in these event categories in the τ h τ h , µτ h , and eµ decay channels are shown in Figs. 8 to 10. They are compared to the distributions in m τ τ reconstructed by the "collinear-approximation" (CA) method [18] and by the previous version of the SVfit algorithm, described in Ref. [23], to which we refer to as the SVfit "standalone" (SVfitSA) algorithm. The SVfitMEM and 465 cSVfit algorithms are utilized with and without the artificial regularization term described in Section 2.7. The value of κ used for each channel is chosen such that the optimal resolution on m τ τ , quantified in terms of the ratio of the root-mean-square (RMS) to the median M of the distribution, is attained. We find that the choice of κ = 5 for the τ h τ h channel, κ = 4 for the µτ h channel, and κ = 3 for the eµ channel performs well for the SVfitMEM as well 470 as for the cSVfit algorithm.
The distributions in m τ τ reconstructed by the cSVfit and SVfitMEM algorithms with artificial regularization term and by the SVfitSA algorithm are very similar. The SVfitSA algorithm performs well without adding an artificial regularization term of the type described in Section 2.7 to its likelihood function. This is because the effect of the missing factor 1 z 2 in 475 the likelihood function used by the SVfitSA algorithm (cf. Section 3) is equivalent to using an artificial regularization term with κ = 4. Regardless of the choice of κ, the peaks of the m τ τ distributions reconstructed by the different versions of the SVfit algorithm are close to the true value of the mass of the τ lepton pair in the 1-jet boosted and 2-jet VBF categories. In the 0-jet and 1-jet non-boosted categories, the distributions in m τ τ reconstructed by the 480 cSVfit and SVfitMEM algorithms with κ = 0 exhibit a tendency to overestimate the mass of the τ lepton pair, resulting in pronounced high mass tails. The choice of a small positive κ reduces this tendency, makes the distributions in m τ τ peak close to the true mass of the τ lepton pair and significantly reduce the high mass tails.
The effect of adding the artificial regularization term is largest for events reconstructed 485 in the 0-jet category, which are the most difficult to reconstruct. This is because in events in which the H or Z boson has low p T the τ leptons are typically "back-to-back" in the transverse plane (∆φ τ τ ≈ π), with the effect that the neutrinos produced in the τ lepton decays are emitted in opposite hemispheres and their contribution to p miss T cancels. The cancellation of neutrino momenta causes mass hypotheses of low m test(i) H ≈ m vis and of high 490 m test(i) H m vis to be degenerate in terms of the probability density P, computed according to Eq. (19). The best estimate for the mass of the τ lepton pair in a given event may fluctuate due to mismeasurements, within the experimental resolution, of the components p rec x and p rec y of the hadronic recoil or, to a lesser extent, of the p T of τ h , degrading the resolution on m τ τ . The resolution on m τ τ is significantly higher in events selected in the 1-jet and 2-jet VBF 495 categories, in which the τ lepton pair typically recoils against high p T jets. As the p T of the H respectively Z boson increases, the angle ∆φ τ τ between the τ leptons decreases, due to the Lorentz boost in direction of the H or Z boson. The momenta of the neutrinos produced in the τ lepton decays add constructively in this case, with the effect that the mass of the τ lepton pair is constrained by the measured value of p miss T . The correlation between the p T 500 of the τ lepton pair and the angle ∆φ τ τ is visualized in Fig. 11 for SM H → τ τ signal and for Z/γ * → τ τ background events. The events are selected in the µτ h decay channel. The distributions for events selected in the τ h τ h and eµ decay channels are similar.
The variation of the resolution on m τ τ across event categories is most pronounced in the eµ and least pronounced in the τ h τ h decay channel. This is because the fraction of τ lepton energy carried by the visible τ decay products is typically high for hadronic τ decays and typically low for leptonic τ decays, cf. Fig. 2. The consequence is that for events in the τ h τ h decay channel, regardless of the angle ∆φ τ τ between the τ leptons, the best estimate      for the mass of the τ lepton pair is typically not much higher than m vis , as the energies of the neutrinos produced in hadronic τ decays are known to be most likely small. For events in the eµ decay channel on the other hand, τ lepton decays with high energetic neutrinos are known to be likely and, provided they are compatible with the measured value of p miss T , mass hypotheses of low m test(i) H ≈ m vis , corresponding to the case z (1) ≈ 1 and z (2) ≈ 1, and of high m test(i) H m vis , corresponding to the case z (1) 1 and z (2) 1, are often degenerate in terms of the probability density P.

515
Numerical values for the resolution in m τ τ achieved by the different algorithms are given in Tables 1 to 3. The mass of the visible τ decay products, m vis , is given in the tables for comparison. The resolution is quantified separately for the low mass and for the high mass tail of the mass distribution reconstructed by a given algorithm. The symbol σ l /M (the symbol σ h /M) denotes the ratio of the difference between the median and the 16% quantile  as well as by the CA method, peak close to m true τ τ , the distributions in m vis exhibit significant shifts towards lower mass. The shift of the m vis distribution is in general highest for events in the eµ decay channel and lowest for events in the τ h τ h decay channel, reflecting the fact that the visible τ decay products typically carry a lower fraction of τ lepton energy in leptonic compared to hadronic τ decays. The shift is more pronounced in events selected in the 1-jet 535 and 2-jet VBF categories and less pronounced in the 0-jet category. In particular in the τ h τ h channel, the p T cuts that are applied on the τ h remove most of the Z/γ * → τ τ events in the 0-jet category, except for a few events with large m vis . The events selected in the 1-jet and 2-jet VBF categories typically have smaller m vis , as m vis decreases proportional to the cosine of the angle between the τ leptons: m vis ≈ p vis(1) T cosh η vis(1) · p vis(2) T cosh η vis(2) · 540 1 − cos (τ (1) , τ (2) ) . All algorithms can be trivially calibrated such that the median of each mass distribution coincides with the true mass of the τ lepton pair, by scaling the output of the algorithm by a suitably chosen constant. The advantage of quantifying the resolution in terms of the ratios σ l /M and σ h /M is that these ratios are invariant under such scaling.
The ratio S/(S + B) of the number of SM H → τ τ signal to the sum of signal plus 545 Z/γ * → τ τ background events in general increases with jet multiplicity and with the p T of the τ lepton pair (p H T respectively p Z T ). The categories with the highest signal-to-background ratio, the 1-jet boosted and the 2-jet VBF category, in fact provide most of the sensitivity of the SM H → τ τ analysis. The improvement in mass resolution provided by the SVfit algorithm is largest in these categories, enhancing the separation of the SM H → τ τ signal 550 from the Z/γ * → τ τ background where it matters most. In the SM H → τ τ analysis performed by the CMS collaboration during LHC Run 1, the use of m τ τ reconstructed by the SVfit algorithm has increased the expected significance for observing a signal by ≈ 40% compared to m vis [13], corresponding to a gain of a factor two in luminosity.
The reconstruction of m τ τ by the CA method performs suboptimal compared to SVfit.
In particular the pronounced high mass tails in the m τ τ distribution, arising from resolution effects, significantly reduce the sensitivity for observing a signal, as they cause a sizeable fraction of Z/γ * → τ τ background events to be reconstructed near the signal region m τ τ ≈      125 GeV. A further disadvantage of the CA method is that it fails to yield a physical solution for approximately half of the events, whereas the SVfit algorithm provides a physical solution 560 for every event.
The performance of the cSVfit algorithm to reconstruct the p T , η, and φ of the τ lepton pair is studied in SM H → τ τ signal events produced via the gluon fusion process, separately for the decay channels τ h τ h , µτ h , and eµ. Distributions of the difference between reconstructed and true p T , η, and φ of the H boson are shown in Fig. 12. Compared to the case 565 that the H boson p T and φ are reconstructed by taking the vectorial sum of the momenta of the visible τ decay products and of p miss T , the cSVfit algorithm improves the resolution by 10-20%. The pseudo-rapidity η of the H boson can only be reconstructed with the cSVfit algorithm. Typical resolutions on the p T , η, and φ of the H boson, reconstructed by the cSVfit algorithm, amount to 10 GeV, 0.4 rad, and 0.8 rad, respectively. The SVfit algorithm significantly improves the separation of the heavy A and Z boson signals from the irreducible Z/γ * → τ τ background in all three decay channels. Numerical values for the median M as well as for the resolutions σ l /M and σ h /M are given in Tables 4  to 6. 580 We interpret the fact that the performance of the SVfitMEM algorithm is similar for an A → τ τ signal of mass 2600 GeV and for a Z → τ τ signal of mass 2500 GeV as evidence that the usage of a LO ME for the gluon fusion process gg → H in the SVfitMEM algorithm represents no limitation for using the SVfitMEM algorithm in data analyses of τ lepton pair production other than studies of Higgs boson production.

585
The improvement in signal-to-background separation provided by the SVfit algorithm in searches for high mass resonances decaying to τ lepton pairs is illustrated in Fig. 16.
We conclude the discussion of the performance of the SVfit algorithm with a comparison of the computing time requirements of the SVfitMEM, cSVfit, and SVfitSA algorithms. The computing time is dominated by the evaluation of the integrand during the numeric inte-590 gration of Eqs. (19) and (43), and of Eq. (2) of Ref. [23], respectively. The computing time requirement is expected to be highest for the SVfitMEM algorithm, due to the time needed for evaluation of the PDF and of the TF in Eq. (19).
The is chosen such that the resolution on m τ τ is within 1% compared to the resolution obtained in case an infinite (very large) number of evaluations of the integrand is performed. The number of evaluations is higher in case of the SVfitMEM algorithm, accounting for the fact that the additional PDF and TF factors raise the variation of the integrand, such that a higher number of evaluations is necessary in order to reach a given precision on the value of the integral.
The number of mass hypotheses tested for an event varies depending on the conditions described in Section 2.6. For the series of m        series is farther away respectively less far away from m true τ τ at which the value of P is expected 610 to reach its maximum P max . A larger number of mass hypotheses may be required for background events, in particular for backgrounds containing neutrinos that do not originate from τ lepton decays, such as W boson production and the production of top quark pairs, as the values of P may be almost degenerate over a large range in m test(i) H . The numeric integration is performed by the VAMP algorithm in case of the SVfitMEM 615 algorithm (cf. Section 2.6) and by the VEGAS algorithm in case of the SVfitSA algorithm. In case of the cSVfit algorithm, the numeric integration is performed by a custom implementation of the Markov chain Monte Carlo method, as described in Section 3. In the latter case, the integrand is evaluated 100 000 times per event and there is no iteration over a sequence of mass hypotheses, cf. Eq. (43).

620
The computing time is measured by CPU time, using a single core of a 2.30 GHz Intel R Xeon R E5-2695 v3 processor. The distribution of the CPU time, in units of seconds per event, spent by the SVfitMEM, cSVfit and SVfitSA algorithms to compute m τ τ in SM H → τ τ signal events and in Z/γ * → τ τ background events is shown in Fig. 17. Numerical values of the mean and RMS of the distributions are given in Table 7. The cSVfit 625 algorithm requires about 0.5s of CPU time per event to reconstruct the mass m τ τ of the τ lepton pair, as well as its p T , η, φ, and transverse mass m T τ τ . The CPU time is reduced by a factor 5 to 30 compared to the SVfitSA algorithm, owing to the more efficient Markov chain Monte Carlo integration method compared to computing the probability density P for a series of mass hypotheses. The computing time requirement of the SVfitMEM algorithm is 630 higher by a factor 3 to 4 compared to the SVfitSA algorithm.

Discussion
Considering that the resolution on m τ τ , quantified by σ l /M and σ h /M, achieved by the SVfitMEM and cSVfit algorithms is almost identical, we find that the cSVfit algorithm represents the best compromise between physics performance and computing time requirements 635 in practical applications of the SVfit algorithm.
The optimal resolution is achieved in case an artificial regularization term of the type described in Section 2.7, with small positive κ, is added to the likelihood function. We expect the optimal choice of κ to depend on the experimental resolution as well as on the rates of signal and background processes, and we recommend to perform a reoptimization of 640 κ in practical applications of our algorithm.
The merit of the SVfitMEM algorithm is that the formalism to treat τ lepton decays in the ME method, developed for the SVfitMEM algorithm, can be used in future applications of the ME method to data analyses with τ leptons in the final state. An example for such an application is the analysis of SM H boson production in association with a pair of top quarks 645 (ttH) in final states with a τ lepton [80], in which the existence of neutrinos from W → ν decays preclude the reconstruction of the H boson mass by the SVfit algorithm.

Summary
An algorithm for reconstruction of the H boson mass in events in which the H boson decays into a pair of τ leptons has been presented. The relative resolution on the H boson mass 650 amounts to typically 15-20%. The algorithm has been used in data analyses performed by the CMS collaboration during LHC Run 1. It improves the sensitivity of the SM H → τ τ analysis by ≈ 40%, corresponding to a gain in luminosity by about a factor of two.
An improved version of the algorithm has been developed in preparation for LHC Run 2. Two variants of the improved algorithm have been implemented. The first variant, SV-655 fitMEM, has been rigorously developed within the formalism of the ME method. It is based    Table 7: CPU time, in seconds per event, needed to reconstruct m τ τ by the SVfitMEM, cSVfit, and SVfitSA algorithms in simulated Z/γ * → τ τ background and SM H → τ τ signal events in the decay channels τ h τ h , µτ h , and eµ.
on proper normalization of the probability density P, given by Eq. (19). The second variant of the improved algorithm uses a likelihood function of arbitrary normalization. It allows to compute, on an event-by-event basis, any kinematic function of the two τ leptons and provides a substantial reduction in computing time requirements. A further improvement 660 concerns the modelling of the experimental resolution on the p T of τ h via TF, described in Section 2.3.1.
The performance of the algorithm has been studied in simulated SM H → τ τ signal and Z/γ * → τ τ background events, as well as in simulated samples of heavy pseudoscalar Higgs bosons and heavy spin 1 resonances. The SVfit algorithm is found to perform well in all event 665 categories and over the full range in true mass of the τ lepton pair that we studied.
The development of the formalism to handle τ lepton decays in the ME method constitutes an important result of this paper. The formalism allows one to extend the matrix elements generated by automatized tools such as CompHEP or MadGraph by the capability to handle hadronic as well as leptonic τ decays. We expect that the formalism will be very useful for 670 future applications of the ME method to data analyses with τ leptons in the final state.

Appendix
We derive here the relations for the product of the squared moduli of the ME and the phase space elements dΦ (i) τ h ν τ and dΦ (i) ν ν τ for, respectively, the decays τ → hadrons + ν τ and τ → ν ν τ , given by Eq. (33). We start with the simpler case of hadronic τ decays in Section 7.1 and turn to the more complex case of leptonic τ decays in Section 7.2. For clarity 680 of notation, we omit the hat symbol in this section and use the convention that all symbols refer to true values in the laboratory frame, unless indicated explicitely otherwise.

The decay τ → hadrons + ν τ
We treat hadronic τ decays as a two-body decay into a hadronic system τ h plus a ν τ , as explained in Section 2.1, and take the squared modulus of the ME to be a constant, which 685 we denote by |M eff τ →τ h ν τ | 2 . We further denote the momentum of the neutrino produced in the τ decay by p inv and its energy by E inv (cf. Section 2.4). For reasons that will become clear later, we allow the neutrino to have non-zero mass m inv .
We define z = E vis /E τ according to Eq. (14) and replace the integration over dE inv by an integration over z. The Jacobi factor related to this transformation is: We then perform the integration over d cos θ inv . Following the convention that we introduced in Section 2.4, we choose the coordinate system such that θ inv is equal to the angle between the p vis and p inv vectors. The δ-function δ E τ (p vis , p inv ) − E vis − E inv depends on cos θ inv via: E τ (p vis , p inv ) = |p τ | 2 + m 2 τ = p vis + p inv 2 + m 2 τ = |p vis | 2 + |p inv | 2 + 2 p vis · p inv + m 2 τ = |p vis | 2 + |p inv | 2 + 2 |p vis | |p inv | cos θ inv + m 2 τ .

7.2
The decays τ → e ν e ν τ and τ → µ ν µ ν τ 695 We treat leptonic τ decays as three-body decays and do account for the ME. Assuming the taus to be unpolarized, the squared modulus of the ME is given by [44]: where G F denotes the Fermi constant, given by Eq. (6). The product of the squared modulus of the ME and the phase space element dΦ ν ν τ reads: in this frame. Performing the integration over v 0 , we obtain: We where we have chosen the polar axis such that it is parallel to p vis . The ME given by Eq. (54) evaluates to: in the rest frame of the neutrino pair. The energy of the τ lepton and of the electron or muon are related by: from which it follows that: Substituting Eq. (60) into Eq. (59) yields: with E τ and E vis given by Eq. (62). Note that I inv solely depends on a single kinematic variable, m inv , as m τ and m vis are constants.

705
Before substituting Eq. (63) into Eq. (57), we perform a variable transformation from (u 0 , u 1 , u 2 , u 3 ) to (m 2 inv , u 1 , u 2 , u 3 ) = (2 (u 0 2 − u 1 2 − u 2 2 − u 3 2 ), u 1 , u 2 , u 3 ). The magnitude of the determinant of the Jacobi matrix for this transformation is |J| = 4 u 0 , from which it follows that: Substituting Eqs. (63) and (64) into Eq. (57), we then obtain: with u 0 ≡ E inv and u ≡ p inv . The expression in Eq. (65) is very similar in structure to the third line of Eq. (44), if we identify the integration over the momentum of the neutrino pair, given by the phase space element d 3 u in Eq. (65), with the integration over the neutrino momentum d 3 p inv in Eq. (44). The differences between the formulae for leptonic and hadronic τ decays are the additional integration over dm 2 inv and the factor I inv /2 in Eq. (65), which 710 replaces the factor |M τ →τ h ν τ | 2 in Eq. (44). Note that Eq. (65) as well as Eq. (44) refer to the laboratory frame. The rest frame of the neutrino pair was used only for the purpose of evaluating the Lorentz invariant integral I inv .
Since, according to Eq. (63), I inv depends solely on the integration variable m inv , I inv can be treated as a constant when performing the integration over d 3 p τ and d 3 u. We can hence use Eq. (51) of Section 7.1 to express Eq. (65) by: where the angle φ inv specifies the orientation of the neutrino pair momentum vector p inv with respect to the momentum vector p vis of the electron or muon.
The opening angle between the vector p inv and the direction of the electron respectively muon is given in analogy to Eq. (48) by: We define: f p vis , m vis , p inv = I inv to obtain: the result that we quote in Eq. (33).