Reconstructing the invisible with matrix elements

We propose a fully flexible method to perform an hypothesis test between signal and background based on the Matrix Element Method in the presence of multiple invisible particles. The proposed method performs a mapping of the measured final state onto its minimal hypersurface of degrees of freedom for a given process and then maximises the matrix element on this hypersurface separately for signal and background. To show how performant the method is in separating signal from background, we apply it to the prominent partly invisible decay of a Higgs boson into a muon-antimuon pair and two muon-neutrinos via two W bosons.


INTRODUCTION
The extraction of few interesting signal events from a large number of Standard Model background events is one of the biggest challenges at the Large Hadron Collider (LHC). Depending on the nature and kinematic topology of the signal, different techniques and strategies have been devised to perform this task.
In general, in a first step, observables that are characteristic to the signal have to be constructed. This could entail simple observables, like the transverse momentum of reconstructed objects, e.g. leptons, photons or jets, or the total amount of missing energy, or more sophisticated observables, like jet substructure observables. If the signal is a heavy resonance that decays into electroweak gauge bosons or the top quark, which in turn have a large branching ratio into jets, studying the substructure of jets is a popular way to separate them from large QCD backgrounds [1,2]. Either kind of observables can then be further processes using increasingly popular multivariate analysis (MVA) techniques, e.g. neural nets or boosted decision trees, to perform an hypothesis test between signal and background.
An alternative way of performing such discrimination is to use the measured particles' momenta as direct input to the evaluation of the matrix element of the assumed underlying process, thereby evaluating if the final state was more likely to be produced by the signal or background hypothesis. This approach is called Matrix Element Method (MEM) [3][4][5]. As it is based on an analytic/numerical calculation of the process, this method can be directly applied to data and does not require training on Monte-Carlo-generated pseudo-data.
However, while MEM has been used very successfully in a wider range of applications and measurements [6][7][8][9][10][11], and recent developments extended it to the substructure of jets [12,13], next-to-leading order accuracy [14][15][16][17][18] and even to an arbitrary number of reconstructed final state objects [19,20], as an all-information approach, it always had its short-comings when multiple invisible objects are present in the final state.
Here, we propose a fully flexible method to discriminate between signal and background based on the Matrix Element Method in the presence of multiple invisible particles. The method maps the set of measured final state into a manifold parametrised by the minimal degrees of freedom for a given process. Such new manifold, which we will refer to as a "minimal hypersurface", contains the set of all final states that can be produced for a minimal set of degrees of freedom. Such parametrisation can then be sampled in a unique way to produce the set of all final states compatible with the observed event. With such reparametrisation, one can then maximise the matrix element separately for signal and background. On the one hand, this allows to make an educated guess of the 4momenta of the invisible particles in the process, and on the other hand it allows to construct a variable χ as the ratio of the matrix elements that can be used to separate signal from backgrounds. We would like to emphasize that while this method is based on matrix elements, it is not identical to the so-called Matrix Element Method [3][4][5], as it makes approximations in order to calculate the final discriminator faster.
Final states with multiple missing energy particles became an increasingly important signature in searches for a plethora of new physics scenarios at the LHC, e.g. searches for dark matter [21,22], R-parity conserving supersymmetry [23,24], large extra dimensions [25], or even anomalous couplings of the Higgs boson [26,27]. Thus, a flexible method not relying on Monte-Carlo-generated pseudo-data can be readily applied to ongoing searches and measurements at the LHC's multipurpose experiments and increase their discovery potential.
We emphasize that there are several flavours of the matrix element method, such as the ones described in [14], [16], [17] or in [19]. The method we propose is based on the matrix element method, but it is not identical to it. The method proposed here aims at making the procedure easier and faster, while maintaining the ability to separate between signal and background, in the case of missing energy particles.

DESCRIPTION OF THE METHOD
The matrix element method assigns probabilities to signal and background for each event of a sample. The most attractive feature of this method is that it makes maximal use of both the experimental information and the theoretical model. It associates a weight to each event based on the value of the matrix element (i.e., the scattering amplitude) for that specific final state configuration for each of the hypotheses. The weight associated with an experimental event x, given a set of hypotheses α, is where |M α | 2 (y) is the squared leading-order matrix element, dΦ(y) is the phase-space measure, (including the parton distribution functions) and W (x, y) is the transfer function which describes the evolution of the final state parton-level configuration in y into a reconstructed event x in the detector. It is defined by the conditional probability to observe an experimental event x when the truth parton event is y. This function summarises a lot of different physics effect including parton-shower, hadronization, detector resolution and so on [28]. The normalization by σ α in Eq. (1) (dubbed the visible cross-section) ensures that P α (x) is a probability density, P α (x)dx = 1, if the transfer function is normalized to one. As is evident from the definition in Eq. (1), the calculation of each weight involves a non trivial multi-dimensional integration of complicated functions over the phase space. Even if the problem of computing the weights for arbitrary models and processes can be automated, e.g. as implemented in MadWeight [5], such calculations remain extremely CPU intensive and are subject to numerical inaccuracies. We instead propose to replace the convolution of the matrix element with the transfer function by a maximisation procedure over the phase-space volume 1 Φ, 1 Note that in the equation below, the factor |Mα| 2 W (x, y) is not In order to use efficiently the maximization algorithms over a highly dimensional space, it is important to parametrize the phase-space in an optimal way. In particular, the invariant mass of every propagator that can be on-shell needs to be used as a degree of freedom of the phase-space, as well as all the angles of visible particles (due to the high detector resolution on those quantities). Such parametrization allows to reduce the variance of the function by smoothing the peak and it helps to find its maximum more efficiently. We rely on MadWeight to find such a parametrization, which provides a large set of changes of variables that can be combined to reach the optimal parametrization of the phase-space. For both Eq. 2 and 1, the amount of CPU time needed for each event will be related to the presence/absence of local maximum in the function which are typically created by some tension between the partons of the phasespace favoured by matrix-element and the one favoured by the transfer-function. Due to this origin, we do not expect a huge number of false maxima and it is quite simple to identify all of them and find the global maximum. This problem is much more complex in the case of the full phase-space integration where the contribution of each of those phase-space region needs to be correctly evaluated to obtain a good estimator of the weight. Obviously this also means that using only the maximum reduces the information encoded in our final weight and will reduce the sensitivity of the method (as it should be evident from the Neyman-Pearson Lemma).
After finding the most likely final state configuration, given a limited amount of information 2 , we construct an observable χ, which classifies each event on whether it appears more signal-or background-like: A selection requirement can be applied on χ to reject background, improving the analysis sensitivity. The significant gain in speed and high performance of the classifier, allows one to extend it to complex final states with many objects. The method can be integrated straightforwardly into the EventDeconstruction approach [19], thereby extending EventDeconstruction, which was already designed to handle an arbitrary number of visible final state objects, to final states with invisible particles. Thus, even if this letter focuses on a single example, the method is entirely generic and can be applied to a dimensionless, however one can easily renormalise such quantity by a proportionality constant that cancels out the dimensionful component of the matrix element squared. Such normalisation factor would cancel out for signal and background events with the same number of unmeasured final state particles. large class of analyses. We will release a generic code [29], which allows to apply the above method efficiently for any process and set of transfer functions, hence providing the same flexibility as MadWeight.
We note additionally that the introduction of transfer functions are necessary when the measured objects have a much worse momentum resolution than required to map out the fast change of the matrix element. More precisely, for example, in the process pp → HZ with subsequent decay H → bb, the matrix element is proportional to the Breit-Wigner propagator H +imΓ H , and is thus maximised when (p b,1 + p b,2 ) 2 = m 2 H . However, because the width of the Higgs boson is only ∼ 4 MeV and (p b,1 + p b,2 ) 2 can experimentally only be reconstructed with a precision of O(GeV), measurement uncertainties dominate the value of the matrix element. Thus, one introduces transfer functions over which one integrates to ensure that the maximum contribution from the matrix element (i.e. when (p b,1 + p b,2 ) 2 = m 2 H ) is included in the classification between signal and background. Thus, transfer functions are particularly useful in the decay of resonances, when the matrix element changes quickly. However, when the measurement uncertainties on the momenta of particles are small compared to the change of the matrix element over the change of the momentum they can be neglected.
This motivates the maximisation procedure we propose here. Invisible particles arise in the Standard Model, and in most envisioned expansions, in decays of electroweak (or heavier) resonances. Thus, we expect the matrix element to be maximised when m 2 W = (p ν + p ) 2 and when m 2 H = (p W,1 + p W,2 ) 2 (in the signal). When the matrix element is peaked in such phase space regions (which is the case for most processes including invisible particles) our approach is a good approximation to integrating over all degrees of freedom for all possible momenta as the final weight is dominated by the maximum of the matrix element.
Before we apply the matrix-element-method, we select candidate events with the following event selection cuts, which render all but one irreducible background process insignificant. We select muons with a minimum transverse momentum requirement of p T,µ > 10 GeV and a requirement that the absolute value of the pseudo-rapidity is |η µ | < 2.5, to ensure that the muons are within the range of the detector's tracker system. The experimental resolution on the momenta of the muons are precise enough for us to assume their experimental uncertainty to be negligible. Thus, we define W (x, y) of Eq. 2 as To reduce the muon mis-identification rate, the sum I of charged particles within ∆R(µ, particle) < min(0.3, 10 GeV/p µ T ) has been required to satisfy I/p µ T < 0.06. With these basic selection cuts, for √ s = 13 TeV and an integrated luminosity of L = 30 fb −1 , and no detector simulation, we obtain a signal-to-background ratio of S/B 0.03 and a statistical sensitivity of S/ √ B 3.06, as shown in Table I. For the signal pseudo-data generated, we can now use the method discussed to evaluate the weight for a signal event to look like signal w S (S) or to look like background w B (S). The impact of initial state radiation is dampened by implementing a boost back technique of the reconstructed momenta of the lepton as suggested in [6]. The four free parameters defining the phase space Φ over which we maximise the matrix elements areŝ, s 13 , s 24 , as shown in Fig. 1, and the rapidity of the full system y all , where all includes the two muons and the missing trans-verse energy. In the example at hand, we can impose further boundary conditions, i.e. √ŝ m h , √ s 13 m W and √ s 24 < m W for the signal and 2m W < √ŝ < 3m W , √ s 13 m W and √ s 24 m W for the background 3 .
Despite limiting the four-dimensional parameter space, the matrix-element weighted hypersurface is complicated enough to give rise to multiple minima or to fail to give a physical solution for the matrix element entirely. As we probe the parameter space spanned by the unmeasured degrees of freedom, subject to restrictions mentioned previously, it may happen that no probed parameter space satisfies such restrictions. This is a consequence of using Leading-Order matrix-element and simplified transfer function. A detailed study can relate (most of) those events without any physical solution to the presence of radiation not correctly handled by the transfer function. Thus, to find the global maximimum we rerun the maximisation procedure with randomly modified initial conditions n r = 500 times for signal and background each 4 .
Thus, we can calculate the weight for the signal and background hypotheses w S and w B , respectively for signal and background events. We show all four distributions in Figs. 2(a) and 2(b). Event kinematics which do not result in a physical configuration for the signal or background hypothesis give either w S = 0 or w B = 0. We do not show such events in Figs. 2(a) and 2(b), but their fraction can be inferred from Table I. A fairly large number of background events fail to pass the kinematic requirements to look like signal, i.e. resulting in w S (B) = 0. This behaviour is beneficial for the significance of the analysis, as such background events have zero probability to mimic the signal.
After vetoing all events where w S,B (S, B) = 0 we find S/B = 0.08, S/ √ B 4.91 and show the distribution of weights in Fig. 2(c). While a comparison of the absolute weights for the signal and background hypothesis does not allow for a strong separation on an event-by-event basis (see the distributions on the horizontal and vertical axes of Fig. 2(c)), taking the ratio for each event results in a strong discrimination between signal and background, see Fig. 3. For example, by requiring χ ≥ 10 we reject 81% of background while still accepting 80% of signal, resulting in S/B 0.12 and S/ √ B 5.59. In order to estimate the impact of a selection requirement in the observable log(χ), we scan all 3 We tested larger windows for √ŝ but did not find them to change the background weights significantly. The asymmetric phasespace cuts on √ s 13 and √ s 24 are flipped half of the time when maximizing over the phase space. 4 We have varied nr between 0 and 500 and find for nr > 150 the change of w S and w B to be insignificant.  While the momenta of the charged leptons can be measured very precisely, the total amount of missing transverse energy instead is subject to experimental uncertainties. Such uncertainties can potentially affect the ROC curve and overall significance negatively. To estimate the impact of this uncertainty on our method we include a 10% resolution effect by smearing the missing transverse energy with a gaussian distribution.
Both in the ROC curve and Table I, we show the effect of the resolution effect on the missing transverse energy for this method. We find however, that such effect reduces s/ √ b only slightly from 5.59 to 5.46.
Instead of a cut and count procedure, one can use the full shape of the χ distribution of Fig. 3 to set a CLs [37,38] limits on the Higgs-W coupling. Note that, in this procedure, the χ variable is never used for the statistical interpretation directly, since it is biased. Instead, we histogram the variable χ for signal (under several g H,W W conditions), add it to the histogram of the background and use the bin contents (under a chosen binning) to define a likelihood function given by the product of Poisson functions with means at the resulting histogram bin contents. Such procedure is also done for the backgroundonly hypothesis to define a background-only likelihood. One can then define a likelihood-ratio function, as required for the CLs method. Including the 10% resolution effect on the missing transverse energy reconstruction, one can set a 95% CL limit on the Higgs-W-boson coupling at g H,W W ∈ [0.65, 1.25] × g H,W W,SM . While a direct comparison is difficult due to the different collider energies, the limit we obtain is already better than the one from the full combined 7 and 8 TeV data set for the gluon-fusion Higgs production process with subsequent decay into W bosons [39].

SUMMARY AND CONCLUSION
We have proposed a matrix-element method, designed to perform a hypothesis-test in the presence of multiple invisible particles in the final state. Without integrating over the phase space the most-likely kinematic configuration is calculated separately for signal and background. We make full use of the information available on the particle involved in the process.
We applied this method to separate the process pp → H → W W * → µ + µ − ν µνµ from the irreducible background pp → W W → µ + µ − ν µνµ . Using only objects that are experimentally well under control, i.e. the momenta of the muons, from which we calculate the missing transverse energy, we are able to set a strong limit on the Higgs-coupling to W bosons, assuming an integrated luminosity of 30 fb −1 at √ s = 13 TeV.
Other methods to reconstruct partly invisible final states have been devised before, e.g. m T2 [11,40] or boosted kinematics [41]. However, this matrix element method is not relying on a specific kinematic structure for the decays, e.g. the presence of particles with the same mass, or the number of invisible final state particles. We will release a generic Monte-Carlo implementation of this method in a future publication [29], thereby showing the flexibility and applicability to a wide range of beyond the Standard Model scenarios.