Discoveries far from the Lamppost with Matrix Elements and Ranking

The prevalence of null results in searches for new physics at the LHC motivates the effort to make these searches as model-independent as possible. We describe procedures for adapting the Matrix Element Method for situations where the signal hypothesis is not known a priori. We also present general and intuitive approaches for performing analyses and presenting results, which involve the flattening of background distributions using likelihood information. The first flattening method involves ranking events by background matrix element, the second involves quantile binning with respect to likelihood (and other) variables, and the third method involves reweighting histograms by the inverse of the background distribution.

The prevalence of null results in searches for new physics at the LHC motivates the effort to make these searches as model-independent as possible. We describe procedures for adapting the Matrix Element Method for situations where the signal hypothesis is not known a priori. We also present general and intuitive approaches for performing analyses and presenting results, which involve the flattening of background distributions using likelihood information. The first flattening method involves ranking events by background matrix element, the second involves quantile binning with respect to likelihood (and other) variables, and the third method involves reweighting histograms by the inverse of the background distribution.
The CERN Large Hadron Collider (LHC) will soon resume operation, probing energies never before accessed with colliders. Ideas about what sort of new physics it will discover often focus on models that resolve the hierarchy problem [1] or provide relic dark matter candidates [2]. There are a great variety of ideas in each category. However, the lack of convincing evidence for new physics at the LHC to date suggests that we may be looking in the wrong places. We therefore consider methods that will allow the discovery of any departure from known physics.
The Matrix Element Method (MEM) [3] and similar multivariate analyses [4] have been used with success in LHC experiments. A particularly dramatic example has been the use of the MEM in the four-lepton channel for the discovery of the Higgs Boson [5] and the measurement of its properties [6]. Such analyses used variables, such as MELA KD [7] or MEKD [8] that involve the ratio of signal and background matrix elements. Clearly these variables are, therefore, optimized to the appropriate signal and background hypotheses.
A natural question is whether we can preserve, to some degree, the sensitivity of the MEM without assuming a specific signal hypothesis. We argue that this is possible. Specifically, one can use the log likelihood of events, calculated using the likelihood for background events, as a test statistic in a way that allows a model-independent discovery of new physics. We then demonstrate how the optimal MEM-based (as well as other) techniques can be used to create flat distributions for the background with respect to kinematic variables of interest, including, most importantly, variables derived from the matrix element.
The Matrix Element Method. According to the Neyman-Pearson lemma [9], the optimal test statistic for comparing hypotheses H 0 and H 1 is provided by the likelihood ratio: where Λ H0 (Λ H1) is the likelihood for the hypothesis H 0 (H 1 ) as a function of the data, which we assume consists of N events, E i . In the MEM, the likelihood for a given event is calculated using the expression where M Hi,kl is the theoretical matrix element for hypothesis H i , f k and f l are parton distribution functions (pdf) as a function of momentum fractions x 1 and x 2 , while σ(H i ) is the total cross section after acceptances, efficiencies, etc. The likelihood for a set of N events {{p vis i } j } (where j ranges from 1 to N ) is simply the product of the likelihoods for each event: Thus the likelihood ratio (1) contains the product of ratios of event-by-event likelihoods described in Eq. (2). Often, the two hypotheses (H 0 and H 1 ) will involve the same final state, hence factors due to the phase space integrals in Eq.
(2) will cancel in the likelihood ratio. We are then left with a ratio of squared matrix elements, possibly weighted (in the case where the hypotheses involve different initial state partons) by pdfs. These squared matrix elements contain a great deal of information about the process, including the pole structure, spin correlations, etc. While the implementation of an analysis using the likelihood ratio (1) as a test statistic may sometimes be challenging in practice, conceptually the implementation is straightforward and the sensitivity is, at least in principle, optimal [7,8].
Discovery from Background Likelihood Distributions. The limitation of the MEM is that we must arXiv:1405.5879v1 [hep-ph] 22 May 2014 know the signal process in order to calculate the appropriate likelihood. As a result, if we do not know what signal model we are looking for we can no longer consider the likelihood ratio, as we know only one hypothesis, the background. It will still be useful, however, to use the information about the background that is encoded in the matrix element. Therefore, we propose that we consider the background likelihood, and closely-related expressions, as test statistics. Here N is the number of events and P (E i | bg) is either defined following Eq. (2) for the background hypothesis or is a similar variable. Thus, in this letter, our test statistic will be the sum of the logarithm of the pdf-weighted squared background matrix elements, as a function of the visible momenta in each event in our data sample, calculated using MEKD [8] (a package for MEM calculations for the four-lepton final state based on MadGraph [10]). The event-by-event value of this quantity will be labelled |M| 2 in subsequent figures, while the sum of this quantity over events in the pseudo-experiment will be labelled Λ B .
As an example, we plot the distribution of Λ B in Fig. 1, where we show this quantity as calculated for psuedoexperiments consisting of 20 events, generated using the indicated hypotheses, namely the irreducible qq → 2e2µ background (red solid curve), gluon fusion production of a 125 GeV Higgs boson that decays to 2e2µ (green dashed curve), and the irreducible qq → 2e2µ with the Z boson width scaled down by a factor of 5 (blue dotdashed curve).
In this figure, we also demonstrate the procedure for obtaining a p-value, describing the extent to which actual data is consistent with the background hypothesis. (This is a somewhat "brute force", but conceptually straightforward approach to the problem of evaluating the goodness of fit of a likelihood [11].) Specifically, one takes the particular value of Λ B measured in the data, labeled "Data Value", following Eqs. (2) and (4) and evaluates the fraction of background pseudo-experiments which have values of Λ B which are of equal or lesser likelihood than "Data Value". This corresponds to the shaded region in the figure. The specific value of "Data Value" used in this figure is consistent with the hypothesis of a 125 GeV Higgs boson, though many other signal hypotheses produce events with lower values of Λ B than would be expected from background events. However, it is also possible to have a signal hypothesis that is "more background-looking" than the background itself, such as the hypothesis of background events with a reduced Z boson width, as can be seen from the blue dot-dashed curve.
How to Flatten Background Distributions: Examples. The main point of this letter is that the proce-  (4) as evaluated for 20-event pseudoexperiments consisting of background qq → 4 events (red solid curve), gg → H → 4 signal events for a 125 GeV Higgs (green dashed curve), and qq → 4 events for which the Z boson width has been reduced by a factor of 5 (blue dot-dashed curve). If a particular value of our test statistic, indicated by "Data Value", is observed, the corresponding p-value is given by the area in gray. In this specific case, p ≈ 0.13. dure above allows one to exclude the background hypothesis in the presence of an unknown signal. In other words, one can confidently look for new physics models "away from the lamppost", i.e., models which no theorist has yet thought of. While, in principle, any variable could have been used to construct such a test statistic, the use of a variable based on the background likelihood should additionally optimize the sensitivity of such searches.
We now present some related methods, which allow the "non-backgroundness" of some potential signal to be shown in a clear and intuitive way. These methods also have the benefit that they generalize to any possible channel, so results and sensitivity in various channels can easily be compared.
1. Flattening with Ranking. In this approach, one takes the normalized distribution, dN dξ , for some kinematic variable, ξ, and defines a "ranking" variable, We note that r(ξ) is the cumulative distribution function for the background with respect to the variable ξ. We can now evaluate the ranking r ξ of any given event E by defining that is, the value of the ranking variable for a given event, E, is the value found from Eq. (5) for the value of the kinematic variable ξ obtained for the event. The connection between our ranking variable r ξ (E) and the background ξ distribution is shown pictorially in Fig. 2(a). The figure  FIG. 2: Panel (a) shows how the distribution of background MEKD for background events is used to create a "ranking variable". In panel (b) we show that the background distribution with respect to this ranking variable is flat, while for other processes the distribution of background ranking variable is not flat.
also illustrates the physical meaning of r ξ (E) -it is the fraction of background events E in which ξ(E ) < ξ(E).
If we then consider the normalized distribution of the background with respect to r ξ , we find that hence the distribution of this variable for background events is flat, as is shown in Fig. 2(b). This procedure, of course, works for any kinematic variable, ξ, we especially recommend using it with the sensitive matrix-elementbased variables advocated above. Thus, one obtains a sensitive variable for which the background distribution is flat, while the distribution of signal events is characterized by departures from flatness, as is also shown in Fig. 2(b). We note in passing that calculating r(ξ) from Monte Carlo (MC) events is quite straightforward. One simply calculates the value of the variable ξ for each of the N events in the MC sample, thus obtaining a list of values {ξ i }. The value of r(ξ) is then well-approximated by the fraction of the {ξ i } which are less than ξ, i.e., This procedure should facilitate the experimental implementation of this technique.

Flattening with Quantile
Bins. An alternate approach is to use the method of quantile bins. 1 If we are only considering one variable, ξ, this approach consists of finding n + 1 values η 1 , η 2 , ..., η n+1 such that ηi+1 ηi dN dξ dξ = 1/n, 1 Quantile bins have previously been employed in studies of the LHC inverse problem [12]. i.e., the integral of the distribution is equal in each bin. This procedure can be extended to the case where there are several variables ξ i , where again we demand that the integral of the distribution be the same in each bin. For example, in two dimensions, we must choose values of ξ 1 : η 1,1 , η 1,2 , ..., η 1,n+1 and values of ξ 2 : η 2,1 , η 2,2 , ..., η 2,n+1 , such that This procedure allows us to consider additional kinematic variables in addition to a likelihood-based variabe. Examples of this are shown in Figs. 3 and 4, in which we consider the distribution of four-lepton events at the 8 TeV LHC in terms of the four-lepton invariant mass, m 4 , and the background MEKD value. In Fig. 3 we show the results of an example experiment where we have formed quantile bins in m 4 and |M| 2 , assuming the background hypothesis. We then plot the number of events in each quantile bin either from 150 background qq → 2e2µ events (panels in the top row), or 75 125-GeV Higgs signal and 75 background events (panels in the bottom row). The panels in the left column are for one 150 event pseudo-experiment, while the panels in the right column are for the average of 400 such pseudo-experiments. Fig. 4 illustrates the same concept using scatter plots. Here the ratio of signal to background events has been changed from 1:1 (which is realistic for 125 GeV H → 4 signal and the qq → 4 background) to the much more challenging 1:3. Nevertheless, the presence of new signal can still be inferred from the anomalous clustering of points. Note that departures from uniform density are easier to interpret in the scatter plot in panel (b), which utilizes ranking variables.
3. Flattening with Respect to All the Variables. An extreme case of flattening the background distribution with respect to kinematic variables occurs when we consider a complete set of kinematic variables for some process. We can, of course, calculate the boundaries of these bins with Monte Carlo. However, in the limit where we have a good analytical, or at least numerical, understanding of the background, we can perform a flattening using the background distribution.
Specifically, if the background (after detector simulation, etc.) is described by the differential distribution d n N/dξ, then if we weight each background event by 1/(d n N/dξ), we will end up with a distribution that is flat in the full n-dimensional space of values. If we weight data events according to this procedure, a signal will show up as deviations from flatness. This procedure is demonstrated in Fig. 5. The image in the top left represents our background PDF. If we generate "events" (i.e., pixels) according to this PDF, but weigh the corresponding 2D histogram by the reciprocal of the PDF, then we obtain an essentially flat distribution, shown in the top right corner. We now consider the bottom left image, where some "signal" (American football and flying saucers) have been added to the background. If events are generated according to this PDF, but weighted according to the reciprocal of the background PDF, we obtain the bottom right image, in which background features have been flattened, but signal features remain distinct.
Conclusions. We have presented methods, which uti- lize variables based on the squared matrix element, to search for new physics signals at the LHC in a model independent way. These approaches allow for modelindependent exclusions of the standard model in the presence of arbitrary, unspecified, new physics. We look forward to the utilization of such methods in the upcoming Run 2 at the LHC.