New Physics Agnostic Selections For New Physics Searches

. We discuss a model-independent strategy for boosting new physics searches with the help of an unsupervised anomaly detection algorithm. Prior to a search, each input event is preprocessed by the algorithm - a variational autoencoder (VAE). Based on the loss assigned to each event, input data can be split into a background control sample and a signal enriched sample. Following this strategy, one can enhance the sensitivity to new physics with no assumption on the underlying new physics signature. Our results show that a typical BSM search on the signal enriched group is more sensitive than an equivalent search on the original dataset.


Introduction
The search for physics beyond the Standard Model of particle physics (BSM) is one of the most important aspects of the Large Hadron Collider (LHC) physics program. Several modelindependent strategies to search for new physics have been studied [1][2][3][4][5][6][7][8][9]. Inspired by these studies, we propose a strategy to boost BSM searches by applying a machine learning (ML) method to preprocess data in an unsupervised approach. The input to a BSM analysis usually has a large share of background events and a small share of potential signal events. In our procedure, events are filtered before the analysis, such that a signal event is more likely to pass the selection than a background event. This way we enhance the signal-overbackground ratio where the filter threshold defines how "anomalous" an event looks like. In our study, the definition of anomaly is learned training an unsupervised algorithm on a data sideband. This allows to limit the amount of assumptions on the underlying new physics.  Second, one reconstructs the input from that latent representation (Figure 1 left). Third, one computes the distance (in some metric) between input and reconstruction. If the distance of a sample i is small, the reconstruction worked well, i.e. sample i was "easy" to reconstruct. It is then considered a typical example of the kind of events processed by the AE at training time. Instead, if the distance is large, the sample was hard to reconstruct. This is interpreted as an indication that the example could be anomalous. No assumptions are made on why a distance is large, i.e. on why the sample is anomalous. On one hand, this guarantees a modelindependent selection and could allow to retain sensitivity to untested scenarios. On the other hand, one retains little control on the physics behind the event selection. This procedure should be used to test unthought scenarios and weakly defined BSM frameworks.

Variational Autoencoder (VAE)
In our work we consider a variational autoencoder [10,11] (Figure 1 right). Its encoder and decoder parts are stochastic, meaning that their outputs are probability distributions rather than pointwise values. The encoder maps an input X to a distribution in the latent space q Φ (z|x). A pointwise latent variable value is then sampled from this distribution z ∼ q Φ . The obtained z is then fed to the decoder, which maps it back to a distribution in the input space p Θ (x|z). A pointwise prediction for the reconstruction can be obtained by sampling from that distribution x ∼ p Θ . A prior probability distribution is imposed on the latent space, giving the opportunity to add any deviation from the prior to the loss function. The architecture of the VAE is illustrated in Figure 2.

Optimization
The loss function of our VAE model consists of two terms (Eq. 1): A reconstruction loss and a loss on the latent space. The latent space penalty is the Kulback Leibler (KL) divergence [10] between the imposed prior and the encoder output. The term is scaled by a factor β [12] (in our model β = 5 · 10 −4 ): The reconstruction loss is defined on a Gaussian distribution of the values x reconstructed from the latent space z: p θ (x|z) = N(µ(z), I), giving the minimization problem in Eq. 2. The KL loss is defined on a Gaussian prior for variables in the latent space and computes the relative entropy between the prior and the approximated distribution (Eq. 3)

Training
The training dataset consists of a sample of multijet events, generated with PYTHIA8 [13] and processed with DELPHES [14] to emulate detector resolution effects and reconstruction efficiencies. The CMS Phase II [15] detector performances are assumed. The event reconstruction is performed running the DELPHES Particle Flow implementation. A Poisson profile is assumed for the pileup distribution, with the average number of collisions set to 40. Jets are clustered with the FASTJET [16] implementation of the anti-k T jet algorithmm [17], with jet-size parameter R=0.8. We train the VAE on a set of SM multijet events. The goal is to teach it what a SM event looks like, such that it returns a large loss when processing BSM events. In our study we focus on dijet events. However, the VAE is trained on each jet independently and the procedure could be generalized to other event topologies with jets, with no need to retrain the model. We consider events with at least two jets having pT > 30 GeV and |η| < 2.4. Of those, we consider the first (jet 1) and second (jet 2) highest-pT jet in the event. The training set is built mixing jet 1 and jet 2 candidates from events belonging to the |∆η| sideband, defined requiring |∆η| > 1.4 between the two jets. The information that they both belong to the same event is discarded for training (and inference) and will be needed only when their losses are recombined as described in Section 2.3.

Input format
From each jet sample in the input set, we take the 100 constituents with highest p T . We construct a jet image, binned in ∆η and ∆φ where each bin contains the corresponding summed particle p T contributions as illustrated in Figure 3. Each image is made of 32x32 bins in a 0.8x0.8 wide window.

From Inference to Selection
Once trained, the VAE is applied to jet 1 and jet 2 of signal region's events (|∆η| < 1.4). We then collect the loss for the first jet L 1 and the loss for the second jet L 2 . Both losses can be  combined to a total loss L in different ways. To decide if the event is normal or anomalous its total loss must be beyond a threshold L T . The loss combination strategies are illustrated in Figure 4.
Given multiple strategies, we pick the one with the best ROC curve for a set of BSM benchmarks. In our study we evaluated several Randall-Sundrum Graviton [18] processes: G RS → WW with m j j = {1.5, 2.5, 3.5, 4.5} TeV and G RS → tt, broad and narrow resonance with m j j = 13 TeV. ROC curves for some of those benchmarks with different loss combination strategies are given in Figure 5. The results show, that the loss strategy where both jet-losses are required to be above L T (red line) yields the best outcome for signal efficiencies above 10 −4 .
This procedure marks an event as normal or anomalous without the need for a prior definition of anomaly.

From Selection and Filtering to Analysis
Given an input dataset processed by the VAE, we split it into a background enriched and a signal enriched group: Events whose total loss is below the loss threshold L T is considered to be the group enriched with standard physics. Events whose total loss is above L T is considered to be the group enriched with anomalous events. The loss threshold L T could be a fixed value or a function of some application-specific discriminating quantity, such as the dijet mass mjj for the case we consider here. Then we perform classic statistical analysis on the signal enriched group, taking advantage of the improved signal to background ratio. In the results given in the following sections, the input dataset for training the VAE is a QCD simulation in a control region defined by |∆η| > 1.4. For analysis and inference, the signal region defined by |∆η| < 1.4 is used as described in Section 2. The variable of interest is picked to be the dijet mass m j j .

VAE Boosted Supervised Searches on Tails
In this approach we use a fixed loss cut L > L T to preprocess events. We define L T such that some defined fraction of events passes the filter (in the examples below it is 1%).
The advantage of this approach is that the signal to background ratio is indeed enhanced. However, the shape and the position of the background bulk region might render the search for a small signal statistically challenging in that mass range. Therefore, we perform a resonance search if the signal excess is not located in the bulk region of the background. We tested the above described approach on a 'data mixture' of broad G RS → tt with m G = 3.5TeV mass and QCD at different cross sections. At each cross section value, we performed two analyses: A traditional resonance search on the untreated input data mixture and a resonance search on the data mixture filtered by VAE loss cut. The results are shown in Figure 7: In this scenario, a 3σ excess could be turned into 5σ discovery. Note that the plots show a simple template fit assuming that we know the background shape. Our comparison thus is meaningful but we obtain an overestimated significance.

VAE Boosted Supervised Searches on bulks
In this approach we select events with a loss cut that is dependent on the variable of interest, in our case the dijet mass m j j . For each dijet-mass value, a loss threshold L T (m j j ) is identified,    that accepts a given constant fraction of events as anomalous if L > L T (m j j ). We identify those thresholds by means of a quantile regression as shown in Figure 8.
The advantage of this approach is that it keeps the background unbiased. However, it reshapes the signal in an unfavorable way by penalising the tail. Therefore, we perform a resonance search if the excess is located in the bulk region of the distribution. We tested the above described approach on a data mixture of G RS → WW with m G = 1.5TeV and QCD events at different cross sections. Again, we performed two analyses: A resonance search on the original and the VAE prefiltered dataset. The results are shown in Figure 10: In this scenario, a 3σ excess could be turned into 4σ. As in section 3.2, the resonance search fits are performed assuming that the background shape is known.

Conclusion
In this work we have discussed the use of unsupervised techniques in a typical BSM search at the LHC. We have shown that unsupervised machine learning can be utilized to select anomalous dijet BSM events in an input dataset mainly consisting of SM processes. We employed a variational autoencoder (VAE) that learns to identify previously unseen events and to distinguish them from processes that were used in the training procedure. We performed several statistical inference tests for observing processes induced by the decay of a hypothetical graviton in the presence of a dominating QCD background. Our studies show that the observation significance increases when the search is performed on a dataset that was previously filtered by cutting on the VAE loss. We achieve good sensitivity on the tail of dijet mass distributions which could benefit new physics studies focused on resonance searches. In the distribution bulks we observe a shape bias when applying our method. However, the sensitivity can be recovered with a background biased selection.