Using a nested anomaly detection machine learning algorithm to study the neutral triple gauge couplings at an \texorpdfstring{$e^+e^-$}{e+e-} collider

Anomaly detection algorithms have been proved to be useful in the search of new physics beyond the Standard Model. However, a prerequisite for using an anomaly detection algorithm is that the signal to be sought is indeed anomalous. This does not always hold true, for example when interference between new physics and the Standard Model becomes important. In this case, the search of new physics is no longer an anomaly detection. To overcome this difficulty, we propose a nested anomaly detection algorithm, which appears to be useful in the study of neutral triple gauge couplings at the CEPC, the ILC and the FCC-ee. Our approach inherits the advantages of the anomaly detection algorithm been nested, while at the same time, it is no longer an anomaly detection algorithm. As a complement to anomaly detection algorithms, it can achieve better results on problems that are no longer anomaly detection.


Introduction
The search for signals from new physics (NP) beyond the Standard Model (SM) is one of the most frontier topics in the field of high energy physics (HEP) [1].In order to avoid dealing with the huge number of various NP models, a model-independent approach known as the SM effective field theory (SMEFT) has become popular in the phenomenological studies [2].Since the SM is very successful with only few exceptions [3], it can be expected that the NP signal that has not yet been observed must be very small.Moreover, from the SMEFT point of view, the NP signal comes from new interactions, so it can be expected that the kinematic features of NP signals are different.As a consequence, looking for NP signals is anomaly detection (AD), which is well suited to machine learning (ML) algorithms.
ML algorithms have been widely used in HEP [4], for example in the studies that implement the search for NP as AD [5,6].One of the advantages is that AD is not only model-independent, but also operator-independent. From the point of view of the operator geometric space, it is not sufficient to study only dimension-6 operators [7], and phenomenological study of dimension-8 operators has been gaining attention [8,9].If one has to consider dimension-8 or even dimension-10 operators, the number of operators is very large, e.g., 993 for dimension-8 and 15456 for dimension -10 [8].Using an AD algorithm, one can avoid studying the kinematics for specific operators.Besides, AD algorithms such as the isolation forest (IF) algorithm [10] are efficient and easy to implement, which are suitable for the rapidly growing volume of data collected at colliders.The IF algorithm has been shown to be useful in the study of anomalous quartic gauge couplings (aQGCs) in vector boson scattering processes [6].
However, NP signals are not always very different.For example we find that IF algorithm is not applicable to study the neutral triple gauge couplings (nTGCs) at the CEPC, the ILC and the FCC-ee.nTGCs have received a lot of attentions recently because they do not present in the SM and do not receive contributions from dimension-6 operators [11][12][13][14].It has been shown that e + e − colliders are suitable for studying nTGCs [11,13].However, when the energy scale is not large, the kinematics of NP signal is not sufficiently distinct from the SM, and the interference becomes important.In this case, the search for NP signals is no longer AD, and improved algorithms are needed.
In this paper, we propose a nested IF (NIF) event selection strategy (ESS), to use the variation of anomaly score to discriminate signal events.It inherits the advantages of IF, with a transparent mechanism and almost no tunable parameters, it is an unsupervised ML algorithm that does not need to tag the source of events, and it is model-independent and does not depend on the operator to be studied.We shall emphasize that NIF is no longer an anomaly detection algorithm, therefore it can be expected that NIF can be useful in some problems that cannot be solved by anomaly detection algorithms.Moreover, not only IF, but in principle any algorithm that gives quantitative measurements of the degree of anomaly for each sample can be nested in our approach.We find that NIF is useful in the study of nTGCs at the CEPC, the ILC and the FCC-ee.
The remainder of this letter is organized as follows.In Sec. 2, we briefly discuss the problem of IF algorithm in the study of nTGCs, the NIF algorithm with numerical results are presented in Sec. 3, and finally Sec. 4 is a summary.

The problem of isolation forest with nTGCs
The nTGCs receive no tree level contribution neither from the SM nor from the dimension 6 operators (the contribution at 1-loop level was studied in Ref. [14]).There are 4 CP-conserving dimension-8 operators contributing to nTGCs, the Lagrangian is [11,13] with the operators where H denotes the SM Higgs doublet, Bµν ≡ ϵ µναβ B αβ , Wµν ≡ ϵ µναβ W αβ with W µν ≡ W a µν σ a /2 where σ a are Pauli matrices, c X are dimensionless coefficients.The Λ X are related with the cutoff scale as Λ X = Λ/|c X | 1/4 .For simplicity, we define f X ≡ sign(c X )/Λ 4 X .At tree level, the processes e + e − → Zγ can be affected by those operators via ZV γ couplings where V is a Z boson or a photon.It is only necessary to consider the O BW operator, because O W W and O BB do not contribute when Z boson and γ are on-shell, and O B W has the same contribution as the O BW in this case [13].Therefore, in the paper, we only consider the existence of O BW .The numerical results are studied by using Monte-Carlo (MC) simulation with the MadGraph5_aMC@NLO toolkit [15].A fast detector simulation is applied by using Delphes [16] with the CEPC detector card [17].The events are generated at √ s = 250 GeV which is the energy scale of 'Higgs factory' and is close to or covered by the CEPC [18], the ILC [19] and FCC-ee [20].We use the basic cuts same as the default settings except for ∆R ℓℓ which is defined as ∆η 2 ℓℓ + ∆ϕ 2 ℓℓ where ∆η ℓℓ and ∆ϕ ℓℓ are the differences between pseudorapidities and azimuth angles of the charged leptons, respectively.As explained in Ref. [11], the ∆R ℓℓ is small for an energetic Z boson, so we use ∆R ℓℓ > 0.2 [11,21] to avoid losing too much signal events.

Isolation forest
To highlight the signal significance, an ESS based on an IF algorithm has been proposed in Ref. [6] and was found useful in the study of aQGCs.The IF algorithm makes use of the fact that the anomalies are 'few and different', and can be applied for multidimensional data efficiently.In the IF algorithm, a dimensionless anomaly score (denoted as a) can be calculated for each event where 0 < a < 1.An IF is made of many isolation trees (ITs).Consider a dataset in which each piece of data is a multidimensional vector (denoted as x i ), an IT can be constructed using the following procedure.
1. Put all points into a node (denoted as the root node).2. Randomly select a node which has not been partitioned yet and contains more than one point (denoted as n 0 ). 3. Randomly select a dimension (denoted as D), randomly set a partition value min(x i D ) < x < max(x i D ) where i runs over all points in this node, x i D denotes the D-th component of x i .4. For n 0 , generate two children nodes (denoted n L and n R ), move the points in n 0 with x i D < x into n L , and the others into n R . 5. Repeat ( 2) to ( 4) until every node is either partitioned or contains only one point.
Since random numbers are used to construct an IT, to obtain a stable result, many ITs are constructed to form an IF.The anomaly score of a point can be obtained as a = 2 − L/c(N ) , where L is the average of the depths of the nodes contain the point over all ITs, N is the number of points in the dataset, c(N ) = 2H(N − 1) − 2(N − 1)/N is a normalization used to regularize N independently and is the average depth of all nodes in an IT, where H(N ) is the harmonic number.Details on how the anomaly score is calculated can be found in Refs.[6,10].
To study the signal of nTGCs, 15 data sets are generated with L SM + L nTGCs and with f BW = −700 + 100k TeV −4 where 0 ≤ k ≤ 14 are integers.Each data set consists of N = 500000 events.These data sets are in fact those used in Ref. [11], so that we can compare our result with Ref. [11].We require that there are at least two charged leptons, and the two hardest leptons have the same flavor and different charges.Besides, we also require that there is at least one photon in the final state.These requirements are the same as in Ref. [11].After these requirements there are about 330000 events left in each data set.The one piece of data corresponding to each event is a 12-dimensional vector consisting of components of the 4-momenta of the two hardest charged leptons and the photon.
Note that, both IF algorithm and the NIF algorithm to be introduced are model independent.Namely, the datasets with f BW ̸ = 0 can be replaced with any dataset without knowing what kind of NP is contained in the dataset under study.For example, the dataset can be replaced with the dataset obtained in experiment to search for the signal of any possible NP models.We use the dataset with f BW ̸ = 0 merely to study the performances of IF and NIF on the nTGCs.
The data preparation is performed by MLAnalysis [22].Once data sets are determined, the only parameter in IF algorithm is the number of isolation trees (denoted as n) which controls the statistical error of the anomaly scores.In principle, n should be as large as possible as far as the computational power allows.The standard error of anomaly score (denoted as ϵ a ) can be calculated for each event.With n = 2000, the normalized distributions of ϵ a for the events in the data sets with k = 0, 7, 14 are shown in Fig. 3.It can be seen that, generally ϵ a < 0.004 and 2000 trees are sufficient.

The problem of IF
One can use an ESS to select the event with a greater than a threshold score (denoted as a th ) to highlight the signal events.However, in the case of nTGCs at √ s = 250 GeV, the outcomes are not satisfactory.The cross-sections with different a th are shown in Fig. 4, no obvious pattern can be seen.The problem is that when the energy scale is not  large, the kinematic features of signal events are not sufficiently distinct from the SM.In this case the search of the NP signal is no longer a AD problem.
The problem encountered can be depicted in Fig. 5.To illustrate this, it is assumed that the data in the dataset are 2-dimensional vectors and that interference terms are positive or negligible, i.e., the signal events are new points added to the background events.In addition, assuming that the background events are Gaussian distributed in the phase space, then the rarity of the events can be quantified simply as the distance of the points from the centroid of all events.In the case of Fig. 5 (a), the signal events are anomalous events, and the problem is an AD problem.In the case of Fig. 5 (b), the distribution of the signal events to be searched for overlaps with the background events, and then this is no longer an AD problem.For the latter, it can be seen that the density of the distribution is increased.Since anomaly scores are higher in regions with smaller densities, it can be expected that the anomaly scores of some points will decrease when the signal presents.

Nested isolation forest algorithm
Since the anomaly score calculated by the IF algorithm can be regarded as a representation of the density of events in the phase space, it can be conjectured that anomaly score can also measure the density variation.In order to achieve this without increasing a large amount of computation, one needs to first build an anomaly score distribution by the SM as a reference.This can be done by building an IF using the data set from the SM.Then the NIF ESS can be summarized as follows: 1. Using a data set from MC simulation with the SM as a training data set (denoted as S ref ), and build an isolation forest, the anomaly score of each event is obtained as a ref .
2. For the data set to be investigated (from MC simulations or experiments, denoted as S inv ), build another isolation forest, the anomaly score of each event is obtained as a inv .
3. For each event in S inv , find the nearest event in S ref , the variation of the anomaly scores for event i ∈ S inv can be calculated as ∆a i = a i inv − a r ref , where r ∈ S ref represents the nearest event to i.
For this algorithm to work, the distance between the events need to be defined, so that 'the nearest event' can be defined.A systematic way to define the distance known as 'earth (or energy) mover's distance' was introduced in Ref. [23].Since calculating the distance between events is the most computationally resource-intensive step in this algorithm, we chose a relatively inexpensive way of defining distance.In this paper, the distance is simply defined as the distance in the phase space as d = ij (p i j − q i j ) 2 , where p i j and q i j are the i-th component of the 4-momentum of the particle j in the final state, j runs over the two hardest charged leptons and the photon.p are 4-momentum of particles in S inv while q are from S ref , respectively.
Note that, in this approach, the AD algorithm to be nested is not limited to the IF algorithm.Any AD algorithm quantifies the degree of anomaly can be nested in this way.Compared with the AD algorithm to be nested, the additional requirements of this approach are simply the creation of a reference of anomaly scores for S ref , and the need to compute distance between events.Therefore, the nested AD algorithm can inherent the advantages of the algorithms to be nested.Once the variations of the anomaly scores ∆a are obtained, they can be used to detect the existence of NP models as a complement in the cases such as the nTGCs.
As discussed, when the nTGCs-induced events overlap with those of the SM in the phase space, it mainly leads to a decrease in anomaly score.Therefore, we select events where ∆a is less than a certain threshold (denoted as ∆a th ).This simple strategy is the only one we use in this letter, and is denoted as 'NIF cut'.To build the reference anomaly score distribution, another data set with N = 500000 are generated with the SM Lagrangian.
The ESS based on kinematic analysis are discussed in Refs.[11,13].We compare the results of NIF cut with the results in Ref. [11] which uses an ESS as  where M ℓℓ is the invariant mass of two hardest charged leptons, θ γ is the zenith angle of the photon, θ ℓ is the zenith angle of ℓ − when the ℓ − is boosted into the c.m. frame of two hardest charged leptons whose z-axis lays along the direction of ⃗ p ℓ + + ⃗ p ℓ − .The cross-sections after traditional ESS (which is the same as Ref. [11]) and after NIF cuts with ∆a th = −0.01,−0.02, −0.03 are shown in Fig. 6.We find that the cross-section after NIF cut can still be approximated by a bilinear function of f BW within the range of coefficients in use.Among the results after NIF cut, ∆a = −0.02 is the most consistent with the bilinear function.Other than that, for the case of ∆a = −0.03,too many signal events are dropped, leading to a smaller signal significance compared with the case of ∆a = −0.02.For ∆a = −0.01,too many background events are kept.This can also be seen from the comparison with the traditional ESS that, the cross-section after the NIF cut with ∆a = −0.02 is close to the cross-section after the traditional ESS.
Using the signal significance defined as S stat = N s / N s + N bg , where N bg = Lσ(f BW = 0) is the number of events for the SM, N s = L (σ(f BW ̸ = 0) − σ(f BW = 0)) is the number of events beyond the SM with L the luminosity.For a dataset from the experiment, N bg = Lσ SM and N s = L (σ exp − σ SM ) where σ exp is the cross-section after NIF cut for the experiment, σ SM is the cross-section after NIF cut for the SM by MC simulation.The expected constraints on f BW at luminosity L = 2 ab −1 are presented in Table 1.The results of Ref. [11] are also listed.It can be seen that, the NIF cut, which is independent of the operator to be studied, still shows better discriminative ability for f BW < 0 than the traditional ESS.
Finally, we would like to emphasize that although the amount of data in the dataset has been irrelevant when calculating the anomaly scores, the larger the amount of data in the SM dataset when building the reference, the more plausible the reference is.Therefore, it can be expected that the larger the amount of data, the better the NIF will perform.However, the amount of data is not an tunable parameter because the data is obtained from experiments.That is, as much data as possible should be collected from the experiments when using the NIF ESS.

Summary
Although ML algorithms for AD have been widely used in the study of NP, there are cases that the search for NP signals is no longer AD.This happens when the NP signals are no longer very different, for example when the energy scale is not very large, or when the interference term is important.In this paper, a nested AD algorithm is proposed for such cases.
The proposed nested anomaly detection algorithm focuses on the variation of the anomaly scores.Therefore, any anomaly detection algorithm quantifies the degree of anomaly should be able to be nested in our approach.In this paper, the NIF algorithm is studied.Such a method inherits the advantages of IF algorithm.The mechanism behind NIF algorithm is also transparent, there are almost no parameters to be tuned.The whole procedure is still model independent and operator independent, which works as an automatic ESS.Moreover, NIF is also an unsupervised machine learning algorithm in the sense that it is not necessary to tag the events.This is important because it is in principle impossible to specify the origin of an event when there is a negative interference term presented.Moreover, this unsupervised feature ensures the model-independent, the dataset to be investigated can be a dataset from experiment with the purpose of finding any possible NP model signal that we know or even do not know.
As a complement to the AD algorithms, the problems that a nested AD algorithm targets are those that are no longer AD.As a case study, the nTGCs at √ s = 250 GeV are investigated in this paper.It can be shown that while the IF algorithm is not satisfactory, the NIF is useful in this case.As an automatic ESS, the NIF can achieve compatible discriminative ability compared to the traditional ESS without knowing at all what operator is under study.For f BW < 0, a stronger constraint can be set with the help of an NIF cut.On the other hand, in the region that the negative interference is dominant, i.e. σ(f BW ̸ = 0) < σ(f BW = 0), the traditional ESS is better.In this region, the pattern how the anomaly score changes is complicated and needs more exploration.In this case, the NIF cut is still competitive because it achieves a discrimination capacity close to that of traditional ESS without kinematic analysis.

Figure 1 :Figure 2 :
Figure 1: Typical Feynman diagrams for signal events.The dominant contribution from operators in Eq. (2) to the process e + e − → ℓ + ℓ − γ where ℓ is e or µ is through process e + e − → Zγ.Therefore, one can use the process e + e − → ℓ + ℓ − γ at e + e − colliders to study O BW .The signal of O BW is induced by Feynman diagrams shown in Fig.1.In this letter, the events of the processes e + e − → µ + µ − γ and e + e − → e + e − γ are combined.The background is the process e + e − → ℓ + ℓ − γ in the SM.The typical Feynman diagrams are shown in Fig.2.

Figure 4 :
Figure4: The cross-sections as functions of f BW after the ESS that selecting events with a > a th .

Figure 5 :
Figure 5: Two cases of NP signals.The signal points are colored in red and the background points are colored in blue.(a) The signal points are the anomalous points, this is an AD problem.(b) The distribution of signal points overlaps with the background, and this is no longer an AD problem.

Figure 6 :
Figure 6: The cross-sections as functions of f BW after the traditional ESS and NIF cut.