Machine-enhanced CP-asymmetries in the Higgs sector

Improving the sensitivity to CP-violation in the Higgs sector is one of the pillars of the precision Higgs programme at the Large Hadron Collider. We present a simple method that allows CP-sensitive observables to be directly constructed from the output of neural networks. We show that these observables have improved sensitivity to CP-violating effects in the production and decay of the Higgs boson, when compared to the use of traditional angular observables alone. The kinematic correlations identified by the neural networks can be used to design new analyses based on angular observables, with a similar improvement in sensitivity.


I. INTRODUCTION
The Sakharov criteria [1] provide the theoretical backdrop for one of the biggest phenomenological shortfalls of the Standard Model (SM) of Particle Physics -an insufficient amount of charge-conjugation (C) and parity (P) violation. In the SM, the only source of CP violation is the complex phase in the Cabibbo-Kobayashi-Maskawa (CKM) matrix [2,3]. As the flavour and CP structure of SM interactions is intricately related to the Yukawka sector, extending the Higgs sector with additional CP-violating effects is typically considered as a motivated avenue to reconcile the SM with the Sakharov criteria.
Such extensions of the SM typically lead to new exotic states [4], which so far have not been discovered at the Large Hadron Collider (LHC). This suggests that there is a significant gap between the mass scale of weak interactions and the mass scale of beyond-the-SM (BSM) physics. This line of thought has led to a resurgence of effective field theory applications to the interpretation of LHC data [5][6][7][8][9][10][11][12][13][14][15][16]. The extension of the SM by dimension-six interactions provides the first step in this programme, capturing the deformations of correlations in particle physics data under the assumption that there is a hierarchy between the scale of measurement and new physics Q 2 Λ 2 . Of particular interest are the operators, O i , that introduce new sources of CP violation in the Lagrangian, where L SM is the SM Lagrangian and the c i /Λ 2 are Wilson coefficients that specify the strength of the new interactions. The operators that affect the electroweak in- * Electronic address: akanksha.bhardwaj@glasgow.ac.uk † Electronic address: christoph.englert@glasgow.ac.uk ‡ Electronic address: robert.hankache@manchester.ac.uk § Electronic address: andrew.pilkington@manchester.ac.uk teractions of the Higgs boson are (see also [17]) where Φ is the Higgs field, and the W µ and B µ are the fields in the SU (2) ⊗ U (1) gauge-field eigenbasis. The dual field strength tensors are defined as X µν = µνρδ X ρδ /2. 1 The contributions of these operators to Higgs boson production and decay is given by the squared amplitude, i.e.
where M SM and M d6,i are the SM and dimensionsix amplitudes, respectively. For the CP-odd operators of interest, the interference between the SM amplitude and the dimension-six amplitude is also CPodd. Interference effects therefore cancel entirely for CP-even observables, such as inclusive cross sections and transverse-momentum spectra, but can be observed as asymmetries in appropriately-constructed CP-odd observables . The inclusion of the pure dimension-six contributions to the amplitude-squared in Eq. (3) gives two potential problems. First, these contributions are CP-even, making it difficult to disentangle the effects of a CP-even operator from a CP-odd operator. Second, the contributions arise at O(1/Λ 4 ) and power counting of the new physics scenario becomes important in this instance, i.e. it is a model-dependent question whether the leading O(1/Λ 2 ) dominate over the O(1/Λ 4 ) expansion in an actual matching calculation [4,40]. For these reasons, the ATLAS and CMS experiments have an extensive programme of searches and measurements that utilise CP-odd observables, including angular observables [41][42][43][44] as well as observables that are constructed from matrix-element information [45][46][47][48]. The latter approach exploits the full kinematic information in leading-order matrix-elements to discriminate different CP hypotheses, and is shown to improve the analysis sensitivity over the use of angular observables alone. The use of matrix-elements in an analysis is, however, more technically challenging and time-consuming than using the simpler angular observables. For this reason, only a few experimental analyses have adopted these more sophisticated analysis techniques to date. 2 In this article, we show that CP-odd observables can be directly constructed from the output of a neural network. Given that the O(1/Λ 2 ) interference effects cancel entirely for CP-even observables, and induce asymmetries in CP-odd observables, we can directly construct a CP-odd observable by training a neural network (NN) to distinguish between positive and negative interference contributions. With the ability to learn kinematic correlations, the NN can be used to (i) construct a nearoptimal CP-odd observable for each dimension-six operator, or (ii) design new analyses based on the correlation between the angular observables and other kinematic quantities. The method can then be extended to multi-class models, with the pure-SM prediction included in the training of the network, to allow the NN to learn the phase-space regions for which the SM is suppressed relative to the interference contribution.
As a concrete example, we explore the potential application of neural networks in two of the main search channels for CP-violation in the Higgs sector: the h → 4 decay channel and in the vector-boson fusion production channel (VBF h + 2 jets). As well as addressing the phenomenological difference between Higgs production and Higgs decay, the comparison of h → 4 and h + 2 jets also highlights the difference between single-scale and multiscale processes when viewed through a NN lens. We note that the technique should be applicable to a wide variety of production and decay channels at the LHC (see also the recent [50][51][52]).
We organise the work as follows. In Section II, we introduce the Monte Carlo event generators that we use to construct the SM and dimension-six theoretical predictions for Higgs boson production at the LHC. In Section III, we recap the angular observables that typically are used for CP-violation searches in the h → 4 and h + 2 jets final states. We also introduce the method to construct CP-odd observables using neural networks. In Section IV, we apply this method to simulated h → 4 and h + 2 jets events and compare the sensitivity of the machine-learned CP-odd observables to the sensitivity obtained using angular observables. We also investigate the origin of any improvement in sensitivity. Finally, we conclude in Section V.

II. THEORETICAL PREDICTIONS
Events are generated for the production of h → 4 and VBF h + 2jet in proton-proton collisions at √ s = 13 TeV using MadGraph5 aMC@NLO [53]. The events are accurate to leading order in perturbative QCD and are passed to Pythia8 [54] to simulate the effects of partonshowering, hadronisation and underlying event activity. The NNPDF30nlo (NNPDF23lo) parton distribution function [55] is used in the cross-section calculation for the h → 4 (h + 2jet) samples. The A14 set of tuned parameters [56] is used to model the underlying event. Events are generated separately for the Standard Model and for the interference between the SM and dimension-six amplitudes, with the interactions induced by the dimension-six operators provided by the SMEFTSim package [14]. In the h + 2 jets sample, the Higgs boson is not decayed, as we focus on productionrelated kinematics in this channel.
For the h → 4 analysis, we require the generated events to pass the selection criteria of the ATLAS pp → 4 measurement [57], in the Higgs Mass fiducial region (120 GeV < m 4 < 130 GeV). For the analysis of VBF Higgs production, we require the events to pass the selection criteria of the ATLAS VBF h → τ + τ − analysis [58], in the VBF 1 fiducial region.

III. CP-SENSITIVE OBSERVABLES
A. Angular observables CP-violating effects in the h → ZZ * → 4 decay channel can be probed using the Φ 4 variable [59,60] defined by where the normal vectors to the planes are defined aŝ Each q αβ labels the three-momentum of the lepton/antilepton β that arises from the decay Z α → ¯ , and q α = q α1 + q α2 is the three momentum of the Z α . All three-momenta are calculated in the Higgs-boson centreof-mass frame. It is worth noting that Φ 4 coincides with the angular difference of the polar angles of the leptons with identical charge in their respective Z boson rest frame (for aligned reference axes).
CP-violating effects in the VBF h + 2 jets production channel can be probed using the signed azimuthal angle between the two jets, i.e. ∆φ jj = φ(j 1 ) − φ(j 2 ) , with y(j 1 ) > y(j 2 ) , (6) where φ(j 1 ) and φ(j 2 ) are the azimuthal angles of the two highest transverse momentum jets in the event that are ordered in rapidity y. The interference effects and associated asymmetry effects can be traced to the vertex structure that is induced by the operators of Eq. (2). The Levi-Civita tensor determines a P -odd behaviour of the decay amplitude where q i are the two four momenta of the effective fermion currents j i coupling to the Higgs boson. C and P transformations induce sign changes of the currents j µ i , q µ i . Together with the odd property of the Levi-Civita tensor under parity transformations, this leads to an asymmetry of the interference effects as a function of ∆φ jj (see also [26]). Note also that M d6 is only nonvanishing for linear independent momenta and effective currents, thus removing longitudinal effective polarisations from the BSM amplitude. In the case of VBF production, the currents in Eq. (7) are related to interactions between the tagging jet and its associated initial state parton. For VBF kinematics p T,j E j this leads to a p T -enhanced CP-sensitivity (the aforementioned optimal observable), which is reflected in the VBF results below.

B. ML-constructed CP-odd observables
We use TensorFlow 2.3.0 [61] to train the neural networks. The input data are the MC samples discussed in Sec. II, with the events in the interference sample separated according to whether the event weight was positive or negative. Two types of neural network architectures are investigated. Binary (two-class) models are trained using only the interference sample and define the probability that a given event is a positively-weighted interference event (P + ) or a negatively-weighted interference event (P − ). In these models, P + + P − = 1. Multi-class models are trained using both the interference sample and the pure-SM prediction, and therefore also define the probability that a given event is a SM event (P SM ). In these models, P + + P − + P SM = 1. The machine-learned CP-odd observable is then defined by The ability of a neural network to construct the CPodd observable is, in principle, dependent on the input information, with the simplest input being only the fourvectors of the leptons and jets. More complex inputs would include variables that can be derived from those four-vectors, such as Φ 4 in the h → 4 decay channel. In general, we find that the neural networks perform equally well when including derived variables or just using lepton and jet four-vectors. However, the inclusion of derived variables can help with understanding the physical origin of any improvement in sensitivity. Unless otherwise stated, the results presented in this article use neural networks trained with both lepton/jet four vectors and derived variables.
The optimal choice of hyperparameters for each network is obtained using Keras-Tuner 1.0.2 [62,63]. The optimisation included the number of layers, the number of units, the activation function, the L2 regularisation, the learning rate and the batch size. To avoid the networks exploiting statistical fluctuations, we adopt a data augmentation procedure whereby each event is used twice in the training, once with the default input variables and once with a CP-operator applied to all the input variables. For CP-flipped events in the interference sample, the event weight is multiplied by -1. After the initial training, in order to smooth the model, we train for more epochs using the full batch. We also apply a learning rate decay, beginning with the initial rate and halving it every 100 epochs until reaching a factor of 1/8.
The construction of Φ 4 and O N N requires each lepton and antilepton to be associated with the decay of an intermediate Z-boson. For the h → e + e − µ + µ − decay channel, this is trivial because the Z-boson always decays to a same-flavour opposite-charge pair. However, an ambiguity arises in the h → e + e − e + e − and h → µ + µ − µ + µ − decay channels, due to the multiple possible pairings of the leptons and antileptons. For this reason, we initially restrict our discussion to the h → e + e − µ + µ − decay channel and comment later on the performance of the other h → 4 decay channels.
The differential cross section for h → e + e − µ + µ − as a function of Φ 4 is presented in Fig. 1. The SM prediction is shown in addition to the interference contribu- with Wilson coefficients set to c/Λ 2 = 1 TeV −2 . As expected, the CP-even SM prediction is symmetric around Φ 4 = 0, whereas the CP-odd interference contributions are all asymmetric with an integral of zero. The largest interference effects arise from the O Φ B operator. The Φ 4 distribution is much less sensitive to the O Φ W B and O Φ W operators, and much larger values of Wilson coefficient would be needed to produce a noticeable effect on the combined SM+EFT cross section.
The differential cross section as a function of the CPodd observable produced by a binary NN is shown in Fig. 2, where the NN has been trained to distinguish between the positive-and negative-interference effects produced by the O Φ W B operator. The interference contribu- The improved sensitivity obtained using the neural network can be understood using feature importance techniques. Specifically, the importance of each input variable is determined for the trained network, by evaluating the increase in the loss (or decrease in the accuracy) that occurs when the value of the input variable for a given event is replaced by a randomly chosen value taken from the ensemble of events. Unsurprisingly, the most-important variable is found to be Φ 4 . However, the invariant mass (m 12 ) of the lepton-antilepton pair that is closest in mass to the Z-boson is also found to be very important, despite being a CP-even quantity. This is explored more in more detail in Fig. 3, which shows the double-differential cross section for the interference contribution induced by the O Φ W B operator as a function of Φ 4 and m 12 . The importance of m 12 is clear: at a given value of Φ 4 , the interference effects for events with m 12 ∼ m Z are opposite in sign to the interference effects for events with m 12 m Z , which cancel when Φ 4 is measured inclusively. The neural network has learned this feature and utilised it to produce an improved CPodd observable.
The origin of the sign flip in the interference contribution induced by the O Φ W B operator is driven by the anomalous hZZ, hγγ and hZγ interactions, which are related to one another via gauge symmetry and are given by where s w and c w are the sine and cosine of the Weinberg angle, respectively. The impact of the O H W B operator is anticorrelated for the hZZ and hZγ (hγγ) anomalous interactions, which leads to anticorrelated interference contributions. The sign flip therefore occurs due to different contributions from the h → ZZ, h → γγ and h → Zγ dimension-six amplitudes in the on-shell and off-shell regions. The sensitivity of O N N can be further improved by using multi-class neural networks, which have the ability to learn the kinematic features of the SM prediction. Figure 4 shows the differential cross section as a function of the CP-odd observable constructed from a multiclass network. The neural network has been trained to distinguish between the SM contribution as well as the positive-and negative-interference effects produced by the O Φ W B operator. The interference contributions are still peaked at O N N = ±1, but with a broader peak than what was obtained with a binary neural network. However, the SM prediction is shifted much closer to (and peaks at) zero. Overall, the SM contribution in the interference-enhanced regions at O N N = ±1 is further reduced when compared to the binary network, implying a further increase in sensitivity.
To quantify the sensitivity of an experimental analysis, we construct the expected 95% confidence intervals for each CP-odd observable. A likelihood function is defined as where n i is the expected number of events in bin i assuming the SM-only hypothesis, and λ i is the predicted number of events (SM+EFT) at a given value of a Wilson coefficient. The expected number of events are obtained from the event generator samples after applying a normalisation factor that is defined such that the SM prediction reproduces the number of events observed experimentally in the Higgs mass fiducial region of Ref. [57], corresponding to an integrated luminosity of 139 fb −1 . The confidence level is then calculated using the profilelikelihood test statistic [64], which is assumed to be distributed according to a χ 2 distribution with one degree of freedom following from Wilks' theorem [65] and allows the 95% confidence intervals to be constructed. 3 Although the likelihood function does not account for systematic uncertainties, the effect of any systematic variation should be symmetric for all CP-odd observables. As 3 This assumption is validated by constructing pseudo-experiments to determine the distribution of the profile-likelihood test statistic. The resulting distribution is well modelled by a χ 2 distribution with one degree of freedom. the constraints are driven by asymmetries in the distribution, the impact of systematic uncertainties should be very small. This was tested by injecting small symmetric shifts into the predicted number of events, to simulate a systematic bias. The resulting 95% confidence intervals were almost unchanged. The constraints obtained for each Wilson coefficient are shown in Tab. I, when performing a fit to (i) the angular observable Φ 4 , (ii) a two-dimensional fit to Φ 4 and m 12 , and (iii) fits to the NN-constructed O N N observables for both binary and multi-class networks. The O N N observables were obtained with networks trained on the interference predictions obtained with the O Φ W B operator. The O N N observables both provide much better sensitivity than Φ 4 alone, with 95% confidence intervals reduced by a factor of 2-10, depending on the Wilson coefficient. Some of this improvement is regained using a two-dimensional fit to Φ 4 and m 12 , although the constraints obtained using the O N N observables remain 20-30% more sensitive. The constraints obtained from the CP-odd observable constructed from the multi-class network are 5-10% better than the constraints obtained from using binary networks. It is also found that the constraints on c Φ B and c Φ W can be further improved by 5% and 10%, respectively, if the neutral network is specifically trained on the interference predicted by the associated operators.
Finally, we discuss the analysis of h → e + e − e + e − and h → µ + µ − µ + µ − . In these decay channels, there are two possible combinations of + − pairs. We adopt the strategy taken in the ATLAS 4 analysis, whereby all  possible same-flavour lepton-antilepton pairs are considered and the pair with invariant mass closest to the mass of the Z boson is defined as the 'first' pair (with mass m 12 ). The second pair is then constructed from the remaining lepton and antilepton. Figure 5 shows the differential cross section as a function of the CP-odd observable produced by a binary NN, where the NN has been trained to distinguish between the positive-and negative-interference effects produced by the O Φ W B operator in the h → e + e − µ + µ − decay channel. The model retains the capability to distinguish between the different interference contributions for the h → e + e − e + e − and h → µ + µ − µ + µ − decay channels, but there are two key differences with respect to the h → e + e − µ + µ − decay channel. The first is a sign-flip in the differential cross section contribution at O N N = ±1; this arises due to the increase in kinematic combinations allowed for the h → e + e − e + e − and h → µ + µ − µ + µ − amplitudes leading to an inversion of the correlation of Fig. 3. The second feature is that the contribution at O N N ∼ ±1 is smaller and broader, implying a poorer separation of the interference contributions. The change in sign means that the observable will need to be measured independently for each decay channel to avoid an unwanted cancellation in the asymmetry. The constraints obtained on Wilson coefficients when including the information from all three decay channels and using the CP-odd observable constructed for a binary network are found to improve by 10-20% when compared to the constraints obtained from the h → e + e − µ + µ − decay channel alone.

B. h + 2 jets
We now turn to VBF as another avenue to constrain the CP structure of Higgs boson interactions. The phenomenology of vector boson fusion is very different to h → 4 because VBF is a multi-scale process, whereas the Higgs mass sets the scale for h → 4 .
In Fig. 6, we show the signed-∆φ jj distribution of Eq. (6) for the interference contribution induced by the O Φ W operator with c Φ W /Λ 2 = 1 TeV −2 . This is the most important operator that affects the Higgs boson production via vector boson fusion, with the remaining electroweak operators of Eq. (2) playing a subdominant role (see, e.g., Refs. [25,66,67]). The lack of sensitivity to the interference contributions induced by the O Φ B and O Φ W B operators arises due to the hypercharge coupling structure of the Z boson interactions and the off-shellness of the t-channel momentum transfers, which leave a small Zγ-interference contribution related to a small set of partonic subprocesses. The signed-∆φ jj is found to dominate all other correlations when we perform a binary classification. Concretely, the network learns to distinguish between positive and negative interference contributions simply by projecting out the total asymmetry. This is shown in Fig. 7, where the interference contribution and the SM contribution both populate the same bins at high |O N N |. Any other kinematic dependence that is characteristic of a given operator is irrelevant when we only try to discriminate between positive and negative interference contributions. To better exploit the underlying kinematics, we can turn to the multi-class networks, which learns the kinematic information of the SM beyond the symmetry of ∆φ jj . This is shown in Fig. 7, where the additional information is used to discriminate between the SM contribution and the interference contributions. The SM contribution is located closer to O N N ∼ 0 than the interference contributions, implying that the multi-class network has exploited some differences in kinematics between the SM prediction and the interference prediction.
To quantify the sensitivity of each observable constructed for VBF h + 2 jets, the constraints on Wilson coefficients are estimated using the same likelihood setup as described in Sec. IV A. The expected number of events are obtained from the event generator samples after applying a normalisation factor that is defined such that the SM prediction for VBF Higgs production reproduces the number of events observed experimentally in the VBF 1 fiducial region of Ref. [58] (corresponding to an integrated luminosity of 139 fb −1 ). The SM-only event yields are then further increased to account for background contributions from non-Higgs processes.
The constraints on the Wilson coefficients are given in Tab. II. For all operators considered in this work, the multi-class neural network improves the constraints when compared to the use of ∆φ jj alone. It is also clear that the kinematic information accessed via the multiclass approach is crucial for constraints on O Φ W : the binary classification does not access bin-to-bin sensitivity, which leads to a slightly decreased sensitivity compared to ∆φ jj . Only the O Φ W operator can be constrained yet the bulk of the discrimination happens at low p T where the biggest share of the cross section is localised. significantly with the LHC Run-II datatset (139 fb −1 ), as the constraints on O Φ W B and O Φ B remain too loose to by directly physically relevant. This is is in line with previous findings [25,66,67]. However, the gain in sensitivity to these operators that can be achieved by using neural networks will be important with larger datasets in the future, e.g. in LHC Run-III, at the High-Luminosity (HL) LHC, or at a Future Circular Collider.

V. SUMMARY AND CONCLUSIONS
In this article, we have outlined a method to directly construct CP-odd observables using the output of neural networks. The method exploits the fact that CP asymmetries arise from the interference between the SM and BSM scattering amplitudes. The neural-network is then able to optimise the separation of positive-and negativeinterference contributions, using the full kinematic information that is available for a given production or decay process.
We demonstrated the performance of this method by constructing CP-odd observables for the h → 4 decay channel and the VBF Higgs production mechanism. Although CP-odd observables can be exploited in either channel to constrain CP-violating interactions in the Higgs sector, we have shown that the use of neural networks can lead to large improvements in sensitivity to CP-violating effects in the Higgs sector. Specifically, we demonstrated this using dimension-six effective field theory predictions for the interference contributions. Improving the sensitivity to CP-violating effects Results are presented for a one-dimensional fit to the ∆φjj distribution, and fits to the ONN variable constructed from the neural-net outputs of the binary and multi-class models. The ONN variable is constructed from neural networks trained on the interference predicted by the each operator separately.
in h → 4 and VBF Higgs production is particularly important for the self-consistency of the dimension-six approach [68,69].
In the h → 4 decay channel, we have shown that both binary networks and multi-class networks improve the sensitivity to CP-violating effects in the Higgs boson interactions with weak bosons, when compared to the use of traditional angular variables alone. Using the kinematic features identified by the network, we showed that the improved sensitivity derives from a sign-flip in the interference contributions when the highest-mass leptonantilepton pair corresponds to an on-shell Z boson or an off-shell Z * /γ * bosons. The sign-flip in the interference term arises due to different contributions of h → Zγ and h → ZZ amplitudes in each region. Operators that modify the electroweak Higgs boson gauge interactions can therefore be constrained with much higher sensitivity than focusing solely on the CP-odd angular observable (Φ 4 ), either by using the observable constructed from the neural network output or by performing a doubledifferential analysis of the event yield as a function of Φ 4 and m 12 . Specifically, we found that a sizeable O(10) improvements can be achieved using our method when compared to the use of the angular observable alone. It will be important to eventually compare the sensitivity of the CP-odd observables presented in this paper to those constructed directly from matrix-element methods [49].
In VBF Higgs production, the sensitivity to CPviolating effects is predominantly limited to O Φ W . We found that the angular observable ∆φ jj drives this sensitivity. However, the multi-class network outperforms the angular observable, as it accesses the full kinematic information of each class and tensions the interference contribution against the SM contribution. We note that the application of multi-class machine learning improves the sensitivity to the phenomenologically less significant EFT operators by at least a factor of two, which is equivalent to quadrupling the integrated luminosity of the dataset.
Our neural-net-based method will therefore allow these operators to be scrutinised in detail at the HL-LHC.
We note that the construction of CP-odd observables using neural networks can be generalised to many other processes that probe CP-violation at the LHC, including other Higgs boson production and decay channels, as well as searches for CP-violating effects in the weakboson self-interactions. Although we have focused on a SMEFT analysis, the techniques presented in this work directly generalise to light propagating BSM degrees of freedom that could induce CP violation (this could be captured via retaining the full mass dependence of the Wilson coefficients on the BSM particles). In a similar spirit, absorptive parts of SM amplitudes [70] could be analysed via the introduced classification, thus providing a novel angle on validating QCD predictions. While more traditional approaches, (i.e. measuring angular observables in h → 4 and pp → h + 2 jets) remain important tools for the clarification of the CP-structure of the Higgs sector, their generalisation to more comprehensive BSM classifiers also taking into account additional correlations will enhance the sensitivity of analyses during LHC Run-III, the HL-LHC, and at a potential Future Circular Collider.