Unveiling CP property of top-Higgs coupling with graph neural networks at the LHC

The top-Higgs coupling plays an important role in particle physics and cosmology. The precision measurements of this coupling can provide an insight to new physics beyond the Standard Model. In this paper, we propose to use Message Passing Neural Network (MPNN) to reveal the CP nature of top-Higgs interaction through semi-leptonic channel $pp \to t(\to b\ell^-\nu_\ell)\bar{t}(\to \bar{b}jj)h(\to b\bar{b})$. Using the test statistics constructed from the event classification probabilities given by the MPNN, we find that the pure CP-even and CP-odd components can be well distinguished at the LHC, with at most 300 fb$^{-1}$ experimental data.


I. INTRODUCTION
The discovery of Higgs boson [1,2] is a great step in the long quest to the origin of mass. The precision measurement of Higgs couplings is one of the main goals of the future LHC experiment, which will further reveal the electroweak symmetry breaking mechanism and shed light on new physics beyond the Standard Model (SM). Among these couplings, the top-Higgs interaction is particularly interesting. On one hand, due to its large size, this coupling dominantly contributes to the renormalization group evolution of the Higgs potential and thus plays an unique role in determining the scale of new physics. On the other hand, the direct extraction of this coupling from the QCD process pp → ttH is very challenging since it has a small production rate and rather complicated final states. Very recently, the ATLAS and CMS collaborations have reported the observation of ttH production in several Higgs decay channels, such as γγ and W + W − , at the 13 TeV LHC [3,4]. The LHC Run-3 with a higher luminosity will have great potential to decipher the structure of top-Higgs coupling.
The general top-Higgs interaction can be parameterized as where y t = √ 2m t /v and ξ = 0 in the SM [5], with v = 174 GeV being the vacuum expectation value of the Higgs field. The presence of sin ξ term leads to the CP violation in top-Higgs coupling. It will affect the Higgs production and decay channels, the electric dipole moments and the flavor physics observables, which are measured to well agree with the SM and then set strong constraints on the CP-violating top-Higgs coupling [6,7]. However, these indirect bounds are model-dependent and * renjie@itp.ac.cn † leiwu@itp.ac.cn ‡ jmyang@itp.ac.cn can be evaded in some extensions of the SM. Therefore, the most robust test of top-Higgs coupling in Eq. (1) is from the direct measurement of ttH production at colliders. Several observables, such as tt spin correlation and charge asymmetry in ttH production, are constructed to probe the CP violating top-Higgs coupling at the LHC [8][9][10][11][12][13][14][15][16][17].
On the other hand, applying machine learning (ML) techniques, exceptional performance can be achieved with object level kinematic variables from jets, leptons, and photons to separate the signal from the background [18][19][20][21][22][23][24]. A successful example in this aspect is the use of boosted decision trees in the LHC experiment that led to the discovery of Higgs boson [25]. Since a collision event can be seen as a geometrical pattern formed by a number of final state objects, graph is a natural way to represent the events in mathematical language, which can be efficiently analyzed by an appropriate ML approach. Among the ML algorithms to deal with graphs, the Message Passing Neural Networks (MPNNs) [26] are particularly suited for graph classification and flexible enough as the original Graph Neural Networks (GNNs) [27,28] as nonlinear end-to-end models that relate the target output to the input graphs. An MPNN consists of a number of learnable functions acting on the graph nodes, and can be efficiently trained using supervised learning techniques. So far, the MPNNs have been successfully applied in supersymmetry exploration [29], jet physics [30] and other fields [26].
In this paper, we attempt to use MPNN to investigate the CP nature of top-Higgs coupling in Eq. (1). We design and train a specific MPNN to classify the collider events, and then perform a hypothesis test based on the variable constructed from the output of the MPNN. As a proof of concept, we focus on the semi-leptonic top decay channel of the process pp → ttH(→ bb) at the LHC.

II. METHODS
For convenience, we denote the Higgs boson with CPeven ttH coupling (ξ = 0) as h, and the one with CP-odd ttH coupling (ξ = π/2) as A. At the LHC, the tth and ttA signals have the same background events dominantly coming from the process pp → ttbb.
We choose event graphs [29] as the representation of collider events, and design an MPNN specific to the classification of the collider events, whose outputs are the probabilities of the input event graph being tth, ttA and ttbb event, respectively. Then, we construct a variable from the output of MPNN and perform a hypothetical test.

A. Event graph
As the input of MPNN, we represent each of the collider events as an event graph. FIG. 1, as illustration, shows an event graph for a specific simulated tth event.
For a given collider event, the nodes in the graph are used to represent the final state objects, including the reconstructed photons, leptons, jets and missing transverse momentum (MET). Each node has a compact seven dimensional feature vector x i = (I 1 , I 2 , I 3 , I 4 , p T , E, m) to describe the major properties of the corresponding final state. Except the transverse momentum p T , energy E and mass m, the first four features are indicators of the type of final state: (1) it is a photon (I 1 = 1) or not (I 1 = 0); (2) it is a lepton (I 2 =charge) or not (I 2 = 0); (3) It is a b-jet (I 3 = 1), light jet (I 3 = −1) or not a jet (I 3 = 0); (4) it is the MET (I 4 = 1) or not (I 4 = 0).
Each pair of nodes is linked by an edge, which is weighted by the geometrical distance between the corresponding pair of final states. We choose to use d ij = ∆(η i , η j ) 2 + ∆(φ i , φ j ) 2 to measure the geometrical dis- FIG. 2. The architecture of MPNN designed for classifying tth, ttA and ttbb events. It has one node embedding layer, two message passing and node update layers and one output layer. The small circles denote vector concatenation. The arrows denote applying non-linear functions. The summation and average run over all nodes.
tance between two final states i and j, where η and φ are the pseudo-rapidity and azimuthal angle, respectively. Notice that the differential cross section of a collider event is invariant with the rotation of the whole event along the beam. To respect such an important geometrical symmetry of collider event, we exclude the information of azimuthal angle from the node features, and only encode the difference of azimuthal angles in the edge weights. In such a design, the event representation and classification will be stable, regardless of the rotation of event along the beams. Note also that, (1) the number of nodes in an event graph depends on specific collider event, (2) there is no ordering of nodes, and (3) the data in event graphs are exact. These are the main differences between event graph representation and other collider event representations used as input for ML models.

B. Network architecture
The architecture of our MPNN is shown in FIG. 2. It has one node embedding layer, two message passing and node update layers and one output layer.
The node embedding layer embeds each node feature vector x i into a higher dimensional node state vector s 0 i by applying a linear transformation and the rectified linear unit (ReLU) activation function, where W e and b e are learnable parameters. The state vector s 0 i only encodes the node features x i without any information about the geometrical pattern of the graph.
In the following two message passing and node update layers, the nodes exchange information contained in their state vectors by passing messages. At layer t, each node i collects the messages sent from each nodes j and then update its state vector, where the brackets denote vector concatenation, W s and bs are learnable parameters in each layer. Note that, to make the edge weight d ij more suitable in linear transformation, we expand it onto 21 Gaussian bases to form a weight vectord ij , whose components are where µ k are linearly distributed in range [0, 5] and σ = 0.25. The message passing mechanism is the key to automatically extract features of the input event graph, which can efficiently disseminate the information among all the nodes taking into account the connections between the nodes. After the two message passing and node update layers, each node state vector can be viewed as an encoding of the whole event graph.
In the output layer, each node i produces three probabilities p i by applying a linear transformation and the softmax activation function on its state vector s i , To stablize the classification performance, we average the output over all the nodes as the final output of MPNN, where N is the number of nodes in the input event graph. The three components of p are the probabilities of the single input event graph e being the tth, ttA and ttbb event, respectively, denoted as p(h|e), p(A|e) and p(b|e).
It is worth to note that the MPNN is a dynamic neural network, which can be viewed as a stack of several learnable transformations acting on each signal or pair of graph nodes. Therefore, MPNN intrinsically scales with the size of input event graph.

C. Training
The MPNN can be efficiently trained using supervised learning. We choose cross entropy as the loss function. The gradients of loss to learnable parameters are evaluated on each mini-batch of 500 examples. The learnable parameters are optimized using the ADAM optimizer [31] with a fixed learning rate of 0.001. The training is performed up to 300 epochs and we choose the MPNN parameters which lead to the best generalization performance (minimum loss) on the validation set. All the above are implemented with the open-source deep learning framework PyTorch [32] with extensive GPU acceleration.

D. Hypothesis test
If the top-Higgs coupling is CP-even, the event sample collected in experiments will come from the tth process plus the dominate ttbb background process. Otherwise, it consists of a mixing of ttA and ttbb events. Therefore, we define variables that can discriminate event samples of the two scenarios. From the single-event probabilities output from the MPNN, we construct two likelihoods to measure the consistence of a given event sample D with each of the two scenarios. In the CP-even scenario, L h (D) L A (D); otherwise in the CP-odd scenario, L h (D) L A (D). It is worth to note that the productions only run over events with p(h|e) and p(A|e) larger than p(b|e). Namely, we exclude the backgroundlike events in the evaluation, which can effectively reduce the contamination of background.
To perform a hypothesis test, here we choose to use the log-likelihood ratio as the test statistics. The distribution of ln Q in the two scenarios, denoted as f h and f A , respectively, can be numerically obtained by evaluating a large number of random simulated event samples, namely, performing pseudo experiments. Because actually generating a huge number of simulated events can be extremely time-consuming, we adopt the bootstrap technique. First, for each process X, we generate a simulated event dataset, in which the number of events is large enough and the events have equal weights. Then, we construct event samples of process X by randomly sampling the corresponding dataset with replacement. The number of events n in an event sample obeys Poisson distribution P (n|λ) with the average number of events λ = X σ X L, where σ X is the production cross section of process X, L is the integrated luminosity and X is the event selection efficiency for process X.
In the CP-even (odd) scenario, a pseudo experimental event samples is the union of a tth (ttA) sample and a ttbb sample.
Given the distributions of ln Q in the two scenarios and the ln Q * calculated from the observed experimental data D * , as shown in FIG. 3, we can evaluate the p-values of rejecting the CP-even scenario and the CP-odd scenario by the integrals respectively.

III. RESULTS
Simulated events of the tth, ttA and ttbb processes are generated separately using MadGraph5 [33] at 13 TeV LHC. Showering and hadronization are performed by Pythia8 [34]. Detector simulation is done by Delphes [35] with ATLAS configuration. CheckMATE2 [36] is used to perform event selection. Leptons are detected within p T > 20 GeV and |η| < 2.5, and jets are required to have p T > 25 GeV and |η| < 2.5. B-tagging is performed with 60% nominal efficiency. We focus on the semi-leptonic channel, requiring events to have exactly one lepton, four b-jets and at least two light jets in the final states. After event selection, the detection cross sections σ for the tth, ttA and ttbb processes are 3.78, 1.82 and 27.5 fb, respectively. We collect 900,000 examples with balanced number of tth, ttA and ttbb events as the training set for optimizing the parameters in our MPNN, while another 300,000 examples are collected as the validation set for performance evaluation.
We show in FIG. 4 the distributions of the output of our trained MPNN evaluated on the validation set. It is clear that the MPNN has successfully learned important discriminative event features for different processes. The background events are prone to have higher p(b|e), while the tth, ttA events get higher p(h|e) and p(A|e), respectively.
We perform millions of pseudo-experiments for each of the two scenarios. The pseudo experiment events are taken from the validation set. In FIG. 5, we show the probability distributions of the log-likelihood ratio ln Q from pseudo-experiments (left panel) and receiver operating characteristic (ROC) curves of the hypothesis test (right panel) for different values of integrated luminosity. With the increase of luminosity, we can see that the overlap between the two distributions reduces significantly and the ROC curves will be closer to the corner. When the integrated luminosity reaches 300 fb −1 , the two distributions will be almost separated. This indicates CPeven and CP-odd components can be well distinguished at the LHC with at most 300 fb −1 experimental data.