Spectral analysis of jet substructure with neural networks: boosted Higgs case

Jets from boosted heavy particles have a typical angular scale which can be used to distinguish them from QCD jets. We introduce a machine learning strategy for jet substructure analysis using a spectral function on the angular scale. The angular spectrum allows us to scan energy deposits over the angle between a pair of particles in a highly visual way. We set up an artificial neural network (ANN) to find out characteristic shapes of the spectra of the jets from heavy particle decays. By taking the Higgs jets and QCD jets as examples, we show that the ANN of the angular spectrum input has similar performance to existing taggers. In addition, some improvement is seen when additional extra radiations occur. Notably, the new algorithm automatically combines the information of the multipoint correlations in the jet.


Introduction
At multi TeV pp colliders such as the LHC, boosted heavy particles can be produced and form a single collimated cluster of particles.Such a localized cluster is distinguished from QCD jets from quarks or gluons by the substructures of the cluster [1].For this purpose, consistent definitions of substructures of jets have been studied extensively.There are various methods for identifying the jet substructures, such as strategies based on cluster decomposition [1][2][3][4][5][6][7][8] and shape variables [9][10][11][12][13].These methods focus on different features of jet substructures to maximize the discrimination power.For the case of Higgs, W, and Z boson decaying hadronically into two quarks, a critical feature is a two-prong substructure inside.Because the key features depend on nature of the parent particle of a jet, there are several frameworks that can be applied to jets [14][15][16][17][18].
In this paper, we propose a new framework to identify jet substructures using a spectral function similar to the angular structure function [14,19,20].Spectral analysis is widely used technique to explore quantum worlds.A notable example of spectral analysis is proton nuclear magnetic resonance ( 1 H-NMR) spectroscopy of organic molecules.Organic molecules consist mostly of carbon skeleton and hydrogen atoms.Shift and splitting of the resonant frequency of the hydrogen depend on interactions between the hydrogen and the rest of the molecular substructure.Hence, a 1 H-NMR spectrum can be used to determine molecular structures as illustrated in Figure 1.We develop a suitable spectrum of jet substructures, so that spectral analysis techniques are applicable to identify the nature of a given jet.An artificial neural network (ANN) is a useful technique for analyzing the spectrum.Jet substructure analyses based on ANN are gaining attention recently and have been studied in various contexts.The analyses are categorized mainly into two groups with different inputs.One group utilizes special-purpose observables and uses ANN to identify a correlation between substructures [15,17,21] similar to analyses with a boosted decision tree [22,23].The other group uses the jet constituents directly and uses ANN to find out particular substructures in a jet.This group is again categorized into two subgroups depending on how you interpret jet constituents.One can interpret jet constituents as an image [24] and use image recognition techniques [25][26][27][28][29].The other strategies are to interpret a jet as a sequence of data, such as clustering sequence of the jet algorithms [30][31][32][33][34][35], and utilize ANN for sequential data anlysis [36,37] Our approach is different from these approaches, namely our network is requested to analyze an event-by-event spectrum of a jet.This approach reduces the inputs of the ANN significantly but it can still learn characteristic non-local correlations in a jet from a heavy particle like the non-local neural networks for video classification [38].We will show that our approach improves the separation between Higgs jets and QCD jets in natural manner.
This paper is organized as follows.In Sec. 2, we define the spectral function S 2 (R) and describe its nature.We also explain the setup of our Monte-Carlo simulations to show the performance.In Sec. 3, we introduce the spectral analysis with ANN and present a comparsion of performance between ANN with S 2 (R) and ANN with previously known quantity D 2 [12] for two-prong substructures.Sec. 4 is devoted for summary and discussions.

A Spectral Function of Jet Substructure
Inspired by1 H-NMR, we introduce a spectral function of jet substructure.The perturbative description of jets is analogous to organic molecules.Partons from a heavy particle decay determine primary geometry of a jet, corresponding to a carbon skeleton which fixes the structure of the organic molecule.The final objects in the jet are hadronized particles from the partons, and we consider them analogous to the hydrogen atoms surrounding the carbon skeleton.We then use correlations between two jet constituents to define a spectrum from transverse momenta of all particle pairs i and j, p T,i and p T,j , and their angular distance , where η ij is pseudorapidity and φ ij is azimuthal angle between i and j.The spectrum is better to be infrared and collinear (IRC) safe, namely invariant under soft and collinear radiations.If the IRC safety is not satisfied, then Kinoshita-Lee-Nauenberg theorem [39,40] is not applicable, and such a spectral density is hard to be estimated from perturbative QCD calculations.
We define a binned spectral function for two-point correlation, where ∆R is the bin width.In a continuum limit ∆R → 0, 1 this spectral function turns into, where d R P T ( R) is a p T sum of constituents in a neighborhood d R of R, and δ(x) is the Dirac δ function.The spectrum may be easily generalized into three-point or multi-point spectral functions, analogous to energy correlation functions [11] and energy flow polynomials [18].
The S 2 (R) spectrum is IRC safe.A radiation a(p T,a ) → b(p T,b ) + c(p T,c ) does not change S 2 (R) if b or c is soft because p T,b p T,c → 0. A collinear splitting also does not change S 2 (R) because two momenta stay at the original R = (η, φ) coordinate.Note that summing the autocorrelation term p 2 T in Eq. 2.1 is necessary to achieve IRC safety at R = 0 because the crossing term p T,b p T,c after splitting is originated from the autocorrelation term p 2 T,a .The IRC safety of S 2 (R) is also understood in the context of C-correlators [18,41].The S 2 (R) is a special case of C-correlators with an unbounded non-smooth angular weighting function f 2 (p i , pj ) = δ(R − R ij ).If we replace the Dirac δ function to a bounded smoooth function, for example, δ(x) → (|a| √ π) −1 e −(x/a) 2 , then the Taylor expansion of δ(x) transforms S 2 (R) into a series of IRC safe energy flow polynomials [18] with two vertices.The series converges to S 2 (R) in the limit a → 0, and the IRC safety of S 2 (R) is understood asymptotically.
The S 2 (R) spectrum is constrained by kinematics.The integrals of S 2 (R) are one-point and two-point energy correlation functions [11], which are approximately the transverse momentum p T,jet and the mass m jet of the jet, (2.4) These integrals imply the physical interpretations of S 2 (R).S 2 (R) measures contribution to p 2 T,jet from pairs of jet constituents at a particular relative angle R, and R 2 S 2 (R) measures contribution to m 2 jet from the pairs at the angle R.
We perform a Monte Carlo study of classifying Higgs jets and QCD jets using S 2 (R) spectrum.We generate pp → Zj events and pp → Zh events followed by h → b b, and use the leading jet of the events as training sample of one prong and two prong jets respectively.Each sample is generated at the leading order in QCD using MadGraph5 aMC@NLO 2.6.1 [42] with parton distribution function (PDF) set NNPDF 2.3 LO at α S (m Z ) = 0.130 [43].Z bosons are forced to decay into neutrinos so that they are invisible to the detector.The events are showered and hadronized by Pythia 8.226 [44] with Monash tune [45].We include effects of underlying events such as multi-parton interaction and beam remnant treatment but we do not take pile-ups into account.
Finally, we simulate detector response using Delphes 3.3.3[46] with their default ATLAS setup.Jets are reconstucted from calorimeter tower using anti-k T algorithm [35] with a jet radius parameter R jet = 1.0 implemented in fastjet 3.3.0[47,48].We study substructures of leading p T jets having p T,jet ∈ [300, 400] GeV and m jet ∈ [100, 150].The characteristic angle between two b quarks from the boosted Higgs boson is then R b b 2m h /p T,jet ≈ 0.83.Hence, the choice of the jet radius is enough to catch them efficiently.For Zh events, we additionally require that at least one b parton momentum in the matrix element level is located within R ≤ 1 from the jet center to avoid contamination of hard initial state radiations.After these preselections, we have 256691 Zh and Zj events for training ANN.When we test ANN, we use a testing sample generated independently to the training sample.
We show typical pixelated jet images and S 2 (R; 0.1) spectra of a Higgs boson jet and a quark jet in Figure 2.For the boosted Higgs boson, the S 2 (R; 0.1) distribution has two prominent peaks at R = 0 and R b b which correspond to autocorrelation and cross-correlation of two b partons,  After the initial parton production, parton shower starts.Each splitting of a parton is characterized by the angle betwen daughter partons and their momenta.S 2 (R) sums up those individual splittings.The peaks in R is smeared by the parton shower and hadronization, but it is not easy to modify the initial radiation pattern.Figure 2 (top center) is qualitatively similar to the 1 H-NMR spectrum in Figure 1.In Figure 2, we also show the quark jet spectrum.It does not have distinctive peaks.Instead, the spectrum have a gradually decreasing spectrum from R = 0 to high R.
Note that the S 2 (R) spectrum is not completely independent to the angular structure function ∆G(R) in [14].The funtion ∆G(R) is related to S 2 (R) by This ∆G(R) is a Higuchi's fractal dimension [49] of G(R) which measure irregularity of G(R) over R. QCD jets have a uniform ∆G(R) distribution on average because of approximate scale invariance of QCD [14,19,20].On the other hand, G(R) of the multi-prong jet shows sharp peaks at some angular scales [14].In [14], a number of peaks and peak heights are used to classify jets.In this paper, we construct classifier using ANN from S 2 (R) spectrum to utilize the global structures of the jet.

Spectral Analysis with Artificial Neural Network
We now feed these event-by-event S 2 (R; ∆R) spectra to our ANN to build a classifier between the boosted Higgs jet and the QCD jet.First, we prepared an equal number of Higgs jets and QCD jets to avoid overfitting from unbalanced data.We use TFLearn [50] with backend TensorFlow [51] for the ANN analysis.An input set we consider includes S 2 (R; 0.1) up to angular scale R < 2, between each hidden layer.The network is trained by Adam optimizer [52] with learning rate 0.001, β 1 = 0.99 and β 2 = 0.999 minimizing a categorical cross-entropy, We call this network as N S2 .In the trained network, Higgs jets have scores near 1, while QCD jets have scores near 0. We validate N S2 using the testing samples.We compare the performance of N S2 with the performance of a network trained with specialpurpose observables.First, we choose D 2 which is sensitive to the two-prong substructure of jets [12].D 2 is defined by a ratio of two-point and three-point energy correlation functions e β 2 and e β 3 as follows.
where the summations of e β 2 and e β 3 run over all jet constituents.We use β = 2 for further discussion.For Higgs jets, D 2 tends to have a small value because e β 3 is suppressed by collinear and soft radiations while e β 2 is large because the pairs of jet constituents with R ij ∼ R b b dominate.To optimize the performance of D 2 , we prepare another neural network which maps following inputs to the Higgs-like score y D2 , Again, the input data is normalized to [0, 1], i.e., x i → (x i − min x i )/(max x i − min x i ), where max x i and min x i are the maximum and the minimum of x i in the training sample respectively.We use smaller hidden layers (100, 100) ReLU nodes because the number of inputs is smaller.The other ANN setups are identical to the S 2 (R) analysis.We call this network as N D2 .
To make ANN learn a hierarchy between soft and hard radiations, variables after jet trimming [53] are useful.To obtain trimmed quantities, we first reconstruct k T subjets [31,32] with R sub = 0.2 from constituents of the jet and remove subjets having transverse momentum p T,subjet < f cut •p T,jet , where f cut = 0.05.In the right panel of Figure 2, we show S 2,tr (R) spectrum, which is S 2 (R) spectrum of the trimmed jet constituents, of a Higgs jet and a QCD jet.The S 2 (R) spectrum before trimming is shown in the central panel.The two-prong substructure of a Higgs jet is hard, and the double peak structure appears both in S 2 (R) and S 2,tr (R).On the other hand, the spectrum of a QCD jet is significantly modified, which means that soft activities dominate the S 2 (R) spectrum.This shows a difference between S 2 (R) and S 2,tr (R) contains useful information for the classification.
We then prepare two networks N D2+tr and N S2+tr taking inputs {x i } D2+tr and {x i } S2+tr , ) We show receiver operating characteristic (ROC) curves of our ANN analyses N D2+tr , N D2+tr , N S2 , and N S2+tr in Figure 3.The ROC curves show that ANN with S 2 (R; ∆R) rejects more QCD jets for fixed Higgs jet efficiency.At the Higgs tagging efficiency 0.4 (0.2), QCD jet mistag rate of N S2+tr is reduced by 21.6% (26.7%) compared to that of N D2+tr .This is expected because N S2+tr uses two-point energy correlation from S 2 (R; ∆R) and infers three-point energy correlation from correlations between different angular scales.For example, S 2 (R; ∆R) of a three-prong jet having angular scales R 1 , R 2 , and R 3 has three peaks away from R = 0.The intensity of each peak gives a threepoint energy correlation function, e This results in better discrimation power.Also, adding trimmed observables allows ANN to learn hard and soft substructures separately.Hence, the ANN solve degeneracy in the variables before trimming and reject QCD jets better.
The network N D2+tr uses minimum and simplest inputs; therefore it is easy to check ANN reaction to the input parameters m jet , p T,jet , and D 2 .We show distributions of the events in the Higgs-like probability p h (Y D2+tr ) and one of the inputs {x i } D2+tr plane in Figure 4.The Higgs-like probability of N X , p h (Y X ), is defined by probability to get a score y X less than Y X for the Higgs jet samples, where Pr(C|H) represents a conditional probability of C given H.A large p h (Y X ) means that the given jet is more Higgs-like in N X .Figure 4 shows that N D2+tr tries to select jets having m jet around 125 GeV up to energy loss, and small D 2 for capturing two-prong jets.Note that the N D2+tr is trained by the QCD jets and the jets from a boosted Higgs boson with m H = 125 GeV.Higgs jets with m jet (trimmed m jet ) bigger (smaller) than m h = 125 GeV are likely to be categorized as QCD jets.These shifts of the mass from the input Higgs mass are due to contamination of other activities, or large angle radiations from b jets.The distributions of QCD jets in Figure 4 show more clearly the definition of Higgs-like events.QCD jets with high p h (Y D2+tr ) have small D 2,tr , m jet,tr ∼ 115 GeV, and m jet ∼ 125 GeV.The events with m jet ∼ 130 GeV and p h (Y D2+tr ) < 0.2 and events with m jet,tr > 100 GeV and p h (Y D2+tr ) < 0.3 are clearly reduced and instead distributing in higher p h (Y D2+tr ) region.Here N D2+tr rejects QCD jets in a similar fashion that a cut based approach rejects QCD events.
We compare the Higgs-like probabilities in N D2+tr and N S2+tr in Figure 5 to see the origin The event is Higgs-like in the neural networks with S2(R), NS 2 and NS 2 +tr , but it is not in those with D2, ND 2 and ND 2 +tr .The sum of the bins of the spectrum is normalized to 1.The blue triangles and green plus symbols in the pixelated jet images indicate direction of b quarks and gluons obtained from the matrix element including order αS radiations.We show the Higgs-like probability p h (YX ) in an ANN model NX defined by Eq. 3.9.  of improvement.The events are widely spreading around the line p h (Y D2+tr ) = p h (Y S2+tr ), which means that those two analyses have different selection criteria.To quantify the residual anticorrelation of N D2+tr and N S2+tr , we show the fractions of the events in the upper triangular region p h (Y D2+tr ) > p h (Y S2+tr ) and the lower triangular region p h (Y D2+tr ) < p h (Y S2+tr ).For Higgs jets, the lower triangular region contains more events compared with the upper triangular region, 51.3% of the total events.For QCD jets, the lower triangular region contains less events, 43.1%.Hence, N S2+tr improves signal and background ratio S/B from N D2+tr .
To figure out how N S2+tr accepts more Higgs jets while rejecting more QCD jets compared to N D2+tr , we will show three examples of the events located at off-diagonal regions in Figure 5.We show a Higgs jet in Figure 6, which is Higgs-like in N S2+tr but regarded as a QCD jet in N D2+tr , p h (Y D2+tr ) = 19.6% and p h (Y S2+tr ) = 80.7%.This jet has a moderate wide-angle radiation on top of two-prong substructure which increases D 2 significantly.Remind that a Higgs jet originates from a color singlet particle while a QCD jet originates from a colored parton.Such wide-angle radiation is easily generated from a colored parton compared to a color singlet particle.N D2+tr is distracted by a large D 2 and assigns this jet as a QCD jet even though the jet has small trimmed D 2 .N S2+tr must have determined the jet as Higgs-like from the information of microscopic radiation patterns The event is Higgs-like in the neural networks with D2, ND 2 and ND 2 +tr , but it is not in those with S2(R), NS 2 and NS 2 +tr .The sum of the bins of the spectrum is normalized to 1.The red triangles and green plus symbols in the pixelated jet images indicate direction of light quarks and gluons obtained from the matrix element including order αS radiations.We show the Higgs-like probability p h (YX ) in an ANN model NX defined by Eq. 3.9. in S 2 (R) which shows a clear double peak structure.Figure 7 shows D 2 and D 2,tr distributions in events having p h (Y D2,tr ) < 30% and p h (Y S2,tr ) > 70%.We can see that some, but not all, events with large D 2 but small D 2,tr fall into this region.
The second example in Figure 8 is an event classified as a Higgs jet in N D2+tr but categorized as a QCD jet in N S2+tr .This jet has evident two-prong substructure, and hence, N D2+tr classifies this jet as a Higgs jet.However, the two subjets are asymmetric in p T .Such events appear frequently in QCD jet samples.We did not give subjet momenta to N D2+tr , and the ANN classify the jet as a Higgs jet.In contrast, S 2 (R) knows the p T asymmetry by comparing the peak intensity; see Eq. 2.5.Hence, N S2+tr avoids these p T asymmetric events populated by QCD jets while N S2+tr finds the cut on the subjet p T from the training samples.In the mass-drop tagger [1], the events with asymmetric p T subjets are removed by hand.
The third example in Figure 9 is the case where only N S2+tr , which takes into account of trimmed S 2 (R), classifies the jet as a Higgs jet.This jet has a two-prong substructure but deeply buried in radiations compared to Figure 8.As a result, D 2 is large, and S 2 (R) is similar to QCD jet with falling S 2 (R) structure toward high angular scale R. Trimming helps N S2+tr this time because N S2+tr recognize hard and soft substructure separately by comparing S 2 (R) and S 2,tr (R).Trimming does not alter the tail of S 2 (R) distribution, which means the structure at large R is hard.This should be compare with S 2 (R) and S 2,tr (R) of typical QCD jets in Figure 2.

Discussion and Conclusion
In this paper, we have introduced a spectral analysis of jet substructure with the artificial neural network (ANN).Unlike the other ANN approach, our algorithm use the spectral function S 2 (R) constructed from p T and R of the pair of particles in the jet.The S 2 (R) spectrum is useful in describing substructures with large angular separation by relatively small inputs.ANN can learn non-local correlations in jets from the spectrum.To show this, we have constructed ANN from S 2 (R), N S2 , and compare it with ANN from D 2 , N D2 .We have shown that N S2 discriminates between boosted Higgs jets and QCD jets with better performance compared to N D2 .Introducing trimming to S 2 (R) further helps to separate hard and soft substructures, and the ANN with trimmed observable outperforms the ANN without trimming.The improvement comes from the better handling of the cases with radiation from b parton or with contamination of other hadronic activities.
The improvement we observe is not large, because D 2 catches the two-prong substructure of the Higgs jet efficiently, but ANN analysis with S 2 (R) has much wider application.One of the merits of N S2 and N S2+tr is that the analyses automatically take care of radiations from the b jet.Note that the existence of radiation has to be taken care of even in the original mass drop tagger by [1].The S 2 (R) has information on three-point correlation and higher simultaneously and additional selections are not required.Consequently, S 2 (R) can be used in a cascade decay of a heavy particle, especially the top quark.We also note that S 2 (R) is sensitive to the color of the boosted heavy particle.We will show in a separate publication that a modified N S2 discriminate color octet resonance and color singlet resonance efficiently [54].

Figure 1 .
Figure 1.Molecular structure of ethanol and its 1 H-NMR spectrum.The intensity, location and splitting of peaks allow us to identify the original molecular structure.The chemical shift is the resonant frequency of hydrogen relative to a reference frequency.

Figure 2 .
Figure 2.The pixelated jet image (left), two-point spectrum S2(R; 0.1) (center), and the trimmed spectrum S2(R; 0.1) (right) of a typical Higgs jet (top) and a typical quark jet (bottom).The sum of the bins of each spectrum is normalized to 1.The red reversed triangles, blue triangles, and green plus symbols in the pixelated jet images indicate direction of light quarks, b quarks and gluons obtained from the matrix element including order αS radiations.We show the Higgs-like probability p h (YX ) in an ANN model NX defined by Eq. 3.9.

Figure 4 .Figure 5 .
Figure 4.The distributions of Higgs jets and QCD jets in the Higgs-like probability p h (YD 2 +tr ) and inputs in {xi}D 2 +tr .For each pair of figures, the left is the distribution of Higgs jets and the right is the distribution of QCD jets.

Figure 6 .
Figure 6.The pixelated jet image (left), and two-point spectrum S2(R; 0.1) (right) of a Higgs jet.The event is Higgs-like in the neural networks with S2(R), NS 2 and NS 2 +tr , but it is not in those with D2, ND 2 and ND 2 +tr .The sum of the bins of the spectrum is normalized to 1.The blue triangles and green plus symbols in the pixelated jet images indicate direction of b quarks and gluons obtained from the matrix element including order αS radiations.We show the Higgs-like probability p h (YX ) in an ANN model NX defined by Eq. 3.9.

Figure 8 .
Figure 8.The Pixelated jet image (left), and two-point spectrum S2(R; 0.1) (right) of a Higgs jet.The event is Higgs-like in the neural networks with D2, ND 2 and ND 2 +tr , but it is not in those with S2(R), NS 2 and NS 2 +tr .The sum of the bins of the spectrum is normalized to 1.The red triangles and green plus symbols in the pixelated jet images indicate direction of light quarks and gluons obtained from the matrix element including order αS radiations.We show the Higgs-like probability p h (YX ) in an ANN model NX defined by Eq. 3.9.

Figure 9 .
Figure 9.The Pixelated jet image (left), two-point spectrum S2(R; 0.1) (center), and trimmed spectrum S2(R; 0.1) (right) of a Higgs jet which is Higgs-like only in the NS 2 +tr .The sum of the bins of each spectrum is normalized to 1.The blue triangles and green plus symbols in the pixelated jet images indicate direction of b quarks and gluons obtained from the matrix element including order αS radiations.We show the Higgs-like probability p h (YX ) in an ANN model NX defined by Eq. 3.9.
jet , m jet , S 2 (0; 0.1), • • • , S 2 (1.9; 0.1)}.(3.1)Note that R = 2 is the diameter of our jet definition.All the input data {x i } are standardized, i.e., x i → (x i − xi )/σ(x i ), where xi and σ(x i ) are the mean and the standard deviation of x i of the whole training sample including both Higgs jets and QCD jets.The network is configured with four hidden layers having (400, 300, 200, 100) ReLU nodes and an output layer with two softmax nodes giving a Higgs-like score y S2 .To avoid overtraining, we insert dropout layers with rate 20%