Unbinned Deep Learning Jet Substructure Measurement in High $Q^2$ ep collisions at HERA

The radiation pattern within high energy quark- and gluon-initiated jets (jet substructure) is used extensively as a precision probe of the strong force as well as an environment for optimizing event generators with numerous applications in high energy particle and nuclear physics. Looking at electron-proton collisions is of particular interest as many of the complications present at hadron colliders are absent. A detailed study of modern jet substructure observables, jet angularities, in electron-proton collisions is presented using data recorded using the H1 detector at HERA. The measurement is unbinned and multi-dimensional, using machine learning to correct for detector effects. All of the available reconstructed object information of the respective jets is interpreted by a graph neural network, achieving superior precision on a selected set of jet angularities. Training these networks was enabled by the use of a large number of GPUs in the Perlmutter supercomputer at Berkeley Lab. The particle jets are reconstructed in the laboratory frame, using the $k_{\mathrm{T}}$ jet clustering algorithm. Results are reported at high transverse momentum transfer $Q^2>150$ GeV${}^2$, and inelasticity $0.2<y<0.7$. The analysis is also performed in sub-regions of $Q^2$, thus probing scale dependencies of the substructure variables. The data are compared with a variety of predictions and point towards possible improvements of such models.

Interactions between quarks and gluons (partons) are described by the theory of Quantum Chromodynamics (QCD) [1]. At high energy particle colliders, outgoing partons produce collimated sprays of particles known as jets. The radiation pattern inside jets (jet substructure) provides insight into the emergent properties of QCD at high energies. Electron-proton collisions are ideal for studies of strong interaction processes. They are induced by an intermediate electroweak gauge particle and provide clean initial conditions. This is in contrast to studies at hadron colliders where the subprocesses are to be studied in the complex context of strong interactions including underlying event effects. The HERA accelerator facility was operated in the years 1992 to 2007 colliding electrons and positrons with protons. Jet properties were extensively studied at HERA during data taking [2][3][4][5], but all of these studies pre-date modern jet substructure [6][7][8][9][10]. With new theoretical and methodological advances, novel insight can be extracted from the preserved HERA data [11,12] for precision QCD studies as well as for event generator improvements.
A canonical set of observables used to explore different aspects of the jet radiation pattern are the generalized angularities [13]: with z i = p T,i /p jet T for a particle with momentum p T,i transverse to the incoming beams and clustered inside a jet with distance parameter R 0 and transverse momentum p jet T . The variable R i is the Euclidean distance between particle i and the jet axis in the pseudorapidity-azimuthal angle plane. These observables can be further augmented by multiplying the summand in Eq. 1 with the constituent electric charge q i to form the charged-weighted angularitiesλ κ β . In this work, normalized multi-differential cross sections are measured as a function of six jet angularities. The jets are selected at high transverse momenta in neutral current deep in-elastic scattering (DIS) at high photon virtualities Q 2 , resulting for the majority of the collisions in single jet events.
The angularities include three infrared and collinear (IRC) safe observables: the jet broadening (λ 1 1 ) [14][15][16], an intermediate observable λ 1 1.5 , and the jet thrust (λ 1 2 ) [17]. Three IRC unsafe variables are also studied: the momentum dispersion [18][19][20], the jet charge Q 1 =λ 1 0 [21,22], and the number of charged constituents N c =λ 0 0 . All observables measured in this work are summarized in Table. 1. These observables are chosen because they have been extensively studied in e + e − and pp collisions and they cover a diverse set of physics effects. The momentum dispersion is known to be one of the best probes for separating jets originating from quarks versus jets originating from gluons [23]. The charged particle multiplicity is also an excellent quark-versusgluon jet discriminant [24,25] and its evolution with energy scale has been measured at nearly all colliders that have studied jets, including the SPS [26][27][28], PETRA [29,30], PEP [31][32][33][34] 56]. The charged constituent multiplicity also is important input to event generator tuning [57]. The jet charge can be used to disentangle quark jets of different flavors [58,59]. Even though it is not collinear-safe, its energy scale dependence can be predicted in perturbation theory [22,60] and can be used to search for scaling violation [58]. The IRC safe observables can be computed in perturbation theory both at a fixed energy scale and as a function of energy scale. The thrust is formally equivalent to the jet mass, which is the most precisely known observable in e + e − and pp collisions [61][62][63][64][65][66] and may be used to extract the strong coupling constant [67]. All IRC safe observables are presented here after a logarithmic transformation, highlighting the region dominated by non-perturbative effects. Results are also reported as a function of the energy scale set by the DIS photon virtuality, thus probing the evolution of jet substructure.
New machine learning methods are used to simultaneously correct (unfold) all observables for detector effects, where Graph Neural Networks (GNN) are used for the first time to process all of the reconstructed particles inside jets. Making use of the unbinned nature of the data unfolding, both mean and standard deviation of the measured distributions are provided at multiple Q 2 intervals. They are reported after unfolding and are free of binning effects.
These first measurements, derived from the HERA preserved data, are going to set the foundation for future studies at the future electron-ion collider or possible other electron-proton collider experiments [68][69][70][71], which all will operate in kinematic regimes complementary to the HERA machine.

Experimental method
A full description of the H1 detector can be found elsewhere [72][73][74][75][76]. The detector components that are most relevant for this measurement are described below. Results are reported using a right handed coordinate system with positive z direction pointing towards the outgoing proton beam and positive x-axis pointing to the center of the HERA ring. The nominal interaction point is located at z = 0. The polar angle θ is defined with respect to the z axis, the azimuthal angle ϕ is measured in the xy plane. The pseudorapidity is defined as η lab = − ln tan(θ/2). The main sub-detectors used in this analysis are the inner tracking detectors and the Liquid Argon (LAr) calorimeter, which are both immersed in a magnetic field of 1.16 T provided by a superconducting solenoid. The central tracking system, which covers 15 • < θ < 165 • and the full azimuthal angle, consists of drift and proportional chambers that are complemented with a silicon vertex detector in the range 30 • < θ < 150 • [77]. It yields a transverse momentum resolution for charged particles of σ p T /p T = 0.2% p T /GeV ⊕ 1.5%. The LAr calorimeter, which covers 4 • < θ < 154 • and full azimuthal angle, consists of an electromagnetic section made of lead absorbers and a hadronic section with steel absorbers; both are highly segmented in the transverse and longitudinal directions. Its energy resolution is σ E /E = 11%/ √ E/GeV ⊕ 1% for leptons [78] and σ E /E ≈ 50%/ √ E/GeV ⊕ 3% for charged pions [79]. In the backward region (153 • < θ < 177.5 • ), energies are measured with a lead-scintillating fiber calorimeter [76]. Results are reported using data recorded by the H1 detector in the years 2006 and 2007 when protons and electrons/positrons (henceforth referred to as 'electrons') were collided at energies of 920 GeV and 27.6 GeV, respectively. The total integrated luminosity of this data sample corresponds to 228 pb −1 [80].
Events are triggered by requiring a high energy cluster in the electromagnetic part of the LAr calorimeter. The scattered electron is identified as the highest transverse momentum LAr cluster matched to a track passing an isolation criteria [81]. Events containing scattered electrons with energy E e ′ > 11 GeV are kept for further analysis, resulting in a trigger efficiency higher than 99.5% [82,83]. Backgrounds from additional processes such as cosmic rays, beam-gas interactions, photoproduction, charged-current DIS and QED Compton processes are rejected after dedicated selection [83,84], resulting in negligible background contamination.
The inelasticity and photon virtuality are reconstructed using the Σ method [85] defined as where Σ e' = E e ′ (1 − cos θ e ′ ) with θ e ′ the polar angle of the scattered electron and Σ had = (E i − p i,z ) is the difference between the energy and longitudinal momentum sums of the entire hadronic final state (HFS). HFS objects are reconstructed using an energy flow algorithm [86][87][88] after removing energy clusters and tracks associated to the electron. Additionally, events are required to have 45 < Σ had + Σ e' < 65 GeV to suppress initial-state QED radiation and contributions from photoproduction. Jets are defined in the laboratory frame by clustering HFS objects satisfying −1.5 < η lab < 2.75. The FastJet 3.3.2 package [89,90] is used with longitudinally invariant k T clustering algorithm [91,92] using the default E-scheme and distance parameter R 0 = 1. All jets in the event with p T > 5 GeV are kept for further analysis.
The unfolding targets ('particle-level') are calculated in the simulation using final-state particles with proper lifetime cτ > 10 mm and excluding the scattered lepton and photons radiated from one of the lepton lines. Reconstructed and generator level jets are matched by requiring the distance ∆R = (ϕ jet gen − ϕ jet reco ) 2 + (η jet gen − 0.9.
Results are presented after unfolding the data to particle level for events in the kinematic region defined by Q 2 > 150 GeV, 0.2 < y < 0.7, p jet T > 10 GeV, and −1.0 < η jet < 2.5.

Monte Carlo simulations
Monte Carlo (MC) simulations are used to correct the data for detector acceptance and resolution effects as well as to compare theoretical predictions with unfolded results.
Detector acceptance and resolution effects are estimated using large samples of simulated events, generated with the Djangoh 1.4 [93] and Rapgap 3.1 [94] event generators. Both implement Born level matrix elements and are interfaced with Heracles [95][96][97] for QED radiation. The CTEQ6L Parton distribution function (PDF) set [98] and the Lund hadronization model [99] with parameters determined by the ALEPH Collaboration [100] are used for the non-perturbative components.
Djangoh uses the Colour Dipole Model as implemented in Ariadne [101] for higher order emissions, and Rapgap uses parton showers in the leading logarithmic approximation. Each of these generators is combined with a detailed simulation of the H1 detector response based on the Geant3 simulation program [102] and reconstructed in the same way as data.
Additional predictions, yet without a detailed simulation of the H1 detector, are made using a set of state-of-the-art generators developed mostly for pp collisions. Predictions from Pythia 8.3 [103,104] are used for comparison using the default implementation and two additional parton shower implementations: Vincia [105,106] and Dire [107]. Vincia uses a p Tordered model for QCD + QED showers based on the antenna formalism while Dire implements a p T ordered dipole shower similar to Ariadne. The NNPDF3.1 PDF set [108] is used for both default and Vincia implementation and the MMHT14nlo68cl PDF set [109] is used for the Dire implementation. Predictions from Herwig 7.2 [110,111] are calculated using the cluster hadronisation model [112] with default implementation parameters and alternative MatchBox [113] matching and merging [112] schemes. Predictions from a pre-release version of Sherpa 3.0 [114] are provided by the Sherpa authors featuring a new cluster hadronisation model [115] and matrix element calculation at next-to-leading order (NLO) obtained from Open-Loops with the Sherpa Dipole Shower [116] based on the truncated shower method [117,118].

Unfolding methodology
The unfolding procedure is carried out by simultaneously unfolding the six jet angularities, jet momentum (p T , η, ϕ), and photon virtuality Q 2 using the OmniFold method [119,120]. OmniFold corrects the data for detector effects using machine learning in an iterative reweighting process that is unbinned and naturally incorporates high dimensional inputs. This method generalizes the Lucy-Richardson deconvolution technique [121][122][123] designed for binned spectra.
At each iteration, OmniFold employs a set of classifiers trained using neural networks to estimate two reweighting functions used during the unfolding procedure. Each iteration has two steps, with one classifier each. In the first step, a classifier is trained to distinguish data from simulated events, resulting in event weights based on the classifier output. The second step updates the simulation by training a classifier to distinguish between the original simulation and the simulation weighted after the first step. This second step only uses the particle-level information of simulated events as inputs, producing a weight map that is a proper function of the particle level inputs. The updated simulation is then used as the starting point for the following OmniFold iteration. The process is repeated a total of 13 times; six iterations are chosen for the final results as that choice yields the lowest overall uncertainty.
The first step uses up to 30 HFS objects clustered inside jets as inputs to a GNN, where HFS objects are represented as nodes of the graph. This novel hybrid approach has been found to reduce uncertainties by accounting for all possible covariates of the detector response. For each HFS object with mo-mentum (p T ,η,ϕ) and electric charge q, the kinematic information used for the first step classifier is the set (z,∆η,∆ϕ,q), with z = p T /p jet T , ∆η = η − η jet , and ∆ϕ = ϕ − ϕ jet . Jet momentum (p jet T , η jet ,ϕ jet ) and Q 2 information are also included in the aggregation function of the graph implementation. In total, 30 × 4 + 4 = 124 features are considered during this first step, requiring the network implementation to learn the differences between data and simulation in a high-dimensional setting. Leveraging the success of GNNs for jet classification [124][125][126][127][128][129][130][131][132][133][134][135] based on particle sets as inputs, the Point Cloud Transformer (PCT) [132,136] GNN architecture is chosen. This GNN architecture is optimized to learn the relationship between nearby particles through the use of attention layers, resulting in state-of-the-art classification performance for different types of jets [132]. The local neighborhood in PCT is defined by selecting the five closest neighbors of each particle with distances calculated in the η − ϕ plane. The impact of adding more than 30 particles was found to be negligible.
In the second step, a simplified classifier architecture is used. The inputs of the classifier are the same jet kinematic information and Q 2 used in the previous step, but HFS objects are replaced by the jet observables that are reported in the final results. This choice decreases the number of inputs by an order of magnitude and reduces the training time of the classifier by a factor five. Since the first step is tasked to incorporate as much information as possible from the data, the simplified classifier adopted for the second step does not degrade the unfolding results while greatly reducing the computational cost of the method. This simplified classifier is implemented using a fully-connected neural network with three hidden layers with 64, 128, and 64 nodes per layer. Different hyperparameter choices of the neural networks were also tested, but no significant difference in the final results was observed. The robustness of the classifiers is improved by training an ensemble of 10 network models for each OmniFold step. The average response of the ensemble is then used to derive the reweighting functions.
The output of all classifiers is passed through a sigmoid activation function and trained using the binary cross-entropy loss function. The training proceeds until the validation loss, estimated with a statistically independent data set, does not improve for 10 consecutive epochs. All machine learning methods are implemented using the Keras [137] and TensorFlow [138] libraries, using the Adam [139] optimization algorithm. In total around 50 × 10 6 jets are used during the training of the method split between data and simulation. To derive the unfolded results and uncertainties, 2800 neural networks are trained independently using 128 graphic processing units (GPUs) simultaneously with the Perlmutter supercomputer [140] and Horovod [141] library used for parallel distributed training.

Uncertainties
Systematic uncertainties on the reconstruction of the data observables are estimated by varying the relevant parameters in the simulation and carrying out the full unfolding procedure for each simulation variant. Uncertainties on HFS objects include the energy scale from two different contributions: HFS objects contained in high p T jets and other HFS objects. In both cases, the energy-scale uncertainty is ±1%. Both uncertainty sources are estimated separately [83,142] by varying the corresponding HFS energy by ±1%. An uncertainty of ±20 mrad is assigned to the azimuthal angle determination of HFS objects. Lepton uncertainties are considered in the Q 2 determination and the uncertainty on the lepton energy scale ranges from ±0.5% to ±1% [83,143]. Uncertainties on the azimuthal angle of the scattered lepton are estimated to be ±1 mrad [82]. In Appendix A distributions on reconstructed level and the uncertainties are compared with predictions from Djangoh and Rapgap generators.
Additional uncertainties from the unfolding procedure are estimated to cover a possible bias from the generator choice used to perform the unfolding procedure. A model uncertainty is estimated by unfolding the data with Djangoh instead of the default Rapgap. In addition, a non-closure uncertainty is estimated by unfolding Djangoh as pseudo-data using the Rapgap simulation and comparing to Djangoh at particle level. The combined effects of the unfolding uncertainties typically are below 10% and increase only in the more extreme regions of the jet angularities. QED corrections accounting for virtual and real higher-order QED effects as implemented in HERACLES are small and the full effect is taken as an uncertainty, estimated by comparing predictions with and without first order QED radiative effects at the leptonic vertex. The statistical uncertainty is estimated using the bootstrap technique [144]. The unfolding procedure is repeated on 100 pseudo datasets, each defined by resampling the original dataset with replacement. Normalization uncertainties, such as luminosity scale and trigger efficiencies, are not considered since they cancel in the ratio of the normalized differential cross section results.

Results
The results of this analysis are unbinned normalized cross sections as a function of Q 2 and the generalized jet angularities. Unfolded results of the normalized differential particle level cross sections of the jet substructure observables are presented in Figs. 1 in the kinematic region described in Sec. 1. In order to visualize the unfolded distributions bins have been introduced based on the respective detector resolution. All distributions are observed to accumulate around a distinct peak. Mean and standard deviation are calculated from the unbinned, unfolded event sample, and are shown for the six jet angularities in four Q 2 ranges in Figs. 2 and 3, respectively, with numerical results reported in Tables 2 and 3. Distributions of the angularities in the four Q 2 ranges are shown in Appendix B.
Predictions by the DIS MC generators, Djangoh and Rapgap, and by general purpose simulators, Sherpa, Herwig, Pythia show a good agreement with data for most observables. For Herwig, the largest differences to data are observed in the charged hadron multiplicity, predicting a higher number of particles than observed. In contrast, Pythia underpredicts the charged particle multiplicity. Sherpa shows a good agreement in all IRC-safe observables and is able to describe the data at low observables values, where contributions from soft-radiation are dominant and challenging to describe, as evidenced by a worse agreement by Pythia and Herwig.
A larger selection of predictions is used to study the mean and standard deviations as a function of Q 2 in Figs. 2 and 3. Rapgap shows a good agreement with data in all distributions and Q 2 intervals, while Djangoh shows worse agreement with data at high Q 2 values. Of the three Pythia prediction variants, where different parton shower models are studied, the Dire parton shower provides the best overall description of the data. The Herwig variants, where NLO matrix elements and details of the matching procedure are studied, all are in overall good agreement with data. Large deviations from data are observed in the charged hadron multiplicity, which has a high sensitivity to the modeling of the hadronization process. Predictions from Sherpa show a better agreement with data at lower Q 2 values and larger discrepancies at the highest Q 2 interval. Sherpa is also able to provide a good description of the standard deviation of the IRC jet angularities together with DIS decidated generators and the Pythia implementation with Dire parton shower. In contrast, all other implementations predict a larger standard deviation than observed.

Conclusion
A first measurement of jet angularities in neutral current DIS events with Q 2 > 150 GeV 2 and 0.2 < y < 0.7, as well as divided into four Q 2 intervals is presented. The distributions as well as mean and standard deviation of these jet substructure observables are sensitive to perturbative QCD effects and to hadronization models used to describe jet formation.
The process of unfolding detector effects makes use of novel machine learning methods. All measured distributions are simultaneously unfolded using the OmniFold approach. Kinematic information from each reconstructed particle clustered inside a jet is input to a dedicated Graph Neural Network implementation that learns the correlation between particles clustered inside a jet. On particle level, however, the jet substructure is represented by the generalized angularities alone. The success of this approach demonstrates that different sets of unbinned observables at detector level and particle level can be used to determine the unfolded distributions. The training of over two thousand neural networks is carried out on the Perlmutter supercomputer using 128 GPUs simultaneously during the unfolding procedure.
While the unfolding procedure is unbinned, results are provided as histograms for both single-and multi-differential cross sections to ease the comparison with different theory predictions. Mean and standard deviation of all observables are calculated after unfolding in multiple Q 2 ranges. Theory predictions from dedicated DIS simulators provide an overall good description of all measured quantities. In particular, a good agreement between data and Rapgap simulation is observed for all distributions and Q 2 intervals. General purpose simulators are also able to describe the data well. Pythia interfaced with Dire parton shower, Herwig, and Sherpa predictions all show a good agreement with most of the jet angularities studied. The largest dif-     ferences to data are observed for the charged hadron multiplicity, which is expected to be most sensitive to non-perturbative effects. The H1 data can be used to further improve the precision of event generators and analytic calculations. The Q 2 dependence is measured and can be compared to scale evolution predictions. The measurements can serve as a complementary guide for jet substructure studies at the future Electron-Ion Collider, which is going to record much larger data samples in a similar kinematic region. Results presented in this work make use of all detector objects during the unfolding procedure, learning the relationship between reconstructed objects using neural networks. This points to a possible more general algorithm for data unfolding. In the generalization, all particles or jets in a collisions could be made available after unfolding. This would greatly simplify future comparisons to predictions, in particular for specialized theories valid only in small phase-space regions, or for new observables not known to date. the engineers and technicians for their work in constructing and maintaining the H1 detector, our funding agencies for financial support, the DESY technical staff for continual assistance and the DESY directorate for support and for the hospitality which they extend to the non-DESY members of the collaboration.
We express our thanks to all those involved in securing not only the H1 data but also the software and working environment for long term use, allowing the unique H1 data set to continue to be explored. The transfer from experiment specific to central resources with long term support, including both storage and batch systems, has also been crucial to this enterprise. We therefore also acknowledge the role played by DESY-IT and all people involved during this transition and their future role in the years to come. a supported by the U.S. DOE

Appendix A. Data and simulation comparison at reconstruction level
Prior to unfolding, a comparison between the data and simulation is performed using the selection criteria described in Sec. 1 and the angularities calculated from the reconstructed detector objects. Both Rapgap and Djangoh simulations are compared to data in Fig. A.4. The Rapgap simulation with unfolding weights applied is also shown.

Appendix B. Binned distribution of unfolded observables at different energy scales
In this section, the normalized differential cross section of the jet substructure observables after unfolding are presented in Figs. B.9, B.10, B.8 with N being the total number of unfolded jets in the considered Q 2 range and ∆N representing the number of unfolded jets in the respective bin of O. The bin width is denoted ∆X.

Appendix C. Data tables for all presented histograms
Measured values of the normalized differential cross sections and contributions from each uncertainty source to the unfolded values are listed in Tables C.4, C.5, C.6, C.7, and C.8.       . Data are shown as solid dots, horizontal bars indicate the bin ranges. Predictions from multiple simulations are shown for comparison, and are offset horizontally for visual clarity. The relative differences between data and predictions are shown in the bottom panels, split between dedicated DIS simulators (middle) and general purpose simulators (bottom). Gray bands represent the total data uncertainties.   . Data are shown as solid dots, horizontal bars indicate the bin ranges. Predictions from multiple simulations are shown for comparison, and are offset horizontally for visual clarity. The relative differences between data and predictions are shown in the bottom panels, split between dedicated DIS simulators (middle) and general purpose simulators (bottom). Gray bands represent the total data uncertainties.