Is infrared-collinear safe information all you need for jet classification?

: Machine learning-based jet classifiers are able to achieve impressive tagging performance in a variety of applications in high-energy and nuclear physics. However, it remains unclear in many cases which aspects of jets give rise to this discriminating power, and whether jet observables that are tractable in perturbative QCD such as those obeying infrared-collinear (IRC) safety serve as sufficient inputs. In this article, we introduce a new classifier, Jet Flow Networks (JFNs), in an effort to address the question of whether IRC unsafe information provides additional discriminating power in jet classification. JFNs are permutation-invariant neural networks (deep sets) that take as input the kinematic information of reconstructed subjets. The subjet radius and a cut on the subjet’s transverse momenta serve as tunable hyperparameters enabling a controllable sensitivity to soft emissions and nonperturbative effects. We demonstrate the performance of JFNs for quark vs. gluon and Z vs. QCD jet tagging. For small subjet radii and transverse momentum cuts, the performance of JFNs is equivalent to the IRC-unsafe Particle Flow Networks (PFNs), demonstrating that infrared-collinear unsafe information is not necessary to achieve strong discrimination for both cases. As the subjet radius is increased, the performance of the JFNs remains essentially unchanged until physical thresholds that we identify are crossed. For relatively large subjet radii, we show that the JFNs may offer an increased model independence with a modest tradeoff in performance compared to classifiers that use the full particle information of the jet. These results shed new light on how machines learn patterns in high-energy physics data.


Introduction
Jets are highly energetic and collimated groups of particles observed in the detectors of high-energy scattering experiments such as the Large Hadron Collider (LHC) [1][2][3].Jets arise from the fragmentation of highly energetic quarks and gluons, which themselves can arise from the decay of unstable particles such as the Higgs boson.Classifying the origins of jets, such as quark vs. gluon initiated jets, QCD vs. boosted Z/W jets, and QCD vs. boosted top jets  is crucial to disentangle the various processes occurring at collider experiments and perform searches for physics beyond the Standard Model.
Jet classification algorithms have been developed based on multivariate combinations of jet substructure observables as well as using machine learning methods.Machine learning based jet classifiers significantly outperform traditional multivariate jet taggers that utilize a limited number of observables, since they are able to leverage the full information in the jet.However, machine learning based classifiers often have the drawback that they are not calculable by analytical methods.Efforts to address this have been an active area of research, such as enforcing Infrared-Collinear (IRC) safety [25] in the network architecture [26][27][28][29][30] or by finding optimal ways to reduce the amount of information provided as input to neural networks [31][32][33][34][35]. IRC safety is a frequently used guiding principle in high-energy physics for constructing suitable observables.It is often stated as: "An observable is IRC safe if it is insensitive to infinitesimally soft or exactly collinear emissions" [36].It is a defining feature of observables that allows for calculations in QCD where divergences cancel at each order in perturbation theory [1,37].We note one has to distinguish between the IRC safety of an observable, i.e. the calculability or traceability of an observable within perturbative QCD, and the question about the relevance of nonperturbative effects.While both questions are important, we will primarily focus on IRC safety throughout this work.
In order to increase the interpretability of machine learning based classifiers, a complete IRC-safe basis of jet substructure observables was introduced in Refs.[10,38,39] based on N -subjettiness observables [40,41].These observables capture the momentum and relative angles of emissions inside the jet.The set of N -subjettiness observables is then used as input to a machine learning algorithm for jet classification.While the complete basis of IRC-safe observables is large (3M − 4 for M particles in the jet), it was found that the performance of classifiers saturates quickly with a relatively small number of observables.Another set of observables, Energy Flow Polynomials (EFPs), was developed as a linear and IRC-safe basis of jet substructure observables in Ref. [26].
Interestingly, it was found that while the performance of classifiers based on complete sets of observables saturates, in most cases there remains a performance gap between classifiers with IRC-safe inputs (Sudakov safe classifiers) and IRC-unsafe classifiers that make use of the full information content of the particles inside the jet.Examples of such IRC-unsafe classifiers include architectures based on deep sets [11], point clouds [12,42] and transformers [43].This performance gap has been observed for a variety of jet classification tasks, including QCD vs. W/Z and H jets [35], and pp vs. AA jets [31].For quark vs. gluon tagging, it was found that the IRC-safe EFPs can match the performance of PFNs when only momentum information of particles in the jet is considered [11].Several efforts have been made to quantify the gap, with the aim to gain new insights into fundamental QCD dynamics.There are several possible explanations for the observed performance gap: • IRC-unsafe classifiers may be able to make use of the very soft information content of jets, which is difficult to access with IRC-safe observables.
• IRC-unsafe classifiers such as PFNs take as input the exact position information of the particles inside the jet, whereas IRC-safe observables can only capture the information of relative distances.It is possible that existing machine learning algorithms can make more efficient use of position information.
• The specific form of the IRC-safe observables may not be optimal for classification tasks and there may be other sets of observables that could perform better.
With this question in mind, we introduce in this work a new machine learning-based jet classifier, Jet Flow Networks (JFNs) 1 , which take as input the energy and position of reclustered subjets instead of individual particles.JFNs allow for soft and collinear emissions to be clustered into subjets making the input IRC-safe and the resulting classifier generally Sudakov safe [44,45].Sudakov safe observables are not IRC-safe, and as such are not defined at any order in the strong coupling α s and yet have finite cross sections when all-orders effects are included.This happens because the Sudakov form factor regulates the infrared divergences.An example of such an observable is the ratio formed from two different angularities [46] measured on the same jet.However, different than the N -subjettiness or the EFP basis of observables, position information is used instead of having (indirectly) access only to relative distances between emissions (or subjets) inside the jet.We note that we do not consider quark flavor tagging in this work which requires IRC-unsafe information, see e.g.Refs.[47][48][49].JFNs are closely related to Particle Flow Networks (PFNs) [11] and Energy Flow Networks (EFNs) [11], which will be elaborated on in section 3.In the limit of a vanishing subjet radius, where every subjet contains only a single hadron, and in the limit of a vanishingly small cut on the transverse momenta of the subjets, JFNs are identical to PFNs.The radius of the reclustered subjets in JFNs, as well as the cut on the transverse momenta, can be used to dial in nonperturbative information allowing for a smooth transition to IRC-unsafe classifiers.As such, JFNs complement the existing family of permutation-invariant networks in particle physics.As for PFNs and EFNs, we will utilize machine learning algorithms for JFNs based on a permutation invariant deep set architecture [11,[50][51][52].
In this paper, we will explicitly study classification tasks of quark vs. gluon jets and jets from QCD vs. jets from boosted hadronic decays of Z bosons.The particular IRC (un)safety of the likelihood ratios for these tasks has been studied previously.It is expected that the likelihood ratio for quark vs. gluon jet discrimination is IRC safe [29], which has been validated in previous machine learning studies [27,28,53].The argument for IRC safety is that the N -body phase space can be spanned by additive, IRC safe observables [54] and so probability distributions for quarks and gluons will in general take the form of a Sudakov factor near the infrared regions of phase space.The rate of suppression of emissions is controlled by the corresponding color Casimirs, and, because gluons carry more color than quarks, the likelihood ratio itself vanishes at all infrared boundaries.Therefore, all fixed-order divergences are mapped to the same value of the likelihood ratio, namely 0, and so the classifier is IRC safe.This explains why EFPs, which are an IRC-safe classifier, can match the performance of the IRC-unsafe PFNs.By contrast, the likelihood ratio for QCD jets vs. Z jets is expected to only be Sudakov safe, as optimal observables for general onevs.two-prong discrimination take the form of the ratio of IRC safe observables [40,[55][56][57].Ratios of IRC safe observables are in general themselves not IRC safe [58].
To separately study soft and collinear safety, we consider both a finite subjet radius (collinear safety) and a low momentum cut p soft T on the transverse momentum of the subjets (soft safety).The main result of our work will be to show that the JFNs based on IRC-safe input achieve the same classification performance as PFNs.The JFNs represent the first example of a classifier based on IRC-safe inputs that achieve equivalent performance on several classification tasks as its IRC-unsafe counterpart.In addition, the machine learning architecture is equivalent to PFNs, which allows for one-to-one comparisons.The exact value of the subjet radius where the PFN performance is matched depends on the classification task at hand.Therefore, different than the classifiers based on complete IRC-safe sets of observables, JFNs constitute a "gapless" classifier indicating that the very soft aspects of jets are in fact not relevant for typical classification tasks at collider experiments.This answers in part the question about the features that are relevant for the performance of We use the inclusive anti-k T algorithm to identify the initial jet and the subjets.Particles are represented by small filled circles with radii proportional to the particle's transverse momentum in the ∆y vs. ∆φ plane, where ∆φ = φ particle|subjet − φ jet is the azimuthal angle with respect to the jet axis and ∆y = ∆y particle|subjet − ∆y jet is rapidity distance to the jet axis.Subjets are shown with larger colored areas where red marks the leading subjet, green marks the second leading subjet, blue marks the third leading jet, and shades of gray represent subjets with lower longitudinal momentum fraction z = p subjet T /p T with intensity proportional to z. classifiers in high-energy physics.Throughout this work, PFNs are taken as a reference, but other permutation invariant classifiers such as GNNs, transformers (equivalent to a fully connected graph), and point clouds could equally well be trained on particles or reclustered subjets.
In addition to shedding light on the role of IRC-safe information, JFNs allow for new insights into the physics of jet tagging and may lead to various future applications at high-energy collider experiments.By studying the performance of the JFNs as a function of the subjet radius and the jet transverse momentum, we are able to identify the relevant physical scales of different classification tasks.For example, for Z vs. QCD-jet tagging, we find that the subjet scale p T r is sensitive the opening angle between the boosted hadronic decay products of the Z-boson.Second, we explore the generalization capability of JFNs to unseen data, which is crucial when deploying a classifier trained on simulations to experimental data.Due to the clustering of collinear and soft emissions into subjets, the resulting JFNs are relatively insensitive to the detailed modeling of the infrared (IR) physics that is often poorly understood.This raises the possibility to use JFNs to trade performance for generalizability by adjusting the number of reconstructed subjets.Lastly, we expect that subjets can be measured well in heavy-ion collisions despite the large fluctuating background.See Ref. [59] for recent measurements of the energy spectrum of inclusive and leading subjets by the ALICE Collaboration.See also Ref. [60].
The remainder of this paper is organized as follows.In section 2, we introduce the subjet basis and discuss differences between inclusive and exclusive subjet reconstruction algorithms.In section 3, we introduce the permutation invariant machine learning algorithms that take the kinematic information of the reconstructed subjets as input and in section 4, we briefly discuss the data sets used for different classification tasks used in this work.In section 5, we present numerical results for the classification performance of JFNs for quark vs. gluon and Z vs. QCD jets.In particular, we show that JFNs match the PFN performance for a finite subjet radius and a finite cut on the transverse momenta of the subjets in both test cases.Based on these results, we describe in section 6 that the machine learning algorithm is sensitive to different physical scales, which it can effectively learn.In section 7, we investigate the tradeoff between performance and generalizability of the JFNs.In section 8, we draw conclusions and present an outlook.

The subjet basis
In this section, we describe the reconstruction of subjets that will serve as the input to the machine learning classifier.The initial jet is identified using the anti-k T algorithm [61] and jet radius parameter R. In order to utilize the substructure of jets, we then recluster the jet constituents into subjets.We consider two approaches for the subjet reconstruction: inclusive anti-k T subjets and the exclusive k T subjets [62,63].Note that we do not consider exclusive anti-k T jets as this clustering procedure does not lead to a "physical" clustering tree.In both cases, soft and collinear emissions are first clustered into subjets.For a finite subjet radius and an additional cut on the subjet's transverse momentum p soft T , the input to the classifier is IRC safe.In this sense, subjets serve as a useful tool for throttling or controlling the input data to the machine in a way that is theoretically interpretable in perturbative QCD.
First, we consider inclusive subjets reconstructed with the anti-k T algorithm using the E-recombination scheme [64] and a fixed subjet radius r < R.This approach fixes the maximally allowed size of the reconstructed subjets but the number of subjets varies for each jet.We illustrate the distribution of subjets in the η-ϕ plane for three different subjet radii in Fig. 1.As r is increased, the central subjet contains a large fraction of particles.Second, we consider subjets reconstructed with the exclusive k T algorithm and the E-recombination scheme.Particles are clustered with the k T algorithm until a fixed number of subjets N is obtained.Different than in the case of inclusive subjets, the number of identified subjets is fixed but their size varies jet-by-jet.The N subjets span the full information content of the N most resolved emissions inside the jet analogous to the N -subjettiness basis developed in Refs.[10,38,39].An alternative approach to the exclusive k T algorithm is to identify subjets with the XCone algorithm [65].We leave the exploration of this algorithm for future work.By taking the small-r (inclusive subjets) or large-N limit (exclusive subjets), as well as p soft T → 0, we can study the transition to the nonperturbative regime where eventually, every subjet only contains a single hadron.
To illustrate the qualitative differences between the two reconstruction methods discussed above, we show as an example the longitudinal momentum distributions of subjets z = p subjet T /p T in Fig. 2 separately for quark and gluon jets.The details on the data generation are presented in section 4.Here p T denotes the initial jet transverse momentum and p subjet T the longitudinal subjet momentum using either the inclusive or exclusive recon- The jets have p T = [500, 550] GeV with an average particle multiplicity of 43.We show the distributions for inclusive anti-k T subjet clustering with r = 0.02 (left) corresponding to an average of approximately 30 subjets, and for exclusive k T clustering with a fixed number of N = 30 subjets (right).See also section 4 for more details.struction method.As an example, we choose N = 30 for the exclusive reconstruction of subjets and r = 0.02 for inclusive subjets, which yields a comparable average number of 30 subjets.We observe that the two methods lead to qualitatively different spectra.The inclusive subjet spectrum exhibits a peak (quarks) or plateau (gluons) for intermediate to large values of z.In contrast, the spectrum for exclusive subjets only peaks at small values of z and falls off steeply for z → 1.This is due to the fact that for exclusive clustering the k T algorithm is used, where soft hadrons are clustered first.Only at the end hard emissions are combined, making it unlikely to find a subjet with z → 1 for a fixed value of N .We note that for inclusive subjets, the z-distributions are qualitatively the same for both the anti-k T and k T algorithms.The longitudinal momentum spectrum for inclusive subjets was calculated within perturbative QCD up to next-to-leading logarithmic (NLL) accuracy.See Refs.[67][68][69][70].This close connection to first-principles calculations may allow for an increased understanding of machine learning algorithms in QCD.
From the identified subjets, the kinematic information (z i , η i , ϕ i ) of the each subjet is used as input to the classifiers discussed below.In the limit that r → 0 (inclusive subjets) or N → ∞ (exclusive subjets), the subjet basis becomes equivalent to the set of particle four-vectors of the jet, and the classifier can make use of the full information content of the jet.The subjet basis therefore provides a means to limit the information supplied to the classifier, by using r > 0 or N < ∞.

Jet Flow Networks (JFNs): Deep sets of subjets
In this section, we describe the permutation invariant neural networks that use the kinematic information of subjets as input to perform binary classification tasks.As introduced above, PFN [11] JFN EFN [11] Input particle 3-momenta subjet 3-momenta particle 3-momenta Classifier IRC unsafe Sudakov safe IRC safe Table 1: Overview of different classifiers based on permutation invariant neural networks.
we refer to the machine learning architecture and the pre-processing step of clustering particles into subjets as JFNs.
The reconstructed subjets discussed in the previous section do not have an inherent ordering.Therefore, permutation-invariant neural networks are a natural choice to perform classification tasks that take as input the kinematic information of subjets.In Refs.[50][51][52] deep sets were introduced as a permutation invariant neural network.In the context of particle physics deep sets were first discussed in Ref. [11] as Particle Flow Networks (PFNs) that take as input the information of individual particles.A permutation invariant classifier f , which takes as input the subjet four-momenta Here π denotes the permutation operator.Following Ref. [50], we can write the classifier f as where F, Φ are neural networks and, as an intermediate step, we sum over all reconstructed subjets N .The first neural network Φ : R 4 → R l takes as input the individual subjet four momenta and maps it to an l-dimensional latent space.For massless subjets, we can write the individual four vectors in terms of (z i , η i , ϕ i ).Here z i is the subjet's longitudinal momentum fraction, see Fig. 2, and (η i , ϕ i ) denote its coordinates in the rapidity-azimuth plane.We note that further information can be included in the per-subjet mapping such as the jet mass or the jet charge [71,72], analogous to e.g.particle identification (PID) for PFNs.We leave quantitative studies of the impact of these additional features for future work.The summation in Eq. (3.1) ensures that the classifier f is invariant under permutations of the input variables.The second neural network F : R l → R is a map from the latent space where the summation operation is performed to the final classification score.Note that the classifier architecture in Eq. (3.1) can accommodate both a fixed number N of subjets (exclusive subjets) and input with variable length (inclusive subjets).
We refer to the deep set classifier based on subjets in Eq. (3.1) as JFNs.The JFNs are a family of classifiers due to the dependence on the continuous parameter r in the case of inclusive clustering or on N in the case of exclusive clustering, in which case the clustering is performed until N subjets remain.Since the JFN takes subjet information as input, the resulting classifier is generally Sudakov safe [45].We summarize the different aspects of permutation invariant network architectures based on deep sets in table 1.Since the JFNs are Sudakov safe, they constitute an intermediate point between IRC-unsafe PFNs and IRC-safe EFNs.In the limit of r → 0 (inclusive subjets) or the large-N limit (exclusive subjets) and p soft T → 0, we recover the PFN classifier.
We note that one can think of the input to the classifier as a multi-differential cross section or multi-variate probability distribution.To avoid for example regions of negative cross sections, the corresponding perturbative calculation requires the all-order resummation of sufficiently many logarithmic corrections.The resummed calculation then needs to be matched to fixed-order calculations.The classifier performs a highly nontrivial marginalization over the multi-variate probability distribution.Since all order resummations are required, the final result of the classifier is Sudakov but not necessarily IRC safe.

Data sets
In this work, we will consider JFNs for two exemplary binary classification tasks in highenergy physics.First, we consider quark vs. gluon jet classification and, second, Z vs. QCD jet classification.For the quark vs. gluon case, we make use of the data set in Ref. [73], which consists of 2M jets with transverse momentum p T = [500, 550] GeV, rapidity |η| < 1.7, jet radius parameter R = 0.4, and center-of-mass energy √ s = 14 TeV.We will make use of both the data set generated with Pythia [66] and Herwig [74].In order to explore the dependence on the jet transverse momentum, we also generate two additional data sets consisting of 500k jets each with transverse momentum p T = [300, 350] GeV and [1000, 1050] GeV, respectively.The underlying processes are: q q → Z(→ ν ν) + g and q q → Z(→ ν ν) + (uds) analogous to Ref. [73].For the Z vs. QCD-jet case, we generate 500k jets for three different bins of jet transverse momentum, [300, 350] GeV, [500, 550] GeV and [1000, 1100] GeV with a jet mass m j = [45,135] GeV.The radius parameter is R = 0.8, the rapidity cut is |η| < 1.7 and the samples are generated using Pythia at √ s = 14 TeV.Jets arising from Z bosons are identified by requiring that the leading Z boson is in the catchment area of the jet as extracted from the kinematics of the events at the particle level before hadronization with a Z-jet distance from the jet axis less than R/2.A similar tagging procedure is performed to differentiate between quark and gluon jets in the QCD sample.The tag is based on the leading parton within the catchment area of the jet.However, to strengthen the parton-jet association, we use parton-level kinematics injected into the hadron-level event using so-called ghost particles (p T = 10 −5 GeV) that do not affect the jet reconstruction but allow for efficient tagging after the jet finding step.
The substructure of QCD jets is generally single-pronged, whereas the decay products of a Z cause the corresponding jets to have two prongs.The ratio of N -subjettiness observables [40,41,75,76] is sensitive to the number of prongs inside a jet.In order to define the N -subjettiness, a given number of N axes are identified inside the jet using the exclusive k T algorithm.The N -subjettiness variables τ (β) N measure the radiation along these axes and are defined as Here the p T i of each particle i is weighted by its distance R ji to the closest axis j raised to the power β > 0, which is a tunable parameter.For jets that are more like a single-prong jet, the variable τ 2 will peak at smaller values compared to τ 1 , whereas for two-prong like (left) and the jet mass distribution (right) for QCD and Z jets with p T = [500, 550] GeV.The N-subjettiness axes were identified using the one pass k T clustering algorithm.jets the variable τ 2 takes similar values compared to τ 1 (by construction, τ n+1 ≤ τ n ).To illustrate the qualitative differences between QCD jets and Z jets, we plot in Fig. 3, the result for the ratio τ 1 , which shows the expected separation of the two jet samples (left panel), and the jet mass m j distribution for QCD and Z jets (right panel).For all classification tasks, the training/validation/test split is 80%/10%/10%.

JFN performance: gapless jet classification
In this section, we will present results for the performance of the JFN classifier.In the two subsections, we separately study the JFN dependence on the subjet radius r and the transverse momentum cutoff p soft T , which are associated with collinear and soft safety, respectively.We will observe that the maximum performance is obtained for finite values r and p soft T .

Collinear safety
We start by studying JFNs for different values of the subjet radius r.In the limit r → 0, every subjet contains only a single hadron, and the PFN performance is recovered.Throughout this section, we consider p soft T = 0 GeV.In the next subsection, we consider the JFN performance for finite values of the transverse momentum cutoff.We consider two exemplary binary classification tasks in high-energy physics: quark vs. gluon and Z vs. QCD jet identification.
In order to implement the permutation-invariant neural networks, we parametrize the functions Φ and F in Eq. (3.1) in terms of DNNs, using the EnergyFlow package [11] with Keras [77]/TensorFlow [78].For Φ we use two hidden layers with 100 nodes each and a latent space dimension of d = 256.For F we include three layers with 100 nodes each.For each dense layer, we use the ReLU activation function [79] and we use the softmax activation function for the final output layer of the classifier.We train the neural networks using the Adam optimizer [80] with learning rates ranging from 10 −3 to 10 −4 .We use the False q Rate (False q / Total g) binary cross entropy loss function [81] and train for 60 epochs with a batch size of 256 and a patience parameter of 8 for early stopping.We find no significant changes in performance when changing the size or number of the layers, latent space dimension, learning rate, and batch size by factors of 2-5.Following Ref. [11], we perform a preprocessing step to simplify the training process: we use the rescaled momentum fractions z i and center the rapidity and azimuthal angles η i , ϕ i of the particles in the jet with respect to the jet direction.
We quantify the performance of the different classifiers in terms of the Receiver Operator Characteristic (ROC) curve and the Area Under the Curve (AUC) of ROC curve.The ROC curve is the cumulative distribution function of the true positive rate vs. the false positive rate of a classifier as the decision threshold is varied.The AUC takes values between 0.5 and 1, where 0.5 (1) corresponds to a random (perfect) binary classifier.We estimate the statistical uncertainty of the AUC by training the deep sets four times for each choice of the subjet radius r and using the standard deviation as the uncertainty.performance as we change the inclusive subjet radius r, and the bottom panel shows the ROC curves for several r.For comparison, we also show the PFN result.In the case of the AUC plot, we display the PFN classifier as the leftmost point on the r axis.As expected, for r = R we obtain a random classifier since all particles are clustered into a single subjet that is identical to the original jet.Similarly, we find that for the smallest subjet radii r the performance of the PFN is recovered.Strikingly, however, the performance of the JFN does not diminish as r is increased for values of the subjet radius r ≲ 0.01.At this critical r value, we have on average n subjets /n hadrons ≈ 0.75.This observation is corroborated by the ROC curve in the lower panel, which shows that there is no performance loss in the JFN (r = 0.01) as compared to the PFN.This demonstrates that there is little-to-no information encoded in the very collinear emissions relevant for discriminating q vs. g jets, and suggests that collinear safe inputs are sufficient for the purpose of q vs. g classification.In section 6, we will further discuss the physical interpretation of this critical r value.
Fig. 4 (right) shows the analogous JFN results for Z vs. QCD jet classification using inclusive subjet clustering and 500k jets for training, validation and testing.We observe again that the JFNs smoothly converge to the result of the PFN.Different than for quark vs. gluon jet tagging, we can now choose a significantly larger subjet radius r ≲ 0.1 without compromising the performance of the classifier.This is related to the fact that in this case, the boosted Z-boson decay products generally lead to a two-pronged jet substructure, whereas QCD (quark and gluon) jets exhibit a single-pronged jet substructure (see Fig. 3 as well as Refs.[1,3] for different observables and perturbative calculations that characterize the radiation patterns of QCD and boosted Z jets).In general, machine-learned classifiers can make use of more information than the one-vs.two-pronged structure inside these jets; for r ∼ 0.1, a significant fraction of the hadrons inside the jets are clustered into subjets, n subjets /n hadrons ≈ 0.4.However, due to the observed saturation up to r ∼ 0.1, we conclude that the information contained in collinear emissions is significantly less relevant for this classification task compared to quark vs. gluon jet tagging.This is due to the physical PFN 0.5 0. 75   scales that are relevant for the different jet classification tasks, which we will explore in more detail in section 6.For completeness, we list the numerical values for the AUC including uncertainties in table 2.

Soft safety
Next, we are going to study the sensitivity of the classifier to soft emissions by analyzing the JFN performance as a function of the cut on the subjet's transverse momenta p soft T .The results for the AUC are shown in Fig. 5 for both quark vs. gluon and Z vs. QCD jets.We studied the effects of the soft cut for two ranges of transverse momenta for the jet and using 500k jets as the dataset for each classification tasks.Again, we observe that the PFN performance is achieved for finite values of p soft T that are O(1 GeV).We observe only small differences for the two classification tasks.The critical value for p soft T where the performance reaches a plateau is independent of the total jet transverse momentum p T within the displayed errors.This indicates that there is a soft scale below which no further information is added that can improve the classification performance.We conclude that IRC-safe information is indeed sufficient to carry out the two jet classification tasks considered in this work, which is consistent with theoretical considerations in the literature [29,82].However, the relatively low values of p soft T that are needed to achieve the maximum performance indicate that nonperturbative effects are generally relevant.These observations may inform the design of suitable observables, which we leave for future work.
For both jet classification tasks, we have shown that for a range of subjet radii r and p soft T cuts, the JFN exhibits no significant difference in performance compared to the PFN.That is, the JFN classifier here is "gapless" in the sense that we smoothly approximate the PFN performance (for finite values of r and p soft T ).The clustering of soft and collinear emissions into subjets does not affect the performance as long as r and p soft T are sufficiently small.This is in contrast to previous studies based on observables such as N -subjettiness variables, which exhibit a small but persistent performance gap to PFNs [7,10,11].The JFN provides the first example of a classifier with IRC-safe inputs that achieves equivalent performance to the IRC-unsafe PFNs for several classification tasks.Our results are consistent with the intuitive expectation that very low-energy particles are essentially uncorrelated with the hard process and therefore do not provide relevant information for typical classification tasks in high-energy physics.The main question of our paper has thus been answered by these observations.At least for the two classification tasks considered here, we have found that IRC-safe information is sufficient to close the gap to IRC-unsafe classifiers.This was achieved by using the machine learning architecture and input type (momentum, position information) for both cases and by including subjet reclustering as a preprocessing step in the IRC-safe case.We note that our conclusions come with the following caveat.While we are going to identify relevant physical scales with the performance of the classifiers, it is possible that future advances in machine learning lead to more powerful algorithms that may require us to reduce the subjet radius r and cutoff p soft T to match the performance of IRC-unsafe classifiers.

Learning physical scales
As discussed in the previous section, the performance of the JFNs based on IRC-safe input matches that of the IRC-unsafe PFNs for finite values of the subjet radius r where a significant fraction of hadrons is clustered into subjets.In this section, we quantify in more detail the onset of the drop in performance when the subjet radius crosses certain physical scales.Here we only focus on inclusive instead of exclusive subjet reconstructions since we are primarily interested in the physical scale associated with a fixed value of subjet radius r.In order to identify the physical scale associated with the classification tasks and study its scaling behavior, we are going to analyze the AUC for different bins of jet transverse momentum p T .Throughout this section, we only focus on the subjet radius r and choose p soft T = 0 GeV since we already identified the relevant soft scale in the previous section.In the upper left panel of Fig. 6, we show the AUC for q vs. g discrimination for three different bins of the jet transverse momentum.For comparison, we show analogous to Fig. 4 the PFN result as the left-most point (r = 0.001).For all three classification tasks we used 500k jets for training, validation and testing.First, we notice that at sufficiently large r all three AUC curves merge and the classification performance is (approximately) independent of the jet transverse momentum.This can be traced back to the approximate scale invariance of the QCD parton shower cascade.Second, we observe that in all three cases, the AUC reaches a plateau for finite values of the jet radius as r is decreased.In the lower left panel of Fig. 6, we show the results in the transition region in more detail.The onset of the plateau shifts to the left as the jet p T is increased.For higher jet p T , the jet constituents are more collimated leading to a smaller critical value r where the JFNs match the PFN performance.We can identify the following approximate scale where the AUC reaches a plateau and agrees with the PFN results: Figure 6: Left: AUC for quark vs. gluon jet classification for three different jet p T intervals as a function of the subjet radius r.Upper right: Average subjet multiplicity n q,g for quark (solid) and gluon (dashed) jets.Lower right: Ratio of the average subjet multiplicities n g /n q for the three jet p T intervals.
Since there is no additional physical scale in the quark vs. gluon jet classification task besides the jet p T , the agreement with the PFN result is achieved for relatively low energy scales.However, we would like to stress that the identified scale is still in the perturbative regime.This suggests that because we only use particle momentum as our classifier's input that non-perturbative symmetries, such as isospin, forbid useful information at hadron level for discrimination.This would be broken and discrimination could improve if information sensitive to flavor was measured, which has been established in some recent studies of the jet charge [48,49].
Since the (IRC-unsafe) particle multiplicity is known to be a powerful discriminant for quark vs. gluon jet tagging [29,83,84], we study the relation of our results to the average subjet multiplicities n q,g as a function of the subjet radius r, which is shown in the upper right panel of Fig. 6.As expected, the subjet multiplicities for both quark and gluon jets increase as the subjet radius r is decreased.In the limit r → 0, the subjet multiplicity smoothly asymptotes to the particle multiplicity.The expected value for the ratio of the particle multiplicities at leading order is n g /n q = C A /C F = 9/4 [85,86].In agreement with the discussion in the previous section, we notice that the subjet radius where the quark vs. gluon jet AUC reaches a plateau is larger than the r value at which the subjet multiplicity reaches the particle multiplicity by an order of magnitude.This confirms that matching the PFN performance with JFNs is a non-trivial result.As shown in the lower right panel of Fig. 6, we observe that the ratio n g /n q peaks at intermediate values of r, which is in the region of the better modeled perturbative physics [84,[87][88][89].Interestingly, the location of the peaks is approximately the same as where the AUC for quark vs. gluon jet tagging reaches the plateau and agrees with the PFN result.This interesting correlation indicates that we can increase the subjet radius r without affecting the classification performance until the subjet multiplicity n g /n q starts to decrease.Next, we investigate the dependence of our results on model parameters.In particular, we focus on the cutoff scale of the shower in Pythia at which the perturbative evolution ends and partons get converted to hadrons using the Lund string fragmentation model.The default value of this parameter is p min T shower = 0.5 GeV.To assess the dependence on this parameter, we compare the default choice to p min T shower = 1 GeV, which changes the transition from perturbative to non-perturbative splittings.The AUC results for the resulting classifiers are shown in Fig. 7 for quark vs. gluon jets.We find that the asymptotic value of the AUC is slightly different when a higher transition scale is used.This is expected since the hadronization module used in Pythia masks differences between quark and gluon jets.More importantly, we observe that indeed the plateau is reached for the same finite value of r ∼ 0.01, i.e. the scale we identified is independent of the choice of this parameter in Pythia.This illustrates that while the detailed modeling of soft physics is needed to describe the asymptotic AUC of the classifier, we observe that its peak performance is determined by a finite value of the subjet radius.
Next, we consider QCD vs Z jet classification.The AUC for three jet transverse momentum intervals is shown in Fig. 8 as a function of the subjet radius r.In all three cases we used 500k jets for training, validation and testing.As the jet p T is increased, the value of the subjet radius r ∼ 0.1 − 0.2, where JFNs match the PFN performance is shifted to the left.This observation is generally consistent with quark vs. gluon jet tagging discussed above.We note that for the AUC curve with p T ∈ [300, 350] GeV the choice of the jet radius R might start to play a role.The value of r where the performance reaches the plateau is roughly a factor of 10 higher compared to quark vs. gluon jet tagging.As already hinted at above, this is due to the presence of different physical scales.QCD jets do not have any additional intrinsic scales except for the hadronization scale.Instead, jets that contain the decay products of the boosted Z boson are sensitive to the Z-boson mass M Z .
In order to gain further insights into the underlying physics, we are going to study the distribution of the opening angle of the two leading subjets θ 12 .This variable is closely related to the 2-pronged structure of Z jets and serves as a useful discriminant since at leading order the Z-boson decays into a quark and anti-quark, which correspond to the two leading subjets at this order.The boosted Z decay products have an opening angle θ Z , which is determined by M Z and the jet transverse momentum p T as For higher p T , the decay products are more boosted and θ Z is smaller.See Fig. 9 for an illustration of the boosted Z-boson decay products clustered into subjets.If the subjet radius parameter is sufficiently small r < θ Z /2, the Z-boson decay products are clustered into separate subjets.Instead, for r > θ Z , they are merged into a single subjet.In the intermediate region, r < θ Z < 2r, they are identified as two separate subjets but the subjet catchment areas overlap.In Fig. 10, we show the distributions of the opening angle θ 12 between the first two leading subjets for both QCD and Z jets for different values of the subjet radius r.Here, θ 12 corresponds to the geometric distance in the η-ϕ plane, i.e. without rescaling by the jet radius R. The left column of Fig. 10 shows the opening angles θ 12 between the two leading hadrons, which corresponds to the subjet radius r = 0.The middle column shows the θ 12 distributions for r values in the plateau region where the AUC of the JFNs matches the PFN result.The right column corresponds to higher r values where the AUC has dropped significantly, see Fig. 8 above.The top and bottom row correspond to two different jet p T intervals as indicated in the figure.We observe that for both QCD and Z jets, the distributions are bounded from below by the chosen subjet radius, θ 12 > r, and the distributions vanish when the angle between the leading subjets reaches the jet radius θ 12 ≲ R = 0.8.Due to collinear QCD emissions, the distribution of the angle between the two leading hadrons peaks at θ 12 ∼ 0. As the subjet radius is increased, it peaks close to the lower bound θ 12 ∼ r.Eventually, the θ 12 distribution becomes broader for large values of r.Instead, both at hadron level and for r values in the plateau region of the AUC, the θ 12 distribution of Z-jets has a two-peak structure.The left peak is due to QCD emissions and it occurs at the same θ 12 value as the single peak of QCD jets.The second peak occurs around the opening angle of the Z decay products θ 12 ∼ θ Z .The width of the peaks scales as ∼ 1/p T .When r is chosen in the plateau region of the AUC in Fig. 8, the JFN performance agrees with the PFN result.In this region, the two-peak structure of the Z-jet θ 12 distribution can be clearly identified.The two-prong structure of Z jets is the most prominent feature that distinguishes the two jet samples and it is clearly resolved as long as r is sufficiently small.In this region, we found that the JFN performance is the same as the IRC-unsafe PFN result.While the location of the θ 12 ∼ θ Z peak is fixed, the peak due to QCD emissions moves to larger θ 12 as r is increased.Eventually, the two peaks start to merge.This is illustrated in the right-most column of Fig. 10, which shows the θ 12 distributions for r values where the JFN performance is significantly below its maximal value.The two peaks of the Z jets have merged into one peak and the distribution is very similar to QCD jets.In this case, the Z-decay products cannot be clearly resolved and the performance of the classifier deteriorates.At this scale, the classifier does not have access to the UV physics anymore and as such the performance for the p T = [1000, 1100] GeV jets matches the performance for the p T = [500, 550] GeV jets.By comparing the upper and lower row of Fig. 10, we observe that the location of the peaks is shifted to lower values for higher jet p T .In addition, the width of the peaks is narrower ∼ 1/p T .This agrees with the observation that for higher jet p T , the end of the AUC plateau in Fig. 8 is reached for smaller r values.
Another way of illustrating the importance of resolving the Z-boson decay products is by training deep sets using only the information of the first few leading subjets.In Fig. 11, we show the AUC for QCD vs Z jets as a function of the subjet radius r for three classifiers.We compare the JFNs to deep sets trained on the kinematic information of only the first two or three leading subjets.As an example, we use the jet transverse momentum interval of p T = [500, 550] GeV.We observe that for large and intermediate values of the subjet radius r, the JFN performance is close to the deep sets trained on only two or three leading subjets.For small values of r, the leading two or three subjets do not contain enough information to match the JFN result.Especially, using the information of three leading subjets closely approximates the JFN performance down to a subjet radius of r ∼ 0.2.The relevance of the third leading subjet corroborates the results of Ref. [38], where the leading emission off the color dipole was identified as an important component for Z vs. QCD jet classification.
Analogous classification tasks where physical scales can likely be identified are light QCD vs. c or b-jets [90][91][92], QCD vs. Higgs [93] or QCD vs. top quark jets [14].We leave the exploration of these topics for future work.

Performance vs. generalizability
Machine learning-based classifiers are often deployed in experimental analyses to tag jet topologies.A typical method is to train the classifier using fully supervised learning on precise theoretical simulations and apply it to experimental data [92,94,95].However, this approach introduces model dependence as simulations do not perfectly match the actual data.In this section, we will explore some of the systematic uncertainties associated with this method.Other options that have been proposed include semi-or weakly-supervised techniques [96,97], as well as data-driven methods [98].When using fully supervised learning to develop classifiers, it is crucial to ensure that the model can generalize well to the unseen experimental data.For JFNs, soft and collinear particles are clustered into subjets making them less sensitive to the modeling of IR physics.Since it is generally challenging to model the very soft physics (both perturbative and nonperturbative) of collider events in universal Monte Carlo event generators, JFNs may have an advantage compared to PFNs in terms of generalizability.On the other hand, if too many particles are clustered into few subjets, the overall performance deteriorates.In order to assess whether a classifier performs well on unseen data, we train PFNs and JFNs with different parameters on Pythia [66] (training + validation data set) and test on Herwig [74] simulations.Here, Herwig can be considered as a surrogate for experimental data.It has been often observed that Pythia and Herwig envelope many jet substructure observables, sometimes referred to as the "Pythia-Herwig sandwich".See e.g.Ref. [99].We note that while the final results of both event generators are quite similar, the underlying physics of both the perturbative parton shower and the hadronization model can differ significantly.One generally expects that quark jets are quite similar in Pythia and Herwig but the results for gluon jets tend to differ more significantly [100][101][102].See also Ref. [6], where Pythia and Herwig studies were presented using Convolutional Neural Networks (CNNs).Moreover, in Ref [103] mixed Herwig/Pythia samples were used together with a Bayesian Network in order to increase model robustness.
We consider quark vs. gluon jet tagging for p T = [500, 550] GeV using exclusive k T clustering of the subjets that are taken as input to the machine learning algorithm.Fig. 12 shows the AUC as a function of the number of the subjets N .Analogous to the previous figures we show the PFN result as the left-most point (N = 140).The upper panel shows the result for JFNs as a function of N trained on Pythia and tested on Pythia (blue) or Herwig (orange).In both cases, we observe a plateau in classifier performance as the number of (exclusive) subjets is increased.Within the shown errors, we observe that the AUC in both cases reaches its maximum value for N ∼ 30.As expected, there is a performance gap when testing the Pythia-trained classifier on quark vs. gluon jets generated with Herwig compared to testing it on Pythia simulations.This observation is consistent with the results of Refs.[6,103].However, we observe that the performance gap decreases as N decreases.To better visualize this aspect, we show in the lower panel the difference between the two AUC curves shown in the upper panel.The difference becomes smaller as N is increased indicating improved generalizability of the model.Our findings suggest that clustering particles into subjets can reduce the overall performance, but it also masks modeling uncertainties of the IR physics leading to more robust classifiers.Interestingly, we find that the difference between Pythia and Herwig does not decrease for small N for Z vs. QCD jet classification (not shown).We do not observe an increased robustness using inclusive subjet clustering.We expect that the generalizability or robustness of machine learning-based classifiers will be useful for certain experimental applications where the trade-off between performance and generalizability needs to be considered.To illustrate this aspect, we introduce the objective function f (a, N ), defined as: f (a, N ) a=4 q vs. g Middle and lower panels: The objective function f (a, N ) defined in Eq. (7.1), where a is a weighting factor between optimal performance and generalizability.
where N is the number of exclusive subjects.Here the performance and generalizability are combined additively and a weighting factor a > 0 is introduced that allows us to increase/decrease the relevance of the two metrics.An optimization problem to find the optimal balance between performance (first term in Eq. (7.1)) and generalizability (second term ∼ a in Eq. (7.1)) can now be formulated as follows: For a given choice of the weighting factor a, find the maximal value of the objective function f (a, N ).The optimal number of exclusive subjets is then given by N opt = arg max N f (a, N ).We plot f (a, N ) for two different values of a in Fig. 13 (middle and lower panels).We observe that as a is increased (the generalizability is weighted higher), the objective function peaks at an intermediate value of N .For example, for a = 4 we find N opt = 3.While our objective function is constructed for illustration purposes, this result indicates that for certain experimental analyses that employ machine learning-based classifiers, it can be advantageous to use JFNs with a finite number of subjets to achieve the desired goals.

Conclusions
The classification of jets at collider experiments is relevant for a wide range of tasks in high-energy particle and nuclear physics.Over the past years, machine learning-based classifiers have been developed that can achieve impressive tagging performance.While machine learning generally outperforms traditional methods by efficiently making use of the full information content, it is often unclear where the performance difference is coming from.
In particular, it had been unclear if classifiers based on infrared-collinear (IRC) information can match the performance of IRC-unsafe classifiers.IRC safety is primarily motivated by theoretical considerations ensuring that observables are tractable in perturbative QCD.
In addition, it is expected that the very soft physics is uncorrelated to the hard partonic process making it unlikely to be the reason of the performance gap that has been observed between IRC-unsafe machine learning results and traditional IRC-safe observables.
In order to address these questions, we introduced in this work a new family of classifiers, the Jet Flow Networks (JFNs).Here, particles inside a jet are first clustered into subjets and their position and momentum are taken as input to a permutation-invariant neural network (deep set).The clustering of subjets with a certain radius and transverse momentum cut allows us to control the sensitivity to soft and collinear emissions making the input to the classifier IRC safe.As the subjet radius (collinear safety) and momentum cut (soft safety) vanish, we recover the IRC-unsafe Particle Flow Networks (PFNs).We investigated both inclusive and exclusive subjet clustering, which can lead to important differences depending on the application.As representative examples, we considered two classification tasks: quark vs. gluon and Z vs. QCD jet tagging.Interestingly, we observed that the JFN performance matches the IRC-unsafe PFN result for finite values of the subjet radius and the soft transverse momentum cut.This makes JFNs the first classifier based on IRC-safe input without a performance gap to their IRC-unsafe counterpart for several jet classification tasks.This observation answered the main question we aimed to address in this work and indeed IRC-safe information is sufficient for the jet classification tasks considered here.As the subjet radius is increased, the performance of the JFNs remains unchanged (and in agreement with the PFNs) until physical thresholds are crossed.For example, for quark vs. gluon jets this threshold is around 5 GeV, whereas for Z vs. QCD jets it is determined by the kinematics of the hadronic boosted decay products of the Z-boson.The analogous threshold for the soft momentum cut is O(1 GeV).This indicates that while the jet classification tasks are IRC safe, nonperturbative physics is generally relevant for jet classification.In addition, we found that JFNs may offer a decreased model dependence for certain classification tasks with only a modest tradeoff in performance.This was illustrated in section 7 using a toy example and applications may depend on the type of experimental analysis that is carried out.This observation may lead to interesting applications of JFNs in collider phenomenology.
Our results shed new light on the information that machines learn in high-energy physics applications.As more powerful algorithms will be developed it will be interesting to revisit the question of the potential gap between classifiers based on IRC-safe and unsafe information.While more work is needed in this direction, our work represents an important step toward increasing the interpretability of machine learning methods in high-energy physics.In addition, we anticipate various applications of JFNs in heavy-ion collisions and the future Electron-Ion Collider [104].

Figure 1 :
Figure 1: Illustration of a QCD jet with p T = 100 GeV and radius parameter R = 0.4 reclustered into subjets for subjet radii r = 0.1 (left), r = 0.2 (middle), and r = 0.3 (right).We use the inclusive anti-k T algorithm to identify the initial jet and the subjets.Particles are represented by small filled circles with radii proportional to the particle's transverse momentum in the ∆y vs. ∆φ plane, where ∆φ = φ particle|subjet − φ jet is the azimuthal angle with respect to the jet axis and ∆y = ∆y particle|subjet − ∆y jet is rapidity distance to the jet axis.Subjets are shown with larger colored areas where red marks the leading subjet, green marks the second leading subjet, blue marks the third leading jet, and shades of gray represent subjets with lower longitudinal momentum fraction z = p subjet

Figure 2 :
Figure 2: The longitudinal momentum distribution of inclusive subjets z = p subjet T /p T originating from either a quark (blue) or a gluon (orange) jet simulated with Pythia [66].The jets have p T = [500, 550] GeV with an average particle multiplicity of 43.We show the distributions for inclusive anti-k T subjet clustering with r = 0.02 (left) corresponding to an average of approximately 30 subjets, and for exclusive k T clustering with a fixed number of N = 30 subjets (right).See also section 4 for more details.

Figure 4 :
Figure 4: Top panel: AUC for quark vs. gluon (left) and Z vs. QCD (right) jet tagging using JFNs with different values of the (inclusive) subjet radius r.The PFN classifier is shown for reference at the leftmost value of r.Bottom panel: ROC curves for quark vs. gluon (left) and Z vs. QCD jet tagging using JFNs and the PFN with different values of the (inclusive) subjet radius r for the same datasets as the upper panel.

Figure 5 :
Figure 5: AUC of the JFN as a function of the cut on the subjet's transverse momenta p soft T for quark vs. gluon (left) and Z vs. QCD (right) jet tagging.

Figure 7 :
Figure 7: AUC for quark vs. gluon jet classification for two different choices of the perturbative shower cutoff parameter p min T,shower in Pythia as a function of r.The value of r where the maximum classification performance is achieved is independent of the shower cutoff within the displayed errors.

8 ,Figure 8 :
Figure 8: AUC for Z vs. QCD jets for three different jet p T intervals as a function of the subjet radius r.

Figure 9 :
Figure 9: Reclustering of the Z-boson decay products into subjets with different radii.

Figure 10 :
Figure 10: Distributions of the opening angle θ 12 between the two leading subjets for both QCD and Z jets.We show the results for the two leading hadrons (r = 0, left column) and two representative r values (middle and right column).The upper and lower row correspond to two intervals of the jet transverse momentum p T .

8 , 3 N = 2 Figure 11 :
Figure 11: AUC of JFNs for Z vs. QCD jets trained on the full information (inclusive subjets) compared to deep sets trained only on the two or three leading subjets.

Figure 12 :
Figure 12: Classification performance for quark vs. gluon jets using JFNs and exclusive k T clustered subjets plotted as a function of the number of subjets N .Upper panel: JFNs trained and tested on Pythia [66] (blue), JFNs trained on Pythia and tested on Herwig [74] (orange).Lower panel: The difference in the performance of the two results.

Figure 13 :
Figure 13: Upper panel: AUC of the JFNs trained on Pythia and tested on either Pythia or Herwig plotted as a function of the number of (exclusive) subjets N , see also Fig. 12.Middle and lower panels: The objective function f (a, N ) defined in Eq. (7.1), where a is a weighting factor between optimal performance and generalizability.