End-to-End Jet Classification of Quarks and Gluons with the CMS Open Data

We describe the construction of end-to-end jet image classifiers based on simulated low-level detector data to discriminate quark- vs. gluon-initiated jets with high-fidelity simulated CMS Open Data. We highlight the importance of precise spatial information and demonstrate competitive performance to existing state-of-the-art jet classifiers. We further generalize the end-to-end approach to event-level classification of quark vs. gluon di-jet QCD events. We compare the fully end-to-end approach to using hand-engineered features and demonstrate that the end-to-end algorithm is robust against the effects of underlying event and pile-up.


I. INTRODUCTION
One of the important aspects of searches for new physics at the Large Hadron Collider (LHC) involves the classification of hadronic jets in collision events. The Compact Muon Solenoid Collaboration (CMS) uses a particle flow reconstruction approach that converts raw detector data into progressively physically-motivated quantities [1] until arriving at particle-level data. Such higher-level quantities are then used as inputs for a jet classifier to obtain the probability that a jet initiated from a particular flavor of quark or gluon [2,3]. Following a similar reconstruction strategy, a number of novel jet classification algorithms based on deep neural networks have been introduced, achieving the current state-of-the-art performance in several simulated classification tasks using simplified detector models. [4].
In this paper, we build upon our previous work combining low-level detector data with modern deep neural networks in fully end-to-end (E2E) particle and event classification [5]-and extend it to the task of jet identification. As a first application, we apply the E2E approach to the discrimination of light quark-vs. gluon-initiated jets. We then tackle event classification in the context of multiple jet production.
While jet representation with images has been studied extensively [6,7], especially for quark vs. gluon identification [8,9], these approaches struggle to compete with algorithms based on particle-level data, as described above [2,4]. As we will show in this paper the use of high-fidelity detector images together with Convolutional Neural Networks (CNNs) is vital to bringing out the full potential of image-based algorithms.
This paper is arranged as follows: in Section II we introduce our data sample and event selection. In Section III we briefly describe the CMS geometry and jet image construction. In Section IV, we outline our network architecture and training strategy. The results for jet and event identification are presented in Sections V and VI, respectively. We summarize our conclusions in Section VII.

II. OPEN DATA SIMULATED SAMPLES
For the end-to-end study we use CMS Open Data [10] that provides high-quality simulated CMS datasets wellsuited for E2E studies. These datasets utilize Geant4 [11] for detector simulation and the most detailed geometry models of the CMS detector.
Both quark and gluon samples are taken from the same QCD dijet dataset with ap T = 90 − 170 GeV [12] and using Pythia6 [13] for the simulation of parton hadronization. As gluons contain both QCD color and anti-color compared to a single color for quarks, gluon-initiated jets have a higher branching probability, giving them a broader radiation pattern, simulated to leading-order in Pythia6 [13]. These samples additionally account for the multiparton interactions from the underlying event and have run-dependent pile-up (PU) ranging from a peak average PU of PU = 18 − 21 [14].
Events containing two outgoing gluons g from the Pythia hard-scatter are classified into a gluon event class, while events containing any two outgoing quarks q l , where l = u, d, s are classified into a quark event class. Events are required to have two reconstructed jets of transverse momentum p T > 70 GeV and pseudorapidity |η| < 1.8 matched to one of the partons within a cone of ∆R = 0.04, where R is the angular separation in the pseudorapidity-azimuthal arXiv:1902.08276v1 [hep-ex] 21 Feb 2019 (η − φ) plane. For jet identification studies, only the leading p T jet is used, such that each event provides a single sample for jet identification.
For convenience, we only use a subset of the QCD dijet data. In addition, we ensure a balanced number of samples per class and a balanced PU distribution between classes. These are broken down by run era for each class in Table  I  This procedure produces a total of 793900 samples for training and validation, and 139306 samples for the final test set.

III. CMS DETECTOR & IMAGES
The CMS detector is arranged as a series of concentric cylindrical sections split into a barrel section and two circular endcap sections. The innermost sections comprise the inner tracking system for identifying charged particle tracks. This is then enclosed by the electromagnetic calorimeter (ECAL) which measures energy deposits from electromagnetic particles, followed by the hadronic calorimeter (HCAL) which measures energy deposits from hadrons. Finally, the calorimeters are enclosed by the outer tracking system used to identify muons.
The CMS Open Data contains the reconstructed hits of the ECAL and HCAL at the crystal-and tower-level, respectively. This makes it possible to construct calorimeter images whose pixels correspond exactly to physical crystals or towers. The track information is approximated as p T -weighted hits corresponding to the fitted track's (η,φ)-position evaluated at the surface of closest approach to the beamline.
The images are constructed following ECAL-like granularity, with HCAL hits up-sampled to match. The difference in segmentation between the ECAL endcaps (EE) (iX, iY ) and the HCAL endcaps (HE) (iη, iφ) imposes a constraint on the construction of multi-channel detector images. As explained in [5], we thus devise two image geometry strategies: one where the EE segmentation is preserved and the HE hits are projected onto an (iX, iY ) grid (ECALcentric), and another where the HE segmentation is preserved and the ECAL hits are projected onto an (iη, iφ) grid (HCAL-centric). In either case, the track hits follow the corresponding segmentation. This gives us a full detector image of ∆iη × ∆iφ = 280 × 360 pixels of ECAL barrel-like granularity, as illustrated in Figure 1a. For the entirety of this paper, we use only the HCAL-centric geometry image for simplicity.
The process of creating a jet image from the full detector image is as follows: as a first approximation for determining the center of the jet image, we identify the centroid position of the reconstructed jet passing event selection. We then identify the highest energy HCAL tower in a window of 9 × 9 HCAL towers. Once identified, the coordinates of this HCAL tower determine the center of the jet image. We then crop out a 125 × 125 window (in ECAL granularity) from the full multi-channel HCAL-centric detector image, as illustrated in Figure 1b. In terms of coverage, this corresponds to about 25 × 25 HCAL towers or ∆η × ∆φ = 2.175 × 2.175. The combination of jet image window size and jet image center, as chosen for this study, imposes an effective pseudorapidity cut on the jet image center of about |η| < 1.57.
We present a number of jet image visualizations to better grasp the image construction. Figure 2 shows the various sub-detector image overlays averaged over the full test set containing about 70k jet images for each class, while Figure 3 shows sub-detector images for a single jet. There are two main differences compared to previous jet imaging techniques [8,9]. First, the E2E images appear notably more "raw" in that they contain more noise and stray hits. This is intentional with the expectation that the classifier ultimately learns to discern signal hits from the noise. Second, E2E images are rendered in the finer ECAL-like granularity as opposed to the coarser HCAL-like granularity. While all E2E images have the same 125×125 resolution, the effective feature scale differs greatly for each sub-detector image. In the Tracks image, particles appear as individual, isolated pixels, while in the ECAL image, as roughly 3 × 3 pixel showers, and in the HCAL image, as 5 × 5 pixel blocks. Such a classification task, therefore, poses a non-trivial feature extraction challenge for the CNN.
To estimate the maximum expected performance given the above image construction techniques, we construct a generator-level image of the event that accounts for underlying event while neglecting the pile-up. We take all the stable particles from the Pythia particle table and construct a multi-channel image with hits corresponding to the (η,φ)-positions of the stable particles weighted by their p T . We place all electrons and photons in one image channel and all the remaining hadrons in another. We then form HCAL-centric full detector and jet-level images as before.

IV. NETWORK & TRAINING
For jet classification, we focus on discriminating quark-vs. gluon-jets, while for event classification on the discrimination of the di-quark vs. di-gluon QCD events. In both cases, we use the same training strategy. In particular, we use the ResNet-15 CNN architecture described in [5] for the identification of H → γγ events. The ADAM adaptive learning rate optimizer [15] is used to minimize the binary cross-entropy loss in batches of 32 samples. We use an initial learning rate of 5 × 10 −4 , and reduce it by half every 10 epochs for a total of 30 training epochs. We reserve about 26k out of the 768k samples for our validation set (see Table II). We found no significant gain over the original set of hyper-parameters used in [5] and therefore use them throughout this analysis. All training was done using the PyTorch [16] software library running on a single NVIDIA Titan X GPU.
For event-level classification, we construct algorithms corresponding to the different ways of constructing classifiers from jet-level inputs. We can construct an image for each jet (see Figure 1b), apply a ResNet-15 network, and input the concatenation of the two network outputs in a Fully-Connected Neural Network (FCN) that serves as an event-level classifier (algorithm A). To account for event-level kinematics, we can augment the image inputs with the      FCN (algorithm B). Empirically, the choice of using either the reconstructed jet centroids or the actual jet image centers does not impact the final results. In either algorithm A or B, the final result is not sensitive to the size or depth of the FCN and is therefore set at 2 hidden layers of 128 nodes each. Finally, we use a fully end-to-end approach [5], using the full detector image (see Figure 1a) as input to a single ResNet-15 (algorithm C). The different network strategies are outlined in Table  III and illustrated in Figure 1.

V. JET ID RESULTS
The E2E jet identification results for different combinations of input detector images are presented in Table IV. We use the area under the Receiver Operating Characteristic (ROC) curve to evaluate the performance of different algorithms. This lends itself well to an interpretation in terms of the signal efficiency (true positive rate) vs. background rejection (true negative rate), as is commonly used in high-energy physics. In addition, we present the inverse of the false positive rate (FPR) at a fixed true positive rate (TPR) of 70%. The area under the ROC curve (AUC) is used to select the best algorithm based on the validation set. For an unbiased estimate of performance, all final performance metrics presented here are determined from the test set which is statistically independent from the validation set.   We first compare the performance of the single sub-detector images shown in the bottom three rows in Table IV. The best single sub-detector performance is provided by the Tracks image followed by ECAL, then HCAL. This suggests that precise spatial measurement of the jet constituents holds the strongest discrimination power for quark vs. gluon discrimination. This is also expected, given the differences in the shower patterns of quarks and gluons (see Section II) and the strong performance of 4-momenta-based jet classifiers [4]. What is remarkable is the ability of the end-to-end approach to extract information from the highly-sparse Tracks images (see Figure 3a) which contain isolated (p T -weighted) image pixels.
We next consider the effect of combining two sub-detector images in a single multi-channel image, as presented in the middle two rows in Table IV. We combine the Tracks and ECAL images to incorporate information about the photons that are absent from the Tracks image (Tracks+ECAL). Alternatively, we could swap out the Tracks image for the HCAL (ECAL+HCAL), which amounts to taking the charged hadron information from the coarser HCAL image into a purely calorimetric image. The former approach achieves the best discrimination so far, while in the latter, we observe a performance penalty from sacrificing the precise spatial information from the tracks. Despite the ECAL+HCAL image having the advantage in terms of neutral hadron information, we see that the Tracks-only image performs as well as the full calorimeter image.
Finally, the best overall performance is obtained when all three images are combined (Tracks+ECAL+HCAL) as shown in the second row of Table IV. Although these three images have identical image resolution, the effective feature scale among the images differ drastically (see Section III). The fact that the CNN can extract meaningful features at these different feature scales and deliver robust performance from the underlying information content is a testament to the power and versatility of this algorithm.
The relatively higher performance of the generator-level images (Generated EM+Had), as shown in the top row in Table IV, suggests that while detector resolution effects might limit classification performance, better pile-up mitigation strategies may further improve the performance.
To put these results into context, we can compare the end-to-end classifier with the current state-of-the-art jet classifier, the QCD-aware Recursive Neural Network (RecNN) jet classifier [4,17]. We use the default architecture, hyper-parameters, and training strategy implemented in [18] but with the training split and evaluation frequency modified for consistency with the E2E jet ID training. We try different re-clustering pre-processing, as available in [18], the results of which are presented in Table V and plotted in Figure 4. The top scores for each algorithm represent the mean and standard deviation over 5 trials of randomized shuffling of the training set.  We find that the ascending-p T pre-processing gives best RecNN algorithm results, although others are not far off. We observe that the E2E jet image algorithm is highly competitive with the top performing RecNN, even after taking into account systematic uncertainties due to the random number seeds. Previous studies [2,4] have shown image-based approaches to under-perform relative to 4-momentum-and high-level feature-based algorithms. Our results suggest that these differences can be attributed to limitations in image construction rather than the the use of image-based algorithms themselves. All forms of jet images have hitherto relied on the HCAL granularity at the expense of spatial resolution, which, as discussed above, leads to lower performance. Further progress in using jet images can therefore come from even higher-fidelity detector representations, and, in particular, in improving the way that tracking information is presented.

VI. EVENT ID RESULTS
Next, we generalize the quark vs. gluon jet identification algorithm to the challenge of identifying full collision events that contain jets. As a proof-of-concept application, we focus on QCD dijet production that can originate from quarks or gluons. We construct the different algorithms for event classification as described in Section IV. The results are summarized in Table VI with the corresponding ROC curves in Figure 5. For reference, we also include E2E results based on the generator-level particle information (algorithm C-Gen).  The results above suggest that event classification performance is dominated by jet-level differences (algorithm A), with negligible gain from including jet 4-momenta information (algorithm B). The results of algorithm B were not sensitive to the choice of using reconstructed jet positions or the actual coordinates of the jet image centers. Since both dijet classes have non-resonant kinematics, this is consistent with expectation [19]. While the fully E2E approach (algorithm C) does carry a slight advantage, this is likely due to efficient feature extraction from a unified detector image. For other more complicated event topologies with variable jet multiplicity and possible overlaps among jets, we expect the fully E2E approach to be even more competitive. Lastly, we note the performance of the detectorreconstructed vs. generated inputs to be closer than it was at the jet-level, suggesting that a view of the complete event aids in pile-up mitigation.
To get a better understanding of the effect of underlying event and pile-up in the fully E2E approach, we additionally train the classifier on full detector images with pixel intensities outside of the jet windows zeroed out (algorithm C-Zero) and perform a transfer learning study by evaluating the classifier trained on the original scenario (algorithm C) on these zeroed-out images and vice-versa. The results of this study are presented in Table VII.  As these results indicate, the loss in performance from either training starting point is minimal, showing that the end-to-end algorithm is insensitive to the underlying event and pile-up outside of the jet region-of-interest.

VII. CONCLUSIONS
In this paper, we demonstrate the application of the end-to-end deep learning technique to quark vs. gluon jet classification and extend it to event-level classification. By constructing high-fidelity multi-channel detector images, we apply the end-to-end technique to isolated jets. These emphasize high-quality granular image construction with the Geant4-based simulated CMS 2012 Open Data. Using a ResNet-15 convolutional neural network, we demonstrated the ability of the end-to-end algorithm to effectively extract features across different detector scales to obtain classification performance highly competitive with current state-of-the-art quark vs. gluon jet classifiers. We found that precise spatial information is of paramount importance, highlighting the central role of track information for jet identification. For full event classification of di-quark vs. di-gluon events, we found that the performance is largely dominated by individual jet-level differences. Finally, we showed that fully end-to-end algorithms are robust and versatile against underlying event and pile-up, making them a compelling option for event topologies difficult to model by hand. We aim to investigate in the future more comprehensive representations of particle tracking information, given the importance played by spatial resolution. ,