Signal-background separation and energy reconstruction of gamma rays using pattern spectra and convolutional neural networks for the Small-Sized Telescopes of the Cherenkov Telescope Array

Imaging Atmospheric Cherenkov Telescopes (IACTs) detect very-high-energy gamma rays from ground level by capturing the Cherenkov light of the induced particle showers. Convolutional neural networks (CNNs) can be trained on IACT camera images of such events to differentiate the signal from the background and to reconstruct the energy of the initial gamma ray. Pattern spectra provide a 2-dimensional histogram of the sizes and shapes of features comprising an image and they can be used as an input for a CNN to significantly reduce the computational power required to train it. In this work, we generate pattern spectra from simulated gamma-ray and proton images to train a CNN for signal-background separation and energy reconstruction for the Small-Sized Telescopes (SSTs) of the Cherenkov Telescope Array (CTA). A comparison of our results with a CNN directly trained on CTA images shows that the pattern spectra-based analysis is about a factor of three less computationally expensive but not able to compete with the performance of an CTA image-based analysis. Thus, we conclude that the CTA images must be comprised of additional information not represented by the pattern spectra.


Introduction
When a gamma ray reaches the Earth's atmosphere, it induces a cascade of secondary particles which is known as an air shower.The secondary particles can reach velocities higher than the speed of light in air, inducing a flash of Cherenkov light [1].The Cherenkov light can be captured by Imaging Air Cherenkov Telescopes (IACTs) from the ground to reconstruct specific properties of the initial particle, such as its species, energy and direction (see [2][3][4] for an overview of ground-based gammaray astronomy).The Cherenkov Telescope Array (CTA) [5] is the next generation ground-based observatory for gamma-ray astronomy at very-high energies, offering 5-10 times better flux sensitivity than current generation gamma-ray telescopes [6], such as H.E.S.S. [7], MAGIC [8] and VERITAS [9].It will cover a wide energy range between 20 GeV to 300 TeV benefiting from three different telescope types: Large-Sized Telescopes (LSTs), Medium-Sized Telescopes (MSTs) and Small-Sized Telescopes (SSTs).The CTA Observatory will be distributed on two arrays in the northern hemisphere in La Palma (Spain) and the southern hemisphere near Paranal (Chile).CTA will outperform the energy and angular resolution of current instruments providing an energy resolution of ∼ 5 % around 1 TeV and an angular resolution of 1 ′ at its upper energy range.With its short timescale capabilities and large field of view of 4.5 • − 8.5 • , it will enable the observation of a wide range of astronomical sources, including transient, high-variability or ex-tended gamma-ray sources.Several analysis methods for IACT data have been developed to classify the initial particle and reconstruct its energy and direction.Hillas parameters [10] are one of the first reconstruction techniques proposed by A. M. Hillas in 1985.They describe features of the Cherenkov emission within the camera images and are widely used as input to machine learning algorithms like Random Forest [11] or Boosted Decision Trees [12][13][14] to perform full event reconstruction of gamma rays.Another approach is the ImPACT algorithm [15], which performs event reconstruction using expected image templates generated from Monte Carlo simulations.Other methods such as model analysis [16] and 3D model analysis [17], which are based on a semianalytical shower model and a Gaussian photosphere shower model respectively, managed to be more sensitive to certain properties of the shower [18].Recently, convolutional neural networks (CNNs) [19][20][21] have been proposed and applied to IACT data [22][23][24][25][26][27][28][29][30][31][32][33][34].CNNs are machine learning algorithms that are specialised for image data and are currently one of the most successful tools for image classification and regression tasks [35].They rely on convolutional layers which consist of image filters that are able to extract relevant features within an image.Among many others, models such as AlexNet [36], GoogLeNet [37] and ResNet [38] established many new techniques, such as the Rectified Linear Unit (ReLU) [39] activation function and deeper architec-tures, which set the milestones for many upcoming architectures.ResNet won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2015 by introducing shortcut connections into the architecture and achieving a top-5 classification error of only 3.6 % [40].CNNs that contain these shortcut connections often achieve higher performances and are referred to as residual neural networks (ResNets).The first event classifications with a CNN trained on IACT images have been presented in [22] and [23], which have demonstrated the signalbackground separation capabilities of CNNs.Later work has shown the energy and direction reconstruction capabilities of gamma rays with CNNs [24][25][26][27], their ability to run in stereo telescope mode [28,29] and to be applied to real data [31][32][33].In particular, the ResNet architecture has been shown to perform well for full event reconstruction for CTA data [30].However, one of the main drawbacks of this method is that the training of CNNs is computationally very expensive [41].It typically requires access to computing clusters with powerful graphics processing units (GPUs) and large amounts of random-access memory (RAM).The larger the dimension of the input image, the larger the computational power and time needed for the CNN training.A significant reduction of the dimension of the input image without any performance losses would therefore result in substantial savings in hardware and human resources, increase the efficiency of related scientific works and lower the environmental impact of CNNs [42].An approach to this problem are pattern spectra [43], which are commonly used tools for image classification [44][45][46] and can significantly reduce the computational power needed to train CNNs.They provide a 2-dimensional distribution of sizes and shapes of features within an image and can be constructed using a technique known as granulometries [47].The features within the image are extracted with connected operators [48], which merge regions within an image with the same grey scale value.Compared to other feature extraction techniques, this approach has the advantage of not introducing any distortions into the image.In this work, we generate pattern spectra from simulated CTA images to apply them on a ResNet for signal-background separation and energy reconstruction of gamma rays.The application of a ResNet on pattern spectra takes advantage of their 2D nature by selecting relevant combinations of features within the CTA images.Our pattern spectra algorithm is based on the work presented in [44], which provides two main advantages compared to other existing pattern spectra algorithms: (i) the computing time for creating the pattern spectra is independent of its dimensions and (ii) it is significantly less sensitive to noise.These properties merit the investigation of pattern spectra-based analysis for IACTs.Direction reconstruction of gamma rays is not considered here since pattern spectra are rotation invariant, meaning that the same CTA image rotated by an arbitrary angle would result in the same pattern spectrum.By generating pattern spectra from simulated CTA images, we aim to obtain a competitive algorithm that is significantly faster and less computationally intensive while keeping comparable performance to a CNN trained on CTA images in terms of signal-background separation and energy reconstruction of gamma rays.
The structure of this article is as follows: In Section 2, the CTA dataset used in this analysis is described.Section 3 is devoted to our analysis methods including the pattern spectra algorithm, the ResNet architecture and the performance evaluation methods for our algorithms.The results are shown in Section 4 and discussed in detail in Section 5. Finally, we state our conclusions in Section 6.The source code of this project is publicly available at [49].

Dataset
The dataset consists of simulated gamma-ray and proton events detected by the southern CTA array (Prod5 DL1 (ctapipe v0.10.5 [52]), zenith angle of 20 • , North pointing [53,54]).Due to the hexagonal pixels integrated in the LSTs and MSTs cameras, which cannot be processed by the current version of the pattern spectra algorithm, only the 37 SSTs with rectangular pixels are considered in this analysis.The SST images containing the charge information, i.e. the integrated photodetector pulse, will be referred to as CTA images in the following.CTA images generated by gamma rays with an energy between 500 GeV and 100 TeV and protons with an energy between 1.5 TeV and 100 TeV have been considered for this study to match the operating energy range of the SSTs.
For the energy reconstruction ∼ 3•10 6 gamma-ray events generated with a 0.4 • offset from the telescope pointing position, referred to as pointlike gamma rays in the following, are used.For the signal-background separation ∼ 2 • 10 6 diffuse gamma rays and ∼ 2 • 10 6 diffuse protons are used, whereas the term diffuse describes events generated in a view cone of 10 • .The pointlike and diffuse events are considered in the analysis to represent real observation conditions.When observing a source, background events reach the telescopes not only from the direction of the source but potentially from a much larger view cone.However, using pointlike gamma-rays and diffuse proton events for signal-background separation would introduce a bias for the learning process of the CNN.Therefore, we consider diffuse events for the signal-background separation and pointlike events for the energy reconstruction task.
In particular for high energies, the dataset often includes single events that were captured by multiple SSTs.This results in several CTA images for a single event.Since the construction and training of a CNN, that is able to handle a varying amount of input images, is very challenging, we constructed a single CTA image for each event as a first step towards the implementation

ReLU
Weight layer

Weight layer
ReLU : Top: Architecture of the thin residual neural network (TRN) [51].For each convolutional layer, the filter size and number of filters are specified.Bottom: Building block with a linear shortcut connection (left) and non-linear shortcut connection (right) (adapted from [38]). of pattern spectra for the analysis of CTA images.In order to obtain a single CTA image per event, all CTA images of the same event are combined into a single image by adding up the individual pixel values of each image.We are aware that this is reducing the performance of the array, but we adopt this strategy to simplify our proof of concept work.Furthermore, we tested our analysis in mono-mode, i.e. using a single CTA image per event and found that the image stacking does not have any adverse effect on the performance of our pattern spectra analysis.However, we do not promote the idea of image stacking for CNN analyses with CTA data when trying to maximise the performance of the CNN.

Pattern spectra
The algorithm used to extract pattern spectra from the CTA images is based on the work presented in [44] and will be briefly summarised in the following.Let f be a grey-scale image with grey levels h.In the case of CTA images, the grey levels h correspond to the set of unique pixel values within the image.Consider an image domain E ⊆ R 2 and let the set X ⊆ E denote a binary image with domain E. The grain of a binary image X is defined as a connected component C of X.Therefore, grains are distinct regions that represent various structures and elements within the image X.
The peak components P k h ( f ) of an image f are defined as the kth grain of the threshold set T h ( f ), which is defined as Starting with the lowest grey level h 0 of the image f , the threshold set T h 0 ( f ) always consists of a single peak component P 0 h 0 ( f ), independent of the image f .Increasing the grey level h to the next larger value h 1 , the threshold set T h 1 ( f ) consists of k peak components P k h 1 ( f ), which are the grains of the binary image T h 1 ( f ).The grey levels are subsequently increased until the highest grey level h n is reached.Figure 1 (a) shows an example of a 2D grey-scale image and (b) the corresponding peak components P k h ( f ).In this particular example, the image consists of four grey levels h = {0, 1, 2, 3}.For the grey levels h = {0, 1, 3}, the threshold set T h ( f ) consists of a single peak component P 0 h i ( f ).For grey level h = 2 two peak components, P 0 2 ( f ) and P 1 2 ( f ), are present.This is due to the fact that two distinct regions (grains) with grey level h ≥ 2 are present within the image.Additionally to the threshold set T h ( f ), consider another set The nodes N k h ( f ) of an image f are defined as the connected components C of T h ( f ) such that C ∩ Q h ( f ) ∅.A way to hierarchical represent the nodes N k h ( f ) is the so called Max-tree.The Max-tree of the previous example image is shown in pixels with the lowest grey level h 0 , i.e. the set of pixels belonging to the background.The children of the root node represent the set of pixels with the next larger grey level h 1 and so on.The leaf nodes of the Max-tree represent the set of pixels with the highest grey level h n , i.e. the set of pixels belonging to the foreground.For each image f , a Max-tree is computed according to the algorithm described in [44].The pattern spectra are based on the size and shape attributes of the peak components P k h ( f ).The size attribute corresponds to the area A(P k h ( f )), which is computed by the sum of the pixels belonging to the detected feature.The shape attribute corresponds to I/A 2 with the moment of inertia I describing the sum of squared differences to the centre of gravity of the feature.The size and shape attributes are binned into N = 20 size classes s and shape classes r, which results in a good compromise between the performance of the pattern spectra and the computational power needed to train the ResNet.The 2D pattern spectrum is computed from the Max-tree as follows [44]: 2. Set all elements of Φ[r, s] to zero. 3.For each node N k h ( f ) of the Max-tree, compute the size class r from the area A(P k h ( f )), the shape class s from 2 and the grey-level difference δ h between the current node and its parent.

Add the product of δ h and A(P
The current version of the algorithm is designed to work for images with square pixels but it could be adapted to work with hexagonal pixels in the future.An example of a pattern spectrum extracted from a CTA image is shown in Figure 2. The image in (a) shows a CTA image of a 1.9 TeV gamma-ray event that was captured by eight SSTs.The bright features in the centre of the image correspond to the Cherenkov emission induced by the particle shower.Due to the different locations of the SSTs, the Cherenkov light is captured with different intensities and at different positions on the SST cameras.The pattern spectrum generated from the CTA image is shown in Figure 2 (b).Each pattern spectrum pixel represents a set of detected features.An example of the detected features is shown in Figure 2 (c) &  (d).The image in (c) shows a set of detected features within the CTA image highlighted in red.The image in (d) shows the pattern spectrum with the red pixel representing these features.This specific example shows features with a small A and small I/A 2 referring to features with a small size and a circular-like shape.They correspond to individual pixels in the CTA image and represent mostly noise.Another example is shown in (e) and (f) of Figure 2. Compared to the previous example, the red marked pattern spectrum pixels correspond to larger A and I/A 2 values.Thus, the highlighted objects (red/orange) in the CTA image correspond to features with a larger size and more elliptical-like shape.The detected features in this example are of particular interest since they represent the Cherenkov photons induced by the particle shower, which contain information about the type and energy of the initial particle.The pattern spectrum of a 1.9 TeV proton event is shown in Figure 3.The features within the proton image differ significantly in comparison to the features present in the gamma-ray image shown in Figure 2. Whereas the gamma-ray event results in mainly elliptical features, the features from the proton event vary notably more in shape and size.At a first glance, it is difficult to identify major differences between the pattern spectrum extracted from the gamma-ray event and the proton event.However, on closer inspection one can see that the gamma-ray pattern spectrum contains more features for larger I/A 2 and the proton pattern spectrum for smaller I/A 2 values.This meets our expectation because (i) gamma-ray events result in more elliptical-like features and (ii) the proton image contains more circular-like features.This does not mean, however, that proton events do not contain any elliptical-like features.algorithm.A classifier can therefore be trained on these differences to distinguish between gamma-ray and proton events.

Residual neural network architecture
For the signal-background separation and energy reconstruction of gamma-ray events, two individual but almost identical ResNet architectures are constructed and trained with either CTA images or pattern spectra.The architectures of our ResNets are identical to the ResNets presented in [51] and are based on the work presented in [30,38,55].The ResNet is illustrated in Figure 4. Due to the rather shallow architecture compared to the ResNet presented in [38], we refer to our architectures as thin residual neural networks (TRNs) in the following.They are constructed using Tensorflow 2.3.1 [56] and Keras 2.4.3 [57] and consist of 13 convolutional layers with Rectified Linear Unit (ReLU) [39] activation function, a global average pooling layer and two fully connected (dense) layers with 64 and 32 neurons respectively.The output layer consists of a single neuron for the energy reconstruction and two neurons with softmax [58] activation function for the signal-background separation.Shortcut connections [38] at every third convolutional layer were implemented in order to improve the stability and performance of the algorithm.The solid arrows in Figure 4 represent linear shortcut connections, in which the input of a building block x is added to the output of the last layer of the building block F(x).If the input and output of a building block have different dimensions, the input x is put into another convolutional layer with the same number of filters as the last layer of the building block.The output of this residual operation G(x) is added to the output of the last layer of the building block F(x).A filter size of 1 × 1 is used for all shortcut connections with a convolutional operation.In total, the two TRNs have about 150000 trainable parameters.

Network training and performance metrics
The TRNs described in the previous section are trained and evaluated 10 times each on the datasets for both signalbackground separation and energy reconstruction to perform a statistical analysis of the training process.Similar to the work presented in [30], a multiplicity cut of four or more triggered telescopes is applied for both the gamma-ray and proton events.The dataset is split into 90 % training data, from which 10 % is used as validation data, and 10 % test data.The weights of the TRN are initialized using the Glorot Uniform Initializer [59] and the training, validation and test data are randomized for each run.The adaptive moment (ADAM) optimizer [60] with a learning rate of 0.001, a batch size of 32 is used for the TRN training.The training is stopped if there is no improvement on the validation dataset for over 20 epochs, and the model with the lowest validation loss is saved.The categorical cross entropy and mean squared error [61] are applied as loss functions for the signal-background separation and energy reconstruction, respectively.The results shown in Section 4 are obtained by evaluating the performance of each TRN on the test data.

Signal-background separation
Each event is labelled by its gammaness Γ, whereas Γ = 1 corresponds to a gamma-ray (photon) and Γ = 0 corresponds to a proton.The output of the TRN is a Γ-value between 0 and 1, which describes a pseudo-probability of the event being a photon according to the TRN.For a fixed Γ-threshold α Γ , the photon efficiency η γ is defined as η γ = T P/P, where T P is the number of true positives, i.e. photon events with Γ ≥ α Γ (correctly classified photons), and P is the total number of positives (photons) that pass the selection criteria described in Section 2. Similarly, the proton efficiency η p is defined as η p = FP/N, where FP is the number of false positives, i.e. proton events

CTA images Pattern spectra
Figure 7: Left: mean effective area A eff as a function of the true energy E true obtained from 10 independent TRNs.Right: mean proton efficiency η p as a function of the true energy E true obtained from 10 independent TRNs.The proton efficiency was calculated by fixing the photon efficiency η γ to 90 % for each energy bin.Note that all simulated events are used to calculate the effective area A eff but only the events that pass the selection criteria are used to calculate the proton efficiency η p (see text for more details).
with Γ < α Γ (misclassified protons), and N is the total number of negatives (protons) that pass the selection criteria.A good classifier results in a high photon efficiency η γ and a low proton efficiency η p for a given Γ-threshold.In order to evaluate the performance of our TRNs, the efficiencies as a function of the Γ-threshold and the effective area A eff as a function of the true energy E true are calculated.The effectivate area is determined by A eff = ηγ • A geom , where A geom is the geometrical area of the instrument, i.e.A geom = πr 2 max with r max being the maximum simulated impact radius, and ηγ = T P/ P with P being the total number of simulated photons, including the events that did not pass the selection criteria in Section 2. Similarly, we define ηp = FP/ Ñ with Ñ being the total number of simulated protons.The energy range is split into seven logarithmic bins, whereas each event is assigned to an energy bin based on its true energy E true .The effective area is then calculated for each energy bin by increasing the Γ-threshold until ηp = 10 −3 is reached and extracting the corresponding ηγ .The value ηp = 10 −3 is motivated by the photon flux of the Crab Nebula being about three orders of magnitude lower than the isotropic flux of cosmic rays (CRs) within an angle of 1 deg around the direction of the source: Φ Crab γ ≈ 10 −3 •Φ CR [2].Similarly, we determine the proton efficiency η p as a function of the true energy E true by fixing the photon efficiency η γ to 90 % for each energy bin.Lastly, the receiver operating characteristic (ROC) curve [62] is determined.The ROC curve describes the photon efficiency η γ versus the proton efficiency η p .The area under the ROC curve (AUC) is calculated and used as a measure of the performance of each TRN.For part of our calculations we make use of pyirf v0.7.0 [63], which is a python library for the generation of Instrument Response Functions (IRFs) and sensitivities for CTA.From the 10 TRNs, the mean efficiencies, effective area, ROC curve and the AUC-value are calculated for both the CTA images and pattern spectra-based analyses.

Energy reconstruction
The gamma-ray events are labelled by their true energy E true , which the TRN learns to predict based on the training input.The performance of the TRN on the test data is evaluated by comparing the reconstructed energy E rec of the TRN with the true energy E true of the initial gamma ray.Therefore, the relative energy error ∆E/E true = (E rec − E true )/E true is calculated for each event.The whole energy range between 500 GeV and 100 TeV is split into seven logarithmic bins and each event is assigned to an energy bin based on its true energy E true .For each of these energy bins, the distribution of the relative energy error ∆E/E true is determined and its median calculated.The median of ∆E/E true is referred to as the energy bias in the following.An energy bias close to zero indicates a good energy accuracy of the algorithm.The distributions of the relative energy error ∆E/E true are then bias-corrected by subtracting the median, i.e. (∆E/E true ) corr = ∆E/E true −median(∆E/E true ).The energy resolution is defined as the 68th percentile of the distribution |(∆E/E true ) corr |.From the 10 TRNs, the mean energy bias and energy resolution with their standard deviation are calculated for each energy bin for both the CTA images and pattern spectra-based analyses.

Signal-background separation
Two examples of the gammaness distributions obtained from a single TRN trained with the CTA images and pattern spectra are shown in Figure 5. Figure 5 (left) shows a distinct separation between photon and proton events for the TRN trained with CTA images.The majority of photon events are classified with Γ = 1 and the majority of proton events with Γ = 0.The number of proton (photon) events continuously decreases for larger (smaller)  Γ-values, which indicates a good separation capability of the TRN.

Number of events
Figure 5 (right) shows the performance of the TRN trained with the pattern spectra, which results in a lower signal-background separation capability compared to the TRN trained with CTA images.Once again, the majority of photon events are classified with Γ = 1 and the majority of proton events with Γ = 0.However, the distributions decrease less rapidly compared to the CTA images-based analysis.The mean photon efficiency η γ and proton efficiency η p as a function of the Γ-threshold α Γ are shown in Figure 6.The shaded regions in this figure and the upcoming ones depict the standard deviation across the 10 TRNs.Both the photon efficiency and proton efficiency decrease steadily for an increasing α Γ -value.Up to Γ ∼ 0.1 the pattern spectra-based analysis results in a very similar photon efficiency but in a much higher proton efficiency in comparison to the CTA images-based analysis.The proton efficiency of the pattern spectra approaches a similar value compared to the CTA images at Γ ∼ 0.9 at which, however, the CTA images outperform the pattern spectra in the photon efficiency.Therefore, the CTA images results overall in a better photon and proton efficiencies independent of the Γthreshold α Γ .The mean ROC curve and corresponding AUC-value are shown in Figure 6 (right).As expected from the gammaness distributions discussed above, the ROC curve obtained from the CTA images is significantly steeper than the ROC curve obtained from the pattern spectra.The mean AUC-value of 0.987 for the CTA images is therefore significantly larger than the value of 0.929 obtained from the pattern spectra by a factor of 1.06.Figure 7 (left) shows the mean effective area A eff as a function of the true energy E true .The CTA images result in a higher effective area than the pattern spectra for all energies.The difference between the two analyses increases with increasing energy.The CTA images result in a maximum effective area of ∼ 12.8 × 10 5 m 2 at ∼ 80 TeV, whereas the pattern spectra result in a maximum effective area of ∼ 7.0 × 10 5 m 2 at ∼ 80 TeV, which corresponds to factor of 1.8 between the two analyses.
The mean proton efficiency η p as a function of the true energy E true is shown in Figure 7 (right).The CTA images result in a lower proton efficiency, i.e. less misclassified protons, than the pattern spectra for all energies.For a fixed photon efficiency of 90 %, both analyses achieve the lowest proton efficiency at ∼ 7 TeV, whereas η p ≈ 2 % for the CTA images and η p ≈ 14 % for the pattern spectra.Percentage-wise, the difference is notably smaller for the higher energies.At the highest energy bin at ∼ 80 TeV, the CTA images result in a proton efficiency of ∼ 20 % and the pattern spectra in ∼ 50 %.
Overall, the TRN trained with CTA images shows a higher signal-background capability than the pattern spectra-based analysis.We discuss potential causes and implications of these results in Section 5.

Energy reconstruction
Figure 8 shows two examples of the energy migration matrices, i.e. the 2D histogram of E rec against E true , obtained from a single TRN trained with the CTA images and pattern spectra.Most of the events are distributed around the E rec = E true line for both the CTA images and pattern spectra-based analysis.However, the distribution obtained from the pattern spectra is more spread compared to the CTA images-based analysis.
The mean energy accuracy obtained from 10 independent TRNs is shown in Figure 9 (left).The energy biases obtained from the CTA images-based analysis are closely distributed around 0 with the largest energy bias of ∼ 5 % at the lowest energy bin.The energy biases obtained from the pattern spectra-based analysis reaches up to ∼ 20 % with the largest energy biases at the lowest and highest energy bin.The absolute value of the energy bias obtained from the pattern spectra-based analysis is larger than the values obtained from the CTA images for all energies.The mean energy resolution obtained from 10 independent TRNs is shown in Figure 9 (right).The CTA images-based analysis ranges from 0.08 to 0.12 with a minimum at ∼ 7.5 TeV.While we simplified our analysis by stacking CTA images for each event, the energy resolution still meets the CTA requirements [64] for all energy bins, except for the lowest energy bin.The pattern spectra result in an energy resolution between 0.22 and 0.25 with a minimum at the highest energy bin and does not meet the CTA requirements.Thus, the CTA images-based analysis outperforms the pattern spectra for all energies with a maximum factor of 2.9 at ∼ 7.5 TeV between the two curves.

Discussion
A comparison of the computational performance of the analyses is shown in Figure 10.The TRN training with pattern spectra is about a factor of 2.5 faster and requires a factor of 2.5 less RAM compared to the TRN training with CTA images.The pattern spectra are capable of detecting and classifying relevant features in the CTA images, which is illustrated by the gammaness distributions shown in Figure 5 (right) and the energy migration matrix shown in Figure 8 (right).However, the pattern spectra-based analysis is outperformed by the CTA images with respect to their signal-background and energy reconstruction capabilities.For a given Γ-threshold α Γ , the pattern spectra result in a poorer photon and proton efficiency compared to the CTA images (see Figure 6), which is a main drawback of the analysis since both efficiencies are important quantities for the analysis of real gamma-ray data.Moreover, we infer from the effective area and proton efficiency versus energy plots shown in Figure 7 that the signal-background capabilities of the pattern spectra-based analysis are below the capabilities of the CTA images-based analysis independent of the energy of the initial particle.Note that the different shape of the effective area and the proton efficiency curves are due to the fact that the effective area is determined using ηγ and ηp , i.e. considering the total number of simulated photons and protons, whereas the proton efficiency η p in Figure 7 (right) considers only those events that passed the selection criteria.The lowest proton efficiency η p is reached at an energy of ∼ 5 − 10 TeV, which corresponds to the energy range for which the SSTs are expected to have the highest flux sensitivity [65].For energies larger than ∼ 10 TeV, we suppose that the increasing proton efficiency η p is caused by the increasing leakage, i.e. the fraction of the image intensity contained in the outermost pixels of the camera, in both the proton and gamma-ray images.The AUC-value obtained from the CTA images is a factor 1.06 larger than the pattern spectra AUC-value and illustrates once again the overall lower signalbackground capabilities of the pattern spectra-based analysis.
The CTA images result in a better energy resolution and a lower energy bias for all energies compared to the pattern spectra.Although our choice of attributes, i.e. size and shape attribute, is well-motivated, these two attributes do not seem to be sufficient to fully describe all relevant features within the CTA images.Potentially, the pattern spectra might not be able to detect, e.g., the electromagnetic substructure in proton showers.Other feature attributes, e.g. the perimeter, sum of grey levels and compactness (perimeter / A 2 ), were tested for both signalbackground separation and energy reconstruction but did not result in a significantly better performance.Furthermore, we applied pattern spectra on other algorithms including classification and regression trees (CART) [66], Learning Vector Quantization (LVQ) and Generalized Matrix Learning Vector Quantization (GMLVQ) [67].None of these algorithms achieved a better performance than the TRN.We therefore conclude that the TRN relies on features within the CTA images that are not detected by the pattern spectra algorithm.We suppose that the features within the CTA images are too complex to be sufficiently described by two attributes.For any given feature, one can always find a different feature with the same size and shape attribute values.This fact arguably makes it harder for any classifier to distinguish between gamma-ray and proton events or to reconstruct the energy of a gamma ray.Adding more than two attributes to the pattern spectrum, resulting in an n-dimensional pattern spectrum, might improve the performance of the algorithm.However, the computational power required to train a classifier with such an n-dimensional pattern spectra would significantly increase and would likely exceed the computational  power required to train a classifier with CTA images.Given the results of the 2D pattern spectra presented in this work, we doubt that the n-dimensional pattern spectra would outperform the CTA images.Therefore, we decided to not pursue this idea further.The performances stated in this work do not represent the expected performance by the CTA Observatory at the end of its construction phase.

Conclusions
For the first time, signal-background separation and energy reconstruction of gamma rays was performed under the application of pattern spectra.We have shown that the pattern spectra algorithm has the capability to detect and classify relevant features in IACT images.The detected features are capable of differentiating between gamma-ray and proton events and to reconstruct the energy of gamma-ray events.The training of the TRN with pattern spectra requires 2.5 less RAM and is about a factor 2.5 faster than the TRN trained with CTA images, which agrees with our expectation due to the smaller size of the pattern spectra as compared to CTA images.The reduction in computational power was one of the main motivations to test the performance of pattern spectra on IACT data.However, the pattern spectra-based analysis is not competitive with the CTA images-based analysis in signal-background separation and energy reconstruction.The AUC-value, which is a measure of the signal-background separation capability of an algorithm, obtained from the CTA images is a factor 1.06 larger than the value obtained from the pattern spectra.The CTA images result in a better energy accuracy and energy resolution for all energies with a maximum factor of 2.9 at ∼ 7.5 TeV in energy resolution compared to the pattern spectra.We, therefore, conclude that the relevant features within the CTA images are not sufficiently detected or described by our choice of size and shape attributes.Other sets of attributes were tested but resulted in no major improvements.Thus, the TRN trained on CTA images must rely on additional features not captured by the pattern spectra.In other applications, especially when the input images are larger, or vary in size, the results may be different.

Figure 2 :Figure 3 :
Figure 2: (a) CTA image of a 1.9 TeV gamma-ray event captured by eight SSTs.(b) Pattern spectrum extracted from the CTA image.(c) CTA image with set of detected features highlighted in red.(d) Pattern spectrum with pixel corresponding to the detected features (small A and I/A 2 ) highlighted in red.(e) CTA images with different set of detected (sub-)features highlighted in (red) orange.(f) Pattern spectrum with pixels corresponding to the detected features (intermediate A and I/A 2 ) highlighted in red.

Figure 5 :
Figure 5: Example of the gammaness distributions obtained from a single TRN trained with CTA images (left) and pattern spectra (right).

Figure 6 :
Figure6: Left: Mean photon efficiency η γ and proton efficiency η p as a function of the Γ-threshold α Γ obtained from 10 independent TRNs.Right: mean ROC curve and mean AUC-value obtained from 10 independent TRNs.The solid black line corresponds to a ROC curve expected from a random classifier.The performances stated here do not represent the expected performance by the CTA Observatory at the end of its construction phase.

Figure 8 :
Figure 8: Example of the energy migration matrix obtained from a single TRN trained with CTA images (left) and pattern spectra (right).

Figure 10 :
Figure 10: Mean time (left) and RAM (right) required to train the TRN for signal-background separation and energy reconstruction obtained from 10 independent TRNs for each analysis.The training was performed on a Nvidia A100 GPU.