Using Deep Learning to Enhance Event Geometry Reconstruction for the Telescope Array Surface Detector

The extremely low flux of ultra-high energy cosmic rays (UHECR) makes their direct observation by orbital experiments practically impossible. For this reason all current and planned UHECR experiments detect cosmic rays indirectly by observing the extensive air showers (EAS) initiated by cosmic ray particles in the atmosphere. The world largest statistics of the ultra-high energy EAS events is recorded by the networks of surface stations. In this paper we consider a novel approach for reconstruction of the arrival direction of the primary particle based on the deep convolutional neural network. The latter is using raw time-resolved signals of the set of the adjacent trigger stations as an input. The Telescope Array Surface Detector is an array of 507 stations, each containing two layers plastic scintillator with an area of $3$ m$^2$. The training of the model is performed with the Monte-Carlo dataset. It is shown that within the Monte-Carlo simulations, the new approach yields better resolution than the traditional reconstruction method based on the fitting of the EAS front. The details of the network architecture and its optimisation for this particular task are discussed.


Introduction
The ultra-high energy cosmic ray (UHECR) sources identification is one of the most difficult problems of modern astroparticle physics. UHECRs are believed to have an extragalactic origin since the galactic magnetic field is not capable to keep them inside the Milky Way. The well-known Hillas criterion [1] strictly limits the set of potential UHECR source classes, based on the requirement of a particle's Larmor radius not exceeding the accelerator size R, otherwise the particle escapes the accelerator and cannot gain the energy any further. This translates into the condition for maximum acceleration energy E max as a function of the particle electric charge q and the typical magnetic field strength B in the source environment: E max ≤ qBR. Therefore, heavier nuclei can be accelerated to higher energies, however the radiation loss restrictions may also apply [2]. Powerful active galaxies, starburst galaxies and shocks in galaxy clusters are often considered among possible UHECR sources.
Due to the smallness of the UHECR flux all present UHECR experiments detect cosmic rays indirectly by observing the extensive air showers (EAS), initiated by the cosmic ray particles in the atmosphere. Two largest UHECR experiments, the Pierre Auger Observatory (Auger) [3] and the Telescope Array (TA) [4] operate in the Southern and Northern Hemisphere respectively. Both experiments combine two shower observation techniques. Firstly, they observe the lateral shower density profiles on the ground level using the network of surface detector (SD) stations. Secondly, the shower longitudinal profiles are observed via fluorescent light from the excited nitrogen molecules by the fluorescent detector (FD) telescope stations. Note that since the fluorescent light observation is only possible during the moonless nights with an additional requirement of the clear weather, the FD duty cycle is approximately 10% of that of the SD. TA observatory, the largest experiment in the Northern Hemisphere, covers the area over 700 km 2 in Utah, USA, with over 500 ground scintillation stations, placed at a distance of 1.2 km from each other in rectangular grid and with 3 fluorescence telescope stations placed in the corners of the SD array.
The shower reconstruction procedure involves modelling particle interactions in the atmosphere which requires the extrapolation of nucleon-nucleus interaction cross section up to energies of hundreds TeV in center-of-mass system. This introduces unavoidable systematic uncertainty in the estimations of all particle properties. The UHECR arrival direction, estimated using the shower front timing measurements (see below), is the least model-dependent quantity. However, its interpretation strongly depends on the assumption of the primary particle nature. The latter is very difficult to infer from the observable EAS properties, because showers initiated by protons and heavy nuclei are very similar. So far only the average UHECR composition can be estimated based on the large amount of showers observation [5,6]. Moreover, the upper limits on the possible admixture of γ-rays [7,8] and neutrinos [9,10] in the UHECR flux were obtained. Recent studies by Auger [11] indicate that the UHECR mass composition changes from mostly light nuclei around several EeV to the heavier ones above 10 EeV, which means that the highest energy cosmic ray arrival directions may not necessarily point back to their sources due to the substantial (several tens of degrees) cosmic rays deflections in the Galactic magnetic field. This fact makes the source identification task extremely difficult.
The search of the UHECR sources has been addressed with multiple approaches. One group of these approaches is based on the study of large and intermediate scale anisotropies in the UHECR arrival directions distribution. The results of these methods include indication of the hotspot by TA [12] and measurement of the dipole by Auger [13,14]. Another manifestation of the large scale UHECR anisotropy is a difference between TA and Auger energy spectra [15]. The latter is not significant in the overlapping region of the sky of two experiments. Though being weakly dependent on UHECR source model and composition the large and intermediate scale anisotropy approach can give only a general picture of what real sources could be [16][17][18]. These results are more demanding for an accuracy of the UHECR energy reconstruction than the accuracy of reconstruction of events arrival directions.
Another group of approaches is focused on the small scale anisotropies and assumes testing some predefined UHECR source model. There exists, however, a freedom in the choice of the source parameters in construction of predefined search catalogue. The blind scanning over these parameters leads to a statistical penalty which should be taken into account. The results of these methods may possess a certain level of ambiguity since there are physically different models of sources that lead to similar UHECR distributions on the sky. Despite these challenges, this method is potentially able to discover the particular UHECR sources. Among the results of these methods application are HiRes evidence of small fraction of UHECR from BL Lacs [19,20] as well as a recent Auger result on starburst galaxies [21]. Another approach is based on the study of events auto-correlation at small angles and search for spatial doublets or triplets of events [22]. Although the present results on auto-correlation are consistent with the isotropic distribution of events, the lower limits on the density of the UHECR sources are established [23,24]. The methods of this group can significantly benefit from the improvement of the experiments angular resolution, especially in the searches in which small deflection of the particles from their sources is assumed.
There are also "hybrid" approaches to anisotropy search that simultaneously include information about UHECR arrival directions and their energy. Possible applications of these methods include searches for specific energy-angle correlation patterns in the UHECR distribution [25][26][27][28]. The improvement in experiment's angular resolution may boost the analyses of this kind, while the simultaneous improvement in energy resolution may also be required.
In this paper we consider the reconstruction procedure used in the Telescope Array experiment's surface detector. We propose the event reconstruction enhancement method based on the machine learning algorithms and illustrate its capacity to recover the particles arrival direction. The paper is organised as follows: in Section 2 we briefly review the standard reconstruction procedure of UHECR properties used in TA SD and then describe a new method to enhance this reconstruction by means of the convolutional neural network (CNN). In Section 3 we compare the results of the CNN-enhanced reconstruction of UHECR arrival directions with the respective results of the standard TA SD reconstruction. We summarize in Section 4.

Method
In this work we focus on the event reconstruction of the surface detector array. Modern experiments record the full time-resolved signal of each SD station (in case of the Telescope Array in each of the two layers of the scintillator). The traditional analysis methods are based mostly on the values that could be measured by the previous generation of the detectors: the arrival time of the first particle and the integral signal of each detector. The chemical composition analysis performed by the Auger and TA collaborations uses a number of empirically established integral characteristics of the measured signal. In both cases, not all of the data available for the analysis were used.
The machine learning methods, in particular, deep convolutional neural networks have been very sucessful in image recognition and many related tasks, including challenges in astroparticle and particle physics. Due to these advancements it is now possible to perform the analysis utilizing all the experimental data available. The method we propose uses existing standard SD event reconstruction as a first approximation. We describe it below.

Standard Reconstruction
In the standard SD event reconstruction procedure [29] the event geometry is reconstructed using the arrival times of the shower front measured by the triggered stations. The shower front is approximated by the empirical functions first proposed by J. Linsley and L. Scarsi [30], then modified by the AGASA experiment [31] and fine-tuned to fit the TA data in a selfconsistent manner. The integral signals of the individual stations are used to estimate the shower lateral distribution profile normalization S 800 [32].
The fit of shower front and lateral distribution function is performed with 7 free parameters [33]: the position of the shower core (x core , y core ), the arrival direction of the shower in the horizontal coordinates (θ, φ), the signal amplitude at the distance of 800 meters from the shower core S 800 , the time offset t 0 and Linsley front curvature parameter a [30]. The following functions t(r) and f (r) are employed for the shower front and the lateral distribution correspondingly: where R core is the location of the shower core, R i are the locations of each station of an event, obtained from the pre-defined coordinate system of the array centered at the Central Laser Facility (CLF) [34], t plane is the arrival timing of the shower plane at the distance r, n -unit vector towards the direction of the arrival of a primary particle and c is the speed of light. After the fit is performed, the primary particle energy is estimated as a function of the density S 800 at the distance of 800 m from the shower core and the zenith angle θ using the lookup table obtained with the Monte Carlo simulation [29].
In this work we use the full Monte Carlo simulation of TA SD events induced by the protons and nuclei for both the standard and the machine learning-based reconstructions. We sample p, He, N and Fe nuclei in equal fractions. Primary energies are distributed assuming the spectrum of the HiRes experiment [35]. The events are generated by CORSIKA [36] with the EGS4 [37] model for the electromagnetic interactions, QGSJET II-03 [38] and FLUKA [39] for high-and low-energy hadronic interactions. Both proton and nuclei event sets are sampled assuming isotropic primary flux with zenith angles θ < 60 • . We apply the standard anisotropy quality cuts [40] to the simulated events in the same way they are applied to the data. The standard cuts include the constraint on the reconstructed zenith angle θ rec < 55 • .

Deep Learning-based Event Reconstruction
In essence the method is using the machine learning technique to construct the inverse function for the full detector Monte Carlo simulation. The latter allows to calculate the raw observables as a function of primary particle properties. In case of the Telescope Array surface detector the raw observables are time-resolved signals for the set of the adjacent triggered detectors. The application of this approach to the ultra-high-energy cosmic-ray experiment was first demonstrated in Ref. [41] on toy Monte Carlo simulation roughly following the geometry of Auger observatory. In this work we use the full detector Monte Carlo simulation [29] of the TA observatory to produce the time-resolved signals from the two layers of SD stations. The inverse function is constructed by means of a multi-layer feed-forward artificial neural network (NN). This class of parameterized algorithms is known to be able to approximate any continuous function with any finite accuracy [42]. The NN free parameters, weights, are tuned in the so-called training procedure which is in essence the error minimization on a set of training examples.
In practice, the achievable accuracy is often limited by the stochastic nature of the problem introducing unavoidable bias. In case of EAS development the main source of the stochastic uncertainty is perhaps the first particle interaction which occurs at a random depth. In this work we try to enhance the existing event reconstruction procedure. To rate the relative performance of the enhanced reconstruction with respect to the baseline one the explained variance score could be used Here y is the true value of the quantity being predicted, which can also be interpreted as the error of the baseline model (zero approximation), andŷ is our estimate of y, i.e. the correction to zero approximation calculated in the enhanced model. The ideal reconstuction would have EV = 1, while the presence of unavoidable bias means that EV is limited from above by some value EV max < 1. The systematic uncertainty, e.g. hadronic interaction model at ultra-high energies, does not constrain the NN accuracy nominally, however it should be  For our purpose we utilize the convolutional NN, a widely used special kind of feedforward NN optimized for the image and sequence processing [43]. In Fig. 1 we show an example of the NN architecture used in this work. The typical UHECR event triggers from 5 to 10 neighbour stations. The readouts from 4 × 4 or 6 × 6 stations around the event core are used as an input with each pixel corresponding to a particular station. There are two time-resolved signals, one per layer, for each detector. The typical length of the signal recorded with 20 ns time resolution does not exceed 256 points. The overall dimensionality of the raw input data is therefore 4 × 4 × 256 × 2 or 6 × 6 × 256 × 2. The approximate position of the latter is estimated using the standard reconstruction.
The full time-resolved signal from the two scintillator layers of each surface detector is first converted to a 28-dimensional feature vector using waveform encoder, consisting of six convolutional layers followed by the max-pooling. The scaled exponential linear units are used as the activations to achieve the self-normalizing property [44].
The extracted features from the waveforms are treated as a multichannel image using a sequence of 2D convolution layers. We also add some extra detector (or detector signal) properties S to the set of the extracted detector signal features before feeding them into the 2D convolution model. These properties include the exact detector position (more precisely, altitude and the offset from the rectangular grid), the detector state (on/off/saturated) and standard reconstruction parameters, integral signal, timing relative to plane front.
In order to take the missing or switched off detectors into account properly, we have introduced a special Normalize layer, somewhat similar to the Dropout [45]. Namely, it drops the pixels corresponding to the missing detectors and multiplies the activations of the present detectors by a factor of N total /(N total − N missing ). This trick enhances the explained variance (2.1) by few percent.
The 2D-convolutional model part consists of two blocks built from the two convolutional layers with 3 × 3 and 2 × 2 kernels followed by max-pooling. The input signal in each convolutional block is concatenated with the output before pooling operation, which facilitates reusing the extracted features at different scales. We tried to replace classic convolution blocks with the separable convolution as it was suggested in [41] and found no positive effect on the accuracy. The 2D convolutional model outputs a feature vector which is then processed by the two fully connected layers. Before the feature vector is passed to the first dense layer, it is concatenated with N extra event properties, such as the season, the time of day and (optionally) the standard reconstruction data (e.g. S 800 ). At this step we've found it useful to add a set of 14 composition sensitive synthetic observables calculated in standard reconstruction [46]. The output may contain one or several observables. In this work we use three components of the arrival direction unit vector.
The mean square error is used as a loss function. For small errors it is proportional to the average square of the angular distance between the true and predicted directions. The weights are optimized using the adaptive learning rate method Adadelta [47]. The optimization is performed for at most 400 epochs with the conditional early stop, i.e. until the loss function on the validation data is not improving for more than 5 epochs. We tried to apply several regularization techniques, such as L2-regularisation, noise admixture and α-dropout [44] and eventually found the early-stop procedure to be sufficient to avoid overfitting.
The raw waveform data is converted to the log scale before it is passed as an input to the model. The rest of the input and output data are normalized to ensure their zero mean and unit variance. Finally, for the arrival direction reconstruction we've found it useful to predict the correction to the standard reconstruction instead of its absolute value.
The NN model is implemented in Python using Keras [48] library with the Tensorflow backend. We also use hyperopt package [49] to optimize model hyperparameters, such as dimensionality of the waveform encoder output, the shapes of the convolution kernels and the dense layer size.

Results
As the first application of the event's geometry reconstruction we evaluate the event's zenith angle θ. In Fig. 2 the accuracy of this observable estimation is compared between the standard and CNN reconstructions. One may see that both the bias and the width of the distribution are smaller for the CNN reconstruction than the standard method. The parameter that is less technical and more interesting for physical applications is the angular resolution of the reconstruction. We calculate the angular resolution as 68% percentile of angular distance between true and reconstructed cosmic ray arrival direction. The comparison of angular distance distributions of the standard and NN reconstructions for proton Monte-Carlo event set for energies larger than 10 EeV and 57 EeV are shown in Fig. 3. We plot angular resolutions for the proton and iron Monte Carlo event sets as a function of the reconstructed energy using either QGSJETII-03 or QGSJETII-04 hadronic interaction model in Fig. 4. The training data was composed of Monte Carlo events initialized by H, He, N and Fe nuclei mixture in equal proportions calculated with QGSJETII-03 hadronic interaction model.  The angular resolution at a given energy is better on average for heavier nuclei since the EAS produced by heavy nuclei are typically wider and trigger more detectors. This seems to be the main reason of the resolution difference as it is clear from the Fig. 5 where the dependence of the resolution on the number of detectors triggered is shown for protons and iron nuclei.
The results presented above were obtained with the NN taking 4 × 4 detector raw signal grid as an input. 6 × 6 architectures show similar performance but take much longer to train. This is expected, since most of the events have just 5-6 neighbor detectors triggered. The larger grid may be useful for the higher energy events reconstruction, where the number of triggered stations is higher.

Conclusion
We have shown that the deep learning based methods allow to substantially enhance the accuracy of the TA SD event geometry reconstruction. The angular resolution for proton   Angular resolution for the standard (red curves) and CNN-enhanced (blue curves) reconstructions of the proton (solid lines) and iron (dashed lines) Monte Carlo event sets simulated using QGSJETII-03 (left plot) or QGSJETII-04 (right plots) hadronic interaction models.
induced showers is improved from 1.35 • to 1.07 • at the primary energy of 1 EeV. The systematic uncertainties related to the choice of hadronic interaction model which are known to limit the method applicability for the primary particle mass and energy determination seem to be almost irrelevant for the arrival direction reconstruction. We plan to apply the same approach for the photon candidate event reconstruction which is important to improve the directional photon flux limits. We continue the work on the improved method operating with the relaxed quality cuts, which helps to increase the exposure. in configuring the high-performance computing system based on graphic cards. The work is supported by the Russian Science Foundation grant 17-72-20291.