DeepRICH: Learning Deeply Cherenkov Detectors

Imaging Cherenkov detectors are largely used for particle identification (PID) in nuclear and particle physics experiments, where developing fast reconstruction algorithms is becoming of paramount importance to allow for near real time calibration and data quality control, as well as to speed up offline analysis of large amount of data. In this paper we present DeepRICH, a novel deep learning algorithm for fast reconstruction which can be applied to different imaging Cherenkov detectors. The core of our architecture is a generative model which leverages on a custom Variational Auto-encoder (VAE) combined to Maximum Mean Discrepancy (MMD), with a Convolutional Neural Network (CNN) extracting features from the space of the latent variables for classification. A thorough comparison with the simulation/reconstruction package FastDIRC is discussed in the text. DeepRICH has the advantage to bypass low-level details needed to build a likelihood, allowing for a sensitive improvement in computation time at potentially the same reconstruction performance of other established reconstruction algorithms. In the conclusions, we address the implications and potentialities of this work, discussing possible future extensions and generalization.


I. INTRODUCTION
Imaging Cherenkov detectors [1] measure the velocity of charged particles and combined to independent measurements of their momentum are largely used for PID in modern particle physics experiments. The pattern recognition of the rings is typically likelihood-based and requires computationally expensive simulations, hence different strategies (among which pre-computed look-up tables) have been developed to find a trade-off between time and reconstruction performance. A particular class of Cherenkov detectors is based on the detection of internally reflected Cherenkov (DIRC) light (see, e.g., [2]): light is contained by total internal reflection inside a solid radiator preserving its angular information until it reaches spatially segmented photon sensors, where typically rather complex hit patterns are observed.
Machine learning (ML) algorithms are already the state-of-the-art in event and particle identification in high energy physics [3] but solutions based on ML for this kind of detectors just started being explored [4]. The first DIRC detector was developed by the BaBar experiment at SLAC [5], and inspired other experiments (see, e.g., [6][7][8]) to utilize similar detectors, also in view of future experiments like the Electron Ion Collider [9]. In the following we will consider as an example the case of the GlueX experiment [2,10] at the Jefferson Laboratory, where the DIRC has been recently installed utilizing components of the decommissioned BaBar DIRC to enhance the PID capabilities of the experiment. * cfanelli@mit.edu Our choice is motivated by FastDIRC [11], an open source simulation and reconstruction package for DIRC detectors implementing the GlueX DIRC geometry. This geometry consists of four bar boxes and two photon cameras, where each bar box contains 12 fused silica radiators (1.725 × 3.5 × 490 cm 3 ). Both photon cameras are attached to two bar boxes and are equipped with an array of Multianode Photomultiplier Tubes (MaPMTs) allowing a three-dimensional (x,y,t) readout with a time resolution of approximately 200 ps. Patterns take up significant fractions of the PMT in x,y and are read out over 50-100 ns due to propagation time in the bars. The reader can find in Fig. 1 (left) a schematic of the detector with one of the two photon cameras and in Fig. 1 (right) an example of hit pattern generated with Fast-DIRC expected in the PMT plane (x,y) as a function of the propagation time. In particular, the GlueX experiment is designed to search for gluonic excitations in the meson spectrum produced through photoproduction reactions at a tagged photon beam facility. For this physics program, the DIRC is expected to provide a good separation power between pions and kaons of at least 3σ up to 4 GeV/c in momentum (a plot of the kaon efficiency as a function of the kaon momentum for different pion mis-identification probabilities is shown in Fig. 2), which allows systematic studies of kaon final states that are essential for inferring the quark flavor content of both hybrid and conventional mesons [12,13]. For all these reasons, developing an efficient and fast reconstruction algorithm is of crucial importance. Notice that in the case of ring imaging Cherenkov (RICH) detectors, the time variable is typically not used in the reconstruction methods. This feature could be part of future reconstruction algo- (right) Example of hit pattern detected in the PMT plane (spatial coordinates are dubbed x,y, while the time is indicated as t) simulated with FastDIRC. The two colors correspond to the hit pattern of a kaon and of a pion as reported in the legend, under the same kinematic conditions (i.e. particle momentum, incident angle on the bar, azimuthal angle with respect to the bar, location on the bar and which bar has been hit).
rithms if better time resolutions are achieved. Instead in the DIRC case, the larger propagation times contribute to distinguish the type of particle producing Cherenkov light. Depending on the type of detector, DeepRICH reconstruction can be based on spatial features only or on combined space and time components.
The outline of this paper is as follows: existing reconstruction methods are discussed in Sec. II; the Deep-RICH architecture is presented in Sec. III; application to the DIRC case, discussion of the results and comparison with FastDIRC are described in Sec. IV; summary and conclusions are reported in Sec. V.

II. ESTABLISHED METHODS AND NOVEL APPROACHES
Cherenkov detectors are relatively slow to simulate with full simulations like Geant [14]-e.g., for the DIRC case, each Cherenkov photon reflects on average O(10 2 ) times within a bar and this makes the simulation CPU intensive-thus new approaches are being developed to get a faster reconstruction of the detected light [4,11,15,16]. In this section we briefly describe the state of the art of established computational methods and provide an overview of novel paradigms based on machine learning.

A. The Geometrical Reconstruction Method
The geometrical reconstruction method is based on the BaBar DIRC algorithm [17]. This approach involves generating in advance a large number of photons at different angles exiting each bar, and then tracking them to the PMT plane. In this way a look-up table is created, where each pixel on the photo-detection plane is associated to a set of photon directions at the exit from the bar potentially leading to a photon detected in that pixel. The Cherenkov angle θ C of each photon is then reconstructed combining the particle direction provided by the tracking system with the photon direction taken from a look-up table. The look-up table is stored as a ROOT tree with the size of about 300 MB [10]. The resulting cumulative distribution of the reconstructed Cherenkov angles is typically characterized by the peaks at the expected values of θ C for pions and kaons and a combinatorial background beneath them. The width of the Cherenkov angle reflects the single photon Cherenkov angle resolution characteristic of the detector performance.

B. Time-based Image Reconstruction
Another approach is the so called time-based imaging reconstruction which is derived from a method used by the Belle II TOP [18]. For every particle hypothesis, the expected arrival time of Cherenkov photons is calculated analytically based on the charged particle direction FIG. 2: Kaon efficiency vs momentum for π mis-identification probabilities of 0.1, 1, and 10% (i.e. probability for a charged π track to be incorrectly identified as a charged K). The dashed curves show the conservative performance, while the solid curves show the improved performance achieved in simulation for the current GlueX DIRC design. Image taken from [2], where the reader can find more details.
and hit location and is compared to the measured time, yielding to likelihoods. This method is rather computeintensive, as one in principle should simulate all the configurations of the charged particles as a function of the mass, energy, direction and location in the DIRC bars.

C. FastDIRC
The main characteristic of the FastDIRC algorithm [11] is to analytically trace the photons through the optical system. This approach is about O(10 4 ) times faster than the full Geant simulation. The reconstruction is based on a kernel density estimation (KDE) [19] of the probability distribution function (PDF) for each assumed particle type. The expected distributions on the detection plane for each charged particle hypothesis are compared to the actually observed hit patterns to build likelihoods. FastDIRC allows for parameterization, a feature that makes it suitable for detector design optimization and for offline calibration of real data. It has been shown [11] that the resolution of the reconstructed Cherenkov angle is about 30% better than the geometric reconstruction method. However the FastDIRC method is about O(10 2 − 10 3 ) times slower than the look-up table based reconstruction.
In this paper, FastDIRC is used as a source of reliable simulated events that are injected as input of the DeepRICH architecture.

D. Generative Adversarial Network
A first attempt to apply deep learning to simulate Cherenkov detector response appeared recently in [4], where it has been proposed to use a generative adversarial neural network (GAN) [20] to bypass low-level de-tails at the photon generation stage. This work is based on events simulated with FastDIRC assuming the design of the GlueX DIRC. The GAN architecture is trained to reproduce high-level features (the likelihood results from FastDIRC) based on input observables of the incident charged particles, allowing for an improvement in simulation speed. The authors of [4] claim a good precision and very fast performance (the batch generation on GPU produces up to 1 million track predictions per second) from their studies. Recently in another paper [21] generative models have been used for fast simulation of RICH detectors at LHCb.
In the following section we are going to present a new deep architecture called DeepRICH, providing a thorough description of the code, data preparation, training/testing phases and performance.

III. THE DEEPRICH NETWORK
Differently from the GAN based method, which directly maps the injected input to the reconstructed output, our generative model explicitly reconstructs the injected hit patterns expected for each kinematics, and internally creates latent variables that allow to classify the particles.

A. Architecture
DeepRICH is based on a custom Variational Autoencoder [22]. VAEs are generative models that try to simulate how the data are generated. In order to characterize the causal relations underlying the observed data, VAEs provide a posterior function approximated by an autoencoder architecture, which is made by an encoder and a decoder, the latter being symmetric to the first in terms of layer structure.
In what follows we describe each detected hit by a three-dimensional vector, (x,y,t), corresponding to the spatial and temporal components. We use the notation x ∈ R m×3 to indicate m hits associated to an individual charged particle. The kinematic parameters of each particle are represented by the vector h, and they embody information on the particle momentum, angle and location where the particle crossed each bar (more details on this can be found in Sec. III B, where we discuss about the preparation of data).
Our novel architecture consists of three main parts: • An Encoder, which takes as input the concatenation between (i) m hits produced by a particle, x ∈ R m×3 and (ii) the associated vector h of kinematic parameters, to produce a d-dimensional vector of latent variables for each input hit, i.e. l ∈ R m×d . These vectors contain all the information that the network is capable of extracting from the hits x.
• A Decoder, which takes as input the vectors of latent variables l concatenated with h and provides as output a set of hitsx ∈ R m×3 , corresponding to the reconstruction of the input x.
• A Particle Classifier, which basically consists in convolutional and linear layers; the network takes as input the vectors of latent variables l to classify the particle. The challenging aspect here is to use the information extracted from the Encoder to do PID, that is to understand if the particle that has generated the hits x ∈ R m×3 is a pion (π) or a kaon (K). [23] A flowchart of the DeepRICH network is represented in Fig. 3. The model is trained by minimizing the total loss function which is: where the λ multipliers are used to weigh the contribution of the corresponding loss terms, described in the following: (i) The term L r is the average reconstruction loss between the real particle x and the output of the Decoder x, calculated using the L1 smooth loss (also called Huber error. See, e.g., [24]): [25] L r (x,x) = 1 3 where z i is given by and the index i indicates the spatial or time components of each hit. Such a loss is less sensitive to outliers than the Mean Squared Error (MSE).
In fact in the case of an unbounded output, MSE requires careful tuning of the learning rate and the loss in order to prevent exploding gradients.
(ii) The term L c is the classification accuracy, calculated using the Cross Entropy between the target y, i.e. the ground truth particle's type (0 for kaons and 1 for pions), and the output of the classification layerỹ.
(iii) The loss L v is a term calculated using the Maximum Mean Discrepancy (MMD) [26], as explained in the following; notice that the idea of combining VAE and MMD was used for the first time in [27], where the authors proved that infoVAE (VAE using MMD) is fast to train, stable and leads to a better learning of the features if compared to the traditional evidence lower bound (ELBO) [28] criterion used in VAEs. The basic idea of MMD is that two distributions are identical if and only if their moments are the same. Assuming to have two distributions p(z) and q (z), one can measure the divergence between these distributions: where κ(·, ·) can be any positive definite kernel, which can be seen as a function that measures the distance between two samples. To this end, we use a Gaussian kernel [29]. In our case the distribution p(z) is related to the vector of latent variables, and q (z) is a normal distribution N (0, σ); the best value of σ is determined using the Bayesian optimization described in Sec. III D. A naive intuition of MMD is that the latent vectors should follow the same distribution of q (z). The architecture described in this section is also summarized in form of a pseudo-code in the Alg. 1.
In addition we use a dropout layer after each layer in the Encoder/Decoder, with drop probability equal to 10%; we also apply a dropout on the latent variables before feeding them into the CNN, with a probability equal to 50%. We fix the number of layers in the decoder/encoder to 2, while the number of hidden unites is set to [512,256]. The CNN has 3 layers with, respectively, [64, 64, 128] kernels with stride 1 and size 3, whereas the classifier has 4 layers with [100, 50, 25, 2] neurons, where the dimension of the last layer correspond to the number of classes (π and K). The activation function used after each layer is the Rectified Linear Unit (ReLU). The reader can find more technical details summarized in Table I. FIG. 3: A flowchart of DeepRICH: the inputs are concatenated-n.b., the ⊕ represents the concatenation between vectors-and fed into the encoder, which generates a set of vectors of latent variables, which are then used for both the classification of the particle and for the reconstruction of the hits.

B. Data Preparation
The data generation is based on FastDIRC [11]. Fast-DIRC allows to generate the hit pattern observed in the PMT detection plane for a given kinematics of the charged particle traversing the radiator. The kinematics is characterized by different parameters, namely the momentum of the particle p [GeV/c], the polar angle θ relative to the normal to the surface of the bars, the azimuthal angle φ, the location X, Y on the surface of the bar, the information (as an integer index) on which fused silica bar has been hit. [30] FastDIRC use kernel density estimation to produce an estimate of the probability dis-tribution function on the PMT plane. It generates about 10 5 provisional points for each kinematics, which are used to detect an actual charged particle passing through the bars and generating a sparse hit pattern of about 20-50 "real" hits. FastDIRC therefore generates both the sparse hit patterns associated to one particle as well as the whole probability density function (PDF) associated to a particular kinematics which is used to identify that particle. The training set for DeepRICH has been generated with FastDIRC combining more than one kinematics for a single bar. A particular region of the phase-space can be divided into a fine grid of points. For example, the largest dataset we generate corresponds to an hyper- update θ by minimizing total loss L(x,x, y,ỹ, l) end for end for end procedure • x ∈ R m×3 is a set of hit produced by a charged particle; y is the ground truth of the particle (i.e., a π or a K); h is the vector containing the kinematic parameters associated to the particle; θ are the weights (parameters) of the networks; l is the vector of latent variables associated to x and produced by the Encoder.
• For each hit in the particle, the Encoder produces a vector of the latent variables l, by taking as input the encoded kinematic parameters concatenated with the hit itself.
• The vectors of latent variables associated to the hits of a particle are used to classify the particle itself.
• The Decoder reconstructs the input hits using the latent variables and the kinematic parameters. For each point of the grid we generate one PDF with FastDIRC and then sample randomly the observed "real" hits. This is done by taking into account the expected yield of the photons: we implemented an yield generation inspired by the FastDIRC simulation of the observed hits which takes into account the photon yield reduction due to several effects, e.g., if the total internal reflection condition is not met or a photon misses a mirror. We also check that keeping the yield constant (fixing it to 40 photons) does not change the performance significantly.
Consistently with the expectations, a more dense grid of points combined with a larger number of sampled particles at each kinematic point generally improves the PID performance of DeepRICH (this can be quantified as the Area Under Curve described in Sec. IV A). Taking into account that the intrinsic limit on the achieved performance depends on the kinematic conditions (e.g., the larger the momentum the lower is the π/K distinguishing power), a tradeoff on the above numbers (i.e. how dense the grid and how many particles should be chosen for training) can be found based on the sought classification accuracy and the available computing resources.

C. Model Training and Testing
At each kinematic point (p, θ, φ, X, Y ) we use Fast-DIRC to produce a large number of expected hits for both πs and Ks. Then we sample N particles of a given type (π or K) where by construction each particle consists of a random set of m hits. In this way we avoid that the network learns how to classify particles based on some patterns internal to the FastDIRC generation algorithm. At the same time with this choice we can virtually build an unlimited dataset of particles from the PDFs of FastDIRC.
The generated samples have been then divided into two subsets: training and test: (i) The training set contains particles at certain kinematics which are used during the training phase-ensuring that all the vertices of the hypercube are included-while (ii) the test subset will be used only for testing the network performance after the training procedure to see if it can achieve good results on unknown kinematics. Furthermore the particles from the training set are divided into "training particles" and "development particles" (the split is 80%/20%); the training particles are used to update the parameters of the network by minimizing the total loss (see Eq. (1)), while the development particles are used to calculate an accuracy score, to evaluate the goodness of the classification while training and check if the network is learning properly how to classify hits from known kinematics. Early stopping is used to interrupt the training procedure if the development score does not improve after a certain number of epochs. The classification score on the development particles is also used to tune the hyperparameters of the network with a Bayesian optimization (the procedure is explained in detail in Sec. III D).
We then optimize the parameters of the network with Adam [31] using the tuned learning rate. The dataset has been standardized-for each feature we choose 0 mean and standard deviation (Std) equal to 1-and this is done separately for both the hits and the kinematics parameters, in order to avoid the overshadowing of features with smaller values and further improve the training procedure; notice that the development and test hits have been standardized using the mean and the Std calculated on the training hits, to avoid a potential injection of bias that could improve the classification performance.
We train the network in different experiments, each consisting of at most 50 epochs, and evaluate the performance on the development subset during the training phase. The development accuracy is calculated by applying the sof tmax(·) on the classification layer. When the training is over, the model is evaluated on the test particles extracted from unknown kinematics.

D. Network Optimization
Bayesian Optimizers (BOs) [32,33] are among the most efficient tools for optimizing the hyperparameters of a deep architecture [34]. In fact BOs search for the global optimum x * over a bounded domain χ of a blackbox functions f (x). In particular, f can be noisy, nondifferentiable and expensive to evaluate.
Typically gaussian processes [35] are used to build a surrogate model of f , but other regression methods such as decision trees can also be used. Once the probabilistic model is determined, a cheap utility function (also called acquisition function) is considered to guide the process of sampling the next point to evaluate. The DeepRICH network consists of N hyperparameters listed in Table II. In particular, the multipliers of the loss functions defined in Eqs. (2), (3), (4), the dimension of the latent variables, the MMD variance and the learning rate play an important role in the performance of the network. These hyperparameters are tuned with a BO provided by the sklearn [36] package. As previously discussed, other hyperparameters, e.g., the number of layers in the architecture, are not tuned and their values are reported in Table I. We choose as objective function f the develop-ment score obtained during the training phase. Each call of the BO is based on 50 epochs. Results of the optimization are summarized in Table II.

IV. RESULTS
The following results are based on charged π, K candidates with momentum between 4 and 5 GeV/c, the latter corresponding to a challenging kinematics given the sizeable overlap between the expected hit patterns. The capability of distinguishing πs from Ks and effectively doing PID depends on the features and the causal relations learnt in the space of the latent variables. A 3D visualization in the space of the latent variables is shown in Fig. 4, where t-SNE [37] is used for dimensionality reduction. A clearer separation is achieved in the reduced space of the latent variables at 4 GeV/c compared to 5 GeV/c.
An alternative representation of the same data is shown in Fig. 5. Here the distinguishing power is quantified as the average absolute difference between π and K in each latent variable versus the Y-position on the quartz bar. This is shown at 5 and 4 GeV/c in momentum (top and middle of Fig. 5, respectively). Notice that the number of bins (16 on the x-axis) corresponds to the dimension of the vector of latent variables. Intuitively, the larger the absolute difference the more πs are separated from Ks. The relative difference (bottom of Fig. 5) is characterized by negative values only, pointing to the obvious interpretation that the distinguishing power is larger at 4 GeV/c. Notice also that in good approximation the separation between the two particle types does not depend on the Y-location on the quartz bar and we verify as a sanity check the presence of vertical bars in the patterns of Fig. 5 along the y-axis.
As described in Sec. III B, the event generation is based on FastDIRC which is also used in this section as a reconstruction algorithm to provide a benchmark against which evaluating the performance of the DeepRICH architecture.

A. Comparison with FastDIRC
The PID strategy in FastDIRC is likelihood-based: N d photons for each candidate particle are detected in the PMT plane, and N g photons are generated to produce the expected PDFs of the 2 candidates (π, K). The N d particles are then used to compute the log-likelihood from each candidate PDF as follows: where λ is a bandwidth and x is a vector whose components are the spatial and time coordinates of each hit (either detected or generated).
[38]  These features are then used to classify the particle. The plot shows a better separation between π/K at 4 GeV/c, which means that the network has good distinguishing power. As expected the points become less separated at larger momentum. The 3D visualization is obtained with t-SNE [37].
The operational definition of likelihood in DeepRICH is different from Eq. (5), in that different quantities are provided by the network: as explained in Sec. III, the output of the classifier is a two-dimensional vectorỹ ∈ R 2 , and we use these values as likelihoods for π and K.
At this point we can consider the ∆ log L, the difference between the two log-likelihoods (under the null hypotheses of π and K, respectively). Histograms of ∆ log L are obtained for both FastDIRC and DeepRICH and shown in Fig. 6 at 4 GeV/c (left column) and 5 GeV/c (right column), respectively. Two different colors are used in the legend to highlight the ground truth of each particle (which is either a real π or K).
In the same figures, to quantify the performance of the two algorithms, a Receiver Operating Characteristic (ROC) curve is obtained by changing the threshold on the ∆ log L to cut on.
The ROC curves have been produced generating 350 particles observed for each kinematics and the Area Under Curve (AUC) is used as a metric to compare the performance of the two algorithms.
A detailed comparison between FastDIRC and Deep-RICH reconstructions is reported in Fig. 7 (top), where the DeepRICH AUC divided by the corresponding AUC of FastDIRC is drawn as a function of a single kinematic variable, after integrating the performance over all the other kinematic parameters to show the partial dependence on that particular variable.
The plots show that the two algorithms are very close in reconstruction performance, namely AUC(deepRICH) 0.99 · AUC(FastDIRC) in a large region of the kinematic parameters where the reconstruction efficiency of DeepRICH is approximately uniform, while a slight dependence is observed as a function of the momentum. Fig. 7 (bottom) summarizes these results in form of radar plots: each axis correspond to a kinematic parameter, and the distance from the center on each direction corresponds to the correlation of the AUC with that specific parameter. As expected, the largest dependence of the AUC is on the momentum parameter, the π/K distinguishing power becoming lower at larger values of the momentum.

B. Test on Unknown Kinematics
One major concern about this method regards the predictability for kinematics not explicitly injected in the training phase. In this section we show results that prove the stability of the network reconstruction for every kinematic point belonging to the hypercube ∆p × ∆θ × ∆φ × ∆x × ∆y, which was approximated in Sec. III B by a discrete grid of training datasets. This approximation tacitly assumes no discontinuities in the hit pattern by varying the parameters within the hypercube.
In Fig. 8 we show the quality of the DeepRICH reconstruction for unknown kinematics in terms of the test score. We performed different tests and we did not notice any sensible changes in the test score and in the AUC, which are two figures of merit we have used to prove the quality of the reconstruction.

C. DeepRICH Performance
In this section we summarize the performance of the network both in terms of reconstruction efficiency and computing time.
The quality of the reconstruction is high as shown in Table III: as already mentioned, the AUC values are close to those of FastDIRC, given a certain sub-region of the kinematic space for the training process. Notice these results can further improve considering the major points addressed in Sec. III B-III D for the training phase. Table IV reports on the memory and computing time performance: the inference time is the actual time Deep-RICH needs to do PID after the training phase and is on average O(ms) per batch of particles using a GPU Titan V. Fig. 9 shows the inference time as a function of the batch size: the inference time is approximately constant up to 10 4 particles, which is the maximum batch size that could be handled in our configuration. For completeness we also report a comparison with the reconstruction time of other methods: for look-up-tablebased algorithms, not fully optimized estimates provide order few ms per track on a single standard CPU [40]; for FastDIRC it is about 300 ms per track on a Macbook Air 2.2 GHz i7 and is dominated by the generation of the PDF, though it's worth reminding that can be massively parallelized; the GAN method [4] is the closest to our order of magnitude (but it regards the generation of ∆ log L values) and the authors claim 1M particles generated per   FIG. 9: After training, the inference time is almost constant as a function of the batch size, meaning that the effective inference time-i.e., the reconstruction time per particle-can be as small as few µs. Notice that the corresponding memory size in the inference phase is approximately and equal to the value reported in Table IV. second.
Another potential advantage of DeepRICH is the limited network size evaluated throughout all the training phase, which never exceeded 4 GB for different network configurations. It's worth reminding that the network size depends mainly on the weights of the network and the gradients, rather than on the subspace of the kinematic parameters used in the training phase. This is a feature to keep in mind when comparing to the overall size of a look-up table obtained for example with the geometrical reconstruction method.

V. SUMMARY AND CONCLUSIONS
The DeepRICH architecure developed in this paper shows very promising results. As a case study we consider the DIRC detector. Notice that DeepRICH is agnostic to the shape of the photon patterns, and in principle it can be trained to do PID for other imaging Cherenkov detectors.
The training set is generated with FastDIRC dividing the phase-space in a fine grid of points. We have made different tests changing the number of kinematic points in p, θ, φ, X, Y and for one specific bar of the DIRC (we refer the reader to Sec. III B for more details on the preparation of data). We prove the high quality and stability of the reconstruction within the kinematic subspace. We then increased the space and kept the same dimensions of the neural network architecture, and this does not seem to affect the quality of the reconstruction. Notice that the generation of the hypercube and the resulting density can be further optimized in the future. Increasing the kinematic space and consequently the size of the dataset obviously results in larger training time and ideally this is limited only by computing resources and available time to train the network. It is worth reminding that the size of the network is related to the weights and the dimensions of the architecture. The measured inference time is approximately equal to 1 ms per batch and we find it is roughly constant up to 10 4 particles. Notice that further parallelization of the network can be explored during both the training and inference phases.
Our conclusion is that DeepRICH, within the conditions described throughout the text, can reach the reconstruction efficiency of established algorithms and potentially outperform them in the reconstruction time.
The O(ms) time performance per batch of particles makes this algorithm suitable for near real-time applications (e.g. calibration). The high quality of reconstruction and the fast computing time are two compelling features of the DeepRICH algorithm, this coming at the cost of relatively long training time, as expected. If the latter aspect cannot be further optimized in the future, one can always use DeepRICH to characterize critical sub-regions of the phase-space, e.g., it can be applied to each bar separately.
DeepRICH has been designed to be easily generalized to classify other categories of particles, and the extension of the network is left for future development. An important feature is related to the nature of the VAE, which suggests a tempting scenario of generalizing DeepRICH to fast generation of events once the behavior in the latent space is learnt. Finally another suggestive application could be training DeepRICH using pure samples of identified particles from real data, this allowing to deeply learn the response of the Cherenkov detector.