The artificial retina for track reconstruction at the LHC crossing rate

We present the results of an R&D study for a specialized processor capable of precisely reconstructing events with hundreds of charged-particle tracks in pixel and silicon strip detectors at $40\,\rm MHz$, thus suitable for processing LHC events at the full crossing frequency. For this purpose we design and test a massively parallel pattern-recognition algorithm, inspired to the current understanding of the mechanisms adopted by the primary visual cortex of mammals in the early stages of visual-information processing. The detailed geometry and charged-particle's activity of a large tracking detector are simulated and used to assess the performance of the artificial retina algorithm. We find that high-quality tracking in large detectors is possible with sub-microsecond latencies when the algorithm is implemented in modern, high-speed, high-bandwidth FPGA devices.


Introduction
Higher LHC energy and luminosity increase the challenge of data acquisition and event reconstruction in the LHC experiments.The large number of interactions for bunch crossing (pile-up) greatly reduces the discriminating power of usual signatures, such as the high transverse momentum of leptons or the high transverse missing energy.Therefore real-time track reconstruction could prove crucial to quickly select potentially interesting events for higher level of processing.Performing such a task at the LHC crossing rate is a major challenge because of the large combinatorial and the size of the associated information flow and requires unprecedented massively parallel pattern-recognition algorithms.For this purpose we design and test a neurobiology-inspired pattern-recognition algorithm well suited for such a scope: the artificial retina algorithm.

An artificial retina algorithm
The original idea of an artificial retina tracking algorithm was inspired by the mechanism of visual receptive fields in the mammals eye [1].Experimental studies have shown neurons tuned to recognize a specific shape on specific region of the retina ("receptive field") The strength of the response of each neuron to a stimulus is proportional to how close the shape of the stimulus is to the shape for which the neuron is tuned to.All neurons react to a stimulus, each with different strength, and the brain obtains a precise information of the received stimulus performing some sort of interpolation between the responses of neurons.
The retina concepts can be geared toward track reconstruction.Assuming a generic tracking detector, the arXiv:1411.1281v1[physics.ins-det]5 Nov 2014 3D charged particle trajectory is described by five parameter.The space of track parameters are discretized into cells, which mimic the receptive fields of the retina.The center of each cell identifies a track in the detector space, that intersects detector layers in spatial points that we call receptors.For each incoming hit, the algorithm computes the excitation intensity, i. e. the response of the receptive field, of each cell as follows: where s kr is the distance, on the layer k, between the hit and the receptor r. σ is a parameter of the retina algorithm, that can be adjusted to optimize the sharpness of the response of the receptors.After all hits are processed, tracks are identified as local maxima over a threshold in the space of track parameters.Averaging over nearby cells of the identified maximum provides track parameters with a significant better resolution than the available cell granularity.

Retina algorithm in a real HEP experiment
To evaluate the performances and the robustness of the algorithm in a real HEP detector, we focus on the upgraded LHCb detector.The upgraded LHCb detector [2], a single-arm spectrometer covering the pseudorapidity range 2 < η < 5, is a major upgrade of the current LHCb experiment, and it will run at the instantaneous luminosity of 3×10 33 cm −2 s −1 , with a beam energy of 7 TeV.All the sub-detectors will be read out at 40 MHz, allowing a complete event reconstruction at the LHC crossing rate.To benchmark the retina algorithm, we decided to perform the first stage of the upgraded LHCb detector tracking reconstruction [3], using the information of only two sub-detectors, placed upstream of the magnet: the vertex locator (VELO), a silicon-pixel detector [4] and the upstream tracker (UT) [5], a silicon microstrip detector.We used the last eight forward pixel layers of the VELO and the two axial layers of the UT.We arbitrarily chose to parametrize tracks with the following parameters: (u, v, d, z 0 , k). (u, v) are the the spatial coordinate of the intersection point of the track with a "virtual plane" perpendicular to the z-axis, placed to a distance z vp from the origin of the coordinate system.d is the signed transverse impact parameter, z 0 is the z-coordinate of the point of the closest approach to the z-axis.k is the signed curvature in the bending plane ( B = Bŷ).
The detector geometry and magnetic field (negligible in the VELO and about 0.05 T in the UT), allow us to use only the (u, v) parameters to perform the pattern recognition, since the 5D tracks' parameters space can be factorized into (u, v) ⊗ (d, z 0 , k).Thus (u, v) are the "main" parameters where pattern recognition is performed, whereas (d, z 0 , k) are treated as "perturbation" of the main parameters (u, v) [6,7].
To evaluate the performances of the algorithm, we develope a detailed C++ simulation of the retina algorithm [8] able to process simulated events, interfaced with the default LHCb simulation.We discretize the main (u, v)-subspace into 22 500 cells, a granularity O(100) larger than the maximum expected number of tracks in a typical upgraded LHCb event.Generic collisions samples from the default LHCb simulation are used to assess the performances of the retina algorithm.The generic collisions are generated with beam energy of 7 TeV, and luminosities up to L = 3 × 10 33 cm −2 s −1 .A typical response of the retina algorithm is shown in fig. 1, where several clusters are clearly identifiable, and most of them reconstructed as tracks.
All hits from simulated events from the default LHCb simulation are sent and processed by the retina.In order to evaluate tracking performances we considered only tracks in a region of the (u, v)-plane where they have full acceptance on the chosen layer configuration.In addition, cuts close to the ones applied to calculate the offline efficiency [3] are applied.For instance, we required at least three hits on VELO layers and two hits on UT layers, and also a momentum p > 3 GeV/c and a transverse momentum p T > 200 MeV/c.Tracks satisfying all these requirements are defined as reconstructable, and the tracking efficiency is defined as the number of reconstructed tracks over the number of reconstructable tracks.The efficiency of the retina is reported in figure 2 as function of p T , d parameters.We also report the efficiency of the offline LHCb track reconstruction algorithm, performing the same task as the retina [6].The retina algorithm shows very high efficiencies in reconstructing tracks, about 95% for generic tracks, which is comparable to the offline tracking algorithm.The fake track rate is 8% at L = 2 × 10 33 cm −2 s −1 and 12% at L = 3 × 10 33 cm −2 s −1 , slightly higher than the fake rate of the offline algorithm.We also estimate the efficiency of the retina algorithm in recostruncting signal tracks from some benchmark decay modes, such as B 0 s → φφ, D * ± → D 0 π ± and B 0 → K * µµ for L = 2 × 10 33 cm −2 s −1 .The efficiency for these channels is about 97-98%.Resolutions on tracking parameters determined by the retina are comparable with those of the offline reconstruction.

Hardware implementation
To fully exploit the high-grade of parallelism of the algorithm, we developed the retina algorithm into FPGA chips [9].The logic is implemented in VHDL language; detailed logic-gate placement and simulation on the high-bandwidth Altera Stratix V device model 5SGXEA7N2F45C2ES is achieved using Altera's proprietary software.Figure 3 shows an overview of the devices architetture.To achieve an efficient distribution of the hit information coming from the detector layers to the cells of the space of track parameters, we design an intelligent information delivery system that routes each hit in parallel to all and only those cells for which such hit is likely to contribute a significant weight.The switching network completes its processing in 30 clock cycles.Each cell in the tracks parameter space is defined as a logic module, the engine.The engine is implemented as a clocked pipeline, that calculate the excitations.The engine process takes 17 clock cycles.At the end, the logic that identifies the center-of-mass in the space of track parameters take 11 cycle of clock cycles along with another 10 cycles for fanout.With a clock frequencies of 350 MHz, the latency for reconstructing tor to the processing engines that calculate the excitations.The need for a 40 MHz throughput with a flow of several Tbit/s of input data make this a nontrivial task.The other challenge is performing pattern recognition quickly enough to remain within the harsh latency constraints.Any solutions to either issues necessarily depends on the actual geometry of the tracking layout.The peculiar LHCb geometry with straight-line tracks traversing the vertex detector before being curved by the magnetic field in the downstream tracking stations allows an efficient solution for both challenges.Tracking performance sufficient for triggering can be achieved by doing a pattern recognition in a volume where the magnetic field is weak.In this regime, contiguous detectors hits correspond to contiguous regions in track parameter space, which simplifies significantly the switching task.A mapping between detector hits and parameters associated with tracks produced in the collisions is performed using simulation.The result is used to associate a "zip-code" with each possible detector hit, which is used by the nodes of the switching network to properly route the hit.The LHCb geometry allows factorizing pattern recognition into two steps.First, tracks are assumed to be straight lines originated from a common nominal interaction point and track-finding is performed in a two-dimensional primary plane transverse to the beam, whose intersection which each track identifies the track's two parameters.Then, the determination of the momentum and actual spatial origin of the charged particle are treated as small perturbations of the primary two-dimensional track.Figure 3 illustrates the architecture of the retina track processor.

Switching network
We design an intelligent and economical information-delivery system that routes each hit in parallel to all and only those engines for which such hit is likely to contribute a significant weight.Each -4 - online tracks is less than 0.5 µs.Each Stratix V can host up to 900 engines leaving approximately 25% of logic available for other uses, including a 15% of switching and the logic for center-of-mass calculation [6].

Conclusions
We showed that high-quality tracking in large LHC detectors is possible at a 40 MHz event rate with subµs latencies, when appropriate parallel algorithms are used in conjunction with current high-end FPGA device.This opens the interesting possibility of designing high-rate experiments where track reconstruction happen transparently as part of the detector readout.

Figure 1 :
Figure 1: Left: response of the retina algorithm (only the (u, v)-plane, where the pattern recognition is made) to a generic collision from the default LHCb simulation, with instantaneous luminosity of L = 2 × 10 33 cm −2 s −1 .The hole at the center of the figure is due to the physical hole in the VELO layers.Right: a zoom of the retina response.

Figure 2 :
Figure 2: Tracking reconstruction efficiency of the retina algorithm (in red) and of the offline VELO+UT algorithm (in blue), as function of: (a) p T , (b) d.The distribution of the considered parameter is, also, reported in black.Luminosity of L = 3 × 10 33 cm −2 s −1 .

Figure 3 .
Figure 3. Illustration of the device's architecture