Comparison of two hardware-based hit filtering methods for trackers in high-pileup environments

As experiments in high energy physics aims to measure increasingly rare processes, the experiments continually strive to increase the expected signal yields. In the case of the High Luminosity upgrade of the LHC, the luminosity is raised by increasing the number of simultaneous proton-proton interactions, so-called pile-up. This increases the expected yields of signal and background processes alike. The signal is embedded in a large background of processes that mimic that of signal events. It is therefore imperative for the experiments to develop new triggering methods to effectively distinguish the interesting events from the background. We present a comparison of two methods for filtering detector hits to be used for triggering on particle tracks: one based on a pattern matching technique using Associative Memory (AM) chips and the other based on the Hough transform. Their efficiency and hit rejection are evaluated for proton-proton collisions with varying amounts of pile-up using a simulation of a generic silicon tracking detector. It is found that, while both methods are feasible options for an efficient track trigger, the AM based pattern matching produces a lower number of hit combinations with respect to the Hough transform whilst keeping more of the true signal hits. We also present the effect on the two methods when increasing the amount of support material in the detector and introducing inefficiencies by deactivating detector modules. The increased support material has negligible effects on the efficiency for both methods, while dropping 5% (10%) of the available modules decreases the efficiency to about 95% (87%) for both methods, irrespectively of the amount of pile-up.


Introduction
At the Large Hadron Collider (LHC), experiments such as ATLAS [1] and CMS [2] are analyzing proton-proton interactions to study the nature of matter. The rare phenomena that are searched for are hidden in an enormous background from well know physics interactions. To increase sensitivity to interesting phenomena, the instantaneous luminosity in the LHC will be increased [3] to 5 × 10 34 cm −2 s −1 , which is 5 times higher than the design value of the current LHC. The drawback of the higher luminosity is that it will also increase in the number of interactions per bunch crossing, referred to as pile-up, by an equal amount. This presents new technical challenges to the experiments.
The sheer amount of data generated by the collisions at the High Luminosity upgrade of the LHC (HL-LHC) makes it impossible to read out and store all events. That is why experiments such as ATLAS and CMS use a trigger system to select the most interesting events. It is particularly important to be able to trigger on high transverse momentum (p T ) particles, since they provide a clean signature for a variety of interesting processes in and beyond the Standard Model. It is vital to keep the trigger p T thresholds as low as possible since the acceptance fraction of interesting processes, such as Higgs bosons decaying to tau leptons, decreases rapidly with increasing p T thresholds.
The trigger is usually organized in at least two levels: a hardware trigger based on analogue electronics and logic in Field Programmable Gate Arrays (FPGAs) and Application-Specific Integrated Circuits (ASICs) followed by a software trigger running on a computer farm. The hardware trigger in the two LHC experiments reduces the event rate from the bunch-crossing rate of 40 MHz to the order of hundreds of kHz. It only has a few microseconds to make a decision and must be located on or close to the detector. The software trigger uses sophisticated reconstruction algorithms and makes a trigger decision on the timescale of seconds. After the software trigger the event rate to offline storage is a few hundred Hz.
In order to maintain low trigger thresholds, high efficiency and low rates, the two LHC experiments are developing methods to use tracking information in the hardware trigger. This has until now not been possible because of insufficient bandwidth to read out the tracker and resources to process the tracker data fast enough for the trigger decision.
In this paper we compare two pattern recognition methods that can be used in the hardware trigger for fast hit filtering of tracking data. In section 2 we present the motivation and challenge of using tracking in the trigger. In section 3 we present the simulation framework. In sections 4 and 5 we present the two methods for fast hit filtering: pattern matching using Associative Memory (AM) chips and the Hough transform. Section 6 presents the results from simulation and compares the two methods. A summary is given in section 7.

Triggering with tracks
The tracker is the most granular detector system in the collider experiment. It consists of 10-20 layers, located at increasing distance from the interaction point, equipped with silicon detector modules. The layers at short distance from the interaction point are the most granular and are equipped with pixellated sensors giving two-dimensional information of where high energetic particles cross the sensor. Sensors with strips are used at large distance from the interaction point. These are less granular than the pixel sensors and give only one-dimensional information but requires less electronics to be read out. Two-dimensional information can be obtained from the strip layers using two layers of silicon strip sensors in close distance that are rotated by an angle.
To read out data from the tracker requires high bandwidth. In current trackers the services needed to meet the required bandwidth exceed the available space and power budget. Hence, methods to reduce the quantity of data to be read out are needed. This is done by introducing data reduction on-detector as developed by CMS or using readout in Regions of Interest (RoIs) seeded by the muon and calorimeter triggers, as developed by ATLAS. (In ATLAS, an RoI is defined as a region in the track parameter phase-space that spans a maximum of 10 % of the tracker volume.) The processing of tracker data is done in two main steps. The first is to identify hits from high-p T particles originating from primary interactions. This is done with coarse resolution in order to save time and computing resources. The filtered hits with full resolution are forwarded to a fitting step where the track parameters are determined. The performance of the hit filtering step is essential to keep the number of fits at a low level. The hit filtering step must have a high efficiency not to lose potentially interesting events.
The two methods studied in this paper are the Hough transform and pattern matching with AM which both can be implemented in hardware. The study is influenced by the development in ATLAS but the performed studies are general and valid for similar problems. We have decided to limit the study to an eight layer system to make it easier to compare the two methods. An eight layer system is a reasonable choice for a hardware based system since limited bandwidth constraints and computing resources still do not allow all potential layers to be used.

Simulation
A generic tracking detector, similar in layout to those of the ATLAS and CMS experiments, is modeled in G 4 [4]. The detector has a barrel section, with sensors arranged in cylindrical shells around the beam axis, and endcaps with disks of sensors in the transverse plane. A schematic view of the layout is shown in figure 2, and scans of the radiation length is shown in figure 3. The sensors are modeled as rectangular boxes of silicon, 320 µm thick in the radial (beam axis) direction in the barrel (endcaps). The hit positions in the sensors are recorded modulo the readout segmentation, i.e. the local (x, y) coordinates are turned into discrete hit positions of (row, column). Figure 1 shows the layout and local coordinates of the sensors on the support material. The sensors with one dimensional segmentation, so-called strip detectors, divide the local x coordinate in steps of 80 µm. Sensors with two-dimensional segmentation, so-called pixel detectors, divide the local x-coordinate by 25 µm and the local y-coordinate by 250 µm. The barrel region contains five layers with pixel sensors and five double layers of strip sensors, with details given in table 1. In addition to the sensors, all layers contain material in the form of 3 mm thick carbon supports. The double layers have strip sensors on each side of the stave rotated with a stereo angle of 40 mrad. The stereo angle provides fine resolution for offline data, but at the hardware track trigger stage this information is not available.
Single muons were generated directly in G 4 using a particle gun, while minimum bias events were generated with P 8 [5] using the SoftQCD:inelastic setting for proton beams with a 14 TeV center of mass energy. The propagation of the particles through the detector was   simulated in G 4 with the FTFP_BERT physics list. The muons were generated with impact parameters sampled from uniform distributions of |z 0 | < 150 mm and |d 0 | < 2 mm. The transverse momentum ranged from 4 to 400 GeV and was sampled from a uniform distribution of 1/p T . In addition, the muon η and φ was restricted to one RoI, 0.1 ≤ η 0 ≤ 0.3 and 0.3 ≤ φ 0 ≤ 0.5.
The G 4 simulation is followed by a digitization step. In this step a single muon event is overlaid with 0 to 260 minimum bias events to emulate various degrees of pile-up. Muons with primary hits outside the RoI are discarded. The hits are merged by adding together the total energy deposited in each pixel or strip and requiring that the total energy deposited corresponds to more than 1 fC of collected electric charge. It is these merged hits that are referred to as hits in the rest of the paper. Hits coming directly from G 4-generated muons are labeled as signal and everything else as background.
The number of connected strips or pixels in φ that have a hit, the so-called cluster width, is also calculated for each hit. The cluster width is later used in the Hough transform and the AM pattern matching to ignore hits with cluster width larger than 3. The physical motivation for this is that the high-p T tracks are straighter and produce smaller clusters than the more bent low-p T track from minimum bias.

Hough transform
The Hough transform was originally developed for particle physics to detect particle tracks in photographic plates from bubble chambers in the late 1950s [6]. Since then the method has been generalized and is commonly used to detect features that can be represented by a few parameters, such as lines and circles, in a wide range of imaging applications [7]. The Hough transform is actively used in offline tracking, e.g. in ALICE [8], and is gaining interest with recent advancements in parallel computing; see for example [9]. To be used in triggering, the Hough transform must be implemented in hardware, for instance using FPGAs. In this paper, the Hough transform is simulated using C++, but a very similar implementation has been written in O CL and run on an FPGA.
The Hough transform calculates all combinations of parameters consistent with each data point and casts votes in a histogram-like object called an accumulator, that has one dimension for each parameter in the feature representation. The coordinates of the points in the accumulator with the most votes corresponds to candidate features.
The track of a charged particle in the transverse plane of a uniform magnetic field, e.g. in a tracking detector, is described by a circular arc. If the interaction vertex is constrained to the origin of the coordinate system and the RoI is small enough in φ 0 (such that sin(x) ≈ x), the track through a point with polar coordinates (r, ϕ) can be described by where A ≈ 1.5 × 10 −4 GeV c −1 mm −1 T −1 is the unit conversion constant if r is measured in millimeters, B is the magnetic field in T, p T is the transverse momentum of the track in GeV c −1 , and q is the sign of the electric charge. The accumulator is implemented as a two-dimensional histogram with A qB p T on one axis and φ 0 on the other. The accumulator is filled by entering the hit coordinates into equation (4.1) and sweeping φ 0 over the range defined by the RoI to get the A qB p T coordinate. However, each bin does not simply contain the number of hits. Instead, the bin consists of an 8-bit number to keep track of which of the eight layers have been hit in the small parameter space that it spans. It also has a list of all the hits in the bin. After the accumulator has been filled, bins with less than 6 layers hit are discarded. The surviving bins are track candidates with rough track parameters φ 0 and p T given by the location of the bin in the accumulator and a list of hits that can be sent to the track fitter.
The proton-proton collisions in ATLAS at the HL-LHC are expected to be spread out uniformly over 300 mm along the beam line, i.e. z-axis, making the RoI very wide in z 0 [10]. Since the Hough transform, as implemented here, is looking at a projection in the transverse plane, this makes it difficult to separate tracks with similar φ 0 and p T that originate from different points along the beam line. The two conventional approaches of separating tracks in z 0 using the Hough transform is either to parameterize the track in z 0 as well as φ 0 , or make a second pass of the Hough transform to find lines in the r-z plane after finding track candidates in the transverse plane. The problem with the first approach is that it requires to sweep both φ 0 and z 0 , which adds another dimension to the accumulator and increases the number of computations needed quadratically. The problem with the second approach is that the Hough transform in the r-z plane would have to be performed after completing the Hough transform in the transverse plane and would increase the execution time proportionally to the number of hits passing the first stage, approximately 30 % for single muons embedded in 200 minimum bias events, since the hits are read out serially. Also, the track finding efficiency is found to be poor due to the low resolution of the strip layers in z.
Our approach to help separating tracks in z 0 is to slice up the RoI in several parts in z 0 as illustrated in figure 4 and fill separate accumulators for each slice by sorting the hits according to their z-coordinate. The RoI slices overlap with each other slightly in z because of the η 0 range and due to the non-zero detector resolution in z, especially for the strips. The z 0 -slicing approach is found to reduce the number of hits passing the Hough transform by 70 % without affecting the track finding efficiency for single muons embedded in 200 minimum bias events.  The number of bins in A qB p T and φ 0 , as well as the number of slices in z 0 , need to be tuned to find a configuration that provides a high-enough track finding efficiency while suppressing as much of the low-p T background as possible. The optimum number of bins in the accumulator depends strongly on the detector geometry, the detector resolution, and the layer configuration used. The optimal number of z 0 -slices is mostly affected by the pile-up conditions. The Hough transform can be implemented in hardware using an FPGA. The FPGA receives the hits serially and fills the whole φ 0 range and all z 0 -slices in parallel. There are N z accumulators, one for each z 0 -slice, with N p T bins in A qB p T and N φ in φ 0 consisting of 8 bit, one for each layer. The total memory requirement for the accumulators is N z × N p T × N φ × 8 bit. When all hits have been processed, bins with at least 6 hits in unique layers are considered track candidates with rough track parameters φ 0 , p T , and z 0 . At this point full-resolution hits need to be associated with the track candidates and sent to the track fitter. This is expected to be done in a separate FPGA.
The available hardware constrains the size of the accumulator used. For the particular geometry and layer configuration used in this study (described in section 3), the accumulator typically has N z = 6, N p T = 120, and N φ = 300, amounting to a memory requirement of 216 kB. The low latency requirement of a hardware track trigger prevents the use of external memory; hence, the accumulator must be stored internally in the FPGA.
The vertex constraint imposed when deriving equation (4.1) is necessary to reduce the computational complexity and the time needed to perform the Hough transform. Without the vertex constraint each hit has to be paired with every other hit and the amount of computation grows as the square of the number of hits in the event. However, imposing a vertex constraint has an important drawback: it limits the ability to find tracks with large impact parameters, i.e. those which originate far from the beam line, especially if using detector layers located close to the interaction point.

Pattern matching
Pattern matching is a commonly employed technique in particle track finding; a review of its use can be found in [11]. At the LHC, pattern matching is used by the ATLAS and CMS experiments both in track triggers and offline reconstruction to reduce the computational load on track fitting algorithms. A pattern usually consists of a set of hit positions in the detector, which correspond to the track of a high energy particle. The actual hits in the detector can be compared to these patterns, and only the hits that match a pattern are propagated to the track fitting step. The pattern matching step can be made very fast by using a subset of the available detector hit positions, a coarser resolution, or both. ATLAS is currently commissioning a hardware track trigger, the Fast TracKer (FTK) [12], which performs the pattern matching in AM chips [13]. This technology has been used in a wide variety of applications outside of particle physics, such as network switches and artificial neural networks. The AM chips can compare a detector hit to the stored patterns in parallel, making for a very fast system. This method is considered for a fast hardware track trigger in the HL-LHC era by both ATLAS and CMS. A constraint on the design of the trigger comes from the number of patterns that can be stored on a single AM chip, which also limits the number of layers that can be used. The current estimation for the AM chips under development for the ATLAS upgrade is that they can hold about half a million patterns using eight layers. Here we assume that two such chips can be used to cover a phase space of 0.2 × 0.2 in ∆η 0 × ∆φ 0 based on the cost and power requirements of these chips.
In this study, the patterns consist of a set of hit positions with coarse resolution in a subset of eight silicon detector layers. The coarse resolution hit positions, so-called superstrips, are formed by groups of contiguous silicon strip or pixel elements. A unique superstrip index (SSID) is created from the module and superstrip number, and stored in the pattern. The patterns are generated by simulating single muons and recording the hit positions in the selected layers. About 30 million muons are used to build a bank of a few million patterns. A large sample is needed to get good efficiency, but some events produce the same pattern which is why the bank typically contains less than five million patterns. The patterns are ordered by their use count, i.e. how many events produce that same pattern. To respect the hardware constraints, only the first one million patterns are used to match hits in a separate sample of muons overlaid with minimum bias events. During the matching all hits in the detector are converted to their superstrip index and compared to each entry in the pattern bank. Each pattern can have multiple detector hits within the superstrips which are propagated to the track fit.
The patterns use eight separate layers and a match requires seven of these layers to have a superstrip hit. Since a realistic detector can have inefficiencies due to the lack of coverage, e.g. dead space between sensors, the efficiency of the pattern matching is increased by using patterns with wild cards. If a muon lacks a hit in any of the layers during the pattern generation, the layer can be marked as a wild card for this pattern. In the pattern matching, a pattern with a wild card always considers that layer to have been hit. With a maximum of two wild cards per pattern it is possible to have a matching pattern with only five real hits in separate layers, which increases the rate of fake matches from combinations of background hits and degrades the track fit performance. For these reasons, patterns with two wild cards are required to have all eight layers hit to be matched, i.e. six real hits.
The number of patterns in a bank is mainly driven by the size of the superstrips, as the number of possible patterns grow with smaller sizes. Using larger superstrips is not an ideal option, since it increases the amount of fake matches. A compromise can be found by using don't care (DC) bits. A DC bit on the SSID in the pattern bank effectively doubles the size of the superstrip, as it disregards the value of the least significant bit in the SSID. Many patterns share the same SSID in several layers and/or have neighboring SSIDs. These patterns can then be combined into one pattern with one or more DC bits. Since a DC bit can be set individually for each pattern and layer, this creates a bank of patterns with variable resolution. This reduces the total number of patterns in the bank but only marginally increases the number of fake matches.

Comparison of the hit filtering
The track fitter which follows after the hit filtering will have to perform one fit for each hit combination. Therefore, the number of combinations of hits after the pattern recognition stage is an important measure of performance. Both the Hough transform and the AM approach group hits together. In the Hough transform the hits are grouped into bins in the accumulator, while in the AM pattern matching, each matched pattern is associated with a group of hits. The total number of fits N fits needed is calculated by taking the product of the number of hits in each layer and then summing over the number of groups: where n g,l is the number of hits in layer l of group g.
The other important measure of performance is the track finding efficiency. A muon track is considered to be successfully found if at least 6 out of 8 hits from the primary muon are found in unique layers. This definition is motivated by that the subsequent track fit, e.g. the one planned for the ATLAS Phase-II hardware track trigger, is foreseen to fit tracks with 8, 7, and 6 out of 8 hits. The efficiency is defined as the number of events with a muon found divided by the total number of events. One thing to note is that the track finding efficiency of the AM pattern matching is limited by the number of patterns that can fit in the hardware. The Hough transform, on the other hand, can be configured to provide almost any efficiency within the same hardware. However, a higher efficiency comes with the cost of a larger number of possible hit combinations. For the results presented here, the Hough transform was tuned to provide similar efficiency as the AM method.
In section 6.1 we present and compare the performance of the hit filtering for AM pattern matching and the Hough transform in terms of the number of hit combinations after the hit selection, the efficiency of finding muon tracks, and how many hits originating from the true muon that pass the hit filters. In section 6.2 we present the effect of adding 50 % more material, and in section 6.3 we present the effect of randomly turning off 5 % or 10 % of the modules.  Figure 5 shows the number of hit combinations as a function of the number of hits in the RoI when both methods are providing roughly the same track finding efficiency. The number of hit combinations required for the Hough transform is higher than that required for the AM method. Both show the same exponential increase as a function of the number of hits. The distribution of the number of hit combinations are not symmetrical, and have tails that extend to a very high number of combinations, as seen in figure 6.

Study of nominal performance
The efficiency of finding tracks from primary muons is plotted against the number of hits in figure 7 and against the true muon p T in figure 8. Both methods have a flat efficiency as a function of the number of hits in the RoI. Having high efficiency for leptons with p T in the 4-20 GeV range is important to maintain a low trigger p T threshold and performing track-based isolation. As shown in figure 8, the difference between the track finding efficiency at 4-8 GeV and 64-400 GeV is only a fraction of a percent for the AM method. The Hough transform shows a larger difference, approximately 1.5 %.
It is important to verify that the hit filtering methods are not biased to any particular layer. The inner layers are expected to have more hits since tracks from low-p T charged particles are bent off by the magnetic field. As expected, the number of hits in each layer decrease with larger radius, as shown in figure 9. After the AM pattern matching or Hough transform all layers have roughly the same number of hits, as seen in figure 10. This is expected for tracks of high-p T particles and verifies that neither method is biased towards any particular layer.
The number of hits from primary muons sent to the fitter will have an effect on the resolution  of the track parameters, with the best performance obtained when 8 hits can be used. Figure 11 shows the fraction of events where the AM and Hough transform find the same or different number of muon hits, and figure 12 shows the distributions of the number of muon hits found per event.

The effect of detector material
The effect on the track finding efficiency of adding 50 % more service material as a function of the number of hits in the RoI is shown in figure 13. The AM methods shows a small overall increase in efficiency while the Hough transform shows an overall decrease.
Looking at the p T dependence, shown in figure 14, the AM method has a small increase in efficiency, although the effect is compatible with statistical fluctuation. The Hough transform shows a decrease in efficiency of approximately 1 % at low and high p T , although these effects are also within statistical uncertainties. Note also that the Hough transform should be able to re-gain efficiency by re-tuning the accumulator configuration at the cost of more hits surviving the selection.

The effect of dropping detector modules
Based on experience with the current trackers at the LHC, a failing module is a more likely cause for loss of efficiency than a general reduction in quantum efficiency of the silicon sensors. In this study, inefficiencies are introduced by randomly removing single detector modules from the simulation. Figure 15 shows the muon track finding efficiency as a function of the number of hits per RoI. Removing 5 % of the modules reduces the efficiency to approximately 95 % for both the Hough transform and the AM pattern matching, with a slightly lower efficiency for the Hough transform. Removing 10 % of the modules reduces the efficiency to approximately 87 % for both methods.
The muon track finding efficiency as a function of the true muon p T is shown in figure 16. It exhibits some interesting features: for instance, the Hough transform shows a drop in efficiency at p T between 16 and 64 GeV but a high efficiency at higher p T . The p T dependence of the AM efficiency is not as strong, but there is a decrease in efficiency between 32 and 64 GeV. Note, however, that the efficiency has high statistical uncertainties and no firm conclusions can be drawn.

Summary
Both methods are able to find at least six of the muon hits in each event with an efficiency at the level of 98-99 %. This efficiency is not directly dependent on the amount of pile-up. Both systems can be tuned to have a nearly full efficiency, e.g. by using very large superstrips or accumulator bins. The optimization of the hit filtering must be done so that the efficiency is kept high while respecting the hardware constraints and rejecting enough hits to keep the latency of the fitter low.    The number of track fits, as shown in figure 5, grows rapidly and non-linearly with more pile-up for both methods. With modern FPGAs capable of performing a track fit every nanosecond, both methods yield a mean number of hit combinations that could be fitted within a latency of a few microseconds. However, the tails of the distributions are long and both methods will likely produce some events with several thousand hit combinations, which might be truncated by the latency limit. The AM pattern matching show a consistently lower mean and 75 % percentile over the range of pile-up hits tested here. Thus, this option seem to give the best safety limit when it comes to fitting latency.
Another important factor for a track trigger is the quality of the tracks, e.g. the final trigger decision could be based on the χ 2 and the p T measurement. Since in general the track fit quality will increase with the number of hits used, it is important for the hit filter to send as many of the true track hits as possible to the fitter within the same pattern or group. As shown in figures 11 and 12, the AM pattern matching finds more muon hits than the Hough transform in about 40 % of the events and has a larger fraction of 8 out of 8 hits found.
Given these results, the AM pattern matching looks like the best option for a hardware track trigger. However, there are some drawbacks to this method. AM chips with specifications that meet the task are not commercially available and have to be custom-made, which is expensive. The Hough transform, on the other hand, can be implemented in commercially available FPGAs. Both methods could be made resilient for the event of severe detector failure, e.g. if a whole detector layer stops working, by keeping backup pattern banks and uploading them to the AM chips or simply switching the layer parameters in the Hough transform. If the beam parameters change or the pile-up conditions differ from the simulations, the Hough transform offers greater flexibility as it can run with an arbitrary number of layers which can be increased if need be, while the AM chips must be designed for a fixed number.