Performance and operation of the calorimetric trigger processor of the NA62 experiment at CERN SPS

The NA62 experiment at the CERN SPS aims at measuring the branching ratio of the very rare kaon decay K+ → π+ ν ν̄ (expected 10−10) with a 10% background. Since an high-intensity kaon beam is required to collect enough statistics, the Level-0 trigger plays a fundamental role in both the background rejection and in the particle identification. The calorimetric trigger collects data from various calorimeters and it is able to identify clusters of energy deposit and determine their position, fine-time and energy. This paper describes the trigger system setup during the 2016 physics data taking. A newly implemented cluster counting algorithm is also presented.


Conclusion 10
The NA62 experiment [1] is a fixed target experiment located in the CERN North Area. The 400 GeV/c high-intensity SPS proton beam impinges on a beryllium target, producing a 750 MHz secondary hadron beam of which 6% are kaons. They are selected with a momentum of 75 GeV/c and they decay in flight along a 65 m fiducial decay region (figure 1). To achieve the desired signal to background ratio of about 10 in the K + → π + νν measurement the experiment has to identify and veto the kaon decays, such as K + → π + π 0 and K + → µ + ν, that have branching ratios up to 10 10 times larger than the expected signal [1].
The Level-0 calorimetric trigger has the role of vetoing photons and selecting a π + in the final state. Its capabilities have been extended during the 2016 physics run by the implementation of a cluster counting algorithm, in addition to the total energy criteria.

The Calorimeters
A hermetic photon veto for the experiment is provided by various detectors, each covering a different angular region. From the inner to the outer region: the forward Small Angle Calorimeter (SAC), the Intermediate Ring Calorimeters (IRC) up to 1 mrad, the Liquid Krypton Calorimeter (LKr) up to 8.5 mrad and the Large Angle Photon Veto (LAV) up 50 mrad (see figure 1).
Both IRC and SAC are made of alternating layers of lead and scintillators (Shashlik). Downstream the LKr calorimeter there are two hadronic calorimeters called Muon-Veto 1 and 2 (MUV1 and MUV2), composed as iron-scintillator sandwich. They are all readout via PMTs with a total of 176 channels for MUV1, 88 for MUV2, and 4 for IRC and SAC.
LKr is an high-performance electromagnetic calorimeter, about 27 radiation lengths, with 13248 channels consisting of 2 × 2 cm 2 cells of thin copper-beryllium ribbons, kept at high voltage, and immersed in a 10 m 3 liquid krypton bath at 120 K acting as active medium. For photons of more than 10 GeV energy, a detection inefficiency of 10 −5 , a time resolution of 350 ps and an energy resolution better than 1% allow its use as an efficient veto and for particle identification.
The back-end electronics is provided by 432 Calorimeter REadout Modules (CREAMs) for the LKr and other 10 CREAMs for MUV1, MUV2, IRC and SAC [2]. They are VME modules installed in 29 crates (28 for the LKr alone). Each module digitizes, after proper shaping, up to 32 calorimeter channels with 40 MS/s FADC with 14-bit dynamic range. It then buffers up to 8 GB data (on a DDR3 SODIMM module) during the SPS spill and provides 2 lower-granularity Trigger Sum Links (TSL) of 16 (4 × 4) calorimeter cells to the calorimetric Level-0 Trigger. The data, optionally zero-suppressed, are readout when there is a Level-1 trigger. A scheme of the calorimeter trigger and readout system for the LKr is shown in figure 2a.

The NA6Trigger System
The calorimeter trigger is part of the larger experimental trigger. At full intensity beam, an average 10 MHz decay rate hits the downstream detectors. In order to extract few interesting decays from such an intense flux, a complex three level trigger and data acquisition system was designed [3].
The Level-0 (L0) trigger algorithm is based on different sub-detectors (in addition to the calorimetric trigger, the charged hodoscope, the muon detector, the large-angle vetoes, the RICH detector) and it is performed by dedicated custom hardware modules, with a maximum output rate of 1 MHz and a maximum latency of 1 ms.
The data from each sub-detector -except the LKr calorimeter's -is sent to a farm of PCs where the Level 1 (L1) and Level 2 (L2) software triggers are performed. L1 algorithms run on the data of individual detectors. A positive L1 decision triggers the readout of the calorimeter data (which is kept in memories up to then) and, subsequently, L2 algorithms are executed on the complete event. The L1 trigger has a maximum output rate of 100 kHz with a non-fixed total latency of about 1 s, while the L2 trigger, has an output rate of the order of 15 kHz with a maximum total latency equal to the basic data taking time unit, the period of the SPS beam-delivery cycle of about 15 s.

The Level-0 Calorimetric Trigger
The trigger recognizes electromagnetic/hadronic clusters in the calorimeters along with their position, fine-time and energy [4,5]. A schematic view is provided in figure 2b. The inputs are the Trigger Sum Links (TSL), sums of ADC values, sampled at 40 MHz, that are continuously sent by the CREAM modules. There are 864 TSL for the LKr, 1 for IRC, 1 for SAC, 12 for MUV1 and 6 for MUV2.
The system is composed by 37 TEL62 boards [6,7]. They are 9U general purpose data acquisition boards, based on LHCb TELL1 [8], common to many sub-detectors of the experiment and they are equipped with custom dedicated I/O mezzanines (see figure 3a). Each board mounts five Altera Stratix III FPGAs (EP2SL200 [9]): four, so called Pre-Processing (PP), receive and process data from input mezzanines and one, so called Sync-Link (SL), collects and process data from the PPs and sends them to the output mezzanine.
The calorimetric trigger is structured as a 3-layer system where each layer has a different number of TEL62, with different I/O mezzanines, and plays a different role in the cluster search that is is performed through a 1D (vertical) + 1D (horizontal) algorithm. The trigger of the largest and most complex calorimeter, the LKr, is structured as follows: • In a first front-end layer, composed of 28 boards (one board is shown in figure 3a, the crates in figure 4a), peaks are independently identified in 28 vertical slices of the calorimeter. Each slice is segmented vertically in 32 super-cells (4x4 calorimeters cells), where each super-cell corresponds to an input TSL.
• In a second layer composed of 7 merger boards (mezzanines shown in figure 3b and 3c, the crate in figure 4b), different peaks are horizontally merged when they are close in time and space, and therefore each cluster can be fully reconstructed.
• A concentrator board collects all the information and transmits, through the Gbit Ethernet mezzanine, a trigger primitive to the central L0TP for the trigger decision.
In the 2016 physics run the trigger decision is based on the total energy deposit of each calorimeter and on the number of clusters. The role of the merger boards has been limited to the collection of data from the frontends and to its delivery to the final concentrator board, where the trigger logic is implemented.
The trigger for the MUV1, MUV2, SAC and IRC is realized with one front-end board directly connected the the concentrator board.

Main Firmware Features
The entire system firmware has been designed from scratch and static timing analysis performed according to hardware specifications on all I/O paths. In this section the main features common to all system firmwares are described. Each layer additionally implements part of the trigger algorithm as described in section 3. The latency from the TSL input to the generation of the trigger primitive is about 50 µs.1 1The total latency of the NA62 L0 trigger is fixed to 100 µs with a delay added by the Level-0 Trigger Processor.
-4 -(a) One of the 29 TEL62 boards of the front-end layer. It mounts two TELDES input mezzanines (on the left), each with 16 DS92LV16 deserializers for 16 input channels, and one TX board (on the right) that serializes two output channels.  Clock distribution The experiment distributes the 40.08 MHz experiment clock via the Timing, Trigger and Control (TTC) system [10]. Each TEL62 board receives one optical fiber with the TTCrx timing receiver ASIC [11] (see figure 3a) that also synchronously distributes triggers (both the physics triggers and special ones like the SPS Start of Burst (SOB) and End of Burst (EOB)). The received clock is jitter-cleaned by a QPLL chip [12] (see figure 3a) and then distributed on the board to the SL and to the four PP FPGAs that use it as the input clock of the main PLL. Data are sent from the PP FPGAs to the SL with a derived 160 MHz compensated clock (12 Gbps bandwidth per each PP).
The various I/O mezzanines use different technologies. The DS92LV16 deserializers on-board the TELDES [13] boards receive a 16-bit serialized input from the CREAMs and recover the -5 - 40 MHz clock embedded in the data stream. This data stream is interfaced on the PP FPGAs with dual clock FIFOs. The output mezzanines, the TX board or the Gbit board, receive source synchronous data and clock at 120 MHz from the SL FPGA. The TX board serializes 48-bit data over 8 LVDS links with a 70 MHz clock, provided by an oscillator on-board, that is also transmitted and used to latch data on the receiver side. A Stratix II FPGA on the TX board buffers the 120 MHz data stream received from the SL and provides a 70 MHz Double Data Rate input to two DS90CR485 serializers.
The ECS local bus Each TEL62 has a local 32-bit bus, called Experiment Control System (ECS), that connects the five FPGAs, the output mezzanine (16 LSB of the bus) and the on-board Credit Card PC (CCPC, see figure 3a). The CCPC is an i486 disk-less PC with an Ethernet interface and 64 MB SRAM that runs Linux. A dedicated glue-card (PLX 9030) interfaces the PCI memory space of the CCPC to the local bus. This allows software to address and read/write registers on the FPGAs. ECS is clocked with a 20 MHz clock derived from the QPLL 40 MHz output clock. The CCPC behaves as a master on the bus while, on the firmware side, each FPGA has a bus bridge that selects the addressed register, FIFO or memory cell that acts as a slave. This system allows on-line control and monitoring of the trigger. Because of the complexity and high number of boards, a python object-oriented software infrastructure has been developed to abstract operations at the highest possible level. A VHDL memory map of the address space is used for firmware writing and it is also parsed by the software, simplifying software writing. The glue-card also allow JTAG access for reprogramming the board.
-6 -Data transmission The whole system uses common generic logic for sending and receiving data between different FPGAs or between different boards. The data can be optionally sent together with a Hamming code that is checked by the receiver to check data integrity. The code has one extra parity bit to allow single error correction and double error detection. The system has not shown transmission errors during operations unless a hardware problem is present: a faulty mezzanine, cable, connector or simply a cable not correctly plugged. This logic can also be configured to send on the whole bus pseudo-random binary sequences that are checked on the receiver. This has two purposes: 1) perform Bit Error Rate Tests (BERT) for a extended time period; 2) at power-up of the deserializer on the RX boards, allow and then check data-clock deskew. Extended BERT have been performed for days without detecting errors.
Single Event Upset detection Because of the high radiation environment, the error detection CRC feature of the Altera Stratix III FPGAs has been enabled. This is able to detect single or double bit flips in any of the configuration CRAM bits in Stratix III devices due to a soft error. During the 2016 run it has been used as monitor feature that allows operators to intervene in case of error by reloading the configuration and reinitializing the system (with the frequency of about one intervention per week at high beam intensities). In the future an automatic reloading and reconfiguration procedure may be foreseen.

The Trigger Algorithm
In the 2016 physics run, the trigger decision is based on the total energy deposit in each calorimeter and on the number of clusters (limited to "one cluster only" or "more than one cluster" conditions). More than one trigger condition can be implemented and it's result (true or false) is encoded in a specified bit of the primitive id word sent to the central trigger processors of the experiment. It is performed on different steps on different boards and firmwares.
The front-end boards look, on each TSL input channel (ADC sums of 4x4 calorimeter cells), for relevant physics signals. This is done requiring a peak in time and above a configurable threshold. If d[0..3] are four input samples of one channel (40 MHz sampling frequency), the requests are: [3] (3.1) Different criteria and thresholds are selectable on-line during trigger operations and have been optimized to maximize peak recognition of pions and photons while being blind to muons at low energy. A parabolic fit is performed on the three samples around the maximum sample (see figure 5). The maximum of the fit is used as an energy estimation: this recovers any time walk effect due the phase between the physics signal and the 40 MHz sampling clock. Recursive bisections between the samples are then performed for the fine-time estimation of the peak onset. The peak time is therefore determined as the 32-bit experimental clock counter (25 ns period) of the lowest sample of the first bisection plus a fraction of that period expressed as an 8-bit fine-time number (LSB = 25 ns/256 ≈ 98 ps).
Data for each peak (energy estimation, timestamp, finetime) is then transmitted from the 29 frontend boards up to the SL FPGA of the last concentrator board where the trigger decision is -7 - taken. This logic is clocked at 160 MHz and it is sketched in figure 6. There are five identical logic blocks, one per each data source: MUV1, LKr, SAC, IRC, MUV2. Incoming data have a source identifier in the data packet (they are tagged while traveling in the trigger system) and can be routed to the corresponding logic block. The core is a dual port RAM used as a circular buffer: it represents an energy histogram binned in time: each memory cell corresponds to a time interval and the stored value is total energy in that time interval. The bin size can be tuned on-line with a lower limit of 6.25 ns due to the 160 MHz clock rate. Each memory has depth 4096 and can then store a minimum of 6.25 · 4096 = 25.6 µs of data.
The timestamp and the finetime of the data are used to address the RAM and the addressed memory cell is first read and then written summing up the energy of incoming data to the previously stored energy value. In addition, in each memory cell a bit (hereafter ncls) indicates the number of clusters as one cluster only (ncls = 1) or more than one cluster (ncls = 0). The energy and the 98 ps resolution 8-bit finetime of the peak with maximum energy is also stored; this is used to provide a fine granularity timing for the output trigger information.
These memory buffers, one for each detector, are read simultaneously with a rate corresponding to real-time (that is at 160 MHz if bin width size = 6.25 ns, at 80 MHz if bin width size = 12.5 ns etc.). The width of the memory, corresponding to a 25 µs time window in the worst case, allows to absorb any time skew between different incoming data. The values read from each RAM are the total energy the number of clusters in the detector for that time bin. For calorimeters used as a veto, the veto window is enlarged summing up the energy over three time bins centered around the time of the positive trigger; this is done in order to avoid to underestimate the total energy because of time binning. Boolean conditions with cuts on energy and number of clusters for each detectors are applied resulting in the trigger decision. The time of the trigger corresponds to the finetime of the more energetic peak in the time bin.
In figure 7 there is a flowchart showing how the clustering algorithm is implemented. While the energy in the bin is the total sum of the energy of the incoming peaks, the position always refers to the peak with the maximum energy release. Such position is considered as a candidate seed position for a cluster: if all other incoming clusters fall within a configurable distance (8 cm is the -8 -value used in the 2016 run) then there is one cluster only (and the total energy is the cluster energy). As a cross check, a comparison between the single cluster position computed by the trigger and the position computed by the offline analysis software is shown in 8, on a set of events selected with the offline software as having only one cluster in the LKr. The zero-centered box shape shows that the trigger algorithm correctly identifies the cluster position.   An estimation of the trigger veto efficiency on π + π0 is in figure 9, showing about 98% efficiency.2 The trigger was required to veto above 20 GeV of total energy in the LKr calorimeter. The efficiency sample is obtained without any request on the LKr and the trigger response is checked in a ±20 ns time window around a control trigger.

Conclusion
The Level-0 calorimetric trigger of NA62 has been fully commissioned and operated for the first time during the first physics run from May to November 2016. It has been tested up to the nominal beam intensity of 30 · 10 11 protons per SPS spill.
After the initial commissioning phase, the system has proven to be stable and no hardware faults have been detected.
2The efficiency sample is selected requiring a single track, corresponding to the pi + in the decay region. Events with photons outside the LKr acceptance or with muons in the LKr acceptance are excluded. Calorimeters are not used in the selection.

JINST 12 C04020
A significant amount of data has been acquired with various trigger conditions that show clear suppression of the main background contributions.
While for the 2015 run the trigger conditions were based on the total energy deposit in each of the calorimeters (MUV1, LKr, IRC, SAC, MUV2), the 2016 run had the energy clusters reconstructed on the basis of the spatial and time information. For the 2017 run a data readout at Level-0 is also foreseen through additional mezzanines, with Gbit Ethernet links, that will plugged on the TX board.