The Level-0 calorimetric trigger of the NA62 experiment

The NA62 experiment at the CERN SPS aims at measuring the branching ratio of the very rare kaon decay K+ → π+ ν ν̄ (expected 10−10) with a 10% background. Since an high-intensity kaon beam is required to collect enough statistics, the Level-0 trigger plays a fundamental role in both the background rejection and in the particle identification. The calorimetric trigger collects data from various calorimeters and it is able to identify clusters of energy deposit and determine their position, fine-time and energy. This paper describes the complete hardware commisioning and the setup of the trigger for the 2015 physics data taking.

The back-end electronics is provided by 432 Calorimeter REadout Modules (CREAMs) ( [5]) for the LKr and other 10 CREAMs for MUV1, MUV2, IRC and SAC. They are VME modules installed in 29 crates (28 for the LKr alone). Each module digitizes, after proper shaping, up to 32 calorimeter channels with 40M S/s FADC with 14-bit dynamic range. It then buffers up to 8 GB data (on a DDR3 SODIMM module) during the SPS spill and provides 2 lower-granularity Trigger Level Sums (TLS) of 16 (4x4) calorimeter cells to the calorimetric Level-0 Trigger. The data, optionally zero-suppressed, are readout when there is a Level-1 trigger. A scheme of the calorimeter trigger and readout system for the LKr is shown in figure 2.

The NA6trigger system
The calorimeter trigger is part of the larger experimental trigger. At full intensity beam, an average 10 MHz decay rate hits the downstream detectors. In order to extract few interesting decays from such an intense flux, a complex three level trigger and data acquisition system was designed [3].
The Level-0 (L0) trigger algorithm is based on different sub-detectors (in addition to the calorimetric trigger, the charged hodoscope, the muon detector, the large-angle vetoes, the RICH detector) and it is performed by dedicated custom hardware modules, with a maximum output rate of 1 MHz and a maximum latency of 1 ms.
The data from each sub-detector -except the LKr calorimeter's -is sent to a farm of PCs where the Level 1 (L1) and Level 2 (L2) software triggers are performed. L1 algorithms run on the data of individual detectors. A positive L1 decision triggers the readout of the calorimeter data (which is kept in memories up to then) and, subsequently, L2 algorithms are executed on the complete event. The L1 trigger has a maximum output rate of 100 kHz with a non-fixed total latency of about 1 s, while the L2 trigger, has an output rate of the order of 15 kHz with a maximum total latency equal to the basic data taking time unit, the period of the SPS beam-delivery cycle.

The Level-0 calorimetric trigger
The trigger recognizes electromagnetic/hadronic clusters in the calorimeters along with their position, fine-time and energy [9,10]. The inputs are the Trigger Sum Links (TSL), sums of ADC values, sampled at 40 MHz, that are continuously sent by the CREAM modules. There are 864 TSL for the LKr, 1 for IRC, 1 for SAC, 12 for MUV1 and 6 for MUV2.
The system is composed by 37 TEL62 boards [6,7]. They are 9U general purpose data acquisition boards, based on LHCb TELL1 [13], common to many sub-detectors of the experiment and they are equipped with custom dedicated I/O mezzanines (see figure 4a). Each board mounts five Altera Stratix III FPGAs (EP2SL200): four, so called Pre-Processing (PP), receive and process data from input mezzanines and one, so called Sync-Link (SL), receives and process data from the PPs and sends them to the output mezzanine.
The calorimetric trigger is structured as a 3-layer system where each layer has a different number of TEL62, with different I/O mezzanines, and plays a different role in the cluster search that is is performed through a 1D (vertical) + 1D (horizontal) algorithm. In case of the larger and most complex calorimeter, the LKr: • In a first front-end layer, composed of 28 boards (one board is shown in figure 4a, the crates in figure 3a), peaks are independently identified in 28 vertical slices of the calorimeter. Each slice is segmented vertically in 32 supercells (4x4 calorimeters cells), where each supercell corresponds to an input TLS.
• In a second layer composed of 7 concentrator boards (mezzanines shown in figure 4b and 4c, the crate in figure 3b), different peaks are horizontally merged when they are close in time and space, therefore each cluster is fully reconstructed.
• A final concentrator board collects all the information and transmits, through the Gbit Ethernet mezzanine, a trigger primitive to the central L0TP for the trigger decision.
-3 - In the 2015 physics run the trigger decision is limited to the total energy deposit of each calorimeter. Therefore the concentrators have been only used to perform I/O from the frontends to the final concentrator, where the trigger logic is implemented.
The trigger for the MUV1, MUV2, SAC and IRC is realized with one front-end board directly connected the the final concentrator.

Main firmware features
The entire system firmware has been designed from scratch and static timing analysis performed according to hardware specifications on all I/O paths. In this section the main features common to all system firmwares are described. Each layer additionally implements part of the trigger algorithm as described in section 3. The latency from the TSL input to the generation of the trigger primitive is about 50µs.1 Clock distribution. The experiment distributes the 40.08 MHz experiment clock via the Timing, Trigger and Control (TTC) system [12]. Each TEL62 board receives one optical fiber with the TTCrx timing receiver ASIC that also synchronously distributes triggers (both the physics triggers and special ones like the SPS Start of Burst (SOB) and End of Burst (EOB)). The received clock is jitter-cleaned by a QPLL chip and then distributed on the board to the SL and to the four PP FPGAs that use it as the input clock of the main PLL. Data are sent from the PP FPGAs to the SL with a derived 160 MHz compensated clock (12 Gbps bandwidth per each PP).
1The total latency of the NA62 L0 trigger is fixed in 100µs with a delay added by the Level-0 Trigger Processor.
-4 -(a) One of the 29 TEL62 boards of the front-end layer. It mounts two TELDES input mezzanines (on the left), each with 16 DS92LV16 deserializers for 16 input channels, and one TX board (on the right) that serializes two output channels.  The various I/O mezzanines use different technologies. The DS92LV16 deserializers on-board the TELDES boards receive a 16-bit serialized input from the CREAMs and recover the 40 MHz clock embedded in the data stream and this is therefore interfaced on the PP FPGAs with dual clock FIFOs. The output mezzanines, the TX board or the Gbit board, receive source synchronous data and clock at 120 MHz from the SL FPGA. The TX board serializes 48-bit data over 8 LVDS links with a 70 MHz clock, provided by an oscillator on-board, that is also transmitted and used to latch data on the receiver side. A Stratix II FPGA on the TX board buffers the 120 MHz data stream received from the SL and provides a 70 MHz Double Data Rate input to the two DS90CR485 serializers.

-5 -
The ECS local bus. Each TEL62 has a local 32-bit bus, called Experiment Control System (ECS), that connects the five FPGAs, the output mezzanine (16 LSB of the bus) and the on-board Credit Card PC (CCPC). The CCPC is an i486 disk-less PC with an Ethernet interface, 64 MB SRAM that runs Linux. A dedicated glue-card (PLX 9030) interfaces the PCI memory space of the CCPC to the local bus. This allows software to address and read/write registers on the FPGAs. ECS is clocked with a 20 MHz clock derived from the QPLL 40 MHz output clock. The CCPC behaves as a master on the bus while, on the firmware side, each FPGA has a bus bridge that selects the addressed register, fifo or memory cell that acts as a slave. This system allows on-line control and monitoring of the trigger. Because of the complexity and high number of boards, a python object-oriented software infrastructure has been developed to abstract operations at the highest possible level. A VHDL memory map of the address space is used for firmware writing and it is also parsed by the software, simplifying software writing. The glue-card also allow JTAG access for reprogramming the board.
Data transmission. The whole system uses common generic logic for sending and receiving data between different FPGAs or between different boards. The data can be optionally sent together with an Hamming code that is checked by the receiver to check data integrity. The code has one extra parity bit to allow single error correction and double error detection. The system has not shown transmission errors during operations unless an hardware problem is present: a faulty mezzanine, cable, connector or simply a cable not correctly plugged. This logic can also be configured to send on the whole bus Pseudo Random Binary Sequences that are checked on the receiver. This has two purposes: 1) perform Bit Error Rate Tests (BERT) for a extended time period; 2) at power-up of the deserializer on the RX boards, allow and then check data-clock deskew. Extended BERT have been performed for days without detecting errors.
Single Event Upset detection. Because of the high radiation environment, the error detection CRC feature of the Altera Stratix III FPGAs has been enabled. This is able to detect single or double bit flips in any of the configuration CRAM bits in Stratix III devices due to a soft error. During the 2015 run it has been used as monitor feature that allows operators to intervene in case of error by reloading the configuration and reinitializing the system. In the future an automatic reloading and reconfiguration procedure may be foreseen.

The trigger algorithm
In the 2015 physics run, the trigger decision is based on the total energy deposit in each calorimeter. It is performed on different steps on different boards and firmwares. The first phase on the PP FGPAs on the 29 front-end boards is totally pipelined.
The front-end boards look, on each input channel (ADC sums of 4x4 calorimeter cells), for relevant physics signals. This is done requiring a peak in time and above a configurable threshold. If d[0 . . . 3] are four input samples of one channel at 25 ns, the requests are: Different criteria and thresholds are selectable on-line during trigger operations and have been optimized to maximize peak recognition of pions and photons while being blind to muons at low energy.
-6 - A parabolic fit is performed on the three samples around the maximum sample (see figure 5). The maximum of the fit is used as an energy estimation: this recovers any time walk effect due the phase between the physics signal and the 40 MHz sampling clock. Recursive bisections between the samples are then performed for the fine-time estimation of the peak onset. The peak time is therefore determined as the 32-bit experimental clock counter (25ns period) of the lowest sample of the first bisection plus a fraction of that period expressed as an 8 bit fine-time number (LSB = 25ns/256 ≈ 98ps).
Data for each peak (energy estimation, timestamp, finetime) is then transmitted from the 29 frontend boards up to the SL FPGA of the last concentrator board where the trigger decision is taken. This logic is clocked at 160 MHz and it is sketched in figure 6. There are five identical logic blocks, one per each data source: MUV1, LKr, SAC, IRC, MUV2. Incoming data have a source identifier in the data packet (they are tagged while traveling in the trigger system) and can be routed to the corresponding logic block. The core is a dual port RAM used as a circular buffer: it represents an energy histogram binned in time: each memory cell corresponds to a time interval and the stored value is total energy in that time interval. The bin size can be tuned on-line with a lower limit of 6.25 ns due to the 160 MHz clock rate. Each memory has depth 16384 and can then store a minimum of 6.25 · 16384 = 102.4µs of data.
The timestamp and the finetime of the data are used to address the RAM and the addressed memory cell is first read and then written summing up the energy of incoming data to the previously stored energy value. Because this takes two clock cycles, a FIFO is needed to buffer incoming data.
The first data packet of the SPS burst is written in the middle of the corresponding memory and this sets the reference time for the each of memory locations for all five buffers. The buffers are read simultaneously starting from the beginning of the memories and with a rate corresponding to realtime (that is at 160 MHz if bin size = 6.25 ns, at 80 MHz if bin size = 12.5 ns etc. . . ). This delay of half memory (hence > 51µs) allows to absorb any time skew between different incoming data. The -7 - value read from each RAM is the total energy of the detector in that time bin. For calorimeters used as a veto, the veto window is enlarged summing up the energy over three time bins always centered around the time of the positive trigger; this to avoid resolution effects at the edge of veto window. Boolean conditions with energy cuts for each detectors are applied resulting in the trigger decision. The time of the trigger corresponds to the reference time of the memory cell currently read.

Conclusion
The Level-0 calorimetric trigger of NA62 has been fully commissioned and operated for the first time during the first physics run from July to November 2015. It has been tested up to the nominal beam intensity of 33 · 10 11 protons per SPS spill.
After the initial commissioning phase, the system has proven to be stable and no hardware faults have been detected.
A significant amount of data has been acquired with various trigger conditions that show clear suppression of the main background contributions.
While for the 2015 run the trigger conditions were based on the total energy deposit in each of the calorimeters (MUV1, LKr, IRC, SAC, MUV2), the next run in 2016 will have energy clusters reconstructed on the basis of the spatial and time information. Data readout at Level-0 is also foreseen through additional mezzanines, with Gbit Ethernet links, that will plugged on the TX board.