First operation of the level-0 trigger of the NA62 liquid krypton calorimeter

The NA62 experiment at CERN Super Proton Synchrotron aims at studying ultra-rare decays of charged kaons for precise tests of the Standard Model. The complete experimental setup is being commissioned for the first physics data taking in the autumn of 2014. This paper presents the final design and implementation of the Level-0 trigger system of the LKr calorimeter, acting as hermetic photon veto of the experiment in the 1-8.5 mrad region. The first on-field performance tests are presented.


Introduction
The NA62 experiment [1,4], located in the CERN North Area, is especially focused on a precise measurement of the branching ratio of the very rare kaon decay K + → π + νν [2,3]. The 400 GeV/c high-intensity SPS proton beam impinges on a beryllium target and secondary kaons, with selected momentum of 75 GeV/c, decay in flight along the 65 m fiducial decay region (figure 1). Because of the small signature of the processes under study, various detectors, distributed along a 170 m long region, are devoted to particle identification and to the high precision measurement of the various physical quantities. An hermetic photon veto system is needed to reject photons from the various K + decay modes, in particular K + → π + π 0 , and the Liquid Krypton (LKr) calorimeter is used for such purpose in the 1-8.5 mrad forward region.

Trigger and data acquisition system
In the 2015 run the CERN SPS will provide 3 × 10 12 protons per spill (4.8 s burst duration with a period of 16.8 s). At the time of writing, October 2014, the experiment is undergoing a first physics test run with a lower intensity beam. At full intensity, the selected 75 GeV/c secondary hadron beam will result in an instantaneous kaon rate of about 50 MHz and an average 10 MHz decay rate hitting the downstream detectors. In order to extract few interesting decays from such an intense flux, a complex and performing three level trigger and data acquisition system was designed [5].
The Level-0 (L0) trigger algorithm is based on few sub-detectors (the charged hodoscope, the muon detector, the liquid krypton electromagnetic calorimeter and large-angle vetoes) and it is performed by dedicated custom hardware modules, with a maximum output rate of 1 MHz and a maximum latency of 1 ms.
-1 - The data from each sub-detector -except the LKr calorimeter's -is sent to a farm of PCs where the Level 1 (L1) and Level 2 (L2) software triggers are performed. L1 algorithms run on the data of individual detectors. A positive L1 decision triggers the readout of the calorimeter data (which is kept in memories up to then) and, subsequently, L2 algorithms are executed on the complete event. The L1 trigger has a maximum output rate of 100 kHz with a non-fixed total latency of about 1 s, while the L2 trigger, has an output rate of the order of 15 kHz with a maximum total latency equal to the basic data taking time unit, the period of the SPS beam-delivery cycle.

The LKr calorimeter and the CREAM readout modules
The NA48/62 calorimeter [6] is a quasi-homogeneous ionization device, about 27 radiation lengths, segmented transversally (there is no segmentation in depth) to the beam in 13248 (2×2) cm 2 cells of thin copper-beryllium ribbons, kept at high voltage, and immersed in a 10 m 3 liquid krypton bath at 120 K acting as active medium. For photons of more than 10 GeV energy, a detection inefficiency of 10 −5 , a time resolution of 350 ps and an energy resolution of less than 5% allow its use as an efficient veto and for particle identification.
The back-end electronics has been updated from NA48 because of the higher trigger rate. New Calorimeter REadout Modules (CREAMs) [7] readout the calorimeter channels, buffer and optionally zero-suppress data during the SPS spill. They also provide to the Level 0 Trigger system (the object of this paper) 864 lower-granularity Trigger Level Sums (TLS): digital sums of 16 (4x4) calorimeter cells. A scheme of the calorimeter trigger and readout system is shown in figure 2.

The LKr Level-0 trigger
The Level-0 trigger of the LKr calorimeter (figure 4) is based on the recognition of the electromagnetic clusters and it is able to provide their position, fine-time and energy for a veto decision [10,11]. Because the input of the trigger is the 864 TLS from the CREAMs, digital sums of -2 - 4 × 4 calorimeter cells, the system reconstructs the position with the same bi-dimensional segmentation of the calorimeter but with a lower granularity of (8×8) cm 2 .
The design parameters of the system are the maximum expected instantaneous hit rate of 30 MHz, the required single cluster time resolution of 1.5 ns and the maximum allowed latency of 100 µs from the detector hit generation to the trigger primitives output to the L0 Trigger Processor (L0TP).
The entire system will be composed by 36 TEL62 [8,9], the 9U general purpose data acquisition board common to most sub-detectors of the experiment, equipped with custom dedicated I/O mezzanines. Each TEL62 provides four Pre-Processing FPGAs, receiving data from different kinds of custom receiver mezzanines, and one Sync-Link FPGA (all Altera Stratix III [12]) that collects the data and transmits it trough a custom transmission mezzanine and eventually, via Gigabit Ethernet, to the L0TP.
The cluster search is performed through a 1D+1D algorithm on a 3-layer system as shown in figure 3: • A front-end layer, composed of 28 boards (figure 4(a)), receives the TLS data from the CREAMs: each board reads up to 32 channels 1 (blue squares in figure 3) that are located along a vertical slice of the calorimeter. Peaks in the energy deposit are independently identified in each vertical slice (see 4.2).
• A 1 st concentrator layer, composed of 7 boards (figure 4(b)), receives data from the front-end layer. Peaks on different vertical slices are horizontally merged when they are close in time and space, therefore each cluster is fully reconstructed. Each of the 7 boards has a region of interest of 4 vertical slices plus two slices on the left and two on the right (red area in figure 3), overlapping with neighbor boards, to avoid double counting.
• A final 2 nd concentrator layer (figure 4(c)) with one board collects all the information and transmits, via Gbit Ethernet, a trigger primitive to the central L0TP for the trigger decision. In the 2014 physics test run, a reduced system has been installed with six boards in the first layer, receiving data from the six central vertical slices of the calorimeter, and one concentrator board, that collects the peaks from the front-end boards, merges them into clusters and sends the result to the L0TP.
In the following sections hardware and firmware of each layer are briefly described.

TELDES input board
Each of the front-end TEL62 of the trigger system mounts two input mezzanines TEL62 DESerializer (TELDES, shown in figure 6), making a total of 56 TELDES in the whole system. Each TELDES receives 16 16-bit serial streams by the CREAMs. The stream is serialized by the SerDes circuitry built in the FPGA on-board each CREAM with the experiment master clock (TTC) at 40 MHz and deserialized by SerDes chips DS92LV16 on board the TELDES. It has embedded clock information, therefore avoiding any need for clock-data deskew. The physical links are 100 Ω differential pairs over a standard Ethernet cables (Cat 6 SFTP, 15 m length) with an effective data rate per pair of 640 Mbps. A cable driver DS15BA101 together with the equalizer DS15EA101 is used to ensure signal integrity between the transmitting CREAM and the receiving TELDES.

Peak recognition firmware
A peak at time t in the channel k is defined as a maximum energy deposit E k (t) in space, time and larger than a programmable threshold: Peak over threshold: where E k−1 (t) and E k−1 (t) are the energies at time t in the channels located above and below the channel "k" having the peak energy E k (t) (each channel is a blue square in figure 3). The expected -4 -(a) One of the 28 front-end boards, each processing one vertical slice of the calorimeter. 32 TLS are received by two TELDES input mezzanines on serial links over standard Ethernet cables. The information on recognized peaks is redundantly sent, trough the TX mezzanine, to two different concentrator boards.  -5 -   Figure 7. Scheme of the PP firmware: 8 16-bit data streams from the TELDES input mezzanine enter the system and a peak finder algorithm is applied as described by eq. (4.1)-(4.3). Data corresponding to a peak is then processed by a logic described in VHDL or by a software running on 8 NIOS cores. The peak maximum and its fine-time are calculated (see figure 8) and transferred to the rest of system for cluster merging and transmission to the Level 0 Trigger Processor. maximum instantaneous hit rate on the calorimeter is 30 MHz and, considering the extension of each cluster and the topology of the trigger system, a safe (factor 3 worse) estimation of the maximum peak rate to be sustained by each front-end FPGA (processing 8 channels) is 5 MHz.
The firmware performing the peak recognition in the FPGAs of the front-end boards is sketched in figure 7. Each of the four Pre-Processing FPGAs of the front-end boards receives 8 16-bit input channels at 40 MHz (CREAM TLS, see also section 3 and 4) and looks for peaks in a pipelined and synchronized logic. On the data corresponding to peaks, a parabolic fit is performed to determine the peak maximum and to obtain a fine-time estimation of peak arrival (see later). This can be done with two alternative approaches. The first is a VHDL logic, synchronous with the 40 MHz inputs, with multipliers and dividers for the parabolic fit and recursive bisections between the samples for the fine-time estimation; being synchronous it is intrinsically immune to any change in the peak rate. The second, asynchronous, approach is a software, performing both the fit -6 - Figure 8. The plot shows the input and the output data of the Peak-Logic block of figure 7, that estimates the peak maximum and fine-time of the peak arrival time. A peak is identified over five 25 ns samples. A parabolic fit is performed around the maximum sample to find the true maximum (blue dot). The fine-time is an high-resolution timing of the peak arrival estimated with a constant fraction discriminator technique. If max(y fit ) is the value of the peak maximum found by the parabolic fit, the fine-time is defined as the time t 0 such that y(t 0 ) = constant · max(y fit ). The hardware implementation assumes that the signal shape is linear around t 0 and performs an iterative bisection starting from the closest samples (reaching in 8 steps a resolution of 25 ns/2 8 ≈ 98 ps). and the estimation of the peak fine-time, running on Altera NIOS II embedded processors clocked at 320 MHz; real-time measurements have shown that the maximum processing latency is 1.4 µs, therefore 8 cores in a Round-Robin schedule are able to sustain the maximum estimated peak rate of 4.2 MHz.

Fine-time and energy resolution of the readout system
In order to obtain a preliminary determination of the resolution of the readout system, a signal shaped as a calibration pulse has been split and sent to two different CREAM inputs. The L0 trigger system has been used to simultaneously estimate the maximum and the fine-time of the ideally identical peaks on the two channels. The distribution of the differences between the finetime values of the simultaneous peaks on the two channels is shown in figure 9(a) and has a standard deviation of 166 ps. The same kind of distribution for the difference in the peak maximum is shown in figure 9(b) and has a standard deviation of 0.6% the peak maximum. Both are obtained with the Peak Logic described in figure 8. Because of the systematic (worsening) effects of the measurement setup, they are both upper limits of the real resolutions. These values are lower than the calorimeter time resolution of 350 ps and 5% energy resolution for a photon with energy larger than 10 GeV.

The internal transmitter and receiver mezzanines
The communication between the three layers of the Level-0 trigger system is implemented with transmitter and receiver mezzanines based on the DS90CR485/6 SerDes chips.
The transmitter board, shown in figure 10(a), mounts an Altera Stratix II FPGA that receives and buffers data from the Sync-Link FPGA on board the TEL62, and transmits it on two links through two serializer DS90CR485. Each of them serializes 24 LVTTL double edge inputs (48 bits -7 -(a) Distribution of the difference between the reconstructed fine-time values of the peak onset of two channels receiving the same input.
(b) Distribution of the difference between the reconstructed maximum peak value (related to the peak energy deposit) of two channels receiving the same input.  The receiver board, shown in figure 10(b), manages four links with four DS90CR486 chips. Each Pre-Processing FPGAs will receive data from two links, each providing 48 bit data and the received clock. The physical link is a custom, halogen-free, 2 m cable with ten shielded twisted copper pairs and Mini D Ribbon type connectors.

Firmware interface and test results
The SerDes chipset is designed for operations with input clock between 66 and 133 MHz (up to 6.384 Gbps). Data-clock deskew is performed by the deserializer at power up or whenever controlled for this operation. Bit Error Rate (BER) Tests have shown that the quality of the deskew procedure is not constant and this can be critical at higher clock rates or with cables and connectors of lower quality. Therefore the firmware implements features 2

Discussion and conclusion
In this paper we have presented the design and the implementation of the Level-0 Trigger system of the Liquid Krypton Calorimeter of the NA62 experiment. In particular we discussed the tests performed during its commissioning for the 2014 physics test run of the experiment. Although a limited number of inputs is being used compared to the full system, all the hardware components are being tested.
In section 4.2 we have described the peak recognition firmware in the system Front-End layer and the processing limits of its architecture. The identification of the peaks and their processing (fit for maximum and fine-time estimation) can be completely pipelined with the input. Alternatively, the system can use NIOS II embedded processors that would not limit the processing capability as far as a sufficient number of them works in parallel; calculations show that 8 of them, easily fitting in the Stratix III being used, are enough to sustain the foreseen maximum instantaneous hit rate of 4.2 MHz per each front-end FPGA. 3 The maximum system throughput is therefore dictated by the bandwidth of the transmission link between the different layers of the system. As described in section 4.3, the transmission link has an effective data transfer of 2.870 Gbps and can be reliably increased up to 3.485 Gbps without any transmission error being detected. Considering a data payload of 82 bits per peak, this translates in a limit of up to 10 MHz peak rate processed by each front-end FPGA (20 MHz per calorimeter vertical slice), that is higher than the worst case estimation of 4.2 MHz.
The 2014 test run will allow to test all components of the system and to be ready for the data taking at full luminosity for the 2015 run. 2 Before normal operations each link is initialized with the transmitter sending a Pseudo Random Binary Sequence (PRBS, 23-bit generator) over the 48 bits of the link. The receiver repeatedly commands a deskew of the deserializer and checks the result (counting the errors in the received sequence) till a pre-defined link quality is achieved. 3 In the actual implementation there is no functional difference between the VHDL logic and the NIOS softwarebased logic because the software is written to perform exactly the same task of the VHDL logic. The difference being in the added flexibility in implementing changes to the algorithm in software.