The FPGA based Trigger and Data Acquisition system for the CERN NA62 experiment

The main goal of the NA62 experiment at CERN is to measure the branching ratio of the ultra-rare K+→π+νν̄ decay, collecting about 100 events to test the Standard Model of Particle Physics. Readout uniformity of sub-detectors, scalability, efficient online selection and lossless high rate readout are key issues. The TDCB and TEL62 boards are the common blocks of the NA62 TDAQ system. TDCBs measure hit times from sub-detectors, TEL62s process and store them in a buffer, extracting only those requested by the trigger system following the matching of trigger primitives produced inside TEL62s themselves. During the NA62 Technical Run at the end of 2012 the TALK board has been used as prototype version of the L0 Trigger Processor.


.1 Introduction
The NA62 experiment at the CERN SPS aims at measuring the ultra-rare FCNC kaon decay K + → π + νν with a precision of 10%. This rare decay is an excellent process to study the flavour physics because of its very clean nature from a theoretical point of view.
The strong suppression of the SM contributions and the remarkable theoretical precision of the SM rate make this decay a powerful probe for possible new degrees of freedom, complementary to direct searches at the LHC and potentially sensitive to much higher energy scales.
NA62 will collect about 100 events in two years of data taking. Assuming a signal acceptance of about 10% and a branching ratio of the order of 10 −10 , 10 13 kaon decays are required. To achieve the signal over background ratio of 10/1 a rejection factor of 10 12 is needed, coming both from particle vetoing and particle identification, and from kinematics, together with the possibility to measure efficiencies and background suppression factors directly from data.

NA62 detector layout
The NA62 experiment will exploit the in-flight decay technique. The secondary kaon beam is produced by protons from the SPS at 400 GeV/c impinging on a beryllium target; the beam optics system selects the charged kaons with a momentum of (75±1%) GeV/c. This momentum improves background rejection and sets the longitudinal scale of the experiment. The detectors are positioned along 170 m, starting about 100 m after the beryllium target; the fiducial region for the selection of -1 - useful decays is 65 m long. The total beam rate is 750 MHz, charged kaons are about 6% of the total, thus resulting in an integrated rate over downstream detectors of about 10 MHz.
A schematic layout of the experiment is shown in figure 1.
The final sensitivity of the experiment needs tracking devices for beam particles up to the rate of 750 MHz, tracking devices for charged decay products, calorimeters for π 0 rejection and for electrons identification, muon vetoes, pion-muon separation in the fiducial kinematical region (15-35 GeV/c). An overall event time resolution of about 100 ps is required to avoid the mismatch between a pion and a beam particle and a resulting wrong missing mass.
The following detectors are present (see figure 1): • Cerenkov threshold detector (CEDAR) tagging a kaon of the nominal momentum and its time window.
• Beam spectrometer (GTK) measuring coordinates and momentum of beam particles before entering the decay region.
• Ring counters (CHANTI) surrounding the last GTK station veto charged particles upstream of the decay region.
• Decay products spectrometer consisting of 4 straw tube chambers (STRAW) with a dipole magnet.
• Ring-imagingČerenkov (RICH) for pion-muon separation and precise timing of the pion.
• Photon veto system at small angle (IRC and SAC), medium angle (LKr), large angle (LAV). The LKr EM calorimeter will also provide high resolution energy measurements.

The Trigger and Data Acquisition system
The intense beam flux of the experiment requires an high performance trigger and data acquisition (TDAQ) system, which must minimize dead time and random veto, and maximize the efficiency in data collection at high rates. NA62 has a unified TDAQ system: trigger is integrated inside the DAQ system and the lowlevel selection is done digitally using data itself, with just a single digitization.
The timing of the experiment is provided by the Timing, Trigger and Control (TTC) system used in LHC experiments [3]. The time scale is defined by a 32-bit timestamp with 25 ns LSB, covering the duration of an entire burst, plus 8 bit of fine time with 100 ps LSB.
A schematic view of the NA62 trigger hierarchy can be seen in figure 2. The trigger logic is made of three levels: the rates and the number of channels of the experiment (12 sub-detectors, about 80000 channels, 25 GB/s of raw data) have driven the choice of NA62 for an hardware lowest-level trigger; in response to a L0 request sub-detectors will transfer data to dedicated PCs where L1 and L2 high-level trigger algorithms will be applied.
The rate on main detectors will be about 10 MHz. The L0 will be implemented in the common TEL62 board (see section 3.1) and will be used to reduce the total rate to under 1 MHz using information coming from fast detectors: the charged hodoscope (CHOD) and the RICH as positive -3 - elements, the muon veto (MUV) and the photon vetoes (LKr, LAV) as negative ones. Given the 1 MHz limit, it will be possible to implement secondary triggers for control samples and different physics goals, such as the search for other rare or forbidden decays of the K + and the π 0 . Data from all sub-detectors will be stored in buffers during L0 trigger evaluation, for a time up to the defined maximum L0 trigger latency of 1 ms.
After sub-detector data is sent to the PC farm following a L0 trigger, the L1 algorithms will check data quality conditions and require simple correlations between conditions computed by single sub-detectors, reducing the rate to under 100 kHz. In case of positive L1, a complete event reconstruction at L2 could be done. Both these trigger levels are implemented inside the same PCs, to avoid useless data transfer. All data satisfying the L2 trigger condition are finally logged to tape with a rate of the order of 10 kHz.
3 The FPGA based common system architecture 3

.1 TEL62
The TEL62 board (see figure 3) is the common FPGA-based motherboard for trigger generation and data acquisition of the NA62 experiment. It has been developed in Pisa, and represents a major upgrade of the TELL1 board designed by EPFL Lausanne for the LHCb experiment at CERN [4].
The overall architecture of the TEL62 is similar to the TELL1's, but the board is based on much more powerful and modern devices, resulting in 8 times the computing power and about 20 times the buffer memory.
The board size complies with the 9U Eurocard standard. The printed circuit is made of 16 layers, with all lines controlled in impedance (50 ohm). Special care has been used for the clock tree, to avoid signal jitter.

JINST 9 C01055
The board is composed of the following parts: • 4 Altera Stratix III FPGAs EP3SL200F1152 (called Pre-Processing or PP-FPGAs), each one handling the data from one TDCB (see section 3.2) mezzanine daughter-card providing digitized data; the mezzanine is connected through a 200-pin connector (4 × 32 bit at 40 MHz data buses). Each PP-FPGA is also connected to a 2-GByte DDR2 memory buffer (64 bit at 640 MHz); • 1 central Altera Stratix III FPGA EP3SL200F1152 (called Sync-Link or SL-FPGA), connected to all the PPs through independent data and trigger flow buses (32 bit each one at 160 MHz), and connected also to the output mezzanine GbE; a 1Mb QDR RAM is used as intermediate buffer; • the output mezzanine card is a Quad GbE (same board as in the TELL1 design), equipped with 4 1 Gbit Ethernet channels, used to connect the TEL62 to the L0 Trigger Processor (L0TP), to other TEL62s in daisy chain, or to send the main data flow to the PC farm; • the slow control of the board is handled by two other mezzanine cards: a commercial Credit-Card PC (CCPC) and a custom card named Glue card, also identical to those on the TELL1 board, connected to the CCPC through a PCI bus. Three different communication protocols are implemented in the Glue Card and distributed to all devices and connectors: JTAG, I2C mainly used for slow communication with TDCB and TALK (see section 3.3) daughtercards, and ECS (a parallel bus to access FPGA internal registers); • clock and L0 trigger are distributed through a standard optical TTC link, a CERN-developed optical time-multiplexed connection which distributes the main 40 MHz clock with trigger and timing signals encoded in it; a TTCrx chip is used to decode this information [5]; • an auxiliary connection for inter-board communication is also present (2 independent 16 bit buses).
The firmware of the FPGAs has been developed within a common framework using the software tools HDL Designer and Modelsim (from Mentor Graphics) and Quartus II (from Altera). A brief description of the firmware architecture is given in the following. Each PP collects data from its own daughter-card and merges information from the 4 TDCs into one single buffer, for each 6.4 µs long data frame. Merged data is triplicated in different streams to feed three parts of the firmware: one for monitoring purposes, one for trigger primitives generation, one for data storage inside the DDR2 buffer. Each PP receives from the SL the timestamped L0 trigger requests, extracts from the DDR2 the requested data corresponding to a programmable number of 25 ns time slots around the trigger time and transfers it to the SL. It also continuously sends the generated sub-detector's trigger primitive to the SL.
The SL firmware merges the TDC data coming from PPs for each L0 request to build an event fragment and stores it into the temporary QDR buffer; from it several event fragments are then read and assembled into a multiple-event packet in UDP format, to better exploit the data link bandwidth. The TTC interface decodes the trigger information arriving from the optical fibre and dispatches this to all PPs. This information contains the sequential trigger number, the trigger type -5 -

TDCB
A daughter-card for the TEL62 motherboard has been developed in Pisa, called TDC board (TDCB, see figure 4), for the high resolution time measurements needed in NA62. The printed circuit is made of 10 layers. Up to 4 TDCBs can be housed in a TEL62, each one connected through the 200-pin connector, thus allowing the readout of 512 LVDS input channels per motherboard.
The TDCB includes: • 4 CERN-developed High-Performance TDC (HPTDC) chips [6], each one giving 19 bit leading time and time over threshold measurements with 100 ps LSB for 32 LVDS input channels; • the input to TDCs is provided through 4 VHDCI 68-pin connectors for standard 34-pair SCSI cables; • 1 Altera Cyclone III FPGA EP3C120F780 (called TDC controller or TDCC-FPGA); • 2 × 1 MB SRAM, for first data monitoring or pre-processing; • 1 QPLL for clock jitter reduction (under 40 ps), to avoid compromising the time resolution as time measurements are performed by TDCs with respect to clock edges; • the slow control of the board is driven by the motherboard through the I2C protocol; TDCs and FPGA configuration is done through JTAG.
The TDCC firmware implements the following operations. It handles the communication of the board with the CCPC acting as an I2C slave, for commands and configuring instructions. As a JTAG master it configures the 4 TDCs' registers. At runtime it controls the TDCs, recording possible errors; then it reads out data words and associates coarse timestamps and monitoring counters to data packets. In particular cases data are pre-processed for the correct primitive computation and buffering inside TEL62's DDR2. At the end of the flow the firmware provides data on 4 parallel buses to the PP (4 × 32 bit at 40 MHz). TDCs don't receive the L0 trigger, as in NA62 it is produced using digitized data themselves. Data are rather continuously read out sending periodic triggers from the TDCC to TDCs, which -6 - provide data belonging to a precise time window: the TDC has a 256 32-bit words wide buffer for each of the four 8 channel subgroups, read every 6.4 µs. The TDC data are then transferred to the TEL62.
A powerful TDC data emulator was also implemented: it is used as a test bench for TEL62's firmware testing and debugging.
The TDCC can drive a spare output LVDS pair of the TDC connector, that allows to trigger the front-end boards for sub-detectors' calibration. Additionally, one channel per TDC chip can receive a NIM input from a LEMO connector mounted on the daughter-card rather than from the LVDS connector input.
The TDCB will be used for several sub-detectors: CEDAR, CHANTI, LAV, RICH, CHOD, MUV; test beams in the past years showed that the time resolution is not affected by this system and is compatible with the expected time resolution of each single sub-detector.

TALK
The TALK (Trigger Adapter for Liquid Krypton calorimeter, see figure 5) is a multi-purpose TEL62 daughter-board developed at CERN. The printed circuit is made of 10 layers, with clock lines controlled in impedance. It is connected to the motherboard through the same 200-pin connectors used by the TDCB, and up to 2 TALKs can be mounted on a TEL62. A 6U VME frame has also been developed for standalone use.
This board has been successfully used during the NA62 Technical Run at the end of 2012 as the interface between the old LKr readout (as the new one was still not deployed in 2012) and the TTC system dispatching triggers, thus allowing data collection from the LKr. The TALK board is currently used as L0TP prototype. The firmware implements the following: primitives collection from TEL62s through Ethernet and merging; old style NIM triggering using LEMO connectors; trigger decision sent to LTU using the dedicated connector; choke and error signals manager using RJ11 connector as input.
The main future uses of the board will be as test bench for new calorimeter readout modules and the implementation of the LKr calibration logic. The former will have a firmware for emulated L0 (through LEMO or via LTU) and L1 (through Ethernet) triggers. The latter will operate in standalone mode and Ethernet commands will program the LKr calibration varying rates, type of pulses and starting point reference in the SPS cycle.