``The Read-Out Driver'' ROD card for the Insertable B-layer (IBL) detector of the ATLAS experiment: commissioning and upgrade studies for the Pixel Layers 1 and 2

The upgrade of the ATLAS experiment at LHC foresees the insertion of an innermost silicon layer, called the Insertable B-layer (IBL). The IBL read-out system will be equipped with new electronics. The Readout-Driver card (ROD) is a VME board devoted to data processing, configuration and control. A pre-production batch has been delivered for testing with instrumented slices of the overall acquisition chain, aiming to finalize strategies for system commissioning. In this paper system setups and results will be described, as well as preliminary studies on changes needed to adopt the ROD for the ATLAS Pixel Layers 1 and 2.

The ATLAS experiment [1] is a general-purpose detector for the LHC at CERN. It consists of several sub-detectors designed with different technologies. The Pixel Detector [2] is the innermost one, installed just around the beam interaction point and devoted to tracking and vertex reconstruction. The system is arranged in three concentric cylinders (the barrel) and three disks at each barrel extremity (the end-caps). The mean radius of the barrel layer closest to the beam pipe (B-layer) is 5 cm. The basic building block of the active part of the Pixel Detector is a module that is composed of silicon sensors and front-end electronics. Each module is equipped with 16 front-end ASICs, called FE-I3, which are read out by a Module Control Chip (MCC). Each MCC is connected with the off-detector electronics via two links: the down-link is used to transmit clock, trigger, commands and configuration data to the front-end, while one or two named up-links are used for event readout. Each up-link can be configured to transmit at either 40 Mb/s or 80 Mb/s data rates. The off-detector basic readout unit is composed of two 9U-VME cards: the Back-of-Crate (BOC) and the Read-Out Driver (ROD), implementing an optical I/O interface and data processing respectively. They are paired back-to-back in the same VME slot. Electrical-to-optical conversion of link transmission occurs in custom opto-boards on the detector side, and in the optical-receiver (RX) and optical-transmitter (TX) plug-ins in the BOC. Readout data is then transmitted to the paired ROD (via the backplane) that processes the data and performs event building. Each BOC-ROD pair can sustain an aggregate output bandwidth of up to 160 MB/s, which correspond to -1 -8 up-links transmitting at the maximum data rate. The ROD output is sent via one S-Link 1 to the ATLAS data acquisition system. Due to different bandwidth requirements of the layers, a diversified read-out scheme is adopted: one up-link at 40 Mb/s is used for the Layer-2 modules, and one up-link at 80 Mb/s is used for the Layer-1, while the B-layer uses two up-links to increase the aggregate bandwidth to 160 Mb/s. The matching between the different readout schemes and the unique BOC-ROD architecture is mainly obtained by varying the number and the type of plug-ins into the BOC.
During the current LHC shutdown (planned to continue through the end of 2014) the ATLAS experiment will upgrade the Pixel Detector with the insertion of an innermost silicon layer, called the Insertable B-layer (IBL) [3], which will be interposed between a new beam pipe and the B-layer. The IBL has been designed to strengthen the tracking capability by increasing both redundancy and precision. Moreover it will preserve the detector performance for effects due to the increased luminosity expected after LHC upgrades (greater pile-up and radiation doses).
The new IBL detector will consist of 14 local supports (staves) located at a mean geometric radius of 33.4 mm, each loaded with silicon sensors bump-bonded to the newly developed frontend integrated circuit FE-I4 [4]. A module uses the FE-I4 IC and either planar or 3D sensors. The planar sensors will be bump-bonded to 2 ICs, while the 3D sensors will be bump-bonded to a single FE-I4. Each stave will be instrumented with 32 FE-I4s.

Overview of the IBL readout electronics
New off-detector electronics [5] have been designed as well; new BOC and ROD versions will be produced. Each card pair has to process data received from 32 FE-I4 data links for a total I/O bandwidth of 5.12 Gb/s, four times greater than the existing ROD-BOC pair. The schematic layout of both cards is shown in figure 1. The BOC card will perform I/O interface with the FE-I4 and with the ATLAS readout via 4 S-Links. A detailed description of BOC hardware, as well as firmware functionalities, can be found in [6].
Since each BOC-ROD pair is designed to serve the number of FE-I4s contained in one stave, 14 pairs in total are required. There is one additional BOC-ROD pair for the diamond beam monitor detector. These have to be installed and hosted in one full VME crate.

The Read-Out Driver (ROD) card
The ROD card has to accomplish several major tasks: detector control (by configuring and sending slow commands to the read-out electronics), data taking (by forwarding triggers and building events), detector calibration (by steering dedicated acquisition runs, getting data and collect them for building cumulative histograms) and detector monitoring.
The design is based on modern FPGA Xilinx devices: one Virtex-5 for control and two Spartan-6 dedicated to data processing. A complete description of the ROD hardware can be found in [8]; even though the card shown in that document is revision B, no significant changes have been introduced in the last production (revision C -shown in figure 2).  2 The ROD production and commissioning 2.1 Prototyping, pre-production and commissioning strategies The first prototype batch (revision A) included 3 samples produced in September 2011, and was mainly devoted to design and layout validation. The second prototype batch of 5 cards (revision B) was delivered on February 2012 and included major bug fixes with respect to the previous version. Some samples of revision B were also distributed to collaborators to enlarge the community of firmware developers, as well as to develop several test stands where different integration tests could be performed. In February 2013 5 cards (revision C) were produced with minor changes with respect to the previous iteration. This batch is considered as a pre-production sample because it will be used to validate the overall board performance and design before starting to manufacture the final boards. They will also be used as spares for the final system. Tests were performed in parallel in different laboratories with different set-ups. Six test stands are available: one at CERN, two in Italy (Bologna and Genova) and three in Germany (Gottingen, Mannheim and Wuppertal).

ROD testing
As part of testing the prototype boards, we have developed a comprehensive set of tests for board bring-up and deployment. The list of features that must be verified is the following: • VME interface; • FPGA firmware download, both stand-alone and from VME; • FPGA-embedded processors, with software download from VME; • TIM 2 interface (VME module which connects with the ATLAS level-1 trigger system); • connectivity between FPGAs and memory components (DDR2, FLASH-based, Synchronous Static RAM); • three Gigabit Ethernet links; • ROD bus (asynchronous bus to configure chips on both ROD and BOC); • buses to and from the BOC.
Three specific tests will be discussed in the following as being particularly interesting. In the first we will show how testing prototypes was decisive on finalizing some layout options; in the second we will spotlight one typical case where complex approaches and custom solutions have been developed to face a quite difficult problem. The last example will spotlight how integration tests provided us reasonable confidence that the ROD required performance has been successfully accomplished.

Test example #1: BOC to ROD transmission
The data received by the BOC from the FE-I4, after decoding, is sent to the ROD via a 96-bit wide bus with a single line signal rate of 80 Mb/s. The bus width and the signal rate have been dimensioned as a tradeoff between the bandwidth requirement between BOC and ROD (5.12 Gb/s) and the design modularity (12 bit is the width of the front-end data). The bus connects the Spartan-6 FPGAs, hosted on the different cards, through the VME backplane connectors. The communication has been implemented with the SSTL 3 standard [9]. Transmission is double data rate, synchronized with the 40 MHz clock common to both cards. The bus has been designed to have minimal skew and to optimize the signal integrity in the PCB layout. The decision whether to include bus terminators external to the FPGA was postponed to a later stage: terminations provide better signal integrity but the increased routing congestion and the reduced space on the board have to be considered. Extensive tests have been performed with an ad-hoc firmware, sending data continuously and measuring the bit error rate by varying the phase of the receiver clock. The goal of this test was to measure the width of the good sampling phase window. A typical result for cards of revisions A and B is shown in table 1. Due to unsatisfactory results, we added compact package resistors onto the whole bus in revision C, taking care to minimize changes on the current signal routing. The result for one card is shown in table 2. The performance looked more satisfactory, and we have seen good uniformity between cards.

Test example #2: SRAM controller for calibration histograms
As it was previously mentioned (section 1.3), one of the tasks of the ROD card is to create histograms of the data for calibrating the detector. Preliminary estimates show how scans, which required about 10 minutes with the existing electronics, can be performed faster in the IBL ROD. Since the histograms are read-out via Gigabit Ethernet links, acquisition runs with a comparable amount of data can be accomplished in about 10 seconds. The most important speed limitation becomes the maximum data rate of the Synchronous Static RAM (SSRAM) components. Due to the card design, in order to manage histograms for all FE-I4s in parallel, the selected SSRAM 3 must be accessed with a clock frequency of about 200 MHz. Implementing such a 36-bit SSRAM controller running at 200 MHz in a Spartan-6 FPGA required adopting careful design techniques. First, the clock is forwarded to the SSRAM with matched PCB feedback paths. Second, the Spartan-6 PLLs are configured in "Zero delay buffer" mode, synchronizing both FPGA and SSRAM clocks. Last, distinct clock domains for the transmitter and receiver logic are implemented.
Other issues have to be taken into account. For I/O pin driver properties, carefully chosen values of both slew rate and current drive capability must be determined to get the fastest pad propagation delay without violating the maximum amount of simultaneous switching outputs allowed by the FPGAs.
Finally, effects due to bus contention have been evaluated. This happens when two devices are simultaneously driving a bi-directional line. This is exacerbated by clock skew between the transmitter and receiver elements. Bus contention increases the power dissipation of the components. Taking into account both the switching times of devices (from the datasheets) and relative clock phases (from simulation), a temperature rise of less than 3 • C has been expected. Even though it is a small effect, we can reduce the effect further by grouping together several cycles with the same access type (read or write). Tests have been performed at maximum achieved clock speed (200 MHz), measuring the increase of temperature (with a thermal camera) and showing that we succeeded on limiting that effect.
The interface with the SSRAM has been successfully tested at 200 MHz in stand-alone operation, while a 140 MHz working version has been proven to work when integrated in the latest official release of the ROD firmware.

Test example #3: integration with the FE-I4
Extensive tests have been performed connecting the FE-I4 to a BOC-ROD pair. Their major aim is to validate the resources implemented on both cards (e.g. the amount of memory available in the chosen components), showing that hardware specifications have been met. Moreover, integration tests have been performed to drive the firmware design and development with real use cases.
The official version of the firmware is shown in figure 3. The design is highly modular. Four groups of 8 FE-I4 (4 modules) are processed independently on the BOC and then transmitted to a collector block in the ROD (called the "gatherer"). Then, an event builder (EFB) multiplexes data both to an S-Link interface and to a dedicated component (called histogrammer) that accumulates events and fills summary histograms (occupancy, time over threshold ToT per pixel, ToT 2 per pixel). The histogrammer stores temporary data in the SSRAMs. At the end of the acquisition, the histograms are read by an embedded microcontroller (a Xilinx MicroBlaze IP core) via DMA and loaded into the DDR2. They are then sent to an external PC farm through Gigabit Ethernet link. The PowerPC microcontroller, embedded in the ROD Virtex-5, controls all the subsystems and acts as an interface to the software application that executes this test. At the present time, the test stand is composed by two FE-I4s, connected to the same datapath (see green lines in figure 3).
The results are satisfactory: the basic implemented functionalities are adequate to successfully perform data acquisition and calibration scans, with FPGA resource utilization of less than 20%. This allows us to be confident that a more realistic version (with busy logic, timeouts, complete error handling, etc.) will fit on the chip. Moreover, the whole mechanism of histogram filling and acquisition has been evaluated, showing how all requirements will be met even when it will operate -6 - with the full set of FE-I4s connected. Thus, the choice of the specific components installed in the ROD has been confirmed, and the overall board design validated.

ROD production: plans and validation procedures
At the present time, the ROD design has been successfully validated and readied for production for IBL. The same firm that delivered all prototype batches will do the PCB manufacturing and assembly. After production, preliminary tests (similar to those of revision B and C) will be delegated to the same company: electrical examination after components supply and Xray inspection for large BGA-packaged devices.
We also defined a list of minimal procedures to validate the ROD cards after production: • firmware-software upload from VME, JTAG and Gigabit Ethernet ports; • ROD to BOC (and BOC to ROD) dataflow over all I/O lines; • R/W tests for Virtex-5 and Spartan-6 external memory modules; • dataflow tests on the 3 Ethernet ports; • TIM card connectivity test.
-7 - Figure 4. Synchronization errors in runs acquired last year. The y-axis is the maximum number of modules that showed synchronization errors at a given event. The ROD verifies the synchronization of the data link by checking both encoded data format and the event counter. The synchronization, during data taking, is automatically restored by issuing an event counter reset every few seconds. As an example, a "Synchronization Error/Event" of 50 means that an inefficiency of 50 out of 1700 (the number of modules in Layer 2) occurred for few seconds. The excess shown by Layer 2 links with respect to the others are probably due to bandwidth limitation.
These tests will be performed before the board installation on the experiment. The firm has been hired to produce 15 cards, identical to revision C, at the end of August 2013, fulfilling the production requirements for IBL. Delivery is expected by mid October.
3 Review on upgrade for read-out electronics of Layer 1 and Layer 2 With the restart of LHC (2014) we are expecting a higher luminosity, which will increase even more in subsequent years. As a consequence, we foresee that the Pixel readout link of the actual detector will suffer from bandwidth limitations. The link bandwidth is proportional to the product of occupancy and trigger rate of the front-end devices. The first is a function of the luminosity and can be extrapolated using the experience gained from previous years of data acquisition. Preliminary measurements show that, even with an average number of expected pile-up events µ ≈ 50 (a reasonable estimation for luminosity, energy and bunch spacing expected by 2015) and trigger rate of 100 kHz, the estimated average link occupancy for Layer 2 is about 90%.
With greater pile-up (when luminosity will be greater than 2 · 10 34 cm −2 s −1 ) even Layer 1 will get into trouble. Moreover, in the last run, Layer 2 already showed several synchronization errors, probably due to bandwidth limitation, even though the average link occupancy was about 50% ( figure 4).
The actions that could be taken are different in Layer 1 and Layer 2. Layer 2 is read-out with one link per module at 40 Mb/s. Increasing the bandwidth to 80 Mb/s is a viable solution; it would require producing more BOC opto-electrical plug-ins as well as more BOC-ROD pairs (to support -8 -double the bandwidth) and rearranging the link cabling. Since IBL off-detector electronics already manage BOC-ROD transmissions at 80 Mb/s (section 4) it would be straightforward to adopt IBL cards. Only two minor modifications would be needed: a new firmware and new custom RX optoelectrical BOC plug-ins. No change in connectivity and fiber routing is foreseen.
Layer 1 is already able to manage read-out at 80 Mb/s; therefore, installing a second link per module can increase the bandwidth. As in the previous case, the adoption of IBL cards would require both new firmware and BOC plug-ins, with minor modifications of the connections.
The adoption of IBL ROD-BOC cards would bring also more benefits: because of the standardized design, it will provide a uniform board for most of the Pixel read-out system, and provide common spares for all sub-detectors. No modifications will be needed to the ROD design, so production can start immediately, and the board will be ready by the LHC restart. 26 ROD will be needed for Layer 2 and 38 for Layer 1. No official decisions on board deployment have been taken yet, but discussions are ongoing.