A potent approach for the development of FPGA based DAQ system for HEP experiments

With ever increasing particle beam energies and interaction rates in modern High Energy Physics (HEP) experiments in the present and future accelerator facilities, there has always been the demand for robust Data Acquisition (DAQ) schemes which perform in the harsh radiation environment and handle high data volume. The scheme is required to be flexible enough to adapt to the demands of future detector and electronics upgrades, and at the same time keeping the cost factor in mind. To address these challenges, in the present work, we discuss an efficient DAQ scheme for error resilient, high speed data communication on commercially available state-of-the-art FPGA with optical links. The scheme utilises GigaBit Transceiver (GBT) protocol to establish radiation tolerant communication link between on-detector front-end electronics situated in harsh radiation environment to the back-end Data Processing Unit (DPU) placed in a low radiation zone. The acquired data are reconstructed in DPU which reduces the data volume significantly, and then transmitted to the computing farms through high speed optical links using 10 Gigabit Ethernet (10GbE). In this study, we focus on implementation and testing of GBT protocol and 10GbE links on an Intel FPGA. Results of the measurements of resource utilisation, critical path delays, signal integrity, eye diagram and Bit Error Rate (BER) are presented, which are the indicators for efficient system performance.


Introduction
Nuclear and particle physics experiments at high energies, often referred to as High Energy Physics (HEP) experiments, study the constituents of matter and their fundamental interactions. The evolution of our knowledge of the fundamental particles, their interactions as well as connections to the early Universe, has been proportional to the evolution of the available beam energies in the particle accelerators. The increase of collision energy and beam interaction rates demand for sophisticated and hightech detectors, electronics and data acquisition (DAQ) systems. In addition, the radiation levels in the proximity of the detectors have also been growing, which calls for radiation tolerant systems. The readout electronics in the harshly radiated area are highly prone to damage due to the total dose, single event upsets and non-ionizing energy loss [1] depending on the type of radiation. Traditional DAQ systems [2,3] of the last century, could handle low data rate and less data errors against multi-bit upset in radiation zone. An efficient DAQ system at the present time must be able to cope with the increase in data volume by acquiring data at a high rate and recovery from data error against the multi-bit upsets in radiation environments.
-1 - Modern DAQ for HEP and nuclear physics experiments is a result of continuous evolution [4]. The overview of the journey and the different methodologies adopted in the field of high speed DAQ are summarized in the table 1. In the early two decades of 1960-1980 the DAQ issues had been acknowledged by custom designed readouts that were framed after the characteristics of the individual experiments. The introduction of Nuclear Instrumentation Module (NIM) standard and a modular computer-controlled bus named Computer Automated Measurement and Control (CAMAC) focused on the standardisation of front end and back end respectively however this lacked parallelism with limited data rate and channel count. In the decades of 1980-2000, Fast-Bus standard supported parallelism but the advent of microprocessors leads to the Versa Module European (VME) standard and VME Inter-Crate bus specifications (VIC). In the current century point to point high speed links are evolved, specifications for Ethernet and Peripheral Component Interconnect Express (PCIe) protocol are continuously expanded. With the advent of highly dense latest commercial Field Programmable Gate Array (FPGA), we are heading towards the data rates of Terabytes/sec with high channel count and on-board and local processing. One of the examples of the demand for moden DAQ systems is at the experiments at CERN's Large Hadron Collider (LHC), which is world's largest and most powerful particle accelerator, operating since the year -2 -

JINST 12 T10010
2009. The first phase of data taking is over and the second phase will be completed in 2018. The LHC and the HEP accelerator experiments aim for the stepwise upgrade to fully extract their scientific potential and extend the physics outreach [5]. With the proposed upgrade in the coming years, the LHC beam energies will increase and the beam luminosity will progressively ramp up to six times of its current design value of 1 × 10 34 cm −2 s −1 . It will increase the interaction rates which will lead to a dramatic increase in the channel occupancy, data rate and data volume [6].
In this manuscript, we address the challenges of DAQ for large HEP experiments. The key issues in the design of DAQ for HEP experiments are the high data rate communication with error resiliency in the harsh radiation environment, quick upgradation, easy reconfigurability and portability on other platforms. A Field Programmable Gate Array (FPGA) based new DAQ readout scheme is proposed that has high speed, fault tolerant readout architecture with the ability to perform in a harsh radiation environment, yet flexible enough to keep up with upgrades and instant reconfigurable. The uniqueness of the proposed scheme is that the readout could be implemented using the commercially available, non-radiation hardened [7] state-of-the-art FPGAs with high processing power in comparison to the radiation hard FPGAs. This approach is based on the radiation tolerant 4.8 Gbps GBT [8][9][10] optical link and 10GbE [11] link using modular approach. Functional testing of the links is presented. For this development, FPGAs are considered for rapid prototyping as they are field reconfigurable with large resources. It allows higher parallelism and pipelining thereby increasing the logic computation speed and minimizing the latency involved.
The paper is organised as follows. The details of the readout scheme, its internal architecture, features, constituents and the advantages over the conventional approach are explored in section II. Architecture of readout links with different interfaces are discussed in section III. Section IV illustrates the test setup for the implementation of the optical interfaces on FPGA. Performance evaluation and functional tests for evaluation of different protocols are described in section V along with analysis results and discussions. The paper is concluded with a summary in section VI.

DAQ architecture
The most demanding features for DAQ in the HEP experiments involve error resilience, high data rate handling, compact hardware with reusable modules for portability and quick upgradation and efficient data aggregation [12,13]. Motivated by such requirements a simplified hierarchical readout chain for HEP experiments is proposed as shown in figure 1. In this scheme, the readout system is broadly divided into two parts: Front End Electronics (FEE) and the Data Processing Unit (DPU).
FEE are located in the radiation zone with proximity to the detector requiring custom built radiation-hard electronics. At the first stage, FEE receives the data from the detectors. It amplifies, integrates and shapes the weak sensor signals over a given period, and provide robust signals to be transmitted from the detector. A FEE consists of Front End Module (FEM) and Optical Conversion Module (OCM) [14,15]. The design of each FEM is unique to individual detector requirements [16]. In general, FEM consists of a charge sensitive preamplifier, buffer, sequencer, Analog to Digital Converter (ADC) with other detector specific components. FEM operates directly on the analog (charge) signals produced by the beam interactions in the particle detectors. An event is qualified by the beam interactions and in certain instances by the interaction trigger. When an event occurs, the FEM sends the digitized detector specific data to the Optical Conversion Module -3 -

JINST 12 T10010
Detector System-1  (OCM) through the differential Electrical Links (E-links) [17]. The OCM converts the digitized data to the optical signal wrapped in the link format [8] and sends it to the DPU over an optical fibre. Optical conversion is needed to communicate the high speed data over long distances with less channel noise having low power consumption.
DPU aggregates the data from a large number of high bandwidth detector side optical links to even higher bandwidth data links on the server side. It increases the throughput and also optimizes the system level cost. The DPU hardware is identical for every detector. However, the detector specific functionalities like number of optical links to be handled, firmware, need DPU to be implemented as custom designed electronics boards with programmable functionalities based on FPGA technology. The digitized data sent to the DPU are multiplexed, processed and formatted depending on the detector specifications before being forwarded to the back-end computing nodes.
The physical location of the DPU in the readout chain is one of the major factors that affect the selection and design of the hardware. DPU can be located either near the detectors as the conventional approach or far from the experimental site in the controlled radiation zone. However, counting room, far from radiation zone is chosen as the preferred location for the DPU. The advantages of the proposed approach over the conventional approach against the critical design parameters and their implications for the DPU technology, available ecosystem and ease of maintenance are listed in the table 2. Considering all these, the layout for the readout scheme is adopted as shown in figure 1.
Data transfer from detectors to the DAQ with high reliability and fixed latency [18] is a crucial requirement for HEP experiments. The interface from FEE to DPU and between DPU to the backend computing node are marked as Link-1 and Link-2 respectively shown in figure 1. The various commercially available protocols can not be used as Link-1 as the robust error correcting code is not present to protect the data upset due to harsh radiation in the cavern. Hence 4.8 Gbps radiation tolerant GBT protocol developed at CERN which is bidirectional and error resilience optical link having fixed latency support is used as Link-1. 4.8 Gigabit/sec for GBT protocol is composed of 40MHz x 120 bits. Frequency of 40 MHz is derived from the LHC bunch spacing time of 25ns [18] and 120 bits is technology parameter [19]. If FPGA transceiver reference clock is fixed at the 40 -4 - MHz then the minimum input data width of the transceiver is 120 bits, so the GBT protocol with the link rate as 4.8 Gbps is chosen. Details of the GBT protocol, its implementation and testing on FPGA are discussed in section 3, 4.1 and 5.1 respectively. The link-2 from the DPU to the back-end processor requires a standard interface having large bandwidth with a multi-channel support that guarantees transmission capability. Custom designed or commercially available link could be used.
Since the DPU will be connected to a commercial DAQ server, so a custom defined protocol for high speed link becomes difficult to maintain with poor future support. Hence, different commercial off-the-shelf options were explored that have large ecosystem with ample solutions and reasonable cost. The two latest promising technology options available for the high speed Link-2 interface are PCIe express protocol [20] and 10GbE protocol [21]. However, for each experiment with its distinct set of requirements, it is hard to adopt a ready solution commercially available. Hence it is best -5 -to perceive a standard high speed protocol and adapt it as per our requirements of HEP. The most tangible options as mentioned above are compared concerning the design requirements like the form factor, legacy support, ease of upgradation, flexibility and cost are listed in the tabular form in table 3. Although designers had migrated to PCIe [20], still 10GbE has got future proof solutions. Hence the 10GbE standard interface is considered for the high speed link to the computing system as summarized in table 3. 10GbE can be optimized for detector specific data and the Quality-of-Service is provided in the higher layers of Open Systems Interconnection (OSI) model [22].
The selection of FPGA family for the DPU hardware and DAQ firmware development is constrained by the availability of logic resources and High Speed Serial Interface (HSSI) Serializer-Deserializer (SerDes) on FPGA. Different non-radiation hardened FPGA families are compared against various crucial parameters like the available logic resources, transceivers, Phase lock loops (PLL) and market availability for the choice of FPGA chip on DPU as listed in the table 4. Intel Stratix-V GX FPGA [23] has been chosen due to the enough available resources as per the requirement of prototype tests.

Readout links
Architecture for link-1 and link-2 are discussed in this section. The radiation tolerant GBT optical link has been used as Link-1. It is the front end interface for digital data transfer from the on-detector electronics to the DPU. Data are multiplexed, processed and reformatted in the DPU depending on the detector specifications and sent to the processor using the back end interface Link-2 as 10GbE protocol.

Front end interface -link-1: GBT
GBT link is the interface between FEE and DPU. It is used for data, timing and control distribution merged on a single data channel. The GBT transmission is an asynchronous serial communication that is composed of a GBT transmitter, a Multi-Gigabit Transceiver (MGT) [19] and a GBT receiver as shown in the figure 2. Pattern generator and checkers are used for testing purpose only, and they are replaced with First In Fist Out (FIFO) buffers for FEE buffered data. In the transmitter, scrambler maintains the DC balance for accurate clock recovery without additional overhead by reducing the occurrence of a long sequence of continuous 1 s or 0 s.
Reed-Solomon (RS) encoder as shown in figure 3 utilizes two double interleaved RS(15, 11) encoded words each capable of correcting a double symbol error. Interleaving operation increases -7 -the error correction capability up to 4 symbols with each symbol of 4 bit. The whole process increases the code correction capabilities without any additional overhead. Gear Box, as shown in figure 2 translates the frequency by modifying data bus width for Clock Domain Crossing. It consists of a dual port RAM, breaks down 120 bit frame to three word of 40 bits each. In the transmit chain, the input of Gear Box is 120bit@40MHz and output is at 40bit@120MHz keeping the data rate fixed at 4.8Gbps. Data frame is sent from the GBT transmitter to a high speed Multi Gigabit Transceiver block.  GBT Receiver performs descrambling, decoding and deinterleaving. The Frame Aligner block [8] performs header detection and locking for frame synchronization using an efficient pattern search algorithm to maintain synchronization between the transmitted and the received data. A custom developed radiation hardened GBT chipset [15] consisting of GBT ASIC with other components is used as OCM in this scheme. Data from the detector is framed into GBT standard using GBT chipset and transmitted to DPU via serial optical link. Radiation hardness is required near detectors, however it is not necessary for the DPU located away from the radiation zone [24]; this feature is utilized for the realization of GBT functionality on the non-radiation hardened FPGAs. GBT-FPGA logic core firmware [8] is implemented on the FPGA based DPU. It mimics the GBT ASIC behaviour on the FPGA to enable the DPU for receiving the GBT datagram and transmitting the control and timing signals from control room to the detectors. The details of the GBT protocol standard are summarized in the figure 4a.

Back end interface -link-2: 10GbE
10GbE link acts as an interface between DPU to the back end servers. 10GbE protocol stack consists of a Physical layer and the Data link layer according to OSI model. Data link layer is formed of Media Access Control (MAC) and the Logical Link Control (LLC) as shown in figure 4b.
Detector specific data payload is submitted to the MAC layer [11]. MAC layer initializes, controls and manages the peer to peer connection to prevent from transmission failure due to data collision. It acts as a bridge between the Physical layer and the data link layer. An interconnection between MAC layer and Physical Layer (PHY) is a 72 bit wide 10-Gigabit Media independent interface (XGMII) [11]. The parallel data path of the XGMII and the serial data stream of MAC is mapped by Reconciliation Sublayer (RCS). 10GbE MAC IP core and 10G-Base-R Physical layer (PHY) IP core [25] from Intel are used in this scheme. The internal block diagrams of the two IP cores are shown in the figure 5. Both the Physical Coding Sublayer (PCS) and Physical Medium -8 -

Test setup for the interfaces
Development of the scheme shown in figure 1 involves the integration of the constituent components. Aim of the setup is to test the individual modules. It is important as the performance of the scheme depends on the interactions between different constituents. Setup focusses on the testing of the interface links on the FPGA. It tests the compatibility of the components to transfer the valid data at the correct instance across the interfaces. Transceiver test forms an important part as it is the hardware interface to receive the data from the OCM to DPU and transmit it to the back-end processor.
The two interface links used in the readout scheme; GBT link and the 10GbE link are implemented and functionally tested on Intel FPGA. The GBT-FPGA logic core was functionally simulated to understand the behavior of each functional block and then implemented in FPGA [26]. 10GbE link was implemented using the Intel's system integration tool Qsys [27] to adopt a modular approach. The entire test setup is segregated into multiple test models for easy step by step debugging and rapid fault finding of the constituent modules in case of faulty behavior and non-functioning of the scheme.

Implementation of link-1: GBT on FPGA
Intel Stratix-V GX FPGA development board [23] is used for the implementation and testing of GBT-FPGA logic core firmware. FPGA fabric clock of frequency 156.25 MHz was driven by an on-board oscillator and a clock of 120 MHz to drive MGT is fed by an external jitter cleaned clock source CDCE62005EVM of Texas Instruments [28]. GBT link could be operated in Standard (Std) mode or Latency optimized (Latopt) mode [29,30]. In Std mode of GBT operation, an elastic buffer is used in between PCS and PMA blocks to compensate for the phase of the clocks that drive the two blocks as shown in figure 6. Elastic buffer adds uncertainty in the latency. To alleviate this effect, the elastic buffer is bypassed and an external phase aligner block is used to align the phase of the clocks between PCS and PMA. This helps to achieve a consistent delay for Latopt mode of GBT operation which is needed for the time critical data transfer. A detailed study for the estimation of the latency in terms of the clock cycles utilized in the GBT protocol for data processing and transmission is done for all the possible combinations of the mode of operation for transmit and the receive side [31] and discussed in the section 5.1. The latency is measured by concurrently subtracting the transmitted and received value of the counter from the pattern generator in the loopback condition as shown in the figure 7. Estimation for the utilization of FPGA logic resources is an important parameter for the choice of FPGA on the DPU where a large number of links need to be handled. It is estimated using the implementation report of Intel Quartus-II tool (FPGA design software by Intel-Altera).  -10 -

JINST 12 T10010
An eye diagram [32,33] is used to indicate the quality of signals in high-speed digital transmissions. The channel performance of the transceiver link was studied by interpreting the eye diagram pattern in a LeCroy oscilloscope. BER analysis is done using the pattern generator and checker. The outcome of the measurements are discussed in the section 5.1

Implementation of link-2: 10GbE on FPGA
A test setup is developed for 10GbE implementation on FPGA utilizing Intel IP cores, along with the associated firmware and the embedded software. It is implemented on Quartus-II using the Qsys system integration tool for the quick generation of the interconnect logic and the functionality is verified using the Intel's ModelSim simulation software. The implementation includes two models. The Model-1 focusses on the efficient method of high speed data transmission with minimum processor overload. It performs the loopback tests. The Model-2 focusses on the optical link testing. It presents an effective approach to address the challenges associated with the testing, performance monitoring and parameter tuning of optical interconnects in FPGA-based systems.   The architecture of the assembled system instantiated in FPGA and the interconnection between different sub-blocks is shown in figure 8. The model consists of 10GbE Ethernet MAC IP core, NIOS-II processor IP, Scatter Gather Direct Memory Access (SGDMA) IP, JTAG UART [25], an Onchip memory, two On-chip dual clock FIFOs and a standard XGMII interface on the network side and configured to include 10G-Base-R PHY layer IP for optical communication. The implementation methodology is based on the resource intensive soft-core NIOS-II processor [34]. Its soft-core nature allows the designer to specify and generate a custom software over NIOS-II core. NIOS-II acts as a control unit in the loopback test, coordinates the design and provides overall system control. In this design, the SGDMA controller core is used for high-speed data transfer with -11 -minimal processor hold-up [35]. It links the transfers to non-contiguous memory using a table of descriptors from memory. SGDMA improves the overall system performance as compared to the DMA cores. The On-chip memory stores the executable program, data, as well as descriptors for the SGDMA controllers. Dual clock FIFO buffers are used between the SGDMAs and the 10GbE MAC IP core for clock domain crossing. Avalon Memory-Mapped (Avalon-MM), Avalon Stream (Avalon-ST) and Avalon conduit bus [36] are used as interface buses. A brief snapshot of the bus signalling is shown in figure 9. Avalon-MM interfaces are used to implement the address-based read and write interfaces for the source and sink SGDMA. Avalon-ST interface on the client side is used to configure 10GbE MAC IP. Avalon-ST supports the unidirectional flow of data for the components that need low latency, high throughput point-to-point data transfer with data bursting and interleaving option. All the read/write signals and data transfer is synchronized with an associated clock interface. The control lines are implemented using Avalon-MM bus lines and data stream lines are implemented using Avalon-ST bus. -12 -available on the Intel development board. The 10G-base-R PHY IP is operated in the internal loopback mode. Software based loopback test setup is developed using the NIOS-II Software Builder Tool (SBT) [37]. The NIOS-II processor runs the application program that handles the data transmission. It coordinates the design by allocating the memory to store the transmit and receive data buffers and the descriptor pairs. The test data is incremented in the transmit buffer; it populates the descriptor pair, writes the first descriptor pair to the SGDMAs, thereby starting the transfer, waits until both SGDMAs complete the transfer of all the data buffers. It also validates the received data with the transmitted data. The results are discussed in the section 5.2.1  Figure 11. Simpified digital communication optical link.

Model-2: Hardware platform and transceiver test
Optical link architecture for the digital communication in FPGA is illustrated in figure 11. The data path consists of FPGA transceiver consisting of PCS and PMA, optical transmitter (laser diode circuitry) and receiver (PIN diode circuitry) along with multimode optical fibre [38]. FPGA is connected to the transmission channel through the PMA block which generates the required clocks and perform serialization/deserialization. The digital processing between the PMA and the FPGA core is performed by PCS block. The PCS performs byte serialization/deserialization, byte ordering, rate matching, and 64B/66B encoding/decoding for the reliable digital data channel. However, we restrict the scope of present work to the performance measurements of physical layer parameters, keeping aside the issues of the PCS sublayer. Tuning of the transceiver parameters is required for channel conditioning which affects the signal integrity and achieve the lowest possible bit errors. The major challenge lies in the fact that various components of the link have different parameter settings with a wide parameter optimization space and higher statistics are required to achieve low BER probability for a given confidence level [39].
Transceiver Toolkit (TTK) [32] from Intel is used to validate the transceiver link signal integrity and to access and tune the transceiver settings in real time. The Auto-Sweep test is performed to identify the best PMA parameter settings [40]. Transceiver parameter settings like Voltage Output Differential (VOD), Pre-emphasis Pre-tap, 2nd Pre-Tap, 1st post Tap, 2nd Post tap, Equalization, DC gain and Variable Gain Amplifier(VGA) [41] are scanned and tuned for the optimal performance by the Auto-Sweep test in TTK. It also reports the signal quality of the received data in terms of eye dia--13 -gram to understand the signal degradation mechanism. Eye diagram serves as an indicator of the link performance and is used as a target parameter for the link optimization. The test setup for BER measurements and to tune the transceiver parameters for the high speed optical link is shown in figure 12.  It consists of an integrated FPGA system with embedded transceivers along with the Serial Form-factor Pluggable (SFP+) optical transceiver module and the Multi-Mode Fibre with connectors. Firstly, the light output from the optical transmitter is coupled to fibre and looped back without any optical attenuator. With this setup transceiver parameters are tuned using TTK and Auto-Sweep test. This achieves the optimum values of the transceiver PMA parameters known as solution space [40] at the targetted BER for the maximum height/width of the eye diagram. With these optimized solution space PMA settings; a manually controlled In-line Variable optical Attenuator (VOA) is introduced in the fibre loopback to induce optical power degradation. The Optical power output after the attenuation is measured using a handheld optical power meter with an insertion loss of < 0.3dB at the 850nm range of operation. The output from the attenuator is looped back as shown in figure 12. A Pseudo-Random Binary Sequence (PRBS) is transmitted across the transceiver link to evaluate the BER function with the pattern checker. BER at different attenuation levels were measured. This test characterizes the sensitivity of the receiver and the minimum optical power required for achieving a specified BER in a system. Details are discussed in section 5.2.2.

Performance evaluation
Test results and the performance analysis for the implementation of the two interfacing links on FPGA are discussed in this section.

Link-1: GBT protocol on FPGA
The GBT-FPGA logic core firmware reference design is implemented on FPGA. Resource estimation is necessary to approximate the number of links that could be packed on an FPGA and to get an idea of the hardware resources utilization. It is also important that the modules consume least amount of power so that power consumption involving computation processes remains within the margin of power rating and prevents overheating. The resource utilization is shown in table 5 and power consumption using the Intel internal power monitoring tool is summarized in table 6 Latency measurement for the GBT protocol is a crucial parameter. Data transmission from the detector to the DPU have to be time synchronized and a fixed latency is required for the application in the trigger and timing system. Latency occurs in both transmitting and receiving directions, -14 -  depending on the media and path involved. The total path L1 in the loopback mode as shown in figure 7 and is given by equation (5.1). It consists of GBT transmitter (GBT Tx), Multigigabit transmitter (MGT Tx), Multigigabit receiver (MGT Rx) and GBT receiver (GBT Rx).
The number of clock cycles utilized in the GBT transmit and receive section and the MGT transceiver section are estimated separately. MGT transceiver is removed from the GBT protocol, and the GBT transmitter is coupled to the receiver section at the firmware stage. The two paths L2 and L3 is given by equation (5.2) and equation (5.3).
The clock cycles utilized are observed using the simulation models. The Latency in the MGT section is calculated by the difference of L1 and L2 path delays. Measurement within FPGA is always dependent on the data rate, hence the delay is measured in terms of clock cycles (1 clock cycle = 25 ns). Transmission latency is measured for all the possible combinations of mode of operation of GBT protocol and tabulated as shown in table 7. The information is useful for the designers to optimize the data acquisition firmware. The signal quality of the GBT protocol operating at 4.8 Gbps is measured using Lecroy serial data analyzer. The Eye diagram and the details of jitter measurements are shown in figure 13. Eye width/height is 176.8 ps/373 mV at the BER of 5.525 × 10 −12 . The measured total jitter is 51.148 picosec only. The data obtained is acceptable and beneficial for further studies.
BER measurements for GBT protocol with respect to the two encoding schemes as shown in figure 3 is plotted in figure 14. An exponential fit to the data is implemented.   2.1 dBm as given in equation (5.4) is in close agreement to the measurement conducted for GBT protocol implemented on Xilinx FPGA [42] which is around 2.5 dBm.

Link-2: 10GbE protocol on FPGA
It includes two models. Model-1 presents the implementation results of 10GbE on FPGA and the analysis in terms of resource utilization, stages of the frequency translation, format of data transmission and the latency involved. Model-2 presents the transceiver tests, tuning to achieve solution space, spider chart, eye diagram and BER measurement as a function of optical power.

Model-1 results
The test setup shown in figure 8 is implemented on FPGA and the logic resources utilized are summarized in   802.3ae Ethernet standard [11] when transmitting data frames on the XGMII interface. 10GbE MAC transmitter performs the endian conversion [43] and the frames received on the Avalon-ST interface from the user follows big endian format. The transmission on the XGMII interface follows little endian format by transmitting the frames from the least significant byte as shown in the figure 16. In the receive data path, the 10GbE MAC Receiver decodes the data lanes coming through the XGMII. For all valid frames, the 10GbE MAC receiver removes the START, preamble, SFD, and EFD bytes and ensures the byte-wise frame alignment. The data transfer latency regarding the clock cycles is calculated for the user logic shown in figure 10 and summarized in table 9 Table 9. Latency estimation for data transfer(1 clock cycle = 156.25MHz).

Model-2 results
Transceiver testing is done as discussed in section 4.  for optimizing the parameters as it provides the most stressful boundary conditions to achieve a confidence level in the operating margins of design as shown in figure 17. Autosweep test has been performed to scan the best performing case concerning Eye Width/Height at targetted BER of 10 −12 .
As indicated by the Auto-Sweep test, solution space is plotted in the form of spider chart as shown in figure 18. The parameters are fixed and the eye diagram is captured at the best PMA settings, and it is shown in the figure 19 with Eye Width(Horizontal Phase Step)/Eye Height(Vertical Step) as 45/26. BER at different attenuation levels of optical transmitted power [44] are measured as shown -18 -   figure 20. Optical transmitted power of around -11 dBm is required to achieve the BER of 10 −12 for the transceiver under test. The exponential fit through the data points yields equation of the form BE R(dB) = a × e b×Power(dBm) , where coefficients a and b are -144.33 and 0.22887 respectively. The exponential fitting is done as BER is approximated by complementary error function 'erfc' and the system noise is Gaussian in nature; in logarithmic scale, it is approximated as exponential. The statistics for goodness-of-fit; the Sum of Squares due to Error (SSE), R-square, Adjusted R-square and Root mean squared error (RMSE) is 11.71, 0.9698, 0.9686 and 0.484 respectively.

Discussion
The links of the scheme are implemented on Intel FPGA. The resource utilization and power consumption are measured. Latency calculation gives a measure of the clock cycles utilized for the data processing in the logic path and the distribution of the buffer (Elastic or external phase aligner) in the transmission path. This information is a useful reference for the designers to optimize the data acquisition firmware. Latency in terms of clock cycles for Tx Latopt and Rx Std mode of GBT operation is found to be 14 clock cycles (350 ns) which is the most utilized mode for fixed latency operation. Tx Latopt mode is necessary to send the timing information in a deterministic way whereas on the receiver side the data comes padded with time stamp and hence the timing constraint is relaxed on receiver side that allows the use of Rx Std mode. Signal quality of the GBT protocol is measured using eye diagram with BER of the order of 1 bit in 10 12 and jitter range of Picoseconds only. The margin of receiver sensitivity is found to be 2.1 dBm for the two encoding schemes of GBT at the targetted BER ∼ 10 −12 . It is found that the measurement of BER for GBT protocol with respect to the optical power as shown in figure 14 cannot be pursued below -17 dBm receiver sensitivity, due to the loss of recovered clock. However, the plot can be extrapolated based on standard complementary error function nature of the curve, assuming the Gaussian nature for noise. 10GbE link is implemented using the Qsys approach and the three level of frequency translation from the fabric frequency to the optical transmission is discussed. Endian conversion during the data packet transmission for the protocol is shown in figure 16. The number of clock cycles to transmit the data buffer through the system interconnect fabric are calculated. Transceiver is tuned for the high-speed link using signal conditioning circuitry as it forms the important hardware interface for data transmission from OCM to DPU at 4.8 Gbps and from DPU to the DAQ server at 10Gbps. Autosweep test is performed using Intel TTK and the multivariate data for the best case is displayed on a 2D spider chart shown in figure 18. The variation of BER at the speed of 10Gbps as a funtion of optical power upto -15.5 dBm is plotted in figure 20, below which the receiver sensitivity is lost. The deviation of the data set from the exponential fit is due to the various parameters for instance, opto-electronics conversion factor, gain, optical couplings, the insertion losses and the accuracy of the instruments used.