Metrics and methods for TTC-PON system characterization

A new generation FPGA-based Timing, Trigger and Control (TTC) system based on emerging Passive Optical Network (PON) technology is being investigated to potentially replace the existing off-detector TTC system used by the LHC experiments. The new system must deliver trigger and data with low and deterministic latency as well as a recovered bunch clock with picosecond-level jitter. This new topology offers major improvements over its predecessor: bi-directionality as well as higher capacity. This paper focuses on the figures of merit used to characterize the TTC-PON system both downstream and upstream, on the techniques used to extract them and on the impact of these first results in optimizing the architecture.


Introduction
The Timing, Trigger and Control (TTC) system is a crucial system dedicated to synchronization of experiment electronics to the LHC beam. Currently, it is a unidirectional network extensively deployed in all major detectors distributing the LHC bunch clock and the level-1 trigger accept decision (L1A) as well as individually addressed or broadcast commands to the various detector partitions [1]. To match the needs for increased payload capacity and to provide bi-directionality, a feature absent from the legacy TTC, a new generation TTC system is being investigated for offdetector use, based on PON technology (figure 1b). A PON is a bidirectional (but single fibre) point-to-multipoint network architecture in which optical splitters are used to enable a master node or Optical Line Terminal (OLT) to communicate with typically 64 slave nodes or Optical Network Units (ONUs). It is based on mature devices, as the PON is nowadays the most successful solution worldwide for deploying FTTx networks [2]. A first TTC-PON demonstrator was built in 2010 during early investigations made at CERN, using commercial FPGAs and 1-Gigabit Ethernet PON transceivers. The first and very promising results motivated the work to explore the emerging XG-PON technology in order to better fit the user requirements in terms of latency and payload. With the aim of proposing first prototypes for 2015, the present phase of the TTC-PON project consists in exploring several types of PON technologies and architectures being developed -1 -

JINST 9 C01015
for commercial access networks. One or several potential solutions will then be identified and adjusted to experiment-specific TTC requirements such as bandwidth, clock recovery, upstream and downstream latency, as well as system feasibility, price, and compatibility with legacy TTC. To fully evaluate and compare the performance of the existing and future TTC-PON system prototypes, a detailed set of characterization methods and criteria has been developed. The complete set consists of two groups: those that assess the functionality of the system; and those that relate to the various aspects of the enabling PON technology. The former cover: latency, payload size, recovered clock phase/jitter, and error detection/correction capability. Those related to the technology include: split ratio, dynamic range, upstream burst-mode sensitivity penalty, and CDR phase acquisition time. These parameters are evaluated by measuring Bit Error Rate (BER) and Packet Loss Ratio (PLR, in the upstream direction only) with changing network parameters such as split ratio, relative power ratio and phase of upstream consecutive packets, duration of the silence period between packets and training pattern duration.
After a brief description of the architecture of the TTC system in section 2, we will present in section 3 the method used to characterize our PON-based TTC topology as well as the results obtained, with an emphasis on the upstream direction where time-domain-multiplexing (TDM) is used to allow each ONU to transmit in turn. As a conclusion to this study, the section 4 will finally present the impact of these results for further TTC-PON developments, especially in term of protocol and architecture.

The current TTC system
The existing TTC architecture in LHC is shown in figure 1a. It consists of a TTCex module which communicates with a number of TTCrx receiver application specific integrated circuits (ASICs), via a passive optical tree. TTCex receives information from two channels activated by the trigger control system; Channel A contains the level 1 trigger accept (L1A) decision and channel B carries general purpose commands for the synchronization and calibration of the detector partitions. TTCex multiplexes the two channels in time, encodes them and uses the data to drive a bank of up to 10 Fabry-Perot lasers. Data are transmitted through an optical fiber which is about 100 m long and are distributed to a maximum of 32 TTCrxs via an optical fan-out. The downstream data rate is 40 Mb/s and the bi-phase mark format is used to encode the data.  -2 -TTCrx acts as an interface between the TTC system and the detector partitions. Its function is to recover the LHC clock with deterministic phase and distribute it to the front-end detector electronics. The clock is de-skewed to compensate for variable particle times of flight and cleaned before distribution. TTCrx also demultiplexes channels A and B and delivers the synchronization commands, the L1A trigger-accept decisions (with very low and fixed latency) and their associated bunch and event identification numbers to the front-end electronics. Although very efficient, this system is suffering from bandwidth limitations. It is also lacking bidirectionality and prevents users from sending feedback to the TCS (Trigger Control System). To tackle this problem a separate electrical "busy/throttle" link delivers feedback on the status of the front-end readout buffers and of the data acquisition system to the trigger control system. In fact, the 'busy' signals from all detector partitions are merged in the fast merging modules (FMM) in the CMS case or in the ROD Busy modules in the ATLAS case, so that only one signal per detector partition finally reaches TCS. If a front-end buffer is ready to overflow a "warning" signal is issued and TCS inhibits the L1A trigger-accept until the occupancy in the buffers falls below a predefined threshold and a "ready" signal is issued.

.1 TTC-PON Setup
The bidirectional and point-to-multipoint architecture of passive optical networks theoretically allows building a TTC system without the drawbacks presented above. To prove it, a complete TTC-PON prototype has been developed using a Virtex-5 evaluation kit (figure 2) [3]. An equivalent TTCex instance has been realized by combining the Virtex-5 FPGA and a commercially available 1G-EPON OLT transceiver (OBL4333F by OESolutions) while two equivalent TTCrx instances (out of 8 available) have been realized by using the same FPGA and 1G-EPON ONU transceivers (OBN3433F by OESolutions). As both the OLT and ONU entities are using the same FPGA, a careful split of the clock domains was required: each entity has its own reference clock provided by a separate generator and is connected to its own GE-PON optical transceiver. The downstream direction (from OLT to ONUs) is based on an optical serial link running at 1.6Gbps, in the 1490nm wavelength band carrying triggers, trigger types, clocks and other data synchronously broadcasted to all the branches of the network. In the upstream direction (from ONUs to OLT), however, the use of oversampling clock and data recovery (CDR) lowers the bit rate of the optical serial link down to 0.8Gbps. Moreover the transmission medium is shared among the ONUs using time domain multiplexing (TDMA) in the 1310nm window -a much more challenging task requiring burst mode transmitter ONUs and a burst mode receiver in the OLT. Most of the tests presented below were automatized using GPIB remote control and Tcl/tk scripting. According to the specifications provided by the OLT and ONU manufacturer, the power budget allowed by such a PON system is of about 28dB. A bit error rate (BER) test was run to verify this value using a single ONU directly connected to the OLT. It was confirmed that this power budget was comfortably met by our system as shown in figure 3. The safe split ratio offered by this system was thus calculated to be 1:64, keeping a margin of 6 dB for various unexpected power losses and ageing of optical components.

Downstream payload
A first custom protocol was built in order to meet the needs of the LHC experiments and to serve as many ONUs as possible (64). With a bit rate of 1.6Gbps and an 8b/10b encoding scheme (arbitrarily chosen), the available payload is 1.28 Gbps, largely sufficient to broadcast triggers, synchronous commands and configuration frames. Figure 4 shows a graphical representation of the downstream protocol for 64 ONUs. The transmitted frame contains the following encoded Bytes [3]: "K", frame alignment and synchronization. "T" & "F", trigger/Trigger type, L1A trigger accept decision, real time transmission. "D1" & "D2", commands (Broadcast or individually addressing, depending on the first bit of D1). "R", address of the next ONU allowed to transmit upstream.
Such a protocol allows broadcasting 16 bits at 40MHz for bunch synchronous information (trigger, Event Counter Reset, Bunch Counter Reset, etc.) using the T and F fields and of 15 other bits dedicated to broadcast or individually addressed data (fields D1 and D2). This results in a fixed bandwidth of (16bits× 64)/1.625uS = 630 Mbps for bunch synchronous data and of (15bits× 64)/1.625uS = 590.8 Mbps to be shared between broadcast and unicast commands. Each ONU can benefit from a maximum of 9.23 Mbps of individually addressed data. Of course, these values are strictly related to the protocol and are subject to modification in the future.

Latency (downstream and upstream)
The latency measurements were performed using the setup described in figure 6. The attenuator, PON power meter and the 1:2 splitter shown in figure 2 were removed. A set of two flags, for downstream direction, was implemented to emulate a trigger broadcast; flag 1 on the OLT side (trigger pulse generation right before the GTX-Tx), and flag 2 on the ONU side (output of the GTX-Rx after frame decoding). The downstream latency between flags 1 and 2 has been designed to be deterministic and measured to be 10 BC, including ∼1 BC due to the optical link. For the upstream direction, another set of two flags was implemented to emulate the ONU's address transmission; Flag 3 on the ONU side and flag 4 on the OLT side. It is important to note that the upstream latency

Upstream path characterization
The TDMA technique used to arbitrate the multipoint-to-point communication from ONUs to OLT makes the characterization of the upstream line much more complex than a simple point-to-point transmission line: each ONU is transmitting in turn a burst of data which is then multiplexed in time with the bursts of the other ONUs. To fully understand the quality of the transmission channel for each ONU independently, a burst-mode bit error rate (BBER) test was implemented. In this technique, parallel BBER tests run for each ONU, using a dedicated protocol for each ONU, where each burst is made of a preamble used to train the OLT (p1 for ONU1 and p2 for ONU2), an 8b/10b comma and address field (K1 and K2, different for each ONU), and a burst of PRBS data bounded by a start of frame (SOF) and an end of frame (EOF), as can be seen in figure 7. As the BBER continuously locks and unlocks with each packet, it was also useful to have a statistic of the quantity of packets for which the BBER engine could not lock at all: the packets loss ratio (PLR) [4]. These packets are not contributing to the BBER itself and can hide a channel defect in certain particular cases. Besides requiring specific statistic tools for analysis, the various characteristics of each ONU (optical power, length of the line) forces the optical receiver and the CDR of the OLT to cope with fluctuating power thresholds, gaps in the data frame and phase steps between bursts of data.
To understand the impact of all these factors on the BBER tests described above, we added them to the data frame presented in figure 7 as configurable parameters: one of them is the CID field (consecutive identical bits [4], basically consecutive '0's constituting the silent period between two gaps) to which a phase step is added (∆φ in degrees, 360 o =1 bit clock period) modelling the phase change between two bursts. The output power of the ONU2 was modulated by an optical attenuator (JDSU, MAP-200 mVOA-A2) to introduce a configurable dynamic range (α). Finally, the lengths of the preamble and of the data fields were made adjustable to analyse the impact of the protocol itself on the quality of the transmission. The following sections present the impact of the abovedescribed parameters on the BBER and PLR performance of the TTC-PON system equipped with two ONUs. As the type of CDR used in the FPGA for the OLT has also an impact on the ability of the system to cope with certain parameters (essentially the phase step and the lengths of the CID and the preamble), some parameters were analyzed for two CDR architectures: one based on conventional phase-tracking CDR [4] (referred as N-CDR) and one based on oversampling CDR without phase tracking [4,5] (referred as O-CDR), with the objective to quantify their immunity to the system parameters.

BBER and PLR for a relaxed system
The first measurements were conducted to evaluate the performance of our system in a steady and relaxed configuration. We chose thus a large preamble length of 1120 bits and a comfortable silent period between bursts (CID) of 3200 bits with ∆φ =0 o and we measured a simple BBER and PLR versus the global optical power at the input of the OLT. The comma length (16 bits) and data length (2032 bits) were kept constant throughout all tests. For this initial test, the O-CDR architecture was implemented. Figure 9(a) and (b) show a power budget of ∼36dB (considering also the average optical power of the ONUs, ∼2dBm), with slight differences between the two ONUs.

Effect of CID length
In order to compare the performance of the two types of CDR and the impact of CID, consider the plots shown in figure 9(c) for O-CDR and 9(d) for N-CDR. The penalty of a short CID was evaluated by performing a BBER test versus optical power for a discrete set of values of CID for both ONU's and both CDR types. All the other parameters were kept constant (preamble length=1120bits, ∆φ =0 o ). The plots show that changing the CID length introduces a penalty between 0.5dB and 1dB in the BBER test. The PLR vs Optical power plot (not shown in this document) follows the same trend. As far as the CDR type is concerned, the results are comparable (except for CID=0bits) and this is probably due to the fact that ∆φ =0 o . We noticed as well that the performance of both CDRs improved along with the length of the CID. Therefore, assuming that the O-CDR would be used, the CID value can be kept within the range of 0-400 bits.

Effect of Dynamic Range
Measuring  -7 -

Effect of Preamble length
The penalty of the Preamble length was evaluated for both CDR types by performing a BBER test versus optical power on both ONUs for a discrete set of values of p1 and p2. All the other parameters were kept constant (CID length=3200bits, ∆φ =0 o ). The plots show (figure 10(a): O-CDR, figure 10(b): N-CDR) that decreasing the preamble length introduces a penalty between 0.5dB at low attenuation values (<34dBm) and 1dB at higher attenuation values (>35dBm) in the BBER test. The results also showed that less than 64 preamble bits result in a PLR≈1 for both CDR types (PLR vs Optical power plot is not shown in this document). Increasing the number of preamble bits above 80, error-free operation was observed for low attenuation values (<34dBm). Although the two types of CDR perform similarly for most of the preamble lengths at low attenuation values, the O-CDR type seems to be more robust and immune to preamble length changes at higher attenuation values. Consequently, whichever CDR type used, the preamble length can be kept as low as 80 bits.

Effect of phase step
The phase penalty was measured by carrying out a BBER and PLR tests versus optical power for 10 discrete phase values (a step of ∆φ =36 o represents 1/10th of a bit period of the 0.8Gbps upstream link), for both CDR types. It required a reference clock synchronization for both ONUs, achieved by externally linking together each ONU's clock generator. All the other parameters were kept constant (preamble length=1120bits, CID length=3200bits). The plots on figure 10(c) and (d) show the BBER and PLR (respectively) versus optical power performance of O-CDR. Similarly, N-CDR's performance is shown on figure 10(e) and (f). Comparing the two BBER and PLR tests, it is easily noticed that O-CDR performs significantly better to all phase steps compared to N-CDR. This is expected since O-CDR is more likely to sample at the middle of each data bit [5].

Summary
The objectives of the study presented here were to observe the TTC-PON system behavior under various stress conditions in order to optimize it to a TTC application, both at the physical and at the protocol layers. In particular, we had to understand the effect of burst mode transmission upstream and analyze the sensitivity of a PON topology to various constitutive parameters of the system: physical parameters (attenuation, split ratio, dynamic range, phase between ONUs) and protocol related parameters (interframe gap, training pattern, data payload per burst, CDR type, encoding scheme). A test setup and a reproducible methodology were implemented to ensure the analysis of our current system and prepare that of the next generation. Thanks to the results presented in this -9 -