Effect of repetitive reset, temperature variation and frequency offset on the performance of PLL for the LHC experiments

The Large Hadron Collider (LHC) uses timing, trigger and control (TTC) system backbone to distribute the bunch clock and other critical timing signals to all the participating experiments. The clock signal is directly derived from the radio frequency (RF) driving the beams in the accelerator. The whole range of electronic systems from an ADC to high-speed transmission link works in synchronous to the clock signal and are sensitive to jitter. Throughout the clock distribution chain, high-frequency components increase the jitter. Multiple Phase-Locked Loops (PLLs) are used in the entire chain to maintain the jitter noise at a minimum level. Si5344 PLL by Silicon Labs is chosen as one of the candidate PLLs for jitter cleaning of the embedded clock in LHC gigabit serial link transmission. The article aims to highlight the qualification tests conducted to characterize the PLL component. Laboratory test setup is built to emulate the thermal variation of the LHC as in run-time condition. The present research study investigates the influence of temperature variation on PLL jitter-cleaning performance when operated in different configuration modes needed in TTC distribution chain. Stability of the PLL circuitry to lock correctly with repetitive reset assertion, power on/off cycle and marginal frequency swing about the LHC bunch clock mean frequency value is also studied.


Introduction
The experiments at the CERN LHC are composed of multiple detector systems, each with a large number of sub-detectors. Each of the sub-detectors receives the Bunch Clock (BC) signal derived from the RF frequency as a synchronization reference. The Timing-Trigger and Control (TTC) system [1] is dedicated for synchronizing the electronics of the experiments to the LHC beam. The upgraded TTC system uses Passive Optical Network (PON) technology [2] and CERN GigaBit Transceiver (GBT) [3] optical solutions to broadcast TTC information. FPGA based readout boards like CRU are used in the back-end system of the LHC experiment for TTC communication and lead the designers to opt for a programmable PLL solution that can be mounted on the readout card itself. The design parameters are hard to change once mounted on the PCB. Hence, proper identification of LHC timing related requirement is necessary before choosing the required PLL. Timing and synchronization is a major challenge for the LHC experiment. Synchronization of data is a necessary condition for successful operation of the trigger and readout systems of the LHC experiments. Any loss of link synchronization triggers a string of cascading failures in data readout chain and consequently stalls or corrupts the readout data of the particular data communication chain. Varela [4] in his research work has provided a broad overview of the synchronization types that need to be achieved and monitored at different levels and in different contexts for the operation of the system. Among the various synchronization types, Sampling Synchronization and Serial Link Synchronization need PLL applications. In sampling synchronization, the detector signals are -1 -matched with the clock phase. While in serial link synchronization, the recovered parallel data word is aligned with the incoming serial bit stream. Both processes involve the PLL to recover and regenerate the clock with phase aligned to the referenced clock signal maintaining low phase noise. In the past, analogue PLLs were used in the timing system of the LHC to recover clocks and filter out jitter introduced by the transmission media. The design parameters are difficult to change once the PCB has been mounted. In the recent years, the extensive use of FPGAs in LHC experiments allow the application of fully programmable digital PLLs. Even precise skew monitoring in sub-nanosecond resolution without any additional hardware in between the generated digital clocks is possible within the FPGA itself [5]. The work of Sanchez et al. [6] describes two variants of digital PLL in use in the general timing system for the LHC: the all digital PLL that uses NCO (Numerically Controlled Oscillator) and the hybrid PLL that uses VCXO (Voltage Controlled Crystal Oscillator) with DAC (Digital to Analogue Controller). Ideally, VCXO is preferred in designs where lower phase noise with higher frequency stability is needed. In the ASIC world, all digital PLL normally refers to a PLL with TDC (Time to Digital Converter) + DCO (Digitally Controlled Oscillator) [7]. The PLL we tested (Si5344) has a special architecture with a dual loop (DSPLL): the inner-loop is composed of a wide-band analog PLL which acts as a DCO (modulating divider value in feedback loop) and the outer-loop is a highly configurable digital PLL (instead of a TDC they use a PD+ADC) [8].
Apart from the proper selection of PLL technology type, the major significance is attached to the jitter cleaning ability of the PLL. LHC timing reference signals distributed to the electronic systems in the detector need to maintain sub-nanosecond jitter and constant phase. It has been shown in the early works of Taylor [9] that it is possible to multiplex and encode complex trigger and timing signals over an optical communication system provided the timing signal integrity is preserved. Acceptable RMS jitter of the LHC clock for remote recovery of timing output lies below 10-20 ps. The contribution due to fiber dispersion and other interfering channel noises exceeds the RMS jitter value over 200 ps. Hence, intermittent use of PLLs is done to clean the timing signal to meet the timing requirement. Kolotouros et al. [10] while characterizing for TTC-PON optical link standard have shown that even for the same input-to-output frequency configuration, PLL originating from different vendor sources exhibits significant deviation in the jitter cleaning effect. This happens due to differences in PLL architectures and configurations.
In this article the effect of reset, temperature variation and frequency offset on Silicon Labs Si5344 PLL is studied. The article is organized into six sections. Section 2 discusses the types of timing signals in the LHC experiments and their short-term frequency un-stability during the energy boosting process of the accelerator. Section 3 elaborates on the PLL feature requirements for the LHC experiments. Section 4 covers the details of the measurement setup. Section 5 discusses the measurement results of clock skew variation and jitter for repetitive reset assertions at room temperature and the effect of temperature ramping. The effect of marginal frequency offset on PLL's ability to remain locked is also discussed. Finally, in section 6, we conclude our discussion and suggest a course for future research direction.

JINST 14 P02001
2 Frequency swing of timing signals in the LHC experiments LHC RF frequency operates at a value of 400.78 MHz. The restriction in the choice of frequency is imposed by the design constraint at CERN that forces the RF frequency to be a simple multiple of Super Proton Synchrotron (SPS) frequency of 200.4 MHz. This approach was economical for CERN as it puts in use the exciting setup of SPS in the built of the LHC accelerator [11]. In LHC, the accelerated particle beam is not a continuous stream instead it is chunks of squeezed particles contained in an RF bucket, referred to as bunches. The RF frequency plays a crucial role in determining the timing of the arrival of those bunches in a particular RF bucket. Before the onset of physics run, three major activities are involved: synchronization of the SPS to the LHC, generation of beam synchronous signals (40 MHz bunch clock (BC) frequency, revolution frequency or Orbit and injection kicker pulses) for experiment users and fine re-phasing of two LHC rings needed only during the Pb-p runs otherwise both the beams are always synchronous [12].
The master clock in the LHC is the RF frequency from which the timing signals are derived and are synchronous to the circulating beams, that makes the frequencies to swing during beam acceleration. The signals of our interest are the BC and the Orbit. The timing signal BC is a square wave, derived from the RF ( f RF = 40.078 MHz ± ∆ f RF ) by a frequency division of 10. While the second timing signal is the Orbit is a 5 ns long pulse operating at particle revolution frequency ( f rev = 11.245 kHz ± ∆ f rev ) derived by dividing the RF frequency (400.78 MHz) by 35640, and is used for marking the position of first RF bucket. Corresponding to each ring Bunch Clock (BC1, BC2) and Orbit (Orb1, Orb2) signals are assigned. The BC1 and BC2 are synchronized to the same master RF all the time (except for local variations due to Low Level Feedback loop). The transmission responsibility of the timing signal from the LHC machine to the experiments lies with the TTC (Timing, Trigger and Control) systems [13].
The timing signals in the LHC experiments shows short-term un-stability with the process of energy-boosting of the accelerator [14]. An operational LHC beam mode involves the injection phase, the ramping phase, the beam squeeze phase and the stable beam phase [12]. Figure 1 illustrates the transition of beam energy with each mode. The LHC machine gets prepared for the energy ramp soon after the injection process terminates and continues until it reaches its highest energy state before settling into a stable beam. During this period of beam ramping or the absence of stable beam, the frequency of the LHC timing signals swings around the mean frequency value with respect to the beam energy. The fluctuations of timing signal BC with the beam energy is plotted in the same figure 1 to correlate the two parameters, where x-axis represents the time. The absolute value of frequency swing (∆ f ) of the timing signals for proton beam and ion beam is tabulated in table 1. Our study also involves the effect of frequency offset on the PLL at room temperature, to anticipate the behaviour of the PLL during runtime condition with the frequency swing.

Understanding the PLL feature requirements based on LHC experiment needs
PLLs are widely used throughout the clock distribution tree in LHC to keep the jitter low and to compensate for the slow variations of the clock signal. Four essential requirements need to be met for application in the LHC experiment:  2. Frequency drift occurs in the timing signals during LHC beam ramping (beam dump or filling operation), and at this time it is crucial to maintain the lock to complete the synchronization -4 -process. As noted in table 1, the LHC BC frequency varies over a period of ≈ 22 mins in the spread of 87 Hz and 548 Hz for the proton and ion beam energy ramp respectively. Hence, preventing the loss of lock in PLL during frequency ramp is the second requirement [14].
3. The electronic components in the LHC experiments are subjected to power cycle or hard reset during the synchronization phase. This takes us to the third requirement of the PLL phase stability that is to ensure no sudden jumps of the phase of the recovered clock happens in between electronic equipment power cycles or hard resets [1]. 4. The PLLs are mounted on the same board over which the FPGA or the processing hardware is placed for ease in programmability. The experimental setup is maintained in a controlled environment where the temperature is strictly regulated. The stress on hardware resources during sudden upsurge in data processing might cause the onboard temperature to ramp-up significantly beyond the ambient temperature. Hence, the fourth criterion is to ascertain that the drift in PLL jitter lies within the tolerable margin during temperature fluctuation such that the system does not go out of the locking range [1].
Apart from the aforementioned requirements there exists a need for a few other desirable features. Hardware design engineers likes to make use of high fanout PLLs feature to lower the number of timing components in BOM (Bill of Material) and make a more compact high-density PCB layout. Switching between two source synchronous timing signal is needed in certain cases, however it causes phase perturbation and carries a risk of downstream low-bandwidth PLLs to suffer from loss of lock due to a clock rearrangement. PLL that supports hitless switching feature can compensate for the phase difference during two source synchronous input clocks switchover and allows a smooth takeover of the synchronization process [16]. Consider an event when the input synchronization clock goes absent. In that scenario, as a fall-back solution holdover mode feature is needed to keep the clock chain sync-stabilized until the sync-clock comes back. Moreover, for systematic investigation of the unanticipated loss of lock in the PLL, a handy debugging checklist is available from Intel Corporation [17].
A wide range of PLL technologies are available in the commercial market. A detailed characterization of different PLLs are needed to understand the best suited jitter attenuating PLL for the LHC timing application. In our study Si5344 PLL from Silicon Labs is chosen as the device under test. The Si534x PLL [18] monolithic chip uses combined technology of fourth-generation DSPLL and MultiSynth ™. The DSPLL architecture permits frequency translation to any-output clock frequencies from 1 kHz to 800 MHz while MultiSynth fractional divider enables the PLL to achieve 0 ppm frequency synthesis error along with less than 100 fs RMS phase jitter performance (integrated phase noise over 12 kHz to 20 MHz) [19]. Other key features include synchronous/freerun/holdover modes and automatic/manual hitless switching. Si5345/44/42 product family supports simultaneous fan-in of four clocks and fan-out of 10/4/2 clocks respectively depending on the chosen device part number from the Si534x family. In the experimental setup Revision B Si5344 evaluation board is used for characterizing the PLL. A fully automated test setup is built to characterize the PLL as shown in figure 2. Automation enabled us to acquire over 1000k data points in different settings with increased efficiency and reliability over a short period of two weeks of continuous run-time. The arrangement of the test setup consists of seven elements: the CTS-Climatic Test Chambers, to maintain the controlled variation of temperature on the Si5344 PLL; the Revision B of Si5344 PLL evaluation board, as the device under test; the PT100 RTD (Resistance Temperature Detectors), to monitor the surface temperature of the Si5344 PLL chip; the data acquisition switch unit, to register the reading from PT100 RTD; the clock generator model CG635, to generate the source clock; the digital phosphor oscilloscope DPO-5104B, to monitor the clock signals and the processing unit, to act as a nodal point for the data acquisition and analysis. In figure 2 the clock line connections are marked in red while the control/data acquisition lines are marked in blue. The processing unit is responsible for the central readout of data acquisition systems and control of individual devices. Each of the components is interfaced with the processing unit: CTS using Ethernet link, Si5344 PLL using USB, DAQ Switch using GPIB control, Oscilloscope using Ethernet link and Clock generator using GPIB control. Additionally, the processing unit being connected to the Internet allows flexibility for remote supervisory control. The phase of the output clocks is tracked with respect to the clean reference clock provided by the CG635 clock generator. The entire test is conducted at a controlled room temperature that varies between 21-23 • C. Only the Si5344 evaluation board is placed inside -6 -the CTS chamber to subject to the controlled temperature changes. Communication with the PLL to the PC is established using an I 2 C interface over a USB link connection and is programmed using the manufacturer provided ClockBuilder Pro software. PLL Jitter cleaning ability is at its best for 200 Hz loop bandwidth settings [15]. Hence, for the entire test the configuration of PLL loop bandwidth is kept fixed at 200 Hz with output driver set to LVCMOS mode. The output signals are applied to an oscilloscope and jitter measurements are recorded using DPOJET software [20] tools.
A basic level of feedback control mechanism is built into the program to minimize human intervention for the entire measurement duration extending over a few days. The control logic consists of two blocks: the monitoring logic and the decision making logic. The monitoring logic oversees the activity of individual components. While the decision making logic checks for any anomaly detected in the behaviour of the components. In case specific abnormality found, in its first attempt it tries to rectify it by asserting reset to all components and dumping the collected data-sets for that particular run to prepare for the fresh start. Even after that if the issue persists, it forces a power off sequence and informs the supervisor with an alert message for immediate attention.
Rapid acquisition of 10k measured values individually for eight different operation conditions over a short period, requires the ramping rate of CTS temperature to be increased. As enough time is not given for PLL to achieve thermal stability, the measurement error due to thermal hysteresis becomes dominant. The effect gets more pronounced if the rate of change of temperature is high. Without compensation the measurement is plotted in figure 3 (a), where the ramping rate of CTS is 1.9 • C/min. Hence, to remove the hysteresis error the surface temperature required to be monitored closely. For this purpose PT100 RTD (Resistance Temperature Detectors) is used to precisely monitor the surface temperature of the Si5344 PLL chip. The sensor is mounted on to the surface using an epoxy adhesive. Figure 3 (b) shows the measurement when the temperature is recorded using RTD. In the measurement setup, PT100 is operated in 4-wire connection mode to compensate for the asymmetries in the connection length and removes the necessity for calibration.   figure 2. In this mode, the PLL device assures the internal delay introduced due to the circuitry is cancelled out and maintains consistent minimum delay between the selected input and the outputs [18]. The data points for the two modes, namely 'No ZDM' and 'ZDM' are shown in the same plot and marked separately in red and blue colour respectively.   [18]. However, the revision B that is used for this test has a reported errata from Silicon Labs that states "When entering holdover mode without valid holdover history data, holdover frequency may not be accurate" [21]. Moreover during repeated power cycle or reset cycle the PLL temporarily goes into holdover and when it comes out of holdover it does an internal phase adjustment that shifts the output phase. To get fixed input-output delay, HOLDOVER is turned off by setting proper configuration registers in the settings file. Hence, during the test, PLL is operated in 'HOLDOVER OFF' mode. The possible side effect is that, if the input signal is lost the output clock would drift to an unknown frequency.

RESET AND
Oscilloscope parameter settings: are crucial to understand the frequency offset coverage of the measured jitter values. Four settings parameters of primary importance related to our experiment are tabulated in tables 2-4: Oscilloscope sampling rate, Oscilloscope record length, Population of measurement observation and No. of readings taken.
Oscilloscope sampling rate determines the resolution of the waveform. For our experiment, it is of 5 giga-samples per second and remained fixed throughout the experiment. Oscilloscope record length gives the number of points recorded in the complete waveform memory. The capture time can be evaluated from the 'oscilloscope record length', by dividing 'oscilloscope record length' -8 -with 'oscilloscope sampling rate' [22]. For example with a 5 GS/s sample rate (200 ps sample interval) for acquired waveform record of 2 million points, the capture time happens to be 400 µs or the horizontal scale is 40 µs (considering the scope has ten horizontal divisions). Population of measurement observation is determined by the 'population limit' parameter in the DPOJET tool that sets the population of measurement observations that will be accumulated to get statistical averaged values for the measurement [20]. Higher the population of observations, more accurate is the estimation of the jitter measurement, especially for the TIE jitter measurement. While the parameter, No. of readings taken determines the measurement data-points we have acquired using the oscilloscope.

Measurement parameter definitions
In the test system, the measurement for skew jitter and Time Interval Error (TIE) jitter is conducted, as these two parameters play a dominant role in LHC timing signal quality measurement [1]. The definition of the parameters in the context of the experiment is defined in the following paragraphs.
Skew jitter. The delta phase difference between the input reference clock and the output jitter clean clock due to PLL internal circuitry delay is denoted by ∆θ = (θ REF_CLK − θ CLK0 ). This phase offset is known as input-to-output clock skew and is deterministic. The temporal variation of skew is referred as skew jitter (specification not yet finalized in JEDEC) [23] and is random. Multiple skew measurements are done with other configurations kept constant to form a statistical data set (or a population). The spread of the distribution of the data set gives the skew jitter value. It is denoted by J skew . TIE jitter. The Time Interval Error (TIE) jitter or accumulated jitter is the measurement that includes jitter at all modulation frequencies including relatively slow and cumulative variations. TIE jitter is measured in the time domain and is represented in the units of picoseconds or femtoseconds. Higher the size of acquired signal samples more accurate is the estimation of the TIE jitter, but slower would be the processing speed, so there is a trade-off. For our experiment, the digital oscilloscope is operated in one shot trigger node to acquire a statistical population in the range of 15000-10000 for the TIE jitter measurement. It is denoted by J TIE .

Experiment I: effect of repetitive reset assertion on skew jitter at room temperature
Repetitive reset on PLL is asserted at an ambient temperature of 23 • C, using Silicon Labs Clock-Builder Pro software driver. Two types of resets are available in this PLL: hard reset and soft reset. A hard reset is functionally similar to power-up, it erases the volatile memory and downloads the configuration from the non-volatile memory. A soft reset bypasses the non-volatile memory download process and only initiates the registers reconfiguration. To maintain statistical integrity, 10000 data points are gathered for a particular PLL configuration. Each data point is acquired after a hard reset cycle is performed. No phase inconsistency has been detected during 10,000 repetitive reset cycles conducted over all combinations. Figure 5 shows the box plots and distributions of input-to-output clock skew data points for different PLL configurations. The spread of the skew distribution gives the skew jitter, which remains fixed for a specified value of voltage and temperature. The nature of the distribution of input-to-output clock skew is Gaussian, as depicted in figure 6.
-9 -For eight PLL configuration modes the skew jitter values is tabulated in table 2. In the table, skew jitter values for two types of spread in the skew distribution is reported: spread over ±1σ (standard deviation) that covers 68.2689% of the population and spread over ±5σ that covers 99.99% of the population.  -10 -

Experiment II: effect of temperature variation on input-to-output clock skew and TIE jitter
Temperature variation range of 10 • C to 60 • C is chosen as two extremes of the thermal cycle to operate. The ramping rate used in the experiment is 1.9 • C/min for input-to-output clock skew measurement and 0.01 • C/min for TIE jitter measurement. About 12-14 thermal cycles are performed to have statistical accuracy of the measured data points as shown in figure 7 for input-to-output clock skew measurement. For TIE jitter measurement one thermal cycle is sufficient, as the population of observations is ∼25 times greater than the input-to-output clock skew measurement.

JINST 14 P02001
The effect of temperature variation on input-to-output clock skew for different frequency configurations of PLL are shown in figure 8. Average reading of skew at room temperature 23 • C in both heating and cooling temperature cycle is represented by J 23 • C+ skew and J 23 • C− skew respectively. The mean value as marked in figure 8 is calculated by taking the average of the two readings obtained, Table 3 summarizes skew measurement for all the eight PLL configurations. Typical deviation of skew lies within 102-86 ps for No ZDM mode and 51-37 ps for ZDM mode. Strong dependence of input-to-output clock skew with temperature variation is observed. Spread of the skew at particular temperature value gives the skew jitter information for that temperature instant. Instead TIE jitter that gives net accumulated jitter measurement is plotted for different frequency configurations of PLL as shown in figure 9. Average reading of TIE jitter value at room temperature for 23 • C in both heating and cooling temperature cycle is represented by J 23 • C+ TIE and J 23 • C− TIE respectively. The mean value as marked in figure 9 is calculated by taking the average of the two readings obtained, J TIE 23 A larger population of observations gives more accurate statistical estimation. However, specifying a larger population has the disadvantage of requiring a longer measurement period, as the results need to be computed and the instrument have to sequence several times before enough statistics are accumulated to provide the results [20]. Table 4 summarizes TIE jitter measurement for all eight PLL configurations. Typical deviation of TIE jitter lies within 0.69-0.60 ps for No ZDM mode and 0.64-0.59 ps for ZDM mode. Weak dependence of TIE jitter with temperature variation is observed.
For LHC applications, jitter analysis at the high-frequency range is done using phase noise or TIE jitter measurement. TIE jitter carries the same information in the time domain as phase noise is in the frequency domain. For accurate characterization of jitter in the time domain, TIE measurement is used where the acquisition window width of TIE filter and offset frequency range is set. If the measurement is in the frequency domain, phase noise analysis is used where integration bandwidth and jitter filter is set for accurate frequency response characteristics. The article deals with the evaluation of the consistency of Si534x PLL performance. To cover the entire frequency range, our measurement includes unfiltered TIE jitter value.

Experiment III: effect of frequency offset on PLL loss of lock at room temperature
A frequency drift of a maximum ±500 Hz centred over the frequency of 40 MHz is applied as shown in figure 10. The frequency drift rate is of 1 Hz per 10 secs and checked for loss of lock in the PLL before each iteration. Frequency drift test is repeated ten times for statistical assurance. From the test result, it can be inferred that the configured loop bandwidth of 200 Hz has the suitable locking range to withstand the runtime LHC frequency drift.

Conclusion
The ability of a PLL to improve the clock quality has a significant role in the timing and synchronization of the LHC experiments. The main objective of the tests is to validate that the selected PLL meets the stated LHC requirements. A substantial amount of measurements are conducted to confirm that the chosen Si5344/45 PLL satisfies the LHC TTC essential quality standards. As can be inferred from the result section, TIE jitter shows weak dependence with temperature ramping while input-to-output clock skew exhibits strong dependence with temperature ramping. The preferred -14 - solution for the PLL is to set at ZDM mode of operation, to minimize the input-to-output clock skew variation with temperature changes. Si5344 revision D is currently available where the manufacturer claim to fix various temperature dependent issues (e.g. variations in the input-to-output delay, consistency of the MultiSynth dividers) [21]. The revision B is not recommended to be used by the manufacturer. However, during the testing phase, revision B was available featuring a known issue with the holdover mode. The holdover is an optional feature and is required when the reference clock is missing.

JINST 14 P02001
The discussion in the article is limited to the measurement of clock related input-to-output clock skew and jitter variation with temperature changes. The detailed characteristics of the jitter values are not included in this article. The jitter cleaning performance of PLL is reported in a separate article [15]. The presented results are specific to revision B of the PLL, the results are cross-checked with the same family members of the PLL belonging to revision B. With revision D of the PLL family [24] the characterization conditions are planned to be replicated to compare the performance variations. Si5344/45 PLL chip has found applications in LHC experiments, ALICE and LHCb in their first production version of the readout cards for the RUN3 upgrade. In the future, the effect of cascaded PLL on jitter cleaning effect can be a focus of study, as in an LHC timing distribution chain multiple PLLs are involved having variations in type and bandwidth settings.