Methods for clock signal characterization using FPGA resources

Reliable measurement of clock signal parameters is an important tool for calibration and validation of circuits used in precise-timing applications. Such parameters include frequency, phase, duty cycle and channel-to-channel skew. Especially in applications in which testing time for multiple channels is a significant factor, efficient parallelization of measurements is crucial, often coming with significant extra cost. We present a technique for characterization of clock signal parameters using off-the-shelf (FPGA) evaluation hardware. Two methods for both static measurements (steady-state behaviour) as well as dynamic measurements are presented. For the two measurement methods proposed, case studies are presented and their performance discussed.


Introduction
A recurring requirement for clock synthesis and timing distribution ASICs is to provide multiple phase-locked clock signals, whose frequency is programmable to an integer multiple of a common reference clock frequency. These clock signals need to be provided with deterministic delays and low jitter.
With such ASICs increasingly being used for synthesis and distribution of high-precision timing reference clocks in radiation environments, the associated requirements for testing and validation of clock performance are becoming more stringent. Recent ASIC designs feature tens of such clock outputs [1], whose correct function and performance needs to be tested during production in as little time as possible. Therefore, a need exists for scalable measurement systems which can be integrated into FPGA-based test setups commonly used during the evaluation testing phase as well as for production testing.
Another important aspect of performance validation in the context of radiation hardened electronics is characterization of Single Event Effect (SEE) sensitivity. With more stringent requirements for clock latency and jitter, there is a need to detect phase and frequency transients in real time as well as glitches originating in Phase Locked Loop (PLL) circuits.
Two distinct methods are presented that were developed for static and dynamic characterization of clock signals, fulfilling the requirements outlined above. Their implementation relies solely on common integrated resources found in modern FPGAs and does not require external hardware -1 -components. They can be exploited to easily implement multiple measurement channels, which allows characterization of multiple signals simultaneously. It also facilitates integration into a larger FPGA Data Acquisition (DAQ) system with reduced design effort.

Method 1: measurement method for static signal parameters
The first measurement method presented is focused on the determination of signal parameters under the assumption of a non-perturbed, static clock signal. Such parameters include clock frequency, clock phase (skew), duty cycle and jitter.
The measurement is based on the concept of equivalent time sampling [2]. The waveform of the clock under test is sampled using a delay line by an internally generated sampling clock, and the phase of this sampling clock is subsequently advanced in increments much smaller than the sampling clock period in order to recreate the waveform of the clock signal under test with high temporal resolution. From the reconstructed time series obtained this way, the signal parameters mentioned above can be estimated using post-processing. The process is comparable to the equivalent time sampling methodology used in digital oscilloscopes for periodic waveforms.
The main difference to a classical equivalent time measurement is the input sampling flip flop, which can be considered a limiting amplifier. Accumulation of multiple measurements for the same sampling phase therefore results in a measurement of the probability of the clock signal being either high or low instead of information on the signal amplitude (as acquired using an Analog-to-Digital Converter (ADC)). This fact can be used in the estimation of signal jitter, as will be shown below.

Circuit architecture
A possible implementation of the measurement method described above is shown in figure 1. The reference clock depicted in the figure is used for both the Device Under Test (DUT) and the FPGA in order to realize a synchronous measurement. An internal FPGA PLL synthesizes the measurement sampling clock, whose frequency is an integer multiple K of the reference clock and has programmable phase shift.
A high-speed input buffer and register are used to sample the clock signal under test. Following the initial sampling register is a synchronous delay line with a length sufficient to cover a full period of the reference clock. The contents of this delay line, clocked by the high speed measurement clock, are sampled once per reference clock cycle. Attached to each tap of the delay line is an accumulator, which is used to integrate the values sampled on every measurement cycle. In this way, the probability of the clock being observed as 0 or 1 at each of the delay line taps is digitized. The accumulators and the PLL used to achieve variable phase shift are controlled by a common finite state machine.

Measurement operation
A complete set of measurement data is obtained by performing the following sequence of operations. First, the sampling clock phase is aligned with the reference clock. The delay line accumulators are reset before an integration period is started. For M reference clock cycles, the accumulators attached to each delay line tap integrate the state of the sampled clock waveform. After M clock cycles, the integration process is stopped and the accumulator contents are transferred to a memory.  To obtain the next measurement, the accumulators are reset and the measurement clock phase is advanced by one increment. This procedure is then repeated until a set of measurements has been acquired for all available phase settings of the sampling clock in relation to the reference clock.

Measurement post-processing and signal parameter estimation
The data set acquired by the procedure outlined before is now rearranged by interleaving the measurements according to figure 2. While the delay line taps produce samples equally spaced at ∆t Tap = T Ref /K, a phase shift of the sampling clock by ∆ϕ rad shifts these measurements by ∆t Phase = ∆ϕ·T Ref /(2πK). Therefore, an equivalent-time series of measurements can be constructed after acquiring N = ∆t Tap /∆t Phase sets of measurements and interleaving them as shown. For each time offset with respect to the reference clock, the accumulator value stored represents the probability of the clock signal being sampled as logic low or high.
To determine the clock signal frequency, techniques such as zero-crossing detection or Fourier transforms can be used on the resulting time series. Given that the clock signal under test is  constrained to be an integer multiple of the reference clock frequency, there are only a limited number of possible frequencies available, which can easily be resolved using such simple methods. This process can also serve as a check for the presence of a clock signal (which may be absent in case of a faulty DUT).
In order to determine the remaining signal parameters with high accuracy, a joint estimation process for skew, duty cycle and jitter can be used. Under the assumption that the clock jitter distribution is adequately modeled using a normal distribution with a standard deviation σ, each clock edge observed in the time series can be modelled as the modified error function formulated in equation (2.1), containing the a-priori unknown parameters t edge and σ edge . A visual representation of this model and its fit to an example time series can be seen in figure 3.
When performing the measurement as described above, this probability function is sampled at discrete time steps spaced by ∆t Phase and quantized into M discrete levels. By performing a curve fit of the modified error function template to each edge in the measurement time series using the parameters mentioned above, the mean and standard deviation of the underlying normal distribution can be estimated, which correspond to the edge position and rms jitter. From these estimated parameters, the other signal parameters are calculated. With precise information of the timing of each edge, signal skew and duty cycle can be calculated. While it has to be noted that the estimated jitter contains a contribution from the FPGA PLL and its reference clock, their contributions can be considered small compared to the quantization noise floor arising from the finite phase time step available.

Implementation details
An important implementation parameter is the choice of the multiplier parameter K, dictating the delay line clock frequency. In case the phase shift time step size scales with the chosen sampling -4 -frequency, using a higher frequency allows improving temporal resolution of the measurement. However, it also increases the length of the delay line and makes timing closure of the design more difficult. If the phase shift time step size is independent of the sampling frequency, the choice of K still dictates the number of phase steps that need to be performed for a full measurement. Reducing the sampling frequency in this case results in a larger number of points to be measured, increasing the total measurement duration. The required measurement time as a function of possible design parameters is given by eq. (2.2). While the reference clock period is usually fixed, the measurement clock frequency (which determines ∆t Tap ) can be chosen for best compromise between FPGA design timing closure and measurement time. The choice of ∆t Phase is typically limited to certain values based on the chosen Voltage-Controlled Oscillator (VCO) frequency, but may also be adapted to accuracy and measurement execution time requirements.
Careful selection of the input buffer and sampling flip flop (see figure 1) with highest performance is crucial to achieving the highest input pulse width resolution t pw,min . Experiments have shown that use of the sampling flip flops integrated in IO tiles of the FPGA, preferably ones colocated with a DDR receiver, allow much shorter pulses to be sampled than what is possible when using a generic fabric register. Such a pulse width limitation constrains the minimum duty cycle d min that can be measured for a given input frequency f in (eq. (2.3)) or equivalently the maximum input frequency f max for a 50 % duty cycle signal (eq. (2.4)).
An additional advantage of this implementation is that the only register directly contributing to measurement linearity errors is located in the IO tile of the FPGA and its clock is provided via a fixed global route from the PLL. In contrast to the first register, all following registers are placed automatically in the fabric and are fully synchronous and constrained for static timing analysis. This makes the design insensitive to different placement and routing of these cells with changes in the RTL design.
As the measurement accuracy is directly determined by the time step size achieved by phase shifting the measurement clock, attention must be paid to the linearity of this programmable phase shift. It was found that the FPGA integrated PLLs with phase shifting capability may show significant non-linearity over a full VCO period. While being monotonous, this causes the measurement time steps to be non-equidistant. Depending on the level of accuracy required from the measurement, these imperfections may be ignored (leading to skew-dependent estimation errors for edge times and jitter standard deviation) or characterized and accounted for during post-processing.

Performance measurements
To demonstrate the measurement concept, an example was implemented and characterized experimentally. The system was designed to synchronously measure the parameters of clock signals at an integer multiple of 40 MHz. It was implemented on a Xilinx Virtex 7 Series FPGA, offering -5 -  generic PLLs with programmable phase shifting capabilities. Specifically, it offers a resolution of ∆t Phase = 1 /56T VCO [4]. As a compromise between measurement time and required fabric frequency, a sampling clock frequency of 320 MHz has been selected (K = 8). The PLL VCO operates at 640 MHz, resulting in a time resolution of 27.9 ps. In order to sample a full 40 MHz clock period, an eight-tap delay line is required. For the accumulators, a width of 8 bit was chosen, resulting in 256 quantization levels of the sampled Cumulative Distribution Function (CDF). Given the phase step size and the measurement clock period, a total of 112 measurements have to be taken, and for each step a total of 256 reference clock cycles need to be accumulated. According to equation (2.2), this setup results in a minimum measurement time of 717 µs. Some additional time is required for performing the PLL phase shifting as well as for the transfer of data. It must however be highlighted that the measurement duration itself is independent of the number of channels implemented, as the same logic can be instantiated multiple times and operate in a parallel fashion from the same PLL clock.
With section 2.2 having established that the measurement performance relies on the estimation of the two model parameters t edge and σ edge , the quality of their estimation was characterized in detail. The test setup used for this characterization is shown in figure 4. The measurement system is synchronized to a low jitter clock synthesizer, which provides a 40 MHz reference clock. A pulse generator is used to generate diverse input waveforms with variable skew and jitter. Measurements of the same signal obtained from a high-bandwidth oscilloscope are used as a golden reference measurement to exclude imperfections from the pulse generator. Figure 5 shows the measurement results for the time estimation error of t edge as described in section 2.2, which is the underlying observable for the calculation of signal skew and duty cycle. The measurement covers a single full reference clock period (25 ns) with a 1 ps step size. It can be clearly seen that the estimation error follows a repeating pattern with a period of T VCO , which can be attributed to the non-linearity of the PLL phase shifting capability. The worst skew estimation error was found to be 61.6 ps, while the rms error equals 13.7 ps.
To evaluate the jitter estimation precision, the clock signal generator was delay modulated with white noise having different amplitudes. The proposed measurement system was used to estimate the applied input jitter. Figure 6 shows the resulting estimation error. Repeated series of measurements were conducted at different signal skews, shown in grey. A mean over all measurements was calculated, highlighted in black. Two main observations can be made: for input rms jitter values below 20 ps, the measurement error is dominated by quantization noise, which produces a floor of ∆t Phase/ √ 12 ≈ 8 ps. It can also be seen that the jitter estimation tends to consistently over-or underestimate the jitter depending on the input signal skew by up to 20 %. Again, this effect can be attributed to non-linearity of the programmable PLL phase shift. This can be concluded from the fact that the mean value of the rms jitter estimations, collected for different skews, is in very good agreement with the input rms jitter. Again, a measurement setup calibration can be used to remove this skew-dependent bias off-line. Even without this calibration, the setup can be used to reliably detect excess jitter on clock signals.
The minimum pulse width, supported by the measurement system, was also evaluated. Using a dedicated IDDR input sampling register, the minimum pulse width was found to be 250 ps, while for a generic fabric register only 500 ps was achievable. This minimum pulse width results in a maximum signal frequency of 2 GHz at 50 % duty cycle or a minimum duty cycle of 25.6 % at a frequency of 1280 MHz, which was the highest frequency tested in the experiment.

Method 1 summary
A measurement concept for the characterization of clock signals, produced by integer-N frequency synthesizers and clock distribution circuits, has been implemented using built-in FPGA resources. It can be employed for reliably measuring signal parameters such as frequency, duty cycle, skew and jitter. The system allows for a wide range of trade-offs to optimize measurement performance or measurement duration. An implementation example has been characterized and the main contribution to measurement errors has been identified. The method is able to provide measurements accurate to tens of picoseconds without calibration. Calibration and offline data processing can be used to further improve the measurement performance. The system was successfully integrated into the production testing setup of the Low-Power Gigabit Transceiver (lpGBT) ASIC, characterizing 32 clock signals simultaneously [6].

Method 2: measurement method for dynamic signal parameters
A recurring requirement in the testing of radiation-hardened clock synthesis and distribution circuits is the detection of sudden phase or frequency jumps of a clock signal. With more stringent requirements on phase stability of these systems, the requirements for circuit characterization in radiation environments increase as well.
Direct, dead-time free real time detection and acquisition of small signal phase excursions is a task not currently covered by off-the-shelf high-end measurement equipment. FPGA based systems implemented for this purpose up to this point offered only limited resolution [3]. Measurement systems based on oscilloscopes can acquire very precise phase information, but fail to trigger on phase deviations of the signal and can therefore not be used to reliably detect SEEs that manifest as a change in clock phase.
The architecture presented here mitigates both of these shortcomings by realizing a deadtime free measurement system capable of directly triggering on signal phase with time-domain resolution of less than 100 ps. This is achieved by re-purposing the receivers intended for highspeed communication links, which can nowadays be considered a common peripheral of FPGAs.

Circuit architecture
The circuit implementation is shown in figure 7. The GTX transceivers available in Virtex 7 FPGAs are configured to operate in 'RX CDR Lock to Reference' mode, in which the CDR circuitry is effectively bypassed and instead the deserializer clock is locked to an externally provided reference clock [5]. When this reference clock is derived from the same clock that is used as a reference for the DUT, any of its output clock signals can be sampled with a very high oversampling ratio and -8 - fixed phase relationship. If used in this way, the GTX deserializer is effecively being re-purposed as a Time-to-Digital Converter (TDC). Because the GTX deserializer can be operated at data rates up to 10.51 Gb/s, this mode of operation allows achieving time resolutions better than 100 ps. Multiple parallel measurement channels can be implemented by using more than one GTX receiver in this configuration. Virtex 7 FPGAs typically offer multiple receivers to be clocked from the same reference clock, allowing synchronous multichannel measurement systems to be implemented.

Measurement operation
In the described mode of operation, the GTX core produces deserialized, multi-bit binary data words at a word clock frequency much lower than the deserializer bit rate. Multiple of these data words can be concatenated to capture a full reference clock waveform and the resulting waveform can be transferred into the reference clock domain. In order to extract dynamic signal parameters from these, the necessary logic required for determination of frequency, phase and duty cycle is implemented in generic FPGA fabric. This logic only needs to operate at the reference clock period, which makes timing closure straight forward to accomplish.
Frequency detection can be realized by counting the number of rising edges in the deserialized data word. Information about the position of the clock signal edges (and therefore information about clock phase and duty cycle) can be found directly from evaluating the location of edges in the deserialized data word. Many other measurements (such as the presence of glitches in the input clock) can be implemented through simple logic functions similar to the ones described. An example of how a phase determination might be implemented is outlined in figure 8. The addition of triggering capabilities for any of these described observables can be easily implemented and used to store transients of arbitrary length to a memory, which can then be read out for analysis or to trigger an oscilloscope. Other applications such as event counting or time-over-threshold measurements can be implemented this way as well.

Performance measurements
In order to characterize the measurement performance, the proposed architecture was implemented on a Xilinx Virtex 7 FPGA evaluation platform and one of the integrated GTX receivers was used. The performance of this measurement is determined by the temporal linearity of the deserializer which is used as a differential receiver and TDC. Similarly to the test setup in section 2.5, a DUT reference clock and output clock of 40 MHz were used, from which a GTX reference clock frequency of 320 MHz is synthesized using a PLL integrated in the FPGA. This reference clock is then further used to generate a bit clock of 10.24 GHz for the deserializer. Choosing 10.24 GHz as the deserializer clock frequency has the advantage of being an integer, power-of-two multiple of the chosen reference frequency. This configuration produces 256 deserialized bits per reference clock period (25 ns), which corresponds to a phase resolution of 97.65 ps.
The phase measurement used for characterization of the system was implemented simply by determining the position of a rising edge in the deserialized data word. A triggering system was implemented using a window comparison with configurable window width. A configurable number of phase measurements before and after a trigger event are stored in a dual-port memory for retrieval by the DAQ system.
Essentially, the DUT clock phase is quantized by means of sampling it with the deserializer bit clock. Similar to quantization processes in the amplitude domain, this time domain quantization is non-ideal and warrants a closer look to better understand the performance of the implemented measurement. Deterministic jitter components correlated with the reference clock present on the sampling clock may introduce Integral Non-Linearity (INL) and Differential Non-Linearity (DNL) into the measurement. Such non-linearities can compromise the quality of measurements derived from the digital phase information. Figure 9 shows measurements of INL and DNL for the proposed implementation for a complete reference period. The maximum INL was found to be 0.53 LSB, while the maximum DNL is 0.15 LSB. An example transient captured using the implemented method is shown in figure 10.

Method 2 summary
A real time clock signal measurement system based on the reuse of high-speed data link receivers available in Xilinx Virtex 7 FPGA has been presented, implemented and characterized. It allows for determination and direct triggering of signal parameters such as frequency, phase and duty cycle as well as the detection of signal glitches in transient measurement applications, i.e. for each individual clock cycles. Parameter determination, trigger processing and data handling are implemented using generic FPGA fabric, allowing for the design of sophisticated measurement tasks -10 - -11 -and simple integration into larger DAQ systems. An example implementation demonstrated singleshot phase measurement resolution better than 100 ps and dead-time free trigger capability on phase transient of this magnitude, which was previously not possible using only off-the-shelf measurement equipment. Multiple channels can be implemented by using multiple high-speed receivers available in the selected FPGA. With the receiver performance improving with recent FPGA generations by moving to data rates upwards the 10 Gb/s presented in this work, a further improvement in measurement resolution is achievable while utilizing the same measurement concept. As a potential example for further study, the Xilinx Ultrascale+ GTY transceivers supporting Non-return-to-zero (NRZ) line rates of 32.75 Gb/s can be mentioned.

Conclusions
In this paper, two methods that rely exclusively on built-in FPGA resources to characterize clock signals, were presented. These techniques can be used in applications such as production testing, since they are easily parallelized, as well as for test campaigns for which integration into an existing data acquisition system is desired and transient measurement and triggering ability is required to characterize single-event radiation effects. Both methods were successfully used in the characterization process of the lpGBT ASIC. The first measurement method was used to monitor the evolution of clock skew and duty cycle during prolonged exposure to ionizing radiation and it is also employed for production testing of the ASIC. The second method was successfully used during multiple radiation test campaigns focused on detection of single event effects in the lpGBT PLL [7]. Both presented methods already allow characterizing the parameters of clock signals to better than 100 ps and their performance is predicted to improve when being implemented on future FPGA generations.