A massively scalable Time-to-Digital Converter with a PLL-free calibration system in a commercial 130 nm process

A 33.6 ps LSB Time-to-Digital converter was designed in 130 nm BiCMOS technology. The core of the converter is a differential 9-stage ring oscillator, based on a multi-path architecture. A novel version of this design is proposed, along with an analytical model of linearity. The model allowed us to understand the source of the performance superiority (in terms of linearity) of our design and to predict further improvements. The oscillator is integrated in a event-by-event self-calibration system that allows avoiding any PLL-based synchronization. For this reason and for the compactness and simplicity of the architecture, the proposed TDC is suitable for applications in which a large number of converters and a massive parallelization are required such as High-Energy Physics and medical imaging detector systems. A test chip for the TDC has been fabricated and tested. The TDC shows a DNL≤1.3 LSB, an INL≤2 LSB and a single-shot precision of 19.5 ps (0.58 LSB). The chip dissipates a power of 5.4 mW overall.


Introduction
Time-to-digital converters (TDCs) have a significant impact on the performance of timing detectors, whenever high resolution is sought.In medical imaging or High-Energy Physics (HEP) applications [1] [2], the integration of a large number of TDCs in a single chip with a time resolution better than 100 ps is often required to improve the quality of image reconstruction.For this reason, a simple, compact, easily scalable, low-power design is crucial for this kind of applications.The TDC architecture proposed in this paper was designed with the aim of obtaining a converter that is able to combine all the specifications that high-time resolution pixel detector requires.This converter is based on a free-running RO that is able to perform an event-by-event measurement of the oscillation frequency which will compensate for potential (or unavoidable) drifts.Thus, this architecture allows implementing a simple and compact solution avoiding the use of any PLL-based synchronization system.This approach was first investigated during the development of various chips for timing detectors, as the ones produced for a full-silicon Positron Emission Tomography (PET) scanner at the University of Geneva [3] [4] and for the proposal of a new preshower system for the FASER experiment at CERN.As anticipated, detectors for HEP and medical imaging applications can guarantee better performance if the system is featuring a large number of TDCs with time-resolution in the order of tens of picoseconds [5].Indeed, detectors with a more precise time measurement system are able to perform a better image reconstruction of the particles that they need to sense.For instance, in many PET scanners, the Time-of-Flight information is fundamental to reduce the positional uncertainty of the annihilation points of the positrons produced in the body under exam [6].In a generic pixel detector with timing capabilities, having a structure in which each pixel is connected to its own TDC channel would be the ideal solution for efficiency purposes.Indeed, in this case, every portion of the matrix is independent of each other and the system will be able to store the timing information also in the case in which all the pixels are hit at the same time.However, especially for monolithic pixel detectors, this solution is difficult to implement for various reasons including area, complexity of the routing and power consumption.Hence, different design strategies need to be used, as the one illustrated in Figure 1.The matrix of the detector chip can be divided in sub-matrices: in the example of the figure, they are composed of 2 x 2 pixels and each of them is connected to a different TDC channel through the fast-OR blocks, together with the corresponding pixels of other sub-matrices.In this way, simultaneous hits on pixels of different channels (indicated with numbers from 1 to 4 in Figure 1) can be correctly detected.Having sub-matrices of pixels connected to separated converters avoids problems related to high cluster sizes because, in many detectors, the particles that need to be sensed can generate signals in groups of adjacent pixels [7].The number of TDCs is chosen on the basis of the cluster size and the event rate, taking into account, as mentioned before, the power consumption and the area of the converter.If multiple hits occur on the same channel in a time window shorter than the dead time of the TDC, the converter, after the first one, will disable the fast-OR block in order to prevent other hits to interfere with the measurement.A possible improvement of this architecture is based on implementing a design that, in the multiple hits scenario, is able to store the position in the matrix of all the pixels that sensed an event after the first one without timing information.For all these reasons, the goal of the present work was to design a TDC characterized by a simple, compact and low-power design.Moreover, as will be shown in Section 2, the proposed converter is characterized by a PLL-less architecture, a useful solution to further reduce power consumption, complexity and area, integrating more TDC channels in a single chip.The integration of the presented TDC inside a timing detector system requires a calibration process.Indeed, the difference among the delays of the ring oscillator and the counters used for the coarse component of the measurement can worsen the accuracy of the converter.In order to compensate this effect, a possible calibration approach is based on sending a periodic known event (synchronous with the reference clock) to the TDC.At this point, a set of offset parameters will be applied to the outputs of the system (given by Eq. 2.12-2.14 as it will be explained in Section 2) in order to minimize the standard deviation of the measured values.

TDC basics and common architectures
As introduced before, the development of a (tens of) picosecond-level resolution timing detector requires a TDC that is able to measure time with a precision in the same order of magnitude.Indeed, as explained in [8], an ideal TDC is characterized by a quantization error (assuming a uniform distribution) with a standard deviation   proportional to the time of the Least Significant Bit (LSB) . ( This parameter is often indicated as resolution of the converter [2].One of the traditional and most common approaches to design a TDC is based on using Ring Oscillators (ROs) [9] [10] [11].Considering a certain time interval , it is possible to measure a time difference by counting the number of cycles  of the oscillator in the interval and sampling the RO at the edges of , leading to where   is the period of the RO,    is the result of the sampling of the oscillator state which will produce the fine component of the measurement and   is the quantization error.More recently, other architectures have been proposed.A possible implementation is presented in [12] that shows an interpolative voltage-controlled oscillator (VCO).In this solution, the outputs of all the nodes of the structure are exploited to precharge further nodes in the oscillator resulting in an increase of the oscillation frequency.This implementation features a r.m.s.jitter value of 1.25 ps and a maximum frequency of 4.6 GHz in 180 nm CMOS technology and may be exploited for the design of both time digitizers and Phase-Locked Loop (PLL).A similar design approach has been adopted for the time conversion system integrated in the Blumino SiPM developed at EPFL [13].The architecture proposed in the present paper features a similar mechanism to increase the oscillation frequency.
Another solution that exploits a cyclic interpolation of switched-frequency RO allows measuring time intervals up to 375 µs with a precision of 4.2 ps [14].
In conventional RO-based architectures, the accuracy of the converter is given by the delay of the single cell of the oscillator   [8].In order to overcome this limitation, Vernier delay lines have often been used [15]: these solutions usually feature two delay lines with different stage delays  1 and  2 , whereas the converter has a LSB equal to Δ =  2 −  1 .However, the main limitation of this solution is represented by the measurement range of the converter that is given by   = Δ, where  is the number of stages of the delay lines.For a certain value of Δ, a wider range requires a larger , thus resulting in a consequent increase of the power consumption.Various architectures can be implemented to overcome this trade-off such as cyclic Vernier lines to extend the maximum measurement time range, as the one presented in [16], or 2-D Vernier lines [17], which represent an efficient solution that allows obtaining  quantization levels using only √  stages.However, the complexity of these structures makes them unsuitable for the goals proposed before.

Architecture
The design process of the present TDC was not only focused on the implementation of a simple and compact architecture but also on the optimization of other fundamental parameters such as time-resolution and linearity that play a crucial role on the performance of timing detectors.This analysis was supported by analytical modeling and validated by extensive simulations.The proposed converter has been designed in 130 nm BiCMOS technology.The latter has been used by the group for the design of various pixel detectors and for their front-end systems.However, no bipolar transistor was used for the TDC and thus the analysis could be extended to a pure CMOS technology node.

Design
The presented TDC is composed of a RO with 9 pseudo-differential pseudo-NMOS delay cells, depicted in Figure 2a.Each of the output pairs of these cells is connected to a pseudo-NMOS Differential Cascode Voltage-Switch-Logic (DCVSL) buffer [18], shown in Figure 2b.The pseudo-NMOS architecture was chosen to increase the oscillator frequency: in this way, the load connected to each cell does not include the gate capacitances of PMOS transistors.In a conventional RO, the frequency of the output is given by the inverse of the time that signal needs to propagate inside the chain of delay cells multiplied by two: where  is the number of stages of the oscillator and   is the delay of the single stage that represents the limit in time-resolution of a TDC with a conventional RO.However a feedforward design (also indicated as multi-path) has been applied to increase the speed of the system, reducing the delay   and, therefore, improving the resolution (LSB is given by   as explained in Section 1).Indeed, each delay cell of Figure 2a features two differential inputs: one of them is connected to the output of the previous cell while the other to the outputs of the buffer related to the cell placed four stage before in the RO.In this way, each buffer will be used to advance the charge or the discharge of the input of a further cell, resulting in a consequent increase of the oscillation frequency, as shown in Figure 3.For this reason, as simulations show, the nominal   will rise of almost 45 % with the respect to the case in which the multi-path architecture is not adopted.Moreover, the inputs of one of the delay cells must be inverted as displayed in Figure 4 in order to make the circuit properly oscillate by having an odd number of inverting stages.Indeed, because of the way the stages are connected (Figure 3), each output propagates in the chain without being inverted as depicted in Figure 4.For this reason, the connection in blue of Figure 4 is fundamental to satisfy the Barkhausen oscillation criterion [19][20][21].The choice of having a single inversion was made to facilitate a better symmetry of the layout.The role of the buffers is to decouple the output nodes of the RO and the loads of the circuit, i.e. the latch stages used to sample the state of the oscillator.However, in our design, these blocks are also put in the feedforward paths in order to increase the linearity of the converter and reduce the effect of mismatch among the buffers by exploiting the feedback loops of the oscillator.In order to clarify this point, it is possible to analyze the simple 5 stage multi-path RO depicted in Figure 5 (the result of the following analysis is general and can also be applied to structures with a larger number of stages).The dashed line represents the conventional multi-path architecture in which the feedforward is provided directly by the outputs of the delay cells.In the proposed RO, buffers provide the input to later delay cells through the dotted connections of Figure 5.The following analysis aims to evaluate the effect of the mismatch of an output buffer on the linearity of the  architecture in both of the scenarios depicted in Figure 5.
The parameters   with  = 0, 1, ..., 4 are the delay of the inverters of the oscillator while the (non-inverting) buffers show a nominal delay given by Δ.In order to analyze the linearity of the system, it is possible to exploit the Differential Non-Linearity (DNL) defined as where  is the code of the converter and   is the ideal delay which, as stated before, corresponds to the ideal LSB.Considering the first case (dashed line connection) with ideal delays   =   ∀ and assuming that, because of mismatches, the delay of the first buffer is Δ 0 ≠ Δ, the DNL will be since the Δ 0 will only affect the value of DNL related to the first cell.More in detail, the mismatch Δ 0 ≠ Δ may possibly generate a bubble in the output code (see Section 3).In the proposed example, it is possible to evaluate the DNL associated to the RO using Eq.2.3 only by assuming that an efficient bubble correction algorithm has been implemented.The same assumption will be used for the rest of the section.The characterization of the behavior of the RO requires the introduction of a parameter that links the effect of the feedforward connections with the speed of the system.The value of   is function of the difference between the arrival times of the inputs of each cell .Expanding   =   () in a Taylor series and neglecting all the components after the linear one , we obtain From Figure 5, it is possible to see that in the dashed line case  = −2  .Replacing this relation in Eq. 2.4 leads to The approximation of Eq. 2.4, as it will be explained later in the section, is justified by simulations.However, the analysis reported in this paper is general and can be easily extended to situations in which the non-linear terms are not negligible.
where   =   (0) is the maximum value of   (in the case of no multi-path architecture implemented) and  =   (0)/ is the feedforward parameter described before.Simulations of the cell in Figure 2a justify the approximations of Eq. 2.4 and 2.5 with values of  ≈ 0.25.The star-marked curves of Figure 6 show the behavior of the maximum and the Root Mean Square (RMS) value of the DNL as function of  with   = Δ = 50 ps, Δ 0 = 70 ps.For what concerns the proposed solution (dotted line in Figure 5), a proper evaluation of the non-linearities in the case Δ 0 ≠ Δ can be performed analysing the distribution of the edge times in each node of the oscillator   .As done for Eq.2.4 and 2.5 and considering the presence of the delay buffers in the feedforward paths, these times can be expressed as (2.6) A numerical approach was used to calculate the values of   for enough oscillator cycles such that all delay cells   reach their convergence values.At this point, the DNL can be calculated exploiting Eq. 2.2, replacing   with the average value of the cell delays  − and taking into account that Δ 0 ≠ Δ as done for Eq.2.3a.The plots in Figure 6 show that, for the proposed solution (dashed line curves), the RMS and the maximum of the absolute value of the DNL is smaller than the one related to the usual feedforward architecture (star-marked curves).The same parameters can also be compared as function of the cell delays (LSB).In Figure 7, it is possible to see that the non-linearity of the proposed solution has smaller values also when   and  − are comparable.The use of  − instead of   will be justified in Subsection 2.2.Indeed, the TDC is featuring an event-by-event calibration system that is able to compensate potential variations in the oscillation period measuring the frequency of the RO through a comparison with an external reference signal.A simplified approach can be used to analyze the behavior of the proposed solution.This approach is based on neglecting the variation of   as function of the variation of other cell delays and considering for it only the impact of Δ.This simplification, as it will be later shown, will give similar results to the ones obtained with the more detailed approach explained before because, in this analysis, only the effect of the mismatch of the buffers has been evaluated.Following the same considerations that lead to Eq. 2.5, it is possible to obtain the value of the cell delays   as However, the mismatch on the first buffer will also have an impact on the delay  3 ≠   that can be expressed as (2.8) The new value of  3 will also cause a variation in the oscillation period of the RO (2.9) From Eq. 2.9, it is possible to obtain the value of the equivalent LSB of the system (i.e. the average elementary delay of the cells) as   5 (calculated with Eq. 2.3 for the usual connection case, with Eq. 2.11 for the proposed solution scenario and exploiting the edge time distribution of Eq. 2.6 for the more detailed model).
Thus, the DNL of the architecture will be given by It must be clarified that in a  stages RO-based TDC, the total number of different codes the system is able to provide as output is 2.Hence, the   () should be defined for  = 0, 1, ..., 2 − 1.However, in this simplified analysis, assuming that the rise and fall times of the cells are perfectly equal, the mismatches affect the value of  () for  =  and  =  +  with  = 0, 1, ...,  − 1 in the same way.For this reason, it is possible to consider only half of the values of the DNL as done for Eq.2.3 and 2.11.In Figure 6 and 7, the solid lines represent the behavior of the non-linearities of the architecture with this more simplified approach.The approximation of the previous analysis are negligible for low values of  because of the reduced impact of the feedforward.However, even for larger , the proposed solution shows better performance in terms of non-linearities.Finally, it must be emphasized that the choice of a differential architecture, despite the increase of power consumption, is also based on improving the linearity of the system: simulations show that the DNL of a single-ended solution is almost 14 % higher than the one of an equivalent differential structure.
Figure 8: Block diagram of the system for the event-by-event calibration.

Event-by-event Measurement System
In Figure 8 we describe the synchronization system to which the TDC is connected.This system is based on the one presented in [22].Each node of the RO   with  = 0, 1, ... be connected to as many outputs of the four latch stages.The gating signal  0 is connected to the EVENT line, that will perform a falling edge every time an event occurs.A logic will then generate the remaining gating signals  1,2,3 that, for image reconstruction applications, can be associated to Time-of-Arrival (ToA), Time-Over-Threshold (TOT) and the period of a reference clock (CAL) respectively (it must be highlighted that a different number of latch stages can be adopted for different types of applications in which the TDC can be used).The counters will calculate the number of oscillator cycles   in these time intervals distributed as in Figure 9, producing coarse measurements of these periods    =     .The difference between the states of the TDC at the beginning and at the end of ToA, TOT and CAL intervals will define the fine contributes of the measurements    = (  −   )  where   and   are the outputs of two of the latch stages and   is the resolution of TDC (as stated before, it corresponds to the delay of the cells of the RO).From Figure 9, considering both of the fine and coarse contributes and resolving the RO period as   = 2  (with  = 9 in this case), it is possible to express the ToA, TOT and CAL intervals as ) The measurement of    is fundamental to compensate for potential parasitics, device mismatches, voltage drops of the supply, temperature gradients and in general all those factors that may cause a variation of the   and a consequent worsening of the accuracy of the converter.Indeed, the value of    is nominally equal to an external clock reference.For this reason, Eq. 2.14 can be exploited to calculate the value of   as function of the clock period every time an event occurs.Hence, this approach allows avoiding the use of any PLL-based synchronization system reducing the complexity of the whole architecture, power consumption and noise.The value of the LSB, i.e.
, can vary in time due to the above-anticipated temperature effects.This system, however, is able to calculate this value in a time window that depends on the period of the reference signal (   ), allowing the TDC to provide an output coherent with the time to be measured.Moreover, in a chip  with many ROs and only one PLL, all the frequencies would be synchronized on the slowest one.The approach shown above, instead, allows avoiding this situation, since all the ROs will oscillate at their own natural frequency.The schematic of the latches chosen for this architecture is depicted in Figure 10.Also in this case, the pseudo-NMOS architecture has been chosen to reduce the propagation time of these blocks and make them able to follow the outputs of the RO ( signals in Figure 10) when the latches are in transparent mode.
A test chip of the TDC featuring one channel (i.e. 4 latch stages) was submitted and its measurements will be presented in Section 3. A simulation analysis highlighted that the RO can be connected to more than one channel.Its oscillation frequency is reduced by 5.5% if 2 channels are connected and 23% in the case of a 4 channels configuration.However, in the applications in which such a drop is not acceptable, it is possible to add more ROs and/or multiplex more pixels to the same TDC channel.The integration of multiple ROs is usually problematic for area and power consumption.However, as it will be shown in Section 2.3, 3.2 and 3.3, the area and the dissipated power of the proposed architecture is smaller or comparable to the ones of many state-of-the-art TDCs.The jitter of the CLK signal of Figure 9 directly affects the precision of the measurement.In the proposed solution, since a ≈30 ps LSB is achieved, a jitter in the order of few ps is required.The distribution of a clock with a picosecond level jitter in a large ASIC is a challenging task in terms of area and power consumption.Fortunately, a reference signal can be sent only when a calibration is necessary, so the clock can be gated for a majority of the time, sending it only when an event is detected or at a fixed rate, depending on the expected drift in frequency of the clock source.

Layout
A picture of a test chip for the proposed TDC is shown in Figure 11a, while Figure 11b shows the layout of the RO.The position of the delay cells and buffer has been chosen to maximize the symmetry of the connections.As it is possible to see in the figure, with this placement the lengths of the feedforward paths are always one cell long while direct paths are two.The area of the RO core is 30.1 µm x 20.9 µm and 30.1 µm x 87.5 µm including the rest of the the system.Moreover, the outputs of the latches connected to the RO are routed on different metal layers (the pattern is 5-1-3-1-3-5 for the three inner stages) in order to reduce capacitive couplings and their effect on oscillation frequency.

Simulations and Measurements
In this section the simulations and the measurements of a test chip of the TDC will be shown.
As stated before, the converter was designed in 130 nm CMOS technology and the simulation framework was set to analyze and optimize the performance of the circuit in terms of scalability, linearity and time-resolution.

Post-layout Simulations
The free-running frequency of the oscillator   is highly dependent on the parasitics of the system.Simulations highlighted a 61 % drop (on average) of the   when passing from schematic to postlayout netlist.The circuit has been analyzed for various supply voltages   with a focus on 1.These values are reported in the plot of Figure 12.
A preliminary analysis has been performed during the design process to evaluate the linearity of the system.The sampling of the RO was simulated sweeping the sampling time   in a time interval that is larger than   , in order to be sure that the the system goes through all of its 2 states.The time step for   was chosen equal to 1 ps.For each step, several Monte Carlo (MC) simulations have been performed (using the same set of seeds for every value of   , in order to make the outputs coherent).At this point, it is possible to calculate the DNL and the Integral Non-Linearity (INL) in order to evaluate the distribution of their maximum values and RMS.The INL can be defined as The distribution of the DNL and INL obtained through this analysis for the case   = 1.6 V is reported in Figure 13.Table 1 shows the value of frequency, nominal resolution, power consumption and average value of both DNL and INL distribution (maximum value and RMS).The table also reports the simulated conversion time   .This parameter (equal to approximately 0.69 ns and 0.51 ns for   =1.4 V and 1.6 V respectively) only takes into account the time needed by the system to sample the state of the RO and the delay of the registers of the counters included in the converter.Thus, it represents the minimum ideal conversion time of the system.The measurement setup of the TDC, that will be described in the next subsection, did not allow a correct estimation of the conversion time since the system was limited by the readout logic.Hence, the aforementioned values of Table 1 just give an indication of the potential speed of the proposed TDC.Moreover, the   of the converters presented in the cited works (whose performance will be later commented and compared to our work) were extracted from the output data rate of the TDCs reported on the papers.Therefore, they simply represent upper limits of the real conversion times.

Table 1:
Multi-path simulations and measurements results.A comparison with other works is also reported.
For the proposed solution, it does not take into account the counters.

Test Chip Measurements
The measurements of the test chip were performed using the UNIGE USB3 GPIO board, developed by the engineers of the Department of Nuclear Physics (DPNC) at University of Geneva and based on the architecture of the readout scheme of the Baby-MIND experiment detectors at CERN [23].
A firmware was loaded on the FPGA that the board features in order to handle the communication with the chip and send sampling signals for the analysis of the linearity of the TDC.

Linearity Measurements and Bubble Correction
The distribution of the output read from all the latch stages connected to the RO after bubble correction is shown in Figure 14 for   = 1.6 V.With bubble correction, it is possible to indicate the algorithms that can be exploited when a TDC provides a forbidden output.Indeed, a TDC as the one presented in this paper, features -bit outputs but the number of correct states of the RO is only 2 [8].However, because of mismatches and metastability of the latches, it is possible that the sampled word is not included among the 2 correct states and it is characterized by a group of more than two consecutive equal bits called bubble [8].For the presented TDC, a simulation analysis highlighted that the most probable bubbles are the ones in which the output words has four consecutive zeros or ones and they can be easily corrected as explained in Figure 15.Applying this algorithm to the outputs obtained during the measurements it is possible to see that only the 0.03% of them is not corrected.In Figure 14, the output codes have been reported along the x-axis using numbers from 0 to 17 (2) while -1 indicates the amount of forbidden state outputs after the correction (see the plot for latch 01).Table 1 reports the results of the measurements, compared to the ones obtained with post-layout simulations.The test chip shows a smaller oscillation frequency that turns in to a lower time resolution due to non-extracted substrate capacitances that reduced the speed of the system.The measured LSB is 38.7 ps for   = 1.4 V and 33.6 ps for   = 1.6 V.However, the behavior of the circuit in terms of linearity is in line with the simulation results.The output distribution, as the one of Figure 14, allows calculating the standard deviation of the 1 0 0 0 0 1 0 1 0 1 1 0 1 0 1 0 1 0 1 0 0 1 0 1 0 1 0 1 0 1 1 0 1 0 1 0 1 0 1 0 0 1 0 1 0 1 0 1 0 1 1 0 1 0 1 2 3 4 5 Figure 15: Simple bubble correction algorithm implemented for the presented TDC.If four consecutive bits are 0 (word on top), assuming that the others are correct, there are only 5 possible states in which the RO can be (bottom).The numbers on the right represent the associated code (arbitrary) and they are ordered in the way the TDC goes through these states (e.g. 2 follows 1).The implemented correction is based on inverting the two middle bits of the incorrect portion of the word (in the full rectangle) because it reduces the maximum potential error and it is also the most probable value (proved after a simulation analysis).
quantization error   .This parameter can not be calculated using Eq.1.1 because of the irregular and not ideal distribution of the bins of the system.The probability density function   () of the error can be obtained using the law of total probability as where ( = ) =   /  is the probability that the output code  is equal to .The behavior of the pdf for all the latch stages is reported in Figure 16 for   = 1.6 V.The average value of the quantization error standard deviation   is 21.1 ps (0.54 LSB) for   = 1.4 V and 17.1 ps (0.51 LSB) for   = 1.6 V.

SSP and PN
The so-called Single Shot Precision (SSP), i.e. the jitter of repeated measurements of the same time interval, was measured exploiting the block diagram in Figure 17    (Discriminated-DIV in the figure) is sent, through a Crate NIM, to the GPIO Board, that will then turn off the gating signals sampling the oscillator.The value provided by the TDC should ideally be always the same.However, the standard deviation of the distribution of this outputs will represent the above mentioned SSP.The output distribution for a supply voltage   = 1.4 V is reported in Figure 18.The average standard deviations are 15.8 ps (0.41 LSB) and 19.5 ps (0.58 LSB) for   = 1.4 V and   = 1.6 V respectively.The analysis of the output distributions like the ones in Figure 18 allows calculating the accuracy of the converter.This parameter can be defined as the equivalent offset affecting the time measuring system.For the presented TDC, the accuracy was evaluated as the maximum difference of the average value of the distributions obtained for the calculation of the SSP.The measurements show that the accuracy is equal to 40.9 ps (1.05 LSB) for   =1.4 V and 31.0 ps (0.92 LSB) for   =1.6 V.However, a simple calibration based on the same procedure implemented for the evaluation of the accuracy can be used for the offset compensation.

Ring-Oscillator
The output of the divider was also exploited to analyze the power spectrum of the RO in order to evaluate the Phase Noise (PN). Figure 19 shows a zoom of the power spectrum of this signal around its fundamental component for   = 1.6 V.The measured value of PN at 100 kHz from this component is -99.02 dBc/Hz for a 1.6 V supply and -97.7 dBc/Hz for 1.4 V.The value of SSP and PN are reported in Table 1.

State-of-the-Art Comparison
Table 1 offers a comparison between the TDC described in this paper and other works.As highlighted before, the main property of the presented TDC is the compactness and the simplicity of the PLL-less architecture which makes it the solution with the smallest area among all the cited works (for [13] the area is not reported).Solutions [11] [15] [16] and [17] are characterized by smaller power consumption and LSB but they have been developed in a more advanced technological node and, as explained in Section 1, the complexity and/or the limited maximum measurable time interval make them more difficult to be integrated in large pixel detector chips.The non-linearities of the presented architecture are comparable with the other works (only solutions [2] and [9] have significantly better values of DNL and INL but their power consumption is one or two orders of magnitude higher than the one of the PLL-less TDC).The performance of the converter proposed in this paper is compared to some of the works reported in Table 1 and in Figure 20.Even this plot highlights the compactness of our architecture compared to others with similar performance in terms of resolution and power consumption.

Conclusion
A RO-based TDC was developed to be integrated in pixel detectors for HEP and medical imaging applications.Simulations and measurements show a LSB of 33.6 ps (or 38.7 ps for lower supply) and a DNL≤1.3LSB.Two models were developed for the analysis of the proposed solution architecture and to demonstrate that the integration of the buffers into the feedforward paths is useful to reduce the impact of their mismatch on the linearity of the system.This solution does not add any complexity to a standard multi-path architecture since it only requires the buffers to drive the input of other delay cells other than the external loads.For this reason, this simple modification in the architecture of the system can be applied to any multi-path RO-based TDC in various technologies.
The PLL-less event-by-event calibration system, the small power consumption and the compact area allow an easier integration of a large number of converters in pixel detector chips, a crucial characteristic for the above-mentioned applications.1 and the ones reported in [24][25][26][27][28][29][30][31][32][33][34][35][36].The size of the dots on the plot is proportional to the power consumption of the analyzed TDCs (logarithmic scale).

Figure 1 :
Figure 1: Possible configuration of a 4 x 4 pixel matrix connected to 4 different TDC channels through fast-OR blocks.In this case active area refers to the sensitive region of the detecting system.

Figure 5 :
Figure 5: An example of a 5 stage multi-path RO with two types of feedforward connections (dotted line: proposed solution).Δ 0 ≠ Δ indicates the propagation time of the buffer that shows a mismatch with respect to the others.

Figure 6 :
Figure 6: RMS (top) and maximum of the absolute value (bottom) of DNL as function of  of both of the solutions depicted in Figure5(calculated with Eq. 2.3 for the usual connection case, with Eq. 2.11 for the proposed solution scenario and exploiting the edge time distribution of Eq. 2.6 for the more detailed model).
proposed sol.sim.model usual connection sol.

Figure 7 :
Figure 7: RMS (top) and maximum of the absolute value (bottom) of DNL as function of the cell delay (calculated with Eq. 2.3 for the usual connection case, with Eq. 2.11 for the proposed solution scenario and exploiting the edge time distribution of Eq. 2.6 for the more detailed model).

Figure 10 :
Figure 10: Schematic of the latches used to sample the state of the RO.

Figure 11 :
Figure 11: Picture of the test chip of the proposed TDC (total area: 0.9 x 0.9 mm 2 ) (a) and layout of the RO (b).

Figure 12 :
Figure12: LSB and power consumption of the TDC for typical, Fast/Fast (F/F), Fast/Slow (F/S), Slow/Fast (S/F) and Slow/Slow (S/S) corners and for   equal to 1.4 V and 1.6 V.

Figure 13 :
Figure 13: Maximum values and RMS distributions of DNL and INL calculated over various Monte Carlo simulations.In this case, the supply   = 1.6 V.

11 Figure 14 :
Figure 14: Measured output distribution (after correction) of the TDC for   = 1.6 V and for all the latch stages connected to the RO.

Figure 16 :
Figure 16: Probability density of the quantization error for each latch stage (  = 1.6 V).

Figure 17 :
Figure 17: Block diagram of the measurement system to evaluate the SSP of the converter.

11 Figure 18 :
Figure 18:Output distribution of the data obtained with the measurement system depicted in Figure17for   = 1.4 V.

Figure 19 :
Figure 19: Zoom of the power spectrum of the divider output for   = 1.6 V around the fundamental component of the signal.
4 V and 1.6 V. Post-layout simulations show that the RO oscillates at a frequency   equal to 2.05 GHz and 2.34 GHz for   = 1.4 V and   = 1.6 V respectively.Considering Eq. 2.1 with  = 9, the system will be characterized by a nominal resolution of 27.1 ps and 23.7 ps for the above-mentioned cases.Multi-corner simulations highlighted a less than 30 % variation of the LSB with the respect to the typical case.More in detail, minimum values of the LSB are obtained in Fast/Fast corner (22.45 ps and 20.02 ps for   = 1.4 V and   = 1.6 V respectively) and the maxima in the Slow/Slow (30.38 ps and 35.37 ps for   = 1.4 V and   = 1.6 V respectively).
. A Ready signal, connected to the gating of the latches, activates a 8 bit divider.The rising edge of the output of this block