A CMOS 0.18 μm 600 MHz clock multiplier PLL and a pseudo-LVDS driver for the high speed data transmission for the ALICE Inner Tracking System front-end chip

This work presents the 600 MHz clock multiplier PLL and the pseudo-LVDS driver which are two essential components of the Data Transmission Unit (DTU), a fast serial link for the 1.2 Gb/s data transmission of the ALICE inner detector front-end chip (ALPIDE). The PLL multiplies the 40 MHz input clock in order to obtain the 600 MHz and the 200 MHz clock for a fast serializer which works in Double Data Rate mode. The outputs of the serializer feed the pseudo-LVDS driver inputs which transmits the data from the pixel chip to the patch panel with a limited number of signal lines. The driver drives a 5.3 m-6.5 m long differential transmission line by steering a maximum of 5 mA of current at the target speed. To overcome bandwidth limitations coming from the long cables the pre-emphasis can be applied to the output. Currents for the main and pre-emphasis driver can individually be adjusted using on-chip digital-to-analog converters. The circuits will be integrated in the pixel chip and are designed in the same 0.18 μm CMOS technology and will operate from the same 1.8 V supply. Design and test results of both circuits are presented.


Introduction
The ALICE (A Large Ion Collider Experiment) experiment on the LHC at CERN is designed to study the properties of the Quark Gluon Plasma, the state of the matter which characterized the early moment of the Universe, by means of Pb-Pb, p-p and p-Pb collisions. The ALICE detector consists of several sub detectors of various types and allows for a comprehensive study of different hadrons particles and jets in a wide range of transverse momenta.
To reach the physics objectives of the ALICE experiment, an upgrade program is ongoing to enable the detector to deal with the increase of the LHC instantaneous luminosity up to L = 6×10 27 cm −2 s −1 for the Pb-Pb collisions after the Long Shutdown 2, foreseen 2018-2019. Actually, ALICE will accumulate 10 nb −1 for Pb-Pb collisions and it will deal with an interaction rate of 50 kHz which turn out with a gain of 100 in statics in order to have access to rare probes at low and intermediate range of p T .
Two crucial limitations of the current apparatus are the overall material budget and the readout rate capabilities. Actually, the first degrades the impact parameter resolution whilst the latter restricts the detector to use only a small fraction of the Pb-Pb collisions. For this reason, ALICE plans an upgrade to fully replace its Inner Tracking System (ITS), its innermost detector, with a new one which will have a low material budget and high spatial resolution to efficiently reconstruct the particles tracks together with a faster event read-out. The new ITS will be equipped with 7 layers of pixel chips based on Monolithic Active Pixel Sensors (MAPS) to enhance the impact parameter resolution by means of an increase of the detector granularity. Actually, MAPS integrate sensor and read-out electronics on the same piece of silicon, thus reducing the material budget of the overall -1 -detector. For the ITS upgrade the 0.18 µm CMOS Image Sensor technonology from TowerJazz has been selected. An important aspect of this technology is that it is expected to be more robust to the radiation damages because of the feature size and the doping characteristics. The 0.18 µm technology node is beneficial for the tolerance to the total ionizing dose (TID) since it features a gate oxide thickness below of 4 nm, thus reducing the probability of charge carriers trapping. Since the MAPS from TowerJazz can be reverse biased, the sensitive volume can be partially depleted. In this way the charge collection is made by drift, thus reducing the trapping and recombination probability. Test measurements have been assessed the resistance to the TID and to the Non Ionizing Energy Loss (NIEL) of this technology and the results are presented in [1].
The sensor chip will measure 3 by 1.5 cm and contain about 500 000 pixels of about 28 by 28 microns. The seven ITS layers are divided in two groups. The Inner Barrel (IB) consists of the innermost three layers, where modules consists of nine chips in a single row which drive their own high speed output. The Outer Barrel (OB) consists of the outermost four layers, where particle densities will be smaller allowing to group the data on one chip of seven chip. Physics simulations show that it is necessary to send data out of the chip at the target speed of 1.2 Gb/s (IB) and 400 Mb/s (OB) to efficiently read the pixel matrix. For this reason, a full custom Data Transmission Unit (DTU) has been designed for the chip periphery. It is a fast serial link made by a 600 MHz clock multiplier PLL, a serializer working in Double Data Rate mode and a pseudo-LVDS driver. This link works with a power supply of 1.8 V and it is entirely designed in the TowerJazz 0.18 µm CMOS Image Sensor technology. The PLL is used to multiply by 15 the 40 MHz LHC clock, thus allowing to obtain the 600 MHz clock for the IB and the 200 MHz (40 MHz × 5) clock for the OB. Both the PLL and the serializer have been protected against Single Event Upset (SEU) by using the Triple Modular Redundancy (TMR) technique on all flip flops. The pseudo-LVDS driver is the unique active device which link the output of the pixel chip with an external FPGA in the patch panel.
In the following, the description of the main characteristics of both circuits are reported in sections 2 and 3 whilst the test measurements are presented in section 4.

The 600 MHz clock multiplier PLL
The 600 MHz clock multiplier PLL shown in figure 1 is a full custom circuit consisting of a Voltage Controlled Oscillator (VCO), a ×15 Frequency Divider, a Phase-Frequency Detector (PFD), a Charge Pump (CP), a RC filter and a DAC which is used to set the current in the CP. The multiplication factor by 15 is set because of the fact that a 8b/10b data encoding is used. Actually, in the DTU case the number of bits that have to be encoded during a clock period of 40 MHz is 30, i.e. 3 additional byte has to be encoded after the 8b/10b encoding. This allows a 1.2 Gb/s of transmission rate and a 600 MHz clock for the DDR serializer. Then, the multiplication factor comes out from the ratio between the 600 MHz output clock and the 40 MHz LHC clock.
The characteristics of each of those circuits are briefly described below whilst the test results on the PLL test chip are reported in section 4.

The Voltage Controlled Oscillator
The VCO is implemented using four stages of a differential ring oscillator like the one shown in figure 2 (a) which is based on [3] and [4]. From the simulation across process corners, the tuning -2 -  range lays between 500-700 MHz. The top and bottom current sources provide the same current to the circuit and the current value can be suitably adjusted by varying the gate voltages Vcp and Vcn. The ring oscillator input is a PMOS differential pair loaded with diode connected transistors in parallel with two transistors with external gate voltage. This architecture has two advantages with respect to a single ended current limited inverter. It is less sensitive to the noise and it allows to have an even number of stages in the VCO and hence dividing the phase by an even factor. The diode connected transistors are used to load the input differential pair instead of fixed resistors. These diode connected transistors in parallel with two transistors with external gate voltage give the possibility to adjust the output resistance of the VCO simply changing the current in the circuit. In this way it is possible to control the overall circuit delay which depends on the value of the VCO output resistance and on the capacitance of its load.

Frequency divider
The output of the VCO feeds the fully digital Divider circuit. It consists of a divider by 3, which provides the 200 MHz clock, followed by a divider by 5. The output of this last divider is compared with the 40 MHz input clock by the phase frequency detector (PFD). The entire circuit is SEU protected to avoid bit flip and bit error propagation using triplication followed by majority voting for each division stage as shown in figure 2 (b). Four voters are implemented after the divider by 3 in order to guarantee the same load for each of the voter which generates the 200 MHz clock.

The charge pump circuit
The outputs up and down of the PFD feed the Charge Pump circuit shown in figure 2 (c) in order to convert the phase difference value in a voltage value. The four switches are opened and closed alternatively to allow both current sources to provide the same current and let the current to flow in clockwise or counterclockwise direction. A careful design of the CP current sources is needed to reduce the static phase error and for this reason they have been designed with high output impedance transistors. Furthermore, the system has a unity gain amplifier feedback for voltage equalization. A mismatch between the two currents in the loop will generate a phase difference to compensate the difference.

RC filter
A first order RC filter is added to guarantee the stability of the entire system. By referring to figure 1, the voltage in the middle point between the CP current sources is controlled by two switches. In this way it is possible to add or subtract an amount of charge in that point to change the C L voltage drop. It can be shown that this system without the load resistor R L is not stable. Indeed, with only C L the transfer function of the circuit has two poles in the origin, the second coming from the VCO. In order to stabilize the system is thus needed to introduce a zero by means of the load resistance R L . To better control fast variation of the voltage in the middle point, sometimes a second order filter is used which foreseen to add a capacitor in parallel to the previous RC filter. Even if this technique decreases the voltage spikes, it slows down the circuit and degrade the stability. After detailed simulations it turned out that for this circuit the voltage spikes are not so relevant and the first order filter was preferred.

The pseudo-LVDS driver
The High Speed Output (HSO) driver is designed to drive data from the pixel chip to a FPGA in the patch panel at the target speeds of 1.2 Gb/s for the IB and 400 Mb/s for the OB. A pseudo-LVDS driver was designed in 0.18 µm CMOS technology and it is the final part of the DTU, just after the DDR serializer. It is based on the LVDS transmission protocol TIA/EIA 644 ( [5]) that will allow to use very high transmission rate by keeping low the power consumption. The adjective "pseudo" is dictated from the fact that some characteristics of the LVDS protocol have been slightly modified in order to enable the circuit to work with a low power supply used for the entire pixel chip. Actually, the power supply is 1.8 V in contrast with the 2.5 V, or even higher, typical value that is often used for this kind of circuits [8], and the output common mode value V OC M is lowered from 1.2 V foreseen by the standard protocol to 1.1 V.
This pseudo-LVDS driver block is made up of a main driver (MD) and an ancillary pre-emphasis driver (PED). A 4-bit DAC for each of the two circuits is implemented to set a suitable current value for a high transmission quality at the end of long transmission line. Actually, the HSO will have to drive a full 5.3 m or 6.5 m long differential transmission line, as shown in figures 3 and 4, therefore a pre-emphasis technique is mandatory in order to overcome the bandwidth limitations of the cable.
To properly reconstruct the signal at the end of the cable the driver strength has to be high enough so that the lines energy loss can be neglected. Sometimes, when very long cables are used for high speed data transmission the Pre-Emphasis (PE) technique is adopted to overcome the bandwidth -4 -  It is also linked to a 5 m long Twinax coaxial cable for a total length of 6.5 m to cover the distance to the patch panel. The data rate in this case is 400 Mb/s. limitation coming from this component and which cannot be improved. Those limitations have a major impact when a fast bit transition between two logic levels is occurring since the output of the driver does not have enough time to settle at the correct voltage value. In this case the PE technique gives an increase of the signal amplitude for a very short time when the bit transition takes place, in order to speed up the transmission by keeping a high signal integrity [6]. The timing diagram of the PE technique is shown in figure 5. To drive the PE it is mandatory to have the delayed copy DATA_D of the transmitted data stream DATA so that it is possible to know if two subsequent bits are different. By looking to the timing diagram shown in figure 5 it is possible to see the PE principle. When the data stream maintains the same logical value in two subsequent bits only the current steered by the MD will drive the signal line. Nevertheless, when the data stream switches from 0 to 1 the PE circuit will add the current I PE D whilst for the inverse transition from 1 to 0 it will subtract the same amount of current. For this reason the PED implements a fast XOR logic in order to add or subtract an amount of current depending on whether there is a bit transition. While the value of the pre-emphasis current is selectable, the pre-emphasis pulse has a fixed duration of half a clock cycle at 600 MHz. The model of the full transmission line has been simulated in the design phase. It is made of a Al (IB) or Cu (OB) Flex Printed Circuit (FPC) connected with a 5 m long special coaxial cable for different signal. During the simulation phase of the circuits, the physical parameters of the FPC together with the S-parameters of the Twinax have been used.

The Main Driver
The Main Driver (MD) circuit is shown in figure 6 and it is based on [7]. It consists of eight transistors which are paired on the two diagonals and a common mode feedback circuit which fixes the driver output common mode and provides half of the total MD current. The MD works in current steering mode providing a current which ranges between 2 mA and 5 mA to a 100 Ω load resistor. This resistance value is set in order to avoid signal reflections due to impedance mismatches between the transmission line and the receiver at the end of it. The inputs DATA+ and DATA--5 -

The Pre-Emphasis circuit
The PE driver is shown in figure 6 and it is based on [9]. It has a structure similar to the MD but it implements a fast XOR logic in order to add or subtract current depending on the data stream.

Measurements
The  inputs with DATA± and DATA_D±. Conversely, a custom delay line was integrated between the PLL and the driver inside the test chip to test the driver by using a clock waveform. Furthermore, a full transmission line consisting of a 30 cm Al FPC and a 5 m Twinax Cable has been set up in order to test the signal integrity at the far end of the cables. Some example of the test results obtained using a 600 MHz clock from the PLL and an external 1.2 Gb/s PRBS are shown in figure 8 in case of a 0% or 50% of PE is set. In case of 0% of PE the MD steers 4 mA circa of DC current but when it has to drive the full transmission line the eye diagrams are closed and the amount of TJ ranges between 0.35UI and 0.57 UI, out of the commercial standard of 0.3 UI. The transmission quality improves when PE is activated with a current amplitude of 50% compared to that of the main driver. Actually, the Total Jitter (TJ) ranges between 0.3 UI and 0.4 UI but it has to be noted the the clock buffer together with the coaxial cable without the driver introduce a jitter of 0.4 UI.

Conclusion and future work
A 600 MHz clock multiplier PLL and a pseudo-LVDS Driver have been designed for the novel pixel chip ALPIDE for the ITS upgrade. These two blocks together with the serializer forms the Data Transmission Unit, the block which is responsible to send data out of the chip at the speeds of 1.2 Gb/s and 400 Mb/s. The test measurements shown that the PLL works properly, albeit some improvement in the duty cycle performances are suggestible. An improved differential to single ended converter is under design in order to improve the output clock waveform symmetry.

JINST 11 C01066
To reduce the amount of the total jitter which affects the pseudo-LVDS driver, modifications have been done to eliminate the slew rate control, which is not required given the high resistive lines. Furthermore, two multiplexers, one for the MD and the other one for the PE, will be integrated to select data from the serializer with a suitable delay and hence to test the full DTU chain including the serializer.