A prototype hybrid pixel detector ASIC for the CLIC experiment

A prototype hybrid pixel detector ASIC specifically designed to the requirements of the vertex detector for CLIC is described and first electrical measurements are presented. The chip has been designed using a commercial 65 nm CMOS technology and comprises a matrix of 64 × 64 square pixels with 25 μm pitch. The main features include simultaneous 4-bit measurement of Time-over-Threshold (ToT) and Time-of-Arrival (ToA) with 10 ns accuracy, on-chip data compression and power pulsing capability.


Introduction
CLIC (Compact LInear Collider) is a concept for a electron-positron (e + e − ) linear accelerator currently under study [1] with a centre of mass energy from a few hundred GeV to 3 TeV. It will allow for precision Higgs and Top measurements and the testing of models for new physics such as supersymmetry [2].
The work described in this paper focuses on the vertex detector of CLIC. Requirements for the detector include: • low average power consumption (less than 50 mW /cm 2 ) to allow for air cooling • low material budget (0.2% of a radiation length) to reduce particle scattering • small pixel pitch (25 µm) to achieve the required spatial resolution • simultaneous Time over Threshold (ToT) and Time of Arrival (ToA) measurements to correctly identify tracks.
CLICpix is a hybrid pixel detector for the CLIC experiments. A prototype of the readout ASIC has been designed using a commercial 65 nm CMOS technology. Details on the implementation and characterization results are presented in the following sections.

Description of the chip architecture
The CLICpix prototype includes a 64×64 pixel matrix, working in a single event detection mode. Each pixel measures 25×25 µm 2 , resulting in a sensitive area of 1.6×1.6 mm 2 for the entire matrix. The chip was designed in a 65 nm CMOS technology previously characterized for use in High Energy Physics applications [3]. Figure 1 shows a block diagram of the chip. The pixel matrix is divided in 32 double columns (grouping arrays of 32×2 pixels). Pixels in a double column have their digital part merged together to optimize the area of the digital circuitry. A "periphery block" is included to manage I/O and to provide biasing and global configuration.

Pixel and superpixel architecture
Each pixel implements an analog and digital front-end, shown in figure 2. Current pulses coming from the sensor or from a test capacitor are amplified and shaped by the preamplifier and feedback network and compared to a global threshold. This threshold is locally adjusted with a 4-bit Digitalto-Analog Converter (DAC) to compensate for pixel-to-pixel threshold mismatch. The result of the comparison is used in the pixel logic as an enable signal for the counting clocks of both the ToT and ToA counters. Local state machines are implemented in order to decide when to stop counting: for the ToA counter, it is tied to a global shutter signal, which is synchronously distributed to the whole pixel matrix and used as a timing reference. For the ToT measurement, the counting will stop as -2 - soon as the discriminator signal goes below the threshold. The ToT counting clock frequency can also be divided by a programmable value, to adjust the dynamic range to the measurement. The content of the 4-bit ToA and 4-bit ToT counters forms the 8-bit data acquired by each pixel with a valid hit. Pixels without input pulses that exceed the threshold will have empty counters.
The pixel array also implements a data compression scheme. This is done by adding a flag for every pixel, that is set to 1 if the pixel registered some hits during the shutter time and is left at 0 if no hit was recorded on that pixel. A multiplexer controlled by this flag causes the pixels with no information to be skipped during the readout phase. This means that each pixel has one additional bit that needs to be read out (9 bits instead of 8) but pixels without a valid hit only contain a single bit of data.
Pixels are grouped together in 2 by 8 pixel clusters (or "superpixels") and 8 superpixels form a double column. Some blocks are shared between pixels in the same superpixel. One such block is the clock distribution tree. Another block implemented on superpixel level is an additional hit flag flip-flop to allow for superpixel readout skipping. It works in the same way as the hit flag in the pixel, by having its value stored in a latch at the beginning of the readout. This latch controls a multiplexer which allows the superpixel to be skipped during the double column readout. In this case in order to check if a pixel in it was hit, the value of the hit flag is calculated with a logic OR among the 16 hit flags of individual pixels. An additional flag bit is added in every double column to cause its readout to be skipped completely if no associated pixel is hit during an acquisition.

Periphery and chip operation
During readout, the two counters in each pixel are connected together as an 8-bit shift register. Pixels in every double column are daisy-chained, from the top to the bottom. Data is then shifted out one double column at a time, one bit per clock cycle, starting with the leftmost double column. Each pixel shifts the data to the next one, making the counters work as a long shift register. When -3 - all the data from one double column have been shifted out, the readout continues with the next column, until all columns have been read out.
The chip uses two different clock signals: a 320 MHz clock for the readout and I/O and a 100 MHz clock for the acquisition. The clock is sent to the pixels from the bottom to the top of each double column through a series of buffers (one per superpixel), such that the clock arrives to each pixel with a slightly different delay. This delayed (i.e. not in phase) clock distribution has multiple effects on the pixel array. First it helps to minimize the total area, as it reduces the number of clock buffers that need to be implemented. Also, by not making all pixel logic gates switch at the same time, it reduces the instantaneous power consumption. The impact of receiving the clock with a different phase is negligible during data acquisition, as input pulses are not in-phase with the clock and there is no communication between pixels. ToA measurement is not affected as the reference shutter signal is distributed synchronously. During data readout, the clock delay modifies the timing of the circuitry, but simulations showed that its impact on the timing was negligible.
The periphery also includes thirteen 8-bit DACs to provide the biasing voltages to the analog blocks and one 12-bit DAC for the threshold voltage. DACs providing bias voltages to pixels have their output buffered to cope with gate leakage, since their output is connected to every pixel.
Due to cooling constraints, the average power consumption of the detector must be lower than 50 mW /cm 2 , which motivates the use of a power pulsing technique. The power pulsing is implemented by having two DACs in the periphery to program each state of the most powerconsuming pixel analog blocks (the preamplifier and the discriminator). One DAC is used to control the nominal biasing during data acquisition. The other one is added to provide a stand-by biasing current within a range of values several times lower than the first one. A multiplexer for each double column in the periphery can switch the biasing of the pixels from the nominal value to this "low-power" state, decreasing the power consumption of the analog part by more than one order of magnitude. In this state, the analog circuits are not fully functional, but it is possible to wake up the pixel and start an acquisition in about 15 µs. A summary of the main CLICpix specifications can be found in table 1.

Implementation details 3.1 Analog front-end
The preamplifier uses the Krummenacher feedback architecture [4]. The circuit is made of a cascoded single-ended NMOS amplifier and a feedback network; the schematic is shown in figure 3. Feedback and test capacitors are implemented using vertical natural capacitors (using the parasitic lateral metal-to-metal capacitance) in order to have good accuracy at the expense of a larger area. The leakage compensation capacitor (C1 in figure 3) is implemented with a PMOS gate capacitance because of its high nominal value (∼200 fF). NMOS transistors in the preamplifier are implemented as Deep N-Well (DNW) transistors to isolate the front-end from the substrate.
The discriminator is a two-stage open-loop amplifier, with an additional digital inverter connected to its output to increase its total gain. The discriminator output has a dynamic range of 0.2 V to 1 V for a 0.5 mV input swing. The input-to-output delay is less than 5 ns. The two stages were sized to minimize the effective threshold dispersion due to transistor mismatch, while being compatible with the available area. The threshold standard deviation due to the discriminator is 5.2 mV, which combined to the dispersion due to the preamplifier gives a total standard deviation of 7.2 mV, corresponding to 160 e − (according to simulations).
The threshold calibration DAC is a 4-bit binary weighted current DAC. A dynamic range of four bits is sufficient to compensate for ±3 standard deviations of the threshold value due to pixelto-pixel mismatch. This leads to a total dynamic range of about 43 mV, which corresponds to a Least Significant Bit (LSB) of 60 e − for a 4-bit accuracy, which is comparable with the equivalent input noise.

Digital structures
Each pixel is designed to measure ToT and ToA simultaneously. The output of the discriminator is used as an input to two state machines, which generate counting signals for two 4-bit counters. The counters are implemented as 4-bit linear feedback shift registers (LFSR) used to count ToT -5 -and ToA. Each counter can work in two modes. During acquisition, it receives a counting clock for the ToT or ToA measurements. During readout, a multiplexer is used to connect both counters together as an 8-bit shift register, in order to shift the data down the column and off-chip.
Each pixel includes two state machines that control the generation of the counting signals for the ToT and ToA counters. These state machines are implemented as Asynchronous State Machines (ASM), which means they do not use latches or flip-flops to store the states. The current state is stored exploiting the delay of the logic gates and it is updated as soon as an input changes.
Latches for storing the local pixel configuration are included: 4 bits for the threshold adjustment DAC, one for pixel masking, one to allow for the injection of test pulses and one to configure the pixel for event counting mode. The mask bit causes the pixel to be skipped during readout. The input of these latches are connected to the counter flip-flops. In order to program them, configuration data are shifted in the counters from the periphery and a global signal is sent to latch their values.

Chip characterization
A custom test set-up was developed to test and characterize the CLICpix prototype, consisting of a mezzanine board holding the chip, an FPGA development board and a command line interface program running on a Linux PC. The characterization was performed using the internal test capacitor in a bare chip, without using a bonded sensor.
Basic functionality of the chip was successfully validated. Tests on the ToT counting were performed injecting a controlled amount of charge in the preamplifier. A plot showing the dependency of the ToT on the value of the feedback current (which determines how fast the pulse returns to baseline) is shown in figure 4. The ToT gain (the slope of the ToT characteristic) was measured for the entire matrix, showing a 4.2% r.m.s. spread. The corresponding pixel map can be found in figure 5. The map shows a higher gain in the central part of the matrix and a lower one in the corners, although the variation is within expected values due to pixel-to-pixel mismatch. The nonlinearity of the ToA measurement was also tested, by sending test pulses with different delays with respect to the shutter signal and it was found to be lower than 0.5 LSBs across the full dynamic range of the ToA counter (0 to 160 ns).
The power pulsing feature was validated and it reduces the power consumption of the analog structures by more than one order of magnitude while in an idle state. The time necessary to switch back to an "acquisition" state was found to be 15 µs.
Tests on analog performances of the pixel front-end were performed using the pixel counting mode (summing the results of multiple measurements, in order to work around the 4-bit saturation of the counter). This measurement was used to find the baseline voltage of each pixel front-end. This allowed the development of a calibration algorithm to equalize the baseline of pixels across the entire matrix. The threshold value corresponding to the lowest and highest calibration DAC code in every pixel was measured, so that the DAC characteristic for every pixel could be interpolated. From this measurement, an individual DAC code for each pixel was chosen in order to equalize the effective threshold. The results of the measurement, before and after calibration, can be found in figure 6. The r.m.s. of the threshold distribution before equalization was found to be 128 e − . The spread was reduced to 22 e − after equalization.
-6 -  -7 -  The same measurement was used also to calculate the noise level in each pixel, by extracting the r.m.s. of the Gaussian curves. A noise map of the full matrix is shown in figure 7. A vertical stripe pattern is visible in the plot, with pixels in odd columns having a higher noise than pixels in even columns. This is due to the slightly different layout of pixels on the right and on the left of a double column. Results are, however, close to simulations in both cases.
A summary of the characterization results compared to simulated values can be found in table 2.

Summary and future work
The CLICpix prototype chip has been designed using a commercial 65 nm CMOS technology, featuring 25 µm by 25 µm pixels, simultaneous 4-bit ToT and ToA measurements, power pulsing and on-chip data compression. The chip has been characterized using test pulses. Measurements match simulations closely, with an equivalent noise < 60 e − and a pixel-to-pixel threshold spread of 22 e − r.m.s. after calibration. Future work includes completing the characterization and investigating the possibility of bumpbonding a sensor to the prototype chip.