Timepix4, a large area pixel detector readout chip which can be tiled on 4 sides providing sub-200 ps timestamp binning

Timepix4 is a 24.7 × 30.0 mm2 hybrid pixel detector readout ASIC which has been designed to permit detector tiling on 4 sides. It consists of 448 × 512 pixels which can be bump bonded to a sensor with square pixels at a pitch of 55 µm. Like its predecessor, Timepix3, it can operate in data driven mode sending out information (Time of Arrival, ToA and Time over Threshold, ToT) only when a pixel has a hit above a pre-defined and programmable threshold. In this mode hits can be tagged to a time bin of <200 ps and Timepix4 can record hits correctly at incoming rates of ∼3.6 MHz/mm2/s. In photon counting (or frame-based) mode it can count incoming hits at rates of up to 5 GHz/mm2/s. In both modes data is output via between 2 and 16 serializers each running at a programmable data bandwidth of between 40 Mbps and 10 Gbps. The specifications, architecture and circuit implementation are described along with first electrical measurements and measurements with radioactive sources. In photon counting mode X-ray images have been taken at a threshold of 650 e− (with <10 masked pixels). In data driven mode images were taken of ToA/ToT data using a 90Sr source at a threshold of 800 e− (with ∼120 masked pixels).


Introduction
From their origins in the tracking detectors for high energy physics [1], hybrid pixel detectors have provided innovative and sometimes disruptive solutions in a number of different fields. Single photon counting hybrid pixel detectors have become ubiquitous at synchrotron light sources [2]. Recent advances in spectroscopic X-ray detection promise a small revolution in medical imaging [3,4]. In hybrid pixel detectors the detection threshold is typically set well above the noise floor providing noise-hit-free images. This feature, combined with the possibility to detect the energy and arrival time of a particle interaction has led to numerous new applications and enabled the use of triggerless readout for the first time in a high rate high energy physics tracking detector [5] as well as a novel solution for real-time, non-destructive proton beam monitoring at the CERN PS machine [6]. The compact nature of the ASIC/sensor assembly combined with a miniaturised USB readout system [7] have led to, among others, applications in schools, helping students to 'see' particles interacting with matter [8], and to radiation background monitoring and dosimetry in various space applications [9], including at the ISS [10].
In many applications a detection area of a single ASIC (typically 2 cm 2 ) is sufficient. However, where large areas are to be covered tiling becomes necessary. As the ASICs are typically tile-able on 3 sides a single large area sensor with 2 × n chips can be fabricated with multiple ASICs attached to the same sensor [11]. However, to cover larger areas seamlessly rather complicated rooftile-like structures have to be created whereby the peripheral electronics and associated wire bonds must be hidden beneath the neighbouring tile. A much more elegant solution would be provided by using an ASIC which can be tiled on all 4 sides [12]. The Timepix4 ASIC described here is, to the best of our knowledge, the first large area, high resolution pixel detector readout ASIC which can be tiled on all 4 sides. Moreover, in data driven mode it is capable of allocating a hit to within a 195 ps time bin registering Time of Arrival information (ToA) while also recording the amplitude of the 2022 JINST 17 C01044 detected charge using Time-over-Threshold (ToT). Using the full output bandwidth available on 16 × 10 Gbps links the ASIC can detect up to ∼3.6 Mhits/mm 2 /s. Section 2 will start by outlining the main specifications for the Timepix4 ASIC. The floorplan is described detailing the strategy used to permit tiling on all four sides. The analog and digital pixel cell and the super-pixel architecture are then explained. The section concludes with a description of the two main readout modes of the ASIC, data driven and frame-based. Section 3 focusses on the results of electrical characterisation and section 4 covers first measurements with radioactive sources of an ASIC which has been bump bonded to a full-sized 300 μm thick p + in n silicon sensor. We will conclude in section 5 by a summary and an outline of future plans.

Specifications, floorplan and circuit implementation
The Timepix4 chip is specified to work in 2 primary operating modes: photon counting (or frame-based) and data driven.
In the photon counting mode Timepix4 should be able to count up to a maximum of ∼5 Ghits/mm 2 /s (@2.2 ke − or 8 keV in Si). There is a single threshold and 2 counters per pixel providing continuous read/write operation. The counter depths can be either 8-bit or 16-bit with maximum frame rates determined by the number of bits used for counting and the number of high-speed data links used for readout (up to 16 links each running at up to 10 Gbps). In 8-bit mode the maximum frame rate is ∼90 kfps and in 16-bit mode it is ∼45 kfps when using the full readout bandwidth of 160 Gbps.
In data driven mode the particle arrival time is recorded within a time bin of 195 ps and time-over-threshold is recorded with a resolution of ∼700 e − FWHM (∼1.5 keV FWHM in Si). Each hit generates a 64-bit word and at full speed the ASIC correctly detects ∼3.6 Mhits/mm 2 /s. The ASIC is composed of 448 × 512 pixels. It is designed to be connected to a sensor which is composed of 448 × 512 square pixels at a pitch of 55 μm. The top surface of the ASIC comprises a uniform matrix of 488 × 512 bump bonding pads (with octagonal bump openings of 12 μm across). The readout pixels are composed of 2 matrices of 448 × 256 pixels with dimensions of 55 μm × 51.4 μm each. A fan-in, which uses the 2 top layers of metal, connects the uniformly distributed bump bonding pads to the pixel inputs adding ∼45 fF to the input capacitance seen by the front-end electronics. The layout of the fan-in is made such that almost all pixels in the matrix have the same input capacitance. This approach leaves ∼460 μm available under the bump bonding pads at the top and bottom of the ASIC and ∼920 μm between the two pixel matrices. This is where the peripheral circuitry, I/O and Through Silicon Via (TSV) structures are implemented, see figure 1(a). In order to probe the ASIC on wafer and to wire bond single chips without requiring TSV processing, wire bond extenders are added to the top and bottom of the ASIC. As indicated in figure 1(b) these can be used for conventional wire bonding or diced off when TSVs are to be used for I/O. Figure 1(c) is a die photograph of the ASIC.
The analog front-end of the Timepix4 chip consists of a charge sensitive amplifier with a programmable feedback capacitance, see figure 2. The pixel gain can be adjusted according to the detection requirements (see table 1). In the high gain mode, a 3 fF metal-to-metal capacitance is used. An additional capacitance of 3 fF can be added in parallel to the main capacitance to decrease the gain of the amplifier by enabling the low gain mode bit. A MOS gate capacitance can also be enabled in the signal path to program the chip in the adaptive gain mode [13]. This last mode works only for signals of positive polarity and extends the dynamic range of the ToT up to ∼800 ke − .

Figure 2.
A block diagram of the Timepix4 front-end. There are 3 gain modes. In high gain mode the feedback capacitor is 3 fF and in low gain mode it is ∼6 fF. The dynamic range of the front-end can be further extended when detecting positive polarity signals by adding a CMOS gate capacitance in log gain mode. The polarity of the front-end is programmable and the front-end can be disabled (powered down), if required. There are 5 bits for tuning the threshold in each pixel.
The feedback loop that resets the front-end and compensates for leakage currents is based on the Krummenacher circuit [14]. It is optimized for negative polarity input sources and compensates for up to 10 nA per pixel at nominal settings.
In the photon counting (or frame-based) mode the ASIC is intended for use in environments (such as synchrotrons) where high count rates are required. Under these conditions a high counting speed is of the utmost importance. We aim therefore for a deadtime not exceeding 50 ns (@2.2 ke − or 8 keV in Si) permitting a count rate of up to ∼5 Ghits/mm 2 /s. The risetime of the preamp is 10 ns at nominal settings.
A discriminator compares the preamplifier output voltage pulse with a single energy threshold. A 5-bit threshold adjustment DAC corrects for pixel-to-pixel threshold mismatch. As we aim to bin hits to within 195 ps in data driven mode the front-end jitter should be <60 ps rms for input charges >10 ke − .
The power consumption per unit area has been set at a maximum of 1 W/cm 2 , a level at which only modest cooling is required. The power supply is 1.2 V. At the nominal settings, 3 μA is consumed by the preamp, 4 μA by the discriminator and 0.5 μA by the threshold adjustment circuit.
In the following we will describe the behaviour of the pixel in data driven mode and photon counting mode.

Data driven mode
Each pixel matrix is composed of 224 × 256 double columns. The pixels in each double column are grouped into super pixels containing 2 × 4 pixels, see figure 3. Each super pixel has a voltage controlled oscillator (VCO) which oscillates at 640 MHz when active. In data driven mode the rising edge of the first discriminator to fire in a super pixel is used to start up the VCO as shown in figure 4. The VCO itself contains 4 inverter stages with 4-bit delay calibration. At the rising edge of the discriminator the state of the inverters is recorded as the 4-bit ultra-fast Time of Arrival (ufToA-Start). This covers the case where the VCO has already been started by another pixel in the super-pixel. The output of the VCO is used to increment the 5-bit fast Time of Arrival (fToA) counter. The VCO is stopped on the next rising edge of the 40 MHz master clock. At that moment the content of the global 16-bit ToA is latched along with the value of the fToA counter (stored as fToA-rise) and the internal state of the delay inverters of the VCO (ufToA-stop). The ToT counter records the number of positive transitions of the 40 MHz master clock while the discriminator is above threshold. When the discriminator falls below threshold the VCO is started again and increments the fToA-fall counter until the next rising edge of the master clock. At this stage the data packet is ready for readout and the 9-bit pixel y-coordinate is added to form a 56-bit packet. At the end of column the 8-bit double column address is added providing 64 bits in total per hit. One of the main challenges associated with this design is the precise distribution of the timestamp clock across the pixel matrix. The solution adopted was already explained in [15] but the operating principle is described here for convenience. As we have already seen the tagging of the particle arrival time is achieved using a local VCO in each super pixel and also recording the state of the internal inverters. However, for the time precision to be maintained across the pixel matrix the reference clock used at each super pixel must be accurately reproduced. Each double column is divided into 16 blocks containing 4 super pixels (i.e. a super pixel group, SPG), see figure 5. Each block contains 2 Adjustable Delay Buffers (ADBs), one of which is used to propagate the master clock up the double column and while the other is used for downwards propagation of the master clock. A controller at the end of each double column ensures that the delay between the input master clock and output master clock is maintained at exactly 25 ns. The controller adds or subtracts delays to the ADB's via an 8-bit bus. The ADBs contain 14 course delay elements and 15 fine delay elements. The 4 MSBs are used to derive a 14-bit thermometer code which adds or skips a course delay element while the lower 4 bits are used to derive a 15-bit thermometer code used to add up to 15 small capacitors capable of inducing a fine delay step of only 4 ps. It is also possible to completely skip ADBs to find and eliminate outlier ADBs which may otherwise prevent a full double column from operating correctly.
Each time a pixel is hit it requests permission to send its hit information down the double column. A priority encoder is used to determine the order in which pixels in a given double column are read out to the end of column. The ends of column at the top and bottom of the ASIC are split into 2 groups, left and right, called data fabrics (see figure 6). Every forth double column is connected to a link forming segments. The data from each segment is directed via its link to a packet processor in the middle of its (top or bottom) periphery. The packet processor (one for the top matrix and one for the bottom matrix) aggregates the packets from 224 double columns. There is then a router which multiplexes the data onto the available fast serial links (GWTs) which are an evolution of those used in a previous design [16]. The chip can be programmed to use between 1 and 8 serial links on each periphery and the readout speed of the links can be programmed to between 40 Mbps and 10 Gbps.
The analog power consumption depends on the exact biassing conditions used. In the default configuration it is estimated to be ∼400 mW/cm 2 . The digital power consumption depends on the clock frequency used and in data driven mode increases depending on the incoming hit rate. Below ∼3 Mhits/mm 2 /s and at full clock speed it is below 200 mW/cm 2 .

Frame based (photon counting) mode
In frame-based or photon counting mode the logic available in each pixel is used differently. The local VCO in the super-pixels and the 16-bit TOA distribution are turned off. For each incoming hit above threshold a pixel counter is incremented. Each pixel contains 2 counters which have a programmable depth of either 8 or 16 bits. As there are 2 counters per pixel continuous read/write operation is used. One counter is being read out while the other is counting. During readout 64-bit data packets from each super pixel contain the raw count data. In the case of 8-bit mode, the count data from all 8 pixels is contained within one packet. In 16-bit counting mode each super pixel generates two 64-bit packets of data. A frame start packet is sent off chip and this is followed by the packets from each data segment in a sequence as defined by the number of serializers being used (between 1 and 8 per periphery).

Electrical measurements
A single naked ASIC with wire bonding extenders was mounted on a daughter card designed at Nikhef. Readout was performed using the Spidr4 hardware and in-house software. The results quoted here are for the Timepix4v1 version of the ASIC. In that version an incorrect PMOS device model led to the on-pixel VCO oscillating faster than expected (at ∼920 MHz instead of 640 MHz) and it was not possible to lock it to the 40 MHz master clock in the pixel matrix. The VCOs in the matrix were therefore running unlocked with a ToA LSB of ∼135 ps instead of 195 ps. The same VCO was used for the PLL of the fast output links. In this case it was possible to lock the VCO to the 40 MHz master clock by lowering the power supply of the output block. With the lower power supply it was possible to read out at 2.56 Gbps (instead of the design value of 10.24 Gbps). At the time of writing the v2 version of the chip (with a corrected VCO) is being diced prior to testing.
The ASIC was programmed with default settings and the noise and threshold variation measured in both frame-based (photon counting) mode and data driven mode. The noise was measured by scanning the threshold DAC though the noise floor and fitting the count data from each pixel with a Gaussian distribution. The design value of 13.5 e − per DAC step was assumed, a value later confirmed with radioactive source measurements. Figure 7 shows the distribution of the threshold adjustment bits and the measured electronic noise in frame-based mode. There are no systematic variations in the threshold tuning bits and only some (expected) increase in noise above the peripheral circuitry. Shielding had been added to the input nodes of the pixels whose pads lie on top of the peripheral circuitry adding a further ∼30 fF to the input capacitance of those pixels leading to an increase in noise of ∼10 e − . 137 pixels out of 229k have a noise of >80 e − rms. In data driven mode there is some residual coupling between the propagating ToA and some of the pixels whose inputs are on top of the peripheral regions. At a threshold of 1000 e − 100 pixels had to be masked whereas at a threshold of 650 e − 1250 pixels had to be masked. The time resolution was measured by injecting a test pulse on both the analog and digital inputs of a single pixel. The measurements are shown in figure 8. For the digital injection all hits were recorded with less than 100 ps rms precision. Timewalk on the analog front-end dominates at lower input charges but above ∼7 ke − all hits are recorded with a precision of <125 ps rms. A full summary of the electrical measurement is provided in table 1.

Measurements with radioactive sources
One Timepix4v1 wafer was processed with bump bonds by Advacam in Finland after coring the wafer from 300 mm to 200 mm. ASICs were selected randomly from the wafer and flip chip bonded to 300 μm thick p + in n silicon sensors. A photo of one assembly mounted on the Nikhef chip card is shown in figure 9. Figure 10 shows images obtained using the chip in photon counting mode with a threshold of ∼650 e − using an X-ray tube with Cu target at a bias voltage of 30 kV. There are only 6 missing pixels in the image demonstrating the quality of the ASIC, sensor and  bump bonding. The left image is composed of raw data while the right image is corrected with 20 flat field images to remove count variations due to the non-uniformity of the X-ray source.
In order to illustrate the performance of the ASIC in data driven mode a 90 Sr source was placed above the sensor and data was taken over a 10 s interval. A 5 ms slice of the arrival time data is shown in figure 10 illustrating how the individual tracks are correctly grouped. A closer examination of the data showed the expected uniform response of the ToA data. Analysis of the statistics of the fToA-rise, fToA-fall and ufToA-start and ufToA-stop data showed distributions which are consistent with the free running behaviour of the VCO (figure 11). Figure 10. X-ray images of a dried fish taken using Timepix4v1 in frame based mode. The threshold was set at 650 e − . The source was an X-ray tube with Cu target and a voltage of 30 kV. On the left raw data and on the right the flat field corrected image. Figure 11. Images of measurements taken with a 90 Sr source in data driven mode. The image on the left shows the total number of counts per pixel over the 10 s acquisition time. On the right is the ToA information covering a 5 ms time slice. The operating threshold was 800 e − and ∼120 pixels were masked.

Summary and outlook
We have presented the Timepix4 readout ASIC describing its architecture and details of its implementation along with first electrical and radioactive source measurements. The performance of the front-end is consistent with simulations and it is possible to operate the chip at a threshold of 650 e − in photon counting or frame-based mode (with <10 masked pixels) and in data driven mode at a threshold of 800 e − (with <120 masked pixels). First electrical measurements show that with a fast detector depositing >7000 e − per pixel all hits can be correctly allocated to a 195 ps bin. An error in PMOS device modelling led to the on-super-pixel VCO oscillating at a much higher frequency than foreseen. A new version of the ASIC is expected soon where this issue and other minor bugs are fixed. We will publish detailed measurements of the new version when it becomes available.
Looking ahead we can appreciate that ASICs such as Timepix4 in data driven mode, while providing excellent and versatile tools in many domains, produce much more data than may be needed in any given application. Moreover, while the data driven architecture strongly reduces the amount of useless data produced by frame-based readout schemes, the data for any given cluster of hits is likely to leave the chip in an unforeseen order when the event rate is high. The quantity of data is an issue but also the somewhat disordered nature of its off chip transmission. We would foresee to address such issues in future in two ways. Firstly, if we adopt a more scaled CMOS technology (say 28 nm) it should be possible to create local intelligence at the pixel level whereby pixels belonging to a given cluster are aware of their surroundings and the hit data is only sent to the chip periphery as a train of clustered hits. Secondly, it should be possible using modern processes to integrate an on-chip processor at the chip periphery which can be trained (in the case the processor is based on a neural network) or programmed (in the case of a more conventional processor) to identify clusters with signatures consistent with a given particle species of interest. An ASIC programmed to work in an electron microscope may, for example, only send out impact point coordinates for each impinging electron, while a chip programmed for a High Energy Physics experiment may only send out track information consistent with selected high momentum particles. The benefit of such an approach is that a single ASIC could be trained (or programmed) for very different purposes always aiming to reduce the off-chip bandwidth to a minimum.