Pixel chip architecture optimization based on a simplified statistical and analytical model

The technical challenges related to increased collision rates of the LHC will significantly affect detector electronics design. Efficient hit processing is achieved in pixel detectors by grouping pixels in regions, which share buffering logic. We present an approach to determine an optimized sharing strategy between pixels, depending on the shape of clustered hits in the detector. Simple statistical models of such shapes have been developed with respect to the position in the detector. The buffering performance of different pixel region configurations has been compared, showing significant improvement from architectures that do not feature pixel grouping.


Introduction
New hybrid pixel detectors with improved resolution capable of dealing with hit rates up to 2 GHz/cm 2 will be required for future High Energy Physics experiments in the Large Hadron Collider (LHC) at CERN [1]. In this scenario complex on-chip digital signal processing and buffering at the absolute lowest possible power consumption will be designed, requiring special design techniques to implement a system working reliably.
Particle hits in the pixel detector are naturally clustered, i.e. signal is produced in more than one pixel unit cell per particle track. A cluster is composed of a given number of hit pixels and its shape depends on a quantity of factors, e.g. position in the detector where it is produced, pixel size, sensor thickness, magnetic field component, radiation damage, discriminator threshold [2]. We describe an approach to determine an optimized sharing strategy of digital logic in pixel chip arrays, depending on the shapes of clustered hits in a detector. In order to correctly process clustered hits each hit pixel should associate hit information to the same cluster without errors. This can be accomplished efficiently by grouping pixels in so-called pixel regions (PR) where buffering logic is shared between them. Such a strategy has already been adopted in current readout chip architectures such as FE-I4 for the ATLAS experiment [3], Timepix3 and Velopix [4,5].
The main goal of the study presented in this work is to determine a range of the most convenient PR shapes and sizes to adopt in the design of next generation pixel chips without taking into account implementation related issues (e.g. routing, occupation of area, power consumption). Assumptions on the shapes of clustered hits in a detector have been made and two simplified analytical/statistical models of cluster shapes have been elaborated. The performance of different PR -1 -  configurations, both square and rectangular (figure 1), has been evaluated by calculating the number of buffer locations and the total number of memory bits. The used cluster models have also been compared to real physics data.
The paper is organized as follows: section 2 describes the analytical models created for the cluster shapes and the statistical assumptions on the number of hit pixels inside such models; section 3 illustrates the results of the study for a set of PR configurations with each cluster model, describing in detail the adopted technique for comparing buffering requirements; section 4 explains the comparison of the study with real physics data. Finally, section 5 draws the conclusions and presents further developments.

Cluster models
With reference to the cylindrical coordinate space inside the detector, we have focused on how cluster shape varies depending on the position in the pixel detector. In the center of the barrel a cluster is usually composed of a central hit pixel with additional signals generated in the periphery. In the edges of the barrel, on the other hand, the shape is elongated because of the track angle from the collision point. The number of hit pixels inside such shape is approximately the same as its length in number of pixels. Two compliant analytical models have been developed from these assumptions: • Clusters related to the center of the barrel have been abstracted with a fixed square envelope of size 3 × 3 pixels (to be referred from now on as symmetrical cluster model). Inside such an envelope it is possible to model a cluster composed of 1 to 9 hit pixels, one of which is assumed to be in the center (figure 2(a)).
• Clusters related to the edges of the barrel, have been modeled using rectangular shapes with φ = 1 and z = n p , where n p ∈ [1, 16] is the number of hit pixels in the cluster (elongated cluster model, shown in figure 2(b)).
Each cluster model has been modeled based also on statistical information on n p . Four typical statistical distributions of n p have been constructed for each model in order to represent different -2 - A statistical cluster model with an envelope of 3 × 4 has also been developed but it is not presented here, as it gives very similar results to the fully symmetrical 3 × 3 model.

Average number of PRs occupied by a cluster
A quantity that is needed for calculating the data buffering during the trigger latency is the average number of pixel regions occupied by a cluster (PR). It depends on the size and shape of the PR and it is related to the probability of the cluster to occupy a given number of PRs. This probability needs to be calculated: i) for each possible position of the cluster in the PR and ii) for each value of n p . A devoted procedure has been elaborated for each cluster model.
For the symmetrical cluster model, it is possible to identify in a PR different categories of pixel cells that produce the same results when the 3 × 3 envelope is placed with its center on them (i.e. corner cells, edge cells, middle cells, as shown in figure 4): therefore it is just needed to compute the probability mentioned above for each cell category. Averaging the results with respect to the number of PRs and n p , using weights from the statistical distribution of n p , produces the cell type contribution to PR. The value of PR for a given PR configuration is finally obtained by combining the number of cells of a given type in the region with their own contribution: where N j is the number of cells of type j in the PR, PR j is the contribution of cells of type j to PR and N PR is the total number of pixel cells in the PR.
For the elongated cluster model, on the other hand, the average number of PRs involved does not depend on the PR height because the φ component is unitary. The probability to occupy a given number of PRs has thus been calculated in all the possible positions of the cluster along the z direction: the obtained results have shown a relation between PR, n p and the PR width w which can be described by the following empirical formula: where D(n p ) is the chosen statistical distribution of n p .

Algorithm for determining the required number of PR buffer locations
The performance of square and rectangular PR configurations can be evaluated using different quantities. First the number of buffer locations (needed to store data during trigger latency) was calculated: this quantity is defined as the required number of memory locations for keeping the overflow probability below a given value; for HEP applications an acceptable value of such probability is typically 10 −2 at the absolute maximum hit rate. In order to determine the overflow probability, the PR buffer has been modeled as an ideal array of finite memory locations where hit packets arrive at a rate that depends on the track rate in the detector (R T , equal to cluster rate) and they are serviced by a trigger accept/reject after a known latency time. If the number of memory locations is h, an overflow is assumed to happen when the number of hit packets that arrive during the latency is greater than h. With the acceptable hypothesis of uncorrelated interactions taking place at each bunch crossing it is possible to calculate the probability of the arrival of more than h hit packets during the trigger latency by applying the binomial distribution formula: where L is the latency time, BX is the bunch crossing period and p is the probability of arrival of a hit packet during a bunch crossing period. p is determined by first calculating the pixel region rate -4 -2014 JINST 9 C03011 R PR , which is the product of the track rate in the detector with the average number of PRs occupied by a cluster: Then R PR is multiplied by the pixel region area and the bunch crossing period: where A p is the area of a single pixel cell and N PR is the number of pixel cells in the pixel region.
The overflow probability has thus been plotted with respect to h in order to determine the required number of buffer locations for a PR (h req ). We have taken into account a case study related to next generation pixel chips with a track rate of 500 MHz/cm 2 in the detector, a trigger latency of 20 µs and a bunch crossing clock period of 25 ns [1]. Figure 5 shows the number of memory locations normalized per pixel (in order to compare the results between different configurations) for square and rectangular PR configurations and both symmetrical and elongated cluster models. It is possible to notice that the same results are obtained with both models for square PRs (figure 5(a)) and this is linked to the fact that the "Average 4.22" statistical distribution of n p has been taken into account for both of them; moreover, it is shown that less memory units per pixel are needed as the PR size increases. On the other hand, the adoption of rectangular PRs ( figure 5(b)) does not bring any benefit to clusters in the center of the barrel. In the case of the end of the barrel the decrease of buffering locations for PR shapes elongated in the same direction of the z of the cluster is lower (14.3%) than the increment (35.7%) for those elongated in the opposite direction.

Total number of memory bits
The most appropriate quantity to study is the total number of memory bits in the shared PR buffer (b tot ), as it holds information related to both the buffer locations and the memory organization. It is possible in fact to have different kinds of memory organization in a PR, depending on the format of the information that needs to be retrieved. For this study we have assumed a packet containing the bunch crossing ID (B-ID) (10 bits) and the hit amplitude (4 bits) in each pixel channel of the -5 - The plots in figure 6(a) show the total number of required memory bits for the symmetrical and elongated cluster models taking into account square PRs. This quantity has an apparent optimum at 2 × 2 with a modest increase for 3 × 3 and 4 × 4 pixel regions that are the most suitable PR configurations. The total number of memory bits has also been plotted for the case of keeping the overflow probability below 10 −3 , requiring up to 30% more memory. The plots in figure 6(b) are related to rectangular PR and confirm the results that have been found in the previous subsection. We can thus deduce that the best solution is to adopt only square pixel region architectures. Choosing to implement one configuration for pixel chips in the central area of the barrel and another one for those in the edges is not feasible, as it makes the design significantly more complicated.
The final memory optimization will depend strongly on the type of memory used for a pixel chip implementation. If dedicated SRAM blocks are used for data buffering it is in general much more efficient, from a layout/area point of view, to have a few "large" memory blocks than many "small" memory blocks. This can make it more advantageous to have a relatively large pixel region (e.g. 4 × 4) instead of the indicated optimum of a 2 × 2 PR. The very hostile radiation environment for very high rate pixel detectors (e.g. ∼ 1 Grad for ATLAS/CMS phase 2 pixels [1]) will dictate what kind of memory structure can be used.

Comparison of used cluster model to real physics data
In order to have an experimental confirmation of the assumptions underlying the elaborated cluster models, results have been carried out using also realistic statistical distributions of n p and they have been compared to those obtained with the typical ones.
-6 - We have used cluster footprint distributions constructed with data coming from CMS LHC Run 200091 (figure 7(a) shows the distribution for the center of the barrel, while figure 7(b) shows that for the edges of the barrel). These distributions are related to all the detected clusters and therefore they include both those associated with tracks and the untracked ones, which are caused by loopers and other effects. Around 40% of the clusters are on tracks; nevertheless, we have chosen to use all the clusters rather than only the tracked ones, because the pixel detector should be able to work with all kinds of hits without problems, including the random ones which are potentially dangerous.
The value of the footprint distributions in point (z, φ) corresponds to the probability of the smallest rectangular envelope that encapsulates a clustered hit to have a size of z × φ. Different strategies, described in the following subsections, have been adopted for each cluster model in order to carry out distributions of n p from them.

Comparison for symmetrical cluster model
The symmetrical model describes all the possible cluster shapes inside a 3 × 3 envelope. Such footprints are described by the distribution in figure 7(a) in the sub-domain z ∈ [1, 3] and φ ∈ [1,3]. This distribution contains 84.2% of the total volume in the given sub-domain and this is therefore an acceptable simplification for the cluster model.
The complete statistical/analytical study has been repeated for the symmetrical model using realistic statistical distributions of n p that have been carried out from the depicted distribution with the following relationship: where P zφ (n p = k) is the contribution of the single point (z, φ) to P(n p = k) and f (z, φ) is the value of the distribution in figure 7(a) in point (z, φ). Based on two opposite hypotheses on P zφ (n p = k), two different distributions have been obtained. The first one takes into account all the possible clusters that can be encapsulated in an -7 - envelope of smallest size z × φ with the same probability (e.g. if (z, φ) = (2, 2) the corresponding P 22 (n p = k) is the ratio between the number of occurrences of clusters made of n p = k, where k = 2, 3 or 4, and the number of occurrences of all the possible clusters); this distribution is plotted in figure 8(a). The second one, on the other hand, considers only the cluster with the maximum value of n p inside the z × φ envelope (e.g. if (z, φ) = (2, 2) we have that P 22 (n p = k) is zero for k = 2, 3 and is 1 for k = 4); it is shown in figure 8(b).
The results of the study for both the realistic distributions, plotted in figure 9(a), are consistent with the ones carried out using the typical "Average 4.22" distribution. The maximum error between the plots related to the two different hypotheses is 8.3% (not taking into account the nonrelevant 1 × 1 case). All the other assumptions that can be made on cluster probability produce results that are contained between those two plots.

Comparison for elongated cluster model
Because n p at the edges of the barrel is around the same as its z component, in order to keep φ = 1 for the elongated model the distribution in figure 7(b) has been intersected with planes parallel to the z axis. Since the distribution contains 84% of the total volume in the sub-domain φ ∈ [1,3] and z ∈ [1,16], the intersections of the distribution with planes φ = 1, φ = 2 and φ = 3 have been taken into account. The realistic statistical distribution of n p has then been subsequently carried out by calculating the average of the three intersections, weighted by the area subtended by each of them ( figure 8(c)).
-8 - The complete statistical/analytical study has been repeated for the model using this distribution and the results are compared with the initial ones (typical "Average 4.22" distribution) in figure 9(b). Consistency between plots validates the adopted cluster model.

Conclusion
A study of shared buffering performance has been conducted for a set of square and rectangular pixel regions. The study is based on two analytical and statistical cluster shape models related to the different positions in a detector; the goal is to get a perception of the most suitable configurations that can be taken into account for the design of next generation pixel chips which will operate at very high hit rates in the HL-LHC experiments upgrades.
The pixel region performance has been evaluated in terms of memory bits in regional shared buffers and the results show that square regions with size from 2 × 2 to 4 × 4 pixels seem the most suitable ones. The most appropriate configuration will be chosen in the indicated range and this will strongly depend on the physical chip implementation.
The statistics underlying the study have been validated using a set of real physics data and the results show consistency with the initial assumptions. A suitable simulation and verification framework is currently being developed using the hardware verification and description language SystemVerilog, with the goal of modeling the candidate pixel chip architectures with more detail. It is also planned to use the elaborated cluster models for the generation of input hits. This verification environment will then be used for further architecture optimizations and will finally be used for extensive design verification.