Low power, high resolution MAPS for particle tracking and imaging

We describe here the first monolithic pixel detector prototype embedding the OrthoPix architecture, specifically designed to deal with imaging applications where the relevant number of pixel hit per frame (occupancy) is small (on the order or less than 1%), like in High Energy Physics, Medical Imaging and other applications. Current state of the art employs complex circuitry into the pixel cell to discriminate relevant signals, leading to an extremely effective, non-destructive compression at the price of large power consumption and pixel area limitations. The OrthoPix architecture instead implements a passive projective compression scheme, leading to low power, small pixel cell and large area devices.


Introduction
The pixel sensor prototype we present has been produced to test and demonstrate the feasibility of the OrthoPix architecture, specifically designed to compress digital raster images containing just a few pixels of interest in a full frame [1]. Such sparsely populated images represent the typical raw dataset in many scientific and industrial applications, i.e. the irreducible data generated by the primary detected radiation (photons, charged particles) for the case of single particles tracking, in High Energy Physics, Medical Imaging, and many other fields. Detectors of choice for such applications are usually pixel sensors, where the particle positions is measured by analysing the clustered charge signal that each passing/absorbed particle releases into the pixels matrix [2]. A stringent requirement for such a method to be effective is that, in a single frame, the pixel clusters are distinguishable. In most practical applications, the occupancy, i.e. the number of pixels carrying useful signals respect the total pixels count, is actually on the order of few per cent or less [2].
Many imaging applications which integrate the particle flux to extract the information they need (as for example in x-ray imaging, traditional microscopy or electron microscopy), as long as the particle flux has a time structure and provided a fast enough detector, can be addressed by the superimposition of a set of sparsely populated images. The performance gain possible by employing such techniques has been recently demonstrated by the growing number of groups working with the growing family of so-called super-resolution microscopy techniques [3], which also led to the Nobel prize in chemistry in 2014 [4].
The drawback of traditional pixel detectors employed for sparsely populated images capture is the necessity of reading the whole matrix to extract the few useful pixels, which translates in a limited frames-per-second speed, even in the latest scientific-grade commercial CMOS detectors, -1 -especially if large pixel count is required for large area/density acquisitions. Therefore, whenever the need for sparsely populated images acquisition arose, custom made solutions using the hybrid-pixel approach have been successfully employed [5,6]. While hybrid-pixel detectors offer un-matched performance for many applications, they also have intrinsic technological constrains (briefly illustrated in the next section) which set a minimum practical pixel pitch (around 25 µm), a high (compared to standard CCD and CMOS detectors) power consumption and, last but not least, a high production complexity and cost.
The architecture we implemented instead approaches the problem of sparsification (compressing a sparsely populated image) by means of topological, passive data reduction instead of the active, pixel driven system adopted by traditional hybrid pixel detectors. The main advantage of this architecture is that the pixel becomes an almost passive component, drastically reducing the power consumption and allowing shrinking its size, if needed, down to few microns or less, depending on the implementation technology. The principal drawback comes from the fact that it is a static, lossy compression scheme, which starts disrupting the original information when occupancy rises over a given level.

Overview
We provide here only a brief overview of the major features of the OrthoPix architecture, which has been thoroughly described in [1], together with a short recall to present state-of-the-art solutions, as a means for comparison. The idea at the basis of the OrthoPix architecture is an evolution of the traditional projection scheme, which uses the x and y projection of a data matrix, widely employed due to its effectiveness and implementation simplicity [7]. The well-known drawback of the traditional projection scheme is that one is forced to handle at most one particle per frame to correctly reconstruct the position. Poisson statistic therefore dictates that the particle average rate should be about 1/10 of the frame rate to ensure full reconstruction efficiency [1]. Managing more events per frame has instead greater impact on the sustainable particles rates: if we assume for example that our system can distinguish up to eight particles per frames, to achieve 99% reconstruction efficiency the average number of particles per frame can be as high as four times the frame rate (P ≈ 4R), a factor 40 increase with respect to the at most one particle per frame case (P ≈ 0.1R) [1]. A projective system capable of handling multiple events per frame would therefore boost speed performance.

State of the art
A way to handle multiple events with a pixel detector is to implement some intelligence into the pixel itself, as is done in hybrid-pixel detectors [2]. Limits of the hybrid pixel detectors are the inherent production complexity, mostly due to the difficulty of connecting the two layers, and the consequent high production and assembly cost. The bump bonding process also limits the single die size, making the coverage of large uniform areas without supporting structures a major issue. Finally, the per-pixel circuitry requires high power consumption and sets a limit in reducing the pixel cell size (problematic for 20 µm pixel pitch or smaller).
-2 - Figure 1. The additional information provided by adding a third projection, along a diagonal in this example, makes it possible to correctly reconstruct the position of more than one single element in a frame.
Using more than just two projections helps in disentangling multiple hits situations. The idea has been explored both in gaseous [8] and solid state [9] detectors: one of the natural ways to implement this approach, and an effective way to illustrate the concept (figure 2), is by adding a diagonal projection to the basic rows and columns projection scheme. The appeal of such a projective compression method is apparent for any applications where a lossy, yet highly efficient compression system is acceptable. Such a topological compression does not in fact require any active element (transistors), leading to extremely low power and low footprint implementations, ideal whenever the lowest possible power consumption and/or the smallest pixel pitch are paramount.

The OrthoPix solution
The question is whether the straightforward approach of figure 1 is the best one, whether it can be further improved and, more in general, if there is a model indicating whether an optimal way exists to improve multiple hits reconstruction by adding an arbitrary number n of projections. For example, it can be easily demonstrated that the solution sketched in figure 1 is not the most efficient one for the case where three projections (n = 3) are used [1]. An accurate analytical treatment of the projective compression approach led to the formalization of the OrthoPix architecture. It can be that, given an arbitrary set of N 2 pixels and a set of n projections, there is an unique way, or a set of equivalent ways, to connect each pixel to each projection which maximizes the system efficiency in reconstructing multiple hits [1,7]. This generalized approach is illustrated in figure 2.
The OrthoPix architecture actually uses this "maximal-efficiency" connection scheme to link each pixel to a projection. The formula shown in (2.1) approximates the efficiency of the system in reconstructing multiple hits in a frame, assuming one hit is equivalent to one pixel with a signal level exceeding a given threshold. In equation (2.1) N 2 is the number of pixel composing the matrix, H the average number of hits per frame, and n the number of projections implemented. Figure 3 shows the reconstruction efficiency as predicted by (2.1) (triangles) and as it results from Monte Carlo runs (squares). While the approximate formula clearly departs from the Monte Carlo for high hits densities, it provides excellent agreement for the most interesting region, where reconstruction efficiency is close to 100%. A complete, exact efficiency formula is described in [1]. For the sake of simplicity, we so far considered a hit equivalent to one single pixel hit. Differently from this one -3 -  hit -one pixel scenario, in real applications each hit will lead to a cluster of pixels. If the cluster seed signal is dominating, a threshold mechanism can effectively reduce this condition to the one just considered, otherwise, more than one pixel will have to be considered for each hit.

Monte Carlo simulations
To effectively model the reconstruction efficiency in the case of clusters, specific Monte Carlo simulations have been implemented. To realistically check the effectiveness of the OrthoPix lossy compressing scheme, which performance strongly depends on the average cluster size and background noise, simulations capable of rendering the relatively noisy (at room temperature), artifacts prone output of a MAPS sensor are necessary. To overcame the problem, a database of particle clusters, background noise, empty frames and artifacts has been built from a large dataset gathered during test beam conducted at the CERN north hall [10]. This dataset has been chosen because it was gathered with a detector whose pixel size closely matched that of the OrthoPix prototype [11].
-4 - The Monte Carlo engine did use all database elements to generate the most realistic possible frames including background noise, artifacts and noise pick-up effect, and blending in them the clusters from the database after applying further random transformations (rotation, mirroring, translation, etc.) to maximize the clusters diversity. The rendered frame was then passed to the algorithm which simulated the OrthoPix compression scheme, leading to a bit-stream subsequently reconstructed into clusters positions and approximate sizes. As the number of "real" clusters in each frame was therefore known a priori, this allowed to check the OrthoPix performance, as well to verify which effects were the most detrimental to the system efficiency, like average background noise, artifacts, etc.
As an example, figure 4 illustrates Monte Carlo results for a simulated 2048 pixels detector, assuming a pixel pitch of 20 µm and using a cluster database where the average cluster multiplicity was of 4 pixels, with a Landau distribution. To estimate the hits (clusters) flux a frame rate of 50 MHz has been assumed. It is relevant to note how larger (more pixels) matrixes handle higher event rates per frame, but less event densities (hits rates per square centimeter), at higher compression levels.
The pixel size of 20 µm has been chosen considering a target resolution better than 10 µm, for either an analogue or a digital readout implementation. The cluster size and shape statistics is extremely important, as the number of clusters per frame the OrthoPix architecture can effectively handle strongly depends on the average number of pixels composing the clusters [7]. It is worth to note that the figures in the plots refer to hits that have been uniquely reconstructed (mathematically guaranteed to be real). Un-confirmed real hits are not lost, and can be thereby used to increment efficiency in the case other means to distinguish them are available from outside the system. A typical example of this is a multi-layer tracking system, where unconfirmed real hits in one layer can be actually confirmed real by using the other layers and carrying out track reconstruction.

Design overview
The prototype here described is a 255 × 255 pixel array (10 µm pitch) realized in the Tower-Jazz 0.18 µm quadruple-well CMOS process on a 18 µm thick high resistivity (1 kω cm) epitaxial layer. It can therefore be reverse biased at low bias voltages (< 10 V) and used as the sensitive volume. The same design has been implemented on epitaxial layers of various resistivity and thickness, as well as in a special version employing BJT transistor, not described here. The design has been part of a MPW run submitted by the ALICE collaboration for the development of novel monolithic pixel detectors [11].
In the prototype each pixel has four static connections to the periphery: the first two connect the pixel by row and column, realizing x and y projections like in a traditional projection scheme, while the last two roughly resemble diagonal connections, even if they actually differ in the fact that they balance the number of pixels connected to each projection (figure 5). At the matrix periphery an array of 1020 comparators is used to discriminate the four signals coming from each pixel. Each comparator is connected to a 1 bit memory, which is set when the comparator input signal is above a given threshold.
The mapping illustrated in figure 5 is described by the following relations, which have been used in the design phase and that, reversed, also provides the basic relationships necessary to reconstruct the original pixel position from the data stream: In (3.1) x, y are the Cartesian coordinates of the pixel within the matrix, and N the side size of the matrix, equal to 255 in the prototype. The mod operator indicates the modulus N operation. The actual positioning of the 255 bit cells of the four projections P n in the memory is interleaved due to layout optimization, therefore the arrangement of the 1020 elements memory results as following: P 1,0 , P 3,0 , P 4,0 , P 2,0 , P 1,3 , P 4,1 , P 2,1 , . . . , P 1,254 , P 3,254 , P 4,254 , P 2,254 -6 - Figure 6. The priority encoder readout scheme (left, [11]) and the prototype layout (right).
While algebraically straightforward, the relations expressed in (3.1) ensures that, for a uniform hits distribution over the detector surface, the chance of distinguishing multiple hits is maximized. Those relations are therefore not unique, as they belong to a set of homeomorphic transformations which all provide the same level of efficiency (the maximum one) in disentangling multiple hits on the pixel matrix. Power saving and pixel compactness come from the fact the projective compression is lossy, thereby data reconstruction efficiency degrades for higher occupancies.

Readout control
Two readout systems have been implemented to read the 1020 bits memory which contain the frame data. The first one is based on a Circular Shift Register with a serial output, which simply outputs the memory out sequentially. It is mostly intended for testing and debug purposes, and accepts a clock frequency up to 160 MHz. Operating it at about 100 MHz, a maximum frame rate of 100 kHz can be obtained.
The second circuit is based on an asynchronous priority encoding sparsification scheme borrowed from the ALPIDE pixel detector development [11], which performs further data reduction with very low power consumption. The priority encoder has a tree structure (figure 6) in order to reduce the steps necessary to scan through all the memory elements from n to log n n. Each asynchronous read cycle takes less than 10 ns to walk through the tree and output the topmost (highest priority) memory cell address containing a hit. The total readout time for a frame will therefore be proportional to the total number of (projected) hit pixels. For our prototype frame size (255 × 255 pixels), equation (2.1) and simulations predict a detection efficiency > 99% up to about 12 clusters per frames: simplifying to a worst case scenario, we can assume three projected pixels per cluster for each coordinate, resulting in a total of 12×3×4 = 144 pixel containing a signal in the periphery memory.
Reading out 144 pixels at 100 MHz (to allow for memory valid bit asynchronous address readout time) means an average frame rate close to 700 kHz, equivalent to a detectable hit frequency at full efficiency of more than 8 MHz (12 hits per frame times the frame rate) over the detector area. It is apparent how the priority encoder scheme further reduces the data throughput of a about -7 - a factor 7 for a sensor operating at the maximum particle flux possible (at full efficiency), and even more in case of lower cluster density.

First evaluation and future testing
We performed basic electrical integrity tests as soon as dies were available, with successful results, as the chip responded as expected. A known design issue with the collection node bias current in the pixel cell requires providing a constant leakage current through soft illumination to correctly operate the matrix. While annoying, the issue did not prevent testing the architecture behavior, which was the goal of the submission.
By micro-focused laser scanning and 55 Fe gamma source data recording, it has been possible to verify that the OrthoPix architecture works as expected, and that it is actually possible to reconstruct hit pixel position from the compressed data stream.
First step has been to verify the pixel mapping scheme by checking how many and which comparators did fire at each 55 Fe hit, and if they were correlated as they should. While the observed behavior agrees to what expected, we noticed that for many single-pixel hits not all the four comparators connected to the pixel did fire. It was found that thresholds setting is a critical parameter of the design, and very low thresholds are necessary to have all four comparators firing for near all the pixels. As already stated, the necessity of providing a constant bias current through illumination to make the front-end working renders it difficult to quantitatively evaluate this threshold non uniformity (providing an s-curve for the detector), which is clearly an important issue to address for the effectiveness of the device. While we are working to address the problem, it is likely that a second submission not affected by the described bias current issue will be necessary for more accurate measurements.
A second set of tests has been conducted to reconstruct a laser spot positon, and to follow it all over the matrix surface (figure 7). For this laser scanning a 910 nm laser focused into a 30 µm spot has been used. The relatively "large" spot diameter, when compared to the pixel pitch (10 µm) -8 -ensures more than one pixel fires for a given position, therefore approximating a cluster from an ionizing particle. It was therefore possible to reconstruct the laser spot position and size all over the matrix. The average measured spot size was of 3.5 pixels, but the statistics gathered is not sufficient to determine the eta-function of the pixel. It is evident from figure 7 that the necessity to lower the thresholds to have all comparators connected to a pixel firing lead to pick-up some noisy pixels, which clearly appear completely out of the laser track.
While these results are still preliminary and a more in-depth investigation is necessary to verify critical details like the response to large signal clusters and the actual efficiency in operative condition, the architecture proved viable in this first implementation. Test beam characterization has been planned to effectively measure the reconstruction efficiency in a realistic scenario for various minimum ionizing particles fluxes. Considering the importance of the cluster multiplicity and how biasing the epitaxial layers affects this parameter, measurements at different bias voltages are foreseen.

Conclusions
We report on the realization of the first MAPS prototype using the OrthoPix architecture to read out the pixel matrix, which performs real-time data compression at greatly reduced power consumption and with minimal layout footprint. Yet not fully characterized, the prototype demonstrates how this architecture could offer an effective solution for applications, like tracking and very low occupancy imaging, where small pixel pitch, extremely fast frame rate and low material budget are paramount.