Modeling the impact of coincidence loss on count rate statistics and noise performance in counting detectors for imaging applications

Coincidence loss can have detrimental effects on the image quality provided by pixelated counting detectors, especially in dose-sensitive applications like cryoEM where the information extracted from the recorded signal needs to be maximized. In thiswork,we investigate theimpactof coincidence loss phenomenaonthe recorded statistics in counting detectors producing sparse binary images. First, we derive exact analytical expressions for the mean and the variance of the recorded counts as a function of the incoming event rate. Second, we address the problem of the mean and variance of the recorded events (i.e., pixel clusters identi ﬁ ed as individual incoming events), which also acts as a function of the incoming event rate. In this frame, we review previous studies from different disciplines on approximated two-dimensional models, and we critically reinterpret them in our context and evaluate the suitability of their adoption in the present case. The knowledge of the ﬁ rst two momenta of the recorded statistics allows inferring about the signal-to-noise ratio (SNR) and the detective quantum ef ﬁ ciency at zero frequency (DQE 0 ). Analytical results are validated through comparison with numerical data obtained with a custom-made Monte Carlo code. We chose a realistic case study for cryoEM applicationconsistingofa25-µm-thickMAPSdetectorfeaturingapixelsize of 10 µm and illuminated with electrons of 300 keV energy over a wide range of incoming rate.


Introduction
Coincidence loss in counting detectors is the phenomenon whereby an incoming event fails to be recorded by the system due to its proximity, or overlap, with at least another incoming event.Proximity or overlap can, in general, occur either in the time domain, in the spatial domain, or both.In the time domain, coincidence loss problems belong to the category known as dead time problems [1].In brief, due to the physics of the particle interaction with the sensor material and to the specific signal processing in the associated front-end electronics, the detection of an event on a certain channel can make it insensitive to upcoming events within a certain time interval-called dead time-leading not only to count losses but also to distortions of the counting statistics.According to the case-specific signal processing, a multitude of different counting behaviors can be generated.They typically cluster into two main categories, namely, the paralyzable and non-paralyzable counting modes, and are described in their classical meaning 1 , e.g., in [2].In the spatial domain, in an analogous way, coincidence loss may occur when an incoming event would trigger one or more detection channels within a certain distance to another channel that already recorded an event such that their spatial traces merge, thus impairing the possibility of distinguishing the contribution of the individual events.This distance is typically called the coincidence length or coincidence area, for the one-dimensional and twodimensional case, respectively.In addition, in this instance, the case-specific detection physics, signal processing, and the actual definition of event can lead to different counting behaviors and different degrees of loss of information.
The effects of coincidence loss are therefore particularly detrimental in dose-sensitive applications where the need to maximize the information content of the detected signal is preeminent.This is, e.g., the case of electron cryomicroscopy (cryoEM), an electron microscopy technique for life science applications which involves the illumination of a highly radiation-sensitive, vitrified biological sample with a high-energy electron beam (typically in the range 100-300 keV), and the detection of the transmitted signal with a two-dimensional pixelated detector array [3,4].The development of tailored detection systems has been driven primarily (but not exclusively) by the optimization of the detective quantum efficiency (DQE), a figure of merit that quantifies the worsening of the signalto-noise ratio (SNR) of the information in the passage through the detector and that is commonly adopted by imaging applications in general.Efforts in this direction led not only to direct-detection devices based primarily on monolithic active pixel sensors (MAPSs) [5,6] with single-electron counting capability [7,8] but also to hybrid-pixel counting (HPC) devices like the ones mentioned, e.g., in [9,10], which are intrinsically designed with single-electron detection capability.The typical data-taking workflow consists on the acquisition of a series of images, or frames, in a condition of sufficiently low electron beam intensity such that the individual recorded events, which leave traces potentially extending over multiple pixels, are sparse enough to be individually identified and processed, thus enhancing the DQE of the image resulting from the sum of all the individual frames [11][12][13].If the pixel electronics works in the charge integrating mode, like typical MAPS detectors, it is virtually impossible to distinguish between one and multiple events occurring in the same pixel during a single exposure or time frame.The use of the information given by energy deposition per pixel to try to disentangle individual events is also discouraged.Indeed, in realistic sensor geometries, the random nature of the electron track in the sensor volume tends to reduce the correlation between energy deposition per pixel and real impinging positions to a level that is practically of no use [11,12].The first image processing stage in this case can only be to simply assess if pixels have recorded any event, in a binary fashion.If no further image processing is applied, the case would be equivalent to that of a standard counting detector with an in-pixel counter with a depth of 1 bit.If the image processing continues with single-event analysis, events-ideally corresponding to individual incoming electrons-need to be first identified.During the identification process, if two or more recorded events by chance overlap (i.e., their traces hit the same pixel) or merely touch each other (i.e., their traces hit neighboring pixels), it is impossible to distinguish them and the merged event is counted as one.This leads not only to an underestimation of the number of incoming electrons but also to a greater uncertainty on the inferred impinging location, all factors that ultimately contribute to a worsening of the DQE.
To shed some light on the impact of coincidence loss on the recorded signal statistics and on the noise figures in counting detectors producing sparse binary images, we proceed as follows: first, we investigate the statistics of the recorded counts as a function of the incoming event rate, where counts are intended in their classical meaning of (binary) value stored in the pixel counter.In particular, we derive from basic statistical arguments, and without the need for free parameters, analytical expressions for the mean and the variance of the recorded counts.Second, we investigate the statistics of the recorded events as a function of the incoming event rate, where events are intended as isolated clusters identifiable in a sparse image.In particular, we recognize that in the past decades, analogous mathematical problems have already been addressed in other fields of science, especially statistics, biology, and chemistry (chromatography).We give a basic historical overview of such studies with the double intent of bringing them back to the attention of detector scientists and microscopists and to challenge their suitability for our purposes.In this framework, we individuate the two-dimensional analytical model that better approximates the behavior of the mean number of recorded events-as no exact analytical solution exists-highlighting similarities and differences between the original and our case regarding the assumptions at the basis of the model derivation.A two-dimensional model for the variance of the recorded events also does not exist.However, driven by analogy principle, we propose an adaptation of the analytical solution valid for the one-dimensional case.Whenever possible, we highlight the analogies with the corresponding existing cases in the time domain.
In both counting cases, the knowledge of the first two momenta of the recorded signal statistics allows us to derive the corresponding analytical expressions for the SNR and for the DQE at zero spatial frequency (DQE 0 ) as a function of the incoming event rate.Additional figures of merit, namely, the area occupancy (AO) and the coincidence loss fraction (CLF), are also presented.
Analytical results are validated through comparison with numerical data obtained with a custom-made Monte Carlo simulation suite.As a realistic case study, we simulated a MAPS detector for cryoEM applications featuring a thickness of 25 µm and a pixel size of 10 μm, illuminated with 300-keV electrons up to a flux intensity of 200 el/s/pix (electrons per second per pixel), working at the frame rate of 1 kfps and with a counting threshold of 1 keV.Given the nature of the case study, electrons identify with events the two terms are used interchangeably.

Background
Let us assume a two-dimensional pixelated array with infinitely large area and with a total number of pixels N pix .Incoming events constitute a homogeneous random sequence following Poisson statistics, both in time and in space, with constant average rate per pixel and per unit time n 0 .Let N 0 (t) be the random variable denoting the total number of incoming events in a pixel element over the time interval from 0 to t.During a time frame of duration Δt, the number of incoming events in a pixel element is therefore N 0 (Δt), and the cumulative value over a number F N time frames is N 0 (F N Δt).Due to the physics of the interaction between the impinging particle and the sensor material, a single incoming physical event can trigger the simultaneous detection in multiple, typically neighbors and pixels.To describe the number of pixels firing due to a single incoming event, we define the random variable event multiplicity Mul.Denoting with E[•] the mean value operator, the average effective incoming event rate is In the following sections, quantities are described over the time interval of a frame Δt.The generalization to the sum of a series of F N time frames is straightforward, given the statistical independence between consecutive frames and the homogeneity of the involved processes.For any random variable X , indeed, it holds where V[•] indicates the variance operator.

Mean and variance of recorded counts
Let us define the random variable M(Δt) as the total number of recorded counts per pixel in a time frame and the average recorded count rate: Under the working hypothesis of binary outcome-a pixel can register at most one count per time frame-M(Δt) assumes the nature of a Bernoulli distribution (single-trial binomial distribution) and the probability of a successful outcome, i.e., a pixel registers a count that is at least one effective incoming event hits the pixel: where Pr(k i) is the Poisson probability of having i incoming events in a pixel in a time frame.According to our notation, it takes the following form: and therefore, The expression for the recorded count rate m in Eq. 5 can be then rewritten as The derivation of the counts variance for an individual pixel is straightforward from the binomial distribution and yields the following: 1 The analysis of the behavior of individual pixels can be extended to the collective behavior of the pixel ensemble by introducing an additional random variable representing the spatially averaged total number of recorded counts: Given the linearity of the mean value operator, the mean number of spatially averaged counts is simply while for the variance, the derivation is more involved since pixels may exhibit a correlation.In a previous work [14], we derived an expression for the variance of a correlated ensemble of pixels in counting detectors, based solely on the knowledge of the variance of the single pixel, of the first and second moment of the event multiplicity distribution, and of the counting efficiency η c m n : Expanding it and using our notation, we obtain We would like to emphasize that this expression is completely predictable since it does not contain free parameters to be determined, e.g., by fitting.The only empirical term is the multiplicity distribution that can be retrieved experimentally quite easily2 .

Mean and variance of recorded events
The analysis of the statistical properties of events that occurs randomly, independently, and with uniform probability in a generic mathematical space belongs to the category of Poisson point processes, and the gathering of such events into clusters is known as Poisson clumping or burst.Often, problems in this field necessitate of heuristic approaches to come to an explicit conclusion [15,16].During the 1980s, a problematic analogous to ours emerged in the field of chromatography, a chemical analysis technique involving the separation of a mixture into its components, their spatial drift through a system-the velocity depending on the nature of the component-, the reaching of a final stationary state, and the identification of the several possibly overlapping and assumed Poisson-distributed signal peaks.The study of the statistical properties of the recorded peaks in the one-dimensional case led Davis and Giddings to the birth of the statistical model of overlap (SMO) or statistical overlap theory (STO) [17].Unsurprisingly, this first exact analytical result coincides, space exchanged with time, with the dead time model for the classical paralyzable counting detectors reported in Eq. 1.
The STO was then extended (with approximations) to the twodimensional case by Davis in [18], until the work from Roach [15], developed in the 1940s-1960s during studies on biological and hygienic sciences of phenomena closely related to the overlap of spots in twodimensional beds, was "re-discovered" in [19].In the same work, it is also demonstrated that this model provides the best approximation to numerical data, among a series of different other models, and therefore, we took it as reference for our work.Incidentally, it is worth mentioning Davis's extension of "Roach model" to the generic n-dimensional case [20].
To ease the reading, we outline the basic reasoning at the basis of the Roach model, as described in [19].The theory starts assuming circular "zones" with radius r 0 , whose centers are randomly distributed on a continuous, two-dimensional space.If the centers of two zones are closer than 2r 0 , they are considered overlapping.A series of overlapping zones is called a spot.The recorded number of spots determined with the following scheme.Selected arbitrarily an initial zone A and its first neighbor B, if the distance between the two centers is greater than 2r 0 , zone A is a singlet spot.Otherwise, the overlapping pair forms either a doublet spot or a higher-order multiplet (e.g., triplet, quartet, quintet, etc.).In this case, a third zone C is individuated such as its center is the closest to either A or B. The shortest distance between C and A or C and B is again compared to 2r 0 .If greater, the pair A and B is a doublet spot.If smaller, A, B, and C form at least a triplet spot.The neighbor-searching procedure is then repeated until the distance of all remaining zones to any of the n zones of the spot is greater than 2r 0 .As a result, a spot consisting of n overlapping zones, or an n-tet, has been isolated.The process is iterated for another arbitrary zone until all zones has been assigned.Figure 1 helps visualizing an example of Roach's zone selection procedure.
The outcome of the neighbors overlapping test can be modeled with the binomial distribution.For instance, probability p 1 that the distance between a zone center and the center of the nearest neighboring zone is greater than 2r 0 (i.e., the first zone is a singlet) corresponds to the probability that no event occurs within a circular area of radius 2r 0 around the center of the first zone-let us call this coincidence area 4A 0 , where A 0 is the area of an individual zone.Let us remember that the spatial distribution of the events is assumed to follow Poisson statistics.Introducing now our notation and defining k as the number of events falling within the coincidence area in a time frame Δt, the Poisson probability distribution for the number of events can be written as where QE is the quantum efficiency, defined as the ratio between the number of detected events and the total number of incoming events assuming no coincidence loss [21], which can be also expressed as The introduction of QE arises from the fact that events that pass completely undetected do not contribute to the overall statistics (this statement is reviewed in Section 2.6).From these premises, it follows that p 1 can be written as On the other hand, the probability that the distance between a zone center and the center of the nearest neighboring zone is less than 2r 0 is the complementary probability 1 − p 1 .The chain of events that brings to an n-tet spot therefore consists of n − 1 sequences in which the nearest neighbor distances are less than 2r 0 and 1 sequence in which the nearest neighbor distance is greater, breaking the spot connection to the remaining zones.If the interzone distances are independent from each other-a condition that, as recognized by Roach, is approximately true and necessary Illustration of Roach's zone selection procedure.Zones (A-E) form a quintet, as the distance between any zone centers and at least one of the other zone centers in the spot is less than 2r 0 .The distances between all zone centers in the spot and all zone centers not in the spot, beginning with (F), are greater than 2r 0 .Figure adapted from [19].
to reduce the two-dimensional overlapping problem to a tractable form-the probability p n for a zone to be part of an n-tet spot is the product of the individual probabilities: Since each of the QEn 0 Δt incoming events per unit area (corresponding to a pixel, in our case) per time frame has this probability of forming an n-tet spot, the number of zones contributing to the formation of n-tet spots is QEn 0 Δt p n .However, because n zones are required to form each spot, we need to weight by 1/n this value in order to obtain the expected number P n of n-tet spots per unit area per time frame.
The total number of spots P is the algebraic sum of all the n-tet spots, which converges analytically to Defining now the random variable E(Δt) as the total number of recorded events per pixel in a time frame, its mean value coincides with p: The recorded event rate e E[E(Δt)]/Δt can therefore be written as At this point, it is suitable to bring to the attention to the reader that the Roach model was developed in a continuous two-dimensional domain and for the "simple" case of circular-shaped zones 3 , whereas our case involves a discretized domain (pixels) and random event shapes.Nevertheless, results shown in Section 3 suggest that the Roach model still provides an accurate description of the phenomenon, at the price of interpreting A 0 as a sort of effective quantity to be typically found via curve fitting, on the same line of what is done in [22] for the effective dead time in counting detectors with pulse shapes different from the ideal rectangular one.A rough a priori guess can anyway be attempted, imagining events to be more similar to squares rather than circles-a consequence of the pixelization of the space-with effective area E[Mul] and side L E[Mul] .It is easy to find that this implies a correlation area equal to Let us now focus on the variance of the number of recorded events.Due to the complexity of the task in two dimensions, no model could be found neither by the author nor in the consulted literature.However, an analytical solution for the one-dimensional case exists and it deserves some attention.In the context of chromatography, an exact formulation based on statistical arguments was first achieved by Rowe and Davis [23] (see also [24] for a comprehensive summary of one-dimensional models): where P corresponds to the number of recorded events in the onedimensional space interval Δx, λ corresponds to the incoming event rate per unit space, and x 0 to the coincidence interval.Few years later, in the context of dead time models for paralyzable counting detectors, Yu and Fessler achieved independently and by analytical methods an equivalent formulation [25]-change Δx with the time interval t and x 0 with the system dead time τ.In their derivation, the term equivalent to λe −λx0 in Eq. 14 is explicitly identified with the recorded event rate p: so that Eq. 14 could be rewritten as By way of analogy, we propose to adapt this expression to the twodimensional case by identifying p as the recorded event rate e of Eq. 12 integrated over the time frame Δt (in the one-dimensional context, indeed, rate is defined per unit spatial dimension), Δx as the number of pixels N pix and x 0 as half the correlation area 4A0 2 .This last relation is obtained by equivaleting the expressions for the recorded event rate in one and two dimensions for values of incoming event rates tending to zero, as reported in Appendix 4. Incidentally, it is worth noting that the one-dimensional expression is equivalent to the one for the singlets P 1 in two dimensions (see Eq. 10 with n = 1), which was, e.g., used to model, alone, the recorded event rate in two dimension in [21].From the aforementioned proposition, it follows that the variance of the recorded events per pixel and in a time frame can be written as

SNR and DQE 0
The knowledge of the first two momenta of the recorded statistics allows us to compute the corresponding SNR over a time frame, defined as which, thanks to the properties highlighted in Eqs 3, 4, can be easily generalized to the sum of a series of time frames: More interesting, however, due to its importance and widespread use in the imaging community (see [26] and references therein for an interesting historical overview), is the DQE, defined as where SNR OUT is the signal-to-noise ratio referring to the recorded signal statistics and SNR IN is the one referring to the incoming signal statistics, which, under the assumption of obeying Poisson statistics, equals to which makes the DQE independent on the number of acquired frames.
In its modern acceptance 4 , DQE is conceived in the twodimensional domain of the spatial frequencies by applying Fourier transform to the output signal and noise.Use of frequency analysis requires the system to exhibit the properties of shift-invariance, linearity and wide-sense stationary statistics.In general, rarely, a system satisfies all the requirements in a rigorous manner.For example, a pixelated detector is not strictly shiftinvariant, unless shifts are by an integer number of pixels.In addition, in a counting system, the noise is stationary only in conditions of uniform illumination as it depends on the incoming signal itself.Therefore, care must be taken when interpreting the results as they might be approximations of the true properties of the system [27,28].In our case, the requirement of linearity is clearly not satisfied.Since a straightforward extension of the theory is not well-established (see [29] for a nice compendium on possible generalizations of the concept of DQE to non-linear systems), we limit our analysis to the degenerate zero-frequency case denoted DQE 0 5 .The DQE 0 of the recorded counts as a function of the incoming event rate can therefore be retrieved using Eqs 6, 8, 18, 19 in obtaining , while one of the recorded events using Eqs 11, 17-19 obtaining 1−e −QEn 0 Δt4A 0 .

Derived quantities
Additionally, derived quantities of particular interest, which we would like to mention, are the area occupancy (AO) and the coincidence loss fraction (CLF).The AO is defined as the ratio between the average number of counting pixels per frame and the total number of pixels, coinciding with the average number of recorded counts in a time frame.
Information on the AO can have implications on the detector readout mechanism design and optimization, as well as on the choice of suitable data compression algorithms.The coincidence loss fraction is defined as the ratio between the number of "lost" events, i.e., not recorded, and the total number of incoming events.
and reflects the counting efficiency of the system.In Eq. 20, the physical incoming event rate n 0 in the denominator is scaled by QE as QEn 0 is the actual ideal event rate recorded in the absence of coincidence loss.

General remarks
i.The study of the counting statistics of pixels featuring binary counts presented here complements a picture already including at least the classical paralyzable and non-paralyzable counting modes [25] and a particular case of the non-paralyzable mode [14], where the paralysis is avoided, thanks to a circuital stratagem called "instant retrigger."ii.The results on the recorded event statistics are of general validity, whether the hit digitization occurs directly in the pixel (with binary or non-binary outcome), in the readout electronics, or at the image processing stage.Differences can arise on the value of the coincidence area, according to the specific algorithm used for the event recognition.iii.A detector with in-pixel counting electronics might have some advantages compared to one working in the charge integrating mode in terms of a slightly smaller coincidence area.It can indeed happen that a pixel receives, in the same time frame, signals originated by more than one event.If the front-end electronics works in the counting mode, the contribution of every event is individually processed (provided they do not undergo pile-up).If the front-end electronics works in the charge integrating mode, it is the sum of the contributions to be processed, making more probable the recording of a hit and the consequent formation of a "bridge" between neighboring events.iv.The presented models were derived for systems with framebased readout, but their validity can be extended to systems with the event-based readout as well.In a system with eventbased readout, the detection of an event triggers its own readout, but the time stamp associated with it has finite resolution, and therefore, neighboring events within this time interval cannot be distinguished.The time-stamp discretization corresponds to our frame time 6 .
4 Originally, the DQE was conceived as a "large area" property, what is nowadays called zero-frequency DQE [26].
5 The notation DQE 0 is preferred over DQE(0) not to induce to think that the DQE is an actual function of frequency.
6 Modern event-based readout chips can feature time-stamp discretization down to ns or sub-ns levels, allowing in line of principle for count rate capabilities orders of magnitude better than frame-based ones.A practical limit anyway arises from the fact that transmitting spatial coordinates and timing information to the readout electronics potentially generates huge amount of data, saturating the system bandwidth.Although much depends on the specific implementation details, the advantage of one readout mode over the other is not obvious.

Monte Carlo simulation framework
To validate the analytical models, we used results of numerical simulations carried out with an improved version of the Monte Carlo tool used in [13,30,31].The first step of the workflow was the creation of a statistically relevant pool of electron tracks in the semiconductor sensor.A total of 40 million tracks were generated using FLUKA 7 -a Monte Carlo particle transport and interaction suite [32,33]-storing, for each of them, the three-dimensional spatial coordinates with an accuracy of 1 µm and the amount of energy released therein.
We then processed each individual electron track with a custom-developed numerical code mimicking the physics of the charge collection and signal formation at the pixelated electrode.The generated charge distribution of each track segment was thus propagated through the remaining sensor thickness to the pixels, and a Gaussian blurring was added to reproduce the effect of thermal diffusion, with a total width depending on an initial intrinsic contribution and, under the assumption of the constant electric field, to a contribution depending on the total travel length.The charge collected by each pixel was converted into energy and a counting threshold was applied, if the energy is higher, the pixel counts 1; otherwise, it counts 0. A random fluctuation representing the electronic noise, normally distributed and assumed uncorrelated among the pixels, was added to the signal.The response of both the sensor and the readout electronics has been assumed uniform in space.At this point, it was possible to extract the statistical distribution of the event multiplicity.Then, for each value of a series of incoming electron rates, a set of 2000 independent images (frames) was generated.The total number of impinging electrons for each individual frame was chosen randomly according to the suitable Poisson statistics and then they were uniformly and randomly distributed across the sensor surface, which covered an area of 2048 × 2048 pixels.A pixel cluster recognition algorithm was then applied to each frame to isolate single events.In order to be considered isolated, two clusters need to be separated by at least one empty pixel.

Case study
We chose a case study realistic for CryoEM applications consisting of a 25-µm-thick MAPS detector, covered with 5 µm of Al accounting for the metal layers and featuring square pixels of size 10 µm.The counting threshold energy was 1 keV, the electronic noise was 200 eV rms, and the thermal diffusion between 1 and 3 µm rms.The frame rate was assumed 1 kfps.The energy of the impinging electrons was 300 keV, with values of incoming rates in the range 1-200 el/s/pix.

Results and discussion
In order to get a visual feeling of the events distribution recorded on the pixel matrix, two frame sub-regions obtained with an incoming electron rate of 5 el/s/pix and 30 el/s/pix, respectively, are shown in Figure 2. The first can be considered an example of low incoming rate condition, the second an example of medium incoming rate condition, with event clumping clearly noticeable.
The single-event multiplicity probability distribution, computed in a condition of no coincidence loss, is shown in Figure 3.The large majority of clusters consist of 1-4 pixels, with a peak of probability at 2. Cluster sizes greater than 4 exist with lower occurrence probability.In addition, the case of no detection, i.e., cluster size 0, is very unlikely with probability 0.002.This allows us to infer the quantum efficiency through Eq. 9, yielding QE = 0.998.The first two statistical momenta of the Example of frames sub-regions obtained with an incoming electron flux of 5 el/s/pix (left) and 30 el/s/pix (right).Pixels with overlapping events have been highlighted in red for visualization purposes.

FIGURE 3
Single-event multiplicity probability distribution.Lines are to guide the eye.Figure 4 (top) shows the comparison between recorded count rate curves obtained with numerical simulations and predicted by the analytical model of Eq. 7, as a function of the incoming electron rate.The ideal count rate curve n 0 E[Mul] is also shown as reference.Figure 4 (bottom) shows the corresponding error in percentage, which is on the order of few per mill across all the range of probed incoming electron rates.In addition, as a quantitative measure of the goodness-offit, we computed the L 2 relative error norm (L 2 REN), defined as where NoP corresponds to the number of simulated points.In this case, it amounts to 0.13%.Figure 5 (top) shows the comparison between recorded count variances in one time frame obtained with the numerical simulations and predicted by the analytical model of Eq. 8, as a function of the incoming electron rate.The Poisson variance values of the incoming electron statistics n 0 Δt and of the ideal recorded counts n 0 E[Mul 2 ]Δt [14] are also shown as reference.Figure 5 (bottom) shows the corresponding error in percentage.The L 2 REN amounts to 3.97%.
The level of agreement between simulated and predicted data for both for the mean and the variance is such that we can positively conclude on the validity of the proposed model.
Figure 6 (top) shows the comparison between recorded electron rate curves simulated and fitted with the analytical models, as a function of the incoming electron rate.We first observe that the counting paralysis occurs here much earlier than for the bare counts in Figure 4, due to the inflating effect of the event multiplicity.Then, the curve labeled "ana.2D" corresponds to the result using the Roach model of Eq. 12, whose fitting yields a value of 4A 0 = 20.51pixels.For accuracy reasons, the fitting was restricted to a range of incoming electron rate 0-80 el/s/pix (the upper bound corresponds to the location of the maximum recorded electron rate) as above this limit, the model tends to overestimate the number of recorded electrons.Within this range, the L 2 REN of the Roach model amounts to 0.06%.Incidentally, we note that using the Roach model of Eq. 13 to estimate a priori the correlation area 4A 0 , we obtain a value of 19.26, which is not far from the correct value obtained through the fitting.The curves labeled with "ana.P n " show the breakdown of the Roach model into the first five ntet spot components, namely, the singlet, doublet, tripled, quartet, and quintet spots, computed with Eq. 10.It is interesting to note how the contribution of lower-order n-tet spots is always dominant on higherorder spots over all the range of incoming electron rates.To some extent, this explains why the one-dimensional model of Eq. 15 (mathematically equivalent to the number of singlets in the twodimensional model) with parameter x 0 4A0 2 10.255, labeled in Figure 6 as "ana.1D," also provides an acceptable approximation of the simulated data, in particular at low values of incoming electron rates.This statement is supported also by Figure 6 (bottom), which shows the corresponding fitting errors in percentage.
Figure 7 (top) shows the recorded electrons variance in one time frame, as a function of the incoming electron rate.The curve labeled "ana.1D," corresponds to the one-dimensional analytical solution in Eq. 16, while the curve labeled "proposed  ana.2D" corresponds to the proposed extension of the onedimensional model to the two-dimensional case, as in Eq. 17.The Poisson variance of the incoming electrons statistics n 0 Δt is also shown as the reference.We observe that up to the incoming electron rate of ~30 el/s/pix (corresponding to a coincidence loss of 27.6%, see Figure 9), both models describe the behavior of the recorded variance equally well, with a L 2 REN of 6.8%.For increasing incoming electron rates, both models underestimate the data down to a factor 1/2, as shown in Figure 7 (bottom).The proposed model, however, follows the peculiar "bulged" shape of the simulation better than the original onedimensional one.For this reason and because it also leads to a simpler expression of the DQE0E, we endorse the adoption of the proposed model for the two-dimensional case.
The behavior of the DQE 0 as a function of the incoming electron rate is reported in Figure 8, for both the cases of recorded counts and recorded electrons.The knowledge of the first two statistical momenta of the multiplicity distribution allows us to compute the limiting value of the DQE 0M for low incoming electron rates8 : which, in our case, amounts to 0.64.(Top) Recorded electron rate curves obtained with the numerical simulations (error bars) and their fitting with the analytical models (solid lines)."ana.2D" corresponds to the fitting with the Roach model of Eq. 12, while "ana.P n " corresponds to the contributions of the n-tet spots up to n = 5 using Eq.10. "ana.1D" corresponds to the fitting with the one-dimensional model of Eq. 15.Please note that the error bars on the simulated valued are smaller than the graphical symbol.The ideal recorded electron rate curve n 0 is also shown as the reference.(Bottom) Deviation between the numerical simulations and their fitting with analytical models expressed in percentage.
(Top) Recorded count variance obtained with the numerical simulations (error bars) and predicted with the analytical models (solid lines)."ana.1D" corresponds to the result obtained with the onedimensional model of Eq. 16, while "proposed ana.2D" corresponds to the result obtained with our proposed twodimensional extension of Eq. 17.As reference, the Poisson variance of the incoming electrons is also shown.(Bottom) Deviation between the two curves expressed in percentage.
Simulated (symbols) and analytically modeled (solid lines) DQE 0 for the recorded counts and for the recorded electrons.
grows indefinitely.We can attempt to explain the increasing trend for both DQE 0M and DQE 0E -for the latter before the unavoidable collapse at high incoming rates due to counting paralysis-in an analogous way of what is already observed in [14] in the time domain.Essentially, but probably less intuitively, event overlapping provides a sort of "regularizing" effect of the recorded statistics.For instance, a recorded event resulting from the union of several overlapping events is recorded, regardless (to some extent) of the variation in number and position of the single events constituting the union, making it less sensitive to the natural statistical fluctuations of the incoming signal, in a sort of noise filtering effect.However, one should not be tempted to think that operating a detector in this regime would be absolutely beneficial.Indeed, the gain in DQE 0 is compensated by a loss in spatial resolving capability due to the increasing size of the merged event.
Finally, Figure 9 shows the area occupancy and the coincidence loss fraction as a function of the incoming electron rate.In the context of cryoEM applications, only tiny deviations from linearity are acceptable-on the order of few percent.Assuming a relaxed upper limit for the coincidence loss fraction of 10% [21], this is reached in our case study for an incoming electron rate of 10.10 el/s/ pix and the corresponding area occupancy is 2.86%.In this regime, it is, therefore, perfectly justified the use of both the Roach model for the recorded electron rate and of the proposed two-dimensional model for the recorded electron variance.To give a more practical understanding of the impact of distortions of counting statistics, it is useful to translate the aforementioned quantities into integrated values over the detector area (2048 × 2048 pixels) and frame time (1 ms).A coincidence loss fraction of 10% would then occur at an incoming electron count of 41'943 el, for a corresponding AO of 119'957 pix.

Conclusion
We investigated the impact of coincidence loss on the recorded count statistics and on the noise performance in counting detectors featuring sparse binary images.First, we derived exact analytical expressions for the mean and the variance of the recorded counts.Second, we addressed the problem of the mean and variance of the recorded events (i.e., pixel clusters identified as a single incoming event).We reviewed, reinterpreted, and evaluated the suitability of approximated models-as no exact solutions exist in two dimensions-previously obtained in several different disciplines, adopting the "Roach model" for the mean and proposing an extension of the one-dimensional exact solution for the variance to the two-dimensional case.For both cases, we derived expressions for the SNR and the DQE 0 .Model predictions were qualified against numerical simulation carried out with a custom-developed Monte Carlo code, for the CryoEM-realistic case study of a 25-µm-thick MAPS detector featuring a pixel size of 10 μm, a frame rate of 1 kfps, and working in the binary counting mode.The incoming beam consisted of electrons with energy 300 keV and with flux intensities up to 200 el/s/pix, where coincidence loss phenomena are bringing the system well into paralysis.The matching between simulated data and analytical prediction is perfect for the mean and variance of the recorded counts.For the mean recorded electrons, the Roach model fits excellently simulated data up to an incoming electron rate of ~80 el/s/pix, corresponding to the location of the maximum of the recorded curve.At higher values of incoming rates, the model tends to slightly overestimate the number of recorded events.For the variance of the recorded events, both the existing onedimensional and the proposed two-dimensional analytical solutions match the simulated data excellently up to an incoming electron rate of ~30 el/s/pix.At higher values of incoming rates, both models severely underestimate the data, but the proposed two-dimensional extension follows better functional behavior, supporting its adoption.The resulting DQE 0 shows an increasing behavior as a function of the incoming rate for the recorded counts, while it shows a nonmonotonic behavior for the recorded events.The increase above the low incoming rate limit is due to an allegedly reduced sensitivity of the recorded signals resulting from the union of multiple events to the statistical fluctuations of the individual incoming events, in a sort of noise filtering effect.Only in the second case, it ultimately decreases to zero due to the overwhelming system paralysis.Generalization and limitations to the validity of the models were also discussed.

FIGURE 2
FIGURE 2 7 v. 4-2.1.The physics was set to multiple Coulomb scattering with the cutoff energy of 1 keV for electrons and 100 eV for photons.Fluorescence was enabled, and no biasing was used.Frontiers in Physics frontiersin.org07 Zambon 10.3389/fphy.2024.1408430distribution, fundamental for the continuation, are E[Mul] 2.87 pix and E[Mul 2 ] 12.93 pix 2 .

FIGURE 4 (
FIGURE 4    (Top) Comparison between recorded count rate curves obtained with numerical simulations (error bars) and predicted by the analytical model of Eq. 7 (solid line).Please note that the error bars on the simulated values are smaller than the graphical symbol.The ideal count rate curve is also shown as reference.(Bottom) Deviation expressed in percentage.

FIGURE 5 (
FIGURE 5(Top) Comparison between recorded count variances in one time frame obtained with the numerical simulations (error bar) and predicted by the analytical model of Eq. 8 (solid line).The Poisson variance of the incoming electron statistics and of the ideal recorded counts is also shown as reference.(Bottom) Deviation between the two curves expressed in percentage.