Information theoretic approach for assessing image fidelity in photon-counting arrays

The method of photon-counting integral imaging has been introduced recently for three-dimensional object sensing, visualization, recognition and classification of scenes under photon-starved conditions. This paper presents an information-theoretic model for the photon-counting imaging (PCI) method, thereby providing a rigorous foundation for the merits of PCI in terms of image fidelity. This, in turn, can facilitate our understanding of the demonstrated success of photon-counting integral imaging in compressive imaging and classification. The mutual information between the source and photon-counted images is derived in a Markov random field setting and normalized by the source-image’s entropy, yielding a fidelity metric that is between zero and unity, which respectively corresponds to complete loss of information and full preservation of information. Calculations suggest that the PCI fidelity metric increases with spatial correlation in source image, from which we infer that the PCI method is particularly effective for source images with high spatial correlation; the metric also increases with the reduction in photon-number uncertainty. As an application to the theory, an image-classification problem is considered showing a congruous relationship between the fidelity metric and classifier’s performance.

The motivation for this paper comes from a special class of integral imaging, called photon-counting integral imaging, where images are formed by means of a photon-counting array. Applications for photon-counting integral imaging [21] include, low light level imaging [22,23] single photon emission tomography [24] and astronomical imaging [25]. Photon counting for 3D object recognition, classification, and visualization with integral imaging have been reported in [26][27][28].
It is intriguing that both 2D and 3D photon counting imagery capture significant portions of the information in an image even under photon-starved conditions, as demonstrated in [26][27][28]. This feature has become particularly evident and useful in the context of image classification, where very good performance is observed even when very few photons per pixel are available [26][27][28].
If we consider the physical process associated with a photon-counting imaging (PCI) system, we can identify three main components in the system (see Fig. 1). The first component is the true intensity image of the object. The second component is the transformation rule that governs the conversion of an intensity image into a stochastic stream of photons in space and time. The latter is characterized by a deterministic quantity called the photon-flux density [29]. The photon-flux density, ϕ, is the average number of photons per unit time and per unit area. It is well known [29] that for a coherent light source, the actual number of photons in the stochastic photon stream, present in an infinitesimal time-area rectangle dtdA, is a Poisson random variable with mean value ϕdtdA. The stochastic photon streams considered in PCI systems typically undergo severe randomdeletion (or thinning) of photons due to absorption and scattering through the transmission medium. The third and last component of the PCI systems is the photon-counting array, which counts the photons impinging on each detector element. The totality of the photon counts in the array represents a degraded version of the source image, as seen by comparing the output and source images in Fig. 1. A key question is how to quantify the loss in image content starting from the object and ending with the degraded image generated by the photon-counting array.
In light of the above three components of the PCI systems, we can identify the "source," "channel," and "output" of a communication system. As such, we can view the PCI problem shown in Fig. 1 as an informationtheoretic problem. More precisely, the "source" is the ensemble of all possible intensity images of interest, which are selected according to a prescribed probability distribution function. The "output" is the ensemble of all digital photon-count images where the pixel values are non-negative integers representing the photon counts. The "channel" is simply the conditional probability that an output image is generated given that a specific source image was used.
With the above information-theoretic description, we observe that the mutual information between the source and the output of the aforementioned communication system is a measure of the change in the information present in an image prior to transmission and that present at the destination [30,31]. Therefore, we can use the mutual information as a measure for the "similarity" between the source and output imagery. In particular, the mutual information can be utilized to quantify the preservation in image content in photon-counting imagers. More specifically, as a relative metric for image degradation we can consider the mutual information between the source and output normalized by the source's entropy, which yields a number between zero and unity, corresponding to complete loss of information and full preservation of information, respectively. For example, if the "degraded" image is statistically independent of the true image, then the metric returns a value of zero; on the other hand, if the "degraded" image is a deterministic and invertible transformation of the true image, then the metric will return a value of one. In the PCI problem considered in this paper, the transformation from the true image to the (degraded) elemental image is stochastic since intensity values are transformed into stochastic photon numbers, as dictated by the quantum nature of light. Moreover, the transformation that maps intensity to a photon-steam is generally non-invertible.
In this paper, we will use the normalized mutual information metric applied to 2D elemental images in the PCI process to investigate the role of spatial correlation and photon statistics in how well image-content is preserved. Nevertheless, this model and formulation can be applied to the 3D images once we have modeled a 3D image scene in terms of the corresponding elemental images. In such a model, we need to consider the depth in addition to the intensities at a particular pixel location. Fig. 1. Schematic of the PCI system considered in this paper. The figure shows a source image, the transformation rule and the output image. These components of the PCI system can be identified as source, channel and output of a communication system. In most scenarios of interest, the output image is a sparse, binary array since the photon stream is very weak.

Probabilistic model for PCI under Poisson photon statistics
Consider a stochastic column vector X whose entries, Xi, i = 1, …, n, are discrete random variables in the interval [0,1] representing the reflectance of some unknown object or an unknown digital image. (For example, the vector X can be thought of as a lexicographic representation of a digital image.) Throughout this paper, we will refer to X as the source image. In undertaking a digital image as the source image, we have implicitly assumed that the scene is imaged using an imaging system with a finite spatial resolution; as such, it is sufficient to consider a digital image sampled from the continuous-space acquired image with the sampling rate satisfying the Nyquist criterion consistent with the spatial resolution of the imaging system. (The quantization of levels is merely for simplicity.) A probing beam carries an average photon flux of λε photons per second per pixel, where λ is the photon flux of the un-attenuated light and ε ∊ [0,1] is an attenuation factor that we will use as a control parameter in this study. Clearly, the average photon flux is reduced to Xiλε upon reflection of the ith pixel of the image since each photon is reflected with a probability equal to the reflectance, Xi, associated with the ith pixel. The reflected light is detected using a detector array, operating in the photon counting mode, with quantum efficiency η; i.e., each photon is detected with probability η. Background stray light, dark-current noise, read-out noise, and other forms of noise (other than quantum noise) are ignored in this analysis.
The ith element of the detector array gives a measurement of the number of photons, Yi, detected during integration time τ. According to the laws of photon optics for coherent light [29], conditional on a particular realization of Xi, say Xi = xi, Yi is a Poisson random variable with mean value ηxiλετ ≡ xiεNp, where Np = ηλτ is the mean number of photons per pixel and per unit integration time. Therefore, the conditional probability mass function of Yi given that Xi = xi can be written as (Notation: We refer to P{Yi = yi|Xi = xi}, the probability that Yi = yi given that Xi = xi, as PYi|Xi(yi|xi); similarly, we refer to P{X = x} as P x(x), etc.) We now define Y as a stochastic array whose entries are the integer-valued random variables Yi (i = 1, …,n); this stochastic array represents the photon-count array. To find the probability mass function of the random-count array Y, we observe that conditional on the gray levels X = x of the totality of pixels, where x is an array with entries x 1, …, xn, the photoncount random variables Yi corresponding to different pixels are statistically independent. Hence, it follows that the conditional probability mass function of the photon-count array Y given X = x is where y ≜ [y 1, …,yn]′ and for i = 1, …,n, yi is a nonnegative integer. Using the law of total probability, we can write the probability mass function P Y(y) of the output image Y as where P X(x) is the probability mass function of the source image X, and the summation is over all realizations x of X.
Approximation under photon-starved conditions: In situations when the attenuation factor e is very small, the photon counts per pixel are either 0 or 1 with high probability [27]. Thus, the probability mass function of the photon count in each detector element can be approximated by a Bernoulli (binary) law. The Bernoulli law is also applicable to the scenario when the photon count is hard-limited (or thresholded) to 0 or 1. Under the Bernoulli assumption, the probability mass function of the photon count Yi, conditional on the pixel's gray level Xi = xi, takes the simpler form of (4a)

Probabilistic model for PCI under alternative photon statistics
In order to understand the role of the Poisson nature of photon numbers on the normalized mutual information, we considered two alternative models for the photon-counting channel: (1) the binomialdistribution model, and (2) the geometric-distribution model. In the former case, the probability mass function of Yi conditional on Xi is binomial; in the latter case, the probability mass function is geometric (exponential). For consistency, we have assumed the same mean value for all three models. The binomial model is suitable in modeling photon statistics of non-classical light-when the probing light is maximally amplitude squeezed [32]. An amplitude-squeezed state can result in narrowing the distribution of the photon number, viz., reducing the photon-number uncertainty and hence reducing quantum noise below the classical shot-noise limit (associated with light with Poisson photon statistics). The geometric distribution, also termed the Boltzmann distribution, is suitable for modeling the photon statistics of thermal light [29].
Binomial distribution model: With the application of the binomial model to the photon-count variable, Yi, conditional on Xi = xi (i = 1, …,n), Yi is distributed according to a binomial probability mass function with mean Npxiε. Denoting the parameters of the binomial distribution as n and p, we must therefore set np = Npxiε. This yields where p = Npxiε/n is a parameter between 0 and 1. The case p = 1 corresponds to maximal amplitude squeezing for which the photon number is deterministic and equal to Npxiε. Note also that as p approaches 0 (or n → ∞), this binomial distribution converges to a Poisson probability mass function with mean Npxiε. Hence, the Poisson model represents the limiting case of the binomial model.
For the case when the photon count is hard-limited to 0 or 1, we obtain (6a) Geometric distribution model: With the application of th geometric model to the photon-count variable, Yi, conditional on Xi = xi (i = 1, …,n), Yi is distributed according to a geometric probability mass function with mean Npxiε. Denoting the parameter of the geometric distribution as p,we must therefore set (1 -p)/p = Npxiε, yielding p = 1/(Npxiε+1) and For the case when the photon count is hard-limited to 0 or 1, we obtain (8a) The probability mass functions from the three models are shown in Fig. 2.
Next, we use the PCI image-degradation probabilistic model described in this section to define a metric for the quality of image transmission in the PCI system based upon the mutual information between the input X and output Y.

Normalized mutual-information metric
The entropy H(X) measures the amount of information conveyed in units of "bits," on average, by a random vector X [30]. The mutual information, I(Y;X), is a measure of amount of information that one random vector, Y, contains about another random vector, X. The normalized mutual information, denoted by ρ, is defined as (9) = ( ; ) ( ) .
It can be shown that 0 ≤ ρ ≤ 1 [30]. Moreover, ρ = 1 whenever Y is a deterministic, one-to-one function of X. In particular, when Y can always be unambiguously transformed back to X, the parameter ρ is at its maximum. The other extreme case is when X and Y are statistically independent, in which case ρ = 0 [30]. In this paper use the terms normalized mutual information and fidelity metric interchangeably.

A classification example
Now we consider a classification example to show the application of the normalized mutual information metric defined in the previous section 2.3. We considered a 128 × 128 "line image", X, as the input. The slope of the line, s, takes values from the set {-4, -3, -2, -1, 1, 2, 3, 4} uniformly and the gray level of the pixels representing the line is assumed to be 255. We then added independent and identically distributed uniform noise, K, taking values in the interval [0,k], k ∊ [0,255], to the input image, X. For increasing values of k, the variance of the noise that is added to the input increases and we generated different input images with varying spatial correlation. Conditional on the slope, s, the input image pixels are independent of each other, We then simulated the channel using Eq. (4) to generate the photon-counted binary image Y. With the help of Eq. (2), we note that (11) The classification problem is then to find out whether the slope, s, of the line in the input "line image", X, is positive or negative based on our observation Y. The classification is based upon the best leastsquare-error fit of a line to the noisy observation Y. For a particular noise variance determined by k, the classifier is run for 10,000 times and we estimate the classification error and we repeat the simulation for different values of k. As we can see in Table 1 and from Fig. 4, the classification error increases as we increase the noise variance, which is expected. In addition, we find that ρ decreases when the classification error increases. In other words, the fidelity metric decreases as the spatial correlation in the input image decreases. Hence, we find that there is a congruous relation between the fidelity metric and the classifier's performance. This observation indeed echoes the role of Shannon's information (and channel capacity) in detection error in communication systems.

Results
In all of our calculations that follow we have used Markov models for the source image X and further adopting the hard-limited model for the photon counts at the output.

One-dimensional case: Markov-chain model
To evaluate ρ, we need a model for the source probability distribution that can capture the correlation between the pixels. In this subsection we consider a one-dimensional image, represented by X 1,…,Xn, which obeys a Markov-chain model. More precisely, the conditional probability For simplicity we will assume that each pixel Xi takes values from the set {0,1, …,I}. The model described in Eq. (12) results in correlation between neighboring pixels, and the degree of correlation depends upon the specification of the transition probabilities P{X i+1 = x i+1 |Xi = xi}. In this paper, we will assume the following form for the transition probabilities: (13) where m is the spatial correlation index and cxi is selected so that P{X i+1 = t|Xi = xi} is a probability mass function (as a function of t) for every xi; more precisely, (14) = 1 + 1 � 2 − + 2 ( + 1) + 1�. Figure 5 depicts representative examples of the transition probabilities P{X i+1 = x i+1 |Xi = xi} (for xi = 4); these probability mass functions signify the correlation present between X i+1 and its neighbor Xi. As the correlation index increases from one curve to another, the spatial correlation increases between neighboring pixels. As such by changing the correlation index m, we generate a range of correlation degrees among pixels, extending from independence (m = 0) to perfectly correlated (m = ∞).
Finally, the Markov model allows us to write the probability mass function, P X(x), for the image in terms of the transition probabilities and the probability mass function of X 1, which can be chosen arbitrarily, (15)

Discussion of the normalized mutual information metric
We now evaluate the normalized mutual information metric, ρ, assuming the above described Markovchain model for different values of the correlation index, m, and also for different channel probability distributions. In our calculations we have assumed that the first pixel, X 1, is uniformly distributed. Figure 6 shows ρ as a function of the transmission probability, ε, for different channel conditional distributions for a particular spatial correlation index, m = 0.01. We find that the case of Poisson photon statistics yields a higher normalized mutual information than that for the Boltzmann (geometric) photon statistics. We also observe that, as expected, the normalized mutual information from the binomial photon-statistics case converges to that for the Poisson statistics as the parameter p of the binomial distribution tends to zero (while keeping its mean fixed). Figure 7 shows ρ as a function of the transmission probability, ε, parameterized by different spatial correlation indices, m, for both the Poisson and geometric photon statistics. Here too we find that the Poisson-statistic model offers higher fidelity metric, ρ, than that offered by the geometric photonstatistics model. Notably, it is seen that fidelity metric increases with spatial correlation in the source image. In particular, Fig. 7 suggests that the highly correlated case, corresponding to m = 0.5, yields higher fidelity metric than that offered by the nearly independent-pixel case (m = 0.0005).
In summary, we draw the following observations from the results. First, the performance metric ρ increases as the variance of the photon number decreases (from that corresponding to a Boltzmann distribution, to a Poisson distribution, and finally to a binomial distribution). This is expected since a reduced quantum noise implies that the photon count resembles the image more closely and, in turn, implying higher correlation between the source image and the photon-counted output image. Second, a more important observation is that ρ increases as the spatial correlation in the image becomes stronger. This suggests that the ability of the PCI approach to retain spatial information improves with the spatial correlation in the source image. Namely, the PCI approach seems to be inherently geared toward "images" rather than individual pixels. These observations are confirmed in the 2D simulations considered next. In a Markov-random field (MRF), the value of each pixel depends on a neighborhood of pixels. Various neighborhood systems can be defined for the lattice corresponding to the sites of pixels in an image.
Some typical examples of neighborhoods are shown in Fig. 8(a). From the Hammersley-Clifford theorem [33], we have the so-called Gibbs representation for the probability distribution of a MRF. A Gibbs random field (GRF) is defined by Gibbs distribution: (16) where x is an image and U is the energy function given by (17) where the above summation is over the set C of all cliques c and Vc is a clique potential (or interaction function). A clique, c, is a subset of sites in an image with the property that every pair of distinct sites are neighbors. (By definition, every set containing only one site is also a clique.) If the clique potential, Vc(x), is independent of the position of the clique in the lattice, then we call such an MRF a homogeneous MRF. Figure 8(b) shows all the cliques that are present in a second-order neighborhood.
In a first-order neighborhood, we have two categories of cliques based on the number of points they contain, as shown by the first three shapes of cliques (from top, left corner) in Fig. 8(b). In the secondorder neighborhood, we have four categories and a total of ten shapes of cliques.
In this paper, we have considered a commonly used special case of a homogeneous MRF termed the Ising model, [33]. A generalized Ising model is characterized by its clique potential function, which has the following form: (18) ( ) = � if the pixel values of at the site in are same − otherwise. Here, βc is a real number and it depends on the type of the clique; this parameter controls the spatial correlation in the image. A more detailed description of MRFs is given in Appendix 5.1.
Finally, it is known that for an image modeled by a homogeneous MRF, it is sufficient to consider H(X) and I(Y;X) for a neighborhood of a pixel [31]. A detailed description of the method of estimating these quantities from an image is given in Appendix 5.2.

Discussion of the normalized mutual information metric in the 2D setting
We have generated MRF images according to the generalized Ising model described in (18) with the following specification of the parameter βc. For any one-site clique c, βc = 1; for any two-site clique c, βc = β, a constant. Finally, for all three-site and four-site cliques, βc = 0. We followed the Metropolis sampling algorithm [33] to generate 3-bit images of size 128×128 with varying spatial-correlation parameter, β. (The temperature parameter, T, as discussed in Appendix 5.1, is set to 3.) The algorithm is run for 1000 iterations for each image generated; examples are shown in Fig. 9 showing the change in spatial correlation in the image as β is varied.
To quantify the spatial correlation present in the source images, we estimated the auto-covariance of the image. More precisely, for each selection of the parameter β we generated 50 realizations of the image. We then formed the covariance matrix corresponding to the 50 sample images and generated the L 1 norm of it, which is a measure of spatial correlation. Next, we normalized the spatial correlation in the image by that for a constant (perfectly correlated image), which resulted in a number between 0 and 1. We took this number, termed the correlation metric as a measure of the amount of spatial correlation present in a the MRF image.  From Table 2, we can see that as the parameter β decreases, the correlation metric increases. In particular, from Fig. 10, we observe that, when β increases from -1 to -0.9, the correlation metric drops rapidly. This result can also be seen in from Figure 9: as β increases, the image begins to loose its spatial structure and begins to resemble white noise. It is also evident that as the the correlation metric increases (by decreasing the parameter β), the fidelity metric, ρ, also increases.
Next, as in the case of the 1D Markov-chain model, we plot ρ as a function of transmission probability, ε, parameterized by β, as shown in Fig. 11(a). It is seen that ρ increases monotonically with ε. Moreover, for a fixed ε, ρ increases as β is decreased (i.e., as correlation metric is increased). To see the dependence of ρ on spatial correlation of the image more clearly, we plotted ρ as a function of the correlation metric, as shown in Fig 11(b). It is seen that there is a nearly linear relationship between the correlation metric and ρ.

Conclusions
In this paper we have viewed the PCI method as a communication system over a stochastic channel; this allowed us to model and analyze the performance of the PCI method within a rigorous information-theoretic framework. In our model, we regard the source as the ensemble of all possible intensity images of interest with a known probability distribution function and the output as the ensemble of all digital photon-count images. The channel is governed by laws of statistical optics and photo-detection, which together allow us to characterize the conditional probability that an output image is generated given that a specific source image has been used. Normalized mutual information between the source and the output images was proposed as a fidelity metric, measuring the loss of information content in the output image relative to the source image. The analysis specifically captures the role of spatial correlation present in the source image by means of Markov image models. Calculations suggest that the effectiveness of the PCI method is enhanced with the presence of spatial correlation in the source image. In addition, we examined the performance of the PCI method under Poisson and alternative photon statistics (Boltzmann and photon-number squeezed distributions). Calculations suggest that the performance improves as the variance of the photon number is reduced, which is consistent with our understanding of the role of quantum noise in imaging.  While this paper considers only the analysis of elemental images in the PCI method, our model can be extended in a straightforward manner to sequences of elemental images and 3D images by expanding the sizes of the vectors. However, we believe that the insight brought about by our analysis of elemental images can justify the tenet of the photon-counting integral imaging approach as a compressive sensing tool. There are many possible directions that can be pursued based upon the foundational work provided in this paper. For example, since image compression is an inherent property of the PCI method, it will be interesting to look into the fundamental trade-off between compression and preserving 3D image fidelity in the PCI method as well as compressive 3D imaging and visualization. It is also possible to employ recognition-specific metrics such as the Mahalanobis and Bhattacharya distances to further characterize the use of PCI-generated imagery in object classification.

Background to MRF
A random field is a collection of random variables arranged on a lattice, (19)