On spatial resolution, signal-to-noise and information capacity of linear imaging systems

A simple model for image formation in linear shift-invariant systems is considered, in which both the detected signal and the noise variance are varying slowly compared to the point-spread function of the system. It is shown that within the constraints of this model, the square of the signal-to-noise ratio is always proportional to the"volume"of the spatial resolution unit. In the case of Poisson statistics, the ratio of these two quantities divided by the incident density of the imaging particles (e.g. photons) represents a dimensionless invariant of the imaging system, which was previously termed the intrinsic imaging quality. The relationship of this invariant to the notion of information capacity of communication and imaging systems, which was previously considered by Shannon, Gabor and others, is investigated. The results are then applied to a simple generic model of quantitative imaging of weakly scattering objects, leading to an estimate of the upper limit for the amount of information about the sample that can be obtained in such experiments. It is shown that this limit depends only on the total number of imaging particles incident on the sample, the average scattering coefficient, the size of the sample and the number of spatial resolution units.


Introduction
Among the most important characteristics of many imaging, communication and measurement systems are their capacity to encode or transmit large quantities of information, resolve sufficiently fine spatial details and provide output with high signal-to-noise ratio (SNR) at lowest possible radiation doses or input energy levels [1,2]. For mainly historical reasons (abundance of photons in typical visible light imaging applications), imaging performance characteristics related to the incident particle fluence have not been extensively studied in classical optics.
However, such characteristics, and the interplay between them, have attained additional relevance in recent years in the context of biomedical imaging, where the samples are often sensitive to the radiation doses [3], in certain astronomical methods where the detectable photon flux can be extremely low [4], as well as in some problems related to foundations of optics and quantum physics [5,6]. In X-ray medical imaging, in particular, it is critically important to minimize the radiation dose delivered to the patient, while still being able to obtain 2D or 3D images with sufficient spatial resolution and SNR in order to detect the features of interest, such as small tumours [7,8]. In this context, an imaging system (e.g. a CT scanner) must be able to maximize the amount of relevant information that can be extracted from the collected images, while keeping sufficiently low the number of X-ray photons impinging on the patient. The present paper addresses some mathematical properties of generic imaging systems, that are likely to be important in the context of designing medical imaging instruments, and may also have relevance to some fundamental aspects of quantum physics and information theory.
We have recently introduced a notion of intrinsic imaging quality, which incorporates both the spatial resolution and the noise propagation properties of an imaging system [6][7][8][9][10]. Compared to previous publications on this and related topics, Section 2 of this paper extends the notion of intrinsic imaging quality from "flat-field" (featureless) images to the situation were the signal and noise levels can vary across the images, but the rate of variation is slow compared to that of the point-spread function (PSF) of the imaging system. In Section 3, we consider the notion of information capacity of imaging systems, leading to results which are consistent with the ones previously reported on the basis of the classical approach originated by C. Shannon in the context of communication systems [11][12][13]. In Section 4, we apply the obtained results for the intrinsic imaging quality and information capacity of linear shift-invariant (LSI) systems to analysis of a simple model of quantitative imaging of weakly scattering samples. In particular, we derive an estimate for the maximum information about the sample that can be extracted in the relevant scattering experiments. We show that such a limit for the extractable information depends only on the total number of imaging particles used (i.e. on the radiation dose), the number of spatial resolution units and the "scattering power" of the sample, which is defined as a product of the average scattering coefficient and the linear size of the sample. The main results of the present paper are summarized in Section 5.

Noise and spatial resolution
Consider a simple model for formation of images in an LSI system which involves a stochastic flux of imaging particles incident on a positionsensitive detector, and a linear filter function describing the deterministic post-detection action of the imaging system. Here the fluence of particles (i.e. the particle density) in an image registered by the detector will typically result from interaction of particles emitted by a source with a scattering object (sample) that is being imaged, but we leave the analysis of effects of such interaction until Section 4 below.
Different realizations (instances) of images registered by the detector under the same conditions correspond to different sample functions of the stochastic detected particle fluence distribution ( ) in S x , n ∈ x  , where n is the dimensionality of the image. Note that here we generalize the concept of a "detection system" to any positive integer dimension n , which can be convenient e.g. for description of 3D imaging. We will call the expectation value (ensemble average) ( ) in S x the "detected signal" (which serves as an "input" for the post-detection signal processing system). The difference is the corresponding "noise process".
The second-order correlation properties of the particle fluence registered by the detector are described by the autocovariance function We assume that the autocovariance function has the following form: where the ratio ( ) / ( ) H H − x y 0 has the meaning of the degree of correlation, and 2 ( ) ( , ) in σ = Γ x x x is the noise variance. Note that this model is similar in form to that of the quasi-homogeneous source [2], but while the latter model represents the second-order correlation function of complex amplitudes, the present model, eq.(1), is of the fourth order with respect to the complex amplitudes. The key assumption that we make here is that the function (  x y x y y . The function ( ) P x represents the "native" point-spread function (PSF) of the imaging system, which can reflect both the correlation properties of the incident particle fluence (i.e. the properties of the source and the imaging set-up) and the PSF of the detector. We assume here that ( ) P x is non-negative. Then, taking into account that 2 x is the function that is equal to one in the cubic area around the origin of coordinates with side length equal to 2σ, and is equal to zero outside this area. An autocovariance function of the type described above is obtained, for example, in the case of a filtered Poisson point process [1] when the width of the filter is small compared to the typical variation length of the mean photon fluence.
We want to find out how the output of the imaging system changes after the action of an LSI transformation with a non-negative PSF (filter function) ( ) T x , Let us calculate the average signal (i.e. the expectation) and the (noise) variance of the filtered process in eq.(2) under the assumption that ( ) T x is varying slowly compared to ( ) H x (i.e. ( ) T x is almost constant over distances comparable with the correlation length of the input noise), but the width of ( ) T x is much smaller than the characteristic length of variation of the detected signal ( ) in S x and noise variance 2 ( ) in σ x . Under these assumptions, we have: x y x y y y y y x 0 .
Therefore, the following expression can be given for the output SNR: Having introduced the notion of SNR in the LSI system defined by eq.(3), we now turn to the notion of spatial resolution. The intrinsic spatial resolution ∆x of a system described by eq.(3) can be defined via the variance of its PSF as follows, where the factor 4π / n is introduced for normalization purposes (see Table 1), and we have assumed for simplicity that ( ) 0 T ≥ x is properly centred with respect to = x 0, i.e. its first integral moment is equal to zero.
Following the ideas discussed in [2,14] and elsewhere, it is also possible to define the spatial resolution of the LSI system by an alternative expression: The results of calculation of for some popular PSFs shown in Table 1 demonstrate that the spatial resolution defined by eq.(7) produces similar, and in some cases more natural results, compared to the more conventional definition given by eq. (6). Note in particular that in the case of our model PSF functions ( ) respectively, which correlates well with the usual understanding of the "width" of these two functions.
Combining eqs. (5) and (7), we can obtain: is the spatial resolution associated with the native PSF ( ) P x . Equation (8) means that the ratio of the square of SNR to the volume of the spatial resolution unit is invariant with respect to actions of LSI systems under the assumed conditions. In other words, an exact duality exists between the signal-to-noise and the spatial resolution of the imaging system. It can be verified that any additional convolution with another filter ( ) V x , which is much wider than ( ) T x but much narrower than the typical variation length of the detected signal, will result in the same equation for the noise variance as eq.(5), with ( ) T x replaced by ( ) V x . This can be proved using the fact that the two subsequent convolutions are equivalent to the convolution of the original image with the PSF ( * )( ) , and under the stated assumptions, we get also that If the detector area A n is fixed, the spatial resolution "area" and the total number 2 M of resolvable effective "pixels" (resolution units) are inversely proportional to each other: 2 Then eq.(8) implies that the product of the number of resolvable spatial units (pixels) and the square of the SNR is also invariant under the action of LSI systems with suitable PSFs: Equation (9) shows that, at least under the stated assumptions, the invariant, 2 2 ( ) M SNR x , does not change after post-processing operations that can be described in the form of a convolution or deconvolution equation.
Note that, while the invariant defined by eq.(9) is dimensionless, the invariant defined by eq.(8) has the dimensionality of inverse length taken to the power of n. One way to "naturally" normalize this function and make it dimensionless is to divide it by ( ) Then using eq.(7) we obtain: More generally, the quantity defined in eq.(8) can be expressed as x 0 , and hence the latter expression gives the "normalization factor" that correctly accounts both for the strength of the detected signal and for its variation length scale. For Poisson processes one has 2 x , which coincides with the normalization factor chosen above for this case.
We have previously introduced and studied [6-10] the following dimensionless quantity, which is close in form to eq.(10) and incorporates both the noise propagation and the spatial resolution properties of LSI system: This quantity was termed the "intrinsic imaging quality" characteristic (per single particle) of the imaging system. Note that although both SNR and in S can vary across the image, they have been assumed to be "quasistationary", i.e. approximately constant within the width of the filter function of the LSI system. It turns out that under these conditions, the imaging quality S Q is constant across the image. Indeed, substituting (10) into (11) and then taking into account the definitions of the spatial resolution from eqs. (6) and (7), we obtain i.e. the intrinsic imaging quality characteristic can be expressed purely in terms of the filter PSF alone. It is easy to verify that the functional in the right-hand side of eq.(12) is bi-invariant with respect to multiplication of the PSF or its argument by any positive constant. Therefore, S Q is independent from the height or width of the system's PSF, and depends only on its functional form (see Table 1). It was proven in [9] that this functional is always bounded from above, i.e. the inequality 2 ( ) 1/ S n Q T C ≤ holds and is exact for LSI systems with point-spread functions T(x) having finite mathematical expectation, variance and energy, where x [15,9], where the subscript "+" at points where the expression in brackets is negative. Note that 1 6 /125 0.95 C π = ≅ and Cn → 0 monotonically when n → ∞. Although the definition of the intrinsic quality was originally introduced for LSI systems [7], later we extended it to some non-linear systems. One such example, studied in [6], corresponded to the case of the ideal observer SNR [1] in the famous Young double-slit diffraction experiment.
We will also need an estimate for the average value of 2 ( ) SNR x over all spatial resolution units in the image. In the case of incident fluences with Poisson statistics, we obtain from eq.(10): x , which can be identified with the average number of imaging particles registered in the "pixel" (spatial resolution unit) with the center at where N is the average total number of incident particles per registered image and the angular brackets denote the average over all resolution units in the sample.
A useful expression for the intrinsic imaging quality characteristic can be obtained by first noting that 2 As discussed in the next section, eq. (14) is somewhat similar in form to the expression for information capacity of imaging systems.

Information capacity of imaging systems
Consider a simple "imaging" experiment that involves N incident particles ("imaging quanta", e.g. photons) and a 2D detector with M pixels (where both N and M are non-negative integer numbers). We would like to calculate first how many distinct "images" can be produced in this situation, in other words, how many distinct sequences 1 we can conclude that the quantity satisfies the recursive relationship of eq.(15). The correctness of "normalization" in eq.(17) is justified by considering the case of M = 1 bins: obviously, in this case the number of possible different placements of balls is equal to 1 (they can only all go into the one available bin). This agrees with the fact that, according to eq.(17), (1, ) 1 X N = for any N. Now let us consider the possibility that some of the particles incident on the object can be absorbed or strongly deflected by the object, etc., and as a result will not be registered by the detector. This means that some of the resultant "images" in this situation can be created with a different number N′ of registered quanta, where N′ can be any integer satisfying 0 N N ′ ≤ ≤ . Let us calculate the number, ( , ) Y M N , of all distinct images that can be formed this way. Obviously, Knowing the explicit form of ( , ) X M N given by eq.(17), we can use eq.(16) again to evaluate the sum on the right-hand side of eq.(18), with the result: Note, in particular, that in the simplest case of one bin, M = 1, eq. (19) gives ( Let us introduce a notion of "information capacity" by analogy with the original information capacity definition introduced by Shannon [11] for communication systems and later extended to imaging systems [12][13]. We will also consider information capacity per single particle: Usually in practice the number of imaging particles (e.g. photons) in an image is much larger than the number of detector pixels, which corresponds to the case N M >> in the above equations. In that case eq.(21) leads to the following expressions Let us now postulate that two images are distinguishable if and only if the difference between their signal values in at least one pixel exceeds the sum of standard deviations of noise in that pixel in the two images (this can be easily modified to include a distinguishability factor SNRmin which can e.g. be equal to 5, if the Rose criterion is applied). In the "noise-free" case considered above, the distinguishability was equivalent to any difference in detected particle numbers in the two images by at least one particle in at least one pixel, which formally corresponds e.g. to the standard deviation of noise being equal to 1/2 or less in all pixels of both images.
Consider the case where the detector is not perfect and contains a fixed amount of additive noise in each pixel. Assuming that the level of noise is σ, two signals, corresponding to n1 and n2 registered particles in a pixel, respectively, can be considered distinguishable e.g. if 1 In other words, the number of distinguishable images can now be calculated in terms of "bunches" of 2σ particles, instead of the individual particles (note that the fractions of the bunches, that correspond to signal increments below the level of distinguishability, may lead to a reduction of the total number of full bunches). Therefore, the result obtained above in eqs. Although the above considerations regarding the extension of eqs.(22)- (23) to the case of noisy images are not rigorous, it turns out that they lead to correct formulae for the information capacity in the case of Poisson (shot) and Gaussian (additive) noise. A rigorous derivation for these cases is given in the Appendix where more accurate formulae are derived from the results in [18][19][20]. Specifically, an estimate for the information capacity for the case of Poisson statistics when / N M is large, is given by from which it follows that where the term (1) o tends to zero in the limit as / N M → ∞ .
Identifying the individual "pixels" with the isotropic spatial resolution units having the side length given by eq. (7) and substituting eq.(13) into eq. (24), we obtain that in the case of Poisson noise: Previously, expressions for the information capacity of imaging systems with additive noise in a form similar to eq.(27) have been obtained in [12,13].

Effect of the imaged sample
The above analysis has been carried out without a reference to an imaged sample. In the present section we consider one simple generic model which takes into account the interaction of the incident particle fluence with a sample and its consequences for the information about the sample that can be obtained in an imaging experiment.
Consider an experiment involving interaction of a flux of incident imaging particles with a weakly scattering sample, after which the scattered particles are registered by a position-sensitive detector. We assume for simplicity that the particle fluence over the "incident surface" of the sample can be described by its spatially uniform expectation value ( 1) 0 n S − . Note that the latter quantity is expressed in the number of particles per unit "area", i.e. it has the dimensionality of inverse length in the power of n -1, while the detected signal, ( ) in S x , introduced in Section 2 was expressed in the number of particles per unit "volume", i.e. it has the dimensionality of inverse length in the power of n. We assume here that the detected scattered fluence ( ) in S x has a Poisson distribution with the mean x . This corresponds to the case of so-called "dark-field imaging" ,when the primary unscattered beam does not reach the detector. Modification to the case of bright-field imaging is quite straightforward and the corresponding result is given below. Here ( ) γ x is a deterministic non-negative function ("scattering coefficient") that describes the interaction of the incident fluence with the sample. This interaction is assumed to be linear with respect to the interaction length, i.e. ( ) γ x has the dimensionality of inverse length. As a model for ( ) γ x one may consider, for example, the linear attenuation, ( ; ) (4 / ) ( ; ) µ λ π λ β λ = x x , or refraction, , coefficients in X-ray imaging, where 1 n i δ β = − + is the complex refractive index of the sample and λ is the X-ray wavelength (we omit the argument λ in the notation for ( ) γ x for brevity). It can sometimes be helpful to express ( ) ( ) γ σρ = x x , where σ is the scattering cross-section and ( ) ρ x is the number density of scattering centers (usually, atoms) in the sample. The interaction is also assumed to be weak, so that ( ) , where A is the linear size of the sample. In particular, this allows us to ignore the weakening of the incident flux in the course of its propagation through the sample. Finally, the interaction has a "single-scattering" nature, i.e. each incident particle interacts at most with a single scattering center the sample, as is usually assumed in the first Born approximation or in the so-called "kinematical approximation" in X-ray and electron diffraction [21]. This model is somewhat similar to the one used in diffraction tomography [22], where the first Born approximation is invoked for the description of interaction of the radiation with the sample. However, unlike diffraction tomography, our simple model ignores the effect of propagation of the scattered radiation to the detector, assuming only that there is a one-to-one correspondence between the resolution volumes in the sample and the effective "detector pixels". The totality of detector pixels may possibly be aggregated over multiple 2D exposures of the sample. For example, in conventional or diffraction CT imaging, such one-to-one correspondence may be achieved after a reconstruction procedure, e.g. after the application of the inverse Radon transform to the data collected in real detector pixels at different rotational positions of the sample. In the case of weakly absorbing or weakly scattering samples, such reconstruction procedure is usually linear and can be expressed by a convolution with the corresponding filter function [23], and hence these cases fit well into the LSI approach considered above.
Consider now a post-detection filtering of the Poisson process with a narrow filter function ( ) T x as in Section 2 above (i.e. the filter function is required to be narrow with respect to the characteristic variation length of the scattering properties across the sample). We then obtain: We can now use the results from the previous sections to characterize the information capacity of an LSI imaging system with a weakly scattering object. In the present context, this can be understood as a capacity of the system to distinguish between different imaged objects.
Using the expression for 2 ( ) SNR x from eq.(28) and arguing similarly to the derivation of eq.(13) above, i.e. averaging over all spatial resolution units into which the sample is split, we can obtain 2 ( 1) Let us consider for comparison a more specific example of conventional computed tomography (CT) imaging. It was shown in [24] that in the case of filtered back-projection reconstruction (FBP) with the nearest-neighbor interpolation, one has: , [25] (which leads to isotropic spatial resolution in the reconstructed slices), for derivation of the final result in eq.(31). Note that equation (31) correctly reflects the wellknown fact [3] that in order to keep the average SNR in the reconstructed CT images constant, the number N of incident particles (and hence the radiation dose to the sample) has to be changing as the inverse 4th power of the spatial resolution (note that ). The cause of the difference between this and the corresponding result for the bright-field scattering model, eq.(30), which has a different power of M2 in the denominator, is in the mathematical properties of the inverse Radon transform. The latter is known to amplify the noise in the input signal via the application of the ramp filter in the course of FBP CT reconstruction (this fact is also reflected in the growth of the eigenvalues of the inverse Radon transform with the radial index) [25].
Using eq.(29) in combination with eq.(26) for the information capacity, we obtain: When the spatial resolution is fixed, the dependence of this information capacity on the number of particles N is straightforward, x for the incident fluence, into the definition of 2 S Q in eq.(11), we find that the imaging quality here has exactly the same form as in the case with no sample considered in Section 2. In particular, eq.(12) and the estimate 2 remain unchanged. This happens because, according to eq.(12), the imaging quality can be expressed in terms of the two definitions of the system's spatial resolution alone, and these resolutions do not depend on the effect of the scattering power of the sample under the conditions of the simple model considered here.

Conclusions
We have considered the relationship between the spatial resolution, signalto-noise ratio and information capacity of linear shift-invariant imaging systems. Convenient imaging performance characteristics of such systems include the imaging quality S Q and the information capacity per single imaging particle (e.g. photon), DS. From the results of this paper it is clear that, apart from relatively insignificant factors, the information capacity per single particle and the intrinsic imaging quality have similar mathematical expressions, namely However, the behavior of these two quantities under linear transformations of the imaging system is different due to the presence of the logarithm in the first expression. It is usually possible to increase the signal-to-noise (SNR) ratio at the expense of spatial resolution (or equivalently, the total number of spatial resolution units, M) and vice versa using operations like image filtering, denoising or deconvolution. While the intrinsic imaging quality remains essentially invariant under such (linear) transformations, the information capacity can change significantly: for example, a system with a single spatial resolution unit has quite low information capacity per particle, ~2 (log ) / N N . On the other hand, when the number of particles is fixed, the information capacity per particle can grow without a limit when the number of resolution units grows: consider the fact that an unlimited number of new images can be created, in principle, by directing a photon or a fixed number of photons into new pixels (one pixel at a time).
In imaging problems that involve quantitative analysis of samples on the basis of transmitted or, more generally, scattered radiation, the notion of "useful signal" is modified. We have studied one example of this type of problems in Section 4 of the paper. We have found that the information capacity is proportional to the logarithm of the product of the total number of imaging particles and the scattering power of the sample, the latter depending only on the average scattering coefficient and the size of the sample (or alternatively, on the total number of scattering centers, e.g. atoms, in the sample, and the average relative scattering cross-section of each such center). On the other hand, the imaging quality per single particle is not affected by the scattering properties of the sample, and remains the same as in the sample-free case. It means that, in accordance with its name and the intended role, the intrinsic imaging quality characteristic depends only on the properties of the imaging system itself, and does not depend on the imaged sample.  where the ( ) 1 o term becomes zero in the limit as n → ∞ .This is consistent with the results in Section 3.
The channel capacity for a communications channel with independent additive Gaussian noise, is well known [11]. In this case, we have Y Z = Λ + where Λ is a random variable representing the signal and Z is an independent mean zero Gaussian random variable that models the noise. The channel capacity is given by