Three dimensional object recognition with photon counting imagery in the presence of noise

Threedimensional (3D) imaging systems have been recently suggested for passive sensing and recognition of objects in photon-starved environments where only a few photons are emitted or reflected from the object. In this paradigm, it is important to make optimal use of limited information carried by photons. We present a statistical framework for 3D passive object recognition in presence of noise. Since in quantum-limited regime, detector dark noise is present, our approach takes into account the effect of noise on information bearing photons. The model is tested when background noise and dark noise sources are present for identifying a target in a 3D scene. It is shown that reliable object recognition is possible in photon-counting domain. The results suggest that with proper translation of physical characteristics of the imaging system into the information processing algorithms, photon-counting imagery can be used for object classification. © 2010 Optical Society of America OCIS codes: (100.6890) Three-dimensional image processing; (030.5260) Photon counting; (110.6880) Three-dimensional image acquisition; (999.9999) Quantum-limited imaging. References and links 1. J. W. Goodman, Statistical Optics(Wiley-Interscience, 1985), Wiley classics ed. 2. J. R. Janesick, Scientific Charge-Coupled Devices (SPIE Press Monograph Vol. PM83) (SPIE Publications, 2001), 1st ed. 3. F. Dubois, “Automatic spatial frequency selection algorithm for pattern recognition by correlation,” Appl. Opt. 32, 4365–4371 (1993). 4. F. Sadjadi, ed., Selected Papers on Automatic Target Recognition (SPIE-CDROM, 1999). 5. V. Page, F. Goudail, and P. Refregier, “Improved robustness of target location in nonhomogeneous backgrounds by use of the maximum-likelihood ratio test location algorithm,” Opt Lett 24, 1383–1385 (1999). 6. A. Mahalanobis, R. R. Muise, and S. R. Stanfill, “Quadratic correlation filter design methodology for target detection and surveillance applications,” Appl Opt 43, 5198–5205 (2004). 7. H. Kwon and N. M. Nasrabadi, “Kernel matched subspace detectors for hyperspectral target detection.” IEEE Trans Pattern Anal Mach Intell 28, 178–194 (2006). 8. B. Javidi, R. Ponce-D́ ıaz, and S.-H. Hong, “Three-dimensional recognition of occluded objects by using computational integral imaging,” Opt. Lett. 31, 1106–1108 (2006). 9. O. Matoba, E. Tajahuerce, and B. Javidi, “Real-time three-dimensional object recognition with multiple perspectives imaging,” Appl. Opt.40, 3318–3325 (2001). 10. G. M. Morris, “Scene matching using photon-limited images,” J. Opt. Soc. Am. A 1, 482–488 (1984). 11. E. A. Watson and G. M. Morris, “Imaging thermal objects with photon-counting detectors,” Appl. Opt. 31, 4751– 4757 (1992). #133500 $15.00 USD Received 16 Aug 2010; revised 5 Nov 2010; accepted 17 Nov 2010; published 2 Dec 2010 (C) 2010 OSA 6 December 2010 / Vol. 18, No. 25 / OPTICS EXPRESS 26450 12. S. Yeom, B. Javidi, and E. Watson, “Three-dimensional distortion-tolerant object recognition using photoncountingintegral imaging,” Opt. Express 15, 1513–1533 (2007). 13. S. Yeom, B. Javidi, and E. Watson, “Photon counting passive 3d image sensing for automatic target recognition,” Opt. Express13, 9310–9330 (2005). 14. I. Moon and B. Javidi, “Three dimensional imaging and recognition using truncated photon counting model and parametric maximum likelihood estimator.” Opt Express 17, 15709–15715 (2009). 15. S. R. Narravula, M. M. Hayat, and B. Javidi, “Information theoretic approach for assessing image fidelity in photon-counting arrays,” Opt. Express 18, 2449–2466 (2010). 16. S. Yeom, B. Javidi, C. wook Lee, and E. Watson, “Photon-counting passive 3d image sensing for reconstruction and recognition of partially occluded objects,” Opt. Express 15, 16189–16195 (2007). 17. M. G. Lippmann, “La photographie int égrale,,” Comptes-rendus de l’Acad émie des Sciences 146, 446–451 (1908). 18. C. B. Burckhardt, “Optimum parameters and resolution limitation of integral photography,” J. Opt. Soc. Amer 58, 71–76 (1968). 19. T. Okoshi, “Three-dimensional displays,” Proceedings of the IEEE 68, 548–564 (1980). 20. M. C. Forman, N. Davies, and M. McCormick, “Continuous parallax in discrete pixelated integral threedimensional displays.” J Opt Soc Am A Opt Image Sci Vis 20, 411–420 (2003). 21. A. Stern and B. Javidi, “Three-dimensional image sensing, visualization, and processing using integral imaging,” Proceedings of the IEEE 94, 591–607 (2006). 22. F. Okano, J. Arai, K. Mitani, and M. Okui, “Real-time integral imaging based on extremely high resolution video system,” Proceedings of the IEEE 94, 490 –501 (2006). 23. B. Javidi, F. Okano, and J.-Y. Son, eds., Three-Dimensional Imaging, Visualization, and Display (Signals and Communication Technology) (Springer, 2008), 1st ed. 24. R. Martinez-Cuenca, G. Saavedra, M. Martinez-Corral, and B. Javidi, “Progress in 3-d multiperspective display by integral imaging,” Proceedings of the IEEE 97, 1067 –1077 (2009). 25. J.-S. Jang and B. Javidi, “Three-dimensional synthetic aperture integral imaging,” Opt. Lett. 27, 1144–1146 (2002). 26. S.-H. Hong, J.-S. Jang, and B. Javidi, “Three-dimensional volumetric object reconstruction using computational integral imaging,” Opt. Express 12, 483–491 (2004). 27. B. Javidi, P. Refregier, and P. Willett, “Optimum receiver design for pattern recognition with nonoverlapping target and scene noise.” Opt Lett 18, 1660 (1993). 28. E. A. Richards, “Limitations in optical imaging devices at low light levels,” Appl. Opt. 8, 1999–2005 (1969). 29. P. Rfrgier,Noise Theory and Application to Physics (Springer, 2004), 1st ed. 30. A. Papoulis and S. Pillai, Probability, Random Variables and Stochastic Processes (McGraw Hill Higher Education, 2002), 4th ed. 31. R. J. Schalkoff, Pattern Recognition: Statistical, Structural and Neural Approaches (Wiley, 1991), 1st ed.


Introduction
Photons are considered the basic carriers of optical information in the context of imaging system.However, a photon's behavior is governed by principles of quantum physics [1].This makes it difficult to rely on an individual photon or even a small number of them for reliable information transfer.Another stage of indeterministic process occurs in the detection stage where the photons are converted into electrons and counted using electronic circuitry [2].Fortunately, there is an abundance of photons in most scenarios which has resulted in sensors, imaging systems and image processing algorithms to operate around statistical properties of information bearing photons.However, there are a number of benefits to systems that can perform various high level tasks such as visualization, object recognition and classification with limited photons.
Many of classical object recognition algorithms operate on images that are formed using tremendous number of photons [3][4][5][6][7].These algorithms have also been explicitly adopted for three dimensional imaging systems [8,9].The optimality of such algorithms, however, may not carry over if these methods are extended directly to the photon counting regime due to the quantum-limited nature of the imagery.Thus, a new class of automatic object recognition problems arise within the context of photon-counting image sensing [10,11].In fact, threedimensional, multi-perspective imaging systems along with conventional linear and nonlinear matched filters have been applied to photon counting object recognition [12,13].The methods of statistical sampling theory have also been investigated for such problems [14].Photon count-ing image fidelity has also been studied from an information theory point of view [15].To the best of our knowledge, there has been no study on the effects of background and dark noise on object recognition performance in photon counting domain.
In this paper, the maximum likelihood decision theory is used for object recognition in 3D photon-counting imagery where the ratio of object photons to dark counts is less than one.Unlike conventional intensity images, photon-counting images with very few (50 or less) photons contain many pixels that register no counts at all.A typical matched filter [16], for example, would not consider such pixel as information bearing.Nevertheless, the absence of photon counts, by itself, conveys information about the object which is exploited in the present framework.This is the key difference between the proposed method with prior art which makes it more robust to background and dark noise sources.
The rest of the paper is organized as following: in Section 2, a brief review of multi-view image sensing and reconstruction is presented.In Section 2.1, the disjoint object and background model is combined with quantum-limited photo-detection principles to model realistic photoncounting imagery including dark noise.Section 3 represents the maximum likelihood based pattern recognition algorithm, while Section 4 contains experimental results and performance evaluation of the proposed method.The paper concludes in Section 5.

Multi-View Photon-Counting 3D Sensing
Three dimensional (3D) passive imaging and display systems using multiple sensors has been extensively studied [17][18][19][20][21][22][23][24].The image registered by each sensor is commonly referred to as an elemental image.Multiple image sensors can be used in a grid, or a single sensor can sequentially scan and collect the images while moving on a platform (also known as synthetic aperture integral imaging [25]) [see Fig. 1].In either method, angular information of the rays are encoded in the relative lateral shift of ray-sensor intercept between multiple sensors.Having both direction and intensity information of rays emanating from the object, one can computationally reconstruct the scene at a desired distance from the sensor array using back-projection [26].back-propagation based reconstruction can be written as: in which p = p f /µz is the pickup grid pitch (p) normalized by sensor pixel pitch (µ) and magnification (z/ f ) [see Fig. 1].
A point in the object space at distance z can be reconstructed by integrating its associated image pixels on all sensors, that is for the i-th object space point at distance z one has R . The collection of all points on plane z = z 0 represent the light field distribution in the object space at that particular plane, i.e.R ↑ z = R i : i = 1 . . .M .Similarly, the light field distribution in 3D object space can be reconstructed at Q intermittent planes, such that: R = {R ↑ z q : q = 1 . . .Q} (2)

Photon Counting Imagery Model
When an object is present in non-obstructing clutter, the clutter can be considered as spatially disjoint background noise.Such models appear frequently in image-based pattern recognition problems when an object is to be detected or recognized in presence of spatially disjoint background noise [27].The advantage of this model for recognition purposes is that it allows for the object and background pixels to be treated independently based on their respective available a priori knowledge.We extend this model to 3D imaging systems and particularly apply it to quantum-limited (photon-counting) imaging scenarios.In addition to the disjoint background noise, in quantumlimited imaging conditions, the number of thermally excited (dark) electrons in detector arrays can be comparable to, or exceed, photo-electrons [28].In such case, it is essential to model and take dark noise into account in pattern recognition problems using quantum-limited imagery.
Dark electrons are predominantly generated by thermal excitation within defective regions in silicon crystal [2].We assume a uniform defect distribution among sensors' pixels [29].For an easier presentation, we associate an equivalent irradiance, n d , to each pixel such that statistics of dark counts is preserved.Therefore, the resulting irradiance (r) incident on the i-th pixel of k-th sensor in a multi-view imaging system can be modeled as a combination of object (s), background (n B ) and dark-count equivalent (n d ) irradiances as following: in which one dimensional scripted notation is used for brevity.Also, w i k denotes a binary window function that defines the support of the object in k-th elemental image so that w i k is unity within the object boundaries and zero elsewhere.Note that, w k is different for each elemental image and is known a priori as part of reference object information.Likewise, background noise can be different in each elemental image due to varying sensor viewpoints.In addition, α accounts for the potential difference between unknown object and reference irradiances.
Throughout this paper, the pre-superscript, post-superscript and post-subscript for each symbol denote object class, pixel index and signal source, respectively.For example, j w i k denotes the i-th pixel of a j-th class object support function as seen from k-th elemental image.
In general, inherent stochastic fluctuations of irradiance can influence the statistical properties of photo-counts, which can result in non-Poissonian photo-count distributions.However, it can be shown that for polarized thermal radiation, when the count degeneracy parameter approaches zero, the probability distribution of photo-counts approaches Poisson distribution [1].In this case, the number of detected photons, r i , in i-th pixel is a discrete random variable whose mean is related to irradiance, r i , of the light impinging on that pixel and follows Poisson distribution as [30]: In fact, each term in Eq. ( 3) represents the rate for a Poisson process.Intensity images can be used to simulate photon-counting imagery [13] since recorded intensity on a pixel is related to the mean number of photons impinging on each pixel.In simulating photon-counting imagery, one can control the total number of photo-counts in all elemental images combined, N ph , by using normalized irradiance as [13]: where I i is the recorded intensity at i-th image pixel.

Pattern Recognition with 3D Photon-Counting Imagery
In the realm of statistical decision theory [31], each possible state (or class) is represented by a hypothesis H j .Given the photon-counting imagery, the maximum-likelihood (ML) decision criterion is to choose between one of the hypothesis such that an objective function (e.g.probability of error) is minimized.In a binary classification problem, H 1 is selected if object 1 is more likely to have produced the observed photon-counting data.For mathematical brevity, we assume that all object classes are equally likely and that the cost of error is the same for all misclassifications.Given a photon counting dataset, R, a convenient way to use ML decision theory is to calculate the likelihood ratio, ℓ(.), between the two classes and make decisions based on the outcome [30]: where L (.) denotes the likelihood function.
As described in Section 2, the multi-view photon counting imagery can be used to reconstruct the object space in 3D.The same methodology can be extended to quantum-limited imaging conditions.The likelihood function of the reconstruction space under hypothesis H j can be expanded as: where r i k is the discrete count random variable associated with pixel i of k-th elemental image.The innermost product in Eq. ( 7) is on M pixels of each elemental image, the second product is on set of K elemental images and the outermost product is on Q reconstruction planes.
Note that the disjoint object and background model in Eq. ( 3) can be used to rewrite the probability density of each point in space as: in which the first product series is on counts within object support and the second product series is that of background.The latter is irrelevant to the recognition problem and since it is bounded, we substitute it with 1 and treat it as a benign term.
From Eqs. ( 3) and ( 4), it is evident that within the object support of k-th elemental image, the conditional density of the number of counts for i-th pixel, r i k , follows a Poisson distribution with the rate r i k = α.j s i k + n d , with j denoting the class hypothesis, i.e.
where P(.) is the Poisson transformation following the form of Eq. ( 4).Combining Eqs. ( 8) and ( 9) and substituting in Eq. ( 7), and taking the logarithm yields: where ĩ = i − pk for conciseness.The log likelihood in Eq. ( 10) can be calculated based on the a priori reference object normalized irradiance (s and w), sensor's characteristic dark count rate (n d ) and total counts registered at each pixel of the sensor (r).Note that unlike correlation based techniques, the likelihood in Eq. ( 10) would penalize a high energy background if the object is not present or expected in the scene, i.e.where s i ≪ 1 but r i > 0. This reduces the false positive rate and improves the recognition performance.
Since the object irradiance is only known to be a scalar multiple of the reference object irradiance, we set to zero the partial derivative of Eq. ( 10) with respect to α to find its estimate α: From Eq. ( 11) the solution for α is that of a high-order polynomial which does not yield a closed form expression.However, for small enough dark noise, n d ≪ s, α can be simply found as: which can be calculated separately and substituted for α in Eq. (10).If n d ≥ s, one can calculate α by applying numerical non-linear solvers, such as Newton's method, on Eq. (11).Note that only pixels with nonzero counts need to be taken into account to find a solution for α in Eq. (11).
In photon counting domain, only a small number of pixels are expected to report counts, which in turn simplifies calculation of α.Given a set of photon counting elemental images which include both photon-counts as well as dark-counts, one can calculate the log-likelihood in Eq. ( 10) for all object class hypotheses j = 1, 2, . . ., J.In case of the binary classification between two distinct objects, the labeling strategy based on the likelihood ratio in Eq. ( 6) can be rewritten with log-likelihood values.The resulting decision rule is: The computational complexity for calculation of Eq. ( 10) is O(n) where n represents the total number of pixels that belong to the object in all images.Note that in contrast to conventional intensity images that have a large number of 8 or 12 bit pixels, photon counting images are the outcome of a Poisson process [Eq.( 4)] with very low number of incident photons.Such images typically require only 1 or 2 bit pixel elements for detection.This results in substantially reduced number of non-zero pixels in the image and directly translates into reduced storage and computational requirements.

Experimental Results
In order to evaluate the performance of the proposed method, a multi-view 3D imaging system is used to capture toy models.The resulting images are normalized [see Eq. ( 5)] and transformed to quantum limited imagery through Poisson transformation [see Eq. ( 4)].Dark noise is simulated and added to the photon-counting images per Section 4.1.The algorithms presented in Section 3 are applied to determine the performance of classification through Monte-Carlo simulations.The results are demonstrated in terms of Fisher ratio.

Multi-View 3D Imaging
Two similar toy truck models are chosen as reference objects (see Fig. 2).Both objects fit in a rectangular box of approximately 3 " × 1 " × 1 " and have similar features and shape.The blue truck in Fig. 2(a) is taken to be the true class while the white truck in Fig. 2(b) is assigned to be the false class object.Using a multi-view imaging system, as shown in Fig. 1, a single sensor scans the pickup plane in an 11×11 grid, and 121 elemental images of both reference objects are recorded.
The horizontal and vertical sensor pitches are p x = 16 mm and p y = 10 mm respectively and the focal plane size, i.e. sensor size, is 24×36 mm with pixel size of µ = 10µm.The imaging optics has a fixed focal length of 24 mm with f # = 5.4.For each elemental image the object support w k is extracted by thresholding.The reference objects are imaged under controlled illumination against a dark background.Note that the unknown input scenes need not to be imaged in the same illumination condition and can include background noise of arbitrary pattern and brightness [see Eq. ( 11)].As unknown input objects, the same two objects are presented to the imaging system in a different pose (comparing to reference objects) with additional pinetree foliage background.Figure 3 illustrates 16 (out of 121) views of one of the objects.
Intensity images of the unknown scene (Fig. 3) are used to generate photon-counting elemental images according to the Poisson detection model described in Section 2.1.Dark counts are also simulated and added to the photon-counts according to Eq. ( 3) and (4).Figure 4 illustrates how a single view of the scene is generated from its corresponding intensity image.This process is repeated for all elemental images to create multi-view photon-counting image set.
The photon-counting elemental images can be used to reconstruct the object space, R, as described in Section 2. The volumetric reconstruction for both photon-counting imagery and reference objects' 3D images are generated using Eq. ( 2) based on which Eq. ( 10) is used to find the log likelihood with respect to both object hypotheses.

Recognition Performance
The performance of the proposed photon-counting 3D object recognition is tested under two different scenarios.In both, background noise is present in the scene containing the unknown object.In the first scenario, dark noise is disregarded, i.e. n d = 0, and the reconstructed photoncounting 3D image of the object only contains the photo-counts.In the second scenario, sensors are assumed to have a fixed dark count rate as a result of which the total dark counts (N dc ) increase proportional to the total photon counts (N ph ).In both cases, the illumination conditions are similar, thus α is set to 1.
To quantify the recognition performance in ideal conditions, lets consider the case where background noise is present but no dark counts are generated at the detector.Photon-counting images of both true and false class objects in background noise are generated 500 times through Monte Carlo simulation based on experimentally captured elemental images.At each step, the likelihood of the photon-limited 3D reconstruction is computed according to Eq. ( 10) with respect to both true, H 1 , and false, H 2 class reference objects.
The log likelihood ratio in Eq. ( 13) is then calculated and the difference between the log likelihoods with respect to the known reference objects, i.e. log L (R|H 1 ) − log L (R|H 2 ) is used for classification.This quantity, along with its standard deviation, is plotted in Fig. 5(a) for various values of total photon count, N ph .
In the second scenario, the total number of dark counts increase with available photons detected from the scene.The dark count rate, n d , is chosen such that the expected number of dark counts combined for all elemental images, N dc , is always 27 times more than that of photoncounts, i.e N dc = 27N ph .This results in a constant ratio of object photons to dark counts equal to 0.037 that is preserved in all experiments.The resulting log likelihood difference and its standard deviation is plotted in Fig. 5(b).
As the performance metric, Fisher Ratio can be used.Table 1 and Fig. 6 show the associated Fisher Ratio for each N ph in both scenarios.Although sparse, the information captured from an object using a 3D photon counting imaging system provides one with means for object recognition.The likelihood ratio formulation can be used to process the photon-counting information in a binary classification problem.
As expected, more object photons result in a better discrimination, i.e. a higher Fisher ratio.In our experiments, in the absence of dark counts, an acceptable Fisher ratio of 7.1 can be achieved even at 10 photons per scene.While with more than 20 photons, the binary classification is virtually perfect.In the realistic case of quantum-limited imaging where dark noise is present, the required number of photons increases to about 50 assuming that the fallacious dark counts are 27 times more than photon-counts, i.e. object photons to dark counts ratio of 0.037.

Conclusion
In this paper, maximum likelihood decision theory is presented for object recognition in photoncounting imagery containing sparse, quantum-limited information about the object.Back- ground and dark noise sources present in realistic scenes are also considered.The imaging system used for capturing both reference object and photon-counting imagery is a multi-view 3D imaging system which can capture 3D structure of the object.Experimental results were demonstrated for binary object recognition at a ratio of 0.037 between object photons and dark counts.The proposed method makes use of the fact that pixels with zero counts also convey object information when it comes to deciding between multiple object hypotheses.This method can be extended to multiple-class recognition problems.

Fig. 2 .
Fig. 2. Reference objects used in the experiment.(a) True class (blue truck), and (b) false class (white truck).Objects share similar shape and features.

#Fig. 4 .
Fig. 4. Simulation of photon-counting imagery.Photons are shown in green, dark counts shown in red.(a) Full intensity image of real unknown object, (b) detected 191 photons from intensity distribution of (a) according to Eq. (9), (c) dark frame generated with appx.5400 counts, and (d) addition of photon image (b) and dark frame (c).

#− 2 Fig. 5 .
Fig. 5. (a) Log likelihood ratio for the blue and white truck in a scene with background noise (a) without dark counts, and (b) with dark counts varying linearly with photon-counts.True class is the blue truck.

#Fig. 6 .
Fig.6.Fisher Ratio increases with number of detected photons.The slope decreases with increasing dark noise.

Table 1 .
Discrimination performance between two classes with and without presence of dark noise.FR is the Fisher Ratio