Spatio-temporal Hotelling observer for signal detection from image sequences

Detection of signals in noisy images is necessary in many applications, including astronomy and medical imaging. The optimal linear observer for performing a detection task, called the Hotelling observer in the medical literature, can be regarded as a generalization of the familiar prewhitening matched filter. Performance on the detection task is limited by randomness in the image data, which stems from randomness in the object, randomness in the imaging system, and randomness in the detector outputs due to photon and readout noise, and the Hotelling observer accounts for all of these effects in an optimal way. If multiple temporal frames of images are acquired, the resulting data set is a spatio-temporal random process, and the Hotelling observer becomes a spatio-temporal linear operator. This paper discusses the theory of the spatio-temporal Hotelling observer and estimation of the required spatio-temporal covariance matrices. It also presents a parallel implementation of the observer on a cluster of Sony PLAYSTATION 3 gaming consoles. As an example, we consider the use of the spatio-temporal Hotelling observer for exoplanet detection.


Introduction
Signal detection [1,2] is a fundamental task in image science and a common problem in many fields, from medicine (for example, the detection of tumors [1,3,4]) to astronomy (detection of extrasolar planets [5,6] or near-earth objects). As a consequence, an extensive theory and numerous algorithms have been developed to address the problem of signal detection from images [1,2,[7][8][9][10][11][12][13]. Some systems, such as adaptive optics systems [14,15], can deliver sequences of spatially correlated images. Spatio-temporally correlated data also appear in medical applications, such as [16,17]. In many cases, the large amount of spatiotemporal data makes it hard to examine them and apply the detection algorithm directly to the data. Reduction in the temporal dimension is usually performed by adding together some or all of the frames. In adaptive optics, for example, images of the same object are taken over time and summed together either on the readout chip or in an external computer. The resulting single-frame image is used to perform the task of interest, but the information loss that results from the summation can reduce the detection performance. In medical applications, by contrast, it is usually not valid to assume that the object is constant over a long exposure time, and indeed the temporal dynamics of the object may be what defines the signal to be detected. Short-exposure images are inherently noisy, but long exposures wash out the signal. Full spatio-temporal processing is required for optimal signal detection.
The performance of a detection algorithm is mathematically quantified using receiver operating characteristic (ROC) analysis [18][19][20] and the area under the ROC curve (AUC) [19,21]. With respect to that measure, the likelihood ratio is the optimal detector [21]. However, the likelihood ratio requires knowledge of the probability density functions under the hypotheses signal present and signal absent. Such probability density functions are often unknown or hard to estimate in practical cases. A more viable solution is the Hotelling observer [22], which requires only the knowledge of the data mean vector and covariance matrix. The Hotelling observer is linear, and it is optimal with respect to the class of linear observers [21] and a certain detectability measure to be defined below.
In this paper, the optimal-linear Hotelling observer is applied to spatio-temporal imagery. For this reason, we talk about the spatio-temporal Hotelling observer. By construction, such an observer is able to use both the spatial and temporal correlations between pixels in an optimal way, with respect to all linear observers. Methods for the estimation of the mean data vector and covariance matrix are described as well. Computational methods are described, and a parallel algorithm is implemented on a cluster of Sony PLAYSTATION 3 game consoles.

The spatio-temporal Hotelling observer
Let {g (1) , … , g (J) } be a collection of J images of the same object f (assumed to be independent of time) taken over time. Each g (j) is, in turn, a collection of M pixel intensities . The data set {g (1) , … , g (J) } represents the intensities of a total of MJ pixels, which we raster-scan and represent in compact form as the MJ × 1 vector G.
The task of interest is binary discrimination: given a noisy data set G, an observer must decide whether the object f that produced G belongs to either the "signal-absent" class Γ 0 or to the "signal-present" class Γ 1 . The observer can also be said to decide between hypothesis H 0 where the signal is absent, or hypothesis H 1 where the signal is present. In any case, the observer evaluates a real-valued non-random function t on the random data G and compares t(G) to a threshold τ. If t(G) > τ, hypothesis H 1 is assumed. Otherwise, hypothesis H 0 is concluded. All of the possible outcomes and associated terminology are summarized in Table 1.
Recall that G is a random vector, so t(G) is a random variable. For any fixed value of τ, the decision taken depends on t(G), so it is a random variable as well. We can consider its probability density functions pr(t|H 0 ) and pr(t|H 1 ) given, respectively, hypothesis H 0 or H 1 . These two densities allow us to formally define the true positive fraction (TPF) and the false positive fraction (FPF) as (1) in which it is made clear that TPF and FPF are functions of τ.
If we change the value of τ, different values of TPF(τ) and FPF(τ) are obtained; a plot of TPF(τ) versus FPF(τ) as τ is varied over the real line is called a receiver operating characteristic (ROC) curve [18][19][20], and the area under the ROC curve (AUC) [19,21] is a meaningful figure of merit for a binary classification task. The AUC is defined as [21] Another figure of merit for the same task is the signal-to-noise ratio (SNR) on the test statistic t(G): (2) in which the notation 〈t(G)〉 G|H i denotes the statistical expectation of the random variable t(G) conditioned to the knowledge that hypothesis H i is true. Similarly, Var{t(G)|H i } is the variance of t(G) under the hypothesis H i . In (2), hypotheses H 0 and H 1 are assumed equiprobable. The AUC t(G) is a known, monotonic function of SNR t(G) if t(G) is normally distributed [21].
Many classical monographs discuss statistical decision theory; some of them are [2] and [9]. These texts show that, for binary classifications and for any ROC-related figure of merit (including the AUC), the optimal observer is the likelihood ratio [1,2,9,21]: or, equivalently, its logarithm λ (G) = ln Λ (G). However, we note that Λ (G) requires knowledge of the multivariate densities pr(G|H i ), which are usually unknown or difficult to estimate. A more viable alternative can be found by restricting attention to linear observers, i.e., observers of the form t(G) = W T G, for an appropriate template vector W of the same size of G. Here, the symbol T denotes the transpose of a vector or matrix, so W T G is a scalar product. The optimal template vector can be derived by substituting t(G) = W T G in (2) and maximizing with respect to W. The resulting template vector, which we call the Hotelling template vector [21,22], is of the form (3) in which K G is the covariance matrix of the data vector G and is the image of the signal. The triple overbar represents an average over object randomness, system randomness, and measurement noise as discussed below. The linear observer that uses W Hot is the Hotelling observer, defined as . The Hotelling observer is also called a prewhitening matched filter [21], and the prewhitening operation is both correcting for the correlation in a single image and also undoing the frame-to-frame correlation. Note that the Hotelling observer t Hot (G) defined above is applied to spatio-temporal data, as opposed to the classical use of the Hotelling observer in the case of purely spatial data.

Analysis of the data covariance matrix
In a general case, we have three sources of randomness: detector noise, point spread function variability, and object variability. This leads to the decomposition [23] (4) with which the Hotelling template vector is written as (5) The expression in (4) can be formally derived from the definition of covariance matrix for G: (6) in which P represents the sequence {p (1) , … , p (J) } of point spread functions (PSFs). We will allow the PSFs to be random, as in the adaptive optics problem. The PSFs will be assumed known statistically (PSF-known-statistically or PKS [21]), and their contribution to the data covariance matrix will be estimated by means of simulated data. It is important to note that full knowledge of the statistical properties of the PSFs is not needed. Instead, the Hotelling observer requires the knowledge of only the mean signal S and the data covariance matrix K G . As long as these two quantities can be estimated with sufficient accuracy, the Hotelling observer will deliver high performance. No moments higher than the second are needed.
Expression (6) contains three averaging (or statistical expectation) steps. The innermost expectation is on the noise and for given P and f. The resulting quantity is averaged over P given f, and, finally, we average over f. From (6) and adding and subtracting terms appropriately [23], (4) is derived, provided that: in which G with a variable number of bars denotes average with respect to noise, PSFs, and object. By construction, random vectors G, G ̅ , and G ̿ are uncorrelated [23].
For simplicity, we will assume that the signal we want to detect is known in brightness and location. In this case, we refer to a signal-known-exactly (SKE) problem [21]. The more realistic problem of unknown signal location can be handled by scanning the observer template [24][25][26]. The SKE hypothesis provides valuable information that the observer can use to improve detection performance (quantified by an increase in AUC) with respect to the detection and localization problem of [25]. We also assume that the object background is known exactly (background-known-exactly or BKE in the terminology of [21]).
The noise covariance matrix is usually very easy to study. Indeed, if we assume that only photon (Poisson) and readout (Gaussian) noises are present in the sequence G, then the noise in distinct elements of the detector are usually uncorrelated and so we can write (7) where is the readout noise variance for the m-th pixel of the detector, and is the variance of the photon noise for the same pixel when is the expected number of photons (from objects in the field of view and background) collected for the m-th pixel of the j-th image. The appropriateness of the Poisson model for the photon noise is justified by invoking the so-called Poisson postulates [21]. The readout noise variance at each detector pixel is usually known, as provided by the detector's manufacturer, or it can be measured. Matrix above is diagonal with no zero terms on the diagonal, which guarantees the invertibility of K G .

Estimation of the Hotelling template vector
In this section, we describe how quantities discussed in the previous section can be estimated. We will rely on simulation code and consider L 1 realization of the PSF sequence P and L 2 realization of a random signal. Consider L 1 L 2 simulated noiseless data sets , ℓ 1 = 1, … , L 1 , ℓ 2 = 1, … , L 2 for the hypothesis H 1 and, similarly, L 1 noiseless data sets , for ℓ 1 = 1, … , L 1 for the hypothesis H 0 .
The noise covariance matrix is estimated as: where we denoted estimated quantities using the hat symbol and we set The PSF covariance matrix is estimated from simulated data as well: (8) where (9) Expressions (8) and (9) show that is the sample estimate of , estimated from the noiseless simulated data.
Finally, if we consider randomness in the signal to be detected or the background on which it is superimposed, the object term in the covariance matrix expression can be estimated as: where It might be tempting to try to estimate the whole data covariance matrix K G in (6) from noisy simulated data, without using the decomposition (4). A necessary (but not sufficient) condition for such estimate K̂G to be nonsingular is that the number L = L 1 L 2 of simulated noisy sequences must be greater than MJ, the order of K G itself. If, for example, each image g (j) is of size 64 × 64 and there are 25 of them in each sequence G, then MJ = 64 2 · 25 ≈ 10 5 . Simulating such a huge number of image sequences is clearly prohibitive. Instead, the decomposition (4), along with (7), guarantees the invertibility of K̂G. This can be proved by noting that K̂G is symmetric and (strictly) positive definite. Indeed: for any vector x ≠ 0.
Finally, the average contribution of the signal to the image data is estimated as: Armed with an estimate of the signal to be detected and estimates of the covariance matrices that appear on the right-hand side of (4), we can formally write an expression for the Hotelling template vector estimate: in analogy with (5), we define (10) For the SKE/BKE case, (10) reduces to: (11) Although (10) and (11) make sense because [K̂G] −1 exists, the matrix we need to invert is huge, so computing the inverse by means of standard algorithms (such as Gaussian elimination) is computationally prohibitive [27]. Even if we could invert K̂G, storing it will require an incredible amount of disk space. However, we recognize the particular structure of the PSF covariance matrix [see (8)] and consider an algorithm that takes advantage of it. Indeed, if we introduce the matrix R whose elements are then (8) is rewritten as The MJ × L matrix R contains in the ℓ-th column the MJ × 1 vector obtained by rasterscanning the pixels in . The Woodbury matrix-inversion lemma [28][29][30][31] allows us to rewrite the inverse in (10) as a computationally tractable expression. In abstract form, the matrix-inversion lemma can be stated as follows: in which I N is the identity matrix of order N. If then (12) where The matrix is diagonal, so computing its inverse does not pose computational problems. The matrix Q is of size L × L and invertible. Note that L is usually much smaller than MJ, which implies that Q −1 can be calculated in much shorter time than [K̂G] −1 . Standard Gaussian elimination with pivoting is a fast and numerically stable way to invert Q. Overall, using (12) is a more tractable and stable problem than computing (11) directly.

Implementation
For this research, we took advantage of the computational capabilities available at the Center for Gamma-Ray Imaging, University of Arizona. In particular, we used a Sony PLAYSTATION 3 cluster consisting of 30 units. Each unit was equipped with a Cell Broadband Engine (Cell BE) processor, 256 MB of RAM memory, 60 GB of disk space, and was running Linux Fedora 7, kernel version 2.6.23. The IBM's Cell BE Software Development Kit 3.0 was installed on all of the units and used to generate code suitable for the Cell BE microprocessor. All machines in the cluster were connected by means of a 1-Gbit/s local area network (LAN). All algorithms were coded using the C programming language, and source files were compiled using IBM XL C/C++ compiler, version 9.0. Communication between different nodes of the computer cluster was achieved using the Message Passing Interface (MPI) standard. The Cell BE architecture [32][33][34] has recently received enormous attention from the computer science community. The possibility of using up to nine processing cores and the fact that SIMD instructions are supported make the Cell BE processor particularly appealing as a low-cost solution for high-performance scientific computing [35][36][37][38][39].
An algorithm for the computation of the template vector Ŵ Hot according to (10) and (12) was implemented and run on the PLAYSTATION 3 cluster. The algorithm is composed of two different programs: one to be run as a master process and one to be run as a slave process. Pseudocode for the master process is shown below: In the implementation, we took advantage of the Cell BE processor and its SIMD capabilities. Floating-point values were stored in double precision, and SIMD instructions were used for the computation of the blocks of [K̂G] −1 as they were needed. Splitting the work load among all of the processing cores available on each PLAYSTATION 3 allowed more than a 15-fold reduction in the computation time with respect to an implementation that uses only one core and ignores their SIMD capabilities. Had we used single-precision values, the speed-up would have been even larger.

Simulation results
As an example, we developed the spatio-temporal Hotelling observer for adaptive optics (AO) images [14,15]. The simulation code we used is AOTools (webpage: http://www.tosc.com/software/software.html). The atmospheric thickness was set to 24km and, in order to compute the aberrated wavefront, we assumed the frozen flow hypothesis of atmospheric turbulence. The turbulence was computed according to the modified Hufnagel-Valley C n 2 profile, given by [40]: in which the altitude h is expressed in meters and the units of C n 2 are meters to the −2/3 power. We assumed C n 2 (h) trascurable when h ≥ 24km. The Fried parameter r 0 [41] for the phase screens was 0.30m at the wavelength λ = 500nm. Our simulation took into account effects such as scintillation and anisoplanetism [15] as well. The wind speed was 1.25m/s. The telescope we simulated had a circular aperture of diameter 5m, and the diameter of the central obscuration due to the secondary mirror was 0.50m. The secondary mirror was supported by three arms. For the estimation of the template vector Ŵ Hot in (10), we simulated L = 512 sequences containing J = 25 images each of size 64 × 64 (pixel size 5.96 µm). The wavefront sensor apparatus of many AO systems includes a lenslet array. In our simulation, we simulated a 32 × 32 array of side length 0.02m. The total power entering the telescope was equally split between the wavefront sensor and the science camera by a 50/50 beamsplitter. The system was assumed idea, with an efficiency of 1 (i.e., no losses) and the AO loop was running at a speed of 1kHz. The quality of the AO correction can be quantified with an average Strehl ratio of about 0.67.
We applied the spatio-temporal Hotelling observer to the detector of dim planets orbiting a star (assuming the star and the planet in the same isoplanetic patch), and we simulated data sets for both hypotheses H 0 and H 1 . The apparent magnitude of the star was m = 6 and the difference in apparent magnitude between the planet and the star was Δm = 16.74. More specifically, we simulated L = 512 sequences for each hypothesis and, for each sequence, we used J = 25 frames containing M = 64 2 pixels each. The simulation was set up in order to mimic an astronomical observation. For example, we used typical values for the readout noise as found in [42]. The exposure time was 0.1ms. The data used to estimate the template vector Ŵ Hot was noiseless and no noise was considered in the wavefront sensor. On the other hand, data on which the detection task was applied were noisy and were obtained with different phase screens.
We considered the SKE/BKE/PKS case and we compared the performance of the spatiotemporal Hotelling observer to that of the purely spatial Hotelling observer t Hot (g) and the purely spatial matched filter [43,44] observer t mat (g) = s T g, where s is the signal to be detected. In an effort to mimic current practice in astronomical detection, both t Hot (g) and t mat (g) were applied to long-exposure data g obtained by on-chip integration of many shortexposure frames. The observers were run on simulated noisy data. We generated test noise-free image sequences for both the planet-absent and planet-present hypotheses and degraded them with Poisson photon noise and Gaussian readout noise to generate n = 10,000 noisy sequences of short-exposure images for the planet-absent hypothesis and n noisy sequences short-exposure images for the planet-present hypothesis. The corresponding long-exposure images were generated as well. These test data were supplied to the observers considered in this comparison, and the corresponding values of the test statistics and were collected. Binning these values would provide approximated plots of the densities pr(t|H 0 ) and pr(t|H 1 ) for a particular observer t. For each observer, we estimated the values of the TPF and FPF [see (1)] as: in which the notation |S| stands for the number of elements of the set S, and we varied the value of τ to obtain ROC curves. The ROC curves are reported in Fig. 1, and the corresponding values of the AUC, standard deviation σ on the AUC, and SNR are reported in Table 2. The SNR for the three observers was computed according to (2), in which conditional means 〈 … 〉 and variances Var{ … } were replaced by the sample means and sample variances of the and . The values of σ were computed as described in [45].
The results reported in Fig. 1 and Table 2 confirm the superiority of the spatio-temporal Hotelling observer with respect to the spatial Hotelling observer and the matched filter observer. The results of Fig. 1 complete the ones reported in [25]. Indeed, in [25], we compared the spatial Hotelling observer with current techniques used in astronomy for point-source detection, and we noted that the spatial Hotelling observer outperforms popular detection algorithms, such as [46]. In this paper, we showed that short-exposure images retain temporal information, which increases detection performance. We see that the spatiotemporal Hotelling observer outperforms current long-exposure detection algorithms as well.

Conclusions
In this paper, statistical decision theory was rigorously applied to the problem of signal detection for spatio-temporal data. Three sources of randomness were initially considered: randomness in the object, randomness in the residual point spread function, and randomness due to measurement noise in the detector array. However, for simplicity, we assumed the signal in the object to be nonrandom and at a known location, and the background constant and known. We noted that, in many applications of interest, a complete description of the statistics of the data is not available. Only the first and second moments of the data are required to compute the Hotelling observer. We remarked the importance of the temporal correlations between pixels in the temporal data. Indeed, the Hotelling observer was applied to the whole sequence of temporally-related images, rather than to their average.
In some cases, such as adaptive optics systems, a complete analytical study of the first two moments of the data might be complicated. Therefore, we proposed to estimate means and covariance matrices using simulated data. We implemented one algorithm for the computation of the spatio-temporal Hotelling observer, and we ran the algorithm on a Sony PLAYSTATION 3 cluster. Our implementation took advantage of the computational capabilities of the Cell Broad-band Engine Architecture (Cell BE) processor, which equips all of the Sony PLAYSTATION 3 units of our cluster. Thanks to the matrix-inversion lemma, the problem of computing the product between the inverse of a large covariance matrix and the signal was recast to the problem of computing matrix multiplications involving the inverse of a much smaller matrix.
Research concerning an analytical expression for the PSF covariance matrix is currently underway for the case of an ideal thin lens with a weak Gaussian phase perturbation in the pupil. This model might be appropriate for high-performance adaptive optics systems, for which the residual perturbation can be shown to be Gaussian and weak. This study would be of great importance, as it would eliminate the need for simulation in order to estimate the PSF covariance matrix. If an analytical expression for the data covariance matrix is available, other methods-based, for example, on the Landweber algorithm or on the Neumann series expansion-for the computation of the Hotelling template vector could be investigated as well [21]. In addition, it would be interesting to find an expression for the data's probability density function. With such density, the likelihood ratio could be computed and compared with the Hotelling observer. We expect these two methods to deliver similar detection performance, because, for large mean values, Poisson random variables are very well approximated with Gaussian random variables.
In this study, we relied on simulation code to estimate the mean and covariance of the data. Differences between the actual mean and covariance present in real data and the mean and covariance used in the detection task can arise from two sources: errors in the simulation code or sampling errors because the mean and covariance are estimated from a finite number of simulated sample images. In this work and related previous studies [25,47], we have investigated the latter point in great detail. The former point, effect of model errors on detection performance, has not been investigated for the spatiotemporal Hotelling observer (implemented for the first time in this paper), but it has been studied in the medical literature for purely spatial Hotelling observers [21,48]. The general conclusion is that even crude modeling of the covariance model affords a demonstrable improvement in detection performance, though of course more accurate models are always preferable. This issue will be discussed in a separate paper. Plots of the ROC for the observers t Hot (G), t Hot (g), and t mat (g).