Bayesian bacterial detection using irregularly sampled optical endomicroscopy images

Pneumonia is a major cause of morbidity and mortality of patients in intensive care. Rapid determina- tion of the presence and gram status of the pathogenic bacteria in the distal lung may enable a more tailored treatment regime. Optical Endomicroscopy (OEM) is an emerging medical imaging platform with preclinical and clinical utility. Pulmonary OEM via multi-core ﬁbre bundles has the potential to provide in vivo, in situ , ﬂuorescent molecular signatures of the causes of infection and inﬂammation. This paper presents a Bayesian approach for bacterial detection in OEM images. The model considered assumes that the observed pixel ﬂuorescence is a linear combination of the actual intensity value associated with tis- sues or background, corrupted by additive Gaussian noise and potentially by an additional sparse outlier term modelling anomalies (bacteria). The bacteria detection problem is formulated in a Bayesian frame- work and prior distributions are assigned to the unknown model parameters. A Markov chain Monte Carlo algorithm based on a partially collapsed Gibbs sampler is used to sample the posterior distribution of the unknown parameters. The proposed algorithm is ﬁrst validated by simulations conducted using synthetic datasets for which good performance is obtained. Analysis is then conducted using two ex vivo lung datasets in which ﬂuorescently labelled bacteria are present in the distal lung. A good correlation between bacteria counts identiﬁed by a trained clinician and those of the proposed method, which de- tects most of the manually annotated regions, is observed.


Introduction
Pneumonia is characterised by a host inflammatory response to a pathogenic infectious burden in the distal lung and is usually caused by bacterial infection . However, accurate quantitative diagnosis and monitoring of suspected pneumonia is a challenging task with currently available imag-scanning proximal light source, linked to a flexible multi-core fibre. This fibre is passed through the working channel of endoscopes enabling microscopic imaging at its distal end. The lateral diameter of the fibre used in lung imaging applications is 1.4 mm. This miniaturisation enables the exploration of the distal pulmonary tract as well as the assessment of the respiratory bronchioles and alveolar gas exchanging units of the distal lung ( Thiberville et al., 2007 ). OEM has been used clinically in the lung for the detection of lung cancer ( Fuchs et al., 2013;Thiberville et al., 2009b ) and has been used to image the distal lung ( Thiberville et al.,20 09c; 20 09a ) including the imaging of parenchymal lung diseases ( Newton et al., 2012 ).
In pulmonary OEM, elastin and collagen generate autofluorescence when excited with a 488 nm laser source. While bacteria can potentially auto-fluoresce when exposed to light at the appropriate wavelength(s), the emitted fluorescent signal is usually too weak to be imaged effectively and to provide any diagnostic information. To address this limitation, bacteria can be specifically labelled with targeted fluorescently labelled ligands (SmartProbes) , which bind to the bacteria to generate fluorescence in response to a light excitation at pre-determined wavelengths (e.g., 488 nm laser excitation) . Stained bacteria then appear as bright dots in the images, whereas elastin and collagen are naturally auto-florescent and present a mesh-like structure in the distal lung, making the detection of bacteria a more challenging task.
Recently, there has been a strong research interest in the development and deployment of custom optical molecular SmartProbes to image in vivo targets in clinic. Several candidate SmartProbes have been designed to target bacteria  and fibrogenesis ( Aslam et al., 2015 ). In this work, we assess different combinations of bacteria and SmartProbes which yield different fluorescence levels.
The commercially available OEM platform (Cellvizio, Mauna Kea Technologies, Paris-France) acquires images at 8-12 frames per second and clinical/preclinical OEM procedures often last minutes, generating thousands of frames, hence making their manual analysis a very labour-intensive process. There has therefore been an increased interest in the automated analysis of OEM images. OEM tailored approaches predominantly concentrate on (i) the reconstruction of fibre bundle images ( Eldaly et al., 2018a;2018b;Perperidis et al., 2017b;Le Goualher et al., 2004;Savoire et al., 2012;Parker et al., 2019 ), (ii) the retrieval and classification of frames sequences for a more efficient quantitative analysis ( Ishijima et al., 2015;Perperidis et al., 2017a;Koujan et al., 2018;André et al., 2011;Désir et al., 2012 ), and (iii) the automated mosaicking of temporally adjacent frames to create an extended and continuous field of view ( Vercauteren et al.,20 06;20 05 ). Bacterial detection and quantification approaches developed for confocal microscopy ( Lenseigne et al., 2007;Veropoulos et al., 1998 ) are not directly applicable to OEM data due to major differences in the image acquisition approaches employed by the two technologies. In particular, confocal microscopy employs a regular rectangular grid to acquire images of high lateral and axial optical resolutions which are 0.18 μ m and 0.5 μ m respectively ( Fouquet et al., 2015 ). In contrast, OEM employs a sparse and irregular pattern (due to the organisation of the fibers in the fiber bundle) to acquire images of moderate lateral and axial optical resolutions of 0.5 μ m and 3 μ m respectively, within a constantly moving scene (breathing, heart beat and fibre motion) ( Jabbour et al., 2012 ). The real datasets considered in this work are acquired by the Cellvizio, Mauna Kea Technologies (Paris, France), confocal endomicroscopy system , which includes a set of preprocessing steps, which then makes the observation noise difficult to model analytically. Thus, we assume here that the noise is additive, white and Gaussian ( McCool et al., 2016 ).
The object detection problem can be addressed in a variety of different annotation scenarios among which two widely used ones are the dot-annotation and the count-annotation ( Arteta et al., 2014;von Borstel et al., 2016 ). While dot-annotation provides the location of objects that appear in the image, count-annotation only provides the number of objects in an image without explicitly revealing their locations. Dot-annotations are more informative given the small scale of the objects to be quantified. In this work, we assess the performance of our method by considering the number of detections (count-annotation) but also their estimated locations (dot-annotation).
In the literature, the object detection problem has been mainly addressed by either supervised approaches such as in ( Arteta et al., 2016;Lenseigne et al., 2007;Veropoulos et al., 1998;Bonheur et al., 2019 ), or by unsupervised approaches such as in ( Ding et al., 2011;Altmann et al., 2015;McCool et al., 2016;Lindeberg, 1998;He et al., 2010;Kimori et al., 2010;Bright and Steel, 1987;Eldaly et al., 2019 ). In supervised approaches, the dataset is usually divided into training and testing sets. During the training phase, a model is trained by pairing inputs with their expected outputs which are referred to as ground truth. This trained model is then used to estimate the output of the test dataset. While these algorithms are usually simple and fast, the trained model and its detection performance usually depends on the quality of the ground truth considered when training the model/classifier. Since bacteria annotation might be different from one clinician to another, considering such methods for cases where the ground truth can be highly subjective, can result in a biased trained model. Moreover, some of these models use convolutional neural networks ( Arteta et al., 2016 ), which are problematic here due to lack of sufficient training images and bacteria annotations. On the other hand, several studies have considered unsupervised approaches to solve outlier detection problems. In these approaches, the objects to be detected are learned from the data by fitting them with suitable distributions without using explicitly-provided labels. Example of such approaches are hierarchical Bayesian models ( Ding et al., 2011;Altmann et al., 2015;McCool et al., 2016 ) which offer a flexible and consistent methodology to deal with uncertainty in inference when limited amount of data or prior information is available. Moreover, other unknown parameters can be jointly estimated within the algorithm such as noise variance and regularization parameters. As such, they represent an attractive way to tackle ill-posed inverse problems. These methods rely on selecting an appropriate prior distribution for the unknown image and remaining unknown parameters. The full posterior distribution can then be derived using the Bayes' rule, and then exploited by optimization or simulation-based (Markov chain Monte Carlo) methods.
In this paper, motivated by the models considered in ( Altmann et al., 2015;Ding et al., 2011;McCool et al., 2016 ), we develop an algorithm to detect bacteria in distal lung images, taken using an OEM system. We consider that each fibre core's spectrum is a linear combination of an actual intensity value corrupted by additive Gaussian noise, and possibly an additional term modelling outliers (which are considered to be candidate bacteria).
The main contributions of this work are threefold: 1. We develop an algorithm that can help to automatically detect labelled bacteria in optical endomicroscopy images. To the best of our knowledge, it is the first time this problem is addressed in a statistical framework by using a hierarchical Bayesian model. The statistical model does not rely on a particular spatial organisation of the fiber bundle and the algorithm is fully automatic in the sense that it allows the automated estimation of the model hyperparameters. Thus, it does not require the user to tune crucial parameters. 2. We investigate different combinations of SmartProbes and bacterial concentrations including control cases for which no bacteria are present. Different SmartProbes that cause weak and stronger fluorescence are tested and the developed algorithm does not rely on strong assumptions about the Smart-Probe used. We also show that the algorithm can differentiate controls from bacterial loads of different concentrations. 3. We compare the results of the proposed model with bacteria annotations performed by a trained clinician and four existing spot-detection algorithms, using both dot-annotation and count-annotation.
The remaining sections of the paper are organised as follows. Section 2 formulates the problem of bacteria detection in OEM images, followed by Section 3 which summarizes the likelihood and the prior distributions assigned to the unknown parameters of the model. The resulting joint posterior distribution and the partially collapsed Gibbs sampler used to sample that distribution are discussed in Section 4 . Simulations conducted using synthetic and real datasets are presented in Sections 5 and 6 respectively. Limitations, conclusions and future work are finally reported in Sections 7 and 8 respectively.

Problem formulation
Fig. 1 depicts a background OEM image i.e., an image from a sample presenting constant intensity, and a zoomed-in region of this image. Bright and dark areas represent fiber cores and their cladding, respectively. This fibre bundle contains cores that are transmitting light and collecting fluorescence light simultaneously. Fibre cores contain information about the object being imaged, while the cladding (the space between the cores) does not. The underlying fluorescence profile is usually recovered by interpolating the fibre core intensities ( Le Goualher et al., 2004;Eldaly et al., 2018a ). Fig. 2 shows a typical sequence of OEM frames which result from interpolated fibre core intensities, with labelled bacteria annotated by a trained clinician. We can observe that labelled bacteria appear as bright dots overlaid across a network of very bright elastin strands in these images.
Since only fibre cores hold information about the object being imaged, each image can be sparsely represented with only N measurements, associated with N fibers, where each fibre core is represented by a single intensity value. The size of OEM images in our case is 274 × 384 pixels and each of these images consists of N = 12105 fiber cores. This corresponds to approximately 8.69% of the original image pixels. The interpolated images contain the same amount of acquired information as the sets of core intensities, and these images are mostly used for visualization purposes. Moreover, processing only central core intensities, whose coordinates are known from factory pre-calibration, reduces the data vol-ume to be processed and is expected to result in faster algorithms than when considering the whole interpolated images.
We consider a set of N observed sub-sampled intensities y = [ y 1 , . . . , y N ] T . In a similar manner to ( Altmann et al., 2015;Newstadt et al., 2014;Ding et al., 2011 ), each of these samples is assumed to result from a linear combination of an actual intensity value and additive Gaussian noise, potentially corrupted by additive outliers. The observation model can thus be expressed as where x n is the actual intensity value of the n th sample, r n represents a potential outlier (bacteria) and w n represents the additive noise, which is assumed to be independently and identically distributed over the N fiber cores. The noise is assumed to be Gaussian distributed with covariance matrix σ 2 I N , denoted as w ∼ N (w ; 0 N , σ 2 I N ) , where ∼ reads "is distributed according to".
The problem investigated in this paper is to estimate the outlier vector r in Eq. (1) , but the intensity values x and the noise variance σ 2 are also unknown. Thus we estimate jointly r, x and σ 2 from the observation vector y . To solve this problem, we propose a hierarchical Bayesian model and a sampling method to estimate the unknown parameters.

Hierarchical Bayesian model
This section introduces the hierarchical Bayesian model proposed to estimate the unknown parameter x, r and σ 2 . This model is based on the likelihood function of the observations and on prior distributions assigned to the unknown parameters.

Intensity field x
In OEM images, the intensity values of the scene to be recovered are expected to be spatially correlated. A classical and convenient way to model spatially correlated intensities is to consider Markov random fields (MRF) to build a prior model for x ( Rue and Held, 2005;Mardia, 1988;Eldaly et al., 2018a ). MRFs assume that the distribution of a given intensity x n , conditioned on the other intensity values of the image, reduces to its distribution conditioned on the values of its spatial neighbours, i.e., where V n is the set of indices of the neighbours of x n , x \ x n denotes the vector x whose element x n has been removed, and x V n is the subset of x composed of the elements whose indexes belong to V n . In this work, a Delaunay triangulation scheme ( Preparata and Shamos, 2012 ) is used on the N samples to define the neighbourhood structure. Given the structure of the fiber bundle considered, each fibre core has between 2 and 7 neighbouring cores with mean distance between neighbours of 4.1 pixels. We specify and where ∝ reads "is proportional to", d n,n denotes the Euclidean distance between the spatial locations n and n , and the hyperparameter γ 2 controls the global spatial correlation between intensities. Eq. (3) promotes smooth intensity variations between neigh- bours while ensuring that the prior dependence between neighbours decreases as d n,n increases. The resulting joint prior f ( x | γ 2 ) can be expressed as where Note that each core has more than 2 neighbouring cores, and thus

Noise variance σ 2
A Jeffreys' prior distribution ( Jeffreys, 1946 ) is chosen for the noise variance σ 2 , i.e., where 1 R + ( ·) denotes the indicator function defined on R + , which reflects the lack of knowledge about this parameter. This noninformative prior can be easily replaced by conjugate inverse-Gamma prior to include knowledge available about the noise level.

Outliers r
The concentration of bacteria in in vivo human lungs is expected to be such that only a small fraction of the fibers will be associated with bacteria detections. Hence the outliers are assumed to be sparse, i.e., for most of the spatial locations, the outliers are expected to be exactly equal to zero. To model the outlier sparsity, we factorise the outlier vector as where the corresponding outlier amplitude vector, and denotes the Hadamard (term-wise) product. This decomposition allows one to decouple the location of the sparse components from their values. Precisely, z n = 1 if a bacterium is present in the n th observed location with value equal to r n = t n , and r n = 0 if there is no bacterium ( z n = 0 ). A conjugate Gaussian prior model is used for t , i.e., where μ t and s 2 control the prior mean and variance of the outliers, respectively. In this work, we do not assume a particular spatial structure for bacteria positions. However, they have a priori the same probability of being present in any region of the scene.
To model this prior belief, we assign each label z n the following Bernoulli prior distribution where δ( · ) denotes the Dirac delta function. Moreover, it is assumed that the probability of bacteria presence ω is also unknown and we include this parameter within the inference process.

Hyperparameters
To reflect the lack of prior knowledge about the outlier variance in Eq. (8) and regularisation parameter γ 2 in Eq. (4) , the following weakly informative inverse-gamma priors are assigned to s 2 and γ 2 where ( η, ν) are fixed to (η, ν) = (10 −3 , 10 −3 ) . Note that we did not observe significant change in the results when changing these hyperparameters. Similarly, we assign μ the following conjugate truncated Gaussian prior where ( μ, ξ 2 ) are fixed and user-defined parameters (which depend on the dynamics of the image to be recovered). A truncated Gaussian prior on the positive set is considered as we expect the outlier mean to be positive. In this work, we fixed ( μ, ξ 2 ) = (0 , 10 6 ) . Finally, we assign the bacteria presence percentage ω a conjugate beta prior distribution where ( α, β) is fixed to (α, β ) = (0 . 1 , 1) as we expect the proportion of bacteria to be relatively small (the prior mean of ω is α/ (α + β ) ).

Joint posterior distribution
Assuming the parameters x, z, t and σ 2 are a priori mutually independent, the joint posterior of the parameter vector = x , z , t , σ 2 and hyperparameters = { μ t , s 2 , γ 2 , ω} can be expressed as and The next paragraph presents a sampling strategy to estimate the unknown parameter vector and the hyperparameters .

Bayesian inference
To overcome the challenging derivation of Bayesian estimators associated with f ( , | y ), we propose to use an efficient Markov Chain Monte Carlo (MCMC) method to generate samples asymptotically distributed according to Eq. (13) . In practice, strong correlations appear between x and t , and between z and t . Moreover, as z is sparse, sampling f (t , μ t , s 2 | y , \ t , \ (μ t ,s 2 ) ) , where H \ u denotes the parameter vector H whose parameter u is omitted, using a traditional Gibbs sampler results in very slow convergence. Hence, we propose a partially collapsed Gibbs sampler (PCGS) which yields better mixing and convergence properties of the generated Markov chain. It has been shown in the literature that the PCGS is a better Bayesian inference tool for Bernoulli-Gaussian models than standard Gibbs samplers as it can explore the state space more efficiently ( Kail et al., 2012;Lin et al., 2010;Kail et al., 2010;Ge et al., 2008;Robert and Casella, 2013 ). The PCGS used here samples groups of variables (e.g., ( x, t )) from their joint posterior distribution, in a similar fashion to block Gibbs samplers, which yields better mixing and convergence properties than sampling the variables (e.g., x and t ) sequentially from their conditional distributions. Sampling the joint distribution is achieved by first marginalising some variables which are then sampled from their full conditional distribution ( Liu, 1994;Van Dyk and Park, 2008 ). Precisely, we propose to sample sequentially the elements of and using moves that are summarised in Algorithm 1 .

Algorithm 1 Partially Collapsed Gibbs Sampling Algorithm For
Bacteria Detection -Version I. 1: Fixed input parameters : Number of burn-in iterations N bi , total number of iterations N MC 2: (k ) 4: Set k = k + 1 .
In Algorithm 1 , t 0 denotes the elements of t whose corresponding labels in z are null. Similarly, t 1 denotes the elements of t whose labels are equal to 1. We now detail each sampling step of Algorithm 1 as follow: Sampling (s 2 , t 0 ) | (y , \ t 0 , \ s 2 ) : As mentioned earlier, due to the sparsity of the outlier label vector z , we propose to sample simultaneously (s 2 , t 0 ) | (y , \ t 0 , \ s 2 ) , which improves the mixing and convergence properties of the Markov chains, rather than considering a Gibbs sampler to sample from s 2 | (y , , \ s 2 ) and t 0 | (y , \ t 0 , ) . This is done by using In other words, sampling from (s 2 , t 0 ) | (y , \ t 0 , \ s 2 ) can be achieved by sampling sequentially s 2 from f s 2 | y , \ t 0 , \ s 2 , and then t 0 from f t 0 | y , \ t 0 , . In order to compute f s 2 | y , , \ s 2 in Eq. (16) , we keep only the parameters that depend on s 2 in the joint posterior distribution in Eq. (13) , this leads to Due to the conjugacy of the distributions in Eq. (17) , it is easy to show that sampling from f s 2 | y , , \ s 2 can be achieved by sampling from the following inverse-Gamma distribution ( Gelman et al., 2013 ) However, marginalizing t 0 out the conditional distribution in Eq. (18) where N 1 = card (t 1 ) and I 1 = { n | z n = 1 } .
On the other hand, when z n = 0 , t n does not appear in Eq. (1) .
Thus sampling t 0 | (y , \ t 0 , ) reduces to sampling its elements independently from their Gaussian prior distribution defined in Eq. (8) .
In a similar fashion to the outlier variance, we consider a PCGS to sample simultaneously (μ t , t 0 ) | (y , \ t 0 , \ μ t ) . This is done by using It is easy to show that sampling μ t | (y , \ t 0 , \ μ t ) reduces to sampling from the following truncated Gaussian distribution where M = μs 2 + ξ 2 n ∈ I 1 t n Sampling from Eq. (21) can be achieved efficiently by using the method proposed in ( Mazet et al., 2005 ) while sampling t 0 | (y , \ t 0 , ) reduces to sampling from Gaussian distributions, as discussed above.
Sampling γ 2 | (y , , \ γ 2 ) : By cancelling out the terms that do not depend on γ 2 from the posterior in Eq. (13) , its conditional distribution reduces to the following inverse-gamma distribution which is easy to sample from.
Sampling ω|( y, , ࢨω ): Due to the conjugacy of the hierarchical prior model f ( z | ω ) f ( ω ), the full conditional distribution of ω reduces to the following beta distribution Sampling ( x, t )|( y, \ ( x,t ) , ): As mentioned earlier, we propose to sample simultaneously ( x, t )|( y, \ ( x,t ) , ), rather than considering a Gibbs sampler to sample from x |( y, \ x , ) and t |( y, \ t , ). This is done by using It is easy to show that f ( x | y, \ ( x,t ) ) is the following multivariate Gaussian distribution and Z = diag (z ) . The full conditional distribution of t , i.e., f ( t |( y, \ t , )) reduces to the following multivariate Gaussian distribution Note that is a diagonal covariance matrix, which is easy to construct.
Sampling ( z, t )|( y, \ ( z,t ) , ): Updating simultaneously ( z, t ) is achieved using It can be seen from Eq. (13) that where m ∈ {0, 1} and Consequently, the label z n can be drawn from its conditional distribution (where t n has been marginalised) by drawing randomly from {0, 1} with probabilities given by Moreover, the elements of z can be updated in a parallel manner . Sampling from f ( t | y, \ t , ) is then achieved using Eq. (28) .
Sampling σ 2 | (y , \ σ 2 , ) : In a similar fashion to the regularisation parameter γ 2 , the noise variance σ 2 can be sampled from the following inverse-gamma distribution Algorithm 2 is a detailed version of Algorithm 1 following the sampling steps explained above. Although Algorithm 2 seems more complex (more sequential steps) than Algorithm 1 , several steps can be omitted since some generated variables are not actually used. For instance, the variables generated in steps (a2) and (b2) are not used during the following steps and thus can be omitted. Similarly, sampling t |( y, \ t , ) in (e2) is omitted as this step is not required when sampling z |( y, \ ( z,t ) , ) in (f1). These simplifications result in the final algorithm shown in Algorithm 3 .
Algorithm 2 Partially Collapsed Gibbs Sampling Algorithm to Bacteria Detection for OEM -Version II.
(g) Sample σ 2 (k ) | y , x (k ) , t (k ) , z (k ) , (k ) from Eq. (33). 4: Set k = k + 1 . The algorithm is stopped after N MC iterations, including N bi burn-in iterations which correspond to the transient period of the sampler (determined visually from preliminary runs). The first N bi samples are discarded and the remaining samples are used to approximate the following estimators. The label vector z is estimated using marginal maximum a posteriori (MAP) estimation. This estimator is then used to compute the minimum mean square error (MMSE) of r conditioned on z = ˆ z MAP , i.e., ˆ r = ˆ r MMSE | ˆ z MAP ˆ z MAP . Finally, the remaining parameters are estimated using the empirical averages of the generated samples (MMSE estimates), for instance, the MMSE of the actual intensity vector x , denoted as ˆ x is given by

Data creation
In order to assess the performance of the proposed approach for outlier detection, two standard test images are used, namely the 'Elastin' (570 × 570 pixels) and the 'Coins' (492 × 600 pixels). Subsampled versions of these images are obtained by considering the sampling pattern of an actual OEM system ( Krstaji ć et al., 2016 ) as shown in Fig. 4 , which yield 4436 randomly subsampled pixels from the Elastin image (corresponding to 1.37%) and 4029 randomly subsampled pixels of the Coins image(corresponding to 1.36%). Fig. 5 (a) and (e) show the original test images, (b) and (f) show natural neighbour interpolation ( Sibson, 1981 ) of the randomly subsampled pixels of the images in (a) and (e) respectively, and (c) and (g) show examples of system output after applying the model in Eq. (1) . In this case, the noise variance is σ 2 = 10 , and 5% of outliers with mean (μ t = 128) , and variance (s 2 t = 10) are used. Note that the outliers and noise are added to the intensity field vector x before interpolation.

Evaluation criterion
To assess the outlier detection performance, considering the fibre cores that are neither known as outliers nor detected by the proposed algorithm as true negatives (TN) would result in a heavily unbalanced two-class problem. Hence, precision (also equivalent to positive predictive value) and recall (also equivalent to sensitivity or true positive rate) measures are considered, which are computed as follows where TP, FN, and FP refer to the number of true positives, false negatives, and false positives respectively. Given the locations of cores where an outlier has been detected and the reference outlier locations, we consider • any detection that matches reference outlier location as TP.
• any detection that does not match any of the reference outlier locations as FP. • any reference outlier location that does not match with any of the algorithm detections as FN.
Since each bacterium corresponds to a set of connected cores, each group of connected detections is counted as a single detection. These connected detections arise due to the spread of some bacteria fluorescence light over neighbouring cores, which decreases as distance from the central illuminated core increases. Once these connected detections are identified, they are replaced by a single detection at the mean of their locations, which gives the total number of detected outliers in the test image.

Performance analysis
In order to assess the performance of the proposed method, weak and strong outliers and noise are tested and the precision and recall measures are computed accordingly. We evaluated the algorithm by selecting σ 2 ∈ {0, 1, 10}, μ t ∈ {0, 128, 255}, s 2 t ∈ { 10 , 100 , 10 0 0 0 } and outlier percentage of 1%, 5% and 10%. We show the results for σ 2 = 10 , and outlier percentage of 5%. Results for outlier percentages of 1% and 10% are not shown since the trends in behaviour are the same as with 5%. The precision and recall measures of the tested scenarios are represented by the empirical precision-recall graph shown in Fig. 6 for the standard Elastin image. We can observe a satisfactory detection accuracy for all of the tests except for the cases where outlier mean and/or variance is near to noise variance, in which the outliers are of low amplitude, and hence could not be detected. This should not be a problem as in most OEM images, including the one investigated in this work, outlier mean and variance are positive and much larger than that of the noise. Similar trends are observed for the standard Coins image and hence they are not presented here. Fig. 5

(d) and
(h) show examples of detections when σ 2 = 10 , outlier presence probability of 5% and outlier mean and variance of μ t = 128 and s 2 t = 10 respectively, for the Elastin and Coins image respectively.

Comparison with existing approaches
In this subsection, we compare the proposed approach with four existing spot-detection methods, namely the sparse coding approach proposed in , the Laplacian of Gaussian (LoG), its approximation; the difference of Gaussians (DoG) filters ( Lindeberg, 1998;He et al., 2010 ), and the grey scale opening top-hat filter (GSOTH) ( Kimori et al., 2010;Bright and Steel, 1987 ) which are all unsupervised detection methods. The sparse coding approach was used for bacterial detection to the same datasets processed in this work . This approach splits each image into a set of overlapping patches (each of size 27 × 27 with 50% overlap in this case) and assumes that observed intensities are linear combinations of the actual intensity values associated with background image structures, corrupted by additive Gaussian noise and potentially by a sparse outlier term modelling bacteria. The actual intensity term representing background structures is modelled as a linear combination of a few atoms drawn from a dictionary which is learned from bacteria-free data and then fixed while analyzing new images. The bacteria detection task is formulated as a minimization problem and an alternating direction method of multipliers (ADMM) is used to estimate the unknown parameters. Different number of dictionary atoms including 50, 100 and 150, and different outlier regularization parameter values ranging between 0.01 and 15 are tested. Thus, we provide here the best results obtained. On the other hand, the LoG filter is implemented by employing a 5 × 5 kernel of standard deviation of 0.8 to each frame. Similarly, the DoG filter is implemented by considering the difference of two 5 × 5 Gaussian kernels of standard deviations of 0.5 and 0.8 respectively. The GSOTH is employed by first smoothing the input image by a Gaussian kernel to reduce the noise, then by computing the morphological opening of the input image by employing a 3 × 3 flat disc, which achieves the best detection results and then subtracts the result from the original image. Each of these four methods provides an outlier amplitude image that needs to be thresholded in order to determine bacterial locations. Different cut-off thresholds are tested and the same post processing steps described earlier (pixel grouping and computation of the barycenters) are also employed. The precision-recall curves are then constructed accordingly and the area under curves (AUC) are computed. The comparison with the proposed approach in this work is conducted in terms of precision-recall measures, as well as in terms of the computation time.

Bacteria detection performance
Due to the absence of AUC measures for the proposed approach, since it can provide instant detections that do not require to be thresholded to get outlier locations, we first provide a comparison between the four existing methods described above in terms of AUC measures. We then compare the proposed approach with the existing method that provides the highest AUC. Table 1 presents AUC measures of the four existing methods using different outlier means and variances. It is clear that the sparse-coding approach outperforms the rest of the three existing methods and that the LoG method provides the second best. Fig. 7 illustrates an empirical plot of the precision-recall measures of the proposed approach and the precision-recall curves of the sparse coding approach which provides the highest AUC measures. It is clear that the Bayesian approach outperforms the rest of the methods. Moreover, it is fully automatic as the user is not required to tune any Table 1 Area under curve measures of the resulting precision-recall curves of the four existing methods. Bold (resp. underlined) represent best (resp. second best) results.  crucial hyperparameters that would greatly affect the detection performance. Table 2 provides the average computation time of the five methods. For the sparse coding approach, the resulting number of test patches is 1521 yielding Y ∈ R 729 ×1521 , and the dictionary tested is D ∈ R 729 ×150 . The experiments were conducted on ACER core-i3-2.0 GHz processor laptop with 8 GB RAM. As the proposed approach can provide results from just a single run, the rest of the methods require thresholding the outlier amplitude image, which is difficult to convert into a precise computation time. However, the computation time of the sparse coding approach corresponds to the duration of five runs of different outlier regularization parameter values, which are used to select the best regularization parameter among the five values in terms of detection performance. Although the Bayesian approach proposed provides the highest computation time, it crucially brings the benefit of providing higher detection performance with respect to the other four methods, and being fully automatic, as there is no need to either set any regularization parameters nor threshold the resulting outlier amplitude image in order to identify outlier locations.

Datasets
The proposed algorithm was assessed using two datasets of ex vivo ventilated whole ovine lungs instilled with bacteria. Dataset I contains seven videos assessing a combination of fluorescent dyes and bacterial types, including control segments. It contains (i) three videos of ovine lungs instilled with Methicillin-sensitive Staphylococcus aureus ( MSSA ) stained with a commercially available laboratory dye (PKH67, Sigma-Aldrich), a highly fluorescent cell membrane dye, (ii) two videos of ovine lungs instilled with bacteria (gram-positive MSSA and gram-negative Pseudomonas PA3284) stained in situ with an in-house bacterial detection SmartProbe , and (iii) two videos of ovine lungs without the presence of any bacteria. Videos 1-5 were instilled with a single concentration of bacteria, equivalent to Optical Density (OD595nm) of 2.
Dataset II contains four videos, each with an increasing bacterial concentration (OD595 nm 0.004,0.04,0.4,4) all labelled with an in-house bacterial detection SmartProbe. Tables 3 and 4 summarise the details of Datasets I and II respectively.
The Cellvizio OEM imaging platform along with a 1.4mm, 12,105 core fibre bundle (Mauna Kea Technologies, Paris, France) was used to acquire all data in this study. Images sequences of size 274 × 384 pixels (306 μm × 429 μm) were captured at 12 frames per second. Furthermore, the fibre core locations were known from factory pre-calibration to sub-pixel accuracy in this image grid. Finally, prior to any frame selections, manual annotations and automatic processing, the uninformative frames were automatically removed ( Perperidis et al., 2017a ) from each video, ensuring only clinically relevant frames ( ∼ 500 in each video) were considered for this work.
Random frames that are representative of each of the entire video sequences are chosen from each of the eight videos of Dataset I and the four videos of Dataset II by a trained clinician. Those comprises 133 image frames for Dataset I, and 58 frames for Dataset II as described in Tables 3 and 4 . In each frame, a trained clinician marked the co-ordinates of phenomena that are thought to be bacteria with high confidence. Ambiguous points are ignored. Due to the size of the imaged bacteria ( < 3 μm), a bright fluorescent 'dot' is identified as bacterium if one or two cores present higher intensities than the surrounding area. The location of the annotation is rounded to the closest core location. The fibre core locations are provided by the imaging system with sub-pixel accuracy, so the fibre core centre coordinates were also rounded to the nearest pixel.
The increasing bacteria concentration dataset is considered to compare the annotations performed by the clinician and the results of the algorithm.

Algorithm evaluation
The algorithm was run for each of the annotated frames, yielding sets of detected bacteria, i.e potential labelled bacteria. The same post processing steps described earlier (connected detections grouping and barycentre computation) are applied, which gives the total number of detected bacteria in each frame. Due to the subjectivity of the clinician's annotations, it cannot be considered as ground truth, therefore, in order to evaluate the performance of the proposed approach, we consider only count-annotation criterion by performing statistical comparison of bacterial counts performed by the trained clinician and the algorithm output.

Count-annotation effect
For Dataset I, the algorithm counts are compared with the clinician counts in each frame as shown in Fig. 8 . We can observe an Table 2 Average computation time (in seconds) for the proposed method and four existing spot-detection methods. In order to maintain a fair comparison between the five algorithms, the computation time of the sparse coding approach corresponds to the duration of five runs (used to select the best regularization parameter among the five values). Moreover, all of these methods apart from the Bayesian model requires manual thresholding of the outlier amplitude image in order to identify bacteria locations, but this can not be easily converted into precise computation time. Bold and underlined values are fastest and second fastest computation times respectively.

Hierarchical Bayesian model
Sparse-coding approach LoG DoG GSOTH 10.9 5 x 1.98 = 9.9 0.6 0.13 0.31 almost linear relationship between the clinician counts and algorithm counts. Indeed the empirical correlation between the manually and automatically detected anomalies is 0.912. Moreover, for videos 1, 2 and 3 in which a highly fluorescent SmartProbe is used, and videos 4 and 5 in which an in-house SmartProbe which produces weaker fluorescence signal is used, a similar trend is observed between the clinician and the algorithm. This also applies to the type of bacteria the samples are labelled with. Videos 6 and 7 which are controls, demonstrate minimal counts using both the clinician and the algorithm, which reflects the ability of the algorithm to differentiate bacterial loads from control. For Dataset II, Fig. 9 shows a plot of the clinician's bacteria counts versus those of the algorithm, we observe that each concentration occupies a broad count range in terms of both counts from the opinion of the clinician and algorithm output. Moreover, Fig. 10 depicts plots of the total bacteria counts performed by the clinician and that by the algorithm versus different bacterial concentrations. We can observe that both of the clinician and the algorithm counts increase with the bacteria concentration which reflects the agreement between the approach considered and the clinician's annotations.
In the two datasets, we can observe that the algorithm counts are higher than that of the clinician, as we expect the algorithm to be able to identify dots that are hard to be seen by the naked eye. Moreover, the clinician did not annotate ambiguous dots in a number of cases. This, along with false positives is most likely the main reasons why the algorithm counts are higher than the clinician counts. Fig. 11 shows examples of detections in two frames from Dataset I using the proposed approach. Fig. 11 (a) and (c) show the original frames and (b) and (d) show the algorithm detections and the clinician's annotations superimposed. We observe the accuracy of the algorithm as almost all of the cores annotated by the clinician are covered by the algorithm's detections.

Comparison with existing approaches
Since the sparse coding approach described earlier provides the best results in comparison with three existing methods , in this section, we compare its performance with the proposed approach in terms of count-annotation effect. Fig. 12 depicts the proposed algorithm's counts versus the clinician's counts in (a) and that of the sparse coding approach at cut-off thresholds of 0.07 and 0.05 in (b) and (c) respectively, all for Dataset I. The empirical correlation coefficients of the three cases are 0.91, 0.82 and 0.65 respectively. It can be seen that the proposed approach provides the highest correlation between algorithm's counts and clinician's counts. Moreover, the user would need to carefully tune the cut-off threshold of the sparse coding approach to obtain reasonable correlation. It can also be observed that the proposed approach provides minimal counts for the control cases (videos 6 and 7). Although increasing the cut-off threshold will result in fewer counts, these counts are not necessarily true positives (this can be observed in Fig. 7 , which indicates that having high precision might result in having small recalls).
In a similar fashion, Fig. 13 shows the mean number of detections per selected frames in videos 1-4 of Dataset II and the corresponding standard deviation for counts derived from the clinician's opinion in (a), proposed method in (b) and sparse coding approach with different cut-off thresholds in (c). The two methods are in agreement with the increasing concentration behaviour. Although the sparse coding approach provides smaller counts compared to the proposed approach, this comes to the cost of successful tuning of cut-off threshold to provide reasonable precision-recall measures.

Limitations
Although the benefits of the proposed approach in detecting outliers are clear, a few limitations remain. The first is related to the noise model considered in this work. Due to the lack of background images from the system that the data was collected with, an exact noise model could not be inferred, hence a Gaussian noise model was assumed due to its simplicity. Therefore, validation of this noise model is still required. The second is related to the lack of ground truth in the real datasets investigated in this work due to the difficulty in annotating the bacteria. Although these datasets are annotated, the annotations cannot be considered as ground truth. While annotating, it is likely that the annotator, i.e. the clinician, will makes errors: they will either falsely annotate a bacterium when it is noise, or simply miss-annotate a bacterium due to their overwhelming numbers in each frame, since our target objects are 'dots' with similar structure. These types of error are common in any annotation process. We note that in clinical context no ground truth will exist and it is a topic of further work to explore how the annotations from an automatic method would be used to assist a clinician.

Conclusions
In this paper, we illustrated the performance of a Bayesian algorithm for bacterial detection in OEM images of distal lung tissue. Prior distributions were chosen for the unknown model parameters and their corresponding hyperparameters. An MCMC method based on a partially collapsed Gibbs sampler was proposed to sample from the resulting posterior distribution in order to estimate the unknown model parameters. The algorithm was validated using synthetic dataset for which it showed good detection performance. Subsequent analysis conducted using two ovine lung datasets, including a combination of fluorescent dyes, bacterial species and bacterial loads, demonstrated that the estimated bacterial count correlates with the bacterial counts performed by a clinician. The proposed algorithm is fully automatic in the sense that it does not require the user to set sensitive/crucial parameters as all model parameters are estimated within the MCMC algorithm. Comparisons with existing approaches showed superior performance of the proposed approach. Although the proposed approach is simulationbased for which a sophisticated MCMC method has been considered, it provided a competitive computation time compared to fast optimization methods, with the benefit of being fully automatic. Future work includes bacterial tracking by taking advantage of the temporal information in the datasets. Multiple concurrent imaging wavelengths can also be tested to extend to detection in multispectral datasets by using appropriate spectral unmixing algorithms.

Declaration of Competing Interest
This manuscript is an original work and therefore it has never been published previously and it is not under consideration for publication in another journal. We confirm that the manuscript has been read and approved by all named authors and that there are no other persons who satisfied the criteria for authorship but are not listed. We further confirm that the order of authors listed in the manuscript has been approved by all of us. We confirm that we have given due consideration to the protection of intellectual property associated with this work and that there are no impediments to publication, including the timing of publication, with respect to intellectual property. In so doing we confirm that we have followed the regulations of our institutions concerning intellectual property. We understand that the Corresponding Author is the sole contact for the Editorial process (including Editorial Manager and direct communications with the office). He is responsible for communicating with the other authors about progress, submissions of revisions and final approval of proofs. We confirm that we have provided a current, correct email address which is accessible by the Corresponding Author and which has been configured to accept email from (AK577@hw.ac.uk).