Automated Detection of Uninformative Frames in Pulmonary Optical Endomicroscopy

Significance: Optical endomicroscopy (OEM) is a novel real-time imaging technology that provides endoscopic images at a microscopic level. The nature of OEM data, as acquired in clinical use, gives rise to the presence of uninformative frames (i.e., pure-noise and motion-artefacts). Uninformative frames can comprise a considerable proportion (up to > 25%) of a dataset, increasing the resources required for analyzing the data (both manually and automatically), as well as diluting the results of any automated quantification analysis. Objective: There is, therefore, a need to automatically detect and remove as many of these uninformative frames as possible while keeping frames with structural information intact. Methods: This paper employs Gray Level Cooccurrence Matrix texture measures and detection theory to identify and remove such frames. The detection of pure-noise and motion-artefacts frames is treated as two independent problems. Results: Pulmonary OEM frame sequences of the distal lung are employed for the development and assessment of the approach. The proposed approach identifies and removes uninformative frames with a sensitivity of 93% and a specificity of 92.6%. Conclusion: The detection algorithm is accurate and robust in pulmonary OEM frame sequences. Conditional to appropriate model refinement, the algorithms can become applicable in other organs.

as probe-based confocal laser endomicroscopy, is the most widely used platform and the only fiber-based endomicroscopic methodology approved for clinical use.The technology employs a proximal laser scanning unit linked to an interface with a flexible multicore fiber.This fiber is passed through the working channel of endoscopes enabling microscopic imaging at the distal end of the fiber.In pulmonary OEM, the abundance of elastin and collagen enables structural imaging through the generation of autofluorescence with a 488-nm laser excitation.The lateral diameter of the fiber used in lung applications is 1.4 mm.This miniaturization enables the exploration of the distal pulmonary tract [2] as well as the assessment of the respiratory bronchioles and alveolar gas exchanging units of the distal lung [3].OEM has been used clinically in the lung for the detection of lung cancer [4], [5] and has been used to assess the distal lung [6], [7] including the imaging of parenchymal lung diseases [8].Furthermore, OEM has been used in other organs such as the urological tract [9].The largest OEM application remains in imaging of possible cancerous lesions in the gastrointestinal tract [10], [11].The commercially available FCFM platform images at 12 frames/s and clinical and preclinical OEM procedures often last minutes, generating thousands of frames; hence, making their manual (postvivo) analysis a very labor intensive process.
The nature of OEM data acquisition results in image sequences that form a long continuous scene.Within these sequences, there are frames that contain only pure-noise [see Fig. 1(a)], mostly due to the lack of contact of the fiber with a fluorescent target or due to bio fouling of the tip of the fiber.Similarly, there are frame sequences where the spatial movement is very large when compared to the temporal rate of acquisition.This results in motion artefacts, expressed as either deformed anatomical structures [see Fig. 2(a)-(c)], or spatial discontinuity for temporally adjacent frames [see Fig. 2(d)-(f)].Such frames contain a little information of value and are, therefore, referred to as "uninformative frames."Indeed, uninformative frames comprise a substantial proportion of the dataset, depending on the motion of the imaging target as well as also the operator manipulating the fiber.In pulmonary OEM, significant movement artefacts occur due to the movement of the fiber in the distal lung from both the respiratory effort of the patient and also of the fiber traversing bronchopulmonary segments of the lung.In our experience with lung OEM data, uninformative frames may comprise in excess of 25% of the acquired frames.The presence of uninformative frames: 1) prolongs the offline manual assessment of the data, 2) increases computational resources, and 3) dilutes the results of any postprocessing algorithm intended to analyze and quantify the images.There is, therefore, a need for an automated approach to accurately and robustly detect and remove such uninformative frames as the first line of any automated or manual image analysis step.
There has been considerable research in the suppression of the noise [12] as well as for the detection of motion artefacts for a range of imaging modalities, including, but not limited to, aerial images [13], microscopy [14], medical images [15], as well as other digital photography images [16]- [18].However, most such studies focus on the detection of motion-blurred regions within a frame with the intention to compensate for it through some image enhancement algorithm.Such techniques, while potentially very effective for their specific application, cannot be easily employed to detect uninformative frames in OEM data.Analyzing the spatiotemporal characteristics of the sequences is required.A large number of studies performing such analysis of the spatiotemporal characteristics of video sequences concentrate on detecting shot transitions and grouping frames into scenes [19]- [22].Once again, this is not applicable in OEM data due to the continuous acquisition resulting in a continuous imaging sequence with uninformative frames embedded within it.Other endoscopic imaging techniques can generate analogous frame sequences to OEM when navigating along the bronchus, or the gastrointestinal tract.There is, therefore, considerable interest in the spatiotemporal analysis of endoscopic data, including, but not limited to, laparoscopy [23], colonoscopy [24], wireless capsule endoscopy [25]- [28], and larynx endoscopy [29].The main focus of all these studies was the identification of one or more key frames within the main frame sequence to aid the diagnostic process or some further postprocessing technique.A recent study [30] has developed a fully automated approach for the selection of a representative frame from a short endomicroscopy frame sequence, enabling a real-time quantitative image analysis at the point-of-care.The approach generated very promising results for short oral and esophageal image sequences.However, none of the aforementioned studies address the problem of identifying and uninformative frames from OEM frame sequences.
This paper presents a novel approach for detecting and "removing" uninformative frames from OEM frames sequences.The algorithm was developed and assessed on frame sequences from the distal lung of patients with suspected lung cancer.However, with the appropriate adjustments, the algorithm can potentially be effective in removing uninformative frames from sequences acquired on 1) other organ systems, such as the gastrointestinal tract and the urinary tract, as well as 2) any other fiber-based imaging platform.The rest of this paper is organized as follows.Section II describes the material (data) utilized in this study.Section III describes the detection algorithms for purenoise and motion-artefacts independently.Section IV describes the data analysis used to train and test the detection algorithm, and Section V displays the relevant results.Finally, the proposed methods and corresponding results are discussed in Section VI.

II. DATA
Eighty-three OEM image sequences of the distal lung were used during the development and testing of the proposed algorithm.All data were obtained as part of a database (of 126 subjects) during the routine care of patients undergoing investigation for an indeterminate pulmonary nodule (< 30 mm) at the Columbus Lung Institute, Indiana, USA.The study was approved by the Western Institutional Review Board.All procedures were undertaken by a single expert operator using standard bronchoscopy, with the aid of a superDimension Navigation System (Covidien Inc., MN, USA) and imaging with 488-nm Cellvizio using a 1.4-mm lateral diameter Alveoflex fiber (Mauna Kea Technologies, Paris, France).All image sequences were stored in the proprietary .mktformat and read as 16-bit binary files for processing in MATLAB (MathWorks, Inc., MA, USA).Some subjects (n = 43) were rejected due to 1) short duration of sequences (i.e., video < 10 frames), 2) corrupted data (i.e., file not readable, misaligned fiber, or out of focus images), or 3) lack of distal lung images (i.e., solely imaging the bronchus).No other subjective criteria (such as image quality) that could potentially bias the proposed algorithm were used during the video selection process.

III. METHODOLOGY
This section describes the methodology used to detect purenoise and motion-artefacts frames.These were handled as two independent problems, both utilizing image-derived texture metrics.
Let I(x, y, t) be a gray scale image sequence, with x ∈ [1, N], y ∈ [1, M], and t ∈ [1, K] indicating the pixel location (xcolumn and y-row) and the frame number, respectively.The Gray-Level Cooccurrence Matrix (GLCM) [31] G t for frame I t = I(x, y, t) (x,y )∈[1,N ]× [1,N ] was defined as an L × L matrix where L was the number of gray levels within the image (16 bit), i and j were intensity levels, p and q were the spatial positions in the image I t , and Δx and Δy were the spatial offsets (in number of pixels) utilized to estimate the GLCM G t .In order to achieve rotational invariance of the relevant texture measures, G t was estimated as the mean GLCM for four different offset pairs {(1, 0), (1, 1), (0, 1), (−1, 1)}, corresponding to a single pixel offset at directions (0 where n was the sum of all the elements of the matrix G t .Related texture metrics were derived as [31]: Maximum Probability (t) = V 5 (t) = 1 − max i,j (p ij ) (7) where t was the frame number and p ij = G norm t (i, j).In addition to the aforementioned GLCM properties, global image characteristics, such as frame intensity mean (V 6 (t)), and standard deviation (V 7 (t)), were also employed.All texture metrics were estimated in a way such that frames containing noise (or very faint features) demonstrated low (nearly zero) values [see Fig. 1(a) and (b)], while more pronounced features, such as elastin strands and blood vessels [see Fig. 1(c)-(f)] within the alveoli space, demonstrated higher (closer to 1) values.Since the GLCMs need to be estimated in rectangular regions only, the largest square region within the circular field of view (FOV) of the OEM frame sequences was used as I(x, y, t) throughout this study.The remaining four segments (each 9% of the overall circular FOV) were not included in I(x, y, t), and, consequently, in the GLCM estimation and the subsequent frame detection.This decision was based in the assumption that, if the central square region of a frame was identified as pure-noise or a motion-artefact, a small structure in any of the four excluded subsections is not enough to reinstate the frame as an informative frame.
Each texture vector was normalized to the [0, 1] range using where i ∈ [1,7] and t ∈ [1, K] represented the frame number.Finally, a 7-D feature space was, therefore, defined as A. Detection of Pure-Noise Frames 1) Reducing Dimensionality: Principal component analysis (PCA) was employed to reduce the dimensionality of the 7-D feature space.The K × 7 matrix Y = (PC1, PC2, . . ., PC7) was defined as incorporating the projection of the seven feature vectors (X) in the relevant principal component space.The vector m X = contained the mean value of each of the seven parameters and the vector λ was an 1 × K unit vector, while the rows of A were the eigenvectors (i.e., the direction of the principal components) of C X , the 7×7 sample covariance matrix of X (normalized to unit length) Although the whole matrix Y could be used for the detection of pure-noise frames, the first principal component (PC1) was found to contain sufficient information for the detection of pure-noise frames.As a consequence, only PC1 (i.e., a single parameter per frame) has been considered for pure-noise frame detection.
2) Gaussian Mixture (GM) Model: An experienced investigator performed a thorough visual inspection on a subset of the available OEM data, aiming to identify any image texture subgroups that can justly represent the underlying anatomical information.The inspection of the OEM data highlighted four different texture categories (see Fig. 1): 1) pure-noise frames [see Fig. 1(a)], mostly containing no anatomical information, 2) subtle feature frames [see Fig. 1(b)], mostly containing linear bronchus strands or very low contrast elastin strands, 3) normal frames [see Fig. 1(c)-(d)], containing both pathological [see Fig. 1(c)] and healthy [see Fig. 1(d)] elastin strands, and 4) vibrant frames [see Fig. 1(e) and (f)], containing very well defined features, such as larger elastin strands and blood vessels.The boundaries of these four categories were not distinct.
Fig. 3(a) provides a representative histogram example H1 derived from the PC1 of a lung OEM image sequence.A GM model was employed to represent the underlying texture information contained in PC1.More precisely, following the four texture categories identified through the aforementioned manual visual inspection of the OEM data, the following GM model composed of 4 Gaussian distributions was considered where parameters P i provided the weight (also referred to as proportion or probability distribution with mean μ i (μ 1 < μ 2 < μ 3 < μ 4 ) and standard deviation σ i .
The GM model likelihood (log likely hood) was optimized using the iterative expectation-maximization algorithm [32], as performed by MATLAB's fitgmdist command.Fig. 3(a) overlays the mixture of 4 Gaussian distributions to the underlying histogram, with N 1 corresponding to pure noise and N 2 to N 4 corresponding to frames including anatomical features, from subtle to vibrant.

3) Model Simplification
Using Metropolis-Hastings (MH) Method: It is difficult to derive the distribution of classical test statistics (and thus predict the detection performance) in the general case of mixtures of more than two distributions.This section presents a statistical method to split a set of random variables, identically distributed (i.i.d.) according to a known mixture of Gaussians (N 1 to N 4 ), into two subsets, each containing variables distributed according to a mixture of a subset of the original Gaussians (e.g., N 1 and N 2 ).Such a split reduces the detection problem to a classical binary hypothesis test to decide between N 1 and N 2 (as will be shown in Section III-A4).The proposed approach can be seen as a MH algorithm [33], which is a Markov chain Monte-Carlo method typically used to generate random variables according to an arbitrary target distribution, i.e., distributions not handled by classical random number generators.The MH consists of generating random candidates according to a "proposal distribution" and accepting each candidate with a particular probability (the rejected candidates are either discarded or set apart).In our case, this accept/reject process ensured that the accepted samples were distributed according to the "target distribution" defined as the following mixture of N 1 and N 2 : as the intention was to discriminate (N 1 , N 2 ) from (N 3 , N 4 ).Let u ∈ P C1, the projection of an image feature vector (X) onto the first principal component, being distributed accord-ing to (12).By considering (12) as proposal distribution, the variables in PC1 as independent candidates and ( 14) as target distribution, the probability of accepting u was estimated by the ratio where (16) Note that if the variables u ∈ PC1 were actually independent, i.i.d.variables following (12), the selected variables in PC1 sub would be distributed according to (14).However, since the GM (12) was an approximation of the actual distribution of u ∈ PC1, the distribution ( 14) was, therefore, also an approximation of the distribution of u ∈ PC1 sub .Nevertheless, as suggested by the results in Section V, in practice, this approximation was accurate enough, leading to satisfactory results in terms of uninformative frame detection.Fig. 3 depicts a representative example of histograms of the variables in PC1, before and after the model simplification, along with the associated mixtures of 4 and 2 Gaussian distributions.

4) Detection:
The null and alternative hypotheses were defined as with N 1 corresponding to the pure-noise frames and μ 1 < μ 2 .
The receiver operating characteristic (ROC) curve of the two-Gaussian model was estimated as the false positive rate (FPR) against the true positive rate (TPR) with ν ∈ PC1.A weighted version of Youden's Index [34], [35] J was employed to derive the cut point on the ROC that pro-vides optimal tradeoff between TPR and FPR.Youden's index is often used in conjunction with ROC analysis as a measure of overall diagnostic effectiveness.Youden's index represents the point along the ROC curve with maximum vertical distance from the first bisector [34].Unlike the area under the curve (AUC), Youden's index can be used as an optimal cut-off point (threshold), being the point in the ROC curve furthest away from the chance line.In order to avoid threshold bias toward the largest population (negative frames in this case), a weighted Youden's index J was defined as [35]: with true negative rate (specificity) TNR = 1 − FPR and weighting factor r = (1 − π)/απ.Moreover, α denoted the relative loss (cost) of a false negative classification, while π represented the proportion of positive (pure-noise) frames within the frame sequence.For the proposed application, since no critical decision was being made by the proposed detection algorithm, relative cost α was set to 1.
The optimal cut-point ν J was then employed to derive the desired (optimal) false positive rate FPR J = FPR(ν J ).Finally, the quantile function Φ −1 (p) was used to estimate the threshold T f = μ 2 − Φ −1 (p)σ 2 differentiating noise to normal frames.More precisely where erf −1 was the inverse error function and p representing TNR J = 1 − FPR J .Hence, the set of pure-noise frames was

B. Detection of Motion Artefacts
Instead of the direct texture values, the frame-by-frame texture variability, X = X(t) − X(t − 1), t ∈ [1, K] indicating the frame number, was used to detect motion artefacts.PCA, as described in (10), was then employed to reduce the dimensionality of the feature space.The first two principal components, PC1 and PC2 , were found to contain the information relevant for the distinction of motion artefacts.
Visual inspection of the data highlighted four different types of frame-by-frame motions (see Fig. 2), namely 1) motion artefact frames, where a large movement resulted in tissue deformation and spatially discontinuous frame sequences [see Fig. 2(a)-(c) and (d)-(f)], 2) large movement frames, in which while movements were large, they still resulted in spatially continuous frame sequences [see Fig. 2(g)-(i)], 3) normal frames with moderate movement [see Fig. 2(j)-(l)], and 4) nearly static frames, with negligible movements.In a fashion similar to that adopted for the pure-noise frames, the boundaries of these cases were not well-defined.To represent this underlying texture-difference information contained in PC1 and PC2 , and taking into consideration the four frame-byframe motions identified through the aforementioned manual visual inspection of the OEM data, two 4-GM models (GM = 4 i=1 P i N i (μ i , σ i )) were employed.In both cases (PC1 and PC2 ), the Gaussian distributions demonstrated zero mean and decreasing standard deviation (σ 1 > σ 2 > σ 3 > σ 4 ).Fig. 4 provides representative histogram examples derived from the PC1 and PC2 of a lung OEM image sequence along with the corresponding 4-Gaussians models, with N 1 corresponding to motion artefacts and N 2 to N 4 corresponding to frames with large to negligible movements.In a similar fashion to the noise case, the detection problem was simplified (see Fig. 4) by removing the two distributions with smallest standard deviations (normal and nearly static frames) as described in ( 14) to (16).
1) Detection: The null and alternative hypotheses were defined as with N 1 corresponding to the motion artefact frames and σ 1 > σ 2 .According to the Neyman-Pearson Lemma [36], the likelihood ratio test rejecting H 0 in favor of H 1 when where P (Λ(x) ≥ k|H 0 ) = α, provides the most powerful test at significance level α for a threshold k.By employing Bayes' theorem and taking the logarithm of the likelihood ration, Λ(x) became This was further simplified to where where χ 2 denotes the chi-squared distribution with one degree of freedom.As a result, for a given FPR where F was the chi-squared probability density function, and β = 1 were the degrees of freedom of the chi-squared distribution.
The upper and lower thresholds denoting motion artefacts were, therefore, estimated by and the set of motion blur frames was Similar to the noise case, the optimal false positive (alarm) rate (FPR J ) was estimated using the ROC curve and the relevant Youden's index, as described in (18) to (20).

IV. DATA ANALYSIS
Of the available 83 OEM frames sequences, 11 datasets were selected as a testing set.Selection criteria included type of diagnosis, video duration, and quality of acquired images (i.e., noise, contrast, and artefacts levels).The remaining datasets were used as training set.In order to minimize a potential selection bias, it was ensured that representative frame sequences were included in both training and testing sets.Tables I and II summarize the key characteristics of the training and testing sets.The training set was employed 1) to create a statistical model (i.e., GM model) that describes well the underlying texture information, and 2) to extract a detection threshold that achieves an optimal tradeoff between TPR and FPR (employing Youden's index).The relevant noise and motion artefact thresholds were, therefore, estimated using the training set employing no prior knowledge about the testing set.The testing set was then projected in the training set's principal component space and the threshold was employed on the relevant projection.If the assumptions used to make the statistical model were correct and the resulting GM model is representative of the underlying data,  when the threshold is applied on the previously unseen testing set, it will produce results (sensitivity and specificity) that match the expected theoretical values (TPR and FPR derived from the training set).

A. Manual Data Analysis
One investigator, with substantial prior experience in OEM image sequences of the distal lung, annotated each individual frame in the testing set as normal or pure noise.Furthermore, due to the more subjective nature of what is considered as motion artefact, two investigators independently annotated each individual frame in the testing set as normal or motion artefact.The instructions on which the annotation was based stated that, a frame was considered a noise frame, if no anatomical information was present within the frame.A frame was considered a motion artefact if there was 1) spatial deformation of the imaged structures due to the high motion levels compared to the acquisition speed, and/or 2) no spatial continuity between temporally adjacent frames.Characteristic examples of normal, noise, and motion-artefact frames are provided in Figs. 1 and 2.
Table III lists the number of frames annotated by each operator as motion artefacts, the Union and Intersection of the two sets, as well as the corresponding Jaccard index [37].Jaccard index provides a statistic for comparing the agreement between the two finite sample sets, and is defined as the size of the intersection divided by the size of the union of the compared sets 2| .In order to reduce the interobserver variability (bias of the manual data annotation), a frame was assigned the uninformative label if both investigators had annotated it as such.Otherwise, if one of the investigators considered that there was valuable information within the frame in question and labeled it as normal, the frame was considered normal.The resulting binary annotations (summarized in Table IV) were utilized as the gold standard for the subsequent evaluation of the proposed detection algorithms.

B. Assessing Proposed Model Fit
A Kolmogorov-Smirnov (KS) test [38], [39] was employed to assess the goodness-of-fit of the actual data on the proposed GM model.More precisely, the KS statistic was estimated where F 1,K (ψ) and F 2,K (ψ) were the empirical distribution functions (EDFs) of the actual data and mixture model, respectively (i.e., F 1,K (ψ), was the proportion of actual data ≤ ψ and F 2,K (ψ) was the proportion of the mixture model ≤ ψ).Furthermore, K and K were their respective sizes (in number of frames).Under null hypotheses, both the actual data and the relevant mixture model came from the same distribution.For a given significance level α = 0.05, the null hypotheses was rejected if where c (α) = 1.36 for significance level α = 0.05 as provided in the relevant critical value table in [39].

C. Training-Set Size Selection
The overall training set S consisted of 72 datasets and >48 000 frames containing a representative selection of frames.An optimal training set size would provide robust detection thresholds for uninformative frames, while keeping computational requirements (relative to the size) to a minimum.A line plot of set size against thresholdrobustness was employed to identify such a sufficient training set size.More precisely set size(δ) = δ × step. ( where δε{1, 2, . . ., 8} and step = 6000, testing set sizes of up to 48 000 frames.Furthermore where rsd (A δ ) estimated the relative standard deviation of the set A δ , and A δ = {Thr 1 (S(J δ )), . . ., Thr 10 (S(J δ ))} was a set of ten replicated estimates of the required threshold (Thr i ) for a given subset S(J δ ) of the training set S.
provided the uniformly distributed random indices of the subset of S (length of S = L).

D. Assessing the Performance of Detection
The effectiveness of the proposed approaches in detecting uninformative frames was assessed quantitatively by estimating their relevant sensitivity and specificity against the manual detection results (gold standard).The sensitivity and specificity levels were also compared against the relevant model-based ROC curves, assessing how representative the employed model and the associated assumptions were in detecting pure-noise and motion-artefacts frames within previously unseen OEM frame sequences.

A. Assessing Proposed Model Fit
Numerous mixture models with increasing number of Gaussian distributions were fitted to the original EDFs in order to verify that the proposed model provided an optimal representation of the underlying data.Table V summarizes the corresponding KS goodness-of-fit results.Table VI also compares the KS goodness-of-fit of the selected 4-Gaussian model to the corresponding 2-Gaussian model refinement (as described in Section III-A3 and Section III-B), while Fig. 5 illustrates the closeness of these models to the original EDFs (for both pure-noise and motion-artefact detection).

B. Training-Set Size Selection
Line plots were derived (as described in Section IV-C) illustrating the effect of increasing the size of the training set on the robustness (expressed as RSD) of the relevant threshold estimation.The process was repeated for PC1 in the detection of pure-noise frames, as well as PC1 and PC2 in the detection of motion artefacts.Fig. 6 contains the relevant plots.

C. Sensitivity Versus Specificity
ROC curves were derived from the proposed GM models for pure-noise and motion-artefact detections.Fig. 7 illustrates the relevant plots with their corresponding AUC provided in the title.If the models provided an accurate representation of the underlying data, the estimated specificity and sensitivity results from the previously unseen testing set should match the corresponding values at the optimal ROC cut-off point as calculated using the Youden's index (Section III-A4).Table VII lists the sensitivity and specificity in pure-noise detection for each individual dataset as well as for the whole testing set as a whole.
Similarly, Table VIII lists the sensitivity and specificity in motion artefact detection using PC1 and PC2 individually.The model-based sensitivity and specificity estimates are provided in the relevant table titles.Due to the independent modeling and analysis of PC1 and PC2, no model-based estimates of sensitivity and specificity are provided for PC1 ∪ PC 2. Finally, Table IX summarizes the sensitivity and specificity of the detection of uninformative frames (both pure-noise and motion-artefacts) collectively.To emulate the decision process of a manual detection, sporadic (one consecutive) good frames amongst a sequence of uninformative frames were removed.

VI. DISCUSSION
Thorough visual inspection of the available OEM data by an experienced investigator highlighted four different texture categories (see Figs. 1 and 3) and an equal number of frame-byframe movement types (see Figs. 2 and 4) to be used for the detection of pure-noise and motion-artefacts frames, respectively.As illustrated by Figs.3-5 and verified by the corresponding KS goodness-of-fit results in Table V, in both cases, the 4-Gaussian models provide an optimal representation to the underlying information.Reducing the number of Gaussian distributions in the proposed model has a direct and substantial detrimental effect in the corresponding goodness-of-fit to the underlying data.On the other hand, increasing the number of Gaussians in the model to 5 (or more) does not necessarily improve the relevant goodnessof-fit.Further visual inspection of the available data indicates that, in the case of pure-noise frames, the challenge lies in the accurate and robust distinction between pure-noise [see Fig. 1(a)] and subtle feature frames [see Fig. 1(b)].Similarly, in the case of motion artefacts, the challenge lies in the distinction between them [see Fig. 2(d)-(f)] and large (but continuous) movements [see Fig. 2(g)-(i)].The relevant distribution overlaps in Figs. 2  and 4 verify this observation (largest overlaps between N 1 and N 2 ).By refining the GM model as described in Section III-A3, the detection problem is reduced to a classical binary hypothesis test deciding between N 1 and N 2 .The close proximity of the refined model to the corresponding histograms (see Fig. 5 and Table VI) along with the subsequent promising detection results suggest that the refined models provided a fair approximation of the distribution of the relevant PCA coefficients.
A large and diverse set of OEM images was employed to train the proposed algorithms for the detection of uninformative frames.As illustrated by Fig. 6, a training set of >30 000 frames is sufficiently large for a robust threshold estimation (RSD < 6%-small drop for larger training sets) in both the pure-noise and motion-artefact cases.Section III-A4 employed a simple approach based on the model-based ROC curve and the corresponding weighted Youden's index to detect pure-noise frames.The ROC curve in Fig. 7(a) along with the corresponding AUC and predicted detection sensitivity of 98.8% and specificity of 97.7% support the decision of employing such a simple model.The decision is further backed by the encouraging detection results on the previously unseen testing set, yielding an overall sensitivity of 93% and specificity of 98.8%.Not taking into consideration the outlying dataset, "benign 1" can further  increase the overall sensitivity to 96.5% and specificity to 98.6%.The very promising detection results, along with their close agreement to the results predicted by the proposed GM model, highlight the reliability of the proposed detection approach and the limited scope for a more mathematically advanced solution.
The detection of motion artefacts was a more challenging and subjective task; hence, the decision to manually annotate the relevant frames by two operators.The very modest agreement (Jaccard index: 0.58-Table III) was mostly due to interobserver variation in the start and end frames of an uninformative frame sequence.Rarely there was a disagreement over a full motionblur artefact.Nevertheless, the limited agreement between the two manual annotations confirms the more challenging and subjective nature of the problem.The observation is further supported by the corresponding ROC curves [see Fig. 7(b) and (c)] with the optimal cut-off points (Youden's Indices) yielding sensitivity of less than 76%.Due to the more challenging and subjective nature of the problem, the Neyman-Pearson lemma was employed for the estimation of the detection threshold providing the most powerful test at significance level α for a threshold k.PC1 yields better detection results achieving a sensitivity    of 74.9% and specificity of 94.3%, as opposed to PC2's sensitivity of 65.8% and specificity 93.8%.As illustrated in Table VIII, the detection results for both PC1 and PC2 were in close agreement with the ones estimated by the proposed GM models.Combining the binary masks, derived from each principal component, can substantially increase the detection sensitivity to a promising 83.3% (from 74.9%), with a minimal effect in corresponding detection specificity (dropping from 94.3% to 91.8%).When combined with the pure-noise detection, the proposed approaches reliably detect uninformative frames with sensitivity of 93.0% and specificity of 92.6% (see Table IX).Part of the disagreement (good frames identified as uninformative) between manual and automatic detection can potentially be attributed in the restricted region used in the estimation of the GLCMs and the assumption that no additional information, enough to effect the decision process, is imaged in the excluded regions.
Having developed a reliable method for detecting and removing uninformative frames from OEM imaging sequences of the distal lung, the next step is to further classify the remaining, useful, frames into subcategories based on the underlying image textures.This further classification would differentiate between frames imaging the bronchus and ones imaging the alveolar space.Subsequently, alveolar space frames can potentially be further classified amongst, healthy elastin, pathological elastin, and cell-flooded frames.Such classification would enable pulmonologists to target analysis to regions of interest, reducing the subjectivity and time efforts of the analysis.With the advent and development of optical molecular imaging and exogenous contrast agents [1], [3], such frame classification will be an essential requirement to expedite quantifiable optical data analysis.

VII. CONCLUSION
Uninformative frames comprise a considerable proportion (up to >25%) of clinical pulmonary OEM frame sequences.Texture descriptors derived from the GLCM, such as contrast, energy, homogeneity, etc., provide valuable information for the detection of frames containing either pure-noise or motion-artefacts.PCA (as a mean of dimensionality reduction) combined with the proposed GM models provide a fair representation of the underlying texture information, enabling an accurate (sensitivity: 93.0%) and robust (specificity: 92.6%) detection of uninformative frames in human lung OEM frame sequences.A similar approach can be employed to further classify any informative frames based on their underlying texture assisting any manual and automatic postanalysis.Finally, conditional to appropriate model refinement, the proposed algorithms can become widely applicable in OEM frame sequences acquired on 1) other organ systems (e.g., the gastrointestinal tract), and 2) other OEM imaging platforms.

Fig. 3 .
Fig. 3. (a) Histogram corresponding to the PC1 from 72 frame sequences concatenated as a single dataset along with the corresponding 4-GM model.(b) Refined histogram along with the corresponding 2-GM model.The P-values of the relevant KS goodness-of-fit tests were: 0.88 for 4-Gaussian and 0.83 for 2-Gaussian.

Fig. 4 .
Fig. 4. Original and refined histogram along with their corresponding 4-and 2-GM models for (a) PC1 and (b) PC2 of the motion artefact data.A zoomed-in version of the original histogram is also provided to best illustrate the mixing of the 4-Gaussians and the effect of removing the 2-Gaussians from the overall distribution.The P-values of the relevant KS goodness-of-fit tests were for PC1 : 0.77 for 4-Gaussian, 0.92 for 2-Gaussian, and PC2 : 1.0 for 4-Gaussian and 1.0 for 2-Gaussian.

Fig. 6 .
Fig. 6.Line plots of detection threshold variability (robustness expressed as relative standard deviation) for increasing size of training set for (a) PC1 in pure-noise detection and (b), (c) PC1 and PC2 in motion artefact detection.In all cases, a set size of 30 000 enables a robust (<6% RSD) threshold estimation.

TABLE I DATASETS
AND RELATIVE DIAGNOSIS FOR TRAINING AND TESTING SETS

TABLE II DURATION
RANGE (IN NUMBER OF FRAMES) AND TOTAL DURATION FOR TRAINING AND TESTING SETS

TABLE III TOTAL
NUMBER OF FRAMES ANNOTATED AS MOTION ARTEFACTS BY EACH OPERATOR INDEPENDENTLY, THE UNION AND INTERSECTION OF THE TWO SETS AS WELL AS THE CORRESPONDING JACCARD INDEX (AGREEMENT BETWEEN TWO OPERATORS)

TABLE V RESULTS
OF KS TEST ASSESSING THE GOODNESS-OF-FIT BETWEEN ORIGINAL AND MODEL EDFS

TABLE VI EFFECT
OF SIMPLIFYING THE MODEL FROM 4-TO 2-GAUSSIANS (BY REMOVING CORRESPONDING FRAMES) ON THE KS GOODNESS-OF-FIT

TABLE VII SENSITIVITY
AND SPECIFICITY FOR THE AUTOMATIC DETECTION OF PURE-NOISE FRAMES

TABLE VIII SENSITIVITY
AND SPECIFICITY OF THE AUTOMATIC DETECTION OF MOTION FRAMES FOR ALL OF THE TESTING DATASETS COMBINED TOGETHERThe first two principal components are treated separately.Model estimates for 1) PC1: 75.9% sensitivity and 96.4% specificity and (ii) PC2: 69.3% sensitivity and 96.1% specificity.

TABLE IX OVERALL
RANGE OF SENSITIVITY AND SPECIFICITY OF THE AUTOMATIC UNINFORMATIVE FRAME DETECTION, COMBINING PURE-NOISE AND MOTIONARTEFACT (PC1 ∪ PC2) FRAMES