Sub-category Classifiers for Multiple-instance Learning and Its Application to Retinal Nerve Fiber Layer Visibility Classification

Manivannan, Siyamalan; Cobb, Caroline; Burgess, Stephen; Trucco, Emanuele

doi:10.1007/978-3-319-46723-8_36

Siyamalan Manivannan¹⁸,
Caroline Cobb¹⁹,
Stephen Burgess¹⁹ &
…
Emanuele Trucco¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9901))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

13k Accesses
4 Citations

Abstract

We propose a novel multiple instance learning method to assess the visibility (visible/not visible) of the retinal nerve fiber layer (RNFL) in fundus camera images. Using only image-level labels, our approach learns to classify the images as well as to localize the RNFL visible regions. We transform the original feature space to a discriminative subspace, and learn a region-level classifier in that subspace. We propose a margin-based loss function to jointly learn this subspace and the region-level classifier. Experiments with a RNFL dataset containing 576 images annotated by two experienced ophthalmologists give an agreement (kappa values) of 0.65 and 0.58 respectively, with an inter-annotator agreement of 0.62. Note that our system gives higher agreements with the more experienced annotator. Comparative tests with three public datasets (MESSIDOR and DR for diabetic retinopathy, UCSB for breast cancer) show improved performance over the state-of-the-art.

You have full access to this open access chapter, Download conference paper PDF

An interpretable multiple-instance approach for the detection of referable diabetic retinopathy in fundus images

Article Open access 12 July 2021

Auto-classification of Retinal Diseases in the Limit of Sparse Data Using a Two-Streams Machine Learning Model

Combining Fine- and Coarse-Grained Classifiers for Diabetic Retinopathy Detection

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

This paper introduces an automatic system assessing the visibility and location of the RNFL in fundus camera (FC) images from image-level labels. The optic nerve transmits visual information from the retina to the brain. It connects to the retina in the optic disc, and its expansion form the RNFL, the innermost retinal layer. The RNFL has been recently considered as a potential biomarker for dementia [1], by assessing its thickness in optical confocal tomography (OCT) images. However, screening of high numbers of patients would be enabled if the RNFL could be assessed with FC, still much more common than OCT for retinal inspection, and increasingly part of routine optometry checks.

Very little work exists on RNFL-related studies with FC images on studying associations with dementia [2]. This is contrast with RNFL analysis via OCT, supported by a rich literature [1]. The RNFL is not always visible in FC images, and its visibility itself has been posited as a biomarker for neurodegenerative conditions. This motivates our work, part of a larger project on multi-modal retina-brain biomarkers for dementia^{Footnote 1}.

We report an automatic system to identify FC images with visible RNFL regions and simultaneously localize visible regions. A crucial challenge is obtaining ground truth annotations of visible RNFL regions from clinicians, notoriously a difficult and time-consuming process. We take therefore a Multiple Instance Learning (MIL) approach, requiring only image-level labels (RNFL visible/invisible), which can be generated much more efficiently. In MIL, images are regarded as bags, and image regions as instances.

Visible RNFL regions have significant intra-class variations, and can be difficult to distinguish from RNFL-invisible regions. To address this, we embed the instances in a discriminative subspace defined by the outputs of a set of subcategory classifiers. An instance-level (IL) classifier is then learned in that subspace by maximizing the margin between positive and negative bags. A margin-based loss is proposed to learn the IL and the subcategory classifiers jointly.

Our two main contributions are the following.

1.
To our best knowledge, we address a new problem with significant impact potential for biomarker discovery, i.e. classifying FC images as RNFL-visible/invisible, including region localization.
2.
We improve experimental performance compared to state-of-the-art MIL systems by proposing a novel MIL approach with a novel margin-based loss (instead of the cross-entropy loss commonly used in comparable MIL systems).

The differences between our and recent, comparable work are captured in Sect. 2 after a concise discussion of related work.

We evaluated our approach on a local dataset (“RNFL”) of 576 FC images, and with three public datasets (MESSIDOR [3] and DR [4] for diabetic retinopathy, UCSB [5] for breast cancer). Table 1 summarizes the datasets and the experimental settings used. The images (green channel) in our RNFL dataset were annotated (image-level annotations) independently by two experienced ophthalmologists (A1 and A2, A1 the more experienced). Overall, they agreed $\simeq 83\,\%$ of the time ($\mathcal {P}\simeq 83\,\%$) with a kappa value of $\mathcal {K}\simeq 0.62$. Our experiments suggest that our system highly agrees with A1 than A2 (system agreement with A1, $\mathcal {P}\simeq 84\,\%$ with $\mathcal {K}\simeq 0.65$ and A2, $\mathcal {P}\simeq 82\,\%$ with $\mathcal {K}\simeq 0.58$). Our approach also improves the state-of-the-art results on the public datasets (see Table 2).

2 Related Work

MIL approaches can be divided in two broad classes, (1) instance-level (IL) and (2) bag-level (BL). In both cases a classifier is trained to separate positive from negative bags using a loss function defined at the bag-level. IL approaches: the classifier is trained to classify instances, obtaining IL predictions. Here, BL predictions are usually obtained by aggregating IL decisions, e.g. MI-SVM [6], MCIL [7]. BL approaches: a classifier is trained to classify bags. Usually a feature representation is computed for each bag from its instances, then used to learn a supervised classifier. As this is trained at the BL, IL predictions cannot be obtained directly; e.g. JC$^2$MIL [8], and RMC-MIL [9].

The original feature space may not be discriminative. Hence embedding-based (EB) approaches try to embed the instances in a discriminative space [8–10]. The bag representation computed in this space is used to learn a BL classifier.

MIL approaches have also been explored within the recent, successful Convolutional Neural Networks (CNN) paradigm for visual recognition [11]. Here, a MIL pooling layer is introduced at the end of the deep network architecture to aggregate (pool) IL predictions and compute the BL ones.

Our approach is an EB approach; but it learns an IL classifier instead of the BL one learned in [8, 9]. Therefore it can provide both IL and BL predictions. CNN+MIL [11], EB approaches [8, 9, 12] as well as other approaches [7] minimize cross-entropy loss. However, recent results suggest that margin-based loss is better than the cross-entropy loss for classification problems [13]. Considering this, we propose a novel soft-margin loss where the bags which violate the margin are penalized, and show improved performance over the cross-entropy loss.

3 Method

3.1 Motivation and the Overview of the Method

Most MIL approaches do not make explicit assumptions about the inter or intra-class variations of the positive and negative bags (e.g. [6, 14]). However, with high intra-class variation and low inter-class distinction these approaches may not perform well. This is the case for our RNFL dataset: the visible RNFL regions have a high intra-class variations, and they are difficult to distinguish from RNFL-invisible regions (Fig. 1). To overcome this, we assume there exists a set of discriminative sub-categories, and learn a set of classifiers for them. These sub-categories, for instance, may capture different variations (or visual appearance) of the RNFL. Each classifier in this pool is learned specifically to separate a particular sub-category from others. Each instance is thus transformed from its original feature space to a discriminative subspace defined by the output of these classifiers. An IL classifier is then learned in this space by maximizing the margin between the positive and the negative bags. For each bag, the BL prediction is obtained by aggregating (pooling) the decisions of its instances. An overview of the proposed approach is illustrated in Fig. 2.

3.2 Sub-category Classifiers for MIL

Let the training dataset contain $\{(B_i, y_i)\}_{i = 1}^N$, where $B_i$ is the $i^\text {th}$ bag (image), $y_i\in \{-1,1\}$ is its label, and N is the number of bags. Each bag $B_i$ consists of $N_i$ instances (image regions), so that $B_i = \{\mathbf {x}_{ij}\}_{j=1}^{N_i}$, where $\mathbf {x}_{ij}\in \mathbb {R}^d$ is the feature representation of the $j^\text {th}$ instance of the $i^\text {th}$ bag.

Let $\mathcal {M}=\left[ \mathbb {\mu }_1, \dots , \mathbb {\mu }_K\right] \in \mathbb {R}^{d\times K}$ be a set of sub-category classifiers, where each classifier is learned to separate a particular sub-category from others. The probability of an instance $\mathbf {x}_{ij}$ belonging to the $k^\text {th}$ sub-category vs rest can be given as $q_{ijk} = \sigma (\mathbb {\mu }_k^T\mathbf {x}_{ij})$, where $\sigma (\mathbf {x}) = 1/(1 + \exp (-\mathbf {x}))$. The new instance-representation $\mathbf {z}_{ij}$ in the discriminative sub-space is defined by the outputs of these sub-category classifiers, i.e. $\mathbf {z}_{ij} = \left[ q_{ij1}, \dots , q_{ijk}\right] $. Let $\mathbf {w}\in \mathbb {R}^K$ define the IL classifier which is learned in this discriminative subspace, and $p_{ij} = \sigma (\mathbf {w}^T\mathbf {z}_{ij})$ the probability of the instance $\mathbf {x}_{ij}$ belonging to the positive class.

The BL probability, $P_i$, of a bag $B_i$ can be obtained by aggregating (pooling) the probabilities of the instances inside the bag. In this work, we use the generalized-mean operator ($\mathcal {G}$) for aggregation: $P_i = \left( \frac{1}{N_i}\sum _{j=1}^{N_i} p_{ij}^r\right) ^{1/r}$, where r is a pooling parameter. When $r=1$, $\mathcal {G}$ becomes average-pooling, and large r values ($r\rightarrow \infty $) approximate max-pooling.

The sub-category classifiers ($\mathcal {M}$), the pooling parameter (r), and the IL classifier ($\mathbf {w}$) can be learned using a cross-entropy loss function (Eq. (1)).

$$\begin{aligned} \mathop {{{\mathrm{arg\,min}}}}\limits _{r, \mathcal {M}, \mathbf {w}}\; \frac{\lambda }{2}\Vert \mathbf {w}\Vert _2^2 - \frac{1}{N_+}\sum _{i:y_i=1}\log (P_i) - \frac{1}{N_-}\sum _{i:y_i=-1}\log (1 - P_i) \end{aligned}$$

(1)

where $P_i= P_i(y_i=1|B_i, r, \mathcal {M})$, $\lambda $ is a regularization parameter, and $N_+$, $N_-$ are the total number of positive and negative bags in the training set respectively. Note that, this loss is widely used by the existing MIL approaches in [8, 9, 11, 12].

Instead, we propose a margin-based loss (Eq. (2)) which penalizes the bags violating the margin, as margin-based loss has two main advantages over the cross-entropy loss [13]. (1) It tries to improve the classification accuracy of the training data (by focussing on the wrongly classified images), instead of making the correct predictions more accurate (as in cross-entropy loss). (2) It improves training speed, as model updates are only based on the images classified wrongly; the ones classified correctly will not contribute to the model updates, and can be avoided altogether in derivative calculations.

$$\begin{aligned} \mathop {{{\mathrm{arg\,min}}}}\limits _{r, \mathcal {M}, \mathbf {w}}\; \frac{\lambda }{2}\Vert \mathbf {w}\Vert _2^2&+ \frac{1}{N_+}\sum _{i:y_i=1}\mathcal {L}_i(y_i,B_i, r, \mathcal {M}) + \frac{1}{N_-}\sum _{i:y_i=-1}\mathcal {L}_i(y_i,B_i, r, \mathcal {M}))\\ \nonumber \text {where,}\quad&\mathcal {L}_i(y_i, B_i, r,\mathcal {M}) = \max \left[ 0, \;\gamma + y_i(0.5-P_i)\right] ^2. \end{aligned}$$

(2)

$\gamma \in (0,0.5]$ is a margin parameter. In our experiments we set $\gamma = 0.1$, $\lambda =10^2$.

Initialization and optimization: We use gradient descent to optimize Eq. (2), alternating between optimizing $\mathcal {M}$, $\mathbf {w}$ and r until convergence. To initialize $\mathcal {M}$, first the instances from the training set are clustered using k-means (dictionary size $=K$), then a set of one-vs-rest linear SVM classifiers are learned to separate each cluster from the rest. These classifiers give the initial values to $\mathcal {M}$.

4 Experiments

4.1 Datasets and Experimental Settings

The experimental settings for different datasets are summarized in Table 1.

(1) Messidor [3]: A public diabetic retinopathy screening dataset, contains 1200 eye fundus images. Well studied in [3] for BL classification. Each image was rescaled to $700\times 700$ pixels and split into $135\times 135$ regions. Each region was represented by a set of features including intensity histograms and texture.

(2) The diabetic retinopathy (DR) screening dataset [4]: 425 FC images, constructed from 4 publicly available datasets (DiabRetDB0, DiabRetDB1, STARE and Messidor). Each image is represented by a set of 48 instances.

(3) UCSB breast cancer [5]: 58 TMA H&E stained breast images (26 malignant, 32 benign). Used in [5, 8, 12] to compare different MIL approaches; each image was divided into 49 instances, and each instance is represented by a 708-dimensional feature vector including SIFT and local binary patterns.

(4) RNFL retinal fundus image dataset: Green channel was considered for processing. Images were resized preserving their aspect ratio so that their maximum dimension (row or column) becomes 700 pixels. Each image is then histogram-equalized. Instances (square image regions) of size $128\times 128$ pixels with an overlap of 64 pixels are extracted, leading to $\sim 90$ instances per image. Inside each instance, SIFT features (patch size of $24\times 24$ pixels, overlap 16 pixels) are computed and encoded using Sparse Coding with a dictionary size of 200. Average-pooling was applied to get a feature representation for each instance.

Table 1. Datasets and experimental settings (FCV-fold cross validation).

Full size table

Table 2. Results on the public datasets. All the results except ours and mi-Graph were copied from [3–5]. Some references are omitted due to space. Different evaluation measures were used as they were reported in [3–5].

Full size table

4.2 Experiments with Public Medical Image Datasets

Table 2 reports the comparative results on the public datasets. For fair comparison we use directly the features made publicly available^{Footnote 2}, and follow the same experimental set-up used by the existing approaches.

With Messidor, our approach gives a competitive accuracy of $73.1\,\%$ (with a standard error of $\pm 0.12$) compared to the accuracy obtained by mi-Graph,which however cannot provide IL predictions as a BL approach. With DR, our approach improves the state-of-the-art accuracy by $\sim 4\,\%$. With UCSB, our approach achieves an AUC of 0.965 with a standard error of 0.001. Our Equal Error Rate was $0.07\pm 0.002$, much smaller than the one reported in [12] ($0.16\pm 0.03$).

The figure on the right shows K (x-axis) vs. accuracy values (y-axis) for the DR dataset. As expected, increasing K improves the accuracy, saturating for $K>150$. This figure also shows that the margin-based loss (Eq. (2)) outperforms the cross-entropy loss (Eq. (1)). The advantages of the margin-based loss are discussed in Sect. 3.

4.3 RNFL Visibility Classification

We used the public code from [3] for MILBoost and mi-SVM, taking care to select the parameters guaranteeing the fairest possible comparison. As a further basline, we implemented BL-SVM, a supervised linear SVM classifier trained on the image-level feature representations obtained by average-pooling the dictionary-encoded (size 200) SIFT features. The training images with consensus labels from the annotators were used for training for each cross-validation.

Table 3 reports the results. Our approach gives better agreement with the annotators than other approaches. Over the entire RNFL dataset we found that the inter-annotator agreement is $\mathcal {P}=82.99\,\%$ with a kappa value of $\mathcal {K} = 0.6190$. Our approach gives higher agreement with the experienced annotator (A1) than the less-experienced one (A2). Notice that, although BL-SVM gives a competitive performance compared to our approach, it cannot give region-level predictions as a BL method. Figure 3 shows some region-level predictions by our approach.

Table 3. Approaches and their agreements ($\mathcal {P}$ and $\mathcal {K}$ ± standard error) with different annotators (A1 and A2) for RNFL visibility classification. Note that the agreement between the two annotators is $\mathcal {P}=82.99\,\%$ and $\mathcal {K}=0.6190$.

Full size table

5 Conclusions

The RNFL thickenss and its visibility have been posited as biomarkers for neurodegenerative conditions. We have proposed a novel MIL method to assess the visibility (visible/not visible) of the RNFL in fundus camera images, which would enable screening of much larger patient volumes than OCT. In addition, our approach locates visible RNFL regions from image-level training labels.

Experiments suggest that our margin-based loss solution performs better than the cross-entropy loss used by existing EB MIL approaches [8, 9, 12]. Experiments with a local RNFL and 3 public medical image datasets show considerable improvements compared to the state-of-the-art. Future work will address the associations of RNFL visibility with brain features and patient outcome.

Notes

1.
EPSRC grant EP/M005976/1.
2.
Messidor and UCSB cancer: http://www.miproblems.org/datasets/;
DR: https://github.com/ragavvenkatesan/np-mil.

References

Thomson, K.L., Yeo, J.M., Waddell, B., Cameron, J.R., Pal, S.: A systematic review and meta-analysis of retinal nerve fiber layer change in Dementia, using optical coherence tomography. Alzheimer’s Dementia Diagn. Assess. Disease Monit. 1(2), 136–143 (2015)
Article Google Scholar
Hedges, H.R., Galves, R.P., Speigelman, D., Barbas, N.R., Peli, E., Yardley, C.J.: Retinal nerve fiber layer abnormalities in Alzheimer’s disease. Acta Ophthalmologica Scandinavica 74(3), 271–275 (1996)
Article Google Scholar
Kandemir, M., Hamprecht, F.A.: Computer-aided diagnosis from weak supervision: a benchmarking study. Comput. Med. Imaging Graph. 42, 44–50 (2015)
Article Google Scholar
Venkatesan, R., Chandakkar, P., Li, B.: Simpler non-parametric methods provide as good or better results to multiple-instance learning. In: IEEE International Conference on Computer Vision, pp. 2605–2613 (2015)
Google Scholar
Kandemir, M., Zhang, C., Hamprecht, F.A.: Empowering multiple instance histopathology cancer diagnosis by cell graphs. In: Golland, P., Hata, N., Barillot, C., Hornegger, J., Howe, R. (eds.) MICCAI 2014. LNCS, vol. 8674, pp. 228–235. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10470-6_29
Google Scholar
Andrews, S., Tsochantaridis, I., Hofmann, T.: Support vector machines for multiple-instance learning. In: Advances in Neural Information Processing Systems 15, pp. 561–568. MIT Press (2003)
Google Scholar
Xu, Y., Zhu, J.Y., Chang, E., Tu, Z.: Multiple clustered instance learning for histopathology cancer image classification, segmentation and clustering. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 964–971 (2012)
Google Scholar
Sikka, K., Giri, R., Bartlett, M.: Joint clustering and classification for multiple instance learning. In: British Machine Vision Conference, pp. 71.1–71.12 (2015)
Google Scholar
Ruiz, A., Van de Weijer, J., Binefa, X.: Regularized multi-concept MIL for weakly-supervised facial behavior categorization. In: British Machine Vision Conference (2014)
Google Scholar
Chen, Y., Bi, J., Wang, J.: MILES: multiple-instance learning via embedded instance selection. IEEE Trans. Pattern Anal. Mach. Intell. 28(12), 1931–1947 (2006)
Article Google Scholar
Kraus, O.Z., Ba, L.J., Frey, B.J.: Classifying and segmenting microscopy images using convolutional multiple instance learning. CoRR abs/1511.05286 (2015)
Google Scholar
Li, W., Zhang, J., McKenna, S.J.: Multiple instance cancer detection by boosting regularised trees. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9349, pp. 645–652. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24553-9_79
Chapter Google Scholar
Jin, J., Fu, K., Zhang, C.: Traffic sign recognition with hinge loss trained convolutional neural networks. IEEE Intell. Transp. Syst. 15(5), 1991–2000 (2014)
Article Google Scholar
Zhou, Z.H., Sun, Y.Y., Li, Y.F.: Multi-instance learning by treating instances as non-i.i.d. samples. In: International Conference on Machine Learning, pp. 1249–1256 (2009)
Google Scholar

Download references

Acknowledgement

S. Manivannan is supported by EPSRC grant EP/M005976/1. The authors would like to thank Prof. Stephen J. McKenna and Dr. Jianguo Zhang for valuable comments.

Author information

Authors and Affiliations

CVIP, School of Science and Engineering (Computing), University of Dundee, Dundee, UK
Siyamalan Manivannan & Emanuele Trucco
Department of Ophthalmology, NHS Ninewells, Dundee, UK
Caroline Cobb & Stephen Burgess

Authors

Siyamalan Manivannan
View author publications
You can also search for this author in PubMed Google Scholar
Caroline Cobb
View author publications
You can also search for this author in PubMed Google Scholar
Stephen Burgess
View author publications
You can also search for this author in PubMed Google Scholar
Emanuele Trucco
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Siyamalan Manivannan .

Editor information

Editors and Affiliations

University College London , London, United Kingdom
Sebastien Ourselin
The Hebrew University of Jerusalem , Jerusalem, Israel
Leo Joskowicz
Harvard Medical School , Boston, Massachusetts, USA
Mert R. Sabuncu
Istanbul Technical University , Istanbul, Turkey
Gozde Unal
Harvard Medical School and Brigham and Women's Hospital, Boston, Massachusetts, USA
William Wells

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Manivannan, S., Cobb, C., Burgess, S., Trucco, E. (2016). Sub-category Classifiers for Multiple-instance Learning and Its Application to Retinal Nerve Fiber Layer Visibility Classification. In: Ourselin, S., Joskowicz, L., Sabuncu, M., Unal, G., Wells, W. (eds) Medical Image Computing and Computer-Assisted Intervention – MICCAI 2016. MICCAI 2016. Lecture Notes in Computer Science(), vol 9901. Springer, Cham. https://doi.org/10.1007/978-3-319-46723-8_36

Download citation

DOI: https://doi.org/10.1007/978-3-319-46723-8_36
Published: 02 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46722-1
Online ISBN: 978-3-319-46723-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

Sub-category Classifiers for Multiple-instance Learning and Its Application to Retinal Nerve Fiber Layer Visibility Classification

Abstract

Similar content being viewed by others

An interpretable multiple-instance approach for the detection of referable diabetic retinopathy in fundus images

Auto-classification of Retinal Diseases in the Limit of Sparse Data Using a Two-Streams Machine Learning Model

Combining Fine- and Coarse-Grained Classifiers for Diabetic Retinopathy Detection

Keywords

1 Introduction

2 Related Work

3 Method