Robust image retrieval with hidden classes

https://doi.org/10.1016/j.cviu.2013.02.008Get rights and content

Highlights

  • We address a new robust problem, named hidden classes, in content-based image retrieval (CBIR).

  • We propose a new robust CBIR scheme using multi-image queries to tackle the difficule problem.

  • We develop a novel query detection technique to separate queries as common or novel according to their underlying classes.

  • We apply a self-adaptive retrieval strategy to handle different types of queries automatically.

  • We design and carry out a number of experiments to demonstrate the effectiveness of our scheme.

Abstract

For the purpose of content-based image retrieval (CBIR), image classification is important to help improve the retrieval accuracy and speed of the retrieval process. However, the CBIR systems that employ image classification suffer from the problem of hidden classes. The queries associated with hidden classes cannot be accurately answered using a traditional CBIR system. To address this problem, a robust CBIR scheme is proposed that incorporates a novel query detection technique and a self-adaptive retrieval strategy. A number of experiments carried out on the two popular image datasets demonstrate the effectiveness of the proposed scheme.

Introduction

Content-based image retrieval (CBIR) is an active research area. The aim of various CBIR systems is to search images by analyzing their content. Images are normally described by their low-level features such as color, texture and shape [1], [2]. In the literature, a significant amount of research has been conducted relating to CBIR [3], [4]. However, the robustness of CBIR systems has not been sufficiently investigated even though the topic of robustness has been explored extensively in traditional information retrieval [5]. We have already identified and addressed unclean queries as a problem of robustness [6], however in this paper, we will study the hidden class problem of CBIR systems employing image classification as preprocessing.

The application of image classification techniques into a CBIR system results in a user’s queries being answered with images in predefined classes, thus helping to improve retrieval accuracy and speed. However, in a large-scale image collection, some image classes may be unseen [4]. We call these hidden classes as opposed predefined classes. The existence of hidden classes severely affects the retrieval accuracy of image classification based CBIR systems. There are two approaches that can address the problem of robustness. One approach is detecting hidden classes at the stage of preprocessing in order to avoid the problem of hidden classes when answering a query. The second approach is to take hidden classes into account when answering a query because different retrieval strategies can be adopted for different queries. We decided upon the second approach because it is too difficult to detect hidden classes during preprocessing without extra information.

Under the query-by-example (QBE) paradigm, there are three problems that arise due to hidden classes. When considering hidden classes, a user’s queries can be divided into two categories; a common query and a novel query. Fig. 1 illustrates a hidden class, common query and novel query. A common query can be answered using a predefined image class because relevant images of the common query have been gathered in this class. A novel query is associated with a hidden class and it cannot be answered using any predefined image classes. The first problem is how to identify whether a query is a common or novel query. This determination will influence the retrieval strategy. The second problem is how to predict a relevant predefined image class for a common query. The third problem is how to perform image retrieval for a novel query if it is not associated with any predefined image class. The solutions to these problems will result in a new retrieval scheme that can manage the problem of hidden classes.

In this paper, we aim to address the critical problem of hidden classes in CBIR systems. Our major contributions are summarized as follows.

  • We propose a robust CBIR scheme that can incorporate multi-image queries and a support vector machine (SVM) to effectively deal with hidden classes.

  • We develop a novel query detection technique to determine whether a user’s query is a common or novel query, therefore making it feasible to consider hidden classes in the retrieval process.

  • We develop a self-adaptive retrieval strategy. For a common query, a relevant predefined image class will be predicted and the within images are ranked. For a novel query, a new method is proposed to filter out the irrelevant images before image ranking.

Finally, a number of experiments that were carried out on a Corel image dataset and the NUS-WIDE-LITE dataset [23] demonstrate the effectiveness of the proposed scheme. In particular, the improvement on precision depends on the number of hidden classes, with over 10% achieved.

The remainder of this paper is organized as follows: Section 2 reviews related work. Section 3 presents the novel CBIR scheme and discussion is provided in Section 4. In Section 5, the experimental evaluation and results are reported and the conclusion to this paper is presented in Section 6.

Section snippets

Related work

Image classification improves the accuracy and speed of a content based image retrieval (CBIR) system [4]. Images in a collection can be categorized by supervised image classification using predefined image classes. For a given query, the retrieval results of a CBIR system are generated by first locating the most relevant image followed by ranking the images within the class [7], [4]. It should be noted that image classification is not necessary for all CBIR systems. A CBIR system can be

A robust CBIR scheme

This section describes the proposed CBIR scheme as illustrated in Fig. 2. Due to hidden classes, a common query and a novel query are two types of queries requiring different retrieval strategies. In this work, novel query detection is proposed to determine if a query is a common query or a novel query. Following this, different types of queries can be answered using different image ranking methods. To support different ranking strategies, a new preprocessing is developed.

Let us consider an

Discussions of implementation

This section discusses the implementation of the proposed scheme including utilization of a multi-image query, on-line computation time, and setting of the threshold.

Experimental evaluation

A number of experiments were carried out on two image datasets, Corel [37] and NUS-WIDE [23], to evaluate the proposed scheme.

To simulate the problem of hidden classes, we assumed that certain image classes were predefined classes and other image classes were hidden classes in the experiments. For each predefined class, 30% of images in the class were randomly selected and used as the training samples. For each hidden class, no training samples were available. This means the hidden classes were

Conclusions

In this paper, we identified and addressed a new robustness problem of hidden classes which severely affected the performance of content-based image retrieval (CBIR) systems employing image classification. We observed that, because of hidden classes, the queries can be separated into two categories; either a common query or a novel query. In the proposed scheme, novel query detection was developed to determine whether a query was a novel query or a common query. A self-adaptive strategy was

Acknowledgments

The authors thank Dr. Jinhui Tang for providing the NUS-WIDE dataset [23]. The authors would also like to thank the anonymous reviewers for their thoughtful and insightful comments that helped to improve the quality of this paper.

References (38)

  • M. Varma et al.

    Learning the discriminative power-invariance trade-off

    IEEE Int. Conf. Comput. Vision

    (2007)
  • C.H. Lampert et al.

    Beyond sliding windows: object localization by efficient subwindow search

    IEEE Conf. Comput. Vision Pattern Recogn.

    (2008)
  • H. Zhang et al.

    SVM-KNN: discriminative nearest neighbor classification for visual category recognition

    IEEE Int. Conf. Comput. Vision Pattern Recogn.

    (2006)
  • R. Raina et al.

    Self-taught learning: transfer learning from unlabeled data

    Int. Conf. Mach. Learn.

    (2007)
  • O. Boiman et al.

    In defense of nearest-neighbor based image classification

    IEEE Conf. Comput. Vision Pattern Recogn.

    (2008)
  • N. Rasiwasia et al.

    Bridging the gap: query by semantic example

    IEEE Trans. Multimedia

    (2007)
  • C.G.M. Snoek et al.

    Adding semantics to detectors for video retrieval

    IEEE Trans. Multimedia

    (2007)
  • J. Tang, S. Yan, R. Hong, G.-J. Qi, T.-S. Chua, Inferring semantic concepts from community-contributed images and noisy...
  • J. Tang et al.

    Correlative linear neighborhood propagation for video annotation

    IEEE Trans. Syst., Man, Cybern. B

    (2009)
  • Cited by (5)

    • A novel scale and rotation invariant texture image retrieval method using fuzzy logic classifier

      2014, Computers and Electrical Engineering
      Citation Excerpt :

      Gradient Field Histogram of Gradients (GF-HOG) is a adaptive form of the HOG descriptor suitable for Sketch Based Image Retrieval (SBIR). Zhang et al. [17] presented a scheme for CBIR where classification is used to improve the retrieval accuracy to overcome hidden classes problem. In Non-parametric classifiers category, Artificial Neural Network (ANN) and fuzzy logic methods are important.

    • Sequency-ordered generalized Walsh-Fourier Transform based shape description and retrieval

      2017, ICALIP 2016 - 2016 International Conference on Audio, Language and Image Processing - Proceedings
    • Image retrieval in cloud computing environment with the help of fuzzy semantic relevance matrix

      2016, Proceedings of the 10th INDIACom; 2016 3rd International Conference on Computing for Sustainable Global Development, INDIACom 2016
    • ANN based semantic Content Based Image Retrieval with distributed processing

      2014, Proceedings of 2014 International Conference on Contemporary Computing and Informatics, IC3I 2014
    • Semantic content based image retrieval technique using cloud computing

      2013, 2013 IEEE International Conference on Computational Intelligence and Computing Research, IEEE ICCIC 2013

    This paper has been recommended for acceptance by Chung-Sheng Li.

    View full text