Collective Deep Quantization for Efficient Cross-Modal Retrieval

Authors

  • Yue Cao Tsinghua University
  • Mingsheng Long Tsinghua University
  • Jianmin Wang Tsinghua University
  • Shichen Liu Tsinghua University

DOI:

https://doi.org/10.1609/aaai.v31i1.11218

Keywords:

Deep quantization, Collective quantization

Abstract

Cross-modal similarity retrieval is a problem about designing a retrieval system that supports querying across content modalities, e.g., using an image to retrieve for texts. This paper presents a compact coding solution for efficient cross-modal retrieval, with a focus on the quantization approach which has already shown the superior performance over the hashing solutions in single-modal similarity retrieval. We propose a collective deep quantization (CDQ) approach, which is the first attempt to introduce quantization in end-to-end deep architecture for cross-modal retrieval. The major contribution lies in jointly learning deep representations and the quantizers for both modalities using carefully-crafted hybrid networks and well-specified loss functions. In addition, our approach simultaneously learns the common quantizer codebook for both modalities through which the cross-modal correlation can be substantially enhanced. CDQ enables efficient and effective cross-modal retrieval using inner product distance computed based on the common codebook with fast distance table lookup. Extensive experiments show that CDQ yields state of the art cross-modal retrieval results on standard benchmarks.

Downloads

Published

2017-02-12

How to Cite

Cao, Y., Long, M., Wang, J., & Liu, S. (2017). Collective Deep Quantization for Efficient Cross-Modal Retrieval. Proceedings of the AAAI Conference on Artificial Intelligence, 31(1). https://doi.org/10.1609/aaai.v31i1.11218