Collective Deep Quantization for Efficient Cross-Modal Retrieval

Yue Cao; Mingsheng Long; Jianmin Wang; Shichen Liu

doi:10.1609/aaai.v31i1.11218

Authors

Yue Cao Tsinghua University
Mingsheng Long Tsinghua University
Jianmin Wang Tsinghua University
Shichen Liu Tsinghua University

DOI:

https://doi.org/10.1609/aaai.v31i1.11218

Keywords:

Deep quantization, Collective quantization

Abstract

Cross-modal similarity retrieval is a problem about designing a retrieval system that supports querying across content modalities, e.g., using an image to retrieve for texts. This paper presents a compact coding solution for efficient cross-modal retrieval, with a focus on the quantization approach which has already shown the superior performance over the hashing solutions in single-modal similarity retrieval. We propose a collective deep quantization (CDQ) approach, which is the first attempt to introduce quantization in end-to-end deep architecture for cross-modal retrieval. The major contribution lies in jointly learning deep representations and the quantizers for both modalities using carefully-crafted hybrid networks and well-specified loss functions. In addition, our approach simultaneously learns the common quantizer codebook for both modalities through which the cross-modal correlation can be substantially enhanced. CDQ enables efficient and effective cross-modal retrieval using inner product distance computed based on the common codebook with fast distance table lookup. Extensive experiments show that CDQ yields state of the art cross-modal retrieval results on standard benchmarks.

Collective Deep Quantization for Efficient Cross-Modal Retrieval

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription