Abstract
Recognizing the location of a query image by matching it to an image database is an important problem in computer vision, and one for which the representation of the database is a key issue. We explore new ways for exploiting the structure of an image database by representing it as a graph, and show how the rich information embedded in such a graph can improve bag-of-words-based location recognition methods. In particular, starting from a graph based on visual connectivity, we propose a method for selecting a set of overlapping subgraphs and learning a local distance function for each subgraph using discriminative techniques. For a query image, each database image is ranked according to these local distance functions in order to place the image in the right part of the graph. In addition, we propose a probabilistic method for increasing the diversity of these ranked database images, again based on the structure of the image graph. We demonstrate that our methods improve performance over standard bag-of-words methods on several existing location recognition datasets.
Similar content being viewed by others
Notes
Note that feature detectors such as SIFT often detect unstable features in an images that are not matched to features in any other image in the collection.
Note that the tf-idf weighting of the bag-of-words histograms could be seen as redundant for the purposes of learning an SVM since the SVM is itself learning a per-word weight. However, we found such “pre-weighting”—i.e., using tf-idf weighted histograms as opposed to raw histograms as feature vectors—to be useful in practice, perhaps as a form of regularization.
References
Agarwal, S., Snavely, N., Simon, I., Seitz, S., & Szeliski, R. (2009). Building Rome in a day. In ICCV.
Arandjelovic, R., & Zisserman, A. (2012). Three things everyone should know to improve object retrieval. In CVPR.
Cao, S., & Snavely, N. (2012). Learning to match images in large-scale collections. In ECCV Workshop on Web-scale Vision and Social Media.
Chen, H., & Karger, D. R. (2006). Less is more: Probabilistic models for retrieving fewer relevant documents. In ACM SIGIR.
Chum, O., & Matas, J. (2010). Large-scale discovery of spatially related images. PAMI.
Chum, O., & Matas, J. (2010). Unsupervised discovery of co-occurrence in sparse high dimensional data. In CVPR.
Doersch, C., Singh, S., Gupta, A., Sivic, J., & Efros, A. A. (2012). What makes Paris look like Paris?. SIGGRAPH.
Fan, R., Chang, K., Hsieh, C., Wang, X., & Lin, C. (2008). Liblinear: A library for large linear classification. The Journal of Machine Learning Research, 9, 1871–1874.
Frahm, J. M., & et al. (2010). Building Rome on a cloudless day. In ECCV.
Frome, A., Singer, Y., Sha, F., & Malik, J. (2007). Learning globally-consistent local distance functions for shape-based image retrieval and classification. In ICCV.
Guha, S., & Khuller, S. (1998). Approximation algorithms for connected dominating sets. Algorithmica, 20, 374–387.
Havlena, M., Torii, A., & Pajdla, T. (2010). Efficient structure from motion by graph optimization. In ECCV.
Hays, J., & Efros, A. (2008). Im2gps: Estimating geographic information from a single image. In CVPR.
Irschara, A., Zach, C., Frahm, J., & Bischof, H. (2009). From structure-from-motion point clouds to fast location recognition. In CVPR.
Johns, E., & Yang, G. (2011). From images to scenes: Compressing an image cluster into a single scene model for place recognition. In ICCV.
Knopp, J., Sivic, J., & Pajdla, T. (2010). Avoiding confusing features in place recognition. In ECCV.
Li, X., Wu, C., Zach, C., Lazebnik, S., & Frahm, J. (2008). Modeling and recognition of landmark image collections using iconic scene graphs. In ECCV.
Li, Y., Crandall, D., & Huttenlocher, D. (2009). Landmark class-ification in large-scale image collections. In ICCV.
Li, Y., Snavely, N., & Huttenlocher, D. (2010). Location recognition using prioritized feature matching. In ECCV.
Li, Y., Snavely, N., Huttenlocher, D., & Fua, P. (2012). Worldwide pose estimation using 3D point clouds. In ECCV.
Malisiewicz, T., & Efros, A. (2009). Beyond categories: The visual memex model for reasoning about object relationships. NIPS.
Malisiewicz, T., Gupta, A., & Efros, A. A. (2011). Ensemble of exemplar-SVMs for object detection and beyond. In ICCV.
Mikulik, A., Perdoch, M., Chum, O., & Matas, J. (2010). Learning a fine vocabulary. In ECCV.
Philbin, J., Chum, O., Isard, M., Sivic, J., & Zisserman, A. (2007). Object retrieval with large vocabularies and fast spatial matching. In CVPR.
Platt, J. (1999). Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In A. Smola, P. Bartlett, B. Schölkopf, & D. Schuurmans (Eds.), Advances in Large Margin Classifiers (pp. 61–74). Cambridge, MA: MIT Press.
Robertson, S. E. (1997). Readings in information retrieval. The probability ranking principle in IR (pp. 281–286). San Francisco: Kaufmann Publishers Inc.
Sattler, T., Leibe, B., & Kobbelt, L. (2011). Fast image-based localization using direct 2D-to-3D matching. In ICCV.
Sattler, T., Leibe, B., & Kobbelt, L. (2012). Improving image-based localization by active correspondence search. In ECCV.
Sattler, T., Weyand, T., Leibe, B., & Kobbelt, L. (2012). Image retrieval for image-based localization revisited. In BMVC.
Schindler, G., Brown, M., & Szeliski, R. (2007). City-scale location recognition. In CVPR.
Shrivastava, A., Malisiewicz, T., Gupta, A., & Efros, A. A. (2011). Data-driven visual similarity for cross-domain image matching. SIGGRAPH ASIA.
Sivic, J., & Zisserman, A. (2003). Video Google: A text retrieval approach to object matching in videos. In ICCV.
Torii, A., Sivic, J., & Pajdla, T. (2011). Visual localization by linear combination of image descriptors. In ICCV Workshops
Turcot, P., & Lowe, D. (2009). Better matching with fewer features: The selection of useful features in large database recognition problems. In Workshop on Emergent Issues in Large Amounts of Visual Data, ICCV.
Zheng, Y. T., et al. (2009). Tour the world: Building a web-scale landmark recognition engine. In CVPR.
Acknowledgments
This work was supported in part by the National Science Foundation (Grants IIS-1149393 and IIS-0964027) and Intel Corporation. We also thank Flickr users for use of their photos.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Derek Hoiem, James Hays, Jianxiong Xiao, and Aditya Khosla.
Appendix: Derivation for the iterative usage of the update factor
Appendix: Derivation for the iterative usage of the update factor
Here we provide the full derivation of how our diversity reranking method updates the probability scores for each image each time we select a new image for the shortlist. Let \(n\) denote the number of images that have already been selected for the ranked shortlist \(RL = \{i_1, i_2, ..., i_n\}\). For any image \(u\) in the as-yet-unselected set of images \(\mathcal {I} \setminus RL\), we wish to compute
which is the conditional probability of image \(u\) matching the query image given that all the \(n\) images in the current ranking list do not match the query. To simplify the analysis, we separately consider the cases \(n=1\), \(n=2\), and the general case \(n=k\) (note that in Section 5 we denote \(P_u^{(1)}\) as \(P'_u\)).
-
\(n = 1\): From Eq. (2), Section 5 derives the update factor in Eq. (3) used to compute \(P_i^{(1)}\):
$$\begin{aligned} P_u^{(1)} = P_u \frac{1 - \frac{P_{u,i_1}}{P_u}P_{i_1}}{1-P_{i_1}} \end{aligned}$$where \(P_u\) is the original probability estimate from our GBP approach.
-
\(n = 2\): From the definition of conditional probability, we have that
$$\begin{aligned} P_u^{(2)}&= \Pr (X_u = 1 | X_{i_2} = 0, X_{i_1} = 0) \nonumber \\&= \frac{\Pr (X_u = 1, X_{i_2} = 0, X_{i_1} = 0)}{\Pr (X_{i_2} = 0, X_{i_1} = 0)} \nonumber \\&= \frac{\Pr (X_u = 1, X_{i_2} = 0 | X_{i_1} = 0)\Pr (X_{i_1} = 0)}{\Pr (X_{i_2} = 0, X_{i_1} = 0)} \nonumber \\ \end{aligned}$$(7)Recall from Section 5 that we make two simplifying assumptions:
-
1.
Independence of \((X_{i_1} = 0)\) and \((X_{i_2} = 0)\) for distant \(i_1\) and \(i_2\):
$$\begin{aligned} \Pr (X_{i_2} = 0, X_{i_1} = 0) = \Pr (X_{i_2} = 0) \Pr (X_{i_1} = 0) \end{aligned}$$ -
2.
Independence of \((X_u = 1, X_{i_2} = 1)\) and \((X_{i_1} = 0)\):
$$\begin{aligned} \Pr (X_u = 1, X_{i_2} = 1 | X_{i_1} = 0) = \Pr (X_u = 1, X_{i_2} = 1) \end{aligned}$$
Given these assumptions, we can simplify Eq. (7) as follows:
$$\begin{aligned}&P_u^{(2)} = \Pr (X_u = 1 | X_{i_2} = 0, X_{i_1} = 0) \nonumber \\&= \frac{\Pr (X_u = 1, X_{i_2} = 0 | X_{i_1} = 0) \Pr (X_{i_1} = 0)}{\Pr (X_{i_2} = 0) \Pr (X_{i_1} = 0)} \nonumber \\&= \frac{\Pr (X_u = 1, X_{i_2} = 0 | X_{i_1} = 0)}{\Pr (X_{i_2} = 0)} \nonumber \\&= \frac{\Pr (X_u = 1 | X_{i_1} = 0) - \Pr (X_u = 1, X_{i_2} = 1 | X_{i_1} \!=\! 0)}{1-\Pr (X_{i_2}\!=\!1)} \nonumber \\&= \frac{\Pr (X_u = 1 | X_{i_1} = 0) - \Pr (X_u = 1, X_{i_2} = 1)}{1-\Pr (X_{i_2}=1)} \nonumber \\&= \frac{P_u^{(1)} - \Pr (X_u = 1 | X_{i_2} = 1) \Pr (X_{i_2} = 1)}{1-\Pr (X_{i_2}=1)} \nonumber \\&= \frac{P_u^{(1)} - P_{u,i_2} P_{i_2}}{1-P_{i_2}} = P_u^{(1)} \left( \frac{1 - \frac{P_{u,i_2}}{P_u^{(1)}} P_{i_2}}{1-P_{i_2}} \right) \end{aligned}$$(8)where \(P_u^{(1)}\) is the updated conditional probability of \(X_{u} = 1\) using the evidence \(X_{i_1} = 0\). Note that the update factor at the end of Eq. (8) is of the same form of Eq. (3).
-
1.
-
\(n = k\): More generally, we can similarly derive that
$$\begin{aligned} P_u^{(k)}&= P_u^{(k-1)} \left( \frac{1 - \frac{P_{u,i_k}}{P_u^{(k-1)}} P_{i_k}}{1-P_{i_k}} \right) \end{aligned}$$(9)Hence, through Eq. (9), we can update each as-yet-unchosen image \(u\) conditional probability \(P_u^{(k)}\) in a iterative fashion each time we add a new image to the ranking list \(RL\).
Rights and permissions
About this article
Cite this article
Cao, S., Snavely, N. Graph-Based Discriminative Learning for Location Recognition. Int J Comput Vis 112, 239–254 (2015). https://doi.org/10.1007/s11263-014-0774-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-014-0774-9