Skip to main content
Log in

Graph-Based Discriminative Learning for Location Recognition

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Recognizing the location of a query image by matching it to an image database is an important problem in computer vision, and one for which the representation of the database is a key issue. We explore new ways for exploiting the structure of an image database by representing it as a graph, and show how the rich information embedded in such a graph can improve bag-of-words-based location recognition methods. In particular, starting from a graph based on visual connectivity, we propose a method for selecting a set of overlapping subgraphs and learning a local distance function for each subgraph using discriminative techniques. For a query image, each database image is ranked according to these local distance functions in order to place the image in the right part of the graph. In addition, we propose a probabilistic method for increasing the diversity of these ranked database images, again based on the structure of the image graph. We demonstrate that our methods improve performance over standard bag-of-words methods on several existing location recognition datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. Note that feature detectors such as SIFT often detect unstable features in an images that are not matched to features in any other image in the collection.

  2. Note that the tf-idf weighting of the bag-of-words histograms could be seen as redundant for the purposes of learning an SVM since the SVM is itself learning a per-word weight. However, we found such “pre-weighting”—i.e., using tf-idf weighted histograms as opposed to raw histograms as feature vectors—to be useful in practice, perhaps as a form of regularization.

References

  • Agarwal, S., Snavely, N., Simon, I., Seitz, S., & Szeliski, R. (2009). Building Rome in a day. In ICCV.

  • Arandjelovic, R., & Zisserman, A. (2012). Three things everyone should know to improve object retrieval. In CVPR.

  • Cao, S., & Snavely, N. (2012). Learning to match images in large-scale collections. In ECCV Workshop on Web-scale Vision and Social Media.

  • Chen, H., & Karger, D. R. (2006). Less is more: Probabilistic models for retrieving fewer relevant documents. In ACM SIGIR.

  • Chum, O., & Matas, J. (2010). Large-scale discovery of spatially related images. PAMI.

  • Chum, O., & Matas, J. (2010). Unsupervised discovery of co-occurrence in sparse high dimensional data. In CVPR.

  • Doersch, C., Singh, S., Gupta, A., Sivic, J., & Efros, A. A. (2012). What makes Paris look like Paris?. SIGGRAPH.

  • Fan, R., Chang, K., Hsieh, C., Wang, X., & Lin, C. (2008). Liblinear: A library for large linear classification. The Journal of Machine Learning Research, 9, 1871–1874.

    MATH  Google Scholar 

  • Frahm, J. M., & et al. (2010). Building Rome on a cloudless day. In ECCV.

  • Frome, A., Singer, Y., Sha, F., & Malik, J. (2007). Learning globally-consistent local distance functions for shape-based image retrieval and classification. In ICCV.

  • Guha, S., & Khuller, S. (1998). Approximation algorithms for connected dominating sets. Algorithmica, 20, 374–387.

    Article  MATH  MathSciNet  Google Scholar 

  • Havlena, M., Torii, A., & Pajdla, T. (2010). Efficient structure from motion by graph optimization. In ECCV.

  • Hays, J., & Efros, A. (2008). Im2gps: Estimating geographic information from a single image. In CVPR.

  • Irschara, A., Zach, C., Frahm, J., & Bischof, H. (2009). From structure-from-motion point clouds to fast location recognition. In CVPR.

  • Johns, E., & Yang, G. (2011). From images to scenes: Compressing an image cluster into a single scene model for place recognition. In ICCV.

  • Knopp, J., Sivic, J., & Pajdla, T. (2010). Avoiding confusing features in place recognition. In ECCV.

  • Li, X., Wu, C., Zach, C., Lazebnik, S., & Frahm, J. (2008). Modeling and recognition of landmark image collections using iconic scene graphs. In ECCV.

  • Li, Y., Crandall, D., & Huttenlocher, D. (2009). Landmark class-ification in large-scale image collections. In ICCV.

  • Li, Y., Snavely, N., & Huttenlocher, D. (2010). Location recognition using prioritized feature matching. In ECCV.

  • Li, Y., Snavely, N., Huttenlocher, D., & Fua, P. (2012). Worldwide pose estimation using 3D point clouds. In ECCV.

  • Malisiewicz, T., & Efros, A. (2009). Beyond categories: The visual memex model for reasoning about object relationships. NIPS.

  • Malisiewicz, T., Gupta, A., & Efros, A. A. (2011). Ensemble of exemplar-SVMs for object detection and beyond. In ICCV.

  • Mikulik, A., Perdoch, M., Chum, O., & Matas, J. (2010). Learning a fine vocabulary. In ECCV.

  • Philbin, J., Chum, O., Isard, M., Sivic, J., & Zisserman, A. (2007). Object retrieval with large vocabularies and fast spatial matching. In CVPR.

  • Platt, J. (1999). Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In A. Smola, P. Bartlett, B. Schölkopf, & D. Schuurmans (Eds.), Advances in Large Margin Classifiers (pp. 61–74). Cambridge, MA: MIT Press.

  • Robertson, S. E. (1997). Readings in information retrieval. The probability ranking principle in IR (pp. 281–286). San Francisco: Kaufmann Publishers Inc.

    Google Scholar 

  • Sattler, T., Leibe, B., & Kobbelt, L. (2011). Fast image-based localization using direct 2D-to-3D matching. In ICCV.

  • Sattler, T., Leibe, B., & Kobbelt, L. (2012). Improving image-based localization by active correspondence search. In ECCV.

  • Sattler, T., Weyand, T., Leibe, B., & Kobbelt, L. (2012). Image retrieval for image-based localization revisited. In BMVC.

  • Schindler, G., Brown, M., & Szeliski, R. (2007). City-scale location recognition. In CVPR.

  • Shrivastava, A., Malisiewicz, T., Gupta, A., & Efros, A. A. (2011). Data-driven visual similarity for cross-domain image matching. SIGGRAPH ASIA.

  • Sivic, J., & Zisserman, A. (2003). Video Google: A text retrieval approach to object matching in videos. In ICCV.

  • Torii, A., Sivic, J., & Pajdla, T. (2011). Visual localization by linear combination of image descriptors. In ICCV Workshops

  • Turcot, P., & Lowe, D. (2009). Better matching with fewer features: The selection of useful features in large database recognition problems. In Workshop on Emergent Issues in Large Amounts of Visual Data, ICCV.

  • Zheng, Y. T., et al. (2009). Tour the world: Building a web-scale landmark recognition engine. In CVPR.

Download references

Acknowledgments

This work was supported in part by the National Science Foundation (Grants IIS-1149393 and IIS-0964027) and Intel Corporation. We also thank Flickr users for use of their photos.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Song Cao.

Additional information

Communicated by Derek Hoiem, James Hays, Jianxiong Xiao, and Aditya Khosla.

Appendix: Derivation for the iterative usage of the update factor

Appendix: Derivation for the iterative usage of the update factor

Here we provide the full derivation of how our diversity reranking method updates the probability scores for each image each time we select a new image for the shortlist. Let \(n\) denote the number of images that have already been selected for the ranked shortlist \(RL = \{i_1, i_2, ..., i_n\}\). For any image \(u\) in the as-yet-unselected set of images \(\mathcal {I} \setminus RL\), we wish to compute

$$\begin{aligned} P_u^{(n)} = \Pr (X_u = 1 | X_{i_1} = 0, ..., X_{i_n} = 0), \end{aligned}$$

which is the conditional probability of image \(u\) matching the query image given that all the \(n\) images in the current ranking list do not match the query. To simplify the analysis, we separately consider the cases \(n=1\), \(n=2\), and the general case \(n=k\) (note that in Section 5 we denote \(P_u^{(1)}\) as \(P'_u\)).

  • \(n = 1\): From Eq. (2), Section 5 derives the update factor in Eq. (3) used to compute \(P_i^{(1)}\):

    $$\begin{aligned} P_u^{(1)} = P_u \frac{1 - \frac{P_{u,i_1}}{P_u}P_{i_1}}{1-P_{i_1}} \end{aligned}$$

    where \(P_u\) is the original probability estimate from our GBP approach.

  • \(n = 2\): From the definition of conditional probability, we have that

    $$\begin{aligned} P_u^{(2)}&= \Pr (X_u = 1 | X_{i_2} = 0, X_{i_1} = 0) \nonumber \\&= \frac{\Pr (X_u = 1, X_{i_2} = 0, X_{i_1} = 0)}{\Pr (X_{i_2} = 0, X_{i_1} = 0)} \nonumber \\&= \frac{\Pr (X_u = 1, X_{i_2} = 0 | X_{i_1} = 0)\Pr (X_{i_1} = 0)}{\Pr (X_{i_2} = 0, X_{i_1} = 0)} \nonumber \\ \end{aligned}$$
    (7)

    Recall from Section 5 that we make two simplifying assumptions:

    1. 1.

      Independence of \((X_{i_1} = 0)\) and \((X_{i_2} = 0)\) for distant \(i_1\) and \(i_2\):

      $$\begin{aligned} \Pr (X_{i_2} = 0, X_{i_1} = 0) = \Pr (X_{i_2} = 0) \Pr (X_{i_1} = 0) \end{aligned}$$
    2. 2.

      Independence of \((X_u = 1, X_{i_2} = 1)\) and \((X_{i_1} = 0)\):

      $$\begin{aligned} \Pr (X_u = 1, X_{i_2} = 1 | X_{i_1} = 0) = \Pr (X_u = 1, X_{i_2} = 1) \end{aligned}$$

    Given these assumptions, we can simplify Eq. (7) as follows:

    $$\begin{aligned}&P_u^{(2)} = \Pr (X_u = 1 | X_{i_2} = 0, X_{i_1} = 0) \nonumber \\&= \frac{\Pr (X_u = 1, X_{i_2} = 0 | X_{i_1} = 0) \Pr (X_{i_1} = 0)}{\Pr (X_{i_2} = 0) \Pr (X_{i_1} = 0)} \nonumber \\&= \frac{\Pr (X_u = 1, X_{i_2} = 0 | X_{i_1} = 0)}{\Pr (X_{i_2} = 0)} \nonumber \\&= \frac{\Pr (X_u = 1 | X_{i_1} = 0) - \Pr (X_u = 1, X_{i_2} = 1 | X_{i_1} \!=\! 0)}{1-\Pr (X_{i_2}\!=\!1)} \nonumber \\&= \frac{\Pr (X_u = 1 | X_{i_1} = 0) - \Pr (X_u = 1, X_{i_2} = 1)}{1-\Pr (X_{i_2}=1)} \nonumber \\&= \frac{P_u^{(1)} - \Pr (X_u = 1 | X_{i_2} = 1) \Pr (X_{i_2} = 1)}{1-\Pr (X_{i_2}=1)} \nonumber \\&= \frac{P_u^{(1)} - P_{u,i_2} P_{i_2}}{1-P_{i_2}} = P_u^{(1)} \left( \frac{1 - \frac{P_{u,i_2}}{P_u^{(1)}} P_{i_2}}{1-P_{i_2}} \right) \end{aligned}$$
    (8)

    where \(P_u^{(1)}\) is the updated conditional probability of \(X_{u} = 1\) using the evidence \(X_{i_1} = 0\). Note that the update factor at the end of Eq. (8) is of the same form of Eq. (3).

  • \(n = k\): More generally, we can similarly derive that

    $$\begin{aligned} P_u^{(k)}&= P_u^{(k-1)} \left( \frac{1 - \frac{P_{u,i_k}}{P_u^{(k-1)}} P_{i_k}}{1-P_{i_k}} \right) \end{aligned}$$
    (9)

    Hence, through Eq. (9), we can update each as-yet-unchosen image \(u\) conditional probability \(P_u^{(k)}\) in a iterative fashion each time we add a new image to the ranking list \(RL\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cao, S., Snavely, N. Graph-Based Discriminative Learning for Location Recognition. Int J Comput Vis 112, 239–254 (2015). https://doi.org/10.1007/s11263-014-0774-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-014-0774-9

Keywords

Navigation