Structure from motion for complex image sets

https://doi.org/10.1016/j.isprsjprs.2020.05.020Get rights and content

Abstract

This paper presents an approach for Structure from Motion (SfM) for unorganized complex image sets. To achieve high accuracy and robustness, image triplets are employed and an (approximate) internal camera calibration is assumed to be known. The complexity of an image set is determined by the camera configurations which may include wide as well as weak baselines.

Wide baselines occur for instance when terrestrial images and images from small Unmanned Aerial Systems (UAS) are combined. The resulting large (geometric/radiometric) distortions between images make image matching difficult possibly leading to an incomplete result. Weak baselines mean an insufficient distance between cameras compared to the distance of the observed scene and give rise to critical camera configurations. Inappropriate handling of such configurations may lead to various problems in triangulation-based SfM up to total failure.

The focus of our approach lies on a complete linking of images even in case of wide or weak baselines. We do not rely on any additional information such as camera configurations, Global Positioning System (GPS) or an Inertial Navigation System (INS). As basis for generating suitable triplets to link the images, an iterative graph-based method is employed formulating image linking as the search for a terminal Steiner minimum tree in the line graph. SIFT (Lowe, 2004) descriptors are embedded into Hamming space for fast image similarity ranking. This is employed to limit the number of pairs to be geometrically verified by a computationally and more complex wide baseline matching method (Mayer et al., 2012). Critical camera configurations which are not suitable for geometric verification are detected by means of classification (Michelini and Mayer, 2019). Additionally, we propose a graph-based approach for the optimization of the hierarchical merging of triplets to efficiently generate larger image subsets.

By this means, a complete, 3D reconstruction of the scene is obtained. Experiments demonstrate that the approach is able to produce reliable orientation for large image sets comprising wide as well as weak baseline configurations.

Introduction

Recent developments for Structure from Motion (SfM) or sparse 3D reconstruction techniques from unorganized image sets focus on large (Internet) photo collections (Heinly et al., 2015, Crandall et al., 2011, Frahm et al., 2010, Havlena et al., 2010, Agarwal et al., 2009, Snavely et al., 2008). They can contain millions of images, yet, often comprising a very high redundancy and moderate baselines. In contrast to these large photo collections, we focus on image sets with up to a few thousand images, but containing complex camera configurations comprising wide as well as weak baselines between images.

In our case, wide baselines often arise when terrestrial images and imagery taken from small Unmanned Aerial Systems (UAS) are combined. Failure to handle wide baselines can lead to incomplete SfM resulting in multiple disconnected reconstructions. On the other hand, weak baselines occur if the translation between image acquisitions is insufficient in relation to the distance to the observed scene. Such camera configurations are termed critical because they lead to a poor intersection geometry, which becomes undefined in case of zero baseline (i.e., pure rotation). Thus, an inappropriate handling of weak baselines can result in inaccurate or even failing orientation estimation and sparse 3D reconstruction.

A crucial step during SfM is the merging of image pairs or triplets with consistent geometry into larger image subsets with a common reference frame. The result is usually optimized by means of bundle adjustment (Triggs et al., 2000), which is a computationally intensive non-linear optimization method. Hence, hierarchical merging techniques (Toldo et al., 2015, Mayer, 2014, Gherardi et al., 2010, Farenzena et al., 2009, Fitzgibbon and Zisserman, 1998) are applied to improve the efficiency. This way, disjoint image subsets can be merged independently and, thus, in parallel, improving the runtime on systems with multiple parallel processing units.

The aim of our approach is an efficient, complete and reliable linking of the entire image set even for complex configurations comprising wide as well as weak baselines. No additional information like camera configuration, Global Positioning System (GPS) or Inertial Navigation System (INS) is used. An overview of the processing stages is given in Fig. 1.

Input is an (unorganized) image set with an (approximate) internal camera calibration. Based on this, we start with image preprocessing where multi-resolution image pyramids are generated and SIFT features (Lowe, 2004) are extracted using GPU-acceleration (Wu, 2012). In the next stage, a graph-based method (Michelini and Mayer, 2016) establishes the links between the images (Section 3) employing a classification-based approach (Michelini and Mayer, 2019) for the detection of critical camera configurations. The wide baseline method (Mayer et al., 2012) is used for geometric verifications providing robustness even in case of large radiometric or geometric image distortions. Based on image linkage, a novel graph-based optimization strategy improves the efficiency of the subsequent hierarchical merging of image subsets (Section 4). In the last stage, relative camera orientations as well as the sparse 3D structure are determined using hierarchical merging (Mayer, 2014). Data exchange between processing stages is accomplished by means of a database.

Results which demonstrate the potential of our SfM approach on real-world datasets as well as in comparison to other state-of-the-art frameworks are presented in Section 5. Finally, in Section 6 conclusions are given.

Section snippets

Related work

Fitzgibbon and Zisserman (1998) as well as Koch et al. (1998) presented pioneering works dealing with SfM in image sequences. Later, Schaffalitzky and Zisserman (2002) have shown, that automatic SfM is achievable for general camera configurations without additional information (e.g., about the sequence). Snavely et al. (2006) introduced the Framework Photo Tourism, which can deal with larger image sets and could produce high quality results. However, it has a high runtime due to the employed

Image linking

Image linking describes the relations between images and, thus, implies knowledge about their overlap (i.e., the projection of the same parts of a scene). The latter is usually derived from feature correspondences, i.e., by means of image matching (Hartmann et al., 2016). Yet, because of the high combinatorial complexity, exhaustive image matching is not practical even for small image sets.

In addition, knowledge about geometric relations between images is in our case purely based on image

Hierarchical merging of image subsets

Starting from known links in the form of triplets, images are merged to larger image subsets transforming the camera orientations into a common reference frame. We employ the hierarchical merging of Mayer (2014), which allows image subsets to grow independently from each other. This offers the opportunity to utilize parallel architectures by performing merging in parallel. However, while the reduction of 3D points with the aim of reducing the merging runtime was the focus in Mayer (2014), here

Results

The capability of the proposed approach is demonstrated on image sets whose properties are specified in Table 2 and a system whose specification is listed in Table 1. Camera orientations of the image sets, estimated using the proposed approach, are shown in Fig. 6, Fig. 7, Fig. 8, Fig. 9, where colors represent different camera types. The pyramids correspond to camera orientations with the apex of the pyramid giving the camera position and the rotation of the pyramid the camera direction.

Images

Conclusion

In this paper, an automatic SfM approach for (unordered) image sets comprising complex configurations has been presented. Apart from (approximate) internal camera calibration, no other information like GPS or INS data is required.

We proposed a graph-based method allowing for an efficient and unsupervised search for image links even in case of strong image distortions as well as critical camera configurations. In addition, an optimization technique is presented which improves the load balancing

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (64)

  • W. Hartmann et al.

    Recent developments in large-scale tie-point matching

    ISPRS J. Photogramm. Remote Sens.

    (2016)
  • G. Lin et al.

    On the terminal Steiner problem

    Inform. Process. Lett.

    (2002)
  • R. Toldo et al.

    Hierarchical structure-and-motion recovery from uncalibrated images

    Comput. Vis. Image Underst.

    (2015)
  • S. Agarwal et al.

    Building Rome in a Day

  • C. Beder et al.

    Determining an initial image pair for fixing the scale of a 3D reconstruction from an image sequence

  • P. Bruckert

    Scheduling Algorithms

    (2007)
  • H. Cai et al.

    Learning linear discriminant projections for dimensionality reduction of image descriptors

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2011)
  • Y.H. Chen

    An improved approximation algorithm for the terminal Steiner tree problem

    Comput. Sci. Appl. – ICCSA

    (2011)
  • J. Cheng et al.

    Fast and accurate image matching with cascade hashing for 3D reconstruction

  • D. Crandall et al.

    Discrete-continuous optimization for large-scale structure from motion

  • R.I. Davis et al.

    A survey of hard real-time scheduling for multiprocessor systems

    ACM Comput. Surv.

    (2011)
  • M. Farenzena et al.

    Structure-and-motion pipeline on a hierarchical cluster tree

  • A.W. Fitzgibbon et al.

    Automatic camera recovery for closed or open image sequences

  • J.-M. Frahm et al.

    Building Rome on a cloudless day

  • R. Gherardi et al.

    Improving the efficiency of hierarchical structure-and-motion

  • M. Havlena et al.

    Optimal reduction of large image databases for location recognition

  • M. Havlena et al.

    VocMatch: efficient multiview correspondence for structure from motion

  • M. Havlena et al.

    Efficient structure from motion by graph optimization

  • K. Heath et al.

    Image webs: computing and exploiting connectivity in image collections

  • J. Heinly et al.

    Reconstructing the World in six days

  • A. Irschara et al.

    Efficient structure from motion with weak position and orientation priors

  • P. Jaccard

    The distribution of the flora in the alpine zone

    New Phytol.

    (1912)
  • H. Jegou et al.

    Hamming embedding and weak geometric consistency for large scale image search

  • N. Jiang et al.

    A global linear method for camera pose registration

  • Ke, Y., Sukthankar, R., 2004. PCA-SIFT: A More Distinctive Representation for Local Image Descriptors. In: Conference...
  • K.I. Kim et al.

    Match graph construction for large image databases

  • Klopschitz, M., Irschara, A., Reitmayr, G., Schmalstieg, D., 2010. Robust Incremental Structure from Motion. In:...
  • Koch, R., Pollefeys, M., Gool, L.V., 1998. Automatic 3D Model Acquisition from Uncalibrated Image Sequences. In:...
  • M. Korch et al.

    A comparision of task pools for dynamic load balancing of irregular algorithms

    Concurr. Comput.: Pract. Exp.

    (2004)
  • X. Li et al.

    Modeling and recognition of landmark image collections using iconic scene graphs

  • T. Lindeberg

    Feature detection with automatic scale selection

    Int. J. Comput. Vision

    (1998)
  • D.G. Lowe

    Distinctive image features from scale-invariant keypoints

    Int. J. Comput. Vision

    (2004)
  • Cited by (9)

    • Robust hierarchical structure from motion for large-scale unstructured image sets

      2021, ISPRS Journal of Photogrammetry and Remote Sensing
      Citation Excerpt :

      Therefore, it is challenging to use distributed parallel computing nodes to process large-scale image sets. To solve these issues, a hierarchical and parallelable scheme has been introduced to large-scale SfM, namely HSfM (Chen et al., 2020; Farenzena et al., 2009; Gherardi et al., 2010; Michelini and Mayer, 2020; Toldo et al., 2015). In general, HSfM first divides the images into several reconstruction units containing images covering certain parts of the whole scene, then constructs a partial model with ISfM for each unit, and finally merges partial models into a complete 3D model.

    • A hybrid global structure from motion method for synchronously estimating global rotations and global translations

      2021, ISPRS Journal of Photogrammetry and Remote Sensing
      Citation Excerpt :

      Fig. 19 shows their reconstruction results and the red ellipses illustrate the corresponding visual artefacts. As for Michelini and Mayer (2020), the most attractive virtue of our approach is that nearly 150,000 tie points are additionally reconstructed with a lower mean reprojection error, but Michelini and Mayer (2020) is 3.4 times faster than us, which is mainly caused by two factors: first, less tie points reduce the runtime in the process of bundle adjustment; and second, the parallel technique is utilized on a more powerful machine (2 × Intel® Xeon® E5-2643 v3 (6 cores, 3.40 GHz)) by them. In order to further figure out how far our approach can move forward and explore the limitation, we tested one more dataset, namely campus (provided by Cui and Tan, 2015).

    • Geospatial Information Research: State of the Art, Case Studies and Future Perspectives

      2022, PFG - Journal of Photogrammetry, Remote Sensing and Geoinformation Science
    View all citing articles on Scopus
    View full text