Structure from motion for complex image sets
Introduction
Recent developments for Structure from Motion (SfM) or sparse 3D reconstruction techniques from unorganized image sets focus on large (Internet) photo collections (Heinly et al., 2015, Crandall et al., 2011, Frahm et al., 2010, Havlena et al., 2010, Agarwal et al., 2009, Snavely et al., 2008). They can contain millions of images, yet, often comprising a very high redundancy and moderate baselines. In contrast to these large photo collections, we focus on image sets with up to a few thousand images, but containing complex camera configurations comprising wide as well as weak baselines between images.
In our case, wide baselines often arise when terrestrial images and imagery taken from small Unmanned Aerial Systems (UAS) are combined. Failure to handle wide baselines can lead to incomplete SfM resulting in multiple disconnected reconstructions. On the other hand, weak baselines occur if the translation between image acquisitions is insufficient in relation to the distance to the observed scene. Such camera configurations are termed critical because they lead to a poor intersection geometry, which becomes undefined in case of zero baseline (i.e., pure rotation). Thus, an inappropriate handling of weak baselines can result in inaccurate or even failing orientation estimation and sparse 3D reconstruction.
A crucial step during SfM is the merging of image pairs or triplets with consistent geometry into larger image subsets with a common reference frame. The result is usually optimized by means of bundle adjustment (Triggs et al., 2000), which is a computationally intensive non-linear optimization method. Hence, hierarchical merging techniques (Toldo et al., 2015, Mayer, 2014, Gherardi et al., 2010, Farenzena et al., 2009, Fitzgibbon and Zisserman, 1998) are applied to improve the efficiency. This way, disjoint image subsets can be merged independently and, thus, in parallel, improving the runtime on systems with multiple parallel processing units.
The aim of our approach is an efficient, complete and reliable linking of the entire image set even for complex configurations comprising wide as well as weak baselines. No additional information like camera configuration, Global Positioning System (GPS) or Inertial Navigation System (INS) is used. An overview of the processing stages is given in Fig. 1.
Input is an (unorganized) image set with an (approximate) internal camera calibration. Based on this, we start with image preprocessing where multi-resolution image pyramids are generated and SIFT features (Lowe, 2004) are extracted using GPU-acceleration (Wu, 2012). In the next stage, a graph-based method (Michelini and Mayer, 2016) establishes the links between the images (Section 3) employing a classification-based approach (Michelini and Mayer, 2019) for the detection of critical camera configurations. The wide baseline method (Mayer et al., 2012) is used for geometric verifications providing robustness even in case of large radiometric or geometric image distortions. Based on image linkage, a novel graph-based optimization strategy improves the efficiency of the subsequent hierarchical merging of image subsets (Section 4). In the last stage, relative camera orientations as well as the sparse 3D structure are determined using hierarchical merging (Mayer, 2014). Data exchange between processing stages is accomplished by means of a database.
Results which demonstrate the potential of our SfM approach on real-world datasets as well as in comparison to other state-of-the-art frameworks are presented in Section 5. Finally, in Section 6 conclusions are given.
Section snippets
Related work
Fitzgibbon and Zisserman (1998) as well as Koch et al. (1998) presented pioneering works dealing with SfM in image sequences. Later, Schaffalitzky and Zisserman (2002) have shown, that automatic SfM is achievable for general camera configurations without additional information (e.g., about the sequence). Snavely et al. (2006) introduced the Framework Photo Tourism, which can deal with larger image sets and could produce high quality results. However, it has a high runtime due to the employed
Image linking
Image linking describes the relations between images and, thus, implies knowledge about their overlap (i.e., the projection of the same parts of a scene). The latter is usually derived from feature correspondences, i.e., by means of image matching (Hartmann et al., 2016). Yet, because of the high combinatorial complexity, exhaustive image matching is not practical even for small image sets.
In addition, knowledge about geometric relations between images is in our case purely based on image
Hierarchical merging of image subsets
Starting from known links in the form of triplets, images are merged to larger image subsets transforming the camera orientations into a common reference frame. We employ the hierarchical merging of Mayer (2014), which allows image subsets to grow independently from each other. This offers the opportunity to utilize parallel architectures by performing merging in parallel. However, while the reduction of 3D points with the aim of reducing the merging runtime was the focus in Mayer (2014), here
Results
The capability of the proposed approach is demonstrated on image sets whose properties are specified in Table 2 and a system whose specification is listed in Table 1. Camera orientations of the image sets, estimated using the proposed approach, are shown in Fig. 6, Fig. 7, Fig. 8, Fig. 9, where colors represent different camera types. The pyramids correspond to camera orientations with the apex of the pyramid giving the camera position and the rotation of the pyramid the camera direction.
Images
Conclusion
In this paper, an automatic SfM approach for (unordered) image sets comprising complex configurations has been presented. Apart from (approximate) internal camera calibration, no other information like GPS or INS data is required.
We proposed a graph-based method allowing for an efficient and unsupervised search for image links even in case of strong image distortions as well as critical camera configurations. In addition, an optimization technique is presented which improves the load balancing
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References (64)
- et al.
Recent developments in large-scale tie-point matching
ISPRS J. Photogramm. Remote Sens.
(2016) - et al.
On the terminal Steiner problem
Inform. Process. Lett.
(2002) - et al.
Hierarchical structure-and-motion recovery from uncalibrated images
Comput. Vis. Image Underst.
(2015) - et al.
Building Rome in a Day
- et al.
Determining an initial image pair for fixing the scale of a 3D reconstruction from an image sequence
Scheduling Algorithms
(2007)- et al.
Learning linear discriminant projections for dimensionality reduction of image descriptors
IEEE Trans. Pattern Anal. Mach. Intell.
(2011) An improved approximation algorithm for the terminal Steiner tree problem
Comput. Sci. Appl. – ICCSA
(2011)- et al.
Fast and accurate image matching with cascade hashing for 3D reconstruction
- et al.
Discrete-continuous optimization for large-scale structure from motion
A survey of hard real-time scheduling for multiprocessor systems
ACM Comput. Surv.
Structure-and-motion pipeline on a hierarchical cluster tree
Automatic camera recovery for closed or open image sequences
Building Rome on a cloudless day
Improving the efficiency of hierarchical structure-and-motion
Optimal reduction of large image databases for location recognition
VocMatch: efficient multiview correspondence for structure from motion
Efficient structure from motion by graph optimization
Image webs: computing and exploiting connectivity in image collections
Reconstructing the World in six days
Efficient structure from motion with weak position and orientation priors
The distribution of the flora in the alpine zone
New Phytol.
Hamming embedding and weak geometric consistency for large scale image search
A global linear method for camera pose registration
Match graph construction for large image databases
A comparision of task pools for dynamic load balancing of irregular algorithms
Concurr. Comput.: Pract. Exp.
Modeling and recognition of landmark image collections using iconic scene graphs
Feature detection with automatic scale selection
Int. J. Comput. Vision
Distinctive image features from scale-invariant keypoints
Int. J. Comput. Vision
Cited by (9)
A cluster-based disambiguation method using pose consistency verification for structure from motion
2024, ISPRS Journal of Photogrammetry and Remote SensingRobust hierarchical structure from motion for large-scale unstructured image sets
2021, ISPRS Journal of Photogrammetry and Remote SensingCitation Excerpt :Therefore, it is challenging to use distributed parallel computing nodes to process large-scale image sets. To solve these issues, a hierarchical and parallelable scheme has been introduced to large-scale SfM, namely HSfM (Chen et al., 2020; Farenzena et al., 2009; Gherardi et al., 2010; Michelini and Mayer, 2020; Toldo et al., 2015). In general, HSfM first divides the images into several reconstruction units containing images covering certain parts of the whole scene, then constructs a partial model with ISfM for each unit, and finally merges partial models into a complete 3D model.
A hybrid global structure from motion method for synchronously estimating global rotations and global translations
2021, ISPRS Journal of Photogrammetry and Remote SensingCitation Excerpt :Fig. 19 shows their reconstruction results and the red ellipses illustrate the corresponding visual artefacts. As for Michelini and Mayer (2020), the most attractive virtue of our approach is that nearly 150,000 tie points are additionally reconstructed with a lower mean reprojection error, but Michelini and Mayer (2020) is 3.4 times faster than us, which is mainly caused by two factors: first, less tie points reduce the runtime in the process of bundle adjustment; and second, the parallel technique is utilized on a more powerful machine (2 × Intel® Xeon® E5-2643 v3 (6 cores, 3.40 GHz)) by them. In order to further figure out how far our approach can move forward and explore the limitation, we tested one more dataset, namely campus (provided by Cui and Tan, 2015).
View-graph key-subset extraction for efficient and robust structure from motion
2023, Photogrammetric RecordGeospatial Information Research: State of the Art, Case Studies and Future Perspectives
2022, PFG - Journal of Photogrammetry, Remote Sensing and Geoinformation ScienceA review of developments in the theory and technology of three-dimensional reconstruction in digital aerial photogrammetry
2022, Cehui Xuebao/Acta Geodaetica et Cartographica Sinica