Unmanned aerial image dataset: Ready for 3D reconstruction

Unmanned aerial vehicles (UAVs) have become popular platforms for collecting various types of geospatial data for various mapping, monitoring and modelling applications. With the advancement of imaging and computing technologies, a vast variety of photogrammetric, computer-vision and, nowadays, end-to-end learning workflows are introduced to produce three-dimensional (3D) information in form of digital surface and terrain models, textured meshes, rectified mosaics, CAD models, etc. These 3D products might be used in applications where accuracy and precision play a vital role, e.g. structural health monitoring. Therefore, extensive tests against data with relevant characteristics and reliable ground-truth are required to assess and ensure the performance of 3D modelling workflows. This article describes the images collected by a customized unmanned aerial vehicle (UAV) system from an open-pit gravel mine accompanied with additional data that will allow implementing and evaluating any structure-from-motion or photogrammetric approach for sparse or dense 3D reconstruction. This dataset includes total of 158 high-quality images captured with more than 80% endlap and spatial resolution higher than 1.5 cm, the 3D coordinates of 109 ground control points and checkpoints, 2D coordinates of more than 40K corresponding points among the images, a subset of 25 multi-view stereo images selected from an area of approximately 30 m × 40 m within the scene accompanied with a dense point cloud measured by a terrestrial laser scanner.


a b s t r a c t
Unmanned aerial vehicles (UAVs) have become popular platforms for collecting various types of geospatial data for various mapping, monitoring and modelling applications. With the advancement of imaging and computing technologies, a vast variety of photogrammetric, computer-vision and, nowadays, end-to-end learning workflows are introduced to produce three-dimensional (3D) information in form of digital surface and terrain models, textured meshes, rectified mosaics, CAD models, etc. These 3D products might be used in applications where accuracy and precision play a vital role, e.g. structural health monitoring. Therefore, extensive tests against data with relevant characteristics and reliable ground-truth are required to assess and ensure the performance of 3D modelling workflows. This article describes the images collected by a customized unmanned aerial vehicle (UAV) system from an open-pit gravel mine accompanied with additional data that will allow implementing and evaluating any structure-frommotion or photogrammetric approach for sparse or dense 3D reconstruction. This dataset includes total of 158 high-quality images captured with more than 80% endlap and spatial resolution higher than 1.5 cm, the 3D coordinates of 109 ground control points and checkpoints, 2D coordinates of more than 40K corresponding points among the images, a subset of 25 multi-view stereo images selected from an area of approximately

Data
This paper describes the images collected by a customized unmanned aerial vehicle (UAV) system from an open-pit gravel mine at approximate altitude of 80 m over an area of 150 m Â 200 m (Fig. 1). These images are filtered and analyzed to produce additional data such as corresponding feature points (Table 1), which will allow implementing and evaluating any structure-from-motion or photogrammetric approach meant to perform sparse three-dimensional (3D) reconstruction. Accurate 3D control points and check points (Table 2) are provided to allow testing indirect geo-referencing techniques and validating the accuracy of 3D modelling. A dense point cloud captured by terrestrial lidar (Fig. 2) is provided to allow evaluating dense stereo or multi-view reconstruction techniques.

Acquiring data
The imagery was acquired from an electric-powered helicopter (Responder, ING Robotic Aviation Inc., Ottawa, ON, Canada). A 16-Megapixels camera (GE4900C, Prosilica, Exton, PA, USA) was used to capture the images. It had an approximate sensor size of 36 Â 24 mm, and a 35 mm F-adjustable lens. The camera was synchronized with the navigation sensors, a GPS-aided inertial navigation system (INS) (MIDGII, Microbotics Inc, Hampton, VA, USA). Control software was developed to store the data, to set the acquisition parameters of the camera and the INS (e.g., the exposure time, F-number, and Value of the data Researchers in the fields of photogrammetry engineering, computer vision, and geography can use this dataset for: Evaluating robust sparse matching algorithms Evaluating Bundle Adjustment approaches Evaluating multi-view stereo dense reconstruction algorithms Evaluating incremental structure-from-motion techniques triggering interval) and to geo-tag the images. This software solution was run on an ultra-small, singleboard system (CoreModule 920, ADLINK, San Jose, CA, USA) on board the UAV and was remotely controlled from the ground station. The data was acquired over several stockpiles in a gravel pit at Sherbrooke, QC, Canada. A simple software solution was developed and used to plan both the flight mission and the field survey. Several factors such as the geometric configuration of the photogrammetric network and the distribution of control data were taken into account to plan the data collection. A large number of targets were installed and measured using a high-precision, dual-frequency real-time kinematic (RTK) system (R8 GNSS System, Trimble, Sunnyvale, CA, USA) as well as a robotic total station (VX, Trimble, Sunnyvale, CA, USA). They can serve as ground control points (GCP).
Over one zone, ten dense 3D point clouds were collected by a terrestrial laser scanner (Focus 3D, FARO Technologies). To accurately co-register the individual scans, approximately 40 signalized targets with checkerboard patterns and reference spheres were installed and accurately measured. The readers are referred to [2] for further details about the data-acquisition experiments.

Data formatting
This dataset includes i) Total of 158 high-quality images captured via an industrial-grade camera equipped with a sensor of size 36 mm Â 24 mm at pixel size of 7.4 mm and a 35 mm F-adjustable lens.
The spatial resolution of this imagery is approximately 1.5 cm on the flat zones of the scene; ii) Approximate exterior orientation parameters of the images measured by the navigation sensors and filtered for gross errors; iii) The 3D coordinates of 109 ground control points collected by a highprecision, dual-frequency RTK GNSS system and a robotic total station as well as their 2D coordinates observed in the images; iv) 2D coordinates of more than 40K corresponding points among the images ensuring feasibility of accurate multi-view stereo analysis; v) A dense point cloud acquired by a terrestrial laser scanner over an area of approximately 30 m Â 40 m within the scene that can be used as reference data for developing and testing high-density stereo matching techniques. The following paragraphs provide the formatting details of each component of this dataset.

Images, direct geo-referencing data, time data
Uncompressed images are in Tiff format and are located in folder "RawImages". Labels of the images start from 1 and incrementally continues to 158. Due to large volume, the images are available through an online storage space. 1 The approximate exterior orientation parameters (EOPs) of images were recorded via a low-cost GNSS/IMU system. These observations were refined and are provided in the text file "DirectEOPs.txt". This file consists of seven columns. The first column represents the image label. The second, third and fourth columns represent the 3D coordinates of the perspective center of the camera (3-vector C in Equation (1)). The coordinates are expressed in meters in the projected coordinate reference system, NAD 83/MTM Zone 7. The elevations are ellipsoidal heights. The fifth, sixth and seventh columns represent the rotation angles that can align the object coordinate system with the principal image coordinate system (Tait-Bryan Euler angles u, 4, k in Equation (2)). The unit of these angles is decimal degrees. Since images are captured in a high-overlapping sequence, one can use them in deterministic simultaneous localization and mapping algorithms or sequential structurefrom-motion as well. To this end, one may need the timestamp of each image that corresponds to the exposure time of the image. The timestamps are provided in file "TimeStamps.txt" and are expressed in seconds up to a micro-second precision.
where: u : Homogeneous coordinates of the image point. X: Euclideanly normalized homogeneous coordinates of the object point. K: camera intrinsic calibration matrix. C : coordinates of the perspective center of the camera in the world coordinate system. R: rotation matrix describing the rotation of the image coordinate system wrt the world coordinate system defined as in Equation 2  u; 4; k: rotation angles about the x-, y-, and z-axes of the world coordinate system.

Corresponding points
Coordinates of corresponding points are provided in the text file "TiePoints.txt". This text file includes four columns. The first column is the index of the image in which the point is observed (an integer between 1 and 158, inclusively). The second column is the index of the point. The third and the fourth columns are the 2D coordinates of the point in the image expressed in a coordinate system with its x-direction aligned with the image width, the y-direction aligned with the image height, and the origin at the top left corner of the image. The approximate precision of these image observations is 0.3 pixels, in both x-and y-directions. Equation (3) relates these coordinates (u) to the coordinates in the image principal coordinate system, a.k.a camera coordinate system (x), where the x-direction is aligned with the image width, the y-direction is aligned with the image height pointing towards the image top, and the origin is at the center of the image.   s 0 Àsw = 2 À x p 0 Às sh= 2 À y p 0 0 1 where: x: homogeneous coordinates of the point expressed in the image principal coordinate system. s: pixel size in mm. f : principal distance in mm. w; h: image width and height in pixels.
x p ; y p : offsets of the principal point in mm. a x ; a y : lens and sensor distortions as defined in Equation 4 where: r 2 ¼ ðx À x p Þ 2 þ ðy À y p Þ 2 : squared radial distance from the image center. k 1 ; k 2 ; k 3 : radial lens distortion coefficients. p 1 ; p 2 : decentering lens distortion coefficients. b 1 ; b 2 : affine sensor distortion coefficients.

Accurately estimated 3D coordinates of the tie points, the EOPs, and the intrinsic calibration parameters
The 3D coordinates of the tie points and the EOPs of the images are estimated and refined through a robust, sparse, self-calibrating bundle adjustment approach. The 3D coordinates of the tie points are provided in the file "ObjectPoints.txt", which includes four columns. The first column is the point index, and the last three columns are the coordinates of the point that are expressed in meters in the projected coordinate reference system, NAD 83/MTM Zone 7. The elevations are ellipsoidal heights. The estimated EOPs are included in the file "RefinedEOPs.txt", which has the same format as the file "DirectEOPs.txt". The intrinsic calibration parameters are included in the file "RefinedIOPs.txt". The parameters are described as listed in the text file and they correspond to Equations (3) and (4).

Ground control points
The ground-truth coordinates of 109 tie points are measured via surveying tools and provided in file "GCPs.txt". This file has the same format as file "ObjectPoints.txt". Considering the surveying method, an approximate precision of 2e5 mm is acceptable for these observations. The GCPs are labeled with negative integer numbers to allow them to be easily identified in tie-point observation file "TiePoints.txt".

Image pairs and ground-truth dense point cloud
If one is interested in testing a dense multi-view stereo reconstruction approach on the images of this dataset, our recommendation is using a subset of images whose indices are provided in the text file "DenseMatching.txt". These images cover a stockpile, which is accurately and densely surveyed with a 3D terrestrial laser scanner (Fig. 2). Therefore, after dense matching, one can evaluate the generated depth maps against this point cloud. The point-cloud data is provided in the file "LaserScan.xyz", which includes three columns corresponding to the 3D coordinates of the scanned points in NAD 83/MTM Zone 7. The elevations are ellipsoidal heights.

Pre-processing
View-clustering is performed, and image pairs with sufficient overlap and proper geometric configuration are identified. Initial image matching is performed using a modified SIFT algorithm. A robust method based on an evolutionary algorithm is used to estimate the two-view epipolar geometry of image pairs and identify the inlier corresponding points [3]. Then, a sequential approach is applied to convert two-view parameters of epipolar-geometry between image pairs first to relative orientation parameters (ROPs) and, then, to exterior orientation parameters. The initial 3D coordinates of the tiepoints are reconstructed as well. Initial estimates are optimized using robust sparse self-calibrating bundle adjustment, as a result of which remaining large errors among the observations of the corresponding points are removed as well [4]. After this process, the average frequency of the tie points (i.e., the number of images a tie-point is observed in) is 11 images, and the average density of the images (i.e., the number of tie points observed in an image) is 261 points. The total number of observations is 41346 points. For a subset of 25 images taken over an area of 30 m Â 40 m (one stockpile), the estimated intrinsic calibration parameters of the camera and the EOPs of the images can be used to rectify corresponding image pairs to normal stereo images [1]. The ground-truth point cloud provided in this dataset is generated by co-registering ten scans. Registration is performed using a total of 43 accurately surveyed signalized targets (spheres and cross-check patterns). The point cloud is comprised of over 8,300,000 points.