Dataset of thermal and visible aerial images for multi-modal and multi-spectral image registration and fusion

This article presents a dataset of thermal and visible aerial images of the same flat scene at Melendez campus of Universidad del Valle, Cali, Colombia. The images were acquired using an UAV equipped with either a thermal or a visible camera. The dataset is useful for testing techniques for the improvement, registration and fusion of multi-modal and multi-spectral images. The dataset consists of 30 visible images and their metadata, 80 thermal images and their metadata, and a visible georeferenced orthoimage. The metadata related to every image contains the WGS84 coordinates for allocating the images. Also, the homography matrices between every image and the orthoimage are included in the dataset. The images and homographies are compatible with the well-known assessment protocol for detection and description proposed by Mikolajczyk and Schmid [1].


Data
The dataset consists of 30 visible and 80 thermal images of a planar scene on an area of 1.54 ha at Universidad del Valle-Colombia (À76.536 E, 3.378 N). The images are compressed in JPG format and their WGS84 position is included to every header file. The dataset includes also one visible georeferenced orthoimage and the homography matrices between the orthoimage and every thermal and visible image. Table 1 presents the files and folders organization of the dataset. Table 2 and Table 3 presents the main specifications of the equipment used to capture the images of the dataset. Table 4 presents the approximate weather conditions while capturing the images. Fig. 1 shows two thermal and two visible images of the dataset. Fig. 2 shows the photogrammetric flights that were performed Specifications Table   Subject Computer Vision and Pattern Recognition  Specific subject area  Visible and thermal images registration and fusion  Type of data Images, text files, metadata and one orthoimage. How data were acquired Thermal camera Zenmuse XT, 1  Value of the Data This dataset presents thermal and visible aerial images of a flat scene with their respective geospatial data, a visible georeferenced orthoimage and the homography matrices between the images and the orthoimage, which are useful for other researchers to assess and develop new techniques for improvement, registration and fusion of thermal and visible images.
Using this dataset, researchers in fields of computer vision, remote sensing and pattern recognition can develop, improve and test matching methods between multi-modal and multi-spectral images. The homography matrices included in this dataset can be used to assess new registration processes focused on images of different wavelengths, these homographies can also be used as an initial approximation to generate either visible or thermal orthoimages that allows fusion methodologies to be assessed.
for capturing the images. Fig. 3 shows the visible orthoimage. Code 1 shows the Matlab 5 function used to write the homographies between the images.

Experimental design, materials, and methods
For capturing the images of the database, the next materials and equipment were used: Zenmuse XT Thermal camera (Specifications at Table 2).    Zenmuse X3 RGB camera (Specifications at Table 2). Matrice 100 UAV (Specifications at Table 3). DJI Go 6 application software for planning flights.
First, the visible images were acquired in a photogrammetric flight with an approximate overlap ratio of 80% (~28 m between consecutive images) for both longitudinal and transverse direction using the Zenmuse X3 camera on board of the Matrice 100 UAV flying at approximate altitude of 80 m and speed of 6.4 m/s, on Jan-17-2019 at 16:16 hours (GTM-5). Then, the thermal images were acquired in a photogrammetric flight with an approximate overlap ratio of 90% (~12 m between consecutive images) for both longitudinal and transverse direction using the Zenmuse XT camera on board of the same UAV flying at approximate altitude of 100 m and speed of 6.4 m/s, on Jan-17-2019 at 16:40 hours (GTM-5). Fig. 1 shows two thermal images and two visible ones that were captured in their respective flights. The photogrammetric flights for visible and thermal acquisition (See Fig. 2) were configured using DJI Go app. For the visible images, an approximate area of 1.57 ha was covered by 6 flight lines northsouth orientated and separated by~28 m. The total flight length for capturing the visible images was 857 m. For the thermal images, an approximate area of 0.86 ha was covered by 10 flight lines northsouth orientated and separated by~12 m. The total flight length for capturing the thermal images was 1009 m. The weather conditions when acquiring the images are listed in Table 4.
After capturing the visible images with their respective metadata, the Agisoft Metashape software generates a georeferenced orthoimage (See Fig. 3) that is a distortion-free representation with uniform scale over the complete scene. Agisoft uses SIFT for matching keypoints in a set of grayscale images and optimization algorithms to calculate the relative camera locations and a point cloud that allows reprojecting the images to generate the orthoimage that georeferenced in WGS84 [Semyonov, 2011].
The scene is considered to be mostly planar due to~9% of the visible orthoimage area was covered by 3D objects of more than 1 m height and the flights altitude for capturing the images was~80.9 m, for the visible ones, and 100.4 m, for the thermal ones. Therefore, the images can be related by homography matrices. The homography matrices between the orthoimage and every thermal and visible image are approximated as affine transformations. The homographies were computed using Code 1. It allows you to manually select at least 12 points between the orthoimage (reference image) and the thermal and visible images (target image). Then, it tunes the selected points using cross correlation. After, it approximates an affine transformation with the tuned points and saves the homography matrix. The obtained homographies are compatible with the well-known assessment protocol for detection and description that was proposed by Mikolajczyk and Schmid [1] and has been widely used to evaluate the performance of local descriptors with images on the same spectrum [2e4].