Automated georeferencing of Diwata-2 multispectral imagery using feature matching

The Diwata-2 microsatellite is a 56 kg optical microsatellite that features different optical sensors such as the medium resolution Spaceborne Multispectral Sensor (SMI) and a high-resolution sensor called High Precision Telescope (HPT). This research aims to develop an automated georeferencing workflow for coregistered multiband HPT and SMI imagery with a GSD of 5 meters and 127 meters respectively. Georeferencing is done using a mixture of FAST and SIFT feature matching algorithms where HPT imagery used SIFT descriptors and detectors and SMI used FAST detectors and SIFT descriptors. The research used freely available Landsat- 8 imagery which was processed into cloud-free mosaics as reference data. Accuracy assessment is automated using AROSICS software which performs geometric correction and calculates the RMSE of local shifts based on image tie points which are automatically generated using the Diwata image as secondary image and Landsat-8 as reference image. Output of the feature matching algorithms has achieved high accuracy with an average RMSE value of 2.55 meters for HPT imagery and 65.96 meters for SMI imagery. Both RMSE values are close to half the GSD of both sensors which indicate sub-pixel precision. Future research for this method includes additional method for keypoint filtering to further increase the accuracy of the feature matching.


Introduction
As part of the roadmap to boost the country's capacity in space science and technology as well as leverage the use of space data for environmental monitoring, Diwata-2, the Philippines second 50 kg earth observing optical microsatellite was launched and deployed into orbit last October 29, 2018. Diwata-2 carries multiple imaging payloads such as the Spaceborne Multispectral Imager (SMI), High Precision Telescope (HPT), Middle-Field Camera (MFC), Wide-Field Camera (WFC), and Enhanced Resolution Camera (ERC). Among these payloads, the SMI and HPT are mainly used for scientific earth observation. The image characteristics of these two payloads can be seen in table 1. To be able to fully utilize the data from these relatively new sensors for earth observation, georeferencing or assignment of geographical coordinates for each image is imperative. Georeferencing of microsatellite images can be done using rational polynomial coefficients or Ground Control Points (GCPs). For accurate designation of geographical coordinates to images, manual selection of GCP pairs between a reference image and a secondary image is commonly done. However, this is a slow and tedious process that can cause a bottleneck when generating remote sensing products. Furthermore, the manual process of designating multiple GCPs may be prone to human errors, which may have a detrimental  Figure 1. Proposed georeferencing workflow for Diwata-2 imagery. implication on the higher-level data processing such as image classification and data fusion. Previous research in this field has shown that the use of feature matching to automatically generate GCPs between two images using a mixture of feature matching descriptors and detectors can greatly improve the keypoint count and accuracy. A mixture of FAST detector [7] and SIFT descriptor [5] have shown promising results for Diwata-2 SMI imagery but the success rate and half-pixel requirement were not met for it to be operationalized. [9] Work done by Olvera et al [6] has shown that incorporating image preprocessing methods such as the Contrast Limited Adaptive Histogram Equalization (CLAHE) method resulted in an up to 2566.2% increase in the number of keypoints between the reference and secondary image.
In this study, an automated georeferencing method using free and open-source feature matching algorithms to detect features using Diwata-2 as the secondary image and Landsat-8 bands as the reference image will be implemented. Landsat-8 data has multispectral bands with a ground sampling distance (GSD) of 30 meters while it also has a panchromatic band with a GSD of 15 meters. Landsat-8 was chosen as the reference data to maximize Diwata-2 compatibility with other satellite data. Landsat has a long running history of satellite imagery and it is often used as a reference data for many satellite providers. Two revisions of the method by Tupas et al [9] will be done. This includes applying CLAHE for image preprocessing which can increase the number of keypoints between the images and instead of performing feature matching (a) HPT images may have minimal or no overlap which means these will be treated as individual images.
(b) SMI imagery has a lot of overlap which will allow it to be processed into a single image. on every band the image will be converted to grayscale and only one instance of feature matching will be performed using grayscale images of the reference and Diwata-2 data. The other revision includes revising the quality threshold from 50% of sensor GSD to 60% of GSD. This accounts for the increase in sampling points and increase in variability during automated sampling. The workflow of the proposed method is shown in figure 1.

Materials and Methodology
Test data includes a total of 22 images with varying land features such as desert, urban, and forested areas from around the world. It should be noted that HPT and SMI sensors utilize different methods for image acquisition. While the HPT sensor can capture imagery for all bands simultaneously the SMI sensor can only capture 1 visible band and 1 infrared band at a time before switching to the next set of bands. This leads to a staggered acquisition with large spatial overlaps which allows the images to be coregistered and mosaicked into a single image as seen in figure 2b. HPT images often have minimal or no overlap between the image acquisitions as seen in figure 2a. This means that scenes in HPT missions will be treated as separate images while scenes in SMI missions will be treated as a single scene. This will result in a larger sample size for HPT imagery when compared to SMI imagery. These scenes will be identified by their unique timestamps.  Table 2 and table 3 show the location and mission details of the test data used in this paper.

Landsat-8 Reference Data
Cloud-free reference data is loaded from a global scale Landsat-8 grid based on the Worldwide Reference System used by Landsat. There are 2 separate sets of grid data which include a global grid with a 15 meter GSD using band 8 and a global grid with a 30 meter GSD using bands 2, 3, and 4. These 2 sets of data account for the high resolution HPT sensor which is matched to band 8 and the low resolution SMI sensor which is matched to a grayscale image of band 2, 3, and 4. A cloud-free scene is generated by creating a median composite image from 10-20 Landsat scenes using per-band median values [4]. Images used for the composite should represent the seasonal characteristics that are most frequently occurring in the scene.
Otherwise, median composite scenes should be prepared for the different seasons that the local scene may encounter. This includes things such as seasonal snow cover. The final cloud-free tile is used for the global Landsat-8 grid and is done for both sets of grids. A vector shapefile that represents the Diwata-2 scene extent is used to select intersecting tiles from the grid and then the polygon is used to mask out the rest of the Landsat-8 scene to maximize feature matching performance and minimize processing time.

Image Preprocessing
In this study, test images from SMI and HPT are already co-registered and stacked using a combination of a phase cross correlation for global image alignment with sub-pixel precision and AROSICS software to correct local geometric distortions that may occur due to sensor movement [3]. The reference and Diwata-2 data are converted to a grayscale image using the Luminance method [2] which use BT.601 coefficients as seen in equation 1: where Y is the luminence of the image, R is the red band, G is the green band, and B is the blue band. A 3x3 median filter is then applied to reduce the noise and smoothen outlier pixels. Subsequently, a CLAHE histogram modification between each reference and secondary pair is applied to boost the number of keypoint numbers. Image values are then scaled to an 8-bit range due to feature matching algorithm requirements. If the Diwata-2 image was taken during a descending pass then it is suggested to rotate the image to north up to maximize the potential keypoints generated between the reference and Diwata-2 image.
To preserve the original DN values in the image, a separate copy of the image will be used where all the preprocessing will be done specifically to enhance the image for feature matching purposes. The geotransform data calculated from feature matching will then be applied to the original unmodified Diwata-2 imagery.

Feature Matching
For the HPT images, the reference imagery will use the higher resolution panchromatic band of Landsat-8 and the HPT imagery will be converted to grayscale based on the blue, green, and red bands. Meanwhile, the SMI images will use the 30 m blue, green, and red bands of Landsat-8 as reference. Both images are converted to grayscale during the SMI feature matching process. The grayscale images are ingested into the feature matching algorithm to generate keypoints between the image pairs that act as GCPs.
Feature matching is done using the OpenCV library in Python. A combination of FAST detector and SIFT descriptor is used for SMI imagery and SIFT detector and SIFT descriptor is used for HPT imagery. Keypoint filtering is done using Lowe's ratio test [5] and a value between 0.6 and 0.7 is used depending on the scene. An edge buffer is also used where keypoints that are close to the edge of the image are removed. This removes false keypoints that are generated between the boundary of the no data area and the image itself. The last keypoint filtering method used is to calculate length and angle of all keypoints and filter out outlier keypoints. This is based on the assumption that most keypoints are valid so if the x and y coordinates of the keypoint pair are plotted then each keypoint pair should have similar angles and distances. Any keypoint pair that deviates from the average values are filtered out. After filtering, only high quality keypoints are left which can be seen in figure 3.
The column and row coordinates in the image are transformed into coordinate data using the reference data's geotransform information using Rasterio's transform function. The geotransform that is computed from these keypoints is then applied to all individual bands in the Diwata-2 stack. A new multiband stack is created from the images with the new geotransform data. Applying the same geotransform to all the bands ensures the overall alignment of the Diwata-2 coregistered stack is not affected.

Geometric Correction
Geometric correction is done to fix any small misalignment that may have occurred during the feature matching process. This is done using AROSICS software which generates a grid on the image and calculates local shifts for each point on the grid. [8] The Landsat-8 image is used as a reference image and a moving window is used to calculate local image misalignment. The parameters used in this process include a window size of 120 pixels, grid resolution of 20, and a minimum tie point reliability of 50 during tie point filtering.

Accuracy Assessment
Automated accuracy assessment on the final product is done using AROSICS software. It performs the same process as geometric correction as seen in Section 2.3 wherein it calculates local shifts, however, the shift data is not applied. Root mean square error (RMSE) of local shifts in meter units is calculated to assess overall accuracy. The software uses a grid of filtered  high quality image tie points wherein local shifts will be calculated at each image tie point. The automated RMSE value should meet 60% of the GSD of the sensor. Anything above this value should be manually verified for anomalous shifts. The parameters used for accuracy assessment include a grid resolution of 30, minimum reliability of 30, and window size of 120 pixels.

Results and Discussion
Results from the accuracy assessment of georeferenced SMI images are shown in table 4 and HPT results are shown in tables 5 and 6. Total sampling points indicate the remaining number of image tie points after the filtering process which are then used for local image correction. The RMSE of all the local shifts is then calculated in the RMSE column. Lastly, the GSD ratio indicates the ratio of the RMSE value compared to the GSD of the sensor. Results of the proposed method show an overall accurate alignment in both SMI and HPT sensors. This is indicated by the low RMSE values and a manual visual inspection. RMSE values are compared to the GSD of each sensor in a column called GSD Ratio and should ideally have a maximum value of 60%.
The SMI missions have an average RMSE error of 83.72 meters and average GSD ratio of 65.92% as seen in table 4. This elevated RMSE value and GSD ratio are attributed to an erroneous result with SMI mission ID 3. Despite a very high RMSE value of 148.12 meters, a visual inspection showed that the image was aligned. A possible reason for this error was that the sampling points that showed bad misalignment were over an area where the coastline was not as defined as other areas due to difference in water quality and higher water turbidity. Other areas in the image were seen to have low RMSE values. Not including this error, the average RMSE value is 67.69 meters and GSD ratio is 53.30%. Another issue observed from the SMI Mission ID 5 data was the presence of fewer sampling points. This may be due to the amount of clouds and water pixels in the scene. However, despite only having 11 sampling points, these points are well distributed throughout the images and resulted to an RMSE value of 72.3 meters. Furthermore, manual inspection of the scene suggested that the image has relatively good alignment.
The HPT mission over Mexico had good results with an average RMSE value of 2.39 meters  and average GSD ratio of 47.76% as seen in table 5. The distribution of sampling points on the HPT scenes over Mexico was observed to be heavily affected by shadows. In particular, the HPT scene with a timestamp T201645010 contained several land features with shadow, which may have resulted in higher RMSE values. The Mexico mission largely features hills, agriculture, rivers, and some urban features. Meanwhile, the HPT mission over Dubai had an average RMSE value of 2.72 meters and an average GSD ratio of 54.34% as seen in table 6. There is a wide range of RMSE values which went up to 5.11 meters which is 102.20% of the sensor GSD. A manual inspection showed that the images were aligned. The high resolution sensor of HPT is able to catch individual shadows for each building and certain sampling points may have been affected due to the difference in shadows between the scenes which can result in erroneous shift data. The scenes in Dubai largely consisted of urban and water features which may have led to more accurate feature matching compared to HPT Mexico scenes. While the proposed method has a high success rate in terms of achieving subpixel accuracy, the current study is subjected to different limitations in terms of using Landsat-8 data as a reference imagery. First, at each point in the grid, a local area is extracted based on the window size parameter of AROSICS. This assumes that the local scene in the Landsat-8 data and the Diwata-2 image feature are similar. However, several factors may affect the luminosity of a local area. Diwata-2 is an afternoon satellite while Landsat-8 is a morning satellite. This difference in image acquisition time largely affects the feature matching of Landsat-8 and HPT imagery, which has a relatively higher spatial resolution and smaller coverage as seen in figure 4. Most often, not enough sampling points will be generated in mountainous areas due to the difference in size and position of the shadows. Moreover, the time difference in image acquisition may result to varying cloud coverage which reduces the number of valid sampling points. Consequently, this negatively impacts the accuracy of the feature matching. These issues can lead to an irregular distribution of sampling points which may lead to inaccurate assessment results. [10] Due to this limitation, the current study utilizes test images with nearly cloud-free scenes.
For this same reason, some images in the HPT Mission ID 2 were excluded from the test data. In future improvements, this can be addressed with the use of an accurate cloud and cloud shadow mask.