A High-Performance Cloud-based Remote Sensing Data Reprojection Method

Remote sensing (RS) data are the cornerstone of the digital earth. At present, the productions from Earth observation satellites are updated frequently with refined data quality. However, traditional methods are implemented mostly based on a single machine. When processing large-scale data in batches, there are limitations in computing power and storage, and the expansion is relatively cumbersome. This demands a more efficient and complex computation platform, such as cloud computation. Specifically, the reprojection procedure, unlike other data processing procedures, is both computationally intense and I/O-intense. This paper proposes a high-performance cloud-based RS data reprojection method called OCRM (optimised cloud-based reprojection method). First, the data process flow of reprojection was optimised by improving the I/O efficiency and computational efficiency. Second, with scheduling and controllers under cloud computation, high-performance massive RS data reprojection was achieved. The overall performance of the RS data reprojection module with the optimised algorithm in a single machine considerably outperformed that of the commercial software. The scheduling module can maximise the use of cloud resources to achieve high-performing reprojection calculations. The results indicate the stable performance of the proposed method, and we are working to expand the structure of OCRM to other computationally intensive remote sensing processes, such as data preprocessing and data mining with deep learning.


Introduction
Currently, RS data are widely used in the fields of earth science, disaster early warning, emergency dispatch, land management, etc. [1] [2]. RS data are one of the most important data sources of the digital earth [3]. The era of remote sensing big data and digital earth big data has arrived, and thus, the demand for massive RS data computation has grown rapidly [4] [5]. By this trend, specific regimes for RS big data preprocessing and cleansing [6] are expected to maintain the structural unity of RS data. Preprocessing includes geometric correction, radiometric correction, spatial and temporal reconstruction, quality evaluation, etc. [7]. Among them, the procedure that transfers multisource RS data of different projections into the same projection coordinate system is called reprojection. Reprojection is a typical computationally intensive and I/O intensive procedure that is widely used in IOP Publishing doi: 10.1088/1755-1315/1004/1/012005 2 various preprocessing steps. Its implementation efficiency deeply influences RS data preprocessing efficiency. However, different data providers use different projected coordinate systems. Different fields need RS data from different projection coordinate systems due to factors, such as technology stacks, demands, policies and equipment. The mismatch between providers and consumers needs services that can adapt highly dynamic data delivery demands and rapidly expand service capabilities. However, traditional commercial software (such as ArcGIS) can only support frame-level reprojection and can only run on a single machine. It is not able to cope with dynamic reprojection demands, process superlarge RS data, and rapidly expand to support large-scale data processing. To address these issues, this paper proposes a high-performance cloud-based RS data reprojection method called OCRM. To solve the problem caused by computational intensity, this research first optimised the calculation under basic efficiency and high dynamic demand, which improved the computational efficiency of reprojection computing. To minimize storage intensity, this research expanded the capacity through task scheduling and management, which optimised the migration ability and the efficiency of the transfer projection of superlarge data. This work is based on our previous work on ScienceEarth [8][9].  Our technology architecture included three layers: the user layer, the compute layer, and the storage layer. The user layer received the user's reprojection demands and then sent them to the computation layer. The compute layer used the subdivision method and divided the user's requests into task groups. Then, the resource manager scheduled those task groups and sent each task to the compute nodes. The compute node was responsible for the reprojection computing procedure. The whole compute layer was based on a storage layer structured by Hadoop [10]. Our technology architectural design achieved computational and storage distributions. OCRM was implemented in conjunction with the technical improvements mentioned below.

Data read/write optimisation
To combat the inefficiency of I/O when random reading is reprojected, we used the locality of the projection algorithm 1 , which improved the aggregation of data reading. We performed a single point reprojection operation on the boundary required by the user to determine the original data range. The original data range was applied to read the range of image data into memory. Then, all calculations and copies of the data occurred in memory, and the reprojected data were written to disk after processing. This optimisation changed the random access to sequential reads and writes, which decreased the stress for the hard drive. This change greatly increased the speed of the I/O. To reduce the data size and better adapt to cloud computation, the COG format was also implemented in this study.

Memory addressing optimisation 2
Since reprojection calculations involve a large number of image band operations, such as data filling and cell interpolation, converting data from AOB (array of band) structures to AOC (array of colour) structures can effectively reduce the memory addressing time.   Figure 2, a typical array of AOB structures, which is very friendly for I/O reading and writing, can use the speed of hardware sequential read and write to improve I/O efficiency. However, for calculations such as reprojection, which require read and write data within the band frequently, using AOB structures can cause memory addresses to differ largely during calculations. As shown in Figure 3, this large difference caused addresses to exceed the range of a single memory page. The system frequently transformed page memory. Therefore, an additional expenditure of time would occur.

1.
Detailed discussion of locality at Appendix A.

2.
All examples in this section are 3 bands.   Figure 5, after rearranging the data according to AOC, each data address was near the neighbour. This process greatly reduced the possibility of page memory displacement and improved the efficiency of addressing.

Pre-building the environment
Building the environment while computing will consume time. However, the environment needed is stable for the same algorithm, and each computation only needs to change the input data. This feature leads us to try a prebuilt computation environment to compress the reprojection computation time. To ensure the correctness and generalisation use of the components, we chose GDAL [11] as the technical base.

Cloud-based tasks management design
For large-scale RS data, single machine performance may be constrained, which can be improved by calculating in parallel by scheduling. Since the range of user requirements does not often fall into individual RS data within the underlying data, converting the user's needs into several RS data reprojection subtasks is often needed. We proposed a division method for the reprojection of individual RS data. The file system HDFS and the task scheduling system, designed with the characteristics of reprojection tasks, achieved a dynamic expansion of resources and completed oversized reprojection. After the cluster was built according to this study, we proposed the following Figure 6. The logical structure of the system. The system is divided into two layers: the computing layer and the storage layer. The overall system is driven by the managing node. IOP Publishing doi:10.1088/1755-1315/1004/1/012005 5 cloud-based task management methods to ensure the high availability, stability and overall controllability of the system. The overall structure is shown in Figure 6.

Resources use function
To meet the controller's resource management needs, the manager needed to estimate the memory resources used by the task before running. It also prevented running instability when the physical machine was running out of memory. We found that the memory use of the task execution had the following mathematical relationships: Input data is:  Starting coordinates under the target projection:Proj x ,Proj y  Amount of change per unit under the target projection:Proj ∆x ,Proj ∆y  Width and height of the target picture:Proj width ，Proj height  Reprojection function:p(Proj x1 , Proj y1 ) = (Ori x1 , Ori y1 ) The bounding box for the target projection can be calculated as follows: Proj xEnd = Proj x + Proj ∆x * Proj width (1) Proj yEnd = Proj y + Proj ∆y * Proj height (2) Proj minx = min(Proj x , Proj xEnd ) (3) Proj miny = min(Proj y , Proj yEnd ) (4) Proj maxx = max(Proj x , Proj xEnd ) (5) Proj maxy = max(Proj y , Proj yEnd ) (6) As shown in Formula (7), the bounding box of the original data can be obtained by calculating the projection of the boundary points of the target projection: We use the original unit change read from the original data:Ori ∆x , Ori ∆y . We calculate the width and height of the original data required as follows: Ori height = (Ori maxy − Ori miny ) Ori ∆y The use of memory resources can be calculated with the following formula: memory = (Ori width * Ori height + Proj width * Proj height ) * B * U + C (14) B: number of bands U: number of bytes occupied by data type C: constant memory for the environment (in our environment C = 2MB)

Subdivision method
The task will be divided if memory consumption exceeds the threshold (we set it to 750 MB). The subtask will be divided later if the subtask still fails to meet the threshold. For the subdivision method, we divide the image according to the configuration, as shown in Figure 7. evenly divided y-Axis, 2 equally divided).
By applying this method, the read/write conflict area in the image task can be controlled at the segmentation boundary. The parallel executability of the overall task can be improved by cooperating with the scheduling strategy. After the user's task goes through the division process, if there is no division, the task is directly put into the computing node pool to wait for the allocation of computing nodes. If the split task group has only one layer, all of the subtasks in this layer are sequentially placed into the computing node pool. If multilayer division occurs, the batches are divided into n layers, and then each batch is scheduled in a single-layer manner.

Implementation environment introduction
This research has two implementation environments: Table 1 for a single machine and Table 2 for cloud computation.

Single machine efficiency comparison
The single machine efficiency experiment using an RS data original projection is EPSG:4513, the size is: 4518 * 4732, and compares two common usage scenarios: The efficiency comparison source code can be found at https://gitee.com/ddyy_1213/tile-extract. The project experiment can be reproduced under the Linux environment with Gdal installed.
In Mission 1, our implementation was better than that of ArcGIS, but it was slower than that of Gdal, which will be discussed in Section 4.2.
In Mission 2, our implementation outperformed Gdal and GdalPY by an order of magnitude in time consumption and was almost the same as the C version of that of Gdal in memory consumption.  It can be seen from Table 4 that after the above optimisation, the reprojection operation still occupies the main body of the overall execution.

Cloud computation experiment
During the study, we received a request to process the latest satellite remote sensing data from a certain place into usable RS products. We divided the satellite data into three batches, each of which contained one hundred images. The data volume per image was approximately 200 G (uint8, 5 band), and the data diversities were almost the same. We tested the traditional reprojection method with the three images in the first batch and processed them for an average of approximately 19 minutes. Traditional reprojection has difficulty achieving parallel processing. The processing efficiency of a batch is estimated to be far more than 60 hours if considering the single hard disk capacity limitation and data transfer time. This efficiency makes it difficult to complete the task on time. A large amount of manual participation is extremely error-prone. We successfully processed data after applying OCRM and provided the required RS products.
In this practice, we used 3 computing nodes and each node was configured with 7.5 GB of memory for the reprojection calculations. In the end, it took 57.56 hours to complete the reprojection calculations of the three batches of data.

Optimisation effects
From Figure 9, after I/O improved, the execution efficiency on Mission2 steadily surpassed that of Gdal. However, the execution time was still generally more than 50 ms. Later, with the implementation of the prebuilt optimisation, the execution time was reduced to less than 50 ms, but the performance was still not stable enough. Finally, after the introduction of AOC optimisation, the execution time was stable in the 30-50 ms interval.   Figure 9. Detailed comparison of the optimisation items of the reprojection module.
The y-axis is the execution time and the x-axis is the round of executions.  Table 4. Gdal achieved the overall efficiency improvement by reducing the number of reprojection calculations. In Mission 2, the performance of OCRM was far ahead of Gdal. Therefore, we believe that purely from the perspective of the reprojection process, the optimisation scheme of this study already

Cloud computation practice
In the cloud computation practice in Section 3.3, since the total data volume exceeds 120 TB, only two of our storage nodes could safely store project data. Due to the limitation of storage nodes, too many computing nodes would have been stalled due to I/O speed. After considering the calculation speed and the I/O speed, we only used 3 computing nodes to process the data. If there was no storage limitation, with the expansion of computing nodes and storage nodes, the task execution time could have been be shortened.

Conclusion
This paper proposes a high-performance cloud-based RS data reprojection method called OCRM. First, we optimised the process of the reprojection algorithm of RS data. After I/O, addressing, prebuilding and other optimisation methods, the time of reprojection calculation is effectively reduced. To ensure the accuracy and versatility of the results, as well as the ease of use in engineering, we used the reprojection algorithm provided by Gdal. Cloud-based computing is used to optimise the computing efficiency of a large amount of RS data, avoiding the difficulty of expanding computing resources and storage resources. The experiment shows that the overall performance of the optimised algorithm when running the RS data reprojection module on a single machine is considerably better than that of the current commercial software. Second, the dispatch system can maximise the use of cloud resources to achieve high-performance reprojection calculations. This research supports single-machine deployment or multi-node deployment, fully considering the availability issues in different computing environments and the scalability of the overall system. Possible follow-up optimisation directions include: 1. We may research the implementation of reprojection algorithms and try to propose more efficient algorithms. 2. We may try to implement the reprojection algorithm with more efficient software and hardware. 3. We may try to explore the method of taking points at intervals to reduce the number of reprojections.

Appendix
Appendix A: Spatial locality of reprojection Spatial locality is defined as the feature that immediately accesses its neighbouring addresses when it is likely that a program accesses a memory address. The reprojection operation is executed sequentially in the original projection space, which has spatial locality. When performing interpolation operations in the postprojection space, it also visits 8 directional points of the postprojection point, which has spatial locality.

Appendix B: System migration ability
Different modules were separated, which enhanced the expansion capability of the overall system when interacting with each other by protocol. When facing other kinds of data sources, the mismatched management node or computing node can be replaced in a modular manner to adapt the diversity of the data. In addition, our reprojection method is only a module of this architecture to achieve cluster reprojection, and other reprojection methods or other calculation methods can also be incorporated into the system as cloud-based tasks after the control protocol has been extended.