CUDA-based parallelization of time-weighted dynamic time warping algorithm for time series analysis of remote sensing data
Introduction
In recent years, changes in human activities have intensified changes in the global surface (Battude et al., 2016). A time series analysis of remote sensing images can accurately and efficiently identify these changes and has become an important technical means to obtain land use information (Gomez et al., 2016). The sentinel series satellites of the European Space Agency (ESA) and the spot series satellites of the French Center for Space Research (CNEs) have been launched one after another, providing researchers with satellite image time series with high time and spatial resolutions (Pelletier et al., 2016). The rapid development of remote sensing earth observation technology not only provides massive data for time series analysis, but also brings the problems of a large amount of file data and a long processing time for its application, which makes it difficult to meet the increasing real-time demand of GIS spatial analysis (Zhu et al., 2018). Therefore, how to improve the computational efficiency and reduce the time consumption of the time series analysis has become a prominent research area for current scholars (Coelho et al., 2017).
Among the algorithms for time series analysis of remote sensing images, the time weighted dynamic time warping algorithm (TWDTW) is used for time series pattern matching of remote sensing images (Maus et al., 2016). Research shows that it has achieved remarkable results in crop classification (Gella et al., 2021), seasonal information extraction (Narin et al., 2021), vegetation type identification (Cheng and Wang, 2019), land use, and land cover mapping (Viana et al., 2019). The TWDTW algorithm is improved from the dynamic time warping (DTW) algorithm. The working principle of DTW is to compare the similarity between known event patterns and unknown time series (Sakoe and Chiba, 1990). Due to its computational characteristics, the algorithm has a time and space complexity of , which limits its application in large datasets (Oliveira et al., 2018). Based on the above considerations, many scholars are committed to the methods and technologies of high-performance parallel optimization of the algorithm. With the maturity of universal programming tools, such as unified computing device architecture (CUDA), researchers have performed much research on the parallelization of the algorithm on multicore computing platforms, achieving remarkable results (Aldinucci et al., 2021). Zhu et al. (2018) proposed an algorithm to find the possible starting position of similar local subsequences. This method checks the local optimal combination for matching, but the remote sensing data are arranged neatly and cannot benefit from the method of searching irregular subsequences. Xiao et al. (2014) proposed a DTW parallel algorithm based on prefix calculation. The algorithm uses a specific data dependency transformation to solve the problem of instance size limitation. However, the algorithm has a high query cost and a poor scalability of low-dimensional data. Zhang et al. (2012) introduced KNN to estimate the lower bound and allocated the path search and lower bound estimation to two cores for calculation. Each calculation requires the transmission cost matrix, which has become a performance bottleneck. The above literature provides an idea for optimizing the DTW algorithm through GPU parallel computing, but it is not completely suitable for remote sensing image processing. A spatial parallel TWDTW (SP-TWDTW) is proposed. Oliveira et al., (2018) considered the first law of geography and determined the types of features in the study area by coordinating the available core analysis timeline and spatial autocorrelation technology, the experimental results show that the response speed is increased by 11 times. In this scheme, the fine-grained parallel strategy is selected, and the parallel design is carried out at the operation level of the cost matrix. Each thread is responsible for the operation of a diagonal unit without dependency. However, this scheme does not make full use of the advantages of the GPU programming model (Wu et al., 2017). The calculation of the cumulative cost matrix requires frequent access and access in the video memory. Each access requires hundreds of clock cycles, which greatly increases the access overhead. Different diagonal lengths will lead to uneven task allocation among threads, which will cause some threads to be idle, resulting in a waste of GPU resources. The important feature of remote sensing data is the long revisit period and wide spatial range of satellites, which leads to an imbalance of the spatial axis and time axis of remote sensing time series datasets. This is very different from other time series data applications in the original target algorithm. The abnormal memory distribution limits the memory throughput and instruction throughput, which seriously affects resource utilization. There are currently few parallel methods for analyzing time series based on the characteristics of remote sensing images. Therefore, this paper proposes a time weighted dynamic time warping parallel algorithm based on CUDA. Combined with the geospatial data abstraction library (GDAL), the data structure is reorganized according to the memory access principle in the writing link to design a multithreaded architecture and memory access model. Then, the cumulative cost matrix is established by using multithreading in parallel, which is separated from the pattern matching stage to achieve partial fine-grained parallelism. Then, the multilevel cache of the GPU is used to compare the subsequence with different modes in different cores, and the final cumulative cost array is returned to the CPU, which is good at logic control and responsible for classifying and outputting labels. Finally, the computational efficiency of parallel algorithms under different conditions is compared and analyzed, and the effects of thread organization, time series length and number of patterns on the performance of the algorithm are discussed to provide a reference for the time series analysis and optimization on heterogeneous platforms.
Section snippets
Time series analysis of remote sensing images
Time series analysis is a method to extract important statistical information and features from time series data (Fu and Weng, 2016). A time series represents the set of ordered values obtained by n samples at fixed time intervals. In the field of remote sensing, remote sensing satellites revisit a certain area according to a fixed period, map its data into a three-dimensional array in space and time (as shown in Fig. 1), take the coordinates of pixels as the spatial axis, and take the time
Methods
By analyzing the flow of the serial TWDTW algorithm, it can be seen that the calculation of the cumulative cost matrix is the most time-consuming and costly of the entire algorithm (Xiao et al., 2014). The row method for calculating the cumulative cost matrix is shown in Fig. 5. The construction of the cost matrix depends on the calculation of the NDVI timing curve. The data dependence of the algorithm leads to low data reuse and low parallelism. The existing fine-grained parallel
Experimental environment
The experimental platform of this paper is an Intel (R) core (TM) i7-8700 CPU, whose main frequency is 3.2 GHz, the system memory is 8 GB, the GPU platform is NVIDIA GTX 1050 Ti, the architecture is Pascal, the video memory is 4 GB, the single precision floating-point performance is 2.1 Tflops, and there are 768 stream processors. The hardware and software environment includes Windows 10, Visual Studio 2017, and CUDA10.0. Hardware equipment parameters are listed in Table 1.
Experimental result
The experimental data
Conclusion
Aimed at the low computational efficiency of the TWTDW algorithm in remote sensing image time series analyses, a TWDTW parallel algorithm based on CUDA was proposed in this paper. The cumulative cost matrix was established through a multithreaded architecture, and each thread independently calculates the distorted path length to solve the problem of data dependence in order to improve the parallelism of the algorithm. At the same time, the memory usage model is optimized to reduce memory access
Code availability section
Name of the code/library: Time weighted dynamic time warping parallel algorithm based on CUDA.
CRediT authorship contribution statement
Hengliang Guo: Conceptualization, Methodology, Supervision. Bowen Xu: Data curation, Methodology, Writing – original draft, preparation. Hong Yang: Writing – review & editing. Bingyang Li: Methodology, Supervision. Yuanyuan Yue: Methodology, Supervision. Shan Zhao: Writing – review & editing, Funding acquisition.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
Funding: This work was supported by 2020 Science and technology project of innovation ecosystem construction, National Supercomputing Zhengzhou center-Research on Key Technologies of intelligent fine prediction based on big data analysis (Number:201400210800).
References (24)
- et al.
Practical parallelization of scientific applications with OpenMP, OpenACC and MPI
J. Parallel Distr. Comput.
(2021) - et al.
Estimating maize biomass and yield over large areas using high spatial and temporal resolution Sentinel-2 like remote sensing data
Remote Sens. Environ.
(2016) - et al.
A GPU deep learning metaheuristic based model for time series forecasting
Appl. Energy
(2017) - et al.
A time series analysis of urbanization induced land use and land cover change and its impact on land surface temperature with Landsat imagery
Remote Sens. Environ.
(2016) - et al.
Mapping crop types in complex farming areas using SAR imagery with dynamic time warping
ISPRS J. Photogrammetry Remote Sens.
(2021) - et al.
Optical remotely sensed time series data for land cover classification: a review
ISPRS J. Photogrammetry Remote Sens.
(2016) - et al.
Assessing the robustness of Random Forests to map land cover with high resolution satellite image time series over large areas
Rem. Sens. Environ.
(2016) - et al.
Dynamic programming algorithm optimization for spoken word recognition
Reading Speech Recogn.
(1990) - et al.
KBLAS: an optimized library for dense matrix-vector multiplication on GPU accelerators
ACM Trans. Math Software
(2016) - et al.
Efficient time series clustering by minimizing dynamic time warping utilization
IEEE Access
(2021)
Forest-type classification using time-weighted dynamic time warping analysis in mountain areas: a case study in Southern China
Forests
Sentinel-2 cropland mapping using pixel-based and object-based time-weighted dynamic time warping analysis
Remote Sens. Environ.: Interdiscipl. J.
Cited by (1)
A parallel strategy to accelerate neighborhood operation for raster data coordinating CPU and GPU
2023, Cartography and Geographic Information Science