Qualitative Precipitation Estimation from Satellite Data Based on Distributed Domain-Specific Architecture

. This paper presents the qualitative precipitation estimation (QPE) based on data from the Himawari satellite and distributed-domain speci ﬁ c architecture. The QPE process consists of receiving and managing the raw data from the satellite every 10 minutes and calculating the rain-temperature relationship. The aim of this research is to decrease the QPE processing time by using distributed domain-speci ﬁ c architecture (DDSA), with 9 small computing boards are connected to a gigabit switch. Instead of using a high-performance PC, this distributed embedded system is also suitable for processing interval data receiving from the satellite every 10 minutes. The experimental results show that the proposed fast-satellite data processing algorithm is optimal for QPE processing on the DDSA platform, requiring 115.53 seconds processing time and low power consumption.


Introduction
The impacts from tropical cyclones include with heavy rainfall and strong wind that can cause significantly massive loss of life and infrastructure. Many efforts are being made to understand about the meteorological parameters in physical terms. To this end, the tropical cyclones' characteristics determined from satellite data are of interest for researchers.
Qualitative analysis via quantitative precipitation estimation (QPE) is a method used to estimate the amount of precipitation from specific satellite data such as the rainfall across a region. The maps used for precipitation estimation over a certain area and time span are compiled using the different data types collected by satellites. There have been many studies on QPE using satellite-based parameters. Regional rainfall event (RRE), defined in [1], was presented with the concept of QPE from satellite products compared with rain gauge observations from a stationary network of hydrometeorologi-cal stations. The results showed that the estimates from satellite data are more reliably modelled variation of rainfall in terms of QPE within the RRE perspective. Similarly, Panda et al. [2] investigated how to determine the characteristic features of the tropical PHET cyclone in 2010 using the satellite-based meteorological parameters, including QPE, temperature on the sea surface, and relative humidity in the upper troposphere, compared with numerical model simulations using a model of the Weather Research and Forecasting (WRF) system. These satellite-derived parameters have been shown to be suitable for determining the meteorological conditions of tropical cyclones. Adler et al. [3] confirmed that satellite-based precipitation observations over the last 30 years have presented more accurate estimates of rainfall for understanding regional and global climates. As stated in [4], the precipitation estimates from satellite archives for decades over the sea surface are significant in learning the characteristics of ocean-atmosphere interaction.
Regarding remote sensing data, the object detector relied on the anchor boxes for remote sensing imagery introduces a large number of hyperparameters and increases the computational redundancy of the detection model. Avoiding the large number of hyperparameters, an anchor-free single stage detector has been proposed based on multiscale dense path aggregation [5]. An idea of adaptive anchor networks for multiscale object detection has been presented with a feature pyramid [6]. Based on the intelligent annotation approach, an automatic aggregation via hierarchical similarity diffusion has been modified [7]. A dynamic monitoring technology using a spatial coordinates correction approach [8] has been presented by using low altitude unmanned aerial vehicle (UAV) visible light and hyperspectral images for monitoring forest dynamics. So, the research work on the remote sensing applications can be summarized in Table 1.
Earth observation imagery taken by imaging satellites is used for satellite data processing to supply environmental information about the surface temperature and weather changes around the world. An advantage of distributed domain-specific architecture (DDSA) is its ability to solve specific problems such as tasks under limited resources using FPGA-based DDSA architecture [9], high-performance computing tasks using GPUs [10], and low-power tasks using embedded systems [11].
In Thailand, the Thailand Meteorological Department (TMD) [12] is the government organization which provides the weather forecasting and warnings to mitigate natural disasters. DDSA [9] switches processing from uniprocessors to multiprocessors. While computers with standard processors run conventional programs through the operating system, DDSA can perform many small tasks extremely well. It could be much more heterogenous than recent homogeneous multicore chips [10]. In Thailand, the TMD and Ministry of Digital Economy and Society have collaborated with Hewlett Packard Enterprise (HPE) to provide highperformance computing (HPC) solutions such as the HPE APOLLO 2000 GEN10 system [13] in order to enhance the performance for forecasting the weather and predicting the natural disasters with the high accuracy [14].
In this paper, we present the deployment and evaluation of distributed domain-specific architecture for qualitative precipitation estimation using satellite data. Our main objective is to present rainfall estimation using visualizations of tropical cyclone movement and rain over Thailand for public warning. It is convenient to explain weather phenomena in the form of cyclone motion visualizations for those who are not meteorologists.

Background and Related Works
The works related to this paper can be separated into two parts in terms of qualitative precipitation estimation from satellite data and embedded heterogeneous system (EHS) architectures on heterogenous systems. Precipitation is a meteorological phenomenon that affects human lives and can cause economic loss. Quantitative predictions of precipitation have applied rain gauges, satellite data, and numerical models such as the advanced meteorological imagers onboard geostationary satellites for quantitative precipitation nowcasts [15]. Surveys of global precipitation variations using satellite and surface gauge information have been analyzed and reviewed by the Global Precipitation Climatology Project [13]. Improvements in global precipitation estimation have been observed between high-frequency brightness temperature depression and surface precipitation using satellite passive microwave retrievals, as detailed in [16].
The real-time parallel applications [17] on heterogeneous distributed embedded systems in an energy-efficient scheduling algorithm were composed of CPUs, memory (RAM) and nonvolatile memory, and a network interface card, which all are connected by a controller area network (CAN) bus. A task executed in one processor sends messages to all its successor tasks, which may be located on different processors implemented on a cluster consisting of six homogeneous dual-core processors (ARM Cortex-A20), 1 GB memory, and the Debian 4.7 operating system. The exploitation of approximate feature extraction via genetic programming for hardware acceleration in a heterogeneous microprocessor was presented in [18]. The architecture included an MSP430 CPU for top-level software control, a direct memory access (DMA) module for automated data movement in applications, a four-core feature extraction accelerator (FEA) for GP-model computation, a support vector machine accelerator (SVMA) for configurable SVMA classification, and accelerators and other modules memory mapped and interfaced to the CPU via a peripheral bus. An evaluating single-board computer cluster for cyber operations was proposed in [19] to study a 128-core Beowulf cluster comparing the Parallella computer architecture and Raspberry Pi architecture. A scalable object detection framework based on an

Study area Approach Technique
Anchor-free single stage detector [5] Multiscale dense path aggregation feature pyramid network High-level semantic information and low-level location information Multiscale object detection [6] Region-based convolutional neural networks (RCNNs) and feature pyramid network Adaptive anchor networks Automatic aggregation [7] Hierarchical similarity diffusion To measure the similarity matrix of image in the data set cluster by cluster Monitoring forest dynamics [8] Spatial coordinates correction approach Registering low-altitude UAV visible light and hyperspectral images 2 Modelling and Simulation in Engineering embedded manycore cluster [20] was presented implemented on a Parallella board. For big-data-related applications, an in-memory computing architecture for heterogeneous CPU-GPU clusters was presented in [21], called GFlink. A performance evaluation of a system consisting of a single machine and cluster compared with an Intel Core i5-4590 CPU containing four cores running at 3.30 GHz with 16 GB memory was conducted. Several kinds of GPUs were utilized, including the NVIDIA GeForce GTX 750, NVIDIA Tesla C2050, NVIDIA Tesla K20, and NVIDIA Tesla P100. All mentioned research works on heterogeneous distributed embedded systems in many areas are summarized in Table 2.
For our research, we aimed to minimize the processing time of the QPE process using parallelism on the DDSA platform. The contributions of this study are summarized as follows: (i) We propose the fast satellite data processing (Fast-SDP) algorithm to minimize the overall processing time in comparisons to the previous research    Modelling and Simulation in Engineering (ii) We propose the master-slave DDSA architecture, including one master and multiple slaves of heterogeneous embedded system boards, called Parallella, connected to a gigabit switch by using the OpenMP API for parallelism The rest of this paper is organized as follows. Section 3 presents the proposed distributed domain-specific architecture. Section 4 presents the fast satellite data processing algorithm. Section 5 details the experimental results, and Section 6 concludes this research.

Proposed Distributed Domain-Specific Architecture
Domain-specific architecture (DSA) is used to form generalpurpose cores in computer architecture to improve the level of efficiency. Domain-specific algorithms are almost always for small, computationally intensive kernels of larger systems, which should focus on a specific program.
3.1. Distributed Domain-Specific Architecture. The proposed DDSA design is based on [9] for multiple merits: parallelism, flexibility, and scalability. Each DSA consists of a host processor with local memory (L.MEM), a coprocessor with L.MEM, a shared memory (SH.MEM), and Gigabit Ethernet interfacing (GB.Ethernet). As similar to [11], the DDSA comprises a master DSA (mDSA) and multiple slave DSAs (sDSA), connected through a star network switch, depicted in Figure 1.

DDSA System
Design. The proposed system design consists of four layers in the standard OSI (Open Systems Interconnection) model, as shown in Figure 2; these are the hardware layer, operating system (OS) layer, application programming interface (API) layer, and application layer.

Hardware Layer.
We applied the Microserver Parallella board [22] in the form of DSA. Each contains a ZYNQ SoC processor and a 16-core Epiphany RISC coprocessor. A 1 GB DDR3 memory, 32 GB microSD storage, and Gigabit Ethernet are also included.

OS Layer.
Parabuntu 2019.1 [23] was used as the operating system. The components of Parabuntu include a Linux 4.14 kernel based on Ubuntu 18.04, which was installed on a microSD card in the Parallella board.

API Layer.
Four APIs were used for the DDSA: COPRTHR-2, Epiphany SDK, MPICH, and the Network File System (NFS) for DDSA architecture. COPRTHR-2 (The CO-PRocessing THReads) SDK version 2 [24] provides the libraries and tools, including the syscore, coprcc, coprcc-info, coprcc-db, libcoprthr-mpi, and libcoprthr. While Epiphany SDK [22] provides the library as Epiphany-C/C++ in the host processor and a coprocessor for implementation. MPICH is a widely used and portable implementation of the MPI (Message Passing Interface) standard [25] for DDSA with dynamic process management, parallel input/output, onesided operations, and other extensions. The OpenMP API is a scalable model with a simple and flexible interface for developing parallel applications [26]. The Network File System (NFS) API utilities incorporate with the user-space server and client tools for the kernel's NFS abilities [27]. NFS is a protocol that allows for sharing file systems over the DDSA.

Application
Layer. The performance of the proposed DDSA with the proposed fast satellite data processing algorithm is evaluated in the next section.

The Fast Satellite Data Processing Algorithm
In this section, we introduce the fast satellite data processing (Fast-SDP) using satellite data from the Thailand

Modelling and Simulation in Engineering
Meteorological Department (TMD) [12]. Quantitative precipitation estimations of the rainfall over Thailand by the Himawari-8 satellite [28] are observed every 10 minutes. Rainfall volume is the quantity of rainfall in terms of volume measured at the base station, while the rainfall rate is obtained from translating the temperature data from the satellite. It can be used to estimate the amount of precipitation fallen with the specific data from satellites across the region.
The rainfall rate (ℝℝ) is as a function of rainfall in terms of volume and temperature in units of mm/hr as follows [29]: where ϕ, ψ, and ζ denote coefficients that depended on the climates and locations. The temperature from satellite data, temp k , was used in degrees Kelvin.
The normalized temperature (N temp c ) can be expressed as where θ =273 and ε is a parameter using the color bar monitoring. Equation (2) is to estimate the amount of precipitation fallen in the form of temperature with the satellite data across the region. To convert to the rainfall rate, the results from Equation (2) are used in Equation (1) to calculate with the specific coefficients parameter referred to the climates and locations.
The proposed fast satellite data processing (Fast-SDP) algorithm is introduced in Algorithm 1. Line 1-2, the initial satellite data and 9 types of network common satellite data forms, called ".nc files", are input to the host processor and converted to the common data form language called a ".cdl file" prepared for the next processing.    distributed to the 8 sDSAs through the mDSA using the OpenMP API command. Lines 3 and 4 show the parallelism of OpenMP, searching for temp k in the .cdl file at the coprocessor. While running the algorithm, the OpenMP API exploits several threads to search temp k in the nine .cdl files as a parallel process. Line 5-7, the decision on the rain rate is made. The results are returned to the host processor when finished. Line 8 shows the conversion process of the .cdl files back to the .nc file. Line 9, the satellite data processing of all .nc files are collected at the host processor.

The Experimental Results
Experiments on the proposed DDSA concerning QPE from Himawari-8 satellite data are performed based on the Fast-SDP algorithm. We implement the proposed Fast-SDP algorithm in Epiphany-C/C++, complied by E-GCC/G++ on our DDSA. The Fast-SDP algorithm is used to optimize the QPE process, including receiving and managing the raw data of the movement of tropical storms Podul and Kajigi over Thailand during 27-31 August 2019 and 31 August to 4 September 2019 from the Himawari-8 satellite, available in every 10 minutes every day, to estimate the rainfall from the rain-temperature relationship. Initial parameters for raintemperature calculation followed the climates and locations in Thailand [29,30] ϕ = 1:1183, ψ = −0:036382, ζ = 0:5, and ε = −25. Figures 3 and 4 show the satellite imageries generated by the Fast-SDP algorithm for Thailand during 27-31 August 2019 for the tropical storm Podul and during 31 August to 4 September 2019 for the tropical storm Kajigi where high temperature is represented in red color, while low temperature is in blue color. The shades of colors were translated using Equation (1), followed by a software called GrADS (grid analysis and display system) for graphical display in a default rainbow color code.
The performance comparison of the proposed Fast-SDP algorithm implemented in different platforms is shown in Table 3. Mainly, there are two types of programming in this evaluation, (1) distributed parallel programming in DDSA hardware platforms consisting of one master and 8 slaves as proposed in section 3.1 and (2) parallel programming using coprocessors in a Parallella DSA board or in PC (i.e., iMac and MacBook).
As can be seen, the proposed Fast-SDP runs faster than the SDP in [30,31] in the DDSA hardware platform. The processing time can be reduced from 1477 s [31] and 925 s [30] to 115 s, or we can, respectively, achieve 12 and 8 times faster. For a single CPU with multicoprocessor platform, the proposed algorithm can run in parallel using its local coprocessors. Although running in desktops or notebooks is faster than running in the embedded DSA board, power consumption is higher, also the price.
The estimated rainfall rate using the Fast-SDP algorithm shown in Figures 3(b) and 4(b) is verified by comparing with the actual observation data from the Thai Meteorological Department (TMD) provided in [33]. The correlation coefficient and Euclidean distance techniques are employed for this comparison.
The correlation coefficient [34,35] is used to compare the histograms of the results image in Figures 3(b) and 4(b) and the existing measures data from [33]. The correlation coefficient DðH 1 , H 2 Þ can be expressed to match two histograms H 1 and H 2 by where and N is the total number of histogram bins, H 1 is from the results images in Figures 3(b) and 4(b), and H 2 is from the actual observation data from [33]. The property of correlation coefficient is in between 0 ≤ DðH 1 , H 2 Þ ≤ 1. The value of correlation coefficient is close to 1, which means that the two histograms H 1 and H 2 have strong correlations.
The Euclidean distance method [36] is to calculate by the Euclidean plane. We define that the point P has Cartesian coordinate ðP η , P ζ Þ and the point Q has coordinate ðQ η , Q ζ Þ. So, the pairwise distance DðP , QÞ between the points If the distance is closer to zero, it means that the two points can simply be defined as the similar points. Table 4 shows the correlation coefficient and the Euclidean distance between two sets of the data, (a) Himawari-8 Satellite before QPE & Fast-SDP results after QPE and (b) the Fast-SDP results and real observation data from [33]. The average correlation coefficient from the Fast-SDP results after QPE compared with the real observation data of Podul storm and Kajigi is about 0.9396 and 0.9722, respectively. The average of Euclidean distance from the Fast-SDP results after QPE compared with the real observation data of Podul storm and Kajigi is about 0.0590 and 0.0507, respectively. This confirms the similarity of the results from the proposed algorithm and the real observation data.

Conclusion
In this paper, the fast satellite data processing (Fast-SDP) algorithm was proposed for QPE using Himawari satellite data and based on a distributed domain-specific architecture (DDSA). The proposed master-slave architecture consists of 1 mDSA and 8 sDSAs connected by a gigabit switch. The Fast-SDP algorithm is receiving and managing raw data from the satellite every 10 minutes and calculating the rain temperature. The experimental results of satellite imageries can be used to visualize tropical storm movement relating to the rainfall over Thailand. From the performance evaluation results, the overall QPE processing time using the proposed Fast-SDP is 115.53 seconds or 8 times faster than the previous work [30]. Additionally, the correctness of the results from the proposed Fast-SDP after QPE is confirmed by the real observation data from the Thai Meteorological Department (TMD) using correlation coefficient and Euclidean distance.
The novelty of this study is the proposed Fast-SDP for solving QPE in the distributed domain-specific architecture, using data from the Himawari satellite. This work provides advantages for the TMD in terms of network flexibility, hardware scalability, low power consumption, and suitable computational performance for 10-minute window of incoming satellite data.

Data Availability
Real observation data is supported by the Thai Meteorological Department, Thailand.

Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.