Multimodal Semisupervised Deep Graph Learning for Automatic Precipitation Nowcasting

Precipitation nowcasting plays a key role in land security and emergency management of natural calamities. Amajority of existing deep learning-based techniques realize precipitation nowcasting by learning a deep nonlinear function from a single information source, e.g., weather radar. In this study, we propose a novel multimodal semisupervised deep graph learning framework for precipitation nowcasting. Unlike existing studies, different modalities of observation data (including both meteorological and nonmeteorological data) are modeled jointly, thereby benefiting each other. All information is converted into image structures, next, precipitation nowcasting is deemed as a computer vision task to be optimized. To handle areas with unavailable precipitation, we convert all observation information into a graph structure and introduce a semisupervised graph convolutional network with a sequence connect architecture to learn the features of all local areas. With the learned features, precipitation is predicted through a multilayer fully connected regression network. Experiments on real datasets confirm the effectiveness of the proposed method.


Introduction
e goal of precipitation nowcasting is to predict the future rainfall intensity in a local region over a relatively short period. is method plays an important role in land security and emergency management of natural calamities. ough it is very crucial, precipitation nowcasting still depends on the real-time manual analysis of meteorological observation data by forecasters.
To address this issue, researchers are increasingly trying to replace forecasters with computers, e.g., [1]. In essence, precipitation nowcasting is a spatiotemporal sequence forecasting problem with a sequence of past meteorological observation data as input and a sequence of fixed numbers (usually larger than 1) of future precipitation as output. However, traditional optical flow-based methods have unsatisfactory performance.
Precipitation nowcasting has three long-standing problems. e first challenge is spatiotemporal correlations. Precipitation at any location is influenced by the weather condition of its neighboring region over a relatively short period. e second challenge is diversified observation data. With the progress of technology, we can obtain various meteorological or nonmeteorological observation data of a certain region, e.g., radar echo maps, satellite images, topographic map, temperature, and air humidity. However, handling these data in a unified framework is an open problem. e third challenge is semisupervision. It is generally known that the distribution of meteorological stations for observing precipitation is uneven, implying that precipitation at many locations is unknown.
Since the recent advances in deep neural networks [2,3], their high capability in various classification and regression tasks has been successfully demonstrated, e.g., image classification by [4], object detection by [5], video representation by [6], and speech recognition by [7]. ere are also some studies on precipitation nowcasting using deep learning techniques. Xingjian et al. [8] and Shi et al. [9] first introduced a convolutional long short-term memory (ConvLSTM) network to capture spatiotemporal correlations, which has been shown to outperform traditional optical flow-based methods for precipitation nowcasting, indicating that deep learning models have a huge potential for solving this problem. Subsequently, Singh et al. [10] and Karevan and Suykens [11] introduced a more complicated LSTM structure, and Shi et al. [12] used convolutional neural networks (CNNs) to extract more efficient representations.
Although the methods above can achieve some progress, their inputs are merely weather radar echo maps, and they only focus on addressing the problem of spatiotemporal correlations, neglecting the latter two problems.
Inspired by the success of Deep Learning Vision for Nonvision Tasks by [13], we propose a novel multimodal semisupervised deep graph learning framework for precipitation nowcasting in this study. In contrast to previous studies, different modalities of observation data (including both meteorological and nonmeteorological data) are modeled jointly, and thus, benefiting each other. All information is converted to an image structure. en, precipitation nowcasting is deemed as a computer vision task to be optimized. We convert all observation information into a graph structure and introduce a semisupervised graph convolutional network (GCN) with a sequence connect architecture to learn the features of all local areas for handling areas without available precipitation. With the learned features, precipitation is predicted through a multilayer fully connected regression network.
We summarize the contributions of this work as follows: (1) We introduce a novel short-term precipitation nowcasting solution by leveraging multimodal observation data (including meteorological and nonmeteorological data) (2) A unified multimodal semisupervised deep graph learning is proposed, aiming at simultaneously addressing three long-standing problems in precipitation nowcasting, i.e., spatiotemporal correlations, diversified observation data, and semisupervision (3) To the best of our knowledge, this is the first attempt to address the precipitation nowcasting problem using a GCN (4) e experimental results on the collected large-scale real dataset validate the effectiveness of the proposed solution e rest of the paper is organized as follows. In Section 2, related work is reviewed. Section 3 presents the problem formulation and introduces the proposed semisupervised deep regression framework in detail. In Section 4, we report and analyze the experimental results. Finally, we conclude the paper and discuss future work in Section 5.

Related Work
In this section, we briefly review the closely related methods in two folds: deep learning-based precipitation nowcasting methods and graph-based semisupervised learning techniques.

Deep Learning-Based Precipitation
Nowcasting. Xingjian et al. [8] formulated precipitation nowcasting as a spatiotemporal sequence forecasting problem and proposed a ConvLSTM model, which extends the LSTM by [14], by adding convolutional structures into both input-to-state and state-to-state transitions to solve the problem. Using radar echo sequences for model training, the authors showed that ConvLSTM was better at capturing spatiotemporal correlations than fully connected LSTM and provided more accurate predictions than the real-time optical flow via variational methods for echoes of radar algorithm by [15]. However, the convolutional recurrence structure in ConvLSTM-based models is location invariant, whereas natural motion and transformation (e.g., rotation) are location variant in general. To address this, Shi et al. [9] proposed a trajectory-gated recurrent unit model that can actively learn a location-variant structure for recurrent connections. To obtain a more robust representation, Karevan and Suykens [11] proposed a two-layer spatiotemporal stacked LSTM model. Unlike LSTM-based methods which were used widely on sequence learning and time series prediction, Shi et al. [12] used recurrent dynamic CNNs to handle the spatiotemporal information of radar data.
Unfortunately, all techniques mentioned above only consider radar echo maps as the input data. In addition to radar echo maps, other types of meteorological observed data, such as satellite images, temperature, and air humidity, and nonmeteorological observation data, such as "topographic maps" are available, and these data were incorporated in this study. Notably, data from different sources are complementary. In this work, the proposed model is capable of not only handling spatiotemporal information from different observation sources simultaneously but also unifying labeled and unlabeled data.

Graph-Based Semisupervised
Learning. Existing semisupervised learning methods using graph representations can be roughly be divided into two categories: methods that use some form of explicit graph Laplacian regularization and graph embedding-based approaches.
Typical graph Laplacian regularization techniques include label propagation by [16], manifold regularization by [17], and deep semisupervised embedding by [18]. Typical graph embedding-based methods include DeepWalk by [19], LINE by [20], and node2vec by [21]. Perozzi et al. [19] learned embedding via the prediction of the local neighborhood of nodes, sampled from random walks on a graph. Tang et al. and Grover and Leskovec [20,21] extend DeepWalk with more sophisticated random walk or breadth-first search schemes.
Although the methods mentioned above have achieved significant progress, a multistep pipeline is included in these frameworks, which means that they cannot be trained in an end-to-end way. Recently, Kipf and Welling [22] proposed a more simplified GCN for semisupervised learning by employing a first-order approximation of spectral filters. e GCN and subsequent variants have achieved state-of-the-art results in various application areas, including multilabel classification by [23], zero-shot learning by [24], social networks by [25], and natural language processing by [26].
In this paper, graph-based semisupervised learning is utilized for precipitation nowcasting. ere are two main differences between the proposed method and that by [22] (including its variants). First, as observation data are obtained from different sources, graphs in our framework are multimodal. In contrast, most graphs in conventional graphbased semisupervised learning methods are built by a single modality (image, text, etc.). Second, precipitation nowcasting is a regression problem in our study, whereas previous techniques focus on classification tasks.

Method
In this section, we first define problems of precipitation nowcasting processed in this paper. en, we introduce the proposed multimodal semisupervised deep graph learning in details.

Problem Definition.
Our goal is forecasting precipitation at a place for the next several hours based on the weather conditions of its neighbors over the past several hours. Precipitation is related to many factors. As illustrated in Figure 1, according to the existing observation conditions, herein, we assume that future precipitation can be predicted via modeling the following factors: radar echo maps, air humidity images, satellite images, temperature images, and available precipitation data over the past several hours with the corresponding topographic map. In addition, as precipitation prediction of every longitude and latitude is impossible, we assume that if we can divide a certain region to many cells (that is, each cell represents a local area) then precipitation at different locations in the same cell can be deemed as almost the same.
As the factors belong to different modalities, precipitation nowcasting can be deemed as a multimodal analysis task. Moreover, from Figure 1, we can observe that precipitation in some cells may be unknown when the target area is divided into many grid cells because precipitation stations are not placed everywhere.
us, precipitation nowcasting is also a semisupervised learning problem in this paper. erefore, we present a multimodal semisupervised learning method to solve the precipitation nowcasting problems herein. e problem formulations are detailed below. Let are the temperature and air humidity image sequences from a remote sensing satellite, respectively. m is the sample index before time t. If the target area is divided into N × N grid cells, the precipitation of all cells is denoted as It should be noted that p u jk will be unknown if no precipitation stations are placed in the corresponding cell. If a cell has several precipitation stations, the cell precipitation values will be the average observation values from the stations. In addition, the precipitation of n time points in the future is denoted as where erefore, the problem can be formulated as follows: To address it, we propose a multimodal semisupervised deep graph learning framework herein.

Graph Representation.
Multimodal and heterogeneous data are represented by a graph structure. e target region of precipitation nowcasting is divided into many grid cells, and each cell represents a local area (as shown in Figure 2, the rectangular area represents the target region of the precipitation nowcasting and the blue points represent the precipitation stations). Let G(V, E) be the graph structure of the observation data of the target region, where nodes V represent a certain local area and edges E represent the relationship of two local areas. Here, red points mean the precipitation is available while the green means the precipitation is unknown. erefore, the first problem is vector representation of G(V, E). To obtain the node and edge representations of G(V, E), we utilize a convolutional autoencoder to learn the feature representation of a radar echo map, an air humidity image, a satellite image, a temperature image, and a topographic map. e convolutional autoencoder (as shown in Figure 3) architecture in our method consists of five parts: an input image, an encoder, a feature representation layer, a decoder, and a reconstructed image. It should be noted that the five convolutional autoencoder architectures have the same structure but are trained separately. Moreover, to ensure that the input images are as similar as possible to the reconstructed images, we utilize mean structural similarity (MSSIM) as our objective function in the convolutional autoencoder training step. If the input and reconstructed images are marked as I and I′, respectively, then the MSSIM of I and I ′ is formulated as follows: where K represents the image similarity computed in K patches and I j and I j ′ represent the jth image patch. SSMI is Mathematical Problems in Engineering 3 the structural similarity measurement of two image patches formulated as where μ and σ are the mean and variance of the image patch and C is a small constant which avoids zero errors. Once all convolutional autoencoders are trained, the node representations in graph G (V, E) can be denoted as where f v i is the representation of node v i , and , and f ah v i are the representations obtained from the outputs of the decoders in Figure 3. en, the edges in graph G(V, E) are denoted as We divide a certain region to many cells and the precipitation of different locations in the same cell can be deemed as almost the same.
t -1 t t + 2 t + n Figure 1: Motivation and the problem definition of precipitation nowcasting in this paper. In this paper, we assume that the future precipitation can be predicted via modeling the following factors: radar echo maps, air humidity images, satellite images, temperature images, and available precipitation data over the past several hours with the corresponding topographic map. In addition, we assume that if we can divide a certain region to many cells, then precipitation at different locations in the same cell can be deemed as almost the same.
where s i,j is the similarity of two nodes v i and v j and ‖ * ‖ 2 is the Euclidean distance of the two node representations. erefore, let G(H, A) be the graph representation of graph G (V, E), which can be formulated as A � s 1,1 s 1,2 · · · s 1,N s 2,1 s 2,2 · · · s 2,N · · · · · · · · · · · · s N,1 s N,2 · · · s N,N

Precipitation Nowcasting Model
Learning. Based on the graph representation mentioned above, the observation data of different sampled times can be deemed as a graph sequence, as shown in Figure 4. erefore, the precipitation nowcasting task is converted to a sequence graph modeling problem, which aims to obtain the precipitation of each node according to a series of graphs.
To address it, as illustrated in Figure 5, a multimodal semisupervised deep graph learning framework is proposed as a solution. A GCN and LSTM are used to model spatial and temporal information, respectively. GCN contains one input layer, several propagation (hidden) layers, and one final perceptron layer by [22].
Given an input H and an adjacency matrix A, the GCN conducts the following layerwise propagation in the hidden layer as where k � 0, 1, ..., p is a layer-specific weight matrix needing to be trained. σ denotes a nonlinear activation function, and H k+1 ∈ R n×d k+1 denotes the activation output in the k + 1th layer.
As each row in H represents a local area, the precipitation can be predicted as follows: where h is a multilayer fully connected neural network. It should be noted that equation (8) is a regression formulation, which is different from the commonly used semisupervised GCN model. In the conventional semisupervised node classification, each row in H represents a node, and GCN defines the final perceptron layer as softmax(D − (1/2) AD − (1/2) H k θ k ) to classify all nodes.
Considering that H 0 is crucial for computing equation (7), we design a fusion unit to integrate the five features ) into one single hidden feature representation. Here, we apply a fully connected layer using a hyperbolic tangent nonlinearity activation function. e detailed transformations are as follows:

Mathematical Problems in Engineering
where W f is the transformation parameter, that is, the weights of the fully connected layer, which are learned automatically in the training. is fusion unit is capable of learning the weights for different types of the features. It is well known that precipitation is related to not only the current weather but also meteorological conditions over the past period. To handle this temporal influence, LSTM is utilized to establish relationships between the node at time t and the nodes before time t, which is formulated as where h is a LSTM-based sequence model. e optimal weight parameters are trained by minimizing the following loss function over all labeled nodes L, i.e.,  Figure 5: e framework of the proposed precipitation nowcasting model. e red nodes mean the precipitation of the corresponding local area is available while the green means the precipitation is unknown.

Topographic map representation
Input data … Predict data Figure 4: Illustration of the precipitation nowcasting which is deemed as a sequence graph modeling problem in this paper. 6 Mathematical Problems in Engineering where L indicates a set of the labeled nodes and j denotes the index of all observation graphs.
After the model training is completed, the future precipitation is easy to compute because y t+n � P t+n ′ .

Experiments
In this section, we first evaluate the proposed method on a large-scale real meteorological dataset. en, we demonstrate extensive experimental results and analysis.

Dataset.
e dataset used in the benchmark contains radar echo images, satellite images from 2016 to 2018, and topographic maps. e radar reflectivity images at 0.5 0 elevation with a resolution of 416 × 416 pixels were collected from local new-generation weather radar stations (Hefei, Bengbu, and Huangshan) and cover a local area centered in each radar station. e topographic maps were collected from Baidu maps. Infrared and vapor channel images of FengYun-2 meteorological satellite images were chosen to increase the accuracy of precipitation nowcasting. e topographic maps and satellite images have the same spatial resolution and cover the same areas. us, a large-scale real meteorological dataset was obtained (partial raw data can be queried from http://www.amo.org.cn/tcsj/zh/index.jsp). All data were checked by five volunteers. Any sample data from consecutive 27 h can constitute a whole training sample (this paper aims to forecast precipitation at a place for the next 3 h based on the weather conditions of it neighbors over the past 24 h. Moreover, to match the sampling time, the precipitation herein is accumulated over the past 20 min). For each local radar station, we randomly selected 5000 samples as the training set, 500 samples as the validation set, and 2000 samples as the test set.

Evaluation Metrics.
We utilize mean-square error (MSE) to evaluate model's effectiveness. e formulation is as follows: where Q is the number of test samples and smaller MSE means better performance. Furthermore, to quantitative evaluate our model's effectiveness influenced by the time variations, we compute MSE in different sampling time periods in the test phase: where i is the index of the ith precipitation map.  Table 1 presents the mean-square errors of different time nodes of the three radar stations. We can observe that the proposed framework predicts the precipitation in the next hour well. It also clearly shows that the error increases with time. Moreover, we can see that our model has the worst performance for Huangshan. We believe that the reason is that the causes of precipitation in mountainous areas are more complicated than elsewhere.

Implementation
To the best of our knowledge, this is the first attempt to model various types of the observed data in a unified network. We describe the performed ablation study below to explore the performance influenced by different factors. e inputs have five forms: re: inputs that only include radar echo maps re + si: inputs that include radar echo maps and satellite images re + si + ah: inputs that include radar echo maps, satellite images, and air humidity images re + si + ah + ti: inputs that include radar echo maps, satellite images, air humidity images, and temperature images re + si + ah + ti + tm: inputs that include radar echo maps, satellite images, topographic map, temperature images, and air humidity images From Table 2, we can see that all observed data used in this paper contribute to short-term precipitation nowcasting. Table 2 also shows that the topographic map usage improves the accuracy slightly, as compared to the other four factors. We think that the reason is that all other observed data are influenced by the topographic map. erefore, the features extracted from the other factors represent the topographic map implicitly.

Comparison with Existing
Methods. We note that some deep learning-based precipitation nowcasting methods have recently been proposed, e.g., [28][29][30]. However, they are all focused on supervised tasks. As we attempt to address the semisupervised automatic precipitation nowcasting problem, we compare our method with the two widely used semisupervised node representation methods, e.g., DeepWalk by [19] and ICA by [27]. Specifically, the GCN utilized in the proposed framework is replaced by these two methods. e results are reported in Table 2. It can be observed that the proposed method achieves the best performance. We think that the major reason is that the proposed framework can handle the spatial (local region) relationship more effectively.

Conclusion and Future Work
In this paper, we propose a multimodal semisupervised deep graph learning framework for precipitation nowcasting. Particularly, multimodal factors, i.e., the radar echo maps, air humidity images, satellite images, temperature images, and available precipitation data over the past several hours with the corresponding topographic map are handled in a unified framework. In addition, we handle areas without precipitation stations, and thus, with unknown precipitation. A GCN with sequence information is presented herein this paper to model temporal and spatial information with semisupervised labels simultaneously. Using the methods above, we successfully implemented efficient precipitation nowcasting via a computer vision technique. To verify our method, we built large-scale real precipitation nowcasting datasets. Extensive experimentation demonstrated that our approach achieves superior performance to the baselines. In the future, we will explore how to embed attention mechanisms in our framework, which may improve the accuracy further.

Data Availability
e data used to support the findings of this study are available at http://data.cma.cn/site/index.html.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.