Physics-informed generative neural network: an application to troposphere temperature prediction

The troposphere is one of the atmospheric layers where most weather phenomena occur. Temperature variations in the troposphere, especially at 500 hPa, a typical level of the middle troposphere, are significant indicators of future weather changes. Numerical weather prediction is effective for temperature prediction, but its computational complexity hinders a timely response. This paper proposes a novel temperature prediction approach in framework ofphysics-informed deep learning. The new model, called PGnet, builds upon a generative neural network with a mask matrix. The mask is designed to distinguish the low-quality predicted regions generated by the first physical stage. The generative neural network takes the mask as prior for the second-stage refined predictions. A mask-loss and a jump pattern strategy are developed to train the generative neural network without accumulating errors during making time-series predictions. Experiments on ERA5 demonstrate that PGnet can generate more refined temperature predictions than the state-of-the-art.


Introduction
Weather forecasting plays a significant role in various fields [1,2], including civil aviation, society-level emergency, and agriculture activities. And, with the development of the economy and society, more accurate and timely weather forecasting is becoming an urgent demand. The mainstream weather forecast methods are physical process-based methods like numerical weather prediction (NWP) [3,4]. Since, from the physical process perspective, weather forecasting is a sustained phenomenon marked by gradual changes through a series of states occurring in the physical world [3]. Thus, it can be solved by physical dynamics [5,6]. NWP leverages the integrated data of different weather services on coarse-grained resolution grids covering wide geographical areas and describes scientific knowledge of the convection-diffusion equation to improve its performance. PDE-Net [35,36] discretizes a broad class of PDEs by approximating partial derivatives with convolution neural network. Vincent [37] proposed PhyDNet, a two-branch deep architecture, which explicitly disentangles PDE dynamics from unknown complementary information. Although these works leverage physical knowledge, they either ignored the unknown physics laws or ignored mitigate conflicts in the generated results [38]. What is worse, that the conflict phenomenon frequently happens in image generatIn this paper, we propose a two-stage model called PGnet that benefits from the both advantages: 1). the excellent numerical fitting ability of deep learning approaches that is commonly with the auxiliary of high-performance computing units. 2). the higher accuracy of physical process. To our best knowledge, no previous work has utilized physical processes and generative networks simultaneously. This phenomenon will be discussed in detail in section 3.4.
In this paper, we propose a two-stage model called PGnet that benefits from the both advantages: 1). the excellent numerical fitting ability of deep learning approaches that is commonly with the auxiliary of high-performance computing units. 2). the higher accuracy of physical process. The first-stage is the physic-informed method that generates propagated images, which could involve some conflict pixels and blurry regions. The second-stage is the generative neural model that can calibrate and improve the first-stage outputs. To our best knowledge, none of any works has utilized physical processes and generative networks simultaneously. Furthermore, a MASK matrix is designed to distinguish these conflict contents, then the it will be feed into the second-stage. We also devise a mask-loss to adjust the contributions of these two stages dynamically, and put froward an evolution method with a jump pattern strategy for time-series multi-frame prediction.
Our contributions are summarized below: (i) We proposed a two-stage model called PGnet that incorporates a physic-informed method and an image synthesis neural network for troposphere temperature prediction.
(ii) We put forward an evolution method with a jump pattern strategy for time-series multi-frame prediction.
(iii) We evaluate PGnet on the troposphere temperature dataset and achieve more refined results than the previous methods.
This paper is organized as follows: . Section 2 describes the background of convectiondiffusion equation. Section 3 elaborate PGnet. Section 4 describes the training details and the performance evaluation. Conclusion and discussion are shown in Section 5.
This paper proposed a two-stage model PGnet that jointly takes known physical laws governed elements and unknown laws governed or conflict happened elements into account. The physics processes help to predict known physical laws governed elements. The generator network helps to capture unknown laws and to synthesis conflict elements.

Background
Partial differential equations (PDEs) play a prominent role in many disciplines of science and engineering. It describes a wide variety of physical phenomena such as sound, heat, fluid dynamics. A specific PDE equation can describe how some physical phenomena transfer into a physical system, such as physical particles, energy(temperature). And it is called the convection-diffusion equation that integrates the diffusion process and the convection process. The traditional method forecasts temperature by solving the convection-diffusion equation using numerical method [26]. The convection-diffusion equation is illustrated in equation 1. Let w be the vector velocity field of the flow with two components (u, v), velocities along with x and y directions, T be the temperature, the governing equations for this physical system are : where ∇ denotes the gradient operator, ∇ 2 denotes the Laplacian operator, and D is the diffusion coefficient. According to theorem 1 of [26], for the initial condition T 0 , there exists a unique global solution T (x, t) to the convection-diffusion equation 1 : where k(u, v) = 1 4πDt e − 1 4Dt u−v 2 is the kernel. Provide the motion w and the diffusion coefficient D. It states that for any timestamp and location, temperature T (x, t) can be calculated by a convolution operation between an initial condition T 0 and a Gaussian probability density function. In other words, if the troposphere temperature underlying advection mechanisms were known, future troposphere temperature could be predicted from previous ones. Unfortunately, neither the initial conditions, the motion vector, nor the diffusion coefficient is known. They have to be estimated from data. Using the same method in [26], in the first-stage of PGnet, we learn to predict a motion field analog to the w in equation 3 to generate intermediate physical processes results. Discretizing equation 2. by replacing the integral with a sum, and setting temperature field T i as the initial condition, we can calculate the future temperature field T i+1 based on the motion field estimateŵ.T As seen by the relation with the solution of the advection-diffusion equation, we use the warping mechanism in [26] that clearly adapted to the modeling of phenomena governed by the advection-diffusion equation. Troposphere temperature forecasting is a particular case.

Method
We view the forecasting of temperature as a video prediction task, giving a series of historical N frames X N = {T 1 , T 2 , . . . , T N } and predicting the future K frames Each frame can be perceived as a local area temperature field. As shown in figure 1, we design a two-stage model PGnet to disentangles physical process propagation from physical-agnostic generation. The first physical processing stage propagates the pixels constrained by convection-diffusion equations using the method described in 2. The first-stage generates a mask matrix and the propagated image that contain conflicting pixels. We define conflicting pixels, including boundary-affected pixels and collision pixels. As we treat the flow field as images, the images' boundary pixels propagated from previous images' outer areas are called boundary affected pixels. There are two cases for collision pixels. Case one: there is no pixel propagate to this coordinate, but boundary conditions do not cause it. Case two: there are at least two pixels propagate to this coordinate. A mask matrix that distinguishes the conflicting pixels is used. Please refer to section 3.4 for the detailed description of mask matrix. The evolution method with a jump pattern strategy used in the first stage will be described in sections 3.2 and 3.3.
Moreover, the second stage network's purpose is to generate the unreliable pixels indicated in the mask matrix. Pixels are not isolated. There are many cases [39][40][41] where appropriate spatial contexts must be retrieved. Spatial context encoders [42] query learned dataset priors with an exposed appearance searching for missing patches. We use the generation network stage to encode the propagated image's spatial context into a latent space. By employing the decoder network, we aim to generate the temperature field, wildly conflicting pixels from the latent space.  Figure 1: The overall architecture is shown in (a). The motion field is generated from input images with skip connected convolutional neural networks. F is a motion field evolution function that learns time-dependent dynamic characteristics of the motion field. The last frame of inputs is warped by the motion field to predict the propagate image. The generation part is also a CNN with encoder-decoder architecture that concatenates propagate image and the mask feature as input to generate the future frame. (b) shows the probabilistic graphical model representation of the proposed motion evolution method in which the evolution order M = 1, the number of inputs frames N = 2, and predicts K frames.

Algorithm 1 Jump Pattern Algorithm
Input: Require data T 1:N , number of predict frames K Output: Predicted imagesT N +1:N +K 1: Get the last input frame T N 2: for i = 0 to K do 3: Predicte motion field dW with physical process network 4: if i==0 then 5: Compute mask M with motion field W 12: Predicte imageT N +i withT N +i and M using generator network 13: returnT N +1:N +K

Model Components
For the physical process stage of PGnet, Conv-Deconv Network with skip connections is first used to produce the interval motion field. The final motion field is computed by motion evolution function F which will be explained in 3.3. A warp scheme described in section 2 that can apply the advection-diffusion constraint is employed to predict the propagate image. Different from [26], the warped image that contains conflicting pixels are served as the final output, the propagate image is utilized as an intermediate feature map in our design. For the generation network, a convolutional encoder-decoder network with symmetric skip connections is used. First, a mask is computed by the motion field to distinguish conflicting pixels. The output propagate image is channel-wise concatenated with mask feature and fed into the generation network. In order to improve accuracy and generate realistic images, an evolution method with a jump pattern strategy is put forward for multi-frame prediction and will be discussed in the below subsection.

Jump Pattern Algorithm
Most of the approaches (ConvLstm [12], DeepRNN [16], MCnet [43], Traj-GRU [15], Vid2Vid [19], etc [44][45]) implement the next-step prediction objective by replacing T n+1 withT n+1 and evaluated at each time-step. These approaches make the hypothesis that front predictions are ground truth for later prediction. However, this hypothesis is extremely strict and leads to the blurry unrealistic image and decrease accuracy, especially for the later frames.
To solve the problem mentioned above and reduce the error accumulation effect, we introduce a jump pattern algorithm shown in algorithm, where T i represents the ith frame, dW temp represents the cache interval motion field, dW represents the current interval motion field. W represents the final motion field, M represents the computed mask. warp(A, B) represents the warp network that deformation the input tensor A by the motion field B. It should be noted that warp (A, B) is also used to warp the motion field in our algorithm which tracks pixels over time steps that distinguish it from previous works. F (.) is motion evolution function. Different from the original method in the Journal of Statistical Mechanics [26] that the next frame is distorted from the previous frame, the frames of all steps are warped from the last frame of input images in our method. As a result, the times of warp and interpolation are reduced to one for each image and the accumulation error is restrained. We verify the effectiveness of this algorithm in the experiment section.

Motion Evolution
The natural spatiotemporal processes like troposphere temperature can be highly nonstationary in many ways. From Cramèr's Decomposition [46], any non-stationary process can be decomposed into deterministic, time-variant polynomials, plus a zero-mean stochastic term. We formally develop a probabilistic graphical model to learn timevariant features by introducing a motion evolution order M which represents the current motion is dependent on previous M motions. Generally, the motion field can be seen as a hidden state of different steps. A specific graph G is illustrated in fig:1 (b) with evolution order M = 1 inputs N = 2 images and predicts K frames without jump pattern. Where H represents the hidden state of different prediction steps (e.g., interval motion field). The structure of the graph G induces the following conditional independence relationships.
Generally, for evolution order M , inputs N images and predict K frames, we have the following conditional independence relationships. That is, the future observation is independent of past observations gave the current observation and future hidden state, and the current ith hidden state is independent of i − N past observations given the previous M hidden state and previous N observation. By the following equation, we can model the joint distribution of T 1:N +K and H 1:N +K by using the motion evolution method.  (dW temp , dW )). Momentum based function F (.): dW temp = (1 − β) * dW + β * dW temp . As Lucas-Kanade method [47] assumes, the optical flow is essentially constant in a local neighborhood.
We argue that such locality can be learned by CNN based function F (.). And the previous M motion fields are concatenated with the current motion field and input into the convolutional network. Besides, inspired by inertia, the motion vector generally does not vary dramatically. And humans can track the trajectory of objects with the latent notion of inertia. For our task, the motion field of different prediction steps should maintain stability and continuity. In order to obtain the beneficial effect of inertia, momentum-based function F (.) is proposed (M = 1 in this case). Where β ∈ [0, 1) is momentum coefficient.

Mask-Loss
Compared to deep learning methods, boundary conditions are often critical in the physical process methods like NWP. For a flow field image, most deep learning methods treat the pixels propagate within the image boundaries. This assumption is limited for the small region because boundary pixels are often propagated from outer areas. Besides, pixels also have occlusion problems and the "ghosting" effect [38] when regions in the target frame to which extrapolated optical flow have no projection. Motivated by these ideas, a mask that can distinguish conflicting pixels, including boundary conditions affected pixels and collision pixels, is introduced. Furthermore, a mask-loss is designed to dynamically adjust the contributions of the physics process and generation network. An energy-based method to compute the occlusion map from the optical flow is mentioned in [38,48,49]. Differed from [38], the mask in our method is achieved by the motion field that generated from the physical process stage of PGnet. Besides, the boundary conditions are also considered. The pixel density can be viewed as an energy map. In order to distinguish pixels influenced by border inspection conditions, a specific padding value(zero in our study) is used in the warp scheme [26]. Consequently, the pixels propagate from the outer area can be identified. We aim to compute a mask matrix that distinguishes conflicting pixels, including boundary-affected pixels and collision pixels. Similar to the method in [38]. We initialize the energy field of the first frame to be a matrix filled with ones, denoted as E 1 x,y . Given a motion field, for each pixel in the first frame, the energy unit on each coordinate will be computed by its corresponded coordinates in the second frame using the equation 4. Given the motion field and we get a new energy field E 2 x,y . We consider two special cases for each coordinate (x, y) in the second frame: 1. If E 2 i,j = 0. there is no pixel moving to this coordinate(mostly the boundary affected pixels padded by warp net) 2. If E 2 i,j ≥ 2. there are at least two pixels in the first frame compete for the same location, which suggests it occluded and not trustable. The mask are generated by E 2 i,j values.
To dynamically adjust the contributions of physics process and generation network a masked loss is designed. The masked loss and overall loss is formulated below.
Where N = W * H is the number of pixels. W and H are the width and height of the image. The parameter α is used to balance the weights of different parts. Different from the magnitude regularization in [26], which causes unstable of the model because of its negative coefficient. We just additionally apply divergence loss and smoothness loss.

Dataset
The dataset used in our experiment comes from ERA5 [50], a climate reanalysis dataset from ECMWF (European Centre for Medium-Range Weather Forecasts) [51]. The ERA5 dataset offers various atmospheric, land-surface, and oceanic variables in a spatial resolution of 31 km, a time resolution of 1 hour, and 137 vertical levels. In this experiment, temperature data of 500 hPa level is used and sampled at 6-hour intervals within 1980-01-01 00:00 UTC to 2018-01-01 00:00 UTC. The sampled data of the adjacent three days form a sequence of 12 images in time order. Then we got 4510 image sequences in sum; 85% is used to training 5% for validation and 10% for testing. The latitude and longitude range of the dataset is 10.5 • N to 74 • N and 72 • E to 135.5 • E, with a spatial resolution of 0.5 degrees (about 50 km).

Baseline Comparison
We evaluated our approach on the ERA5 dataset and compared it with several baselines. The model with 4 input frames is forecasting images on a horizon of 8 and evaluated with the mean square error (MSE), peak signal-to-noise ratio (PSNR), structural similarity (SSIM) metrics and pattern correlation (CORR). Besides, MSE scores are also given by different steps. The hyperparameters are tuned using the validation set. Models are trained on V100 NVIDIA GPUs. For measuring the effectiveness of the jump pattern algorithm and the motion evolution method, a progressive ablation experiment is designed with motion evolution order M = 1. Higher-order evolution is an interesting area for future research. Two variants of our model with different forms of function F (.) are evaluated. For CNN based function F (.), two 3 * 3 convolutional layer and one 1 * 1 convolutional layer [52] is stacked to capture spatial-temporal correlation of the motion field. Momentum based function F (.) is designed as PGnet-Momentum that concatenates motion field with propagating image and the mask feature as the input of the generator network. The hyperparameter α was set to 0.9 by cross-validation. Our model can be viewed as an upgraded version of the model proposed in [26], a physically-constrained advection-diffusion flow model. Compared with [26], our method not only applies physical constraints but also uses more deep learning methods like the generator network and motion evolution. We also introduce two purely deep learning models ConvLSTM and DeepRNN, as our baselines. ConvLSTM uses convolutional transitions in the inner LSTM module, and DeepRNN [16] can stack RNN deep effectively.

Quantitative Results
Ablation To better evaluate the effectiveness of our jump pattern algorithm and motion evolution method, we finally design a progressive ablation experiment. Table 1 shows the experimental results of ablation on validation set. MSE, SSIM,PSNR are measured as average scores on every predicted frame. We can see that the generation network elevated the scores most. Jump pattern algorithm can improve MSE and PSNR benefits from the suppressed cumulative error by reducing the times of interpolation and deformation.
When it comes to motion evolution, the method can greatly increase the scores of all metrics because of the dynamic characteristics are learned by the proposed motion evolution method. The result is in line with our theoretical expectation and proves the effectiveness of the jump pattern and motion evolution.
Jump Pattern The MSE and SSIM scores of each step is shown in Figure 2. With the increase of the step, all model's prediction accuracy and similarity are decreasing. Our models outperform base model Emmanuel et al.(2019) [26] at every prediction steps. and PGnet W/G,J,M achieves the best performance. The base model [26] seems not adapted to our instantaneous temperature dataset, which has more time correlations, and it can not generate pixels when conflict happens. PGnet W/G,J produce a lower MSE score in the short term (first three steps) compared with PGnet W/G, but capture long term dynamics. The score gap between PGnet with and without the jump pattern algorithm indicates the effectiveness of the algorithm. Figure 2 (a), the motion evolution method with momentum improves the MSE score by 0.43 on average for every prediction step. Figure 2 (b) also indicates that the motion evolution method with momentum improves the SSIM at every step. The results show that our method can keep similarities and reduce errors by capturing the time-variant features of the motion field. We also found that the motion evolution method leads to a faster convergence rate and a lower loss during training.  In order to study the impact of different types of motion evolution methods, we compared the step-8 prediction frame of two variants of the motion evolution method mentioned in 3.3. β is set to 0.9999 for PGnet-momentum that will be described below. From Figure 3, the shape information of the vortex has been lost for PGnet-Conv, but captured by PGnet-Momentum even in the long term. PGnet-Momentum produces images that are closer to the real temperature field image. One explanation is that timevarying features are more important than local constant spatial features for long-term prediction. As a result, PGnet-Conv with convolutional structure keeps more attention on spatial features and lost various detailed features of time in the long run. In contrast, PGnet-Momentum tracks the dynamic trajectory of every pixel and predicts a sharper image in the long term. Besides, the impacts of the coefficient β for method momentum   Table 2, When β is in 0.999 ∼ 0.9999, it works reasonably fine. It indicates that a relatively large momentum is beneficial to motion. When β is small (e.g., 0.9) or no momentum (β is 0), MSE drops significantly; When β is 1, there is no momentum, and the motion is consistent with the first motion field for every prediction step which is not considered in this paper. These results support our motivation of the motion field will not change dramatically.

Motion Evolution From
Evaluation Quantitatively, MSE, PSNR, SSIM scores of our model outperform other models. Table 3 shows the evaluation of our model and other baselines [26][15] [16]. Our model PGnet-Conv PGnet-momentum all outperforms other candidate models. And PGnet-momentum achieves the best performance on our dataset and respectively improve MSE scores by 2.6% on average. Figure 4 exhibits the motion field of our model. The picture above is predicted by PGnet W/G and the below picture is generated by PGnet W/G,J,M. As we see, PGnet without jump pattern is more messy and changeable. In contrast, the motion fields of different prediction steps with jump pattern and motion evolution are more similar and increasing with time steps and produce more convergent results.

ConvLstm
DeepRnn PGnet (a) step 1 (b) step 2 (c) step 3 (d) step 4 (e) step 5 (f) step 6 (g) step 7 (h) step 8 Figure 5: Experiment results of different models in the 2 days (8 steps) temperature field image prediction task. The time difference between each consecutive step is 6 hours. The bright spot shown in the small red box indicates the highest temperature, while the vortex in the large red box represents dramatic temperature changes. Our new model captured bright spot and vortex, which are distinct from the absence of these features in other models. Figure 5 presents the ground truth and the predicted frames of different models. The PGnet model using momentum-based F (.) with β = 0.9999 and evolution order M = 1. All models are given 4 frames as input and required to predict the next 8 frames. In contrast to the base model CDNN with regularization [26], other methods capture more dynamics. It indicates that although the CDNN model is constrained by physical equations and produces clear prediction pictures, it can not learn the dynamics. ConvLSTM and DeepRNN model seems to capture some dynamics but suffers more from blurriness with the step increases because of the lack of physical constraints and cumulative errors. Besides, our model benefits from both physics and deep learning methods. It learns higher-order dynamics by introducing the motion evolution method. As demonstrated in Figure 5 that both the bright spot and vortex shown in red boxes are captured by our model, while other models are missed. PGnet predictions are the closest to ground truth and captures more details that are essential to weather forecasting.

Conclusion
In this study, we introduce PGnet that incorporates a physic-informed method and an image synthesis neural network for troposphere temperature prediction. Towards learning time-variant features and increase accuracy, the motion field evolution method with a jump pattern strategy are proposed. We validate our proposal through a series of ablation experiments and provide exhaustive comparisons of PGnet with baselines. Quantitative studies on the ERA5 500 hPa level temperature dataset demonstrate that the physics-informed model can effectively improve the accuracy of upper atmosphere temperature prediction. The proposed method's effectiveness on other variables and levels is an exciting research area that we will study in the future.