UReslham: Radar reflectivity inversion for smart agriculture with spatial federated learning over geostationary satellite observations

The frequent occurrence of severe convective weather has certain adverse effects on the smart agriculture industry. To enhance the prediction of severe convective weather, the inversion model effectively fills radar reflectivity data gaps by leveraging geostationary satellite data, offering more comprehensive and accurate support for meteorological information in smart agriculture systems. Nevertheless, collaborative cross‐regional inversion driven by dispersed radar data faces challenges in efficiency, privacy, and model accuracy. To this end, we employ an U‐shaped residual network with an embedded light hybrid attention mechanism and utilize a federated averaging algorithm for efficient distributed training across multiple devices which could preserve the privacy of data from different locations, thereby improving inversion performance. In addition, to address the unbalanced nature of radar data, a weighted loss function is designed to enhance the model's sensitivity to high radar reflectivity. Experimental results demonstrate that the proposed model exhibits a certain level of improvement in evaluating radar reflectivity inversion performance across different thresholds compared to other models, thus substantiating the superiority of the proposed approach.


INTRODUCTION
In recent years, there has been a growing concern in society regarding extreme weather events.
The suddenness, widespread nature, and destructiveness of the events pose a serious threat to the agricultural industry. 1,2As a cornerstone of the national economy, agriculture urgently needs to implement effective measures to mitigate losses and ensure food supply in the face of meteorological disasters.Especially in the current era of rapid technological advancement, smart agriculture has emerged as a crucial direction for agricultural production.In this context, accurate weather data is critical to aid in the realization of smart agriculture.By integrating meteorological data with advanced technology, precise predictions of extreme weather events are made. 3Agricultural producers are able to then better respond to climate change and disaster risks.Smart agriculture can adjust planting plans, irrigation schemes, and harvesting timings more effectively, leading to increased crop yields and quality, which contributes to the realization of sustainable agriculture goals and promotes the modernization of agriculture.In this new age of agriculture, the importance of meteorological data is increasingly highlighted, making it an indispensable element in ensuring the sustainable development of agriculture and food security.Meteorological radar plays an indispensable role in this system, 4 leveraging its ability to continuously and frequently monitor over an extended period and its strong penetration capabilities to accurately capture the characteristics of particles within clouds.However, radar installation requirements are high, and in steep terrain, radar signals can be susceptible to scattered interference.This is particularly evident in the southwest region, where radar data coverage blind spots exist, 5 affecting the accuracy of meteorological information acquisition for smart agriculture.Despite a nationwide coverage area of approximately 2.2 million square kilometers, radar stations in the central and eastern regions have spacing between 150 and 200 km, while in the western regions, the spacing increases to 250-300 km.This not only hampers effective monitoring of extreme weather but also limits the application of smart agriculture systems nationwide.Therefore, to further enhance the monitoring capabilities of agricultural meteorological disasters, it is crucial to address the issue of insufficient radar data.This will meet the demand for efficient utilization of meteorological information in smart agriculture and promote sustainable development in agriculture.
With the continuous improvement of infrastructure, the development of remote sensing and meteorological satellite technologies has led to an explosive growth in the information generated from satellite and radar data. 6,7Compared to radar, meteorological satellites 8 have advantages such as a broad observation range and high spatial-temporal resolution.Satellites, positioned in geostationary orbits high above the Earth, continuously capture vast amounts of remote sensing and meteorological data.Under the conditions of eliminating terrain and clutter interference, radar monitoring of meteorological elements in a certain area is more precise compared to satellite observations.By studying models that invert radar reflectivity based on satellite data, it is possible to compensate for radar data, thus addressing data gaps and nonuniformity.This integrated use of data provides smart agriculture with more comprehensive and accurate meteorological information, enabling more intelligent and efficient agricultural production.Moreover, with the booming development of deep learning technology in recent years and its penetration into the meteorological field, 9 some progress has been made in the research of radar reflectivity inversion models.However, existing inversion models still face significant hurdles.
The first challenge lies in the inversion capability of the model amidst complex data.Due to the presence of numerous interfering features in intricate satellite data, extracting fundamental information closely related to radar reflectance proves exceedingly challenging for most existing models.Additionally, the lack of raw information assistance during the inversion process from highly condensed features to reflectance results in less accurate outcomes.Hence, there is an urgent need for new high-precision inversion models to bridge the relationship between high temporal and spatial resolution satellite data and radar reflectance data.
The second challenge lies in the parameter issues of models, where typical deep learning-driven algorithms involve a massive number of parameters.This requires substantial computational power and energy consumption to support, inevitably resulting in significant carbon emissions and energy consumption.This contradicts the direction of sustainable low-carbon development in smart agriculture.The third challenge is the phenomenon of data silos.One contributing factor is that radar stations are typically located in different geographical locations, each responsible for monitoring and collecting radar data in specific regions.This decentralization results in a lack of effective data exchange between radar stations, forming data silos among them.Another factor is that satellite radar data is transmitted from various dispersed data reception platforms to central servers, with a certain probability of exposing sensitive information during the data transfer process, thereby posing a risk of data leakage.Previously, to address such issues, Google introduced federated learning technology, allowing each platform to process data and train models locally without transmitting raw data to a central server.Each platform only shares encrypted model parameters, weights, gradients, and other features, ensuring data privacy.This learning approach not only addresses hardware and data transfer issues but also effectively protects the security of sensitive information.
To address the aforementioned three major challenges, we have chosen to adopt lightweight network models.By inverting meteorological radar reflectivity through multichannel satellite data, we ensure high inversion accuracy while reducing network parameters.This innovative approach significantly improves the issue of data gaps in radar data.This advancement provides more accurate and efficient data support for smart agriculture while also strongly promoting the enhancement of agricultural production decision-making and management.Additionally, we have introduced a spatial federated learning framework, cleverly utilizing the idle resources of multiple servers to achieve efficient parallel training.In the field of smart agriculture, given the importance of sustainable computing empowerment, our research focuses on the energy efficiency and data efficiency of intelligent computing.Through the innovative application of lightweight networks and federated learning, our goal is to enhance the performance of inversion models while committing to achieving the minimum resource consumption and carbon emissions in smart agriculture.This injects new impetus into the high-quality development of agricultural production.The main contributions of the paper are summarized below: • We introduce a spatial federated learning framework, allowing each radar station to conduct data processing and model training locally on satellite and radar datasets.This ensures data privacy while enabling efficient parallel training.
• We propose an U-shaped residual network that combines the hierarchical residual structure with the U-shaped network design concept to achieve a more precise extraction of essential information features from intricate satellite data and to establish a relationship bridge between the satellite data and the radar reflectivity data, which provides more accurate weather information for smart agriculture.
• We incorporate a hybrid attention mechanism layer into the reconstruction of radar reflectivity.This mechanism enhances the effectiveness of the information support mechanism, improving inversion accuracy while minimizing the parameters of the network model, thereby reducing the energy consumption generated during training and achieving low-carbon sustainable computing.

The inversion of remote sensing data
In recent years, the relationship between smart agriculture and meteorology has become increasingly intertwined.Das et al. 10 emphasized the significance of agricultural meteorology research related to extreme events.Zhang et al., 11 considering the current status of meteorological services and smart agriculture, analyzed the demand for meteorological services in smart agriculture and provided targeted suggestions to enhance meteorological services.With the further development of satellite sensing technology, Mahankale 12 utilized satellite meteorological data to investigate spatial variations in agricultural productivity and its correlation with weather variables such as temperature and precipitation.Their research aids decision-makers in formulating effective agricultural policies to mitigate the impact of climate change on crop yields.In the context of smart agriculture and meteorological big data, artificial intelligence algorithms can identify trends and correlations from large datasets, optimizing numerical inversion models, improving data processing efficiency, and better estimating and correcting data quality.
In the early stages of artificial intelligence, researchers utilized this technology for remote sensing-based precipitation estimation, and the results indicated that this algorithm outperformed traditional generalized linear models. 13,14Later on, with the emergence of Convolutional Neural Networks (CNNs), 15 an increasing number of researchers attempted to apply convolutional networks to precipitation and infrared inversion. 16,17These endeavors confirmed the feasibility of using convolutional neural networks in meteorology.Building upon this progress, Olaf Ronneberger 18 made improvements to the CNN architecture, and the U-Net has since been widely adopted in the field of image segmentation.Numerous investigations have demonstrated the superior performance of deep learning network architectures compared to conventional techniques in experiments focused on convective weather inversion. 19iu et al. 20 established a relationship between cloud-top infrared brightness temperature data obtained from meteorological satellite observations and precipitation intensity to perform precipitation forecasting.Even so, meteorological satellite observations have their limitations, as they could not penetrate cloud clusters to observe their internal conditions.For instance, convective cloud systems typically do not generate precipitation, but their cloud-top temperatures may be relatively low, leading to being mistakenly identified as precipitation clouds and resulting in prediction errors.Beusch et al. 19 introduced a rainfall inversion approach grounded in neural networks, marking a noteworthy advancement beyond conventional linear models.Meanwhile, Wang 21 pioneered an inversion technique rooted in deep learning, specifically targeting infrared precipitation.They substantiated the efficacy of their approach by fusing a convolutional neural network with inputs from multiple physical channels.Kyle et al. 22 built a convolutional neural network to convert GOES-R radiation and lightning into a synthetic radar reflectivity field to improve short-term convective-scale forecasts of high-impact weather hazards.Duan et al. 23 reconstruct radar reflectivity from the U-Net network and Himawari-8 satellite radiance data.The article analyzed an individual case of a summer convective event in northern China and better reproduce the location, shape, and intensity of this convective storm.Lin et al. 24 have proposed a novel approach that integrates advanced encryption standards, multiscale feature fusion, and attention techniques for safeguarding the transmission of remote sensing data and enhancing the robustness of meteorological remote sensing satellite data retrieval models, particularly addressing the issue of nondeposition cloud interference.Some researchers have also used deep learning methods to estimate synthetic radar reflectivity using observations from China's new-generation FY-4A geostationary meteorological satellite and terrain data. 25,26espite some progress, the current inversion technique is difficult to maintain high spatial and temporal resolution during neural network training, resulting in the radar data obtained by inversion cannot meet the expected results.Therefore, it is necessary to design an excellent neural network under the premise of maintaining high spatial and temporal resolution in order to improve the inversion accuracy and obtain accurate radar data.

Federal learning framework
In 2016, McMahan et al. 27 proposed the concept of a federated averaging algorithm, which uses a central server to coordinate multiple clients for distributed collaborative training.Federating multiple clients solves a distributed machine-learning problem in which all client data is controlled by the clients themselves and saved locally without sharing.Each client server uses the data to update the parameters iteratively and transmit them to the central server, and finally, the models of each party are aggregated at the central server to obtain the federated shared model.Currently, federated learning is classified into horizontal federated learning, vertical federated learning, and federated transfer learning based on the difference between the feature space and sample space of the data.Horizontal federated learning is mainly suitable for data sharing the same sample space; vertical federated learning is suitable for the case where parties have different but related feature data; federated transfer learning is suitable for cases where the feature space and sample space are different.In the dataset we use, although the sample data comes from different places, the data have the same feature information, so we adopt horizontal federated learning.
In recent years, more and more people have begun to explore federal learning on the ground in various industry sectors.Ramaswamy et al. 28 fused a word-level recurrent neural network-based language model with federated learning to train language modeling tasks using the network model locally on each client.The paper demonstrated the feasibility of using federated learning for natural language understanding tasks.In 2018, Brisimit et al. 29 employed federated learning to train Support Vector Machine (SVM) classifiers using training datasets containing confidential electronic health information from various hospitals.This innovative approach held immense potential in the medical domain, allowing tasks like predicting heart conditions and assessing the necessity of hospitalization, all while safeguarding patient privacy through decentralized model training.
What is more, in the face of the COVID-19 pandemic threat over the past 2 years, medical institutions have high confidentiality requirements for patients' medical data, making it challenging to share medical information among different healthcare facilities.This limitation has restricted the application of deep learning technologies in the medical domain.Wang et al. combined federated learning and blockchain technology to effectively segment patients' chest CT images, helping doctors diagnose the condition of neo-coronavirus pneumonia.

MODEL FRAMEWORK
The overall architecture of the model is depicted in Figure 1.We have employed lightweight network models and spatial federated learning, cleverly utilizing the idle resources of multiple servers to achieve efficient parallel training.Our objective is not only to enhance the performance of the inversion model but also to minimize resource consumption and carbon emissions in smart agriculture.Specifically, during the training process, satellite and radar data collected from various regions are transmitted to the nearest Smart Agriculture Workstation, which serves as a server.
Within the federated learning framework, multiple servers concurrently conduct model training.Once trained, the models are deployed on the nearest servers for predictions and integrate meteorological data, which is then transmitted to Smart Agriculture Automated Weather Stations within a certain range.This provides Smart Agriculture with more comprehensive and accurate The overall model framework.
meteorological information, contributing to intelligent and efficient optimization of agricultural production.

Model training framework based on spatial federated learning
In the context of smart agriculture, while data interoperability exists in most regions, there are challenges such as slow data exchange and the risk of data leakage during transmission, particularly when dealing with relatively sensitive radar reflectivity data.Therefore, in this paper, we have for batch b ∈  do 8: end for end for 14: end for the predefined number of iterations is reached.In the practical scenario of agricultural meteorological data, each smart agricultural workstation serves as a data server, receiving satellite and radar data signals covering the region.The model is trained and iterated on the server internally, and only involves the transfer of training parameters between servers, without transmitting the privacy data itself.After training, the predicted results will be transmitted to the smart agricultural automated meteorological station closest to them.
The detailed training process of the federated averaging algorithm is illustrated in Algorithm 1.
The algorithm defines a total of N clients involved in training, C represents the proportion of clients participating in each round of computation, and C × N is the number of clients involved in each round.E is the number of times each client invested all of its local data in training in each round, B is the size of the batch used for client updates, D is the total number of samples for all clients combined.Additionally, the learning rate is set to  and d n represents the number of samples for client n.

U-shaped residual network with attention
We propose an U-shaped residual network with an embedded attention mechanism.The overall architecture of the network is illustrated in the Figure 3.The network consists of three main parts: the encoding layer with residual structures, the light hybrid attention mechanism layer, and the decoding layer with a cross-layer feature reconstruction.

Encoder
The traditional encoder layer of the U-Net network, that is, the feature extraction layer, has only two layers of convolution, the basic batch normalization operation, and the activation

F I G U R E 4
The residual structure of the solid and dashed lines.On the left is the solid structure and on the right is the dashed structure.The residual structure of the dashed line has a downscaling effect.(A) Residual solid structure, (B) residual dashed structure.
function ReLU operation per layer.Although it gained great attention initially, when applied to radar inversion operations, the shallow network failed to extract effective features, necessitating an increase in the depth of the neural network to alleviate this issue.However, the operation led to new challenges, such as the problem of gradient explosion commonly observed in deep networks.For such problems, resnet 30 proposed a residual structure to avoid these problems to some extent.The residual structure of the solid and dashed lines is shown in Figure 4.The calculation formula is out = p(x) + x.In the formula, x is the input of the structure, p(x) represents a series of operations on the input x, and ⊕ is the matrix addition operation.
Compared to the continuous emergence of novel network architectures, the performance of the resnet network architecture is not satisfactory, particularly when it comes to restoring the condensed features through feature fusion in the recovery network.However, the advanced nature of resnet residual structure still prevails in the intricacies of various novel network designs.In this paper, the hierarchical residual structure of resnet is combined with the U-network design concept to seek better inversion results.Specifically, the core idea behind the U-shaped network design is to mutually integrate the encoder and decoder, creating a symmetrical network architecture.The encoder's task is to gradually reduce the spatial resolution of the input image and extract high-level features, while the decoder gradually restores the resolution while utilizing the features extracted by the encoder to generate the final result.We attempt to embed residual modules into the encoder to fully leverage the feature extraction capabilities of residual modules and enhance the effectiveness in the feature recovery stage.In terms of detailed implementation, before using the basic block in each layer, the image is first halved in height and width using the MaxPool operation, and then a series of basic blocks are used for feature extraction and channel dimension expansion.It is worth noting that using dashed-line residual structures facilitates dimensionality increase between basic blocks with different stacking levels, allowing feature maps on the shortcut branch to be concatenated with feature maps on the main branch in the same shape.This ensures that feature copies from skip connection on the right side can maintain the same shape as the upsampled feature copies from the lower layer.

Decoder
In the decoder stage, when upsampling to restore image size, it essentially involves creating something from nothing, which requires a large amount of auxiliary information.Specifically, as the network becomes deeper, the feature map becomes larger in the channel dimension but smaller in the spatial dimension.As a result, the richness of high-resolution complex information in the feature maps decreases.Therefore, the availability of auxiliary information becomes increasingly scarce.In the network, each layer of the encoding stage uses multiple basic blocks for feature learning to maintain high-resolution information at each layer and generates copies to provide high-resolution auxiliary information for the upsampling process in the decoder stage.The network utilizes five upsampling operations in total to gradually restore the image size and reconstruct image details.In the cross-layer feature reconstruction operation, each decoding layer embeds the scale information from the corresponding encoding layer into a mixed attention mechanism layer, followed by a 1 × 1 convolution and ReLU operation.The purpose of this operation is to combine high-resolution detailed information from the encoder stages with the different upsampling stages of the decoder.This fusion process introduces multiscale, multilevel information, thus comprehensively enhancing the reconstruction of image details.
In U-Net, skip connections are used to generate feature copies at different scales within each stage of the decoder.Additionally, this network embeds a mixed attention mechanism layer for each of these feature copies.This allows the network to learn the weights of different scale features and selectively pass the most informative features to the corresponding scale of the decoder for image reconstruction.The network plans to perform cross-layer feature reconstruction on feature maps with sizes of 14, 28, 56, 112, 224, and 448, for a total of six scales, thus compensating for the high-resolution detail information at the time of decoder upsampling.
In addition to reconstructing cross-layer features, the decoder also needs to perform cross-layer splicing.Specifically speaking, all but the last layer of the scale decoder must perform a splicing operation between the feature maps reconstructed from the cross-layer features and the feature maps obtained from the lower-level upsampling.After the concatenation, a 1 × 1 convolution and ReLU operation are performed on the combined feature maps.It is worth noting that the 1 × 1 convolution and ReLU operation mentioned in the previous encoder is intended to improve cross-channel information interaction and increase the nonlinear properties of the feature map, while the operation mentioned here is more aimed at dimensionality reduction operations to facilitate the next upsampling operation.For instance, in the fourth scale's decoder layer, the feature map with size (512, 28, 28) is concatenated with the feature map obtained from cross-layer feature reconstruction, which has a size of (256, 28, 28).After the concatenation, the resulting feature map has a size of (768, 28, 28).Then, a 1 × 1 convolution and ReLU operation is performed on this concatenated feature map to reduce its dimension to (512, 28, 28).
All the upsampling operations mentioned in this network use the bilinear interpolation method.Bilinear interpolation calculates the output value at a given position based on the weighted average of neighboring pixels in the input tensor.Compared to other interpolation methods like nearest-neighbor interpolation, bilinear interpolation provides smoother results and preserves spatial correspondence between the input and output, with corner pixels aligned to maintain spatial relationships.

Light hybrid attention module
The skip connection used in the traditional U-Net network can effectively solve the gradient vanishing problem, helping the network to better learn features, accelerate convergence, and improve model performance.Nonetheless, the introduction of the skip connection may lead to the problem of information redundancy in the network.Some features may appear at multiple levels, causing the network to learn the same features repeatedly while neglecting other useful information.Inspired by the Squeeze-and-Excitation (SE) networks, which introduced an attention mechanism in the channel dimension, we adjust the structure of the network's convolutional and activation layers and embedded them into each level to obtain more effective feature information according to the actual data, which is conducive to better generation results.We aim to improve the feature correlation between the decoder and the encoder by adding a branch with a light hybrid attention layer before each layer's feature extraction process for the input images.And the method effectively reduces the feature loss caused during upsampling.The light hybrid attention layer consists of a channel attention layer and a spatial attention layer.The overall architecture diagram is shown in Figure 5.
The entire process can be summarized below.
where X is the input feature, and X ∈ R C×H×W , X mid is the product that passes through the channel attention, X end is the product that passes through the spatial attention and also the output.⊗ means multiplication of matrix elements.A c is the channel attention and A c ∈ R C×1×1 .A s is the spacial attention and A s ∈ R 1×H×W .

F I G U R E 6
The channel attention layer.The gray represents the convolutional module.Specific related introduction is in the back.
It is worth noting that using channel attention or spatial attention in isolation can lead to the neglect of crucial information.The former may overlook spatial positions, while the latter may disregard interchannel relationships.To enhance the handling of images or other tasks effectively, it is common to integrate both channel attention and spatial attention.This combined attention mechanism allows for the simultaneous consideration of the correlation between feature channels and spatial locations, resulting in a more comprehensive capture of image feature relevance and, consequently, improved model performance.
The channel attention layer is shown in Figure 6.The formula for channel attention is shown below.
Then, a series of calculations lead to the following equation.Note that in the ConvModule, the weights of the first convolutional layer is . Moreover, r is the reduction ratio, the choice of specific values of r is explained later. is the Sigmoid function.
Each channel of the feature map has equal weight values before the attention layer is applied.After channel attention, the weight values of the different channels are changed so that the network can pay more attention to the channels with higher weights that are relevant to the outcome.Similar to the SE mechanism, the channel attention layer performs global average pooling for the input channels, and in addition, to reduce the interference of redundant information in the feature data of each layer, the input features are divided into one copy and added to the global

F I G U R E 7
The spatial attention layer.maximum pooling.Then, the two features that have gone through the averaging and maximum pooling layers respectively are jointly added to the convolutional module for channel weight learning.Finally, the two layers are combined, followed by a convolutional operation and Sigmoid processing to obtain the result, which serves as the input for the spatial attention layer.
The convolutional module used here for channel weights learning consists of two convolutional layers combined with a ReLU layer in the middle.The first convolutional layer reduces the input dimension to 1 16 of its original size, followed by the ReLU layer, and then the second convolutional layer recovers it to the original input dimension.Here 16 can be set according to the actual number of channels.The input channels for this experiment are 64, 128, 256, and 512, so 16 is chosen as the integer division multiplier to reduce the number of parameters and ensure that the weights are learned.
The spatial attention layer is shown in Figure 7.The formula for spatial attention is shown below.
where  is the Sigmoid function, f 7×7 is the convolution with a filter 7 × 7. Concat is a splice operation, used to splice multiple tensors.
For channel-processed feature maps, a replica is generated.The replica is then averaged and maximized in the spatial dimension.To illustrate, the original feature map of size (512, 64, 64) undergoes a series of operations to be transformed into a new feature map of size (1, 64, 64).Then, this transformed feature map is combined with another feature map, resulting in a final feature map with dimensions (2, 64, 64).After that, the resulting concatenated feature map of size (2, 64, 64) is fed into a convolutional layer with input channels = 2 and output channels = 1.Following this, a Sigmoid layer is applied to learn the spatial dimension weights of the feature map.

Loss function
In the experiment, there is a significant data imbalance issue in the radar data set.In most regions of the radar data, the radar reflectivity is low, indicating the absence of precipitation.Considering that severe convective weather is a low-probability event, higher radar reflectivity values represent severe convective weather, which accounts for a very small proportion of the radar data.
Therefore, the inversion model must focus on regions with higher radar reflectivity.Even so, using For example, the radar data at 4:00 p.m. on June 29, 2021 is selected for probability statistics, and the statistical results are shown in the table.In this context, weight(y) represents the model's level of attention to reflectivity.The larger the proportion and the lower the reflectivity, the higher the corresponding weight(y) should be set, aiming to make the model prioritize high reflectivity.Considering that the probability of severe convective weather varies in different seasons, based on previous experience, this paper assigns weights to five intervals as 1, 10, 15, 66, and 800, with their practical meanings indicated in the table.It should be noted that due to various reasons, radar data may contain NaN values, negative values, and extremely high reflectivity values.For NaN and negative values, we set the weight to 1, and for reflectivity values exceeding 75, they are capped at 75.
To solve the problem, we use a weighted loss function to address the data imbalance problem and improve the model's ability to invert larger radar reflectivity values.The specific definition of the loss function is shown below and the probability statistics and weight setting for radar data are shown in Table 1.
where M is the number of all pixel points in one radar data, y is the radar reflectivity predicted by the model, y is the true observed radar reflectivity.

Satellite data
The meteorological satellite data used in this study is the full-disc scan grid data from the Himawari-8 advanced meteorological satellite, which was launched in Japan in 2014.This satellite is a pioneer in the next-generation geostationary meteorological satellites.Compared to its predecessor, Himawari-7, the satellite's onboard sensor performance has significantly improved.It has greatly increased the observation frequency, coverage range, spectral channels, and image resolution, leading to a substantial enhancement in the quality of satellite cloud images.Specially speaking, in comparison with the MTSAT series, the Himawari-8 meteorological satellite has increased the number of observation channels from 5 to 16, and it has three visible, three near-infrared, and 10 infrared channels.Moreover, the satellite has a higher temporal and spatial resolution, with a resolution of 0.5 km in the visible channel, 1-2 km in the near-infrared and infrared channels, and a frequency of observation of the full disc map once every 10 min.Therefore, by fully leveraging the capabilities of this satellite, it will greatly improve the accuracy of typhoon and heavy rain forecasting, providing a more advanced and effective tool for medium-range weather monitoring and prediction.This, in turn, ensures the reliability of meteorological forecasting and disaster weather monitoring, offering valuable support for decision-making, public services, and scientific research.Satellite data can be registered and downloaded from JAXA's P-Tree Data Service Network at the following link: http://www.eorc.jaxa.jp/ptree/.

Radar data
In the western regions of China, the spacing between radar stations is relatively large, generally ranging from 250 to 300 km.Such spacing results in lower spatial and temporal resolution of radar data, making it challenging to effectively capture the details and changes of short-duration heavy rainfall, torrential rain, typhoons, and other extreme weather events.Moreover, the complex geographical conditions in the western regions, including high mountains, hills, deserts, and climate variations, lead to potential interference and noise in radar data, thereby reducing the reliability and accuracy of the data.In comparison, in the central and eastern regions of China, the spacing between single-point radar stations is relatively close, generally ranging from 150 to 200 km.This results in a higher spatial and temporal resolution of radar data.Such a layout facilitates timely monitoring and accurate prediction of rapidly changing weather phenomena, providing crucial support for weather disaster early warning and response.Therefore, the radar reflectivity data used in this paper, that is, the labeled data, is the radar data with a wide weather radar coverage in eastern China, with a geographic resolution of 0.01 and a temporal resolution of 6 min.

Data processing
The Himawari-8 satellite's sensors contain multiple channels, and each channel has its unique observation wavelength, providing distinct identification information and serving specific tasks.
In theory, stuffing all the satellite channel data information into the model will give better results, but actually doing the training creates a lot of problems.Firstly, the width of the neural network increases accordingly.Increasing the width of the network in the pretraining phase enables it to better capture the complex relationships in the data and learn the data features more efficiently.While in the later phase of the network, the operation may lead to a model that becomes overly complex and fails to generalize well to new data, resulting in model overfitting.Secondly, fully incorporating all the information can lead to a significant increase in the number of training parameters in the network.Nevertheless, existing hardware devices are unable to handle such massive datasets, causing the training process to proceed at a very slow pace.In addition, some of the satellite channel data information is not well correlated with the radar reflectivity, and there is redundant information, which can slow down the training process or even lead to model overfitting.
For the above reasons, 16 channels of data must first be screened in the process of radar reflectivity inversion based on satellite data.When selecting satellite channel data, the first step is to remove data with missing channels and time gaps.After consulting relevant research materials, five channels of data are selected, namely channels 7, 9, 13, 15, and 16.Additionally, for the sake of training convenience, we applied interpolation algorithms to resample the original satellite data.The processed satellite channel data has a spatial resolution of 0.04 and a temporal resolution of 10 minutes, covering data from 2020 to 2022.An important observation is that channel 16 encompasses data concerning cirrus clouds, which are non-precipitating upper-level clouds characterized by their low top temperatures and absence of precipitation production.However, channel 16 contains crucial information for inversion and cannot be discarded.Therefore, it is essential to minimize interference from non-precipitating clouds in higher layers.Through the consultation of relevant materials, 31 it is found that the brightness temperature difference between channels can also characterize cloud properties, facilitating the capture of strong convective areas.Therefore, based on the previous studies, 32 the model presented in this paper is trained by utilizing the bright temperature difference data between channels 13 and 15, which share similar wavelengths.Specifically, data information for the five satellite channels selected for this paper is shown in Table 2.
In the dataset, we perform interpolation on satellite raw data to achieve a spatial resolution of 0.04, while the radar data has a spatial resolution of 0.01.In the data acquisition process, both the first and last data points are included.For example, spanning a longitude of 20 degrees, at a resolution of 0.04, we obtain 501 data points, while at a resolution of 0.01, we obtain 2001 data points.Furthermore, there is a difference in the temporal resolution between radar data and satellite data.Nonetheless, aligning the data with the same temporal resolution alone may result in an insufficient number of data samples for model training.Since atmospheric dynamics exhibit minor fluctuations within short time intervals, this study opted for radar grid data with time stamps differing by 2 min within the satellite data range as labeled training data.The dataset was randomly distributed based on the number of clients, with each client having data at different time intervals but covering the same geographical region.

Evaluation of indicators
According to the literature, we employ five evaluation metrics to assess the performance of the inversion model: Heidke Skill Score (HSS), Critical Success Index (CSI), Accuracy (ACC), False Alarm Ratio (FAR), and Probability of Detection (POD).
To calculate these evaluation indexes, three different thresholds are chosen in this paper, which are 5, 20, and 35, respectively.According to these thresholds, the radar grid point data are converted into a binary matrix in the form of 0, 1, to compute the values of TP (True Positive), FP (False Positive), FN (False Negative), and TN (True Negative).The specific meanings of the parameters are shown in Table 3. the input matrix data and threshold value, we stipulate that data in the matrix greater than the threshold value is set to 1, and vice versa, it is set to 0. This operation is applied to both the predicted and actual matrix data.
These metrics assess the performance of the inversion model in identifying positive and negative cases.Specifically, HSS and CSI measure the accuracy and sensitivity of the model, respectively.The ACC assesses the overall correctness of the model, while the FAR quantifies the tendency of the model to produce false alarms.The POD, on the other hand, assesses the model's ability to detect true events.By considering the results of these metrics together, we can more fully evaluate the performance and effectiveness of the inversion model.The calculation formula is shown below.

Experimental settings
In this study, the aforementioned federated averaging algorithm is used as the basis for the distributed training architecture.Considering the different nature of each data storage satellite site compared to the mobile distributed training clients, these sites employ stable-performance computers with high real-time network connectivity.As a result, the model is designed with a

TA B L E 3
The specific meanings of the parameters.

Predicted situation
Real situation 1 0 proposed C-proportion of 1, meaning that all clients participate in each computation round, for a total of four clients.The total number of rounds, denoted T, is set to 5, with each round consisting of 10 epochs.Furthermore, training uses the ADAM optimizer with default hyperparameter settings, and the proposed loss function is also used for training.The learning rate is set to 0.001.80% of these sets of data were randomly selected for network training and the rest were used as the validation data set.Details of the specific hardware and software configuration for the experiments are given in Table 4.

Comparison experiment
The function curve represents the model's performance and training progress during the training process.The loss function measures the gap between the model's predictions and the true labels, and can also be viewed as how well the model fits the training data.The model proposed in this paper runs five iterations on the simulated clients, each client with 10 epochs, for a total of 50 epochs.The loss function curve of the clients is shown in Figure 8.
The graph reveals a gradual and stabilizing downward trend in the overall training loss curve.It is worth noting that during the experiment, it was found that the model loss values in the training process fluctuated more than usual in the batches of training after the key points of 10, 20, 30, and 40.This phenomenon is largely attributed to the fact that, in these instances, all the  The parameter statistics for various models are presented in Table 5.To be fair, the input size is set at 4.07 MB.From the perspective of parameter count, it can be inferred that the poor performance of the U-Net model is likely attributed to its simplicity.Each encoding layer in the U-Net model contains only two convolutional blocks for feature learning, which leads to inadequate learning results.Although the DeepLabv3 model has a relatively small total parameter count which mainly consists of the weights and biases, the model utilizes a substantial number of complex operations, most likely including spatial pyramid pooling and dilated convolution techniques.This leads to an excessive number of floating point operations required by the model, resulting in a longer inference time and necessitating more computational resources.Therefore, it is advisable to choose devices with robust GPU performance to address this computational demand.While the Danet model demands fewer FLOPs compared to the DeepLabv3 model, it comes with a larger total parameter count.Hence, when utilizing the Danet model, it is crucial to carefully manage the equilibrium between the total parameter count and the available memory.The UResLham model proposed in this study requires three times the total number of parameters of the DeepLabv3 mode, but its FLOPs demand is only 14.8% of the latter.This implies a lower computational resource requirement, making it well-suited for dedicated meteorological data servers, which are characterized by large storage capacity, significant data throughput, and fewer resources for deep learning.
We use the U-Net 18 model, the Danet 33 model, and the DeepLabv3 34 model for comparative experiments on the same dataset.For the fairness of the experiment, all hyperparameters are set to the same parameters as the experiment above, the total number of iterations T is 5, and there is a total of 10 Epochs per client in each round T, totaling 50 Epochs.Three different thresholds of 5, 20, and 35 were chosen in this study for the evaluation of relevant metrics.The experimental results for the six evaluation metrics are depicted in the accompanying figure.Among these metrics, higher values for HSS, CSI, ACC, and POD indicate better performance, while higher values for root mean squared error (RMSE) and FAR indicate worse performance.The graphs of the indicators are shown in Figure 9.

TA B L E 5
The parameter statistics in different models.112.The Unet model is relatively simple, leading to more inaccuracies in the inversion process, which results in higher RMSE values.While other models generally cover the same areas in their inversions, they exhibit significant discrepancies between the retrieved reflectance values and the actual values at individual points.Our model incorporates attention mechanisms and incorporates a wealth of auxiliary information during the feature recovery process.This approach aims to not only approximate the actual regions but also minimize the differences between the retrieved reflectance and the ground truth values to the greatest extent possible.This reduction in the RMSE metric signifies an enhancement in performance for the proposed model.

Model
Regarding the FAR metric, all models exhibit the lowest FAR values at a threshold of 20, while the FAR values peak at a threshold of 35.The likely reason for this situation is that intense precipitation events are relatively rare occurrences, characterized by long time intervals between events and a concentration of their incidence in specific regions.As for why the FAR is also relatively high at a threshold of 5, it is due to the model identifying areas of heavy precipitation where radar reflectivity is elevated, which may encompass neighboring radar reflectivity regions with the potential for minor precipitation phenomena.This aspect also highlights an area for future improvement in our work.At a threshold of 35, compared to other models, the model proposed in this study exhibited a decrease of 35.02%, 11.74%, and 2.5%, respectively.Following a similar trend, the HSS metric also shows a consistent pattern, albeit with an opposite interpretation compared to the FAR metric.
For the POD metric, several models have the highest POD at a threshold of 5 and the smallest POD at a threshold of 35.The model in this paper has a value of 0.98 at a threshold of 5, which is an improvement of 2.7, 0.4, and 1.9 percentage points compared to other models.At the thresholds of 20 and 35, the difficulty of prediction increases for several models, and the accuracy of the model proposed also decreases to 0.709 and 0.647, respectively, which is also due to the nature of low probability events of strong convective weather.However, compared to the other three forecast models, the greatest percentage improvement is observed at the threshold of 35.The CSI indicator follows the same trend.
Concerning the ACC metric, at a threshold of 35, there is minimal variation observed among the models.This is largely attributed to the fact that intense convective weather is a rare event, and its presence in radar reflectivity areas is relatively limited, thus making it reasonably predictable across most models.Nonetheless, the performance of all models deteriorates notably at thresholds of 5 and 20.This is primarily due to the broader coverage of regions with low radar reflectivity, which poses a challenge for accurate prediction and consequently results in reduced precision.
All these metrics collectively demonstrate significant improvements in radar reflectivity inversion achieved by the model proposed in this study compared to other models.Not only does the proposed model perform better in estimating regions with lower radar reflectivity, but it also exhibits remarkable enhancements in estimating areas with high radar reflectivity.

Ablation experiment
We also consolidate all client data for centralized training under identical hyperparameter conditions.The scale of the data we use is relatively small compared to actual data, so the efficiency difference between the two training methods is not significant.In practical scenarios, despite the network latency caused by parameter transmission, distributed training is faster and better protects the privacy and security of the data compared to centralized training.The performance comparison between centralized training and distributed training using the Federated Averaging algorithm is shown in Figure 10.From the graph, it can be observed that the various metrics affected by the utilization of distributed training exhibit relatively minor losses compared to centralized training.This is attributed to data being subjected to data cleansing and the rigorous adherence of each client's data to specific standards, coupled with the application of the cosine annealing algorithm to adjust the learning rate for hyperparameters.However, despite these factors, the model parameters still lag slightly behind those of centralized training.
In addition, to demonstrate the effectiveness of the light hybrid attention embedded in the decoder stage, comparative ablation experiments were performed under the same conditions.The loss curve graph for both models with the same client is shown in Figure 11.As can be seen from the figure, the loss curve of the model without the embedded hybrid attention mechanism initially shows lower loss values.While it shows a slower rate of convergence in the later stages and tends to saturate.In contrast, the model with the embedded mechanism converges faster and maintains a continuous convergence trend.
Table 5 above already contains the parameter statistics for both models.As can be seen from the table, the network embedded with five light hybrid attention modules experiences only a marginal increase in the total number of parameters, while the total number of floating point operations increases.However, there is a significant improvement in performance.This underlines the lightweight nature of the light hybrid attention layer.Tables 6-8 compare the evaluation indicators of the two models under different thresholds.

F I G U R E 11
The loss curve graph of the two models.

T A B L E 6
The evaluation indicators of the two models when the threshold is 5. From the table above, it can be observed that the UResLham model, when compared to models without the Lham module, shows improvements in various metrics.Taking the threshold of 35 as an example, the CSI increased by 0.08, FAR decreased by 0.08, HSS improved by 0.09, POD increased by 0.09, and ACC remained the same.This demonstrates that embedding a hybrid attention mechanism can effectively enhance the feature correlations between the decoder and encoder.This, in turn, aids in better feature learning, accelerates convergence, and enhances overall model performance.

4.5
Case study From the figure, it is evident that the inversion performance of the U-Net algorithm is the poorest.It predicts only the area of high radar reflectivity in the center, where intense precipitation occurs.The lower right corner and the upper corner are not predicted.There is also a widespread phenomenon of false alarms.This is most likely due to the insufficient depth of the network layers.In the case of Danet, heavy precipitation is accurately predicted in the lower right and central areas, but the upper area is not, and there are many incorrect predictions of areas of low radar reflectivity.In comparison, DeepLabv3 shows a significant improvement over Danet but predicts too many areas of heavy precipitation, with some surrounding areas also predicted as having heavy precipitation.These problems with the Danet and DeepLabv3 models are probably due to an excessive number of model parameters, requiring more iterations to find appropriate weights.For the same number of iterations, the URes architecture shows superior performance, accurately predicting the three heavy precipitation areas in the middle, lower right, and upper regions.However, the model without the Lham module shows a problem of overprediction.In contrast, the UResLham model with the added module avoids overprediction and accurately represents the areas of intense precipitation.Nevertheless, the model proposed in this study also suffers from a significant problem of false alarms.In several scattered regions with low radar reflectivity in the observed image, the model predictions show surrounding clusters, suggesting the potential for light drizzle in these areas.This aspect will require further improvement in subsequent phases.Overall, it is indeed a significant improvement over other models in predicting areas of high radar reflectivity.

CONCLUSIONS
We propose an U-shaped residual network with an embedded attention mechanism.To accommodate the decentralized nature of data sources, the federated learning average algorithm is used for distributed training which enables extensive sharing and collaborative use of data from different locations, while ensuring data privacy.When converting multichannel satellite data into radar reflectivity, the U-shaped residual model with embedded hybrid attention layers effectively extracts critical features and progressively restores spatial detail.In comparative experiments, various evaluation metrics of the model show significant improvements over other models.Furthermore, the introduced loss function partially mitigates the effects of unbalanced radar reflectivity distribution by directing the inversion process to prioritize regions with higher radar reflectivity.This is beneficial for the prediction of intense precipitation events.Subsequent work will focus on addressing the issue of false alarms in areas of low radar reflectivity predicted by the model, which may represent instances of light drizzle.Then the performance of the model will be further improved.The model proposed in this paper allows the inversion of observed satellite data into corresponding radar reflectivity data for specific regions, thereby improving the gaps and inhomogeneities in the radar data.

Algorithm 1 .G 2 : 3 : 4 : 5 :
adopted the Federated Averaging algorithm as the foundation of our distributed training architecture.This algorithm is commonly used in the field of Federated Learning and aims to facilitate collaborative model training among different satellite radar data sites while ensuring the privacy of data remains uncompromised.The federated learning framework is presented in Figure 2. In the federated averaging algorithm, the central server initializes the global model parameters, transmits it to all the sites, that is, clients, involved in the training, and selects several clients for training in a fixed ratio.The selected sites are trained locally on the data set they have, using the global model parameters.In each round of training, each site will update the local model and compute the local model parameters.And these local model parameters are then transmitted to the central server via a secure communication channel.No raw data is stored at the central server, only the local model parameters from each site are received.Next, the central server aggregates the local model parameters into global model parameters using weighted averaging.After the aggregation, the central server sends the updated global model parameters back to each site.The sites then utilize the global model parameters to update their local model parameters, enabling collaborative global model training.The process is repeated in multiple iterations until F I G U R E 2 The spatial federated learning framework.Federal averaging algorithm Require: N ≥ 1, C ∈ * (0, 1), T > 0, D > 0 Ensure:  T G 1: initialize global weights  0 for t = 1, 2, 3, … do SC t ⇐ max(C × N, 1), clients selected in round t for each client ∀n ∈ SC t do  ⇐ (d n split into batches size of B, running on client n) 6:

F I G U R E 3
The architecture of the U-shaped residual network with attention.Extracting effective features from the matrix input data is the main function of the encoder layer (See Section 3.2.1).The hybrid attention mechanism layer consists of light spatial and channel attention modules, which are mainly used to improve the correlation between the encoder and the decoder (See Section 3.2.3).The main function of the decoder layer, which incorporates a cross-layer feature reconstruction technique, is to incrementally recover the image size (See Section 3.2.2).

F I G U R E 5
It mainly consists of two submodules.The input matrix is passed through each module to learn the corresponding module weights by matrix multiplication.

TA B L E 4 F I G U R E 8
Hardware and software configuration for the experiments.The loss curve of the model.
clients transmit model parameters to the central server, which aggregates the parameters with the weighted averaging algorithm and updates the global model parameters.Subsequently, the updated global model parameters are disseminated to each client.This process introduces a certain discrepancy between the updated global model parameters and the originally inferred local model parameters on the client side, resulting in fluctuations at critical points.Nevertheless, with the exception of key nodes, the model as a whole shows a steady downward trend with gradual convergence.In addition, the third client converges significantly faster at lower values than the others.After several checks, it was found that this was due to a more reasonable distribution of data in the local client 3 compared to the others.

F I G U R E 9
Indicator results of different models.(A) Accuracy.(B) critical success index.(C) false alarm ratio.(D) Heidke skill score.(E) probability of detection.(F) root mean squared error.The RMSE metric is discussed first, and since the RMSE values for the same model are the same for different thresholds, only the RMSE values for a threshold of 35 are presented in the figure.RMSE serves as a representative measure of model prediction error, where smaller RMSE values indicate a higher degree of consistency between predicted and actual values.Additionally, RMSE is particularly sensitive to outliers, as the squared differences amplify the effects of larger errors.The proposed model in this study achieved an RMSE value of 3.5.This is notably lower compared to Unet's RMSE of 7.7, Danet's RMSE of 4.0, and DeepLabv3's RMSE of 4.

F
I G U R E 10 The performance comparison between centralized training and distributed training.(A) Accuracy.(B) critical success index.(C) false alarm ratio.(D) Heidke skill score.(E) probability of detection.(F) root mean squared error.

Figure 12
Figure 12 displays the inversion results of different methods at the same time point of August 11, 2021, 6:30 p.m.From the figure, it is evident that the inversion performance of the U-Net algorithm is the poorest.It predicts only the area of high radar reflectivity in the center, where intense precipitation occurs.The lower right corner and the upper corner are not predicted.There is also a widespread phenomenon of false alarms.This is most likely due to the insufficient depth of the network layers.In the case of Danet, heavy precipitation is accurately predicted in the lower right and central areas, but the upper area is not, and there are many incorrect predictions of areas of low radar reflectivity.In comparison, DeepLabv3 shows a significant improvement over Danet but predicts too many areas of heavy precipitation, with some surrounding areas also predicted as having heavy precipitation.These problems with the Danet and DeepLabv3 models are probably due to an excessive number of model parameters, requiring more iterations to find appropriate weights.For the same number of iterations, the URes architecture shows superior performance, accurately predicting the three heavy precipitation areas in the middle, lower right, and upper regions.However, the model without the Lham module shows a problem of overprediction.In contrast, the UResLham model with the added module avoids overprediction and accurately represents the areas of intense precipitation.Nevertheless, the model proposed in this study also suffers from a significant problem of false alarms.In several scattered regions with low radar
Probability statistics and weight setting for radar data.
TA B L E 1 Parameters of selected satellite channels.
The evaluation indicators of the two models when the threshold is 20.The evaluation indicators of the two models when the threshold is 35.: ACC, accuracy; CSI, critical success index; FAR, false alarm ratio; HSS, Heidke skill score; POD, probability of detection.
Abbreviations: ACC, accuracy; CSI, critical success index; FAR, false alarm ratio; HSS, Heidke skill score; POD, probability of detection.T A B L E 7Abbreviations: ACC, accuracy; CSI, critical success index; FAR, false alarm ratio; HSS, Heidke skill score; POD, probability of detection.T A B L E 8Abbreviations