Tropical cyclone intensity prediction by inter- and intra-pattern fusion based on multi-source data

Tropical cyclones (TCs) are one of the most destructive natural disasters, which can bring huge life and economic losses to the global coastal areas. Accurate TC intensity prediction is critical for disaster prevention and loss reduction, but the dynamic processes involved in TCs are complicated and not adequately understood, which make the intensity prediction is still a challenging task. In recent years, several deep-learning (DL)-based methods have been developed for TC prediction by mining TC intensity series or related environmental factors. However, information hidden between the two different data sources is generally ignored. Here, a novel DL-based TC intensity prediction network named Pre_3D is proposed, which aimed to mine of inter- and intra-patterns of TC intensity and related external factors independently by separate feature extraction sub-networks. An MLP network is adopted to achieve adaptive fusion of the two patterns for accurate TCs intensity prediction. TC records from several agencies were used to evaluate generalizability of the proposed framework and extensive experiments were conducted validate its effectiveness. The experimental results demonstrate that the models based on the Pre_3D framework achieved considerable performance. ConvGRU-based Pre_3D yields a significant improvement of over 15% in prediction accuracy in 24 h prediction relative to official agencies.


Introduction
Tropical cyclones (TCs) are among the most destructive natural hazards, along with storm winds, torrential rains, and floods (Zhang et al 2009, Woodruff et al 2013, Emanuel 2018. The prediction on TC track and intensity (maximum sustained wind speed of TC) has become one of the most attractive research topics in the fields of meteorology and oceanography (Gao et al 2016. In recent decades, remarkable advances in trajectory prediction have been made with the development of in situ observation techniques and dynamic models (Davis 2018, Kim et al 2019, Chen and Yu 2020. Average track prediction errors of TCs in the Atlantic basin from the National Hurricane Center have been reduced by ∼50% since the mid 1990s. However, TC intensity prediction is still a challenging task (Cangialosi et al 2020).
Current TC intensity prediction models can be divided into three main categories: dynamical models, statistical models, and deep-learning (DL) models. Dynamical methods generally predict TC intensity by solving hydrodynamic equations on the basis of the meteorological fields (Lynch 2008), as for the representative regional method of the Hurricane Weather and Research Forecasting model (Tallapragada et al 2014) and the global method of the European Center for Medium-Range Weather Forecasts model (Benedetti et al 2009). In contrast, statistical methods usually develop multiple regression equations based on historical data, with representative methods including the Statistical Hurricane Intensity Prediction Scheme (SHIPS; DeMaria and Kaplan 1994), Decay-SHIPS (DSHP;DeMaria et al 2005), and the Logistic Growth Equation Model. These two categories of TC prediction method have provided technical support for TC early warning systems. However, dynamic and nonlinear processes in TCs have been investigated using mainly short-term records and are not yet well quantified, with prediction accuracy and robustness of the above methods are still limited (Sandery et al 2010, DeMaria et al 2014, Altman et al 2018. DL algorithms have strong ability to automatically extract high-level representations from complex data, and they have been successfully applied in many prediction tasks (Xiao et al 2019, Ravuri et al 2021, Zhang et al 2022. They are also capable of learning nonlinear change patterns in TC intensity and extracting potential relationships between TC intensity and environmental variables to further improve prediction performance. For example, a 12-72 h intensity prediction model was designed for the northwest Pacific Ocean (NWPO) using a multi-layer perceptron (MLP; Baik and Peak, 2000), reducing the average prediction error by 7%-16% over the multiple linear regression method. An MLP network was also applied to the Atlantic Basin, reducing the prediction error comparing with the statistical and dynamical models such as SHIPS and DSHP by 5%-22% (Xu et al 2021). In addition to these simple DL-based algorithms, Pan et al (2019) introduced the Long Short-Term Memory (LSTM) network to TC intensity prediction; it can handle long-term dependencies of time-series data, achieving improved prediction performance.
The aforementioned DL-based methods predict TC intensity by directly mining the change patterns of intensity time series. There are also other methods just utilizing the surrounding environmental data (such as sea surface temperature [SST] or u-and vcomponent winds) as input variables. For example, Wang et al (2020) proposed a 24 h prediction structure, which first captured key features of the surrounding environment around the TC using a 3D convolutional neural network (3D-CNN) and then mapped relationships between TC intensity in the next time interval and key features in the current environment with an MLP network. Variations in surrounding environmental data also contain some hidden regularities in the time dimension, and the change pattern is helpful for intensity prediction, which can be investigated by LSTM or convolutional gated recurrent unit (ConvGRU) networks , Zhang et al 2022.
In summary, current TC intensity prediction studies are based mainly on the mining the change patterns of TCs' intensity sequence (intra-pattern) or external environmental variables (inter-pattern).
Here, a new DL-based model for TC intensity prediction was developed by fusing multi-source data, containing three sub-networks: sub-network 1 involving extraction of TC intensity intra-patterns based on the Enc-Dec model with 1D convolution; sub-networks 2 and 3 were developed to mine the influence patterns of 2D and 3D marine and atmospheric variables (e.g. SST and u-and v-component winds) on TC intensity, composed into the inter-pattern. TC intensity prediction was achieved by merging the inter-and intrapatterns using an MLP network. The model details are described in section 2, and the performance of the proposed model is assessed in section 3. We further analyze the characteristics of forecasting errors from the perspectives of TC intensity intervals, intensity variations, and spatial distribution in section 4. At last, we summarized the main findings in the paper in section 5.

Best-track and reanalysis data
The NWPO is the most active TC basin with almost one-third of global TCs (Guan et al 2018, and is therefore deemed an ideal region for the study of TCs and related topics. This study investigated multi-source data for TCs, including best track (BST) data and a variety of environmental variables. BST, provided by the International Best Track Archive for Climate Stewardship TC database, contain basic information for cyclones with an interval of 6 h from different agencies such as the regional specialized meteorological center (RSMC) Tokyo, China Meteorological Administration (CMA), joint typhoon warning center (JTWC), etc, including the timestamp, longitude, latitude, maximum sustained wind speed and etc.
Environmental variables including SST, vertical wind shear (VWS), and multi-isobaric surface meteorological variables composed of u-component of wind (u), v-component of wind (v), temperature (T), and geopotential height (z) have been proved strongly correlated with TC intensity (Baik and Peak 2000, Vecchi and Soden 2007, Tang and Emanuel 2010, Wu et al 2021. Meteorological environmental data including u, v, T and z were obtained from the ERA5 hourly data on pressure levels from 1979 to present dataset (Hersbach et al 2018). We chose the four isobaric surfaces of 250, 500, 750, and 1000 hPa for characterization of the vertical structure of TCs, and the vertical shear was calculated from u and v at 850 and 200 hPa. Similarly, SST data variable is chosen from ERA5 hourly data on single levels from 1979 to present for experiments. The spatial resolution of the above dataset is 0.25 • and the observation frequency is 6 h.

Models and methods
Assuming that the TC intensity at time t is related to previous (s + 1) timestamps, the prediction problem can be defined as: where Y t+6k is the TC intensity prediction value at time (t + 6k), and k is the prediction step; f represents the prediction model derived by training historical data. The inputs of the model comprise three parts expressed as X = [X 1D , X 2D , X 3D ], where X 1D represents a 1D time series including TC intensity, latitude, and longitude of TCs; X 2D contains 2D data of SST and VWS; and X 3D includes 3D meteorological data for multiple isobaric surfaces. X = [X 1D , X 2D , X 3D ] represents the multi-source variable at time t. The determination of parameter s is discussed in section 3.2.
To fully mine the implicit relationships between environmental variables and TC intensity, the three sub-networks have been integrated for analysis of multi-source variables. The overall network frame diagram is shown in figure 1, and the sub-networks the proposed model are described as follows.
• Sub-network 1: TC intensity temporal feature extraction This sub-network was designed mainly to mine TC intensity intra-change patterns, and it comprises a 1D convolution neural network (1D-CNN) and a gated recurrent unit (GRU) network. The local features of X 1D are extracted through the 1D-CNN and fed into the Encoder-Decoder (Enc-Dec) structure, which comprises two independent GRUs (Cho et al 2014) where input data are abstracted into a higher-level feature vector in the Encoder network and used as the hidden layer in the Decoder network. This further enhances the extraction of long-term dependencies. In order to make fuller use of historical information and adaptively extract features at key moments, attention mechanism is embedded in the Enc-Dec structure. The output of the sub-network is denoted as Output_1D.
• Sub-network 2: 2D variables feature extraction Unlike the intensity sequence, the SST and VWS are spatially 2D, and recurrent neural network (RNN) networks are unable to handle these. In this subnetwork, we first used a 2D-CNN with a kernel size of 1 to achieve the attention weights of different spatial positions, strengthening the feature extraction of key areas. The spatial attention map was then added to the input variables, combined to obtain the reconstructed feature maps. ConvGRU is proposed as a variant of GRU, replacing the Hadamard product in GRU with a convolution operation that gives it the ability to extract spatial information (Shi et al 2015, Woo et al 2018. The reconstituted feature maps are fed into a set of ConvGRU to synchronically extracts spatio-temporal features of 2D meteorological data The output of the sub-network is denoted Output_2D.
• Sub-network 3: 3D variables feature extraction The feature extraction for 3D spatial variables is similar to that for 2D data, with the ConvGRU network also being used. However, 3D spatial information and temporal correlations of meteorological environmental data are extracted using 3D-CNN and 3D-ConvGRU, respectively. The output of the third sub-network is denoted as Output_3D.
In order to further extract potential implications of environmental variables for TC intensity prediction for development of a comprehensive prediction model, the output feature vectors of the above subnetworks were fused. Here, we adopted a MLP network to optimize the weights and biases for all input features. The network framework is named Pre_3D, and it can be expressed as: where y pre represents the predicted TC intensity; F is the multilayer perceptron network; and W and b are the weights and biases of sub-module outputs, respectively.

Data processing and training
Environmental data were discrete-valued and recorded synchronously for each grid of the reanalysis.
In the NWPO, 1 • in latitude or longitude represents ∼111 km. Therefore, to accommodate the horizontal scale of TC structures (∼1000 km; Chen et al 2019), the NWPO was divided into a 10 • × 10 • grid. Data for TC intensity, latitude and longitude, and marine and meteorological environmental variables were then intercepted by sliding windows to generate short sequences. The length of a sequence was (s + k), as defined in equation (1). Data records for 2000-2019 from RSMC Tokyo, CMA, JTWC, and HURDA were split into chronological sequences, with available samples of each agency shown in table 1. To ensure that data in the test set did not overlap with the training set, the data set was divided into separate training (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013) and test (2014-2019) sets at an approximately 7:3 ratio. RSMC Tokyo is the World Meteorological Organization's official agency for NWPO, so in the experiments here we analyzed data mainly from RSMC Tokyo, with data from other agencies being used for auxiliary validation. To avoid the impact of different scales across the input data, the input data were normalized as follows: where x and x norm are the primitive input variable and corresponding normalized value, and x min and x max represent the minimum and maximum values, respectively. The PyTorch framework was used to implementing and training the proposed model for TC intensity prediction. All experiments were completed using a PC (configured with an Intel core™ i9 9900k; 32 GB memory; 11G NVIDIA RTX2080TI). Moreover, the Adaptive Moment Estimation was selected as the optimizer in the proposed method, with its proven generality and fast convergence in DL networks.

Evaluation metrics
To compare the performances of different neural network structures in the prediction of TC intensity, mean absolute error (MAE), mean squared error (MSE), R-squared (R 2 ), and mean absolute percentage error (MAPE) were used as evaluation metrics. MAE measures the absolute magnitude of the prediction error; MSE indicates the accuracy of prediction; MAPE is independent of the size of the measurement and reflects the relative magnitude of the error; and R 2 indicates the fit of the prediction result. These metrics are defined as follows: where y i andŷ i are the actual and predicted values at time i, N is the number of time points to be predicted, andȳ are the average of the actual intensity.

Parameter determination
The prediction step is usually 4-20 timestamps in TC intensity forecasting, with a corresponding time interval of 24-120 h (Huang et al 2021). Among these different step lengths, 24 h forecasting is the most important and concerned by the official agencies  , so here we set the prediction step as 4 (i.e. 24 h). The input time horizon is an important parameter for time-series forecasting models. Short horizons may miss critical long-term dependencies, while long horizons may incur interfering information. The time horizon should therefore be optimized for actual applications. Here, a double-layer GRU network was used to analyze time dependencies of TC intensity sequences, and the network was run five times with different time horizons of 1-5, with corresponding MAEs of 8. 68, 8.42, 8.37, 8.26, and 8.35 kt, respectively. Therefore, in the subsequent experiments, the parameter s was set to 4, which is consistent with the findings of Chen et al (2019) and Zhang et al (2022).

Verification and evaluation on module contribution
In previous studies of TC intensity prediction, only intensity series of cyclones or environmental information were considered, ignoring their correlation. The Pre_3D model focuses on extraction and integration of environmental information with intensity information. A series of ablation experiments was undertaken to analyze the contribution of each subnetwork to performance, with results presented in table 2. Subnetwork 1 was analyzed when only BST data was available for input. We conclude that, compared with LSTM and RNN, the GRU has greater ability to extract TC temporal information. With the addition of the Enc-Dec structure, historical information can be fully mined and model predictive-forecasting capability further improved. Pre_1D comprises the complete sub-network 1, and an attention module was attached to Enc-Dec, adaptively fusing data at each time step. When subnetwork 2 was activated, the Pre_2D was completed. The prediction performance of the model was improved substantially, indicating that the two-dimensional environmental variables were effectively and adequately extracted. Comparison of the experimental results for the two groups indicates that the spatial attention module (SA) contributes to prediction accuracy by intensifying the focus on key regions. When 3D data was added to the input set, the Pre_3D was presented in its entirety. The prediction accuracy was further improved using Pre_2D, which indicates that subnetwork 3 can also effectively extract 3D environmental information and contribute to the prediction of TC intensity.

Evaluation of different methods
The previously proposed DL cyclone prediction models Hybrid CNN-LSTM , TC 3DCNN  and TC_Pred (Zhang et al 2022) were re-implemented and the results were shown at the bottom of table 1. Due to the missing BST sequence, none of the above methods has higher prediction progress compared to Pre_3D. In addition, several widely used temporospatial dataprocessing methods, including Full Connect LSTM From the figure 2, all models based on the Pre_3D framework perform better than the baseline (8.26 kt). In particular, the Pre_3D (ConvGRU) had better performance in both MAE and MSE, with a 12.36% improvement in MAE compared with the baseline, which further demonstrates the ability of the Pre_3D framework to efficiently extract information in environmental variables and combine time series of TC to achieve accurate predictions.
Meanwhile, the inability of Pre_3D(FC-LSTM) to extract spatial features of environmental information leads to poor performance. Pre_3D (ConvLSTM), Pre_3D (TrajGRU), Pre_3 (ST-LSTM), and Pre_3D (MIM) share similar structures in terms of network and are roughly similar in performance. Optical flow units overlaps functionally with the SA, both aiming to focus on regions of environmental variables, and may lead to poor performance of TrajGRU. Memory units in ST-LSTM and MIM may be advantageous in longer prediction tasks. ConvLSTM and ConvGRU have similar prediction performances, but because GRU has fewer parameters and a higher training speed, the Pre_3D (ConvGRU) was used in a subsequent study.

Evaluation of model generalizability
The above experiment is based on TC records of RSMC Tokyo and the method was validated using a fixed training/testing set. For further analysis of the generalization of Pre_3D, the framework was examined from two perspectives: (a) different TC records for the NWPO and NA were used for training and testing of the model and comparison with prediction errors given by official agencies; (b) the training and testing sets were redistributed for TC recorded by RSMC Tokyo to verify the temporal stability of the framework.
TC records from CMA, JTWC in the NWPO, and North Atlantic hurricane database (HURDAT) in the NA were used to validate the generalization performance of the model. Different functional modules were first validated for different datasets, with MAEs as shown in figure 3. Comparing Pre_1D and Pre_2D, the prediction results of Pre_3D showed some improvement in the prediction results for each dataset, especially for the records from CMA, which improved by 12.5% and 3.4%, respectively. The improvement further illustrating the effectiveness of the environmental feature extraction method and feature fusion process. The prediction results of Pre_3D (ConvGRU) for each dataset and the prediction errors for each official agency are shown at the bottom of table 2. The prediction results of CMA and JTWC show a significant improvement over the forecasting agencies, with a 19.1% improvement in prediction accuracy for the CMA dataset in particular. However, although both subnetworks 2 and 3 improved the prediction results to some extent with HURDAT, the final prediction results are still inferior to the official agency baseline (9.30 kt).
In addition, the RSMC Tokyo dataset was divided into five different training and testing sets ( figure 4(a)) with reference to the k-fold crossvalidation method, and independent training and testing were performed to verify the stability of the method using different datasets. The MAE and MSE of the five splits of experiments are shown as box plots in figures 4 (b) and (c), with the mean values of the five experiments shown as red lines in the boxes. The feature extraction ability of each sub-network and the effectiveness of the fusion method were once again confirmed by comparing the mean results indicated by the red lines. For each split dataset, comparison of the agency prediction results (table 2) indicates smaller prediction errors for the Pre_3D framework, with prediction results fluctuating more between different datasets. For example, the test result for split 2 is 6.64 kt, and that for split 1 is 7.23 kt, which may be due to the presence of more difficult cyclones in the test set of split 1.

Errors with different TC intensities
According to the cyclone classification standard of RSMC Tokyo, TCs with recorded intensity can be classified into five categories, Tropical Storm, Severe Tropical Storm, Typhoon, Very Strong Typhoon and Violent Typhoon, for which corresponding MAEs and MAPEs were calculated (table 3). The Tropical Storm gets the highest MAPE, and the Violent Typhoon gets the highest MAE. It can be concluded that stronger and weaker TC intensities thus lead to more challenges and larger forecasting errors. The MAE reached 10.94 kt for the Violent Typhoon, possibly due to changing patterns and the influence of the increasing complexity of environmental variables with increasing TC intensity. The large deviation of intensity inversion when the wind speed is high also increases prediction error Black 2003, Klotz andUhlhorn 2014). The MAPE for the Tropical Storm reached 16.26% exceptionally, possibly because of the difficulty of observation when the TC intensity is low, and with data for tropical storms being relatively limited and key features not being fully excavated. Furthermore, most samples in this category were associated with TC landfall with dissipated energy and with intensity variation patterns becoming more complicated.

Errors with different TC intensity variations (TCIVs)
Apart from TC intensity, the TCIV is also an area of interest in TC studies. Here, the TCIV is defined as the change of TC intensity within 24 h (kt per 24 h). A scatter-diagram showing the relationship between predicted TCIV and real TCIV derived from BST data is shown in figure 5(a); with a correlation coefficient of ∼0.80 indicating the high prediction accuracy of the proposed method. The distribution of test samples and intensity-prediction results with different TCIVs is shown in the heat map ( figure 5(b)). It can be seen that the 24 h intensity variation being concentrated mainly in the [−20 kt, 20 kt] range. Furthermore, the prediction error calculated under different intensity variations was found to positively correlated with the TCIV ( figure 5(c)). Large intensity variations may thus reduce the performance of the proposed method. Moreover, the influence of positive intensity variations on prediction results is much greater than that of negative variations, implying that it is more difficult to make predictions with high accuracy during the strengthening of TCs, relative to their weakening. However, prediction errors are relatively small with intensity variations within the concentration range of [−20 kt, 20 kt] ( figure 5(b)).

Spatial distribution of errors
Division of the NWPO (0 • -60 • N, 100 • -180 • E) into a 2 • × 2 • grid allows investigation of the spatial distribution of prediction errors by calculation of the average error for each grid. Moreover, referring to the .91%, and 12.12%, respectively. It is obviously that the prediction errors of regions 1 and 3 are greater than those of region 2 for both MAEs and MAPEs. Although our model achieves favorable prediction results for most samples, the extraction of different features in different regions must be further strengthened. In addition, the areas with large errors occurred mainly in coastal areas and in the initial areas of the TCs, so it may be helpful to take regional differences into consideration.

Conclusions
In this study, a DL-based TC intensity prediction framework named Pre_3D is proposed. In the Pre_3D, three separate feature extraction subnetworks are designed to efficiently extract cyclone intensity variation and spatial environmental features from multiple sources data. And the adaptive features fusion of each sub-network is achieved by using MLP network. With fusion of intra-patterns of intensity series and inter-patterns influenced by external environmental features, the intensity can be predicted more accurately. The Pre_3D was evaluated by ERA5 reanalysis and BST data. A series of ablation experiments are conducted to illustrating the contribution of various parts of the Pre_3D. And multiple TC records are used to evaluate the generalizability of the framework. The results indicating that the prediction accuracy of the proposed method is significantly improved relative to official agencies and frequently used DL-based models. Based on the analysis of error distributions, we conclude that TCs with large intensity changes or located in low-latitude areas are more difficult to model. There are also some limitations in the proposed method. For example, the selection of environmental variables and the determination of the spatial scales of these variables need further refinement. The construction of individual models in different intensity ranges and spatial regions also aid optimization of the predictions. In addition, since the ERA5 has poor real-time performance, it is also an issue worth considering how migration learning can be accomplished on models trained on reanalysis datasets using less real-time environmental data.

Data availability statement
The data that support the findings of this study are openly available at the following URL/DOI: https:// apps.ecmwf.int/data-catalogues/era5 and www.ncdc. noaa.gov/ibtracs.