1. Introduction
Tropical cyclones (TCs) that originate over the western region of the Southwest Pacific Ocean (5° S–40° S, 135° E–180° E) can devastate the north and northeast coasts of Australia, New Zealand, and the islands of Melanesia (including Vanuatu, New Caledonia, and Fiji). The region from the northern territory to southeast Queensland is the most developed, urbanized, and populated area of Australia, making this region extremely vulnerable to the impacts of extreme TC wind speeds. TCs are an annual threat to northeastern Australia [
1]. The threat is exacerbated in the far northeastern region of Australia because of the complex terrain’s proximity to the coastline, increasing the destructive potential of landfalling TCs [
2]. Western Southwest Pacific Ocean (SWPO) islands such as Vanuatu, Fiji, and Solomon are also vulnerable to TC landfalls [
3] because of the unfavorable shoreline to land area ratio [
4] and combination of low-lying coral atolls, reef islands and volcanically composed islands [
5]. The western SWPO islands also contain a large proportion of the world’s biodiversity. Intense landfalling TCs with higher intensity in a changing climate pose a threat to these natural and human habitats and the situation becomes worse because of the limited amount of financial security, mitigation and adaptation strategies [
6].
TCs’ changing intensity prior to landfall and their linkage with large-scale geophysical parameters have attracted the attention of meteorologists and climatologists [
7,
8]. Coastal safety predominantly depends on accurate TC intensity prediction in the short window just prior to landfall, so coastal communities have enough time to start evacuation procedures. The changing intensity prior to landfall depends on the environmental condition [
9,
10], bathymetric characteristics of the ocean floor [
11], changes in ocean temperature [
12,
13], terrain characteristics of the coast [
14], etc. The surrounding environment of a TC making landfall somewhere along the northeast Australian coast will be different from the surrounding environment of a TC making landfall somewhere along the nearby SWPO islands, due to different large-scale environments, including different sea surface temperature (SST) conditions, sea level pressures [
15,
16], teleconnection patterns such as El Niño–Southern Oscillation (ENSO), as ENSO causes a shift in SST towards the west during La Niña years and towards the east during El Niño years [
17], and topographic effects [
18]. These differences lead to difficulties in accurately predicting the intensity change prior to landfall in computer models.
Threats associated with an approaching TC toward land are partially a function of its wind speed [
8]. However, accurately predicting TC intensity has been more difficult than TC tracks in recent decades [
19]. Rappaport, Franklin [
8] highlighted this disparity by mentioning the need for accurately predicting TC intensity, specifically the intensification and weakening before landfall. Research on the changes in wind speed or minimum central pressure has been primarily focused on the degradation of TCs after making landfall [
16,
20,
21]. Comparatively, few studies have emphasized TC intensification and weakening before making landfall [
8,
19,
22]. There are several mechanisms that have been hypothesized to influence TC intensity near land that could be relevant to TC intensity during the final day or hours before landfall. Wu [
23] suggested heat and moisture fluxes over land differ from that over water and the extent of their influence depends on the amount of time a storm spends over the ocean and over land. Entrainment of dry, continental air is also identified as a determining factor of TC intensity before landfall [
21,
24].
Atmospheric aerosols substantially influence the spatial distribution of cloudiness and hydrometeor contents that ultimately impact the wind speed of a TC approaching land. Khain, Lynn [
25] found continental aerosols invigorated convection mainly toward the periphery of Hurricane Katrina in 2005. This led to weakening prior to landfall. The largest decline in winds took place approximately 24 h before landfall, just after Hurricane Katrina reached maximum intensity. Increasing aerosol intrusion into the rainband region weakens TCs [
26]. Dry air that contains atmospheric aerosols can negatively impact TCs [
27] by fostering enhanced cold downdrafts [
28,
29] and lowering the convective available potential energy within a TC [
30]. According to ref. [
24], dry air intrusion into a moist storm envelope weakens landfalling TCs by reducing surface moisture fluxes and increasing surface friction. The surface energy supply to the TC becomes insufficient to counteract the negative effects from enhanced dry air entrainment. Gulf Coast landfalling Hurricanes Opal (1995), Lili (2002), and Ivan (2004) weakened before their centers crossed the coastline as the eyewall convection started weakening, owing to dry air intrusion [
31].
The depth of the ocean mixed layer is a considerable factor in TC intensity, as the shallow depth of the mixed layer helps storms upwell sub-surface cold water, causing the weakening of the storm [
32]. Shallow coastal water also creates room for storms to upwell cool water to the surface. This negative feedback causes the storm to hinder its own increasing intensity [
32,
33], and this factor reveals the importance of ocean heat content (OHC) in changing TC intensity before making landfall.
There are several other ocean and atmospheric processes that modify the intensity of a storm approaching land. Steep coastal orography leads to increases in the storm intensity and associated damages over the northeastern coast of Australia [
1]. TCs that approach land during neap tides (when the tides are the smallest in a location) tend to be less intense than those arriving at other times [
19].
TC intensity changes prior to landfall have been emphasized in previous studies using observational and statistical analysis [
8,
19,
22,
34], high-resolution numerical simulations [
15,
30,
31], and satellite image analyses [
35,
36]. Statistical models’ success is based on maintaining certain assumptions and they are susceptible to multicollinearity. Machine learning models do not violate multicollinearity assumptions, hence the model performance did not get affected by the presence of multicollinearity. Decision trees have been used to classify 24-h intensity changes [
36] over the North Pacific. Geng, Shi [
37] used a finite mixture model-based cluster algorithm to classify landfalling TCs over China’s coast. For TC track prediction, ref. [
38] used a deep learning framework with a bidirectional gate recurrent unit network that outperforms other state-of-the-art deep learning models, including a recurrent neural network, long short-term memory neural network, and gate recurrent unit network. A deep learning-based multilayer perceptron TC intensity prediction model that used the global Statistical Hurricane Intensity Prediction Scheme predictors correctly predicted more rapid intensification events [
39]. A 3D convolutional neural network (3D-CNN), along with image processing, significantly improved the mean absolute error of TC intensity change predictions and the accuracy of TC intensifying or weakening classification [
40]. By using a CNN structure and geostationary satellite Himawari-8 cloud products, ref. [
41] reported that the model is conducive to estimating TC intensity. A neural network framework for TC intensity prediction named TC-Pred, along with a novel feature extraction and aggregation approach, was designed by [
42] and considered the characteristics of multi-source environmental variables. Their results indicate that TC-pred outperforms other machine learning models at 6 h, 12 h, 18 h, and 24 h intervals, respectively. However, studies on the use of machine learning applications to classify TC intensification or weakening just before landfall are comparatively sparse in the literature, particularly over the western SWPO.
Mitigating TC risk requires a more informed coastal community. Accurately predicting TC intensity prior to landfall should lead to a higher success rate of informed decisions along the coast. The purpose of this study is to develop a prediction model for whether a TC will intensify or weaken six hours prior to landfall using a random forest classifier and historical data, including physical observations of TCs 24 h prior to landfall.
2. Data
This paper utilizes the Southwest Pacific Enhanced Archive for Tropical Cyclones (SPEArTC) best track six-hourly data from 1980 to 2018 to extract the geographical position and V
max of each storm, including six hours to twenty-four hours prior to their landfall. Storms generated within 5° S–40° S and 135° E–180° E are considered for this study. Determining the precise time and location of landfall is crucial. The National Hurricane Center defines the term landfall to be where the surface center of a TC intersects with the coastline [
43]. However, the exact time and location of a TC landfall is sometimes difficult to identify. The TCs that made landfall within the study region have been categorized into the following two groups: (1) mainland Australia landfall TCs, and (2) SWPO island landfall TCs (
Figure 1). This study considers only the first landfall when a TC intersects the land for the first time. For both mainland and island cases, when a TC first crosses land, the closest hour prior to the intersection is considered as the landfall hour in this study. Storm hour after making landfall was not included because of the land influence on the intensity. If the V
max at the landfall hour increased or remained the same 24 h prior to landfall, the TC is considered as intensifying (labeled as 1) (
Figure 2a). If the V
max at the landfall hour decreased from 24 h prior to landfall, the TC is considered as weakening (labeled as 0) (
Figure 2a).
This study restricted the selection of TCs to only those storms that spent ≥24 h over the ocean before making the first landfall in order for the storm to spend a sufficient amount of time over the ocean to gain strength from the surrounding environment. These criteria resulted in a total of 68 TCs that made Australian mainland landfall, and 99 TCs that made SWPO island landfall. In the 68 mainland landfalls, 69% intensified and 31% weakened prior to landfall, and in the 99 island landfalls, 82% intensified and 18% weakened. These class-specific differences motivated us to incorporate cross validation (CV) and sampling techniques to adjust the baseline differences [
44] while performing class-specific prediction.
Several geophysical and aerosol variables are used in this study as input variables. Geophysical and aerosol variables that correspond to each sample are extracted from the nearest grid cell to each storm point 24 h before landfall, rather than considering domain averages to ascertain a localized picture of the relationship. Gridded multidimensional atmospheric, oceanic, and aerosol variables that were spatially and temporally closest to the 24 h prior to landfall were collected for each sample. The input variables considered in this study are latitude, longitude, V
max (ms
−1), vertical wind velocity (VW), sea surface skin temperature (SkT), ocean heat content (OHC) at a 700-meter depth, specific humidity at 300 hPa (sphum), air temperature at 200 hPa (airtemp), 850–200 hPa vertical wind shear (VWS), and sea salt extinction aerosol optical depth (SSAOD) at 550 nm (
Table 1). TCs escalate upper ocean mixing and SST cooling during intensification [
45]. SST can influence TC maximum intensity through its influence on upper atmospheric temperatures [
46]. In addition to a warmer SST, TC intensification or weakening is also affected by the vertical thermodynamic properties of the atmosphere [
47]. The historical upper tropospheric temperature has increased significantly faster than the tropical mean, and the warming of the upper troposphere controls TC intensification, which is associated with ocean warming [
48,
49,
50,
51,
52]. Thus, considering the upper troposphere is important to better understand the thermodynamic influence on TC intensity.
SSAOD data were collected from the Modern-Era Retrospective Analysis for Research and Applications, version 2 (MERRA-2). MERRA-2 is a NASA atmospheric reanalysis tool for the satellite era that uses the Goddard Earth Observing System Model, version 5 (GEOS-5), with its Atmospheric Data Assimilation System (ADAS), version 5.12.4 [
53]. The geophysical variables in
Table 1 were collected from the National Centers for Environmental Prediction (NCEP)/National Center for Atmospheric Research (NCAR) reanalysis 1 project [
54]. The reanalysis product provides daily data four times in a day at six-hourly intervals at 0 Z, 6 Z, 12 Z, and 18 Z. The variable data were collected from the above-mentioned sources depending on the data point that was spatially and temporally closest to the 24 h prior to landfall for each sample (
Figure 2b). The monthly OHC for the 700-m depth data were collected from the Ocean ReAnalysis System 5 (ORAS5) estimated by the European Centre for Medium-Range Weather Forecasts (ECMWF).
3. Method
To test collinearity between the predictor variables, we computed the correlation matrix between each contributing variables for both the mainland and island cases to understand the strength and magnitude between each of them. The collinearity test was performed using the “Hmisc” package in R software.
We used the “randomForest” package [
55] in the R programming language to generate our machine learning classifiers. Random forest classification models [
56] were used to classify TC intensification or weakening prior to landfall over Australia and the nearby SWPO islands using variables 24 h prior to landfall. This paper made separate random forest classifiers for mainland and island landfalling TCs to reduce potential bias presented by different orographic influences [
3]. For both models, 80% of the data were used to train the models, and 20% were used to test model performance.
Random forest is a non-parametric, ensemble-based supervised machine learning model that generates a higher number of individual classification trees (K) by drawing multiple bootstrap samples with replacements from the training data. Each tree is trained with distinct bootstrapped samples. At each node of the random forest, a randomly selected subset m of M input variables is specified, and the best split on this m is used to grow the tree [
56]. This leads to an ensemble of all K classification trees, and the prediction from the ensemble is the average of the prediction of the trees. This process contributes to the reduction in multicollinearity between variables between the constructed classification trees, as well as a reduction in variance [
56].
Random forest provides a list of the importance of each variable [
57]. The mean decrease Gini (MDG) index ranks the importance of each variable in the classification decision. For the classification decision, the node impurity is measured by the Gini index. The Gini index can be used as a general indicator of variable relevance. This variable importance score provides a relative ranking of the variables, and is technically a by-product in the training of the random forest. At each node within the binary trees of the random forest, the optimal split can be achieved using the Gini impurity value, which measures how well a potential split separates the samples of the two classes in this particular node. This indicates the efficiency of the Gini Index in explicit variable selection [
58]. The MDG is the total decrease in node impurities from the splitting of the variables, averaged over all the trees. The larger the MDG value, the higher the importance of the variable [
59].
CV involves partitioning the data into a number of groups, using each in turn as a test set for the models produced using the remaining data, and choosing the method that achieves the highest accuracy [
60,
61,
62]. It is the optimum method of model selection, especially when the data size is small, and is often used to increase the generalization ability of a classifier [
63]. This paper uses the following two types of CV methods: repeated K-fold CV and leave-one-out CV (LOOCV). LOOCV is widely used, as it can provide an unbiased estimate of the model performance for the test data [
64]. LOOCV is also appropriate when the dataset is small and an accurate estimate of model performance is more important. However, robust model selection should also be based on minimizing the generalization errors. Repeated K-fold CV is an unbiased estimator of the variance in K-fold CV [
65]. The main approach of repeated K-fold CV is to repeat the K-fold CV process multiple times and report the mean performance across all folds and all repeats and each repeat must be performed on the same dataset split into different folds. This process provides the benefit of improving the estimate of the mean model performance.
The p or the number of predictor variables (mtry = p) and n or number of trees (ntree = n) are tuned for optimal performance using repeated three-fold CV with three repeats and LOOCV tuning approaches. The maximum depth of the individual tree was selected as the stopping criteria and this value was set as 4. Based on the models’ performance for the testing data with different combinations of p and n, all the classification models for both mainland Australia and SWPO island cases with mtry = 10 and ntree = 500 gave the best classification accuracy.
Classification accuracy is the ratio of the number of correct predictions to the total number of input samples. However, it will provide higher accuracy for the class that contains the higher number of cases if the sets are unbalanced [
66]. A classification data set with skewed class proportions is called unbalanced. Several studies have found class imbalance to be responsible for the low accuracy of traditional classification models, including linear discriminant analysis [
67], support vector machines [
68], and classification trees [
66]. An approach to handle this issue is to re-balance the training data set by applying different sampling techniques, including oversampling of the minority class, the synthetic minority over sampling technique (SMOTE) sampling, and random over sampling examples (ROSE). Sampling methods preprocess the data, which includes constructing a balanced data set and adjusting the prior distributions of the majority and minority classes [
69,
70].
Oversampling balances the differences between the majority and minority classes by randomly replicating samples from the minority class [
71]. However, as this method contributes to the balance of the class distribution without adding new information to the dataset, it can cause overfitting. Chawla, Bowyer [
72] proposed SMOTE sampling by creating synthetic samples of the minority class, rather than using over sampling with replacements. ROSE sampling involves smoothed bootstrapping to draw artificial samples from the feature space neighborhood around the minority class [
73]. As the ROSE sampling process generates new artificial data that have not been observed previously, it reduces the risk of overfitting and improves the generalization ability [
74]. The majority of the cases belong to the intensifying class for both mainland and island landfalls rather than the weakening class. Therefore, this paper applied three types of sampling for the random forest model. Since the events are rare, we are in some measure limited in what statistical inferences the model can gather from the data.
The ultimate standard for the performance of a machine learning model is its predictive capability using the testing data (i.e., generalization error). This paper used classification accuracy and a confusion matrix (
Table 2) [
75,
76,
77] to evaluate the model’s prediction performance. The confusion matrix is used to calculate sensitivity, specificity, and the area under the curve (AUC) to understand the robustness of the model.
Sensitivity or true positive rate (TPR; Equation (1)) is the model’s ability to correctly detect intensifying cases when the storm is indeed intensifying. False negative (FN) corresponds to the proportion of intensifying cases that are mistakenly considered as weakening, with respect to all the intensifying cases. Specificity or true negative rate (TNR; Equation (2)) refers to the model’s ability to correctly measure the proportion of correctly identified weakening cases. False positive (FP) corresponds to the proportion of weakening cases that are mistakenly considered as intensifying, with respect to all the weakening cases. The metrics in Equations (1) and (2) assess the potential weaknesses in the model that may be missed by reporting prediction accuracy alone, especially for unbalanced datasets [
78].
where TP is true positive.
where TN is true negative.
The relationship between classification confidence/probability and correct labeling can be illustrated by using a receiver operating characteristic (ROC) curve [
79]. The ROC curves have been created with three probabilistic thresholds. The AUC reports the probability of false detection versus the probability of detection [
78], with a value approaching 1.0 indicating high sensitivity and specificity [
79], and thus a robust model.
5. Conclusions
TCs are capable of generating significant economic and human loss. Accurate predictions of weakening or intensifying TCs prior to their arrival at the coast can assist in minimizing the potential disasters. This study utilized random forest classifiers to predict TC intensification or weakening based on multiple geophysical and aerosol variables and the initial intensity of TCs 24 h prior to their landfall. TCs are separated into Australia mainland and SWPO Island TCs based on their location of landfall, due to their different terrain characteristics. To alleviate the class imbalance problem, sampling techniques were applied on the training data and to achieve the optimum performance on the testing data, two types of CV techniques were applied for the model. Nine different combinations of random forest models were trained to achieve the highest accuracy and robust results using the testing data.
LOOCV with ROSE sampling random forest models demonstrated the best performance for both the mainland and island landfall cases. Longitude, initial intensity, and SkT were highlighted as the important variables for classifying both mainland and island landfalling TCs, indicating their importance in TC prediction for the entire SWPO basin. Th importance of longitude indicates the seasonal shift of the large-scale atmospheric–oceanic circulation and their associated ocean temperatures and cloud belt shifts. A subset of category three TCs from both mainland and island cases was used to generate another random forest model to validate the importance of initial intensity. The model ranked initial intensity as the highest contributing factor in the classification decision. This result provided us with the opportunity to further study the importance of large-scale climate dynamics’ influence on the initial intensity of TCs by integrating high-resolution model data, along with machine learning techniques.
Incorrectly classified cases were analyzed by sorting their initial intensity hour, landfall hour, monthly distribution, and 24-h intensity change. The number of incorrect cases was the highest during the afternoon hours during the peak TC season mainly over the Western Cape York Peninsula and Vanuatu Island chain. Cloud cover over the Western Cape York Peninsula [
97] and the mountainous terrain characteristics over the Vanuatu island chain [
50] hinder the model from providing correct landfalling intensity prediction. Higher FP cases from both the mainland and island model indicate the model’s inability to provide absolute accuracy while classifying intensifying cases. This limitation provided us with the motivation to further optimize the model with more data and optimize the model hyperparameters with detailed and advanced CV and calibration techniques. A detailed analysis of the incorrectly classified cases from six-hourly or hourly radar data and with advanced machine learning models is recommended.
This research represents an attempt at modeling TC strength just before landfall using machine learning. Future research will be dedicated to improving the sampling of the storm environment, include reanalysis tracks data for model performance verification, identify the additional environmental and aerosol factors responsible, such as eddy flux convergence [
102], the upper-level trough [
7], influence of terrain and continental aerosols such as PM2.5 for improving intensity predictions of TCs approaching land, and investigate the differences between large-scale parameters following model successes and failures.