A Transformer‐Based Deep Learning Model for Successful Predictions of the 2021 Second‐Year La Niña Condition

A purely data‐driven and transformer‐based model with a novel self‐attention mechanism (3D‐Geoformer) is used to make predictions by adopting a rolling predictive manner similar to that in dynamical coupled models. The 3D‐Geoformer yields a successful prediction of the 2021 second‐year cooling conditions that followed the 2020 La Niña event, including covarying anomalies of surface wind stress and three‐dimensional (3D) upper‐ocean temperature, the reoccurrence of negative subsurface temperature anomalies in the eastern equatorial Pacific and a corresponding turning point of sea surface temperature (SST) evolution in mid‐2021. The reasons for the successful prediction with interpretability are explored comprehensively by performing sensitivity experiments with modulating effects on SST due to wind and subsurface thermal forcings being separately considered in the input predictors for prediction. A comparison is also conducted with physics‐based modeling, illustrating the suitability and effectiveness of 3D‐Geoformer as a new platform for El Niño and Southern Oscillation studies.

GAO ET AL. 10.1029/2023GL104034 2 of 10 been made to develop and improve coupled ocean-atmosphere models for predicting ENSO (Tang et al., 2018;Zhang et al., 2020). For example, more than 20 statistical and dynamical models have been used to make real-time ENSO predictions, and the results are routinely collected by the International Research Institute for Climate and Society (IRI) for further applications (Barnston et al., 2012;Cane et al., 1986;Zhang & Gao, 2016). However, many models failed to predict the Niño 3.4 SST evolution during 2020-2021 ( Figure 2). In particular, it shows obvious biases and uncertainties in predicting the 2021 second-year cooling condition, with a turning point of SST anomaly evolution in mid-2021. This difficulty in predicting the prolonged La Niña conditions presents a great challenge for the climate modeling community, indicating a clear need to find a way to improve real-time predictions and to understand the responsible processes.
Recently, data-driven deep learning-based modeling has made great progress in geoscience (e.g., Liu et al., 2021;Reichstein et al., 2019;Zheng et al., 2020;Zhu et al., 2022), with novel applications to ENSO studies (Ham et al., 2019(Ham et al., , 2021J. Hu et al., 2021;Mu et al., 2021;Nooteboom et al., 2018;Petersik & Dijkstra, 2020;. It has been demonstrated that ENSO predictions can be made using these innovative modeling platforms. In particular, Zhou and Zhang (2023) recently developed a novel transformer-based deep learning model using a specific self-attention-enhanced neural network (Vaswani et al., 2017), named 3D-Geoformer, which can predict ENSO-related three-dimensional (3D) upper-ocean temperature anomalies together with wind stress anomalies well. The deep learning-based model is designed in the same way as dynamical models to represent the coupling among surface winds, 3D subsurface temperature and SST.
Here, we will take advantage of this transformer-based deep learning model to demonstrate its striking performance in making predictions of the 2020-2021 prolonged La Niña conditions in the tropical Pacific. We put our analysis focus on the second-year cooling in 2021 because, as explicitly mentioned above, many other dynamical models have difficulty in predicting this evolution. Furthermore, to improve the interpretability of the results, sensitivity experiments are conducted by modulating the intensities of subsurface thermal effect and wind forcing. An additional comparison between this  The Niño 3.4 sea surface temperature anomalies (3-month average over the region (5°S-5°N; 170°-120°W)) during 2020-2021, observed (black dotted line) and predicted by statistical (green dotted lines) and dynamical (blue dotted lines) models from the International Research Institute for Climate and Society (IRI) collection and 3D-Geoformer (red dotted lines). Each colored line indicates the trajectory of a 9-month prediction made starting in different months. Note that the data used in the statistical and dynamical models are separately averaged results directly taken from the IRI website at https://iri.columbia.edu/our-expertise/climate/forecasts/enso/current/. 3D-Geoformer and a dynamical coupled model is also presented for prediction and process understanding during 2021.

A Transformer-Based Deep Learning Model (3D-Geoformer)
The transformer-based deep learning model (named 3D-Geoformer) used in this study was recently developed by Zhou and Zhang (2023) and is a novel self-attention-based neural network for ENSO-related multivariate modeling. The model is established on an encoder-decoder scheme with associated modules, including two data preprocessing modules, encoder and decoder components, and an output layer. More specifically, within the tropical ocean (92°E-30°W, 20°S-20°N), nine fields (anomalies of zonal and meridional wind stress, and seven-layer temperature at depths of 5, 20, 40, 60, 90, 120, and 150 m) during 12-month consecutive time intervals are fed into the model as input predictors; we chose the seven specific depths as predictors so that the data sampling in the vertical should adequately represent the fine structure of upper-ocean temperature evolution . The same variables are produced as an output from the 3D-Geoformer for the following 20-month time intervals with targeted predictands through rolling predictive steps. A more detailed description and essential discussion of the method are included in Supporting Information S1. This 3D-Geoformer has several characteristics that are different from other deep learning-based models for ENSO predictions. For example, it uses a self-attention-based transformer architecture to establish multivariable relationships regardless of their respective spatial-temporal distances. Therefore, the 3D-Geoformer can more easily and effectively learn nonlocal teleconnection and long-term dependence within the geoscience data and greatly improve the prediction accuracy. Multivariate predictor variables are selected to adequately represent ocean-atmosphere interactions and the Bjerknes feedback consisting of sea surface wind, SST, and subsurface temperature anomalies (Jin & An, 1999), since the intensities of wind forcing and thermocline effect (as represented by the subsurface temperature) are both important to SST evolution in the tropical Pacific (Gao & Zhang, 2017). Moreover, the same predicted variables are produced as an output for predictands, achieving 3D multivariate modeling capability. In addition, the model takes consecutive 12-month fields serving as initial conditions so that 12-month information (1-year time interval) is contained in the input predictors for predictions, whereas dynamical models usually include only one instant time for initial conditions. The 3D-Geoformer also adopts a rolling manner in multivariate predictions similar to dynamical coupled models. Taking a prediction that starts on January 2021 as an example, nine fields during a 12-month time interval (January-December 2020) are used as input predictors; then, the model is used to make a one-month forward prediction, generating the same fields for January 2021 as predictands. Afterward, new 12-month predictor fields (from February 2020 to January 2021) are formed by combining the 11-month predictors during February-December 2020 and the 1-month predicted fields in January 2021, which are then fed into the model as new input predictors to generate the next prediction for February 2021. This procedure continues in the same manner as dynamical models do in their forward predictions.
The training data used in 3D-Geoformer include simulation products from 23 Coupled Model Intercomparison Project Phase 6 climate simulations during 1850-2014. As CMIP models are not all the same in terms of their abilities to simulate ENSO (e.g., Planton et al., 2021), we have not purposely selected models that did a good job in representing ENSO, as indicated in Table S1 in Supporting Information S1. The validating data used in the 3D-Geoformer are reanalysis datasets from Simple Ocean Data Assimilation during 1871-1979 (Carton & Giese, 2008) and Ocean Reanalysis System 5 during 1958-1979(Zuo et al., 2018. At the same time, the Global Ocean Data Assimilation System reanalysis data set (Behringer & Xue, 2004) is used to assess the multivariate ENSO prediction skill. This purely data-driven spatiotemporal attention-enhanced model demonstrates its excellent performance in retrospective ENSO prediction ( Figure S8 in Supporting Information S1).

The Evolution of the 2021 Second-Year Cooling Condition and Its Predictions
A La Niña condition emerged in late 2020 in the tropical Pacific, followed by second-year surface cooling in late 2021 ( Figure 1); also see Supporting Information S1 for more details. When 3D-Geoformer is used to make predictions for the 2020-2022 cooling conditions in the tropical Pacific, its real-time predictions demonstrate its excellent performance. As an example, the predicted Niño 3.4 SST anomalies during 2020-2022 are presented in Figure 2 (also Figure S1 in Supporting Information S1); note that our model actually produces 3D temperature fields and we then calculate the Niño 3.4 SST index for this figure. It is clear that 3D-Geoformer can well predict the double La Niña conditions in late 2020 and 2021. For example, the model predictions followed the corresponding observations very closely, clearly being much better than the averaged prediction results collected by the IRI (Barnston et al., 2012) and the North American Multi-Model Ensemble (NMME, Kirtman et al., 2014) products; see details in Supporting Information S1.
Here, we further focus on the 2021 second-year cooling condition when starting prediction from January 2021. The 3D-Geoformer can depict the 3D evolution of upper-ocean temperature anomalies extremely well ( Figure 3). Then, the predicted 3D temperature evolution and winds in the tropical Pacific are analyzed to understand the processes affecting the second-year cooling during 2021. Compared with the corresponding observations ( Figures  S2 and S3 in Supporting Information S1), the evolving relationships between wind stress, SST, and subsurface temperature during the 2021 second-year cooling state can be captured well by the 3D-Geoformer (Figure 3 and Figure S4 in Supporting Information S1). In particular, the model can predict the prolonged cooling conditions during 2020-2021 well at lead times of 6 months and longer, capturing the 2020 La Niña and the 2021 secondyear La Niña conditions, with a turning point of SST anomaly evolution in mid-2021. As we trace the prediction of the 2021 SST evolution from the 3D-Geoformer simulation, it is found that subsurface temperature evolution is a crucial factor affecting SST conditions.
As coherent relationships among anomalies of SST, wind stress and upper-ocean temperature in 2021 are predicted well when starting in January 2021 using 3D-Geoformer, these fields can be used to illustrate processes leading to SST evolution in 2021. During the 2020 La Niña event, positive subsurface temperature anomalies prevailed in the western equatorial region, along with large negative subsurface temperature anomalies centered in the central-eastern equatorial Pacific. When the positive subsurface anomalies propagated eastward from the western Pacific to the east, the negative subsurface temperature anomalies in the east gradually weakened. However, cold conditions returned again in approximately July 2021, and the positive subsurface anomalies were interrupted without their continual extension to the east. Thereafter, the negative subsurface temperature anomalies intensified and expanded to the west, resulting in second-year surface cooling in late 2021. A few processes within the coupled ocean-atmosphere system tended to maintain the negative subsurface temperature anomalies in the eastern tropical Pacific, which are described in Supporting Information S1 for more details.

Sensitivity Experiments Conducted Using 3D-Geoformer for Interpretability
Based on the above arguments, further sensitivity experiments are conducted to illustrate why the 2021 secondyear cooling condition can be produced as this purely data-driven 3D-Geoformer can predict it well. Note that the 3D-Geoformer is trained by adopting a self-attention-based architecture among the input predictors, including upper-ocean temperature and surface wind stress, with well-established multivariable relationships in parallel regardless of their spatial and temporal distances. From a physical point of view, the thermocline structure and variability in the tropical Pacific provide a physical basis for predicting ENSO evolution (Jin, 1997;Meinen & McPhaden, 2000;Wyrtki, 1985); the related effects are referred to as the thermocline feedback. Local processes in the central-eastern Pacific are also important to SST anomalies, including subsurface thermal influences from off-equatorial regions (Luo et al., 2009). As has been shown above, the negative subsurface temperature anomalies in the eastern tropical Pacific re-emerged in mid-2021, acting to cool the surface in the central-eastern tropical Pacific; moreover, the positive subsurface temperature anomalies, which can be traced from the western tropical Pacific and propagated along the equator, retreated and weakened. Thus, a second-year cooling condition occurred in 2021, with a turning point of SST anomaly evolution in mid-2021.
To mimic these processes and their relative effects on SST evolution, we conduct sensitivity experiments using 3D-Geoformer for its prediction during 2021. One type of experiment is designed to illustrate the effect of the reoccurrence of negative subsurface temperature anomalies in mid-2021 in the eastern tropical Pacific. Specifically, in the rolling prediction experiments performed starting from each month in 2021, the simulated subsurface temperature anomalies in the eastern tropical Pacific (150°-80°W, 20°S-20°N) are artificially removed from the input predictors from June to December 2021 in each prediction experiment (specified to be zero values); thus, the effect due to the negative subsurface temperature anomalies in the prediction experiments is not considered for the eastern tropical Pacific during 2021.
Figures 4a and 4b illustrate the effects on the SST predictions in 2021 when the negative subsurface temperature anomalies in the eastern tropical Pacific in mid-2021 are considered or not. As argued above, in mid-2021, the coupled ocean-atmosphere system in the tropical Pacific can evolve into warming or cooling conditions in late 2021, depending on the relative dominance of the two competing processes: the remote warming effect associated with the positive subsurface temperature anomalies from the western basin and the local cooling effect associated with the negative subsurface temperature anomalies in the eastern tropical Pacific. If the negative subsurface temperature anomalies in the eastern tropical Pacific did not re-emerge in mid-2021, then the remote warming effect from the west dominated, and the subsequent turning point of SST anomaly evolution would not occur in mid-2021 and there would be no second-year cooling condition in late 2021. As a result, a surface warming would remain in late 2021, with weak SST anomaly (Figure 4b). On the other hand, if the effect of the negative subsurface temperature anomalies is dominant, a cold condition can re-emerge and prevail as observed. These experiments clearly illustrate that the reoccurrence of the negative subsurface temperature anomalies in the eastern tropical Pacific in mid-2021 is a determining factor for the reoccurrence and intensification of the secondyear cooling condition in 2021, along with the turning point of SST anomaly evolution in mid-2021.

The Role Played by Intensity of Subsurface Thermal Effect on the Phase Transition and Amplitude of SST Anomalies During 2021
As analyzed above, ENSO predictions are sensitive to the way the subsurface temperature effect is depicted. Further sensitivity experiments are then conducted by changing the intensities of subsurface thermal effect on SST prediction; here, a parameter, sub , is introduced to represent the subsurface temperature effect on SST. Specifically, in the rolling prediction experiments performed starting from each month in 2021, subsurface temperature anomaly (T sub ) amplitude during 2021 produced in the 3D-Geoformer is artificially reduced as input predictors in each prediction experiment. For example, sub = 0.5 is taken to represent a half reduction in the subsurface thermal effect on SST prediction, as shown in Figure 4c. The changes in the intensities of subsurface temperature anomalies have effects on the phase transition and amplitude of SST anomalies. The second-year cooling conditions in 2021 would not occur if the thermocline effect intensity is weakly represented below certain levels, and thus, no turning point of SST anomaly evolution is evident in mid-2021.

An Equally Important Effect on SST Prediction Associated With Wind Forcing During 2021
The wind field is another active element in the thermocline feedback loop, playing an important role in ENSO evolution. As observed and predicted by the 3D-Geoformer, an easterly wind anomaly persisted over the central-eastern regions during 2021, which acted to maintain the negative subsurface temperature anomaly pattern in the eastern equatorial Pacific. Thus, additional sensitivity experiments are also conducted using the 3D-Geoformer by changing the intensities of wind forcing; its anomaly intensity can be symbolically multiplied by a parameter, α τ . Specifically, in the rolling prediction experiments performed when starting from each month in 2021, zonal and meridional wind stress anomaly intensities as input predictors are artificially adjusted in terms of α τ in each prediction experiment. For example, α τ = 0.5 is taken to represent a reduced wind forcing effect case (a half) to examine the role played by the easterly wind forcing in the second-year cooling condition in 2021. Figure 4d shows the predicted Niño 3.4 SST anomalies when adopting α τ = 0.5 relative to the control experiment.
The results indicate that the way SST can evolve into a cooling condition or not in 2021 is sensitive to the magnitude of the easterly wind anomalies. The persistence of the easterly wind anomaly and related effects on the ocean favored the production of a cooling SST tendency in 2021.

Comparisons With Other Dynamical Coupled Models
The evolution of double and triple La Niña conditions in the tropical Pacific has been of great interest because they occur in the global warming context. Numerous previous studies have been conducted to understand the processes responsible for the occurrences of multiyear La Niña events. Correspondingly, previous physics-based modeling studies using intermediate coupled model (ICM) have shown the importance of local negative subsurface thermal anomalies to SST prediction in the 2021 prolonged La Niña condition Zhang et al., 2022). It was demonstrated that the ICM can also predict the second-year sea surface cooling in late 2021 at least one year in advance, with a turning point of SST anomalies in mid-2021 ( Figure 5). Note that this ICM adopts an empirical parameterization for the temperature of subsurface water entrained into the mixed layer (T e ) through its relationships with sea level, which explicitly represent the thermocline feedback (Zhang et al., 2003). To put our purely data-driven modeling into context, in this subsection, we also briefly present analyses performed using this ICM in terms of the second-year cooling condition in 2021. Sensitivity experiments are also conducted using the ICM to demonstrate how its SST predictions are directly affected by the intensities of subsurface thermal effects and wind forcing, which are represented by e and α τ , respectively. Figure S6 in Supporting Information S1 shows the predicted Niño 3.4 SST anomalies during 2020-2021 by reducing the intensities of subsurface thermal effects and wind forcing, respectively. The ICM can predict the second-year cooling conditions during 2021 well, with a turning point of SST anomaly sign in mid-2021 ( Figure S6a in Supporting Information S1). As indicated by alternatively taking e = 0.5 and α τ = 0.5 in the sensitivity experiments, if wind anomaly intensity is weakly represented below certain levels or the intensity of subsurface thermal effect is not strong enough, the second-year cooling in 2021 cannot be predicted to occur in the ICM (Figures S6b and S6c in Supporting Information S1). In other words, to correctly predict the observed second-year cooling conditions in the fall of 2021, the ICM needs to adequately represent the intensities of both the wind forcing and subsurface thermal effect.
As demonstrated here, the ICM and the purely data-driven deep learning-based 3D-Geoformer are constructed completely differently, but both exhibit successful predictions of SST evolution for the 2021 second-year cooling conditions. Further sensitivity experiments are performed to illustrate their roles in ENSO predictions by changing surface wind and subsurface temperature effects. Note that one can independently change predictors (wind stress and temperature) as inputs in the deep learning-based statistical model, but doing this in the context of a dynamical model (e.g., ICM) may introduce inconsistency between the effects induced by wind stress and temperature perturbations, respectively. Specifically, the zonal wind stress along the equator at low frequencies is roughly in a dynamical balance with the depth-integrated pressure gradient (Yu & McPhaden, 1999). Hence, if one independently changes the wind stress or temperature in the eastern equatorial Pacific, the coupled system will adjust to bring them into a new balance. Since such an adjustment is very quickly on the equator (Jin, 1997), the outcomes from the ICM-based sensitivity experiments do not change. Another major factor affecting ENSO predictions is model systematic errors that reduce the ability of coupled models to properly represent ENSO feedback processes, as indicated by the thermocline feedback; here, two parameters, α τ and sub , are introduced to represent wind and temperature effects, which are used to conduct sensitivity experiments (Gao & Zhang, 2017). Putting all above together, the results independently obtained from completely different models indicate that it is necessary to adequately represent the thermocline feedback in predictive models, either in dynamical coupled models or purely data-driven models, so that ENSO predictions can be improved effectively.

Conclusion and Discussion
Great challenges still exist in making accurate real-time predictions of SST evolution during the 2020-2022 prolonged La Niña conditions. In this paper, a transformer-based deep learning model, named 3D-Geoformer, developed by adopting a novel self-attention-enhanced neural network, is used to make real-time SST predictions during 2020-2021, with a focus on three-dimensional oceanic processes that are responsible for the second-year surface cooling condition in 2021, which followed the 2020 La Niña event. By taking long time-interval information (i.e., 12-month 3D temperature fields) as predictor inputs, the 3D-Geoformer can adequately represent the related processes during 2020-2022 and thus make successful predictions, which can be compared with those obtained from other dynamical coupled models.
As analyzed from observation and dynamical models, the 3D-Geoformer achieved a good prediction for the evolution of upper ocean temperature anomalies during 2020-2021, particularly the second-year cooling condition in 2021 that can be predicted when starting from January 2021 as an initial condition, with a turning point of SST anomaly evolution in mid-2021. It is a clear demonstration of the potential for purely data-driven deep learning models to represent physical processes and predict ENSO. The applications of the 3D-Geoformer presented in this paper for process representations and understanding also illustrate an innovative method for ENSO studies.

Data Availability Statement
The data for this study have been deposited in the Marine Science Data Center, Chinese Academy of Science. The data can be found in http://english.casodc.com/data/metadata-special-detail?id=1659173969036693506.