Long‐Lead Drought Forecasting Across the Continental United States Using Burg Entropy Spectral Analysis With a Multiresolution Approach

Amid climate change, extreme droughts have occurred more frequently, accompanied by increasing socio‐economic‐environmental losses. To be prepared for upcoming droughts, proactive drought management planning based on drought forecasting is being highlighted. This study, therefore, developed a long‐term drought forecasting model using Burg entropy spectral analysis with the maximal overlap discrete wavelet transform across the continental United States (CONUS). The model forecasted monthly self‐calibrated Palmer Drought Severity Index (scPDSI) up to median lead times of 12 months with root mean square error less than 0.8 and the modified Nash‐Sutcliffe efficiency coefficient greater than 0.6 across CONUS. The lead‐time with an acceptable accuracy was affected by how the model parameterized low‐frequency structure of scPDSI that was significantly correlated to sea surface temperature climate mode. The correlation between low‐frequency signals and Niño‐3.4 tended to be weaker for the regions located at higher elevations. Thus, higher variability in spectral structures of low‐frequency signals at higher elevation‐regions led to shorter memory/persistency of time series that shortened the forecast lead time. Besides, higher variability in precipitation linked to the variability of scPDIS also shortened the forecast lead time. To appraise the applicability of the suggested forecast model to drought management planning, drought characteristics (i.e., severity, duration, and area) were obtained by the three‐dimensional drought clustering algorithm for the forecasted and observed scPDSI. Then, risk was analyzed for drought characteristics using the vine copula. Risk analysis for drought characteristics forecasted up to a 12‐month lead time showed the potential for applicability of the model to drought mitigation plans.

2 of 33 led to more extended lead-time with statistically significant accuracy: some exceptional studies showed lead times up to 12 months (e.g., Özger et al., 2012), but most studies showed more or less 6-month lead times (e.g., Altunkaynak & Jalilzadnezamabad, 2021;Kim & Valdés, 2003;Özger et al., 2020). ML models learn nonlinear patterns directly from "big data" and overcome the simplified mathematical surrogation human-derived for complex phenomena (Rackauckas et al., 2020). However, without predefined structures governing a problem, ML models require extensive homogeneous training data and might lose their reliability under changing conditions (e.g., climate change), especially when "big data which can address dynamics" is not always available (Nearing et al., 2021;Rackauckas et al., 2020). Furthermore, although the interpretability of deep learning (DL) has improved with sensitivity analysis and integrated gradients and expected gradients methods that can facilitate the understanding of feature importance in a learning process (Jiang et al., 2020;Kratzert et al., 2019;Nearing et al., 2021;Samek et al., 2019;Sundararajan et al., 2017), those methods for interpretability have their own assumptions and/or limitations (Höge et al., 2022;Sundararajan et al., 2017). Therefore, "theory-guided data science (TGDS)" and generalizable models with scientific consistency and interpretability are needed (Karpatne et al., 2017;Nearing et al., 2021).
Although the cause-and-effect statistical models (i.e., bivariate or multivariate analysis) are likely to outperform univariate time series models in long-term forecasting, the forecasting skills of multivariate models are sensitive to the selection of predictors . Therefore, weighing a more practical use of long lead-time drought forecasting for the continental-scale, this study applied a univariate time series model with Burg entropy theory. Univariate time series models, represented by the autoregressive integrated moving average (ARIMA) model and its variants, have been widely applied for hydrological forecasting (e.g., Fernández et al., 2009;Mishra & Desai, 2005;Moghimi et al., 2020). Although ARIMA may simulate short-memory processes, fractional ARIMA, fractional Gaussian noise generator (FGNG) that is theoretically the sum of infinite terms of autoregressive (AR) model of order of 1, and the generalized autocovariance function incorporating ARMA and FGNG can capture any second-order stochastic structures in a wide range of frequency bands and successfully simulate highly nonlinear and long-range dependence processes (Beran, 1992;Koutsoyiannis, 2000;Mandelbrot, 1971). So, the application of univariate time series models is not limited to the systems with exponentially decaying autocorrelation (i.e., short-memory process). As another example, the asymmetric moving average approach successfully simulated the irreversible streamflow process while accurately estimating the second-order structures and higher-order statistics of time series (Koutsoyiannis, 2020).
Besides, a univariate time series analysis at different frequency-amplitude modulation regimes helps quantify a low-frequency structure with a long memory-important for long-lead-time forecasting-and can be addressed by a simple time series model (i.e., AR model; Kwon et al., 2007;Rajagopalan et al., 1998). Time series analysis at multiple frequency bands is also a physically reasonable approach when briefly looking at drought causative mechanisms (Han & Singh, 2020: sea surface temperature (SST) climate modes at large-scale frequencies (i.e., interannual to several decades) and relatively short periodic modulators represented by internal atmospheric variabilities (e.g., Rossby waves, atmospheric rivers, and atmospheric blocking), and initial hydrologic conditions (IHCs) affecting the onset and evolution of drought.
Since univariate time series models utilize their temporal dependencies and/or periodicities in forecasting, harnessing the spectral structure with accurate peak locations in each decomposed frequency domain is essential for accurate and reliable drought forecasting (Han & Singh, 2020). The second-order moments for stochastic time series modeling can be estimated by several statistical approaches, such as autocovariance function and power spectrum, climacogram, and entropy spectral analysis (Dimitriadis & Koutsoyiannis, 2015;Koutsoyiannis, 2010). Since the accuracy and reliability of autocovariance decrease as the lag order becomes higher due to the limited sample size, the bias in the estimated autocovariance may reduce the accuracy of power spectrum (Dimitriadis & Koutsoyiannis, 2015). However, climacogram, which is defined as the variance of the averaged process against the averaged time scale, can address the long-term change in persistency and clustering through the Hurst coefficient. It has yielded accurate estimation of autocovariance and power spectrum for time series modeling (Dimitriadis & Koutsoyiannis, 2015). Entropy spectral analysis also can accurately estimate the spectral and autocorrelation structures of time series, without assuming zeros for the unknown parts of autocorrelation, and determines spectral power/density, which maximizes entropy among spectral signatures satisfying predefined constraints . Besides, entropy spectral analysis has been successfully applied for long-term streamflow forecasting, and its easy connection to the AR process could facilitate time series modeling (e.g., ARMA models; Krstanovic & Singh, 1993a, 1993b.

10.1029/2022WR034188
3 of 33 However, entropy spectral analysis does not seem to have been applied for drought forecasting. Although regional differences exist, drought has higher irregular cyclic/seasonal patterns than streamflow has, besides the nonhomogeneity of stochastic structure embedded in it (Dracup et al., 1980;T. Zhang et al., 2021). Therefore, the objective of the study was to develop a drought forecasting model with long-lead-time (longer than a half year) for the continental United States (CONUS) by applying Burg entropy spectral analysis (BESA) for multiple frequency domains decomposed from drought time series. Thus, the resulting drought forecast is a superposition of the forecasts of each decomposed signal.
The maximal overlap discrete wavelet transform (MODWT; Percival & Walden, 2000) was used in decomposing the drought time series. Compared to discrete wavelet transform (DWT), MODWT is considered a better method for multi-frequency analysis, since it avoids the downsampling issues and the shifting of decomposed signals (Cornish et al., 2006;Percival & Walden, 2000).
The proposed drought forecasting model was validated using the self-calibrated Palmer Drought Severity Index (scPDSI) that seamlessly covers CONUS with 0.5 • spatial resolution for the observation period 1901-2020. However, BESA with MODWT assumes that grids are independent and does not use the spatial dependence structure in the spatial domain of scPDSI. Since the fundamental purpose of drought forecasting is to set up a proactive drought mitigation plan, drought characteristics of forecasted time series were detected by the three-dimensional (3D) drought clustering algorithm (Andreadis et al., 2005;Han & Singh, 2021), and their drought frequency and risk analyses were done through the vine copula (L. Zhang & Singh, 2019) to assess the applicability of the proposed drought forecasting model to practical water resources management schemes.

Data
scPDSI, distributed by the Climate Research Unit (CRU), was used to validate the Burg entropy-based drought forecasting model across CONUS. scPDSI was proposed by Wells et al. (2004) to facilitate regional comparison of PDSI by dynamically adopting regional climate data and calibrating spatially varying climatological norms for quantifying drought. Besides, scPDSI distributed by CRU used Penman-Monteith parameterization to compute the potential evapotranspiration (PET) for drought quantification instead of using Thornthwite's method (van der Schrier et al., 2013). Therefore, the other hydrological variables (e.g., net radiation flux, soil heat flux density, wind speed, latent heat of vaporization, and the rate of water vapor transfer from the ground) incorporated in calculating PET allowed scPDSI to better simulate the effects of temperature-driven water and heat flux transfers. Thus, scPDSI may provide a solid reference when evaluating the forecasting skills of the proposed model across CONUS. scPDSI covering global lands with a spatial resolution of 0.5 • has a data period of 1901-2020 at a 1-month time scale (Barichivich et al., 2021;van der Schrier et al., 2013). However, we used scPDSI spatially confined within CONUS for the period of 1901-2020 (1,440 months long).
Besides scPDSI, SST climate indices: Niño-3.4, Pacific Decadal Oscillation (PDO), and Atlantic Multidecadal Oscillation (AMO), and elevation data were used for evaluating the performance of the proposed drought forecast model, since SST teleconnections and local geophysics can modulate hydroclimate in inland areas (Han & Singh, 2020;Hidalgo & Dracup, 2003). Monthly SST climate indices with the same period as scPDSI had were collected from NOAA Physical Science Laboratory (PSL). For a detailed discussion on Niño-3.4, PDO, and AMO, references are made to Trenberth (1997), Mantua and Hare (2002), and Trenberth and Shea (2006), respectively. This study also used the Global Multi-Resolution Terrain Elevation Data 2010 (GMTED2010) of the 30″ spatial resolution as elevation data. GMTED2010 covers the globe, but we extracted the area of CONUS and spatially interpolated the raw data on 0.5 • × 0.5 • grids using inverse distance weighted interpolation (Ahrens, 2006). In addition, for the forecast performance evaluation and implementation of long shortterm memory (LSTM) neural networks, NOAA nClimGrid monthly precipitation and temperature, which cover CONUS and Alaska with 5 km grid resolution, were used (Vose et al., 2014). However, precipitation and temperature confined in CONUS were spatially interpolated on 0.5 • × 0.5 • grids of scPDSI using the ordinary Kriging method based on the semivariogram-the spatial structure of the observed field (Chiles & Delfiner, 2012;.

Water Resources Research
HAN AND SINGH 10.1029/2022WR034188 4 of 33

Maximal Overlap Discrete Wavelet Transform
Frequency is a physical attribute of a time series, and any time series can be expressed by Parseval's theorem as the weighted sum of simple oscillatory components as ∫ 0.5 −0.5 ( ) 2 where , , , and ( ) represent, respectively, the imaginary unit, frequency, time instance, and a power or distributed variance corresponding to (Kumar & Foufoula-Georgiou, 1997). Since drought time series (scPDSI) can be modeled as nonstationary and composed of multiple frequencies, frequency-time local analysis or multiresolution analysis is required to understand the embedded physical attributes (Kumar & Foufoula-Georgiou, 1997;Sang, 2013). We applied the MODWT multiresolution analysis for scPDSI to obtain the decomposed signals within multiple frequency bands. MODWT, which implements DWT twice, produces zero-phase shifts in wavelet and scaling coefficients so that the decomposed signals are exactly aligned to the original signal (Percival & Walden, 2000).
The basic components of MODWT are wavelet and scaling filters that are represented by a mother wavelet. For more than the last two decades, MODWT has been used in conjunction with the Harr wavelet (Percival & Guttorp, 1994;Percival & Mofjeld, 1997). Instead of using the Harr wavelet, we employed the Daubechies wavelet with a filter of width 4 (DW4) due to its steeper increase (decrease) in the squared gain function of the wavelet (scaling) filter: this physically means sharper separation between low and high frequencies (Percival & Walden, 2000). The assumed MODWT produces sub-signals at different scales ( = 1, ⋯ , ), then wavelet filter ( ) performs a bandpass filter with a pass-band of [ 1 2 +1 Δ , 1 2 Δ ] and scaling filter ( ) performs a low-pass filter with a pass-band of [0, 1 2 +1 Δ ] at the th-scale, where Δ is 1 month in this study. Thus, the decomposition of the original time series up to the th scale is a cascading filtering process (Figure 1a; Percival & Walden, 2000;Seo et al., 2017): after the filtering process at the th scale, a high-frequency wavelet coefficient (̃ ) and detail signal ( ), and a low-frequency scaling coefficient (̃ ) and smooth signal ( ) are produced. Whereas is retained to represent details at the scale, scaling coefficient ̃ is passed down as an input signal for the filtering process at the ( + 1 )th scale ( Figure 1a). Repeating the filtering process up to the th scale brings up at each scale and at the last scale ( Figure 1a). Therefore, the original time series ( ) can be recovered from its MODWT as = ∑ =1 + (Percival & Walden, 2000;Zhu et al., 2014). The mathematical procedure to implement MODWT decomposition is summarized in Text S1 in Supporting Information S1. For a more detailed discussion on DWT and MODWT, reference is made to Percival and Walden (2000). We decomposed scPDSI up to 5 scales ( = 5 ) as shown in Figure 1b (an example arbitrarily taken near central Texas), since the smooth signal at the last scale ( 5 ) had an extremely tight frequency band (see Figure 2), further decomposition could not provide additional information. MODWT was applied for scPDSI at each grid, and then BESA was applied for all and to forecast them. Then, the forecasted and were superposed to obtain the resulting forecast of scPDSI at each grid.

Burg Entropy Spectral Analysis
Spectral density implies the distribution of the variance of time series in the frequency domain such that it signifies the embedded periodicity . Linear prediction models (i.e., AR) can also express the spectral density using their prediction coefficients (Burg, 1975;Shumway & Stoffer, 2011). Based on spectral density as a key linkage component, entropy theory combines spectral analysis and time series analysis for forecasting. There are two pillars of the entropy theory (Jaynes, 1957a(Jaynes, , 1957bKullback, 1997;Singh, 2013Singh, , 2014: (a) principle of maximum entropy (POME) and (b) principle of minimum cross entropy (POMCE). Whereas POME finds spectral density with higher entropy among spectral densities that satisfy the same statistical criteria, POMCE finds spectral density as does POME but as close to the given prior spectral density as possible. This study used BESA, which employs POME, for drought forecasting, since it is mathematically simple but has shown promising long-term streamflow forecasts (Burg, 1967(Burg, , 1975Krstanovic & Singh, 1993a, 1993b. Forecasting time series using BESA involves the determination of entropy and constraints, estimation of the Lagrange multipliers, and the extension of the autocorrelation function and time series (Krstanovic & Singh, 1993a, 1993b. Entropy theories, including BESA, estimate spectral density using incomplete autocorrelation without zeros padded for unknown lags. Their general objective, thus, is to extend autocorrelation for infinite lags in a manner that maximizes (minimizes) the entropy for POME (POMCE) so that the least biased spectral density is estimated. Therefore, before discussing Burg entropy theory further, we define autocorrelation ( ) of time series ( ) with sample size , 10.1029/2022WR034188 5 of 33 where = 1, ⋯, , since is the primary statistic to define constraints for maximizing Burg entropy (Krstanovic & Singh, 1993a, 1993b. Singh (1993a, 1993b) suggested calculating up to lags of 2 for periodic time series or lags of 4 for nonperiodic time series to obtain accurate autocorrelation. Since decomposed signals of scPDSI do not have significant periodicities within designated frequency bands, was calculated up to lags of 4 = . Hereafter, decomposed signal(s) of scPDSI is(are) referred to as signal(s).

Derivation of Entropy Spectral Density
The second-order statistics of time series are stored in spectral density. When spectral density is normalized and denoted as ( ) , ( ) performs as the probability density used in entropy theory (Burg, 1975;Shannon, 1948;Shore, 1981;Tzannes et al., 1985). Burg entropy is defined as the integral of the log of the normalized spectral density as where is the frequency and is the Nyquist fold-over frequency which is equivalent to 1 2Δ where Δ is sampling interval (herein Δ = 1 month).
From a stochastic viewpoint, there may be a number of spectral densities that satisfy specified constraints. However, finite observations/information may not fully represent a system in terms of ergodicity. Therefore, 10.1029/2022WR034188 6 of 33 accommodating as much uncertainty as possible but satisfying given conditions can formulate a stochastic process with the least bias. Before maximizing the defined Burg entropy (Equation 1) using Lagrange multipliers, constraints need to be defined as where is the imaginary unit and ( ) is the autocorrelation function of the th lag. When is 0, Equation 2 yields the total power of 1. Defining constraints through the Wiener-Khinchin (WK) theorem (i.e., Equation 2) enables the entropy theory to link the time domain characteristics to the frequency domain characteristics and provides an upper bound for maximizing the entropy (Burg, 1975).
Based on the constraints, the defined Burg entropy is maximized for deriving the least-biased spectral density using the Lagrangian function as where are the unknown Lagrange multipliers.
The least-biased spectral density is obtained for Burg entropy by taking the partial derivative of ( ) with respect to ( ) and equating the derivative to zero, ( ) ( ) = 0 , such that ( ) is maximized: The spectral density so obtained can be expressed as The entropy spectral density obtained by maximizing the Burg entropy has a linear prediction filter form (Burg, 1967(Burg, , 1975. Therefore, the problem at hand now is to estimate and then relate them to the prediction coefficients of the AR process. The process for estimating the Lagrange multipliers is discussed in the following subsection.

Estimation of Parameters
The Burg entropy spectral density, Equation 4, is expressed in terms of Lagrange multipliers . Besides the estimation of the least biased spectral density, the essential objective of using BESA is to forecast drought. Therefore, Figure 2. Spectral density estimated for signals (from 1 to 5 ) at the arbitrarily selected grid from the state of Texas (red star in Figure 1). Gray dashed lines and red solid lines represent spectral densities estimated by Fast Fourier Transform (FFT) and Burg entropy spectral analysis (BESA), respectively. 7 of 33 bridging with prediction coefficients of linear prediction models is important (herein AR process). To achieve the objective, the Levinson-Burg algorithm was used for BESA in estimating .
The Levinson-Burg algorithm improves the Durbin-Levinson algorithm, which achieves the minimum squared error (MSE) of the forward prediction error by obtaining MSE of forward and backward prediction errors (Burg, 1967(Burg, , 1975. For a detailed discussion on the Durbin-Levinson algorithm, reference is made to Shumway and Stoffer (2011). As a first step to implement the Levinson-Burg algorithm, ( ) of Equation 2 is replaced with Equation 4 as Equation 5 relates the observed autocorrelation and the Lagrange multipliers. Then, going through the sequential mathematical deductions leads to the relationship between and prediction coefficients of the AR process as Equation 6 and yields a linear prediction error filter in the form of the Yule-Walker equation as Equation 7 (see Appendix A for mathematical deduction).
Therefore, prediction coefficients ( ) can be calculated recursively up to model order (i.e., = 1, 2, ⋯, , ⋯, ) by the Levinson-Burg algorithm as follows: 1. Calculate the forward error at the th order: 2. Calculate the backward error at the th order 3. Calculate the reflection coefficients at the th order 4. Calculate the prediction coefficients at the ( ) th order where (0) = 1 and 0 = 0 = ( ) . ( ) is a signal. Model order is defined by the Bayesian Information Criterion (BIC; Schwarz, 1978). The mathematical formula of BIC can be found in Text S2 in Supporting Information S1. The estimated prediction coefficients were used for forecasting time series and calculating the Burg entropy spectral density in the relationship with .

Extension of Autocorrelation and Forecasting scPDSI
Since Lagrange multipliers for BESA, derived in the previous subsection, were associated with prediction coefficients of the AR process, the estimation of parameters up to model order requires the extension of autocorrelation (Equations 7-11). Therefore, autocorrelation is extended by maximizing the Burg entropy in the form of For mean squared error of the forecasted time series to be minimum, the prediction coefficients for time series forecasting should be the same as those used in the extension of autocorrelation function (Cui, 2015). Therefore, when using BESA, time series is forecasted using a linear combination weighted by the prediction coefficients aŝ The sequential forecasting procedure discussed for BESA was applied for each signal, and the resulting forecast of scPDSI was obtained by aggregating the forecasted signals. The forecasting process was repeated at every grid across CONUS.

Entropy Spectral Analysis
The first step for entropy-based forecasting is the estimation of spectral density. Figure 2 shows the estimated spectral densities by BESA with the reference spectral densities that were obtained by the Fast Fourier Transform (FFT) as an example by taking the arbitrarily selected location near Texas (red star in Figure 1c). More noiselike spectral structures were embedded as the signal went through a higher frequency band-pass filter ( Figure 2). To clarify, 1 ( 5 ) had the highest (lowest) frequency band or lowest (highest) scale level. The estimated spectral density by BESA could detect the exact location of the highest peak with a high resolution as the scale of decomposition increased (i.e., increases from 1 to 5 ) due to more simple spectral structures embedded therein ( Figure 2). Since different regions may have different spectral structures for each signal, we also showed the estimated spectral densities for the decomposed signals of scPDSI arbitrarily selected from other states in Figures S1-S3 of Supporting Information S1 for more examples: California ( Figure S1 in Supporting Information S1; blue star in Figure 1c), Colorado ( Figure S2 in Supporting Information S1; yellow star in Figure 1c), and Illinois ( Figure S3 in Supporting Information S1; black star in Figure 1c). Overall, as more multiple peaks existed in signals (i.e., 1 and 2 ), BESA was likely to lose accuracy in estimating the primary and/or secondary peaks and detecting locations thereof. The degrading estimation skills of BESA for multi-peak spectral cases have been reported by Cui and Singh (2015) and Burg (1967). However, the variance of scPDSI, , can be expressed by wavelet coefficients (̃ ) and scaling coefficient (̃ ) of MODWT as Then, the percentage of variance explained (PVE) by each signal can be calculated as |̃| 2 ( ) × 100 and |̃5 , | 2 ( ) × 100 for detail signals and smooth signals, respectively, so that the variance of scPDSI was explained as: 2.5% by 1 , 3.6% by 2 , 7.4% by 3 , 16.2% by 4 , 25.1% by 5 , and 45.2% by 5 . Since a higher scale signal explained more variance of scPDSI, more accurate estimation of spectral densities of 3 and through to 5 , which explained ∼94% of the total variance, were significant and may lead to more accurate forecasting. 9 of 33 1 , 2 , and 3 were modulated by periodicity bands ranging from 2 to 16 months, so the local dynamics/variabilities ranging from the sub-seasonal to inter-annual scales of hydrometeorological variables (e.g., precipitation, temperature, and evapotranspiration) can significantly contribute to the variability of 1 , 2 , and 3 . On the other hand, the low-frequency climate modes (e.g., ENSO, PDO, and AMO that have periods ranging from 2 years to multi-decadal; McCabe et al., 2004;Sheffield & Wood, 2012) and their on-or-off phases can contribute to the modulations of 4 and through to 5 which have periodicity bands longer than 16 months. Application of the empirical orthogonal function (EOF; refer to Jolliffe (2002) for a detailed discussion) to the signals superposed by 4 , 5 , and 5 across CONUS could reveal the spatiotemporal variabilities and their relations to the climate modes. The first three principal components, PC1, PC2, and PC3, explained the variability of the superposed low-frequency signals as much as 26%, 13%, and 11%, respectively. Since the rest of the EOF modes individually explained the variance less than 6%, we analyzed the first three leading PCs as shown in Figure 3a.
Eigenvectors corresponding to PC1 (EOF1) were multiplied by −1 for easy comparison to other modes and they showed negative modes almost across CONUS but with higher negative loadings for the Great Plains and the Midwestern United States (Figure 3b). EOF1 represented almost the continental-scale drought ( Figure 3b) and this spatial pattern was identified by calculating the composite mean of the superposed low-frequency signals ( Figure 3e) corresponding to the periods when PC1 was below the 10th percentile of it (red shading in the top panel of Figure 3a). The extreme low PC1 occurred in 1910-1911, 1917-1918, 1925, 1930-1931, 1933-1936, 1939-1940, 1953-1957, 1963-1964, 1976-1977, 1988-1989. For example, extreme droughts in the 1930s were represented by the Dust Bowl, which was the most devastating drought of the 20th century and affected two-thirds of the United States and partials of Mexico and Canada (Schubert et al., 2004). The Dust Bowl was initiated by La Niña and prolonged by local land-atmospheric feedback (Cook et al., 2009(Cook et al., , 2013Han & Singh, 2020). This was supported by the significant correlation (p value <0.05) between climate indices (e.g., Niño-3.4, PDO, and AMO) and PC1 (Table 1). In performing the correlation analysis for PCs, Niño-3.4, PDO, and AMO were decomposed as scPDSI did, and then the last two lowest frequency signals were superposed. The superposed low-frequency climate signals were normalized ( Figure S4 in Supporting Information S1). Hereafter, the normalized low-frequency Niño-3.4, PDO, and AMO signals are referred to as Niño-3.4, PDO, and AMO, respectively.
The second mode of EOF (EOF2) showed the north-south dipole but with the pronounced drought conditions in the Southwest, the lower Colorado River basin, and over Texas ( Figure 3c). PC2 had statistically significant (p value <0.05) correlations to climate modes, as listed in Table 1. The loading pattern of the second mode has been identified as ENSO-impacted drought spatial pattern for CONUS in other studies (e.g., Cole & Cook, 1998;Mo et al., 2009;Steinschneider et al., 2016). The third mode of EOF (EOF3) showed the east-west dipole ( Figure 3d). The statistically significant correlations of PC3 to climate modes are summarized in Table 1. Although correlations of PC3 to climate modes were weak, the zonal type of dipole of the correlation of precipitation to ENSO (i.e., positive [negative] correlation in the east [west] of the United States) has been identified by Enfield et al. (2001). Besides, the correlation between PCs and PDO and AMO could enforce or remit the ENSO effects and affect their time lags of drought (Enfield et al., 2001).
The spatial EOF loading patterns of the three leading PCs were also identified by Steinschneider et al. (2016). They conducted EOF analysis of a multi-centennial tree ring-based paleo reconstruction of PDSI and showed the relation of ENSO to the EOF loading patterns by correlation between PCs and SST anomalies. When we did EOF analysis of scPDSI (the original observations before applying MODWT) across CONUS, the same EOF loading patterns as the low-frequency signals had were identified; but correlations between climate modes and PCs of scPDSI were not statistically significant (not shown, but PCs of scPDSI are shown in Figure S5 in Supporting Information S1). Therefore, since the variance of scPDSI was explained by 4 to 5 with greater than 80% and the spatial variability of scPDSI across CONUS was the same as their low-frequency signals (i.e., superposition of 4 to 5 ) did, we can deduce that the climate modes likely affected the variability of 4 to 5 and the low-frequency signals were important to forecast and understand the spatiotemporal variability of scPDSI. In this context, accurate spectral estimation of BESA for low-frequency signals can be the cornerstone of accurate long-term drought forecasting.

Forecasting scPDSI
Entropy spectral density in the previous subsection was accomplished by estimating the Lagrange multipliers associated with the prediction coefficients of the AR process (Equation 6). Thus, based on the estimated entropy  10.1029/2022WR034188 11 of 33 spectral density, our primary goal was to forecast scPDSI at the longest lead time with acceptable accuracy. To evaluate the proposed drought forecasting method, we performed a cross-validation procedure during the validation period at different lead times (from 1 to 48 months). About 20% of the sample size amounting to 288 months (1997-2020) was used for the validation period. The prediction coefficients and model order were defined for each decomposed signal at each grid during the calibration period . With the fixed parameters and model order for each signal, the value of each month for each signal during the validation period was forecasted at different lead times. Then, the forecasted values of signals were superposed to obtain the resulting forecasts of scPDIS each month at different lead times. The forecasting process was repeated for the validation period for all grids across CONUS. The goodness-of-fit of forecasts to observations during the validation period was quantified by root mean square error (RMSE), coefficient of determination ( 2 ), and the modified Nash-Sutcliffe efficiency (NSE) coefficient, since each evaluation metric has its limitations. The mathematical expressions for the goodness of fit measures used can be found in Text S3 in Supporting Information S1. RMSE measures the error between forecast and observation in terms of the average magnitude of difference in the scale of observations . The lower RMSE indicates less error in forecasts. 2 measures how much variance of observations a model can explain (Ritter & Muñoz-Carpena, 2013). So, the higher 2 indicates better forecasting. However, 2 is not sensitive to error (or proportional differences) from observations (Legates & McCabe, 1999). NSE is one minus the ratio of the error variance of forecasts to the variance of observations (Nash & Sutcliffe, 1970). So, the NSE value of 1 represents the perfect agreement with observations, and NSE decreases as forecasting accuracy degrades. However, since NSE is sensitive to extreme values (Krause et al., 2005), the modified NSE was used for better evaluation. Hereafter, the modified NSE is referred to as NSE.

Overall Forecasting Performance of BESA
The performance metrics (RMSE, 2 , and NSE) were calculated at different lead times for each grid, and then they were grouped by lead times to show the overall forecast performance versus lead times (Figure 4). The box plots in Figure 4 show the range of percentiles of performance metrics calculated across CONUS (from 3,273 grid points) at different lead times. Red dots and the interpolated blue lines in Figure 4 indicate the mean and median, respectively. We selected the evaluation criteria of acceptable forecasting performance as 0.8 for RMSE, 0.7 for 2 , and 0.6 for NSE. While Ritter and Muñoz-Carpena (2013) suggested 0.65 of the original NSE to represent a criterion for the acceptable model performance, Krause et al. (2005) showed that the modified NSE tended to yield lower values than the original NSE did, since it was not sensitive to extreme values. Therefore, we decided on 0.6 as the evaluation criterion for NSE. The RMSE ( 2 and NSE) values at each lead time were distributed with positive (negative) skewness: however, 2 had a positive skewness after a 36-month lead time. So, performance metrics were more congregated to their better performance sides. Besides, interquartile ranges (IQRs) for RMSE, 2 , and NSE were quite narrow (<0.2) up to a 15-month lead time. Therefore, the median values at different lead times were used as a statistic to represent the overall forecast performance. The trend of medians of all performance metrics showed a decrease in the accuracy as lead time increased. In addition, widening IQRs as an increase in the lead time also showed the degrading reliability of forecasts in longer antecedence. However, RMSE showed their median of ≤ 0.8 up to lead times of 15 months. 2 (NSE) showed their median of ≥ 0.7 (0.6) up to lead times of 18 (15) months.
The split-sample into calibration (hereafter referred to as Cal) and validation (hereafter referred to as Val) has no general rules to follow, although Klemeš (1986) suggested 70% for calibration and 30% validation of the data period, when a sample is not long enough for the one-half split. Therefore, this study investigated how the forecast accuracy of BESA changes as a Cal-Val period varies: 80% of Cal and 20% of Val, 70% of Cal  and 30% of Val (198530% of Val ( -2020, and 60% of Cal (1901Cal ( -1971 and 40% of Val (197240% of Val ( -2020. The variation of error metrics along with lead times as shown in Figure 4 also was found in different Cal-Val periods ( Figure S6 in Supporting Information S1): as lead time increased, the forecast accuracy and IQRs tended to degrade and widen, respectively. However, the median and IQR of error metrics were not significantly different among cases of the Cal period except for the median of RMSE ( Figure S6 in Supporting Information S1 and Table 2). The median of RMSE tended to increase as the Cal period became shorter. Table 2 lists the median and IQR of error metrics regarding different Cal periods at different lead times. Besides the median RMSE increased as the Cal Since no significant differences in forecast accuracy of BESA among different Cal periods were found except for RMSE, we used a Cal period of 80% which led to the minimum bias. However, BESA seems to calibrate parameters decently when calibration is relatively short but long enough to estimate periodicities in the signal. Further analyses in the following were done with the results from the Cal period of 80%.
The direct and intuitive categorical forecast accuracies can be quantified by hit, false positive, and false negative rates (Kim & Valdés, 2003;Shin et al., 2020;Wilks, 2011). In this study, the hit rate is the probability of all drought events correctly forecasted by BESA; the false positive rate represents the probability of forecasted drought events by BESA that were false (i.e., the fraction of the number of times BESA forecasted drought events when the observations did not indicate drought events); and the false negative is the fraction of the number of times BESA did not forecast drought events when the observations indicated drought events. Drought events Therefore, Table 3 shows the overall probabilistic scores of hit, false positive, and false negative at different lead times. As lead time increased, the hit rate decreased, and the false positive rate increased. However, with up to 15 months lead time, BESA correctly forecasted drought with over 78% times and alarmed drought events false with ∼3% times. The false negative rate is the complementary relation to the hit rate.
Based on the hit and false positive rate, the receiver operating characteristic (ROC) score can be calculated. The ROC score assesses the ability to discriminate the binary outcomes (i.e., true: correct forecasts of drought events and false: false forecast of drought events) (Archer & Fowler, 2008;Shin et al., 2020). For the mathematical details of ROC, reference is made to Archer and Fowler (2008). The ROC score was calculated for the forecasts at different lead times for each grid, and they were grouped by lead times (Figure 5). A perfect score of ROC is 1, and less than 0.5 of ROC indicates poor forecast. Although ROC decreased as lead time increased, the median of ROC was ∼0.85 at 15 months lead time. Therefore, in categorical forecasting skill scores, BESA also showed good forecast performance for at least up to 15 months of lead time.
As another avenue of investigating the overall performance of the suggested model over CONUS, the spatially averaged scPDSI, and cross-validated forecasts were compared for 3-month, 6-month, 9-month, 12-month, and 15-month lead times ( Figure 6). As lead time increased, the characteristics of time series were dully forecasted so that the underestimation of drought events (the periods below zero) or peaks was progressively pronounced. A possible attribute of scPDSI that may affect the underestimation of forecasts is the Hurst phenomenon, which is a critical statistical attribute to simulating time series with long-term dependency (Beran, 1987;Hurst, 1951;Koutsoyiannis, 2010;Tatli & Dalfes, 2020). The Hurst phenomenon represents the long dependency in time series and is associated with clustering of irregular alternation (e.g., high and low flow regimes) in time series (Beran, 1992;Dimitriadis et al., 2021). The possible lack of full replication of Note. The number in parenthesis represents the interquartile range (IQR) for each error metric. the Hurst phenomenon in the entropy spectral analysis may not yield a perfect estimation of second-order structure with a lower decaying rate (Beran, 1987). However, the proposed model was able to forecast all the drought events up to 15 months of lead times and all the observed values were inside the 95% forecast intervals (Figure 6). Based on the overall performance evaluation at different lead times, as shown in Figures 4-6, the proposed model seems to have acceptable forecasting skills up to lead times of 15 months in terms of median and average. The variability in the forecast performance at different lead time categories (Figures 4 and 5) was due to the regionally different forecasting skills of the proposed model. Thus, the important factors we needed to elucidate were what caused the regional variation in the forecast performance. In the following subsection, therefore, the regional forecasting performance and the potential factors that can modulate them are discussed.

Regional Forecast Performance of BESA
The proposed BESA drought forecasting model showed regional variation in its forecast performance (Figure 7). Up to lead times of 3 months, all three metrics showed acceptable accuracies across CONUS: RMSE ≤ 0.5, 2 > 0.85, and NSE > 0.75 ( Figure S7 in Supporting Information S1). However, as lead time increased further, regional patterns in forecast performances were pronounced (Figure 7). We only show regional distributions of performance metrics at lead times of 6, 12, and 15 months to highlight changes in the regional patterns. Based on the evaluation criteria of forecast performance defined in the previous section, blue and green shaded regions in Figure 7 had acceptable accuracies at each  . NSE also showed the regions that had a degraded performance as did RMSE, but including Arizona and excluding some of the southeastern states (Figure 7). Although 2 showed similar regional patterns as NSE had, the performance was greater than 0.7 up to a 15-month lead time across CONUS except for some parts of Utah and the Northeastern United States, which had the worst performance in RMSE and NSE. The high value of 2 does not necessarily indicate good forecasts, since it expresses collinearity without considering the bias between forecasts and observations (Legates & McCabe, 1999). Therefore, RMSE and NSE are considered more reliable goodness of fit measures in the analysis of regional patterns of forecast skills. However, since the two metrics showed not perfectly match regional patterns, we adopted a lead time, which can imply the acceptable accuracy of RMSE and NSE simultaneously as a metric to simplify the regional analysis. The lead time was defined as follows: (a) find the maximum lead time of forecasts with RMSE less than or equal to 0.8 at a specified grid; (b) find the maximum lead times of forecasts with NSE greater than or equal to 0.6 at the grid selected at step 1; (c) decide the lead time as a shorter one between steps 1 and 2; and (d) repeat the step from 1 to 3 for each grid. The spatial distribution of the defined lead times and their histogram are shown in Figure 8.
The spatial distribution of lead times ranged from 3 to 27 months, and the distribution had a mode of 15 months and a median of 12 months (Figure 8b). The warmer colors (e.g., red and orange) indicated regions that maintained acceptable forecast accuracies for shorter lead times (i.e., up to 9 months; Figure 8a). Intermountain West and the Northeastern United States (especially the north side of Appalachians) had spatially extended regions with the shortest lead time (Figure 8a). The red-shaded region in Intermountain West and the Northeastern United States had a median elevation of 1,632 and 457 m, respectively. These regions with the shortest lead times were mountainous areas and had a median autocorrelation calculated from scPDSI

Water Resources Research
HAN AND SINGH 10.1029/2022WR034188 16 of 33 at 1-month lag as 0.86. On the other hand, regions with a longer lead time had a relatively higher lag-1 autocorrelation and were located at a lower elevation in terms of the median (Table 4). Besides, the persistency of scPDSI in Table 4 was defined as lag-months when autocorrelation first fell below 0.2. So, regions with longer lead times in drought forecasting were likely to have a stronger serial correlation, longer memory/persistency, and lower elevations. Since the distributions of autocorrelation, persistency, and elevation in different lead time (LT) categories (i.e., LT = 3 months, 3 months ≤ LT 6 months, 6 months ≤ LT 12 months, and LT ≥ 12 months) did not follow the normal distribution, median values were used for comparison and their standard deviation was noted inside the parentheses of Table 4. We applied the Mann-Whitney U (MWU) test to determine whether the distributions of statistics (i.e., lag-1 ACF and persistency) and their median values were statistically different (McKnight & Najab, 2010;Nachar, 2008). Pairs between LT = 3 and 3 ≤ LT 6, 3 ≤ LT 6 and 6 ≤ LT 12 , and 6 ≤ LT 12 and LT ≥12 showed the p value less than 0.05 so that the null hypothesis (two distributions/medians were identical) was rejected for all pairs regarding lag-1 ACF and persistency, respectively. Therefore, in each row, the column-wise differences of medians in Table 4 were statistically significant. The median elevation in the LT = 3 category (sample size of 48) was small, since 80% of the sample belonged to the north side of the Appalachians, where a mean elevation is 447 m. So, the implications of high elevations in the Intermountain West could not increase the median of elevation in the LT = 3 category. However, the inverse relationship between elevation and lead times and the positive relationship between the strength of autocorrelation/persistency and lead times were observed from the last three columns in Table 4. scPDSI is the drought index to quantify how much shortage or excess of precipitation is to the amount of precipitation required to maintain climatologically normal soil moisture (CNSM) at a monthly scale (Palmer, 1965;Wells et al., 2004). Once the required precipitation (P * ) to maintain CNSM, which is defined by local characteristics of ET, recharge, runoff, and loss, the variability of precipitation was determined, scPDSI was calculated as the deviation of precipitation (P ) from P * (Palmer, 1965;Wells et al., 2004). Therefore, the variability of P was significantly correlated with that of scPDSI (MacDonald & Tingstad, 2007). The compound impacts of the direction and depth of air moisture flux, wind speed, and geographical features of mountains complicated the temporal and spatial precipitation patterns (Yuter et al., 2011). For example, the northwest of Utah, where the suggested model had the shortest lead time, is located under the regime of the Southwest monsoon typically approaching from the Gulf of California (Higgins et al., 1997;MacDonald & Tingstad, 2007). However, in real physics, the vertical mixing of moisture air masses from the Gulf of California (flows below 850 hPa) and the Gulf of Mexico (flows above 850 hPa), antecedent soil moisture, and snow water equivalent modulate the strength of the monsoon and increase the variability of precipitation (MacDonald & Tingstad, 2007). Besides, MacDonald and Tingstad (2007), Nigam et al. (1999), and Woodhouse (2003) found low correlation between precipitation (and PDSI) and SST teleconnection (i.e., ENSO and PDO) in the northwest of Utah so that the low-frequency signals of scPDSI had lesser significant quasi-oscillations/periodicities due to the probable frequency shift of precipitation in the interannual to multidecadal frequency bands. Figure 9a shows the absolute correlation coefficients between the low-frequency signal ( 4 + 5 + 5 ) and ENSO (Niño-3.4) at each grid. The absolute correlation coefficients were grouped by regions that fell in the same lead time category (i.e., either 3 ≤ LT 6, 3 ≤ LT 6, or LT ≥12 ; Figure 9b).  (630) Note. Standard deviations are in parenthesis. The lowest median correlation (0.07) in the category of 3 ≤ LT 6 can support the weak ENSO implications in high elevated regions (Table 4). Besides, lead times longer than 6 months were likely to have stronger associations with ENSO (Figure 9b). Hidalgo and Dracup (2003) also showed that the strength of the Pacific SST teleconnections was weaker in higher elevation regions than in lower elevation regions in the upper Colorado River basin.
Precipitation variability in the Northeastern United States is affected by regional to global teleconnection represented by ENSO and Pacific North American (PNA) that can modulate air masses entering the northeast of the United States (Hubeny et al., 2011). The continental dry air masses in zonal flows and moist air masses from the Gulf of Mexico and Atlantic coastal areas in meridional flows are two primary air masses entering the Northeastern United States. PNA strongly affects the tropospheric circulation patterns so that the meridional (zonal) tropospheric flow leads to the dominance of meridional (zonal) moist (dry) air inflows from the Oceans (continent) (Hubeny et al., 2011). Besides, El Niño (La Niña) years are associated with the positive (negative) PNA index that leads to meridional (zonal) dominant flows (Hubeny et al., 2011). However, the barrier effects of the Appalachian Mountains and IHCs thereof affect the precipitation-production from the meridional moisture supply and alter the strength of SST teleconnection. Therefore, an increase in variability for the low-frequency signals regarding the north side of the Appalachian Mountains can lead to a short lead time. Figure 10 shows the median model order for each signal in different lead time categories. Model orders, which were selected in forecasting each signal decomposed from scPDSI, were grouped in different lead time categories, and the median model orders were calculated. Significant increasing trends in model order along the lead times were found in 4 and 5 that had frequency bands between 16 and 64 months as the known ENSO periodicity (Sheffield & Wood, 2012). When considering relationships among lead times, elevations, strength of autocorrelation and persistency of scPDSI, correlation of low-frequency signals of scPDSI with Niño-3.4, and model orders, we could deduce that the weaker ENSO implications and the increase in elevation tended to lead to shorter persistency/memory of time series, so that model order decreased. The model order was related to the level of persistency of time series. Therefore, the increase in persistency in 4 and 5 led to a longer lead time for scPDSI forecasts with acceptable accuracy. Other geographical features (e.g., the proximity to the coast), vegetation types, and geological features can influence the nature of scPDSI. However, this study focused on the implications of elevation and SST teleconnections for the characteristics of scPDSI that affected the forecast skills of the proposed drought forecasting model. Besides the implication of SST teleconnections for scPDSI and forecast performance at a large time scale, we will discuss how precipitation was associated with scPDSI and forecast performance. Precipitation is the main forcing and drought indicator that can modulate scPDSI (MacDonald & Tingstad, 2007;Palmer, 1965;Wells et al., 2004). Among Niño-3.4, PDO, AMO, precipitation, and temperature, precipitation had the highest correlation to scPDSI at a 1-month scale (not shown). To investigate the implication of precipitation on scPDSI, we decomposed precipitation using MODWT into six signals as we did in the previous section for scPDSI. Then, we found a level of superposition and lag that yielded the highest correlation to scPDSI. For example, the level of aggregation 3 and lag 1 of precipitation mean that 5 + 5 + 4 + 3 at a 1-month lag yielded the highest correlation; and the level of aggregation 5 and lag 3 of precipitation mean that 5 + 5 at a 3-month lag yielded the highest correlation. So, we can infer these as follows: (a) as the level of correlation decreases, the regions can have a space for other hydrometeorological variables (e.g., soil moisture and snow) that can significantly impact the variation of scPDSI; (b) the regions with a lower level of aggregation can have more dynamic temporal patterns in precipitation and scPDSI; and (c) the regions with a shorter lag can have scPDSI that responds promptly to precipitation. Figure 11a shows the spatial distribution of the highest correlation of precipitation, which is of a different level of aggregation ( Figure 11b) and at different lags (Figure 11c), to scPDSI. The western United States tended to have a lower correlation than the eastern United States, probably due to the more significant implication of ENSO ( Figure 9a) and higher topology variation (Figure 1c) that may complicate the hydrologic process at different frequency bands, besides the implication of precipitation. Furthermore, mountainous states and Appalachian states tend to have a lower level of aggregation and shorter lags that imply a more dynamic precipitation process that can pass a higher variability to scPDSI.
The level of aggregation and lag were grouped by regions that fell in the same lead time category (i.e., either 3 ≤ LT 6, 3 ≤ LT 6, or LT ≥12 ; Figure 12). As lead time (i.e., forecast performance) increased, less dynamic precipitation processes (i.e., higher level of aggregation and longer lag) were implicated (Figure 12).

Comparison With LSTM
The forecasting performance of BESA was compared with the state-of-art LSTM encoder-decoder (LSTM-ED; hereafter called LSTM) neural networks at sample locations shown in Figure S8 in Supporting Information S1. The five sample locations in Figure S8 in Supporting Information S1 were randomly selected among grids that correspond to the 95th, 75th, 50th, 25th, and 5th percentiles of the ranked 2 of BESA. For the discussion of the concept of vanilla LSTM and LSTM-ED, references are made to Chollet (2018), Fang et al. (2017), Jiang et al. (2020), and Xiang et al. (2020). Monthly precipitation P, temperature T, Niño-3.4, PDO, AMO, and the past scPDSI were used as input, and the following future scPDSI was used as target data. However, to prepare the input data set with the highest association with the target data set, we conducted a correlation analysis as follows: 1. MODWT was applied for input variables, except for scPDSI: each variable was decomposed into six signals as we did for scPDSI for BESA. 2. Cross-correlation analysis was conducted between scPDSI and different levels of superposition of the decomposed signals at different lags for each predictor. As an example of , calculate the correlation coefficient vector as [ ( 5 ( − ), scPDSI( )), ( 5 ( − ) + 5 ( − ), scPDSI( )), ⋯, ( 5 ( − ) + 5 ( ( 5 ( − ), scPDSI( )), , scPDSI( where is lag and T indicates the transpose of a vector. 5 is a smooth signal and 5 , ⋯, 1 are detailed signals at different levels of decomposition of . Then, find the level of superposition and lag that yield the highest absolute correlation to scPDSI. Therefore, the input data set was comprised of the aforementioned six predictors: except for scPDSI as input data, the other five predictors in the input data set were the superposition of the decomposed signals at a certain level of superposition and lag that had the highest correlation to the target data scPDSI. Training and validation periods were set as the same for BESA. Although LSTM can be trained for multi-sequence-to-multi-sequence, we trained LSTM for multi-sequence-toone for lead time forecasting as BESA did. However, to avoid subjective manipulation in tuning hyperparameters, we used Bayesian optimization, referred to as Sequential Mode-Based Global Optimization (SMBO) with Tree-structured Parzen Estimator (TPE) (hereafter called SMBO-TPE; Bergstra et al., 2011). For the concept and detailed discussion of SMBO-TPE, reference is made to Bergstra et al. (2011). The optimal hyperparameters obtained for each sample point are listed in Table S1 in Supporting Information S1. LSTM-based models were developed using the Keras framework with TensorFlow backend of Python, and NVIDIA GeForce RTX 3060 GPU was used in training. Figure 11. Spatial distribution of (a) the highest correlation between self-calibrated Palmer Drought Severity Index (scPDSI) and precipitation that superposed at the level of aggregation (b) and lag (b). Note that number in the floating-point, not an integer, in the color bar of (b) is due to the spatial interpolation.

Water Resources Research
HAN AND SINGH 10.1029/2022WR034188 20 of 33 Figure 13 shows the variation of error metrics computed from BESA and LSTM at different lead times for each sample location. For all sample locations and all lead times, BESA outperformed LSTM. Figure 14 shows the forecasted scPDSI from BESA ( Figure 14a) and LSTM (Figure 14b) for the sample location corresponding to the 75th percentile at different lead times for the validation period. Since the case of the 75th percentile yielded the least difference between BESA and LSTM, we showed this as a sample illustration. Although LSTM at lead times after 1 month followed the trend of observations, bias significantly increased (Figure 14b). On the other hand, BESA showed more accurate forecasting at all lead times compared to LSTM (Figure 14). If the LSTM was trained in a many-to-many approach, the LSTM model might yield improved performance at the specified period of interest. However, the many-to-many approach is for the forecasting of the desired period, not for an end point of the period. Besides, the LSTM was trained at a single grid for this comparison. However, LSTMs may improve their forecast accuracy when regional data is used with the stacked convolutional layers, or one-dimensional (1D) convolutional layers stacked with LSTMs even at a single grid may improve forecast skills, since the 1D convolutional layer can extract features at different frequencies (Chollet, 2018;Feng et al., 2020;Kratzert et al., 2019).
BESA also, in nature, is interpretable how they forecast scPDSI. LSTM or DL can also be improved in terms of forecast accuracies and interpretability when they are internally or externally coupled with physics or prior knowledge in the rim of TGDS (Höge et al., 2022;Jiang et al., 2020;Karpatne et al., 2017;Rackauckas et al., 2020). On the other hand, DL can extract data features that the conventional hydrologic approach can not use. For example, the improved prediction of the ungauged basin using DL can suggest that there are hidden data features the conventional hydrologic approach (e.g., process-based model) can not use (Nearing et al., 2021). Therefore, DL can be improved in TGDS, and statistical and physical hydrology can build up modeling and conceptual base one step forward with the benefit of DL. However, since improving or developing LSTMs for drought forecasting is not our mission for this study, we will not discuss LSTMs or DL further in the following sections.

Drought Frequency Analysis
The purpose of forecasting drought is to plan drought management proactively, and practical drought management can be facilitated by appraising drought risk (Ji et al., 2022;Singh et al., 2007). The drought risk can be quantified by drought frequency analysis . Therefore, we investigated how accurately the forecasted drought time series for the validation period characterized drought events and their risk. Drought has multivariate characteristics, such as severity, duration, area, and inter-arrival time, and evolves in the 3D domain (i.e., longitude, latitude, and time; Andreadis et al., 2005;Han & Singh, 2021). To estimate more realistic severity (S), duration (D), and area (A) of drought events, the 3D drought clustering algorithm of Andreadis et al. (2005) was adopted; and to analyze drought frequency more accurately, we used the vine copula which facilitated multivariate (herein trivariate: S, D, and A) frequency analysis. The 3D drought clustering algorithm found drought patches that were spatially adjoined and temporally overlapped so that spatiotemporally connected grids comprised a drought event, and in turn, drought characteristics of events were calculated. The drought patches were spatially adjoined grids under "drought" that were defined when scPDSI was below −2, which is a threshold value of moderate drought (Wells et al., 2004). Except for the threshold value in defining the drought grids, all procedures of the 3D drought clustering algorithm followed Han and Singh (2021). For methodological details, references are made to Han and Singh (2021) and Andreadis et al. (2005).
Once drought events were detected and their S, D, and A were calculated from observations, and forecasts at different lead times, trivariate joint cumulative distribution functions (CDFs) and return periods were estimated by the vine copula. The vine copula reduces the dimension of multivariable copula into bivariate copulas through pair copula constructions (PCCs) that are combined again later into a multivariate joint probability function (Wu et al., 2021;L. Zhang & Singh, 2019). Application of the vine copula has three steps (Czado,   to the domain of [0, 1] by the probability integral transform (PIT). We used the empirical distribution function (i.e., Gringorten plotting positions) to estimate marginal distributions for the calculation efficiency; (b) find the best-fitted bivariate copulas for each paired copula. The parameters of copulas were estimated by the maximum likelihood estimation (MLE); and (c) calculate the joint CDF and return periods. For a detailed discussion of copula theory and the trivariate drought frequency analysis using the vine copula, one can refer to L. Zhang and Singh (2019). Figure 13. Comparisons of root mean square error (RMSE), 2 , and Nash-Sutcliffe efficiency (NSE) for sample location corresponding to the 95th percentile of 2 of Burg entropy spectral analysis (BESA) (the first row); same for the first row but for sample location corresponding to the 75th percentile of 2 of BESA (the second row); the same for the first row but for sample location corresponding to the 50th percentile of 2 of BESA (the third row); the same for the first row but for sample location corresponding to the 25th percentile of 2 of BESA (the fourth row); and the same for the first row but for sample location corresponding to the 5th percentile of 2 of BESA (the last row). Blue (red) lines indicate the behavior of BESA (long short-term memory [LSTM]).

Characteristics of Droughts
During the validation period, the 3D drought clustering algorithm detected 63 events for the observed scPDSI and detected 55, 56, 62, 43, and 56 events for forecasted scPDSI at lead times of 3, 6, 9, 12, and 15 months, respectively. Drought characteristics of the detected events were calculated as follows: 1. Severity (S): The accumulated scPDSI values of all drought patches belong to an event as where is the duration of a drought event, is the number of drought patches in the month of , and is the number of grids of the drought patch, , in the month of . (⋅ ) represents the absolute value. Therefore, S is expressed in the units of km 2 ⋅ month . 2. Duration (D): The period between the beginning and the end of an event. D is expressed in the unit of month. 3. Area (A): The projected area of the 3D drought events. Area (A) is expressed in the unit of km 2 or % of area encompassing CONUS. Table S2 in Supporting Information S1 lists the top 3 most severe scPDSI drought events obtained from observations and forecasts at different lead times among all the detected events. The area (in the unit of km 2 ) and severity of all detected patches (i.e., spatially contiguous drought grids, at individual months, that comprise a drought event) in all drought events were grouped in the same occurrence month and then accumulated each month to construct time series of severity and area (in the unit of %) (Figure 15). The time series of area from the drought events detected by the 3D clustering algorithm using the observed scPDSI (black lines in Figure 15a) are well-matched with the time series of CONUS percentage drought area monitored by the U.S. Drought Monitor (USDM; https://droughtmonitor.unl.edu/DmData/TimeSeries.aspx). As other previous research showed that a high correlation existed between drought area and severity (Han & Singh, 2021;Li et al., 2020), the temporal behaviors of the area and severity were almost the same (Figure 15) and had a correlation of greater than 0.95 in the case of observations and all lead times. As lead time increases, drought area and severity were underestimated,

10.1029/2022WR034188
23 of 33 as shown in Figure 15. The RMSE, 2 , and NSE of the forecasted drought area and severity at different lead times are summarized in Table 5. As the cross-validated time series showed the reduced forecasting skills as lead time increased in Section 3.2, drought area and severity were forecasted with a lower level of accuracy as lead times became longer. However, NSE had an acceptable accuracy up to lead times of 12 (9) months for Area (severity). The lead times with an acceptable accuracy by NSE regarding the forecasted drought characteristics were shorter than 15 months as the cross-validation time series had (Figure 4). The only parameter used in clustering drought patches in the 3D dimension was the area threshold to define a spatially contiguous drought patch and overlapping drought patches on the time axis. We adopted the area threshold as ∼115,000 km 2 as did Han and Singh (2021), since too large area threshold yielded fewer drought events and too small threshold resulted in spuriously long persistent drought events. As lead time increased, smaller drought patch areas were likely to be produced due to the trend of underestimation in time series forecasts so that more drought patches were screened by the area threshold and/or could not evolve to more extreme droughts. Therefore, a higher level of underestimation and a shorter lead time were obtained for the time series of drought characteristics (i.e., severity and area) compared to the case of the cross-validated time series. Due to the acceptable accuracies for drought characteristics, frequency/risk analysis was performed up to a 12-month lead time in the following section.

Risk Analysis
Risk analysis was performed for drought characteristics obtained by the 3D drought clustering algorithm. The dependency among drought severity (S), duration (D), and area (A) were measured by Kendall's rank correlation coefficient. All three drought characteristics were positively dependent in all cases (i.e., observations and forecasts at different lead times). The duration was used as a center variable in constructing trivariate vine copula (TVP), since it had the highest correlation with the other two variables in all cases. Thus, TVP of (S, D, A) was constructed using the following two-tree structure: tree 1 ( 1 ): S-D-A; and tree 2 ( 2 ): S|D-A|D. Therefore, Figure 15. Time series of drought area (a) and severity (b) of detected drought events by the 3D clustering algorithm using the observed self-calibrated Palmer Drought Severity Index (scPDIS) (black lines) and forecasted scPDIS at different lead times (red, orange, pink, green, and light blue lines indicate 3-month, 6-month, 9-month, 12-month, and 15-month lead times, respectively). the TVP was comprised of three bivariate copulas: ( (S), (D)) , ( (A), (D)) , and ( (S|D), (A|D)) , where (⋅) means the marginal CDF. To find the best-fitted copulas for the three bivariate copulas, Archimedean copulas (e.g., Gumbel-Hougaard [hereafter Gumbel], Frank, Clayton, and Joe) were applied. The parameters of applied copulas were estimated by MLE for the three bivariate copulas separately, and the copula with the smallest BIC was selected as the best-fitted copulas for each bivariate copula. Gumbel and Clayton were selected but with different combinations for the cases of observations and forecasts at varying lead times (see Table S3 in Supporting Information S1). The joint distributions of the fitted TVPs were well matched with the empirical joint distributions in all cases, as shown in Figure S9 in Supporting Information S1. So, the fitted TVPs were applied for drought frequency analysis.
The trivariate joint CDF obtained from observations is shown in Figure 16a at a different level of non-exceedance probability (i.e., 1, 0.9, 0.75, and 0.25). As the level of non-exceedance probability (NEP) decreased, the convex surface areas, which indicate threshold values corresponding to the specified NEP, increased so that the lower values of the severity, duration, and area resulted. The trivariate joint CDFs obtained from the forecasts at different lead times showed almost identical behaviors with Figure 16a (not shown). To facilitate visual inspection of the trivariate joint CDFs between observations and forecasts, we showed the surface of threshold values corresponding to the NEP (i.e., the convex surfaces) estimated by forecasts at different lead times and observations (Figures 16b-16e). However, surfaces of 6-month and 9-month lead times and 9-month lead time are not shown for the NEP of 0.9 and 0.75, respectively, for easy distinguishing patterns. Although the convex surfaces of forecasts were positioned lower than observations, the surface of a 9-month lead time was positioned higher than that of a 3-month lead time and the surface of a 12-month lead time was located higher than that of a 6-month lead time at NEP lower than 0.5 ( Figure 16d). Besides, the surfaces of 9-month and 12-month lead times were positioned higher than that of 3-month lead times at NEP lower than 0.25 (Figure 16e). Figure 17 shows the non-exceedance bivariate joint probabilities of observations and forecasts at different month lead times when one of three variables was fixed at the 100% percentile. The ridge (the most curved part) of contours of forecasts shifted to lower values compared to observations for three different conditions: S = 100%, D = 100%, and A = 100%. The diagonal-wise lowering tendency of a 3-month lead time forecast was less than other lead times at the NEP greater than 0.5. However, after 0.5 of NEP, a 9-month lead time forecast had less underestimation than a 3-month lead time forecast had; and after 0.3 of NEP, the underestimation of a 3-month lead time was larger than other lead times had ( Figure 17). Therefore, based on the trivariate and bivariate joint CDFs (Figures 16 and 17), we found as follows: forecasts at all lead times were underestimated; the level of underestimation of a 3-month lead time was the least at a high percentile range (i.e., NEP > 0.5); and at the NEP lower than 0.3, the level of underestimation was least at a 9-month lead time and increased in the order of 12-month, 6-month, and 3-month lead times. The cross-validated time series showed the monotonic decrease (increase) trend in the forecast skills (underestimation) as lead time increased. However, the higher underestimation at longer lead times caused the 3D drought clustering algorithm to not allow drought events to evolve up to that extreme as observations and 3-month lead time did, while it produced many more severe drought events than those at a 3-month lead time in the lower percentile ranges. Therefore, the inversion of the underestimation trend between high and lower percentile ranges could be caused. For the quantitative evaluation of the applicability of the proposed drought forecasting model for design practice, return period and risk were estimated at different lead times and were compared to those of observations. Risk analysis was performed by the joint and conditional return periods. In the case of joint return period, either "AND" or "OR" case can be used. The "OR" case tends to have a lower return period compared to the "AND" case (Ji et al., 2022). We will only investigate the "AND" case of T(S > s ∩ D > d ∩ A > a) : its mathematical expression can be found in Text S4 in Supporting Information S1. For example, we calculated T(S > s ∩ D > d ∩ A > a) of some observed drought events, ranging from slight to extreme by inputting them to the fitted TVP to observations and forecasts at different lead times ( Table 6). The joint return period T showed an upward trend as lead time increased. However, the trends were not monotonic against lead times due to the shape/behavior of joint CDF (Figures 16 and 17). To minimize the implications of 3D clustering algorithms on the estimated return periods and simplify analysis, the return periods were regressed against lead times for each case of the observed drought event. Besides, , where is the return period and is a planning period (Read & Vogel, 2015), was calculated using = 30 years and listed in the parentheses in Table 6. As a drought event  became more extreme and lead time increased, the differences between return periods obtained by TVP fitted to observations and forecasts increased. Although risk varied based on , when planning water resources for the next 30 years due to anticipated climate change, the risk calculated by forecasts up to a 12-month lead time did not significantly deviate from that by observations. Therefore, the probability of exceeding the drought event at Figure 17. Bivariate joint cumulative distribution function (CDF) conditioned on the 100% percentile value of (a) severity, (b) duration, and (c) area. Solid lines represent observations. Dotted, dot-dashed, double dot-dashed, and dashed lines represent forecasts at 3-month, 6-month, 9-month, and 12-month lead times, respectively. least once during the design period was not significantly different between observations and forecasts up to a 12-month lead time.
The conditional return period is useful to investigate the difference in the return period based on different assumptions as follows (L. Zhang & Singh, 2007): , and TC(A > a|S ≤ s, D ≤ d) . Mathematical expressions for each conditional return period can be found in Text S4 in Supporting Information S1. As examples, the conditional return periods of different assumptions were calculated by the copulas fitted to observations and forecasts at different lead times, and risk associated was calculated as did for trivariate joint return period and listed in the parenthesis in Table 7. Regarding return periods computed using an arbitrary combination of drought characteristics, the differences between return periods obtained by TVP fitted to observations and forecasts increased as lead time increased as trivariate joint distribution showed. The trends of joint and conditional return periods were maintained for other different combinations of observed drought characteristics (not shown). Therefore, an increase in return periods as lead time increased implied an increase in the level of underestimation of forecasts. However, risks corresponding to the estimated return periods at different lead times did not significantly deviate from those for observations. Therefore, the suggested model can provide a 1-year period to prepare for upcoming droughts without compromising risk analysis for drought mitigation planning.

Conclusions
In this study, we developed a drought forecasting model for the CONUS using BESA with a multi-frequency bands approach to achieve accurate drought forecasts at long lead times. Monthly scPDSI for a period of 1901-2020 was used for model calibration and validation by taking up the first 80% and the last 20% of its sample size, respectively.
BESA is known to accurately estimate spectral density by maximizing entropy without any assumptions for unknowns and easily associate the estimated accurate spectral density with the AR forecasting process (Krstanovic & Singh, 1993a, 1993b. However, drought indicators (e.g., precipitation, temperature, and evapotranspiration)

Water Resources Research
HAN AND SINGH 10.1029/2022WR034188 28 of 33 modulated at various periodic scales passed the multiple frequencies natural to drought time series (herein scPDSI) so that the direct application of BESA could not achieve the long-term drought forecasting with accuracy. Therefore, the decomposition of scPDSI into signals with six different frequency bands using the MODWT was employed beforehand to apply BESA to simpler spectral structures.
Lagrange multipliers and prediction coefficients were estimated for estimating spectral densities of signals based on BESA. The investigation of BESA spectral densities for each decomposed signal may show its applicability for that signal. BESA was not accurate in estimating spectral densities of signals that had high-frequency bands ( 1 and 2 ). However, when signals had simpler or fewer peaks, BESA estimated spectral densities more accurately. Therefore, BESA accurately estimated spectral densities of signals with periodicity bands longer than 16 months that explained ∼94% of the variability of scPDSI (i.e., from 3 to 5 ). EOF analysis showed that the low-frequency signals (from 4 to 5 ), which explained ∼80% of total variance, were significantly associated with SST climate modes (i.e., Niño-3.4, PDO, and AMO) and sufficiently represented the spatiotemporal variability of scPDSI across CONUS. Therefore, the low-frequency signals that BESA could accurately parameterize were the backbone for long-lead-time drought forecasting.
The cross-validation for the forecasted scPDSI at different lead times ranging from 1 to 48 months was evaluated, based on RMSE, 2 , and NSE. Without considering the regional variation of forecasting performance, the suggested model forecasted scPDSI up to a 15-month lead time with acceptable accuracies of median RMSE < 0.8, median 2 ≥ 0.7 , and median NSE ≥ 0.6 . However, analysis of regional forecasting performance based on the lead time as a metric, which can accommodate the level of accuracy criteria of RMSE (<0.8) and NSE (≥ 0.6), showed that 51% (75%) of CONUS had lead times longer than 12 (9) months and elucidated the potential contributors that impacted the forecast performance. Regions located at higher elevations tended to have shorter lead times than regions at lower elevations. In addition, regions that had more dynamic precipitation tended to have shorter lead times. The strength of autocorrelation at 1-month lag and persistency/memory of scPDSI decreased as elevation increased. Besides, especially for 4 and 5 , model order, which can reflect the strength of dependency of time series, showed decreasing tendency as regions had shorter lead times. The signals 4 and 5 had periodicity bands corresponding to ENSO. The correlation between low-frequency signal and Niño-3.4 was weak for higher elevation regions, so forecastability for the low-frequency signals degraded due to higher variability or dynamics embedded in them. Therefore, the level of association of low-frequency signals to SST climate modes and the complexity in precipitation-producing mechanisms related to geophysics (i.e., elevation) affected the dependency of signals which caused the regional variation of forecasting performances. Besides, when comparing the forecasting performance of BESA with that of LSTM at sample locations randomly selected among grids corresponding to 95th, 75th, 50th, 25th, and 5th percentiles of the ranked 2 , BESA outperformed LSTM at all sample locations and all lead times.
The purpose of drought forecasting is to prepare a proactive drought management plan. To appraise the applicability of the proposed forecast model for a drought planning scheme, drought frequency, and risk analyses were conducted by comparing observed and forecasted drought characteristics. The 3D drought clustering algorithm was applied to obtain characteristics (i.e., severity, duration, and area) of observed and forecasted scPDSI time series across CONUS. The vine copula was applied for drought frequency analysis of trivariate drought characteristics. The joint and conditional bivariate CDFs confirmed the underestimation of forecasted droughts, so that joint and conditional bivariate return periods showed an increase in the return period as the lead time increased compared to observations. However, the risks calculated for the planning period of 30 years did not show significant differences between those obtained from observations and forecasts up to a 12-month lead time.
Therefore, the drought forecasting model equipped with BESA with a multiresolution approach showed its applicability for long-term drought forecasting across CONUS, while high mountainous regions had lead times of 3-6 months. Besides, risk analysis for the next 30-year planning period showed that the application of the proposed model would not compromise the accuracy of drought mitigation or water resources planning when considering forecasts up to a 12-month lead time. We need to note that the return period for risk analysis was estimated without considering the temporal dependency of scPDSI as a conventional frequency analysis. On the other hand, Volpi et al. (2015) showed that the time-correlation structure of hydrological processes impacts the shape of the probability function of interarrival time of events and exceedance probability. Since this study does not mainly assess the improvement in the probability of failure, we did not consider the time-correlation structure 10.1029/2022WR034188 29 of 33 for frequency analysis. However, more rigorous drought risk analysis can be achieved using the Equivalent Return Period that can address the autocorrelation in hydrologic time series. Although this study validated the proposed drought forecasting model, which first applied the entropy theory, using scPDSI at a 1-month scale, the model is promising for forecasting scPDSI at different temporal scales and other drought indices with a longer memory than scPDSI (e.g., hydrological droughts) at long lead times.

Appendix A: Deduction for Burg's Linear Prediction Error Filter
The deduction starts with Equation 5 that expresses constraints for Burg entropy spectral density. When −2 Δ of Equation 5  To obtain with an efficient computational manner, the relationship between and the prediction coefficients of the AR process are constructed as follows (Burg, 1975;Cui, 2015;: The spectral density of the AR process is expressed as where is the least mean squared error (herein = 1) and positive definite. * ( ) are complex conjugates of ( ) .