Validating the spatial variability of the semidiurnal internal tide in a realistic global ocean simulation with Argo and mooring data

. The autocovariance of the semidiurnal internal tide (IT) is examined in a 32-day segment of a global run of the HY-brid Coordinate Ocean Model (HYCOM). This numerical simulation, with 41 vertical layers and 1/25 ◦ horizontal resolution, includes tidal and atmospheric forcing allowing for the generation and propagation of IT to take place within a realistic eddying general circulation. The HYCOM data are in turn compared with global observations of the IT around 1,000 dbar, from Argo ﬂoat park phase data and mooring records. HYCOM is found to be globally biased low in terms of the IT variance and decay 5 of the IT autocovariance over timescales shorter than 32 days. Except in the Southern Ocean, where limitations in the model causes the discrepancy with in situ measurements to grow poleward, the spatial correlation between the Argo and HYCOM tidal variance suggests that the generation of low-mode semidiurnal IT is globally well captured by the model.


Introduction
Internal tides (IT) are internal waves generated by the interaction of tidal currents with rough bathymetry. The radiated wave 10 beams can travel thousands of kilometers (e.g., Zhao et al., 2016;Buijsman et al., 2020) over which they undergo dissipative processes as they interact with the eddying ocean and bottom topography, to eventually break (MacKinnon et al., 2017). The dissipation of IT represents a major source of vertical mixing in the ocean interior (de Lavergne et al., 2020). As such, IT have a key influence on the ocean state (Melet et al., 2016), and therefore on the global climate system (see Melet et al., 2022, for a review on the subject). description of the Argo data processing and quality controls we employed is available from Hennon et al. (2014) and Geoffroy and Nycander (2022).

Global Multi-Archive Current Meter Database (GMACMD)
The Global Multi-Archive Current Meter Database (GMACMD; Scott et al., 2011) compiles tens of thousands of oceanographic time series from moorings. Previous model-data validation efforts involving both HYCOM and mooring data notably 95 include Luecke et al. (2020), where the authors compared the temperature variance and kinetic energy, over various frequency bands, in realistic global ocean simulations to more than 3,800 instrumental records.
Here, we extracted 331 temperature time series spanning 1972-2010 and meeting the following criteria: 1. The mooring lies in water deeper than 2,000 m 2. The record is longer than 64 days with a sampling interval shorter than 3 hr, for adequate resolution of the semidiurnal 100 tidal signals 3. The instrument depth is within ±200 m from 1,000 m One particular mooring is used as an example in Sect. 3. It recorded during 366 days in the years 1982-83 at the position 28.00 • N, 151.95 • W (north of the Hawaiian Ridge). As previously documented by Alford and Zhao (2007), we refer to the instrument at 1,119 m depth as mooring No. 2. This instrument sampled temperature with a 0.25 h resolution. 105

HYbrid Coordinate Ocean Model (HYCOM)
This study uses 32 days, from May 20 to June 20, 2019, of hourly output at 1,000 m depth from a global run of HYCOM, with 41 vertical layers and 1/25 • horizontal resolution. The non data-assimilative simulation, designated 'GLBy190.04', includes realistic tidal and atmospheric forcing enabling the generation and propagation of IT within the eddying general circulation.
The high resolution 2D fields are complemented by monthly-mean 3D fields of temperature and salinity subsampled to 1 • .

110
A Lagrangian analysis of the simulation is used for a direct model-data validation with Argo floats. The Argo quasi-Lagrangian sampling is mimicked by releasing 41644 particles randomly across the world oceans. We let the particles be advected by the 2D velocity field at 1,000 m for 32 days while sampling temperature with an hourly resolution. This Lagrangian sampling of HYCOM is achieved using the software Parcels (Van Sebille et al., 2021). The Lagrangian simulation uses a classic Runge-Kutta method for computing the advection of the particles (with 5 minutes integration time step). As for 115 the Argo data, we discard particles with a mean speed larger than 0.1 m s −1 . We also discard any particle crossing the 1,000 m isobath during the simulation.

Methods
In this section, we present a local example to introduce the methods that will be used in Sect. 4 for the global comparison. We start by comparing the Argo observations to the Lagrangian sampling of the HYCOM data. Next, we investigate the effects of 120 the drift by comparing Lagrangian and Eulerian samplings of the numerical simulation. We end the section by introducing the comparison between the Eulerian HYCOM time series and mooring observations.
3.1 Lagrangian sampling of the isopycnal displacement at 1,000 dbar As in Hennon et al. (2014) and Geoffroy and Nycander (2022), we define the vertical isotherm displacement observed by a Lagrangian particle η L 1000 as 125 η L 1000 (t) = where T is the time average of the temperature measurements T (t) over a particle trajectory, and (dT /dP ) 1000 (t) is the temperature gradient at 1,000 dbar. Hence, the vertical isotherm displacement is simply computed as the temperature anomaly recorded by a drifting particle at 1,000 dbar divided by the vertical temperature gradient at that depth. For HYCOM particles, we compute (dT /dP ) 1000 as the mean, between 900 and 1100 m, of the vertical gradient calculated from the modeled monthly-130 mean 3D temperature field. We then linearly interpolate the temperature gradient in the horizontal to the successive particle positions (hence the time dependence). To avoid any spurious displacements, we set η L 1000 to NaN whenever the magnitude of the temperature gradient is smaller than 3 × 10 −5 • C dbar −1 .
In the case of Argo floats, Eq. (1) is evaluated over each park phase. T then represents the time average of the temperature measurements T (t) over a park phase, and the vertical temperature gradient is estimated from the average of the two neigh-135 boring ascending profiles. Specifically, (dT /dP ) 1000 (t) is computed as the mean, within 100 dbar of the parking pressure, of the vertical gradient calculated from the average temperature profile. As for the HYCOM particles, we discarded the data from the whole park phase whenever the magnitude of the temperature gradient is smaller than 3 × 10 −5 • C dbar −1 . As explained in Sect. 2.1, η L 1000 time series from successive park phases are stitched together to constitute 32-day time series. Both for Argo floats and HYCOM Lagrangian particles, the low frequency background activity is filtered out from η L 1000 140 using a fourth-order Butterworth filter with a cut-off frequency of 0.3 cpd. The inertial peak is not removed by this filter poleward of about ±10 • of latitude.

Averaging sample autocovariance series
The HYCOM derived Lagrangian time series η L 1000 can be analyzed in the same way as in Geoffroy and Nycander (2022). We illustrate the mapping process below. The geographical patch presented here is also described in Geoffroy and Nycander 145 (2022). For this example area, Fig. 1 shows 8 32-day segments of Argo data and 13 HYCOM particles selected using their median position. The circular patch of radius 200 km containing the latter median positions is centered on the mooring No. 2. From a finite time series η(t) one can calculate the sample autocovarianceR(τ ): where N is the total number of observations, andη is the sample mean of the series. Note thatR(0) is the sample variance of 150 the series. Here, N is taken as the number of non-missing observations, accounting for gaps in the Argo time series (during the descent, main profiling, and surface transmission phases of the float cycle, or because of failed quality controls). The sample autocovariance as defined in Eq. (2) is an unbiased estimator of the "true" autocovariance (i.e.,R(τ ) converges to the true value R(τ ) for infinitely large N ). In the remainder of this article we only compute the variance and autocovariance of vertical isotherm displacement time series. For the sake of brevity, we simply refer to them as the variance and autocovariance.

155
The sample autocovariances for all HYCOM particles within the circular patch shown in Fig. 1 are averaged to obtain a local mean autocovariance R L HYCOM . The local mean autocovariance from the Argo data, R Argo , is computed in the same way. The standard error of the local mean autocovariance is computed as where STD is the standard deviation of the sample autocovariances over the subset, and N p is the number of particles in the 160 subset. Fig. 2a demonstrates the good agreement of the two datasets until τ ≈ 200 h. Past this time lag limit, R Argo falls under its 95% confidence interval (95% C.I. = ±2 SEM) and thus cannot be considered significantly different from zero.
A handy tool for monitoring the evolution of the autocovariance of the semidiurnal IT is complex demodulation. Here, it consists in the least squares fitting of A cos(ω SD τ ) + B sin(ω SD τ ), where ω SD = (ω M2 + ω S2 )/2 is the semidiurnal frequency, and Φ are the amplitude and phase, respectively. We then define the complex demodulate as C = √ A 2 + B 2 . This positive definite amplitude follows the envelope of the modulated sinusoidal with frequency ω SD . Note that this definition differs from the complex demodulation used in Geoffroy and Nycander (2022), where the authors fitted C cos(ω SD τ ), i.e. with Φ = 0. In practice, we compute the complex demodulate of the autocovariance following the harmonic analysis method of Cherniawsky et al. (2001). The demodulation over a given 48-h window is performed in two iterations. We first fit A cos(ω SD τ )+B sin(ω SD τ ) 170 to the signal and compute the root mean square error (RMSE). We then repeat the fitting with a trimmed signal that excludes values outside of ±2 RMSE from the previously fitted curve.
Complex demodulation is just a convenient way of finding the envelope of the sample estimate of an underlying true oscillating function at a given frequency. However, as an estimate of the envelope of the true oscillating function, the complex demodulate can be shown to be biased high (see Appendix A). This bias is greatly mitigated (i) at short time lags, and (ii) when 175 the sample size is large (e.g., when demodulating regional or global mean autocovariances). Throughout this paper we only rely on complex demodulation in one case or the other. A conservative estimate of the uncertainty of the envelope of the sample function (i.e., the uncertainty of the demodulate) is the uncertainty of the sample function itself. If this uncertainty range is larger than the envelope of the sample function, the conclusion is not that the envelope of the true function can be negative, but that it is not significantly different from zero. Here, we evaluate the uncertainty affecting the complex demodulate of a mean 180 autocovariance series over a 48-h window by computing the median of the standard error over that window, hereinafter denoted SEM. The corresponding 95% confidence intervals are taken as ±2 SEM.
Following Geoffroy and Nycander (2022), the semidiurnal IT variance can be estimated from the first 48-h demodulate at the semidiurnal frequency. In Appendix B we investigate the potential contamination of the first demodulate by the non-tidal variability (noise). We found that a background noise can contribute either positively or negatively (a consequence of filtering 185 the time series) to the amplitude of the first demodulate, depending on its characteristic timescale. Notwithstanding, for typical signal-to-noise ratio and timescale of the background noise observed by Argo floats, the first demodulate was seen to remain a conservative estimate of the IT variance. The effects of the non-tidal variability on the IT variance estimate as computed in Geoffroy and Nycander (2022) thus do not put their results into question.
Since HYCOM does not resolve the full spectrum of the oceanic variability, in particular the variability associated with short 190 timescales, the corresponding stochastic noise is expected to differ from the one captured by the in situ data (both in terms of variance and characteristic timescale). The contamination of the first demodulate by this noise would therefore account for a systematic bias when comparing the simulated data with observations. To limit this, we chose to consistently subtract an estimate of the autocovariance of the non-tidal variability from the sample mean autocovariance before computing the complex demodulates. The way we obtain such an estimate is described in Appendix C. Namely, we fit a model of the autocovariance 195 of a tidal variability on top of a background noise to the sample autocovariance by nonlinear least squares. Besides the noise parameters, this also provides us with an estimate of the tidal variance. Still, in this work, we chose to perform the comparisons in terms of the first demodulate for it is a conservative and more robust estimate of the IT variance.
The result of the complex demodulation at the semidiurnal frequency applied to our two (noise corrected) mean autocovariance series is presented in Fig. 2b

Eulerian perspective
The decay with time lag of the demodulates represented as red crosses in Fig. 2b mirrors the decorrelation captured by the 205 Lagrangian sampling of the Argo floats. The motion of the instruments results in decorrelating effects that cannot be isolated from the loss of coherence of the IT. However, by analyzing the HYCOM data within a Eulerian framework, one can directly monitor the decorrelation of the IT. The Eulerian HYCOM time series can then be compared with observations from moorings.
In the Eulerian framework, our methods remain practically unchanged. We now define the vertical isotherm displacement at a given location as where T (x, y) is the time average of the temperature field T (x, y, t), and (dT /dP ) 1000 (x, y) is the temperature gradient at 1,000 m, calculated from the modeled monthly-mean 3D temperature field. As for the Lagrangian time series, we apply a fourth-order Butterworth high-pass filter with a cut-off frequency of 0.3 cpd on η E 1000 . For each HYCOM particle, we compute 32-day long time series of η E 1000 at the successive positions of the particle subsam-215 pled at a 12-h rate. We then calculate the sample autocovariance from each of these time series and average the results over the particle's trajectory. As in the Lagrangian framework, the resulting averages can be averaged over different particles to compute local and global mean autocovariance series. By estimating the Eulerian autocovariance along the Lagrangian trajectories we account for the geographic variability of the IT. Also, our Eulerian sample autocovariance estimates contain many more degrees of freedom than their Lagrangian equivalents; As a result, the uncertainty affecting the Eulerian autocovariance 220 estimates is much smaller.
Eq. (4) is also used to compute vertical displacement time series from the mooring temperature records. Here, the vertical temperature gradient is computed from the annual mean climatology (WOA18; Boyer et al., 2018) as an average of the temperature gradient within 100 m of the instruments' depth. As for the HYCOM and Argo data, we discarded instruments for which the magnitude of the temperature gradient is smaller than 3×10 −5 • C dbar −1 . Each mooring time series is split into successive 225 32-day segments. We then compute the sample autocovariance for each high pass filtered segment and average them to obtain a mean autocovariance.
The Eulerian sampling serves two main purposes: (i) validating the variance of the IT measured by the Lagrangian particles, and (ii) comparing the decorrelation of the IT in the HYCOM data with mooring observations. We illustrate these two aspects for the local example introduced in Sect. 3.2 in Fig. 3 and 4, respectively. 230 Fig. 3 shows the local mean autocovariance series at 1,000 dbar computed from both the Lagrangian η L 1000 (solid black curve) and Eulerian η E 1000 (solid red curve). As expected, the two autocovariance series demonstrate a close agreement at short time lags, before the motion of the particles causes the Lagrangian R  We bin the global collection of HYCOM particles based on their median position, using circular geographical patches of radius 200 km centered on a regular 2.5 • ×2.5 • grid. Here, we use only particles for which (i) the mean speed is lower than 0.1 m s −1 (to avoid contamination by lee waves), and (ii) the variance of η L 1000 is lower than 1 × 10 4 m 2 . The second criterion accounts for 41 discarded particles (out of 41,644) that we consider as outliers, most of which are located in the Labrador Basin or in shallow water close to Antarctica. Including it instead would only marginally affect our results. We did not investigate more 245 the cause of these extreme values.
As in Sect. 3.2 and 3.3, we compute an autocovariance series for each HYCOM particle, in the Lagrangian and Eulerian framework, separately. Then, we average these autocovariances over the corresponding patches to obtain local mean autocovariance series R L HYCOM and R E HYCOM . For each geographical patch, we get local estimates of the semidiurnal IT variance from the first 48-h complex demodulate of R L HYCOM and R E HYCOM . 250 We start by checking how the Lagrangian sampling affects the semidiurnal IT variance. Fig. 5 shows the first 48-h complex demodulate of R E HYCOM plotted as a function of the first demodulate of R L HYCOM for our collection of geographical patches. The agreement is close to perfect (with a Pearson's r squared, or coefficient of determination r 2 ≈ 0.99 and 0.76 in log-log and linear domain, respectively). Thus, the motion of the Lagrangian particles, and therefore of the Argo floats, have no significant impact on the measured variance of the IT. 255 We can then map the semidiurnal IT variance (here taken as the first 48-h complex demodulate of R L HYCOM ) and the associated SEM on a 2.5 • ×2.5 • grid (see Fig. 6a and 6b, respectively). In each figure we show only the bins which yield an IT variance larger than one SEM. Note that the latter criterion is less binding than the one used for the global maps in Geoffroy and Nycander (2022) since the uncertainties are smaller here, by definition. Fig. 6a can be directly compared with the global map of the semidiurnal IT variance computed from Argo data (see Fig. 6c and 6d, updated from Geoffroy and Nycander, 2022). As 260 documented in Buijsman et al. (2020), HYCOM is known to be subject to a thermobaric instability (TBI) in the north Pacific  (dashed black rectangle in Fig. 6a). Since only a few patches of Argo data are available at the edge of this TBI area, we do not exclude it from the subsequent analysis.
The main patterns visible in Fig. 6a and Fig. 6c broadly agree, with energy radiating away from low-mode IT generation hotspots, namely near Madagascar, Hawaii, and the tropical south and southwest Pacific. In Fig. 7a we show the Argo derived 265 semidiurnal IT variance plotted as a function of the corresponding one from HYCOM. The two datasets are well correlated (r 2 = 0.52 and 0.38 in log-log and linear domain, respectively), but the semidiurnal IT variance is systematically smaller in HYCOM than in the Argo data. We investigate this bias by looking at the geographical distribution of the HYCOM to Argo semidiurnal IT variance ratio.
For presentation purposes, instead of the latter ratio we plot the proxy σ 2 HYCOM /(σ 2 HYCOM + σ 2 Argo ) in Fig  Atlas of σ 2 HYCOM, SD /(σ 2 HYCOM, SD + σ 2 Argo, SD ), a proxy for the HYCOM to Argo semidiurnal IT variance ratio at 1,000 dbar. The value of this proxy is close to 0 where σ 2 HYCOM, SD σ 2 Argo, SD , 0.5 where σ 2 HYCOM, SD ∼ σ 2 Argo, SD , and 1 where σ 2 HYCOM, SD σ 2 Argo, SD . The ocean mask is colored in yellow for readability. (c) Zonal mean of the semidiurnal IT variance from Argo (dash-dotted black curve) and HYCOM (solid black curve) as a function of latitude and their respective 95% C.I. (light green and gray shading, respectively). The red curve represents the number of geographical bins used to compute the zonal mean at a given latitude (red axis on the right). The vertical dashed red lines are placed at 50 • S and 50 • N.
The representativity of the zonal mean variances is smaller north of 40 • N, because of the scarcity of available data (solid 280 red curve in Fig. 7c). We therefore focus on the pronounced discrepancy affecting the Southern Ocean. We gather the data available over the unmasked bins in Fig. 7b into two groups, north and south of 50 • S. For each group and the global collection of bins, we compute a mean autocovariance by averaging the corresponding Argo and HYCOM local mean autocovariance series (see Fig. 8). In Table 1, we summarize the semidiurnal IT variance values computed as the first 48-h demodulate of the mean autocovariance for each group. A significant fraction of the divergence visible in the global mean autocovariance at short 285 time lags (see Fig. 8a and 8b) can be explained by the larger discrepancy south of 50 • S (see Fig. 8e and 8f). In this region, the Argo derived semidiurnal IT variance is close to 3.5 times larger than the simulated one. The agreement is better north of 50 • S, where the corresponding factor is 1.5 (see Fig. 8c and 8d). In a separate section we will discuss possible explanations for both the lower semidiurnal IT variance in HYCOM globally and the even lower simulated IT variance in the Southern Ocean.   In contrast to Argo floats, the Eulerian sampling of moorings allows us to directly monitor the decorrelation of the IT. In a procedure similar to that in Sect. 4.1, we bin the global collection of HYCOM particles based on their median position, but this time using geographical patches (of radius 200 km) centered on the available moorings. We compute a sample autocovariance in the Eulerian framework for each particle, and average them within the corresponding patches to obtain local mean autocovariance series R E HYCOM . The mooring time series in a given patch is split into successive 32-day segments. We then compute a 295 sample autocovariance for each segment and average them to obtain a mean autocovariance R Moor . Again, we use only particles for which (i) the mean speed is lower than 0.1 m s −1 (to avoid contamination by lee waves), and (ii) the variance of η L 1000 is lower than 1 × 10 4 m 2 (outliers). After some additional quality controls, mainly discarding bins where either the mooring or HYCOM complex demodulates fall under one SEM within 15 days of time lag, we are left with 172 moored instruments and the corresponding patches of simulated data.

300
For all the datasets used in this study, we found the probability distribution of the global collection of local mean autocovariance at any given time lag to be skewed (not shown). This is not an issue when computing average statistics from the Argo or the corresponding simulated data, since the number of samples (i.e., the number of geographical bins) is very large and the sample mean is therefore expected to be normally distributed, by virtue of the central limit theorem. In contrast, geographical bins where mooring data are available are fewer. Thus, the influence of the tail of the distribution on the sample mean is larger 305 when analyzing the relatively small collection of bins where mooring data are available than when considering the global collections of Argo and HYCOM data. This precludes the use of statistics that assume a normal distribution when describing regional or even global averages of the autocovariances computed from moorings. To limit the effects of skewness, we discard bins for which either the mooring or the simulated first 48-h demodulate is above the 95 th percentile of its observed distribution (here P 95 Moor ≈ 575 m 2 and P 95 HYCOM ≈ 330 m 2 , respectively). The latter criterion accounts for 12 additional discarded bins, 310 all located in moderate to strong mesoscale activity areas. Even after discarding these extreme samples, the data remain highly skewed.
The moorings may not offer as much spatial coverage as the Argo floats do, but they still provide an opportunity to validate the geographical variability of the semidiurnal IT variance in HYCOM. As in Sect. 4.1, we can get local estimates of the semidiurnal IT variance from R E HYCOM and R Moor in each geographical patch centered on a mooring. Fig. 9a shows the scatter 315 plot of the first 48-h complex demodulate of R E HYCOM as a function of the first demodulate of R Moor for our collection of geographical bins. As with the Lagrangian data (see Fig. 7a), the correlation is good (r 2 = 0.51 and 0.40 in log-log and linear domain, respectively), but the semidiurnal IT variance is systematically smaller in HYCOM than in the mooring data. In Fig. 9b and 9c we show the geographical location of the patches along with a proxy for the HYCOM to mooring semidiurnal IT variance ratio, and the histogram of this proxy, respectively. The latter histogram shows that the distribution of the proxy is 320 centered around 0.37, corresponding to a ratio of 0.6, approximately.
To measure the strength of the decorrelation affecting the Eulerian mean autocovariances, we define the semidiurnal coherent variance fraction (SCVF 15 ) as the ratio between the 48-h demodulate at τ ≈ 15 days (i.e., one spring-neap period) and the first demodulate. The SCVF 15 being calculated from the demodulate at relatively long time lags, it is meaningful only when estimated from a large enough sample size (i.e., with minimal uncertainty). Therefore, we start by considering all the data 325 available over our global collection of geographical bins.  We investigate a potential latitudinal dependence by plotting the mean autocovariance series computed separately from the geographical bins lying north (149 instruments) and south (11 instruments) of 50 • S (see Fig. 10c and 10d, and Fig. 10e and   335 10f, respectively). The mean semidiurnal IT variance and the mean SCVF 15 for each group are shown in Table 2. Again, the divergence between the two datasets appears enhanced south of 50 • S. Moreover, the SCVF 15 is larger in the Southern Ocean than elsewhere for HYCOM, but smaller for the moorings. This indicates a weaker, or slower, decorrelation of the IT in HYCOM in the Southern Ocean than in the global ocean.
The slower decorrelation of the IT in the simulation can be explained by some decorrelating processes, such as eddies or 340 submesoscale variability, being weaker in HYCOM than in the real ocean. It could also be explained by the time variability of certain decorrelating processes. Our numerical simulation only spans May 20 to June 20, 2019. Therefore, it is potentially missing processes that would specifically occur or intensify at another time of the year (or in a different year). On the other hand, the mooring data span several decades. Thus, our single month of data from HYCOM may not be representative of the broader temporal sampling of the mooring data.

345
The spatial distribution of the moorings is sparse and tends to be denser in particular areas (e.g. the Gulf Stream region). This is all the more true south of 50 • S , where much fewer moorings are available than in the northern region, and where they are mostly located in the Drake Passage. For both these reasons, the mean autocovariances computed from moorings north and especially south of 50 • S cannot be considered truly representative of the IT in these vast regions. Nonetheless, they remain representative of the collection of geographical bins used to construct them.
350 Table 2. Summarizing numerics of the autocovariance series plotted in Fig. 10

Apparent decorrelation
In Sect. 4.1 we demonstrated that the motion of the floats has no significant impact on the measured total variance of the IT.
However, the Doppler effect and spatial decorrelation induced by the Lagrangian sampling of the floats both act as decorrelating processes causing the autocovariance to decay with increasing time lag (Geoffroy and Nycander, 2022). Following Caspar-Cohen et al. (2022), we call this mechanism apparent decorrelation, as it is unrelated to the decorrelation of the IT. Geoffroy 355 and Nycander (2022) estimated the apparent decorrelation timescale to be longer than that of the autocovariance function observed by Argo floats, on a global average. Thus, they concluded that the decay of the global mean autocovariance observed by Argo floats is primarily due to the decorrelation of the IT.
The HYCOM data allow to study this by directly comparing the global mean autocovariance computed in the Lagrangian and Eulerian frameworks (see Fig. 11a and 11b). As expected, the Lagrangian (R L HYCOM , solid black curve in Fig. 11a) and Eulerian

360
(R E HYCOM , solid red curve) global mean autocovariances are virtually identical until τ ≈ 50 h. Past this limit, the more rapid decay of R L HYCOM can only be caused by the apparent decorrelation, while R E HYCOM continues to solely reflect the decorrelation of the IT. Our aim in the present section is to find estimates for the characteristic timescales of the apparent decorrelation and decorrelation of the IT in HYCOM and compare it with observations. complex demodulate at the semidiurnal frequency of the Eulerian autocovariance computed over a time lag window centered on τ : Here, σ 2 SD is the total semidiurnal internal tide variance, α is the stationary fraction, T is the IT decorrelation timescale, and σ 2 AM and ω AM are the variance and angular frequency of an amplitude modulating sinusoidal, respectively. A heuristic model 370 for the demodulate of the corresponding Lagrangian autocovariance is then where T app is the apparent decorrelation timescale.
A constrained least squares fit of the model Eq. (5) to the complex demodulate series computed from R E HYCOM (red crosses in Fig. 11b) yields T = 94 h. The fitted model as well as the exponential decay due to the IT decorrelation (i.e., only the term 375 proportional to σ 2 SD ) are represented in Fig. 11b as the solid and dashed red curves, respectively. On a global average, 95% of the decorrelation of the nonstationary IT in HYCOM is therefore achieved within 3T ≈ 300 h (for exp(−3) ≈ 0.05), i.e. well under the 32 days of data.
By dividing R L HYCOM by R E HYCOM , we can isolate the effects of the apparent decorrelation on R L HYCOM . In Fig. 11c we plot this ratio after applying a median filter with a window of 18 h (solid black curve). A least squares fit of a simple decaying 380 exponential to the obtained curve yields an apparent decorrelation timescale T app = 209 h (dashed black curve). For verification, we compute the Lagrangian C L SD from Eq. (6), by multiplying the fitted model of C E SD (solid red curve Fig. 11b) by the fitted exponential decay due to the apparent decorrelation (dashed black curve in Fig. 11c). The result (solid black curve in Fig. 11b) closely follows the demodulated Lagrangian autocovariance (black crosses in Fig. 11b). The different parameters obtained from the Eulerian and Lagrangian simulated data are summarized in Table 3. Note that the fitted amplitude modulating 385 sinusoid is close to the spring-neap cycle, here ω AM = |ω M2 − ω S2 | ≈ 0.0177 h −1 . Table 3. Summary of the parameters estimated from the simulated Eulerian and Lagrangian data. These values are used to compute C L SD from the model Eq. (6) (solid black curve in Fig. 10b). We can compare the global mean values of T and T app obtained from the simulation with the Argo observations. The geographical coverage of the Argo data is less than that of HYCOM (roughly 55% in area). Hence, instead of the global collections of floats and Lagrangian particles, we now consider the intersection of the two datasets taken as the unmasked bins shown in Fig. 7b and the corresponding local mean autocovariances. These local mean autocovariances are averaged further 390 to obtain a global mean autocovariance for each dataset that is representative of a common area. We then fit the model Eq. (6) and (5) to the demodulates of the Argo global mean autocovariance (see Fig. 12, updated from Geoffroy and Nycander, 2022).
The parameters fitted to the Argo observations, as well as the parameters obtained by repeating the above procedure for the corresponding simulated data, are gathered in Table 4. The values of T app and T computed here, from Argo data only, are similar to those reported by Geoffroy and Nycander (2022), where the authors estimated it from a comparable collection of Argo floats and HRET (in particular, the stationary limit was determined from HRET instead of by fitting). The fitted stationary fraction α, however, is about 3 times larger than previously reported. This could be explained by a biased low stationary variance (obtained by projecting HRET at 1,000 dbar) in Geoffroy and Nycander (2022). As from the simulated data, fitting our model to the Argo global mean demodulates yield a T shorter than T app . While the Argo T app is almost identical to the HYCOM value, T is about twice smaller.
400 Figure 12. Global mean autocovariance at 1,000 dbar computed from the Argo data over the unmasked bins in Fig. 7b (solid black curve), and corresponding complex demodulates at the semidiurnal frequency (black crosses). The uncertainties are vanishingly small (not shown).
The solid red curve represents the result of fitting the model Eq. (6) and (5) to the demodulates.
Similarly to Geoffroy and Nycander (2022), we conclude that the decorrelation of the IT is more rapid than the apparent decorrelation on a global average. Yet, according to the simulated data, it is only more rapid by a factor of 2, while this factor is 4 according to the Argo data. Hence, some decorrelating processes appear to be weaker in the simulation than in the real ocean. As explained in Sect. 4.2, the slower decorrelation of the IT in HYCOM could also be explained by certain decorrelating processes being weaker than usual in the May 20 to June 20, 2019 period of outputs used here. Nevertheless, the decorrelation 405 of the IT typically is at least as important as the apparent decorrelation over the first few days of time lag. This result might not hold true everywhere, since the geographical variability of T and α is expected to be large.
Lastly, Zaron (2022) provides a valuable comparison point for the results above. Along the same lines as in this section, he uses the spatially averaged frequency spectrum of sea level observations from satellite altimetry to measure properties of both the baroclinic tidal peak and continuum (i.e., the coherent and incoherent IT variability, respectively). For the semidiurnal band, 410 in the latitude range from 30 • S to 30 • N, the author found that about 62% of the IT variance eventually becomes decorrelated within 34 d. Assuming an exponential decay as in the model Eq. (5), this would represent a stationary fraction α ≈ 0.38 and a decorrelation timescale T ≈ 163 h (for 99% of the nonstationary IT to decorrelate in 5T ). Repeating the fitting procedure above for data limited to the ±30 • latitude range, we recover values in reasonable agreement from the Argo observations: α ≈ 0.29 and T ≈ 144 h (see Table 4). The HYCOM fitted α ≈ 0.64 and T ≈ 105 h agree less well with the results of Zaron (2022), but 415 still are in the same ballpark. In this latitude range, the bins used for the fitting represent roughly 45% of the oceanic area. Table 4. Parameters from the fitting of the model Eq. (6) to the demodulates of (i) the global mean autocovariance computed from the Argo data over the unmasked bins in Fig. 7b (see Fig. 12), and (ii) the mean autocovariance computed from the same bins but limited to the latitude range from 30 • S to 30 • N. These parameters are also estimated from the simulated Eulerian and Lagrangian data in the same locations.

Why is the IT variance lower and IT decorrelation weaker in HYCOM than in the observations, particularly in the Southern
Ocean ? At the time of writing we cannot think of a particular reason for either the Argo or the mooring derived IT variance to be biased high globally. In particular, the correction accounting for the non-tidal variability we systematically subtract from 420 the sample autocovariance precludes any contamination of the first demodulate by the background noise (see Appendix C).
We also investigated whether the bias in the Southern Ocean could be related to the contamination of the first 48-h demodulate at ω SD by near-inertial waves as we approach the M 2 critical latitude (where f = ω M2 , at about 74 • S). To check this, we can map the semidiurnal IT variance from the Argo data anew (as shown in Fig. 6c), this time fitting an additional cos(f τ ), where f is the local Coriolis frequency, to the local mean autocovariances. The result of the fit becomes unstable at 74 • S, but 425 the zonal mean of the demodulates at f does reach a maximum around 60 • S while the zonal mean of the demodulates at ω SD remains unaffected (not shown). We conclude that the first 48-h demodulate at the semidiurnal frequency is not affected by near-inertial waves, at any latitude.
As for the model, the horizontal grid spacing limits the number of vertical modes correctly resolved to the first five modes, equatorward of ±50 • latitude and for seafloor depths deeper than 1250 m (Buijsman et al., 2020). Approaching the poles, the 430 number of model layers below the mixed layer, and hence the vertical resolution, decreases. This further limits the number of resolved modes in HYCOM south of 50 • S: roughly, modes 3 and 2 are only partially resolved poleward of 60 • S and 65 • S, respectively. Although the bulk of the IT variance at 1,000 dbar is likely related to mode-1 waves on a global average (Geoffroy and Nycander, 2022), the contribution from higher modes can become significant locally. In principle, both Argo floats, moorings, and HYCOM data include the effect of higher modes, but these are less well resolved in HYCOM, particularly 435 in the Southern Ocean. Together with the assumption that higher-mode IT are less coherent (Egbert and Ray, 2017), this may explain the lower IT variance and weaker IT decorrelation in HYCOM than in the observations, both on a global average and more specifically in the Southern Ocean. It is also in line with the larger mean SCVF 15 computed from HYCOM data in the Southern Ocean compared with the rest of the globe (indicating a weaker decorrelation there), whereas for moorings the SCVF 15 is smaller in this region (see Table 2).

440
The mode-m vertical structure of the isopycnal displacement Φ m (z) is obtained by solving the Sturm-Liouville problem with the boundary conditions Φ m (0) = Φ m (−H) = 0, where H is the ocean depth, N (z) is the buoyancy frequency profile, and −1/c 2 m is the eigenvalue corresponding to the eigenfunction Φ m (z) for mode-m (Gill, 1982). The modal partitioning of the IT energy at a given location is mainly determined by the conversion rate (both local and remote) and lifetime of each mode 445 (de Lavergne et al., 2019). Additionally, depending on the local stratification and ocean depth, the parking depth at 1,000 dbar can be more or less close to the anti-node (point of maximal displacement) and node (point of no displacement) of the different vertical modes. Therefore, the normalized contribution of mode-m relative to mode-1 waves to the variance recorded at this depth is weighted by a coefficient γ m1 . In Appendix B of Geoffroy and Nycander (2022), the authors derived an expression for this coefficient: In Fig. 13a and 13b we plot the global maps of γ 21 and γ 31 at 1,000 dbar, respectively, computed from Eq. (8)  low HYCOM to Argo IT variance ratios (darker blue pixels in Fig. 7b) spatially coincide with large γ 31 , mostly, or γ 21 to some extent (especially in the Weddell Sea). South of 60 • S, either γ 31 or γ 21 is larger than unity, hence the contribution from mode-3 or mode-2 waves to the IT variance recorded at 1,000 dbar is magnified compared to the contribution from mode-1 waves. However, modes 2 and higher are not well resolved in HYCOM at these latitudes.
The magnification of the contributions from modes 2 and 3 to the variance at 1,000 dbar in the Southern Ocean only affects 460 how propagating IT are perceived at the Argo parking depth. This has no connection with the generation and dissipation processes that set the underlying modal partitioning of the IT energy. Additional explanations might be found by examining whether the main parameters affecting the generation of IT (namely the bottom topography, barotropic tidal forcing, and stratification) are less accurate in this region than in the rest of the globe.
The pattern of the enhanced discrepancies between Argo and HYCOM in the Southern Ocean (darker blue in Fig. 7 been mapped using shipborne techniques. Therefore, global bathymetric products widely rely on depth prediction from satellite gravimetry (Smith and Sandwell, 1994). Small scale features such as abyssal hills are not resolved by this technique. In the recent SRTM15+ bathymetry (Tozer et al., 2019), gravity predicted depths were estimated to have root-mean-square errors with respect to TPXO, a state of the art data-assimilative tide model (Egbert and Erofeeva, 2002). However, the tidal elevations in TPXO itself also have errors. Both Stammer et al. (2014) and Zaron and Elipot (2021) point at the imperfection of the modeled tidal elevations near Antarctica (using data from GRACE and CryoSat-2, respectively). On the other hand, it is not so much the sea surface height that matters here, but the tidal currents. Zaron and Elipot (2021) used surface drifter observations for evaluating TPXO predicted tidal currents throughout much of the global oceans. Unfortunately, the spatial 485 density of observations is too poor to evaluate the model performance around much of Antarctica.
Lastly, to assess the stratification in HYCOM, we compare the phase speed of a mode-1 gravity wave in the model with the phase speed determined from climatology. The phase speed of a mode-m gravity wave c m is directly related to the eigenvalue −1/c 2 m obtained by solving the Sturm-Liouville problem Eq. (7). Having already solved Eq. (7) for the climatology, we now solve it for our simulated monthly-mean 3D fields of temperature and salinity (subsampled to 1 • ). As for the climatology, the 490 HYCOM data were extended down to the reference bathymetry from GEBCO 2019, wherever deeper, by appending the stored bottom value. We then linearly interpolated the climatological phase speed of a mode-1 gravity wave c WOA 1 at 1/4 • resolution onto the coarser 1 • HYCOM grid.
In Fig. 14a we plot the climatology to HYCOM phase speed ratio. Most of the visible differences fall in the range of expected interanual variability of less than 10% (Chelton et al., 1998). However, in a few areas around Antarctica there are 495 larger departures of the ratio from unity. Fig. 14b shows the zonal mean of the mode-1 phase speed from Argo and HYCOM as a function of latitude. Equatorward of ±60 • , the two curves agree almost perfectly, and they also visually agree with the results of Chelton et al. (1998) (not shown). Poleward of 60 • S and 60 • N, however, differences steadily grow. In the Southern Ocean, the zonal mean climatology to HYCOM phase speed ratio peaks at 1.7 (see Fig. 14c). Typically, weaker stratification (smaller phase speed) results in smaller energy conversion.

Conclusions
In this work we compared a 32-day segment of a global run of the HYCOM model, including realistic tidal and atmospheric forcing, with in situ observations of the semidiurnal IT around 1,000 dbar. First, a Lagrangian sampling of the simulation was compared to park phase data from Argo floats to validate the geographical variability of the semidiurnal IT variance in HYCOM (see Sect. 4.1). Then, the Eulerian simulation outputs were directly compared to geographically sparser mooring 505 records, in terms of variance and decorrelation of the IT (see Sect. 4.2).
The main spatial patterns of the simulated IT variance at 1,000 dbar broadly agree with Argo observations, with energy radiating away from low-mode IT generation hotspots (see Fig. 6). Nonetheless, on a global average, the HYCOM data exhibit a smaller semidiurnal IT variance than observed by Argo floats, by a factor 0.58 (see Table 1). This is in line with the results of Ansong et al. (2017) and Luecke et al. (2020), who found HYCOM values to be biased low by a similar factor when comparing 510 the simulated M 2 IT energy flux and temperature variance to historical mooring observations. While the difference between the model and Argo data appears reasonably homogeneous across most of the world ocean, it steadily increases towards the poles (see Fig. 7). Because of the scarcity of Argo floats available in the northernmost region, we focused on the Southern Ocean. On average, south of 50 • S, we found that the simulated semidiurnal IT variance is smaller than the variance observed by Argo by a factor 0.29. North of 50 • S, this factor is 0.69 (see Table 1).

515
The mooring data support the above results for the semidiurnal IT variance. Additionally, we found that the decorrelation affecting the semidiurnal IT in HYCOM over a 32-day window is weaker than observed in the mooring records, on average (see Fig. 10 and Table 2). In other words, over timescales shorter than 32 days, the IT in HYCOM are more coherent than in observations. This weaker decorrelation of the IT in HYCOM can be explained by some decorrelating processes being weaker in the model than in the real ocean. It could also be explained by certain decorrelating processes being unusually weak in 520 the May 20 to June 20, 2019 period of outputs used in this work. Depending on the location, the complete decorrelation of the nonstationary IT is not systematically observable in a 32-day duration. Longer time series are thus needed to accurately describe the decorrelation of the IT. Nonetheless, we found that the semidiurnal IT autocovariance in HYCOM actually reaches its stationary limit within approximately 300 h on a global average, i.e. well under our 32 days of simulated data (see Fig. 11).
Put together, these results support the conclusions of Buijsman et al. (2020), who found that the stationary (i.e., the longterm 525 coherent) M 2 IT solution from HYCOM is too energetic compared with altimetry. Note that their comparison was based on one-year simulated time series corrected for the duration difference with 17-year long altimetry records.
We also investigated the effects of the Lagrangian sampling inherent to the Argo floats. When comparing autocovariances computed from the HYCOM data sampled in the Lagrangian and Eulerian frameworks, respectively, we found the IT variance to be unaffected in the mean (see Fig. 5). Moreover, the simulated apparent decorrelation (the decorrelation due to the motion 530 of a Lagrangian particle) agrees very well with the apparent decorrelation experienced by Argo floats, on a global average (see Sect. 4.3). The (Eulerian) decorrelation of the IT in the simulated data, on the other hand, typically is half as rapid as the one inferred from Argo observations. This would make the IT decorrelation at least as important as the apparent one over the first few days of time lag. However, the geographical variability of the duration and strength of the IT decorrelation is expected to be large. Limiting the comparison to the latitude range from 30 • S to 30 • N, the Argo and HYCOM data lead to an IT 535 decorrelation timescale of about 4.5 and 6 d, respectively (see Table 4), in reasonable agreement with the results of Zaron (2022).
Finally, we discussed the potential sources of bias. We could not think of a particular reason for the IT variance obtained from either the Argo or the mooring data to be biased high, particularly in the Southern Ocean. However, HYCOM is subject to various limitations. First and foremost, the model can only correctly resolve vertical modes up to 5 in most of the global 540 oceans. Approaching the poles, the reduced number of layers further limits the number of resolved modes. While mode-1 IT supposedly accounts for most of the tidal variability at 1,000 dbar on a global average (Geoffroy and Nycander, 2022), in situ instruments also record the contribution from higher modes that can become significant locally. This may explain why the IT variance is larger in the in situ data than in HYCOM, particularly in the Southern Ocean. Insufficient model stratification also seems to be a specific problem in that very region. We have not been able to quantitatively explain the overall smaller IT 545 variance in HYCOM than in the in situ data over the global ocean. In principle, it could be due to limitations in the bathymetry, barotropic tidal forcing, or the stratification.
Code and data availability. Argo data were obtained from U.S. GDAC (ftp://usgodae.org/pub/outgoing/argo). These data were collected and made freely available by the International Argo Program and the national programs that contribute to it. (https://argo.ucsd.edu, https: //www.ocean-ops.org). The Argo Program is part of the Global Ocean Observing System. An Argo Iridium float list is maintained by
There is no long term availability plan for the HYCOM data used in this work. Climatological data were obtained from the World Ocean Atlas 2018 (https://accession.nodc.noaa.gov/NCEI-WOA18). Bathymetric data were obtained from the GEBCO Compilation Group (https://www.
Appendix A: Bias of the complex demodulate of a sample mean function 560 We consider the function R(τ ) on a 48-h interval. This is fitted to the function A cos(ωτ ) + B sin(ωτ ), and we then define This procedure is a mapping from the space of functions R to the positive number C, i.e. a functional. We can write this symbolically as

565
where Φ is the functional.
Φ obviously has the property where a is a constant. Φ also has the property (proof hereafter) 570 with equality only if R 1 (τ ) and R 2 (τ ) are proportional, i.e. perfectly correlated with positive correlation.
Let {R i (τ )} be an infinite series of functions drawn from a stochastic distribution. We have access to only N of these (typically around 10). Define the sample average is an unbiased estimate of the true average R(τ ): We also want an estimate of Φ R . Denote the true value by C ∞ : A sample estimate might be 580 However, (A4) implies that this is biased. For example, define We get with equality only if R N and R M are proportional. But R N and R M are stochastic functions drawn from the same distribution, 585 and therefore almost never proportional. On average we also have Φ R N = Φ R M , hence Eq. (A10) implies so that C N is a decreasing function of N . Thus, we expect C N to be larger than C ∞ . In other words, the complex demodulate of the sample function R N (τ ), i.e. the fitting defined in Eq. (A2), is a biased high estimate of the envelope of the true function R(τ ).

590
We now show the inequality Eq. (A4). It resembles the triangle inequality for two vectors: For 2D vectors a = (a 1 , a 2 ), b = (b 1 , b 2 ), this gives which is valid for arbitrary numbers a 1 , a 2 , b 1 , and b 2 .

595
Define a scalar product f, g between the functions f (τ ) and g(τ ) as the integral of f g over the 48-hour interval: Suppose that the period of sin(ωτ ) exactly fits this window (note that this is the case in the present work, where 4 × 2π/ω SD ≈ 48.8 h). Then sin(ωτ ) and cos(ωτ ) are orthogonal: cos(ωτ ), sin(ωτ ) = 0.

610
Appendix B: Effects of the non-tidal variability on the complex demodulates Two aspects of the processing originally used in Geoffroy and Nycander (2022) are important in mitigating a potential contamination of the first demodulate by the non-tidal variability (noise). (i) The demodulation over a given 48-h window is performed in two iterations. A cos(ω SD τ ) + B sin(ω SD τ ) is first fitted to the signal and the root mean square error (RMSE) is computed.
The fitting is then repeated with a trimmed signal that excludes values outside of ±2 RMSE from the previously fitted curve.

615
(ii) The time series are high-pass filtered prior to computing their sample autocovariance. Suppose the non-tidal variability ρ(t) is a first order autoregressive process (AR1) characterized by zero mean, the timescale τ ρ , and the white Gaussian noise ε ρ (t) with zero mean and variance σ 2 ε, ρ . The variance and autocovariance of an AR1 process are given by respectively. For very short τ ρ , ρ(t) is close to a white noise, i.e. showing a flat power spectrum. As τ ρ increases, ρ(t) resembles 620 a red noise with increasingly steep exponential decay in spectral space. Therefore, as τ ρ increases, an increasing fraction of σ 2 ρ is filtered out by the high-pass filter.
We can investigate this further using a simple model, adapted from Geoffroy and Nycander (2022), of a tidal variability on top of a background noise: Here, ω i and A i are the angular frequency, and the amplitude of the tidal constituent i, respectively, where i ∈ {M 2 , S 2 }. φ(t) and ρ(t) are AR1 processes representing random phase modulations and a non-tidal variability, respectively. These AR1 processes can be defined by their variance and characteristic time. We will vary σ 2 ρ and τ ρ (the variance and characteristic time of ρ(t), respectively) with the rest of the variables set to realistic values: the tidal variance σ 2 SD = (A 2 M2 + A 2 S2 )/2 = 40 m 2 , and σ 2 φ = 1.19 rad 2 and τ φ = 150 h (the variance and characteristic time of φ(t), respectively). Since the tidal variance is fixed, 630 varying σ 2 ρ is the same as varying the signal to noise ratio S/N = σ 2 SD /σ 2 ρ . For reference, in Appendix C we estimate the global distribution of the signal-to-noise ratio observed by Argo floats (not shown). This distribution is skewed, with median 0.8 and 5 th percentile 0.2, approximately. We also estimate the median τ ρ ≈ 1 h from the global collection of floats.
We start by varying τ ρ for a fixed variance value of σ 2 ρ = 200 m 2 , corresponding to the extreme S/N = 0.2. For a given value of τ ρ , we generate 1000 32-day long synthetic time series h(t) following the model Eq. (B2). The time series are high-pass filtered using a fourth-order Butterworth filter with a cut-off frequency of 0.3 cpd. We then compute a sample autocovariance from each filtered time series and average these to obtain a mean autocovariance. Finally we compute the 48-h demodulates of the mean autocovariance following the complex demodulation method described in Sect. 3.2.
In Fig. B1, we show the mean autocovariance and the corresponding 48-h demodulates for τ ρ = 0.1, 1, and 12 h. The value of the first demodulate for τ ρ = 0.1 is essentially the same as if there was no ρ(t) at all, namely C(τ = [0, 48]) ≈ 34 m 2 . For 640 longer τ ρ , the main difference with the previous curve can be easily visualized from the first few demodulates. In particular, the first demodulates C(τ = [0, 48]) ≈ 38 and 31 m 2 , corresponding to τ ρ = 1 and 12 h, are systematically biased high and low, respectively. We repeat the same experiment for τ ρ ranging from 0 to 48 h, and focusing on the effects of the noise on the first demodulate.
In Fig. B2, we plot the first demodulate as a function of τ ρ (solid curve). For any τ ρ , the first demodulate is seen to remain 645 lower than the true σ 2 SD = 40 m 2 . Moreover, two regimes can be identified: (i) for τ ρ shorter than 5 h, roughly, the contribution of the non-tidal variability to the first demodulate is positive and peaks for τ ρ = 1 h where C(τ = [0, 48]) ≈ 38 m 2 , and (ii) for longer τ ρ , this contribution is negative with a minimum C(τ = [0, 48]) ≈ 30 m 2 at τ ρ = 11 h. For a more typical S/N = 0.8, i.e. setting σ 2 ρ = 50 m 2 in our synthetic time series, we found a similar dependence of the first demodulate on τ ρ as previously but with a much smaller span in amplitude (dash-dotted curve in Fig. B2). The contribution of the non-tidal variability to  Hence, for the typical S/N = 0.8 and τ ρ = 1 observed by Argo floats, we estimate that the non-tidal variability can lead to a first demodulate being biased high by roughly 3%. For the same characteristic timescale but S/N = 0.2 (the 5 th percentile of the global S/N distribution sampled by Argo floats), this bias can reach 10%. Notwithstanding, even for such extreme S/N 655 value, the first demodulate is seen to remain a conservative estimate of the IT variance. The effects of the non-tidal variability on the IT variance estimate as computed in Geoffroy and Nycander (2022) thus do not put their results into question.
It is not straightforward to understand why the contribution of the non-tidal variability to the value of the first demodulate can be negative. Because of the way we defined ρ(t), i.e. an AR1 process with positive parameter, its autocovariance should remain positive. Oscillations of R ρ (τ ) are actually a consequence of the high-pass filter applied to the time series. We show 660 this in Fig. B3 where we plot the theoretical autocovariance of ρ(t) with τ ρ = 12 h and σ 2 ρ = 200 m 2 (solid black curve), and the sample mean autocovariance from 1000 non filtered and 1000 high-pass filtered synthetic time series (solid and dash-dotted red curve, respectively) constructed using the same parameters.
As mentioned in Sect. 3.2, we expect the stochastic background noise affecting the simulated data to be different from the one recorded by the in situ instruments. Therefore, to prevent any systematic bias in the comparisons presented in this work, 665 we consistently correct the sample autocovariance by subtracting an estimate of the autocovariance of the non-tidal variability before performing the complex demodulation (see Appendix C). We illustrate the effects of this correction in Fig. B4. Here, we proceed in the same way as for Fig. B2, but this time varying S/N (from 4 × 10 −2 to 4) for fixed τ ρ = 1 and 12 h. While the first demodulates of the non corrected mean autocovariances diverge for small S/N values (solid curves), the first demodulates of the noise corrected autocovariances are virtually constant (dash-dotted curves). Figure B3. Theoretical autocovariance of the AR1 process ρ(t) with τρ = 12 h and σ 2 ρ = 200 m 2 (solid black curve), and sample mean autocovariance computed from 1000 non filtered and 1000 high-pass filtered synthetic time series of ρ(t) (solid and dash-dotted red curve, respectively) constructed using the same parameters. Figure B4. First 48-h demodulate of the mean autocovariance computed from 1000 synthetic time series generated following the model Eq. (B2) as a function of S/N for fixed τρ = 1 (solid black curve) and 12 h (solid red curve). The first demodulates of the corresponding noise-corrected autocovariances are shown as the black and red dash-dotted curves, for τρ = 1 and 12 h, respectively.

Appendix C: Estimating the autocovariance of the non-tidal variability
In the present appendix, we derive a model for the autocovariance of a tidal variability on top of a high-pass filtered stochastic noise. In Appendix B we saw that the autocovariance of the non-tidal variability is affected by the high-pass filter we apply on the original time series (see Fig. B3). This consequence of filtering the time series was not taken into account by Geoffroy and Nycander (2022) in the model of the background noise they used.

675
Assume that the non-tidal variability ρ(t) is a first order autoregressive process (AR1) characterized by zero mean, the timescale τ ρ , and the white Gaussian noise ε ρ (t) with zero mean and variance σ 2 ε, ρ . In the time domain, the filtered ρ (t), can be modeled as the convolution of the unfiltered ρ(t) with the impulse response of the filter h(t). We are, however, interested in the autocovariance of ρ (t), R ρ (τ ), which is closely related to the power spectrum (one is the Fourier transform of the other).
Luckily, it is much simpler to perform this operation in the frequency domain, where the power spectrum at the output of a 680 linear filter, P ρ (z), is related to the power spectrum of the input stochastic process, P ρ (z), by P ρ (z) = |H(z)| 2 P ρ (z).
Here, H(z) is the system or transfer function of the filter, and the z-domain is limited to the unit circle z = exp(jω∆t), i.e. the z-transform is equivalent to the discrete Fourier transform, where ω is the angular frequency, and ∆t is the constant sampling interval (Kay and Marple, 1981). Moreover, the power spectrum of an AR1 process is a standard result: 685 P ρ (ω) = σ 2 ε, ρ ∆t 1 + exp(−2/τ ρ ) − 2 exp(−1/τ ρ ) cos(ω∆t) .
The Butterworth filter is linear in amplitude but not in phase. As a workaround, we applied a second order filter twice, once forward and once backward, to recover a 4 th order filter with zero phase. The transfer function of a second order high-pass Butterworth filter in the z-domain, H B2 (z), can be obtained from its Laplace transform:
Since ρ is real-valued, the autocovariance R ρ (τ ) is given by the inverse discrete Fourier cosine transform of its power spectrum P ρ (ω) (computed numerically).
As mentioned in Sect. 3.2, to limit any systematic bias we consistently estimate and subtract the autocovariance of the nontidal variability from the sample autocovariance before performing the complex demodulation. We obtain such an estimate 700 from the least squares fit of the model adapted from Geoffroy and Nycander (2022): corresponding to the autocovariance of a tidal variability affected by a background noise: φ i (t) is an AR1 process representing the random phase modulations affecting the constituent i. For the sake of simplicity, we assume that the decorrelating processes act in the same manner on neighboring tidal frequencies, that is, φ M2 (t) = φ S2 (t) and φ K1 (t) = φ O1 (t). Note that the high-pass filter used in this work was designed so as not to affect the tidal signal. ρ f (t) and ρ s (t) are high-pass filtered AR1 processes accounting for a fast and a slow non-tidal variability, respectively. All the AR1 processes are characterized by zero mean, the time scale τ X , and the white Gaussian noise ε X (t) with zero mean and variance 710 σ 2 ε, X , and X ∈ {φ, ρ f , ρ s }. The variance and autocovariance of an AR1 process are respectively. The model Eq. (C7) does not take into account the effects of the drift of the floats or Lagrangian particles.
We fit the full model Eq. (C7), containing 12 parameters, to the sample autocovariance by nonlinear least squares, with weights inversely proportional to the corresponding SEM, and imposed bounds on the parameters (since we only fit positive 715 timescales and variances). In particular, the timescales of the fast and slow non-tidal variability are restricted to the [0, 5] and [5,48] h range, respectively. The autocovariance of the non-tidal variability is then computed as using the fitted parameters. Lastly, R ρ (τ ) is subtracted from the sample autocovariance to correct for the effects of the filtered background noise.

720
The inclusion of a tidal variability in the model Eq. (C7) primarily aims at limiting the projection of the observed tidal variability onto our model for the background noise when fitting. Nonetheless, the fitted parameters can be used to compute an estimate of the semidiurnal IT variance σ 2 SD, LS = (A 2 M2 + A 2 S2 )/2, and further to estimate the signal-to-noise ratio characterizing the filtered Argo records as S/N = σ 2 SD, LS /R ρ (0). In Fig. C1 we show maps of σ 2 SD, LS , R ρ (0), and S/N computed from the same collection of local mean autocovariances as used in Fig. 6c. Note that σ 2 SD, LS and R ρ (0) are not correlated (r 2 = 0.13 725 and 0.009 in log-log and linear domain, respectively). The least squares fitted σ 2 SD, LS appear reasonable when compared with the first demodulates presented in Fig. 6c. The coefficient of determination r 2 = 0.53 and 0.29 in log-log and linear domain, respectively, indicates a good agreement between the two estimates. Moreover, the global mean and median of their ratio are 0.8 and 0.7, respectively, accounting for the decorrelation of the IT taking place in the first 48 h. For such local mean autocovariances, however, the uncertainties are simply to high to reliably estimate the parameters of the decorrelating process φ i (t).

730
Finally, we chose to perform the comparisons presented in this work in terms of the first demodulate, for it is a conservative and more robust estimate of the IT variance.   Fig. 6c. (b) Non-tidal variance from the same fitting as in (a). (c) Signal-to-noise ratio computed as the ratio of (a) over (b). The ocean mask is colored in yellow for readability.
Competing interests. The authors declare that they have no conflict of interest.
by the Office of Naval Research (ONR) grant numbers N00014-19-1-2704 and N00014-19-1-2712, respectively, which both fall under the project "Modeling, characterizing, and predicting effects of internal gravity waves on acoustic propagation on basin to global scales". JFS was supported by ONR grant number N0001422WX01919, which falls under the project " Diagnosis and validation of the time and spatial variability of remotely generated internal waves in global ocean simulations". The authors would like to thank Shane Elipot and one additional anonymous reviewer who greatly helped improve this manuscript.