An evaluation of eight global ocean reanalyses for the Northeast U.S. Continental shelf

The Northeast U.S. continental Shelf (NES) extending from the Gulf of Maine to Cape Hatteras, is a dynamic region supporting some of the most commercially valuable fisheries in the world. This study aims to provide a systematic assessment of eight widely used, intermediate-to-high spatial resolution global ocean reanalysis products (CFSR, ECCO, ORAS, SODA, BRAN, GLORYS, GOFS3.0, and GOFS3.1) against available in situ and satellite ocean observations. In situ observations include water level from tide gauges


Introduction
The Northeast U.S. Shelf (NES), extending from the Gulf of Maine to Cape Hatteras, North Carolina (Fig. 1), is a dynamic region supporting some of the most commercially valuable fisheries in the world (Hare et al., 2016).The region receives cold and fresh water that originates in the Arctic along with the accumulation of coastal discharge and ice melt that has been advected thousands of kilometers along the western boundary of the North Atlantic (Chapman and Beardsley, 1989;Townsend et al., 2015;Fratantoni and Pickart, 2007;Richaud et al., 2016).Warm and salty water advected by the Gulf Stream also influences the composition of water masses within the NES.As an example, Gulf Stream warm core rings provide episodic impingement of slope and offshore waters onto the shelf, which can significantly change the hydrography and circulation on the NES (e.g., Joyce et al., 1984;Chen et al., 2014a,b;Ullman et al., 2014;Zhang and Gawarkiewicz 2015).Separating the shelf and slope water is a thermohaline front, which is dynamically trapped along the shelfbreak (Gawarkiewicz and Chapman, 1992;Chapman and Lentz, 1994;Chapman, 2000).Bathymetry in the NES region is variable and complex, consisting of broad shallow banks (e.g.Georges Bank and Nantucket Shoals), isolated deep basins and channels in the Gulf of Maine (e.g.Northeast Channel and Great South Channel), and a shelfbreak which shoals dramatically from 300 m near the Gulf of Maine to 50 m off Cape Hatteras (Fig. 1).These bathymetric features place strong constraints on the large-scale circulation and result in local modifications to the basic hydrographic structure that might not be well represented in global reanalyses with insufficient spatial resolution.
The NES region has been experiencing rapid warming (Pershing et al., 2015;Goncalves Neto et al., 2021;Seidov et al., 2021), frequent and intense marine heatwaves (e.g.Chen et al., 2014a;Chen et al. 2015;Gawarkiewicz et al. 2019;Chen et al. 2022), and rapid sea level rise (e. g., Sallenger et al. 2012;Piecuch et al. 2018).Off the shelf, significant changes have been reported in the position, meandering character, and frequency of eddy formation by the Gulf Stream (e.g., Andres, 2016;Gangopadhyay et al. 2019).Understanding the impacts of these changes on shelf habitat is challenged in part by a lack of continuous highresolution ocean observations spanning the NES.Therefore, combining ocean observations and models is necessary to assess the impacts of ocean change on the marine ecosystem.
Given their limited spatial resolution, global climate models are unable to resolve regional ocean circulation on the NES.Saba et al. (2016) found that the global climate models with standard ocean resolution (1 • ) exhibit particularly strong warm and salty biases on the NES due to the coarse horizontal resolution and lack of fine-scale bathymetry within the simulations.Similarly, several studies have found that the seasonal prediction skill of sea surface temperature (SST) for the NES is limited (Hervieux et al., 2019) and the least skillful among 11 Large Marine Ecosystems surrounding North America (Jacox et al., 2020), based on predictions using global models with standard ocean resolutions.The complex coastal topography and important regional processes that influence the NES, such as cross shelf exchange, tidal forcing, freshwater discharge, and strong air-sea interactions, are probably not being realistically resolved by the coarser resolution products.
Earlier studies have used dynamical downscaling to investigate the shelfbreak frontal system (Chen and He, 2010), regional circulation dynamics (Chen and He, 2015), heat balance (Wilkin, 2006;Chen and He, 2015;Chen et al., 2016), Mid-Atlantic Bight (MAB) Cold Pool (Chen et al., 2018), Gulf Stream eddy energetics (Kang and Curchitser, 2015), and future climate impacts on the NES region (Alexander et al., 2020;Shin and Alexander, 2020).These studies have shown that dynamical downscaling produces reasonable and improved representations of the ocean circulation on the NES compared with global models, especially where the relevant bathymetric features (e.g., shelfbreak and basins) are better resolved.Global ocean reanalyses, which combine models and observations via data assimilation, are a useful tool to provide ocean state estimates and boundary conditions for regional models.However, their realism for a particular region of interest should be critically assessed against available observations (e.g., Moore et al., 2019).On a global scale, extensive studies comparing reanalysis products have found the largest biases of ocean state variables in coastal areas, western boundary currents, and the deep ocean (Ryan et al., 2015;Balmaseda et al., 2015;Karspeck et al., 2017;Palmer et al., 2017;Toyoda et al., 2017;Storto et al., 2017;Valdivieso et al., 2017).On a regional scale, Souza et al. (2021) compared four reanalysis products in New Zealand coastal waters, Oke et al. (2012) and Divakaran et al. (2015) compared five reanalysis forecast systems in the Australian waters, Amaya et al. ( 2022) compared three reanalyses on the west coast of the United States and Russo et al. ( 2022) compared three reanalyses in South African waters.These studies show significant differences between the reanalyses and among regions and not one product performed best across all parameters.A study by Chi et al. (2018) has compared 13 reanalyses in the Gulf Stream region and found that most of the products fail to reproduce the Fig. 1.Maps of the area of study with bathymetry shown as color shading and the major features of the surface circulation.a)The Northwest Atlantic region including the Labrador Shelf, the Newfoundland Shelf, the Gulf of Saint Lawrence, the Scotian Shelf, the Gulf of Maine and the Mid-Atlantic Bight.b)The Northeast U. S. Shelf (NES) defined by the red square in (a) including the in situ observations used in this study to compare with reanalysis.The grey dots show the National Oceanic and Atmospheric Administration (NOAA) Northeast Fisheries Science Center (NEFSC) surface and bottom temperature and salinity vertical profiles, the orange dots show the Oleander profiles used in this study, the pink pentagons indicate the NorthEastern Regional Association of Coastal Ocean Observing Systems (NERACOOS) moorings (A01, E01 and F01), the dark orange triangles the National Data Buoy Center (NDBC) temperature buoys (44025; Long Island, 44008; Nantucket, 44025; Gulf of Maine), the blue circles indicate the tide gauges (749;Chesapeake Bay, 264; Atlantic City, NJ, 742; Woods Hole, MA) and the black lines and circles indicate the along-track SLA passes (243, 228, 126 and 141) and locations used in this comparison.The vertical yellow line denotes the cross shelf transect used to compare the thermohaline structure at the shelfbreak front.The colored polygons show the regions used in this study to compare the NOAA NEFSC dataset.SMAB and NMAB stand for Southern and Northern Mid-Atlantic Bight, respectively.GB stands for Georges Banks.WGOM and EGOM stand for the western and eastern Gulf of Maine, respectively.(For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)basic features of the Gulf Stream.Considering regional differences, global ocean reanalyses need to be evaluated within each area of interest.To our knowledge, a systematic assessment of global ocean reanalyses targeted to the NES has not yet been conducted.
This study aims to provide a systematic assessment of intermediateto-high spatial resolution global ocean reanalysis products, widely used in the ocean and climate community, against available in situ and satellite observations on the NES.Because direct observations are nonuniform in time and space it is useful to assess the fidelity of available reanalysis products in reproducing the observed distribution of ocean properties, circulation, and variability in the NES region.A systematic analysis of the available global ocean reanalysis products will provide guidance for users to choose the most suitable product for their particular applications.
Local temperature dynamics are critical to the state of the NES ecosystem, where the temperature gradients are particularly sharp, because they impact the fish distribution more strongly than in other ecosystems where the temperature distribution is comparatively more uniform (Pinsky et al., 2013).Moreover, studies have found a strong relationship between stock productivity and climate variables for several groundfish species in the region (Miller et al., 2016;Miller et al., 2018;Xu et al., 2018;Bell et al., 2018, O'Leary et al., 2019).As a result, global reanalyses are already being considered in stock assessments on the NES (NEFSC, 2020), making a regional assessment of available products even more critical.Our results can, therefore, be used to address the needs of fisheries management in the region by identifying which product is best suited for the full reconstruction of key variables such as surface and bottom temperature (Miller et al., 2016;Chen et al., 2021;du Pontavice et al., 2022).
Here, we present an assessment and comparison of eight reanalysis products ranging from intermediate resolution (1/2 • ) to high-resolution (1/12 • ) across a 24-year overlapping period, 1994-2017.In Sections 2 and 3, we describe the reanalyses and the observations used in their evaluation.The results are presented in Section 4, we first consider the mean circulation in the greater Northwest Atlantic as represented by the reanalyses relative to observations, followed by a comparison of the temperature, salinity, and sea level anomaly (SLA) variability on the shelf.A summary and discussion are presented in Sections 5.

Reanalysis
The global reanalyses evaluated here include the 'Climate Forecast System Reanalysis' (CFSR), the 'Estimating the Circulation and Climate of the Ocean' (ECCO), the 'Ocean and sea-ice ReAnalyses System' (ORAS), the 'Simple Ocean Data Assimilation' (SODA), the 'Bluelink Reanalysis' (BRAN), the 'Global Ocean Reanalysis Simulations' (GLORYS) and the 'Global Ocean Forecast System' (GOFS) versions 3.0 and 3.1.A brief summary of each reanalysis is provided below and in Table 1, and the temporal availability is shown in Fig. 2. We group CFSR, ECCO, ORA and SODA as the coarser-resolution reanalyses since their spatial resolution is larger than or equal to 1/4 • (≈27 km at the NES latitude band), while BRAN, GLORYS and two versions of GOFS are grouped as the high-resolution products since their resolution is finer than or equal to 1/10 • (≈11 km at the NES latitude band).
One of the important consequences of the horizontal resolution is the degree of smoothing that is applied to the bathymetry, which affects the circulation and mixing of water masses.Fig. 3 shows the bathymetry from the ETOPO1 global elevation dataset having 1 arc-minute resolution and each reanalysis, as well as the differences between the ETOPO1 bathymetry and each reanalysis (NOAA National Geophysical Data Center, 2009).The coarser resolution products do not fully resolve the shelfbreak topography or key features in the GOM like the Northeast Channel.CFSR and ECCO poorly resolve the shelfbreak and the GOM and are deep in most of shelf compared to observations whereas ORAS and SODA are too shallow in some parts of the GOM compared to observations.By comparison, higher resolution products compare more favorably to the ETOPO product.These differences are important when comparing key variables such as bottom temperature and salinity, as shown below.

Table 1
Reanalysis used in this study and their attributes.

CFSR
The CFSRv1 (available from 1979 to 2011; Saha et al., 2010) and CFSRv2 (available from 2011 to 2022; Saha et al., 2014) is a coupled atmosphere ocean-land surface sea ice system.The reanalysis has a monthly and 6-hourly temporal resolution, and it is available at a 0.5 • (≈55 km at the NES latitude band) horizontal resolution with 40 vertical levels.The ocean component is the MOM (Modular Ocean Model) version 4 sea ice model (Griffies et al., 2015) and the atmospheric component is GFS.The CFSR coupler sends and receives data, including atmospheric fluxes, between the MOM version 4 sea ice model and GFS at every time step (Saha et al., 2010).CFSR uses the Global Ocean Data Assimilation System (GODAS) 3D-Var assimilation scheme (Derber and Rosati, 1989).The reanalysis assimilates temperature observations from fixed moorings, Argo (Roemmich and Gilson, 2009;Riser et al., 2016) and eXpendable BathyThermographs (XBTs) from the National Oceanographic Data Center World Ocean Database (WOD) 1998 (Conkright et al., 2002), and from the Global Temperature and Salinity Program Profile (GTSPP).Salinity profiles are assimilated from Argo when available.SST is assimilated every 6 h to the daily mean from the National Oceanic and Atmospheric Administration (NOAA) Optimum Interpolation SST product (OISST) (Reynolds et al., 2007).Sea surface salinity data is assimilated from a climatological map based on the WOD 1998.Atmospheric data is assimilated from the National Centers for Environmental Prediction (NCEP).

ECCO
The ocean sea-ice state estimate ECCO version 5 alpha has 50 levels in the vertical and a variable horizontal resolution that increases toward the tropics, with roughly 0.25 • (≈27 km) in the NES (Forget et al., 2015).It is available at a monthly temporal resolution from 1992 to 2017.The ocean model is the MITgcm (Marshall et al., 1997) LLC270 (Fenty et al., 2017) and it is forced with atmospheric variables from the European Centre for Medium-Range Weather Forecasts (ECMWF) ERA-Interim forcing.ECCO uses the Tangent Linear and Adjoint Model Compiler with a variational data assimilation scheme which directly assimilates vertical mixing coefficients and components of the air-sea fluxes.Along-track sea level anomalies are assimilated from several satellite altimeters (Forget and Ponte, 2015) relative to a mean dynamic topography (MDT) computed from satellite altimetry and tide gauges (Andersen et al., 2018).Monthly ocean bottom pressure anomalies from GRACE Mass concentration (Watkins et al., 2015), daily SST fields from Advanced Very high-resolution Radiometers (AVHRR) (Reynolds et al., 2002), and daily sea-ice concentration fields from Special Sensor Microwave/Imager (Meier et al., 2017) are also included in the assimilation scheme.The primary in situ data includes the global array of Argo profiling floats (Roemmich and Gilson, 2009;Riser et al., 2016), shipboard Conductivity Temperature and Depth (CTD)s and XBT hydrographic profiles and the monthly temperature and salinity climatology from the World Ocean Atlas (WOA) 2009 (Boyer et al., 2009), tagged marine mammals (Roquet et al., 2017;Treasure et al., 2017), and ice-tethered profilers in the Arctic (Krishfield et al., 2008).

ORAS
The ORAS version 5 (ORAS5; Zuo et al., 2017Zuo et al., , 2019) ) is available from 1979 to 2019 and has a daily and monthly temporal resolution with a 0.25 • resolution (≈27 km at the NES latitude band).There are 75 vertical depth levels.The product includes sea ice and surface wave models and uses the Nucleus for European Modelling of the Ocean (NEMO) version 3.4.1 ocean model (Madec, 2016) coupled to the LIM2 sea-ice model (Fichefet and Maqueda, 1997).The product uses surface forcing from the ERA-Interim reanalysis.The assimilation is conducted using NEMOVAR (Weaver et al., 2005;Mogensen et al., 2012).It assimilates reprocessed SST from HadISST2 and sea-ice concentration from OSTIA (Operational Sea Surface Temperature and Sea Ice Analysis; Donlon et al., 2012), reprocessed in situ profiles from the Met Office Hadley Centre observations dataset "EN4" (Good et al., 2013) and sea level from AVISO (Archiving, Validation and Interpretation of Satellite Oceanographic data; Pujol et al., 2016) using a MDT from a model run assimilating temperature and salinity (Balmaseda et al, 2013).For this study we are using the ensemble mean which consists of five members with differences in the perturbations added to the assimilated and forcing fields.

SODA
The SODA version 3.12.2(Carton et al., 2018) has a horizontal resolution of 1/4 • (≈27 km at the nes latitude band), a 5-day temporal resolution and 50 vertical levels.It is available from 1980 through 2016.The ocean component of the product is MOM5 (Griffies et al., 2015) and the atmospheric forcing is the Japanese 55-year Reanalysis (JRA55; Shinya et al., 2015).SODA uses optimal interpolation (Bloom et al., 1996).The main datasets that SODA assimilates are shipboard CTD and XBT hydrographic profiles from the WOD13 (Smolyar, 2013) and from the remotely sensed data L3 Pathfinder version 5.2 AVHRR SST (Casey et al., 2010).Note that SODA only assimilates temperature and salinity data.It does not assimilate any altimeter data.

BRAN
BRAN version 2020 (Chamberlain et al., 2021a) is available from 1993 to 2019.The reanalysis is available at daily temporal resolution and at 1/10 • (≈11 km at the NES latitude band) horizontal resolution with 50 vertical levels.The ocean component is Ocean Forecasting Australian Model (OFAM) version 3 which is configured with MOM5 and it is forced by JRA55.BRAN assimilates data using the Ensemble Optimal Interpolation capability of EnKF-C (Sakov, 2014).It assimilates satellite SST, satellite SLA, and in situ temperature and salinity.The satellite SST assimilated are AVHRR and Along Track Scanning Radiometer (ATSR) (Embury et al., 2019).Along track satellite SLA data from all available platforms from Radar Altimeter Database System (RADS v.4, Scharroo et al., 2013) are assimilated using a MDT from a model run with no data assimilation (Chamberlain et al., 2021b).In situ observations of temperature and salinity are assimilated from the Copernicus Marine Environment Monitoring Service (CMEMS) Coriolis Ocean dataset for ReAnalysis (CORA, versions 5.0 and 5.1; Cabanes et al., 2013) and from a near-real time database maintained at the Australia Bureau of Meteorology.

GLORYS
The GLORYS12 version 1 is available from 1993 to 2019 (Lellouche et al., 2021) at a daily and monthly temporal resolution and 1/12 • (≈9 km at the NES latitude band) horizontal resolution with 50 vertical levels.The ocean component is generated using NEMO and the atmospheric forcing is provided by the ECMWF ERA-Interim.The reanalysis is produced using a reduced-order Kalman Filter scheme and a 3D-Var scheme for the correction of large-scale biases in temperature and salinity.Observations of delayed time sea level anomaly from all altimetric satellites from CMEMS, satellite-based SST from OISST AVHRRonly (Reynolds et al., 2007), sea ice concentration from the Centre ERS d'Archivage et de Traitement (Girard-Ardhuin et al., 2008), and in situ temperature and salinity vertical profiles from CORA v4.1 database (Cabanes et al., 2013) are jointly assimilated.The MDT used to assimilate SLA is also obtained from the CMEMS product version CNES-CLS-13 (Lellouche et al., 2018).

GOFS 3.0 and 3.1
We evaluate GOFS versions 3.0 and 3.1 (Cummings and Smedstad, 2014).The products are available every three hours and have a 1/12 • (≈9 km at the NES latitude band) horizontal resolution and 40 vertical levels.GOFSs are available as reanalysis and analysis.The analysis is a daily nowcast through a seven day forecast.The reanalysis is a hindcast simulation that reconstructs the ocean state.For GOFS 3.0, the reanalysis is available from 1992 to 2012 and analysis starts from 2013 onward.For GOFS 3.1, the reanalysis is available from 1994 to 2015 and analysis starts from 2015 onward.The reanalyses vertical coordinates are hybrid type which is a combination between terrain following and vertical z-levels.The ocean component is derived from the HYbrid Coordinate Ocean Model (HYCOM).Atmospheric forcing is obtained from NCEP for both reanalysis versions and from the Navy Operational Global Atmospheric Prediction System (NOGAPS) and Surface and NAVy Global Environmental Model (NAVGEM) for the analysis version 3.0 and only NAVGEM for the analysis version 3.1.HYCOM assimilates data using a 3D-Var Navy Coupled Ocean Data Assimilation (NCODA) multivariate optimal interpolation scheme as described by Cummings (2005) and Cummings and Smedstad (2014).The GOFS assimilates satellite altimeter observations, satellite and in situ SST observations, as well as in situ vertical temperature and salinity profiles from XBTs, Argo floats and moored buoys.In both GOFSs, MDT used to assimilate altimetry is obtained from synthetic profiles, while GOFS3.1 does not explicitly use a MDT.The main differences between versions include the equation of state, the surface wind, the radiation forcing and the sea surface salinity relaxation (Metzger et al., 2017).

Observations
We evaluate the eight reanalyses relative to the following set of publicly available observational datasets.The observational datasets are chosen based on geographic location, resolution, and time period in order to provide a systematic comparison that will cover several years and regions of the NES.Fig. 2 shows the temporal availability of all datasets used in this study.Below we give a brief description of each and indicate whether they are assimilated by any of the reanalysis products evaluated in this study.Except for the sea level tide gauges, observations are partially or fully assimilated by one or more of the reanalysis products.While the observations are not independent, our goal is to evaluate which reanalysis best represents the ocean dynamics and properties on the NES.

Absolute dynamic topography and sea level anomaly from satellite CMEMS altimetry
We use the daily satellite altimetry product from CMEMS to evaluate the absolute dynamic topography (ADT) and sea level anomaly (SLA) in each reanalysis product (Pujol et al., 2016).The CMEMS dataset begins in 1993 with a daily temporal resolution and 0.25 • (≈25 km) horizontal resolution, although we only consider data from 1994 to 2017 to overlap with the period covered by the reanalyses.We use the blended satellite gridded product, which combines along-track data from all the available satellite missions and has errors of about 1-2 cm 2 for wavelengths larger than 250 km.The SLA is computed with respect to the 20-year mean (1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012).It should be noted that all reanalyses except SODA assimilate some form of satellite altimetry, with slight differences in the altimeters chosen and the processing level (Table 1).We use the ADT calculated from the MDT version CNES-CLS18 (Mulet et al., 2021) to assess the overall large-scale circulation including the Gulf Stream.We use along-track SLA measurements made by Topex/Poseidon and Jason from 1994 to 2017 to overlap with the reanalyses period.The locations used for this comparison are shown on Fig. 1 and are chosen from the along-track passes closest to the NES (228, 243, 126 141) and at points closest to the shelfbreak, the slope water and the Gulf Stream.

Sea surface temperature from NOAA-OISST
We evaluate SST in the reanalyses using the NOAA OISST product which is primarily based on the AVHRR SST (Reynolds et al., 2007).The NOAA OISST incorporates data from satellite radiometers as well as in situ data from buoys, ships, and Argo floats.It is available as a gridded dataset with a 0.25 • (≈25 km) spatial resolution and a daily temporal resolution.The product is available from 1981 onward, but only data from 1994 to 2017 are used in this study to align with the reanalyses.CFSR has directly assimilated the OISST product.

NCEI temperature and salinity climatology
To evaluate the horizontal and vertical thermohaline structure on the NES, we use the NOAA National Centers for Environmental Information (NCEI) Northwest Atlantic Regional climatology (Seidov et al., 2018).The climatology is generated from the WOD13 dataset and is available in 10-year averages (except for the 2005-2012 average) at a spatial resolution of 1/10 • (≈10 km).We use the average of two available periods: 1995 to 2004 and 2005 to 2012.

NEFSC temperature and salinity hydrography
In situ bottom and surface temperature and salinity are obtained from the NOAA Northeast Fisheries Science Center (NEFSC) Ecosystem Monitoring (ECOMON) program (Fratantoni et al., 2019).ECOMON conducts hydrography and plankton surveys up to six times yearly over the continental shelf between Cape Hatteras, North Carolina and Cape Sable, Nova Scotia.Observations are available from 1977 through the present and we compare the data only between 1994 and 2017 to overlap with the reanalyses.Data collected within the upper 5 m are considered surface values and within 10 m of the bottom are considered bottom values.This database is included in the WOD database and, therefore, is part of the NCEI climatology.

Temperature from XBT Oleander transects
The vertical temperature distribution on the continental shelf is compared with vertical XBT profiles of subsurface temperature deployed from the CMV Oleander.Observations have been collected aboard CMV Oleander along its weekly transit from Port Elizabeth, New Jersey to Bermuda since 1977.Here we compare data for 1994-2017 only (Flagg et al., 1998;Rossby and Gottlieb, 1998;Rossby et al., 2019).The average number of profiles available per transect is 19 before 2008 and over from 2008 onward.Locations shoreward of the 200-m isobath are used in this study and are shown in Fig. 1b.The Oleander dataset is part of the WOD and GTSPP datasets.

Subsurface temperature from NERACOOS
To evaluate the interannual variability of subsurface temperature in the GOM, we use CTD data from instruments deployed at various depths on moorings in the Gulf of Maine as part of the NorthEastern Regional Association of Coastal and Ocean Observing System (NERACOOS) (Wallinga et al., 2003).Locations are shown in Fig. 1b We use data from instruments deployed at 1, 20, and 50 m on moorings A01, E01 and F01 (Data for mooring A is only available at 1 and 20 m).The moorings with the longest data record are chosen.The moorings were deployed in 2001, and comparisons are made using data between 2004 and 2018 at 1 m, and between 2004 and 2012 for data at 20 m and 50 m due to some gaps in the observations.The moorings are maintained by the University of Maine.

Tide gauge station data
To evaluate the sea level variability near the coast, we use research quality data from three tide gauges maintained by the Joint Archive for Sea Level in conjunction with the University of Hawaii Sea Level Center and NCEI (749: Chesapeake Bay, 264: Atlantic City, NJ, 742: Woods Hole, MA).Locations are shown in Fig. 1b.These observations are not assimilated by any of the reanalyses.Locations were chosen such that the time availability of the tide gauges overlaps with the 1994-2017 reanalysis time period.

Mean circulation in the Northwest Atlantic
We first examine climatological mean Sea Surface Height (SSH) in the greater Northwest Atlantic (35 o N-50 o N, 80 o W-50 o W).Fig. 4 shows the difference (bias) between the climatological mean SSH from each reanalysis and the mean ADT from CMEMS.Note that the spatial means defined as the average over the entire region shown in Fig. 4 are removed from both simulated and observed mean SSHs.The spatially averaged RMSE (Root Mean Squared Error) of the biases is listed above each panel.Due to the different MDTs used in the data assimilation of altimetry data (Section 2) the biases are in part determined by the similarity between the MDT used for each reanalysis and the one used for the CMEMS observation.For example, the GLORYS assimilated the CMEMS MDT and is thus expected to exhibit relatively small biases.Differences with observations occur for all the reanalyses, which exhibit a dipole pattern with large positive (negative) values north (south) of the strong climatological SSH gradient associated with the Gulf Stream.The CFSR, ECCO, and ORAS have positive biases that exceed 50 cm and negative biases of − 30 cm or lower and RMSE values of about 15 cm.SODA and BRAN have similar maximum biases of − 50 and + 25 cm and − 40 and + 20 cm, respectively.GLORYS and GOFSs show the lowest dipole values, with GLORYS having biases with an amplitude of ≈15 cm, RMSE of 4 cm and both GOFS products having biases of ≈-20 and ≈-30 cm and RMSE of 6 and 8 cm for versions 3.0 and 3.1, respectively.Fig. 4 also shows the mean Gulf Stream path from satellite observations and each reanalysis.The path is computed by selecting grid points with the maximum standard deviation of SLA at each longitude (Pérez-Hernández and Joyce, 2014).The mean biases and RMSE differences between the Gulf Stream path location from reanalyses and observations as well as the separation latitude for each product are included in Table 2.
The separation latitude is defined as the latitude where the Gulf Stream path, computed as described above, intersects the 2000-m isobath.The Gulf Stream path is shifted mostly north (mean biases larger than 78 km and RMSE larger than 100 km) in CFSR, ECCO and ORAS while in the high-resolution products and SODA the differences are less than 18 km, with the smallest mean biases (<7 km) and RMSE differences (~15 km) in GLORYS and BRAN.
The dipole pattern in SSH bias is likely due to two factors.First, in all the coarser resolution products except SODA, the Gulf Stream is shifted shoreward relative to observations and separates from the coast at a higher latitude (>37 o N) than compared to observations (35.8 o N).This overshooting problem has been studied extensively in numerical model simulations (e.g., Wang et al., 2014;Small et al., 2014;Saba et al., 2016;Ezer, 2016;Chassignet and Xu, 2017), yet it is still present in modern reanalyses even with the assimilation of altimetry data.The accurate representation of the Gulf Stream is key to capturing regional circulation dynamics and the shelfbreak frontal structure since the Gulf Stream is the primary source of warmer and saltier water for the NES (Loder, 1998).A more shoreward-shifted Gulf Stream will therefore modify the water mass characteristics at the shelfbreak and can have important implications for the frontal structure separating the warm and salty slope water from the cold and fresh shelf water.The second factor which may be contributing to the dipole pattern in SSH bias is the inability of the coarser resolution products to reproduce the narrow width of the Gulf Stream in the cross-jet direction.Smoothing in the cross-stream direction results in a weaker jet with positive and negative anomalies to the north and south of the jet axis, respectively.
To quantitatively assess which of the two factors is more important, we calculate a synthetic bias defined as the ΔSSH_synt(x,y) = SSH_obs (x, y + dy) -SSH_obs(x,y), where dy is the magnitude in km of the Gulf Stream position bias estimated based on the Gulf Stream main axis for each reanalysis and observations calculated for each longitude band.Then we estimate the portion of the SSH bias that is not related to the Gulf Stream position, presumed to be associated with a too broad GS.This residual is defined as ΔSSH_residual = (SSH_reanalysis -SSH_obs) -ΔSSH_synt.Fig. 5 shows the biases associated with the Gulf Stream position (ΔSSH_synt) and with the Gulf Stream width (ΔSSH_residual).This kinematic calculation shows that the warm biases in CFSR, ECCO and ORA are mostly attributable to the misrepresentation of the Gulf Stream position, whereas the dipole biases in the high-resolution products are mostly attributed to the Gulf Stream width and, in the case of GLORYS, to the commonality between the MDT product used for the assimilation (CNES-CLS-13) and for the comparison.(CNES-CLS-18), although some differences have been previously observed between these products (Wilkin et al., 2022).
Figs. 6 and 7 compare the observed SSH variability from CMEMS with that from each reanalysis and map the differences, with the spatial RMSE of the biases included on Table 2.These figures capture how well the reanalyses represent the regional sterodynamic processes including the Gulf Stream variability, which is an important influence on the thermal structure on the shelf.CFSR, ECCO and ORAS have the lowest variability in and to the south of the Gulf Stream region (<30 cm).SODA is the best of the low-resolution products, capturing reasonably well the variability of the Gulf Stream west of 60 o W. The variability in the highresolution reanalyses is in reasonable agreement with observations, with maximum values reaching roughly 45 cm (Fig. 6).In the high-resolution products, there is an overall overestimation of the variability throughout the domain.This is expected since these higher-resolution reanalyses would more effectively capture the mesoscale eddies.(Fig. 7).Among them, GLORYS shows the lowest spatial RMSE value (≈1 cm), while BRAN and GOFS have values of about 3 cm.
To assess the representativeness of sea-level anomalies in each reanalysis, we have identified 4 altimetry passes from the Topex/ Poseidon and Jason satellite missions that cross the Gulf Stream and NES and include observations that overlap the reanalysis time period.For each reanalysis the sea-level anomaly is calculated as the SSH minus the time averaged SSH from 1994 to 2012.Taylor diagrams shown in Fig. 8 summarize along-track SLA comparisons at several locations shown in Fig. 1, including one point near the axis of the Gulf Stream, one point in the Slope Sea north of the Gulf Stream and one point near the shelfbreak.We focus our comparisons on just the high-resolution reanalysis products and SODA.
products, there is not any one product that outperforms all others at the 12 chosen locations when comparing SLA, but GLORYS has the largest correlations in 8 of the 12 comparisons (Table 2).Correlations (r) are larger than 0.4 in all products, except some locations in SODA and GOFS3.0, and standard deviations fall within 25 % of observations except at points near the shelfbreak (circles).The points near the shelfbreak also tend to exhibit larger RMSE than points in the Gulf Stream (crosses) and Slope Sea (triangle).The strongest correlations (r > 0.7) and standard deviations closest to observations are observed in points near the Gulf Stream axis (crosses) and in those passes which are farthest from the coast (126 and 141).
A. Carolina Castillo-Trujillo et al.

Table 2
Summary of the NES comparisons.The reanalysis products which compare best to observations are listed in bold and the products having an error value of 10% or less as the best comparison are highlighted with shaded backgrounds.

Mean surface and bottom temperature and salinity on the NES
We present the differences between the mean surface and bottom temperature and salinity on the NES.The annual mean SST from the OISST data set is compared with temperature data from each reanalysis at the shallowest available depth (Table 1) averaged between 1994 and 2017 (Fig. 9).Hereinafter, in situ temperature measurements have been converted to potential temperature to compare with the reanalyses.The lowest RMSE (<0.5 • C) and biases (<1 • C) are found in the highresolution products (BRAN, GLORYS and GOFSs) and CFSR, while ECCO, ORAS and SODA have RMSE values closer to 1 • C and biases as large as 2 • C, particularly near the shelfbreak.It is not surprising that CFSR has low biases compared to the rest of the coarser resolution products since the OISST product is directly assimilated by CFSR with strong nudging (Saha et al., 2010).The rest of the reanalyses assimilate slightly different satellite SST products (Table 1).
There are important regional differences in SST between the reanalyses.CFSR and ECCO are dominated by warm biases, with ECCO showing the largest values (≈2 • C) throughout the NES.SODA exhibits mostly cold biases of about 1 • C. ORAS contains a region of large positive biases (>3 • C) in the southern MAB and near the shelfbreak.The highresolution products show positive and negative biases of about 1 • C or less across the NES.All products except for CFSR and ECCO exhibit cold biases in the northern MAB.In the GOM, warm biases of 0.5 • C or more are found in all products except for SODA.
These regional differences indicate the distinct processes contributing to surface temperature biases on the NES.For example, in the GOM and Georges Bank, warmer water can indicate a lack of mixing, which is pervasive in these regions due to the large tides.In contrast, warmer waters in the MAB, in the coarser products, are likely due to the misrepresentation of the Gulf Stream path and width as shown by Fig. 5.
We also compared the temperature fields from the reanalyses with three NDBC buoys located near the coast (Fig. 1b).Point-to-point comparisons were made between the surface temperature anomalies from reanalysis SSTs using a nearest neighbor interpolation method and removing the seasonal cycle.A Taylor diagram and the time series are shown in Fig. S1 and S2.Correlations are significant but reanalyses underestimate the temperature variability and have RMSE values larger than 0.25 • C, even though the buoy temperatures are assimilated by all of the reanalyses.
In the rest of this section and in section 4.3, we will use the NCEI climatology averaged from 1995 to 2012 to compare with the reanalyses.The Sea Surface Salinity (SSS) climatology from reanalyses is compared with the NCEI climatology at its shallowest depth (0 m) (Fig. 10).The NCEI climatology is constructed from WOD13, which itself is partially or fully assimilated by all of the reanalyses (Table 1).The reanalyses assimilate fewer salinity observations than temperature, and hence it is expected that the latter will be better represented in reanalyses on the shelf.As with SST, there are large differences between reanalyses and across the NES sub-regions.RMSE is smallest for the high-resolution products and SODA (<0.6 psu), while RMSE is about 1 psu for the remaining coarse resolution products.The southern MAB is saltier in all reanalyses (>1 psu), although in CFSR, ECCO and ORA the region of salty bias extends farther north and is stronger (>2 psu) than in the high-resolution products.This is particularly true for CFSR.In large areas of the GOM, all reanalyses are fresher than the NCEI climatology, with the largest negative bias found in ECCO and ORAS (≈1 psu).
As with SST, positive SSS biases in the MAB are likely due to the misrepresentation of the Gulf Stream which, as shown above, is shifted closer to the shelf in the coarser resolution products.In contrast, the biases in the GOM could be due to the misrepresentation of tidal mixing in the reanalyses.For example, the lack of tides would inhibit the injection of salty deep water to the surface layer, leading to fresh bias near the surface, particularly in places where tidal forcing is strong, such as Bay of Fundy at the northern end of the GOM.One notable difference in the SST and SSS between the reanalyses is that SODA is colder and fresher throughout the NES, compared with the rest of the products.
Next, the climatological mean bottom temperature from each reanalysis product is compared with the NCEI climatology (Fig. 11).The RMSE values are larger than 3 • C in the CFSR, ECCO and ORAS, while SODA, BRAN and GOFSs have values closer to 2 • C and GLORYS shows the smallest RMSE of 1 • C. Similarly, the coarser products have the largest positive biases (~4 • C) in the NES.One exception among the lowresolution products is the SODA, in which positive and negative biases are less than 3 • C. The distribution of biases in CFSR largely reflects the unrealistic representation of bottom topography, which dictates the paths for the circulation and mixing of water masses near the bottom (Richaud et al., 2016) and at the shelfbreak front.Cold biases in the central GOM from the CFSR likely reflect the fact that the Northeast Channel is not well resolved by the coarse resolution (Fig. 3), thereby restricting the influx of warm slope water to the deep basins of GOM (Ramp et al., 1985;Smith et al., 2001;Greene et al., 2013).Interestingly, other low-resolution products with better representation of the bottom topography in the GOM (Fig. 3) do not show the cold bias.Instead, both ECCO and ORAS have large warm biases throughout the NES, perhaps from too much mixing with the slope water due to the lack of strong shelfbreak fronts.The MAB is generally warmer in all reanalyses, with temperature differences exceeding at least 1.5 • C. BRAN has warmer biases (3 • C) in the MAB, compared to the rest of high-resolution products.The warm biases in the MAB are likely associated with the poor representation of the Cold Pool, as will be discussed below.The Cold Pool is a seasonal bottom-trapped cold water mass, formed locally through winter convection and mixing and maintained by the southwestward advection of cold water and weak vertical mixing in spring and summer (e.g., Houghton et al., 1982;Lentz, 2017;Chen et al., 2018).
The mean bottom salinity is compared with the NCEI climatology in Fig. 12. Biases vary significantly between the coarse and high-resolution reanalyses.Overall, the RMSE is larger in ORAS (≈3 psu) and between 1 and 2 psu in the rest of the products, except for BRAN which has a value of 0.6 psu.The coarser resolution products, except SODA, are saltier by more than 1 psu throughout the shelf, except in the central GOM in CFSR.As with bottom temperature, these anomalously salty bottom waters on the shelf could result from the lack of strong shelfbreak fronts.On the other hand, the fresh bias in the deep GOM in CFSR is likely related to the lack of warm and salty inflow from the slope as noted above.The high-resolution reanalyses and SODA exhibit much lower biases (<1 psu) for most of the region and are skewed toward fresher waters.Of the high-resolution products and SODA, one notable difference is that the GOM is only slightly saltier and almost unbiased in GLORYS and GOFSs while it is slightly fresher (≈1 psu) in SODA and BRAN.
Comparing the temperature and salinity biases near the surface and the bottom  indicates that all of the reanalyses have larger errors near the bottom.This difference is likely due to a few factors.First, as discussed above, a proper representation of bottom topography is key to producing realistic temperature and salinity fields near the bottom.Second, the temperature and salinity variabilities at depth are controlled by both advective fluxes and vertical mixing, which are more challenging for the global models to correctly resolve in dynamically complex coastal regions such as the NES.Third, data assimilation provides relatively weaker constraints on bottom temperature and salinity due to more sparse subsurface observations.

Shelfbreak front
In the NES region, a persistent thermohaline front is maintained near the shelfbreak, with fresh and cold water on the shoreward side and salty and warm water on the seaward side (Bigelow, 1933;Linder and Gawarkiewicz, 1998;Fratantoni and Pickart, 2007).Enhanced productivity occurs in the vicinity of the front, making the region critical for supporting commercial fisheries (Marra et al., 1990;Linder and Gawarkiewicz, 1998;Ryan et al., 1999;Oliver et al., 2022).Since the front is clearly identified in the climatological mean, although smoothed due to temporal averaging (Linder and Gawarkiewicz, 1998), we compare the mean vertical distribution of temperature and salinity over Fig. 8. Comparisons between sea-level anomalies from along-track data from the Topex/Poseidon and Jason missions from using passes (243, 228, 126 and 141) and high-resolution reanalyses and SODA at locations shown on inset.The circle, triangle and cross symbols denote the locations on each of the altimetry passes closest to the shelfbreak, in the slope water and at the Gulf Stream position.Reanalyses and observations have been normalized using the standard deviation of the observation so that observations are always located on the x-axis with standard deviation equal to 1. a cross-shelf section (Fig. 1) during winter (January, February and March) and summer (July, August and September) (Figs. 13-14) against the NCEI dataset.reproduced in the high-resolution products, but it is shifted seaward in GOFS3.1 compared to the rest of the high-resolution products.Mean biases vary between 0.30 and 0.9 psu and are larger than 0.5 in the coarser products and in summer in BRAN.A large positive (salty) subsurface bias near 50-100 m is observed in CFSR, ECCO and ORAS in summer.The rest of the products have fresh and salty biases (≈0.5 psu) on the shelf and outer shelf, respectively, except for BRAN in summer, which has a salty bias throughout the transect and GOFS3.1, which shows mostly fresh biases throughout the transect.
The along-shelf (roughly zonal) velocity is overlaid in Fig. 13 and Fig. 14, to compare the representation of shelfbreak frontal jet among the reanalyses, which transports cold and fresh waters to the NES (Linder and Gawarkiewicz, 1998;Fratantoni et al., 2001).CFSR does not reproduce the shelfbreak jet at all, while ECCO and ORAS only capture a broad weak westward flow in winter, suggesting that the warm biases in Fig. 13 are partly due to the misrepresentation of the jet.The jet position varies between summer and winter and among reanalyses.In particular, the SODA, GLORYS, and BRAN's jets are further inshore along with the fronts in summer (Fig. 13).This shoreward shift can bring warmer and saltier waters to the outer shelf.The shelfbreak jet is produced from the thermal wind balance at the front, and hence, it is sensitive to several forcing factors, including local winds and upstream water properties, which are represented differently in different reanalyses.Observational studies have found that the jet width is on the order of 15 to 20 km near the 150-m isobath (Linder and Gawarkiewicz, 1998;Fratantoni et al., 2001) and that it is located further offshore in summer than in winter    (Linder and Gawarkiewicz, 1998).GOFS3.1 seems to best characterize these characteristics of the shelfbreak jet described in the aforementioned studies.
The difference in biases in Fig. 13 and Fig. 14 indicate that different processes are modulating the temperature and salinity structure.The warm and salty biases observed in the coarser resolution products are also related to their inability to resolve the shelfbreak topography, leading to a poor representation of the shelfbreak frontal system (e.g., A . Carolina Castillo-Trujillo et al. front being weakened, flattened, and displaced).In comparison, the strong summer bias in BRAN could be related to the misrepresentation of the Cold Pool.
To better quantify the frontal strength and the baroclinic current shear, we have calculated the cross-shelf density gradient following Linder and Gawarkiewicz (1998).We first computed the potential density over the transect defined in Fig. 1.Then, on each transect, we subtracted the onshore density (averaged over a 10 km interval from 30 to 20 km shoreward of the 100 m isobath) from the offshore density (averaged over a 10 km horizontal interval from 20 to 30 km seaward the 100 m isobath) to estimate the cross-shelf density gradient.Fig. 15 shows the results of these calculations averaged over three different depths (5-15, 25-35, 45-55) and over each month for the overlapping period of 1995 to 2012.The density gradients calculated from the NCEI climatology (Fig. 15, black line) are consistent with those from Linder and Gawarkiewicz (1998) in their Fig. 7. Observations show that the density gradient is almost independent of depth from December to April, reflecting the lack of stratification during winter.In contrast, from May to August the density gradient decreases to less than 0.2 kg m − 3 at the surface while it remains between 0.5 and 1 kg m − 3 at the bottom and intermediate layers, since stratification is present during this time of the year.From August to December, the density gradient increases at the surface and at the intermediate layer to about (1 kg m − 3 ), while at the deepest layer, it remains between 0.5 and 1 kg m − 3 .Overall, the coarser products (Fig. 15, left column) do not reproduce well the seasonal variations of the cross-shelf density gradients, with the largest biases found at depth.CFSR has the weakest gradients (<0.2 kg m − 3 ) at all depths.ECCO has weaker gradients than observations at the intermediate and deeper layers throughout the year, while ORAS has weaker than observed gradients from July through October in the deeper layer.These differences in density gradients are likely due to the lack of shelfbreak topography and a misrepresentation of the shelfbreak jet as discussed previously .Of the high-resolution products (Fig. 15, right column), BRAN best represents the gradients at the surface but shows the largest biases (>1 kg m − 3 ) at the intermediate and deeper layers from April to October, reinforcing the fact that BRAN is likely not reproducing well the Cold.Pool (Fig. 13).GLORYS and GOFSs reproduce the seasonal variations at depth but not the summer decrease in density gradients at the surface, possibly due to a misrepresentation of the surface fluxes and the induced surface warming.One notable result is that CFSR, ECCO and ORAS show a summer decrease in the density gradient at depth which is not observed in the NCEI climatology, suggesting a deeper mixed layer depth in the reanalyses than in the observations.
In summary, CFSR has the weakest density gradients (<0.5 kg m − 3 ) at all depths and the largest differences when compared to observations, while BRAN best reproduces the NCEI dataset at the surface.GLORYS and GOFSs are more accurate at the bottom and intermediate layer.The skill of the product varies by feature, e.g., location and strength of the front, structure of the jet, representation of the Cold Pool, and seasonality.Therefore, the determination of the best product depends on the primary focus of the application.

Temperature comparisons with the Oleander section
We now evaluate how well the reanalysis products reproduce the seasonal variability of temperature within the water column by comparing the simulated profiles with XBT temperature data as observed by R/V Oleander between 1994 and 2017.These observations are assimilated in some form by all of the reanalysis products, since they are included in the WOD, EN4, and CORA databases (Table 1).Temperature profiles from the high-resolution reanalysis products and SODA were interpolated in time and space to align with the observed XBT profiles.(We are only including the high-resolution products and SODA in this comparison).We separate the results into summer (July to September) and winter (January to March) seasons and select only the profiles collected shoreward of the 200-m isobath, since we are interested in the vertical structure of the water column on the shelf (Fig. 16).
The mean vertical structure of temperature for summer and winter (Fig. 16a and 16d) is consistent with temperature features from previous studies (Linder and Gawarkiewicz, 1998;Linder et al., 2006;Forsyth et al.,2015).In summer, surface heating stratifies the water column, and the surface temperature is about 22 • C. The temperature decreases to a minimum of 9 • C at 50 m due to weak vertical mixing and the presence of the Cold Pool.At 100 m, the temperature slightly increases to 13 • C and gradually decreases to 11 • C at 200 m.In winter, the temperature is more uniform due to vertical mixing from stronger winds (Zhang et al., 2011).The surface temperature increases from a minimum of 8 • C at the surface to 13 • C at 100 m, remaining roughly constant to the bottom.Overall, biases and RMSE are larger in summer than in winter, consistent with the shelfbreak front comparisons.In summer, the biases are largest near 50 m, within the seasonal thermocline, whereas in winter the biases are largest below 100 m depth.GLORYS best represents the observations in summer, with biases of less than 1 • C and RMSE of about 3 • C, while in winter GOFS3.1 best reproduces the observations, with biases of less than 1 • C and RMSE of about 2 • C. One interesting feature of this comparison is that SODA's biases in summer are mostly negative and stable throughout the water column while the high-resolution products have positive biases with the largest values in the upper 50 m.These maximum RMSE and bias in summer correspond to the base of the thermocline suggesting that the reanalyses do not properly represent this feature.

Temperature comparisons using NERACOOS moorings
Temperature from instruments deployed on three moorings (A01, Fig. 16.Mean of the eXpendable BathyThermograph (XBT) temperature profiles from the CMV Oleander (a,d) and bias (b,e) and RMSE (c,f) between reanalyses and observations.Summer (top row) and winter (bottom row) correspond with months from July to September and from January to March, respectively.Values are calculated only from Oleander profiles shoreward of the 200-m isobath, as shown in Fig. 1b.Panels (a) and (d) show the number of profiles used in this intercomparison at each depth (shading).-S5).The reanalyses perform better at the surface than at depth since more observations are assimilated at the surface than at depth.Correlations at the surface are clustered between 0.8 and 0.95 with RMSEs between 0.25 • C and 0.60 • C, except for CFSR at the northern moorings (E01 and F01).BRAN has the highest correlations at the surface (r ≈ 0.95), followed closely by ORAS and GOFS3.1 (r > 0.90).At 20 m, there are significant differences between locations and reanalyses.The r values range from 0.4 to 0.9 with RMSEs between 0.3 • C and 1 • C and the weakest correlations are realized at the southern mooring (A01).BRAN has the best r values of 0.9 when compared to temperatures from the northern moorings (E01 and F01) but the agreement drops to 0.5 at the southern mooring (A01).At 50 m depth, r is clustered between 0.6 and 0.9 with RMSEs between 0.4 • C and 1 • C, although data from mooring A01 is not available at this depth.GOFS3.1 and BRAN have the best r values (>0.8) at both moorings.
CFSR performs worse than the other reanalyses at the surface and has low correlations at 20 and 50 m, probably due to the lack of spatial resolution in the GOM (e.g., Fig. 3).However, correlations do not necessarily increase with increasing spatial resolution.For example, ECCO, ORAS and SODA have r and RMSE values comparable to the highresolution products, except for ORAS at 50 m depth.In contrast, GOFS3.0 has some of the lowest correlations (r < 0.7) at 20 and 50 m.The reanalyses also better represent the standard deviation at the surface than at depth.In the GOM, temperature in the upper ocean is influenced by surface heating and the inflow of fresh and cold water Fig. 17.Taylor diagrams for the comparisons of reanalyses and observed subsurface temperature at three NERACOOS mooring stations (A, E, F) in the GOM (locations shown in Fig. 1b) for values at 1 m (a-c), 20 m (d-f) and 50 m (g-h).The radial coordinate of the Taylor diagrams is the standard deviation, the angular coordinate is the correlation (r), and the RMSE is proportional to the distance from the observation standard deviation to the reanalyses.Black color symbols indicate the temperature standard deviation from each mooring location and depth.
A. Carolina Castillo-Trujillo et al. from the Scotian Shelf and its modification through convective mixing (Mountain and Manning, 1994).Therefore, variations in r are primarily related to the reanalyses not accurately depicting the interannual variations in surface heating and cooling, and the advection and vertical mixing of cold waters coming from the north.

Temperature trends
The temperature on the NES has been increasing over the last century at a rate of about 0.007 • C/year in the MAB and of 0.0010 • C/year in the Gulf of Maine (Shearman and Lentz, 2010).Over the last 10 years, the warming trend has accelerated: recent studies have shown an SST trend of about 0.26 • C/year in the GOM (Mills et al., 2013) and 0.24 • C/year over the upper 200 m using the Oleander data (Forsyth et al., 2015).If one is interested in using reanalysis to dynamically downscale climate projections it is important to know how well the warming trends are being reproduced in each simulation.Here, we will use the temperature data from the NERACOOS moorings and the Oleander profiles to assess how well the reanalyses reproduce temperature trends in the NES.
Table 2 shows the trends from all three moorings averaged at each depth and from the Oleander XBT profiles shoreward of the 200 m isobath over the time periods described in the table caption (time series shown in Supplementary Material; Figs.S6).All of the observations show a warming trend at the surface and at depth.The moorings show a trend of 0.1 • C/year at the surface, 0.25 • C/year at 20 m and 0.24 • C/ year at 50 m, while the Oleander dataset shows a trend of 0.06 • C/year.All reanalyses reproduce a warming trend in the GOM, with GLORYS and BRAN reproducing the best the trends at all depths.At the surface, SODA and ORAS reproduce the trend as good as GLORYS and BRAN (differences of less than 0.01 • C/year) while at depth, CFSR and ECCO are the second-best reanalyses with differences of less than 0.03 • C/year between the reanalyses and the observations.When comparing the trends from the Oleander dataset, all reanalyses underestimate the trend, with BRAN and GOFS3.0 showing the best comparisons and a trend of 0.03 • C/year while GOFS3.1 has the slowest warming trend of 0.02 • C/year.

Interannual variability of temperature and salinity using the NEFSC dataset
Here, we assess the interannual variability of surface and bottom temperature and salinity in each reanalysis relative to observations collected by the NEFSC from 1994 to 2017 within the regions shown in Fig. 1b.First, the reanalyses are interpolated onto the position (longitude and latitude) and date corresponding to each observation using the nearest neighbor method.(Note that daily data are not available for CFSR, ECCO, and ORAS and thus, they are not included in this analysis.)Then, the yearly mean is computed for both reanalyses and observations and averaged over each region.Fig. 18 presents the mean biases per region and variable as a quilt diagram (time series shown in Supplementary Material; Figs.S6-S7).
Results are similar to the mean biases discussed above (Figs.9-12).Overall bottom temperatures have larger biases than surface temperatures.The reanalyses are biased towards warmer SST with some exceptions such as in the southern MAB (SMAB) when comparing SODA (-0.45 • C).Georges Bank (GB) and EGOM have the largest biases in SST, with the former measuring roughly 0.7 • C in GLORYS and GOFSs and the latter roughly 1 • C in SODA.SODA and BRAN have mean biases of 0.3 • C and 0.1 • C, respectively, in the GB region.The largest biases and differences in bottom temperature are observed in the northern and southern MAB, with SODA and BRAN showing the largest negative (<-1 • C) and positive (>2 • C) biases.By comparison, GLORYS performs best across all regions.Both results are consistent with the spatial pattern of bottom temperature biases shown in Fig. 11.There are significant regional differences: the eastern GOM is weakly biased toward colder bottom temperatures while the western GOM is biased toward warmer bottom temperatures.The GB region is biased cold across all reanalyses except for BRAN which is biased toward warmer temperatures in all regions except the EGOM, likely due to the absence of the Cold Pool as has been discussed throughout this study.The large biases in bottom temperature and the variability observed across the various reanalysis products are not observed in SST.Processes like the misrepresentation of the topography, vertical mixing and advection of waters may contribute to these differences.Moreover, reanalyses assimilate fewer observations at depth than at the surface.
The surface and bottom salinity biases are less than 1 psu with SODA showing the lowest biases in surface salinity.As with the temperature, there are pronounced differences between regions and reanalyses.The SMAB is saltier and presents the largest biases (≈0.6 psu) at the surface, with, interestingly, smaller values for bottom salinity (≈0.2 psu).These large surface biases are likely due to the misrepresented Gulf Stream in the coarser products, discussed earlier, and perhaps due to an incorrect representation of river sources which are important in the region (Castelao et al., 2008;Whitney, 2010;Geiger et al., 2013).The bottom salinity has fresh biases in SODA and BRAN in all regions except for BRAN in the SMAB, while GLORYS is biased towards salty waters (<0.3 psu) and GOFSs have both warm and salty biases across all regions.Some of the bottom temperature biases discussed above could be related to the presence (or absence) of the Cold Pool, a bottom trapped water mass important for the fisheries management of the NES, particularly for the southern New England yellowtail flounder (Sullivan et al., 2005;Miller et al., 2016;Xu et al., 2018).We have calculated the Cold Pool Index using the NEFSC and reanalyses bottom temperature (Fig. 19).Our calculation is adapted from Miller et al. (2016) and du Pontavice et al. ( 2022) as follows.We first delineate the spatial domain comprising the MAB and the Southern New England shelf between the 20 m and 200 m isobaths (Sullivan et al., 2005).We then define the Cold Pool domain as the area within that domain where the average bottom temperature was cooler than 10

Sea level near the coast
Satellite altimetry, which is fully or partially assimilated by most of the reanalyses, has known errors near the coast.Sea level data from tide gauges, which are not assimilated by any of the reanalyses, are used as an independent validation dataset for sea level near the coast.We choose three locations with the longest available records; Woods Hole, MA, Atlantic City, NJ and Chesapeake Bay.Observations and reanalyses are.
linearly detrended and the seasonal cycle is removed before comparing the monthly averages.Sea level data from tide gauges has been adjusted for the inverted barometer effect following Piecuch and Ponte (2015) by removing the response to barometric pressure using the monthly sea level pressure data from ERA5.Reanalyses do not need to be adjusted for variations caused by inverted barometric effect since none of the reanalyses consider pressure forcing.A Taylor diagram showing these comparisons is presented in Fig. 20.Observations and reanalysis are normalized by the standard deviation of the observations.The time series used in this comparison is found in the Supplementary Material (Fig. S8).
Overall, correlations are smaller than 0.9 and the RMSE is larger than 0.4.ORAS, SODA, BRAN, and GLORYS are clustered in the bottom left corner of the Taylor diagram and have r larger than 0.6 with BRAN and GLORYS having the largest correlation (r > 0.8) and smallest RMSE.These products have standard deviations 20 % lower than the observations and their RMSE is between 0.6 and 0.8.On the other hand, CFSR, ECCO, and GOFSs have the lowest correlations (r < 0.2) with standard deviation almost twice the size of observations.Several factors might contribute to the low r values and the large RMSE.For example, the coarser resolution CFSR and ECCO likely do not resolve the sea level variability associated with local winds which are important for sea level variability near the coast (Piecuch and Ponte, 2015;Piecuch et al., 2016;Andres et al., 2013;Woodworth et al., 2014).
The low r values in the high-resolution GOFS products might be related to the hybrid vertical coordinate system of the model.Conversely, ORAS does not assimilate satellite altimetry in regions having depths shallower than 500 m and uses coastal winds to improve the assimilation (Mogensen et al., 2012;Zuo et al., 2019).

Summary and discussions
We have compared temperature, salinity and sea surface height from eight ocean reanalysis products to a variety of in situ, satellite-derived and climatological observations on the Northeast U.S. continental shelf (NES).Table 2 summarizes our comparisons.For each metric, the reanalysis products that compare best to observations are listed in bold, and the products having an absolute error value within 10% or less compared to the best reanalysis are additionally highlighted with shaded backgrounds.Overall, there is not one product that is best across all metrics and across all regions.Out of the 65 comparisons, the highresolution products outperform the coarser products for all but three.More specifically, GLORYS and BRAN outperformed the rest of the products, performing best in 22 and 25 of the categories, respectively.In addition, these two products produced error metrics with values 10% or less as the best reanalysis, in 36 and 35 of the comparisons.However, BRAN performed as poorly as some of the coarser resolution products in its estimation of subsurface temperature over the shelf as it did not reproduce the Cold Pool water mass which is important for fisheries in the region.Overall, GLORYS most accurately reproduces the subsurface temperature on the shelf and the SSH variability, BRAN most closely reproduces the surface temperature on the shelf and sea level near the coast while GOFS3.1 performs better than the rest of the products in reproducing the density gradient at the shelfbreak front, while GOFS3.0performs best in reproducing the vertical structure of salinity.
Depending on the application, the temporal and spatial resolutions and record length of the reanalysis may be important factors to consider when choosing a reanalysis.ORAS, SODA and CFSR are available from the 80s onward (when routine satellite SST measurements became available), while the rest of the products are available from the early-90s (when satellite SSH observations became available).Computational resources might make the coarser resolution products easier to obtain than the high-resolution products.Therefore, SODA might be a viable choice if one is interested primarily in temperature and salinity, or ORAS if the interest is in sea level near the coast (Table 2).
Even though reanalysis products provide more accurate information than unconstrained numerical model simulations, errors remain due to the inadequate coverage of ocean observations (Balmaseda et al., 2015;Toyoda et al., 2017), the assimilated data, assimilation methods, model physics, and atmospheric forcing.Below we present a brief summary of the processes likely responsible for the observed biases in this study.Reanalyses are primarily limited by their spatial resolution.CFSR, ECCO and ORAS show the Gulf Stream separation shifted much further north than observed in satellite observations by ≈2 degrees latitude, leading to mean sea level RMSE larger than 40 cm near the shelf.This unrealistic separation brings warmer and saltier surface water closer to the NES and likely produces mean biases larger than 3 • C and 2 psu in surface temperature and salinity in the MAB.Moreover, the shelfbreak topography, a dynamical factor that inhibits shelf-slope exchanges of water masses, is not accurately represented in the coarser resolution products.As a result, the shelf and slope waters in these products can be too salty and warm, leading to an unrealistic (weaker than observations) frontal structure in the MAB.
There are significant regional differences common across the reanalyses.The largest positive salinity biases are also found in the southern MAB and closer to the coast, even in the high-resolution products (≈1 psu) albeit weaker than in the low-resolution products.This suggests that the reanalyses might not accurately represent river sources.Studies have shown that the proximity of the Hudson River, Delaware Bay and Chesapeake Bay deliver large fresh surface salinity anomalies to the southern MAB shelf (Castelao et al., 2008;Whitney, 2010;Geiger et al., 2013).Of the high-resolution products, BRAN has the most difficulty in reproducing the subsurface thermohaline structure on the shelf, as it is the only high-resolution product missing the Cold Pool.The absence of tidal mixing likely contributes to biases observed in the bottom temperature and salinity in these regions, particularly in the GOM and Georges Bank where tides are strong.On the other hand, some Fig. 20.Taylor diagrams for the comparisons between reanalyses and observed sea level at 3 tide gauge stations.Comparisons are made from data at Chesapeake Bay (749; a), Atlantic City, NJ (264; b), Woods Hole, MA (742; c) (tide gauge locations are shown in Fig. 1b).Reanalyses and observations have been normalized using the standard deviation of the observation so that observations are always located on the x-axis with standard deviation equal to 1.
A. Carolina Castillo-Trujillo et al. reanalyses might exhibit excessive mixing across the weak shelfbreak front or they might not represent the advection of cold and fresh waters from the Scotian Shelf, increasing the temperature and salinity biases on the NES.
One important result from this study is that all eight reanalyses reproduce a warming trend on the NES at the surface and at depth, with some differences among reanalyses not necessarily related to the reanalyses resolution.BRAN and GOFS3.0 best represent the trend when compared with the Oleander observations, while GLORYS and CFSR best represent the subsurface temperature trend in the GOM.
The depth of the real ocean bottom varies significantly from the depth of the nearest reanalysis grid cell (Fig. 3).This can inflate biases computed for bottom temperature and salinity, particularly in regions having more variable topography, like near the shelfbreak or in the Gulf of Maine.For example, in the GOM, the Northeast Channel is absent in CFSR while better represented in the high-resolution products.This channel is crucial because it is the main conduit for deeper slope waters to enter the NES region, and therefore responsible for setting the thermohaline structure in the deep GOM, establishing density gradients that drive the general circulation in the basin, and contributing to the hydrography in the shallower waters further downstream.
Reanalyses assimilate a different set of observations and have a different assimilation method.A summary of the data assimilated and assimilation method by each of the reanalysis products is shown in Table 1.CFSR, ECCO and SODA assimilate different versions of the WOD, which is also used to compute the NCEI climatology.BRAN and GLORYS assimilate CORA, ORAS, the EN4, and GOFS assimilate several sets of datasets which all include data from the WOD and, therefore, are included in the NCEI climatology.Similarly, satellite observations of temperature and sea level are assimilated by most of the reanalyses, but each product assimilates a different set of satellite observations (Table 1) and uses a different reference mean sea level to assimilate data, making it difficult to estimate the biases.Therefore, the comparatively small biases in GLORYS SSH could be related to the fact that the CMEMS product is used as reference sea level when assimilating SLA while the small biases when CFSR SST could be related to the fact that the NOAA OISST product is used in the CFSR assimilation.The assimilation method is also a factor in these comparisons.The Kalman filter assimilation method has been shown to perform better than 3D-var in many cases (Miyoshi, 2005;Whitaker et al., 2008), perhaps resulting in better agreement realized by GLORYS.Some of the biases presented in this study may also exist due to the choice of the observational dataset used for the intercomparison.The post-processing algorithm used to derive the NOAA OISST product can produce errors as significant as some of the biases found in this analysis (0.6 • C; Reynolds et al., 2007).
Fisheries groups are increasingly turning to reanalysis products for a more comprehensive description of the physical environment on the NES.As an example, statistical predictive models using GLORYS bottom temperature have been used in stock assessments (NEFSC, 2020) and to improve the skill of short-term forecasts (Chen et al., 2021).Therefore, the comparisons presented in this study could have important implications for the fisheries management on the NES.As an example, the Cold Pool, a bottom-trapped water mass important for the recruitment and spawning of the southern New England yellowtail flounder, is only well represented in four (SODA, GLORYS and GOFSs) of the eight reanalyses compared here.
This comprehensive survey provides information to academics, governmental agencies, and industries on which reanalysis is best depending on the focus of interest.Our results show that the reanalysis products are limited in representing the coastal environment, emphasizing the need for regional downscaled modeling in the NES.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Fig. 2 .
Fig. 2. Temporal availability of all observational data sets and the 8 different global ocean reanalyses used in this study.The symbol * next to the dataset name denotes data that is available before 1979.The symbol + denotes data that is available as climatology in decadal periods only.

Fig. 3 .
Fig. 3. Bathymetry from (first and third column) the ETOPO1 global elevation dataset having 1 arc-minute resolution and each reanalysis used in this study and (second and fourth column) the difference between each of the reanalysis bottom depths used in this study minus the ETOPO1 bathymetry.Note that the pixel size in each panel corresponds to the horizontal resolution of each reanalysis, since the ETOPO1 bathymetry has been interpolated onto the reanalysis grid before differencing.The 50 and 200 m isobaths are shown by the yellow contours.(For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 4 .
Fig. 4. Difference between reanalysis and observed Sea Surface Height (SSH, cm) from CMEMS (reanalysis minus observation).Comparisons are made using monthly data between 1994 and 2012.The grey solid (positive values) and dashed (negative values) thin lines indicate the mean SSH from observations at an interval of 10 cm from − 100 to 100.The black (dark grey) thick lines indicate the Gulf Stream path calculated from the altimetry (reanalyses).The green dashed line indicates the 2000-m isobath.The spatial RMSE of the biases in cm for each reanalysis is shown on top of each panel.(For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 5 .
Fig. 5. Biases associated with the Gulf Stream position (ΔSSH_synt; first and third column) and with the Gulf Stream width (ΔSSH_residual; second and fourth column) calculated from the biases shown in Fig. 4. The grey lines indicate the mean SSH from observations at an interval of 10 cm.The black (dark grey) thick lines indicate the Gulf Stream path calculated from the altimetry (reanalyses).

Fig. 6 .
Fig. 6.SSH variability (cm) from a) CMEMS and (b-i) reanalyses.Comparisons are made using monthly data between 1994 and 2012 to overlap with all the reanalyses.The maximum value for each dataset is shown on top of each panel.The seasonal cycle has not been removed in any of the panels.The black (dark grey) thick lines indicate the Gulf Stream path calculated from the altimetry (reanalyses).The grey dashed line indicates the 2000-m isobath.

Fig. 7 .
Fig. 7. Difference in SSH variability between reanalysis products and CMEMS gridded altimetry product (cm; reanalysis minus observation).Comparisons are made using monthly data between 1994 and 2012 to overlap with all the reanalyses.The spatial RMSE of the biases in cm for each reanalysis is shown on top of each panel.The black (dark grey) thick lines indicate the Gulf Stream path calculated from the altimetry (reanalyses).The grey dashed line indicates the 2000-m isobath.
Fig. 13 shows a cross-shelf vertical transect of the NCEI climatology and reanalysis temperature and its associated biases.On the shelf, the seasonal progression of temperature is characterized by the transition from a vertically well-mixed water column in winter to a vertically stratified one in summer.The formation of the seasonal thermocline during spring and summer traps the Cold Pool of winter-mixed water near the bottom.In winter, observations, and reanalyses both show colder waters (≈5 • C) inshore of warmer waters (>10 • C).In summer, surface temperatures exceed 20 • C across all products, but the bottomtrapped Cold Pool (≈5 • C) is only observed in GLORYS, GOFSs and SODA.In winter, temperature biases are less than 3 • C and weaker than in summer, except for ECCO, which has a positive subsurface bias as large as 6 • C. The smallest mean bias is observed in GOFS3.1 (0.36 • C).In summer, mean biases are larger than 3 • C in CFSR, ECCO, ORAS and BRAN and have the smallest value of 1.4 • C in GLORYS and GOFS3.1.The cross-frontal vertical transect of salinity and its biases in comparison to the NCEI climatology are shown in Fig.14.The 34.5 isohaline contour which denotes the front location is overlaid on Fig.14.The horizontal distribution of salinity does not change much between seasons; with fresh water always found inshore of salty water and the mean front located at about the 100 m isobath.The front is weaker in CFSR, ECCO and ORAS, reinforcing the notion that the resolution is an important factor in reproducing the shelfbreak front.The front is well

Fig. 9 .
Fig. 9. Difference between mean Sea Surface Temperature (SST; o C) from reanalysis and NOAA OISST observation (reanalysis minus observation).Comparisons are made between 1994 and 2017 when all datasets overlap.The spatial RMSE of the biases in o, C for each reanalysis is shown on top of each panel.The black dashed line indicates the 200-m isobath from each reanalysis (Fig. 3).

Fig. 10 .
Fig. 10.Difference between Sea Surface Salinity (SSS; psu) from reanalysis and the NCEI climatology (reanalysis minus observation-based climatology).The spatial RMSE of the biases in psu for each reanalysis is shown on top of each panel.Comparisons are made between 1995 and 2012 to overlap with the NCEI climatology.The black dashed line indicates the 200-m isobath from each reanalysis (Fig. 3).

Fig. 11 .
Fig. 11.Difference between reanalysis and observed Bottom Temperature (BT; o C).Observations are provided by NCEI.Comparisons are made between 1995 and 2012 to overlap with the NCEI climatology.The spatial RMSE of the biases in o C for each reanalysis is shown on top of each panel.Comparisons are made between 1995 and 2012 to overlap with the NCEI climatology.The black dashed line indicates the 200-m isobath from each reanalysis (Fig. 3).

Fig. 12 .
Fig. 12. Difference between reanalysis and observed Bottom Salinity (BS; psu).Observations are provided by NCEI.The spatial RMSE of the biases in psu for each reanalysis is shown on top of each panel.Comparisons are made between 1995 and 2012 to overlap with the NCEI climatology.The black dashed line indicates the 200-m isobath from each reanalysis (Fig. 3).

Fig. 13 .
Fig. 13.Mean temperature across the north-south cross-shelf section shown in Fig. 1b for the NCEI climatology and the 8 reanalyses for winter (first column) and summer (second column).The black contours indicate the along-shelf velocity at intervals of 2 cm s − 1 .The solid (dashed) lines indicate eastward (westward) velocities.Mean temperature biases between the NCEI climatology and the reanalyses for winter (third column) and summer (fourth column) are also shown.Winter means are calculated from July to September and summer means are calculated from January to March.Comparisons are made between 1995 and 2012 to overlap with the NCEI climatology.The spatial RMSE of the biases is shown in the bottom left corner in columns third and fourth.The bathymetry from ETOPO1 and from each reanalysis is interpolated into the observations and simulations transect and is shown as a solid black line.

Fig. 14 .
Fig. 14.As in Fig. 13 but for salinity and with the dashed black line indicating the 34.5 isohaline contour, which is an indicator for the location of shelfbreak front.

Fig. 15 .
Fig. 15.The seasonal variation of the cross-shelf density gradients from the NCEI climatology and the coarser (left column) and high-resolution reanalyses (right column) averaged over three different depths 5-15 m (top row), 25-35 m (middle row) and 45-55 (bottom row).The gradients are calculated over the vertical transect shown on Figs. 13 and 14.

A
. CarolinaCastillo-Trujillo et al.E01 and F01) at different depths (1, 20 and 50 m) are used to assess the interannual variability of subsurface temperature in the GOM from 2004 to 2018.Comparisons at 20 m and 50 m depth are limited to 2004-2012.The observed and reanalyzed temperature time series are first detrended and the seasonal cycle removed.Taylor diagrams summarizing the comparisons of monthly anomalies between observations and reanalyses are shown in Fig.17 . The time series used for these analyses are shown in the supplementary material(Figs.S3

Fig. 18 .
Fig. 18.Quilt diagram showing a summary of the mean biases (reanalyses minus observations) of the interannual variability in surface and bottom temperature o C (a-b) and salinity psu (c-d) using the NEFSC temperature and salinity dataset for the SMAB, NMAB, GB, and WGOM and EGOM (regions delineated in Fig.1).

Fig. 19 .
Fig. 19.Cold Pool Index (CPI) time series between 1994 and 2017.The dark black line is calculated from the NEFSC bottom temperature observations.Only reanalyses which have a Cold Pool Domain, defined as the area where the time-averaged bottom temperature is cooler than 10 • C between June and September from 1994 to 2017, are shown.Mean biases between the reanalyses and observations are shown on parenthesis.
• C between June and September from 1994 to 2017.The Cold Pool Index from observations was calculated as follows; the NEFSC temperature data was interpolated into 0.25 degrees longitude and latitude grid cells and the time averaged bottom temperature between June and September was calculated at each grid point to define the Cold Pool domain.The Cold Pool Index was then calculated as the difference between the yearly averaged bottom temperature minus the averaged bottom temperature from June to September over the period from 1994 to 2017 over the Cold Pool domain only.The Cold Pool index from each reanalysis was calculated as the sum of the difference between the yearly average bottom temperature and the time averaged bottom temperature from 1994 and 2017 between June and September at each grid point (only over the Cold Pool domain).The Cold Pool index from observations and reanalyses is shown in Fig.19.Consistent with our previous comparisons, the Cold Pool domain is not reproduced in BRAN and, of the coarser products, it is only reproduced in SODA.Reanalyses are biased towards warmer temperatures except for GOFS3.0.GLORYS reproduces best the index with a mean bias of 0.18 • C while GOFSs have the largest biases (0.34 • C, 0.35 • C for 3.0 and 3.1 respectively).