A mapped dataset of surface ocean acidification indicators in large marine ecosystems of the United States

Mapped monthly data products of surface ocean acidification indicators from 1998 to 2022 on a 0.25° by 0.25° spatial grid have been developed for eleven U.S. large marine ecosystems (LMEs). The data products were constructed using observations from the Surface Ocean CO2 Atlas, co-located surface ocean properties, and two types of machine learning algorithms: Gaussian mixture models to organize LMEs into clusters of similar environmental variability and random forest regressions (RFRs) that were trained and applied within each cluster to spatiotemporally interpolate the observational data. The data products, called RFR-LMEs, have been averaged into regional timeseries to summarize the status of ocean acidification in U.S. coastal waters, showing a domain-wide carbon dioxide partial pressure increase of 1.4 ± 0.4 μatm yr−1 and pH decrease of 0.0014 ± 0.0004 yr−1. RFR-LMEs have been evaluated via comparisons to discrete shipboard data, fixed timeseries, and other mapped surface ocean carbon chemistry data products. Regionally averaged timeseries of RFR-LME indicators are provided online through the NOAA National Marine Ecosystem Status web portal.


Background & Summary
The accumulation of carbon dioxide (CO 2 ) in the atmosphere as a result of human activities, and the uptake of ~25% of anthropogenic CO 2 by the ocean 1,2 , has led to increasing acidity of ocean waters of about −0.016 pH units per decade on a global scale since the 1980s [3][4][5][6] .This ocean acidification (OA) signal is measurable at time series sites 7,8 , observed in mapped data products of CO 2 partial pressure 6,[9][10][11][12] , captured by decadal repeat hydrographic cruises 13,14 , and simulated by ocean models 15 and coupled Earth system models 5,16,17 .Superimposed on steady increases in accumulated anthropogenic carbon (C ant ) and decreases in ocean pH, however, are various modes of temporal (e.g., diurnal, seasonal, interannual) and spatial (e.g., latitudinal, nearshore-offshore) variability, which are particularly pronounced in coastal ecosystems.This variability in coastal OA brings unique impacts to marine organisms that reside in coastal zones and are vulnerable to corrosive waters 18 .
Large marine ecosystems (LMEs) are ocean regions that border coastlines and are characterized by distinct bathymetry, hydrography, productivity, and trophic structure 19 .LMEs encompass estuaries and river mouths, nearshore coastal zones, continental shelves, and the outer margins of ocean current systems.Typically, the offshore boundary of an LME extends to the continental shelf break or to the seaward edge of a current system.Due to their coastal proximity, LMEs tend to be natural hotspots of variability in carbon cycling and rapid exchange between carbon pools.For example, intense surface primary productivity in the coastal ocean is fueled by nutrients from river input, atmospheric deposition, and coastal upwelling 18 ; sinking organic matter from surface production leads to intense respiration throughout the water column and at the seafloor 20 ; and high rates of sedimentation are observed in LMEs from both biogenic and lithogenic inputs 21 .
Ongoing anthropogenic climate drivers coupled with the natural processes occurring in coastal ecosystems make it challenging to attribute modes of OA variability to the appropriate driving mechanisms.For example, anthropogenic eutrophication from freshwater runoff and atmospheric pollution can augment natural nutrient inputs, leading to even greater net primary production in coastal surface waters and greater respiration in subsurface waters.Whereas the direct effect of CO 2 uptake by primary producers mitigates OA at the surface, highly respired subsurface waters can be laterally transported and upwelled onto the continental shelf, leading to enhanced OA in the surface waters there 18,22 .These and other OA-modulating processes differ across ecosystems 23 , but their impacts are frequently correlated with environmental driver variables such as sea surface height, temperature, salinity, and chlorophyll-a concentration.These correlations allow OA metrics to be reconstructed from measurements and data products that are available at high spatial and temporal resolution 11,24,25 .
The data product described here is based on direct observations, which are used to reconstruct a recent history of surface ocean OA indicators at monthly, 0.25° resolution in U.S. LMEs.Observations are from a publicly available, annually updated database of surface CO 2 observations: the Surface Ocean CO 2 Atlas (SOCAT) 26 .SOCAT is an international data synthesis effort that has facilitated the production of global surface CO 2 flux maps 27 that contribute data-constrained estimates of the ocean CO 2 sink in the Global Carbon Budget 28 .We also rely on publicly available satellite-derived surface ocean properties and data reanalysis products to leverage the predictive power of environmental variables for upscaling SOCAT observations across U.S. LMEs.This kind of spatiotemporal upscaling has historically been accomplished using statistical interpolations 29,30 , multiple linear regressions 31 , and machine learning approaches 9,24,32,33 .We build upon the approach of Sharp et al. 24 -who presented a monthly surface ocean CO 2 partial pressure (pCO 2 ) mapped product for the California Current System region called RFR-CCS -to train random forest regression (RFR) algorithms to predict surface CO 2 fugacity (fCO 2 ) from environmental variables that can be derived with spatial and temporal continuity across U.S. LMEs.
We advance the Sharp et al. 24 approach by first clustering each LME into sub-regions with similar environmental variability using Gaussian mixture modelling.In addition to fCO 2 , we predict surface total alkalinity (and nutrients) from empirical property estimation algorithms that have been validated and published 34 .We use fCO 2 and total alkalinity (A T ) to compute eight additional OA indicators -partial pressure of CO 2 , total dissolved inorganic carbon, pH on the total scale, hydrogen ion amount content, carbonate ion amount content, saturation states for aragonite and calcite, and the Revelle factor -to produce monthly data products over 1998-2022 on a 0.25° × 0.25° resolution grid.We refer to these data products as RFR-LMEs 35 , which are freely available online and will be updated annually.Throughout this paper, we will use the term "mapped data products" to describe RFR-LMEs; "mapping" refers to the reconstruction of OA indicators on monthly, spatially continuous grids via the two-step approach of clustering on regional variability and applying trained RFR algorithms to gridded predictor variables.
This work was partially motivated by a partnership with the National Oceanic and Atmospheric Administration (NOAA) Ecosystem Indicators Working Group (EIWG), who manage the National Marine Ecosystem Status (NaMES) website (https://ecowatch.noaa.gov).The NaMES website was created to provide an at-a-glance overview of conditions in U.S. LMEs.These conditions are presented as indicators, which are quantitative and/or qualitative measures of key components of the ecosystem and span the following categories: climatological (e.g., El Niño Southern Oscillation index), physical-chemical (e.g., sea surface temperature), biological (e.g., chlorophyll-a concentration), and human dimensions (e.g., coastal county population).Indicator datasets are used by many NOAA stakeholders, such as fisheries managers, to monitor their ecosystems of interest and to assess the potential for future changes.Indicators included on the NaMES website must be theoretically sound, have demonstrable importance to the system, be relevant and understandable, show sensitivity to environmental variability or policy actions, and complement other indicators that are already served.This paper will describe the theoretical basis of RFR-LMEs and their relevance their respective ecosystems, to justify the use of RFR-LMEs as NaMES indicators of ocean acidification.The NaMES requirements also state that the data used to develop ecosystem indicators should be publicly available, quantitative, directly measurable, and updated on a regular basis; they stipulate that data should have adequate spatial coverage and that the time-series duration should be greater than 10 years and expected to continue for the foreseeable future.Because RFR-LMEs fit these requirements, we aggregate three mapped OA indicators from RFR-LMEs into monthly and annual regional averages of those indicators (and their uncertainties).Timeseries of the selected OA indicators (pCO 2 , pH on the total scale, and aragonite saturation state) are available on the NaMES website and, like the RFR-LME mapped data products, will be updated annually.

Methods
An overview of the methodological procedure to create RFR-LMEs is provided in Fig. 1.First, data were obtained from a variety of sources and bin-averaged or interpolated onto a consistent grid.Then, within each LME, a two-step cluster-regression strategy was employed.In the first step, spatial clusters were created using Gaussian mixture models (GMMs) based on variability in environmental predictors.In the second step, random forest regression (RFR) algorithms were trained for each cluster using fCO 2(SOCAT) as the target variable and co-located environmental variables as predictors.These algorithms were then applied to gridded (0.25° × 0.25°) monthly environmental predictor fields to create monthly RFR-LME mapped data products of sea surface CO 2 fugacity (fCO 2(RFR-LME) ).Applying GMMs on surface data to first divide each LME into subregions reduces the burden on the RFRs to represent many different regimes of dynamic variability at once.Therefore, the RFR algorithms are able to reconstruct sea surface fCO 2 more accurately than if all data points from the entire LME were included in the algorithm training 36 .To create RFR-LMEs for the other indicators, sea surface total alkalinity and nutrient values were estimated, and carbonate system calculations were performed.Uncertainties were propagated through these calculations to obtain uncertainty estimates for each RFR-LME.Finally, RFR-LMEs were evaluated against independent datasets.Data sources.Surface ocean fCO 2 observations were downloaded from the Surface Ocean CO 2 Atlas Version 2023 (SOCATv2023; https://doi.org/10.25921/r7xa-bt92) 37in a large quadrangle surrounding North America and U.S. Pacific Islands with the following coordinates: 18°S to 82°N, 140°E to 58°W (Fig. 2).These observations were filtered by year (1998-2022), dataset flag (A, B, C, or D), and quality flag (q.f.= 2, good data), and binned into 0.25 degrees latitude by 0.25 degrees longitude monthly grid cells using platform-weighted averages.A spatial resolution of 0.25° × 0.25° was chosen largely for coherence with the majority of available predictor datasets.Platform-weighted averages mean that, within each latitude by longitude by month bin, a platform-specific (e.g., ship-only, mooring-only) average was first calculated, then an average was taken of those averages (if more than one platform was represented within the cell).This was done to mitigate unwanted biases toward high-resolution measurement systems.For validation exercises, this binning process was also repeated with only moored buoy observations and with a dataset that excluded moored buoy observations.
Binned observations were grouped into eleven LMEs defined according to the United States Exclusive Economic Zone (EEZ), in accordance with the practice of the NOAA EIWG (Table 1; Fig. 2).Platform-weighted fCO 2 from SOCATv2023 observations (fCO 2(SOCAT) ) in each of these grid cells over time shows large-scale patterns of spatial variability (Fig. 2a) -such as relatively high fCO 2(SOCAT) at the equator and relatively low fCO 2(SOCAT) surrounding Alaska, compared to the region as a whole -and temporal variability (Fig. 2b) -such as relatively high standard deviation in fCO 2(SOCAT) observations surrounding Alaska and near the coastlines of the continental U.S. compared to the relatively low standard deviation in these observations around the Pacific Islands, again compared to the region as a whole.The distribution of the total number of months sampled within each 0.25° × 0.25° grid cell and the number of months of the year sampled at least once across the full dataset (1998-2022) within each grid cell reveal consistent patterns (Fig. 2c,d).The Northeast U.S. has especially high observational coverage (9.5% of all 0.25° × 0.25° monthly grid cells covered and 63.6% of seasonal seasonally binned 0.25° × 0.25° grid cells covered); the Southeast U.S. (4.0%total, 50.0%seasonal), Gulf of Mexico (5.2% total, 53.1% seasonal), Caribbean Sea (6.6% total, 40.2% seasonal), and California Current System (3.0% total, 42.1% seasonal) have moderately high observational coverage (Table 1).Observational coverage generally decreases farther offshore.
Next, gridded fields of satellite, reanalysis, and in situ observational products were downloaded from the sources detailed in Table 2.When applicable, these fields were re-gridded using standard interpolation functions to match the resolution and/or central grid cell positions of the binned fCO 2(SOCAT) observations.In many cases, multiple datasets could be chosen, but preference were given to those that were provided at 0.25° resolution and that covered the relevant time and space.Rigorous comparison between different input datasets is planned for future development of RFR-LMEs as they are prepared for dynamic, operational production.Sea surface temperature (SST; Fig. 2e) and ice concentration were obtained from the NOAA Optimum Interpolation Sea Surface Temperature version 2 (OISSTv2) product at daily, 0.25° × 0.25° resolution 38 ; values were averaged to monthly resolution.Sea surface salinity (SSS; Fig. 2f) and mixed layer depth (MLD) were obtained from the Copernicus Marine Environment Monitoring Service (CMEMS) Global Ocean Ensemble Physics Reanalysis (GLORYS) product at monthly, 0.25° resolution 39 .Sea surface height (SSH) was obtained from the CMEMS satellite gridded product, which is produced at monthly, 0.25° resolution by optimal interpolation of along-track measurements from available altimeter missions 40 .Sea surface chlorophyll (CHL) was obtained from the National Aeronautics and Space Administration (NASA) Ocean Colour Level-3 Mapped Chlorophyll Data product at monthly, 1/12° resolution and re-gridded to 0.25° resolution 41,42 .One-dimensional, linear interpolation was used within each (f) sea surface salinity from the CMEMS GLORYS reanalysis product 39 ; (g) wind speed from the ECMWF ERA5 reanalysis product 43 ; and (h) bathymetry from the ETOPO 2022 Global Relief Model 44 .LME name abbreviations shown in panel (a) are provided in Table 1.
grid cell to fill gaps in the chlorophyll dataset.Wind speed (Fig. 2g) was obtained from the fifth generation European Centre for Medium-Range Weather Forecasts (ECMWF) reanalysis for the global climate and weather (ERA5) at monthly, 0.25° resolution 43 .Bathymetry (Z, Fig. 2h) was obtained from the ETOPOv2022 Global Relief Model at 1/60° resolution and re-gridded to 0.25° resolution 44 .Sea level pressure (SLP) was obtained from the National Centers for Environmental Prediction/Department of Energy (NCEP/DOE) Reanalysis II model at monthly, 2.5° resolution and interpolated to 0.25° resolution 45 .Atmospheric pCO 2 was obtained from the NOAA Marine Boundary Layer (MBL) product at weekly resolution and varying latitudinal resolution and was re-gridded to monthly, 0.25° resolution 46 .Binned observations of fCO 2(SOCAT) were co-located in both time and space with the gridded predictors in preparation for algorithm training.
As one of several validation exercises, the pCO 2 reconstructions from our method were compared to other mapped pCO 2 data products downloaded from SeaFlux (v2021.04) 47, which is an ensemble of six surface pCO 2 products that enables users to calculate air-sea CO 2 flux consistently across the global ocean 27 .SeaFlux harmonizes six data-based pCO 2 products: CMEMS-FFNN 9,48 , MPI-SOMFFN 33,49 , and NIES-FNN 50 , which are each constructed using neural networks; JENA-MLS 30 , which is constructed based on a mixed layer scheme; JMA-MLR 10 , which is constructed using multiple linear regressions; and CSIR-ML6 36 , which is constructed using an ensemble of multiple machine-learning techniques.In an additional exercise, RFR-LME mapped data products were evaluated through comparisons to co-located, independently calculated OA indicators from research cruises included in the Global Ocean Data Analysis Project database (GLODAPv2.2022) 51and the Coastal Ocean Data Analysis Project -North America database (CODAP-NA) 52 .Measurements of SST, SSS, A T , C T , and nutrients were obtained from the GLODAP and CODAP-NA databases, then filtered to retain only observations with good quality flags for each of those variables that were collected at a depth of 10 meters or less.

Spatial clustering.
Three clustering methods were tested: self-organizing mapping -a neural-network-based method of producing a low-dimensional representation of a set of input data -k-means clustering -an iterative method that optimizes a defined number of centroids by minimizing the in-cluster distances from the centroid for a multidimensional dataset -and Gaussian mixture modelling (GMM) 53 -a method of clustering that assumes a multidimensional dataset is represented by a mixture of several Gaussian distributions with different properties.Of these methods, preliminary testing suggested GMM provided the best results in terms of k-fold cross-validated root-mean-square error (RMSE) in fCO 2 (described in the following section) after RFRs were fit for each cluster.In addition, GMM clustering affords the benefit of providing probabilities that each spatiotemporal grid cell belongs within a given cluster instead of simply providing the cluster assignment to each grid cell as done in the other two clustering methods.These probabilities are used in our method to mitigate discontinuities at boundaries between clusters.
Variability (defined as the standard deviation within a spatial grid cell over time) in SLP, SST, and CHL were used as feature sets to form clusters in most LMEs; CHL was replaced with wind speed in two LMEs (BS and NBCS) due to insufficient CHL observations at high latitudes.The decisions to cluster based on variability over time instead of monthly values and to use the specified sets of variables were based on initial testing and optimization in terms of k-fold cross-validated RMSE in fCO 2 (not shown).Future development of RFR-LMEs may continue to explore alternative clustering strategies.
GMM models with full, unshared covariance matrices were created using the MATLAB "fitgmdist" function.Full covariance matrices were used for GMM based on the a priori assumption that some of the predictor variables were correlated due to the nature of oceanographic environmental variables.Covariance matrices for GMM were unshared based on the a priori assumption that each spatial cluster had its own, different covariance matrix.The number of components (i.e., clusters; N in Table 1) was optimized, primarily by minimizing the k-fold cross-validated RMSE in fCO 2 , but also taking into account the Bayesian information criterion -a  measure of model fit that includes a penalty for the number of clusters -and silhouette score -a measure of the accuracy of the clustering technique that is calculated by comparing each point's similarity to the other points in its assigned cluster to how dissimilar it is to the points in the next nearest cluster (Fig. 3).

Machine learning regressions.
Once the numbers of spatial clusters were determined for each LME, random forest regressions (RFRs) 54 were trained for each cluster within each LME using binned fCO 2(SOCAT) as a target variable and each of the co-located gridded variables listed in Table 2 along with longitude (degrees east with a 0° to 360° convention), latitude, distance from the coast, month of the year (sine-and cosine-transformed to maintain cyclicity throughout the year and predictability within each month), and year as predictors.These variables were found to be useful predictors of fCO 2 by Sharp et al. 24 .RFRs are a collection of regression "trees", each of which is trained with a bootstrapped subset of the dataset.Each tree aims to generate a representation of the relationship between the predictor variables and the target variable for its bootstrapped subset of the data.This is done by splitting the data into a series of "branches" based on the predictors.At each branch point, only a random subset of the predictor variables is made available to the algorithm.The algorithm then optimally selects a predictor dataset and a specific value from that dataset on which to split the dataset into two additional branches/groups with the lowest possible within-group fCO 2(SOCAT) variance.This continues until the branches become "leaves", which means they are no longer split, either due to reaching a defined minimum leaf size or a certain criterion (e.g., variance of the remaining fCO 2(SOCAT) observations).The use of an ensemble of regression trees constitutes the "forest" aspect of an RFR.The "randomness" aspect of the forest is due to the fact that each tree is constructed with different subsets of the full dataset and that different subsets of the predictors are available at each branch point, making it possible for each tree to provide a slightly different empirical regression for the dataset.New predictor data can be passed through each  tree in the ensemble of a trained RFR, and an average of the values output from each tree is the fCO 2 prediction (fCO 2(RFR-LME) ).
For each cluster, all grid cells with a GMM probability of greater than 10% for that cluster were used to train an RFR using the MATLAB "TreeBagger" function.This means that many grid cells on the geographic boundary between one or more clusters may then have been used to train multiple RFRs.The number of trees used for each RFR was set to 1000, which was confirmed to be sufficient through visual inspection of the out-of-bag RMSE with respect to the number of trees (not shown).The minimum leaf size was set to three based on k-fold cross-validation testing, and the number of predictors used for each decision split was set to 6 (equal to the total number of predictors divided by three and rounded up to the nearest whole number).
To create an RFR-LME map of fCO 2 for each LME, all the gridded predictor variables (0.25° × 0.25°, monthly) within the LME were run through each cluster-specific RFR.This produced N fCO 2(RFR-LME) maps for each LME, where N is equal to the number of clusters.These maps were then merged as weighted average fCO 2(RFR-LME) maps using the GMM probabilities as weights, which helped to smooth out discontinuities between clusters.Lastly, RFR-LME maps of fCO 2 were converted to maps of pCO 2 (pCO 2(RFR-LME) ) using SST and SLP 55 .
Cross-validation was used to evaluate the skill of the fCO 2(RFR-LME) estimates in each cluster and overall in each LME.This k-fold cross-validation was performed by sequentially withholding subsets of 20% of data, training versions of RFR algorithms with the remaining 80% of data, then, for each data point in the validation dataset, comparing the fCO 2 obtained using the k-fold cross-validation algorithms (fCO 2(RFR-LME-kFold) ) to the observed fCO 2(SOCAT) value.This procedure was repeated five times for each LME so all data points were included in the validation data once, producing ΔfCO 2 values for each data point.

Uncertainty estimation.
Uncertainties in RFR-LME maps of fCO 2 were evaluated based on the previously described k-fold cross-validation approach.First, spatially gridded absolute values of ΔfCO 2 from k-fold cross-validation were low-pass filtered (using 0.5° × 0.5° windows) two times in each LME to begin to fill nearby grid cells with uncertainty values.Then, nearest-neighbor interpolation was used to fill any remaining empty grid cells with data-based, spatially scaled uncertainty values ( fCO s 2( )

Ε
).This approach only assesses the strength of the fit for available.It is therefore prudent to assign greater uncertainties for periods and regions where training data are less abundant or absent.For this reason, the Ε fCO s 2( ) values were further scaled over time by calculating two scaling factors specific to each LME, one representing the seasonal data coverage (using 3-month running means of the relative data coverage across the seasonal cycle) and another representing the relative annual data coverage (using 5-year running means of the relative data coverage across the timeseries).
The window sizes of the scalers were selected to balance data coverage in each time window with realistic periods of time over which observational data may exhibit serial correlations.
Uncertainties in ESPER-estimated A T and nutrients were provided by the ESPER algorithms, which estimate uncertainty using a polynomial fit to salinity and depth.The ESPER algorithms are less skillful in the surface ocean where we use them than they are globally across all depths, and the uncertainty estimates are correspondingly greater at shallow depths.
The uncertainty estimates were propagated along with standard estimated total uncertainties in carbonate system constants (see Table 1 in Orr et al. 64 ) to calculate uncertainty in mapped OA indicators.Gaussian uncertainty propagation was employed, using CO2SYSv3 for MATLAB 56 , which is based on uncertainty propagation code introduced in CO2SYSv2 by Orr et al. 64 .Validation and evaluation.The skill of the RFR-LME maps was evaluated through comparisons with co-located OA indicators independently calculated from the ship-based GLODAPv2.2022and CODAP-NA measurements described above.OA indicators were computed at in situ temperature from the A T and C T observations using CO2SYSv3 for MATLAB 56 and the same equilibrium constants as before.Although the GLODAPv2.2022and CODAP-NA databases also include pH T and pCO 2 measurements, they are not as widespread as A T and C T measurements, so we chose to calculate all indicators from A T and C T for evaluation.Each observation was then co-located with the corresponding RFR-LME grid cell and compared.
In addition, RFR-LME maps were compared to global mapped data products of sea surface pCO 2 obtained from SeaFlux (v2021.04) 47.Long-term averages of pCO 2 from RFR-LME maps and SeaFlux maps were computed across the overlapping time periods of both products (i.e., 1998-2019).Mapped differences between RFR-LME and each SeaFlux ensemble member, as well as an average across the ensemble, were computed and compared.
Finally, observations of pCO 2 at fixed buoy locations were compared to pCO 2 from RFR-LME data products at grid cells corresponding to those moored buoy observations.For this exercise, special-case RFR-LME maps were created by training RFRs on gridded fCO 2(SOCAT) data with buoy observations excluded, then using those algorithms to construct the maps.Comparing pCO 2 mapped from datasets both with and without the underlying buoy observations allowed for evaluation of the influence that those seasonally resolved observations have on the fidelity of the pCO 2 reconstruction.pCO 2 values extracted from the mapped SeaFlux datasets were also included in this comparison, allowing for separate evaluation of how the LME-scale, 0.25° × 0.25° monthly reconstructions compare to global 1° × 1° monthly reconstructions.

Data Records
RFR-LME maps can be accessed through the NOAA National Centers for Environmental Information (NCEI) via the Ocean Carbon and Acidification Data System (OCADS; https://doi.org/10.25921/h8vw-e872) 35.The dataset is available in NetCDF format on 0.25° × 0.25° spatial grids at monthly timesteps.Each mapped OA indicator and its uncertainty is provided via a separate NetCDF file, along with a reference grid that indicates to which LME each spatial grid cell belongs.Additionally, regional timeseries for CO 2 partial pressure, calcium carbonate saturation state, and pH are displayed at the NOAA Marine Ecosystem Status website (https://ecowatch.noaa.gov).Average values, trends, seasonal amplitudes, and uncertainty estimates of ocean acidification indicators from RFR-LMEs vary considerably among the regions (Tables 3-6).
Long-term means (Table 3; Fig. 4) allow for the description of LME-scale patterns in surface ocean carbonate chemistry.Tropical LMEs (PI and CS) are characterized most notably by high carbonate ion parameters ([CO 3 2− ] T , Ω ar , and Ω ca ) and low RF values.Within this pair, the CS can be described as more acidified (higher pCO 2 and lower pH T ) but better buffered (lower RF and higher A T /C T ratio).Subtropical Atlantic LMEs ] T , Ω ar , and Ω ca ) and low RF values.Compared to the Tropical LMEs however, Subtropical Atlantic LMEs have higher C T and A T values, although A T /C T ratios and therefore RFs are similar between the two groups.Temperate and subarctic coastal LMEs (CCS, GA, and NE) can generally be considered intermediate in all parameters: pCO 2 , pH, carbonate ion parameters, C T , A T , and RF.Within the group, the GA has the highest RF and lowest carbonate ion parameters, the NE has the lowest RF and highest carbonate ion parameters, and the CCS is between the two.Subarctic North Pacific LMEs (AI and EBS) are characterized by high C T , pCO 2 , and RF; and low pH T and carbonate ion parameters.Arctic LMEs (NBCS and BS) are characterized by high pH T and RF; and low A T , pCO 2 , and carbonate ion parameters.Spatial variability in OA indicators is evident within each LME and throughout the seasonal cycle (Fig. 5).For example, the CCS develops a strong dipole in the summer (June/July/August; Fig. 5c), with low C T off the coast in the northern C T and high C T off the coast in the central CCS.This dipole becomes much weaker in the winter (December/January/February; Fig. 5a).Similarly, relatively low C T occurs off the coast in the northern NE region in the summer but disappears in the winter (Fig. 5c).The southern continental Alaskan coastline exhibits low C T , especially nearshore in the summer, whereas the northern Alaskan coastline is relatively higher in C T than nearby offshore waters in the Arctic Ocean.A band of relatively low C T is evident from about 10° to 20° N in the PI region, between higher C T in the equatorial Pacific and North Pacific subtropical gyre, a feature that has appeared in other sea surface C T data products 6 .
Mapped indicator uncertainties (see Fig. 6) are served alongside RFR-LME maps 35 , providing a resource for evaluating uncertainty in OA indicator values at a given location.Area-weighted mean u[pCO 2(RFR-LME) ] was 12.0 μatm across the entire domain, u[pH T(RFR-LME) ] was 0.015, and u[Ω ar(RFR-LME) ] was 0.18.These domain-wide means are influenced by the large area and low uncertainties in the Pacific Islands region; individual LME uncertainties, particularly in Artic and Subarctic LMEs, may be considerably larger.Spatial patterns of uncertainties also differ for different OA indicators.For example, u[Ω ar(RFR-LME) ] tends to be relatively high in the tropical LMEs (Fig. 6d), where Ω ar(RFR-LME) is also high (Fig. 4e); on the other hand, u[pCO 2(RFR-LME) ] is extremely low in tropical the LMEs (Fig. 6b).
Uncertainty values reflect not only uncertainty in the RFR predictions, but also uncertainty introduced by interpolating over spatial and temporal gaps in observational coverage.Average uncertainty values for each LME are presented alongside OA indicator timeseries on the NOAA NaMES website.Importantly, the uncertainty values provided in Table 6 and on the NaMES website represent weighted means of grid-cell-level uncertainties rather than uncertainties corresponding to region-wide averages, which may or may not be smaller due to cancelling errors that are removed by areal averaging or larger due to inadequacies of our spatiotemporal scaling approach for representing uncertainties in under-sampled times and locations.

technical Validation
Data-based validation.A k-fold cross-validation approach was used to assess the skill of the fCO 2 estimates and subsequent OA indicator calculations.Region-wide error statistics for each of the eleven LMEs (before the spatial and temporal scaling) indicate that fCO 2(RFR-LME-kFold) values are centered around (mean and median errors all close to zero) and tend to correlate closely with (nine of the eleven R 2 values are 0.8 or greater) the measured values of fCO 2(SOCAT) (Table 7).Root mean square errors (RMSEs) are generally about three times larger than median absolute errors, indicating error populations with long tails of a few particularly large errors.When viewed spatially (Fig. 6a), absolute differences (|ΔfCO 2 |) between fCO 2(RFR-LME-kFold) and fCO 2(SOCAT) are greatest near the coast and in the North Pacific and Arctic, and smallest in the open ocean and in the tropics and subtropics.High |ΔfCO 2 | values tend to correlate with areas of high background variability in fCO 2(SOCAT) (Fig. 2b), emphasizing that the RFR algorithms may struggle to capture extreme values, which is consistent with the aforementioned long-tailed error populations.
Comparison to discrete shipboard data.The RFR-LME fields presented in this work are constructed using surface CO 2 measurements from shipboard flow-through analyzers.This automated observational approach allows for the collection of high spatial and temporal resolution observations of surface ocean carbonate chemistry.Discrete bottle measurements of carbonate chemistry parameters represent another approach for monitoring ocean acidification.The discrete approach allows for high-quality observations throughout the water column.Here we take near-surface discrete bottle measurements of A T and C T from GLODAPv2.2022 51nd CODAP-NA 52 , use those measurements to calculate OA indicators, and compare those calculated values with mapped surface OA indicators from RFR-LMEs.representative of all 14 mooring sites compared to each mapped product, where the boxes extend from the 25 th to 75 th percentile, the center line shows the median of the data, the whiskers extend to the most extreme data points not considered outliers, and dots denote outliers (arrows denote where outliers do not appear within the axis limits). RF-LME is the product described in this work and RFR-LME-NM is the constructed using the same method but without moored buoy observations.References for the SeaFlux mapped products are provided in the Data sources section.RFR-LME indicator values are generally in good agreement with calculations from discrete bottle measurements (Fig. 10).Compared to the k-fold-validation-based uncertainty estimates (Table 6), a greater spread (i.e.larger IQRs) in the differences between GLODAP/CODAP and RFR-LME values is expected in this exercise for two reasons.First, uncertainty stemming from CO 2 system calculations will contribute to the spread (e.g., Orr et al. 64 ), since GLODAP/CODAP indicators values are calculated from A T and C T and RFR-LME indicator values are calculated from fCO 2 and A T .As an example, average propagated uncertainties for GLODAP/CODAP calculations using standard measurement errors for A T and C T (2 μmol kg −1 for both) and for equilibrium constants 64 were calculated as 12.0 μatm for pCO 2 , 0.014 for pH T , and 0.11 for Ω ar .In addition, the two datasets are fundamentally different in their spatiotemporal resolution.RFR-LME grid cells represent averages for large swaths of the surface ocean over a monthly timestep, whereas shipboard measurements are appropriate for a distinct point in space at a distinct time.This spatiotemporal mismatch is especially noteworthy in the coastal ocean where diurnal and other sub-monthly modes of variability operate over spatial scales much finer that 0.25 degrees of latitude or longitude.The calculations from bottle measurements also tend to indicate higher pCO 2 and therefore lower pH T and Ω ar .These offsets between the two datasets may be partly related to inconsistencies in carbonate chemistry calculations, whereby calculations from A T and C T at most surface conditions tend to produce lower pH T (and higher pCO 2 ) values than corresponding measurements of those properties 65,66 .
Comparison to moored buoy time series data.Timeseries of pCO 2 from fixed grid cells of RFR-LME maps and RFR-LME maps constructed without moored buoy observations (RFR-LME-NM) were compared to pCO 2 observations at fixed buoy locations that were extracted from the SOCATv2023 database and aggregated in monthly bins.This provides a test of the capacity of RFR-LMEs to reproduce monthly variability in validation measurements that were withheld from training, and can be considered an assessment of the RFR-LME skill with monthly variability generally.Mapped global data products of surface ocean carbonate chemistry obtained from SeaFlux 27,47 were also compared to the moored buoy observations.Differences between moored buoy observations and mapped products (Table 9; Fig. 11) suggest that timeseries extracted from our regionally focused RFR-LME maps more meaningfully reflect observed pCO 2 than those from mapped global products.Like RFR-LME, most of these alternative products were trained from versions of SOCAT that include the buoy observations.The average median ( ± IQR) ΔpCO 2 (pCO 2(moor.)-pCO 2(grid) ) was 0.1 ± 23.6 μatm for RFR-LME and increased to −13.0 ± 49.2 μatm for the RFR-LME-NM product, which excluded these observations from the training data.These increased error statistics emphasize the value of moored buoy observations for the surface CO 2 observing system.Still, all but one (JENA-MLS; ΔpCO 2 = −2.2± 48.7) of the mapped data products from SeaFlux exhibited more variability in their differences from buoy observations than even the version of RFR-LME constructed without moored buoy observations.JENA-MLS may perform better at representing pCO 2 at these mooring sites because it explicitly models mixed layer fluxes and processes rather than relying on empirical relationships learned from large sets of data.
Individual timeseries from moored buoy sites (Fig. 12) emphasize the significant seasonal and interannual variability in buoy pCO 2 observations (black dots), even when aggregated in monthly bins, and the challenge for mapped products (colored lines) to accurately capture each of those variations at a local scale.The performance of the RFR-LME maps compared to the global mapped products reinforces the notion that locally specific relationships captured by training machine learning algorithms at the scale of objectively defined clusters within LMEs can resolve fine-scale variations in ocean biogeochemistry more effectively than global-scale algorithms, even though those global-scale algorithms are trained with a larger amount of data 24,67 .Positive trends in pCO 2 superimposed upon seasonal variations are visible in both the moored buoy observations and mapped data product timeseries (Fig. 12).
Comparison to mapped data products.Finally, RFR-LME surface pCO 2 was compared directly to the six global-scale mapped products of pCO 2 from SeaFlux across the overlapping interval between them (1998-2019).Maps of average surface pCO 2 display similar patterns across all six SeaFlux products, but differences between those products and RFR-LME (ΔpCO 2 = pCO 2(SeaFlux) − pCO 2(RFR-LME) ) reveal subtle regional differences (Fig. 13).SeaFlux provides a pCO 2 filler field derived from Landschützer et al. 68 to fill spatial gaps in global surface products; this gap filler is not used to produce the difference maps displayed in Fig. 13.However, for spatial consistency, it is used to calculate the averages and standard deviations of the differences for each data product shown in Fig. 13.
In the tropical Pacific, RFR-LME maps agreed well with all products but NIES-FNN, where a prevailing negative bias is evident in that product.In the Atlantic, RFR-LME maps generally agreed well, with visible biases in the Mississippi plume (CSIR-ML6), Georges Bank (JMA-MLR), Caribbean (JENA-MLS), and throughout (NIES-FNN).Coastal negative biases are visible for most products in the central CCS region, and coastal positive biases are visible in the northern CCS region.Both positive and negative biases occur in the regions surrounding Alaska, where low observational density likely leads to significant diversity in pCO 2 estimates among the gap-filling approaches.Despite these regional discrepancies with some individual products, the median (±1 IQR) ΔpCO 2 for the ensemble average of all six SeaFlux products is 0.8 ± 16.6 μatm.This indicates that RFR-LME -which represents local-scale temporal variability in surface pCO 2 more effectively than global products (Table 9; Fig. 11) -agrees at broad scales with observation-based products that are well accepted and widely used by community-wide synthesis efforts such at the Global Carbon Budget 2 and REgional Carbon Cycle Assessment and Processes Project (RECCAP2) 69 .

Fig. 1
Fig. 1 Schematic of the procedure used to construct the ocean acidification indicator data products described by this study.Steps within each section on the right (A, B, and C) are labelled in the schematic on the left.

Fig. 2
Fig. 2 Observations used to develop ocean acidification indicator data products in the eleven U.S. large marine ecosystems (LMEs) considered in this study.LME boundaries are displayed with (a) platform-weighted fCO 2 from SOCATv2023, averaged over 1998-2022 in 0.25 degrees latitude by 0.25 degrees longitude grid cells; (b) platform-weighted fCO 2 variability from SOCATv2023, calculated as the standard deviation over time in 0.25 degrees latitude by 0.25 degrees longitude grid cells; (c) the total number months over the timeseries with at least one observation in each grid cell; (d) months of the seasonal cycle with at least one observation in each grid cell; (e) sea surface temperature from OISSTv2 38 ; (f) sea surface salinity from the CMEMS GLORYS reanalysis product 39 ; (g) wind speed from the ECMWF ERA5 reanalysis product 43 ; and (h) bathymetry from the ETOPO 2022 Global Relief Model 44 .LME name abbreviations shown in panel (a) are provided in Table1.

Fig. 3
Fig. 3 Example for Gaussian Mixture Model (GMM) optimization in the North Bering Chukchi Seas (NBCS).(a) Parameters used for GMM evaluation in the NBCS, plotted against the number of GMM clusters (N).The goal of the optimization procedure was to minimize the root mean squared error, maximize the global mean silhouette score, and identify the N at which the Bayesian information criterion was no longer sharply decreasing.N = 6 was ultimately selected for the NBCS region.(b) Distribution of GMM clusters for the NBCS and (c) the probability for grid cells belonging to cluster 1 (C1).

9 Table 4 .
Long-term trends and uncertainties for OA indicators in each LME.Trends and trend uncertainties are determined by fitting a linear least-squares model with an intercept, trend, and annual and semi-annual harmonics to monthly area-weighted average indicator values.Area-weighted averages are calculated using a consistent fraction of ice-free cells for each month in each region, even though in reality some years have less ice coverage than others.Uncertainties are calculated by scaling the standard error on the trend by the effective degrees of freedom, determined from the decorrelation timescale of residual values.
and spatial coordinate predictors in ESPER-NNs mean that ESPERs function similarly to regional property estimation algorithms.ESPERs also provide the benefit of estimating uncertainty corresponding to each predicted value, allowing for the propagation of those uncertainties through downstream computations.The ESPER-Mixed routine (an average of both the ESPER-LIR and ESPER-NN approaches) was used for this study, due to assessment statistics that have indicated a lower global RMSE for the ESPER-Mixed approach (e.g., a global average RMSE of 3.7 μmol kg −1 for A T ) compared to ESPER-LIR (4.0 μmol kg −1 ) and ESPER-NN (4.1 μmol kg −1 ) when producing property estimates from SSS and SST34 .

Fig. 7
Fig. 7 pCO 2 timeseries for the eleven U.S. LMEs considered in this study.Area-weighted annual (red) and monthly (black) means are presented along with envelopes of uncertainty.Uncertainties are calculated by scaling k-fold cross-validated uncertainties spatially with a two-dimensional low-pass filter, then temporally according to long-term data coverage (5-year running windows) and seasonal data coverage (3-month running windows).Average values across the timeseries are indicated by dotted lines.Note the different y-axis for each LME.For many LMEs, uncertainties are larger near the beginning of the timeseries, when SOCAT observations are less dense.Inset within each timeseries plot is a figure showing the percent of the LME represented across the seasonal cycle by grid cells that remain ice-free across the entire timeseries; these are the grid cells used to compute monthly and annual means.

Fig. 8
Fig. 8 pH T timeseries for the eleven U.S. LMEs considered in this study.Same as Fig. 7, except for pH T .

Fig. 9 Ω
Fig.9Ω ar timeseries for the eleven U.S. LMEs considered in this study.Same as Fig.7, except for Ω ar .

Fig. 10
Fig. 10 Comparisons between OA indicators retrieved from RFR-LME maps and those calculated from discrete observations.(a,c,e) Histograms showing differences between calculations of (a) pCO 2 , (c) pH T , and (e) Ω ar from discrete surface (depth ≤ 10 m) observations of A T and C T and values of the same OA indicators from RFR-LME maps.Discrete observations that fall within the boundaries of LMEs were obtained from the GLODAPv2.2022 51and CODAP-NA 52 data products.Error statistics shown represent the median errors and the interquartile ranges of errors for each comparison.(b,d,f) Mapped differences between calculations of (b) pCO 2 , (d) pH T , and (f) Ω ar from discrete surface (depth ≤ 10 m) observations of A T and C T and values of the same OA indicators from RFR-LME maps.Discrete differences are binned into 1° × 1° grid cells for this map.

Fig. 11
Fig. 11 Summarized differences between monthly binned moored buoy pCO 2 observations and mapped pCO 2 data products.(a) Medians and (b) interquartile ranges of differences, (c) differences in seasonal amplitudes, and (d) correlations of residual values (after removing the trend and seasonal cycle) between binned moored buoy pCO 2 observations and mapped pCO 2 data products.Each of these statistics are shown as boxplotsrepresentative of all 14 mooring sites compared to each mapped product, where the boxes extend from the 25 th to 75 th percentile, the center line shows the median of the data, the whiskers extend to the most extreme data points not considered outliers, and dots denote outliers (arrows denote where outliers do not appear within the axis limits).RFR-LME is the product described in this work and RFR-LME-NM is the constructed using the same method but without moored buoy observations.References for the SeaFlux mapped products are provided in the Data sources section.

Fig. 12
Fig. 12 Comparisons between pCO 2 from selected moored buoy observations, RFR-LME and RFR-LME-NM maps, and other mapped surface products.(a) Mapped long-term mean pCO 2(RFR-LME) along with mean pCO 2 from moored buoy observations (shaded dots).Colors of the arrows in the map correspond to the colors of the outlines of timeseries plots from grid cells that match the buoy locations.(b-i) Each timeseries shows buoy observations aggregated into monthly bins (black dots), corresponding timeseries from RFR-LME maps (red solid lines) and RFR-LME-NM maps (blue dashed lines), and corresponding timeseries from mapped global data products included in SeaFlux 47 (thin colored lines).

Fig. 13
Fig. 13 Comparisons between mapped surface pCO 2 products and RFR-LME maps.Long-term mean pCO 2 for each of the SeaFlux 47 mapped global products (a,c,e,g,i,k) and differences between those products and RFR-LME (ΔpCO 2 = pCO 2(SeaFlux) − pCO 2(RFR-LME) ; b,d,f,h,j,m) are shown.Area-weighted averages and standard deviations of ΔpCO 2 are provided above each set of two figures.

Table 1 .
Summary information for the eleven U.S. large marine ecosystems (LMEs) considered in this study.Areas are calculated as the sum area of all 0.25° × 0.25° grid cells that fall within the limits of each LME, taking into account the percentage of each grid cell covered by ocean determined via the ETOPOv2022 Global Relief Model.Data coverage is calculated as the percentage of monthly grid cells occupied by an observation over the entire time series (Total) and binned over a seasonal cycle (Seasonal).The number of Gaussian Mixture Model (GMM) clusters (N) for each LME is determined by the GMM optimization procedure described in the Spatial clustering section.

Table 2 .
Sources of gridded fields of satellite, reanalysis, and in situ observational products used to create RFR-LME maps.The digital object identifiers and original three-dimensional resolutions of each product are provided.*Used for GMM clustering in most LMEs.# Used for GMM clustering in BS and NBCS.+ Not used for RFR training in the BS or NBCS.

Table 3 .
Long-term mean values for OA indicators in each LME.Long-term means are calculated as averages over the monthly timeseries (1998-2022) of area-weighted average indicator values.

LME pCO 2 μatm A T μM C T μM pH T [H + ] T nM RF
. ESPERs consist of both locally interpolated multiple linear regressions (ESPER-LIR) and feed-forward neural networks (ESPER-NN) trained to estimate seawater properties from a given set of input properties.Though ESPERs are global in nature, the regionally tuned ESPER-LIR coefficients

Table 6 .
Average uncertainties for OA indicators in each LME.Uncertainties are determined by filtering and scaling fCO 2 error estimates, pairing those with A T uncertainties from ESPER estimates, and propagating those through CO 2 system calculations.

Table 5 .
Seasonal amplitudes for OA indicators in each LME.Seasonal amplitudes are calculated from the annual sine and cosine component amplitudes of a linear least-squares model with an intercept, trend, and annual and semi-annual harmonics fit to area-weighted average indicator values.
ms ref is the reference month in the series for each time step(1-228), n obs(ms) is the number of grid cells with observations in the corresponding month of the series, n tot(ms) is the total number of available grid cells in the corresponding month, and MS is the total number of months considered within the window for each time step.Fewer months were considered within each window near the beginning and end of the time series.Finally, the estimated uncertainty of fCO ref ref where ms is the numbered month in the full series (1-228),

Table 8 .
Long-term mean values, trends, and seasonal amplitudes for temperature and salinity in each LME.Long-term means are calculated as averages over the monthly timeseries (1998-2022) of area-weighted average values; trends and trend uncertainties are determined by fitting a linear least-squares model with an intercept, trend, and annual and semi-annual harmonics to monthly area-weighted average values; and seasonal amplitudes are calculated from the annual sine and cosine component amplitudes of the linear least-squares model.

Table 9 .
Medians and interquartile ranges (μatm) of comparisons between moored buoy observations and corresponding grid cells from mapped monthly sea surface pCO 2 data products.Only buoys within LMEs and with observations in more than 36 months of the timeseries were included.Grid cells nearest in space to the moored buoy coordinates were selected from each data product.If the nearest grid cell did not contain pCO 2 values, the next nearest grid cell was selected.This process was repeated until the chosen grid cell contained pCO 2 values.