Revisiting Oxygen-18 and Clumped Isotopes in Planktic and Benthic Foraminifera

. Historically

DAËRON AND GRAY 10.1029/2023PA004660 2 of 29 between their calcification temperatures-which may vary with depth and season-and mean annual sea surface temperatures (or some other convenient metric used to describe the climate) is non-trivial and cannot be assumed to remain constant at geological timescales.
More recently, carbonate clumped-isotope (Δ 47 ) paleothermometry has allowed calcification temperature to be directly constrained, enabling both temperature and δ 18 O sw to be reconstructed from foraminiferal CaCO 3 (Eiler, 2011).Over the past two decades, clumped-isotope geochemistry has seen a steady stream of methodological improvements driven by concerted community efforts, recently leading to the definition of the I-CDES reference scale (InterCarb -Carbon Dioxide Equilibrium Scale), which resolves long-standing inter-laboratory discrepancies in carbonate Δ 47 measurements (Bernasconi et al., 2021), along with apparently unifying calibrations of calcite Δ 47 thermometry (Anderson et al., 2021).Calibration studies have so far concluded that the relationship between foraminifer calcification temperatures and their Δ 47 values is the same as that for the majority of CaCO 3 minerals, including many inorganic/synthetic carbonates (Grauel et al., 2013;Meinicke et al., 2020;Peral et al., 2018;Piasecki et al., 2019;Tripati et al. 2010).Although they predate the I-CDES itself, the three most recent of these studies anchored their Δ 47 measurements to the same carbonate reference materials used to define the I-CDES scale.In theory, this should make it straightforward to directly compare their results, but as pointed out by Meinicke et al. (2020) there are important differences in how these studies estimate "true" calcification temperatures independently of Δ 47 .
In this work, we start by laying out a comprehensive framework for the quantitative interpretation of oxygen-18 in foraminifera, bringing together (a) an extensive compilation of core-top and culture studies and plankton tow data constraining oxygen-18 fractionation factors as a function of temperature in different foraminiferal species, (b) a compilation of typical habitat depths for planktic species from depth-stratified tows, and (c) modern seawater temperature/δ 18 O sw /chemistry databases.This framework is generally consistent with a large compilation of planktic foraminifera from Holocene core tops (Malevich et al., 2019), although it raises questions regarding true calcification depths.Building on this framework, we then jointly reassess the results of the three most recent foraminifer Δ 47 calibration studies.We find excellent agreement, as previously reported, between δ 18 O-derived and Δ 47 -derived temperature estimates in planktic foraminifera, implying that they conform to recently published, mostly inorganic I-CDES calibrations.The case of benthic foraminifera is not as clear-cut, with conspicuous discrepancies between atlas and clumped-isotope estimates of temperature.Despite its first-order approximations and assumptions, we believe this case study showcases how the framework described here provides a useful, data-based foundation to interpret foraminiferal isotopic records.While many of the calibration issues discussed here may seem to be deep in the methodological weeds of δ 18 O and Δ 47 thermometry, we illustrate in the final section how these issues, and the related uncertainties, have a far from trivial impact on our understanding of Cenozoic climate evolution, and by inference of climate sensitivity and polar amplification derived from these records (Cramwinckel et al., 2018;Gaskell et al., 2022;Hansen et al., 2013;Westerhold et al., 2020).

Objectives
Paleoceanographic reconstructions based on foraminiferal δ 18 O c are one of the oldest branches of paleoclimatology (Emiliani, 1954a(Emiliani, , 1954b)).Many of the concepts and nomenclature in use today (e.g., "vital effects" or "expected equilibrium values") reflect this long history, and some of the classical formulas still routinely used are at odds with more recent, yet robust observations.This is particularly apparent when combining oxygen-18 methods with other tracers of seawater temperature, such as Mg/Ca ratios (Weldeab et al., 2007) or clumped isotopes (Meckler et al., 2022), each with their own set of methodological challenges.This work aims to revisit the consistency between climatological, δ 18 O, and Δ 47 -derived temperatures, with our primary targets being (a) to critically revisit the methods by which we may use oxygen-18 thermometry to constrain foraminiferal calcification temperatures, and (b) to reassess, based on the overall I-CDES-reprocessed clumped-isotope data set, whether the Δ 47 values of foraminifera differ significantly from those predicted by I-CDES calibration studies based on other types of biogenic or abiotic carbonates (Anderson et al., 2021;Fiebig et al., 2021).

Least Squares Methods
Regression methods used in this study are either "simple regressions," that is, least squares regression only considering residuals in the dependent variable, with each observation carrying an equal weight and no attempt to quantify model uncertainties, or "York regressions," that is, straight-line fitting of (X, Y) data considering DAËRON AND GRAY 10.1029/2023PA004660 3 of 29 uncertainties in both variables (York et al., 2004).In the latter case, we use the root mean squared weighted deviation statistic (RMSWD), equivalent to the square root of the reduced χ 2 statistic, to assess goodness-of-fit: being the number of degrees of freedom (1) As a rule of thumb, RMSWD values can be used to check a posteriori, based on the magnitude of the regression residuals, whether observation uncertainties have been reasonably assigned: a RMSWD value much larger than one suggests that the uncertainties assigned to the (X, Y) observations are underestimated by a factor roughly equal to the RMSWD value.Conversely, a RMSWD value much less than one is suggestive of overestimated uncertainties.

Original Studies
Peral et al. ( 2018) analyzed 25 planktic and 2 benthic foraminifer samples from 12 core tops (Table 1; Figure 1).Samples in that study were reacted in a common acid bath at 90°C and the typical amount of CaCO 3 per replicate analysis was 20-30 μmol.They did not find evidence for detectable size fraction effects on Δ 47 nor δ 18 O, except for G. inflata whose carbonate δ 18 O values (δ 18 O c ) varied substantially (±0.4‰) with size fraction.Different size fractions for G. inflata were thus treated as independent samples (using the nomenclature of Daëron (2021), where sample designates some amount of homogeneous carbonate material subjected to one or more replicate analyses), while samples for the other species were defined as a unique combination of core-top and species.The results of Peral et al. (2018) are consistent with earlier evidence (Grauel et al., 2013;Tripati et al., 2010) arguing against large species-specific or pH-dependent effects on foraminifer Δ 47 .Piasecki et al. (2019) and Meinicke et al. (2020) both used another sample preparation protocol, where smaller replicates (each ∼1.0-1.5 μmol) were acid-reacted at 70°C using a modified Kiel device.Piasecki et al. (2019) analyzed 43 benthic samples from 13 core tops.They did not find evidence for detectable species or size fraction effects on Δ 47 , and thus computed the average Δ 47 composition for each of the 13 core tops by binning all size fractions and all species together at each site.Meinicke et al. (2020) analyzed 43 planktic samples from a different set of 13 core tops.They did not specifically test for size effects on Δ 47 , but again concluded against detectable species-specific effects.Piasecki et al. (2019) and Meinicke et al. (2020) tested various methods aiming to estimate "true" planktic calcification temperatures independently of Δ 47 .These methods can be broadly categorized as either based on seawater atlas temperatures or based on oxygen-18 thermometry.In the first case, calcification temperatures are constrained by looking up, in a gridded database such as the World Ocean Atlas (WOA) (Locarnini et al., 2018), monthly or seasonally averaged seawater temperatures corresponding jointly to a certain seasonal time window and a certain range of water depth.These space-time constraints are assigned a priori and depend on the planktic species considered.The second approach, instead of seawater temperatures, considers seawater δ 18 O values (δ 18 O sw ) based on a gridded database of mean annual δ 18 O sw (LeGrande & Schmidt, 2006) and combines this information with foraminifer δ 18 O measurements to constrain calcification temperatures.
Despite using slightly different methods to estimate calcification depths, both Peral et al. (2018) and Meinicke et al. (2020) concluded that temperature estimates obtained from the isotopic method are more useful, largely because of strong seasonal variations in seawater surface temperatures (SST) whereas δ 18 O sw remains relatively constant within the water column and throughout the year.However, the two groups ended up making different choices regarding which water-calcite oxygen-18 fractionation relationship best applies to planktic foraminifers.Peral et al. (2018) opted for the Kim and O'Neil (1997) calibration, which is based on synthetic calcites precipitated at 10, 25, and 40°C.They argued that this assumption, when combined with seawater temperature databases and models of temperature-dependent foraminifer calcification rates based on culture experiments (Lombard et al., 2009), yields good first-order predictions for foraminifer δ 18 O c values, with root mean square residuals on the order of 0.2‰ (Roche et al., 2018).Conversely, Meinicke et al. (2020) opted for the calibration of Shackleton (1974, equation D), which is derived from synthetic calcites precipitated at 0 and 25°C (O'Neil et al., 1969;Tarutani et al., 1969) and was found to be consistent with benthic Uvigerina from three core tops with modern temperatures between 1 and 7°C.The effect of choosing the former calibration over the latter is negligible at ∼20°C, but reaches +1.5°C around 30°C and −2.2°C around 0°C (Figure 4d of Meinicke et al. (2020)).
The findings of all three studies are summarized in Figure 2. Despite the use of different analytical protocols, these results were all anchored to the CDES scale of Dennis et al. (2011) using the same set of reference materials (ETH-1/2/3/4, with nominal values from Bernasconi et al., 2018).However, the spread of Δ 47 values predicted by each study at low temperature (∼0°C) is on the order of 18 ppm, equivalent to ∼4°C.Some of that spread may arise from the use of different calcification temperature assumptions, or even simply reflect analytical scatter, but it is also possible that some foraminifer groups -for example, benthic versus planktic -are characterized by different relationships between Δ 47 and temperature.

Conversion to Δ 47 (I-CDES) Values
We reprocessed the original raw data of Peral et al. (2018) using a "pooled regression" approach as implemented by the D47crunch library (Daëron, 2021), using the I-CDES nominal values assigned to ETH-1/2/3/4 by Bernasconi et al. (2021).As in the original study, "samples" are defined by default as a unique combination of core site, species, and size fraction.We then use D47crunch's built-in combine_samples() method to combine all size fractions with the same core and species, except for G. inflata samples (see Section 2.3.1).By properly accounting for analytical error covariance between the Δ 47 values to combine, this two-step approach avoids underestimating the final standardization errors.
The original data of Meinicke et al. (2020) were reprocessed by Meinicke et al. (2021) who provided a recalculated I-CDES calibration equation.Although the corresponding raw data are archived in the EarthChem database, the Kiel-device approach used at the University of Bergen standardizes measurements based on reference materials analyzed in a sliding time window rather than grouping analyses in discrete analytical sessions.Despite this approach being entirely valid in itself, the statistical treatment implemented in D47crunch does not properly apply in the case of a sliding window, and to the best of our knowledge there is no published method to reliably propagate full standardization uncertainties for that approach.However, when following best practices (replicate analyses sufficiently separated in time; evenly distributed measurements of standards) the sliding window approach should provide useful estimates of analytical repeatability despite effectively neglecting all inter-sample error correlations.Instead of attempting to reprocess the original data of Piasecki et al. (2019)  do not extend down to the true depth of the core-top.In such cases, we estimate bottom seawater temperature at the core top by using the nearest neighboring grid node with a temperature profile reaching sufficient depth.We check the consistency between the temperature profile interpolated at the latitude and longitude of the core and the nearest-neighbor temperatures by visual inspection of the two superimposed profiles (Figure S1 in Supporting Information S1).
For seawater temperatures nearer to the surface, which potentially experience large seasonal variations, at a given latitude and longitude we interpolate the gridded mean monthly temperature fields from the same WOA23 database.This allows us to compile, for any given site, histograms of temperatures integrated over arbitrary ranges of calcification depths and months.

Estimates of Calcification Temperatures
From Oxygen-18

Species-Specific Oxygen-18 Fractionation Relationships
As previously argued by Peral et al. (2018) and Meinicke et al. (2020), we concur that oxygen-18 thermometry potentially provides reasonably accurate constraints on planktic foraminifer calcification temperatures, but with an important caveat: that such estimates are critically sensitive to the decision of which oxygen-18 fractionation relationship(s) should apply.Here, we address this issue through the pragmatic approach of compiling published data reporting 18 O/ 16 O fractionation between seawater and foraminifera at various temperatures, either from culture experiments, from stratified plankton tows, or from sediment samples in the case of benthic species (Table 2).This compilation only includes studies with direct temperature measurements and direct δ 18 O c and δ 18 O sw measurements which can be linked reasonably well to the modern VPDB and VSMOW scales.We excluded several studies in which there were only a few observations spanning a narrow range of temperature, but in which the oxygen-18 dispersion was unrealistically large (much greater than ±1‰).As noted by Mulitza et al. (2003), their tow results for T. sacculifer differ from earlier culture experiments (Erez & Luz, 1983;Spero & Lea, 1993), perhaps due to differences in carbonate chemistry between the culture experiments and the present-day ocean.We thus elected to exclude T. sacculifer observations from these earlier culture studies.Finally, for one of the stratified plankton tow studies, that of Lončarić et al. (2006), we applied an additional data filter by only considering collection depths consistent with our best estimates for the living depths of G. inflata and G. truncatulinoides (see Section 2.5.2).

Estimates of Calcification Depth and Seawater δ 18 O
For each species of planktic foraminifera, we compiled typical living depths based exclusively on previously published estimates from depth-stratified plankton tow hauls (Greco et al., 2019;Meilland et al., 2019;Rebotim et al., 2017).Although true calcification depths may in theory differ from habitat depth, and both are likely to vary geographically, seasonally, and at longer, geologic timescales, we make the pragmatic initial assumption that planktic foraminifera may calcify at any depth within the generally assumed habitat range listed in Table 3.We acknowledge that this assumption is far from robust when attempting to derive calcification conditions from temperature profiles, due to potentially strong vertical gradients and/or seasonal variations in SST.However local δ 18 O sw values, by contrast, are much more constant as a function of season and/or depth, with typical residuals of ±0.11‰, roughly equivalent to ±0.6°C (see Figure 4a).
Here, instead of the seawater δ 18 O model of LeGrande and Schmidt (2006), we use the gridded model of monthly averaged δ 18 O sw by Breitkreuz et al. (2018), which is derived from the same δ 18 O observations as LeGrande and Schmidt (2006), but combines them with a general circulation model and additional climatological observations of temperature and salinity.This approach avoids sharp transitions between water masses or in areas with sparse observations, and takes seasonal variability into account in a manner consistent with physical laws, yielding monthly as well as annual mean values of δ 18 O sw with a grid resolution of one degree.To produce monthly average δ 18 O sw profiles at a given latitude and longitude, we use the same method as for temperature profiles (Section 2.4), by looking for the nearest neighboring grid node with sufficient depth range.
Given any combination of latitude, longitude, and planktic foraminifer species, we start by looking up the minimum and maximum living depths for that species (Table 3).We then select the nearest grid node with sufficient depth range, and interpolate each monthly mean δ 18 O sw profile over the living depth range with a depth resolution of 1 m.Over a depth range of N meters, this process yields population of (N + 1) × 12 values, whose arithmetic mean provides an estimate of δ 18 O sw for this particular combination of latitude, longitude, and species.The standard deviation of this population, noted σ ssv , is used to quantify the spatial and seasonal variability of δ 18 O sw in DAËRON AND GRAY 10.1029/2023PA004660 7 of 29 the model.The final standard error assigned to δ 18 O sw for this particular combination of latitude, longitude, and species is defined as the quadratic sum of σ ssv (reflecting in-model variability at this site) and an arbitrary "model error" of 0.1‰ reflecting the model's accuracy.

Isotopic Estimates of Calcification Temperature
For all benthic and planktic samples in the three Δ 47 studies, we compute "oxygen-18" estimates of temperature (T 18 ) by combining (a) δ 18 O c values originally reported for that sample, (b) δ 18 O sw values estimated as in Section 2.5.2,(c) a species-specific relationship linking temperature to the oxygen-18 fractionation factor 18 α between carbonate and water.In some cases where we lack observations constraining 18 α for a given species, we use an aggregate relationship derived from observations on other species of the same genus.In the single case of Pulleniatina obliquiloculata, lacking observations at the genus level, we resort to an even more generalized relationship based on aggregating all planktic species.Although this issue with P. obliquiloculata only affects 3 out of 68 planktic samples, it remains potentially problematic (cf., further discussion in Section 3.4.3).
The uncertainty associated with each T 18 value is computed as the quadratic sum of three independent error components derived respectively from (a) the final standard error on δ 18 O sw as defined in Section 2.5.2,(b) the reported standard error on δ 18 O c , (c) the uncertainty on the species-specific relationship linking temperature to 18 α.

Species-Specific Oxygen-18 Fractionation Relationships
Figure 3 shows oxygen-18 fractionation observations for the studies listed in Table 2.In spite of clear species-specific offsets, for all species with sufficient temperature coverage the thermal sensitivity of 18 α is very close to the Kim and O'Neil (1997) slope of −0.2‰ per K, which is itself indistinguishable from that for quasi-equilibrium 18 α values (Daëron et al., 2019) or for dissolved carbonate/water and bicarbonate/water fractionations (Beck et al., 2005).These observations may be simply explained by postulating that, to the first order, the temperature sensitivity of 18 α is inherited from dissolved (bi)carbonate ions, with additional, second-order non-equilibrium fractionation effects controlled by other factors such as pH, ion concentrations or symbiont activity.
As a practical course of action, we propose to approximate 18 α for each species as an affine function of the form 1000⋅ln( 18 α) = A/T + B, with A = 18.03 ⋅ 10 3 after Kim and O'Neil (1997) and B being a species-specific offset, determined by least squares regression of the data shown in Figure 3.We acknowledge that this approximation fails to account for the influence of factors other than temperature, such as for instance the indisputable effects of lighting conditions on 18 α values in O. universa (Figure 1 of Bemis et al. (1998)).These second-order factors, however, are also sampled in the data set compiled here (e.g., both high-and low-light O. universa experiments are included in Figure 3).We can thus estimate the scatter introduced by non-thermal factors based on the regression residuals for each species.A histogram of all such residuals is shown in Figure 4b, with 95% of the residuals within ±0.42‰, roughly equivalent to ±2°C.To the best of our knowledge, none of the existing methods for reconstructing environmental paleotemperatures offer much better precision/accuracy than that, particularly when considering that these residuals of ±0.42‰ reflect a combination of the natural, "true" variability of foraminiferal oxygen-18 thermometry with observation errors in temperature estimates, δ 18 O sw , and δ 18 O c measurements.These observation errors are likely to cancel out for regressions based on many observations such as those of Figure 3, so that the accuracy of temperature reconstructions derived from sufficiently precise constraints on δ 18 O sw and δ 18 O c may end up being better than ±2°C.
We thus propose that the first-order species-specific oxygen-18 fractionation relationships summarized in Table 4 and Figure 5 provide a useful, updated framework for applying oxygen-18 thermometry to foraminifer shells.This comes with the caveat, however, that there are still many gaps in our understanding of non-equilibrium fractionation effects in foraminifer shells (and, more generally, in most biogenic carbonates).As a result, predicting whether modern species-specific 18 α calibrations apply to extinct species, or to past environments with a seawater chemistry very different from modern conditions, remains problematic.

Apparent Discrepancies Between Oxygen-18 and Atlas Temperatures
We assess the accuracy of our oxygen-   6 and 7).If we reprocess these 16 discordant samples with less strict assumptions about their calcification environment, allowing for a calcification depth range of 0-500 m, only six samples remain discordant.
As an independent check, we also applied the same methodology to a larger, global compilation of ∼2,600 samples from core-tops (Malevich et al., 2019), comprising five widely-studied planktic species (G.ruber, T. sacculifer, G. bulloides, N. incompta, and N. pachyderma).About 12% of the Malevich et al. samples present as discordant (down to 8% assuming 0-500 m calcification depth range), suggesting that the abundance of discordant samples in our clumped-isotope data set is not particularly unusual.
There are several possible explanations for the observation that a minority of planktic samples have oxygen-18 compositions seemingly irreconcilable with modern environmental temperatures at shallow depths: • Pre-Holocene foraminifera: due to bioturbation effects, low sedimentation rates and/or poor chronological constraints, some samples may include material from glacial periods.This would be consistent with the observation that virtually all discordant samples appear cooler than expected (Figure 8), which in this scenario could result from a combination of cooler seawater and greater δ 18 O sw values in glacial times.A first-order prediction for this hypothesis is that the clumped-isotope signatures of discordant samples should accurately record the cooler waters but fail to account for the underestimated δ 18 O sw , yielding Δ 47 values greater than expected from 18 α by up to ∼15 ppm (equivalent to −5°C i.e., +1‰ δ 18 O sw ).As discussed in Section 3.4.2,this does not appear to be the case.• Inaccurate species-specific 18 α functions: it is possible that the observations summarized in Figure 3 fail to capture the natural range of 18 α values associated with some planktic species.However, the discordant observations appear to be broadly distributed among species, including some for which existing constraints on 18 α seem quite robust (G.bulloides, G. ruber white).
• Inaccurate δ 18 O sw model: as noted by Breitkreuz et al. (2018), their model does not account for oxygen-18-depleted precipitation, leading to comparatively large errors in shallow Arctic seawater, at latitudes >70°N (Figure 2 of Breitkreuz et al. (2018)).However, out of the 37 cores considered here, the 4 located in areas where the δ 18 O sw model performs poorly contain none of the discordant observations, which in fact appear fairly randomly distributed with respect to latitude, depth, or ocean basins.More generally, errors in δ 18 O sw are expected to equally bias all species from a given core, which is not the case.
• Gametogenic calcite and/or deeper-than-assumed calcification: Many planktic species are known to precipitate a layer of gametogenic calcite at depths greater than their observed living habitat.Such precipitation, taking place in deeper and colder waters, should drive δ 18 O c to heavier than expected values (e.g., Caron et al., 1990;Duplessy et al., 1981;Hamilton et al., 2008;Spero & Lea, 1993).This explanation would be consistent with the observation that almost all of our discordant samples become concordant when relaxing calcification depth assumptions.A first-order prediction for this hypothesis is that clumped-isotope signatures should covary with T 18 in the same way for concordant and discordant samples alike, which is indeed what we observe in Section 3.4.2.Further support for that hypothesis comes from the carbon-13 composition of discordant versus concordant samples: in ocean basins with strong vertical δ 13 C gradients (Indian and Pacific oceans), discordant samples have lower δ 13 C values than concordant ones from the same site, whereas discordants from the North Atlantic ocean, where the gradient is much weaker, have δ 13 C values indistinguishable from concordant samples from the same site (Figure S4 in Supporting Information S1).We note that, even though it appears unlikely that gametogenic and non-gametogenic calcite would strictly follow the same 18 α functions, the overall resulting enrichment is unlikely to further offset apparent temperatures by more than 1°C.• Cryptic diagenesis in deeper waters: all but one discordant sample in the Δ 47 data set have T 18 values cooler than surface seawater but warmer than local bottom waters.This would be consistent with cryptic, partial overprinting by secondary carbonate precipitation in early-stage diagenesis, despite the absence of clear evidence for such (re)crystallization.Although it has been proposed, somewhat controversially, that burial-induced isotopic re-equilibration widely affects δ 18 O c in well-preserved, glassy foraminifera in an  2.Although these fractionation relationships differ between species, their temperature sensitivities remain indistinguishable from that for inorganic calcite (blue lines).All temperatures observations are from the original studies.
DAËRON AND GRAY 10.1029/2023PA004660 11 of 29 visually undetectable manner (e.g., Bernard et al., 2017;Cisneros-Lazaro et al., 2022), their purported diffusion mechanism would act on much longer timescales (>1 Ma) and would be expected to erase clumped-isotope signatures several orders of magnitude more rapidly than it could substantially alter δ 18 O c .
Our provisional conclusion is that in many cases, due to a combination of gametogenic calcite production and/or greater-than-expected vertical mobility in various planktic species, we lack reliable a priori knowledge regarding when and where planktic calcification occurs.We thus concur with Peral et al. ( 2018) and Meinicke et al. (2020) that our best option is to use oxygen-18 thermometry to estimate calcification temperatures integrated over foraminiferal life-times.A critical prediction for this approach is that T 18 and Δ 47 should be strongly correlated and that this covariation should be the same for concordant and discordant samples.

Cold End-Members in the Planktic Data Set
Currently, only three planktic samples, one N. incompta from the Southern Ocean in Peral et al. (2018) and two N. pachyderma from the North Atlantic in Meinicke et al. (2020), effectively constrain planktic Δ 47 below 2°C.Technically, only one of the three is flagged as discordant, but the T 18 estimates for all three samples are well below surface WOA23 temperature estimates (Figure 15).The discordant N. incompta is unusual because its T 18 of −1.1 ± 1.0°C is irreconcilable with the local modern bottom temperature of 2.3 ± 0.2°C, making it very unlikely that the discrepancy could result from poorly constrained calcification depth.The two N. pachyderma are from a region where the δ 18 O sw model of Breitkreuz et al. (2018) performs poorly due to oxygen-18-depleted precipitation.However, in the high-latitude environments of these three samples, the spread of monthly temperatures throughout the whole water column remains small (±0.8°C, 1SD), so that we can reasonably reassign calcification temperatures based on this narrow temperature range at each site (but note that this affects the two N. pachyderma samples only minimally, increasing temperatures by ∼1°C, cf., Figures S2 and S3 in Supporting Information S1).In the rest of this study we only consider the reassigned atlas temperatures for these three samples, keeping in mind that this approach is only applicable where vertical and seasonal variations of temperature remain small.

Independent Estimates of Benthic Calcification Temperatures
Figure 9 summarizes the currently available constraints on calcification temperatures for the benthic samples of Piasecki et al. (2019) and Peral et al. (2018) whose species or genus allows using one of the 18 α calibrations listed in Table 4.About half of the core-top sites in Figure 9 display some kind of discrepancy between bottom temperatures estimated from WOA23, originally reported in situ measurements, and/or T 18 estimates based on one or more benthic species.Some species, such as those within the Cibicidoides genus, yield T 18 estimates generally consistent with atlas and in situ temperatures.Other species sometimes yield T 18 clearly at odds with atlas and in situ estimates, in spite of apparently robust calibration constraints on 18 α.The worst offenders are H. elegans, the only aragonitic species considered here, and U. peregrina, the only infaunal one.For H. elegans, the three warmest core tops yield T 18 estimates systematically warmer than the other species at those sites, which could reflect a potentially steeper than assumed slope of 18 α (see Figure 3).In one case, U. peregrina yields T 18 7-10°C colder than other estimates.Although it would be tempting to attribute this to inaccurate in situ constraints, other Overall residuals for the species-specific oxygen-18 relationships listed in Table 4, based on all studies shown in Figure 3.
DAËRON AND GRAY  3. SE is the standard error of the best-fit values of B, with observation error estimates based on the scatter (RMSWD) in each data set.SD is the standard deviation of all residuals for each population.Best-fit B values and observation scatter are also shown in Figure 5.
Noting that T 18 estimates, where they deviate strongly from the others, do not display any systematic bias, we propose that the most conservative strategy for now is to stick with the originally reported in situ temperatures, if available, and otherwise (for both of the Peral et al. (2018) cores and three of the Piasecki et al. ( 2019) cores) to use bottom WOA23 temperatures.The basic observation remains, nevertheless, that in several locations the δ 18 O c values of different species do not appear consistent with existing modern observations on 18 α, unless we assume that these species somehow record different temperatures and/or δ 18 O sw values.

Existing Constraints on Equilibrium/Inorganic I-CDES Calibrations
Two recent calibration studies provide constraints on the relationship between Δ 47-ICDES values and carbonate formation temperatures.In the first one, Anderson et al. (2021) analyzed six newly obtained glacial lake carbonates, and re-analyzed at MIT 35 samples from earlier studies comprising natural calcites, synthetic precipitates, and experimentally heated calcites.They also reported new measurements, performed at LSCE, of mammillary calcites from Devils Hole and Laghetto Basso, whose very slow, inorganic precipitation from barely supersaturated waters offer optimal conditions for achieving isotopic equilibrium (Coplen, 2007;Daëron et al., 2019).The calibration equation published by Anderson et al. (2021) was obtained by combining these new results with those of previous studies including the Peral et al. (2018) data (with the original calcification temperature estimates based on Kim and O'Neil (1997)) and the planktic data of Meinicke et al. (2020) (with temperatures based on Shackleton (1974)).Directly comparing that equation to the foraminifer data we revisit here would thus present an obvious circularity.For this reason, we compute here an "MIT calibration" corresponding to the York regression of all analyses performed at MIT, based on the I-CDES values originally reported in Table S1 of Anderson et al. (2021) (computation included in our code repository): There is no such circularity issue with the calibration study of Fiebig et al. (2021), which includes new measurements of the same two mammillary calcite samples along with a suite of calcites precipitated or re-equilibrated at much higher temperatures.We thus use here the published version of their calibration equation: Finally, one may also constrain equilibrium Δ 47 values at Earth-surface conditions by combining the measurements of Devils Hole and Laghetto Basso calcite reported in these two studies, for a total of 76 replicates with an external Δ 47 repeatability of 0.009‰.These independent measurements yield statistically indistinguishable values (RMSE = 2.6 ppm at the sample level), yielding the following "Devils Laghetto" equilibrium relationship: Δ47−ICDES = 39.09⋅ 10 3 ∕ 2 + 0.1535 (Devils Laghetto calibration) (4) The three calibrations above do not differ significantly at ambient temperatures: their maximum spread remains smaller than ±0.4°C between 7 and 30°C, with Equation 3 returning temperatures increasingly lower than the two other equations from 7 to 0°C, with the total spread reaching ±0.8°C at 0°C, well within the 95% confidence DAËRON AND GRAY 10.1029/2023PA004660 14 of 29 bounds for any of these regressions (about ±1.8°C for MIT and Fiebig et al. (2021) and around ±1.2°C for Devils Laghetto).

Δ 47 Calibration of Planktic Foraminifera
As shown in Figures 10 and 11, Δ 47-ICDES values for all concordant planktic samples, when plotted against T 18 , are in excellent agreement with all three inorganic calibrations (2-4).Within the concordant planktic data set (52  4 for planktic (green) and benthic (purple) species, or by genus (blue).Dark bars correspond to ±95% limits based on the SE of best-fit values; light bars represent the total spread of observations.Gray shaded regions correspond to the calcite calibrations of Kim and O'Neil (1997) and Shackleton (1974) in the range of 0-25°C.Also shown is the aragonite calibration of Grossman and Ku (1986) which potentially applies to H. elegans, the only aragonitic species shown here.
DAËRON AND GRAY Strikingly, the discordant planktic samples in both studies appear to follow the same relationship between Δ 47 and T 18 as concordant foraminifera.
The RMSWD of 0.9 for the regression of the concordant planktic samples does not change significantly when also including discordant samples, and its value around one implies that the regression residuals are consistent with those expected from the joint uncertainties in Δ 47 and T 18 .This is in line with our earlier hypothesis that the discordant samples simply precipitate in deeper, colder water than we would expect based on typical habitat depths.
When including both concordant and discordant samples, the reprocessed results of Peral et al. (2018) and those of Meinicke et al. (2020) are once again statistically indistinguishable (ANCOVA p-values of 0.42 and 0.13 for slope and intercept, respectively).We may thus compute the following best-fit regression for all planktic foraminifera considered here (but see Section 3.4.3below): This equation may be reformulated as a sum of two statistically independent components to simplify computing regression standard errors: Based on these results, we conclude that the relationship between calcification temperatures and Δ 47-ICDES values in planktic foraminifer tests is indistinguishable from that observed for inorganic calcite precipitated from solutions with isotopically equilibrated dissolved inorganic carbon (DIC).In practice, this means that the formulas in Equations 2-6 should all yield adequate-and statistically indistinguishable, cf., Figure 11-reconstructions of planktic foraminifer calcification temperatures.However, in view of the substantial minority of planktic foraminifera whose δ 18 O and Δ 47 compositions both appear to record substantially deeper calcification than usually assumed, beware that these reconstructed temperatures do not necessarily reflect surface conditions exclusively.

Poor Constraints on 18 α for P. obliquiloculata
As noted in Section 2.5.3, to the best of our knowledge we lack observations usable to robustly constrain the relationship between 18 α and calcification temperature in P. obliquiloculata or in other species of the same genus.We thus originally assigned T 18 values for the three samples of that species based on an 18 α equation averaged over all planktic observations (Table 4, Figure 5).All three samples are flagged as discordant, with T 18 estimates 5-12°C colder than atlas temperatures (Figure 15), and all of them plot well below the overall planktic Δ 47 regression line, with some of the lowest regression residuals in the whole data set (Figure 12).Because the P. obliquiloculata samples are among the warmest in our data set, assigning them grossly inaccurate calcification temperatures is likely to strongly bias regression results.Using water-column-averaged monthly atlas temperatures, as we did for the cold, high-latitude samples above, is not particularly useful because the very large resulting uncertainties essentially nullify any influence these three samples may exert.Lacking a better option, we opted to exclude the P. obliquiloculata observations from the regression used to compute Equations 5 and 6.    5, with apparent residuals ranging from −7.5 to −0.6°C.The corresponding Z-scores (the "number of SE deviation" for each residual)  2018) and Piasecki et al. (2019), between oxygen-18 estimates of calcification temperatures (95% error bars), bottom mean annual temperatures (95% gray shading), and originally reported calcification temperatures (blue lines).Species listed with asterisks are those without direct observations constraining 18 α.
DAËRON AND GRAY 10.1029/2023PA004660 19 of 29 range from −6.4 to −0.2, and only 5 scores out of 15 lie within ±1.96, that is, the 95% confidence interval for a normal distribution (Figure 13).Plotting these residuals and Z-scores by species/genus, instead of averaging by core site, does not reveal any obvious correlation with genus, nor between infaunal and epifaunal species (vertical and horizontal diamonds in Figure 13, respectively).Only 24 out of these 45 Z-scores lie within ±1.96, once again making it very unlikely that the benthic residuals can be attributed to random analytical scatter.Judging from these large, systematic offsets, the results obtained by Peral et al. (2018) and Piasecki et al. (2019), if taken at face value, would appear to imply that benthic foraminifera do not follow the same relationship between Δ 47 and temperature as their planktic cousins nor, by extension, inorganic calcites.
Although other types of biogenic carbonates, such as corals and some brachiopods, display greater-than-expected Δ 47 values, most likely reflecting disequilibrium between water and DIC associated with CO 2 absorption (Bajnai et al., 2020;Davies et al., 2022;Letulle et al., 2022;Saenger et al., 2012), there does not appear to be any correlation, as shown in Figure 14, between the benthic residuals and the seawater chemistry at the core-top sites (e.g., salinity or calcite saturation).Alternatively, the issue of large benthic residuals could be mitigated (but not eliminated) by supposing that our planktic data set underestimates cold-end-member Δ 47 values by ∼20 ppm.Note, however, that such a large bias is unlikely to simply result from inaccurate temperature constraints, in view of the narrow range of water column temperatures for the coldest planktic samples (Section 3.2.2).
By contrast with these results, older core-top studies of foraminifer Δ 47 , predating the use of I-CDES reference materials, did not report any significant discrepancies between planktic and benthic foraminifera.Tripati et al. (2010), based on 24 planktic and 11 benthic foraminifer samples, did not observe any obvious difference in T-Δ 47 relationships between planktics and benthics, but any such difference would have been difficult to observe given that there was not much overlap in calcification temperatures between the two sample groups.Grauel et al. (2013) only analyzed three benthic samples (U. mediterranea and C. pachyderma), whose measured Δ 47 values are arguably 10-20 ppm greater than those of planktic samples with similar calcification temperatures, but it would be difficult to claim that this apparent offset exceeds the level of analytical uncertainty in that early study.
It is notable that all but two of the published benthic data points were analyzed in the early days of a single laboratory over a relatively short time frame.Without making unfair assumptions about the methodology used by Piasecki et al. (2019), it bears reminding that the standardization of raw Δ 47 values necessarily contributes to final analytical uncertainties, and that this contribution tends to affect samples analyzed together in a correlated manner (Daëron, 2021).After N. Meckler, who was one of the authors of the Piasecki et al. (2019) study and who reviewed the present work, suggested that this particular data set may not be as robust as it would be following today's best practices, we reviewed the corresponding raw data that she kindly shared and we concur that the level of replication of unknown samples and the temporal distribution of unknown versus standard replicates in that study was not ideal, making final average Δ 47 values potentially susceptible to substantial standardization errors.This hypothesis would also be supported, albeit circumstantially, by the much improved agreement between benthic δ 18 O c -derived and Δ 47 -derived Cenozoic temperatures when using our new planktic calibration (see Section 3.6.2below).
In light of all the above, we strongly advocate that new, independent studies should test whether the benthic observations we have so far, most of them from a single study, can be reproduced in different laboratories, for instance Figure 10.Foraminiferal Δ 47 as a function of calcification temperature as documented by the three studies considered here.As discussed in text, the three planktic samples with coldest T 18 values were assigned calcification temperatures based on the narrow range of monthly WOA23 temperatures between 0 and 1,500 m depth (Section 3.2.2),and three discordant planktic samples of species P. obliquiloculata were excluded due to poorly constrained 18 α values (Section 3.4.3).Benthic calcification temperatures are from original publications, where available, or otherwise redetermined from WOA23.The MIT calibration shown here is recalculated based on the full results listed in the Table S1 of Anderson et al. (2021), in order to avoid including the foraminifer observations re-assessed here (see Section 3.4.1).The low-temperature natural carbonates mentioned in Section 3.5 are shown as green squares and diamonds.
DAËRON AND GRAY 10.1029/2023PA004660 20 of 29 by obtaining tight constraints on foraminiferal Δ 47 in cold waters and comparing them, as was done in Figure 10, with published, robust values for natural carbonates formed at similar temperatures, for example, Laghetto Basso calcite, lacustrine carbonates from Lakes Joyce, Fryxell, and Vanda (Anderson et al., 2021), or A. colbecki scallops from Petrel Island (Huyghe et al., 2022).

Oxygen-18 Fractionation Between Seawater and Foraminifera
In the context of the times, it was natural for Craig (1965) and Shackleton (1974) to interpret observations linking 18 α and temperature in terms of isotopic equilibrium, even though they were well aware, as noted by Urey (1947), that "whether animals lay down carbonates in equilibrium with water" remained an open question.The answer to that question has strong practical consequences, however: in the classical case where two phases achieve isotopic equilibrium through equal opposite isotopic fluxes associated with a reversible reaction, the isotopic equilibrium constant can be expressed in terms of ratios of partition functions arising from statistical mechanics and generally depends only on temperature (Bigeleisen & Goeppert Mayer, 1947;Urey, 1947).By contrast, in the case of irreversible reactions or when opposing reaction fluxes differ greatly, as when carbonates precipitate rapidly from oversaturated solutions, the effective fractionation factor between phases depends on the reaction pathway(s) and their relative rates, and may thus vary with other factors than temperature such as pH, salinity, or ion concentrations (e.g., Devriendt et al., 2017;Watkins et al., 2014).As pointed out previously, however, it is entirely possible for a carbonate mineral to achieve clumped-isotope equilibrium despite having "bulk" δ 18 O and δ 13 C values out of equilibrium with water and/or DIC in its parent solution (e.g., Eiler, 2011;Watkins & Hunt, 2015).
The observations summarized in Figures 3-5 document how 18 α at a given temperature may substantially differ among different benthic and planktic foraminifer species, as known since Duplessy et al. (1970) and Shackleton et al. (1973).But it is also clear that carbonate ion concentrations and pH also affect apparent 18 α values in some planktic species (Spero et al., 1997), as do irradiance levels in some symbiotic species (Bemis et al., 1998;Spero, 1992;Spero & Lea, 1993).Defining species-specific 18 α calibrations, as was done here, is a flawed The "Devils Laghetto" regression (purple ellipse, Equation 4) only includes slow-growing calcite believed to achieve quasi-equilibrium Δ 47 values, based on the independent measurements (green triangles in Figure 10) reported by Anderson et al. (2021) and Fiebig et al. (2021).Due to the use of a quadratic formula by Fiebig et al. (2021), only their local slope and 95% confidence region for Δ 47 at 15°C are shown here.Right panel: difference in reconstructed temperatures using various calibrations.95% confidence bounds (not shown here) for the MIT and Fiebig et al. (2021) calibrations are both around ±1.8°C; those for the Devils Laghetto regression are about ±1.2°C.
DAËRON AND GRAY 10.1029/2023PA004660 21 of 29 but useful shorthand, where the species label is used as an imperfect alias for a complex set of chemical and metabolic conditions which we are still struggling to model quantitatively (Zeebe et al., 2008).However useful this approach may be for modern observations, it comes with the critical caveat that unless we improve our quantitative understanding of oxygen-isotope fractionation in foraminifera, we are ill-equipped to assess whether the modern, observable 18 α calibrations are applicable to past oceans with very different seawater chemistry.

Revisiting Benthic Δ 47 Records of Cenozoic Seawater Temperatures
Our use of species-specific 18 α calibrations, instead of a single general calibration as was done previously, leads to a shift of −1°C in temperatures reconstructed using the Peral et al. (2022) calibration, and of −0.5 to −3°C using the Meinicke et al. (2021) calibration, relative to their previously published equations (Figure 11).For Peral et al. (2022), the difference is mostly due to the switch from Kim and O'Neil (1997) to updated 18 α relationships, while the Meinicke et al. (2021) offset reflects both the switch from Shackleton (1974) and the inclusion, in the original publication, of Piasecki et al. (2019)'s benthic data.This offset is noteworthy in the context of the recent finding, by Meckler et al. (2022), that clumped isotopes in benthic foraminifera from the North Atlantic appear, based on the Meinicke et al. (2021) calibration under the assumption that benthic and planktic Δ 47 follow the same calibration function, to record Paleocene to Miocene temperatures much warmer (by 2-3°C on average, cf., Figure S5 in Supporting Information S1) than expected from classical oxygen-18 reconstructions (in this case, Cramer et al., 2011), with the discrepancy potentially resolved by accounting for poorly-constrained pH effects on benthic foraminifer δ 18 O c .
Simply updating the Δ 47 calibration used by Meckler et al. (2022) to our revised planktic calibration (Equation 5) virtually eliminates the average offset between the Δ 47 -derived paleotemperature estimates (T 47 ), and the T 18 values from Cramer et al. (2011) (Figures 16 and 17 and Figure S5 in Supporting Information S1) and brings the results of the two methods in much closer agreement over large spans of the Cenozoic (Figure 18).This is not to dispute the validity of the pH issues raised by Meckler et al. (2022) and others, but this finding highlights how sensitive some interpretations may be to our choice of 18 α and Δ 47 calibrations when applied to foraminifer records (see also Figures S6-S8 in Supporting Information S1 using other I-CDES calibrations).
Furthermore, the overall agreement between the two benthic temperature records doesn't preclude the possibility, as argued by Meckler et al. (2022), that clumped isotopes reveal previously unrecognized structure in the benthic carbonate record.In order to characterize any remaining mismatch between the δ 18 O c and reprocessed Δ 47 records, we compute ΔT 47-18 (Figure 17), defined as the difference between the new T 47 values and the T 18 values from Cramer et al. (2011).We estimate ΔT 47-18 uncertainties based on (a) the analytical uncertainties reported by Meckler et al. (2022), (b) the calibration uncertainties of Equation 6, and (c) the T 18 uncertainties reported by Cramer et al. (2011).In an attempt to smooth out analytical scatter, we then subject the ΔT 47-18 time-series to a LOWESS regression (Cleveland, 1979;Cleveland & Devlin, 1988) with bandwidths ranging from 5 to 25 Ma.We estimate the 95% confidence limits for the LOWESS curve based on a quasi-Monte Carlo simulation where we quasi-randomly generate 2 13 versions of the ΔT 47-18 data set, whose multivariate Gaussian scatter is able to sample the error estimates described above more efficiently than traditional Monte Carlo methods (Roy et al., 2023).As shown in Figure 17, ΔT 47-18 is not statistically different from zero (at 95% confidence level) over most of the Cenozoic.However, there are three time intervals (whose lengths are sensitive to smoothing bandwidth) centered around 57, 50, and 39 Ma, where T 47 is still significantly warmer than T 18 .Although this might conceivably reflect differences in spatial sampling between the two records, these offsets could just as likely indicate that one or more assumptions underlying the use of these methods are wrong during these intervals, the two most likely culprits 22 of 29 being δ 18 O sw reconstructions based on ice volume/composition estimates, and the assumption of constant 18 α relationships through time.Although the same issue might also explain Plio-Pleistocene values of T 47 which appear to be significantly colder than the T 18 estimates, this observation is quite sensitive to the choice of one Δ 47 calibration over another, because of relatively looser constraints on equilibrium Δ 47 for temperatures close to 0°C.Applying the MIT calibration instead of our planktic regression would increase Pleistocene T 47 estimates by 1°C, and applying a recent, more comprehensive compilation of Δ 47 calibration data based on 104 samples with formation temperatures down to −2°C (OGLS23 calibration, Daëron & Vermeesch, 2023) would increase them by 1.5°C, bringing them much closer to Pleistocene bottom water conditions.That being said, we acknowledge that the statistical treatment performed here remains rudimentary.In particular, we are well aware that our LOWESS procedure assumes statistically independent ΔT 47-18 uncertainties, which leads to well-known issues when smoothing data with correlated errors (e.g., Kohn et al., 2000).Bearing in mind the standardization issues mentioned in Section 2.3.2, this may or may not be problematic.It will be important to determine the extent and root causes of these mismatched intervals, because the scale of the discrepancies is far from negligible: an offset exceeding 3°C in deep seawater temperatures has strong implications for the constraints we can place on parameters such as polar amplification and the Earth's climate sensitivity using the benthic record (e.g., Gaskell et al., 2022;Hansen et al., 2013).In light of the improved agreement between T 47 and T 18 over much of the Cenozoic, prior estimates of these parameters based on benthic δ 18 O c records are not likely to be substantially underestimated.

Clumped-Isotope Thermometry of Planktic Foraminifera
The issue of a planktic Δ 47 calibration is more straightforward, based on (a) the agreement between Peral et al. (2018) and Meinicke et al. (2020);(b) the agreement between concordant versus discordant samples; (c) the agreement between planktic foraminifera and the (mostly) inorganic calibration data of Anderson et al. (2021) and Fiebig et al. (2021).The clumped-isotope compositions of planktic foraminifera thus appear to offer robust constraints on their calcification temperatures.However, such reconstructions should take into account the gaps in our knowledge of true mineralization depths, even for species whose living depths appear to be well-known.Similarly, our ability to reconstruct the δ 18 O sw values of ancient oceans still critically depends on our knowledge of the laws governing oxygen-18 fractionation in different foraminiferal species, and how they may have varied through time.

Conclusion
Here we describe an easily extendable, open-source framework to systematically compile and combine data relevant to the interpretation of foraminiferal δ 18 O c and Δ 47 records.Using the best currently available constraints, it is clear that 18 α calibrations differ markedly between species, although 18 α sensitivity to temperature remains indistinguishable from that of inorganic calibrations such as Kim and O'Neil (1997) or Daëron et al. (2019).We should consider these species-specific calibrations as a flawed but useful shortcut, potentially masking a complex set of chemical and metabolic processes which may vary through time.Based on a large number of observations, the δ 18 O c values of most planktic samples are consistent with seawater temperature and δ 18 O sw over their expected living depth range.However a non-negligible proportion have heavier than predicted δ 18 O c values best explained by calcification in deeper, colder waters, highlighting the limits of our a priori knowledge of when and where planktic calcification occurs.
Based on these newly compiled 18 α observations, we also revisit the assignment of oxygen-18-based calcification temperatures for the data reported by Peral et al. (2018) and Meinicke et al. (2020).We find that Δ 47 of planktic foraminifera in these two studies are in excellent agreement with the largely inorganic I-CDES calibrations of Anderson et al. (2021) and Fiebig et al. (2021).The benthic data reprocessed here is more ambiguous, however.On one hand, the available modern benthic observations yield apparent Δ 47 -based temperatures colder by up to  Meinicke et al. (2021).Note the conspicuous offset between the Δ 47 paleotemperatures (round markers, with 95% confidence intervals) and reconstructions based on benthic foraminifer δ 18 O c (orange lines with shaded confidence limits).Right column: using this study's planktic regression instead (Equation 5) largely reconciles these results with the δ 18 O c record (see Figure 17 and Figure S5 in Supporting Information S1 for corresponding plots of the offset between T 47 and T 18 , and Figures S6-S8 in Supporting Information S1 using the different I-CDES calibrations).When using the planktic calibration, the average of δ 18 O sw before 45 Ma is −0.60 ± 0.16‰ (2SE) for the Δ 47 samples, to be compared with an average value of −0.73 ± 0.4‰ for the same period of the Cramer et al. (2011) reconstruction.
DAËRON AND GRAY 10.1029/2023PA004660 25 of 29 7.5°C than local bottom seawater; on the other, applying an equilibrium Δ 47 calibration to the benthic samples of Meckler et al. (2022) reconciles their results to the first order with the deep ocean temperature record from benthic δ 18 O c over most of the Cenozoic, highlighting how sensitive some interpretations may be to our choice of 18 α and Δ 47 calibrations.This apparent contradiction may be readily explained by methodological limitations in one of the modern benthic studies, but conclusively proving that this is the case will require new Δ 47 measurements of benthic foraminifera from well-constrained core tops.Nevertheless, deep ocean temperatures derived from Δ 47 and δ 18 O appear to remain irreconcilable during some Late Paleocene and Eocene intervals, suggesting the breakdown of one or more of the assumptions underlying the paleothermometers, such as δ 18 O sw reconstructions and/or 18 α relationships.Solving these issues will have direct implications on the constraints we can place on parameters such as climate sensitivity and polar amplification using the paleoclimate record, and more generally on our understanding of past and future climates.2022), reprocessed using this study's planktic regression (Equation 5), and the corresponding δ 18 O c -derived T 18 record from Cramer et al. (2011).Vertical error bars account for estimated analytical and calibration errors on Δ 47 as well as the originally reported T 18 uncertainties.Green shaded area is the central 95% confidence band for a LOWESS regression of ΔT 47-18 with a bandwidth equal to 10 Ma.Top right: kernel density estimation for the whole (unsmoothed) data set, showing that the longterm average of ΔT 47-18 is close to zero.Center panel: white areas correspond to periods when ΔT 47-18 does not significantly differ from zero, and shaded areas to periods when T 47 is significantly warmer (in red) or colder (in blue) than T 18 at the 95% confidence level, with color density corresponding to the 2.5% and 97.5% quantiles, respectively, of (T 47 -T 18 ).Note that color densities conservatively denote the minimum level of mismatch (at 95% confidence); average values of smoothed ΔT 47-18 are further away from zero.Bottom panel: smoothed benthic δ 18 O c record of Westerhold et al. (2020).Cramer et al. (2011).In yellow: 95% confidence band of a LOWESS regression of the Meckler et al. (2022) data, converted to T 47 using the Meinicke et al. (2021) calibration.In blue: 95% confidence band of a LOWESS regression of the same data, but converted to T 47 using our new planktic calibration (Equation 5).The LOWESS bandwidth was chosen arbitrarily to yield approximately the same width and level of detail as the Cramer et al. (2011) reconstruction.LOWESS 95% confidence limits are estimated using a quasi-Monte Carlo simulation where we quasi-randomly generate 2 13 versions of the T 47 data set.

Acknowledgments
Original this study grew out MD's to Utrecht at the invitaof M. Ziegler and I. J. Kocken.This work benefitted in many ways from the stimulating discussions we had with M. Peral, J.-C.Duplessy, C. Waelbroeck, E. Michel, and members of the Paléoceans group at LSCE.We are grateful to N. Meinicke and N. Meckler, who graciously shared unpublished I-CDES-reprocessed calibration data.This report was greatly improved by the detailed and thoughtful comments of three reviewers, including N. Meckler's whose candid feedback was remarkably fair and constructive regardless of our differences of interpretation on some issues.We also benefitted greatly from M. Huber's editorial handling, whose suggestions substantially improved the final contents of this work.

Figure 3 .
Figure3.Constraints linking 18 α to calcification temperatures for species listed in Table2.Although these fractionation relationships differ between species, their temperature sensitivities remain indistinguishable from that for inorganic calcite (blue lines).All temperatures observations are from the original studies.

Figure 4 .
Figure 4. Sources of uncertainty affecting the use of oxygen-18 thermometry to constrain planktic foraminifer calcification temperatures.(a) Overall residuals for the depth-and month-integrated δ 18 O sw values used to estimate seawater oxygen-18 composition for each planktic foraminifer sample.(b)Overall residuals for the species-specific oxygen-18 relationships listed in Table4, based on all studies shown in Figure3.

Figure 5 .
Figure 5. Graphical summary of best-fit B values listed in Table4for planktic (green) and benthic (purple) species, or by genus (blue).Dark bars correspond to ±95% limits based on the SE of best-fit values; light bars represent the total spread of observations.Gray shaded regions correspond to the calcite calibrations ofKim and O'Neil (1997) andShackleton (1974) in the range of 0-25°C.Also shown is the aragonite calibration ofGrossman and Ku (1986) which potentially applies to H. elegans, the only aragonitic species shown here.
the Peral et al. (2018) results and those of Meinicke et al. (2020) are statistically indistinguishable (ANCOVA p-values of 0.63 and 0.14 for slope and intercept, respectively).

Figure 6 .
Figure 6.Planktic samples are categorized as concordant (top panel) or discordant (bottom panel) depending on whether the oxygen-18 estimate of their calcification temperature (red or black 95% error bar) overlaps with the seasonal distribution of temperatures (blue histogram) in the assumed calcification depth range for that species.

Figure 7 .
Figure 7. Comparison, for each planktic sample fromPeral et al. (2018) andMeinicke et al. (2020), between oxygen-18 estimates of calcification temperatures (95% error bars) and year-long distribution (blue histograms) of monthly mean temperatures over the assumed living depth interval.T 18 error bars for concordant and discordant samples are shown in black or red, respectively.

Figure 8 .
Figure8.Distribution of discordant samples in our clumped-isotope data set(Peral et al., 2018; Meinicke et al., 2020,  top two rows)  and in a much larger compilation of Holocene core tops(Malevich et al., 2019, bottom row).Top row: T 18 versus atlas temperatures over assumed living depth range (left panel) or over 0-500 m (right panel) for the clumped-isotope data set.Center row: Comparison of T 18 with local bottom ocean temperatures and with atlas temperatures over assumed living depth range (left panel) or over 0-500 m (right panel) for the clumped-isotope data set Bottom row: T 18 versus atlas temperatures over assumed living depth range (left panel) or over 0-500 m (right panel) for the Malevich et al. data set.Right panel in each row only shows previously discordant samples.

Figure 11 .
Figure 11.Left panel: 95% confidence ellipses for the regression slope and Δ 47 value at 15°C for various regressions.The "Devils Laghetto" regression (purple ellipse, Equation4) only includes slow-growing calcite believed to achieve quasi-equilibrium Δ 47 values, based on the independent measurements (green triangles in Figure10) reported byAnderson et al. (2021) andFiebig et al. (2021).Due to the use of a quadratic formula byFiebig et al. (2021), only their local slope and 95% confidence region for Δ 47 at 15°C are shown here.Right panel: difference in reconstructed temperatures using various calibrations.95% confidence bounds (not shown here) for theMIT and Fiebig et al. (2021)  calibrations are both around ±1.8°C; those for the Devils Laghetto regression are about ±1.2°C.

Figure 12 .
Figure 12.Regression residuals for the planktic samples from Peral et al. (2018) and Meinicke et al. (2020).Error bars in top panel are with 95% confidence limits from combined uncertainties in Δ 47 and calcification temperatures.Z-scores in bottom panel are computed based on these same error bars.

Figure 13 .
Figure 13.Temperature residuals (left panels) and corresponding Z-scores (right panels) for benthic samples from Peral et al. (2018) and Piasecki et al. (2019), relative to the planktic regression of Equation 5. Top row: results averaged by core top.Bottom row: results averaged by species at each core top.Vertical and horizontal diamonds represent infaunal and epifaunal species, respectively.Colored markers correspond to different genera.The right axis in each of right panels indicates the p-values corresponding to Z-scores for a Gaussian distribution.

Figure 15 .
Figure 15.Comparison between oxygen-18 estimates of calcification temperatures (95% error bars) and year-long distribution (blue histograms) of monthly mean temperatures over the assumed living depth interval for the three coldest planktic samples (left column, see Section 3.2.2) and the P. obliquiloculata samples (right column, see Section 3.4.3).

Figure 16 .
Figure 16.Left column: original reconstructions of Cenozoic deep ocean temperatures and δ 18 O sw byMeckler et al. (2022) using the Δ 47 calibration ofMeinicke et al. (2021).Note the conspicuous offset between the Δ 47 paleotemperatures (round markers, with 95% confidence intervals) and reconstructions based on benthic foraminifer δ 18 O c (orange lines with shaded confidence limits).Right column: using this study's planktic regression instead (Equation5) largely reconciles these results with the δ 18 O c record (see Figure17and FigureS5in Supporting Information S1 for corresponding plots of the offset between T 47 and T 18 , and Figures S6-S8 in Supporting Information S1 using the different I-CDES calibrations).When using the planktic calibration, the average of δ 18 O sw before 45 Ma is −0.60 ± 0.16‰ (2SE) for the Δ 47 samples, to be compared with an average value of −0.73 ± 0.4‰ for the same period of theCramer et al. (2011) reconstruction.

Figure 17 .
Figure17.Top left: plot of ΔT 47-18 , defined as the difference between the benthic T 47 values ofMeckler et al. (2022), reprocessed using this study's planktic regression (Equation5), and the corresponding δ 18 O c -derived T 18 record fromCramer et al. (2011).Vertical error bars account for estimated analytical and calibration errors on Δ 47 as well as the originally reported T 18 uncertainties.Green shaded area is the central 95% confidence band for a LOWESS regression of ΔT 47-18 with a bandwidth equal to 10 Ma.Top right: kernel density estimation for the whole (unsmoothed) data set, showing that the longterm average of ΔT 47-18 is close to zero.Center panel: white areas correspond to periods when ΔT 47-18 does not significantly differ from zero, and shaded areas to periods when T 47 is significantly warmer (in red) or colder (in blue) than T 18 at the 95% confidence level, with color density corresponding to the 2.5% and 97.5% quantiles, respectively, of (T 47 -T 18 ).Note that color densities conservatively denote the minimum level of mismatch (at 95% confidence); average values of smoothed ΔT 47-18 are further away from zero.Bottom panel: smoothed benthic δ 18 O c record ofWesterhold et al. (2020).

Figure 18 .
Figure18.Comparison of smoothed reconstructions of deep ocean temperature.In gray: 90% confidence band of T 18 reconstructed byCramer et al. (2011).In yellow: 95% confidence band of a LOWESS regression of theMeckler et al. (2022) data, converted to T 47 using theMeinicke et al. (2021) calibration.In blue: 95% confidence band of a LOWESS regression of the same data, but converted to T 47 using our new planktic calibration (Equation5).The LOWESS bandwidth was chosen arbitrarily to yield approximately the same width and level of detail as theCramer et al. (2011) reconstruction.LOWESS 95% confidence limits are estimated using a quasi-Monte Carlo simulation where we quasi-randomly generate 2 13 versions of the T 47 data set.

Table 1
Core-Top Sites Considered in This Study, With Bottom Temperatures From WOA23, δ 18 O sw Piasecki et al. (2019)assigned benthic calcification temperatures based either on in situ measurements reported in earlier studies or on WOA estimates of bottom seawater temperatures.

. Estimates of Calcification Temperatures From the World Ocean Atlas
Locations of the core-top sites listed in Table1.

Table 2
Studies Used Here to Constrain How the Oxygen-18 Fractionation ( 18 α) Between Seawater and Foraminiferal CaCO 3 Varies With Temperature

Table 3
Best Estimates of Habitat Depth Ranges for Planktic Species 9 of 29 including all monthly means over the assumed depth range for the sample's species.Most of the planktic samples in our Δ 47 data set (52 out of 68) pass this test, but the remaining 16 "discordant" samples yield significantly cooler T 18 values than the coldest environmental conditions (Figures