A robust calibration of the clumped isotopes to temperature relationship for foraminifers

The clumped isotope (D47) proxy is a promising geochemical tool to reconstruct past ocean temperatures far back in time and in unknown settings, due to its unique thermodynamic basis that renders it independent from other environmental factors like seawater composition. Although previously hampered by large sample-size requirements, recent methodological advances have made the paleoceanographic application of D47 on small (<5 mg) foraminifer samples possible. Previous studies show a reasonable match between D47 calibrations based on synthetic carbonate and various species of planktonic foraminifers. However, studies performed before recent methodological advances were based on relatively few species and data treatment that is now outdated. To overcome these limitations and elucidate species-specific effects, we analyzed 14 species of planktonic foraminifers in sediment surface samples from 13 sites, covering a growth temperature range of 0–28 C. We selected mixed layer-dwelling and deep-dwelling species from a wide range of ocean settings to evaluate the feasibility of temperature reconstructions for different water depths. Various techniques to estimate foraminifer calcification temperatures were tested in order to assess their effects on the calibration and to find the most suitable approach. Results from this study generally confirm previous findings that there are no species-specific effects on the D47-temperature relationship in planktonic foraminifers, with one possible exception. Various morphotypes of Globigerinoides ruber were found to often deviate from the general trend determined for planktonic foraminifers. Our data are in excellent agreement with a recent foraminifer calibration study that was performed with a different analytical setup, as well as with a calibration based exclusively on benthic foraminifers. A combined, methodologically homogenized dataset also reveals very good agreement with an inorganic calibration based on travertines. Our findings highlight the potential of the D47 paleothermometer to be applied to recent and extinct species alike to study surface ocean temperatures as well as thermocline variability for a multitude of settings and time scales. 2019 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/ licenses/by/4.0/).


INTRODUCTION
The investigation of past climate change relies to a large degree on reconstructions of ocean conditions. Not only is the ocean a major player in the global climate system responsible for storage and redistribution of heat; ocean sediments also constitute vital archives that can be used for climate reconstructions on various time scales (e.g. Zachos et al., 2001). Several geochemical proxies have been used to reconstruct ocean conditions, such as sea surface temperature, prior to the instrumental era. These include paleothermometers, for instance stable oxygen isotopes (d 18 O) measured in calcareous tests of foraminifers (Urey, 1947;Epstein et al., 1951;Emiliani, 1966;Pearson, 2012), Mg/Ca ratios of foraminifers (Nü rnberg et al., 1996;Lea et al., 1999), the unsaturation of organic ketone molecules (U K 37 ) produced by marine nannoplankton (Brassell et al., 1986;Prahl et al., 1988) and the TEX 86 proxy based on membrane lipids of archaea (Schouten et al., 2002).
Although each of these proxies is characterized by its individual strengths and weaknesses, two sources of uncertainty are particularly problematic for the application on longer time scales: First, temperature proxies such as d 18 O and Mg/Ca in foraminifers that depend on the fluid composition the signal is formed from require precise knowledge of the seawater composition (d 18 O seawater and Mg/Ca, respectively) at the time of formation. Second, the recorded signal in most proxies is to some extent influenced by biological processes that need to be accounted for. These socalled vital effects can be species-specific (e.g. Bemis et al., 1998;Turich et al., 2007;Regenberg et al., 2009;Ho et al., 2014;Ezard et al., 2015;Jentzen et al., 2018;Polik et al., 2018), thus increasing the uncertainty of environmental reconstructions, particularly for data that is derived from extinct species. Clumped isotopes have the potential to circumvent both of these problems.
The carbonate clumped isotope method is based upon the fact that the abundance of doubly substituted carbonate ions containing both rare isotopes 18 O and 13 C increases with colder temperature. While carbonate formed under equilibrium conditions generally contains more bonds between two heavy isotopes than expected for a random (stochastic) distribution, the amount of this excess is temperature dependent (e.g. Bigeleisen and Mayer, 1947;Urey, 1947;Eiler and Schauble, 2004;Schauble et al., 2006). Temperature-dependent equilibrium constants determine the relative abundance of the 13 C 18 O 16 O 16 O 2À isotopologue in isotope exchange reactions (e.g. Wang et al., 2004;Schauble et al., 2006).
The relative abundance of these multiply substituted (clumped) isotopologues in a carbonate can therefore be used as a measure for its formation temperature (e.g. Eiler and Schauble, 2004;Ghosh et al., 2006;Schauble et al., 2006;Eiler, 2007). An important aspect distinguishing this paleothermometer from other approaches is its independence from the isotopic composition of the aqueous solution it precipitated from Eiler, 2007). The D 47 value measured in acid-liberated CO 2 reflects the excess abundance (in ‰) of doubly substituted molecules relative to a random distribution that is calculated for each sample . The relationship between the normalized D 47 value and carbonate formation temperature has been defined by theoretical, experimental, and empirical calibrations (e.g. Ghosh et al., 2006;Ghosh et al., 2007;Tripati et al., 2010;Grauel et al., 2013;Henkes et al., 2013;Zaarur et al., 2013;Wacker et al., 2014;Kele et al., 2015;Bonifacie et al., 2017;Kelson et al., 2017;Breitenbach et al., 2018;Peral et al., 2018;Petersen et al., 2019).
The first two studies of foraminifers (Grauel et al., 2013;Tripati et al., 2010) indicated that within their sample sets there is evidence that foraminifers follow a single D 47 -temperature relationship. Hence these studies paved the way for the application of clumped isotope thermometry on foraminifers from sedimentary archives (e.g. Tripati et al., 2014). However, due to a relative lack of data for the cold temperature end of these foraminifer calibrations, coupled with recent developments in data processing and correction methods, additional studies have been underway: The more recent works Peral et al., 2018;Piasecki et al., 2019) have utilized progress made in community-wide efforts to facilitate interlaboratory data comparison using the recalculation of absolute isotope ratios (Daëron et al., 2016;Schauer et al., 2016), and further redefinition of carbonate standards , building on the definition of an ''absolute" reference frame by Dennis et al. (2011), as well as newly developed analytical approaches (Hu et al., 2014). Peral et al. (2018) and Piasecki et al. (2019) focused mostly on planktonic and benthic foraminifers, respectively, and both concluded that foraminifer-based D 47 -T calibrations agree with inorganic calibrations. Although the two equations are statistically indistinguishable from each other temperatures calculated with these calibrations diverge towards the cold end of ocean temperatures ($0°C) by more than 2.5°C. A challenge for surface sediment-based calibrations using foraminifers is the difficulty in determining the actual calcification temperature, particularly for planktonic foraminifers.
Additionally, the small temperature range recorded in foraminifers poses a persisting problem for foraminifer D 47 -T calibrations, because large datasets are required to extract an accurate linear relationship from the relatively large uncertainty of individual measurements (Fernandez et al., 2017). The relatively low signal to noise ratio compared to other geochemical proxies such as d 18 O might mask smaller, potentially species-specific, secondary effects. These potential secondary effects include pH or kinetic effects suggested for other types of marine biogenic carbonates/marine invertebrate organisms (e.g. Bajnai et al., 2018;Daeron et al., 2019;Davies and John, 2019). Divergences among older foraminifer-based calibrations can partly be explained by methodological or inter-laboratory differences such as the 17 O correction (Schauer et al., 2016;Bernasconi et al., 2018;Petersen et al., 2019), the choice of standards , the acid digestion temperature (Defliese et al., 2015) and the common acid bath vs. the micro-volume approach (reviewed in Spencer and Kim, 2015).
These uncertainties underline the importance of further studies investigating method-and laboratory-specific differences as well as potential species effects. Ultimately the aim is to determine a common foraminifer calibration to enable a widespread application of clumped isotope analysis in for-aminifers. At the same time, using clumped isotopes on foraminifers yields enormous potential for paleoceanographic reconstructions when coupled with other proxies: As highlighted by Breitenbach et al. (2018) and Evans et al. (2018), clumped isotope measurements can be combined with other foraminifer-based proxies to disentangle ocean temperature from other influences, such as changing seawater composition (e.g. past Mg/Ca changes, Evans et al. (2018)).
Similarly to other foraminifer-based proxies such as d 18 O (e.g. Mulitza et al., 1997), the D 47 signal in foraminifers could be used to reconstruct temperature gradients in the water column by comparing species from different depth habitats. In the case of D 47 using a single calibration is advantageous as it allows for a direct comparison of data from various species without any additional uncertainty introduced by individual, species-specific calibrations.
Here, we present new foraminifer-based D 47 data analyzed on 14 species of planktonic foraminifers from surface sediments from 13 sites, covering a calcification temperature range of $0-28°C. We study potential species-specific effects on the clumped isotope measurements and compare our results to recent D 47 -T calibrations. Data from our study are combined with data from Peral et al. (2018) and Piasecki et al. (2019) to determine a common foraminifer-based calibration and compare it to inorganic D 47 -T calibrations. Finally, we evaluate whether temperature reconstructions for different depth levels of the water column are feasible with the reduced sample requirements of our analytical approach.

Sites and samples
Surface sediment samples (mostly 0-1 cm, see Table 1) from 13 sites in the Nordic Seas, the North Atlantic, Indian Ocean, and Pacific Ocean were used in this calibration study ( Fig. 1 and Table 1). The sites were selected to cover a wide range of oceanographic conditions and species of foraminifers.
Species characteristics and assumptions regarding their ecology are crucial to the interpretation of the D 47 data, in particular when various species are compared to each other. Table 2 provides a summary of the species-specific characteristics considered, such as the presence of photobiotic symbionts, spatial and seasonal distribution, preferred habitat depth, the tendency to form gametogenetic calcite prior to reproduction and the accumulation of thick calcite crusts.

Sample preparation
All samples were wet-sieved over a 63 lm sieve and dried at 50°C. The coarse fraction was then dry-sieved into size fractions of < 150 lm, 150-250 lm, 250-315 lm, 315-355 lm and 355-400 lm, 400-500 lm and >500 lm. For each sample, at least 2 mg of foraminifer tests of each species were collected under the microscope. The preservation of all individual specimens was assessed under the microscope and translucent specimens were preferred for analysis where available. Only fully intact pristine-looking tests were selected for analysis. Broken specimens as well as specimens containing substantial infillings, secondary calcite overgrowth or oxide coatings were excluded from analysis. Additionally, SEM images were used for selected samples to confirm that the foraminifers were well preserved. The size fractions used for the analysis were individually selected for each sample (Table 3). We attempted to obtain enough adult specimen of each species to allow an accurate isotope analysis while keeping the size range as narrow as possible in order to limit ontogenetic effects. Therefore, the size fraction in which most of the adult specimens at a given site were found was selected for analysis. Size fractions with a small number of very large individuals were excluded as well as smaller size fractions potentially containing juvenile specimens.
A modified version of the cleaning protocol for foraminiferal Mg/Ca analysis published by Barker et al. (2003) was used to remove contaminants. Batches of 200 to 1300 lg of foraminifer tests were cleaned at a time with each sample being represented by at least three individually cleaned sub-samples. The foraminifers were placed between two glass plates and carefully crushed in order to crack open all chambers and allow for subsequent cleaning. The crushed tests were sonicated three times for 30 s with DI water and rinsed with DI water after each sonication step. Samples were then sonicated once for 15 s with methanol and subsequently rinsed three times with DI water. After removing excess DI water, the cleaned samples were dried in an oven at 50°C. The comparison of several cleaning steps and intensities ; see also Grauel et al., 2013, Peral et al., 2018 led to the decision to leave out the H 2 O 2 treatment suggested by Barker et al. (2003) to remove organic material for Mg/Ca analysis.

Measurement procedure
All measurements took place between November 2016 and March 2018 with replicate measurements of individual samples spread over several weeks to months. Measurements were performed using a Thermo Scientific MAT 253Plus mass spectrometer coupled to a KIEL IV carbonate device (Thermo Fisher Scientific, Bremen, Germany) equipped with a Porapak trap to capture organic contaminants (Schmid and Bernasconi, 2010). The Porapak trap was operated at À20°C during the measurement. Between runs, the trap was heated to 120°C for at least one hour for cleaning. In the Kiel device, each aliquot is reacted individually with phosphoric acid at 70°C.
We measured 15 to 30 (average n = 19) aliquots (100-130 lg each) for every sample. Average values for stable were then calculated and used for the calibration. Samples were measured using the long-integration dual-inlet (LIDI) method described by Hu et al. (2014). This method measures the sample and reference gas separately with decreasing pressure from a micro-volume. Samples were measured first for 400 seconds with signals typically decreasing from $16 V to $10 V (m/z = 44). Afterwards the reference gas was adjusted to the same initial pressure and measured accordingly. The shot noise limit for these intensities and integration times is 0.03‰ when applying a typical scale decompression factor for this system. Peak scans (varying high voltage between 9.4 and 9.6 kV) at m/z 44 intensities of 5, 10, 15, 20 and 25 V were performed once a day for the pressure baseline correction following Bernasconi et al. (2013) and Meckler et al. (2014). This and subsequent corrections were applied using the Easotope software package (John and Bowen, 2016). The ''Brand parameters" suggested by Daëron et al. (2016) and Schauer et al. (2016) were used for the 17 O correction. In every run (maximum 46 aliquots), the sample measurements were bracketed by five blocks consisting of the four ETH carbonate standards ETH1 to ETH4 using the values reported in Bernasconi et al. (2018). Three of these standards were used to transfer the results into the absolute reference frame (Dennis et al., 2011), which corrects the measurements for offsets and scale compression, while the fourth standard was treated like a sample to monitor the corrections applied to the data. In addition, the long-term averages of ETH 1 and 2 were used to monitor the pressure baseline correction which should result in the same D 47 values . Baseline-corrected D 48 values were used as a contamination monitor. No contamination was detected in any of the samples.
The average long-term reproducibility (1SD) of D 47 measured in the carbonate standards after correction varies from 0.031‰ to 0.038‰ (see Appendix Table A1). Each replicate measurement was corrected using a total number of 60-80 standard measurements from the same and adjacent days. The exact number was chosen according to the instrument stability (see Piasecki et al., 2019 for more information). In addition to correcting for instrumental drift using carbonate standards, we distributed replicate measurements of all samples over long time intervals of up to several months to ensure that aliquots from as many samples as possible were measured in parallel.
The ETH standard values adopted in this study were reported by Bernasconi et al. (2018), who used a +0.062‰ correction (Defliese et al., 2015) for differences in acid fractionation between digestion at 70°C and the classical 25°C digestion temperature. Applying the recently updated acid fractionation correction of 0.066‰ for this temperature difference (Petersen et al., 2019) would increase all of our D 47 values by 0.004‰. Should the ETH standard values be updated in the future it is possible to recalculate the values from this study using the replicate level raw data that is provided in the EarthChem database (https://doi. org/doi:10.1594/IEDA/111435).

Foraminifer calcification temperature estimates
In order to establish a calibration relating the D 47 signal in planktonic foraminifers to water temperature, the calcification temperature for each species at each site needs to be estimated. Since our sample set comprises a large number of different species from a wide range of geographical regions, the estimation of calcification temperatures is subject to a number of uncertainties. Calcification temperatures were hence calculated using different approaches (Method 1 to 3) in order to find the optimal solution.
Method 1: If calcification depths and possible seasonality effects are known for the species and geographical regions, the water temperature can be taken from reanalysis data presented in the World Ocean Atlas (WOA; Locarnini  Schlitzer, 2018) showing surface sediment locations (pink filled circles), from which foraminifer specimens were selected. Bathymetric data from GlobHR (reference available in Ocean Data View). Table 2 Summary of species-specific characteristics for the planktonic foraminifers analyzed in this study (Schiebel and Hemleben, 2017, and references therein).

Species
Spinose Symbiontbearing    , 2010). Due to the fact that typical foraminifer water depth habitats vary over time, both dependent on the availability of prey and ontogeny (Schiebel and Hemleben, 2017), the environmental signal recorded by the bulk foraminifer tests is rather an average across the entire life cycle of individuals and assemblages (e.g. Deuser and Ross, 1989). We therefore used published apparent calcification depths from studies utilizing other temperature proxies such as oxygen isotopes and Mg/Ca on planktonic foraminifers (e.g. Schiebel and Hemleben, 2017 and references therein) as basis for atlas-based calcification temperatures (Table 3). This approach suffers from insufficient information regarding foraminiferal calcification depths for individual regions and species. Also, the temperature information derived from the World Ocean Atlas may not provide the same accuracy everywhere because the data is interpolated to all ocean regions and standard depth levels. Here we used the annual mean water temperature of the assumed calcification depth as basis for further calculations. Seasonal temperature variability was factored into the uncertainty calculations. For Method 1, the overall temperature uncertainty is given by the standard deviation of all monthly temperatures at the assumed calcification depth of each species at a given site.  Shackleton et al., 1973). For this approach, the d 18 O of the seawater (d 18 O seawater ) is needed, which we obtained for the assumed calcification depths from the database of LeGrande and Schmidt (2006). Due to speciesspecific disequilibrium effects (suggested by Urey, 1947;Shackleton et al., 1973), specific d 18 O-T calibrations have been derived for certain species and ocean regions (reviewed in Pearson, 2012). However, as such calibrations are only available for some of the species studied here, we decided to apply the multi-species d 18 O-temperature equations of Kim and O'Neil (1997) and Shackleton (1974) for the entire dataset, acknowledging that some of the reconstructed temperatures may be biased by species-specific effects. The extent of such effects, however, is still a matter of debate (Niebler et al., 1999;Schiebel and Hemleben, 2017). We tested the sensitivity of our results to corrections for species-specific differences using the available information (Appendix Table A2). Applying species-specific d 18 O corrections led to a calibration line within the error of the uncorrected d 18 O data (Table A3). Furthermore, using species-specific corrections hardly changes the influence of individual species on the slope of the calibration line (Fig. A1). Because of the uncertainty introduced by the large spread of published values for species-specific corrections and the lack of improvement to our fit when applying a correction, we decided against applying any correction to the d 18 O calcite data used for the calibration. Temperature estimates from two commonly used calibrations (Shackleton, 1974, equation D;O'Neil, 1997 modified by Bemis et al., 1998 Table 1) were compared. Following the recommendation of Bemis et al. (1998) and Pearson (2012), factors of 0.20‰ (Epstein et al., 1953) and 0.27‰ (Hut, 1987) were used to convert from VSMOW to VPDB for the d 18 O-T calibrations of Shackleton (1974) and Kim and O'Neil (1997), respectively. The uncertainty of each calcification temperature estimate was calculated as a combination of several individual uncertainties: We used the standard deviation of the measured d 18 O calcite values to account for the variability of the sample material and the uncertainty of the isotope measurement. Mean d 18 O seawater values were calculated for the depth intervals that were assumed to best represent the calcification depths (Table 3, same as in Method 1) of the samples. The standard deviation of d 18 O seawater over this depth interval was taken as uncertainty. An additional 0.2‰ were added to account for the uncertainty introduced by the gridded dataset (following Peral et al. (2018)).
Method 3: In order to avoid relying on assumed depth habitats, calcification temperatures were also estimated from hypothetical d 18 O depth profiles of calcite formed in the water column (d 18 O calculated ). For this approach, the WOA temperature data are used in combination with a published d 18 O-T calibration (Shackleton, 1974) to produce vertical profiles of hypothetical d 18 O to which the measured foraminiferal d 18 O calcite is compared, in order to determine the apparent calcification depth (ACD) and subsequently the corresponding WOA temperature. This approach has previously been applied in Mg/Ca-temperature calibration studies on foraminifers (e.g. Groeneveld and Chiessi, 2011). The d 18 O calculated of calcite is calculated for the entire water column at each sample site, combining WOA water temperature data (Locarnini et al., 2010) and d 18 O seawater values (LeGrande and Schmidt, 2006). Method 3 has the advantage that no assumptions regarding habitat depth are needed, neither for the atlas-derived water temperature nor for the d 18 O of the water. This way, seasonal or ontogenetic variations in the calcification depth are accounted for as well.
If the measured d 18 O calcite of a sample was not found in the d 18 O calculated values (calculated temperatures warmer/ colder than the observed maximum/minimum annual mean water temperature), the annual mean water temperature at 0 m depth was used. Therefore, extreme temperature cases not represented in WOA are excluded with this method, eliminating temperatures warmer or colder than observed at these sites. Nonetheless, Method 3 is associated with several uncertainties stemming from both the analytical and the natural variability of foraminiferal d 18 O, as well as from the atlas d 18 O seawater . Uncertainties were propagated using a Monte Carlo approach. First, assuming a conservative error of 0.2‰ for the atlas d 18 O seawater (following Peral et al. (2018)), we generated 10,000 iterations of d 18 O calculated (Step 1 in Fig. 2A), using the equation of choice. For all these iterations, we then performed Step 2 (Fig. 2B) considering the uncertainty in d 18 O calcite measurements (estimated from the standard deviation of replicate measurements) to obtain the ACD and calcification temperatures. We then calculated average ACDs and calcification temperatures for each sample from the individual iterations. Temperature estimates using all three approaches are compared in Table 3.

Linear regression
In order to account for the uncertainty in both D 47 and calcification temperature, we calculate regression slopes and intercepts using the method of York et al. (2004). This method is commonly used in regression analysis of clumped isotope calibration data (e.g. Huntington et al., 2009;Grauel et al., 2013;Peral et al., 2018), thus helps facilitate the intercomparison of calibrations across studies. We estimated the uncertainty on the slope and intercept and 95% confidence envelopes on the regression lines using quantiles of 100,000 bootstrap samples. These were obtained by randomly resampling with replacement from the original data with its associated uncertainties, therefore maintaining the original sample size.

D 47 , d 18 O and d 13 C data
Average data for each sample as well as environmental parameters such as the estimated d 18 O seawater values and calcification temperatures reconstructed using various approaches (see Section 2.4) are summarized in Table 3. The average D 47 data cover a range of 0.103‰ with a standard error of the mean for individual samples of 0.005-0.009. The lowest (0.653‰) and highest (0.756‰) D 47 value correspond to the lowest (À2.29‰) and highest (3.66‰) d 18 O calcite values, respectively (Fig. 3 A). Overall, there is a strong positive correlation (0.95 using Pearson's productmoment correlation) between both variables for the calculated averages. The standard deviation of replicate d 18 O measurements is 0.05-0.26‰ (standard error: 0.01-0.06‰). Mean d 13 C values for the samples measured in this study range between À1.2‰ and 3.2‰ with standard deviations between 0.04 and 0.24‰ (Fig. 3 B) and standard errors between 0.01 and 0.06‰. The d 13 C values and the D 47 signal do not show a clear relationship. The standard errors of the mean and the standard deviations are not correlated with the mean D 47 , d 18 O and d 13 C, respectively. Moreover, there is no systematic difference in the isotopic composition between species with and without photosymbionts.

Calcification temperatures
Because the calcification temperatures of planktonic foraminifers are challenging to estimate, we approximated them using three approaches (see Section 2.4). All three methods reveal strong correlations (correlation coefficient between À0.91 and À0.95 using Pearson's product momentum correlation) between estimated calcification temperature (10 6 /T 2 , T in K) and D 47 . Detailed information on the regression lines derived from the different temperature estimates can be found in the Appendix (Table A3). Despite the strong correlations that were found for all the different methods, calcification temperature datasets differ from each other ( Fig. 4A-D, Table 3).
The differences in estimated calcification temperature are largest between Method 1 using the World Ocean Atlas 2009 and the methods using d 18 O-T relationships (Fig. 4D). While the slopes of linear regression models for Methods 1 and 2 are similar (Fig. 4A), the dataset using WOA-based temperatures is characterized by larger variability (up to 13°C temperature difference for similar D 47 values, Fig. 4A). This is reflected in a lower correlation coefficient (À0.91 compared to À0.95 using Pearson's productmoment correlation). Hence, the uncertainty related to insufficient ecological information for certain species and/ or regions seems to lead to a larger uncertainty for temperature estimates using Method 1 compared to d 18 O-based approaches (Methods 2 and 3).
When comparing calcification temperatures which were derived from two different d 18 O-T calibration equations (Shackleton, 1974;Kim and O'Neil, 1997) using Method 2 (Fig. 4D), temperature estimates for the tropical species largely agree (average temperature difference 0.7°C), while towards the cold end of the calibration temperature estimates increasingly deviate from each other (maximum 2.2°C). As a result, the regression calculated using the Kim and O'Neil (1997) d 18 O-T calibration reveals a flatter slope (Fig. 4B). The coldest temperature estimates Fig. 2. Schematic drawing illustrating the two-step process (Method 3) used to assess apparent calcification temperatures from a combination of WOA-based temperature data and a d 18 O-T calibration. A: Annual mean temperature data (Locarnini et al., 2010) and d 18 O seawater data (LeGrande and Schmidt, 2006) are used to generate a vertical d 18 O calculated profile for calcite formed at any given location. B: The comparison between the measured d 18 O calcite value of a foraminifer sample and the theoretical d 18 O calculated profile is used to find the apparent calcification depth (ACD) of a foraminifer species. The annual mean water temperature at the ACD serves as best estimate for the calcification temperature of the foraminifer. calculated using Kim and O'Neil (1997) are well below 0°C . These temperature estimates do not agree with the available temperature data from the WOA, neither with the annual mean temperature nor seasonal extremes. In contrast, temperatures calculated using the d 18 O-T calibration of Shackleton (1974) are in agreement with the temperature ranges reported in the WOA.
Temperatures estimated using Method 3 show a good agreement within the error estimates with those derived from Method 2 ( Fig. 4C and D), and both datasets reveal similar correlation coefficients with D 47 . The slopes of both linear regressions agree within error (Fig. 4C). Most samples reveal less than 1°C temperature differences between both methods (Methods 2 and 3). In most cases with larger offsets the measured d 18 O value of the sample is lighter (on average 0.22‰) than the d 18 O calculated value at the sea surface. As Method 3 uses the temperature at sea surface whenever the d 18 O calcite value is lighter than the d 18 O calculated value at 0 m, the resulting temperature estimate is lower than the one that is only based on the d 18 O-T calibration.

Relationship between D 47 and foraminifer calcification temperature
The method used to estimate calcification temperatures of planktonic foraminifers has an influence on the resulting D 47 vs. temperature calibration. Fig. 4D demonstrates that the differences between Method 1 and Method 2 are pronounced for some of the samples, whereas others are less sensitive to the choice of method. We attribute this result to the varying accuracy of the ecological assumptions made for individual sites and species when using Method 1. Therefore, the WOA-based temperature estimates generated using Method 1 appear less applicable for a temperature calibration than temperature estimates based on the better established d 18 O-T-relationship (Methods 2 and 3). Selecting an appropriate d 18 O-T calibration for the reconstruction of calcification temperatures from the d 18 O values is however crucial, as there are systematic differences between temperature estimates generated with different calibrations (Fig. 4D).
It is unlikely that the coldest temperature estimates generated with the calibration of Kim and O'Neil (1997) are accurate, as they do not agree with the WOA temperature data at these sites. A possible cause for this discrepancy between the two calibrations is inherent in the way they were generated: Both calibrations use data from a temperature range of 0-500°C by combining foraminifer data and inorganic calcite data from a study of O'Neil et al. (1969). The calibration of Shackleton (1974) combined this extensive inorganic dataset with foraminifer samples covering a temperature range of 0-7°C and was specifically proposed to represent cold temperature carbonate samples (Shackleton, 1974;reviewed in Pearson, 2012). In contrast, the calibration of Kim and O'Neil (1997) used samples from a temperature range of 10-40°C. Therefore, the latter equation may be less suitable to be applied to foraminifers calcifying at low water temperatures that are beyond the calibrated range. While acknowledging a remaining uncertainty related to the choice of d 18 O-T calibration, we surmise that the calibration of Shackleton (1974) is the most reliable basis for temperature reconstructions from diverse settings and for a large number of different species based on the arguments outlined above.
Whether this equation is used directly (Method 2) or combined with available temperature data (Method 3) has only minor influence on the resulting calibration line ( Fig. 4C and D). One potential weakness of Method 3 compared to Method 2 is indicated by the observation that some sea surface sample values are not represented by the d 18 O calculated profile. This could be explained by some unaccounted-for species-specific disequilibrium effects. Another reason for these d 18 O calcite values that point to warmer temperatures than the annual mean SST could be seasonality effects in the life cycle of certain foraminifer species. These effects may bias the signal towards summer temperatures. This could for example affect samples of G. ruber pink as well as N. pachyderma since both species may calcify relatively close to the sea surface and are reported to  (Shackleton, 1974). Solid lines represent the 95% confidence intervals. D: Comparison of temperatures estimated using Method 2 (Shackleton, 1974) with other temperature datasets used in A-C (blue dots: Method 1; orange dots: Method 2 using Kim and O'Neil (1997); pink dots: Method 3). Error bars represent the temperature uncertainty (A-D) in x direction and one standard error of the mean D 47 (A-C) or the uncertainty of the temperature estimates (D) in y direction, as given in Table 3. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) reach maximum abundances during summer season (Table 2).
On the other hand, assumptions regarding foraminifer ecology might not represent all of the data equally well and can potentially introduce errors, as evident from Fig. 4A. Consequently, the advantage of Method 3 over Method 2 is its independence from any assumptions concerning the calcification depth of the analyzed specimens. Moreover, since the calcification temperatures are derived from WOA data, extreme temperature values outside the observed annual mean temperature are excluded. We therefore use calcification temperatures derived with Method 3 for all subsequent calculations.
The relationship between D 47 and calcification temperatures, which were derived from Method 3 and the application of the York regression, leads to the following regression equation: The D 47 data for all measured species follow a clear linear relationship (Fig. 5) albeit with noticeable scatter, which is most pronounced at the warm end of the calibration (>22°C). Interestingly, earlier studies (Tripati et al., 2010, Grauel et al., 2013 have observed the opposite (i.e., larger variability at the cold temperature end, see Section 4.2). The observed variability is not related to a specific site being systematically offset from the general trend (Fig. 5A), but may stem from a variety of reasons as discussed in the following.
The small amounts (<5 mg) of sample material measured in this study may have led to a slightly larger scatter than observed in the recent foraminifer-based D 47 -T calibration by Peral et al. (2018) where 16-20 mg were used. The samples in this study integrate over fewer individual specimens (minimum $100) than studies measuring larger samples and could be affected by individual tests that deviate from the mean D 47 values. Hence, the scatter of the D 47 signal could likely be reduced further by measuring more replicates at the expense of slightly larger sample requirements.
Besides the measurement procedure, there are several potential reasons for individual samples to deviate from the described D 47 -T relationship, related to either the calcification temperatures that were calculated from d 18 O calcite or the D 47 values. Because calcification temperatures can only be estimated, any divergence from the true calcification temperature can potentially cause affected samples to deviate from the general trend. Surface water conditions in particular can be highly variable (e.g. on a seasonal scale), potentially influencing the isotopic signal recorded by the foraminifers (e.g. Curry et al., 1983). However, potential seasonal temperature effects are largely accounted for by the combined approach of calculating d 18 O values and using WOA-based water temperatures (Method 3).
Despite the fact that planktonic foraminifers appear to calcify slightly offset from isotopic equilibrium with respect to stable oxygen isotopes of ambient seawater (Daeron et al., 2019), we do not see clear evidence that speciesspecific disequilibrium effects on d 18 O enhanced variability in our dataset. Species-specific correction for disequilibrium effects using published values would result in colder temperature estimates for several tropical surface species (Appendix Table A2). However, since there is a wide range of published disequilibrium correction factors (e.g. 0.0-1.0‰ for G. ruber (white), Niebler et al. (1999)), it remains difficult to assess its influence on the temperature estimates. Depending on the choice of correction factor for each species, the scatter of tropical surface-dwelling species in the D 47 signal may in fact increase. In any case, a disequilibrium correction of the d 18 O calcite signal moves the data of several warm-water surface species in the same direction and thus will not reduce the scatter of the data (Appendix Table A2 and Fig. A1).
Early diagenetic alterations such as secondary calcite precipitates grown at colder water temperatures may bias calcification temperature reconstructions from d 18 O measurements towards colder values (e.g. Pearson, 2012). However, such alteration should affect both the d 18 O and the D 47 signal and bias all samples from the same site towards colder temperatures. Especially if two species from the same site are characterized by similar calcification depths, early diagenetic effects should be similar for both. Yet, the calculated calcification temperatures from d 18 O calcite of different surface-dwelling species from the same sites generally agree well: For example, at site SO164-25-3 located in the Caribbean differences of up to 0.029‰ in the D 47 signal were observed for surface species assumed to calcify at similar depth. For the same species the d 18 O-based temperature estimates are characterized by a relatively small difference 1.8°C (see Fig. 5A) and SEM images taken for several species at this site did not reveal any signs of secondary calcification. Moreover, we take the aforementioned good agreement between the temperatures calculated from d 18 O (Method 2) and temperature estimates based on WOA data in combination with published calcification depths (Method 1) as indication that the d 18 O signal is not altered (average temperature difference for tropical species: 0.7°C). Due to the fact that clumped isotopes are only dependent on the mineral formation temperature, a stronger influence of early diagenesis on the D 47 signal than on d 18 O is unlikely and has not been observed, even in samples as old as 44 Ma .
Possible short-term variability of the surface water d 18 O due to salinity changes could introduce larger variability of the d 18 O calcite signal of surface-dwelling species (reviewed in Pearson, 2012). For instance, site SO164-25-3 is located in an area that is influenced by the Amazon and Orinoco River plumes and may thus be experiencing considerable salinity changes (Schmuker and Schiebel, 2002). A surface-water salinity effect that influences the oxygen isotope signal of the upper water column could potentially bias the estimated calcification temperature of surface species at site SO164-25-3. For this site, however, we measured several surface-dwelling species including three morphotypes of the same species, G. ruber (G. ruber white s.s., s.l. and G. ruber pink). Although the d 18 O calcite measurements from all three morphotypes agree well, not only among this species but also with other species from the same site, the D 47 signal of these three samples reveals notable differences ($0.03‰). Moreover, none of the species measured for this site is characterized by a particularly variable d 18 O calcite signal (SD of 0.09-0.16) and the calculated calcification temperatures are similar to the WOA-based estimates.
As suggested by various studies (e.g. Spero et al., 1997), pH affects the d 18 O signal of foraminiferal calcite towards more negative values with increasing pH. This effect was estimated by Zeebe (1999) to amount to À1.42‰ per unit of pH. The presence of photosymbionts is expected to increase the internal pH of foraminifers by up to 0.5 pH units (Rink et al., 1998) and thereby could bias tempera-tures calculated from d 18 O calcite towards warmer values (reviewed in Pearson, 2012). For benthic foraminifers, on the other hand, Marchitto et al. (2014) found no clear pH effect. If pH had a strong influence on the calcification temperatures estimated from d 18 O calcite in this study we would expect all symbiont-bearing species to reveal systematically warmer temperatures and disagree with atlas-based (Method 1) temperatures. Although this is not the case we cannot exclude that pH effects contributed some  (Table 3). Calcification temperatures are given in 10 6 /T 2 (T in K) and°C. The blue line and gray shaded area show the linear regression (Eq. (1)) and 95% confidence interval. additional scatter to the d 18 O calcite signal of surfacedwelling species. Potential pH effects on the D 47 signal will be discussed in Section 4.2. We suggest that the number of replicate measurements is the most important factor causing scatter in our foraminiferal D 47 -T dataset. Nonetheless, we will investigate the data for species-specific effects possibly contributing to the scatter of the D 47 signal in the following section.

Species-specific effects
The question whether the D 47 in planktonic foraminifers is influenced by any species-specific effects is of vital importance to the application of this proxy for paleoceanography, since the absence of species effects would imply that D 47 can be applied far back in time, despite evolutionary changes in species composition. Previous D 47 -T calibration studies on foraminifers found large scatter and a potential discrepancy between foraminifers and inorganic calibrations at the cold end of the calibration (Tripati et al., 2010;Grauel et al., 2013), which were attributed to kinetic effects during the calcification process on foraminifers in cold-water conditions resulting in lower and more variable D 47 values. In this study, in contrast, we observe neither increased scatter nor deviations towards low D 47 in the cold-end foraminifer samples (Fig. 5). Cold-water species such as N. pachyderma from multiple sites do not reveal systematically negative, larger residuals. While most species are distributed relatively close to the calibration line ( 0.01‰) and do not reveal any systematic offset, there are a few exceptions (Fig. 6A): One of two samples measured on G. conglobatus (site SO225-53-1 from the Manihiki Plateau) plots $0.015‰ above the linear fit. Furthermore, all but one of the G. ruber samples from multiple sites are characterized by higher D 47 values (residuals 0.01-0.02‰). This includes all three morphotypes of G. ruber measured in this study. Furthermore, all three samples of P. obliquiloculata show lower D 47 values with two of them <À0.010‰.
Taking the uncertainty of the calcification temperature estimates and the clumped isotope measurements into account, the measured, species-specific D 47 data from this study do not reveal any statistically significant deviation from the linear relationship determined for the entire dataset. Some species are only represented by a single sample, whereas up to six samples from different sites were included for species that are frequently used for paleoceanography (such as G. bulloides, G. ruber and N. pachyderma). The limited temperature ranges of individual species do not allow for the calculation of individual, species-specific regression lines.
To further test the influence of individual species or genera on the D 47 -T calibration, we removed consecutively certain taxa from the dataset and compared the resulting slopes of the calibrations to the slope of the entire dataset (Fig. 6B). All of the slopes calculated for such data subsets fall within the 95% confidence interval of the slope calculated for the entire dataset. The two species that have the largest influence on the slope are N. pachyderma and G. ruber. This observation could be related to the position of the data from these two species at the cold (N. pachyderma) and warm end (G. ruber) of the dataset as the regression line is particularly sensitive to data at both ends of the temperature range. The dataset without N. pachyderma is characterized by a flatter slope (0.0380 compared to 0.0397 for the entire dataset). This deviation could be explained by the fact that the exclusion of this cool (<10°C), high-latitude species from the dataset reduces the entire temperaturerange by 7°C and hence raises the uncertainty of the Fig. 6. Evaluation of possible species-specific effects on the calibration; A: D 47 residuals of single foraminifer species, calculated as deviation from the linear regression presented in Fig. 5 (individual species/species groups are displayed by different symbols and colors similar to Fig. 5B). The residuals show no significant trend. B: Calculated slopes of the linear regression for subsets of the data (black dots) excluding one species at a time to test the influence of individual species on the calibration. The black line represents the slope of the complete dataset (gray area: 95% confidence interval of the mean slope). Error bars represent the 95% confidence intervals of the slopes. D 47 -T calibration (standard error raised from 0.0021‰ to 0.0025‰).
For G. ruber, all but one sample are offset by on average 0.015‰ to a higher D 47 value (lower temperature) from the calculated D 47 -T regression line of the entire dataset (Figs. 5B, 6A). The linear fit calculated for the dataset without G. ruber exhibits the steepest slope (0.0419) of any of the subsets presented in Fig. 6B, suggesting that these data exert the strongest influence on the regression, albeit not significant at the 95% confidence level. This relatively large effect on the slope despite the temperature range of the dataset being largely unaltered may imply that the G. ruber D 47 -T dependency is different from other planktonic foraminifer species. However, the observation that not all G. ruber data show the offset to higher D 47 values is inconsistent with this hypothesis. For Caribbean site SO164-25-3, for example, we analyzed three different morphotypes of G. ruber (G. ruber white s.s, s.l. and G. ruber pink). Two of these samples (G. ruber white s.l. and G. ruber pink) show the offset to higher D 47 , whereas the third morphotype (G. ruber white s.s.) reveals a lower (by $0.03‰) D 47 value than the other two and plots below the regression line (Fig. 5B). Moreover, D 47 in G. ruber was also analyzed in the recent study by Peral et al. (2018), which did not reveal systematic species-specific behavior of G. ruber, although one of three samples was characterized by slightly higher D 47 . Overall, evidence regarding species-specific effects in G. ruber is inconclusive and we do not consider G. ruber to calcify systematically offset from other species of foraminifers with respect to D 47. Individual samples of other species (such as G. conglobatus) also deviate from the linear regression to a similar degree. A combination of the intrinsic uncertainty of the clumped isotope measurement together with natural variability of the sample material could explain the observed scatter of the D 47 -T data, including the apparent deviation of G. ruber from the trend.
However, we cannot rule out the existence of relatively small and possibly variable secondary influences on the D 47 signal during the calcification process in the surface water. The fact that the shallowest surface-dwelling species G. ruber shows the strongest deviation of individual samples from the general D 47 -T relationship raises the question, whether there could be additional effects besides temperature on D 47 in species living close to the sea surface. This has been described for other groups of marine calcifying organisms as well. Tropical shallow-water corals, for example, show increasing D 47 with increasing calcification rates (Saenger et al., 2012). Kinetic effects on the D 47 signal related to growth rates in brachiopods were shown by Bajnai et al. (2018). Moreover, Davies and John (2019) reported evidence for a constant offset of echinoid D 47 values from inorganic calcite that might be related to internal pH of the calcifying fluid in echinoids being offset from seawater pH. If this was true for foraminifers as well, the effect would be expected to be larger for symbiont-bearing species such as G. ruber because the internal pH in these species was reported to be higher (Rink et al., 1998). In contrast to this hypothesis, however, Tang et al. (2014) observed in inorganic calcite precipitation experiments that the D 47 -T relationship is largely insensitive to pH and growth rate at the external and internal conditions expected during foraminifer calcification. Tripati et al. (2015) also found that even major changes in ocean chemistry (pH and salinity) expected during the Cenozoic have only small to negligible effects on the D 47 signal of marine carbonates.
One potential explanation for increased scatter of the D 47 values in surface-dwelling planktonic foraminifers could be the additional influence of photosynthesis on the calcification process: Photosymbionts may potentially cause disequilibrium effects on the recorded D 47 signal because they are not only altering the microenvironment from which calcite is excreted but also by affecting calcification rates (de Nooijer et al., 2014). An effect of strong photosynthesis in plants on D 47 measured in residual CO 2 gas was demonstrated by Laskar and Liang (2016), who also reported plant photosynthesis to decouple the d 18 O and D 47 values. However, studies investigating photosynthesis in foraminifers found that both the kind of symbionts and the concentration of chlorophyll as a measure for photosynthetic activity in G. ruber are similar to other species, such as T. sacculifer (Fujiki et al., 2014;Takagi et al., 2019). This would suggest  Table 3. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) that any effect photosynthesis may have on D 47 in G. ruber should also occur in other symbiont-bearing species, whereas we do not observe any systematic difference between symbiont-bearing and non-symbiotic foraminifer species (Fig. 5B). Because G. ruber has a shallower habitat depth than most other planktonic foraminifers (e.g. Wang, 2000), the influence of photosynthesis might be larger on this species than on deeper-dwelling symbiotic species. A higher symbiont density under high-light conditions likely affects the calcification process of G. ruber as already suggested for boron isotopes (see Hö nisch and Hemming, 2004). However, this effect cannot explain the differences of the D 47 signal in three morphotypes of G. ruber from the same site (see Fig. 5). In particular the G. ruber (white) s.s. morphotype reported to live closest to the sea surface yields D 47 values closest to the calibration line. More work is needed to test possible secondary influences on the incorporation of the D 47 signal and evaluate whether there might be any significant species-specific effects in G. ruber and similar shallow-dwelling species.

Comparison with other clumped isotope calibrations
Our data is in overall good agreement with recent clumped isotope calibrations of Bonifacie et al. (2017), Kele et al. (2015) as recalculated by Bernasconi et al. (2018), andPetersen et al. (2019) (Fig. 7). Our linear regression has a slope between the flatter Petersen et al. (2019) calibration and steeper slopes published by Bonifacie et al. (2017) and Kele et al. (2015) (Table 4). Since the slope and intercept are negatively correlated for the regression analysis the intercept of our regression is higher than the ones published by Bonifacie et al. (2017) and Kele et al. (2015). The Petersen et al. (2019) calibration, which is based on a compilation of synthetic carbonate data shows a slight but apparently systematic offset towards higher D 47 values compared to our data. Most of the datasets included in this compilation, as well as in the compilation by Bonifacie et al. (2017), however, used different analytical and data correction procedures compared to our study. Bonifacie et al. (2017) combines data from various existing calibrations that were generated on a variety of analytical setups and using different standards. Also, several of the datasets included in the Bonifacie et al. (2017) calibration were calculated with the parameter set for 17 O correction by Gonfiantini et al. (1995), which were later shown to have caused discrepancies among samples with very different bulk compositions (Daëron et al., 2016;Schauer et al., 2016). For example, the intercept of the travertine calibration of Kele et al. (2015) decreased by 0.038‰ with the updated 17 O abundance correction of Brand et al. (2010)  . However, the original data are included in the compilation of Bonifacie et al. (2017). Recalculating the calibration of Bonifacie et al. (2017) using the updated Kele dataset might therefore lead to a lower intercept and move this calibration closer to our calibration (Eq. (1)).
The underlying data of the (fully recalculated, see Bernasconi et al., 2018) travertine calibration by Kele et al. (2015) agree well with our foraminifer data (Fig. 7). This is most likely due to similarity in analytical set-up, raw data treatment and correction using the same carbonate standard values. Differences in data correction and/or analytical procedures may thus explain some degree of systematic offset between calibrations. In addition, Fernandez et al. (2017) pointed out that a small temperature range of some D 47 -T calibrations is among the important factors that can explain discrepancies between various calibration lines. Given the small temperature range biogenic samples like foraminifers cover, the slight discrepancies between calibrations are not surprising.

Comparison of foraminifer-based calibrations
Comparing our data to recent D 47 -T calibrations of Peral et al. (2018), Piasecki et al. (2019) and Breitenbach et al. (2018) (Fig. 8) contributes to the ongoing debate regarding Table 4 Our foraminifer-based D 47 -T calibration compared to recent clumped isotope calibrations in the 25°C reference frame. All calibrations except for Bonifacie et al. (2017) and Petersen et al. (2019) were calculated using ETH carbonate standards and the correction for temperature dependend acid fractionation that was published by Defliese et al. (2015). The equation published by Bonifacie et al (2017) was converted to the 25°C reference frame using a correction for temperature-dependent acid fractionation of 0.082‰ (Defliese et al., 2015) on the intercept. The intercept of the Petersen et al. (2019) calibration was lowered by À0.004‰ (see Section 2.3) for comparability with equations based on the acid fractionation factors published by Defliese et al. (2015).

Regression
Slope * 10 6 /T 2 ± 1 SE Intercept ± 1 SE Type of material Bonifacie et al. (2017) 0.0422 ± 0.0019 0.208 ± 0.0207 Various -compilation of existing calibration data Breitenbach et al. (2018) , Peral et al., 2018 0.0418 ± 0.0016 0.2017 ± 0.0195 Foraminifers Combined calibration (Peral et al., 2018 0.0431 ± 0.0016 0.1876 ± 0.0189 Foraminifers inter-laboratory differences and improves foraminifer-based D 47 -T calibration efforts. In order to treat the datasets consistently, we recalculated calcification temperatures for the datasets of Peral et al. (2018) and Breitenbach et al. (2018) using Method 3. The previously published bottom water temperatures for the benthic dataset of Piasecki et al. (2019) were kept. Bottom water conditions are assumed to be relatively constant over time, such that instrumental measurements can be regarded as reliable to calibrate D 47 -data of benthic foraminifers. Such an approach independently verifies that the calculated planktonic calcification temperatures are realistic, provided that neither planktonic nor benthic foraminifers record D 47 values significantly offset from the inorganic D 47 -T relationship.
In general, the datasets are in good agreement across the entire temperature range from À1 to 28°C. The benthic foraminiferal D 47 data of Piasecki et al. (2019) was generated in the same laboratory as the data from this study and seem to indicate a slight deviation from our planktonic D 47 -T calibration towards higher D 47 values for temperatures below 15°C. At the same time, the variability of the benthic D 47 data at the cold end of the calibration is relatively large. This can be explained by individual data points that contain less replicate measurements due to sample limitations.
The planktonic foraminiferal D 47 -T data from Breitenbach et al. (2018) are characterized by a larger scatter. While all four datasets presented in Fig. 8 overlap for the warm end of the D 47 -T calibration, D 47 values at the cold end (<13°C) tend to be lower by $0.02-0.03‰ in the Breitenbach dataset. Breitenbach et al. (2018) acknowledge the small number of samples and replicates as well as the relatively large scatter of the dataset, which was generated with the primary purpose of comparing D 47 and Mg/Ca.
The data from Peral et al. (2018) show excellent agreement with our measurements, with confidence intervals overlapping for the whole temperature range. This is particularly noteworthy as the two datasets were derived with completely different analytical setups: In this study, samples were digested at 70°C in a Kiel IV carbonate preparation device with a short Porapak column and subsequently measured on a Thermo Fisher Scientific MAT 253 Plus in microvolume mode with the LIDI approach. In contrast, Peral et al. (2018) used a common acid bath operated at 90°C, a GC column for contaminant removal, and carried out the isotope measurements on a VG Isoprime mass spectrometer under constant gas pressure. The good agreement of the data provides further evidence that different measurement techniques provide comparable D 47 data as long as they are corrected using the same carbonate standards (in this case ETH 1-4; Bernasconi et al. 2018) and the ''Brand parameters" for the 17 O abundance correction (Daëron et al., 2016;Schauer et al., 2016).
Based on these considerations, the various recent datasets containing foraminiferal D 47 data with comparable data treatment were combined to enhance the accuracy of an overarching D 47 -T calibration valid for all foraminifer species. We excluded the D 47 data of Breitenbach et al. (2018) due to the small number of replicate measurements (although we report a version of a combined foraminiferbased calibration including this dataset in Table 4). The resulting D 47 -T calibration encompassing the data of Peral et al. (2018), Piasecki et al. (2019) and this study (Eq. (2), Fig. 8) falls within the error of the regression exclusively derived from our data (which is characterized by a slightly flatter slope) and emphasizes the conformity and compatibility of the three D 47 datasets: The recalculated version of the Kele et al. (2015) calibration (see Bernasconi et al., 2018) lies within the confidence interval of the combined foraminiferal D 47 -T calibration presented here. For the calibrated temperature range our combined calibration yields temperature estimates within 1°C of the Kele et al. (2015) calibration with the largest difference at the cold end of the calibrated temperature range. Given the fact that the Kele et al. (2015) calibration used the same carbonate standards for the corrections, this  Breitenbach et al. (2018) were not included in the combined calibration due to the larger variability of the dataset (also discussed in Breitenbach et al., 2018). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) agreement suggests that foraminifers follow the same D 47 -T relationship as the travertines. We note that this agreement between our combined foraminifer-based calibration and the Kele travertine calibration does not exclude the possibility that all of these carbonates are influenced by some degree of disequilibrium fractionation (Watkins and Hunt, 2015;Daeron et al., 2019).
For future studies using foraminifer samples we recommend applying our combined foraminifer-based calibration (Eq. (2)) rather than the recalculated travertine calibration of Kele et al. (2015). Although the Kele et al. (2015) calibration has the advantage of covering a much wider temperature range our combined calibration is based on a large number of foraminifer samples from different studies and laboratories. Hence it is characterized by a smaller uncertainty within the normal ocean temperature range compared to the Kele et al. (2015) calibration. However, the reconstructed temperatures applying either of the two calibrations fall within less than 1°C of each other. Using the long-integration dual-inlet (LIDI) method a sample size of 2-5 mg of foraminifers is enough for 20 to 40 replicate measurements, which is the equivalent of a temperature uncertainty of 1.5°C or less on the measurement.

Water column temperature gradients
A widely accepted approach to gain a deeper understanding on past oceanographic changes is to combine geochemical information (e.g. combined analyses of foraminiferal Mg/ Ca and d 18 O) from calcitic tests of shallow and deep-dwelling foraminifer species, allowing reconstruction of water column stratification. Based on the notion that there are no discernible species effects on the D 47 -T calibration presented above (Eq. (2)) and the close agreement with the travertinebased Kele calibration (Kele et al., 2015), we test how reliably vertical temperature gradients can be reconstructed from foraminiferal D 47 data. We compare D 47 -derived temperatures from various species from two Pacific and Indian Ocean sites to annual mean water temperatures and seasonal extremes at these locations (Locarnini et al., 2010) (Fig. 9). To avoid circular reasoning, the D 47 -temperature estimates were derived from the Kele calibration and plotted at the respective assumed calcification depths of the species, based on the available ecological information (c.f. Table 3). This exercise should be seen as a feasibility study.
Within error, the D 47 -temperature estimates from almost all foraminifer species compare to the annual mean temperature at the respective sample locations. The absolute temperature difference of $15°C between the two sites is well reflected in the D 47 -temperature signal. On the vertical scale, shallow-dwelling species commonly show higher D 47 -temperatures than deep-dwelling species, and the reconstructed temperature differences reflect the different gradients at the two sites very well. Only G. menardii and G. ruber from site WIND 33B in the Indian Ocean yield D 47 -temperatures that appear too cold for their assumed calcification depths (by 5 and 6°C, respectively, Fig. 9). The calcification temperature reconstructed for G. menardii suggests a habitat in the lower thermocline at this site, lower than commonly assumed, which is also seen in the deeper d 18 O-based apparent calcification depth (Table 3). The apparent cold bias of G. ruber is a recurrent feature Fig. 9. Atlas-based mean annual water temperature (solid lines) and seasonal temperature range (dashed lines) (Locarnini et al., 2010) for sites SO213-84-2 (blue) and WIND 33-B (pink) and clumped isotope temperatures (using the calibration published by Kele et al., 2015, recalculated by Bernasconi et al. (2018) plotted against water depth/assumed calcification depth (Table 3). Error bars in x-direction represent the temperature uncertainty due to the standard mean error of the measurement while the error bars in y-direction show the uncertainty of the calcification depth based on the available information presented in Table 3. observed at various sites and ocean settings in this study as discussed above. Overall, this exercise demonstrates that D 47 can be used to reconstruct vertical temperature gradients within the water column while avoiding the uncertainty introduced by the use of individual, species-specific calibrations for other foraminifer-based geochemical proxies.

CONCLUSION
By analyzing D 47 in 14 species from 13 globally distributed core-top samples, this study confirms findings from previous studies (Tripati et al., 2010;Grauel et al., 2013;Peral et al., 2018) that found foraminifers to follow the same relationship between D 47 and the carbonate formation temperature as inorganic calcite. The substantial number of different foraminifer species analysed here, as well as the large number of samples from different sites for some species greatly increases confidence in this finding. Although small species-specific effects within the analytical uncertainty cannot be completely ruled out, no significant systematic effect could be identified in this study. The only possible deviation from the D 47 -T relationship that cannot be explained by the uncertainty associated with foraminiferal ecology is the mixed-layer species G. ruber, showing apparent cold biases in some samples. However, the results for this species remain inconclusive, warranting a more detailed study on the clumped isotope signal in G. ruber.
We demonstrate that results from different laboratories and various measurement setups are in good agreement when the D 47 data are corrected using the same carbonate standards and the latest 17 O abundance correction parameters. The combination of natural variability, relatively large uncertainties of the estimated calcification temperatures and the comparatively small natural temperature range affect the precision of any D 47 -T calibration based only on foraminifers. We minimize this problem by combining several available foraminifer-based calibrations and calculating a common D 47 -T calibration. Within the error, this combined calibration is identical to the recalculated travertine calibration of Kele et al. (2015) (see Bernasconi et al., 2018). Temperatures reconstructed using either of the two calibrations fall within less than 1°C of each other. Because of the smaller uncertainty within the ocean temperature range, we recommend using our combined calibration (Eq. (2)) for foraminifer samples. Finally, we show that the reconstruction of temperature profiles through the water column from clumped isotope measurements is feasible using micro-volume measurements on different species within the same sample.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. with the sample preparation.   Table A2) were applied. The data are differentiated for each species (displayed by different symbols and colors as in Fig. 5B). Applying species-specific disequilibrium corrections to the d 18 -O calcite data prior to calculating the calcification temperature dos not result in smaller residual values compared to Fig. 6A.   Table A3 Correlation coefficients (Pearson's product-moment correlation), slopes and intercepts calculated using various calcification temperatures for the following linear regression: D 47 = (m ± SE) * 10 6 /T 2 + (b ± SE) (T in K).