What is the link between temperature, carbon dioxide and methane? A multivariate Granger causality analysis based on ice core data from Dome C in Antarctica

This paper continues the work of Kang and Larsson Theoretical and Applied Climatology 116:537–548 (2014) What is the link between temperature and carbon dioxide levels? A Granger causality analysis based on ice core data. Theoretical and Applied Climatology 116:537-548, by adding methane to the study of causality between temperature and carbon dioxide. The data used goes 800,000 years back in time which is possible due to it being extracted from ice cores located at Dome C in Antarctica. First we use linear interpolation to make the three data sets equidistant to be able to employ statistical methods. We adjust for a deterministic trend and find the best model fit to be an autoregressive model with lag two and a piecewise quadratic trend. By employing multivariate Granger causality tests we find strong evidence that temperature, carbon dioxide and methane all Granger cause each other in both directions i.e. carbon dioxide Granger causes temperature and temperature Granger causes carbon dioxide, etc. This is in accordance with the findings of Kang and Larsson, with the extension that we can add methane to establish a trivariate feedback system between temperature, carbon dioxide and methane.


Introduction
Global warming has for quite some time been one of humanity's biggest challenges and, to a very high degree, still is. The aim of this article is to study the long term relationship between temperature, carbon dioxide and methane from ice core data from Antarctica, to see whether there exists a statistically significant causality between the three variables in question using the multivariate Granger causality test. This is a continuation of the paper from Kang and Larsson (2014) where they found a long term relationship between temperature and carbon dioxide concentration. In the present paper, we bring in methane concentration into the analysis.
References to previous related work are given in Kang and Larsson (2014). Here, we complement this with a brief account of more recent research on related subjects. Two papers that study the impact of carbon dioxide and methane on temperature, and vice versa, using ice core data, are Davidson et al. (2016); Stips et al. (2016). The statistical model of Davidson et al. (2016) resembles our model. They also use a vector autoregressive (VAR) model which takes the deterministic trends (Milankovitch cycles) into account. But in contrast to Kang and Larsson (2014) and the present paper, these trends are modeled via known orbital variables. Another difference is that Ice volume is also brought in as an additional, fourth variable. Granger causality is not tested via parameter restriction. Instead, the analysis focuses on prediction intervals for changes of the dependent variable (e.g. temperature) when one of the covariates (e.g. carbon dioxide concentration) increases its value. In this way, a causal effect of temperature on carbon dioxide and methane is found, but not the other way around.
The same issue is under study in Stips et al. (2016). The methodological tools here are differential equations and Shannon entropy. Data used are both recent (from 1850 and on) and from ice cores. For more recent data, it is found that greenhouse gases, in particular carbon dioxide, are the main causal drivers of the temperature rise. But for ice core data, the cause-effect direction is reversed. The explanation is that the feedback of greenhouse gases to temperature changes seems to be slower that the fast response of temperature to changes of greenhouse gas concentrations. This finding is corroborated by Muryshev et al. (2019).
Other papers of special interest include the following: Faes et al. (2017) modifies the Granger causality testing method in VAR models and conclude that, when analyzing ice core data, the proof for carbon dioxide rise to cause temperature rise is stronger than for the opposite causation. The papers by Seip et al. (2018); Vakulenko et al. (2016) point out, using different methods, that causalities between carbon dioxide concentration and temperature may go in different directions over different time periods. Finally, Estrada and Perron (2019) analyze data from the last centuries and find that breaks in carbon dioxide trend to precede breaks in temperature trend, which they interpret as a sign of causality.
The structure of the present paper is as follows: in Section 2 a short description of the data is given followed by a briefing of the data processing necessarily to be able to employ traditional statistical methods. Thereafter, in Section 3, the statistical analysis used is described. Section 4 initiates with a briefing of the results regarding model selection, optimizing break points and some residual analysis, culminating in the results of the multivariate Granger causality tests. Finally in Section 5 a discussion of the results and comments on future work takes place.
The statistical analysis was performed using the statistical computer software R version 3.2.3.

Background and data
The source of data, National Centers for Environmental Information (NCEI) monitored by National Oceanic and Atmospheric Administration (NOAA) is a world wide climate and historical weather database. More specifically the data used in this study was collected at Dome C (by European Project for Ice Coring in Antarctica (EPICA)) which is located at east Antarctica. The temperature, carbon dioxide and methane series all go from, approximately, 800,000 years back in time to around 100 years before present. This is possible by drilling a 3,270 m ice core from which the climate and gas data can be retrieved. Using the attribute of ice cores to store small air bubbles makes it possible to access information about gas compositions in the past. By analyzing different isotopes it is also achievable to reconstruct the temperature level, cf Alley (2010).

Data processing
The three data sets (temperature, CO 2 and CH 4 ) all span over roughly the same time period,however, there are large differences in number of observations and they are not observed at the same time. Henceforth, to be able to properly analyse the data, a simple linear interpolation was used to make the observations of the three series equidistant. This was done by using the following formula Here x 1 is observed at time t 1 and x 2 at time t 2 hence (1) will calculate x at time t 1 < t < t 2 . The interval was chosen in agreement with Kang and Larsson (2014) and was set to be 1,000 years. The number of interpolation points is a trade-off: choosing too few might result in a loss of information whereas too many will lead to artificiality resulting in a possible bias.

Structural break points
By inspecting Figs. 1, 2 and 3 several obvious structural break points are observed and the pattern seems to be similar among the three series. As suggested by Kang and Larsson (2014) the eight initial break points shown in Table 1 will be used for the three series. The break points will later on be modified to optimize fit and the result of that modification is also shown in Table 1.
where ( 1t , 2t , 3t ) are iid with expectation 0 and the deterministic terms are where i = 1,2,3 (temperature, CO 2 , CH 4 ) and I T j ≤ t < T j+1 is 1 if T j ≤ t < T j+1 and 0 otherwise. In (3) d i j0 is the constant term, d i j1 the linear and d i j2 the quadratic. p in (2) is the number of lags.

Akaike information criterion
The Akaike information criterion (AIC), cf Akaike (1974), measures how well a model fits the data. The lower value of AIC the better the fit. This was taken into consideration when finding the optimal model. The formula is where k is the number of parameters in the model andL is the maximum likelihood.

Out of sample analysis
In addition to AIC an out-of-sample analysis was employed to help decide the final model. This was done by estimating the models using all of the data with the exception of the last 50 observations. Thereafter the ruled out observations were predicted from the estimated models and, lastly, the residual

Granger causality test
The process X 2 Granger causes X 1 if δ k12 = 0 for some k in (2), and similarly, X 1 Granger causes X 2 if δ k21 = 0 for some k as first described by Granger (1969). This was later developed into a formal F test by Sims (1972). In this paper, we prefer to use an alternative test proposed by Lütkepohl (1982). He used the likelihood ratio test where T is the number of observations, RSS r and RSS u are the residual sums of squares of the restricted and unrestricted model, respectively. Then Lütkepohl compared that to the χ 2 distribution. The number of degrees of freedom is the number of restricted parameters. For every trivariate model Lütkepohl carried out three tests. E.g. in our case, for the trivariate temperature model the first test would have the null hypothesis that the lag coefficients for CO 2 and CH 4 would be zero (δ k12 = 0 and δ k13 = 0, for k = 1,2), versus the alternative hypothesis that these two lag coefficients are nonzero. This test determines if at least one of δ k12 or δ k13 is nonzero and therefore has an impact on temperature. The number of degrees of freedom is 2 p.
The second test has the null hypothesis that only δ k12 is zero versus the same alternative. The third test has the null hypothesis that only δ k13 is zero versus the same alternative. The latter two tests help determine the significance of the two variables one by one. The number of degrees of freedom is p.

Results
To investigate whether the three reconstructed series were stationary or not the augmented Dickey-Fuller test (Dickey and Fuller 1979) was used. The null hypothesis of nonstationarity was tested against the alternative of stationarity. The test gave rejection in all the three cases (test statistics −4.8, −4.6 and −6.1 for the temperature, carbon dioxide and methane series, respectively). The Akaike Information Criterion (AIC) was calculated for six different models for each of the three series. That is (2) with p = 1 with and without the piecewise quadratic trend (3) and (2) with p = 2 with and without the piecewise quadratic trend (3). The result of this can be seen in Table 2 where the lowest AIC and therefore the best fit was given by AR(2) with piecewise quadratic trend for temperature and CO 2 . However, the best fit for CH 4 was given by AR(1) with piecewise quadratic trend, though by a very small margin. Furthermore by inspecting Table 3, where residual sum of squares (RSS) of forecast errors in the out-of-sample analysis were calculated for the six Pqt is short for piecewise quadratic trend and mbp is short for modified break points 3.264 × 10 −10 6.0*** One, two or three stars indicate significance level at 5, 1 and 0.1%, respectively different models, similar results were given with the exception that AR(2) with piecewise quadratic trend was the optimal model for temperature and CH 4 while AR(1) with piecewise quadratic trend was slightly better for CO 2 . With these results in mind the model with piecewise quadratic trend, modified break points and lag length two was chosen for all three series as the final model, considering that it was the model that overall produced the best fit when weighing in AIC and the out-of-sample analysis. The parameter estimates for the final trivariate models with modified break points for temperature, CO 2 and CH 4 can be seen in Tables 4, 5 and 6, respectively. The modified break points, together with the initial break points, can be seen in Table 1. They were produced for every series separately to optimize the fit. The AIC and RSS of the models with modified break points are also shown in Tables 2 and 3, respectively. 3.9 *** One, two or three stars indicate significance level at 5, 1 and 0.1%, respectively The modified break points were calculated according to a procedure presented in Kang and Larsson (2014) where, initially, the last 1 three break points (T 5 , T 6 and T 7 ) were fixed. Thereafter an iteration is initiated to find the set of T 2 , T 3 and T 4 that minimizes the RSS within ±10 of the previous set. Then it proceeds by fixing the newly updated set of T 2 , T 3 and T 4 and employing the same iteration to T 5 , T 6 and T 7 until optimized.
The autocorrelation function and partial autocorrelation function of the residuals for the three reconstructed series were calculated. No obvious patterns were detected. These figures, as well as estimated parameters for all univariate and bivariate models, may be obtained from the authors upon request. One, two or three stars indicate significance level at 5, 1 and 0.1%, respectively

Multivariate Granger causality tests
The results from the multivariate Granger causality tests are shown in Table 7, and it is clear that we can reject all the null hypotheses at any realistic significance level. This means that there is strong empirical evidence that temperature Granger causes CO 2 , CO 2 Granger causes CH 4 , CH 4 Granger causes temperature, CO 2 Granger causes temperature, CH 4 Granger causes CO 2 and temperature Granger causes CH 4 .

Discussion and conclusions
First we processed the data by linear interpolation to make classical statistical methods easier to employ. Thereafter we found the best model fit to be an autoregressive model with lag two and a piecewise quadratic trend. The residual analysis showed no obvious patterns. The main results in this paper strengthen the findings of Kang and Larsson (2014) that temperature Granger causes CO 2 and CO 2 Granger causes temperature. We can also further expand that result by presenting empirical evidence that there exists a whole feedback system between temperature, CO 2 and CH 4 where everyone one of them affects the other. This means that the continuous emission of the greenhouse gases CO 2 and CH 4 will, indeed, lead to an increase of the surface temperature.
For further studies on the subject one could consider including more variables, like nitrous oxide (N 2 O) concentration into the analysis, cf Fisher et al (2019). N 2 O is, just like CO 2 and CH 4 , a greenhouse gas so it would be interesting to see in what way it would interact with our established trivariate feedback system and N 2 O concentration is retrievable from ice cores hence it would not be difficult to carry through.
Ideally all the data would have been collected at the same time and the three series would have been equidistant from the start to avoid the necessity of data processing i.e artificiality but, alas, this was not the case. Furthermore one could always consider different, more advanced, statistical models e.g. nonlinear or continuous time models as suggested by Kang and Larsson (2014).
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.