Comment on ‘Quantifying the consensus on anthropogenic global warming in the scientific literature’

Cook et al’s highly influential consensus study (2013 Environ. Res. Lett. 8 024024) finds different results than previous studies in the consensus literature. It omits tests for systematic differences between raters. Many abstracts are unaccounted for. The paper does not discuss the procedures used to ensure independence between the raters, to ensure that raters did not use additional information, and to ensure that later ratings were not influenced by earlier results. Clarifying these issues would further strengthen the paper, and establish it as our best estimate of the consensus.

The consensus paper by Cook et al (2013) generated a lot of interest. Consensus is not proof, but occasional stock takes of the state of scientific knowledge are useful for identifying fruitful new research avenues and potential paradigm shifts. Agreement, or perceived agreement, about the extent and causes of climate change has no bearing on rational choices about greenhouse gas emission reduction-those are driven by the trade-offs between the impacts of climate change and the impacts of climate policy-but it does affect the public perception of and the political debate on climate policy, as does the integrity of climate research. Cook et al (2013) estimate the fraction of published papers that argue, explicitly or implicitly, that most of the recent global warming is human-made. They find a consensus rate of 96%-98%. Other studies 6 find different numbers, ranging from 47% in Bray and von Storch (2007) Bray and von Storch (2007) to 96% in (Carlton et al 2015). Cook et al use the whole sample. Other studies find substantial variation between subsamples. Doran and Zimmerman (2009), for instance, find 82% for the whole sample, while the consensus in subsamples ranges from 47% to 97%. Verheggen et al (2014) find 66% for the whole sample, with subsample consensus ranging from 7% to 79%. Figure 1 shows these estimates; see also table A1 in the appendix.
Measuring 'consensus' is, of course, not easythe human brain always reinterprets information presented. Different studies may have different objects of consensus. This is illustrated by Carlton et al (2015) who ask four different questions-about the impact on climate change of human activities, greenhouse gases, carbon dioxide, and the Sunand find four different results for the consensus rate (90%, 96%, 89%, and 71%, respectively). Other survey studies ask slightly different questions again. Oreskes (2004)  Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
6 Later studies were found from the forward references to Cook et al using Scopus. Earlier studies were found from Cook's backward references, and backward references in backward references.
paper (Cook and Cowtan 2015) and they disagree with the authors too (Tol 2014a).
These differences notwithstanding, the results by Cook et al (2013) seem to be at the high end in the consensus literature when 'no position' is excluded, and at the low end when included. As Cook et al have a sample that is so much larger than in other studies, you would expect its results to lie towards the centre of earlier results. Figure 1 highlights that this is not the case.
It may be that there is a trend in consensus findings, and that study by Cook et al stands out because it is recent. Cook et al (2013) argue that there is an upward trend in consensus but Tol (2014a) shows that this is a trend in composition rather than agreement. There appears to be no trend in the consensus rate across studies. There is no statistically significant trend in the results that include all. There is a statistically significant trend in the results that exclude 'no position', but this trend disappears if the 1996 Bray and von Storch estimate is omitted. See figure A1 in the appendix.
The problem may lie in the methodology of Cook et al (2013)-although earlier papers are not above criticism either (Peiser 2005, Duarte 2014). Reusswig (2013) praises Cook et al but Legates et al (2015) and Tol (2014a) question its data and methodology (Bedford and Cook 2013, Cook et al 2014a, Tol 2014b. Dean (2015) notes that the paper omits inter-rater reliability tests. Cook and Cowtan (2015) add these. These methodological exchanges omit the following five points: 1. Cook et al (2013) do not show tests for systematic differences between raters. Abstract rater IDs may or may not be confidential (Queensland 2012(Queensland , 2014, but the authors could have reported test results without revealing identities. 2. The paper argues that the raters were independent. Yet, the raters were drawn from the same group. Cook et al (2013) are unfortunately silent on the procedures that were put in place to prevent communication between raters.
3. The paper states that 'information such as author names and affiliations, journal and publishing date were hidden' from the abstract raters. Yet, such information can easily be looked up. Unfortunately, Cook et al (2013) omit the steps taken to prevent raters from gathering additional information, and for disqualifying ratings based on such information. (2013)  It would be of considerable benefit to readers if these issues would be clarified, if at all possible. That would help to convince people that the results of Cook et al are not just different but better than those in other studies. Cook et al (2013) renewed interest in the question how to communicate (climate) science. While several studies show that people respond to cues about the scientific consensus (Guy et al 2014, Myers et al 2015, Van der Linden 2015, van der Linden et al 2014, other studies show that this effect is dominated in the long run by other factors (Bliuc et al 2015, Campbell and Kay 2014, Kahan 2015.

Acknowledgments
Oliver Bothe, Collin Maessen, Ken Rice, Bart Verheggen and two anonymous referees had excellent comments on a previous version of the paper. Appendix A.