To the Editor:

Human beings are organisms with brains who have emotions and agency. The nature and size of our brains is what gives us these capacities, but our feelings and our behaviour are not best understood as simply the property of our brains. They are the properties of us as whole human beings immersed in our social world [1]. Inappropriately ascribing feelings, thoughts, preferences and other characteristics of humans to the brain is known as the mereological fallacy [2]. For example, the association between activation of certain areas of the brain and ‘liking the Beatles’ could be studied, but this preference is more likely to be understood by examining the relationships and experiences associated with this music in the person’s life. The same applies to feelings of despair and hopelessness.

First it is worth stating that none of the four correspondents [3,4,5,6] challenge the conclusion of our umbrella review [7] which was that we found ‘no support for the hypothesis that depression is caused by lowered serotonin activity or concentrations.’

Fountoulakis and Tsapakis [6] argue that demonstrating a lack of evidence for the serotonin hypothesis [7] is not the same as a rejection of any biological basis for depression and argue that doing so condemns us to a belief in the supernatural or idealism, while Arnone et al. imply it is a rejection of science [3]. Whilst there is the possibility that another biological cause for depression will be found, studies from neuro-imaging [8], research on specific genes [9, 10] and hypothesised markers of depression from existing neurochemical and hormonal theories [11] have not returned convincing or consistent evidence of a biological abnormality. Moreover, although we would agree that serotonin and other neurotransmitters may well be involved in depression in ‘a more complex manner’ [3], without specifying the nature of the relationship, this is equivalent to saying that blood is involved in depression. It surely is, but this has little bearing on the nature, origins or causation of depression.

We suggest that in the absence of convincing proof of a pathological process, it is more likely that depression is part of the range of emotional reactions to the circumstances of life that are typical of humans. This is not supernaturalism, and it is not ‘nescience’ - it is pro-science to understand the limits of certain sorts of research. We agree that mental activity arises from brain activity, but it seems more likely that depression is the result not of a faulty brain but rather a normal brain responding to stress or adversity: in other words, a behavioural state best understood at the level of the mind (that is, the thoughts, feelings and actions of human beings in their social context) and not of the brain.

The anti-reductionist position we are advocating suggests the nature of causation in the context of human experience - ‘causation’ that involves meaning, interpretation and agency - is different from the mechanical nature of causation in the physical world. Nevertheless, there are well-established and consistent links between environmental stressors and depression, with large effect sizes [8, 10, 12]. For example, in an analysis of neuroimaging markers for depression, the largest significant effect size for neuroimaging was small (Cohen’s d of 0.23 for Voxel Based Morphometry in structural Magnetic Resonance Imaging), while the effects of social support and childhood maltreatment were large (Cohen’s d of 1.04 and 0.98, respectively, both transformed from eta squared effect sizes) [8]. This is consistent with the effect of stressful life events on the risk of depression (odds ratio 5.64, equivalent to a Cohen’s d of 0.96) [13]. The large effect of social context is consistent with the fact that 70% of people will meet criteria for clinical depression or anxiety by the age of 45 [14]. It seems difficult to believe that 70% of people could have a significant biological defect of any kind. This is not to say that there is no biology involved in depression (and other mood states) – Kendler’s studies found that personality (neuroticism) affects the relationship between stress and depression [12], and, apart from upbringing, genetics might have some role in personality.  But, this is a different proposition from depression being caused by a specific biochemical deficit.

There will always be some study somewhere showing a link between a particular biological parameter and depression. Since thousands of studies are done each year, each making dozens of measurements, many of the positive results transpire to be spurious. This is likely to be the case with the research described by Arnone et al., who argue that serotonin is likely to be involved ‘in the pathophysiology of depression in a more complex manner’ while acknowledging that, in fact, findings are contradictory and inconsistent.

Moreover, we did not include these areas of research because, except for the recent study of amphetamine-induced serotonin release (which we examined in our previous response [15] and which came out subsequent to our paper), they do not involve measuring the relationship between serotonin and depression, and numerous assumptions and questionable interpretations are required to bring the observations to bear on the nature of this relationship.

Take the cognitive neuroscience experiments on emotional responses, for example. Arnone et al. claim that Selective Serotonin Reuptake Inhibitors (SSRIs) have been ‘demonstrated to affect’ emotion processing and cite a study showing reduced recognition of negative facial expressions and increased ‘positive affective memory’ in people taking antidepressants [16]. However, although volunteers given citalopram or reboxetine in this particular experiment were less likely to recognise anger, fear and disgust than those on placebo, there were no differences for sadness and happiness and several other similar experiments found the opposite effect: in two other studies, people on citalopram were more likely to recognise fearful expressions and in one people taking duloxetine were more likely to recognise disgust (and happiness, although this effect was weaker) [17,18,19]. One further study of duloxetine found no effect on recognition of any emotions [20] and one found a small effect on sadness only [21]. Although the reported ability of SSRIs and other antidepressants to numb emotions might predict reduced sensitivity to expressions of strong emotion, these experiments do not produce reliable and replicated evidence of any effect, and do not justify Arnone et al.’s claim that serotonin modulates ‘key neuropsychological processes that are relevant to depression.’

Arnone et al. then go onto discuss research on serotonin and ‘reward learning’, suggesting that this provides evidence that serotonin is involved in ‘motivation, decision-making and cognitive appraisal of positive and negative experiences’. However, they admit that studies of the effects of SSRIs on reward-driven learning are inconsistent with some suggesting SSRIs improve it and some that they impair it. They cite animal studies as supporting the role of serotonin in reward learning, but the results of animal studies are also inconsistent, as revealed in a recent review that despite attempting to account for numerous contradictory findings nevertheless admits they are ‘difficult to interpret’(p.142) [22].

Next Arnone et al. propose that depression is a result of ‘fronto-temporal functional and structural dysconnectivity in brain networks modulated by serotonin’. However, it is difficult to discern a testable hypothesis here and what it means for a brain region (the dorsomedial prefrontal cortex) to have ‘opposite top-down and bottom-up functions’ is not clear. The proposed relationship between Transcranial Magnetic Stimulation (TMS) and serotonin is also not clearly explicated, and since the efficacy of TMS is not well-established (a recent umbrella review shows lack of statistically significant effects, poor quality studies, heterogeneity and publication bias [23]), the fact that a small study showed opposite correlations between treatment response and 5HT1A receptors in different brain regions is difficult to interpret. Electro-convulsive Therapy (ECT) disrupts numerous brain systems and functions [24, 25] so it is not surprising if it impacts the serotonin system, but this does not indicate anything about the origins of depression.

Smith et al. [4] make a number of unsubstantiated criticisms and trivial points about methodological choices in our umbrella review. Smith et al. are incorrect in accusing us of not following a clearly documented and reproducible process for identification and inclusion of research papers and we strongly refute that we selected papers to suit our preconceptions. Leading US psychiatrists, Ronald Pies and George Dawson, pointed out that four previous reviews of the area had also concluded that ‘the total evidence [for the serotonin theory of depression] was inconclusive or inconsistent’ and they and numerous other commentators have declared that our results are so obvious as to be positively boring [26]. Contrary to the accusations, we had a clear study selection process: we outlined transparent inclusion and exclusion criteria in a pre-published protocol (and in the paper), presented explanations for every step of our selection process, including reasons for exclusion of studies at abstract level, and specific exclusions for papers at full-text examination as recommended by PRISMA guidance [27].

Specifically, as pre-specified in the PROSPERO protocol [28] we chose a priori to include ‘systematic reviews, meta-analysis, umbrella reviews or individual patient meta-analysis or very large dataset analysis based on a pre-specified set of studies or data set.’ This decision was made because there are large studies which do not consist of systematic reviews which nonetheless contain important information pertaining to the question of whether serotonin levels are associated with depression, such as large genetic studies and collaborative meta-analyses, and it would be quite misleading not to include these studies which are much larger (10,000s of participants) than some systematic reviews.

Smith et al. specifically criticise us for excluding Bell et al. [29]. However Bell et al. was a narrative review that was neither a systematic review, nor a meta-analysis and therefore does not meet our inclusion criteria.

We were transparent in our modification of the protocol to optimise the quality appraisal of included studies and to add a measure of certainty (GRADE). AMSTAR-2 is a scale designed to appraise the quality of systematic reviews and meta-analyses, and therefore includes several questions about the search strategy (e.g. PICO, study type, literature search strategy, etc.) [30], which are not applicable to meta-analyses that did not involve literature searching. In their discussion of the identification of critical domains for review validity, the authors of the AMSTAR-2 tool themselves point out that ‘our listing is a suggestion and appraisers may add or substitute other critical domains’ [30]. As an example of when the critical nature of certain items might be questioned they highlight the output from ‘clinical trial collaborative groups’ ‘using meta-analysis to summarise a known literature base’ [30] with clear relevance to the Culverhouse collaborative meta-analysis in our umbrella review [31]. It is relatively common to modify quality rating scales, including the AMSTAR (e.g. in umbrella reviews where the identified studies were not just randomised controlled trials (RCTs) [32] or in other cases when criteria were modified to reflect the included studies [33]), and the modified scale was only applied to two of our included studies.

We were transparent about our reasons for modifying the scale and displayed the modified questions in the Supplementary Material of our review as well as our judgement on each item. We used criteria such as ‘Did the review authors explain their selection of the study designs for inclusion in the meta-analysis?’ and Did the authors use a comprehensive search for all the relevant data?’ Although other criteria are possible, we submit that many readers would find such criteria reasonable and sensible to appraise these studies.

Smith criticises our use of GRADE. As we mentioned in our previous reply [15], GRADE was originally established to appraise treatment studies to develop clinical practice guidelines [34] and does not therefore capture the most relevant aspects of aetiological research. Smith criticise us for removing the GRADE criterion regarding whether studies were RCTs or observational, for example, but randomisation is inapplicable to the majority of the research we looked at, since it did not involve an intervention. There are clear precedents for adapting the GRADE as we did for use in aetiological research or for devising other assessments of certainty [11, 35]. Indeed, in a review of approaches for assessing certainty in umbrella reviews, it was found that the majority of umbrella reviews published in top ranking journals varied their criteria for assessing certainty of evidence [36].

As explained by Guyatt, one of its originators, the GRADE approach is recognised to be inherently subjective but its aim is to provide a reproducible and transparent framework for grading certainty [34]. We transparently documented our criteria and our decision on each factor for each study so that readers may come to conclusions for themselves. It did not seem a useful approach to mechanically implement an irrelevant set of criteria.

Smith et al. criticise us particularly for selecting sample size as an important criterion. While we agree that sample size means most when contextualised as part of a power calculation, even then it involves considerable estimation of likely effects and is only possible for a specific study or area and in any case it is rarely done in systematic reviews and meta-analyses. We were influenced in choosing cut-offs for what counted as a large study by previous umbrella reviews assessing credibility– e.g. Arango et al. [35] which chose 1000 participants as the cut-off (a commonly employed threshold in credibility assessments [36]). 10,000 as the cut off for ‘large’ for genetic studies is a conservative rule of thumb from analyses in this area [37].

Smith et al. also criticise the GRADE criterion we included regarding ‘unified analysis conducted on original data’. But, a unified, collaborative analysis (e.g. Culverhouse) is usually considered superior to the synthesis of the results of diverse analyses because this allows standardisation of analysis, consistency of inclusion and exclusion criteria and outcomes, amongst other strengths; aggregate data, on the other hand, are often presented differently across studies (e.g. odds ratio versus relative risk), or poorly reported [38, 39].

Smith et al. criticise us for selecting only the five most recent reviews in a single area where more reviews had been conducted. We chose this criterion a priori to limit including repetitive and overlapping reviews. This only occurred in the genetic research, in which later reviews included the studies included in earlier reviews and the large collaborative meta-analysis included data from most of the previous studies.

Smith et al. claim that we chose misleading citations and suggest a paper we cited by Jakobsen et al. that considered the possibility that antidepressant effects are explained by an amplified placebo effect does not do so. In fact, it does. While Jakobsen et al. do not use the phrase ‘amplified placebo effect’ they discuss this effect when they refer to unblinding of participants due to recognised effects from antidepressants, which compromise ‘valid assessment of subjective effects’ [40]. Smith et al. fail to appreciate the concept of the amplified placebo effect. This concept is important because RCTs which examine antidepressants versus placebo purport to assess drug-placebo difference on the premise that the difference in outcomes is attributable solely to the pharmacological effect of the drug. However, this assumption is violated if patients are unblinded by alterations in physical and mental experience produced by a drug, including by, but not limited to, recognised adverse effects. The possible impact of amplified placebo effects is illustrated by a study in which all participants were given an antidepressant, with half told the truth and half told they had been given an active placebo, thus isolating the expectation effect [41]. This study found a difference of 5 points on the MADRS between the two groups, twice the antidepressant-placebo difference found in meta-analysis of antidepressants [42].

We do not believe readers would have been misled by our citation of a paper by one of the authors of the review when this is clearly stated in the reference list. The way we referred to this paper  is common in academic work.

Smith, Ahmed and Fountoulakis ignore the considerable methodological concerns with studies that find a benefit of antidepressant over placebo, including publication bias, short term duration of studies, unblinding (as mentioned), withdrawal effects from cessation of pre-study medication, dichotomisation of continuous data and whether (when ignoring all these aforementioned issues) the differences are of a clinically important degree [42,43,44]. Fountoulakis argue that a drug can have an indirect effect, but as outlined previously [15] drugs like diuretics have biological mechanisms that act on the causal pathway to heart failure. For drugs that act on mood there are simpler, more plausible explanations for the small differences from placebo than hypothesised indirect biological effects, such as the numbing of emotions produced by antidepressants [45] and amplified placebo effects.

Smith et al. query the references provided for highlighting the importance of sample size and uniform analysis of original data. However the genetic studies we cited (Border et al. and Culverhouse et al.) clearly considered large sample size and uniform analyses to be essential justifications for their re-examination of this area [10, 31].

Even if every criticism put forward by Smith et al. were true (and they are not) it would not alter the conclusion of the umbrella review. Changing the sample size cut off for large studies, inclusion of a further 11 less recent meta-analyses, using the original AMSTAR-2 criteria or original GRADE criteria (which would downgrade most studies equally for not being RCTs) or substituting alternative references would not change the conclusion that there is no convincing evidence that low serotonin or serotonergic activity is associated with depression.

We suggest that this letter is an example of the logic-chopping fallacy (trivial objections). The quibbling over minor differences of opinion with regards methodological approaches and confected concerns over choice of references appear to disguise our correspondents’ displeasure at the exposure of the failure of the biomedical enterprise to understand depression.

To conclude, Smith et al., despite agreeing with us and with our other correspondents that the serotonin theory of depression (the idea that depression is caused by low serotonin levels or activity) is ‘overly simplistic,’ cannot admit that this is the same as saying it currently lacks convincing evidence. Smith, like Arnone et al. and most of our previous correspondents, seem unable to let this defunct theory rest in peace. There is abundant evidence that it is the context of our lives and not the balance of our chemicals that offer the most insight into depression.