The ice in voices: Understanding negative content in auditory-verbal hallucinations

may reduce it remains poorly understood. This paper offers definitions of negative voice content and considers what may cause negative voice-content. We propose a framework in which adverse life-events may underpin much negative voice-content, a relation which may be mediated by me- chanisms including hypervigilance, reduced social rank, shame and self-blame, dissociation, and altered emotional processing. At a neurological level, we note how the involvement of the amygdala and right Broca's area could drive negative voice-content. We observe that negative interactions between hearers and their voices may further drive negative voice-content. Finally, we consider the role of culture in shaping negative voice-content. This framework is intended to deepen and extend cognitive models of voice-hearing and spur further development of psychological interventions for those distressed by such voices. We note that much of the relevant research in this area remains to be performed or replicated. We conclude that more attention needs to be paid to methods for reducing negative voice-content, and urge further research in this important area.

and borderline personality disorder (Larøi et al., 2012). They are also found in the wider general population, in people who are not impaired by the experience (Johns et al., 2014;Kråkvik et al., 2015). This raises the question as to why some individuals are able to cope with this experience and function in society, whereas others suffer social and occupational impairment and seek (or are mandated) psychiatric help.
AVH have a heterogeneous phenomenology (McCarthy-Jones et al., 2012;Nayani & David, 1996;Woods et al., 2015), but one prominent aspect of the experience that has a prima facie relation to distress and impairment is negative content, i.e., hearing voices that threaten, frighten, criticize, or abuse the hearer. The majority of people diagnosed with schizophrenia who report AVH have voices with negative content (Nayani & David, 1996). Studies systematically examining this have reported that the most common forms that negative content takes are threats and negative evaluations of the person. For example, Nayani and David (1996) found that 77% of participants reported content coded as critical (e.g. "you can't do anything right"), 70% as abusive (e.g. "ugly bitch"), and 66% as frightening (e.g. "we are going to kill you"). An analysis by Copolov et al. (2004) found that negative affect associated with voices was particularly associated with participants deeming the tone of the voices they heard to be angry, menacing, harsh and malicious/nasty, and stating that their content was derogatory, persecutory, abusive/insulting, accusatory, intrusive, critical, obscene and threatening. Such negative content is likely the reason why hallucinations are associated with an increased risk of suicide in people diagnosed with schizophrenia (Kjelby et al., 2015). AVH with negative content are also heard by individuals with diagnoses including substance abuse disorders, borderline personality disorder, bipolar disorder, post-traumatic stress disorder, and dissociative disorders (Anketell et al., 2010;Jessop et al., 2008;Larøi et al., 2012; see Waters & Fernyhough, 2017, for a review).
It has been argued that one major difference between clinical and nonclinical voice-hearers is negative voice-content (Larøi, 2012). For example, Daalman et al. (2010) compared AVH in people diagnosed with a psychotic disorder to AVH in non-clinical voice-hearers, and found greater negative emotional content in patients. When negative content was used as a predictor in a regression model, there was an 88% probability of a correct categorization as a patient/non-patient based on this characteristic. Furthermore, negative content was the variable that best predicted whether a person experiencing AVH had a psychotic disorder when compared with the other variables that also significantly distinguished the two groups (age at onset, frequency, control). Honig et al. (1998) found similar results when comparing non-patients who heard voices with two clinical voice-hearing groups (schizophrenia, dissociative disorders). All three groups reported positive voices, but there were group differences in the prevalence of negative voices. Negative voices were reported by 100% of the schizophrenia group, 93% of the dissociative group, but only 53% of the non-patient group, suggesting that the amount of negative voice content may influence the need for care. Taken together, these two studies hence suggest negative content is an important factor predicting need for care in someone hearing voices. However, despite an extensive number of excellent reviews on AVH (Baumeister et al., 2017;de Leede-Smith & Barkus, 2013;Mawson et al., 2010;Upthegrove et al., 2016), none examine the reasons why AVH are negative.

Negative voice-content and the cognitive model
In 1989, Lorna Smith Benjamin argued that the content of a person's voices was directly responsible for the person's behavioural response to them. She claimed that: "If the voice attacks the patient, he or she is depressed and suicidal. If the voice tells the patient to kill others, then, if the patient loses self-control, murderous attacks on others are likely. If the voice tells the person he or she is wonderful and powerful, grandiose manic behaviours appear" (Benjamin, 1989, p. 293).
However, the development of the cognitive model of voice-hearing (Chadwick & Birchwood, 1994) drew attention away from such arguments. Chadwick and Birchwood argued that beliefs voice-hearers have about voices -in particular their perceived omnipotence and perceived intention to do evil or good (i.e., malevolence or benevolence) -were key determinants of ensuing distress and problem behaviour. This model was interpreted by Peters et al. (2012) as indicating that "distress and behavioural repertoire in voice hearers is most closely tied to beliefs about voices, irrespective of content" (p. 1507, italics added), and that "It's not what you hear, it's the way you think about it". As a result, cognitive behavioural therapy (CBT) has focused on changing voicehearers' beliefs about their voices, in order to reduce distress and minimize problem behaviour. Although appraisals do influence the level of distress and problem behaviour that AVH cause, contemporary instantiations of the cognitive model have neglected the role of voice-content itself. And yet, Chadwick and Birchwood's (1994) study actually stressed the importance of voice-content, stating that "voice content was frequently put forward [by voice-hearers] as evidence for a particular belief" (p. 192). They added to this the caveat that "the class of belief was not always understandable in the light of voice content alone" (italics added, p. 192). This caveat has been supported by empirical work. In a study by van der Gaag (2003), the presence of negative voice-content in patients was 'objectively' established by asking two independent laypeople to rate whether the transcribed content of voices was negative. This was done based on their judgment as to whether the hearer was being negatively evaluated (e.g., condemned, humiliated, called names), threatened, or was instructed to do something nasty or dangerous. Patients were then asked to rate the intent (malevolent, benevolent, or neutral) they perceived the voice to have. Of the 16 voices of patients that the external raters deemed to have negative content, 14 were believed to be malevolent by the voice-hearers (i.e. 88%), while 2 were believed to be benevolent. Furthermore, whereas the external raters deemed 25 voices to have neutral content (e.g. "Make a pot of tea" or "You should go for a walk in the park"), voice-hearers believed 14 of them (i.e. 56%) to have some malevolent intent. This supported Chadwick and Birchwood's contention that negative content was a strong driver of perceived malevolence of AVH, but also that such negative voice-content was neither necessary nor sufficient for perceived malevolence.
Defining negative voice-content is problematic, however. For instance, negative voice-content cannot simply be adjudged from the decontextualized linguistic content of what is said, in the sense of the 'objective' assessments employed by van der Gaag (2003). Indeed, for many linguistic theorists, decontextualized linguistic content cannot truly yield content at all, and it certainly cannot yield the negativity we require. For example, hearing a voice with apparently positive content, such as "you're a good girl", could be experienced as negative if it was said by the voice of someone who had previously abused the hearer. So it seems that the hearer's reaction to what is said needs to be taken into account. But here it is important to distinguish between two different kinds of reaction, one of which is integral to negative content, the other which is not. We cannot define negative content as simply that which evokes a negative emotional reaction in the hearer. This would both rule in and rule out cases that intuitively we wouldn't want to rule in or rule out. For example, we want to allow that a hearer can experience an utterance with negative content, but learn to not be hurt by it (arguably, techniques like Relating Therapy (Hayward and Fuller 2010) are trying to achieve just that). In such a case, there still is negative voice content; indeed, that is precisely why the hearer's ability to overcome it counts as an achievement. Conversely, hearing a voice with apparently neutral content (e.g., "get the milk") could result in a negative response in the hearer. This could be due to its troubling frequency or due to social representations of voice-hearing in the West that link it to madness and mental violation (Luhrmann et al., 2015a). Yet we would not want to call this negative content.
These intuitions about what we would and would not want to count as negative content reflect two different kinds of reaction on the part of the hearer: one is the interpretative reaction, which involves ascertaining the content and force of the utterance, and the other is the postinterpretative reaction, namely how the person reacts once the content of the utterance has been established. It is the former reaction that contributes to negative content: the latter does not. To use an example, if someone insults me, there are two quite different reasons why I might fail to be hurt by this. It could be because I fail to interpret it as an insult, or because I interpret it as an insult but (for a number of possible reasons) aren't hurt by it (maybe I don't care what the insulter thinks of me). In the former case, I have not experienced negative content (even though it was intended that I should!); in the latter case, I have, but have overcome it. So, negative content is not just decontextualized linguistic content: the very same words can be used to express positive, negative or neutral content. There is nothing intrinsically negative (or positive) about a sequence of words. What is needed in addition is the hearer's interpretative reaction to the utterance. There is one further complication surrounding an adequate definition of "negative content", which is not so much to do with content as with negativity. The simplest case of content negativity involves a case where the hearer is the direct target of negativity (e.g. an insult). However, this need not always be the case in order to generate negative content. For example, a command to do something awful will count as negative content (and will need to be interpreted as such by the hearer) even though the hearer is not the direct target. But as with insults, there are two levels of reaction to negative commands: one interpretative, the other post-interpretative.
With these considerations in mind we can present a solid working definition of negative content. This definition utilises the language of the UK Equality Act 2010 (Chapter 15, Section 26). Borrowing the wording of this Act, we define negative voice content as speech by voices that a reasonable person would interpret as violating their dignity, or creating an intimidating, hostile, degrading, humiliating or otherwise offensive environment.
Examining the common classes of content described as "negative" suggests that what we observe in the content of voices does not seem to be captured by simple language-based definitions of negativity such as the frequency or proportion of negatively valenced adjectives. Most negative voices take the form of name-calling, criticize what someone is doing, and use aggressive or threatening phrases (Nayani & David, 1996). Because the hearer is usually the object of these verbalizations, these seem defined more by their interpersonal characteristics of hostility than by characteristics of the words heard. Such verbally expressed interpersonal exchanges would typically be characterized as attacking or abusive. This framing of voice content as "hostile" or "abusive" rather than "negative" has implications for which aspects of meaning attached to voices are better considered as secondary appraisals and which are more fundamental features of the experience. When Chadwick and Birchwood (1994) highlighted the importance of beliefs about voice malevolence and benevolence, they were conveying that voice hearers experience both an appraisal of the intentionality of voices' verbalizations towards them, and a valence (harm versus benefit) of this. Whilst the belief about intentionality -voices having free will and acting purposefully -appears to be an appraisal of the experience, the orientation of that intent towards harm versus benefit is likely to be primarily reflective of the degree of hostile content (with this perceived intent potentially reciprocally influencing the hostility of the content).
Even if the relation between negative voice-content and distress/ problem behaviour were to be mediated by beliefs about voices, effecting enduring, robust, and stable changes to beliefs about voice intentions is unlikely without changing negative voice-content. This is because, as Chadwick and Birchwood (1994) noted, voice-content shapes beliefs, such as perceived malevolence, which are therefore not independent from voice-content. There is currently no evidence that CBT is able to alter the perceived malevolence of voices, which is unsurprising as it is not a stated goal of CBT to alter negative voicecontent (Morrison & Barratt, 2009). It is possible that the relatively low effect-sizes of CBT for voice-hearing (Jauhar et al., 2014;van der Gaag et al., 2014) may be, in part, due to attempts to alter beliefs about voices in the absence of effective ways to change the negative content of the voice itself. It hence seems reasonable to suggest that we need to extend and deepen the cognitive model of voice-hearing to be able to reduce the negative content of the voice itself (McCarthy-Jones, 2014). To do this, we need to understand what causes negative voice-content.

What causes negative content?
The first proposal we will examine is that voices can be experienced as having negative content as a function of their origin in negative life experiences, with the contents of AVH being related to "the stresses that precipitate" them (Bentall, 1990, p. 91).

Adverse life experiences
There is extensive evidence of an association between AVH and the presence of earlier traumatic life-events in both clinical (Read & Argyle, 1999;Read et al., 2003;Morrison & Peterson, 2003;Offen et al., 2003) and non-clinical Shevlin et al., 2007) populations. There is also evidence that the content of AVH may reflect something of traumas that people have experienced, when present. Reiff et al. (2012) interviewed 21 people diagnosed with schizophrenia-spectrum disorders who had experienced physical or sexual abuse as a child. They found that 76% of these patients made links between their child abuse and the content of their hallucinations/delusions. For example, one participant who had suffered child sexual abuse by their step-father later heard the hallucinated voice of the step-father; another who was raped in childhood later heard a hallucinated voice that threatened rape. Similarly, Thompson et al. (2010) found that, in a population at ultra-high risk for psychosis, having a history of sexual trauma increased the odds nine-fold that sub-threshold psychotic experiences with sexual content would be experienced. Likewise, Raune et al. (2006) found that attributes of stressful events in the year preceding psychosis onset were associated with core themes of both delusions and hallucinations. Finally, Corstens and Longden (2013) found that 94% of voice-hearers, the majority of whom had been diagnosed with schizophrenia, had voices whose content could be related to earlier emotionally overwhelming events, with both voices and adverse events often sharing common emotions (e.g. both involving low self-worth, anger, shame, and guilt) and common individuals (e.g. both involving a family member, a past abuser).
Yet there is also evidence that the relation between AVH content and traumatic life-events may not be as strong as suggested above. Hardy et al. (2005) examined the relations between trauma and AVHcontent and found that 57.5% of patients with a history of trauma had either direct or indirect (thematic) links between the content of their trauma and the content of their AVH. Not only did the remainder (42.5%) have no clear links between their trauma and their AVH-content, but when the voices of patients with a history of trauma (n = 40) were compared with the voices heard by patients with no trauma history (n = 35) then there were no differences between the groups in the levels of threat, humiliation or guilt in the voices. This suggests that, even if trauma is a determinant of negative content, it is not the only factor.
Furthermore, there is some evidence that the presence of trauma does not uniquely predict the presence of voices with negative content, but rather predicts the presence of voice hearing per se. Daalman et al. (2012) compared the prevalence of childhood trauma in (i) healthy controls, (ii) people with AVH without a need for care (non-patients), and (iii) patients who heard AVH in the context of a clinically diagnosed psychotic disorder. Five types of childhood trauma were assessed (sexual, physical, and emotional abuse, and physical and emotional neglect). Each was treated as an individual dichotomous variable ('moderate or severe' exposure was either undergone or not). The results revealed that both non-patients with AVH and psychotic patients with AVH experienced more childhood trauma than the healthy control group, but that there were no significant differences in the prevalence of childhood trauma between the two groups experiencing AVH. From this it be could be intuited that if levels of trauma do not differ between two groups of voice-hearers that are known to have markedly different levels of negative voice-content and distress, then trauma may not be a predictor of either negative voice-content and distress.
Daalman et al. reported further analyses that explicitly addressed this. They found that none of the five forms of trauma assessed could predict the 'emotional valence of content'. This was a composite measure made up of three PSYRATS-AH 29 items (amount of negative content, degree of negative content, and amount of distress). This could be taken to suggest that trauma may render the person vulnerable to experiencing AVH, and that further explanations are needed as to why trauma sometimes leads to AVH with negative content, sometimes leads to AVH with positive (e.g., comforting) content and often times results in both. However, this study had a number of limitations. First, it only assessed the presence/absence of five types of trauma. Furthermore, it used a dichotomised measure of these trauma variables. This limited the power of the study. Second, the inclusion of 'amount of distress' in their outcome measure (emotional valence of content), impaired the ability to interpret this measure as being of negative voice-content.
A recent study has more directly addressed the question as to whether there is an association between trauma and negative voicecontent in people who hear voices. Rosen et al. (2017) examined the relation between childhood adversity and negative voice-content in a clinical sample of 61 patients (48 schizophrenia, 13 bipolar) who heard voices. Adverse childhood experiences were assessed using a continuous variable: scores on the Adverse Childhood Experience (Reavis et al., 2013). Negative voice-content was assessed using the two negative content items (amount and degree) of the Psychotic Symptom Rating Scale (Haddock et al., 1999). A significant correlation (r = 0.44) was found between these two variables: greater levels of childhood adversity were associated with greater levels of negative voice-content. It is also notable that the study found that negative voice-content fully mediated the relation between childhood adversity and voice-distress.
Despite their apparently conflicting findings, both the Rosen et al. (2017) study and the study of Daalman et al. (2012) highlight the need to understand what may mediate the relation between trauma and negative voice-content.
In an epidemiological study, Bless et al. (2018) examined a number of aspects relating to first onset of AVH, such as the role of adverse life events at first onset of AVH on symptom severity and general mental health. Participants who reported adverse life events at first onset of AVH (adverse-trigger group; N = 76) and participants who did not report any specific events at first onset of AVH (no-adverse-trigger group; N = 59) were compared on a large array of variables, including emotional content of AVH. Results showed that AVH in the adverse-trigger group were experienced as more emotional (i.e. significantly more negative and positive AVH, and significantly less neutral AVH) compared to the no-adverse-trigger group.

Threat as a mediator between adverse life experiences and negative voice-content
It may be that a continuing sense of threat after adverse life experiences plays a role in generating negative voice-content. One response to trauma that has been proposed to result in negative voicecontent is hypervigilance. Dodgson and Gordon (2009) theorize that voice-hearing results from people scanning their environment for threats (due to understandable, salient concerns they have), which in turn encourages auditory false positives; i.e. AVH. For example, Smailes et al. (2015) describe the case of an individual who was involved in a violent confrontation with a gang and began to be hypervigilant for signs that he would be attacked. He then developed AVH that involved hearing threatening comments from people passing his house at night. The negative content of his voices could hence be conceptualized as directly arising from hypervigilance for threats relating to his understandable concerns and worries. There is at least some experimental evidence consistent with this account (Garwood et al., 2015;Dudley et al., 2014). It is notable here, given Deamer and Wilkinson's (2015) conception of the 'speaker behind the voice', that what appears to be driving the perceptions of threats in hypervigilance AVH is the perception of malevolent agents. We will return to the clinical implications of this distinction later.
Hypervigilance AVH may be part of a wider class of AVH representing the intrusion of highly salient material into consciousness. As what is salient to us often involves threats to our physical integrity or social value, this may explain why the content of AVH is so often negative. In particular, it has been argued that experiences of social threat after trauma, signalled through the emotion of shame, may both encourage the development of AVH and colour their content (McCarthy-Jones, 2012, 2017a. The Compassion Focused Therapy (CFT) framework suggests that an inability to mitigate shame, either through an inability to self-soothe, or through a lack of available external soothing (due to, for example, social isolation), could perpetuate this threat (Mayhew & Gilbert, 2008;Gilbert, 2009). By helping people to develop self-compassion and self-soothing, perceived threats can reduced. For example, when Mayhew and Gilbert (2008) applied CFT to three patients who heard voices they found that CFT "had a major effect on voice-hearers' hostile voices, changing them into more reassuring, less persecutory and less malevolent voices". This suggests that experiences of shame and guilt may underpin some negative voice-content and, crucially, that working with these emotions in therapy can result in changes to the affective valence of voice-hearing.
In addition to threats to one's physical integrity and social value, threats may also arise to one's cognitive integrity. Psychodynamic models of psychosis have long suggested that the content of psychotic experiences might be understood as reflecting underlying conflict or anxiety. A more recent cognitive formulation with parallels to this idea comes from Morrison's cognitive model of voices (Morrison et al., 1995). This suggests that hallucinations may arise when intrusive thoughts and images create cognitive dissonance (Festinger, 1959), resulting in them being externally attributed, or disavowed. Whilst this proposed mechanism might explain why negative material is particularly likely to be manifest in voices, there is, as yet, no direct empirical evidence that voice content reflects material that is regarded as inconsistent with views of self. On the other hand, more recent research has implicated dissociation as a process involved in hallucinations, with one idea being that voices may reflect dissociated parts of self, reflecting either responses to trauma (e.g. Corstens et al., 2008), and/or difficulties integrating different I-positions (Perona-Garcelan et al. 2015). From this perspective, dissociated self-states may be particularly associated with adverse experiences, giving rise to associated negative content. At present, although there is some evidence that dissociation mediates the relation between trauma and hallucinations (Varese, Barkus, & Bentall, 2012), it is unclear whether dissociation predicts hallucination-proneness in general, or specifically negative content.
A related idea, which does not require explanation in terms of defence processes, is that negative voice-content is a manifestation of cognitions related to self-monitoring and self-regulation of behaviour, in a similar way to models of self-criticism (Carver et al., 1999;Higgins, 1987). Self-regulation appears to be manifest in neutral running commentary at some times for some people, and one may speculate that self-critical content, which often coexists with running commentary , may emerge when there are discrepancies between perceived behaviour and internalized goals. Alternatively, threat-related content may emerge in response to an awareness of environmental threats to goals. In support of this there is some evidence that content of voices is thematically linked with internal representations of desired and undesired states (Varese et al., 2016).

Altered emotional processing as a mediator between trauma and negative voice-content
Trauma could also lead to voice-hearing with negative content due to altered affective processing. This term can cover increased sensitivity to negative emotion (neuroticism), increased sensitivity to emotional arousal, and increased use of (maladaptive) emotion regulation strategies. Increased sensitivity to negative emotion and emotional arousal is present in schizophrenia in general (Aleman & Kahn, 2005;Cohen & Minor 2010), and may be related specifically to a dysfunction in the ability to effectively inhibit or disengage from negative emotions. There is some evidence that such alterations may be specific to voice-hearing, although a note of caution is required, as evidence comes from studies including non-clinical samples and often involves correlational analyses. Studies (e.g., Larøi et al., 2005) report significant correlations between hallucination-proneness and neuroticism, and significant correlations have been observed (e.g. Van t'Wout et al., 2004;Larøi et al., 2008) between hallucination-proneness and a high degree of "emotionalizing" (the degree to which someone is emotionally aroused by emotion-inducing events). These findings suggest that hallucinations may be related to an increased sensitivity for emotional arousal. Indeed, there is also considerable evidence showing that voice-hearers with depressed mood (for which neuroticism is a major risk factor) are more likely to have clinically relevant hallucinations i.e. associated with distress and dysfunction and in the context of a psychiatric diagnosis (Krabbendam et al. 2005;Smith et al. 2006).
Evidence of atypical emotional processing in relation to AVH also comes from neuroimaging studies. Escarti et al. (2010) used fMRI to examine people diagnosed with schizophrenia with and without AVH, as well as healthy controls. Participants were asked to listen passively to lists of both emotional and neutral words. The emotional words were selected according to their frequency of occurrence in the patients' hallucinations. As such, the emotional words here were all negatively valenced words. Results revealed that patients with AVH showed significantly greater activation in the amygdala and the parahippocampal gyrus compared to the other patient group and controls during the processing of emotional words. Similar results are reported in Sanjuan et al. (2007). However, the reverse pattern (hypoactivation in the amygdala) in patients with AVH were reported in Kang et al. (2009), although this study used a different design in that they asked participants to listen to emotional sounds (e.g. laughing, crying) and participants were asked to identify the gender of the voice.
AVH with negative content may be related to the use of maladaptive emotion regulation strategies. Studies have linked emotional suppression to the severity of voices, although not to negative content specifically. Within a group of people diagnosed with schizophrenia with AVH, Badcock et al. (2011) reported that an increased use of expressive suppression (as assessed with the Emotion Regulation Questionnaire; Gross and John, 2003) was significantly correlated with severity of AVH (frequency, duration, loudness), and that this was independent of levels of anxiety and depression, and medication. The authors conclude that "…the precise mechanism underlying the association between emotional suppression and severity of AH is, as yet, unclear" but they suggest that using suppression may limit available executive resources (e.g. lack of inhibitory control). Similarly, there is evidence from nonclinical samples (Laloyaux et al., 2016) that maladaptive emotion regulation strategies mediate the relation between adverse life events and hallucination-proneness. If and how emotional suppression could result in the dissemination of emotions into the contents of AVH is currently not known and therefore need to be directly examined in future studies. However, an association between voice content and mood has been confirmed as consistent with the clinical observation that content is frequently mood-congruent in persons with affective psychoses (see Toh, Thomas & Rossell, 2015). One possibility (cf. Larøi, 2012) is that in the face of highly emotional/stressful life-events, patients with AVH are not optimally equipped to deal with these and therefore (in additional to other factors), for example, may use maladaptive emotional regulation strategies to cope with them. One consequence of this may be allowing these events or situations to have an important influence on the content of their AVH (cf. Badcock et al. 2011). The contrary (i.e. use of adaptive emotional regulation strategies) would thus be considered as a protective factor -not necessarily resulting in the absence of experiencing AVH but rather in reducing the chances that these life-events will have an important impact on the content of their AVH. This claim remains to be directly examined.

Schemas as a mediator between trauma and negative voice-content
Schemas (generalized cognitive representations of prior experience) of self and others have been proposed to underpin the development of negative content. Some models have proposed that schemas may also directly influence voice content (e.g., Beck and Rector, 2003;Paulik, 2012), and they hence have the potential to mediate between traumatic events and voices with negative content. Although Smith et al. (2006) found that clinical voice-hearers with greater levels of depression and lower self-esteem heard voices with more intense negative content, they found negative content to be unrelated to negative evaluative beliefs. Similarly, Thomas et al. (2015) found, in a clinical sample of voicehearing patients, that there were no correlations between negative (or positive) self-or other-schemas and the amount or degree of negative voice content. This suggests that, whilst the degree of negative content experienced may be related to emotional state (as discussed above), it does not simply reflect the strength of negative schemas. However, it should be noted that the associations between schemas and voice content has only been examined at a broad level: internalized representations of specific others or more specific situations may still pertain to voice content.

Neurophysiology
We have previously examined an explanation in which voices with negative content arise from current or past emotional events/concerns in the person's life. An alternative framework is one in which negative voice-content does not necessarily arise from the person's past or present concerns. One model within this framework proposes that the negative emotional valence of AVH results from the activation of right Broca's area (Sommer et al., 2008), and that speech produced by this area has an inherent tendency to have negative affect. Building on their research group's neuroimaging studies showing right Broca's area activation during AVH (Sommer et al., 2008), Sommer and Diederen (2009) have proposed an account of voices grounded in the activation of the right hemisphere homologue of Broca's area. They note phenomenological accords between AVH and the speech that can be created by the right hemisphere in patients with severe aphasia, which is often repetitive simple, 'automatic speech' utterances, with little variation, which often consist of terms of abuse or swear words. It is possible that this relation pertains because right Broca's area plays a role in inhibitory processes (Sommer et al., 2008).
Neuroimaging studies have implied a role for the amygdala and connected emotional networks in AVH. For example, in Escarti et al.
(2010) -a study described above -patients with AVH showed significantly greater activation in the amygdala and the parahippocampal gyrus during the processing of negatively valenced emotional words, compared to patients without AVH. In contrast, during the resting state, Vercammen et al. (2010) reported reduced connectivity of amygdala with the left temporoparietal region (an area of speech perception cortex typically found to be overactive during AVH). In a similar vein, Van Lutterveld et al. (2014) found a decreased function as a hub region for the amygdala in resting state networks, in nonpsychotic subjects with AVH. Interestingly, a recent glucose-PET imaging study measured brain metabolism during the perception of aversive auditory stimuli mimicking the content of AVH in remitted patients with schizophrenia F. Larøi et al. Clinical Psychology Review 67 (2019) 1-10 who previously reported severe AVH during an acute psychotic episode (Horga et al. 2014). The patients showed an exaggerated response to the AVH-like stimuli in limbic and paralimbic regions, including the left amygdala, in comparison to healthy control subjects.
It remains an open question as to whether such neural changes represent an instantiation at the neural level of the subjective experience of threats, discussed in section 3.1, or whether these are endogenous changes which produce negative content in the absence of perceived external threats.

Culture
It has been suggested that culture shapes hallucinations across multiple levels: prevalence, content, and meaning . Very few studies have directly examined whether the affective nature of AVH content may be influenced by culture and the immediate environment. Mitchell and Vierkant (1989) compared two samples of patients with schizophrenia from the same hospital but who were hospitalized during two distinct periods: those admitted from 1933 to 1939 and those admitted from 1986 to 1987. The files reporting delusions and/or hallucinations were then examined. Differences were found regarding the primary sources of the AVH. The 1930s group reported sources that were religious in nature (God, the Holy Ghost, spirits) while the 1980s reported sources that were religious, secular and technological (God, devils/demons, doctor, scanner, television, radio). Differences were observed concerning content of AVH and in particular for command hallucinations: 1930s: benign and religious ("be a better person", "live right", be good and go to heaven", "lean on the Lord"); 1980s: negative and destructive ("kill oneself/others", "do perverse things", "set fire to the lawn"). Finally, there were reported differences in terms of affective content of the AVH: "We want to go to heaven" versus "You will be crucified for your sins". The cultural milieu may hence increase the probability of AVH having negative content.
Further evidence for this proposition comes from a landmark study by Luhrmann et al. (2015aLuhrmann et al. ( , 2015b) that compared AVH among people with schizophrenia (San Mateo, California; Accra, Ghana; Chennai, India). This found that negative content of AVH may vary according to culture, and that culture may play a role in "colouring" the content of AVH. In general, the American sample did not treat their voices as persons, their accounts of voice-hearing were filled with violence and they disliked their voices. Patients in Chennai and Accra, by contrast, did not experience voice-hearing as necessarily bad and were more likely to both identify voices as people they know and to describe conversational relationships with their voices. The authors (Luhrmann et al., 2015a(Luhrmann et al., , 2015b suggest that cultural expectations about the mind may shape the voice-hearing experience of persons with psychotic disorder. Western culture tends to view mind (and self) as a separate, private place, whereas non-Western cultures are more likely to imagine mind and self as interwoven with others. Thus, the American patients in their study may have felt assaulted by voices, and may have perceived them as an intrusion into their private world, as violations of the mind. In contrast, patients with a more "porous" view of mind/self, as may have been the case for the patients from India and Africa, were not as troubled by these uncontrollable experiences as they might have, for example, interpreted them as people who cannot be controlled -thus interpreting voices as relationships and not as the sign of a violated mind.

Negative engagements and relationship with voices
Given that it is common for people to engage with hallucinations via subvocal (and sometimes overt) verbal interaction (Leudar et al., 1997), negative content often results in the person attempting to cope with these experiences via hostile responses such as telling them to go away or shouting at them (Thomas, McLeod & Brewin, 2009). This may lead to patterns of hostile dialogue evolving into regular patterns of negative interaction with voices, which over time may reinforce cognitive structures associated with negative content (Thomas, 2015). Whilst negative voice-content may lead to a negative relationship between the person and their voice, which may reinforce negative voice-content, it may also be the case that having a negative perceived relationship with a voice also drives negative voice-content. In order to establish this, it would need to be tested whether or not changing one's relationship with a voice can lead to changes in its content. There is some preliminary evidence that this may be the case (Corstens et al., 2014;Longden, 2013). For example, Longden (2013) describes the results of being compassionate towards the very hostile and angry voices she heard; "Awful things have happened to you," I said to them [the voices] one day, "and you've carried all the negative emotions and memories. And all I've ever done in return is attack and criticize you. It must have been really hard to be so vilified and misunderstood." There was a very long pause before a response finally came: "Yes. Thank you." This process of accepting and changing one's relationships with voices has been promoted as an important means of adapting to the experience within the Hearing Voices Movement (Corstens et al., 2014), and interventions promoting less negative forms of relating to voices have shown some promise (Hayward et al., 2016;Hayward & Fuller, 2010). Further evidence is still needed of the effectiveness of such interventions in causally impacting upon negative voice-content.

Discussion
Negative voice-content plays an important role in determining ensuing distress and need for care in someone who hears voices. The findings of this paper suggest the following conceptualization. Voicehearing is an experience in which some cognitions come to be experienced as alien and audible. Negative emotional material may be more likely to manifest in this way, due to the tendency for intrusion into awareness to result from incompletely processed traumatic events (i.e. those that are processed in a dissociative manner), and the salience of cognitions pertaining to physical or social threats. The exact verbalizations heard will likely be shaped by current environment, culture, and past experiences, although not necessarily the overall degree of negativity. Such negative voice-content may then be reinforced by the person's response; such as arguing back with hostile material and rebound effects from attempts to suppress (Wegner, 1989). Cultural differences noted might reflect reactions to voice content which result in greater attempts to resist or suppress voices in Western cultures. Negative content then encourages distress and impairment in the hearer experiences. These mechanisms and factors are illustrated diagrammatically in Fig. 1.
It should be noted that the diagram presented in the Figure is meant to act as a framework to guide future research, rather than being a validated theoretical model of the aetiology of AVH. The evidence for the pathways illustrated in this Figure is of a variety of qualities, ranging from speculation to replicated empirical findings. For example, the potential link between right Broca's area and the negative content of AVH is a speculation made in light of the finding that right Broca's area was activated during some patients' AVH (Sommer et al., 2008). Whether scanning the environment for threats (hypervigilance) is a potential factor in generating negative voice-content has only been examined by one research group and their findings are in need of replication. Likewise, the hypothesis that culture may colour the (negative) content of AVH has yet to be properly examined. This stands in contrast to other relations in our framework that have been widely replicated. This includes the association between adversity and negative affect (e.g., Lindert et al., 2014;Norman et al., 2012) and the association between negative affect, emotional regulation strategies such as dissociation and suppression, and intrusions (McNally and Ricciardi, 1996;Wenzlaff et al., 1988). It is thus clear that future studies are needed to examine the mechanisms that we propose play a role in the development of negative AVH content.
F. Larøi et al. Clinical Psychology Review 67 (2019) 1-10 The potential contributors to negative voice-content identified by this review can be seen to fall into two distinct classes. The first are what could be termed negative meaning structures, such as negative thoughts, schemas, mood, traumas, and the wider culture. A causal role for such factors would be assessed through studies of individual differences in content among voice hearers. However, beyond this, we have seen intimations that there may be something about the underlying mechanisms that produce voice-hearing (regulation/inhibition problems, hypervigilance) that means that negative emotional material is more common. This question will be informed by studies of the typical content of hallucinations, and by comparing cognitive and neural processing in hallucinators versus nonhallucinators.
A prediction of this latter observation is that voice-hearing is more likely to have negative content than positive or neutral content. To assess if voice-hearing is more likely to have negative than positive content (ignoring neutral content for the purposes of simplification), we would need to know, 1) the prevalence of negative and positive voicecontent in clinical populations, 2) the prevalence of negative and positive voice-content in non-clinical populations, and 3) the ratio of clinical to non-clinical voice-hearing.
Items #1 and #2 can be estimated. The prevalence of positive and negative voice-content in clinical populations, if it were to be approximately the same as schizophrenia (an unexamined assumption, in terms of available empirical evidence), could be estimated as 50% and 80% respectively (e.g., Nayani & David, 1996). The prevalence of positive and negative content in the non-clinical population can be estimated from a recent study by Kråkvik et al. (2015) which found that, of people in the general population who had never sought professional help for the voices they heard, 29% had heard voices with negative content, and 48% had heard voices with positive content. It is interesting to note from Kråkvik et al.'s preliminary study that the ratio of negative:positive voice-content in clinical and non-clinical voicehearers appears to be inverted (being 5:3 in clinical hearers and 3:5 in non-clinical hearers). However, more studies are required to gain a reliable estimate of the ratio of negative: positive voice-hearing in the general population.
It is hard to estimate item #3 (the ratio of clinical to non-clinical voice-hearing). First, the prevalence of voice-hearing in the non-clinical population is unclear. Johns et al. (2004) found that, after excluding people with probable psychosis, 0.7% of people reported the experience of having heard voices saying quite a few words or sentences in the past year. However, they did not report on the frequency of this experience. In contrast, Kråkvik et al. found 0.3% of people in the general populations heard voices daily without seeking help, and 0.9% heard voices at least several times a week, but did not assess the extent of these experiences, which could simply have been hearing one's name called (a voice-hearing experience of limited comparability to that found in clinical populations). Based on such figures, it has been argued that the prevalence of clinical voice-hearing exceeds non-clinical voice-hearing (McCarthy-Jones, 2017b); however, more data is needed before we could conclude with certainty on this, and conclude that voice-content is inherently more likely to be negative.
There are a number of potential implications of this review for reducing distress and dysfunction driven by negative voice-content. Interventions to date have primarily attempted to effect change via targeting appraisals and responses to voices. These may have a role in helping the person break out of patterns of responding to negative content that may maintain the activation of negative meaning structures, such as immersion in the experience, hostile responses, or compliance (Hayward et al., 2016;Thomas, 2015). However, there is a need to go beyond this and consider factors that may reduce the occurrence of negative content more directly (McCarthy-Jones, 2014). F. Larøi et al. Clinical Psychology Review 67 (2019) 1-10 Traditional methods of cognitive restructuring and working with negative cognitions and schemas may apply, but these have had limited direct study in people with persisting voices. However, there has been study of intervention approaches aiming to reinforce positive selfschemas. An approach to increase accessibility of positive self-esteem had beneficial effects on positive psychotic symptom severity in a small trial with persons with a schizophrenia-related diagnosis (Hall & Tarrier, 2005), although specific measures of voices were not obtained. In another trial, a competitive memory training approach was used that aimed to reinforce positive memories that were inconsistent with negative voice content (Van der Gaag et al., 2012). This trial found positive outcomes for depressive symptoms but the observed effect on overall voice severity (d = 0.30) was not significant. The extent to which these approaches lead to changes in negative self-representations is unclear, and it may be that they primarily buffer against negative voice content allowing the person to decentre from it.
To more explicitly target potential changes in negative voice content, progress may also be made by considering the observed associations with trauma. A key direction is to empirically test the idea that trauma-focused interventions can reduce the negative content of AVH and, if so, to explore what specific parts of such a therapy lead to this change. If negative content arises from intrusion of trauma-related material, exposure-based interventions that involve reprocessing of associated memories and cognitive structures to reduce intrusions could be hypothesised to reduce the frequency of negative voice experiences. There is emerging evidence that exposure-based trauma-focused therapies might have a beneficial secondary impact on psychosis symptoms in persons with comorbid post-traumatic stress disorder (de Bont et al., 2016). However, there is need for focused study of the effects of trauma reprocessing on negative voices as a primary target.
Additional therapeutic strategies include targeting factors which modulate the situational activation of negative meaning associated with voices. These are likely to vary from person to person, but may include shame-triggering stimuli, anxiety, and mood variation. These could include providing voice-hearers with ways of reducing the sense of ongoing threat in their lives, or helping people use more adaptive emotion regulation strategies (and reduce the use of maladaptive regulations). Whilst these skills are already often taught for emotion regulation to reduce distress, more targeted changes in the occurrence of negative content may arise from the explicit application of these methods at the emergence of voice-hearing episodes, or to cope with common triggers of these episodes. Recognition of links between voices and environmental and affective antecedents might be fostered by advances in self-monitoring technologies such as smartphone apps. Additionally, people with elevated experience of shame, guilt or selfblame (in general and/or directly related to AVH) experience) may benefit in particular from methods developed in Compassion Focused Therapy (Braehler et al., 2013). Considering interventions more broadly, there is a need to examine if neurostimulation of right Broca's area may also be able to reduce negative content. Finally, there is the need to consult further with people who hear voices themselves, in order to draw upon their insights into the causes and maintenance factors of their voices.

Role of funding sources
No funding was provided for the writing of the manuscript.

Contributors
All authors contributed significantly to the writing of the manuscript and all authors have approved the final manuscript.