Multiplicity in the experience of voice-hearing: A phenomenological inquiry

J.C

Given its considerable complexity, understanding the phenomenology of voice-hearing is essential to developing a satisfactory explanatory theory (Larøi, 2006) or theories (McCarthy-Jones et al., 2014a) of the experience. For example, McCarthy-Jones, Thomas, et al. (2014a) noted that voices "may be … accusing or enthusing, individual or chorus, spoken or sung, recognized acquaintance or anonymous interlocutor, memories of words past or virgin encounters, stilted-repetitive or novel-creative, heard inside the head or perceived in the world, and spoken to or about the person who hears them" (p. S275). Furthermore, it is well-known that individuals often hear a number and variety of different voices. Yet phenomenological studies have not provided detailed descriptions of the content and nature of multiple voices experienced by the same person, despite noting and sometimes enquiring about their existence (Carter et al., 1995;McCarthy-Jones et al., 2014b;Nayani and David, 1996;Stephane et al., 2003;Woods et al., 2015). Conclusions about mechanisms, subtypes, or risk factors for hearing voices may not be valid unless this multiplicity is adequately reflected in the data. In the current study we therefore sampled several utterances from several voices reported by each participant. We investigated the degree of consistency within voices and within utterances. and tested hypotheses about risk factors when taking this multiplicity into account.
Numerous theories have proposed that the experience of hearing voices involves a misattribution of an inner event (i.e., inner speech) to an external source. However, this approach does not readily account for the heterogeneity of the voices heard (Larøi, 2006) or the interpersonal dynamics of the experience (Longden et al., 2019), and has been somewhat superseded by an attempt to describe subtypes of voices (Stephane et al., 2003). One cluster analysis distinguished between (a) repetitive, constant, commanding, and commenting voices, (b) voices that appeared to replay previously heard words or conversations, and (c) voices that did not address the person, spoke in the first person, and were similar but not identical to words or conversations that had previously been heard (McCarthy-Jones et al., 2014b).
However, in the absence of more detailed information it is hard to establish whether such variability is best expressed in terms of subtypes or dimensions (McCarthy-Jones et al., 2014a). There is also the potential issue that voices might make quite different types of utterance on different occasions. This would suggest that the same voice had the capacity to reflect more than one subtype. Some degree of consistency across multiple utterances is required in order to establish the presence of the kind of stable subtype described above.
In terms of clinical categorisation, voice-hearing is particularly associated with a diagnosis of a psychotic disorder (Aleman and Larøi, 2008;Sommer et al., 2010), although it has also been reported in 5.4% of a general population sample of young adults (Zammit et al., 2013), and in 10-15% of the general population over a lifetime (Tien, 1991). Voice-hearing is also highly prevalent in people diagnosed with non-psychotic disorders such as: dissociative identity disorder (Ross et al., 1990;Sar and Ö ztürk, 2008), posttraumatic stress disorder (Brewin and Patel, 2010;McCarthy-Jones and Longden, 2015), and borderline or emotionally unstable personality disorder (Kingdon et al., 2010).
A number of differences have been found between the voices heard by individuals with a diagnosis of psychosis from individuals in the general population, including more negative content (Larøi et al., 2012;Waters and Fernyhough, 2017;Woods et al., 2015), along with greater frequency and less controllability (Daalman et al., 2011;Waters and Fernyhough, 2017). In patient populations voices tend to be more intrusive and ego-dystonic (Sorrell et al., 2010) and begin at an earlier age than in non-patients (Daalman et al., 2011). In contrast, comparisons of voices reported by individuals with different diagnoses have provided little evidence for any reliable differences in their characteristics (Larøi et al., 2012;Slotema et al., 2012;Waters and Fernyhough, 2017;Woods et al., 2015). However, these comparisons also rely on the assumption that voices are consistent in their content. If they made different kinds of utterance on different occasions this could qualify our understanding of the relationship between voice quality and the presence of various kinds of diagnoses.
In the current study respondents who reported hearing voices were asked to describe up to five voices and give up to five examples of what each voice said to them. The first objective was to investigate whether respondents reported multiple voices and, if so, to document the characteristics of each voice, such as their age, gender, familiarity, and the number and content of their utterances. It has been observed that many voice-hearers do not merely describe hearing a voice but of having a relationship with it (Chadwick and Birchwood, 1994;Hayward, 2003). This could be related to specific syntactic elements of what is spoken, and we coded whether voices ever referred to themselves in the first person (McCarthy-Jones et al., 2014b).
The second objective was to use these descriptive data to test predictions derived from existing theories. It was hypothesized that, if voices represent inner speech, (a) the perceived gender of the voice would correspond to that of the respondent and (b) the age attributed to the voice would not exceed that of the respondent. Further, if voices represent replays of previous experience, it was hypothesized that some of them would be expected to be identified as familiar. Finally, if voices can be readily assigned to subtypes, it was hypothesized that content would be relatively stable across different sample utterances. Stable patterns could confirm existing subtypes or suggest alternative candidates.
The amount of criticism and hostility expressed by voices has also been linked to distress in the voice-hearer, dependent on how messages are appraised (Chadwick and Birchwood, 1994;Thomas et al., 2009). The third objective was therefore to associate features of individual voices (such as their gender, their perceived age, congruence with the respondent's gender, presence of triggers for voice-hearing, and the age they were first heard) with the overall negativity of the voice as assessed by independent raters. No specific hypotheses were formulated.
The final objective was to determine whether overall voice negativity was simultaneously related to characteristics of the person (their gender and mental health status) and to characteristics of the voices (whether or not they were of the same gender as the person hearing them, the age they were first heard, and the presence of triggers to voice-hearing). Based on previous work it was hypothesized that voices would be more negative in content among those diagnosed with psychosis than among those without a diagnosis, but no prediction was made for the comparison between those diagnosed with psychosis and other mental health conditions. The analyses were repeated with two other dependent variables, whether or not voices referred to themselves and utterance length.

Survey procedure
Data were collected via an online survey that was advertized for anonymous completion via service-user websites for hearing voices (Intervoice and the National Survivors Network). Social media accounts, created for the study, additionally advertized the survey on Facebook and Twitter. The study was further advertized by contacting the administrators of peer-support groups throughout the UK and asking them to share the advertizement for the survey amongst their members.
The study was described to potential participants as follows: "This study aims to document and analyse the content of voice hearing experiences (sometimes referred to as 'auditory verbal hallucinations'). By "hearing voices" we mean the experience of having a strong perception of hearing a voice that is not identified as being your own internal voice, and does not originate from another person nearby. Although many people in the general population have this experience from time to time, it is still not well understood and in many people's minds may be associated with mental health difficulties". Participants provided informed consent before completing the survey. Ethical approval was granted by the UCL Research Ethics Committee (project ID number 8279/001). The survey was available for completion from December 2015 to May 2019.

Survey materials
After answering demographic questions participants were asked whether they had ever received a mental health diagnosis, and if so to state any diagnoses received. For analytic purposes these were classified as psychosis versus other diagnoses. They were then asked how many voices they heard and answered questions about up to five of the most prominent voices. For each voice, participants stated its perceived gender and age, if they were able to do so; reported at what age they first remembered hearing the voice; indicated whether or not they perceived the voice to have changed over time; stated whether or not the voice was of a familiar person and indicated who this was; described whether internal conversations were initiated by themselves or by the voice; stated whether or not the voice was triggered by particular circumstances; and gave up to five verbatim examples of what they heard the voice say.

Data analysis
Based on repeated inspection and discussion of a sample of the data, a coding manual was derived which contained six general categories. The content of each individual utterance was then coded by AM into one of the six categories: negative (consisting of criticism, derision, hostility, threats, orders, and instructions to harm self or others), positive (consisting of encouragement, reassurance, expressions of concern, and constructive advice), neutral (consisting of mixed positive and negative material, or material that was not clearly one or the other, general questions or statements, and survey-related comments), delusional (consisting of material with paranoid or other similar content), no clear meaning (e.g., phrases or sentences that did not make sense, non-words, numbers or calculations, and non-speech vocalisations such as humming), and uncodeable (survey entries that did not describe actual voice content but, for example, described the voice or its behavior). Examples are given in Table 1. An independent rater (CRB) coded a sample of 136 separate utterances, obtaining agreement of 78% (kappa = .69). On whether utterances included the voices referring to themselves (simplified yes/no) there was 91% agreement (kappa = .78).
Analyses were conducted at two separate levels, that of individual voices and that of the person. First, the characteristics of individual voices were described, including their age, gender, familiarity, and the number and content of their utterances. An additional variable was created to indicate whether the perceived gender of the voice was congruent with the participant's own gender. Treating each voice as an independent observation, negativity was calculated as the proportion of utterances coded as negative divided by the total number of utterances. This index was then correlated with other voice characteristics. The association of negativity with the categorical variable perceived age of the voice was tested using 1-way ANOVA and post-hoc Dunnett's T3 tests allowing for unequal variances.
Second, at the level of the individual person, the number of voices reported was related to gender and mental health status. Multi-level analysis was then employed to reflect the hierarchical structure of the data with utterances nested within voices nested within individuals. These analyses investigated the simultaneous effects of five potential risk factors (diagnosis of psychosis versus other diagnosis versus no diagnosis, participant gender, voice gender incongruence, reports of triggers for voice-hearing, and the age voice was first heard) on three dependent variables (utterance negativity, utterance self-reference, and utterance wordcount). Intra-class correlation coefficients were calculated within multi-level analyses to assess the degree of agreement separately at the participant and at the voice levels. The eventual sample of 92 was 78% female (n = 69) with an average age of 40.41 (SD 12.30,. The majority (53.5% of those answering; n = 46) had completed an undergraduate or postgraduate degree. Most came from the United Kingdom (40.2%; n = 37) or the United States (35.9%; n = 33). Some reported having not received any psychiatric diagnosis (13.3%; n = 12); 65.6% (n = 59) had received a diagnosis involving a psychotic disorder and the remaining 21.1% (n = 19) had received other, non-psychotic diagnoses. Most (58.9%; n = 53) reported currently taking psychotropic medication.

Voice-level analyses
Number and type of utterance. Of 228 individual voices with at least one valid sample utterance, 182 (79.8%) included a second utterance, 158 (69.3%) a third utterance, 127 (55.7%) a fourth utterance, and 82 (35.9%) a fifth utterance. The mean number of utterances reported for each voice was 3.41 (SD = 1.55). Table 2 indicates how the utterances were coded. Negative utterances were most common, followed by neutral and positive utterances. This ordering was maintained no matter how many sample utterances were provided. Smaller numbers of utterances contained explicitly delusional content or had no clear meaning.
Demographic characteristics of voices. The gender attributed to individual voices was predominantly male (53.9%, n = 123) rather than female (29.8%, n = 68). In 15.8% (n = 36) of cases, the respondent was not able to attribute a gender. Where respondents assigned a gender to both themselves and the voice, gender was incongruent in 58% (n = 109) of cases and congruent in 42% (n = 79). The age attributed to the voice was as follows: child (7.5%; n = 17), adolescent (6.6%; n = 15), young adult (10.1%; n = 23), adult (36.0%; n = 82), middle-aged (16.7%; n = 38), and elderly (4.8%; n = 11). In almost one fifth of cases (18.4%; n = 42) respondents were unable to attribute an age to the voice. A total of 34 respondents (37.0%) indicated that they had voices to which they attributed different ages (e.g., child and adult) whereas 17 (18.5%) reported hearing a voice with a markedly greater age than their own.
Other characteristics of voices. The earliest age at which the voice was heard spanned the entire age range, from less than five years old to 70 years. There were no apparent peaks at any particular life stage. In 59% of cases (n = 134) respondents reported that the voice had always been the same, whereas a change in the voice was reported in 41% (n = 93) of cases. In 71.9% (n = 164) of cases, the respondent did not associate the voice with a familiar person. In the other 28% of cases there was a wide variety of responses, some nominating a family member, a friend, a teacher, a therapist, or an acquaintance, while others had a feeling of familiarity without being able to nominate an individual. Specific situational triggers were reported for 59% (n = 134) of the voices. Finally, in 45.6% (n = 104) of cases respondents indicated that the voice initiated their interactions, only stating that they initiated it themselves in 6.1% (n = 14) of cases. In the remaining 48.2% (n = 110) of cases respondents were either unable or unwilling to answer this question.
Correlates of voice negativity. Negativity was then associated with the other variables describing each voice (voice gender, age first heard, voice changed over time, voice was of a familiar person, voice initiated conversations, specific situational triggers). The only significant correlations were with age first heard, r(227) = − 0.14, p = .041, and with the voice sounding like a familiar person, r(228) = 0.14, p = .034. Greater negativity was associated with first hearing the voice at a younger age and with greater familiarity. Results for the categorical variable measuring the perceived age of the voice are shown in Table 3. The overall ANOVA was significant, F(5, 180) = 3.82, p = .003, partial eta squared = 0.096. Child voices were significantly less negative than all other voices except the elderly (p < .05); the other voices did not differ from each other. Table 1 Example utterances.

Coding category Examples
Negative "she's doing it all wrong"; "you disgust me" Positive "we want to help you"; "you should go to sleep" Neutral "go check outside"; "you're a failure … I do love you though" Delusional "the mindreaders are coming to get you"; "smash the satellites"

Person-level analyses
Demographic characteristics. The total number of voices reported to have been heard was not significantly associated either with the respondent's sex (male mean = 4.41, female mean = 3.79, t(77) = 0.73, p = .465); or with their diagnostic status (psychosis mean = 4.04, other diagnosis mean = 4.24, no diagnosis mean = 2.00, F(2, 76) = 2.19, p = .119). The mean number of different voices that were described in detail for the purposes of the survey (maximum of five) was 2.47 (SD = 1.38). A total of 42 (45.7%) respondents had at least one voice that made exclusively negative utterances and 12 (13.0%) had at least one voice that made exclusively positive utterances. Voices that mentioned themselves at least once were reported by 58 (63.0%) respondents.
Risk factors. A multi-level analysis investigating potential risk factors for utterance negativity (Table 4a) found one significant univariate effect, such that voices first heard at younger ages were more likely to be negative. This was no longer significant when the other potential risk factors were controlled for. Multivariate analysis confirmed the prediction that voices reported by respondents with a diagnosis of psychosis would be rated as more negative than voices heard by those with no diagnosis. Intra-class correlations indicated high correlations within each voice, and much lower correlation in negativity considering the voice within participant nesting.
A second multi-level analysis investigating potential risk factors for utterance self-reference (Table 4b) found three effects that remained significant after controlling for the other variables. Voices that referred to themselves were more common among female participants, were more likely to be associated with the presence of triggers, and were more likely to have been first heard at an older age. Intra-class correlations demonstrated a moderate level of agreement among the individual utterances making up each voice, but much lower levels of agreement between an individual participant's different voices.

Discussion
This study reports several novel findings concerning voice-hearing. First, regardless of gender and diagnostic status, respondents described being able to distinguish an average of four separate voices, ascribing to the majority of them a gender different to their own. Second, although negative utterances and voices were numerically most common, almost 40% of voices contained utterances with mixed types of content. Third, despite this variability the content of utterances was more consistent within voices than between voices. Fourth, regardless of diagnostic status, around half of the voices referred explicitly to themselves. Fifth, voices perceived as belonging to young children had less negative content than voices perceived as belonging to adolescents or most older age groups. Other observations were that there was a wide variation in length of utterances, although the median utterance was short (six words). Voices with delusional or meaningless content were very much in the minority and conversations were rarely initiated by the voice-hearer.
Consistent with Nayani and David (1996), the majority of voices were perceived as male. The perceived gender of the voice often differed from that of the respondent, and in a minority of cases the perceived age was substantially greater. These observations appear inconsistent with the hypothesis that all voice-hearing involves a misattribution of inner speech. Similarly, the voice was only associated with a familiar person in a minority of cases. This supports previous observations concerning the replay of experiences (McCarthy-Jones et al., 2014b), confirming the idea that they represent a specific subtype. Of additional interest was the fact that around half the voices referred to themselves during at least one utterance, thereby seeming to claim a distinct personhood. This may be related to the fact that voice. Hearers typically describe having a relationship with their voice that is in some ways similar to that with a real person (Chadwick and Birchwood, 1994;Hayward, 2003;Thomas et al., 2009).
Many of the findings appear to be in line with the notion that voices are better conceptualized as a dissociative rather than as a psychotic phenomenon (Moskowitz and Corstens, 2007). This has received growing conceptual and empirical support (Longden et al., 2019(Longden et al., , 2020Moskowitz et al., 2017;Pilton et al., 2015), and we suggest that our findings are consistent with this model. For example, voices differing in age/gender from the hearer, a greater variation in utterances between voices than within a single voice, and the frequency of self-referential comments, all correspond with a dissociative framework: Specifically, that voice-hearing (at least in some cases) may represent trauma-induced alterations in consciousness which are experienced as interpersonally dynamic, ego-dystonic, and perceptually detached from the person themselves (Dorahy and Palmer, 2016).
Negativity is a key characteristic of voices that is related to the amount of distress they cause. Analyses of individual voices found an association between negativity and identifying the voice with a familiar person, suggesting that at least some voices represent the internalisation of a relationship with a hostile or critical figure from participants' lives. It was also notable that voices perceived as belonging to young children were associated with very low levels of negativity compared to all other ages. A parallel observation is that schema therapists working predominantly with pervasive problems in self-perception and relationships with others have often identified a state of being in their clients that corresponds to a vulnerable child (Arntz et al., 2021). This state incorporates elements of feelings, bodily states, and beliefs associated with childhood, coupled with a coping response involving resignation or surrender. In turn, consistent with a trauma and dissociation framework, it is of additional interest that child voices have been reported with much greater prevalence in patients with a diagnosis of dissociative disorder relative to schizophrenia (Dorahy et al., 2009;Laddis and Dell, 2012).
As stated previously, analyses of voice-hearing should ideally incorporate variability in both the voices heard and in the individual utterances made by each voice. The multi-level analyses we employed clearly indicated that there was consistently more variability at the level of the person than at the level of the voice. That is, there was more consistency between the utterances made by each voice than there was between the voices heard by each person. This was most obviously the case for negativity and utterance wordcount. The implication is that representing and assessing the voice-hearing experience requires routine inquiry into several of the most prominent voices that hearers experience, rather than basing conclusions on the content of a single voice.
Once this variability was taken into account, utterance negativity was not significantly related to any of the potential risk factors in the adjusted analyses except a diagnosis of psychosis (versus no diagnosis). The lack of any effect for the presence of psychosis versus other diagnoses reflects previous findings suggesting that people with other disorders experience equally distressing voices (Kingdon et al., 2010;McCarthy-Jones and Longden, 2015;Ross et al., 1990;Sar and Ö ztürk, 2008;Slotema et al., 2012). Other sources of negativity need to be explored, including exposure to critical others, traumatic sequelae, social stigma, and the lack of appropriate sources of emotional support.
There were several multivariate risk factors for a syntactic utterance element, the voice referring explicitly to itself (e.g., "I'm behind you watching your every move"; "I am so sick of your complaining"). This was predicted by female participant gender, age hearing the voice for the first time, and most strongly by the existence of specific situational triggers. As previously mentioned, we speculate that self-reference might reflect a developmentally more advanced voice subtype associated with a greater fragmentation in the sense of self, and might be more common in dissociative disorders (Dorahy and Palmer, 2016).
Among the limitations of the study are the reliance on online data collection. This did not permit a more detailed exploration of respondents' experiences (e.g., asking respondents to classify their voices: McCarthy Jones et al., 2014b) nor the opportunity to conduct validity checks. For example, a high proportion of the sample chose to not supply examples of what they heard their voices say. It would be valuable to understand whether this response was due to the demands of the study or to some other factor, such as a belief in not being allowed to talk about voice content. Nevertheless, it is striking that many respondents were prepared to describe their experiences in great detail. Other limitations include the fact that the sample was predominantly female and university-educated, and that numbers were modest, particularly for voice-hearers who had not received any psychiatric diagnosis. The results should therefore be treated with caution. A final issue is that responses were coded by external raters. Voice-hearers themselves are usually much better placed to describe the meaning of what voices say and identify important distinctions.
Nevertheless, the study has suggested that there is significant variability at the level of different utterances within voices and more prominently at the level of different voices within an individual. Apart from underscoring the potential value of collecting multiple utterances from multiple voices, the data were inconsistent with general cognitive explanations for voices, such as the misattribution of inner speech, and more congruent with a dissociation model of voice-hearing. While supporting approaches based on subtype or dimensional methods of classification, they additionally indicate that these might be further developed by assessing multiplicity. Although there are multiple bases for subtyping, including neurology, causal antecedents, and response to treatment, phenomenology provides the most direct window into the experience that is being explained, including implications for a diagnosis of dissociative as opposed to psychotic disorder (Moskowitz et al., 2017). More accurate and sophisticated descriptive data can provide a firm foundation for linking different levels of explanation.
Our data add to previous findings concerning the frequency of voicehearing and the extent to which voices are subjectively believed or a source of distress (e.g., Nayani and David, 1996;Larøi et al., 2012). The confirmation that hearers report a large number and variety of voices has important implications for research and clinical evaluation. It cautions that conclusions about voice-hearing may need to be based on more extensive sampling than is usually the case. Furthermore, basing responses on questions about a single voice or a single utterance may obscure significant variability and complexity in an individual's experiences.