The effect of auditory verbal imagery on signal detection in hallucination-prone individuals

Highlights • Investigated relation between hallucinations, mental imagery and signal detection.• Individuals prone to hallucinations showed a lower SDT response bias with imagery.• Finding held for both instructed and self-reported use of auditory verbal imagery.• Atypical auditory imagery may lead to the generation of auditory hallucinations.


Auditory verbal hallucinations and inner speech
Auditory verbal hallucinations (AVHs) are the experience of hearing a voice in the absence of any speaker. Although commonly associated with a diagnosis of schizophrenia, AVHs also occur in around 1.5-3% of the healthy, nonclinical population (Tien, 1991). There is emerging evidence that the predisposition to AVHs may lie on a continuum, ranging from individuals who frequently experience, to individuals who rarely or never report, hallucinations (Johns & van Os, 2001;Johns et al., 2014). A fruitful area of investigation is therefore to investigate whether cognitive traits and biases associated with hallucinations in clinical populations are shared by individuals in the general population who report frequent hallucinatory experiences (Badcock & Hugdahl, 2012).
The most prominent cognitive model of AVHs suggests that they occur when an internal mental event (such as inner speech or auditory verbal imagery -AVI) is misattributed to an external source (Ditman & Kuperberg, 2005;Frith, 1992;Jones & Fernyhough, 2007b). This strand of research has been embedded in the source monitoring framework, which attempts to explain how we make judgements regarding the origin of information (i.e., its source; Johnson, Hashtroudi, & Lindsay, 1993). Specifically, an externalising bias in reality monitoring, which refers to the ability to distinguish between internally generated and externally generated perceptions, has been linked to AVHs (Bentall, Baker, & Havers, 1991). Externalising biases have variously been linked to excessively vivid mental imagery (Aleman, Böcker, Hijman, de Haan, & Kahn, 2003), and low cognitive effort/intrusiveness associated with mental imagery (Jones & Fernyhough, 2009;Morrison, Haddock, & Tarrier, 1995). On a mechanistic level, forward models may be involved in predicting the sensory consequences of motor processes, and successful prediction via an efference copy may be one way in which self-generated actions are experienced as such (Frith, Blakemore, & Wolpert, 2000). Aberrant efference copy mechanisms could therefore underlie the misattribution of internal mental events to an external, non-self source (Ford & Mathalon, 2005).
Reality monitoring for verbal stimuli has typically been assessed using source memory paradigms, which require participants to recall whether words were spoken by the experimenter or by themselves. A common finding is that patients with a diagnosis of schizophrenia who hallucinate, compared to those who do not hallucinate, are more likely to misremember words as having been spoken by the experimenter, but do not make the reverse error (e.g., Brunelin et al., 2006;Woodward, Menon, & Whitman, 2007). That is, participants who hallucinate tend to show an 'externalising bias' on reality monitoring tasks. Consistent with continuum models of AVHs (Van Os, Hanssen, Bijl, & Ravelli, 2000), non-clinical samples who report higher levels of hallucination-proneness also show a similar pattern of responding on reality monitoring tasks (Brookwell, Bentall, & Varese, 2013;Larøi, Van der Linden, & Marczewski, 2004). 1 However, these tasks are not ideally positioned to test models of AVHs that specify the misattribution of internal mental events such as inner speech, for two main reasons: (1) they are not 'online' measures (source memory tasks, for example, are 'offline' in that they require participants to decide who generated words earlier in the testing session); (2) they are either not specific to monitoring of speech or, if they are, are likely to use 'overt' (out loud) speech, as opposed to engaging the participant in auditory verbal imagery or inner speech. This limits the applicability of the results to inner speech models of AVHs, because it assumes that overt vocalisation in an experimental situation utilises the same mechanisms as covert or inner speech. Although there is evidence that overt and covert speech share cognitive and neural mechanisms, particularly in relation to the motor system (Alderson-Day & Fernyhough, 2015;Perrone-Bertolotti, Rapin, Lachaux, & Loevenbruck, 2014), any inferences from studies using overt speech about the nature of covert speech are necessarily indirect, and based on the assumption that similarities between the two are more significant than the differences.
A similar line of research has attempted to engage the participant in an 'online' decision making process, referred to as 'reality discrimination', requiring participants to immediately respond as to whether a perception was internal or external (in contrast to the 'offline' decisions required in a source memory task, which typically require a decision to be made at a later time point, e.g., Woodward et al., 2007). Reality discrimination tasks typically take the form of signal detection tasks, in which the participant must decide whether a voice was present in a burst of noise. In these tasks, hallucinating patients tend to show a bias towards responding that a voice is present in the noise (Varese, Barkus, & Bentall, 2012). In a non-clinical sample, participants who reported more hallucinatory experiences also showed the same bias in responding . These findings have been linked theoretically to the reality monitoring tasks described above, as providing evidence linking AVHs to an externalising bias (Brookwell et al., 2013). It is unclear, however, to what extent performance on auditory signal detection tasks relates to inner speech/AVI processes, since participants are not specifically instructed to use imagery during the task. One way to address this concern would be to employ a paradigm that requires participants to engage in covert AVI, whilst simultaneously detecting the presence or absence of a similar auditory verbal stimulus.

Mental imagery and perception
Previous research on the interaction between mental imagery and perception has come closest to meeting the two criteria outlined above (engaging participants in an online task, and controlling the mental imagery they generate while performing the task). Perky (1910) carried out a series of experiments that suggested that visual imagery interfered with the simultaneous perception of a visually presented stimulus (subsequently referred to as the Perky Effect). For example, participants who engaged in visual mental imagery of an object took longer to detect a visually presented stimulus of the same object than did participants who did not generate any mental imagery. This was taken to indicate that, since mental imagery and perception could be confused, they must rely on similar mechanisms.
However, others have found that mental imagery actually facilitates perception in the visual modality (Peterson & Graham, 1974). This finding has also been replicated in the auditory modality; for example, Farah and Smith (1983) engaged participants in auditory imagery of a pure tone, whilst simultaneously requiring them to detect a similar tone in noise. Participants were therefore required to distinguish between self-generated, internal mental imagery and an external stimulus. Using auditory imagery facilitated perception of the tone, although the task used did not include trials with no signal present, and so signal detection analysis was not reported. Findings on the interaction between imagery and perception have, therefore, been equivocal. More recently, Aleman et al. (2003) used a similar paradigm with a sample of patients with a diagnosis of schizophrenia, showing that the 'gain' on perception of a pure tone due to auditory imagery was strongly correlated with hallucination severity. This finding was interpreted as reflecting an over-reliance on top-down processes in hallucinating patients (which could also be related to a bias towards labelling internal imagery as external).
One problem with these studies is that they do not measure the effect of imagery on the tendency to falsely detect a signal in noise, 2 because there is always a signal present. This is a key variable when linking performance to the tendency to hallucinate, and also when performing signal detection analysis. From the data presented by Aleman et al. (2003), for example, it is not possible to tell whether the 'gain' on perception was due to a change in sensitivity (an increased ability to distinguish signal from noise), or a change in response bias (i.e., participants being more willing to respond that a tone was present when using imagery). The previously discussed literature relating to biases in reality monitoring/discrimination would imply that it may be the latter. Imagery-perception interaction tasks, though, have the advantage of directly engaging participants in internal mental imagery (as opposed to speaking aloud), and requiring them to distinguish whether any subsequent perception was internally generated or not, hence addressing the concern about reality discrimination tasks described above. As well as being informative on the nature of mental imagery, this makes the tasks ideal for testing inner speech models of AVHs.

Valence and externalising biases
A further question stemming from inner speech models of AVHs relates to what may cause one instance of inner speech to become misattributed, but not another instance. As discussed, source monitoring theories may appeal to vividness of mental imagery and the effort associated with cognitions, but theories of AVHs have also suggested that unpleasant, ego-dystonic cognitions are more likely to become misattributed (Morrison et al., 1995). This is consistent with reports that AVHs are more likely to occur in conditions of negative affect or stress (Nayani & David, 1996), and with the well-replicated finding that, at least in psychiatrically healthy individuals, negative events are more likely to be dismissed as attributable to a non-self cause (Mezulis, Abramson, Hyde, & Hankin, 2004).
However, the findings of studies that have investigated whether negatively valenced cognitions in source memory/signal detection tasks are more likely to become externally misattributed than neutral or positively-valenced cognitions have been inconsistent (Bendall, Jackson, & Hulbert, 2011;Morrison et al., 1995). In a recent meta-analysis of these studies (Brookwell et al., 2013), slightly larger externalising biases were observed in studies that employed positive (Hedges g = 0.75), than negative (g = 0.62), or than neutral stimuli (g = 0.50). However, because of the designs of the synthesized studies, it was not possible to examine whether these effect sizes differed from one another, which makes drawing any conclusions very difficult (especially when only five studies were included in the meta-analysis). Recent work investigating how emotion can modulate intentional binding (a low-level measure of sensorimotor agency) showed that intentional binding was reduced (i.e., participants' sense of agency over their actions was reduced) when participants' actions were paired with a negative emotional outcome in comparison to when actions were paired with a neutral or positive outcome (Yoshie & Haggard, 2013). Thus, there are reasons to believe that negatively-valenced cognitions are more likely to be misattributed to an external source than are neutral or positively-valenced cognitions. However, further work is needed to establish whether this is the case.

The present study
The present study therefore set out to test whether performance on an auditory signal detection task was affected by the generation of auditory verbal mental imagery, and whether this was modulated by the emotional valence of the stimuli, and/or self-reported hallucination-proneness, in a non-clinical sample. This extends previous research by directly linking the putative 'raw material' of AVHs -auditory imagery -with a frequently used task that can elicit false perceptions, with the participant engaged immediately in a decision about the perceptual origin of a stimulus.
Two key changes were made to previous paradigms investigating AVI and signal detection. Firstly, trials in which no signal was present were included, to allow the generation of 'false alarm' responses and the calculation of relevant signal detection measures. Secondly, the signal detected was a voice (as opposed to a pure tone), in order to maximise the relevance of the task to inner speech models of AVHs. We conducted two experiments which manipulated or measured the extent to which participants generated AVI. In Experiment 1, participants completed two blocks of auditory signal detection: one in which they were cued with a short sentence and required to generate AVI of that sentence whilst performing the task, and one in which there was no cued sentence or instruction to use AVI. In Experiment 2, a different sample of participants completed two blocks of auditory signal detection: one in which there was no cued sentence or instruction to use AVI, and one in which they were cued with a short sentence, but with no instruction to use AVI. Participants then retrospectively reported the extent to which they engaged in AVI whilst attempting to detect the auditory stimulus.
We predicted that use of AVI would lead to a lower response bias, because participants would have more opportunity to misattribute a self-generated event to an external source. Such a bias would lead participants to correctly detect more voices in the noise, but also incorrectly detect more voices in the noise (i.e., report hearing a voice when none was present). Furthermore, drawing on the previous literature linking signal detection performance and AVHs, we predicted that participants who scored highly on self-report measures of proneness to hallucinations may be particularly vulnerable to this effect. Finally, consistent with the findings of Yoshie and Haggard (2013), and with previous findings relating to self-attribution biases mentioned above, we predicted that participants' response biases would be lower when they generated negative AVI than when they generated positive AVI.

Experiment 1
In the first experiment, participants performed an auditory signal detection task under two conditions: (1) with a visually presented verbal cue and an instruction to use AVI, and (2) with no cue, and no instruction to use AVI. We predicted that using imagery would lower participants' response bias (making them more likely to report hearing a voice in the noise), but have no effect on sensitivity (the ability to distinguish between the voice and the noise). In half of the trials, participants were asked to engage in positive AVI (e.g., ''I am good") and in half, participants were asked to engage in negative AVI (''I am bad"). We predicted that participants' response bias would be lower when generating negative AVI than when generating positive AVI.

Participants
The sample consisted of 125 participants from the undergraduate and staff population of Durham University, UK. Five participants were excluded from the final sample, due to technical malfunctions during the testing session (n = 3), or because their task sensitivity (d 0 -see below) was classified as an outlier (>4.5 standard deviations above the mean) (n = 2). The final sample size was therefore 120 (number of females = 91; mean age = 20.7 years, SD = 2.5, range = 18-30 years). All participants were native English speakers and none reported any hearing problems. In return for their time, participants were given course credit or a small payment.

Signal detection task
The task required participants to listen to bursts of noise, and to respond whether a voice stimulus was present in the noise. All participants completed two conditions: one block in which they were asked to use auditory verbal imagery whilst detecting the stimuli (the 'AVI' condition) and one block in which they were not (the 'non-AVI' condition).
Each block of the signal detection task (SDT) consisted of 80 trials, each lasting 5 s (plus response time). Fig. 1 shows an illustration of a single trial of the task. In the AVI condition, participants were first presented with a 3-4 syllable sentence, in the centre of the screen, for 1.5 s. In the non-AVI condition, participants were simply presented with a blank screen for 1.5 s. This was followed by an on-screen countdown, which consisted of a shrinking circle, to mimic a '3, 2, 1. . .' countdown. (It was not appropriate to use a verbal on-screen countdown, as this may have interfered with processing of the presented sentence.) Pink noise (which consists of equal energy per octave; generated using Audacity 2.0.2) began playing through the provided headphones, simultaneously with the countdown. The countdown was followed by a fixation cross, which was present on the screen for 2 s. The participants were informed that, if there was a voice present in the noise, it would only appear when the fixation cross was present. Participants were then prompted to respond, with a button press, as to whether they believed a voice was present in the noise.
The sentence presented to the participant always took the form of the words 'I am', followed by an adjective (i.e., the participant was required to repeat a sentence about themselves). Half of the trials presented a positively valenced sentence (e.g., ''I am happy"), whereas the other half presented negatively valenced sentences (e.g., ''I am sad"). These sentences were rated for valence on a Likert scale (1 = very negative; 7 = very positive), by a separate subset of participants (N = 13), none of whom participated in the main experiments. The ratings of the words used for positive sentences (M = 6.04, SD = 0.42) were significantly higher than the words used for negative sentences (M = 2.13, SD = 0.66), t(65.9) = 31.53, p < .001 (equal variances not assumed). None of the mean ratings for any individual positive words were rated lower than any of the negative words. Positively and negatively valenced sentences were also matched for number of syllables.
In 44/80 trials, a voice stimulus was embedded in the pink noise as the fixation cross appeared on the screen, lasting for 1.5 s ('voice-present trials'). The stimuli in the voice-present trials always spoke the same words that had been presented to the participant, in a male voice. The pink noise remained at the same volume for all trials for all participants; that is, only the volume of the voice stimuli differed between trials to vary the signal-to-noise ratio (voice stimulus to pink noise volume ratio). The signal-tonoise ratio in the trials varied between four different levels, based on the performance of pilot participants (N = 10, none of whom participated in the main experiment). Of the 44 voice-present trials in each signal detection block, 8 consisted of stimuli that pilot participants detected on 100% of trials, designed to ensure that all participants in the main experiment could correctly detect some of the voice stimuli. The remaining 36 trials were split evenly between volumes at which pilot participants detected the voice stimuli on 75%, 50% and 25% of trials. This was designed to maximise the ambiguity of the presented stimuli. In the remaining 36/80 trials, no voice was embedded in the noise ('voice-absent trials'). The 44:36 ratio of present:absent trials was roughly based on previous auditory signal detection studies (e.g., Barkus, Stirling, Hopkins, McKie, & Lewis, 2007). Signal-to-noise ratio and the presence/ absence of a voice stimulus were balanced across valence of the presented sentence.

Measure of hallucination-proneness and AVI-usage
The Revised Launay-Slade Hallucination Scale (LSHS-R) was used to assess hallucination-proneness. This 9-item measure was adapted by McCarthy- Jones and Fernyhough (2011) from the 20item Launay-Slade Hallucination Scale used by Morrison, Wells, and Nothard (2000). The scale assesses proneness to hallucinatory experiences in both the auditory and visual modality. It has been shown to have high internal reliability and a more valid factor structure than previous versions of the Launay-Slade Hallucination Scale.
Participants were also asked to estimate the extent to which they had generated AVI at the appropriate time during the task, giving a number between 0 and 100.

Procedure
Participants wore over-the-ear headphones (Logik LHHIFI10) to complete the task. They were informed that they would be listening to bursts of noise, and listening out for a voice in the noise, responding present/absent with a button press at the end of each trial. All participants were told that some voices would be easier to hear, whereas others would be quieter and harder to detect, although they were not informed how often a voice was likely to be present. For the AVI condition, participants were instructed to ''imagine saying the sentence to yourself silently", at the same time as the fixation cross appeared on the screen. Participants were told they did not need to deliberately move their mouth or speak aloud during the main task, but simply to use auditory imagery of the presented sentence. They were also informed that, if there was a  Fig. 1. Schematic representation of a single trial in the AVI condition of the signal detection task. A sentence is presented to the participant (Screen 1), followed by a 1500 ms countdown (Screens 2-4), followed by a fixation cross, which, on voice-present trials, was accompanied by a voice stimulus (Screen 5). Participants were instructed to 'imagine saying' the presented sentence when they saw the fixation cross, and then provide a response as to whether they believed a voice was present during Screen 6. The proportion of the trial during which pink noise played is indicated by the dashed line.
voice present in the noise, it would be presented at the same time as the fixation cross. During the practice phase of the AVI condition, participants were asked to speak the sentence out loud at the required time point for the first four trials, to ensure that they understood the instructions. If the participant did not vocalise the sentence at the appropriate time point, the practice trials were repeated until they were able to perform the task as requested. After the practice trials, participants were asked whether they understood the instructions relating to using AVI, and offered the chance to repeat the practice if unsure. In the event that participants did not understand the instructions relating to AVI (which was rare), various different descriptions were given to aid understanding (e.g., 'inner speech', 'auditory imagery' or 'talking to yourself in your head'). In the non-AVI condition, participants were simply asked to detect a voice in the noise, but not given any instructions about imagining a voice.
All participants completed both conditions of the SDT. The order in which they completed the tasks was counterbalanced across participants. Between the two blocks of trials, participants completed the self-report items (see Section 2.1.2), and completed several other tasks (to be reported elsewhere).

Data analysis
Performance on the SDT was analysed using signal detection theory. For each trial response, there were therefore four possible outcomes: hit (voice-present, 'present' response), miss (voicepresent, 'absent' response), correct rejection (voice-absent, 'absent' response) and false alarm (voice-absent, 'present' response). From these, signal detection parameters relating to response bias (b) and sensitivity (d 0 ) were calculated. Following Stanislaw and Todorov (1999), b was calculated as follows: b ¼ e ZðFAÞ 2 ÀZðHÞ 2 2 n o . b, based on the likelihood ratio of the signal and noise distributions, was chosen as a measure of response bias to be consistent with previous studies relating auditory signal detection performance to hallucination-proneness (e.g., Varese, Barkus, & Bentall, 2011;Vercammen, de Haan, & Aleman, 2008) which have shown robust links to hallucination-proneness in a recent meta-analysis (Brookwell et al., 2013). A b value <1 corresponds to a bias towards responding 'yes', whilst a value >1 corresponds to a bias towards responding 'no'. d 0 is defined as the difference between the standardised hit rate and false alarm rate, with a value of 0 representing performance at chance level. There were two within group variables: task condition and sentence valence. A median split was also performed on the data according to score on the LSHS-R, grouping the participants into high (scoring P 15 on the LSHS-R, N = 61) and low (scoring < 15, N = 59) hallucination-proneness; hence there was one between group variable: hallucination-proneness group (high/low). We therefore performed a 2 Â 2 Â 2 mixed model ANOVA, with response bias (b) as the dependent variable, and task condition (AVI/non-AVI), sentence valence (positive/negative) and hallucination-proneness group (high/low) as independent variables. This analysis was also repeated with sensitivity (d 0 ) as the dependent variable, to test whether the manipulation affected participants' ability to distinguish the voice from the noise. Where data was non-normally distributed, Mann-Whitney U tests were used during further analysis. Signal detection performance was also modelled using regression analysis to investigate the contribution of different variables to task performance in the different conditions.

Results
Participants generally reported being able to complete the task as instructed without difficulty, and reported using AVI with the fixation cross a relatively high amount (M = 91.7, SD = 9.0). There was no difference in the amount of AVI-usage reported between participants in the high (M = 90.58, SD = 9.2) and low (M = 92.88, SD = 8.6) hallucination-proneness groups: t(117) = 1.40, p = .16, d = 0.26. Descriptive statistics for response bias (b) and sensitivity (d 0 ) to the auditory SDT under the two conditions (AVI/non-AVI) for high and low hallucination-prone participants, are shown in Table 1.

Response bias (b)
For descriptive statistics, see Table 1. For b (response bias), a 2 Â 2 Â 2 mixed model ANOVA showed a main effect of task condition (AVI/non-AVI): F (1,118) = 5.99, p = .016, g p 2 = .048, showing that participants performed with a lower response bias in the AVI condition (M = 2.41, SD = 2.7) compared to when not using AVI (M = 2.97, SD = 3.01). There was no overall effect of hallucinationproneness group (F (1,118) = 0.43, p = .51, g p 2 = .004). There was however, a significant task condition Â hallucination-proneness interaction: F (1,118) = 4.47, p = .037, g p 2 = .037, (see Fig. 2a). Further analysis using Mann-Whitney U tests showed that the effect of AVI appeared to be specific to highly hallucination-prone participants, who had a significantly lower response bias in the AVI condition (Mdn = 1.12) than in the non-AVI condition (Mdn = 1.79): z = 3.51, p < .001, r = .45. In the low hallucination-prone participants, there was not a significant difference in b between the AVI condition (Mdn = 1.50) and the non-AVI condition (Mdn = 1.47): z = 0.11, p = .91, r = .01. There was no effect of sentence valence on b: F (1,118) = 1.18, p = .28, g p 2 = .01, nor any interactions between sentence valence and any other variables (all ps > .13).
Given that the use of median splits can sometimes result in spurious interaction effects (McClelland & Judd, 1993), we also analysed this data using two separate regression analyses. In one regression model, response bias (b) in the AVI condition was the dependent variable, with b in the non-AVI condition and Launay-Slade Hallucination Scale (hallucination-proneness) score (left as a continuous variable) as independent variables. This model significantly predicted b in the AVI condition (F (2,119) = 23.3, R 2 = .20, p < .001), with both response bias in the non-AVI condition (b = .41, p < .001) and hallucination-proneness (b = À.193, p = .02) emerging as independent predictors of response bias in the AVI condition. In the second regression model, response bias in the non-AVI condition was the dependent variable, with response bias in the AVI-condition and hallucination-proneness (again, left as a continuous variable) used as independent variables. This model significantly predicted response bias in the non-AVI condition (F (1,119) = 11.62, R 2 = .17, p < .001), although only AVI-condition response bias was a significant predictor (b = .41, p < .001), with hallucination-proneness not significantly predicting performance in this condition (b = .03, p = .74). Therefore, to summarise, when controlling for performance on the other signal detection condition, hallucination-proneness only significantly predicted response bias in the task condition in which participants used AVI.

Sensitivity (d 0 )
For descriptive statistics, see Table 1. For d 0 (sensitivity), there was no effect of task condition (F (1,118 There was also no effect of gender on task sensitivity

Discussion
The first key finding from Experiment 1 was that when participants were instructed to use AVI during a signal detection task, there was a significant drop in response bias. That is, participants were more likely to respond that a voice was presented in the noise when they engaged in AVI, regardless of whether a voice was actually presented. However, using AVI did not affect participants' sensitivity (ability to distinguish between the speech and the noise). Importantly, the interaction between task condition and hallucination-proneness indicated that the effect of AVI on response bias was specific to participants who scored highly on self-reported hallucination-proneness, whilst there was no difference between the imagery conditions in participants who reported few hallucinatory experiences. This finding suggests that the use of AVI caused hallucination-prone participants to exhibit a lower response bias. That is, when participants prone to hallucinations generate mental imagery, they are more likely to respond 'present', regardless of whether the stimulus was actually present or not. This is consistent with models that suggest AVHs may occur when inner speech is misattributed to an external source, which may link to excessively vivid mental imagery, low levels of cognitive effort, and/or aberrant predictive processes (Brookwell et al., 2013;Ford & Mathalon, 2005).
It could be argued that previous findings of lower response bias in signal detection tasks in hallucination-prone participants could be due to a higher rate of spontaneous AVI usage. This interpretation, though, is not supported by the present data, which indicated that there was no difference in AVI-usage between individuals scoring high and low in hallucination-proneness. However, since all participants were instructed to use AVI, this may have masked potential differences in spontaneous AVI usage (a possibility that is explored further in Experiment 2, below). Table 1 Means and standard deviations for Experiment 1, showing performance on the auditory signal detection task in the AVI and non-AVI task conditions, for high and low hallucination-prone participants, for positively and negatively valenced stimuli (M, SD).

Hallucination-proneness
Valence  Contrary to our prediction, there was no effect of the emotional valence of the imagined sentence on performance, nor any interaction between valence and any other variables. This is consistent with previous research showing that, on a source memory task, the emotional valence of the stimuli did not affect performance or interact with hallucination-proneness (Bendall et al., 2011).

Experiment 2
Experiment 1 showed that using AVI during auditory signal detection led to a reduction in response bias, specifically in highly hallucination-prone individuals. However, one concern is that simply cuing participants with a sentence to imagine could have altered task performance, and that the observed effect could be due to priming of the sentence, rather than the use of AVI specifically (although the observed interaction with hallucinationproneness would still be of interest). The data from Experiment 1 is not capable of addressing this concern. Because almost all participants reported a high level of AVI use, it was not possible to determine whether the effect was specific to participants who engaged in high levels of AVI. To investigate this, we therefore conducted a second experiment, using identical stimuli, in which participants were not instructed to use AVI, but were still cued with a sentence before each signal detection trial. After task completion, participants were asked to report the extent to which they felt they had used AVI whilst performing the task. The rationale for this design was that it made it possible to investigate whether signal detection performance was associated with AVI use, even when participants were not explicitly instructed to use it. This design also tested whether participants that were more hallucinationprone reported using more spontaneous AVI, leading to a lower response bias, or whether the two interacted (i.e., participants who were both highly hallucination-prone and reported using high levels of AVI had lower response biases, but the overall level of AVI was not associated with hallucination-proneness). Based on the findings in Experiment 1, therefore, for Experiment 2 we predicted that there would only be a difference between the two task conditions in individuals scoring highly on hallucination-proneness and who reported using high levels of AVI whilst performing the SDT.

Participants
The sample consisted of 60 participants from the undergraduate and staff population of Durham University, UK, none of whom had taken part in Experiment 1 (number of females = 48; mean age = 19.73 years, SD = 2.5, range = 18-30 years). All participants were native English speakers and none reported any hearing problems. In return for their time, participants were given course credit or a small payment.

Procedure
Using identical stimuli and equipment as in Experiment 1, participants completed two blocks of the SDT. Participants were given the same instructions for how to complete the task, with the only difference being that they were given no instructions relating to AVI. Therefore, they completed two conditions: a 'non-cued' condition (identical to the non-AVI condition in Experiment 1) and a 'cued' condition (in which the same sentences as in Experiment 1 were presented before each burst of noise, but there were no AVI instructions). Participants were informed that, in the cued condition, the sentence they were presented on-screen would be the same as the voice they were instructed to detect, although they were not required to be able to comprehend the sentence in the noise to respond 'yes'.
As in the first experiment, participants completed the 9-item LSHS-R as a measure of hallucination-proneness (see Section 2.1.3) between the two blocks of the SDT. After completion of the tasks, participants were presented with the following question: 'This question relates to the task in which you were presented with a sentence before listening to the noise. When the fixation cross appeared on the screen, did you find yourself using 'inner speech' to say the previously presented sentence? If yes, what percentage of the time do you think you did this? (0-100)'.

Data analysis
As in Experiment 1, we performed a median split on the data according to LSHS-R score (high: P15, N = 32; low: <15, N = 28). We also performed a median split on the data according to the amount of AVI reported by the participants (high: P75, N = 32; low: <75, N = 28) and conducted a 2 Â 2 Â 2 mixed model ANOVA, with task condition (cued/non-cued) as a within-subject variable, and hallucination-proneness (high/low) and AVI use (high/low) as between-subject variables. Due to the lack of effect of valence in Experiment 1, we did not include valence as a within-subject variable in this experiment. The dependent variable was response bias (b) on the SDT. The analysis was also repeated using sensitivity (d 0 ) as the dependent variable.

Results
Participants reported using AVI ('inner speech') a relatively high amount, considering that no instructions were given (M = 66.83, SD = 29.1), although estimates ranged from the bottom to the top of the scale (range = 0-100). A Mann-Whitney U test indicated that there was no difference between the high (Mdn = 77.50) and low (Mdn = 75.00) hallucination-proneness groups in the amount of AVI retrospectively reported (U = 436, p = .86, r = .02).
There was a significant interaction between hallucinationproneness and AVI-usage (F (1,56) = 4.62, p = .036, g p 2 = .076), although the three-way interaction between task condition, hallucination-proneness and inner speech usage was not significant (F (1,56) = 3.16, p = .08, g p 2 = .053). However, given that the interaction was close to significance, and that the AVI-usage variable specifically referred to the cued condition, we explored the result further by conducting two 2 Â 2 [AVI-usage Â hallucination-proneness] ANOVAs, for the cued and non-cued conditions separately.
For the non-cued condition, there was no effect of hallucination-proneness (F( 1,56) = 1.91, p = .17, g p 2 = .033), no effect 058, d = 0.52. Importantly, in the cued condition, there was a significant interaction between hallucination-proneness and AVI-usage (F (1,56) = 9.12, p = .004, g p 2 = .14). This interaction effect was explored by conducting two Mann-Whitney U tests for the cued condition, comparing b between the high/low AVI-usage groups for both the high and low hallucination-proneness groups. This showed that, for the low hallucination-proneness group, there was no significant difference in b between the high (Mdn = 1.60) and low (Mdn = 1.80) AVI-usage groups (U = 93.0, p = .84, r = .04). However, for the high hallucination-proneness group, there was a significant difference in b between participants who reported high levels (Mdn = 1.01) of AVI-usage and those who reported low-levels (Mdn = 2.53) of AVI-usage (U = 66.0, p = .02, r = .41). Again, to further explore the data, we conducted a regression analysis using response bias (b) in the cued condition as the dependent variable. b in the non-cued condition, hallucinationproneness score, inner speech usage, and the interaction term (hallucination-proneness Â inner speech usage) were included as independent variables. Although hallucination-proneness score was used as a continuous variable, inner speech usage was kept as a dichotomous variable because a large proportion of participants reported either very high or very low levels of inner speech (59.7% of participants reported using inner speech on either 0 or >80% of trials). This model significantly predicted response bias in the cued condition (F (4,59) = 4.26, R 2 = .24, p = .004.), with b in the non-cued condition (b = .32, p = .014) and hallucinationproneness (b = .81, p = .027) significantly predicting response bias. Inner speech usage was not a significant predictor (b = À.19, p = .116), although the inner speech usage Â hallucinationproneness interaction did significantly predict response bias in the cued condition (b = .25, p = .033). When this analysis was repeated with response bias on the non-cued condition as the dependent variable, the model significantly predicted performance (F (4,59) = 3.54, R 2 = .21, p = .012), but only response bias on the cued condition emerged as a significant predictor (b = .33, p = .014), whereas hallucination-proneness (b = À.30, p = .43), inner speech usage (b = À.12, p = .33), and the interaction between the two (b = .05, p = .89), were not significant predictors. To summarise, the regression analysis indicated that, when controlling for the non-cued condition, hallucination-proneness, and inner speech usage, the interaction between hallucination-proneness and inner speech usage significantly predicted response bias on the cued condition. However, these variables did not predict response bias on the non-cued condition.
As before, we further explored this finding by conducting a regression analysis, this time with task sensitivity (d 0 ) in each condition as the dependent variable. d 0 in the non-cued condition, hallucination-proneness, inner speech usage, and the hallucination-proneness Â inner speech usage interaction term were used as independent variables. Overall, the model significantly predicted d 0 in the cued condition (F (4,59) = 9.51, R 2 = .41, p < .001). However, only d 0 in the non-cued condition was a significant independent predictor (b = .62, p < .001). Hallucinationproneness (b = .36, p = .25), inner speech usage (b = À.05, p = .61), and the hallucination-proneness Â inner speech usage interaction term (b = À.29, p = .36) were not significant predictors of d 0 in the cued condition. The same pattern of results emerged when d 0 in the non-cued condition was used as a dependent variable. d 0 in the cued condition was a significant predictor of d 0 in the Table 2 Means and standard deviations for Experiment 2, showing performance on the auditory signal detection task, for high and low hallucination-prone participants, who reported high and low levels of AVI (M, SD). non-cued condition (b = .62, p < .001), but hallucination-proneness (b = À.25, p = .44), inner speech usage (b = À.07, p = .54) and the interaction between the two (b = .15, p = .63) were not significant predictors of d 0 in the non-cued condition. As in Experiment 1, there was no effect of gender on signal detection task sensitivity (F (1,53) = .04, p = .84, g p 2 = .001), nor any interaction between gender and task condition (F (1,53) = .08, p = .77, g p 2 = .002), inner speech usage (F (1,53) = .57, p = .45, g p 2 = .011), or hallucination-proneness (F (1,53) = 1.82, p = .18, g p 2 = .033).

Discussion
Experiment 2 used identical auditory and visual stimuli as Experiment 1; only the task instructions differed, in that participants were not told that they should use AVI. After completing the task, participants estimated the extent to which they had spontaneously engaged in AVI after being cued with a sentence. Both response bias and sensitivity were affected by the presence of a sentence cue only in participants who both (a) reported high levels of hallucination-proneness and (b) reported using high levels of AVI whilst detecting a voice stimulus (although it should be noted that the three-way interactions only reached trend levels of significance). Nevertheless, these results indicate that if the sentence cue did not cause participants to use high levels of AVI, then it did not have an effect on task performance. However, within the group of participants who used high levels of AVI, only those participants who also reported high levels of hallucination-proneness showed a bias towards perceiving a voice in the noise, and showed reduced sensitivity, when cued with a sentence.
These results are partially consistent with the results from Experiment 1: they indicate that highly hallucination-prone individuals show a lower response bias when using AVI. However, unlike Experiment 1, the results from Experiment 2 indicated that use of AVI also affected sensitivity to the task in highly hallucination-prone individuals (although sensitivity was not predicted by hallucination-proneness in the regression analyses). This was an unexpected finding, which may be explained by a greater increase in the number of 'false alarm' responses relative to the increase in 'hit' responses. That is, if the participant mistook internal, self-generated AVI for an external, non-self-generated stimulus, it may have had a relatively smaller effect on the hit rate, especially if presentation of a stimulus at a low signal-to-noise ratio affected performance. For example, the presentation of voice stimuli (even below a participant's auditory threshold) may have interfered with the likelihood that internally generated AVI was mistaken as external.

General discussion
To summarise, the two experiments reported in this paper examined the effect of the generation of auditory verbal imagery (AVI) on auditory signal detection, in participants who reported high or low levels of hallucination-proneness. Experiment 1 showed that, when instructed to use AVI, participants showed a lower response bias, being more willing to respond that a voice was present in noise (regardless of its actual presence), compared to performance on a standard auditory signal detection task. Further analysis showed that this effect was specific to participants who reported high levels of hallucination-proneness. Emotional valence of the material being imagined did not affect performance. Experiment 2 compared performance on a standard auditory signal detection task, and a variant of the task in which participants were cued with a sentence to detect, but not given any instructions to use AVI. The results suggested that hallucination-prone participants only showed a lower response bias when they retrospectively reported using AVI, despite not being instructed to do so. In Experiment 2, counter to expectations, there was some evidence that task sensitivity was also affected by usage of AVI.
These findings provide support for models of AVHs which suggest that they result from an external misattribution of an internal mental event, such as inner speech (Ditman & Kuperberg, 2005;Frith, 1992;Jones & Fernyhough, 2007b). The present studies partially support previous findings which have shown lower response biases in auditory signal detection, in both clinical and non-clinical samples that report frequent hallucinatory experiences (Brookwell et al., 2013), and extend the findings by showing that hallucination-prone individuals only showed a lower response bias when using AVI. As far as we are aware, previous studies that have linked performance on signal detection tasks to hallucinations have not incorporated variation in AVI/inner speech usage into their study design.
Given that, in both experiments, there was no association between level of reported AVI and level of reported hallucination-proneness, the results cannot be explained in terms of increased AVI usage in hallucination-prone individuals. Instead, the results suggest that when hallucination-prone individuals do use AVI, it is more liable to become externally misattributed. This is consistent with the previously outlined inner speech models of AVHs, suggesting that performance on reality discrimination signal detection tasks may be related to problems with self-monitoring of internally generated cognition. The present study does not, however, provide evidence to distinguish between precise mechanisms at play in reality discrimination or self-monitoring biases. It is possible that hallucination-prone participants misattributed AVI due to high levels of vividness of the imagery, making it harder to distinguish from a 'real' perception. High levels of vividness of mental imagery may be a trait shared by hallucination-prone individuals, which could lead to a higher likelihood of external misattributions. Alternatively, low levels of cognitive effort associated with AVI generation may have led to a similar effect in this group. Interestingly, hallucination-prone participants in the present study showed the opposite effect to that elicited by Perky (1910), instead showing patterns of response more consistent with those reported by Peterson and Graham (1974) or Farah and Smith (1983), who showed facilitation of perception by use of mental imagery. However, the present data support the hypothesis that this may be due to the effect of imagery on response biases, rather than sensitivity.
Neuroscientific findings describing activations in inner speech and those occurring during AVHs have implicated speech production areas, as well as primary and secondary auditory cortical regions, in the generation of AVHs (Allen, Larøi, McGuire, & Aleman, 2008), as well as showing higher levels of activity in auditory cortical (including speech perception) regions when patients with a diagnosis of schizophrenia use inner speech (Simons et al., 2010). Ford et al. (2001) previously showed that inner speech usage in hallucinating individuals was not associated with the same cortical attenuation in response to an external stimulus as in non-hallucinating individuals, suggesting that self-monitoring failures, which may underlie lack of agency over inner speech, are linked to aberrant predictive processes. To support this, a recent study showed that cortical attenuation to self-generated actions in individuals scoring highly on measures of schizotypy was reduced (Oestreich et al., 2015). Furthermore, Moseley, Fernyhough, and Ellison (2014) showed that modulating excitability in superior temporal regions affected performance on a signal detection task similar to the task used in the present study. In combination with the present results, this might imply that hallucination-prone individuals' inner speech may be associated with higher levels of vividness, reflected in higher levels of activity in speech perception regions. This is supported by neuroimaging findings showing that auditory mental imagery (in a non-clinical sample) rated as higher in vividness is associated with higher levels of activity in speech perception regions (Zvyagintsev et al., 2013). This is also consistent with the conclusions of Aleman et al. (2003), who interpreted the effects of auditory imagery as evidence of higher perceptual detail in the AVI of hallucinators, and may suggest that earlier findings relating to the interaction between imagery and perception (Farah & Smith, 1983) are linked to reality discrimination biases through the perceptual detail involved in auditory imagery. These findings suggest that cortical attenuation in sensory regions, commonly linked to the sense of agency, may be linked to perceptual detail in mental imagery, and hence reality discrimination biases.
Unexpectedly, hallucination-prone participants did not show a lower response bias on signal detection overall: the effect was only observed when using AVI. In this respect, the results are inconsistent with previous findings (e.g., Barkus et al., 2007;Rankin & O'Carroll, 1995;Varese et al., 2011) which have suggested that hallucination-proneness is associated with a lower response bias in typical auditory signal detection. It is possible that task differences (for example, cueing the participants with a fixation cross at the point of the voice stimulus presentation, even in non-AVI versions of the task) could have affected performance in our study, and may make our results in non-AVI conditions non-comparable with previously conducted research. Interestingly, two studies which have previously used auditory signal detection paradigms in which participants were cued at the precise time point they should attempt to detect a voice, reported no association between hallucination-proneness and signal detection performance (Hoskin, Hunter, & Woodruff, 2014;Vercammen et al., 2008). It is possible that presentation of the cue with voice presentation may have focused attention on voice detection, and therefore reduced the rate of spontaneous AVI, which may have masked the association with hallucination-proneness. In future experiments, it would be informative to include a condition in which no voice presentation cue is included, to test this hypothesis. A further manipulation would be to engage participants in different types of mental imagery (e.g., visual or motor imagery) to test modality specificity of the effect of imagery on auditory signal detection. If only use of AVI elicited lower response biases in hallucination-prone individuals, this would provide further support for our interpretation that AVI could become misattributed to an external source (the noise). In contrast, if this effect was also elicited by visual imagery (on an auditory task), this would imply that more general processes may underlie the effect.
Indeed, one possible objection to the interpretation of this data as relating directly to the external misattribution of internal mental imagery relates to the role of working memory, and the cognitive load associated with generating AVI during the task. Research has previously shown that increasing working memory load can lead to a reduction in the sense of agency over selfgenerated actions (Hon, Poh, & Soon, 2013). From our data, it is not possible to rule out the possibility that the increased working memory load (by presenting a sentence to be detected) may have interacted with hallucination-proneness, which could underlie the observed effect. However, this explanation seems unlikely, given the relatively light cognitive load involved in our task. Hon et al., for example, did not find an effect of working memory on the sense of agency using a lower working memory load (two presented items), but did with a higher load (six presented items). This therefore seems like an unlikely explanation for our results.
It is also not possible to rule out that attentional processes may underlie the observed effect; for example, heightened attention to the to-be-detected stimuli in the AVI condition may have increased the participants' willingness to respond that a voice was present (although the reverse could also be the case, in that heightened attention could plausibly decrease willingness to respond a voice was present). Contemporary cognitive theories have suggested that biased attentional processes may underlie some AVHs (Hugdahl et al., 2008), and it is likely that reality discrimination biases and attentional biases are not wholly independent constructs. Future research, though, should investigate the relation between working memory, attentional biases and auditory signal detection in relation to hallucinations.
A key area of research will be to understand what causes some instances of AVI/inner speech to become misattributed, but not others. The present study found no evidence that negatively valenced words were more likely to become misattributed, which does not provide evidence for the hypothesis that negative, egodystonic thoughts may be externalised and experienced as a hallucination (Morrison et al., 1995). This supports previous research using source memory tasks, which found that words associated with traumatic events were not more likely to be externally misattributed (Bendall et al., 2011), but stands in contrast to research suggesting that emotionally negative outcomes affect low-level sensorimotor agency (Yoshie & Haggard, 2013) and higher-level attributional biases (Mezulis et al., 2004). Previous research has, however, shown that inducing negative affect in participants causes an increase in the number of external misattributions on a typical auditory signal detection paradigm (Hoskin et al., 2014;Smailes, Meins, & Fernyhough, 2014). This might imply that the content of the inner speech does not play a role in its misattribution, but instead a general state of negative affect may cause an increase in the likelihood of external misattributions.
Furthermore, it is possible that the valence of the AVI may interact with individual schemas relating to the self-concept; that is, the likelihood that a negatively valenced statement will be externally misattributed may be related to the extent to which the individual holds negative views about themselves. It is possible that a negative self-concept would lead to fewer misattributions of negative items (as they may be more likely to be attributed to the self). If this were the case, emotional valence of the stimuli used in the signal detection task might be expected to interact with measures of self-esteem or negative schemas relating to the self, as opposed to proneness to hallucinations. Thus, future research should aim to examine the role played by positive and negative beliefs about the self, as well as the role of affective states, in modulating participants' reality discrimination abilities. An alternative (although not exclusive) possibility is that dialogic inner speech (that takes on the quality of a back-and-forth conversation), or inner speech that includes the voices of other people, may be more likely to be misattributed under conditions of high cognitive load or stress (Fernyhough, 2004;Jones & Fernyhough, 2007a). Further research that manipulates qualitative aspects of AVI and investigates their interaction with affective state is merited.