Effects of expectations and sensory unreliability on voice detection – A preregistered study

The phenomenon of “ hearing voices ” can be found not only in psychotic disorders, but also in the general population, with individuals across cultures reporting auditory perceptions of supernat-ural beings. In our preregistered study, we investigated a possible mechanism of such experiences, grounded in the predictive processing model of agency detection. We predicted that in a signal detection task, expecting less or more voices than actually present would drive the response bias toward a more conservative and liberal response strategy, respectively. Moreover, we hypothesized that including sensory noise would enhance these expectancy effects. In line with our predictions, the findings show that detection of voices relies on expectations and that this effect is especially pronounced in the case of unreliable sensory data. As such, the study contributes to our understanding of the predictive processes in hearing and the building blocks of voice hearing experiences.


Introduction
"One of them could have possibly been an owl, but the other one was, like, a cackling." The Blair Witch Project Imagine entering a house that is said to be haunted, where pitch-black darkness makes it impossible to discriminate what is in one's surroundings.Your mind has no choice but to resort to the auditory modality, but the humming sound you hear might as well be the whistle of the wind, the wailing of an animal, or a whisper.Unable to distinguish what is out there, your mind may be forced to rely on what it expected to encounter in the haunted house in the first place.This, in turn, can leave you with an eerie feeling that what you heard was the voice of a ghost.
While "hearing voices" is a phenomenon often treated as a crucial symptom of psychotic disorders (Waters et al., 2017), a growing body of research recognizes that it is common also in non-clinical populations (Pierre, 2010;Waters et al., 2017).Voice hearing also appears in many religious traditions, both as paradigmatic events (e.g., Saul hearing the voice of God) and personal experiences declared by individuals: believers from the Vineyard Church claim to hear God (Luhrmann, 2012; see also Cook, 2019 for discussion of voice hearing in Christian tradition), members of the Ilahita Arapesh cult consider ritual sounds to be voices of spirits (Tuzin et al., 1984) and paranormal believers tend to hear voices in noisy recordings, treating them as evidence for the existence of ghosts and other otherworldly beings (see Nees & Phillips, 2015).How do the experiences of voice hearing come about?
At least since the seminal work of James (1902James ( /2011)), many explanations of supernatural experiences have been proposed (e.g., Tuzin et al., 1984;Pyysiäinen, 2001), with some focusing particularly on the experience of direct contact with a supernatural being (e. g., Luhrmann, 2012, Andersen, 2019;Van Leeuwen and van Elk, 2019;Luhrmann et al., 2021).In the current study, we investigate the mechanism of voice hearing proposed by one of these latter theories, namely, Andersen's (2019) predictive processing model of agency detection (PPAD).
The term "agency detection", coined within the cognitive science of religion (CSR; see White, 2021), traditionally encompasses various phenomena related to detecting intentional agents in one's surroundings, which is said to be a core cognitive capacity of humans (Spelke & Kinzler, 2007;Spelke, 2022).Previous studies on agency detection focused on the recognition of biological motion (e.g., Mar et al., 2007;Maij et al., 2019), the detection of agents said to inhabit a virtual environment (e.g., Andersen et al., 2019b;Tratner et al., 2020), as well as voice detection (e.g., Maij et al., 2019), among others.Importantly, it has been claimed that agency detection follows a "better safe than sorry" evolutionary logic.In the environment of evolutionary adaptedness, it was better to falsely detect an agent than to neglect signs of agency and risk being unprepared for a potential encounter, e.g., with a predator.Based on that, some researchers posited that humans tend to detect non-existent beings when faced with ambiguous sensory data, which could account for the ubiquity of religious beliefs (e.g., Barrett, 2000;Barrett & Lanman, 2008).However, recently, Andersen (2019) presented a new, PPAD model grounded in the framework of predictive processing (see e.g., Hohwy, 2013;Clark, 2016;Hohwy, 2020;Clark, 2023) and argued that humans are not hard-wired to overestimate agency in their surroundings.In PPAD, false positive agency detections are primarily attributed to prior expectations about agents combined with the unreliability of the sensory data.Notably, in the context of encounters with supernatural agents, these expectations can be shaped by sociocultural learning, including religious beliefs, and religiously meaningful environments often provide induce perceptual uncertainty by limiting or distorting sensory data.
As discussed below, predictive processing-based research on audition provides tentative support for Andersen's model, and while its core predictions have been supported in a study focusing on the visual modality (Andersen et al., 2019b), we decided to put them to a test in a purely auditory setting to explore whether the proposed mechanism could underlie at least some instances of voice hearing.Specifically, we inquired whether varying expectations and unreliable sensory data can bias the detection of voices.Below we elaborate on the theoretical framework and study rationale.

Predictive account of agency detection
After Guthrie (1993) proposed that humans have a universal inclination towards anthropomorphismi.e., the tendency to perceive human-like patterns in natural phenomenaa new research program emerged, building upon the idea that our brain is equipped with a cognitive module dubbed the "Hyperactive Agency Detection Device" (HADD; Barrett, 2000).The hyperactivity of the HADD was supposed to manifest in a high number of false alarms, which served an adaptive function.Based on the logic of error management theory (see (Johnson et al., 2013), it is better to falsely detect a potentially dangerous agent than to assume safety.Thus, it has been posited that the HADD helped us to avoid costly errors, such as failing to detect a predator when one was actually present, during our evolutionary history.Furthermore, the activation of HADD was said to be more likely in the face of ambiguous sensory data (Barrett & Lanman, 2008) or in situations of perceived threat (Maij et al., 2019).As for the relationship of agency detection with religion, Guthrie (1993) argued that the universality of our anthropomorphic tendencies could explain the prevalence of supernatural beliefs.However, Barrett & Lanman (2008) narrowed this proposal down to the claim that we tend to explain anomalous agency detection experiences by referring to supernatural beings that we already believe in.
Several experimental studies have been conducted to test core predictions of the HADD model.First, when it comes to agency detection under ambiguous sensory data, van Elk (2013) found that low and intermediate levels of noise are driving paranormal believers to detect more illusory agency than skeptics.However, this effect disappeared when the levels of noise were high.In another series of studies, van Elk and colleagues (2016) primed participants with human and supernatural agent concepts and measured their false positive agency detections in stimuli with different levels of noise.Here, no effect of noise on agency detection was found.
Second, it has been predicted that HADD should result in more false positive agency experiences under the conditions of threat.However, in a series of experiments including various operationalizations of agency detection (including, e.g., biological motion recognition, auditory task, and an agency detection task in virtual reality), Maij et al. (2019) did not observe any impact of feeling of threat on the detection of agents, a result that contrasts with the "better safe than sorry" principle derived from error management theory, as noted by the authors.
Finally, when it comes to the relationship between agency detection and supernatural beliefs, studies have shown mixed results.For example, the aforementioned van Elk's (2013) study found an effect of paranormal beliefs on agency detection, and, in line with that result, Willard and Norenzayan (2013) reported a relationship between paranormal, but not religious beliefs, and agency detection.On the other hand, Tratner et al. (2020) did not find supernatural belief to predict agency detection in virtual reality, and van Elk et al.'s (2016) study did not corroborate an effect of supernatural primes on the detection of agents.Overall, the empirical findings have resulted in some revisions of the original HADD model and theory (Andersen, 2019;Van Leeuwen & van Elk, 2019).For instance, Andersen (2019), argued that HADD is based on a modular approach in evolutionary psychology, which pictures the mind as a Swissarmy knife (Cosmides & Tooby, 1994), composed of independent, encapsulated modules (for a broader discussion of the HADD model, see also Van Leeuwen & van Elk, 2019;Lisdorf, 2007).Based on both theoretical and empirical concerns, instead of a modularevolutionary view of HADD, Andersen (2019) proposed a domain-general cognitive mechanism of agency detection embedded within the predictive processing frameworkthe PPAD model.

P. Szymanek et al.
Predictive processing is an approach to cognition that has gained traction in recent years (Hohwy, 2013(Hohwy, , 2020;;Parr, Pezzulo & Friston, 2022;Clark, 2016Clark, , 2023) also within CSR (e.g., Schjoedt et al., 2013;Andersen et al., 2019a;Andersen et al., 2019b).While different versions of the predictive processing framework exist (see Litwin & Miłkowski, 2020), the main premise remains the sameour mind works as a prediction-generating machine, with hypotheses, stemming from our model of the world, being tested against the incoming sensory stimuli.Should the model and data not fit together, a prediction error is generated, updating the model in response to the perceived stimuli so that the new predictions are more accurate in the future.To estimate if a given prediction is what is currently perceived, the brain performs a Bayesian-like inferenceit estimates the posterior probability of different models (interpretations) based on the available evidence by considering the likelihood of the evidence given that a prediction is correct (the "likelihood") and moderating it with an estimate of how likely the prediction is independent of any data (the "prior probability").When external stimuli become uncertain and the likelihoods of different interpretations start to approximate each other, the precision of bottom-up sensory input is down-weighted and the importance of the prior probability increases.Or, to put it in other terms, when the data is expected to be imprecise, which leads to a large prediction error (Feldman & Friston, 2010), the error no longer drives our perception to the same extent as in the case of sensory data of high expected precision.For example, in a well-lit room, we can see clearly and base our perception on both the likelihood and prior probability of external stimuli, but as the light dims, we are forced to resort to our assumptions about the world.
Thus, the PPAD model proposes that humans have no default bias that would make us detect illusory agency.Instead, detection of agents that are not really there is most likely to happen when 1) our prior probability of encountering an agent is high and 2) the incoming sensory data is ambiguous, which results in the perceptual processes relying more strongly on priors.Importantly to the study of religion, Andersen argues that the priors can be shaped by, among other, sociocultural transmission of religious ideas, which provide us with the "distribution" of supernatural agents in surroundings (e.g., "You can hear God in a church").On the other hand, the environments where the supernatural agents are said to roam are usually places of low sensory reliability: dark, misty, full of strange sounds or lights, etc.All in all, the PPAD model offers an explanation of why people tend to encounter various sorts of otherworldly beings based on a plausible process of religious learning shaping our expectations and the lack of reliable sensory data, tight to religiously relevant surroundings.
To provide preliminary evidence for the theory, Andersen et al. (2019b) conducted a virtual reality study where participants were asked detect beings that were supposed to inhabit a virtual forest.However, there were no real agents implemented to the virtual environment.The authors of the study used a priming manipulation, inducing high expectations (i.e., one group was told that there are multiple beings in the forest and that there is a 95 % chance of detecting at least one) or low expectations (i.e., the other group was told the opposite, with a 5 % chance).Additionally, the authors manipulated the sensory reliability: all participants went through two versions of the same forest, one with clear weather and one with dense mist.As predicted, the participants detected significantly more agents both when informed of a high probability of encounters and when placed in a situation of high sensory ambiguity.
The idea that perception is shaped by our expectations has already been supported by several empirical studies (see De Lange et al., 2018).For example, neuroimaging studies show that expecting a particular stimulus biases representations in the visual cortex (Kok et al., 2013) as well as decreases their amplitude, thus making them more accurate (Kok et al., 2012).Regarding auditory processing, not only is there general evidence for a prediction-testing based mechanism in our auditory system (see Bendixen et al., 2012;Bendixen, 2014;Heilbron & Chait, 2018), but also, specifically, auditory verbal hallucinations have been explained either as predictive processing abnormalities (Fletcher & Frith, 2009;Wilkinson, 2014), or, on the contrary, as a result of the normal functioning of the predictive brain.For example, in line with the PPAD model, Benrimoh et al. (2018) suggest that auditory hallucinations originate from strong priors and sensory data of low precision, which also aligns with reports of voice hearing found in the context of sensory deprivation (Pierre, 2010).Unable to decide between different hypotheses based on sensory data, we resort to our internal model of the world, in which a human voice is often considered the most likely auditory percept to appear (Wilkinson & Bell, 2016; see also Swiney & Sousa, 2013 for a discussion on voice hearing and attributing agency to thoughts).
Overall, previous research suggests that detecting auditory, agent-related stimuli is influenced by both one's prior expectations and the reliability of the sensory data.In our experimental study, we manipulated these factors to investigate their impact on voice detection.

The present study
We used a modified version of the Auditory Agent Detection Task (see Maij et al., 2019), in which participants were asked to detect voices in a series of auditory stimuli, out of which 50 % had an actual voice embedded in them.The task was to respond with the left or right button if they heard or did not hear a voice, respectively.Participants were assigned randomly into two groups, primed to expect less or more voices than objectively present, respectively (henceforth: "low" and "high" expectations, respectively).We adopted a signal detection approach (see, e.g., Macmillan, 2002) which allows to assess the performance of a given agent in discriminating signal from noise, by analyzing the numbers of correct responses (hits and correct rejections) and incorrect responses (misses and false alarms).Based on those numbers, we could calculate two measures of participants' performance: perceptual sensitivity (d') -the capacity to distinguish between stimuli with the embedded voice and stimuli without itand criterion (c), an indicator of the response biasa preference towards "voice present" or "voice not present" response.As our first goal was to test whether the response bias is affected by expectations regarding the number of voices, we believe that the signal detection theory was a well-suited approach that allowed us to explore whether different expectations drive not only false positive detections, but also false negatives and correct responses.
To test the second prediction of the model, namely, that sensory unreliability increases the reliance on prior expectations, we added P. Szymanek et al. white noise to half of all recordings, expecting that the participants will exhibit an even stronger, expectations-driven response bias while detecting voices in noisy stimuli.Thus, our hypotheses were as follows: 1. Participants in the high expectations group will exhibit a stronger bias towards positive responses than participants in the low expectations group.2. Participants in the high expectations group will exhibit a stronger bias towards "signal present" responses while detecting voices in stimuli with noise as compared to stimuli with no noise.3. Participants in the low expectations group will exhibit a stronger bias towards "signal not present" response while detecting voices in stimuli with noise as compared to stimuli with no noise.
Additionally, to make sure that our effects are not driven by a drop in general performance, we also tested the following assumptions: • Participants in the high expectations group will not exhibit a different perceptual sensitivity than participants in the low expectations group • Participants will not exhibit a different sensitivity to signals while detecting voices in stimuli with noise as compared to stimuli without noise Furthermore, we added a confidence measure to our design, where participants could rate how confident they were of each response.We also used two scales to investigate the relationship between agency detection and supernatural beliefs (see above) as well as the potential moderating effect of absorption, which is supposed to drive experiences of sensed presence, a concept closely related to agency detection (Luhrmann et al., 2021).Finally, we decided to minimize possible distractions to participants by depriving them of visual cues using a blindfold.

Participants
To determine our sample size, we have conducted a G*Power analysis for linear multiple regression.Assuming a medium size effect (rule-of-thumb value of f 2 = 0.15), with three predictors (expectancy, noise, and interaction), the required sample of 0.95 power was 119.However, we followed a stopping rule, i.e., we finished our data collection when we reached a valid sample of 119 participants.To this end, we monitored any issues occurring during the trials, such as failing to complete the procedure (see Results section).We did not access the collected data before a final, valid sample of 119 participants was collected, though.
Overall, we collected data from 122 Polish-speaking participants (77F, 34 M, 11 other/data unavailable; age: M = 25.2,SD = 4.2) with no hearing impairment or with corrected hearing, recruited through announcements on Facebook.Each participant received a 50 PLN (approx.12 USD) gift card to a media store for taking part in the experiment.The study was conducted in a group setting, with up to 6 participants in the lab at the same time.
Fig. 1.The structure of an individual trial of the auditory procedure.Note.In each trial, participants heard a "fixation sound" -a long, high toneafter which the stimulus was presented.Then, after a short, high tone, participants pressed the left or right button.Finally, after a short, low tone was presented, participants rated their subjective response confidence on arrow buttons.

Study design and procedure
We used a mixed factorial design, with one between-subject factor (dividing the participants into the high expectations and low expectations groups) and one within-subject factor (white noise presented in half of the stimuli).Noteworthy, we did not use a repeated measures design for the former factor as it could increase the risk of participants guessing the hypotheses.Participants in the high expectations group were told in the instruction that there was a high number of voices in the stimuli and a 75 % chance of a voice being present in any of the recordings; conversely, the low expectations group was told that the number was small and that there was a 25 % chance.The voices were present in exactly fifty percent of all stimuli.
The participants were allocated to one of the two groups based on their arrival time.After a short introduction, they were informed that they were about to participate in an experiment on auditory perception, in which their task would be to detect voices in a series of recordings.
Next, participants gave their informed consent and received detailed instructions.Their task consisted of listening to a series of fivesecond recordings in headphones and trying to detect human voices embedded in them.Beginning of each recording was signaled by a high tone (0.5 s, 450 Hz, amplitude 0.2).Next, after the target stimulus, a shorter high-pitch sound signal (0.1 s, 750 Hz, amplitude 0.2) was played, signifying a three-second break to press a "Voice present" (left control) or "Voice not present" (right control) response button.Both of those buttons had tactile cues made from the same type of sandpaper.Next, a low pitch signal (0.4 s, 150 Hz, amplitude 0.2) would preclude a second three-second break, allowing participants to give their meta-cognitive judgment by estimating the certainty of their responses (upnot certain at all, leftnot certain, downcertain, rightvery certain).The confidence buttons also had sandpaper tactile cues, with the finest grade sandpaper being used for the "not certain at all" key, and granularity increasing up to the coarsest grade used for "very certain."After pressing any of the arrow keys, a new trial started (see Fig. 1 for the structure of an individual trial).
Participants put on the blindfolds and headphones and began the auditory procedure which started with the recording of the detailed instruction, same for each participant.After that, the participants went through a training phase, consisting of 10 trials with 7 or 3 voices (for the group then primed to have high or low expectations, respectively) which were easier to detect than in the main experimental phase.Just before the start of the main phase, participants listened to a final instruction, where they were informed about the number of voices present in all trials and the average chance to detect a voice in each trial (high -75 % vs. low -25 %).
The experimental phase of the procedure consisted of 120 trials presented in two randomized blocks, with an up-to-one-minute, skippable break after 60 trials, followed by an unskippable reminder of the instruction, including manipulation.After finishing the task, each participant went through the questionnaires, was debriefed, and received a gift card.
The study conformed to the Declaration of Helsinki and was approved by the Ethics Committee at the Jagiellonian University.The study was preregistered at the Open Science Framework (https://osf.io/ywpg2).It was conducted at the facilities of the Mathematical Cognition and Learning Lab.

Auditory procedure and stimuli
The experimental procedure was prepared using the PsychoPy software (Peirce et al., 2019).
For generating and processing the fixation sounds and stimuli, we used Audacity software.Each target stimulus was 5 s long.For the background, inspired by the forest-like virtual environment used by Andersen et al., we used 10 randomly chosen sound fragments of a free-license forest ambience (Sounds of the Forest Night, 2022) with a reverb added to enhance the feeling of surround sound.Next, we recorded a male pronouncing 10 Polish pseudowords chosen randomly, but with an equal share of same-length words from the validated dataset by Imbir et al. (2015).We decided to use pseudowords instead of real words to avoid semantic and memory processes coming into play.We used only the pseudowords with the highest ratings from competent judges.
Each of the 10 forest ambience sounds was copied, and we added a white noise (amplitude: 0.015) to the newly created 10 sounds.With the 20 resulting files, we added each voice to every individual file three times, with the position of the voice in the sound timeline in the first one-third, in the middle, or at the final one-third of the auditory file.That way, we achieved 60 sound files with voice (30 noise and 30 no noise).When it comes to sounds without voice, the 20 files from before adding the voices were repeated 3 times each in the experiment.
The volume of all sounds used in the stimuli was modified as follows: +15 dB (forest ambience), +7 dB (white noise), and − 17 dB (voices).Regarding training stimuli, we used another 5 fragments of forest ambience, did not embed noise in them and the audibility of voices was significantly higher (we decreased the original volume only by 5 dB) to allow participants to learn to detect the target stimuli during the practice trials.5 additional pseudowords were recorded specifically for the training and distributed unequally in each study group (7 vs. 3 voices present in the training).All auditory stimuli can be found in the Supplementary Materials.

Questionnaires
Two questionnaires, namely, the Paranormal and Supernatural Belief Scale (Dean et. al, 2021) and the Absorption Scale (see Luhrmann et al., 2021), both translated using a back-and-forth translation, were distributed at the end of the procedure.Apart from completing these two scales, participants were also asked about their age, gender, education and whether they suspected what the real purpose of the study was and at what time they started to have an idea about the hypothesis being tested.

Equipment
The procedure ran on Dell OptiPlex 7000 PCs, with AKG headphones, chosen for their good external noise-canceling capacity.

Pilot study
Before the proper data collection, two short pilot studies were conducted.A short description of those pilot sessions can be found in the Supplementary Materials.

Preprocessing of signal detection data
We calculated z-scores for the number of hits, false alarms, omissions, and correct rejections for each participant (separately by noise and no noise condition).Based on this, we computed sensitivity (d': z(hits) − z(false alarms)) and criterion (c: -½(z(hits) + z(false alarms)).A higher c corresponded to a more conservative strategy (detecting less signals and committing less false alarms) and a lower c to a more liberal strategy (detecting more signals and committing more false alarms).Thus, our hypotheses predicted 1) a lower c in the high expectations group compared to the low expectations group; 2) a lower c in no noise stimuli than in noise stimuli within the high expectations group; 3) a higher c in no noise stimuli than in noise stimuli within the low expectations group.

Method of main analysis
To test our hypotheses, we conducted a linear mixed effect regression with the use of R language (R Core Team, 2022) and "lme4" R package (Bates et al., 2015).We followed the guidelines by Winter (2013).To interpret results, we exploited the "report" R package (Makowski et al., 2023) which computes confidence intervals and p-values using a Wald t-distribution approximation.

Main analysis
We started the preregistered data preprocessing by excluding 3 out of 122 participants based on issues occurring during the experimental trials (one of them confronted a malfunctioning procedure; one had already taken part in an analogous, unpublished study; one sneaked into the lab and started the procedure on their own and exhibited signs of general confusion).Then, we split the data by noise and calculated the number of hits, false alarms, omissions, correct rejections, and missing responses, as well as respective rates, split by the block of trials.Next, we excluded from all analyses 12 participants who correctly guessed the purpose of the study and declared guessing it no later than in the middle of the auditory procedure.Finally, as specified in the preregistration, another 3 participants were excluded based on their number of missing detection responses (M +/-3 SD outlier) and low correctness (overall hit rate < 0.4 or overall false alarm rate > 0.6).That left us with the final data from 104 participants.We conducted additional quality checks to make sure that the hardware functioned in good order.Signal detection measures are summarized in Table 1.
After transforming the data with an inverse tangent, we tested the assumption of homogeneity of variance with Levene's test separately for group and noise as a predictor (p = 0.884 and p = 0.282, respectively), and we tested the assumption of normality of residuals using Shapiro-Wilk's test (p = 0.194).Then, we fitted the linear mixed effects model with group and group * noise as predictors, adding a random effect for participants.No outliers (M +/-3 SD) of the dependent variable were found.The model's total explanatory power was R 2 = 0.72.We found a statistically significant effect of group (β = -0.27,t(2 0 2) = -2.65,95 % CIs = [-0.47;− 0.07], p = 0.009) and an interaction effect of group and noise (β = -0.30,t(2 0 2) = -3.57,95 % CI [-0.46, − 0.13], p < 0.001).These results led us to reject the null hypotheses, namely the lack of group effect on response bias and the lack of interaction between group and noise.
Next, we examined the estimated marginal means and compared them by pairs using Tukey's correction.We found a statistically significant difference between no noise and noise conditions in the low expectations group (t = -3.05,p = 0.015, d = -0.62,95 % CIs = [-0.34;− 0.03]) in the predicted direction.The difference between no noise and noise conditions in the high expectations group had the predicted direction, but it did not turn out to be statistically significant (t = 1.92, p = 0.225, d = 0.37, 95 % CIs = [-0.04;0.26].However, the effect of the group was statistically significant in the predicted direction within both the no noise condition (t = 2.63, p = 0.046, d = 0.9, 95 % CIs = [0.01;0.54]) and the noise condition (t = 5.50, p < 0.001, d = 1.89; 95 % CIs = [0.30;0.84]) with the latter effect size being larger than the former.The estimated marginal means are presented in Fig. 2. To test our additional assumptions, namely, that noise and group do not predict decreased sensitivity (d'), we conducted two Wilcoxon's tests, since assumptions for linear regression were violated.While the paired samples Wilcoxon's tests showed that d' values were lower in the noise condition than in no noise condition (V = 1716, p = 0.002), the independent samples test did not provide evidence for an effect of group on sensitivity (W = 5168, p = 0.593; see Fig. 3).
We also tested for effects of the block order on both sensitivity and response bias: after checking the adequate assumptions, we ran a linear mixed effect regression for d' ~ block and compared it with the null model.We did not find a statistically significant effect of block (χ2(1) = 0, p = 1).The same was true regarding a paired samples t-test for c ~ block (t < 0.001, p = 1).However, to additionally check if our main effect of group * noise on the response bias changed over time, we decided to fit the mixed model from the main analysis separately from data from the first and second blocks.The effect of group * noise on response bias was more pronounced in the second compared to the first block, though the pattern in the data was consistent across blocks.A table of p-values and plots for estimated marginal means in block 1 and block 2 can be found in the Supplementary Materials.

Individual differences: Supernatural beliefs
In order to test the hypothesis that supernatural beliefs are linked to agency detection (e.g., Nieuwboer et al., 2014;Riekki et al., 2013;Riekki et al., 2014;Tratner et al., 2018;van Elk, 2013;van Elk et al., 2016), we performed the same linear mixed effect regression as in the main analysis, but this time adding mean scores on the PSBS scale as additional predictor and its interaction with the other factors.We started with preprocessing the questionnaire data and excluding 4 participants with missing responses (no participant failed the attention check).Cronbach's alpha was above our preregistered value of 0.7, and statistical assumptions were not violated.There were no +/-3 SD outliers of response bias or PSBS mean scores.The model's total explanatory power was R 2 = 0.73.Within this model, group (β = -0.53,t(1 9 4) = -1.35,95 % CIs = [-1.30;0.24], p = 0.180) was not a statistically significant predictor of response bias, and there was also no effect of PSBS score (β = -0.20,t(1 9 4) = -1.70,95 % CIs = [-0.44;0.03], p = 0.090).This analysis also did not show any interaction between group and noise (p = 0.251), group and PSBS score (p = 0.473), noise and PSBS score (p = 0.107), or between all three predictors (p = 0.838).

Individual differences: Absorption
The next exploratory analyses concerned individual differences in absorption, which has been suggested to drive experiences of sensed presence across individuals (Luhrmann et al., 2021).We were interested in whether absorption predicts response bias and interacts with expectation or noise, eliciting a more liberal or more conservative strategy when detecting voices.We used the data Fig. 2. Estimated marginal means of response bias by group (high expectations vs. low expectations) and noise (noise present vs. no noise).Note.Error bars represent SEs.Lower response bias corresponds to a more liberal signal detection strategy.
P. Szymanek et al. preprocessed as described in the previous subsection and focused on the sum scores of the absorption scale.Cronbach's alpha for absorption scores was higher than our preregistered value of 0.7, and we did not find evidence for violation of statistical assumptions.There were no +/-3 SD outliers of response bias or absorption.

Confidence responses
Regarding the meta-cognitive judgment of how confident the participant was of each voice detection response, we decided to explore the possibility that participants responded with different confidence as a function of group and noise.However, since confidence scoresunlike response biaswere independent of whether there was a signal present and whether a response was correct or not, we decided to look only at hits and correct rejections, fitting two confidence ~ group * noise linear mixed models, one for each type of response.In both analyses, following our preregistration, we excluded 2 participants who were +/-3 SD outliers of missing detection responses and 7 participants who were +/-3 SD outliers of missing confidence responses.Then, we calculated the means of confidence for each participant.
Thus, our analysis of the meta-cognitive confidence ratings suggests that 1) noise decreases the overall confidence across groups; 2) the group in high expectations condition was more confident in correct "signal present" responses; 3) the group in low expectations condition was more confident in correct "signal not present" responses.However, this pattern needs to be interpreted cautiously.
In our final exploratory test, we decided to analyze the correlation between confidence scores and response bias and sensitivity across the whole sample.While confidence and sensitivity were not related (r = 0.04, 95 % CIs = [-0.16;0.23], p = 0.697), we found a correlation of confidence and response bias (r = 0.36, 95 % CIs = [0.18;0.53], p < 0.001).After excluding two +/-3 SD outliers of confidence, the correlation was even more pronounced (r = 0.47, 95 % CIs = [0.29;0.61], p < 0.001; see the correlation plot in Supplementary Materials), reflecting that a more conservative response bias was strongly associated with higher confidence ratings.

Discussion
We found that expecting to hear a lower or higher number of voices than actually present leads to a more conservative or liberal response bias, respectively.Participants primed to expect a high number of voices (75 % chance for each stimulus) were generally more likely to detect a present voice (i.e., they had a higher hit rate), but also more likely to commit a false alarm response.Conversely, participants expecting a low number of voices (25 % chance for each stimulus) had a higher number of correct rejections, but also of misses.We also found that this effect became significantly larger when white noise was present in the auditory stimuli.Together, these findings provide insight into the proximate mechanisms underlying auditory agency detection.Below, we elaborate on the findings and the broader theoretical implications.

Predictive processing in auditory agency detection
Our first hypothesis, which predicted that expectations would drive participant's response bias, was supported.The change in response bias observed between groups shows that expectations regarding the probability of encountering auditory signals influence both the number of hits and the number of correct rejections.While this finding contributes to existing literature on expectations and perception (e.g., De Lange et al., 2018;Kok et al., 2012;Kok et al., 2013), we need to take into consideration that the effect could be related to participants' experience of task demands and their need to comply with the instructions provided.However, we believe that this explanation is rather unlikely because we also found that the effect of expectations was stronger for stimuli with noise, which illustrates that expectations came into play primarily when the sensory information was ambiguous.
Therefore, our results generalize the effect already found in a visual setting (Andersen et al., 2019b) to the auditory domain, suggesting that similar mechanisms are at play for agency detection in at least both of these modalities.According to Andersen's (2019) predictive processing account, our perception is intrinsically shaped by our prior beliefs about the world.Assuming that these prior beliefs can be affected by proximate, sociocultural transmissionsuch as the information presented to participants in our studyour study supports the idea that auditory perception may also be affected by the predictive mechanism.The results speak for the predictive processing account also when we consider the increase of the expectancy effect in stimuli with noise.We hypothesized that within both groups, the presence of noise would enhance the response bias.We found this effect only in the low expectations group, where participants showed a stronger conservative bias in stimuli with noise, which corroborated Hypothesis 3. We did not find evidence for the analogous effect in the high expectations grouphence, we failed to provide support for Hypothesis 2. However, as the effect of the group was over two times larger in the noise condition, we believe that overall, our data aligns perfectly with predictions stemming from Andersen's PPAD model.As pointed out by Andersen (2019; see also Benrimoh et al., 2018), within the brain, the probability of a given hypothesis is based not only on its prior probability but also on its likelihood.In conditions of sensory unreliability, the difference between likelihoods of different hypotheses decreasesin other words, the unreliable or scarce data does not fit one hypothesis better than the other, so priors come into greater play.We believe that our data might represent that effect.Importantly, against our assumptions, we found that presence of noise decreased sensitivity, but we find it unlikely that the clear, twoway pattern in the data appeared due to a mere drop in participant's general performance.In this regard, adopting a signal detection approach allows a more in-depth analysis of the effect and we believe that further studies on PPAD would greatly benefit from exploiting the same method.
Our findings also contribute to the more general discussion of mechanisms underlying agency detection.First, if we assume that humans possess a HADD, which is often considered an encapsulated, domain-specific module, we should not find expectations to affect agency detection. 1Nevertheless, our study does not undermine the possibility that humans are naturally biased toward detecting agents, but it shows that they can be socio-culturally biased through the induction of expectations.Second, within the HADD model, ambiguous data was expected to increase the number of false positive agency detections independently of one's prior beliefs or dispositions.The pattern in our data challenges this prediction, showing that noise can render people both more alert or more neglectful of agency.However, it must be noted that especially in the case of agency detection, the laboratory setting of our study might not provide results generalizable to real-life situations, where contextual and embodied cues could greatly inform our perception of agents.

Voice hearing experiences
How does our study answer the question of how experiences of voice hearing come about?If indeed humans are more prone to hear agents in unreliable stimuli when they have relevant expectations, this effect might be at play in auditory religious experiences, as religious and paranormal context can affect our prior beliefs analogously to the expectancy manipulation administered here.In the process of sociocultural transmission, we acquire information about when (e.g., at nighttime or during prayer) and where (e.g., in church or in a forest) to expect to hear a supernatural being (e.g., the voice of God or spirits).Moreover, while some illusory auditory experiences can be rather mundane in nature, a subsequent religious interpretation can make them more meaningful (see also Van Leeuwen & van Elk, 2019).Similarly to the effect of noise in our study, religiously relevant environments and activities often distort or limit sensory data, which makes us rely more on prior expectations (Andersen, 2019).When it comes to auditory perception, these 1 However, Barrett and Lanman (2008) seem to endorse the idea that HADD experiences can be driven by prior expectations regarding supernatural agents, writing that "once a human mind is exposed to concepts of [supernatural] agents, those concepts are likely to be remembered and thought about and are also likely to both encourage and explain HADD experiences."(p.116).
P. Szymanek et al. places and conditions may include, e.g., the use of instruments during a religious ritual; natural or artificial white noise; chanting and chatter; natural sounds such as wood creaking, feathers moving; and many other stimuli.As our laboratory study investigating only a "model" situation, we cannot draw conclusions that the effect we found is what drives actual experiences, religious or not.We believe that, should further findings corroborate the PPAD, an important step within the research program of CSR will be to build an explanatory bridge informed by anthropological research that would link the model to actual experiences of illusory agency detection found "in the wild."Nevertheless, our study raises the possibility that some religious experiences could be rooted in the "normal" functioning of a domain-general cognitive mechanism.
There might also be a component of voice hearing experience linked to one's metacognitive judgment of what is being perceived.In our analysis of subjective response confidence, we found a general pattern in the data, where high and low expectations drove participants to be more confident in hits and correct rejection responses, respectively.We also found a correlation between confidence ratings and response bias, reflecting that participants with a more conservative response strategy were more confident in their responses.While these results have to be interpreted carefully, we speculate that expectations affected participant's metacognitive judgments of their perceptions in direction aligning with previous research (Sherman et al., 2015): there was an improvement in metacognition in trials where prior expectations were congruent with one's response (on metacognition and signal detection, see, e.g., Maniscalco & Lau, 2012;Fleming, 2017).This phenomenon might also be relevant for the research on religious experiences, as expectations elicited in a religious or paranormal context could make one more confident that what they witnessed was real.Especially during collective ceremonies and rituals, where participants' shared expectations are boosted by the presence of religious authority and cognitive resources are depleted (see Schjoedt et al., 2013), reporting miracles, auras, lights, meteorological phenomena, and the like might be driven by higher confidence of one's percepts.
Contrary to frequent claims that individual differences in agency detection are related to supernatural beliefs (e.g., van Elk, 2013;Tratner et al., 2020), we did not find evidence for this relationship.This null finding adds to the literature, in which some studies reported a relationship between agency detection and supernatural beliefs (Riekki et al., 2013;van Elk, 2013;Nieuwboer et al., 2014), and some failed to provide evidence (e.g., Tratner et al., 2020) or found mixed results (van Elk et al., 2016).Note that based on the predictive processing account, there is no reason a priori why supernatural beliefs should be related to agency detection, though we speculate that some forms of beliefs could motivate believers to look for more patterns in ambiguous data (see, e.g., Nees and Phillips, 2015;Van Prooijen et al., 2017;Narmashiri et al., 2023) thus increasing the prior probability assigned to hypotheses involving agents.It needs to be acknowledged, though, that since within our study, participants were primed to over-or underestimate the number of voices, the null finding in relationship between beliefs and agency detection might be a false negative: the effect of actual religious beliefs could have been "suppressed" or "overruled" by the experimentally induced expectations.Another issue relates to the low variance of supernatural beliefs in our sample: the highest PSBS score was 44, which is only half-way to the absolute maximum of the scale, which indicates that our participants were not strong believers (see histogram in the Supplementary Materials).It is possible that with a more religiously heterogenous sample the analysis would reveal different findings (we thank our anonymous Reviewer for noticing these limitations).
Finally, it is important to note that aberrant voice hearing, whether religiously meaningful or not, is often considered a symptom of a psychiatric disease.However, we do not draw any conclusions regarding possible mechanisms of those experiencesa follow-up study with a clinical population could reveal if there are comparable effects of expectations and sensory unreliability in patients living with psychosis.

Limitations and further research directions
Following open science guidelines, we have conducted our data preprocessing and analysis as consistent with our preregistration as possible, which has made us exclude 12 participants based on their correct guess of the purpose of the study.However, 17 more participants also presented some suspicions about the goal of the experiment (likely because they noticed that the actual number of voices differed from what was instructed), but did not clarify when exactly they started to have their first suspicions.For further studies using misrepresentation similar to ours, we recommend exploring participants' suspicions in more detail, preferably in a short interview.A third party could be consulted to determine which participants should be excluded from the analysis.To test the robustness of our effects, we also ran the main analysis with the 12 participants included in the data.Except for revealing a main effect of noise (p = 0.003), the results did not differ significantly from the previous analysis.
Second, a general impression shared by some of our participants (mostly those who at least partially guessed the purpose of the study at some point in the trial) was that it was hard for them to believe that the chance of hearing a voice was really 25 % or 75 %they reported simply hearing more or less voices than that.Simultaneously, we speculate that there might be an asymmetry in how credible instructions for each group were, as it might have been easier for a participant to believe that the chance of hearing a voice is 75 % ("I probably missed a voice") than to believe that the chance is 25 % ("I definitely hear more voices than they said!").In the future, these issues might be resolved by changing the 25/75 % proportions to a potentially more credible 40/60 % -though it might likely decrease the achieved efficacy of the manipulation.
Third, it is worth mentioning that part of our effects could be driven by sensory deprivation, as participants were blindfolded.Sensory deprivation, in general, can affect individuals' proneness to hallucinate (Daniel & Mason, 2015; see also Pierre, 2010) and, as such, could be a potentially confounding factor that we did not control for.However, as participants were blindfolded irrespective of any condition, the relative differences we found are likely robust.
Finally, looking ahead, an important next step could involve conducting a study similar to ours and Andersen et al.'s, but in a more ecologically valid design.As we acknowledged earlier, our study was far removed from any settings in which actual religious experiences take place, so the conclusions regarding these experiences are quite limited.We believe that a more ecologically valid, conceptual replication of our study, where the expectancy manipulations were applied to more real life-like information, would be in order.A great way to do this would be to arrange a place which participants could enter with varying expectations regarding not only the quantity but also the type of agents, and manipulate the place's soundscape with the use of relevant noise.Finding an effect similar to ours in a naturalistic setting, would make a strong argument to consider the predictive processing-based mechanism as one of the building blocks of religious experiences.Another possible design could involve the use of a manipulation that induces strong expectation of spiritual experiencesuch as the sham God-helmet (see, e.g., Simmonds-Moore et al., 2019) next to a manipulation of expectations regarding the number of voices that can appear.

Conclusions
In our preregistered study, we found that the detection of voices can be driven by expectations and that this effect is enhanced under conditions of sensory unreliability.Our findings are consistent with the predictive processing account of agency detection and add to the discussion on the boundary conditions of voice hearing experiences.

Funding
The research has been supported by a grant from the Priority Research Area 'Society of the Future' under the Strategic Programme Excellence Initiative at the Jagiellonian University.

Fig. 4 .
Fig. 4. Estimated marginal means of confidence by group and noise for hit responses (A) and for correct rejections (B).Note.Error bars represent SEs.Higher scores represent higher confidence ratings for (A) hit responses in which the participants correctly identified the presence of a voice in the auditory stimuli; (B) correct rejections responses in which the participants correctly identified the lack of voice in the auditory stimuli.

Table 1
Mean number of responses in signal detection task by noise and group.
Note.FA = False alarms.CR = Correct rejections.SDs are shown in brackets.P.Szymanek et al.