A generative framework for the study of delusions

Despite the ubiquity of delusional information processing in psychopathology and everyday life, formal characterizations of such inferences are lacking. In this article, we propose a generative framework that entails a computational mechanism which, when implemented ina virtual agent and given new information,generatesbelief updates(i.e.,inferencesaboutthehiddencausesoftheinformation)thatresemblethoseseeninindividualswith delusions.WeintroduceaparticularformofDirichletprocessmixturemodelwithasampling-basedBayesianin-ferencealgorithm.Thisprocedure,dependingonthesettingofasingleparameter,preferentiallygenerateshighly precise (i.e. over-ﬁ tting) explanations, which are compartmentalized and thus can co-exist despite being inconsistent with each other.Especially inambiguoussituations, this canprovidethe seedfor delusionalideation. Further, we show by simulation how the excessive generation of such over-precise explanations leads to new information being integrated in a way that does not leadto a revision of established beliefs. Inall con ﬁ gurations, whether delusional ornot, theinference generatedby ouralgorithmcorrespondsto Bayesian inference. Furthermore,thealgorithmisfullycompatiblewithhierarchicalpredictivecoding.Byvirtueoftheseproperties,thepro-posed model provides a basis for the empirical study and a step toward the characterization of the aberrant inferential processes underlying delusions.


Introduction
Delusions are implausible beliefs which are held with absolute conviction and cannot be changed by countervailing evidence (Jaspers, 1913;American Psychiatric Association, 2013).They are among the core symptoms of psychosis and a majority of individuals with schizophrenia experience delusional beliefs during the course of their illness (Harrow et al., 1995).
Despite the importance of delusions in psychiatric nosology and their debilitating effect on patients, their underlying mental and biological mechanisms are still poorly understood.In particular, a generative computational framework for the study of delusions is still lacking.Such a framework, situated in the context of Computational Psychiatry (Montague et al., 2012;Stephan and Mathys, 2014;Wang and Krystal, 2014;Mathys, 2016;Adams et al., 2016;Huys et al., 2016), would allow for the systematic testing of mechanistic hypotheses regarding regarding the emergence and maintenance of delusions.This framework should be computational in the sense that it conceptualizes delusions in terms of formal mathematical computations imputed to the mind.Beyond that, it should be generative in the sense that it allows for building models of minds which can be configured so that they generate delusional beliefs (where both belief and delusional are welldefined mathematically while also reflecting the clinical usage of these terms).
In this article we make an initial suggestion for such a generative computational framework.We introduce a model that combines three strands of thinking about mind-building and delusion formation.This model is based on Dirichlet process mixture models of concept learning (Tenenbaum and Griffiths, 2001), hierarchical predictive coding (Rao and Ballard, 1999;Friston, 2005a;Sterzer et al., 2018), and the use and abuse of auxiliary hypotheses in hypothesis testing and Bayesian inference (Duhem, 1906;Quine, 1951;Strevens, 2001;Jaynes, 2003;Gershman, 2019).Based on our suggested model, we simulate agents who update their beliefs in response to new information.We show that by manipulating the single decisive parameter of our model, we can generate belief patterns which can be characterized on a spectrum from delusional to appropriate, given the agent's input.We interpret the agent's behaviour in terms of previous conceptualizations of delusions, and we point out possible empirical ways to quantify our model's delusion-generating parameter in experimentally or naturally observed behaviour.

Delusions as a consequence of aberrant inference
Our approach builds on the three conceptual foundations mentioned above.Turning first to hierarchical predictive coding, the idea that inferential mechanisms support the formation and maintenance of delusions has led to an influential characterization in terms of deviations from Bayesian inference (Hemsley and Garety, 1986;Coltheart et al., 2010).Similarly, biases of probabilistic reasoning have been invoked to understand the process of delusion formation, such as limited data-gathering ("jumping to conclusions", see Speechley et al., 2010;Dudley et al., 2016) or a bias against disconfirmatory evidence (Woodward et al., 2006).Furthermore, a failure to think of alternative accounts of the delusion (a lack of belief flexibility) was found to be related to how strongly a delusion was held ("delusional conviction"; e.g., Freeman et al., 2004;Garety et al., 2005), and a number of recent reviews have underlined the importance of cognitive biases and delusional ideation (McLean et al., 2017;Broyd et al., 2017;Bronstein et al., 2019).
Predictive coding (PC) is a general account of brain function (Rao and Ballard, 1999;Friston, 2005b) which assumes that the brain infers the causes of its sensations using a hierarchical model of its environment.Applied to psychosis, the account emphasizes the balance between top-down predictions and bottom-up prediction error (PE) signals (Fletcher and Frith, 2009;Corlett et al., 2010Corlett et al., , 2016;;Sterzer et al., 2018).In this framework, prior beliefs are encoded in predictions about sensory inputs.Discrepancies between these predictions and the actual sensory stimulation lead to changes in beliefs whose magnitude depends on the precision of the predictions.Delusion formation then reflects a compensatory response to imbalances of the hierarchical inference scheme (Adams et al., 2013;Corlett et al., 2016;Fletcher and Frith, 2009).Specifically, delusions might result from the attempts to explain highly precise low-level PEs.The resulting explanations are epistemically inappropriate beliefs at higher levels in the processing hierarchy (Adams et al., 2013;Schmack et al., 2013).

Central and auxiliary hypotheses
A second foundation for our approach is the notion of "explaining-away".This phenomenon occurs in Bayesian belief networks and denotes the case, when, given two potential causes for an effect, the presence of one cause makes another less likely.
In Bayesian terms, the maintenance of delusions (and beliefs in general) is usually attributed to strong prior beliefs.However, inductive inferences critically depend on the beliefs about the structural dependencies between the relevant variables.For example, what one person takes to be evidence for a hypothesis, another person interprets as contradictory evidence.This can happen without contradicting the rules of logic because the direction of belief updating depends on other beliefs (Jern et al., 2014).A ubiquitous example of this phenomenon is the "explaining-away" of evidence.This describes the case in common-effect networks in which the presence of one cause in a common effect network makes another less likely.This implies that the interpretation of an observation depends on the ability of the observer to generate additional assumptions, called auxiliary hypotheses, which can "explain away" the evidence or even turn it into its contrary.
The idea goes back to Duhem's (1906) and Quine's (1951) insight that evidence from an experiment cannot refute a single scientific hypothesis, but only a conjunction of hypotheses (cf.Strevens, 2001;Jaynes, 2003).Gershman (2019) presented an analysis showing that in a Bayesian model, hypotheses with weaker prior probability can act as a "protective belt" and, in the face of dis-confirmatory evidence, take the blame instead of a central hypothesis (i.e., one with a stronger prior).This represents an effective strategy of belief preservation that depends on the creation of auxiliary hypotheses.
While these demonstrations of the explaining-away effect assume the existence of auxiliary hypotheses as given, the framework we introduce here allows for the generation of new auxiliary hypotheses which serve to explain observations that, under a different configuration, could have been explained by nuancing an existing explanation.

Dirichlet process mixture models
Human reasoning processes have a characteristic ability to deal with uncertainties due to incomplete or noisy information and build openended models of adaptive complexity.Much of this uncertainty is due to unobserved variables and the relation between these.When reasoning about a particular course of events, we compare hypotheses about the statistical structure of the world.A common problem is to detect when observations can be partitioned into separate groups, where each group is explained by a distinct cause.A solution to this are Dirichlet process mixture models (DPMMs) (Teh et al., 2006;Doshivelez, 2009).These allow for inferring, for each data point, the group it most likely belongs to.A version of the Dirichlet process was independently proposed by Anderson (1991) for a theory of human category learning.Fig. 1 illustrates the behaviour of the model.Notably, it allows to model the classification into anomalies that require novel categories.The inference of a separate category has a strong influence on the subsequent belief updates, since data that belong to one category are assumed to be independent of all other categories.Crucially, the Dirichlet process prior assumes the existence of a potentially infinite number of groups and is this a model for open-ended learning, adapting to increasing amounts of data by increasing model complexity.This means that it provides a solution for the problem of model-selection, a best model is to be chosen in terms of accuracy and complexity.The Dirichlet process represents a suitable prior for such inferences and DPMMs are a Bayesian solutions to the problem of structure learning Gershman and Blei (2012).For this reason, DPMMs have found broad application in the modelling of higher-order human cognition (e.g., Kemp et al., 2010;Collins and Koechlin, 2012).

Model description
We harness the power of this approach in proposing a generic DPMM that describes delusion formation and maintenance.We do this in the context of a learner performing online inference about the latent structure of the environment based on a set of observed events.This constitutes a structure learning problem in statistics, and the learner is assumed to solve it (in a manner consistent with Bayesian inference) by iterating two steps.First, the learner has to partition the data into separate groups based on whether they are explainable by the same underlying cause.Second, given the grouping of the data, the learner can then infer a specific model for each group.We define the act of explaining an event or observation as inference of a single cause.Causes thus provide explanations for events.That is, they are models of the learner's environment (i.e., they define a probability distribution over current and future observations).The learner is equipped with a set of prior beliefs which are encoded in a hierarchical generative model for the events.Further, the learner has a set of existing models derived from prior experience of the world, which can be used to explain new observations.However, the existing explanations stand in competition with a mechanism for generating new explanations constructed from higher levels of the model, that is, from the prior over explanations.The structure of the prior belief of the learner allows for a potentially infinite number of causes.This means that, depending on their priors, learners can consider any new observation an anomaly, i.e. as belonging to a hitherto unobserved cause.A formal description of our model is given in the Appendix.We implement Bayesian inference for this model using Algorithm 8 from Neal (2000).
The assumption of an infinite collection of causes allows learners continually to discover new ones, building new theories, as they make more and more observations.Still, at any point there is only a finite number of causes (at most one per individual observation) and the ease with which new causes are assumed is affected by priors and by the concentration parameter α.Low values of α favour a small number of causes that each account for many observations, while high values favour many small uniformly sized clusters of observations.
Inference about the underlying cause of an observation proceeds in two steps.In a first step, m potential explanations are drawn from the generative model M. For Gaussian models, the explanations correspond to parameter values (μ, τ), which are drawn from the prior.In a second step, these candidates are compared with the set of already known explanations in terms of their plausibility (i.e.likelihoods).The plausibility judgments are modulated by the respective prior probabilities.These are proportional to the number of previous observations accounted for by an existing explanation.The prior probability for previously unobserved causes depends only on the α parameter, which encodes a general expectation of new causes.The assignment to a cause is chosen according to these factors.The proposals for new causes drawn from the prior that were not selected are discarded after this step and new proposals are drawn for the next inference.Following the assignment of an observation to a cause, the next inference step is to integrate the information into the model associated with that cause.The specific form of this belief update depends on the form of the cause-specific models.After updating the separate hypotheses, the higher-level beliefs are updated.These may include hyper-priors over the parameters of the prior distribution for the cause-specific models and the belief about α.Intuitively, after inferring many new causes, the belief about α will change so that this becomes what is expected in the following.Iterating over these belief updates constitutes a Markov chain that leads to an approximation of the correct posterior belief (Neal, 2000).

Simulation of the emergence of a delusion
As an illustration of our model's basic belief dynamics, we demonstrate an inference process that can be characterized as appropriate or delusional depending on the setting of a single parameter, the expected precision of explanations μ τ .In what follows, we explain data y ∈ ℝ based on simple Gaussian assumptions.That is, the cause-specific models are Gaussians characterized by mean and precision parameters . The prior distributions for the cause-specific ), respectively.These priors influence the generation of candidates for new explanations.They also play a role in the process of updating the internal structure of existing explanations (through Bayes' rule, as in all Bayesian accounts of inference).
Of special interest is μ τ , the expected precision of explanations.Under Gaussian assumptions, it is the mean of the prior on the precision parameter τ k for explanation k.In other words, it specifies the prior belief about the expected inverse variance of observations under any of the currently held models.Generalizing beyond Gaussian assumptions, the expected precision can be cast as the negative entropy of explanations generated by the prior.In this view, high expected precision implies a prior criterion for generating explanations: it favours those explanations that, conditional on being true, assign a high likelihood value to observations.Such strong priors about the expected precision lead to an "overfitting" of explanations, that is, generating hypotheses that overaccommodate the current data.This is related to a suggestion made in previous accounts of delusional thinking (Stone and Young, 1997;Mckay, 2012) that a bias toward "explanatory adequacy," whereby the likelihood is over-weighted at the expense of the prior, plays a role in delusions.For example, Coltheart et al. (2010) develop their account with reference to Capgras' delusion, which involves the belief that a close friend or relative has been replaced by a physically identical impostor.Mckay (2012) explain Capgras' as arising from brain damage or disruption, which causes the face recognition system to become disconnected from the autonomic nervous system, generating anomalous data (Factor One).This disconnection occurs in conjunction with a bias toward explanatory adequacy (Factor Two), such that the affected individual updates beliefs as if ignoring the relevant prior probabilities of candidate hypotheses.
Our DPMM account provides a different perspective.The possibility to assign observations to different explanations allows for deviations from the ideal of a single coherent belief system.In this account, delusional belief updating results from an exaggerated preference for Fig. 1.Categorization and explanation in our framework (schematic).In the top panels, the same initial belief is depicted on the left and right, with separate explanations (causes) represented by Gaussians.On the left, the new observation (black dot) has a larger deviation from the existing causes than on the right.Here, the model infers a new cause and fits a corresponding cause to explain the observation (red Gaussian, bottom left).On the right, a less extreme observation is integrated into an existing cause (blue Gaussian, bottom right), which leads to a change in the structure of the corresponding explanation.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)high-precision explanations.Observations are assigned to highly precise explanations, which, once generated, are evaluated only by their likelihood, which will be high by construction.In this manner, our framework allows for the co-existence of many high-precision explanations, which corresponds to a compartmentalization of an individual's worldview into manypossibly contradictorymodels.
Fig. 2 illustrates this in the context of delusional misidentification as described in a case study of Capgras' delusion (Hirstein and Ramachandran, 1997).Instead of attributing small variations (whatever their origin) to randomness or coincidence, patient DS infers additional explanatory structure.Hirstein and Ramachandran (1997) proposed that Capgras' might be part of a more general memory management problem: When you or I meet a new person, our brains open a new file, as it were, into which go all of our memories of interactions with this person.When DS meets a person who is genuinely new to him, his brain creates a file for this person and the associated experiences, as it should.But if the person leaves the room for 30 min and returns, DS's brain, instead of retrieving the old file and continuing to add to it, sometimes creates a completely new one.Why this should happen is unclear, but it may be that the limbic emotional activation from familiar faces is missing and the absence of this 'glow' is a signal for the brain to create a separate file for this face (or else the presence of the 'glow' is needed for developing links between successive episodes involving a person).
Here, instead of memory files, we suggest that observations are filed away in separate explanations.A delusion results because the expectation of high precision leads to over-precise explanations that do not generalize and therefore lead to large prediction errors in the face of additional data.At the same time, the compartmentalization of separate explanations prevents belief change and elaboration in spite of these large prediction errors since it prevents "joining the dots".These elements combined lead to the phenomenon of aberrant salience as proposed in predictive coding accounts of psychosis (Kapur, 2003).Our framework explains this aberrant (increased) salience as prediction errors resulting from overly precise explanations.The emergence of central delusional beliefs is all but inevitable under these circumstances: anything confirming an existing explanation will (simply by the mechanics of the Bayesian inference mechanism associated with our DPMM) increase this explanation's "pull", but not its reach, while anything contradicting it is explained away with high precision.
While our framework is silent on the content of the central beliefs that are likely to emerge, it allows for models where candidate explanations generated are predominantly self-related, derogatory, grandiose, etc. Specific models of this kind within the proposed framework will be the focus of future work.

Simulation of delusion maintenance
In order to show delusion maintenance, we again make Gaussian assumptions, but this time with an established central belief.We simulate two learners differing only in expected precision μ τ , with identical initial belief and presented with identical observations.Fig. 3 shows the main result.Two belief systems differing only in their priors on μ τ change in a radically different manner when presented with observations that are either integrated (low μ τ ) into the existing explanations (i.e.clusters), or mostly require new explanations (high μ τ ) to be accounted for.Observations are created by sampling from a uniform distribution and the initial belief is represented by a cluster (n 1 =200) constituting an initial central hypothesis.After generation of 50 new observations, we compute the predicted labels for them.Next, we compute the posterior for the labels z i and the cause-specific parameters ϕ k = (μ k , τ k ),k=1,... by running a Gibbs sampler for 10 iterations, which is sufficient for convergence of the (now updated) central hypothesis.In each iteration the labels are re-sampled according to their full-conditional probabilities and the cause-specific are parameters re-estimated accordingly.This corresponds to Algorithm 8 in Neal (2000).
Fig. 3 shows the change in the belief regarding the "central hypothesis".The bottom left panel shows the updated belief of an agent with a Fig. 2. A simulation of delusional mis-identification.In a case study, Hirstein and Ramachandran (1997) presented Capgras' patient DS with a sequence of photographs of the same model's face looking in different directions (here, we represent the photographs as points on a line; observations that are perceptually similar fall close on this abstract dimension).The left panel shows a simulation of inference in healthy observers: a single underlying cause ("the same person, photographed multiple times"; represented as a single Gaussian) is inferred.On the right, the inference observed in patient DS simulated ("different women who looked just like each other"; represented by multiple Gaussians).The two simulations from our model differed only in the expected precision (left: μ τ ¼ 1 100 , right: μ τ =100).Inputs and all other parameters were equal.
relatively low value of μ τ , i.e. a value encoding the expectation of rather imprecise observations, corresponding to wide cause distributions.For this learner, the updated belief given the presented observations is more imprecise.In other words, it has become capable of integrating observations that where somewhat outside its initial distribution, leading to a widening of the density.This can be seen as signalling a reduction of certainty regarding the initial explanation for the observations.The right column shows the updated belief of an agent with a relatively high value of the expected precision parameter μ τ .Given this prior, the agent ends up with a belief that is not changed much in terms of "content" (i.e. the expected observations under the model k, namely μ k ) and is more precise than before.Inference with such a prior exhibits a confirmatory arbitration of evidence which leads to the reinforcement of current beliefs.Even slight deviations are treated as outliers so as to maintain the parameters and meaning of the central hypothesis.Note the simple Gaussians we used here serve to make a general point.It is in principle straightforward to replace them with more complex Bayesian networks representing nontrivial causal structures.Under conditions of delusional belief updating (i.e., aberrant μ τ ), the separation of explanatory categories prevents making connections between observations that challenge current beliefs and which could lead to very different beliefs altogether.Applying the simulation in 3 to the example by Coltheart et al. (2010) (p. 279), we may take the input to represent the various observations of their Capgras' patient: For example, the subject might learn that trusted friends and family believe the person is his wife, that this person wears a wedding ring that has his wife's initials engraved in it, that this person knows things about the subject's past life that only his wife could know, and so on.
Each of these observations would normally lead to a change in the central belief.However, the generation of ad-hoc explanations as in our simulation could explain how the subject maintains the impostor belief.

Discussion
We have introduced a framework allowing for the description and generative construction of delusional inference.This is based on approximate Bayesian inference using Dirichlet process mixture models applied to structure learning problems.We have shown how an optimal inference algorithm can, endowed with particular higher-order beliefs, exhibit behaviour resembling delusional inference.Importantly, the outcome of the inference process was influenced by the prior beliefs about the expected precision of explanations.A strong belief in precise observations leads to the plentiful generation of over-fitting explanations, some of which are bound to coincide with an observation, leading to their acceptance over an a priori more plausible explanation.

Relation to previous work
Hierarchical predictive coding is one of the most promising computational frameworks for the description of delusions, and a misalignment in the hierarchical signalling of precision has often been invoked as the underlying reason for the emergence of delusions (Corlett et al., 2007(Corlett et al., , 2009;;Fletcher and Frith, 2009;Sterzer et al., 2018).Our framework is fully consistent with these ideas.Indeed, it is exactly (not to say precisely…) an exaggerated expected precision μ τ which is Fig. 3. Belief preserving evidence integration.Initial belief (upper row) and final belief (lower row) after inference given new observations.The difference in final beliefs is a function of the expected precision μ τ alone.All other settings and inputs are the same.Bottom left: μ τ ~HN(100,10).The existing explanation (blue Gaussian) is elaborated (i.e., broadened) in response to new observations, which are to a considerable extent integrated into the already existing, but now elaborated, model.Bottom right: μ τ ~HN(1/100, 10).The existing explanation is narrowed, but its dominance remains unaffected.New observations which do not fit it exactly are explained away (i.e., assigned to their own little ad hoc explanations).While both of these ways of processing the same information correspond to Bayesian inference (albeit under different values for μ τ ), the inference process on the right can be characterized as delusional.Further details and the code for reproducing this simulation can be found here: https://tinyurl.com/y3m79qdw.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)sufficient to explain the formation and maintenance of delusional information processing.However, the approach we introduce goes beyond previous predictive coding accounts of delusions in that it comes with a fully specified generative algorithm.Furthermore, the large prediction errors entailed by an over-fitting structure learning process provide the basis for the phenomenon of aberrant salience, which in our framework can explain the emergence of central beliefs with high "pull" surrounded by ad-hoc explanations shielding them from elaboration.
Our model builds on and extends latent cause models in reinforcement learning (Courville et al., 2006;Redish and Johnson, 2007).Gershman et al. (2010) showed how state classification can be derived as rational inference in a Dirichlet process mixture model.While these authors focus on the role of the concentration parameter α, we investigate the role of prior beliefs on the inference of new causes and belief change.Another important difference is that in their model, inputs consist of features which include the context that needs to be inferred, while in our model the agent receives no additional cue about context but has to infer this from the observations alone.Furthermore, our model has an additional hierarchical layer which allows for varying prior beliefs about the precision of observations.

Single-factor versus dual-factor explanations of delusions
There is a debate about whether delusions can be explained by a single factor or whether there need to be at least two.Hierarchical predictive coding is the classic example of a single-factor framework (Fletcher and Frith, 2009), while two factors are required according to Coltheart et al. (2010).Our model speaks to this question in that it provides a generative process where changing a single parameter is enough to get from appropriate to delusional thinking.While this indicates that one-factor explanations of delusion formation and maintenance are possible, the framework does not preclude the presence of additional factors.For example, the process of hypothesis generation could be disordered in addition to the expected precision μ τ .Furthermore, the framework allows for quantitative comparisons of single-factor and k-factor hypotheses.
Our framework takes the perspective that belief states are never per se delusional, but rather the way information is processed can be delusional.From this perspective, it is the combination of the largely immutable central belief and the disconnected auxiliary hypotheses proliferating around it which together constitute the delusion.The delusionality does not lie in any one belief but in the way a belief (i.e., a model of the world) is prevented from being deepened and broadened.Instead, all the information that could drive such a deepening and broadening is explained away.While the models in our simulations were simply clusters of observations explained by Gaussians, Dirichlet process mixture models are not restricted to such simple examples.In principle, such Gaussian clusters can be replaced with elaborate causal models as in Tenenbaum et al. (2011).From the perspective of our framework, delusions are initially adequate causal models in need of elaboration.They are formed by arresting the development of a particular causal model and are maintained by the same mechanismkeeping the model insulated from new evidence.

Limitations and extensions
Our model does not by itself speak to the question how maladaptive expected precision μ τ could evolve developmentally.However, it fits closely with the concept of epistemic trust.This is "an individual's willingness to consider new knowledge from another person as trustworthy, generalizable, and relevant to the self" (Fonagy and Allison, 2014) and is of great clinical importance in the conceptualization and treatment of borderline personality disorder.Our framework allows us to interpret μ τ as an inverse quantification of epistemic trust (i.e., as a quantification of epistemic mistrust): low μ τ leads to the integration of new information and to a corresponding broadening and enrichment of existing models of the world, while high μ τ leads new information to be explained away when it doesn't fit an existing model exactly, accompanied by a narrowing of explanations.This provides a mechanistic computational account of epistemic (mis)trust, and it will be interesting to study the relation between empirical measures of expected precision μ τ and epistemic trust in future work.
An important limitation is that we have not estimated μ τ from observed behaviour.Not least, this is due to the difficulty of devising behavioural experiments where participants are given scope to behave in a sufficiently open-ended manner for ecologically valid forms of delusional behaviour to emerge while still keeping to a controlled experimental setting.For the study of delusional belief dynamics, popular experiments in computational psychiatry such as reversal learning tasks (Schlagenhauf et al., 2014;Waltz, 2017) or the beads task (Adams et al., 2018;Baker et al., 2019) are too restricted in the range of behaviour they allow.We therefore face the challenge of coming up with tasks that enable us to apply our framework to experimental data.
Examples of applications of DPMMs to experimental data are Collins and Koechlin (2012) and Donoso et al. (2014), where the authors model inferential computations underlying reasoning processes in the prefrontal cortex (PFC).Specifically, they showed that the PFC is involved in the monitoring of the reliability of the current and a number of counterfactual behavioural strategies in a learning paradigm.While in their tasks the reasoning processes were about behavioural strategies, similar metacognitive processes may be used in the inferential domain, for example in model selection.In this domain, it is challenging to infer metacognitive processes from behavioural data because the mapping from reasoning to actions is hard to constrain adequatelynot too simple (e.g., tasks involving binary choices, not requiring higher-order reasoning) and not too open-ended (defying formal analysis and modelling).It is therefore important to ground the design of such tasks in formal accounts such as the one we propose here.Furthermore, functional imaging combined with formal modelling can reveal differences in inference processes that may not be expressed in directly observable behaviour.Taken together, behavioural tasks calibrated for meta-inference, neuroimaging, and hierarchical modelling frameworks like the one proposed here hold promise for the understanding of delusions, which play out mostly within the unobservable realm of thought and only rarely relate to behaviours in predictable ways.

Conclusion
Our proposed framework is an initial attempt at a formal conceptualization of delusional thinking.While previous computational descriptions stopped short of proposing a fully generative process, our framework provides this.It covers the spectrum from delusional to appropriate treatment of new information with adjustments to only a single parameter, and it can describe the emergence and maintenance of a delusion as a one-factor process.Furthermore, our framework is consistent with Bayesian inference and hierarchical predictive coding.While this is only a first step which without doubt will be improved upon and empirical applications are still missing, it sets a benchmark by combining the properties just mentioned: generativity, simplicity, single-factor sufficiency, and consistency with Bayesian inference.

Declaration of competing interest
The authors declare no conflicts of interest.

Role of funding sources
Funding sources had no influence in study design; nor in the collection, analysis and interpretation of data; nor in the writing of the report; nor in the decision to submit the article for publication.

Appendix ADetails of model and inference algorithm
Formally, our model performs inference for a mixture model with a Dirichlet process (DP) prior.We assume a data set y = (y 1 , …,y n ) and a corresponding set of latent labels z = (z 1 , …, z n ).The generative model can be written as follows: CRP denotes the Chinese restaurant process, a particular representation of the DP that provides a probability distribution over the space of data partitions.For the choices we make in our simulation, this becomes Based on the partition structure in the generative model we can write the joint probability as where p G (y| θ) denotes the density of distribution G(θ) evaluated at y. Due to exchangeability of the DP, we can compute the full-conditional distributions by assuming the current observation has index n, where the full-conditional has a simple form that we use to perform Gibbs sampling: with the prior probability for that assignment, p(z n = k| z 1 , …, z n-1 ), given by , if k is an existing cause, i:e: k≤K and temporary candidate parameters for the m new components drawn their respective priors μ k ~N(μ μ , τ μ ) and τ k ~HN(μ τ , τ τ ), k = K < k < K + m.The parameters {z 1 , …, z n , ϕ 1 , …, ϕ K } represent the state of a Markov chain that is iteratively updated and can be used to estimate functions of the posterior over the parameters.Specifically, we iterate draws from the full-conditionals of the z and the cluster parameters ϕ according to Algorithm 8 in Neal (2000).

Simulation details
For the simulations for Fig. 3, we first initialize a single the cluster with an initial dataset D init = {(y i , z i )} i=1 200 .This means computing the posterior for cluster k given all data with z i = k.We simulated Random-Walk-Metropolis-Hastings single chains to obtain J=1000 samples from the posterior ϕ j * ~π(μ k , τ k | μ μ , τ μ , μ τ , τ τ ) and setting ϕ k ¼ 1 J ∑ J j ϕ ⁎ j .Given this initial belief state (a mixture with a single cluster), which was kept identical for the simulations with different priors, we perform Bayesian inference using Markov chain Monte Carlo sampling according to Algorithm 8 in Neal (2000).Specifically, we scan through new batch of data D new = {y i * } i=1 50 and sample the labels initial values for the z i * , i= 1,…, 50 according to the predictive probabilities.For each change in the partition implied by the z i , we update the affected cluster parameters by performing 10 MCMC steps toward the posterior (as described for the initialization), starting from an initialization at the previous estimate.After the initialization pass, we perform additional iterations where we iterate 20 times over all observations, both D init and D new and resample the cluster labels according to the algorithm detailed above.The simulation was performed with the following hyperparameter settings: μ μ =0.0, τ μ =1/10, τ τ = 10 and with the prior only differing for HN (μ τ (j) ,τ τ ), where, μ τ (1) =1/100 and μ τ (2) =100 for the two models.The simulation was implemented in Julia (https://julialang.org)and our code is freely available at: https://tinyurl.com/y3m79qdw.