The self and the Bayesian brain: Testing probabilistic models of body ownership through a self-localization task

multisensory


Introduction
Q2 Q3 Throughout our everyday experience, we are constantly accompanied by the pre-reflexive feeling of being "here and now", experiencing the external world from the location and perspective of a body that we perceive as our own. The ensemble of these experiences has been termed bodily selfconsciousness (BSC) and is considered the minimal building block of consciousness and self-awareness (Blanke et al., 2015). BSC and the identification with our own body are so rooted in our everyday experience that they are easily given for granted as a part of our normal brain functioning. However, a rich body of experimental studies suggests that BSC is a continuously built, in fieri phenomenon, linked to specific neural mechanisms. The manipulation of various aspects of multisensory processing can alter key components of BSC, such as body ownership, inducing self-attribution of an external object (Botvinick & Cohen, 1998) or dis-embodiment of an actual body part (Longo et al., 2008;della Gatta et al., 2016). These lines of evidence suggest that body ownership is the result of the multisensory integration of tactile, proprioceptive, and visual bodily stimuli in both the spatial and temporal domains. Accordingly, an influential theoretical view proposes that body ownership emerges when multisensory bodily stimuli are congruent with the normally experienced signals originating from one's own body and can be altered otherwise (Armel & Ramachandran, 2003). In line with Bayesian accounts of low-level multisensory integration (Ernst & Banks, 2002a), it has been proposed that such principle could be conceptualized as an inference-like process. Ramachandran (Armel & Ramachandran, 2003) proposed that the illusory ownership of a fake hand when stroked in synchrony with one's own hand (the rubber hand illusion, RHI), emerges from "Bayesian logic". The idea was that, since the repeated co-occurrence of visual and tactile stimulation would be very hard to obtain by chance, the brain deems the hypothesis that the fake hand and the real hand are the same, i.e., one's own hand, as the most probable. Over the years, such inference-based accounts have also included the integration of mental states (Moutoussis et al., 2014) and interoceptive signals (Limanowski & Blankenburg, 2013;Seth, 2013;Seth & Friston, 2016;Seth & Tsakiris, 2018), linking inference on internal bodily and neural states to self-consciousness.
Arguably, one of the keys to the success of Bayesian approximations of brain function is that they constitute a normative framework, i.e.: they find a clear evolutionary motivation in the need to behave optimally in a noisy sensory environment. In its initial field of application to low-level multisensory integration processes, the validity of such approach has been rigorously proven experimentally by showing signatures of optimality in various setups and sensory modalities (Alais & Burr, 2004;Butler et al., 2010;Ernst & Banks, 2002a;Thurman & Lu, 2014). Instead, although Bayesian descriptions of BSC have been popular for almost two decades, most accounts are purely conceptual (Apps & Tsakiris, 2014;Litwin, 2020;Noel et al., 2018;Seth, 2013;Seth & Friston, 2016) or mathematical (Kilteni et al., 2015). Experimental studies in their support, where modelling is paired with ad hoc behavioural assessments, are still rather scarce (Chancel et al., 2022;Fang et al., 2019;Samad et al., 2015) and do not provide conclusive proofs of the optimality of behaviour, weakening the motivation for the use of a normative model (see below for a detailed discussion).
Here, we aimed at extending the evidence base for Bayesian theories of BSC. To do so, we focused on body ownership as its key and arguably easiest component to be quantified. We tested a Bayesian model of hand ownership by means of a virtual reality-based reaching task, recently introduced to manipulate visual and proprioceptive disparity (VPD task) and derive an implicit measure of hand misattribution (Fang et al., 2019) (see below). We validated this model by rigorously assessing optimality through two alternative strategies: by testing the Bayesian model against appropriate competing models, and by separately measuring the unisensory visual and proprioceptive components.

Body ownership as the result of Bayesian causal inference
As previous Bayesian approaches to body ownership, the proposed model has its roots in classical models of multisensory integration (Ernst & Banks, 2002a). In such models, cues are weighted according to the inverse of their precision, under the assumption that they originate from the same physical source (forced fusion models). However, in the real world, stimuli occur simultaneously at multiple locations, and the brain needs to figure out which ones come from the same source and therefore have to be integrated (Shams & Beierholm, 2010). It has been suggested that this problem may also be solved in a probabilistic framework, i.e., Bayesian Causal Inference (Bayesian CI), whereby the brain infers the likelihood that two unisensory stimuli originate from the same cause, based on their spatial and temporal congruencies (K€ ording et al., 2007). This approach can be applied to the processing of unisensory bodily stimuli to explain how the feeling of owning a body as one's own could emerge from their integration. Based on a model of the expected mutual relations between the sensory stimuli normally originating from the body, the existence of the body itself would be inferred as their common physical cause. Then, the feeling of such a body as one's own would emerge by identifying with that "same old body always there" (James, 1890 Q4 ). This general principle has been translated into different mathematical formulations to model the relevant sensory variables (tactile, proprioceptive, visual cues etc.). As of now, such models have been experimentally tested only in two empirical studies with different experimental setups (Fang et al., 2019;Samad et al., 2015).
Samad and colleagues (Samad et al., 2015) used a Bayesian CI model to account for the RHI, whereby the estimated probability of common cause (P com ) of visual and tactile inputs, as a function of the congruency between tactile, real hand stimulation and visual, rubber hand stimulation, is taken as a measurable estimation of ownership for the rubber hand. According to the model, P com varies as a function of the spatial and temporal disparity between visual and proprioceptive cues about touch location and timing. Samad was the first to empirically test a specific prediction of a Bayesian model of body ownership. Indeed, as predicted by the model, the visual presentation of the rubber hand (in a position congruent with the participant's hand and within the hand peripersonal space), even in the absence of any tactile stimulation, was found to be sufficient to induce the illusion, although to a lesser extent.
More recently, Fang and colleagues (Fang et al., 2019) modelled ownership of a virtual hand as a function of visuoproprioceptive disparity during a reaching task, based on an adaptation of classic paradigms of visuo-motor rotation (Krakauer, 2009). In their experiment, macaques and human participants had to reach targets with their real (proprioceptive) hand, hidden from view and replaced by a virtual hand, presented with various degrees of visuo-proprioceptive disparity. The error in the final reaching position induced by the virtual hand increased as a function of its displacement within a given range of visuo-proprioceptive disparity but decreased for larger levels of disparity. Such behaviour was well modelled by the predictions of a Bayesian CI model, inferring the probability of visual information from a virtual hand and somatosensory information from the real hand originating from the same physical cause (P com ). Since large visuo-proprioceptive disparities are unlikely to be generated by the same physical object (hand), P com (and therefore, the probability that the virtual hand belongs to the subject) decreases proportionally to disparity, leading to attribute a lower weight to vision. Explicit ownership ratings in humans covaried with P com , suggesting that this parameter might provide an implicit measure of subjective ownership probability at a trial-by-trial level.

1.2.
Looking for a signature of optimality e limitations of previous studies The works from Samad and Fang indeed constituted a major advancement. They first demonstrated that the Bayesian CI can be applied to model some aspects of body ownership in experimental setting. Furthermore, they developed the paradigms that made this possible. In particular, through Fang's virtual reality (VR) paradigm it is possible to parametrize the degree of visuo-proprioceptive disparity and to easily collect unprecedented amounts of data for modelling. However, despite such important achievements, these previous works do not provide conclusive evidence on whether body ownership can be truly described as the result of the optimal integration of multisensory signals for the following reasons.
As already anticipated, Bayesian models assume optimality as a normative constraint to brain function. Typically, optimality is defined as the behaviour minimizing squared errors on multisensory estimates (in this case: position estimates) depending on a set of (unknown) free parameters (in this case: unisensory precisions). In the two previous studies, the match between behaviour and model predictions was used as a proof of optimal integration. Predictions from Bayesian CI are in line with classical principles of multisensory integration, stating that stimuli are integrated across modalities only if they are presented close in space and time (Cuppini et al., 2018;Stein et al., 2014;Stein & Stanford, 2008), i.e.: the "binding window" is spatio-temporally constrained. In order to show that those principles truly arise from Bayesian CI, the model should be tested against concurrent models implementing such spatiotemporal constrains outside the Bayesian framework. Until then, the pattern of decreasing visuo-proprioceptive integration with increasing spatio-temporal disparity observed by Fang and colleagues cannot be considered as an exclusive signature of optimality. Previous studies, instead, performed no model comparison, or only tested Bayesian CI against a forced fusion model, which also gives the Bayesian hypothesis for granted. Moreover, such model does not constitute a satisfactory competing model. Indeed, regardless of the Bayesian or non-Bayesian nature of the phenomenon, it is obviously wrong at large disparities due to the just mentioned spatio-temporal constrains on multisensory integration.
An alternative way to show that multisensory integration principles arise from Bayesian CI would be to directly link multisensory behaviour and its unisensory components. To do this, it is necessary either to manipulate the unisensory parameters in a controlled manner, or to directly measure them (Rohde et al., 2016). Compelling evidence in favour of optimal Bayesian CI would then be, for example, that the size of the spatial and temporal windows for integration vary with the amount of sensory noise consistently with model predictions. This was the strategy used in a very recent preprint by Chancel and colleagues (Chancel et al., 2022), who manipulated the amount of visual noise during the RHI. The authors found evidence in favour of a BCI model compared to a model that did not take into account the sensory manipulation. Alternatively, the windows for the integration of multisensory stimuli could be predicted from (independently measured) unisensory precisions. Such comparison was not possible in previous studies, where the unisensory parameters were either taken to be fixed values from the literature (Samad et al., 2015), or fitted from behavioural data (Fang et al., 2019).
In sum, we argue that rigorously testing the optimality assumption is crucial to support the normative justification of Bayesian models for BSC. This can be done either through model selection against appropriate alternative models, or by independently measuring/manipulating unisensory components.

Our approach
In the present work, we aimed at providing empirical evidence for the hypothesis that body ownership emerges from a Bayesian inference process, as a key component of quantitative bottom-up accounts of self-consciousness. To do this, we extended and revised the existing models and behavioural validations, focussing on spatial and temporal features of visual and proprioceptive inputs. We introduced a virtual reality adaptation of the reaching task used by Fang and colleagues e i.e., visuo-proprioceptive disparity task (VPD, Fig. 1a) e as a base c o r t e x x x x ( x x x x ) x x x for the assessment of the Bayesian CI model predictions. Note that, since the VPD task is based on an active reaching movement, both visuo-proprioceptive and visuo-motor congruencies are manipulated. However, for simplicity, we will refer only to visuo-proprioceptive congruency throughout the text. Compared to the most used multisensory illusions, such as the RHI, this approach has two main advantages. First, the quantitative nature of the variables at play allows collecting granular data which is especially suitable for modelling. Second, the use of an active motor task (as opposed to the RHI or to a passive proprioceptive drift measure), allows to collect the large number of trials that is needed for the rigorous validation of a computational model. In our approach, the link between subjective ownership and the Bayesian CI model can be only assessed indirectly, by correlating proprioceptive drift measures and subjective ratings. The link between proprioceptive drift and body ownership is an object of debate, as a correlation is not observed in all experimental setups (Abdulkarim & Ehrsson, 2016;Rohde et al., 2011). The lack of a perfect overlap between body ownership ratings and proprioceptive drift may reflect the multicomponent nature of bodily experience, which includes both the feeling of a body part as belonging to one's own body as well as occupying a specific location in space (and in time). These are considered different, yet related components of bodily selfconsciousness (Blanke et al., 2015;Blanke & Metzinger, 2009;Serino et al., 2013). In line with the approach introduced by Fang and colleagues, we chose to validate our model on proprioceptive drift as it is an implicit and objective measure that does not suffer the well-known limitations of explicit ratings and questionnaires. We tried to generalize such computational framework to body ownership in a second, separate step, by testing the correlation between proprioceptive drift and explicit ownership ratings. We argue that the advantages in terms of easiness of data collection and data granularity outweigh the lack of a direct model validation on ownership ratings.
In order to investigate different features of the Bayesian CI integration process, in addition to the spatial manipulation of visuo-proprioceptive disparity as in Fang and colleagues, we also modulated the temporal disparity, by adding different Fig. 1 e Experimental tasks and model rationale. In the visual-proprioceptive disparity task (multisensory task, a), a variable angle disparity a and temporal delay Dt are introduced between the real and the virtual hand during reaching movements towards a set of visual targets. The red hand represents the visual feedback from the virtual hand, the blue hand the proprioceptive feedback from the real hand, while the green hand is the final estimate resulting from visuo-proprioceptive integration. The relative weight attributed to the visual and the proprioceptive feedback determines the amount of error in the final position of the reaching movement. Hand position estimates (b) according to the Bayesian CI model (green) as a function of spatial and temporal disparities between the visual-virtual (blue) and proprioceptive-real (red) hands. The bottom row summarizes the set of tasks assessing each unisensory component. Proprioceptive precision: in the proprioceptive judgement task (PJ) (c) a virtual hand (red) is displayed in virtual reality at the left or the right of participants' real hand (blue), the perceived position of the hand has been determined using a two-alternative forced-choice converging algorithm. In the open-loop (OL) reaching participants reach targets without visual feedback of the hand (d). Visual precision (e): in the midline judgement task (MJ), participants have to report when they feel that a visual cue, moving across their visual field, is at their body midline. Temporal precision (f): in the simultaneity judgement task (SJ), participants evaluate the synchrony between the onset of their voluntary reaching movements and the displacement of the hand presented in virtual reality with a variable delay in the visual feedback. A visual morphing task (VM, not in the figure) has been used to measure the accuracy in the encoding of the visual features of ones' own hand, as part of the prior. c o r t e x x x x ( x x x x ) x x x 4 levels of delay between the participant's movement and virtual reality visual feedback. The addition of a temporal modulation strengthens the interpretation of the behaviour observed in the VPD task and the underlying model. Indeed, ownership (or disownership) has to be a perceptually unitary phenomenon, regardless of whether it arises mainly from spatial or temporal cues. However, in a purely spatial task, reaching bias can be explained as the result of visuoproprioceptive integration, with ownership being a mere epiphenomenon. If our temporal manipulation also induced the same reaching bias as the spatial manipulation, this simplistic explanation would be ruled out, and support the idea that ownership can be truly measured from reaching errors as the hidden variable linking spatial and temporal biases.

Main hypotheses of the study
In order to investigate whether body ownership can be modelled as a bottom-up process, whereby sensory information is integrated in a behaviourally optimal way, we have.
I) Validated a Bayesian CI model of an implicit measure of body ownership, as the reaching bias induced by the temporal and spatial manipulation in the multisensory reaching task (Hypothesis I). II) Assessed if the implicit measure modelled by the Bayesian CI model also accounts for the subjective feeling of ownership, as assessed through explicit questions (Hypothesis II).
Hypothesis I. Q5 We aimed at performing a rigorous model validation by tackling the two main limitations of previous studies. That is, we compared our Bayesian CI model to a set of truly concurrent models (model selection) and compared model estimates to independently assessed unisensory parameters (unisensory correlations).
The first approach requires that a Bayesian CI model, taking into account both the spatial and temporal (BCI ST ) manipulation, should outperform an appropriate set of concurrent alternative models (Hypothesis Ia). First, we tested it against the forced fusion model (FF) (replicating Fang et al., 2019 Q6 ) and a model including only spatial disparity (BCI S ). Crucially, we further tested our data against two heuristic, non-Bayesian models, designed to well describe experimental data within a non-Bayesian framework (Heuristic and semi-FF, see methods for a detailed description). These models simply describe a continuous transition from integration to segregation, in line with basic multisensory integration principles but outside a Bayesian framework.
The second approach to validate the BCI ST model is to independently investigate unisensory components to assess optimality. As mentioned, this can be done by either manipulating or measuring independently unisensory precisions. Here we chose the latter method due to the difficulty of accurately manipulating unisensory noise for both vision and proprioception in healthy humans during the VPD, multisensory task. Indeed, while manipulating only visual precision would be possible in our setup, manipulating proprioception with the same reliability is problematic. Here, we aimed at mapping the contribution of the different unisensory modalities to the multisensory process with comparable resolution, and we therefore chose to measure, rather than manipulating, unisensory precisions. This approach is particularly relevant in light of future applications of the proposed methodology to clinical populations suffering deficits that can affect any sensory modality. The Bayesian CI model provides predictions of the reaching bias as a function of both spatial and temporal disparity, depending on four free parameters: s v , the unisensory visual precision, s p , the unisensory proprioceptive precision, the temporal precision s t and a global prior about ownership of the virtual hand P p . The free parameters have been fitted from experimental data at the individual level by finding the parameter set that maximizes the match between model predictions and reaching bias. These parameters have been then independently extracted from a set of dedicated tasks ( Fig. 1) and correlated with those fitted from the multisensory task, thus assessing if reaching errors, and therefore the size of the binding/ownership window, are coherently predicted by unisensory precisions (Hypothesis Ib).
Both model selection (Hypothesis Ia) and unisensory correlations (Hypothesis Ib) can be considered as compelling signatures of optimality. Therefore, we chose to consider model validation as achieved in case at least one of these two methods proves to be successful.
Hypothesis II. In order to establish a full link between Bayesian integration and subjective body ownership, we hypothesize that explicit evaluation of the ownership feeling during the VPD task should match the probability of multisensory integration (P com ), estimated by the Bayesian CI model both in the spatial and temporal domain. To test this hypothesis, subjective ratings have been collected and put in relation with reaching bias to ascertain its link with subjective ownership.
In summary, this study provides evidence to validate or reject the hypothesis that body ownership arises from a Bayesian CI process, supporting the quantitative approach to the investigation of BSC (see Table 1) Q7 .

2.
Materials and methods

Transparency statement
We report how we determined our sample size, all data exclusions, all inclusion/exclusion criteria, whether inclusion/ exclusion criteria were established prior to data analysis, all manipulations, and all measures in the study.

Bayesian CI model
Following several successful approaches to model multisensory integration in probabilistic terms (Fang et al., 2019;K€ ording et al., 2007;Rohe & Noppeney, 2015;Samad et al., 2015), we modelled the process of visuo-proprioceptive integration in a Bayesian Causal Inference (Bayesian CI) framework. Early Bayesian models of multisensory integration, called forced fusion models, postulated that the brain estimates the position of a stimulus by simply combining unisensory estimates with a weight that is inversely proportional to their variance, quantified as mean squared error. Behaviour under such models is optimal (i.e.: it minimizes the mean squared error of position estimates) only under the condition that the sensory inputs considered in the different modalities always have the same physical source. Clearly, in real-life situations, where several stimuli are presented to different modalities simultaneously, this assumption is not granted. Therefore, before integrating unisensory estimates, the brain needs to infer whether and which stimuli need to be combined at all. Bayesian CI models account for this additional level of complexity by incorporating this inference in a probabilistic framework, in which the likelihood that two stimuli in different modalities have the same physical source is estimated from their features. In our case, this framework was applied to the integration of visual and proprioceptive inputs about the hand, focussing on the specific factors that we expect to intervene in our multisensory task. As already extensively documented at the qualitative level by behavioural studies, the main factors contributing to hand ownership in a visuo-motor task are spatial and temporal visuoproprioceptive congruencies. Therefore, to derive the equations of the BCI ST model, we started from describing the generative model of the sensory stimuli underlying those congruencies, that is, the joint probability distribution of physical stimuli and their associated neural representation. In particular, in the case of our reaching task, the physical stimuli of interest are the hand position defined by visual and proprioceptive stimuli (s v and s p , expressed in degrees from the shoulder), and their relative timing with respect to the reaching movement (t v and t p , expressed in seconds). First of all, the visual and proprioceptive inputs may have one (C ¼ 1) or two (C ¼ 2) causes, that is, whether the virtual hand is or is not the participant's hand. C is drawn from a Bernoulli distribution with probability P p Then, we model the joint probability distribution of visual and proprioceptive inputs in the spatial and temporal domain, Table 1 Q14 Q15

Question
Does a Bayesian CI process combining visual and proprioceptive information in the temporal and spatial domain account for implicit body ownership?
Does subjective body ownership, as assessed through explicit questions, emerge from the same visuo-proprioceptive integration process?

Hypothesis Ia
The proposed spatio-temporal BCI ST model, fitted to reaching errors in the VPD task, should outperform all the other alternative models (FF, BCI S , Heuristic, Semi-FF).

Hypothesis Ib
The parameters estimated from the unisensory tasks (s p , s v , s t , P p ) should correlate with the correspondent parameters extracted from the multisensory task through the BCI ST model.

Hypothesis II
The subjective ownership ratings are reflected by the estimated probability of a common source of visual and proprioceptive information about the hand (P com ) in both the spatial and temporal domain. Sampling plan (e.g., power analysis) We used the model fitting procedure described in the Analysis plan to extract unisensory parameters from the VPD task (s p , s v , s t , P p ). Then, we assessed the significant correlations of those parameters with the corresponding parameter extracted from unisensory tasks (OL, PJ, MJ, SJ, VM).
Pearson correlation scores between the standard deviation of the bivariate Gaussian fitted on ownership ratings and the width of P com extracted from the VPD in both the spatial and temporal dimensions. The scores are expected to be significantly positive. conditional on whether C ¼ 1 or C ¼ 2. Given the radial nature of our task, we use the angle from target origin as the most natural coordinate for positions. If C ¼ 1, then the visual and proprioceptive position of the hand is the same s v ¼ s p ¼ s and is drawn from a uniform distribution on the À90/90 range, approximating the set of reachable angles. Previous works (Fang et al., 2019;Samad et al., 2015) used a Gaussian centred in 0 and with very large standard deviation (s ¼ 10,000) to approximate a uniform distribution. While simpler to treat analytically, this choice is problematic since the value chosen for the width of the positional prior influences the fitted value of the common cause prior P p . This is because, while when the Gaussian is large enough it can always be approximated to a uniform distribution, the exact value of its standard deviation still influences model predictions through the normalization constant (see Supplementary material for the detailed calculation). In our case, by explicitly choosing a uniform distribution, the value of the normalization constant is naturally constrained by the reachable range and is thus less arbitrary. Similarly, the timing of visual and proprioceptive inputs related to the movement is the same, t v ¼ t p ¼ t, and is drawn from a uniform distribution. The range of the distribution was fixed at 0e30 sec, as a plausible value of the average interval between different movements. If C ¼ 2, s v , s p are drawn independently from the same uniform distributions. As routinely done in Bayesian modelling, in order to simulate variability in sensory inputs, we assume that the true positions and timings of the sensory inputs are corrupted by unbiased Gaussian noise to generate their internal representations The first two variables refer to visual and proprioceptive-motor positions, and the other two to visual and proprioceptive-motor timing of movements, respectively. Then, the Bayes theorem allows to compute the posterior distributions for the positions of the stimuli and the number of underlying causes that an ideal observer would compute, provided that she/he knows the distribution of internal representations conditioned on the true positions and number of causes. Starting from the number of causes of the observed stimuli, we have: Excluding for simplicity the region outside the À90/90 and 0/30 sec range, where the contribution to the integral is negligible, the likelihood functions defined by our generative model are: where a is the normalization constant, d s and d t denote spatial and temporal disparities respectively, and s 2 s and s 2 t are short forms for the sum of spatial and temporal variances. See Supplementary material for details about the approximation in equation (4). When there are two separate causes, we simply have: Therefore, the probability of common cause is given by: The final estimate of hand position is obtained by combining the forced fusion and proprioceptive estimates, weighting them by the probability of common and separate causes, respectively: Then, we explored BCI ST model predictions about sense of ownership (P com ) in a spatio-temporal disparity setup through numerical simulations. To illustrate model predictions, shown in Fig. 2, we selected plausible parameters for the unisensory precisions and the prior, and a set "ground truth" temporal and spatial disparities, representing the actual physical spatial and temporal disparity between visual and proprioceptive inputs. Then, we added Gaussian noise of variance s s and s t respectively, in order to obtain samples from the noisy internal representation of the stimuli. As noted in K€ ording et al. (2007), this procedure is the only correct mean of simulating behavioural experiments within Bayesian models of brain function. The process was repeated 1000 times and the probability of common cause P com was extracted following equation (6) and averaged across the 1000 trials. We show how P com varies as a function of spatial and temporal disparity in Fig. 2a. Coherently with expectations and qualitative findings from behavioural studies, the analysis resulted in a region of very high ownership probability when spatio-temporal incongruences are below a certain threshold. This can be seen as the mathematical counterpart of the empirical notion that in the case of little or no disparity, as in normal conditions, the feeling of ownership is granted and constant. We then simulated the expected results from our VPD task, for the same set of parameters. For each spatio-temporal disparity, we simulated 1000 trials (a number large enough to render sampling noise negligible) by adding Gaussian noise to the real positions and timings. Then, we extracted the hand position estimate according to BCI ST model and computed the average reaching 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65   66  67  68  69  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96  97  98  99  100  101  102  103  104  105  106  107  108  109  110  111  112  113  114  115  116  117  118  119  120  121  122  123  124  125  126  127  128  129  130 CORTEX3732_proof ■ 8 August 2023 ■ 7/26 bias as a function of the spatial and temporal disparity. Finally, to illustrate how model predictions can be recovered from noisy behavioural data at the single participant level, we performed the same simulation, with the limited subset of spatiotemporal disparities and the number of trials of our actual experiment. Fig. 2d shows the ownership probability, as extracted from our simulated behavioural experiment and Fig. 2c shows the reaching bias associated with the spatiotemporal disparities explored in our setup.

Alternative models
As a first model validation, we tested the spatio-temporal Bayesian CI (BCI ST ) model against the forced fusion model shows the average probability of common cause as a function of spatial and temporal disparity, obtained after simulating BCI ST model predictions for 1000 trials (in order to obtain a virtually noiseless prediction). The red half-circle denotes the area where the probability of common cause is above 95% and can be seen as the region corresponding to a subjective experience of complete ownership. Only positive values of temporal disparity are plotted, as they are the only that can be achieved in our experimental setup, but model predictions are symmetrical with respect to time. Panel (b) shows the (averaged, approximately noiseless) simulated results for a participant with the same typical parameters in the visual-proprioceptive disparity task. The y axis indicates the proprioceptive drift in the reaching movement, so that movements completely based on proprioception would have drift equal to 0, and movements completely based on vision would have drift equal to the shown disparity. Different degrees of green denote different values of temporal delay, increasing from dark to light green. Panel (c) is the result of the same simulation as panel (b), run with spatio-temporal disparities and number of trials matching the experimental design of our VPD task to obtain data from one surrogated participant. The 2D heat map in (d) shows how model predictions based on parameters fitted from noisy simulated experimental data match results based on the ground truth parameters used in panels (a), (b), (c). Values of s v , s p , s t and P p were obtained by fitting the BCI ST model on the simulated data shown in panel (c) and used to recover the expected probability of common cause as done for panel (a). The overall shape of P com as a function of spatial and temporal disparity is very similar to the one obtained from ground truth parameters. Inside the black circles we show "empirical" P com values, defined as the ratio between simulated drift and the forced fusion estimate, so a drift coinciding with the forced fusion estimate would correspond to P com ¼ 1, and no drift with P com ¼ 0. This analysis was only performed for visualization purposes and is not used to extract model parameters, as they are recovered more robustly by directly fitting reaching errors from the VPD task. 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65   66  67  68  69  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96  97  98  99  100  101  102  103  104  105  106  107  108  109  110  111  112  113  114  115  116  117  118  119  120  121  122  123  124  125  126  127  128  129  130 CORTEX3732_proof ■ 8 August 2023 ■ 8/26 (FF), to replicate the analysis presented by Fang and colleagues. Using the same notation as in the previous paragraph, the forced fusion estimate is simply: Additionally, we tested it against a Bayesian CI model including only the effect of spatial disparity in the estimate of P com (BCI S ). This is very similar to the original model proposed by Fang: As stated in the introduction, we also aim at challenging the BCI ST model through alternative models that, unlike the FF model, are specifically designed to well fit experimental data and are outside the Bayesian framework. Only if the BCI ST model also outperforms them, it can yield predictions that are subtle enough to advance our understanding of the link between body ownership and multisensory integration. Practically, we challenged the BCI ST model with two models that heuristically implement the combination of visual and proprioceptive information through the notion of a spatial and temporal binding window in multisensory integration, as known from classical literature (Cuppini et al., 2018;Stein et al., 2014;Stein & Stanford, 2008).
The first model is completely non-Bayesian, and we will hence refer to it as the Heuristic model. We assume that visual and proprioceptive information are combined according to weights that do not depend directly on unisensory precisions. We then implement the spatial rule of multisensory integration by imposing that the visual weight decreases as a function of spatial and temporal disparity following a bivariate Gaussian, peaking at zero disparity. A Gaussian random error is added to the estimate to model neural noise.
here W v denotes the weight of visual information at zero disparity, and the proprioceptive weight is set to 1 by definition, as the final estimate only depends on their ratio. a s and a t respectively indicate the spatial and temporal width of the "binding window". Note that the estimate is based on the true position of the physical stimuli (s p and s v ) instead of their noisy neural representation (x p and x v ), since the model is purely heuristic, and the stochasticity of neural processing is rendered by the Gaussian error term added to the estimate. The second model is based on the forced fusion model, and it therefore employs noisy internal representations as a starting point. However, unlike in the Bayesian CI model, the transition between the integration and the segregation regime is not governed by optimal inference, but again by the heuristic principle of a spatial and temporal binding window which is not optimised on unisensory precisions. In other words, the forced fusion estimate is obtained according to classical optimal integration, and it is then combined with the purely proprioceptive estimate with a weight that decreases as a Gaussian function of spatial and temporal disparity. Crucially, the spatial and temporal width of such binding window do not depend on unisensory precisions. We will therefore refer to this model as the semi-forced fusion model (Semi-FF).
As shown in Fig. 3, the predictions of both the Heuristic and semi-FF models on the average error as a function of disparity are very similar to those of the BCI ST . The key difference is that, unlike in the BCI ST model, the parameters governing the average disparityeerror relation are partially or completely independent of the parameters governing the dispersion of individual reaching errors around the average. Therefore, both the Heuristic and Semi-FF models yield subtly but consistently different predictions, allowing disentangling the different underlying functions by model comparison on behavioural data (see Sampling plan) to assess the presence of a Bayesian signature in the integration process.

Materials and general procedure
The battery of tasks is administered via the Oculus Rift S Virtual Reality system, comprising an Oculus Rift S headmounted display (HMD) and two Oculus Touch, or motion controllers. visual morphing (VM) is implemented in Python, while all the other tasks are implemented in Unity and are compatible with other virtual reality systems allowing 6 axis hand and head tracking (e.g., HTC Vive). A virtual reality implementation of our setup was preferred to a physical implementation for several reasons. First of all, the tracking accuracy achieved by modern HMDs, albeit still inferior to dedicated kinematics recording systems (e.g., Vicon optical camera system), allows to record kinematics with a precision that should be largely sufficient for our tasks (<1 cm) (Jost et al., 2019;Spitzley & Karduna, 2019). Second, the usage of a commercial, readily available apparatus, allows a quick and standardized replication of the tasks for large-scale data collection and sharing. Third, the use of an immersive environment allows to fully control and standardize visual inputs, while enhancing the vividness of the task experience. The behavioural experiment consisted of a main task (VPD) and five complementary tasks. The VPD task yields an estimate of ownership of a virtual hand as the tendency to integrate its visual position with proprioception while estimating one's hand position. It consists of the repetition of a reaching movement while a variable angular disparity or temporal delay is introduced between the movement of participants' real hand and a virtual hand displayed in immersive VR. The other tasks assessed independently the relevant parameters included in the BCI ST model (i.e., s p , s v , s t and P p ).
To assess the proprioceptive unisensory precision (s p ) we considered several options. Position matching tasks are commonly used in literature to assess proprioception. However, these tasks present some difficulties, as they may require a memory component (Rincon-Gonzalez et al., 2011), or the computation of a symmetric target location compared to a reference position in the contralateral space [i.e., a mirror-matching task (Dukelow et al., 2010)]. Another important disadvantage in using a mirror-matching task would be that the proprioceptive precision of the active body part (in our case, the right hand) cannot be disentangled from the precision of the target body part (in our case, the left hand). This issue is especially relevant since this model and the associated tasks may be applied in the future to the study of clinical populations with lateralized deficits (e.g., stroke patients suffering a contralesional upper limb impairment).
For this reason, we searched the literature for previous studies in which the proprioception precision was assessed unilaterally (Desmurget et al., 2000;Haggard et al., 2000;Jones et al., 2010;Longo & Haggard, 2010). We devised a proprioceptive judgement task (PJ, Fig. 1c) in virtual reality in which the felt position of the hand is assessed through a forced choice converging algorithm. However, the parameter s p estimated through the model from the multisensory task likely incorporates motor components that are not captured by a purely proprioceptive task. Therefore, in addition to the PJ, an open-loop reaching task (OL) was used to isolate the combination of motor and proprioceptive components which can determine the end-point precision in a reaching task (Fig. 1d). In this task, participants performed the same reaching movements towards virtual visual targets as in the multisensory task (VPD), but in the absence of visual feedback about hand position.  (aec) show the induced drift in the VPD task for a simulated subject in the three models respectively. Model parameters were set to plausible values, and the data was simulated first for the BCI ST model. Then, such data was fitted with the Heuristic and Semi-FF model to obtain the set of corresponding parameters able to generate the most similar results in the two alternative models. Those parameters were then used for panels (bec). This way, the three models can be compared in the case where their predictions are the most similar. For simplicity, only the zero temporal lag was included in the figure. The mean drift (solid line) is rather similar across the three models, but the dispersion around it is different at all disparities between the BCI ST and Heuristic models, while the difference is present mainly at intermediate disparities between the BCI ST and the Semi-FF models. As shown in our Sampling plan, this difference should be enough to disentangle the different models with the planned sample size. Panels (dee) show the full distribution of simulated drifts for the three models at ¡26.6 degrees of disparity, where the difference between the BCI ST and the other models is most evident, in that only the BCI ST model predicts a skewed distribution. The red asterisks show 24 simulated reaching errors (the same number of trials of our actual task), to qualitatively demonstrate that the difference can be appreciated in our experiment.
Concerning visual precision, it is worth noting that, despite visual acuity is extremely high compared to proprioceptive precision in humans (Jones et al., 2010;Kniestedt & Stamper, 2003), the final accuracy in visually determining the hand's position does not depend on visual acuity alone. Indeed, when coordinating vision and proprioception in a motor task, it is necessary to transform the extremely accurate retinotopic visual information in body-centred coordinates, through a set of computations involving gaze angle and head orientation (Buneo et al., 2002). In this view, we believe that participants' precision in visually determining their body midline, as assessed by a midline judgement (MJ) task (Fig. 1e), can be used as the closest approximation to isolate the contribution of visual information to position estimates in our task.
The temporal precision is the third main parameter of our model. This component was measured through a simultaneity judgement task (SJ), whereby participants evaluated the synchrony between the onset of a voluntary reaching movement and the displacement of the hand displayed in virtual reality (Fig. 1f).
The purely "bottom-up" Bayesian approach focussing on visuo-tactile congruencies, initially used to model body ownership, has been extended in conceptual models to incorporate also "top-down" cognitive constraints, given by the posture and the visual appearance of the body (Tsakiris, 2010;Tsakiris et al., 2010;Tsakiris & Haggard, 2005). These (Armel & Ramachandran, 2003;Ehrsson, 2005;Makin et al., 2008;Tsakiris, 2010;Tsakiris et al., 2010;Tsakiris & Haggard, 2005) have been described as the result of another inference process, comparing incoming visual features with an internal model of the body to estimate the probability that they originate from one's own body (Apps & Tsakiris, 2014;Kilteni et al., 2015;Tsakiris, 2017). Nevertheless, the role of visual appearance has not yet been studied quantitatively within a Bayesian framework. Although the problem is too complex and high dimensional to be explicitly modelled in a Bayesian CI model, its influence is reflected in the global "prior" as the marginal probability of all the factors that are not modulated during the experiment and included in the other parameters' computation. Based on previous accounts (Tsakiris, 2010) and as suggested by Fang's study (Fang et al., 2019), most of the variance in such "prior" is likely explained by high-level visual features of the stimulus. In the attempt to quantify their role, we implemented a VM task, based on the continuous morphing of pictures of the participant's real hand into other peoples' hands. Similarly to what was planned for the visual and proprioceptive precision in the spatial and temporal domain, we measured participants' accuracy in recognizing their own hand against other realistic hands. Note that we deliberately chose not to test their recognition ability against virtual hand avatars (as the ones used in the VPD task), in order to make the task more challenging, according to internal pilots. Furthermore, this would allow to demonstrate that the visual recognition ability predicts behaviour in a general, rather than an idiosyncratic manner, allowing a more compelling validation of the Bayesian CI model. If the inter-individual differences in discriminating the visual appearance of one's own hand play a quantifiable, probabilistic role in determining ownership, we expect that the accuracy in the VM task should correlate with prior probability of cue combination fitted from the multisensory task.

Visual-proprioceptive disparity (VPD) task
Participants are requested to sit in front of a chest-height table, with their arm placed in front of them. Participants wear a head-mounted display (HMD) and hold a motion controller in their right hand. During the experiment, participants cannot see their real hand, but a realistic hand is displayed in virtual reality using the tracking of the motion controller. During the task, the spatial congruency between visual and proprioceptive information is manipulated as an angular disparity between the real (proprioceptive) and a virtual (visual) hand. Moreover, a delay is introduced between the onset of the real and the displayed movement in order to alter the temporal congruency of the stimuli (Fig. 1a).
Participants are asked to make reaching movements to targets in virtual reality (white spheres with 3 cm diameter) from a fixed starting position. The starting point is a sphere of 15 cm diameter, fixed 15 cm away from the participant's sternum. Target positions are arranged on an arc centred on the resting position. The arc radius is set according to each participant's maximum reaching distance, calibrated at the beginning of the experiment.
The task consists of three experimental blocks with slightly different designs. In the first three blocks, 7 targets (from T1 to T7) are equally spaced between À45 and 45 with respect to the participant's sternum. Across trials, the visual hand is randomly rotated with a given angular disparity from the participants' proprioceptive hand, with their sternum as the (vertical) rotation axis. Additionally, a temporal delay of 0, 100, 250 or 400 msec is added between the onset of the movement and the displacement of the virtual hand. For the 0 msec delay condition, 7 spatial disparities are used: 0 , ±13.3 , ±26.6 or ±40 (þ: clockwise, CW; À: counterclockwise, CCW). For 100, 250 and 400 msec delay conditions, 6 spatial disparities, uniformly distributed on the same range, are used: ±8 , ±24 or ±40. This was done to increase the variability of the explored disparities, and to avoid collecting uninformative trials at zero spatial disparity. All the possible combinations between target position, temporal and spatial disparity [7*(7 þ 6*3)] are tested in randomized order for a total of 175 trials in each of the first three blocks. In the fourth block, in which subjective ratings of ownership are also collected (see below), only 3 targets are presented to keep the total duration constant, and again one trial is collected for each combination of target, disparity and delay (75 trials in total). Each block lasts approximately 15 min, and 600 trials were collected over approximately 75 min (including 5 min breaks between blocks).
Participants are requested to place their hand on the starting position to initiate a trial. At the beginning of the trial, the virtual hand is rotated by one of the possible disparity angles during 1 sec, and the target appears. This mismatch between the real (proprioceptive) and the virtual (visual) hand is maintained for 1.5 sec as the preparation period. In order to make apparent the temporal delay along with the angular disparity during the preparation period, participants are instructed to make a movement of prono-supination of the hand at a speed of approximately 1 Hz while fixating the  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64 virtual hand. After the preparation period, the target is turned green as a "go" signal. Movement of the hand outside the resting position at any time before the "go" cue automatically restarts the trial. Participants are instructed to reach the target with their real (proprioceptive) hand and return to the resting position within 1.5 sec, ending the trial. The spatial and temporal mismatch is maintained throughout the whole trial along with the hand movement.
Additionally, in the fourth block participants are requested to report their subjective feeling of ownership for the virtual hand through the virtual interface, evaluating their agreement with the statement [adapted to VR from Fang et al. (2019)] "I felt as if the virtual hand was my hand" on a 1e10 Likert scale, by pointing to the answer with the motion controller.

Proprioceptive judgement (PJ) task
The set-up of this task is similar to the VPD (Fig. 1c). Participants' real hand is not displayed in virtual reality. The experimenter passively moves participants' real hand to one out of 7 possible target position (from T1 to T7) arranged at 0 , ±15, ±30, ±45 with respect to participants' sternum on an arc with radius equal to each participant maximum reaching distance. Target position is selected randomly trial by trial. A two-alternative forced-choice converging algorithm is used to find the position in which the participants perceive their hand. At the beginning of each trial, a virtual hand is displayed at þ30 (right) or À30 (left) with respect to participants' real hand. The sign of the initial angle is randomized trial by trial. Participants then report whether they feel that the displayed hand is located to the left or right of their real, unseen hand. In the following step, the position of the virtual hand is moved halving the angle and mirroring it in the opposite direction with respect to participants' previous answer. In five steps, the algorithm converges towards a certain angle at which participants have an equal probability of reporting left or right. The proprioceptive-based estimation is computed as the intermediate hand position between the last displayed position and the next position that would have been displayed by the algorithm according to the participant's last answer. Each target position is tested 4 times in randomized order, for a total of 28 trials.

Open-loop reaching task (OL)
The set-up of this task is similar to the VPD task (Fig. 1d), besides the fact that the virtual hand is not displayed; therefore, participants do not receive any visual feedback about their hand for the entire duration of the experiment. From a fixed starting position, participants are asked to make a reaching movement to one out of seven visual targets (from T1 to T7), arranged at 0 , ±15, ±30, ±45 with respect to participants' sternum. Target position is selected randomly trial by trial. Participants are required to place their hand on the starting position for 1.5 sec to initiate a trial. After the initiation period, one of the targets appears. The reaching target is then turned green as a "go" signal. Movement of the hand outside the resting position at any time during the initial resting period automatically restarts the trial. Participants have to reach the target with their real hand and come back to the resting position within 1.5 sec, ending the trial. Each participant is asked to complete 10 trials for each target (10*7 ¼ 70 trials).

Midline judgement task (MJ)
In this task, participants are asked to sit on a chair keeping their head and trunk aligned while wearing an HMD. On each trial, a white sphere with 3 cm diameter moves horizontally across participants' field of view at a speed of 10 /s, starting from ±45 , ±40 , ±35 , and ±30 from the body midline, on an arc centred on participants' sternum, with a radius equal to their maximum reaching distance, as in the VPD (Fig. 1e). Participants have to report when they feel that the visual cue is aligned with the midline of their body by pressing a response button, with the possibility to subsequently ask the experimenter to manually adjust the judgement. The starting positions of the visual cue are randomized across trials. 24 trials in total are collected.

Simultaneity judgement task (SJ)
This task consists of a series of reaching movements in virtual reality towards 10 targets (from T1 to T10) equally spaced between À45 and þ45 with respect to participant's sternum (Fig. 1f). On each trial, the displacement of the hand in virtual reality is delayed by a variable amount with respect to the onset of the movement of the real hand, spanning 8 values equally spaced between 0 and a maximum of 175 or 350 msec (see below). 10 trials are collected per temporal disparity, with each target repeated once. The order of targets and delays was randomized across the task. On each trial, participants are asked to report whether their movement and the displacement of the virtual hand occurred at the same time answering the question "did you notice a delay between the virtual hand and your hand?". The answer is recorded as a binary variable by the experimenter.
A practice session was conducted to let participants familiarize with the task and to determine for each subject the maximum delay for the experiment. The practice session consisted in 15 trials: five for each of the 0, 175 and 350 msec delays. If the participant noticed the delay (when present) in at least 9 out of 10 trials, the maximum delay was set at 175 msec and 350 msec otherwise.

Visual morphing task (VM)
At the beginning of the task, 3 digital pictures of the participant's right hand are taken, with 3 different postures with varying distance between the fingers (narrow, medium and large). The pictures are converted to black and white, scaled to 300*400 pixels, and the background is removed. Each picture is morphed towards 10 target hands from a fixed database of hands (5 male and 5 female lab members). The morphing was performed using an automated feature mapping software (Liao et al., 2017). Ten intermediate morphing steps are created for each target, each frame representing a 10% incremental change from participant's hand to a target hand or from 100% self to 0% self. A random rotation and translation (uniform on the À10/10 and À10/10 pixels range, respectively) are added to each image to prevent the participant from learning specific orientations-positions of the hands. The morphing of each image is checked by visual inspection prior to the task, and generates 100 morphing steps, uniformly c o r t e x x x x ( x x x x ) x x x 12   1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64 distributed between 0 (i.e., 0% self) and 100 (100% self). For each of the 10 target images, 15 steps of morphing are selected, in such a way that sampling is more frequent for intermediate (harder to recognize) levels of morphing. In the end frames number 1, 17,28,36,41,45,48,50,52,55,59,64,72,83, and 100 are selected. One frame for each target image and level of morphing is selected based on the quality of the morphing (e.g., absence of deformations, discontinuity of the margins, etc.). Hence, a set of 100 images is created. Participants sit in front of a screen, with their hand occluded from view. At each trial, one of the 100 possible images is presented on a computer screen, while the question "is this a picture of your hand?" is displayed at the top of the image. Participants answer to the question by pressing a left or a right button, for negative and positive answers respectively, and their response and reaction times are recorded.

Inclusion criteria and missing values
According to the power analysis described below, 40 righthanded participants were recruited. Inclusion criteria were: no history of neurological, vestibular or psychiatric disorder, normal or corrected to normal binocular vision for VR. Participants were informed about the inclusion criteria beforehand and asked to apply only if no criteria are violated. There was no outlier removal in the collected data. If at any point technical issues arose during an experiment interfering with the experiment's procedure or data-logging, the participant was excluded from the experiment in which the issue emerged. Issues rated as interfering with the experiment's procedure include any type of freezing of the displayed virtual environment or any other faulty distortion of the presented virtual environment. Technical issues were recognized by the experimenter, who monitored the procedure of all experiments on a separate display. If a participant wanted to stop the experiment due to motion sickness or any other discomfort, he/she was excluded from the experiment. Each task was considered complete if the participant performed at least 80% of the trials. Participants who failed to complete more than one task were be excluded from the experiment. In addition, any participant with less than 80% of the trials in the VPD task was excluded. All excluded participants due to the above reasons were replaced with another participant. These criteria were established before data collection.

Monte Carlo simulations approach
Due to the complexity of the planned analysis, and the scarcity of published empirical data using a similar setup, we could not perform a power analysis through standard techniques for most hypotheses of our study. Therefore, in order to determine the optimal sample size, we relied on a custom method combining Monte-Carlo simulations and previous data to estimate the chances of observing the hypothesized effect. In short, we used model fits from Fang's (Fang et al., 2019) largest experiment to infer a plausible distribution of the main model parameters, and we simulated behavioural results from 500 surrogate participants, assuming that the BCI ST model, described in Section 2.3.1, is correct. Then, our analysis pipeline was repeatedly run on several random samples of surrogate participants of different sizes, and the fraction of resamples yielding significant results were computed. This should provide an unbiased estimate of the probability of observing an effect, assuming that the BCI ST model is correct.

Hypothesis Ia: model validation through model selection
Concerning the hypothesis that the BCI ST model would outperform both the BCI S model and FF model, since real data from a similar experiment is available, we based ourselves on the sample of the largest experiment performed by Fang on humans. A sample size of 22 participants was largely sufficient to select the best model with fewer trials per participant.
In order to assess the probability of the BCI ST model to outperform the Heuristic and Semi-FF models, we fitted the surrogated participants (simulated with the BCI ST model as ground truth) also with these two alternative models, and compared them to the BCI ST model by computing the exceedance probability over resamples. We explored sample sizes ranging from 5 to 60 in steps of 5. For each sample size N, subsets of N participants were randomly selected 10,000 times (with replacement) from our pool of 500 simulated participants, and the fraction of resamples yielding significant results was computed. The models yielded a good fit compared to the FF model (R 2 BCI ST : .911 ± .0016; Heuristic: .884 ± .002; Semi-FF: .893 ± .002; FF: .854 ± .003). Nevertheless, when tested by computing the exceedance probability, the BCI ST model overwhelmingly outperformed both alternative models even with 5 participants (power ¼ 100%). The average log-likelihood difference in favour of the BCI ST model was very large: 65.3 ± 43.4 (SD) against the Heuristic model, and 47.66 ± 30.7 (SD) against the Semi-FF model. Indeed, the brain is expected to only approximately perform Bayesian CI, while our simulations are based on an exact implementation of the model. Nevertheless, our analysis shows that the predictions of the different models differ quite radically. If the brain approximates optimal Bayesian CI closely enough to render the Bayesian hypothesis meaningful within this context, our model comparison pipeline should be able to detect it with 40 participants.

Hypothesis Ib: model validation through unisensory correlations
The above-described Monte-Carlo approach was then used for assessing the correlations between model parameters extracted from VPD task (s p , s v , s t ), and unisensory precisions extracted from the PJ and OL (s p ), MJ (s v ) and SJ tasks (s t ).
First of all, ground truth values for the parameters s p and s v were drawn from a distribution modelled on the parameters extracted from our pilot for 500 surrogate participants. Therefore, values of s p were drawn from a Gaussian of mean 5.56, while values of s v drawn from a Gaussian of mean 5.23. The means of the randomly generated parameters were the same as what was obtained in Fang's (Fang et al., 2019) VPD experiment with 22 participants, the largest similar dataset available. The standard deviation of the Gaussians was set to 2 . A lower and upper cut-off of 3 and 8 were set on both values to avoid unrealistic extreme values. For s t , since it was not possible to collect pilot data, we assumed a distribution based on literature about the perception of delay in a visuomotor task. Fitting our model of the SJ on data extracted from figures in the only previous study that was well adapted to this goal (Farrer et al., 2008) indicates a value of s t of about 85 msec (see Supplementary material for details). To cover a plausible and wide range of temporal precisions, we assumed s t to be uniformly distributed in the range 50e120 msec, which is symmetrical around 85 msec. In order to be conservative, the power analysis was performed only with the longer delays (0e350 msec), which should guarantee that in the actual experiment the values of s t should be extracted with equal or greater precision than in our simulations. Finally, values of P p were uniformly distributed between .4 and 1, symmetrical around the average reported value of .7. Then, the VPD, PJ, OL, MJ and SJ tasks were simulated for each participant.
In order to simulate the VPD task, we simply applied the same procedure used to simulate the trials at a given amount of visuo-proprioceptive disparity, used for model fitting and described in the previous section. In order to simulate the reached position in the OL task, or the judged position in the PJ task, we added Gaussian noise of standard deviation s p to the target position for each trial. Similarly, for the MJ task, we generated 24 normally distributed values, with standard deviation s v . Finally, we used the same procedure developed for model fitting to simulate the SJ task.
After the data was generated, we extracted the unisensory parameters from the multisensory and unisensory tasks as described in the Analysis plan ( Q8 Section 3.5.5). Then, we evaluated the probability of observing significant effects as a function of sample size through repeated resampling, as described previously. As a final outcome, we chose the statistical significance of the Pearson correlation between the values of s p , s v , s t obtained by fitting the BCI ST model on the multisensory task, and their unisensory counterparts from the other five tasks, with a threshold of p ¼ .004.
This procedure allowed us to identify the major sources of uncertainty in model fitting, especially for the multisensory task, where all parameters are estimated at once, and optimize the experimental design accordingly. The final combination of spatial and temporal disparities presented in the previous sections was selected between several possible designs, as the one maximizing power while keeping the expected duration of the multisensory task below 90 min. Similarly, the number of repetitions in the unisensory tasks was defined by heuristically optimizing the trade-off between task duration and statistical power, based on the expected precision in measuring the relative model parameter as seen in Fig. 4c. As summarized in Fig. 4a, the analysis shows that a sample of 35 participants should be sufficient to obtain a power above 90% for all the three correlations with a significance threshold of .004 (92.7% for the MJ task with 15 participants, 94.8 with 15 participants for PJ, 96.4% with 10 participants for OL and 93.4% with 35 participants for SJ).
Finally, as no simple mathematical model can be used to describe the expected relation between the VM task and the prior on common cause in the visuo-proprioceptive disparity task (P p ), we relied on more conventional methods, based on the assumed value of the correlation. Our simulations indicate that the expected correlation between the true prior and the value fitted from the VPD task should be R~.777. We assumed Fig. 4 e Power analyses. Panel (a) shows the results of the power analysis conducted by Monte-Carlo simulation for Hypothesis Ib. We plot the probability of observing a significant effect after Bonferroni correction (p < .004) as a function of the number of participants. The red, cyan, blue and grey dots denote respectively the correlations between the values of s v , s p , s t as extracted from the multisensory task (VPD) and its unisensory correspondents. The black dots indicate the estimated probability of observing a significant correlation between the slope of the VM task and the value of Pp extracted by fitting in the VPD task. Probabilities for each sample size were computed over 10,000 random draws, with error bars estimating the 95% c.i. due to the sampling procedure. Panel (b) shows the average Pearson correlation value between true parameters used in simulations and fitted values, over 10,000 random draws of simulated participants. The colour-coding of different parameters is the same as in panel (a). Panel (c) shows the same average correlations between parameters extracted from unisensory tasks and true parameters used in simulations. Error bars in panels (b) and (c) denote 95% confidence intervals obtained by applying the Fisher transformation to correlation coefficients and then applying the inverse transformation to the confidence intervals on the Z scores obtained in such a way. c o r t e x x x x ( x x x x ) x x x the morphing slope value would have the same correlation with the true value of P p . For each sample size N, we selected 10,000 times N surrogated participants and generated n random vectors with an imposed correlation coefficient of .777 with respect to the true value of P p . The obtained values were then correlated with the fitted value of P p , and a significance test was performed with a ¼ .004. After computing the fraction of significant correlations, we found that a sample size of 35 would allow to reach a power of 92%.

Hypothesis II: link with subjective ownership
Concerning the hypothesis that subjective ratings should correlate with reaching biases, we based sample size estimation on the previous work by Fang (Fang et al., 2019), who performed a very similar analysis with fewer trials per participant. Assuming a true R value of this correlation to be equal to .82, a 90% power can be obtained with 10 participants.
In summary, based on the hypothesis that requires the largest sample size, we estimate that 35 participants should be sufficient for meeting our statistical power target of 90%. To be conservative, we oversampled to 40 participants.

Analysis plan
Overall, the main hypothesis of our study is the validity of the Bayesian CI model in both the spatial and temporal domains (BCI ST ) as a potential mechanism for body ownership. This was done in two steps: model validation (Hypothesis I) and linking the model predictions to subjective ownership (Hypothesis II).

Hypothesis Ia: model validation through model selection
We fitted the BCI ST , FF, BCI S , Heuristic and semi-FF on our multisensory reaching task (VPD), and then test the BCI ST model against the concurrent models by evaluating the exceedance probability (original code available from: https:// github.com/sjgershm/mfit). For all models, the fitting approach is similar to the one of Fang and colleagues (Fang et al., 2019). For each spatial and temporal disparity, we simulated 5000 trials, and maximize the likelihood of the data given the simulated model predictions, with respect to the set of fitted parameters (e.g., s v , s p , s t , P p , for the Bayesian CI model). The fitting was performed through the BADS Matlab optimization tool (https://github. com/lacerbi/bads) (Acerbi & Ma, 2017). To avoid convergence problems or poor optimization, the fitting procedure was repeated 5 times with different randomly selected starting parameters, and the fit with the highest log-likelihood was selected. No pre-processing step was performed on the data, except for the removal of systematic biases in reaching (possibly due to tracking or VR calibration). This was done by subtracting, for each participant, the mean reaching bias at zero spatial and temporal disparity.

Hypothesis Ib: model validation through unisensory correlations
As an alternative model validation, we proposed to compare individual parameters extracted from the multisensory task through the BCI ST model with their unisensory correspondents measured through independent tasks. Such tasks were designed to match the VPD setup, to capture the relevant unisensory components as accurately as possible. However, this independent measure necessarily implies some variations, and a strict 1 to 1 correspondence between the measured and the fitted parameters cannot be guaranteed. Still, a correlation between the measured unisensory and the fitted parameters is to be expected if the multisensory task relies on the proposed Bayesian CI process. Therefore, we assessed whether the parameters extracted from the unisensory tasks positively correlate with the ones extracted from the multisensory task by performing significance tests on Pearson correlation scores. Regarding proprioception, the correlation of the s p fitted by the model with the proprioceptive precision extracted from either of the two proprioceptive tasks (PJ and OL) was considered valid. The detailed extraction of unisensory parameters is described below (Section 3.5.5).

Hypothesis II: link with subjective ownership
Finally, we compared model predictions about P com with subjective ratings about ownership. We fitted a bivariate Gaussian to the average subjective ratings as a function of spatial and temporal disparity, in order to extract the values of the tolerated spatial and temporal disparities as the standard deviation of the fitted Gaussian. Those values were correlated with the values obtained by performing the same fit on P com values extracted from model fitting on the reaching task. To obtain P com values for each subject, we simulated 5000 trials for each spatio-temporal disparity, using the parameters fitted from the VPD task, and take the average value of P com , similarly to what done in Fig. 2d. Since, obviously, negative temporal disparities cannot be sampled, we mirrored the ratings and P com values symmetrically with respect to the spatial axis, so to be able to perform the fits. We then performed significance tests on Pearson correlation scores. The Gaussian fit has six free parameters: the spatial and temporal standard deviation (our parameters of interest), the spatial and temporal means, a normalization constant and a global offset.

Interpretation of possible outcomes
Model validation (Hypothesis I) was considered achieved if we could demonstrate the presence of optimality by confirming Hypothesis Ia or Hypothesis Ib. Model selection (Hypothesis Ia) was considered successful if the BCI ST outperformed all the alternative models (exceedance probability >.95 against FF, BCI S , Heuristic, semi-FF). Regarding the unisensory correlations (Hypothesis Ib), a significant correlation, after Bonferroni correction (5 parameters, p < .01), between at least one of the unisensory estimates and the corresponding model parameter was considered sufficient. Indeed, despite our careful planning, it is possible that one or more unisensory tasks do not measure the appropriate unisensory function involved in the putative Bayesian CI process taking place in the multisensory task. On the contrary, if the BCI ST is not valid, the chances of a significant correlation between the independent estimates of a unisensory precision with the corresponding model parameter is extremely low. In any case, even if all of the unisensory tasks fail to capture the relevant unisensory component, model selection should still be able to demonstrate the validity of the  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64 proposed model, as it is performed purely on the multisensory task.
Hypothesis II was accepted if the correlation described in Section 3.5.3 was significant.

Extraction of unisensory parameters
Regarding OL task, where the source of variability in reaching movements is proprioceptive-motor noise, we expect: where x r denotes the reached position, and x t the target position. Then, in order to extract s p , we simply need to fit x t~xr , and extract the root mean square error of the fit. The same reasoning holds for the PJ task, by replacing reached positions with individual proprioceptive judgements. In order to control for possible active movements of the hand during the task, trials in which participants' hand show a displacement >5 mm [i.e., a value which takes into account the reported precision of the motion tracking system (Jost et al., 2019)] were discarded from the analysis.
Similarly, in our MJ task, we expect: where x j denotes the judged midline position, and x m the true midline position. Then again, s v can be simply extracted as the root mean square error of midline judgements. The analyses are slightly more complicated for the extraction of s t . Participants do not directly report judgements about timing, as this would be hard to do practically and possibly introduce cognitive biases, but instead they express judgements about simultaneity. In a Bayesian framework, this is best described as another causal inference process. The causal inference equations are very similar to the ones described for inferring P com , the main difference being that they extend only to the temporal domain. P sim , the inferred probability that the motor command and the observed movement are simultaneous given a perceived amount of delay is Assuming that participants report stimuli to be simultaneous when P sim is larger than a given threshold b, the expression can be used to predict the shape of the psychometric curve obtained in our simultaneity judgement. Then the value of s t can be recovered by fitting through a process similar to the one used in our multisensory task. It is important to note that, as verified by simulations, the fitted value for s t depends only on the slope of the psychometric curve and is not affected by the values of the prior on simultaneity or the response criterion (see Supplementary material).
Finally, since the VM task cannot be connected mathematically to the Bayesian CI model in a straightforward manner, we chose an empirical criterion for assessing its impact. We performed a logistic fit on the judgements expressed as a function of the percentage of morphing according to the following model.
where p y denotes the probability of replying "yes" in the VM task, x denotes the morphing percentage (with 0 meaning 0% self, and 100 meaning 100% self), b 0 and b 1 are fit parameters and ε is the error term. Then, the subjective equivalence point is be given by e b 0 /b 1 , and the slope at that point by b 1 /4. We used such value of the slope as a proxy of accuracy in visually discriminating one's own hand. In principle, participants are expected to be stricter in embodying the virtual hand, which does not look like their own. We therefore expect a significant positive correlation between values of P p and slopes of the psychometric function.

Timeline
Pending limitations deriving from the COVID-19 pandemic, data collection can be initiated immediately upon Stage 1 acceptance in principle, and the study can be completed within 4 months since then. Approximately one month will be needed for data collection, two months for data analysis and one month for writing and editing.

Pilot data
To confirm the possibility of translating the task developed by Fang and colleagues to immersive virtual reality, and test our model fitting procedure, we collected pilot data from 10 healthy participants (4 females, aged 24.1 ± 2.4 years, age range 21e29). The task was a close replication of Fang's task (including the spatial, but not the temporal manipulation of visuo-proprioceptive disparity). Besides the experimental design in terms of spatial (angular) disparities, the task was the same as described in Section 2.3.3. For each participant, three experimental blocks with slightly different designs were collected. In the first two blocks, we collected a total of 49 trials with 7 disparities (at 0 , ±13.3 , ±26.6 or ±40 ). Each disparity was presented 7 times, each time with a different target out of 7 equally spaced targets between À45 and 45 . Only 5 repetitions per disparity were collected in the third block, with 5 targets equally spaced on the same range. Target positions and spatial disparities were randomized within each block. The results of the reaching task, shown in Fig. 5a, are qualitatively in line with what reported by Fang, confirming that the experimental setup can be successfully exported to an immersive virtual reality environment. We then tested  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64 whether our data were well modelled by a Bayesian CI model (BCI S ) by applying a fitting procedure very similar to the one proposed by Fang and colleagues to extract model parameters (see Analysis plan for details). As done in Fang (Fang et al., 2019), we then compared BCI S model to the FF model predictions (fixed weights to vision and proprioception at all disparities). We used model Bayesian information criterion as an approximation of model evidence, and computed the model's exceedance probability (Wozny et al., 2010) (original code available from: https://github.com/sjgershm/mfit). We found this analysis to favour the BCI S model with an exceedance probability ¼ .894, in line with the value reported by Fang (Fang et al., 2019) in his second experiment with 8 human participants (.954), showing that data from our VPD task is also quantitatively in line with this previous study. We then used the distribution of the extracted parameters as a basis for our power analysis.
We also performed a smaller pilot study on two healthy participants (2 males, aged 24 and 28 years) to test the practical feasibility of combining both spatial and temporal disparities during the task, as this was never done before ( Fig. 5b and c). The experimental design was exactly as described in the methods, except that, for temporal delays larger than zero, 6 disparities have been tested, uniformly distributed between À33.3 and 33.3 instead of À40 /40 . The design was changed as further simulations showed that this proposed design to be slightly better in terms of statistical power. Both participants were able to execute the task correctly, and the effect of temporal delay is present in both participants in line with our expectations. We used these pilot data to test our analysis pipeline and parameter extraction via model fitting. The fit converged for both participants and accurately modelled the data (R 2 ¼ .940 and .905 respectively), yielding values of the parameters in line with our expectations (S01: s v ¼ 7.1, s p ¼ 4.63, s t ¼ .118, P p ¼ .938, S02: s v ¼ 9.64, s p ¼ 5.98, s t ¼ .101, P p ¼ .826).

Results
As planned, data from 40 healthy participants was collected for our study. Due to technical issues during data collection (failure of the morphing algorithm), data for the VM task was not collected for two participants. According to the preregistered criteria, these participants were still included in the analyses involving all the other tasks.

Hypothesis I
Hypothesis I consisted in the validation of a Bayesian CI model describing participants' behaviour in a multisensory reaching task under visuo-proprioceptive disparity (VPD

Q9
) as the result of the optimal integration of visual and proprioceptive information in both the spatial and temporal domain (BCI ST ). We aimed at demonstrating our hypothesis through two approaches. All models yielded good fits for reaching errors (R 2 BCI ST : .901 ± .049 SD; Heuristic: .9 ± .053 SD; Semi-FF: .902 ± .052 SD; FF: .895 ± .053 SD). However, when computing the exceedance probability (EP), the BCI ST outperformed the FF (BCI ST ¼ .999, FF ¼ .001), BCI S (BCI ST ¼ .984, BCI S ¼ .016), and heuristic model Fig. 5 e Pilot data for a purely spatial (a) and spatio-temporal (b, c) disparity setup. Panel (a) shows the results from a larger pilot study on 10 participants with the VPD task, with no temporal delay (as in Fang et al., 2019). The x axis indicates visuoproprioceptive disparities, defined as the virtual hand angle minus the real hand angle (positive angles being on the right). The y axis indicates the proprioceptive drift defined as the target's angle minus the real hand's angle (a participant reaching left of a target experiences a proprioceptive drift towards the right, and vice versa). The blue and red dashed lines represent the expected drift in the case of a purely proprioceptive or visual dominance, respectively. The grey dashed line represents the predicted drift from a FF model of visual-proprioceptive integration, while the black dashed line shows the averaged predictions of the BCI S model, in close agreement with averaged experimental results represented by the green solid line. Panels (b) and (c) show results for two pilot participants from the new spatio-temporal disparity setup. Solid lines represent conditional means, and the colours code represents the different temporal delays tested (T disparity; as in Fig. 1b). As expected, drift values increased at increasing temporal delays.
(BCI ST ¼ .984, Heuristic ¼ .016), but not the Semi-FF, which instead slightly outperformed the BCI ST model (BCI ST ¼ .04, Semi-FF ¼ .96). At the single subject level, amongst our 40 subjects, the BCI ST model outperformed the FF in 35 participants, the BCI s in 26 participants, the Heuristic in 26 participants and the Semi-FF in 16 participants. We show the average fits for all models, as well as for two exemplary participants, in Fig. 6.
Regarding the unisensory correlations (Fig. 7b), the fitted proprioceptive precision (s p ) significantly correlated with the proprioceptive precision, as measured by both the PJ task (r ¼ .43, p ¼ .005, p ¼ .027 after Bonferroni correction) and the OL (r ¼ .43, p ¼ .006, p ¼ .031 after Bonferroni correction); in contrast, neither the fitted visual (s v ) nor the fitted temporal (s t ) precisions correlated with the corresponding unisensory precisions measured by the MJ (r ¼ .26, p ¼ .1 uncorrected) and SJ task (r ¼ .18, p ¼ .26 uncorrected) respectively. No significant correlation was found between P p and the ability to discriminate the higher order visual features of the hand, as measured by the VM task (r ¼ .041, p ¼ .8 uncorrected). These results were replicated with a different choice of the flat priors in the BCI ST model (see Supplementary material).
In summary, Hypothesis Ib can be considered satisfied, and, according to the pre-registered analysis plan, Hypothesis I can be therefore accepted despite Hypothesis Ia not being fully satisfied. Fig. 6 e Model comparison. Panel (a) shows the average reaching errors in solid lines, and the corresponding model fits in dashed lines, for all the 5 models we compared. Note that the fitting is based on the full data distribution, taking into account not only the mean reaching error but the dispersion of individual trials around the mean. Hence, the best fits may not be the ones best approximating the mean, probably leading to small discrepancies between fitted and actual data. Panels (b) and (c) show the full data and model distribution for the three key models (BCI ST , Heuristic, Semi-FF), for two exemplary subjects at zero temporal disparity. In the first (b), the BCI ST outperformed all alternative models, in the second (c), it outperformed the heuristic but not the Semi-FF. The fitted data density is encoded in the colour, and its average in the dashed black line. The solid black line represents the data average and red dots individual trials. As visible in the plots, the BCI ST and Semi-FF model mainly differ in the transition between low-disparity visual dominance and high-disparity proprioceptive dominance, which is much sharper in the and BCI ST and leads to an almost bimodal distribution (see also Fig. 3d). In participants in which these subtle differences are not observable, the Semi-FF model ends up outperforming the BCI ST due to the greater flexibility granted by its fully independent parameters. c o r t e x x x x ( x x x x ) x x x 3.2.

Hypothesis II
With Hypothesis II, we aimed at demonstrating that reaching errors in the VPD task, as modelled by our BCI model, truly reflect the subjective experience of body ownership. To do so, we extracted from reaching errors the probability of ownership as a function of both spatial and temporal disparity and computed the spatio-temporal extent of the tolerated window for implicit ownership by fitting a Gaussian function to P com values. We hypothesized that the size of such implicit ownership window would positively correlate with the size of the explicit ownership window, extracted through the same method from explicit ownership ratings.
We found that neither the size of the spatial, nor the one of the temporal windows for implicit ownership were correlated with their explicit counterpart (respectively: R ¼ .07, p ¼ .67; R ¼ À.26, p ¼ .098, Fig. 7c). Therefore, Hypothesis II cannot be considered to be satisfied according to the pre-registered analysis plan.
However, when investigating in detail the different steps of the analysis with empirical data, we found a computational issue in the parameters used for Gaussian fits on P com values, which yielded noisy fitted values (see 6.3.2 for details). We hypothesized that such fitting instability might explain the observed lack of the correlation predicted by Hypothesis II (see Supplementary material for details). Therefore, we performed the conceptual replication of Fig. 7 e Pre-registered analyses. (a) Shows the average proprioceptive drift (left plot), subjective ownership ratings (centre plot), and estimated P com (left plot) for the VPD task in 40 participants. The colour code represents the different temporal delays tested. Dashed lines represent model fits. (b) Shows the correlation between the unisensory precisions fitted by the BCI ST model from reaching errors at the multisensory task (VPD) and those measured by the unisensory tasks, with each plot representing a different unisensory task. (c) Shows the correlation between the implicit (as extracted from reaching errors) and explicit (as extracted from ownership ratings) window for ownership. The top plot represents the spatial window, the bottom plot the temporal window.
Hypothesis II after resolving such technical issue, as detailed in the exploratory analyses.

Relationship between unisensory precisions and body ownership
Model selection showed that the of BCI ST performed worse than the Semi-FF (see Section 6.1). Conceptually, the main difference between these two models is that the first assumes the spatial and temporal binding windows to be optimised on unisensory precisions, while in the latter these are considered independent. To assess more directly the existence of a relationship between unisensory precisions and the ownership window, we investigated the correlation between the width of P com extracted by the BCI ST in both the spatial and temporal dimensions and unisensory precisions extracted from the unisensory tasks. We found a positive correlation between s p measured by the PJ task and s Pcom in the spatial domain (R ¼ .33, p ¼ .035, Fig. 8a). This is directly visible by splitting reaching errors as a function of disparity for between high and low s p subjects (Fig. 8b). In contrast, s Pcom did not correlate with the s p measured by the OL in the spatial domain (R ¼ .13, p ¼ .43) nor with s t measured by the SJ in the temporal domain (R ¼ .043, p ¼ .79). Another interesting model prediction is that natural hand ownership, which in our setup is simulated by ratings at zero spatio-temporal disparity, should negatively correlate with proprioceptive precision. This may be especially relevant to the study of ownership alterations in patients with somatosensory deficits. Indeed, a less precise proprioception leads to higher visuo-proprioceptive incongruences in neural encoding, and less certainty that the hand is one's own even when the underlying physical stimuli are aligned. This was indeed the case in our data, where a significant negative correlation between s p (PJ) and ownership ratings at zero disparity was observed (R ¼ .52, p ¼ .0006, Fig. 8c).

Hypothesis II analyses with improved fit quality
As mentioned in Section 6.2, we identified a technical issue leading to poor quality of the Gaussian fits on P com values, which is a crucial step for the validation of Hypothesis II. Essentially, the issue derived from the presence of three unnecessary free parameters in the fit of P com values: the spatial and temporal location of the peak of the Gaussian, and a constant offset added to the Gaussian function. These parameters are necessary to fit explicit ratings, which could in principle peak at non-zero spatio-temporal disparity and may not reach zero even at very large disparities due to interindividual differences. For simplicity, they were left free also when fitting P com values, although the model used to compute the P com values in the first place imposes per se these parameters to be zero. As detailed in the Supplementary material, this led to extremely poor quality of P com fits, as in several subjects the fitted values largely varied when repeating the fits with different starting parameters and could be therefore considered almost meaningless. The issue is completely resolved by fixing the unnecessary free parameters to zero. In addition, we discovered that the fitting quality could be further improved by increasing the iterations of the BCI ST fit to 100,000 (see Supplementary material).
We performed further analyses after implementing such improvements in the fitting procedures. First, we confirmed that all results from Hypothesis I are left virtually unchanged by the modifications (see Supplementary material). Then, we tested whether the failure to confirm Hypothesis II could be determined mainly by such computational issue, by replicating the key correlation analysis, using the improved fits as an input. Indeed, since the modifications to the analysis procedure only improve the numerical quality of the fit, the conceptual and statistical meaning of the analysis is left unchanged.
We found that, in the spatial domain, there was a significant (R ¼ .32, p ¼ .041, Fig. 9a, left) and positive correlation between the explicit and implicit window for show the average proprioceptive drift in each of the tested delays for lower (dark blue) and higher (light blue) proprioceptive precision measured by the PJ task. Panel (c) shows the correlation between average ownership ratings at zero disparity and proprioceptive precision (PJ). Fig. 9 e Hypothesis II revised. Panel (a) shows the correlation between the implicit and explicit window for ownership, after resolving fitting issues. The left plot represents the spatial window, the right plot the temporal window. Panel (b) shows the correlation between residual drift and residual ownership measures, with each subplot corresponding to one of the 40 participants. Panel (c) shows individual Pearson correlation coefficients between implicit and explicit ownership ratings. Subjects in red show a significant correlation at p < .05. Panel (d) illustrates the result on raw data in a paradigmatic subject. The solid blue line indicates the average drift for each disparity and dots the individual reaching errors. Therefore, trials between the line and zero have negative residual drift (more proprioceptive weight), and vice versa. The colour of the dots indicates the ownership rating (1e10) associated to that trial. Higher ownership ratings are associated with more weight attributed to the virtual hand.  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64 ownership, as expected according to Hypothesis I. Therefore, participants who integrated the virtual hand in their reaching movement up to larger spatial disparities also reported higher levels of ownership up to larger spatial disparities. Surprisingly, we found a significant negative correlation (R ¼ À.35, p ¼ .029, Fig. 9a, right) in the temporal domain. This result is indeed puzzling and counterintuitive, and may depend on computational artifacts in the fitting, or other uncontrollable factors. Indeed, the overall effect of the temporal manipulation on reaching errors is small compared to the one of the spatial manipulation, leading to less reliable estimates of the dependence of P com on temporal disparities. This is reflected by the large and unrealistic fitted values for the temporal window, as well as by the lack of correlation between fitted and measured s t . Therefore, interpretation of the negative correlation should be approached with caution.

Trial-by-trial correlation between ownership and reaching errors
Since both pre-registered and exploratory analyses addressing Hypothesis II yielded mixed results, we decided to further explore the link between subjective ratings at the individual trial level, which was not possible with the experimental design used by Fang et al., who collected explicit ownership ratings only by blocks. This way, we aimed to test the conceptual question of Hypothesis II in a way that is less affected by computational issues of numerical stability and can benefit from the higher statistical power of conducting analyses at the individual trial level. If reaching errors truly reflect the subjective ownership feeling at the implicit level, we would expect that trials with higher explicit ownership would be associated with a higher weight attributed to the virtual hand even at fixed spatio-temporal disparity. To test this hypothesis, we computed "residual" explicit ownership ratings at the individual level, by subtracting to each rating the average ownership rating at the same spatio-temporal disparity. Similarly, residual drift (i.e.: putative implicit ownership) values were obtained by subtracting to the reaching error its average value for each disparity. In order for positive residual drift values to always indicate a larger visual weight and vice versa, residual drift values were multiplied by the sign of the spatial disparity.
Zero disparity values were excluded as they yield no meaningful information in this analysis. Then, we tested whether residual drift and ownership values were correlated by means of a linear mixed model. Regardless of the structure of random effects used, we found a strong effect of residual drift values on residual ownership values (p < .001). Indeed, a correlation between residual ownership and drift values can be observed in almost all subjects individually, with 31 out of 40 participants showing a significant correlation at p < .05 (see Fig. 9bed). The average correlation coefficient was .381 ± .032 S.E., significantly larger than 0 at p < .001. This result strongly suggests that individual reaching errors reflect subjective ownership for the virtual hand as reported by the participant within the same trial.

Discussion
In this work, we aimed at rigorously assessing whether the emergence of body ownership from spatial and temporal visuo-proprioceptive congruency carries the signature of optimal Bayesian inference. To do so, we measured the proprioceptive drift induced in reaching movements by a spatiotemporally incongruent virtual hand and compared it to predictions of a Bayesian Causal Inference (BCI ST ) model. The validation of the model developed over two pre-registered hypotheses. With Hypothesis I, we aimed at demonstrating that the model not only well fits experimental data, but that the data shows compelling evidence of Bayesian processing. This was done by comparing the model with appropriate non-Bayesian alternative models and by directly comparing the unisensory precisions fitted from the multisensory reaching task (VPD) to independent measures of unisensory precisions. With Hypothesis II, we aimed at demonstrating that the reaching errors induced by the virtual hand can be considered as an index of implicit ownership and reflect subjective ratings of ownership (explicit ownership), by assessing the relationship between these two measures.
Hypothesis I was accepted according to the pre-registered analysis plan. A significant correlation was observed between the proprioceptive precision fitted from the VPD task and measured through unisensory static (PJ) and dynamic (OL) task. While there is a significant overlap between the VPD and OL task (both involving active reaching movements), the static PJ task is orthogonal to the VPD task and provides a more compelling signature of Bayesian optimality. No significant correlation was found between measured and fitted temporal precision (s t ), visual precision (s v ) and common cause prior P p . Model comparison also favoured the BCI ST model, which outperformed all but one of the alternative models, as we will discuss in more detail later.
Hypothesis II, instead, was not validated according to our pre-registered analysis plan, as no significant correlation was observed between sizes of the spatial or temporal windows for implicit and explicit ownership. Nevertheless, detailed investigation of the fitting procedures involved in the analysis with the full sample of empirical data revealed a purely computational issue which affected the quality of the results. After resolving it, a significant positive correlation emerged between the sizes of the spatial windows for implicit and explicit ownership, confirming the results by Fang and colleagues (Fang et al., 2019). Surprisingly, a negative and significant correlation emerged in the temporal domain. Since the temporal manipulation was only partially successful and led to noisy estimates of s t , we believe this result may be due to numerical artifacts and should be taken with caution. To further investigate the question raised in Hypothesis II, we performed an alternative exploratory analysis at the single trial level, revealing a strong correlation between variability in ownership ratings and reaching errors. Trials in which the virtual hand biased the reaching movement more strongly than on average were associated with ownership ratings c o r t e x x x x ( x x x x ) x x x 22   1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65   66  67  68  69  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96  97  98  99  100  101  102  103  104  105  106  107  108  109  110  111  112  113  114  115  116  117  118  119  120  121  122  123  124  125  126  127  128  129  130 CORTEX3732_proof ■ 8 August 2023 ■ 22/26 higher than the average. Such correlation was significant not only at the group level, but also at the single subject level for 31 out of 40 (77.5%) participants.
In sum, our results suggest that visuo-proprioceptive integration and the sense of body ownership both carry the signature of Bayesian, uncertainty-based processing, and can be well described by a Bayesian Causal Inference model.
It is worth noticing that the analysis plan was thought to detect any signature of Bayesian processing, with different alternative outcomes being considered sufficient for validating Hypothesis I. This is already an important advancement with respect to most previous studies, whose experimental design did not allow to rigorously demonstrate such feature. Still, not all experimental results were in line with the initial hypotheses, strictly following all the preregistered analyses. Since unsuccessful analyses can be as informative as successful ones, we will discuss these limitations extensively, in line with the philosophy of pre-registered reports.
First, three out of the five expected correlations between the fitted and measured parameters were not significant. The simplest explanation is that the unisensory tasks do not capture model parameters or do so with insufficient precision to reach the necessary statistical power. In our view, this is the most likely explanation for the absence of correlation between the fitted and measured s v and P p . Indeed, although both parameters routinely appear in similar studies (Fang et al., 2019;K€ ording et al., 2007;Samad et al., 2015), they are conceptually difficult to measure independently, which is arguably a common limitation in Bayesian models of brain function. In contrast, the definition of temporal precision within the context of our experimental task seems less ambiguous, and the absence of correlation is more surprising. Indeed, the comparison with the purely spatial BCI model shows that the temporal manipulation does affect behaviour. However, such modulation is weaker than expected (likely also affecting the stability of the fit of the underlying parameter, see Fig. S6) and not coherent with model predictions. Possibly, this is due to the fact that the brain may optimise behaviour in a taskdependent way (Limanowski & Friston, 2020), an aspect which was not included in our model. Since the focus of the task is to maximise spatial accuracy, while attentional and computational resources are limited, spatial parameters may be optimised more accurately than temporal ones.
Another key aspect in Hypothesis I is that model selection showed that the BCI ST model is outperformed by the Semi-FF model, in which unisensory precisions determine the relative weight of vision and proprioception, but not the size of the spatial and temporal windows of integration. This may seem surprising, and contradictory to the presence of correlations between measured and fitted unisensory precisions. Nevertheless, it is important to notice that the Semi-FF model was constructed post-hoc with the sole purpose of providing the best fit of already known empirical data. This way, it benefits from the absence of any theoretical constraint, but for the same reason it lacks any deeper insight into the underlying mechanism. More conceptually, Bayesian approaches to brain function start from the hypothesis that the brain approximates optimal inference, and arguably share the limitation that they do not provide predictions about how good the approximation will be. Whether an ad-hoc model can outperform a Bayesian model is therefore more an empirical question than a conceptual one. Thus, while this does not hinder the Bayesian brain hypothesis conceptually, it may limit its practical utility. On the one hand, the Semi-FF model cannot account for the observed correlations between unisensory and fitted parameters, suggesting that it does not provide the most exhaustive description of brain function. On the other hand, the fact that a non-Bayesian model can outperform a reasonably sophisticated Bayesian model confirms the difficulty of accurately reproducing subtle aspects of behaviour within the Bayesian framework (as those visible in Figs. 3d and 6bec). This confirms the need of refining current models through rigorously paired theoretical and experimental research.
Coming back to Hypothesis II, taken together our results suggest that reaching errors induced by visuo-proprioceptive disparity do reflect subjective ownership feelings. In this sense, the last exploratory analysis (shown in Fig. 9) is especially interesting, as it shows a strong connection between random fluctuations of proprioceptive drift and subjective ownership. This is perfectly in line with a Bayesian account of body ownership, in which noisy and constantly fluctuating sensory information leads to subjective ownership through an online inference process. Such analysis is also relevant for the recent debate about demand characteristics of tasks used to assess body ownership (Lush et al., 2020). While it is possible to imagine that subjects may provide lower ownership ratings in the presence of incongruence, due to demand characteristics, our finding directly link the residual reaching errors and ownership ratings, after removing the average main effect of disparity. Such quantities require a complex mathematical computation to be obtained and are therefore cognitively inaccessible and arguably immune to demand characteristics. Furthermore, compared to explicit paradigms such as the RHI, our VR adaptation of Fang's task allows to easily collect a greater amount of quantitative data points while parametrically exploring several modulating factors.
Our paradigm does not directly apply the causal inference framework to subjective ownership ratings. This is instead the approach used by a recent work (Chancel et al., 2022). Their main finding is that trial-by-trial variations of noise in visual information are taken into account in rating ownership in a rubber hand setup, since the tolerance to temporal delays in visuo-tactile stimulation increases with levels of sensory noise rather than being constant. This is a strong hint that uncertainty-based inference is taking place. However, these results cannot be taken as a quantitative demonstration of causal inference in body ownership, since the effect of sensory noise on unisensory precision is not independently quantified [as e.g., in Ernst & Banks (2002b)] and could not be compared to model fits.

Data availability plan
Compiled code for the experimental paradigms, custom analysis code, model fit outputs and all the data collected for this study is available from: https://osf.io/azh8p/?view_ only¼abdd1b1d33c24794b048847432374720. For further information on the code contact the first author (TB). The stage 1 accepted protocol can be found at: https://osf.io/bzyge.

Ethical approval plan
Participants were asked to sign an informed consent form prior to starting the experiment. All experimental procedures have been approved by the Ethical Committee of Human Research of the Vaud canton (CER-VD, project identifier: 2017-01588), Switzerland, and was run in accordance with the ethical guidelines of the ethical committee and the Declaration of Helsinki. Participants were recruited using the online platform SonaSystem of the University of Lausanne (https:// epflunil.sona-systems.com) and compensated 20 Swiss Francs per hour for their time.

Open practices
The study in this article earned Open Data, Open Material and Preregistered badges for transparent practices. The data and material used for this study are available at: https://osf.io/ azh8p/?view_only¼abdd1b1d33c24794b048847432374720 and preregistered study available at: https://osf.io/bzyge.