Knowing me, knowing you: Interpersonal similarity improves predictive accuracy and reduces attributions of harmful intent

To benefit from social interactions, people need to predict how their social partners will behave. Such predictions arise through integrating prior expectations with evidence from observations, but where the priors come from and whether they influence the integration into beliefs about a social partner is not clear. Furthermore, this process can be affected by factors such as paranoia, in which the tendency to form biased impressions of others is common. Using a modified social value orientation (SVO) task in a large online sample ( n = 697), we showed that participants used a Bayesian inference process to learn about partners, with priors that were based on their own preferences. Paranoia was associated with preferences for earning more than a partner and less flexible beliefs regarding a partner ’ s social preferences. Alignment between the preferences of participants and their partners was associated with better predictions and with reduced attributions of harmful intent to partners. Together, our data and model expand upon theories of interpersonal relationships by demonstrating how dyadic similarity mechanistically influences social interaction by generating more accurate predictions and less threatening impressions.


Introduction
How do people learn about the properties or preferences of other people in the world? Generically, they start with prior expectations and update them in the light of observations (Chater, Tenenbaum, & Yuille, 2006;Plitt & Giocomo, 2021;Vilares & Kording, 2011). When lacking specific information about others, a natural and readily accessible source of prior expectations for people is their own beliefs or preferences (Krueger & Clement, 1994). This will generate self-related biases in the inferences made about others (Andersen & Chen, 2002;Andersen & Glassman, 1996;Buckner & Carroll, 2007;Robbins & Krueger, 2005;Suzuki, Jensen, Bossaerts, & O'Doherty, 2016). These biases will be particularly important when data are scarcea common regime in social contexts.
People's priors can also exert an enduring influence over the whole course of learning about others, as if their interpretation of observations is coloured by what they themselves think or would do in the same circumstances. This can affect predictions of the beliefs or actions of others. For instance, peoples' predictions about the choices partners would make between alternative snack foods remain partly biased by their own preferences, even after substantial observations of their partners' picks (Tarantola, Kumaran, Dayan, & De Martino, 2017). Priors can also influence learning in social interactions, for example when cooperating or trusting others. This is particularly apparent in psychiatric disorders, many of which involve distressing changes in inferences about the social orientations of others. For example, persecutory beliefs are associated with biases in social impression formation (Barnby, Bell, Mehta, & Moutoussis, 2020;Diaconescu, Wellstein, Kasper, Mathys, & Stephan, 2020;Lincoln, Peter, Schäfer, & Moritz, 2010;Raihani & Bell, 2017;Saalfeld, Ramadan, Bell, & Raihani, 2018; Abbreviations: AICc, Akaike Information Criterion (corrected); BIC, Bayesian Information Criterion; CBM, Concurrent Bayesian Modelling; CI, Confidence Interval; ICAR, International Cognitive Ability Resource, Progressive Matrices; R-GPTS-B, Subscale B of the Revised Green Paranoid Thoughts Scale; SVO, Social-Value Orientation. Wellstein et al., 2020), and a defining feature of paranoia is an exaggerated belief that harm will occur, and that other people intend for it to happen (Freeman & Garety, 2000). In experimental settings, more paranoid people attribute more harmful intentions to others, including in scenarios where the partner's true intentions are ambiguous (Barnby, Deeley, Robinson, Raihani, Bell, & Mehta, 2020;Greenburgh, Bell, & Raihani, 2019;Raihani & Bell, 2017;Saalfeld et al., 2018). Paranoia is also associated with variation in social preferences: more paranoid individuals are less trusting and less cooperative in experimental economic games (Fett et al., 2012;Hula, Montague, & Dayan, 2015;Hula, Vilares, Lohrenz, Dayan, & Montague, 2018;King-Casas et al., 2008;Raihani, Martinez-Gatell, Bell, & Foulkes, 2021;Fig. 1. Study design. In phase 1, participants chose between two options for 18 trials. In phase 2, participants were randomly matched with a prosocial, competitive, or individualistic partner and predicted which options their partner would choose on 36 trials. Participants received feedback after each trial about what their partner actually chose. After phase 2, participants were asked to infer the extent to which they believed their partner was motivated by self-interest and harmful intent, respectively. They were also asked to classify their partner into one of three categories depicting the partner's social preferences. These categories were described as whether they thought their partner was primarily aiming to (i) equalise payoffs, (ii) earn as much money as possible, or (iii) prevent the participant from earning money. Xiang, Ray, Lohrenz, Dayan, & Montague, 2012) and other work has shown that paranoia positively predicts the enjoyment of negative social interactions (Raihani et al. 2021) and the willingness to inflict financial harm on a partner ('punishment', Raihani et al. 2021). Given the striking disruption of paranoia on interpersonal dynamics, both in health and illness, it is critical to understand the role of priors on the process of belief formation when information is scarce.
Here, we asked how participants' own social preferences influenced learning about the social preferences of others when an interaction partner's social preferences had financial consequences for the participant themselves. We also examined how variation in paranoia correlated with participants' own social preferences and the way that they formed and updated beliefs about their partners. To do this, we used a modified social-value orientation approach (SVO, Murphy, Ackermann, & Handgraaf, 2011;Murphy & Ackermann, 2014). SVO describes a participant's social preferences and can be measured using a task where decision outcomes impact both the participant and a notional recipient. Specifically, participants can be classified as being prosocial if they typically prefer equal outcomes, individualistic if they prefer to maximise absolute earnings for themselves, and competitive if they prefer to maximise the relative payoff difference between themselves and a partner. In phase 1 of our modified SVO task, participants acted in the SVO decider role for 18 trials; in phase 2, they acted as the recipient for 36 trials when choices were made by a (new) partner. In phase 2, participants predicted which option the partner would choose in each trial, allowing us to measure initial priors about the partner and subsequent learning (Fig. 1).
We formalised the influence of a participant's SVO on their learning about their partner by building and comparing ten Bayesian belief and five heuristic models (Table A.1). We used phase 1 behaviour to estimate participants' own SVO, operationalised in terms of parameters of their subjective utility for earnings for themselves and their partner. In phase 2, models either a) integrated a participant's SVO into their prior beliefs about their partner in a Bayesian updating process, b) used separate prior values for the inferred SVO of a partner (but still used a Bayesian updating process), or c), adopted a heuristic reinforcement learning approach.

Participants
We recruited 750 participants for the initial baseline assessment via Prolific Academic in March 2020. Participants first completed the R-GPTS (Freeman et al., 2021), the ICAR matrices (Condon & Revelle, 2014), and provided demographic information (age, sex, education). After a minimum interval of seven days, we were successful in recalling 697 participants to take part in the experimental paradigms. All participants were between the ages of 16-65, were UK residents, fluent in English, had at least a 90% approval rating on Prolific Academic, and had no prior or current psychiatric diagnosis.
Overall, our sample passed quality control checks, with only 7.1% failing both control questions, 14.9% getting one control question wrong, and 78% getting both control questions correct. Task comprehension was included as an explanatory variable in all regression models. In addition, on a scale of 0-100 (0 = I did not believe that my partner was a real person, 100 = I believed that my partner was a real person), participants were more inclined to believe their partners in the game were real (mean = 59.61, sd = 30.37, median = 66, min = 0, max = 100, skew = − 0.43).

The modified SVO task
We built a modified SVO task based on existing paradigms (Murphy et al., 2011). Participants were asked to play in two phases of the task. Participants were informed that the points they earned in the task would contribute to an overall point total which was pooled over a series of tasks. Other tasks in the series that contributed to the points total are reported in a different paper (Barnby et al., In Prep). Participants were informed they would be matched with two anonymous partners (one for Phase 1 and another for Phase 2) online.
In Phase 1, participants played the role of the decider in a two-player SVO task (Murphy et al., 2011) over 18 trials. In each trial, participants chose between two options determining financial rewards (framed as points) for themselves and an anonymous partner (Table B.1). Participants made 6 choices between prosocial and competitive options, 6 choices between individualistic and competitive options and 6 choices between prosocial and individualistic/competitive options. Options in this latter category were classified as being individualistic/competitive because they were consistent both with a participant's preference to maximise own payoffs and with a preference to establish a payoff advantage over the partner. We did not use these categorical labels when presenting options to participants and instead used neutral labels, Option 1, and Option 2, in each trial.
In Phase 2, participants played in the recipient role with a new partner with whom they were randomly matched. For a breakdown of the distribution of participants preferences within each partner type see Table F.1 and Fig. G.1. Over 36 trials, participants predicted which of two options the partner would choose. Participants were incentivised to predict accurately because accurate predictions contributed to their total point score, which determined entry into a financial lottery. Partner decisions were decided manually a priori, and without noise: prosocial partners always chose the option that maximized equality and in cases of competitive/individualist option pairs the prosocial partner never decided upon the competitive option; competitive partners always tried to reduce the participant's bonus as much as possible (and chose the individualist option otherwise); and individualist partners always chose the highest payoff for themselves (and chose the prosocial option otherwise). After predicting which option their partner would choose, participants were provided with feedback about whether their answer was correct or not ( Fig. 1). Finally, participants were asked to what extent they thought their partner was motivated by harmful intent and self-interest (using two separate slider scales from 0 to 100, with the slider invisible until the participant had made the first click). They then answered (using a 3-option forced-choice question) whether they thought their partner was (1) aiming to share the money equally, (2) trying to earn as much money as possible, or (3) to prevent the participant from earning money.
Categorical pairwise choice analysis reported used the cumulative sum for each type of decision a participant made within each choice pair. For example, within prosocial-competitive choice pairs, prosocial choices were dummy coded as 1, and all instances of a participant choosing the prosocial option when this choice pair was available were summed (for a maximum of 6 per participant, per choice pair); see Table B.1). We constructed 3 separate models, one for each choice pair (prosocial-competitive; prosocial-individualist/competitive; competitive-individualist).
We derived regression estimates (which we refer to as 'estimates' in the text) associated with parameters in our statistical models using model averaging (Burnham & Anderson, 1998Grueber, Nakagawa, Laws, & Jamieson, 2011), which accounts for the uncertainty in producing parameter estimates when more than one model is consistent with the observed data. Briefly, we defined a global model, containing all explanatory terms and interactions of interest and then derived a top model set, using the dredge function in MuMIn (Bartoń, 2020) which compares the global model and all possible sub models. We defined the top model set as being the model with the lowest AICc value and all models within 2 AICc units of that top model. Parameter estimates were then obtained by averaging across this top model set, which accounts for the fact that some parameters do not appear in all the top models. We report full rather than conditional model-averaged estimates, as the former are more conservative. All associations derived from model averaging control for paranoia, general cognitive ability, task comprehension, age, and sex, unless stated otherwise.
All averaged regression models are given an identifier (e.g., Model 1a) to signify which estimates came from the same model and correspond to the model signifier in the RMarkdown workbook available on GitHub (see below for link).
In addition to the model-averaging approach described above, we also conducted stepwise linear regression models to explore the variance explained by each parameter that appeared in the top model set; we report r 2 values associated with these models, generated via a standard stepwise approach (i.e., by sequentially adding terms of interest to our baseline model). We used the 'lm' function in R and we report the overall model fit in addition to regression coefficient strength (referred to as 'non-averaged estimates').

Computational modelling
We implemented a suite of models that belong to two broad 'classes' of computational theory in social learning: reinforcement learning models (k = 5) and more structured inferential Bayesian models (k = 10; Table A.1). While reinforcement learning models allow the tracking of reward-predictive signals from social others, Bayesian models instantiate explicit adherence to the potential structured nature of inference about others (Vélez & Gweon, 2021). Both are important to test which method may be most suited and explanatory to the way our participants integrate their own preferences into the beliefs about their partner. Rather than using the three categorical SVO definitions as with the heuristic models (individualist, competitive, prosocial), the Bayesian models decompose the preferences of participants and partners according to a reduced form of Fehr-Schmidt inequality aversion model (Fehr & Schmidt, 1999) which parameterises the subjective utility U of a choice between reward R self for the chooser and R other for the partner as follows: where R ¼ {R self , R other }. Given a choice between two such option pairs, R ¼ {R 1 ; R 2 }, the probability of choosing the first option is taken to be where σ(•) is the logistic sigmoid.
Here, α describes the weight a participant places on their own payoff (in one reduced Bayesian model we set α = 0), and β, the weight a participant places on their payoff relative to the payoff of their partner. Large positive or negative values of β indicate respectively that participants like or dislike earning more than their partner. We can therefore describe these terms α and β as reflecting preferences for absolute and relative payoffs, respectively. For the option set we used, R self > R other so one can also write U α, β (R) = (α + β) * R self − β * R other .
Following usual SVO practice, the partners in our study acted according to the choices reported in the Appendix (Table B.1). Broadly, individualist partners had high α and minimal β; prosocial partners had substantially negative values of β, so that their subjective utility was reduced if they earned a lot more than their partners; and competitive partners tended to have more positive values of β, so their subjective utility was increased when they earned more than their partners. Competitive and prosocial partners still tended to have positive values of α, because given choices with equal values of R self − R other , they tend to favour the option with a larger R self .
All models were built, fitted, and compared using Matlab (MATLAB, 2020) using the CBM toolbox (Piray, Dezfouli, Heskes, Frank, & Daw, 2019). Model comparison metrics (e.g., iBIC; Huys et al., 2011) estimated from Laplace approximation are useful for individually fitted models but treat each model during comparison as a fixed effect which leaves parameter estimation in hierarchical, mixed effect, models susceptible to outliers (Stephan, Penny, Daunizeau, Moran, & Friston, 2009). To overcome the issue of fixed effects in model comparison we used concurrent Bayesian model fitting that hierarchically estimated participant's parameters for each model and simultaneously compared all models using the CBM toolbox (Piray et al., 2019) using a stepwise method. We used broad priors to fit each model and individual hierarchically (mean = 0, variance = 7.5).
Model comparison consisted of four phases. We first fitted all model parameters to individuals using Laplace approximation with the 'CBM lap' function in the CBM toolbox. We then initially compared 11 models. In the second phase, we retained the viable models from step 1 and compared them with three additional models. In the fourth phase, we compared the winning models with a responsibility greater than or equal to 1%. We performed recovery analysis to confirm each model could simulate and recover data as expected before model comparison. All models that were weighted with <1% contribution to the overall hierarchical fit over all models were excluded at each step. We concurrently compared models 1-15 together using CBM in a subsample of 100 random participants to ensure our outcome from the stepped approach was reproducible across model space. Here, we note the formalisation of the winning model (Model 2). See the Appendix for the formalisation of the other models (Text A.1; B.1).
In the winning model (Model 2; Table A.1), the actual value of a participant's own preferences in Phase 1 influences the inferences they make about their partner in Phase 2. Therefore, both phases are important to clarify the preferences of the partner, and we engaged in simultaneous estimation of parameters from Phase 1 and Phase 2 rather than separating each segment of the task with separate models.

Phase 1 -Estimating participants' social preferences
We modelled participants' social preferences) as ranging along two dimensions: absolute payoffs (α ppt ), indicating the weight participants place on their own payoffs; and relative payoffs (β ppt ), indicating the weight participants place on payoffs relative to the partner. The Fehr-Schmidt model also includes a term for quantifying how much a participant dislikes relative discrepancies in payoffs with their partner that results in them earning less ('disadvantageous inequality') which can arise when R self < R other , although this circumstance does not arise in our options. For one (ultimately losing) model, we adopted the restriction α = 0 (meaning that only β determined the SVO of a participant and their belief about a partner).
Over 18 trials, participants made binary choices c t , t = {1…T} about whether option 1 or option 2 should be chosen given the returns R t = {R t;1 ; R t;2 } = {R self t; 1 , R other t; 1 ; R self t; 2 , R other t; 2 } for self and other for both offers, such that the log likelihood of the participant choosing option c t = 1 is:

Phase 2 -How participants learned about their partners' social preferences
We then modelled participants' beliefs about their partner's SVO as ranging along two dimensions, α par & β par .
Over 36 trials, participants made binary predictions d t , t = {1…T} about whether option 1 or option 2 would be chosen by their partner given the returns } for each pair of options. They then discovered what the partner chose, which we write as d t .
The participant assumed that the partner chooses the same way that they do themselves, but with SVO parameters α par , β par , which they needed to infer from observation. That is, the likelihood that the partner chose d t is LL = log (p(d t |α par β par ; R t )) using the same formula as in eq. 1. The partner's decisions D t = {d 1 , d 2 , …, d t } were used to update a participant's beliefs about a partner's α par , β par , written as p(α par , β par |D t ).
The starting point for these beliefs (written as p(α par , β par | D 0 ) was the participant's prior. For this model, we assumed that this was a factorised distribution with each parameter centred on the participant's own preferences α ppt m , β ppt m but with standard deviation parameters α σ and β σ that characterised the extent to which the participant thought their partner might differ from themselves (belief flexibility). Therefore, we have.
where eq. 3 signifies that the independent probability of α par and β par are predicated on a normal density distribution over all possible values of α par and β par , determined by the participant priors (α ppt m , α σ with regard to α par , and β ppt m , β σ with regard to β par ). We then assumed that a participant's posterior beliefs about their partner from trials t = 1…36 given a partner's decisions followed Bayes rule: For efficiency, we conveniently represented p(α par , β par | D t ) as a matrix over a fixed grid of α and β values, θ αpar, βpar t . We could then calculate the participant's beliefs about their partner's SVO preferences for each trial: We could then marginalise along θ αpar, βpar t to calculate the belief a participant had over their partner's SVO: The model then stated that the participant predicts the partner's decision in the next trial by calculating the probability determined by the utility differences ΔU αpar, βpar (R t+1 ) as in eq. (1), summed over the joint distribution θ αpar, βpar t over the partner's parameters.
and then performed probability matching, so that

Updates in our inferences about the beliefs of participants about their partners
Updates between the prior and posterior distribution of our inferences about what a participant believes about their partner were calculated as the absolute difference between the mean of their prior at the start (trial 0) of phase 2 (which, according to the winning model, came from their own value) and the mean posterior approximation of the participant's belief about a partner along each dimension at trial 36 of phase 2, weighted by the baseline similarity of the participant and their partner, i.e. the number of the same decisions a participant and their partner would have made over Phase 2 choices had no learning occurred:

Phase
When asked to explicitly classify their partners according to prosocial, individualistic, or competitive preferences following phase 2 predictions, participants were generally accurate: 71% of people correctly identified competitive partners as trying to stop them earning money; 84% correctly identified individualistic partners as trying to earn as much money as possible; and 93% correctly identified prosocial partners as trying to share payoffs as equally as possible (Fig. C.1). Kruskal-Wallis rank sum test found that those who were more paranoid significantly (although marginally) differed in their classification of prosocial partners (Kruskal-Wallis χ 2 (2) = 6.00; p = 0.0497). To explore this further, pairwise comparisons using Dunn test with Benjamini- Hochberg   Fig. 2. Model agnostic analysis of phase 2. (A) Harmful intent and self-interest attributions made at the end of the task for each level of paranoiadetermined via split mean (median = 1) -for each partner policy. Linear modelling was used for the main analysis and confirmed the split-mean comparison differences. Split mean differences calculated for this visualisation using non-parametric ANOVA. ns = not significant, * = p < 0.05, ** = p < 0.01 (B) Pearson correlations between total correct answers and attributions made about the partner at the end of phase 2. These associations were confirmed with more complex models in the text controlling for age, sex, task comprehension, general cognitive ability, and paranoia. correction found a significant difference between competitive and prosocial classifications (Dunn estimate = − 2.32, p = 0.03), such that more paranoid individuals were more likely to classify prosocial partners as trying to stop them earning money rather classifying them as trying to share the money equally.

Computational modelling
We found that a four-parameter Bayesian updating model fitted the data best (Fig. 3A). well predicts the trial-by-trial responses of the participants and overall, for each participant (Fig. 3D & E), was able to reproduce behaviour ( Fig. 3G) and had parameters that could be appropriately recovered (Fig. 3H). The model fits α ppt m and β ppt m to data from phase 1 and phase 2; see Table A.1 for the relationship to the fit of parameters estimated from just phase 1 and phase 2 separately.
All alternate models are defined in the Appendix (Table A.1; Formalisms: Text S1; Text S2). The best-performing model fit better than similar models (inspired by Tarantola et al., 2017) in which the participants own preferences exerted a persistent bias over choices as well as affecting the initial condition. See Appendix (Fig. A.1)  ticipants' preference for options in which they earned more than their partners. Participants who identified as female were also found to be less competitive compared to those identifying as male (estimate = − 0.16, 95%CI: − 0.32, − 0.01; Model 5b). There was no effect of general cognition or age on social preferences for relative payoffs ( and β σ ). We therefore also include these terms in the model exploring predictive accuracy. Belief flexibility terms, α σ and β σ , were positively associated with each other (estimate = 0.35, 95%CI: 0.28, 0.42; Model 6). Predictive accuracy was positively associated with baseline similarity and with belief flexibility parameters, α σ and β σ ( Neither paranoia nor general cognitive ability in this model were associated with predictive accuracy (Model 7). To unpack these relationships: initially regressing predictive accuracy against general cognition, age, sex, and task comprehension generated an r 2 of 0.03 (p < 0.001). Including paranoia did not improve the model (r 2 = 0.03, p < 0.001; F = 2.22, p = 0.14). Including baseline similarity significantly improved the model (r 2 = 0.07; F = 27.8, p < 0.001), indicating that similarity between participants and their partners was associated with increased accuracy (non-averaged estimate: 0.19, 95%CI: 0.12, 0.27). Including α σ and β σ significantly improved the model (r 2 = 0.52, p < 0.001; F = 319.8, p < 0.001), with both being positively associated with predictive accuracy (α σ non-averaged estimate: 0.39, 95%CI: 0.34, 0.45;β σ non-averaged estimate: 0.56, 95%CI: 0.50, 0.62). We also allowed for α σ , β σ and baseline similarity to interact in a final model; this significantly improved the model (r 2 = 0.61, p < 0.001; F = 37.7, p < 0.001). In this final model there was an interaction between baseline similarity and β σ (non-averaged estimate: -0.31, 95% CI: − 0.37, − 0.25), as well as baseline similarity and α σ (0.12, 95%CI: 0.04, 0.19).
When predicting general cognitive ability and paranoia in two separate models, general cognitive ability was not associated with either α σ or β σ (Model 8a), although paranoia was negatively associated with β σ (estimate: -0.06, 95%CI: − 0.15, − 0.00; Model 8b), after controlling for participant-partner baseline similarity, age, sex, and task comprehension. There was no interaction between α σ , β σ and baseline similarity in either model. This suggests that paranoia was specifically associated with increased belief rigidity concerning the value a partner placed on relative (rather than absolute) payoffs.

Inferential updating.
We explored the change in the statistical inferences made about the beliefs participants held about their partners by testing the difference in the mean values of inferred distributions before and after a participant learnt about their partner normalised by their similarity prior to learning [Δ(α par m ); Δ(β par m )], and the impact of paranoia on this process (see Table F.1 for summary statistics). We observed significantly larger Δ(α par m ) and Δ(β par m ) in competitive partner conditions compared to individualist and prosocial partner conditions (see Table F There was no association between paranoia and Δ(α par m ) (− 0.05, 95%CI: 0.19, 0.42; Aux Model 1), and there was no interaction between paranoia and partner after controlling for age, sex, general cognition and task comprehension. Paranoia was associated with lower Δ(β par m ) across the board (− 0.16, 95%CI: − 0.27, − 0.05; Aux Model 2), and there was an interaction between paranoia and partner such that those more paranoid changed their beliefs more with prosocial versus competitive partners (0.18, 95%CI: 0.02, 0.33; Aux Model 2) and more with individualist versus competitive partners (0.17, 95%CI: 0.02, 0.32; Aux Model 2), although no difference between prosocial and individualist partners, after controlling for age, sex, general cognition and task comprehension.
We found a relationship between Δ(α par m ) and harmful intent attributions, such that the more inferences about participants' beliefs about their partner moved away from α ppt m , the larger the attributions of harmful intent estimated by the participant (0.31, 95%CI: 0.24,0.38; Aux Model 3a), after controlling for general cognition, age, sex, and task comprehension. The same was also true for Δ(β par m ) -the further participants needed to move away from their β ppt m the larger their attributions of harmful intent (0.37, 95%CI: 0.30, 0.44; Aux Model 3b). Self-interest attributions were positively associated with predictive accuracy and negatively associated with β σ (Table E.1; Model 9b).
Paranoia, general cognition and α σ were not associated with self interest attributions (Model 9b).
We found no relationship between Δ(α par m ) or Δ(β par m ) and selfinterest attributions (Aux Model 4a; Aux Model 4b), after controlling for general cognition, age, sex, and task comprehension.

Discussion
How do people learn about others when given little information? Core social-cognitive theory (Andersen & Chen, 2002;Andersen & Glassman, 1996) suggests that they use themselves as a starting point. Previous experimental evidence suggests that the neural regions participants use to estimate the choices others will make are like those they use to make choices for themselves (Behrens, Hunt, Woolrich, & Rushworth, 2008;Nicolle et al., 2012), with the right temporal parietal junction (Zhang & Gläscher, 2020) and anterior cingulate gyrus (Chang, Gariépy, & Platt, 2013) implicated in the integration of self and other choice information. We explored this hypothesis by asking whether participants' own social preferences influenced the way they learned about the social preferences of others, and how both traits were associated with paranoia. We found that people used their preferences as a prior for learning about others and that they were more accurate at predicting the choices of partners who had similar social preferences to them. This accuracy could not be attributed to baseline participantpartner similarity, nor willingness to adapt. Similarity between the preferences of participants and their partners reduced harmful intent attributions. More paranoid individuals were less flexible in adapting to a partner's prosocial or competitive nature, supporting the idea that paranoia involves alterations to key social computational processes that update representations of interaction partners.
Participants' predictions about their partners' social preferences were best fit by a Bayesian learning model incorporating their preferences as the central tendency of their priors about their partners, and parameters governing the flexibility of these beliefs. This model outperformed alternatives in which the participants' own preferences played either no, or a diminished, role in modelling their partners' preferences, and variants (based on Tarantola et al., 2017) in which participants' own preferences persistently influenced predictions (for instance, predicting options by ignoring their partner's social preferences, or predicting options based on the best outcome for themselves). Unlike Tarantola et al. (2017), the money received by participants here depended on their partner's choices which may have influenced the salience of the predictions. Alternatively, participants' preferences for snack foods could be more pronounced than for the subtler social value choices, implying that they exerted a more substantial effect. The Bayesian model also outperformed heuristic alternatives in which participants learned that their partners made choices in one of the simple three categories associated with SVO. This suggests that the nuance afforded by the full social preference model is beneficial.
Together, our data and model provide mechanistic insights into how interpersonal similarity affects social interaction. Participants perceived partners who were more like themselves (including more competitive ones) as being less intentionally harmful. This was an unanticipated result that should ideally be replicated before we draw firm conclusions, although it is consistent with prior theory. For instance, behavioural or psychological similarity to a social partner can foster social bonding and perception of friendship quality (Bolis, Lahnakoski, Seidel, Tamm, & Schilbach, 2021;Redcay & Schilbach, 2019). These empirical findings are framed within the 'dialectical misattunement hypothesis' (Bolis, Balsters, Wenderoth, Becchio, & Schilbach, 2017): mismatch between two individuals through communication misalignment and unpredictability of action reduces the efficiency of information transfer and bonding and may increase distrust in social relationships. In one virtual reality study, unfamiliarity, feeling out of place and feeling like an outsider increased participants' perceptions of being judged by avatars, as well as increased perceptions of these avatars aiming to cause emotional distress (Riches et al., 2020). Likewise, epidemiological work has observed increased psychosis risk in people who are marginalised, and 'othered' in a community (El Bouhaddani, van Domburgh, Schaefer, Doreleijers, & Veling, 2019;Kirkbride et al., 2017). Here we present evidence and a formal model that suggests the similarity between a participant and partner in non-clinical populations facilitates more precise predictions about a partner which may, in turn, reduce attributions of harmful intent.
Another possibility is that this relationship stemmed from participants' failure to recognise when their own decisions reflected harmful intent (specifically, when making choices that caused a larger discrepancy in earnings between themselves and their partner). Unfortunately, participants did not provide self-assessments of their intentions, so we cannot test whether participants recognised that their own competitive decisions reflected spiteful (and potentially harmful) motives. This would be fruitful to explore in any future work.
Paranoia was positively associated with preferences for more unequal payoffs in Phase 1, and with less flexible predictions and belief updating about a partner's preferences for relative payoffs in Phase 2, regardless of the baseline similarity. The relationship between paranoia and competitive preferences is consistent with prior work Raihani, Martinez-Gatell, Bell, & Foulkes, 2021;Schaerer, Foulk, du Plessis, Tu, & Krishnan, 2021). Our observations of belief rigidity and less social belief updating in paranoia is also consistent with previous experimental evidence in those with schizophrenia and borderline personality disorder Henco, Diaconescu, Lahnakoski, Brandi, Hörmann, Hennings, & Mathys, 2020;Wellstein et al., 2020), and suggests that this reduction in belief flexibility may specifically impinge upon the ability to form and update stable beliefs about the partner's preferences for equality or inequality. Interestingly, as more paranoid people were more likely to make competitive decisions in Phase 1 and attributed more harmful intent to their partners in Phase 2 (replicating Barnby, Bell, et al., 2020a;Greenburgh et al. 2019;Raihani & Bell, 2017, Saalfeld et al., 2018, the increased baseline similarity should have helped participants be more accurate in predicting the decisions of competitive partners in Phase 2. Nevertheless, more paranoid individuals were not more accurate in predicting the decisions of competitive partners, which implicates reduced belief flexibility as a potential cause. It may be that the degree of similarity between more paranoid participants and competitive partners was insufficient to overcome their reduced β σ . This combination of participant-partner similarity and belief flexibility should be explored in future work. Given that autistic-like traits have also shown associations with a reduction in social information processing (Sevgi, Diaconescu, Henco, Tittgemeyer, & Schilbach, 2020) it is also important to measure autistic-like traits in the future use of this task to understand the unique contribution of pre-existing paranoia. Our and prior data are consistent with claims that positive symptoms of psychosis (which frequently involve paranoia) may stem from a general inability to reconcile incoming information with current predictions (Fletcher & Frith, 2009), and this may be particularly applicable in social contexts (Bolis et al., 2017).
We note two main limitations. First, we recruited a sample that exclusively resided in the UK. While our sample was diverse in age, selfreported ethnicity, and sex, given the disparity in neuroimaging evidence between how familiar and unfamiliar individuals are represented in the brain (Ng, Han, Mao, & Lai, 2010;Zhu, Zhang, Fan, & Han, 2007) it might be that the model for representing social others will transfer cross-culturally, although the degree to which our beliefs and values are integrated into our priors over others may vary. Second, we focussed on a particular type of prosocial behaviour, where people show preferences for equal outcomes rather than preferring the partner to earn more than themselves. Meta-analysis suggests that different types of prosocial behaviour cluster together and are implemented with different subtleties in neural activation maps (Rhoads, Cutler, & Marsh, 2021), and therefore it is unclear whether different forms of prosocial behaviour would produce similar learning effects, or perhaps show more specific persistent choice biases as reported in Tarantola et al. (2017).
In sum, we used an SVO task that involved real financial incentives to examine whether and how people use their own social preferences to learn about the social preferences of others. Consistent with accounts of social learning, we found that people used their own social preferences as a prior for learning about the social preferences of a partner and that impressions were updated in a Bayesian manner. More paranoid participants held more rigid beliefs about the partner's preference for inequality, updating their posterior beliefs less across the board. Finally, participants adopted a relative notion of harm, rating choices consistent with their own preferences as being less intentionally harmful.

Data and code availability
All data, model code, analysis scripts, and R-Markdown workbook to reproduce the regression analyses are available on GitHub: https://gith ub.com/josephmbarnby/Barnby_etal_2021_SVO.

Declaration of Competing Interest
None to declare. Model descriptions and free parameters. The winning model is highlighted in bold. NB: Our winning model estimated individual parameters simultaneously for both phases 1 and 2, since the participants' own preferences play a critical role in their learning about their partners. Since we directly perform analyses on the participants' own parameters, α ppt m and β ppt m , we sought to ensure that this simultaneous fit was not corrupting our estimates of these quantities. Therefore, we estimated each Identical to model 2, with the addition of two lapse parameters in phase 2 that quantifies whether a participant is biased to persistently predict options their partner might choose that are congruent ε con or incongruent ε incon with their own SVO. 9 α ppt m β ppt m α σ β σ ρ con ρ incon Identical to model 2, with the addition of two lapse parameters in phase 2 that quantifies whether a participant is persistently biased to learn about and predict a partner's choices that are congruent ρ con or incongruent ρ incon with their own SVO.  Simulations of prior (t = 0) and posterior (t = 36) beliefs over α par (A) and β par (B) of synthetic prosocial, competitive, and individualist participants (x-axis facets) that have either low, medium, or high σ over their prior beliefs (y-axis facets) about partners of different SVO types.    (participant, partner) presented to participants in Phase 1. Participants made 18 choices in three categories. All participants saw all option pairs, but these were presented in a random order and their position on screen was counter-balanced, e.g., within prosocial-competitive option dyads, prosocial options appeared on the left three times and on the right three times. In Phase 1, participants acted in the role of decider in a modified SVO task. Participants chose between two options that determined the allocation of points between themselves and their partner (the receiver). Specifically, participants made six choices between prosocial and individualistic / competitive options, six choices between prosocial and competitive options, and six choices between individualistic and competitive options.      Predictive Accuracy = β0 + (β1 * α σ ) + (β2 * β σ ) + (β3 * Baseline Similarity) + (β4 * Paranoia) + (β5 * General Cognition) + (β6 * Age) + (β7 * Sex) + (β8 * Task  These equations were applied to all models for phase 1, aside from the 'beta-only' model. We modelled participant SVO as ranging along one dimension: relative payoffs (β ppt ) -how much participants preferred creating discrepancy between themselves and their partner.
In Phase 1, participants made 18 choices c t , t = {1…T} about whether option 1 or option 2 should be chosen given the utility (U t ) of each, such that the likelihood of choosing option one was: is the logistic sigmoid. Phase 2 -how participants learned about their partners' social preferences.
Over 36 trials, participants made binary predictions d t , t = {1…T} about whether option 1 or option 2 would be chosen by their partner given the other } of each pair of offers. They then discovered what the partner actually chose, which we write as d t .
The participant assumed that the partner chose the same way that they did themselves, but with SVO parameters β par , which they needed to infer from observation. That is, the log likelihood that the partner chose d t is LL = log (p(d t |β par ; R t )) using the same formula as eq. 8.
The partner's decisions D t = {d 1 , d 2 , …, d t } were used to update a participant's beliefs about a partner's β par , written as p(β par |D t ). The starting point for these beliefs (written as p(β par |D 0 ) was the participant's prior. For model 1, we assumed that this was a factorised distribution with each parameter centred on the participant's own preference β ppt m but with a standard deviation parameter β σ that characterised the extent to which the participant thought their partner might differ from themselves. Therefore, we had.
We then assumed that a participant's posterior beliefs about their partner from trials t = 1…36 given a partner's decisions followed Bayes rule: For efficiency, we could conveniently represent p(β par | D t ) by a vector over the fixed grid of β values, θ βpar t . We then calculated the participant's beliefs about their partner's SVO preferences for each trial: The model then stated that the participant predicted their partner's choice by calculating the probability determined by the utility differences ΔU βpar (R t+1 ) as in eq. (1), summed over the distribution θ βpar t over the partner parameters.
and then performed probability matching, so that Participant ignores the partner (Model 3).
We fitted the data based on the rules that the participant predicts their partner's decision based on their own α ppt & β ppt values rather than considering their partner's preferences. In this model, participants did not update their inferences based on feedback about whether they were correct or incorrect.
We introduced a version of the winning model with a parameter that pulled the central tendency of a participant's prior beliefs about a partner's α par & β par toward zero using a shrinkage parameter (ω) between phase 1 and phase 2. This replaces eq. 3. In this model we allowed for a parameter ζ to account for probability under or over-matching by a participant about a predicted decision, although inference occurred as per eq. 5. This equation replaces eq. 7. Model 7 allowed for the participant to make lapse errors during their predictions of a partner using a single parameter ε and therefore replaces eq. 7: Participant choice congruency bias (Model 8).
In model 6, the participant's own return influenced the way that they learned about their partner. In model 8, we considered the case that learning proceeds normally, but the participant evaluated the probability that the partner chose a particular option in a way that is biased by the participant's preferences. To this, we considered whether a potential partner prediction is congruent with the participant's preferences (having a greater utility for the participant based on the participant's SVO parameters) or incongruent. We implemented this using two lapse-like parameters, each bounded between 0 and 2. We therefore replaced eq. 7 as: Participant learning congruency bias (Model 9).
In model 9, we allowed congruency to affect learning as well as predictions. Therefore, we replaced eq. 5 as: The decision D t = {d 1 , d 2 , …, d t } a partner made is used to update a participant's beliefs about a partner's α and β, θ αpar, βpar t . However, in contrast to previous models, a participant's prior over a partner's initial probabilistic parameters α par & β par was parametrised by a new α & β which formed the central tendency of their beliefs (α 2 m & β 2 m ). This new joint probability was given a particular standard deviation along each dimension (α σ & β σ ).
Together, this formed the adjusted initial joint distribution of their beliefs about a partner's SVO, although all inference and probability matching occurred as usual. This replaces eq. 3.
Text B.1: Competing model heuristic formalism. We calculated a participant's preferences using the same framework as models 1-10. We then constructed a variation on the classic Q-learning model (Watkins, 1989;Watkins & Dayan, 1992) that computes the subjective internal value of choice types in the environment in phase 2. The classic model computes an option-value for each option Q a t , in our case where a ∈ {P, I, C} ranges over the three possible categorical social-value decisions a partner might make (prosocial, individualistic, competitive), rather than the actual values of the options. The values were initialized to Q a 0 = 1/3 (the mean reward expected given that each potential SVO choice has equal probability of giving a 1 -correct -or 0 -incorrect -outcome). Then if the participant predicted option â t on trial t and the partner chose option a t , the value of the chosen option is updated according to: where r t = 1 if â t = a t (i.e., the participant's prediction was correct) and r t = 0 otherwise and λ is the learning rate. In model 11, we applied the same learning rate to all 36 trials in phase 2. In model 12, we allowed for different learning rates according to whether the participant was incorrect or correct [λ neg , λ pos ]. In model 13, we had different learning rates [λ P , λ I, λ C ] for each of the three possible choices â t of the participant on trial t. In model 14, there were separate learning rates [λ c , λ i ] according to whether the partner's choice was congruent or incongruent with the participant's SVO, using the Phase 1 inference process of the Bayesian model to infer the participant's preferences. Finally, in model 15 we added a consonance parameter (ω; where 0 < ω < 1) that initialised Q a 0 according to the paradigmatic partner type the participant was most like, given the mode of their categorical decisions in Phase 1, and (1 − ω)/2 to the Q a 0 of the other two partners.
The prediction of the participant for trial t + 1 was then calculated using a softmax function of the values Q a t subject to a decision temperature, τ: