Naïve information aggregation in human social learning

To glean accurate information from social networks, people should distinguish evidence from hearsay. For example, when testimony depends on others' beliefs as much as on first-hand information, there is a danger of evidence becoming inflated or ignored as it passes from person to person. We compare human inferences with an idealized rational account that anticipates and adjusts for these dependencies by evaluating peers' communications with respect to the underlying communication pathways. We report on three multi-player experiments examining the dynamics of both mixed human-artificial and all-human social networks. Our analyses suggest that most human inferences are best described by a naïve learning account that is insensitive to known or inferred dependencies between network peers. Consequently, we find that simulated social learners that assume their peers behave rationally make systematic judgment errors when reasoning on the basis of actual human communications. We suggest human groups learn collectively through naïve signaling and aggregation that is computationally efficient and surprisingly robust. Overall, our results challenge the idea that everyday social inference is well captured by idealized rational accounts and provide insight into the conditions under which collective wisdom can emerge from social interactions.

One important aspect of successful social learning is the ability to distinguish the communication of new first-hand evidence from hearsay (Berg, 1993;Budescu & Yu, 2007;Enke & Zimmermann, 2019;Hahn, Hansen, & Olsson, 2020;Jönsson, Hahn, & Olsson, 2015;Whalen, Griffiths, & Buchsbaum, 2018).When agents communicate with one another, or base their beliefs on shared evidence, there is a danger of information becoming inflated or ignored as it spreads through a social network.For example, when two colleagues B and C recommend a new local coffee shop, it might seem like a ringing endorsement, until you learn that only B has been to the coffee shop while C just heard about it from B. This makes C's testimony non-independent and in this case, practically worthless.From the perspective of rational analysis (Oaksford, Chater, et al., 2007), dependencies between agents determine how much weight one of them should place on what each of the others says.Specifically, rational social learners should use their knowledge about who tends to communicate with whom -the structure of their social network -to make inferences about the truth behind a claim.Similarly, rational social learners should consider the timing and content of agents' statements to infer who is learning from whom, and when social judgments are indicative of new first-hand evidence.
In line with these predictions, recent empirical studies have shown correlations between human inferences and rational model simulations when reasoning with dependent sources of information (Fränken, Theodoropoulos, Moore, & Bramley, 2020;Whalen et al., 2018) as well as concordance between a rational theory-of-mind based framework and problems of reverse engineering the intended meaning of https://doi.org/10.1016/j.cognition.2023  of the fish are blue and 1 3 are red.On Planet Red, the proportions are reversed.These proportions are known to participants.At each trial , participants sample private evidence from the unknown planet ∈ {blue fish: , red fish: , no evidence: } and observe social evidence corresponding to the previous judgments provided by two other agents (agents B and C).Participants then provide judgments about the planet they are on using a seven-point scale, ranging from 3-blue (highly confident planet blue) to 3-red (highly confident planet red), as well as a structure judgment about the communication structure between all three agents.Participants know that other agents' judgments are elicited using the same seven-point scale and that all are incentivized to be correct.[b] Illustration of computational models.The baseline model (Level-0) uses a beta-binomial model to update its beliefs strictly from private evidence.The naïve social learning model (Level-1) aggregates other agents' judgments and inverts a generative model of the observed judgments to infer the underlying belief (and thereby the evidence), followed by beta-binomial updates combining both observed private evidence and other agents' inferred private evidence.The idealized rational model (Level-2) further conditions its inferences about the unknown private evidence on the communication structure (i.e., it corrects for dependencies in evidence), followed by the same beta-binomial updates as the Level-1 model.[c] Experimental setup.In Experiment 1, participants (agent A) interact with two idealized (artificial) agents whose judgments were fixed across conditions.Between conditions, we manipulate whether participants (agent A) receive no structure information, are told that C could see B's judgments or the reverse.In Experiments 2-3, the focal participant (agent A) interacts with two other human participants (B & C) and the actual network structure is manipulated across two conditions.Participants are informed about the network structure in Experiment 2 but must infer it in Experiment 3. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)others' utterances or the beliefs and desires that caused their behavior more generally (Goodman & Stuhlmüller, 2013;Hawthorne-Madell & Goodman, 2019;Jara-Ettinger, Gweon, Schulz, & Tenenbaum, 2016;Jara-Ettinger, Gweon, Tenenbaum, & Schulz, 2015;Lucas et al., 2014;Wu et al., 2021).In these works, social inference is grounded in a ''utility-maximizing'' assumption, making it possible to reverse engineer the causes of peers' behavior by inverting a generative model of how they ought to rationally update and express their beliefs to achieve their goals (Baker, Jara-Ettinger, Saxe, & Tenenbaum, 2017;Baker, Saxe, & Tenenbaum, 2009;Jara-Ettinger, Schulz, & Tenenbaum, 2020;Lopez-Brau, Kwon, & Jara-Ettinger, 2022).
While there is good evidence people are capable of these kinds of inferences, in this paper, we challenge the idea that social learning is, in general, well captured by such computationally involved and idealized rational accounts.To investigate this, we study human learning in micro social networks made up of both simulated and actual human learners.We find instead that social inference in this iterated online multi-person setting is more often dominated by computationally cheap ''naïve'' inferencing in which learners integrate their peers judgments at face value, lacking sensitivity to the nuances of their dependencies and redundancies.Inspired by standard ''urn'' tasks (Anderson & Holt, 1997), we develop a novel social learning task in which participants combine first-hand observations (private evidence) with judgments from two other agents who are making their own private observations (social evidence).Participants are tasked with inferring a property of their shared environment (planet judgment ) and in some conditions also with inferring the underlying communication structure between themselves and the other agents (structure judgment ; Fig. 1a).Since participants cannot directly access each other's private evidence, to succeed they must estimate the summative impact of the unknown private evidence seen by the other agents (simulated or human) from observations of the their public judgments and combine this with their own private evidence.One way to approach this task is by attempting to invert a generative model of the origins of peers' judgments, taking into account the likely dependencies in the social evidence due to the structure of the network and iterated nature of the task (e.g., Whalen et al., 2018).In contrast, pilot work in a similar task (Fränken, Valentin, Lucas, & Bramley, 2021) suggested that this behavior may be more of an exception than a norm: many participants may adopt a more naïve approach, where they simply incorporate other agents' judgments in a heuristic manner.
To test this and understand how participants make sense of the social evidence, we will analyze behavior using three nested computational models (Fig. 1b), which, akin rational speech act models (c.f.Frank & Goodman, 2012), assume different levels of recursion: (1) A baseline model (Level-0) which ignores social evidence and forms inferences strictly from private evidence, (2) a computationally cheap, naïve social learning account (Level-1), based on Fränken et al. (2021), which aggregates other agents' judgments at their face value, and (3) an idealized rational account (Level-2) which, similar to rational models of testimony (e.g., Fränken et al., 2020;Pilditch, Hahn, Fenton, & Lagnado, 2020;Whalen et al., 2018;Xie & Hayes, 2022), conditions its inferences about the unknown evidence on the underlying communication structure between agents.Using our task and modeling infrastructure, we study human inferences across three behavioral experiments.Our experiments differ in terms of the number of human versus artificial agents making up the network, as well as in terms of these agents' knowledge about the structure of the network (Fig. 1c).In Experiment 1, there is just one human participant per network (agent A) who interacts with two artificial agents (B and C).In Experiments 2-3, a focal participant (agent A) interacts with two other human participants (agents B and C).
The rest of our paper is structured as follows: We first introduce our social learning task (Task), then formalize inferences in our task through the lens of our computational models (Computational Models).Next, we describe our three behavioral experiments and report our findings (Experiments).We then conclude with a discussion of the broader impact and also the limitations of our work (General Discussion).

Task
In our social learning task, participants have to combine their own private evidence samples from the environment with other agents' judgments to make their own judgments (see Fig. 1a & Fig. 2 for an illustration.See Fig . A.1 and our online for full instructions).Under our minimal cover story, participants have crash landed on one of two planets with different proportions of blue and red fish.Participants are told that on the first planet, Planet Blue, 2 3 of the fish are blue, and 1 3 of the fish are red.On the second planet, Planet Red, proportions are reversed.Aside from the different proportions of red and blue fish, the planets are indistinguishable.Over the course of ten trials participants sample private evidence by fishing, occasionally catching either a blue ( ) or red ( ) fish and so learning about the proportion of fish on the current planet.The probability with which participant catch a blue or red fish versus no fish (i.e., no evidence marked as ) at each trial is unknown to participants.Alongside their own private observations, participants can see the previous judgments provided by two other agents.Importantly, the private evidence collected by other agents is never observed by participants.Moreover, participants do not know about the frequency with which other agents observe private evidence, nor do they know at which trial other agents observe private evidence.The only thing participants know is that other agents are on the same planet, and hence the evidence sampled by the other two agents must come from the same distribution.To succeed in the task, participants must thus infer the summative impact of the unknown private evidence seen by the other two agents from observations of the other agents' public judgments and combine it with their own private evidence.Participants have to provide a judgment about the planet they are on repeatedly across a series of ten trials, using a seven-point scale ranging from 3blue (highly confident planet blue) to 3-red (highly confident planet red).This in turn forms the social evidence that other participants see (depending on their position in the network).Participants can always see both their own history of private evidence samples, their own previous judgments, but crucially also the previous judgments of neither, one, or both other agents depending on their position in the network.
When inferring the unknown private evidence from other agents' judgments, participants may or may not be told about the communication structure between themselves and the other two agents.Specifically, our task enables us to manipulate the social information network -who actually sees whose judgments -as well as manipulating participants beliefs about the parts of the network they do not observe directly-whether other agents can see their judgments, or each other's judgments.The true structure influences how information propagates and as such participants structure beliefs influence the weights they should assign to the judgments of other agents (i.e., discount or increase the inferred evidence behind other agents' judgments).For example, if agent A believes that agent C sees agent B's judgments, they should anticipate that there is likely to be some redundancy whereby C's judgments are partly a consequence of B's (akin to our coffee shop example from the introduction).If C makes similar judgments to B perhaps they have not seen any evidence of their own.Alternatively, if C's judgments differ dramatically from B's (Fig. 2, left), this would suggest that C has observed substantial evidence, enough to override the influence of judgments from B. Furthermore, if the distal network structure is unknown to an agent, there is the potential for them to infer it by recognizing patterns of inheritance (i.e., if C's judgment reliably shifts in line with B's preceding judgment).In our experiments, we either explicitly provide the network structure to participants (Experiment 1, conditions two and three, and Experiment 2) or withhold structure information (Experiment 1, condition one, Experiment 3).In conditions where we withhold structure information, at the end of the task we have participants provide a structure judgment (Fig. 2, right).This setup enables us to study whether participants aggregate judgments at their face value, versus additionally consider and accommodate potential redundancies implied by the communication structure, and further whether they can infer this structure from the communication sequences.We next formalize these intuitions using a nested set of computational models.

Overview
We formalize learning in our tasks through the lens of Bayesian inference.At a high-level, the idea is that agents assign different degrees of belief to different states of the world, which we represent using probability distributions.As they encounter new private (binomial) evidence ∈ {blue fish: , red fish: , no evidence: } at each trial, agents can update this distribution using conjugate updates within a beta-binomial scheme ( → no update; see Fig. 3, left).Given a belief such as ℎ ∼ ( = 3,  = 2), we assume agents derive probability masses for each of the  = 7 discrete judgment options using the cumulative density of the beta distribution   (, ): for each   where  ∈ {1, 2, … , } (see Fig. 3,right).Moreover, given an observed judgment, such as ''1-red'', shown in Fig. 3, one can infer a distribution over possible beliefs ℎ and thus make a guess about the evidence going into this judgment by ''inverting'' the inference model.The primary difference between our computational models listed below lies in the mechanism used to apply this inversion.This allows us to study whether people are primarily using computationally cheap, naïve , an agent increments the  parameter by one to update its belief from ℎ ∼ ( = 2,  = 2) to ℎ ∼ ( = 3,  = 2).Right: agents use their beliefs ℎ ∼ (, ) to make judgments by assigning probabilities to each response option on the seven-point scale.Similarly, given an observed judgment, such as ''1-red'' which corresponds to the highest bar with a gray border, agents can infer the most likely belief (and thus, the underlying evidence) that must have produced the observed judgment by inverting the model.This setup allows agents to flexibly infer unknown private evidence from observed judgments.(For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)(Level-1) inferences -which were previously explored in a simplified setting in Fränken et al. (2021) -or whether people use rational, structure-sensitive (Level-2) inferences-as proposed in other previous work (e.g., Fränken et al., 2020;Pilditch et al., 2020;Whalen et al., 2018).

Baseline: Level-0 Inference
Our baseline model, Level-0 inference, disregards social evidence and thus applies no inversion of other agents' beliefs.Instead, the Level-0 model strictly updates beliefs ℎ ∼ (, ) upon observing private evidence by sequential application of Bayes' rule (ℎ |   ) ∝ (  | ℎ)(ℎ) at each trial  using simple conjugate updates: Naïve Social Learning: Level-1 Inference Our first social inference model, Level-1 inference, combines observed private evidence with observed judgments from other agents to update beliefs ℎ ∼ (, ).Specifically, for every other agent , Level-1 uses that agent's most recent judgment  () −1 to compute a posterior probability for each of the agent's most recent private evidence observations ( ()  −1 | ℎ () −2 ,  () −1 ) given the agent's previous belief ℎ ()  −2 and observed judgment  () −1 : Here, ℎ () −2 corresponds to agent 's own beta-distributed belief prior to incorporating their private evidence observation  ()  −1 .For simplicity, Level-1 inference assumes that other agents are ''reliable'' (c.f.Hawthorne-Madell & Goodman, 2019), meaning that they assign a higher likelihood to response bins with a high posterior probability under their beliefs (see Additional Model Details, for further details).Consequently, Level-1 inference produces a prediction about the most likely evidence  ()  −1 leading to the observed judgment  () −1 .This inference captures how the previous belief ℎ ()  −2 of an agent  must have changed in response to the inferred evidence to produce the observed judgment.As an example, imagine that ℎ ()  −2 is equal to ( = 1,  = 1).If we now observe a new judgment  ()  −1 which is equal to ''1-blue'', we would identify  ()  −1 = as the most probable evidence leading to the new judgment.Importantly, this inference process allows us to incorporate uncertainty, meaning that we update our belief about another agent's parameters ( = 1 and  = 1) based on the probability of versus given  () −1 .For example, assuming a uniform prior over private evidence and a judgment such as ''1-blue'', we may arrive at a posterior probability of 0.95 for  ()  −1 = , a probability of 0.01 for  ()  −1 = , and a probability of 0.04 for  ()  −1 = , meaning that the updated marginal belief about 's belief ℎ ()  −1 corresponds to ( = 1.01,  = 1.95).
Level-1 inference assumes that all agents form beliefs independently of one another, that is, implicitly and erroneously assuming agents never observe each other's judgments.Given this assumption, Level-1 inferences can update their belief ℎ at each time step by conditioning jointly on their current private evidence   and inferred evidence behind the judgments ∈ {ℎ (1)  −1 , … , ℎ () −1 } from  independent agents: where −1 ) are the inferred beliefs for  agents based on their own (unobserved) private evidence from the previous time step obtained from Eq. ( 3).1

Rational Social Learning: Level-2 Inference
Our most sophisticated model, Level-2 inference, extends Level-1 by considering whether agents can see each other's judgments during previous time steps.Level-2 inference corrects for such dependencies by evaluating agents' beliefs to build a joint probability distribution over the potential histories of private evidence observed by each agent which can then be marginalized over.If structure is unknown this can be done separately under every structure hypothesis  ∈  and the learner can additionally marginalize over their structure uncertainty.Formally, Level-2 inference can be expressed as: (5) Here, () corresponds to the probability of a given network structure  ∈  and (ℎ ()  −1 | ℎ (1) −2 , … , ℎ (−1) −2 ,  () −1 , ) is computed recursively until a termination condition is met.In our setup, this termination condition could be (1) finding an agent that has no parents (i.e., an independent agent) or (2) the start of the game if there are no independent agents.
Importantly, in the present analyses,  was provided to Level-2 inference (either through experimental instructions or by eliciting structure judgments from participants), which allows us to disregard the marginalization over structures .We describe how to update  () given a judgment sequence in Appendix Structure Learning.To understand the intuition behind Level-2 inference, consider condition two (A told C sees B) from Experiment 1.Here, B is an independent agent and the parent of C, who can see B's judgments.To infer the evidence behind both B's and C's observed judgments, Level-2 inference involves first computing ( ()  −1 | ℎ () −2 ,  () −1 ) and updating ℎ () −2 ) which conditions on the putative evidence B observed two time steps ago  ()  −2 which C presumably inferred and incorporated at the previous time step.Thereafter the model updates ℎ ()  −1 based on both the private evidence imputed for C  ()  −1 and the private evidence C previously imputed from B  ()  −2 .Consequently, if for example C provides a judgment  ()  −1 equal to ''1-red'' upon observing a judgment  ()  −2 equal to ''1-red'' from B at  − 2, Level-2 inference accounts for this dependency when inferring  ()  −1 , such that the evidence most likely to have produced the observed judgments is not counted twice (see dependency in Fig. 1b).2 Accommodating Autocorrelation: ''Sticky'' Models In our task, participants are required to make the same judgment ten times as evidence arrives (Fig. 2, left).This setup presents a challenge for the above inference models, which predict judgments at a specific time point are based on the total evidence (both observed and inferred), and are independent of the participant's previous judgments.A wealth of research shows that people's responses when probed repeatedly tend to be autocorrelated over and above what is licensed by the evidence.This has been shown in both single learner (e.g., Bramley, Dayan, Griffiths, & Lagnado, 2017;Dasgupta, Schulz, Tenenbaum, & Gershman, 2020;Hogarth & Einhorn, 1992;Lieder, Griffiths, Huys, & Goodman, 2018) and multiagent settings (e.g., Fränken, Theodoropoulos, & Bramley, 2022).To partially accommodate such order effects, and thereby enhance our models' ability to capture the meaningful patterns in participants' judgment sequences, we thus incorporated an additional variant (''family'') of the inference models.In addition to using the cumulative density   (, ) (Eq.( 1)) to assign discrete probabilities to each judgment   ∈ { 1 ,  2 , … ,   } followed by a softmax function, the second family of models incorporates an additional free parameter (mixture weight ), which mixes each soft-maxed model prediction with the possibility of simply ''sticking'' to the previous judgment (see Additional Model Details, for details).Lastly, we include a random baseline model that predicts each judgment with a uniform probability of 1  ( = 7), resulting in a total of seven competing models for each experiment.3

Experiments
Using our task and computational models, we study human inferences across three behavioral experiments.In Experiment 1, one participant (playing agent A) interacts with two simulated agents (agents B & C).In Experiments 2 and 3, a focal participant (agent A) interacts with two other real human participants (agents B & C).For each experiment, we first unpack participants' judgment patterns descriptively, followed by an aggregate (group-level) analysis comparing directional differences between conditions.We then evaluate each model's predictive accuracy on an individual participant level using leave-one-out crossvalidation, which is the main focus of our analysis.The experiments were not preregistered.Based on our previous work (Fränken et al., 2020(Fränken et al., , 2021)), we hypothesized that participants' judgments setting would be best described by a naïve, Level-1 account.

Experiment 1 4.1.1. Setup
We first examined a controlled setting with one participant (agent A) and two artificial agents, B and C, each operating under known and unknown social network structures.Using this setup, we studied participants' inferences across three between-subject conditions.In the first condition, no structure information, participants did not receive any information about the relationship between B and C, thus requiring them to infer the underlying network structure at the end of the game.In the second condition, A told C sees B, and the third condition, A told B sees C, participants were instructed that agents B and C were nonindependent changing the impact of their judgments from the Level-2 model perspective.

Participants
We recruited 150 adults from Prolific Academic (Palan & Schitter, 2018), aiming for a sample size of 50 participants randomly assigned to each condition.Four participants dropped out, resulting in a final sample of  = 146 (35.94 ± 15.52, 98 female, 48 male).Of these, 47 participants were assigned to the first condition (no structure information), 50 participants to the second condition (A told C sees B), and 49 participants to the third condition (A told B sees C).Participants received a payment equivalent to an hourly rate of £5.02.This payment included a base amount of £1.00 and a performance bonus of up to £1.50.To incentivize high-quality judgments, we paid each participant a small bonus of £0.15 on every trial in which the participant's judgment was in the correct direction on the scale from the midpoint, where the ''correct'' direction was determined by the combined amount of evidence observed by the entire network.Before starting the main task, participants completed a brief training to familiarize themselves with the game environment and reward structure.Detailed instructions are provided in Fig. A.1 and our online .

Stimuli
In all three conditions, participants played the role of agent A and caught a single red fish themselves on trial 2 ( 2 ) and no fish on any other trial.In all three conditions the participants also saw the same sequence of opposing judgments from B and C (Fig. 2, left), which, depending on the relationship between B and C, led to qualitatively different predictions for Level-2 inferences. 4 We selected a judgment sequence for agents B and C to produce a large qualitative difference for Level-2 predictions between conditions.When generating stimuli, we assumed that agents B and C were reliable (c.f.Hawthorne-Madell & Goodman, 2019), meaning that their provided judgments corresponded to those with a high posterior probability under their respective beliefs ℎ.The resulting judgment sequences for B and C are shown in Fig. 2, left.We note that there is no ground truth in this condition as we picked B's and C's judgments in advance to separate both model predictions within a condition as well as a given model's predictions between conditions.

Results
We begin by describing participants' average judgments (Fig. 4a) across different conditions.In the first condition (no structure information), the most likely private evidence distribution on the planet indicated participants were on Planet Blue.Consistent with this, 80% of participants gave a judgment in favor of Planet Blue at the final time step (mean ± standard error = 1.16 blue ± 0.20).In the second condition (A told C sees B), the most likely private distribution suggested participants were on Planet Red.However, only 34% of participants opted for Planet Red in the final judgment (and on average, preferred blue with a mean of 0.34 blue ± 0.19).Finally, in the third condition, 65% preferred blue at the final time step (0.98 blue ± 0.21), which again aligned with the true evidence distribution.
To better understand individual-level behavior, we finally derived quantitative predictions for each inference model.The number of participants best predicted by each model during cross-validation within each condition is shown in Fig. 4b.Results from our individuallevel analysis suggest predominantly naïve, structure-insensitive inferences with a bias towards sticking with the previous judgment (Level-1 sticky).Specifically, across all conditions, our Level-1 sticky account best predicted 48% of participants overall, while 24% across conditions were best characterized by the rational, structure-sensitive Level-2 sticky competitor.The majority of the remaining 28% across conditions were best accounted for by the Level-1 and Level-2 variants.These results suggest that a nontrivial number of participants were able to account for dependencies between agents' judgments in line with the predictions of Level-2 inference, while the majority were better described by a structure-insensitive naïve account.
In condition one with no structure information, we wanted to ensure that the simulated judgment sequences for B and C made it difficult to infer the existence or direction of any dependency between B and C since this would undermine the instruction manipulation.To test the degree to which the sequence of simulated judgments were informative about the communication structure, we present participants' structure judgments in condition one alongside posterior probabilities for each connection under a Bayesian structure learning model; see Structure Learning for details; Fig. 4(c).This reveals that a normative structure learner assigns a low probability to a dependent communication structure being behind this sequence, assigning a probability of 0.03 to C sees B and 0.21 to B sees C. Participants' edge selections were also lower than random chance at 22% and 37%, suggesting that they could not tell whether or not there was a dependency between B and C. Overall, the above edge selections primarily functioned as a sanity check and were deemed sufficiently low for our manipulation to be effective.Interestingly, this analysis did reveal that both participants and our structure learner frequently and erroneously hallucinate that either or both of B and C were reacting to their judgments (i.e., those of agent A).
Overall, in Experiment 1, the dominant pattern of judgments reflected naïve social inference, with most participants overlooking the dependency between sources.This is also evidenced by a preference for blue across all three conditions at the final time step, despite the simulated agents' judgments being rationally consistent with there having been more red than blue fish caught overall in condition C sees B (see Fig. 4a).Our group-level analysis revealed that, averaged over time, participants' judgments in condition C sees B were significantly ''redder'' than in the other two conditions, qualitatively in line with the Level-2 account which best predicted 24% of participants' individual inferences across conditions.This suggests that, assuming a controlled setting with simulated agents, we are able to replicate previous structure-sensitive inference (e.g., Fränken et al., 2020;Whalen et al., 2018).However, our detailed investigation of individual-level behavior revealed that this statistical effect was driven by only a minority of the participants.

Experiment 2 4.2.1. Setup
A shortcoming of the simulated agent setting is that it is not clear how human-like the behavior of the simulated agents is and what effect that has on the judgments.To explore a slightly more naturalistic setting, we next investigate inferences with three real participants.For Experiment 2, we focused on a known network structure setting and compared two between-subject conditions.In the first condition (independent known) participants playing the roles of B and C were independent of one another, that is they could not see each other's judgments but only received private evidence.In the second condition (C sees B known), participant C could see participant B's previous judgments, making C's judgments dependent on B. All participants were made aware of the communication structure prior to starting the task and it was visualized throughout the experiment (see Fig. 2a).Apart from the above differences, instructions and procedures in Experiment 2 were identical to those in Experiment 1.Since this network structure means that Agent A is the only one that has to deal with dependent information sources, we report our primary analyses from the perspective of the focal participant (agent A).

Participants
To recruit participants via Prolific, we developed a client/server application enabling real-time interactions between three randomly matched participants using their web browser.Once matched, each triad was randomly assigned to one of the two between-subject conditions and the three participants were randomly assigned to one of the three roles.Overall, we recruited 126 participants (25.82 ± 7.08, 70 female, 56 male) comprising 42 triads.Twenty-one triads (63 participants) were assigned to condition one (independent known) and another 21 triads (63 participants) were assigned to condition two (C sees B known).Participants were paid as in Experiment 1.

Stimuli
Similarly to Experiment 1, the participant playing agent A observed one red fish at the second trial.Meanwhile, the participant playing agent B observed one blue fish on trial three, and agent C observed one red fish on trial seven.The staggered and alternating private evidence seen by players A, B and C was selected to produce differences in the judgments of agent A depending on both the structure condition and whether they performed predominantly naïve Level-1 or Level-2 inferences.To unpack why this is the case, recall that agent B does not get any social evidence, and agent C gets only evidence about B's earlier judgments in condition two (C sees B known) meaning only agent A needs to worry about dependence between judgments.Given this setup, we anticipate that B will make blue-leaning judgments in both conditions since all they see is one blue fish.In condition two, Agent C will regard B's blue-leaning judgments as initial evidence favoring the blue planet before their own red catch leaves them finally with a roughly neutral judgment.If A overlooks this dependency (naïve inference), they miss the chance to deduce that C's latterly neutral judgment is actually most compatible with them having seen a red fish.As a result, a naïve agent A will consider their own red fish and B's blue judgment as providing insufficient evidence for either planet, leading to A holding a neutral final judgment in condition two.In the independent structure condition one C is likely to make red-leaning judgments and so A should also favor red on the balance of evidence.However, if A correctly identifies the structure and adjusts for the dependency between B and C in a rational, Level-2 manner, we would expect A to infer the additional red fish from C's neutral judgments in condition two, resulting in preference for the red planet in both conditions.

Results
At the final time step, there was no statistical difference between A's judgment preferences across conditions (Fig. 5a).In condition one (independent known), 52.4% of participants assigned to A preferred Planet Red (mean ± standard error = 0.29 red ± 0.29).In condition two (C sees B known), the preference for red remained the same at 52.4%, although the mean was smaller at 0.14 red ± 0.44, which may be a result of the structure manipulation, despite there being no differences between conditions when averaging A's judgments across time steps (standardized  -score:  = 0.70,  > 0.05, CLES = 0.564).Full judgment sequences are presented in Fig. A.3. 5  We next repeated our cross-validation analysis, which again supported predominantly naïve inference with a bias towards sticking close to earlier judgments (Level-1 sticky ) in condition two (C sees B known), best predicting 62% of participants compared to only 10% best accounted for by structure sensitive Level-2 sticky (Fig. 5b).This was a notably smaller percentage compared to Experiment 1.Note that in condition one (independent known), Level-1 and Level-2 inference predictions coincide as the naïve aggregation of evidence is the same as assuming independence.Overall, results from Experiment 2 replicate the pattern of predominantly naïve inferences found in Experiment 1 but now do so in a genuinely social scenario.

Experiment 3 4.3.1. Setup
In Experiment 2, we gave participants complete access to the structure of their social network (i.e., full knowledge of who sees whose judgments).However, in real social interactions, we often have either no or only limited or uncertain knowledge about other people's precise communication histories.To assess how this additional layer of uncertain and complexity affects inferences in the current learning problem, we finally studied a setting with three human agents and an initially unknown social network structure.As with the first condition from Experiment 1, participants had to provide structure judgments at the end of the task which we used for the Level-2 model.

Participants
We used the same client/server software as in Experiment 2 to synchronize participants into triads.Overall, we recruited 129 adults (25.22 ± 7.04 years, 61 female, 68 male) through Prolific and paid as in Experiments 1-2.Procedures were identical to Experiment 2 with the only exception being the omission of the network structure instruction and structure visualization throughout the task.Overall, 22 triads (63 participants) were assigned to condition one (independent unknown) 21 triads (63 participants) were assigned to condition two (C sees B unknown).

Stimuli
As in Experiment 2, agent A caught a red fish at trial two, B caught a blue fish at trial three, and C caught a red fish on trial seven.[c] Average edge probabilities for participants (left) and a Bayesian structure learner.Inferred edge probabilities from B to C in condition two were higher (60% for participants and 75% for a normative structure learner) as compared to condition one in which B and C were independent (31% for participants and 60% for a normative structure learner).

Results
There was no notable difference between A's judgment preferences across conditions at the final time step (Fig. 6a).In condition one (independent unknown), 45.5% of participants assigned to A preferred Planet Red (mean ± standard error = 0.41 red ± 0.38).In condition two (C sees B known), the preference for red remained similar at 47.6%, again with a lower average judgment of 0.29 red ± 0.25 which was presumably a result of the structure manipulation, despite there being no significant difference between conditions when looking at A's timeaveraged judgment sequence (standardized  -score:  = 0.486,  > 0.05, CLES = 0.544). 6ur cross-validation model fit for Experiment 3 revealed that Level-1 sticky best predicted 41% of participants in condition one, which was again higher than Level-2 sticky (23%; see Fig. 6b).In condition two, Level-1 Sticky and Level-2 Sticky inference (conditioning on participants' finally judged structure, see below) shared the same proportion of participants best predicted (29%).Overall, these results provide additional support for the previously found naïve pattern across conditions.Finally, examining the quality of the structure judgments, the proportion of participants' marking each edge is shown in Fig. 6c and compared against the marginal posterior edge probabilities under our Bayesian structure learning model.As expected, the proportion of participants marking an edge from B to C was higher in condition two (60% for participants and 75% for the Bayesian structure learner) than in condition one which B and C were independent (31% for participants and 60% for a normative structure learner).Notably though, the accuracy of the structure judgments by both participants and the normative structure learner were low.Both participants and the model often wrongfully judged that their own (agent A's) judgments were visible to agents B and C.

Differences between experiments
An important consideration in interpreting the above findings is the different nature of the social evidence (i.e., judgments by B and C) between Experiments 2-3 and Experiment 1.The social evidence in Experiment 1 was simulated to align with rational and reliable inference patterns (see Eq. (A.2)), while in Experiments 2-3, the judgments by B and C were made by actual human participants.Human judgments varied across triads, often diverging from the patterns predicted by rational simulations (see Fig. A.3).While this discrepancy had no direct influence on our ability to characterize agent A's (participant) inferences as predominantly naïve, it significantly affected the performance of the structure-sensitive Level-2 learner.Specifically, Level-2 predictions diverged directionally from those of an omniscient observer of all the caught fish (see Fig. 5a and Fig. 6a).Notably, in Experiment 3 condition two, where the social network structure was undisclosed and agent C could see agent B's judgments, Level-2 inferences in the role of agent A led to predictions that were directionally wrong on average, slightly favoring the blue planet (Fig. 6a) despite the overall evidence observed by all agents being two red fish versus one blue fish .This was due to Level-2 inference incorrectly assuming that the human agents were also behaving like rational utility maximizing learners, while model fitting suggests this was only true for a small proportion of participants.Diverging from Level-2's predictions, participants playing agent A made judgments actually aligned more closely with the ground truth, favoring the red planet in both conditions of Experiments 3.This is consistent with the idea that participants relied on simpler accumulation strategies, leading to inferences more robust to the complexity and ambiguity inherent in social evidence.

General discussion
A key feature of our analyses and results is that the structure of a social network has the potential to shape the beliefs of members, often away from the ground truth.As such, we view our results as complementary to prior simulation-based studies that demonstrate this distorting effect of social network structure in simulations (Fränken & Pilditch, 2021;Hahn et al., 2020;Lewandowsky, Pilditch, Madsen, Oreskes, & Risbey, 2019;Madsen, Bailey, & Pilditch, 2018;Madsen & Pilditch, 2018).These studies simulate how information propagates through large artificial networks, typically assuming that social communications are received and integrated accurately but ''naïvely'' (in our terminology), albeit exploring how they may be tempered by agentspecific considerations like trust.For example, Hahn et al. (2020) show that both dense connectivity and clustering in artificial social networks reduces the ''truth tracking'' of information propagation among otherwise rational agents.Here we show that even in minimal three-person social networks with known structure, the kinds of naïve inference that produce these distortions is the dominant behavior of human social reasoners.
Building on a distinct literature that has developed computational models of rational social learning, we noted how previous social inference accounts depend on a utility-maximizing assumption under which people reverse engineer one another's mental states and the evidence behind them by inverting a generative model of how those people form and express their beliefs.Our experiments probed the extent to which this framework captures real multi-agent social learning dynamics.The results suggest that while a minority of participants exhibit hallmarks of sensitivity to the principles of such rational social inferences, the dominant inference mode is more naïve, seeming to lack accommodation of communication-history-based dependencies between peers.Our results thus challenge a number of recent findings suggesting that human social learners reliably engage in sophisticated, rational inferences when reasoning from others' behavior (Baker et al., 2017;Fränken et al., 2020;Whalen et al., 2018).A possible implication of this is that individuals may not only discount or inflate the evidence from dependent sources when directly hearing from them, but also when receiving information about dependent sources' opinions indirectly.For instance, if person B recommends a restaurant and mentions that person C also enjoys it, a naïve recipient A, upon meeting B and hearing about the restaurant, may mistakenly update their beliefs based on B's report.
Moreover, results from our all-human networks with noisy heterogeneous communication patterns demonstrate that assuming other agents are rational reasoners (e.g., Jara-Ettinger et al., 2015) can also be a cause of systematic judgment errors when those agents fall short of this standard (Fig. 6a).Contrary to the directionally wrong predictions by the rational Level-2 model predictions, human inferences were surprisingly robust with most agent A participants ending up with a directionally correct belief about the planet they were on.This might suggest we adopt simpler models of our peers than the utility-maximizing ''homo economicus'' central to more idealized accounts (Baker et al., 2017(Baker et al., , 2009;;Jara-Ettinger et al., 2020), thereby accounting for the fact that others' reasoning strategies are also often fallible and naïve.This is essentially the opposite move to attempting to accommodate such inter-individual limitations and heterogeneity explicitly in one's social reasoning.For instance, one could articulate a ''Level-3'' extension that attempts to anticipate the various departures peers make from rational inferencing (Alanqary et al., 2021).However, this approach is computationally expensive and demands additional theory-of-mind recursion and costly marginalization (Alon, Schulz, Dayan, & Rosenschein, 2022;Camerer, Ho, & Chong, 2004;Oey, Schachner, & Vul, 2022).On the face of it such complex reasoning seems unlikely to be resource rational in most circumstances (Lieder et al., 2018) given that the computational limitations of the social reasoner are on average going to be just as severe as those plaguing their social peers.As such, we propose that, collectively, human societies may find a better computation-accuracy trade-off in the social inference sphere through mutual adoption of a more naïve social learning heuristics.
There are obvious limitations to our study: While our focus on human-human interactions extends previous work that was restricted to simulated social peers (Enke & Zimmermann, 2019;Fränken et al., 2020) and known dependencies (Pilditch et al., 2020;Whalen et al., 2018), the limited degree to which participants engaged in rational inferences might be a result of the low-bandwidth of the social evidence (i.e., rating scales) or the incentive structure (participants were rewarded for their own success irrespective of others).Future extensions of the present paradigm could thus explore the consequence of allowing participants to provide richer, more linguistic, social evidence-and so increase the bandwidth of communication channels in the network.Moreover, extensions could examine the impact of cooperative incentives, which might lead participants to signal strategically and make correspondingly different social inferences in order to maximize a group's overall payoff.Such a setting might plausibly result in more deeply recursive social inferences, as a learner's reward would directly depend on the beliefs and performance of their peers.Cooperative incentives would also open the door to incorporating theories of active learning (Coenen, Nelson, & Gureckis, 2019) and metacognition (Fleming & Daw, 2017) to capture how people might use their communication signals to resolve uncertainty about peers' competence, motivations, or about the social network structure.Another limitation of the present work is that we did not account for participants' perceived credibility or reliability estimates of other players, which has previously been shown to influence social belief revisions and dependency judgments (Bovens, Hartmann, et al., 2003;Hahn, Harris, & Corner, 2009;Harris, Hahn, Madsen, & Hsu, 2016;Madsen, Hahn, & Pilditch, 2020).Additional modeling extensions could thus probe or manipulate learners' beliefs about the expertise or trustworthiness of peers' judgments to better understand how people weigh private versus social evidence.Furthermore, given that people's judgments in our task were best described by autocorrelated (''sticky'') variants of our models, another extension could be to incorporate others' tendency towards autocorrelation when reasoning about their judgments.Finally, it is important to further establish the conditions under which naïve inference can be adaptive, as well as its implications for explaining population dynamics like information cascades (Bikhchandani, Hirshleifer, & Welch, 1992) and echo chambers (Madsen & Pilditch, 2018).
In sum, we found that people's inferences in an iterated social learning setting were relatively insensitive to dependencies between their network peers.Moreover, we found that simulated rational learners who assume peers behave rationally make systematic judgment errors when reasoning from genuine noisy human judgments.In contrast, human learners appear to succeed in our task through naïve inference

Fig. 1 .
Fig. 1.High-level overview of the paper.[a] Task illustration from the perspective of agent A (the focal participant across experiments).Over the course of ten trials, participants reason about their unknown location in space, i.e., whether they are on Planet Blue or Planet Red.Each planet has a different proportion of red and blue fish.On Planet Blue, 2 3

Fig. 2 .Fig. 3 .
Fig. 2. Example trial from Experiment 1. Left: Participants (agent A) provide a planet judgment at trial ten.At this trial, the participant did not observe additional private evidence (i.e., they did not catch fish).Previous private evidence (one red fish observed at trial two), the participant's own judgments, as well as the previous judgments provided by agents B and C are shown in the trial summary section.In the displayed condition (A told C sees B), the communication structure was known to participants.In addition to explaining the communication structure to participants in the instructions (Fig. A.1), the trial summary section includes arrows to indicate who can see whose judgments.Right: In condition one with no structure information, participants (agent A) provide a structure judgment after completing all ten planet judgments.(For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 4 .
Fig. 4. Results from Experiment 1.[a] Average human and model judgments (y-axis) across the ten time steps (x-axis) of our task for each condition.[b] Cross-validation results showing the number of participants best fit (y-axis) by each model (x-axis).[c] Average edge probabilities for participants (left) and a Bayesian structure learner (right; see Structure Learning) provided in condition one with no structure information.(For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 5 .
Fig. 5. Results from Experiment 2. [a] Average human and model judgments (y-axis) across the ten time steps (x-axis) of task for each condition.''Combined private evidence'' refers to the ground truth, i.e., the predictions of an omniscient learner that had access to all private evidence observed by agents (see Fig. A.2b). [b] Cross-validation results showing the number of participants best fit (y-axis) by each model (x-axis).(For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 6 .
Fig. 6. Results from Experiment 3. [a] Average human and model judgments (y-axis) across the ten time steps (x-axis) of our task for each condition.''Combined private evidence'' refers to the ground truth, i.e., the predictions of an omniscient learner that had access to all private evidence observed by agents (see Fig. A.2c). [b] Cross-validation results showing the number of participants best fit (y-axis) by each model (x-axis).[c]Average edge probabilities for participants (left) and a Bayesian structure learner.Inferred edge probabilities from B to C in condition two were higher (60% for participants and 75% for a normative structure learner) as compared to condition one in which B and C were independent (31% for participants and 60% for a normative structure learner).