Active causal structure learning in continuous time

Research on causal learning has largely focused on learning and reasoning about contingency data aggregated across discrete observations or experiments. However, this setting represents only the tip of the causal cognition iceberg. A more general problem lurking beneath is that of learning the latent causal structure that connects events and actions as they unfold in continuous time. In this paper, we examine how people actively learn about causal structure in a continuous time setting, focusing on when and where people intervene on the system and how this shapes their learning. Across two experiments, we ﬁnd that participants’ accuracy depends on both the informativeness and evidential complexity of the data they generate. Moreover, intervention choices strike a balance between maximizing expected information and minimizing expected inferential complexity. That is, we ﬁnd people time and target their interventions to create simple yet informative causal dynamics. We discuss how the continuous-time setting challenges existing computational accounts of active causal learning, and argue that metacognitive awareness of one’s inferential limitations plays a critical role for successful learning in the wild. paper we show that human learners are sensitive to time, not just in terms of how it impinges on what can be learned from evidence in principle, but also in terms of how it shapes the practicalities of interpreting that evidence as it arrives. Our experiments and modeling show that participants adapted their actions to the ongoing event dynamics during learning so as to strike a balance between expected information gain and anticipated inferential cost. These results contribute to our understanding of causal inference in continuous time, incorporate a new dimension to the study of human active learning and oﬀer new directions for causal learning research.


Introduction
The ability to predict, plan, and control events in the world demands a sophisticated representation of the world's causal structure. Learning such a causal model requires gathering causal evidence through interventions (Pearl, 2000) -actions that manipulate the environment in ways that reveal what causes what and distinguish spurious correlations from genuine causal relationships. However, learning causal structure in general, and selecting interventions in particular, are computationally challenging problems even under idealized conditions (Bramley, Dayan, Griffiths, & Lagnado, 2017). In everyday life, this challenge is compounded by the need to interact with the causal environment in real time, bringing computational constraints to the fore (Griffiths, Lieder, & Goodman, 2015;Simon, 1982). In this paper, we explore how people actively learn about causal structure in real time. To do this, we introduce a causal learning task in which participants interact with causal devices in real time, deciding when and where to intervene in order to gather information about how the device works. To motivate our novel experiments and modeling, we first summarize prior empirical work on active causal learning and point out some of its limitations. We then introduce notions of resource-rational behavior (Lieder & Griffiths, 2020;Simon, 1982) that serve as a guideline for our computational modeling framework. We then investigate human active learning about a range of acyclic and cyclic causal devices in two experiments. We analyze participants' causal judgments and intervention patterns both descriptively and through comparison with a range of models. We contrast an unbounded computational account that optimizes expected information density of its interactions with the devices with bounded models that balance information and inferential complexity. Finally, we discuss the broader implications of this perspective on accounts of human learning.

Prior work on active causal learning
Everyday cognition is rich with causal beliefs that explain the progression of events, shape our predictions about what is to come, and allow us to choose actions to realize our goals. For example, you might recognize a squeaking sound as caused by the opening of your garden gate, predict the doorbell will ring with your food delivery and get up to answer the door in anticipation. Many researchers have used a causal Bayesian network framework to study how people build up and represent networks of beliefs about causal mechanisms and affordances (Bramley, Dayan, et al., 2017;Griffiths & Tenenbaum, 2009;Lagnado & Sloman, 2002;Lucas & Griffiths, 2010;Meder, Gerstenberg, Hagmayer, & Waldmann, 2010;Rehder, 2014;Rottman & Hastie, 2016;Schulz, Gopnik, & Glymour, 2007;Sobel & Kushnir, 2006;Stephan, Tentori, Pighin, & Waldmann, 2021;Steyvers, Tenenbaum, Wagenmakers, & Blum, 2003). While the particulars of these studies are diverse, many share a core set of properties illustrated in Fig. 1a. Participants are typically asked to distinguish between a set of candidate causal structures on the basis of evidence. Often this evidence takes the form of ''snapshot'' samples of discrete variables' states. Most often, the variables of interest are binary with one value construed as a variable being ''present'' or ''active'', and the other state as ''absent '', or ''inactive''. However, when only covariation data of a set of variables is available, observational samples are insufficient to uniquely reveal structure (Pearl, 2000;Spirtes, Glymour, Scheines, & Heckerman, 2000). For example, if a learner observes two variables cooccurring, such that when one is active (or inactive) the other one tends to be active (or inactive) too, they cannot tell if one is causing the other or if they share an unobserved common cause. One solution is to intervene (Pearl, 2000) -manipulating one or more variables in the system by fixing them to particular values and observing how this affects the rest of the system. A number of experiments have allowed participants to perform such interventions in order to support their learning (Bramley, Dayan, et al., 2017;Bramley, Lagnado, & Speekenbrink, 2015;Coenen, Rehder, & Gureckis, 2015;Lagnado & Sloman, 2002;Steyvers et al., 2003).
Studies have shown that well chosen interventions can speed up learning, allowing learners to target their uncertainty and quickly narrow in on the true model. However, poorly chosen interventions can be worse than random actions or passive observations (Settles, 2009). In the covariation-data setting, adults and children have been found to be able to select informative interventions and learn successfully from them about probabilistic systems involving a handful of variables (Bramley et al., 2015;Coenen et al., 2015;McCormack, Bramley, Frosch, Patrick, & Lagnado, 2016;Meng, Bramley, & Xu, 2018;Steyvers et al., 2003). At a normative level, informative interventions are those whose consequences are expected to strongly distinguish among the potential hypotheses, maximally decreasing global uncertainty in expectation (Tong & Koller, 2001), or maximizing the chances of inferring the true causal structure that gave rise to the data (Nelson, 2005). A number of experiments have demonstrated broad alignment with these norms in both adults and children, but also departures from the normative predictions which suggest that process-level considerations are important for fully characterizing people's inferences (Bramley, Dayan, et al., 2017;Bramley et al., 2015). For example, people often chose an intervention that is expected to confirm or refute a currently-favored hypothesis rather than one that provides more information about the full hypothesis space (Coenen et al., 2015;Klayman & Ha, 1989;Meng et al., 2018;Steyvers et al., 2003). People sometimes also rely on generic strategies such as systematically fixing the values of some variables while varying others in order to isolate one potential relationship at a time (Bramley et al., 2015;McCormack et al., 2016;Schulz et al., 2007). One manifestation of this is the so-called control of variables strategy in which a set of candidate causal variables are fixed and one variable is changed in each experiment (Chen & Klahr, 1999;Kuhn & Brannock, 1977;Zimmerman, 2007). Following such a strategy has been emphasized in developmental psychology as a marker of mature scientific experimentation, but this strategy turns out to be suboptimal in certain environments (Bramley, Jones, Gureckis, & Ruggeri, 2022;Coenen, Ruggeri, Bramley, & Gureckis, 2019). Finally, adults choose interventions adaptively, taking into account environmental factors, such as time pressure, as well as whether a strategy was informative in the past (Coenen et al., 2015).

What prior work has neglected
Previous work on causal learning has largely focused on situations that mimic idealized laboratory conditions. In these studies, participants perform interventions in a discrete trial-by-trial manner, and the values of all variables are revealed all at once. In this way, participants are invited to generate and reason from a series of independent observations. Fig. 1b illustrates a example of this atemporal evidence, generated from interventions and subsequent observations of the variables in a stochastic system (see Bramley et al., 2015;Coenen et al., 2015, for example). Information arrives in three independent trials in the form of variable states (yellow = present or active; gray = absent or inactive) conditional on interventions (i.e. variables fixed on or blocked off by the learner).
At a computational level, the problem is one of identifying the true generative causal Bayesian network -the parameterized graph that captures the patterns of covariation between the variables under both observations and any hypothetical intervention (Pearl, 2000). For example, in Trial 1 in Fig. 1b we see that, conditional on an intervention that activates , activated and did not. This can be written as { = 1, = 0| Do[ = 1]}, where 1 indicates a variable was active, 0 indicates it was inactive and Do [.] indicates a variable was fixed through intervention and thereby disconnected from its normal causes on this trial. Interventions can target multiple variables. For example, in Trial 3, both and are manipulated as is activated and is blocked (fixed to be inactivate). In this kind of task, ideal inference and intervention selection are well-understood computationally, facilitating T. Gong et al.  comparison between behavior and rational norms (e.g. Rottman & Hastie, 2014). However, this task setup differs in several respects from the causal learning and reasoning problems people face in daily life when they (1) take into account temporal information, (2) deal with evidence that is interdependent, and (3) encounter causal learning problems when the underlying causal mechanism may be cyclic.

Time
Most previous studies removed temporal information, including the order of events and the delays between them. For example, Coenen et al. (2015) described a cover story of computer-chip systems where the causal relationships are the passage of electrical current from the energy source to the components, occurring too fast to distinguish order of activation. Other studies only allowed participants to view the final outcome (Bramley et al., 2015;Rottman & Keil, 2012). In contrast, many everyday causal relationships take time to propagate, meaning that the temporal order and delay between events is relevant for inferring causal relationships. The notion that causes must precede their effects is foundational to the concept of causation (Burns & McCormack, 2009;Lagnado, Waldmann, Hagmayer, & Sloman, 2007;White, 2006). Indeed, people have been shown to rely on temporal order to guide causal inference even when it conflicts with covariation information (Lagnado & Sloman, 2006), and to assign low probabilities to mechanistic explanations for event sequences that would require an effect to have occurred at the same time as its cause (Bramley, Gerstenberg, Mayrhofer, & Lagnado, 2018).
People not only have expectations about the order of events but also about the delays between them, giving higher causal strength ratings when delays between a putative cause and effect are short and reliable (Greville & Buehner, 2010) as well as when they conform to prior or mechanistic expectations (Buehner & May, 2004;Buehner & McGregor, 2006;Hagmayer & Waldmann, 2002). For example, Buehner and McGregor (2006) found that participants gave higher causal judgments about the insertion of a ball turning on a light on a physical apparatus when the light came on after a few seconds rather than instantly, if they were aware it took time for the ball to roll through the apparatus and reach the light switch. A separate line of work has studied inference and representation of continuous variables in continuous time (Davis, Bramley, & Rehder, 2020;Soo & Rottman, 2018). However, temporal information is yet to be examined in the context of active causal learning.

Interdependence
Under the laboratory conditions created in prior experiments, evidence is taken to come from multiple ''independent, identically distributed'' (i.i.d.) observations or interventions. For example, trials may pertain to different test subjects drawn from the same population (e.g. pairs of patients and treatments; Buehner, Cheng, & Clifford, 2003), or might involve repeated interactions with the same causal mechanism, but collected via a protocol that ensures variables ''reset'' from one trial to the next (e.g. the ''blicket detector''; Gopnik, Sobel, Schulz, & Glymour, 2001;Lucas, Bridgers, Griffiths, & Gopnik, 2014). However, it is rare for everyday experience to exhibit these properties. In life, there is no magic reset button. It is hard to be sure whether and when a causal system has been reset without understanding its underlying mechanism (defeating the goal of the exercise).
To illustrate this, imagine wondering why your puppy is unusually excited one evening. You consider two candidate hypotheses: Perhaps his elevated mood is due to a new variety of dog food you fed him at 5pm, or perhaps it is because of a new floral scent on the road where you walked him at 6pm. The puppy might still be happy about his dinner even after having smelt the flowers. A poor approach to resolving the question would be to always feed him beside the flower bed. It would be better to vary the relative time of walking and feeding him while keeping a close eye on the time intervals implied by different causal explanations.
This example illustrates that active learning in everyday life is better understood as a rolling sequence of interventions, with cause and effect events unfolding on a single continuous timeline. In fact, Rottman and Keil (2012) found that when presented with a sequence of experimental results, even paired with a cover story that implied these experiments were independent, many participants judged causal relationships by how values changed relative to their state on the preceding observation, rather than treating the samples as independent (see also Derringer & Rottman, 2018). This suggests that when evidence arrives over time, people strongly assume temporal dependence. Thus, it seems that temporal dependence not only reflects genuine causal phenomena but that it may also better match laypeople's intuitive causal theories than time-agnostic Bayesian networks do.

Causal cycles
Causal learning studies have largely focused on acyclic causal systems where causal influences flow only in one direction, never revisiting the same components. This is partly due to the conceptual and mathematical convenience afforded by the formalism of acyclic causal Bayesian networks (see Rottman & Hastie, 2014, for a review). The continuous-time setting enables us to investigate cyclic causal relationships. A causal mechanism is cyclic if it has at least one component whose descendants include itself (Pearl, 2000). This means that the components that form part of the cycle, or outputs from it, may occur in repeated alternating fashion (e.g. a bidirectional connection ↔ could generate a sequence of events , , , , , …). Many causal processes in the natural world are cyclic (Malthus, 1872), and people frequently report causal beliefs that include cyclic relationships when allowed to do so in experiments (Kim & Ahn, 2002;Nikolic & Lagnado, 2015;Rehder, 2017;Sloman, Love, & Ahn, 1998), making this an important aspect of causal cognition to study.

The current paradigm
The learning problem. Departing from the atemporal setting, we focus on what people can learn from interventions and observations of events within a single continuous timeline. We study a setting in which effect events follow their causes with some stochastic but predictable delay.
This causally-connected-point-event setting has been used in a number of recent studies of temporal causal reasoning. It rests on a firm mathematical foundation that supports normative inferences from temporal information to causal structure. Greville and Buehner (2010), Griffiths and Tenenbaum (2009) first demonstrated that people can infer how pairs of variables affect one another from observing sequences of point events. Griffiths (2012, 2015) developed a model that infers causal relationships based on the occurrence of putative cause events that influence the rate at which the relevant effect events occur over time. Bramley, Gerstenberg, Mayrhofer, and Lagnado (2018) built models that combined hard order constraints with soft delay expectations to best capture structure judgments: Even when order information was fixed, participants were still sensitive to the variation in inter-activation delays between events and used it to distinguish between certain causal structures.
More recently, research has focused on so-called ''actual causation'' (Halpern, 2016): The question of which out of multiple candidate events actually brought about the outcome (Gerstenberg, Goodman, Lagnado, & Tenenbaum, 2021;Stephan, Mayrhofer, & Waldmann, 2020). Using key ideas from the ''actual causation'' literature, recent work has looked at how causal structures can be identified from temporal information by considering the different possible causal pathways that could have produced the observed events conditional on different underlying causal mechanisms (Bramley, Gerstenberg, Mayrhofer, & Lagnado, 2018;Gong & Bramley, 2020;Valentin, Bramley, & Lucas, 2020). We follow this approach, using Gamma distributions to model the distribution of causal delays exhibited by a particular causal component of a device across instances (see Fig. 1b). Gamma distributions define a probability density over (0, +∞) via a shape parameter , and rate parameter allowing for a variety of causal delay distributions with differing means and more or less variability (see Fig. 1c).
While temporal information was key to how the evidence was presented in the studies above, the data was not fully continuous in the sense in participants experiences were still broken into separate independent episodes. For example, in order to set things in motion in Bramley, Gerstenberg, Mayrhofer, and Lagnado (2018), each clip began with the system at rest perturbed by an exogenously caused root-component activation, with effects following from there. Since components could only activate once in these tasks, the system would quickly reach a steady state. This still departs from a fully continuous-time setting in which interventions and effects are intermingled and components may exhibit multiple activations within the same episode. A fully continuous setting makes it more difficult to figure out what caused what because any given event might be attributed to an earlier-occurring event and might have its own effects that are still to occur.

Activating and blocking interventions in time.
In our experiments we will allow learners to intervene on the causal system in two ways: 1. By activating components, thus potentially setting in motion a new sequence of events. 2. By blocking components, thus preventing that component both from being activated and from activating any other components until it is unblocked again.
Activating and blocking are superficially analogous to fixing variables to be on Do[ = 1] or off Do[ = 0] in the atemporal setting. However, they also differ in important ways. In the continuous-time setting, activating does not disconnect a component from its normal causes. The intervened-on component can be activated again an arbitrary number of times during the same episode either by the intervener or when caused by other variables in the system. For instance, if the activated component is part of a cycle, we would expect it to be re-activated repeatedly following its initial activation until one of the causal connections fails. Thus, activation is better thought of as a shock to the system than as a form of graph surgery.
T. Gong et al. Fig. 2. Sketch of ideal observer inference algorithm and approaches to minimizing complexity. (a) Ideal Bayesian inference considers each possible structure hypothesis and every possible causal path ′ that could describe how that structure produced the observations. The number of possible paths grows rapidly in the number of ''nearby'' events as illustrated with an example recursion tree showing all twelve paths connecting events 1 ... 2 conditional on the structure ← → → ( (index) component : activating interventions; (index) component : effect events, see Appendix A for a full description of notation). Two paths 1 and 2 were further displayed in a timeline format with arrows showing the hypothesized generative process and red arrows in particular highlighting the different delay implications. (b) Three examples of interventional strategies that help reduce the inferential cost of processing generated evidence. Sequences (2), (4), (5), (7) are less complex to process than Evidence (1), (3), (6).
On the other hand, blocking actions do exemplify the ''graph surgery'' property in Pearl's sense. They disconnect the blocked component from its normal causes until it is unblocked again (Fig. 1b). In the atemporal setting, blocking is essential for discriminating between certain structures (Bramley et al., 2015;McCormack et al., 2016;Schulz et al., 2007). For example, turning on a single component (i.e. Do[ = 1], Do[ = 1] or Do[ = 1]) generates a similar pattern of dependence under a → → chain and a ← → → fully connected structure -activating A affects B and C, activating B affects C but not A, activating C neither affects A nor B. This makes it difficult and inefficient to distinguish these structures based on activating interventions alone and impossible in deterministic settings. To identify whether there is direct link between and , one must turn on while simultaneously blocking or disabling (i.e. [ = 1, = 0]). The current continuous-time setting endows blocking with different implications. Since causes generate effects individually, blocking is not strictly required to distinguish direct and indirect paths. Going back to the chain vs. fully connected example above, a fully connected system would normally produce two staggered activations of following an intervention on while the chain would produce only one, making them distinguishable in principle. Nevertheless, blocking may be useful for reducing computational complexity and ambiguity of parsing the consequent event sequences. The learner can use blocking to reduce the event numbers, or remove a component from consideration, while still making remaining events informative (see the section below for examples of two relevant strategies).

Cognitive resource limitations
In the atemporal setting, causal reasoning is a little like crafting an essay: Evidence can be collected, organized and put together carefully with room for reorganizing and backtracking in searching for an effective structure. However, real-time learning more closely resembles the problem of writing under exam conditions: One must react immediately to the prompts, bringing ones inferential tools to bear quickly and efficiently without the luxury of time to backtrack.
We lay out the how an ideal Bayesian observer learns the causal structure with temporal evidence in Appendix A. This shows that the amount of computation needed to process the evidence compounds rapidly as more events occur. An ideal learner needs to consider all the plausible pathways through which a particular causal structure might have produced an observed pattern of events (Halpern, 2016). For example, consider intervening once each on and and then observing two subsequent activations of . To calculate the overall likelihood that a ''collider'' structure → ← could have produced this pattern, we would need to take into account two possibilities (1) that produced the first activation of and the second one, or (2) that caused the first one, and the second one. However, there will generally be far more than two such possibilities. Fig. 2a shows a more complex example in which twelve possible causal paths could link six events for a single causal structure. For a handful more events the number of paths can easily grow into the millions.
Challenging such a naive idealized account of causal inference is the basic fact that human minds are bounded in their capacity to compute and store information. Human reasoning and decision-making necessarily deviates from such intractable, computationallevel ideals (Anderson, 1990;Simon, 1982). Given the computational complexity and conceptual centrality of structure learning in cognition, we expect computational costs to play a large role when structure inference must take place in real time (Christiansen & Chater, 2016). Several process-level proposals have been explored in the literature as candidates for how people approximate normative structure inference. People may only consider a few sampled hypotheses (Bonawitz, Denison, Gopnik, & Griffiths, 2014), incrementally adjust a focal hypothesis to accommodate new evidence (Bramley, Dayan, et al., 2017;Davis et al., 2020;Fernbach & Sloman, 2009), rely on recent evidence (Bramley, Dayan, et al., 2017;Bramley et al., 2015), rely on summary statistics (Gong & Bramley, 2020Ullman, Stuhlmüller, Goodman, & Tenenbaum, 2018), or on simple heuristics such as equating temporal order with causal order (Bramley, Gerstenberg, Mayrhofer, & Lagnado, 2018;Bramley, Mayrhofer, Gerstenberg, & Lagnado, 2017;Burns & McCormack, 2009).
While we expect some combination of the above ideas to be in play in how participants solve our task, we here explore a complementary facet of causal learning, the active gathering of evidence through interventions to support the inference process. We ask whether people time and target their interventions so as to manage the inferential complexity of parsing resultant evidence, while still producing informative evidence overall. Fig. 2b illustrates this idea by displaying three potential intervention strategies for managing evidential complexity. In the first example, the ground truth is ← → . An unbounded ideal learner learns about as much from Evidence 1 as from Evidence 2. However, if we assume that the ability to process evidence is a function of its complexity and that this is related to the density of the events being reasoned about, then it is clear that Evidence 2 is the more useful for a bounded learner. Here, the events are better separated, and so there is much less ambiguity about the plausible causes of each token event, therefore less need to engage in costly averaging over many potential causal pathways under each structure hypothesis.
Bounded learners may also choose to block components of a system to make the event stream manageable for reasoning. As shown in Evidence 3 of Fig. 2b, having performed two activating interventions in a cyclic system, a learner may experience a confusing pattern of parallel excitation. Evidence 4 shows how this can be avoided using a ''controlled'' testing strategy that blocks a component before activating another. This approach allows a learner to isolate a subsystem of a larger system. This controlled test means fewer interpretations of the evidence need be considered. For instance, in Evidence 3 remaining events are straightforwardly indicative of the substructure linking the unblocked components and . Well-timed blocks might also be used to impose pseudo-independence and trial-like structure within a continuous interaction. For example, as shown in Evidence 5, one might block, wait, then unblock components to ''reset'' a system, preventing any ongoing activity from complicating the inference process going forward.
Finally, although activating a suspected root component (here ) tends to produce more evidence about a causal structure than activating its suspected tail nodes (Evidence 6), bounded learners might sometimes avoid root components. For instance, if primarily interested in understanding a presumed downstream subpart of a complicated causal mechanism, one might intervene locally to avoid extraneous events and activity (Evidence 7). Note that those considerations are not independent. Spreading out interventions in time, for example, requires the learner to wait for the system to calm down. The same could be accomplished by blocking-and-unblocking a component to reset the system.
Finding ways to balance informativeness and complexity in generating evidence is conceptually related to the notion of bounded rationality (Anderson, 1990;Simon, 1982). The basic idea is that human minds have evolved or discovered solutions that trade off efficiently between the costs of computation and its rewards in greater accuracy or performance. In particular, one can incorporate computational costs into a solution space formally with a resource rationality analysis (see Griffiths et al., 2015;Lieder & Griffiths, 2020;Shenhav et al., 2017, for review). This has suggested that a number of decision making phenomena classically seen as irrational -including as anchoring and probability matching -may instead represent efficient solutions to a computation-value tradeoff under some sensible approximation scheme (Callaway et al., 2022;Dasgupta, Schulz, & Gershman, 2017;Hawkins, Gweon, & Goodman, 2021;Lai & Gershman, 2021;Lieder, Griffiths, Huys, & Goodman, 2018). We similarly use a resource-rationality framework to analyze adults intervention choices and judgments in our tasks. This involves first considering the impact of both information and complexity on inferential success, and second, modeling intervention selection as driven by a goal of maximizing the expected informativeness of the evidence while minimizing the expected inferential cost of processing the evidence.

Overview of experiments
We conducted two experiments to test how people actively learn causal structures in a continuous-time setting. In both experiments, we manipulated the reliability of the cause-effect delays and included a range of acyclic and cyclic causal structures. In Experiment 1, we only allowed participants to activate components while in Experiment 2, we also allowed them to block components.
In line with our normative account of causal inference in this setting, we hypothesized that performance would be lower in the irregular delay condition given that evidence about what caused what is more ambiguous. In line with our bounded inference account, we also hypothesized that performance would be worse in cyclic systems given the likely increase in event density, interdependence and concomitant complexity. However, we further expected accuracy to depend on the quality and reactivity of participants' intervention choices. Thus, we also examine whether and how participants' intervention selection differs across devices and delay conditions, asking to what extent intervention choice is reactive to the behavior of the device being explored, and whether this reactivity reflects rational anticipation and active management of expected information gain and evidential complexity.

Participants
Seventy-four participants (40 female, 34 male, aged 30 ± 11) were recruited from Prolific Academic and were randomly assigned to either the reliable-delay ( = 36) or unreliable-delay ( = 38) condition. Participants received a basic payment of £1 and a bonus depending on performance (see Incentives section). Nine additional participants were tested but removed from the analysis because they left the default ''unconnected'' connection judgment for all causal component pairs for all trials ( = 6) or had at least one trial in which they performed no interventions at all ( = 3). The sample size was chosen to be in line with related work on causal learning (Bramley, Gerstenberg, Mayrhofer, & Lagnado, 2018;Coenen, Ruggeri, et al., 2019). 1
Participants were asked to investigate abstract causal ''devices'' connected by hidden causal links (Fig. 3a). The causal links produce point events in the form of activations of the device's components over time. For causally related components, an activated component will probabilistically activate each of its effect components once after some delay. All causal connections worked 90% of the time and no events occurred without being caused by an intervention or other event (i.e. none of the components activated spontaneously). Participants were informed and tested on this in the instructions.
Each participant learned about 12 test devices with either 3 or 4 components, including 6 acyclic structures and 6 cyclic structures (Fig. 4). The acyclic structures were chosen to exemplify a variety of causal relationships including common effects (i.e. ''colliders'' Acyclic1 and Acyclic4), chains (Acyclic2 and Acyclic5), common causes (i.e. ''forks'', Acyclic3 and Acyclic6). The cyclic devices were chosen so as to approximately match the number of edges in the acyclic systems while investigating a variety of arrangements. These included full loops (Cyclic1 and Cyclic4) and short loops with incoming connections (Cyclic2 and Cyclic5) and outgoing connections (Cyclic3 and Cyclic6).
Interface. Fig. 3 shows the task interface. For each device, participants saw the 3-or 4-components visualized as gray circles evenly spaced on a white background. Participants had 45 s to learn about how the components were connected. During this time, they could intervene and activate components by left-clicking on them up to 6 times. Intervened-on components were marked by a ''+'' symbol ( Fig. 3a). All activated components turned yellow for 200 ms and then returned to gray (Fig. 3b). At the beginning, all components were inactive, and no connecting links were marked between them.
Participants were able to indicate their current belief about the causal structure as often as they liked during each learning problem. To do so, participants clicked on the gray area between components to toggle between a causal connection in either direction, both directions, or no connection. Each click cycled through the options ( → , ← , ↔ , no relationship) in a random order varied between participants. Participants confirmed their choices by clicking a confirm button that appeared in the middle (Fig. 3c). Links did not disappear after being confirmed, so participants were still able to update earlier judgments. What participants had marked at the end of 45 s was automatically registered as the final judgment for that trial. At the end of the T. Gong et al.
with orange used to highlight the ground truth. Note: Act = average number of activating interventions performed; Acc = mean accuracy; Str = proportion of participants who detected the whole structure correctly. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) trial, participants received feedback about which connections they had marked correctly or incorrectly (Fig. 3d). Since any pair of components might be unconnected, have a directed ( → or → ), or bidirectional ( ↔ ) causal connection, the response space includes 64 possible structures for 3-variable devices and 4096 possible structures for 4-variable devices of which exactly one truly reflects the hidden causal structure.
Incentives. We incentivized participants to mark the correct causal links as early as possible within the trial by rewarding them based on their accuracy at a random time point during each problem. This bonus scheme means it is in participants' interest to register their best guess accurately and early, and to update it whenever their conclusions change during a learning episode (cf. Bramley, Dayan, et al., 2017). The scheme also shapes the nature of an ideal intervention strategy, meaning one should balance the benefits for intervention selection of waiting until one knows more, against the opportunity cost of waiting too long and missing out on what could have been learned from an earlier intervention. In order to perform well in our task, learners need not only consider how and where to intervene next, but also when to do so.

Procedure
In the main task, each participant faced 12 test devices in random order with randomly positioned and unlabeled components. Prior to the inference task, participants completed instructions, a practice trial and comprehension checks. Participants were told that they would be investigating the causal structure of a number of abstract ''devices''. In the instructions, participants were trained on the true cause-effect delays in their condition and shown a video example of a device with its causal links revealed. They were then trained on how to provide structure judgments. Participants learned that they would receive a £0.03 bonus for each connection correctly marked at a randomly chosen and unmarked point during each trial (for a theoretical maximum total bonus of £1.62). This was emphasized in the instructions to encourage participants to mark connections as quickly as possible. Participants had to correctly answer 5 comprehension check questions before proceeding to the main task. Finally, participants completed a practice trial on a device with a collider structure (Acyclic1 in Fig. 4).

Results
We first report participants' judgment accuracy (i.e. what proportion of connections participants correctly identify at the end of each trial) by delay condition (reliable vs. unreliable), device type (acyclic vs. cyclic), and number of components (3 vs. 4). We then T. Gong et al.  discuss characteristic error patterns under specific causal structures. Our accuracy analyses use linear mixed-effect models (LMMs) including random slopes and intercepts for subject ID and structure type (Brauer & Curtin, 2018). For all LMMs we report standard coefficient estimates s (that show how many units of standard deviations the outcome variable changes when the independent variable changes from one condition to the other), values, significance, and 95% confidence intervals (CI). We then compare participants' trial-by-trial accuracy against the predictions of a normative inference model. We explore whether deviations from normative responding are related to the density of events in that trial as a basic index of complexity. We will then focus on participants' intervention choices and explore whether interventions are driven by a trade-off between the expected evidence strength and complexity.
Accuracy. Participants confirmed their causal judgments 2.45 ± 1.31 times per trial. Final judgments -i.e. what participants had marked at the end of the trial -identified the majority of the causal connections correctly (62% ± 34%) but with marked variation across and within devices. Participants' final judgments generally improved on the accuracy of their initial judgments -i.e. what participants had marked as their first answers -in the 79% of trials in which participants made more than one judgment (initial accuracy: 58% ± 30%, = 0.11, = 2.36, = .024, = [0.02, 0.20]). In the following, we focus on participants' final structure judgments.

< .01) with the exception of Cyclic1
(unreliable: (37) = 1.94, = .06, Fig. 4). Table 1 shows accuracy separated by condition. There was a main effect of structure cyclicity ( = 0.50, = 2.69, = .026, = [0.14, 0.86]) such that the accuracy was higher for acyclic than cyclic structures. There was no main effect of delay reliability ( Error patterns. Participants were best at inferring the structure of colliders (Acyclic1 and Acyclic4, see Fig. 4). These structures were naturally simple in their evidence since no intervention would cause more than one effect. For the three-component chain (Acyclic2), 15% of participants added an erroneous additional direct connection → . Similarly, for the four component chain (Acyclic5), participants frequently also added one or more ''short cut'' links from → (12%), → (8%) or → (13%) in addition to the true connections. These errors cohere with previous findings suggesting that people rely on local computations when inferring causal structure, resulting in the addition of extraneous connections in chain structures (Davis et al., 2020;Fernbach T. Gong et al. mistook the fork Acyclic3 for a chain with the same root component ( → → or → → ). This lines up with the idea that people tend to fall-back on temporal order as a cue to causal order (Bramley, Gerstenberg, Mayrhofer, & Lagnado, 2018;McCormack et al., 2016), tending to link the effect components of a fork in whatever order they happened to activate. These error patterns did not differ significantly between reliable vs. reliable groups ( 2 tests, > .10). For the cyclic structures, participants' judgments varied considerably so we focus on the individual-connection-level errors as shown in Fig. 4. In the full loops (Cyclic1 and Cyclic4), participants frequently judged directed or disconnected links as bidirectionally connected (Fig. 4, black bars). This was more prevalent in the unreliable group than in the reliable group (Cyclic1: 37% vs. 23%, 2 (1) = 4.31, = .04; Cyclic4: 22% vs. 10%, 2 (1) = 11.41, < .001), suggesting that reliable delays make it easier to detect full loop structures. This makes sense since regular delays produce much more sequentially reliable and predictable patterns of reactivation. Participants had relatively little trouble at identifying loops with incoming connections (Cyclic2). However, performance was very poor for structures comprised of feedback loops with outgoing connections (Cyclic3, Cyclic5, and Cyclic6). The outgoing component ( in Cyclic3, in Cyclic5 and Cyclic6) was frequently taken to be a constituent of the feedback loop, often being assigned a bidirectional connection with one of the loop constituents. This is reasonable since, for these structures, recurrent and close-in-time events occurred not only at the components forming the loop themselves but also for the output components, making it difficult to tell which components were involved in actively sustaining the looping pattern of activations. Participants often connected an output component to the loop element that typically activated in close temporal proximity. For example, many participants marked a ↔ connection in Cyclic3, ↔ in Cyclic5, and ↔ in Cyclic6, in spite of the fact that this temporal proximity is really due to them sharing a common cause.
Participants were normally correct about whether the structure was cyclic or acyclic. Participants' structure judgments belonged to the correct class 82% ± 38% of time for the acyclic class and 77% ± 42% of time for the cyclic class. There was no difference in the frequency of mistaking cyclic for acyclic vs. acyclic for cyclic ( (73) = 1.15, = 0.25), and the rate did not differ between reliable and unreliable delay conditions ( > .10).
Informativeness and event density. We calculated the accuracy of an ideal observer (IO) based on the 45-seconds of evidence generated by each participant on each trial. This acts as a measure of how informative the evidence generated by the participants was (cf. Bramley et al., 2015). The IO was more accurate in the reliable (97% ± 7%) than the unreliable (94% ± 12%) condition ( = 0.23, = 2.61, = .011, = [0.06, 0.40]). In contrast to human learners, the IO was more accurate in identifying the structure of cyclic structures (98% ± 8%) compared to acyclic (93% ± 11%) structures (   .08]) devices. These results suggest that evidence complexity is critical in this task. For the IO, complexity is generally positively correlated with success. The more activations there are, the more information an ideal observer can use to reduce its uncertainty. However, non-ideal human learners clearly struggled to deal with complex evidence. As shown in Fig. 7, the best-performing participants were generally those who were able to generate evidence that would have enabled the IO to be highly accurate, but that was also low in event density. We later compare computational models that model the influence of complexity on human judgments and intervention selection, capturing the qualitative differences between cyclic and acyclic cases (see the section on Modeling the Judgments).
When to intervene. We now assess whether participants' intervention choices are qualitatively consistent with the idea that they choose interventions that generate strong evidence while minimizing evidential complexity. For example, participants may choose to perform fewer interventions when experiencing a large numbers of events, as tends to occur in cyclic structures and, to a lesser extent with structures with four components (Fig. 8). As shown in Fig. 9, on average, participants performed 4.68 ± 1.46 out of the maximum of 6 interventions on each trial, performing about the same number in the unreliable (4.76 ± 1.41) and reliable  . These results correspond to the simulation results in Fig. 8: In cyclic structures, even a few interventions allowed for ceiling level accuracy in principle, while the event density compounded dramatically with each additional intervention. This means that, in cyclic structures, the computational cost of additional interventions quickly outweights the value of the new information. Event density also increased going from threeto four-component devices (at least for the structures we tested), but this increase comes alongside an increase in the amount of structure to be learned. This means that the evidence-strength gap between three-and four-component devices could be narrowed by intervening more frequently in four-component devices compared to three-component devices.
The average interval between each intervention depended on cyclicity ( = 0.67, = 7.22, < .001, = [0.49, 0.86]), with participants waiting longer before the next intervention when the structure being learned was cyclic (9.38 ± 5.94 s) rather than acyclic (5.49 ± 2.64 s, Fig. 9). There was no evidence for a difference in this measure between the unreliable and reliable delay conditions (7.55 ± 5.10 s vs. 7.26 ± 4.83 s) or between three-and four-node problems (7.19 ± 4.92 s vs. 7.63 ± 5.01 s). As shown in the example in Fig. 2b, even if the total number of events is identical, learning is easier when the events are more spread out across the trial. Intervention-spreading may be particularly important for cyclic structures where the event density is higher.
We test whether participants' tendency to wait longer under cyclic structures is driven by an anticipation of complexity. As a first pass, we calculated a moment-by-moment expectation of the level of computational cost of the upcoming evidence assuming no further intervention is performed. For this, we calculated the number of events expected to occur in the near future as a result of earlier activity (see Modeling the Interventions for more details). We can compare the moments in which participants did nothing with those in which participants performed an intervention. As shown in Fig. 10, people waited -i.e. did not perform any intervention -for 88% of the 1-second windows in which they could have acted (i.e. had not run out of activating interventions yet). Yet in the 12% of time windows where they did intervene, the number of already-expected events (Median = 0, Mean = 1.86) was lower than those where they did nothing (Median = 1.25, Mean = 2.96, Mood's median test: 2 (1) = 854.35, < 0.001). This result suggests that participants tended to wait to perform their next intervention once there was not too much expected activity.
Where to intervene. An efficient sequence of interventions involves both a healthy dose of early exploration -trying each component to learn its effects -but also an exploitative reactive focus -meaning a later tendency to repeat activating components that showed promise in producing effects. This repetition allows a learner to gather evidence about the order and the delay with which effects propagate through the system. This information is crucial for distinguishing between devices with overlapping causal structure. We see clear qualitative evidence of such exploration and exploitation in participants' choices. Fig. 4's node shading shows the aggregate proportion of interventions on each node in each structure. Participants' interventions were relatively evenly distributed across components for most devices. They had a slight tendency to activate more causally ''central'' nodes (i.e. nodes that have many descendant edges; Coenen et al., 2015) in devices that have these such as on in the two common cause structures (Acyclic3 and Acyclic6).
A marker of early exploration is a tendency to initially sample components to test without replacement. That is, choosing something different to activate on one's second test than one's first, and so on. Fig. 11 shows how frequently participants selected a novel component to intervene on as a function of serial intervention position within the trial. This is shown in Fig. 11, and we compared participants' performance against chance (i.e. the random intervener) as well as against the choices of an idealized expected-information-gain maximizing intervener (i.e. the EIG intervener, see the section on Modeling the Interventions for more details) taking actions at the same moments as participants and conditional on the same prior evidence. For both threeand four-node structures, participants were more likely than chance to intervene on untested components until the number of interventions exceeded the number of components in the system ( (73) > 10.91, < .001). This shows that participants were not intervening randomly and suggests that they typically began by exploring system components they had not activated yet.
The simulated informationally efficient intervener shows a similar pattern the first several interventions (Fig. 11). The efficient intervener's decisions, along with those of participants become reactive to the past evidence in complex ways that do not submit to a straightforward aggregate measure. As such we will examine these choices closely through modeling in a section after the experiments.

Discussion
In Experiment 1, we showed that people are able to infer causal structure through active intervention in a challenging continuoustime learning setting. We found that participants, had different error patterns to an ideal observer model, in particular making more accurate judgments about acyclic structures than cyclic structures while the ideal observer had the reverse pattern. We also found that differences in accuracy across conditions were associated with differences in the character of the evidence. The informativeness of evidence predicted participants' performance in acyclic structures, but the complexity of evidence appeared to dominate participants' performance in cyclic structures where it was generally higher. Participants who were able to generate evidence that was both informative but not overly complex tended to perform best overall. We take this to support our central idea that managing computational cost plays an important role in interventional decisions and success in the real-time causal learning setting.
Intervention choices were partly shaped by a drive to control computational demands. In terms of when to intervene, participants performed fewer interventions and waited longer between them on cyclic structures that tended to produce more events. They also tended to perform more interventions on four-node structures yielding a similar number of events as for three-node structures but presumably responding to the greater initial uncertainty (larger space of structure possibilities). When the expected upcoming evidential complexity was already high, participants were more likely to wait rather than activate another component to produce more events.
In terms of which components participants would target, we found they used their interventions to systematically explore the devices, tending to select a hitherto untested component for their first few interventions, qualitatively in line with the behavior of an efficient information maximizing agent. Participants also showed a tendency to repeat-intervene on causally ''central'' components once these were discovered. Note that the role of a root component activation differs in this setting to the atemporal settings studied in the past literature. Interventions on known-to-be causally central components has previously been framed as a heuristic Positive Testing Strategy (Coenen et al., 2015;Steyvers et al., 2003) on the grounds that it is often correlated with expected information yet much easier to calculate. Positive testing can be very poor in the atemporal setting because multiple causal influences from the root component overshadow one another since all effects are revealed at once. However, in the continuous-time point-event case, intervening on a suspected root will often generate rich and diagnostic evidence through the delays and order variability in the propagation of the activity through the system (Bramley, Gerstenberg, Mayrhofer, & Lagnado, 2018). We will test the extent to which participants' specific where-to-intervene choices reflect an information gain norm in our model fitting to follow (see the section on Modeling the Interventions).

Experiment 2: Activating and blocking
Experiment 2 aims to replicate and extend the results of Experiment 1. This time, participants were not only able to activate components but they could also choose to block components, temporarily preventing them from activating until unblocked again. Intuitively, blocking permits the learner a greater degree of control over interactions with and observations of the system, as they can now isolate components to focus on, and also take control of ongoing activity in the system. On the other hand, a larger action space increases the complexity of the intervention decision-making problem. We will examine whether the relationship between ideal observer accuracy, human accuracy and event density is similar to the activation-only setting, and explore how participants use the blocking function. In particular, we will assess whether participants spontaneously use blocks to reduce the complexity of evidence without substantially reducing its diagnosticity about the causal relationships.

Participants
95 participants (54 female, 40 male, 1 nonbinary, aged 36 ± 12) were recruited from Prolific Academic and were randomly assigned to reliable-delay ( = 48) or unreliable-delay ( = 47) condition. They received a basic payment of £1 and a bonus depending on performance as in Experiment 1. Fourteen additional participants were tested but removed from the analysis because they reported for all the trials that the structure was completely unconnected, which was the initial default ( = 7), or did not perform any interventions in at least one trial ( = 7).

Design & procedure
The interface was similar to Experiment 1 with a few changes. In addition to activating components, participants were also able to block components by right clicking on them and to unblock them again with an additional right click. Blocked components were marked visually by turning gray and by showing a stop sign on them (Fig. 3b). Blocked components did not activate when they otherwise would have been caused to do so by another event or by a left click activation intervention. While participants were limited to 6 activations (as in Experiment 1), we did not limit how many times components could be blocked and unblocked.
We also extended the test set of causal devices. We added two densely connected acyclic structures (Acyclic7 and Acyclic8, Fig. 12), two densely connected cyclic structures (Cyclic7 and Cyclic8), as well as two devices with no causal connections between the components (i.e. Unlinked1 and Unlinked2). The inclusion of unconnected structures served to explore how people intervene under one extreme setting where no effects are ever experienced. While unconnected devices are technically acyclic, they are also qualitatively unique and as such, we treated them as a separate device type in our analyses. The new densely connected structures, by comparison, might produce particularly complex evidence and would so be particularly amenable to the use of blocks. Given the larger set of test stimuli, we did not include a practice trial in Experiment 2. In other respects, the instructions, incentive structure, and randomization procedure were identical to that of Experiment 1.
We improved the interface in two ways. First, we were concerned that occasionally two activations of a component would overlap making them hard to distinguish visually. In Experiment 1, each activation caused the component to turn yellow for 200 ms, if two activations overlapped this would result in the component appearing yellow for longer but without a clear declination between events. In Experiment 2, we had each component turn yellow and then fade back to gray over 200 ms, this made it easier to detect distinct activation events even if their onset times were very close together. Second, to make providing judgments more seamless, participants were not required to click a ''confirm'' button to register when they had finished making a change to their structure judgment as they had to in Experiment 1. One second after they stopped clicking on the edges, the state of their currently-marked structure was automatically registered as their latest judgment.

Results
As in Experiment 1, we first look at judgment accuracy and error patterns, and then at intervention strategies. We first focus on use of activations and then explore when and where participants use the novel blocking function.
Accuracy. Participants registered judgments 4.39 ± 2.43 times per trial. Within trials for which the answer was registered more than once (86% of all trials), final judgments were more accurate than initial judgments with 60% ± 34% compared to 48% ± 24% of connections correctly identified, = 0.41, = 14.09, < .001, = [0.35, 0.47]. Participants' judgments became more accurate as they approached the end of the trial ( = 0.11, = 10.92, < .001, = [0.09, 0.12]). As in Experiment 1, we focus on the final answers as our primary measure of task performance. Table 1 shows participants' accuracy separated by conditions. Performance in both reliability conditions was significantly above chance (random: 25%, reliable: (47) = 11.58, < .001, Cohen's = 1.67; unreliable: (46) = 13.63, < .001, Cohen's = 1.99). The average accuracy for all 18 structures were above chance in the reliable condition ( (47)  Error patterns. Fig. 12 shows the types of errors people made in inferring causal structures. For chain structures (Acyclic2, Acyclic5), there were no systematic errors in mistaking them as fully-connected structures, or fork structures (less than 5%). Similar to Experiment 1, 15% of participants mistook the fork structure Acyclic3 as a chain structure ( → → or → → ), while 12% mistook it for a fully-connected structure by adding a directed link between two child nodes. In the case of the fullyconnected structure Acyclic7, 15% of participants disregarded the link between → , while 10% confused it for a cyclic structure → ↔ . For Acyclic8, 13% of participants confused → with → , or → as → . The error patterns did not significantly differ between unreliable and reliable delay conditions ( 2 tests, > .10). Similar to Experiment 1, these error pattern results seem consistent with the idea that reliance on local computation and simple event order played a role in some participants' judgments (Bramley, Gerstenberg, Mayrhofer, & Lagnado, 2018;Burns & McCormack, 2009;McCormack et al., 2016).
For cyclic structures, participants in Experiment 2 also performed poorly for structures that contained one or more output components. The output components were frequently judged to be constituents of the feedback loop. Participants tended to connect components whose activations often occurred close in time. This pattern was replicated in the two new structures Cyclic7, Cyclic8. For instance, participants frequently marked erroneous bidirectional ↔ and ↔ connections in Cyclic7 and ↔ and ↔ in Cyclic8.
As with Experiment 1, participants could generally tell whether a structure was cyclic or acyclic regardless of whether they got all causal connections correct. They choose the correct class 70% ± 46% of time for acyclic structures (excluding the unlinked structures) and 80% ± 40% of time for cyclic structures. Participants more often mistook acyclic structures for cyclic than the reverse ( (94) = 2.46, = 0.02).
When to block. Participants performed blocks on 27% of trials (949 times in 1710 trials, 0.55 ± 1.17 per trial). 75 of 95 participants (79%) used blocking at least once. Given that the frequency of blocking was much sparser than activations, we coded trials as 1 (used blocks) or 0 (no blocks) in our statistical analyses and fitted logistic regression models to explore when blocking was used.
We found that the propensity to block differed neither between the reliable (28%) and unreliable (26%)  . Surprisingly, propensity to use blocks when facing unlinked (26%) structures did not differ significantly from cyclic or acyclic structures ( > .05). We had anticipated participants would be less likely to block in T. Gong et al. unlinked structures since a key function of blocks in this setting is to manage evidential complexity, and in the unlinked structures this is always minimal. We speculated that some uses of blocks might be spurious since we did not limit their use. For example, sometimes participants may have blocked and unblocked components simply to kill time until the end of the trial, especially after they have used up the activating chances and the system has been silent for a while. To further explore how participants used blocking, we focused on categorizing blocking actions, focusing on two plausible goals of blocking that have distinct empirical signatures: (1) Blocking in combination with activating to control for confounding causal paths and (2) blocking to reset the device.
For both of these uses, we derived simple operationalizations. We take ''Controlling'' blocks to be those that appear to be used as a way to perform a controlled test, essentially isolating a sub-network made up of all the components except the blocked one(s). This way, sub-networks can be investigated through an activation without the possibility of interference from any pathways through the blocked component. In contrast, the ''Resetting'' category includes those where the learner blocks and then unblocks a component before performing another activation, without activating any other component in the interval while the component is blocked. In the current setting, Resetting blocks serve to short-circuit ongoing chains of causal effects of previous interventions, essentially resetting the mechanism so that subsequent tests can be performed without interference. Both forms of blocking reduce density of events experienced during the trial but do so in conceptually different ways (See examples in Fig. 2b). ''Resetting'' blocks -where the next action is to unblock the same component -made up 55%, ''Controlling blocks'' -those followed by an activation of a different component -made up 24%, and the remaining 21% were classified as ''Other''. This nuisance category includes cases where a block is performed by a participant who has no activations remaining or performs no subsequent activation or unblocking action before the end of the trial.
We checked whether participants used Resetting and Controlling blocks in ways that make sense from a bounded rationality perspective. We assume that Resetting is useful at moments where expected future complexity is high, while Controlling blocks should be used when people expected low complexity (i.e. when they were ready for the next activating intervention). In line with this, the number of expected unrevealed events was higher for seconds in which Resetting blocks were performed (Median = 0.21, Mean = 2.62) than those where Controlling blocks were performed (Median = 0, Mean = 2.12, Mood's median test: 2 (1) = 14.69, < 0.001, Fig. 10). We asked whether performing blocks is related to participants' accuracy. We compared accuracy of different structures in Experiment 1 and 2 (Fig. 4 vs. Fig. 12) and found that the accuracy of full loops Cyclic1 ( (167) = 3.62, < .001) and Cyclic4 ( (167) = 2.53, = .012) significantly differed between two experiments. We checked whether the blocking function accounted for the difference. For Cyclic1, participants in Experiment 2 performed at least one Resetting (26%) were more accurate than those who did not make this kind of block ( (93) = 2.42, = .018, Cohen's = 0.56). However, Controlling blocks (17%) did not make a significant difference ( (93) = 1.48, = .141). This finding was replicated in Cyclic4 that accuracy was positively associated with the use of Resetting blocks (16%, (93) = 3.58, < .001, Cohen's = 1.01), but not with Controlling blocks (6%, (93) = 1.87, = .06). This indicated that Resetting blocks may be more helpful than Controlling blocks. Given that the performance was better for several cyclic structures, there was some benefit to having the blocking ability.
T. Gong et al. Where to activate. Similar to Experiment 1, participants tended to explore the devices initially by activating untested components. Participants were more likely than chance to activate an untested component with their second and third activating interventions (and their fourth for four node devices, < .001), which is in line with how the informationally efficient intervener behaves (Fig. 11).
Discussion. In Experiment 2, we allowed participants to use blocking as a tool for causal learning. The addition of blocking made the action space larger but also gave participants more fine-grained control over the learning input, allowing them to not just inject excitation into the system but also to selectively inhibit it. As found in Experiment 1, evidence informativeness positively predicted participants' performance in acyclic structures while evidential complexity negatively predicted performance in cyclic structures. Accuracy was less strongly associated with structure cyclicity than in Experiment 1, which may in part be due to the fact that blocking helped participants to accommodate and counteract the differences in excitability and ambiguity characteristic of interactions with the different causal devices. We also replicated the finding that people performed fewer activations and waited longer to perform their next activation in cyclic structures where expected computational cost was generally higher.
Participants used blocking in only a quarter of trials. However, when blocking was employed it was used in sensible ways that primarily managed inferential complexity. Participants blocked more often in cyclic than acyclic devices and did so when many events could be expected to occur in the near future. This is consistent with the assumption that learners take management of computational cost into consideration when choosing how to intervene in real time. We categorized blocks according to two potential goals: Resetting the system and Controlled testing -combining a block with an activation to test a subsystem in isolation. Both of these strategies were more likely to employed after moments of high expected complexity.

Modeling the judgments
The following two sections detail our quantitative analysis of the role of complexity in shaping participants' causal judgments and intervention choices. We compare a set of computational models to demonstrate that: (1) Participants' causal judgments were affected by evidential informativeness and complexity and (2) participants' interventions strike a balance between the informativeness and complexity of future evidence. Readers less interested in technical detail can safely skip ahead to the General Discussion.
To quantitatively test the idea that evidence complexity is not just positively related to informativeness, but also impacts human performance directly, we built a computational-level model that assumes human causal judgments ∈ { → , ← , ↔ , ∅ } are a noisy version of the ideal observer's posterior marginalized across connections IO , where the noise degree depends on the density, and hence complexity, of the evidence. We capture this with a dynamic softmax function (Luce, 1959): where denotes a trial's event density (average number of events per second). The judgment temperature component is thus a linear function of events ( ) = 1 + 2 with two parameters 1 , 2 ∈ (0, +∞) that are constant across trials, while varies across trials depending on what interventions are performed and how the system reacts to them. As ( ) → +∞, model predictions become more uniform, while if ( ) → 0 the predictions become more deterministic and increasingly resemble those of an ideal observer maximizing over the posterior. We fit this model with participants' choices across two experiments and compare it to a baseline model that made random judgments, and a informativeness-based model that only considers IO judgments by omitting 1 from Eq. (1). We used hold-onedevice-out cross-validated log-likelihood as our primary measure of model fit but also include BIC for completeness and comparison with past work (Tauber, Navarro, Perfors, & Steyvers, 2017). Our cross-validation scheme is conservative, since it requires a unified explanation for human data despite different causal devices exhibiting markedly different characteristic dynamics. Table 2 shows the results. In both Experiment 1 and 2, the model that combines informativeness and complexity outperforms the informativeness-only model. Individual results are relatively similar between the two models in Experiment 1, but more-strongly favor the combination model in Experiment 2 where there were more data points for a single person (12 vs. 18 data points in Experiment 1 vs. Experiment 2). These results suggest that, while more complex evidence carries more information on average, its complexity takes a toll on human performance, presumably due to our cognitive limitations.

Modeling the interventions
The final part of our analysis describes a computational account of complexity-sensitive intervention selection and compares it to participants' intervention choices in both experiments. Intervention selection is the problem of choosing what to do now, in order to support future learning. Normatively, this depends on the learner's prior over causal structures ( ) at the point of the decision, which in turn depends on the already-observed data and earlier interventions . In our first experiment, where participants can activate but not block, each participant must choose, at each moment in time, between intervening on one of the components or doing nothing, leading to the intervention space = { , , , ∅} for three node systems. If the learner has used all their activation chances, this reduces to just the option of doing nothing ∅. In our second experiment, where participants and models could also block components, the action space is larger, including actions that toggle the block status of each node such that it becomes blocked if currently unblocked or unblocked if currently blocked (e. g. = { , , , , , , ∅}). While in principle this intervention decision needs to be made constantly, at every instant throughout the trial. In practice we simplified our analyses by assuming that learners make exactly one intervention selection decision per second. 3

Expected information gain
Information gain (IG) is a common currency for measuring the value of evidence for an ideal learner (Coenen, Nelson, & Gureckis, 2019;Nelson, 2005;Shannon, 1948). The goal is to select the intervention (or sequence of interventions) that is expected to have high information gain or, in other words, that best reduces the learner's uncertainty. To do this exactly, one must quantify how much every possible intervention decision * is expected to reduce future uncertainty about the structure of the causal system given the current beliefs and marginalizing over consideration of possible future evidence. We take a greedy approach in favoring actions expected to maximally reduce future uncertainty at this point but without considering potential subsequent actions. 4 The learner's uncertainty at time can be measured by calculating the Shannon entropy ( ) of the current prior ( ) based on all the evidence experienced so far: The ideal calculation of future information should consider all possible future evidence up to some future time point , given the hypothetical action * . However, unlike the atemporal setting, the outcome space here is continuous, meaning we must approximate this integral by sampling a subset of possible futures. We achieve this by simulating a set of possible outcome sequences̃under different structures. We further assume the number of samples simulated under each structure is based on the structure's (current) prior probability (Nelson, 2005). 5 For each simulated futurẽ∈̃we compute the information gain as: and expected information gain as: Note that in this setting, the anticipated information results not only from the focal action choice * , but also from other recent actions and effects that may still be expected to produce further effects and evidence. This means that one can often expect substantial information to be forthcoming even when choosing not to act ( * = ∅). Fig. 14 shows an example where the learner has already intervened, activating at = 0. Even though no effects have occurred yet, they are considering what to do one second later, at = 1. The value of doing nothing (∅) is relatively low at this point as it only includes expected information resulting from the previous intervention. The learner expects less value from activating a second time than for activating something else, since they expect to learn about the consequences of C from their first intervention. The blocking actions ({ , … , }) also have low expected information since, at this stage, they would only serve to block potentially informative dynamics produced by the previous intervention.
3 Thus, we do not attempt to predict when, within a specific 1-second window, any action would be taken but just what action, if any, is performed in each window. Occasionally participants performed more than one action within a 1-second window. This was very rare though, occurring in only 0.41% of windows in Experiment 1 and 0.36% in Experiment 2. For simplicity, we simply treated these multi-action windows as missing data and modeled the other <99% of trials. 4 This is a common choice due to submodularity results about the diminishing utility of planning ahead in active learning problems (Guillory, 2012). 5 All results are based on sampling 512 event sequences from each window. We selected this number for computational practicability and it is a multiple of 64 which is the cardinality of three-component structures and square root of the cardinality of four-node structures. We checked that this sample size resulted in stable and consistent results by replicating the simulation process with different seeds.
T. Gong et al. Fig. 14. Example of expected information gain (EIG) and expected computational cost (ECC). The learner activated at 0 and is now deciding what to do at 1 . The notions of , , and ∅ stand for choices to activate , block , or do nothing, respectively. Both EIG and ECC are temporally discounted. ECC was calculated based on expected local events with a polynomial function.

Expected cost of inference
There are various ways to measure the computational costs of integrating causal structure evidence. Our inference framework works by considering the various pathways connecting the interventions and effects under each considered structure. The number of paths scales rapidly with the number of plausibly-related effects (Fig. 2a), meaning a naïve realization of our ideal observer performs an amount of computation that scales super-exponentially in the total number of events observed so far. Nevertheless, considering all past events back to the beginning of time, which we call the global event set, is clearly infeasible outside of very simplified toy settings. Inevitably, practical constraints come into play such as excluding from consideration events that occurred long enough ago to have a negligible chance of having caused the most recent effect. For simplicity, in our primary analyses we simply assume learners are focused on a 4-second ''backtracking window'' (see Fig. 15a; c.f. Gerstenberg, Bechlivanidis, & Lagnado, 2013). That is, we assume learners enumerate and consider causal pathways involving events or interventions from up to several seconds prior to the moment at which the inference is taking place. We chose 4 s as the window size as this is long enough to include all plausible causes for any newly occurring event under our delay regime. We refer to these as the local event set and assume the learner reasons over a rolling window of local events throughout the trial. This results in a measure of inferential cost that shifts throughout a trial as a function of the number of recent events (see Fig. 15a). We will also examine other choices of the window size in Table C.1. While idealized Bayesian inference also requires estimation of the evidence in parallel under all possible hypotheses, in practice it is implausible that a bounded learner would consider the entire hypothesis space at the same time since this quickly becomes intractable as the number of components increases. For instance, there are 4064 possible structures linking 4 components together and this would increase to 1,048,576 if there were 5 components in the system. A recent proposal for how learners mitigate the complexity of structure inference in the natural world is that they consider hypotheses sequentially. For example, in the atemporal dataset setting, it has been argued that participants consider evidence under a single favored hypothesis at a time, regenerating or adapting this hypothesis only to the extent that it fails to explain the most recent evidence (Bonawitz et al., 2014;Bramley, Dayan, et al., 2017).
Since humans must, by necessity, find a more scalable approach to causal inference than our normative algorithm in order to succeed in the wild, we think of the idealized Bayesian inference as an upper bound on the computational cost of inference. We explore intervention behavior under several plausible inference-complexity-scaling functions based on either the global or local number of events and some base parameter . These functions, including linear ( ), polynomial ( ), exponential ( ) scaling, differ at how fast the cost grows with the increase of event numbers (Fig. 15b).
Similar to expected information gain, we can also anticipate computational cost (CC) of integrating future evidence. This involves counting the events occurring in simulated outcomes̃∈̃. For each hypothetical future time point considered, we count the recent events ( ) and compute the consequent complexity of performing inference about how these events could relate: where we will later allow complexity to be of linear, polynomial or exponential form with some parameter , in either the anticipated local or global events (see Fig. 15). We can then compute the Expected Computational Cost (ECC) by summing over̃∈Ẽ

Resource-rational intervention utility
According to the resource-rational framework (Lieder & Griffiths, 2020), the expected utility of an action E[ ( * )] to a bounded learner balances expected reward and cost of computation. In our case, this results in the following equation: where we assume a 1 s granularity for measurement, and where scales the cost component to align it with the epistemic reward scale of bits, the sum aggregates the expected future gains and costs over future seconds up until , with ( ) as a discount function which diminishes the utility of information and the dis-utility of computational costs the further into the future they occur. In our case this is simply done according to how long the trial remains to end (i.e. chance to affect the bonus): The ideal horizon should be the end of the learning episode (i.e. 45 s in our experiments), but we found no substantial impact upon our choice predictions beyond + 6. 6 Finally, a resource rational learner should behave according to: Fig. 15 visualizes the various elements of a trial that combine into our resource rational algorithm and Fig. 14 shows an example in which information gain and inferential complexity differ in the choices they favor and hence trade off. In sum, our framework captures how a resource rational agent should decide when and where to intervene to support their causal structure learning. We will compare human interventions against the predictions of this modeling framework.

Model fitting
Our primary class of models is based on the utility function specified in Eq. (7). That is, one that is sensitive to both expected information and computational cost. The measurement of this cost is formed as (linear, polynomial, or exponential) complexity 6 Intuitively, this horizon is reasonable here for several reasons: (1) The rational temporal discount factor makes the distant future less important. (2) Expected information gain under the ''greedy'' assumption of no future activations approaches zero after a handful of seconds, by which time even the most complex causal systems have had enough time to loop through all their causal relationships at least once. (3) The inherently stochastic delays combined with the complicated causal interactions and compounded by the learner's uncertainty thereof leads to complicated simulated dynamics whose predictive power rapidly drops toward chance beyond a few seconds (cf. Bramley, Gerstenberg, Tenenbaum, & Gureckis, 2018).  functions of (global or local) real-time expected event numbers. The complexity function controls how quickly the model expects computational costs to increase as the event number increases. For intervention decisions, this affects how sensitive the model predicts people will be in favoring choices less likely to result in large numbers of events in the future. Note that only polynomial or exponential scaling can capture the phenomenon that the more the expected unrevealed events caused by previous interventions, the greater the likelihood of avoiding future activating interventions (Fig. 15c). We used polynomial scaling with a generic exponent of 2 as the primary form here while the results of other complexity functions with different exponents or bases can be found in Table C.2. To investigate whether participants are sensitive to both expected information gain and expected computational cost, we also examined a purely information-driven model that removes ECC from Eq. (10). However, a model that considers computational costs could easily beat such a purely information-driven model given the fact that greedy-EIG underestimates the value of waiting when opportunities to intervene are finite. 7 In contrast, the vast majority of time windows in the human data did not contain an action. 8 Therefore, we included a constant bias against action that increased the probability of not acting: ( ) = 1 if = ∅ and ( ) = 0 otherwise. This allows for a fairer comparison between dynamic-cost-dependent and cost-free models. If our EIG-ECC model outperforms the EIG model, this means that participants timed their interventions in a reactive way to cope with the expectation of computational cost rather than simply avoiding action with a constant probability across time.

T. Gong et al.
We assumed stochasticity in participants' intervention choices captured by a softmax function (Luce, 1959) over the resultant values. The resource-rational prediction is: The model fitting procedure is similar to what we used for the judgment models. We provide hold-one-device-out cross-validation results and BIC results at both the aggregate and individual levels. As shown in Table 3, across both experiments, models that considered both expected information gain and inference cost outperformed pure information driven models. The best variant for both experiments was one that anticipated costs on the basis of a polynomial function of the expected local events. Models including both information and costs also better fit more individuals in both experiments than the other models we considered (78% of participants were fit best by one of the cost-dependent models, 63% people were fit by the local cost model specifically). Fig. 16 gives an example from Experiment 1 in which the combination of expected information and cost give a better account of participants' intervention choices than either does alone. In Appendix B, we used the fitted parameters to simulate interventions and judgments and showed that these align qualitatively with the human patterns.
The fact that the boundedly rational models outperformed pure information seeking models corroborates our central idea that participants' interventions were shaped both by how much information they expected to gain and by how hard they would have to work to process potential future information. Note that while we did not present them for space reasons, model variants sensitive to only cost but not information, perform worse than all the models we present irrespective of how the cost is calculated. These models invariably favor waiting or blocking over activating components.
More individuals in Experiment 2 were best fit by the cost-free model according to BIC. This suggests that cost-free and costdependent models did not differ as much as in Experiment 1 when explaining human interventions. This might be due to the fact that the computational cost component of the model predicted learners should block fairly frequently, while participants blocked less often than predicted in general. We suspect that this is partly due to a preference for simplicity but in terms of strategy choice rather than evidence, with blocking strategies being intuitively more involved. Furthermore, our models so far only consider information gain and computational cost of the current intervention, while as discussed, people are likely to plan ahead when using blocking, for instance combining a block with a subsequent activation, which goes beyond the capability of this greedy model.

Prospective vs. retrospective complexity
We explored whether expected computational cost -which depended on both recent events but also how many events are anticipated to happen in the near future -can be substituted with a simpler retrospective computational cost considerationbased only on how many events have occurred recently. To test this, we ran retrospective variants of each model in Table C.3 finding that these were always a slightly worse fit than their prospective versions. This could be because, while the retrospective approach captures a sensible and simple heuristic of waiting until the system is quiet, this behavior can also be accounted for by expected complexity. Moreover, retrospective complexity is insensitive to earlier learning about the structure within a trial. For instance, one might have learned that the current system is highly excitatory (captured by the evolving prior shaping expected complexity) but that activity might have died down by the time of the next intervention.
Retrospective complexity also only accounts for when to intervene but not where. In contrast, expected complexity serves as a guide for both when and where to intervene. For example, if a learner has already discovered a component that seems to generate a large number of events, they may decide not to activate it again. To test whether cost-dependent choice of where to intervene is a significant feature of our participants' intervention selections, we also fit resource-rational models to only the time windows where participants made activations. At the aggregate level, the EIG-ECC model (cross-validated log-likelihood: −4788 in Experiment 1 and −9054 in Experiment 2) only had minor advantages over a pure EIG model in predicting these windows (cross-validated log-likelihood: −4789 in Experiment 1 and −9054 in Experiment 2). However, it did also better capture 31% of individuals. The individuals best-fit by the EIG-ECC model in terms of the windows in which they acted had better performance than 55% people who were best-fit by an EIG only model (accuracy: 67% vs. 59%, (144) = 2.22, = .028, Cohen's = 0.38), and better performance than the 14% people best-fit as selecting components to intervene on randomly (accuracy: 67% vs. 49%, (74) = 3.97, < .001, Cohen's = 0.99).

General discussion
In a dynamically unfolding world, uncovering causal relationships requires online control and processing of continuous sensory information. To learn about how the world works, one must choose where to act, how to act, and when to do so while also tracking what happens before, during, and after one's actions. In this paper, we investigated human learning in a setting where learners use freely timed interventions to investigate the underlying causal structure responsible for devices' patterns of real-time component activations. We investigated what factors affected the quality of their inferences, and what strategies they used to choose and time their interventions. We hypothesized that computational limitations, and a rational anticipation thereof, would play a key role in shaping real time active learning. Thus, we endeavored to quantify the actual and anticipated computational cost of the evidence stream in our task and used model fitting to show that this could help explain both human judgments and intervention patterns.

What we found
Our empirical findings fall into two classes: (1) Insights about features of real time causal systems that determine how easily people can learn them and (2) insights about how people choose interventions to support their learning.
T. Gong et al. What experimental factors affected participants' learning? Across both our experiments, participants had more success identifying acyclic than cyclic structures while an ideal observer model showed the reverse pattern, highlighting a fundamental divergence between ideal and bounded learning. The ideal observer benefits from the higher density of events produced by feedback loops, essentially because it is able to enumerate and marginalize over the many possible causal explanations for the data, and make ideal use of the rich timing information. Meanwhile, participants' ability to do this was presumably limited by their information processing capacity, leading to a kind of ''less is more'' phenomenon (cf. Gigerenzer & Todd, 1999) in which simpler evidence was often more valuable to them even when less normatively informative. In line with this, we showed that human accuracy patterns can be accounted for through a corruption of the ideal observer that assumed bandwidth limitations on the processing of evidence, such that inferential noise and probability of error increase with the compounding effect of event density, potentially more than counteracting the value of the additional information.
While past work has demonstrated that people are sensitive to delay reliability, and use delay information in addition to order to shape causal structure judgments (Bramley, Gerstenberg, Mayrhofer, & Lagnado, 2018;Greville & Buehner, 2010), reliability had little impact on performance here. Delay condition did not make a statistically significant difference to accuracy in either experiment suggesting reliable delays may be less critical in the active learning setting, where interventions can be dynamically adjusted to accommodate experienced variability. Another possibility is that our delay manipulation was too subtle. However, our unreliable condition had a sevenfold wider standard deviation which is both salient in visualizations of the trials and commensurate with past work (Bramley, Mayrhofer, et al., 2017). We note also that the top 10% of performers were almost all in the reliable condition (100% in Experiment 1 and 90% in Experiment 2). We take this to suggest that delay reliability is important for achieving high accuracy. Additionally, reliable condition participants were better at identifying that full-loop cyclic structures than unreliable condition participants which might suggest that reliable delays allowed extended temporal patterns like periodicity of cyclic activations to contribute to structure identification.
We found several other systematic judgment errors. Some participants mistook chain structures as fully-connected, marking extraneous indirect links from the root components to distal child components. This lines up with the findings of a number of atemporal causal learning studies (Fernbach & Sloman, 2009;Lagnado & Sloman, 2004) as well as studies that have used continuous valued variables (Davis et al., 2020). This pattern may reflect the ''local computations'' idea that people often focus at subparts of the larger system, such as on pairs of variables, and so experience an appearance of direct causation when observing indirectly connected components. Participants were also quite likely to mistake fork structures as chain structures. Since the outputs of a fork would invariably occur in some staggered pattern, this seems straightforwardly consistent with occasional fallback on a heuristic of taking temporal order to directly reflect causal order (cf. Bramley, Gerstenberg, Mayrhofer, & Lagnado, 2018;McCormack et al., 2016).
Among the cyclic structures, participants frequently mistook output components as constituent elements of their upstream feedback loops (e.g. in Cyclic3, Cyclic5 and Cyclic6). This is not unreasonable because these output components tended to activate almost in lockstep with components of the feedback loop that produced them. While informative to an ideal observer, the subtle differences in inter-event delays between components that formed part of the loop, and components that formed the loop's outputs were presumably difficult to process for bounded human learners.

How do people choose what interventions to perform?
Participants generally performed fewer activations in cyclic structures. Multiple activating interventions in cyclic structures could quickly compound the complexity of the subsequent evidence, presumably overwhelming bounded learners and motivating learners to avoid this. For four-node structures, more activations were required to achieve an equivalent level of certainty as in three-node structures, and participants performed more activations on these problems especially in Experiment 1. More generally, participants appeared to adjust their interventions depending on the current and anticipated event density. They tended to perform more activating interventions at moments when the number of expected unrevealed events was low, and were more likely to wait or use blocks to reset the system when the number of expected unrevealed events was high. Strikingly, in both experiments, the participants who performed well were those who managed to generate evidence that was relatively more informative and less complex (Fig. 7). In Experiment 2, participants did not use blocks nearly as often as activations, but most participants did use them and did so particularly in excitable structures and at moments of high anticipated complexity. Participants performed better on the full-loop structures when allowed to block, compared to Experiment 1 (Cyclic1 and Cyclic4). Full-loop structures are intuitively far easier to understand when exhibiting a single oscillating cycle of activity. If participants performed a second activation intervention before such a cycle died out, they would face confusing evidence patterns with two excitations traveling around the system in tandem, potentially overtaking one another and going in and out of sync. Resetting blocks in particular seemed to help learners control such complicated scenarios.

Resource-rational active structure learning
The bounded nature of cognitive computation has long been discussed in relation to models of human learning (Anderson, 1990;Simon, 1982). While early research conceptualized the role of cognitive resources qualitatively, more recent studies have aimed to quantify cognitive costs and estimate boundedly rational norms Lieder & Griffiths, 2020;Vul, Goodman, Griffiths, & Tenenbaum, 2014). Utility functions that combine both expected rewards and computational costs have been shown to better capture a variety of human behaviors including estimation (anchoring-and-adjustment, Dasgupta et al., 2017;Lieder et al., 2018), planning (Callaway et al., 2022), information sampling (Petitet, Attaallah, Manohar, & Husain, 2021), decision making (Gershman, 2020), and communication (Hawkins et al., 2021). The current paper extends this line of research to the problem of real-time active causal learning. By building and comparing computational models, we firstly showed that participants' causal judgments depended on both the informativeness and the complexity of the generated evidence. More importantly, we then showed that in addition to the standardly-considered exogenous costs of interventions (Coenen et al., 2015;Coenen, Ruggeri, et al., 2019), people also care about the internal costs that arise from integrating different forms of causal interaction data. That is, learners were sensitive to the fact that information following an intervention has to be processable to be useful.
Specifically, out of the measures of complexity we examined, a polynomial function of inference-relevant events (those in the recent past and expected in the near future) best captured the influence of complexity on intervention choice, and we found that prospective complexity as well as retrospective complexity contributed to participants' choices. While it would be premature to take this functional form as final, or to make a judgment about whether participants under-or over-anticipated the actual effect of complexity on their inferences, we feel this reflects an intuitively sensible and plausible sensitivity to local events capturing the fact that inferential complexity compounds as the number of actual causal relata increase (Bramley, Dayan, et al., 2017;Fernbach & Sloman, 2009;Van Rooij, Blokpoel, Kwisthout, & Wareham, 2019).
The issue of just how complexity scales raises a question as to what inference process learners actually used in this task. Although many papers, including this one, have laid out computational-level mechanisms of causal structure induction (Griffiths & Tenenbaum, 2009;Pearl, 2000;Rottman & Hastie, 2014), these are typically intractable, requiring a run time that scales often far worse than exponentially in the number of relata. As has been argued forcefully elsewhere (Van Rooij et al., 2019), this makes most computational-level models ''non-starters'' as process accounts of human inference in natural settings, since any plausible account will have to deal with more than a handful of components or events without requiring a time to compute that is beyond the lifespan of the organism (or even the universe). We note that the resource-rational framework adds another layer of computation, which is itself intractable. We use it here to establish that people are sensitive to information and computational cost but we do not provide a recipe for how learners anticipate these costs, given that this depends critically on their inferential processing. Human learning is necessarily more piecemeal and approximate and indeed, human responses are much noisier than our ideal observers'. There are some promising avenues for process accounts that can model aspects of this variability and noise. Simulation-based (Gerstenberg et al., 2021), summary statistics (Gong & Bramley, 2020, and incremental search (Bramley, Dayan, et al., 2017) algorithms have all been proposed in recent years as aspects of how learners simplify and approximate solutions to structure learning.
When considering complexity, it is perhaps surprising that there was not more difference in performance between 3-and 4variable problems since the latter involve an order of magnitude more hypotheses. However, this is in line with recent incrementalist accounts. It has been argued that learners form one or a few hypotheses at a time (Bonawitz et al., 2014), or focus on subparts of the larger system (Davis et al., 2020;Fernbach & Sloman, 2009). These accounts are better able to scale up to inferences among more relata (Bramley, Dayan, et al., 2017). Bramley, Dayan, et al. (2017) show that in inference from interventions in the atemporal covariation setting, people rely on sequential local changes to gradually update their beliefs to incorporate new evidence. Compared to maintaining a global prior, this localist approach may help people to deal with situations involving more than a handful of variables without invoking an exponential increase in computation or catastrophic loss of performance. Thus we conclude from this pattern, that however people manage the complexity of real time causal structure inference, they do so in such a way that they are affected by the number of events, but less by the total number of components. Indeed, the run time of our ideal observer was far more sensitive to the number of paths it had to evaluate per possible hypothesis than the number of hypotheses it evaluated.

Insights for a process-level model of real time active causal learning
Causal structure induction. We have shown how complexity effects human judgments from a computational level perspective, essentially shaping the nature of the optimization problem faced by active causal learners in real time (Marr, 1982). Recent work on observational causal learning has generally found simple event order to be a strong driver of structure judgments, with delay expectations tending to have smaller and subtler effects on inferences (Bramley, Gerstenberg, Mayrhofer, & Lagnado, 2018;Valentin et al., 2020). As such, some form of a simpler endorser heuristic (Bramley et al., 2015;Fernbach & Sloman, 2009) which assumes that people attribute an effect to the most recent event may do a reasonable job of describing some participants judgments. Such a heuristic, however, does not explain participants' good performance in identifying acyclic and cyclic classes. Taking fork structures for example (Acyclic3 and Acyclic6), the order of child node activations would often reverse following repeated activations of the root node, but participants rarely reported bidirectional connections between these child components. More fundamentally, a learner that relies on a temporal order heuristic would lack the necessary representation of the hypothesis space and uncertainty needed to guide intervention selection. We would need a separate account for how people make intervention choices, and a linking model for how the inferences and activation choices are connected with one another.
One other finding that a comprehensive process account would need to accomodate, is that in cyclic structures, participants frequently drew a bidirectional link between the output component and the component in the loop that tended to activate closest in time (Cyclic3, 5--7, and 8), despite the fact that participants were trained to expect longer delays than this between truly causally related events. One interpretation is that repeated spontaneous activations are a prototypical signal of cyclic relationships (Valentin et al., 2020). Our participants may have reasoned in terms of this abstract feature, and applied it without thinking more carefully about the exact timing at which the events happened (Goodman, Ullman, & Tenenbaum, 2011). Alternatively it could be that our pre-training on delays was not sufficient to overrule a prior expectation of causal contiguity, that is, that cause and effect events happen close in time (Hume, 1740). Indeed, past studies have required strong manipulations to override this preference (Buehner & May, 2004;Buehner & McGregor, 2006). More work is needed to better understand how temporal delays affect causal judgments.
Intervention. One long-standing debate centers on the question of whether human active learning and intervention choice involves anticipation of information at all, or whether it relies on heuristics such as simple endorsement (Bramley et al., 2015), positive or confirmatory testing (Coenen et al., 2015;Steyvers et al., 2003). In the current setting, one reasonable heuristic might be to explore components until effects are discovered. If a component appears to produce multiple effects, a learner might repeat-test it, or probe the components at which the effects occurred. In this way, learners might follow a kind of extended positive testing strategy in which they focus their energies on components deemed to be to produce effects so as to gather evidence ''by making the machine go''. This reflects the rationale behind positive testing that has featured in the literature on atemporal active causal learning (Austerweil & Griffiths, 2011;Bramley, Dayan, et al., 2017;Coenen et al., 2015;Klayman & Ha, 1989;Steyvers et al., 2003). However, distinct from the atemporal setting, there is lots that can be learned by repeatedly testing suspected root components in our experiments, meaning it is hard to distinguish whether the repeat selection of root components was driven by explicitly computing expected information gain or by following a simpler strategy such as combining random exploration with positive testing.
Additionally, it is possible that people choose when to intervene separately to where to intervene, for instance using current complexity as a way to decide when to perform one's next intervention and then selecting this without regard to anticipated complexity. To resolve these questions about psychological processing, future studies could set up continuous-time active causal learning scenarios that pit the predictions of heuristics against those of computational norms. However, the current work shows that whatever heuristic or adaptive toolbox is proposed, to fully capture intervention choice, it must include strategies that are at least be somewhat responsive to experienced and anticipated complexity.

Future questions
Some questions related to continuous-time causal learning remain for future research. One open question is where cyclic structures fit into the overall landscape of causal cognition. A full representation of a cyclic system seems to demand a temporal dimension and predictions are generally sensitive to the system's current state. In our experiments, participants performed well above chance for most connections in most cyclic structures without extra training, and were able to reliably determine if a structure contained a cycle even though they were less accurate in identifying the exact structure. This suggests that they can understand cyclic relationship relatively intuitively. Nevertheless, some cyclic structures may be particularly challenging for people to understand, and the reasons for this may go beyond what is captured by our general computation cost account. For example, participants performed relatively poorly in identifying the internal structure of cyclic structures with an output of a loop as mentioned in the paragraph above. An analogous ambiguity in reality could arise wherever it is unclear which events are pure effects (with no potential to control the system dynamics, such as symptoms of a disease) and which are constituent parts of the system's feedback loop (such as the pathogen). If one is interested in controlling a cyclic system, it is important to identify and act on components that are inputs to, or constituent parts of the feedback loop, rather than pure effects, since only by doing so can one nudge the system toward whatever state one wants it to take. This makes it valuable to explore more factors that affect learning and control in cyclic systems.
Another open question concerns the relationship between temporal and covariation-based causal learning. One possibility is that these depend on separate learning processes, but it also seems likely that there are points of overlap. For example, people may extract covariation information from continuous-time evidence through some process of abduction and discretization. Furthermore, interventions might help to create data that is more amenable to these forms of summarization. Better understanding of human causal induction requires us to go beyond covariation-based causal Bayesian networks (Pearl, 2000), but this should not involve discarding the insights they have provided in the search for a unified account for causal learning. Our current paradigm simplifies causes and effects to point events with no measurable duration. However, actual events are often extended in time in complex ways and many require reset or refractory period between occurrences. Therefore, it might be informative to also consider a setting in which causes must be reset or take time to recover to make this paradigm more comparable to the statistical-based causal learning.

Conclusions
Everyday experience is rich with events that reoccur and can be causally related in ways that allow us to predict, control, and make sense of what has happened and what is likely to happen next. While previous research on active causal learning has often sidestepped the temporal dimension, in this paper we show that human learners are sensitive to time, not just in terms of how it impinges on what can be learned from evidence in principle, but also in terms of how it shapes the practicalities of gathering and interpreting that evidence. Our experiments and modeling show that participants' causal judgments depend on not just the informativeness but also the complexity of evidence they gather, and that they adapt their actions to the ongoing event dynamics during learning so as to strike a balance between expected information gain and anticipated inferential complexity. These results contribute to our understanding of causal inference in continuous time, incorporate a new dimension to the study of human active learning and offer new directions for research into human learning.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
T. Gong et al. Fig . B1. Results from simulated evidence according to the parameters fit in the intervention and judgment models.

Appendix A. Ideal observational (io) learning
We formulate how a learner should ideally update their beliefs after seeing evidence produced by interventions. The ideal observer infers a posterior distribution ( | ; ) over causal structures ∈ based on evidence conditional on interventions using the Bayes rule: Here, ( ) denotes the prior probability distribution over causal structures, and ( | ; ) denotes the likelihood of the observed data conditional on the interventions under each possible causal structure. We assume that data consists of all non-interventional activation events and their timings indexed by their chronological order and subscripted by the component at which they occur (index) component and that this is conditioned on the set of interventions including all activations (index) component and blocks (index) component performed by the learner during the learning episode. As mentioned in the main text, when calculating the likelihood of the data given a candidate structure, there are likely to be multiple potential paths of actual causation. Each of these has its own likelihood. To construct the total likelihood of a hypothesized causal structure and interventions producing a set of events, we must consider all possible causal paths that could describe what actually happened given structure and then repeat this for every ∈ . Since the path set is exclusive and exhaustive conditional on the structure under consideration , we can sum the path likelihoods to calculate the total likelihood of that structure producing the data: To construct the possible paths, each effect event must be attributed to exactly one preceding event occurring at a component with a causal link to that effect in structure . Assessing the likelihood of each valid path includes two parts: (1) Explaining all actual effects; (2) explaining away any expected effects that did not occur. The first part is just the product of the gamma densities for all the causal delays between observed effects and their putative causes. Each delay is given by pdf ( ef fect − cause , , ).
For the latter part, we need to check that each cause event assumed by the hypothetical structure has its corresponding effect(s) in the path. Each one that is missing must have failed (with probability 1 − ), or (with probability ) be either yet to occur or have been blocked from occurring. Combining these possibilities we get the following expression 9 : [ cdf (max(0, block onset − cause ), min( block of fset − cause , now − cause ), , ) ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ activation was blocked + (1 − cdf ( now − cause , , )) ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ activation has not occurred yet ] + (1 − ) ⏟⏟⏟ activation failed (15) Thus, the likelihood for a particular causal path given a particular causal structure can be calculated exactly via a combination of 9 cdf ( , , , ) in Eq. (15) denotes the cumulative probability of a delay being between and in length.
T. Gong et al. Note: The base and exponent parameter was fit by grid search. We searched in (1, 4), using a step of 0.002 for the range [1.002, 1.018], a step of 0.02 for the range [1.02, 1.18] and a step size of 0.2 thereafter. These steps approximate log-uniform intervals so are suitable for fitting a parameter bounded at the lower end but not at the higher end. Other parameters were fitted given a fixed . We reported two with CV and BIC deviated in their results. diagnostic reasoning -attributing exactly one cause for each observed effect -and predictive reasoning -attributing exactly one effect or failure to each causal link coming out of each activated component.

Appendix B. Comparing simulated resource-rational interventions and judgments
To test whether our intervention and judgment models can replicate participants' qualitative behavior patterns, we use the parameters fit by human data to simulate resource-rational agents that intervene on the same devices examined in Experiments 1 and 2. In both the reliable and unreliable delay conditions, we generated 30 simulated learners. The intervention patterns are shown in Fig. B1. This shows that for both experiments, simulated resource-rational learners activated components in acyclic structures more than cyclic structures, and four-node structures more than three-node structures. They waited longer to perform their next intervention when the structures were cyclic then when they were acyclic. For Experiment 2, they performed more blocking actions in cyclic devices than acyclic devices. These results demonstrate that our intervention model is capable of replicating a wide range of human intervention patterns.
We also provided simulated evidence to the judgment model, which was based on parameters fitted with human data. For both experiments, the noisy-IO judgment model replicated the human result that acyclic structures had higher accuracy than cyclic structures. Unlike participants, these simulations were not more accurate on unlinked structures in Experiment 2. This could be due to some extra assumptions that we did not include in our models, such as the possibility that rather than beginning each trial with a uniform prior over structures, participants may have expected causal models to be sparse (Lu, Yuille, Liljeholm, Cheng, & Holyoak, 2008).