Advances in techniques for imposing reciprocity in brain-behavior relations

To better understand human behavior, the emerging field of model-based cognitive neuroscience seeks to anchor psychological theory to the biological substrate from which behavior originates: the brain. Despite complex dynamics, many researchers in this field have demonstrated that fluctuations in brain activity can be related to fluctuations in components of cognitive models, which instantiate psychological theories. In this review, we discuss a number of approaches for relating brain activity to cognitive models, and expand on a framework for imposing reciprocity in the inference of mental operations from the combination of brain and behavioral data.


Introduction
The evolution of technology for measuring brain signals, such as electroencephalography (EEG) and functional magnetic resonance imaging (fMRI), has provided exciting new opportunities for studying mental processes. Today, scientists interested in studying cognition are faced with many options for relating experimentally-derived neurophysiological variables to the dynamics underlying a cognitive process of interest. While conceptually the presence of these new "modalities" of cognitive measures could have immediately spawned an interesting new integrative discipline, the emergence of such a field has been slow relative to the rapid advancements made in these new technologies. Until a little over a decade ago, much of our understanding of cognition had been advanced by two dominant but virtually non-interacting groups. The largest group, cognitive neuroscientists, relies on models to understand patterns of neural activity brought forth by the new technologies. Like experimental psychologists, the models and methods used by cognitive neuroscientists are typically data-mining techniques, and this approach often disregards the computational mechanisms that might detail a cognitive process. The other group, mathematical psychologists, is strongly motivated by theoretical accounts of cognitive processes, and instantiates these theories by developing formal mathematical models of cognition. The models often detail a system of computations and equations intended to characterize the processes assumed to take place in the brain. As a formal test of their theory, mathematical psychologists usually rely on their model's ability to fit and predict behavioral data relative to the model's complexity.
A recent trend in cognitive science is to blend the theoretical and mechanistic accounts provided by models in the field of mathematical psychology with the high-dimensional data brought forth by modern measures of cognition. For example, Forstmann et al. (2011) advocated for the use of reciprocal relationships between the latent processes assumed by cognitive models and analyses of brain data. While conceptually, blending these two fields may seem like the ideal approach, as this review will discuss, it is often not straightforward to impose such a relationship (Teller, 1984;Schall, 2004) as there are many theoretical, philosophical, and methodological hurdles any researcher must overcome. Yet, the pursuit continues because the payoff is far too enticing to deter some researchers: the notion that agreed upon theoretical and computational mechanisms supporting cognition could be substantiated in the one organ housing mental operations presents a unique opportunity for major advancements in the understanding of human behavior.

Reciprocal relations between brain and behavior
The relationship between fluctuations in neural data and cognitive mechanisms can be assessed through statements about the particular nature of the mapping between neural states and latent cognitive processes (Brindley, 1970;Teller, 1984;Schall, 2004). These mathematical statements are known as linking propositions, and they can be formally tested and distinguished. For example, Teller (1984) devised a set of different linking propositions specifying how physiological states map onto psychological states. In Teller's view, linking propositions should be defined by a set of logical relations, and she used systems of relations to define families of linking propositions: identity, similarity, mutual exclusivity, simplicity, and analogy. While these propositions are philosophically desirable, they depend on equality statements, which are impossible to observe in the real world as neurons cannot produce the exact same pattern of firing from one trial to the next. In our view, as trial-to-trial fluctuations in neuronal firing are unlikely to be perfectly predictive of decision dynamics, perfectly axiomatic models can be ruled out. Instead, to practically impose logical relations, we can define statistical relationships that quantify evidence for each logical proposition (see Schall, 2004 for a detailed discussion). Because these statistical relationships are posited to quantify evidence, they are viewed as being mechanically different from perfectly causal models such as those discussed in Pearl (2009), although the intentions may often be similar in spirit. Throughout our review, we will refer to the equations defining statistical relationships as the linking function, and will only consider probabilistic links rather than fully causal ones.
The purpose of defining the linking function is to then test which brain areas are related to the psychological variables we care about. In Teller's terms, neurons that form clear logical relationships to psychological states are known as bridge locus neurons. In our terms, bridge locus neurons are neurons whose association to psychological variables is quantified through the linking function. In assessing whether brain areas are related to psychological variables, it is vital that we quantify evidence as either confirming or refuting the linking propositions. This way, we will have a clear rule about whether or not brain areas constitute the bridge locus. Fig. 1 illustrates the concept of the bridge locus, and possible considerations for their instantiation. In each panel, hypothetical brain regions are related to mechanisms within a popular cognitive model, known as the diffusion decision model (DDM; Ratcliff, 1978;Ratcliff and Rouder, 1998;Forstmann et al., 2016). The DDM is useful because it mathematically specifies how psychological variables assumed in the model are related to behavioral variables observed in experiments. For example, consider a choice between detecting leftward and rightward motion in the classic random dot motion task. When viewing the stimulus, we notice small local effects of coherent motion, and over time, we arrive at a general consensus of which of the two motions are more likely. The DDM instantiates this process through sequential sampling: we extract information from the stimulus at each moment in time, and this information is gradually accumulated until we have enough information to make a decision. Conceptually, each response option can be represented in an "evidence" space where the boundary of the evidence space represents the time at which a choice is made. The DDM defines psychological variables in terms of mechanisms, and these mechanisms can be adjusted for individuals or trials to better explain how behavioral data came about. Two of the key mechanisms in the model are the rate of evidence accumulation (i.e., the drift rate illustrated as the black arrow pointing toward a boundary), and the initial evidence for the alternatives (i.e., the starting point of the accumulation process). If we were to relate these mechanisms to brain data , there are a number of possible linking propositions that should be tested. Considerations in forming the bridge locus are (1) the number of candidate brain regions (one or many), (2) the number of psychological mechanisms (one or many), and (3) which brain regions should be related to which mechanisms in the model.
In the field of model-based cognitive neuroscience, there are now many different approaches for identifying the bridge locus (de Holl et al., 2016;Turner et al., 2017b). Consistent with the mathematical propositions of the bridge locus, several researchers have attempted to infer causality between the two streams of data by either directly replacing mechanisms in cognitive models with neural data, or by searching for brain regions whose statistical properties resemble the statistical properties of cognitive mechanisms. We now review these causally-motivated approaches.

Direct input
The first approach we consider links neural activity from a given brain area directly to the dynamics of a decision model, and so we refer to it as the direct input approach. One of the issues with using cognitive models such as the DDM is that they are inherently and intentionally abstract. A drift rate defines the rate of accumulation, but what is the drift rate in terms of the neurophysiological process in the brain? Previous research has shown that several areas in the brain, such as the frontal eye field (FEF) and lateral intraparietal (LIP) area, exhibit an "accumulation to threshold" property, where the cumulative sum of their firing rates increases to a threshold level during the decision period, followed immediately by the initiation of a saccade (Bogacz and Gurney, 2007;Boucher et al., 2007;Glimcher, 2003;Hanes and Schall, 1996;Heekeren et al., 2004;Liu and Pleskac, 2011;Mulder et al., 2014a,b;Purcell et al., 2012;Purcell and Palmeri, 2016;Roitman and Shadlen, 2002;Shadlen and Kiani, 2013;Shadlen and Newsome, 2001;Smith and Ratclif, 2004;Summerfield and de Lange, 2013). The pattern exhibited by these neurons is taken to be analogous of the accumulation processes in modern accumulator models, as described above (Brown and Heathcote, 2008;Ratcliff, 1978;Ratcliff and Rouder, 1998;Usher and McClelland, 2001). Because of the striking similarities between the firing of FEF and LIP neurons and the evidence accumulation process in cognitive models, it seems reasonable that the activity in these neurons may map directly onto the accumulation process.
One approach to make accumulator models more concrete is to use the neural activity during a decision process to replace the mathematical mechanism that generates evidence accumulation in the model. This approach tightly constrains the link between neural and behavioral data because the neural data are used to generate a direct prediction about behavioral data in the task. This approach was first explored in Purcell et al. (2010), who mapped the firing rate of neurons in the FEF Fig. 1. Considerations when mapping brain to behavior. When forming a map between neural and behavioral data, one must consider the type of connections that must be built, such as the number of brain regions and how they are connected to the mechanisms in a cognitive model. For example, one may consider the joint activation of only a single brain region (top row), a single cognitive mechanism (left column), many brain regions (bottom row), and many cognitive mechanisms (right column).
to the evidence accumulation process in an accumulator model. Specifically, the authors mapped the firing rate of visually responsive neurons within the FEF onto perceptual evidence and the firing rate of movement-related neurons onto evidence accumulation, driving the decision process. Here, the neural activity served as a direct input to the behavioral model, subverting the need for latent processes representing evidence accumulation, such as drift rate and starting point. This allowed for a more explicit test of whether visually responsive and movement-related neurons in ocular motor areas of the brain could predict the onset and location of a saccade in a perceptual decision making task, rather than attempting to explain this process post-hoc by interpreting latent parameter estimates and mapping them onto the proposed underlying mechanisms. Since this initial investigation, several other examples have provided convincing links between the neuronal activity in the ocular motor areas and the dynamics of accumulator models (Purcell et al., 2010(Purcell et al., , 2012Purcell and Palmeri, 2016;Schall et al., 2011).
In addition to exploring how the neuronal activity of the FEF could map onto the evidence accumulation process, the direct input approach has been useful in distinguishing between competing accumulator model dynamics, expanding our understanding of how subjects complete the task. By exploiting the constraint imposed between the neural and behavioral data, the authors were able to test the types of transformations of the neural data that were needed to best account for the behavioral data. Specifically, they tested assumptions about how evidence was accumulated over time, such as independent race and counter models (Smith and Van Zandt, 2000;Vickers, 1979), diffusion and random walk models (Laming, 1968;Link and Heath, 1975;Nosofsky and Palmeri, 1997;Ratcliff, 1978;Ratcliff and Rouder, 1998), competing accumulator models (Usher and McClelland, 2001), and gated models (Purcell et al., 2010(Purcell et al., , 2012Schall et al., 2011).
Direct input models are most directly related to causal models in the sense that they typically involve direct transformations of the neural signal into decision variables, such as the rate of accumulation in sequential sampling models. Once a transformation has been specified within the model, any fluctuations in the neural data manifest directly as fluctuations in the behavioral response. While in principle this approach has a more concrete link, it places strong assumptions on the veridicality of the neural data, while still treating the behavioral data as probabilistic. This creates some statistical issues when generating the behavioral metrics, as the length of the neural data are explicitly linked to the latency of the behavioral response. For example, Purcell et al. (2010) defined the decision variable as a stochastic process whose primary drive was a direct function of a single unit recording. One can simulate the model using the single unit data up until the length of the neural data has been exceeded. However, if the decision model has not reached a criterion to produce a prediction for the observed behavioral data, how does one go about extrapolating the neural data to continue the stochastic simulation? One approach is to pool information about the single unit data across trials to create an aggregated signal from which simulations can be performed within for example, a condition of the experiment. The pooling approach ensures that a decision can be reached, but it also treats across-trial variability in the neural signal as noise, which distorts the high resolution that single-unit recordings provide. Another solution is to treat the neural data as probabilistic, which in turn creates a statistical rather than purely causal link. Because treating both neural and behavioral data as probabilisitic is more consistent with what we refer to as a joint model, we save the discussion of this alternative approach until later (see Cassey et al., 2016).

Indirect input
The field of reinforcement learning developed an approach to find neural (often fMRI) correlates of internal model representations (Gläscher and O'Doherty, 2010;O'Doherty et al., 2007). For example, the Rescorla-Wagner model (Rescorla and Wagner, 1972) of classical conditioning characterizes learning as a process of sequentially updating the expected value associated with presented stimuli. The updating process depends on the mismatch between the expected value and actual outcome (the prediction error), modulated by a learning rate parameter that can be estimated using behavioral data. The resulting model produces trial-by-trial expected values and prediction errors, which can be regressed against neural data to find neural correlates of internal representations. Multiple applications of this approach suggest critical roles of ventral striatum in encoding prediction errors and orbitofrontal and mediofrontal cortex in encoding expected value (Daw et al., 2006;Gläscher et al., 2009;Hampton et al., 2006;O'Doherty et al., 2003;Tanaka et al., 2004).
Internal model representations also provide a means to perform model discrimination. Mack et al. (2013) addressed the debate (Minda and Smith, 2002;Zaki et al., 2003) on whether category representations are based on exemplars, or on prototypes (also see Palmeri, 2014). Prototype theory (Posner and Keele, 1968;Reed, 1972) holds that category representations are based on abstract prototypes that bear resemblance to all members of the category, while exemplar theory (Medin and Schaffer, 1978;Nosofsky, 1986) argues that category representations are based on episodic traces formed during learning. Computational models of both theories fit behavioral data equally well, yet the inner representations of both models differ. Mack et al. exploited this discrepancy using multivariate pattern analysis (MVPA) to decode both models' inner representations from fMRI data obtained while participants performed an object categorization task. The results showed that neural data resembled the inner representations of exemplar theory much more closely than those of prototype theory.
Relating internal model representations to neural activity is also a prominent method in the field of vision. For example, the recent success of deep neural networks (DNNs; Kriegeskorte, 2015;LeCun et al., 2015;Yamins and DiCarlo, 2016) in predicting object category spawned research lines to investigate how well DNN representations of visual objects correspond to representations in human cortex Cichy et al., 2016;Güçlü and van Gerven, 2015;Khaligh-Razavi and Kriegeskorte, 2014;Yamins et al., 2014). In one study, Güçlü and van Gerven (2015) transformed DNN representations into predicted neural responses, and correlated these with actual neural responses across the ventral stream of the visual pathway. They showed that the gradient of increasing complexity of object representations across layers in the DNN closely matched the increasing complexity of object representations across the ventral stream. These and similar approaches with other encoding models (Kay et al., 2013a,b;Kay and Yeatman, 2017) help us understand which kind of computations the brain performs to process sensory information into meaningful representations.

Parametric maps
Where the indirect input approach relates internal model representations to neural data, another approach is to regress the cognitive model parameters themselves onto neural data (Forstmann et al., 2008(Forstmann et al., , 2010a(Forstmann et al., ,b, 2016Boehm et al., 2014;Ho et al., 2012;Mansfield et al., 2011;Mulder et al., 2014Mulder et al., , 2012Summerfield and Koechlin, 2010;van Maanen et al., 2011;White et al., 2014White et al., , 2012Rodriguez et al., 2015;Turner et al., 2018a). Generally, the aim is to explore neural mechanisms underlying cognitive processes. In a now classic example, Forstmann et al. (2008) used this approach to study neural adjustments underlying the speed-accuracy trade-off (SAT) in perceptual decision-making. The SAT refers to the ability to increase accuracy at the cost of speed and vice versa (Bogacz et al., 2010;Heitz and Schall, 2012). Experiments studying the SAT generally instruct participants to stress either speed or accuracy in each upcoming decisionmaking trial. Decision-making models are then used to quantify the difference in response caution between SAT instructions, and this difference serves as a measure of participants' flexibility in adjusting their behavior. Forstmann et al. (2008) found that individual variability in response caution adjustments correlate with individual variability in blood oxygenated level dependent (BOLD) responses in striatum and presupplementary motor area. Multiple follow-up studies (Mansfield et al., 2011), for example using structural MRI measures (Forstmann et al., 2010b(Forstmann et al., , 2011 or focusing on within-subject variability by calculating trial-by-trial adjustment in response caution (Boehm et al., 2014;van Maanen et al., 2011;Turner et al., 2015), provided additional evidence for a role of these areas in control of response caution. These results are especially interesting as they support prominent theories of action selection in the basal ganglia (Alexander, 1986;Bogacz and Larsen, 2011;Frank, 2006;Lo and Wang, 2006;Ratcliff and Frank, 2012;O'Reilly and Frank, 2006).
Perceptual decision-making models allow researchers to quantify other latent processes as well. Various studies (Forstmann et al., 2010a;Mulder et al., 2012Mulder et al., , 2014Summerfield and Koechlin, 2010) focused on the neural mechanisms that allow for flexible adjustment of behavior due to biasing information. Typically, participants were presented with a cue providing either prior information (i.e., the cued choice option is more likely to be correct), or potential pay-off (i.e., the associated reward with the cued choice option is higher). After quantifying the amount of choice bias using decision-making models, individual differences in bias were correlated with differences in neural measures. The results suggest that in addition to frontoparietal networks (Mulder et al., 2012), the orbitofrontal cortex is involved in processing such bias cues (Forstmann et al., 2010a;Summerfield and Koechlin, 2010). These results imply a role for the orbitofrontal cortex in encoding expected reward, which is corroborated by the reinforcement learning literature described above.
As another example, Turner et al. (2018) examined the relationship between nonlinear mechanisms in decision processes and the engagement of prefrontal cortex in the intertemporal choice task. In this task, subjects are asked to choose between a lower valued immediate reward and a higher valued reward at some point in the future. Similar to the adjustments made in perceptual models for preferential choice (Usher and McClelland, 2004;Hotaling et al., 2010;Turner et al., 2018c), Turner et al. (2018) adapted mechanisms such as lateral inhibition and leakage (intrinsic to Decision Field Theory (Busemeyer and Townsend, 1993) and the Leaky Competing Accumulator model McClelland, 2001, 2004); see Box 1 to examine a broad range of possible theoretical explanations of how self-control processes emerge when making goal-directed decisions. Importantly, their analyses revealed that when subjects engage in a self-controlled decision that maximizes reward despite a temporal cost, their brains are differentially activated relative to when they make impulsive decisions that minimize temporal cost and do not maximize reward. After fitting their models hierarchically to data from many individuals, they determined that the best explanation for this neural asymmetry was a dynamic, oscillatory feature selection process (Busemeyer and Townsend, 1993;Hotaling et al., 2010;Dai and Busemeyer, 2014) combined with active asymmetric suppression (i.e., through lateral inhibition; McClelland, 2001, 2004) of tempting, yet inferior, choice options. Furthermore, Turner et al. (2018) showed how the difference in the asymmetry of active suppression was significantly correlated with fronto-parietal brain areas often engaged in cognitive control (Botvinick et al., 2001(Botvinick et al., , 2004) on a trial-by-trial level.

Joint models enforce statistical reciprocity
As discussed in Section 1, linking propositions are strict logical statements between physiological and psychological variables. However, because both neural and behavioral data are noisy and biologically constrained, strict linking propositions are impossible to instantiate formally (Schall, 2004). As a remedy, our methods of assessing the strength of a relationship should be based on statistical principles, where noisy relationships in the data are taken into account. Importantly, to test which brain regions constitute the bridge locus and which do not, we must quantify the strength of the relationship by carefully considering all sources of variability in the neural and behavioral measures. Furthermore, as highlighted in Forstmann et al. (2011), it is important that the link be reciprocal, as both the physiological and psychological bases of the bridge locus are random variables.
One new approach for addressing the statistical uncertainty of the bridge locus while simultaneously imposing a reciprocal link between brain measures and decision variables is the "joint modeling" approach (Turner et al., 2013(Turner et al., , 2016(Turner et al., , 2017Turner, 2015;Cassey et al., 2016). Unlike the direct input or parametric map approaches, joint models enforce a constraint on model parameters based on the random variation in the neural data. In other words, if one treats the neural data as a statistical covariate within the model, the estimates of the behavioral model parameters can be better informed. This simple strategy gives joint models some important advantages. For example, joint models are better equipped to (1) handle mismatching (i.e., when the size of the neural data is different from the size of the behavioral data) and missing data, (2) perform inference on the magnitude of brainbehavior relationships (i.e., they are not subject to Type I errors as in the parametric mapping approach), (3) compare hypothesized brainbehavior relationships across models, and (4) make predictions about either neural or behavioral data. Fig. 2 provides an illustration of the joint modeling approach, applied to 30 s worth of an experiment involving a decision among three alternatives. Neural and behavioral data are separated into streams, and each measure is captured by "submodels." For neural data, candidate sets of ROIs are defined (left panel) and the time course of their activation is extracted. A statistical model of how stimulus presentations (red triangles) affect the BOLD response are specified and fit to the extracted neural signal (middle). The process of fitting the model to data procures estimates of activation parameters for each stimulus presentation. For the behavioral data, a cognitive model is developed (left), and similarly fit to behavioral data such as choice response time measures (middle). Parameter estimates quantify how the stimulus presentations affect the psychological processes during the task. Finally, to impose statistical reciprocity, a linking function specifies how and are related (see Box 2).
Of course, there are many different ways of creating a linking function between the two streams of data, and these linking functions have different advantages and disadvantages. One aspect of the linking function that is important for creating divisions in the literature is the manner in which reciprocity is imposed. For example, links can be imposed that are partially reciprocal, where only one set of parameters (e.g., ) are influenced by both streams of data. On the other hand, fully reciprocal links can be specified such that both sets of parameters (i.e., and ) are influenced by both streams of data. The definition is a technical one, but it distinguishes the types of reciprocity in terms of how the likelihood function relating model parameters to data is specified. If the likelihood of a stream of data can be written as a function of one (i.e., partial) or both (i.e., full) sets of parameters, it is what we call a joint model. Because the (likelihoods of the) approaches we discuss in Section 1 cannot be expressed in this way, we do not consider them to be joint models per se, although clearly the intentions are similar. Fig. 3 illustrates three different ways of specifying the linking structure, two of which have been used, and one we will discuss in the Future Directions section below ( Fig. 3; right panel). The left panel shows an approach we refer to as a Directed joint model, where neural features are regressed onto model parameters. For example, a linear plane could be used to relate the activation 1 and 2 of two regions of interest to a model parameter (bottom panel). Here, the values of 1 and 2 strictly determine the value of , and so the path of influence is unidirectional, constituting partial reciprocity. Another approach is the Covariance approach, where a probabilistic linking function is imposed.
Here, all neural features can interact with one another, as well as the model parameters. The probabilistic map can be used to specify a distribution on , where the values of 1 and 2 are used to slice through a hyper ellipsoid. Here, the path of influence is bidirectional (i.e., double headed arrows), constituting full reciprocity. Finally, to create more flexible maps, one could use a Neural Network approach where all neural features map to "hidden states" before being converted into model parameters. These linking functions allow for distributed activation that can be highly nonlinear, yet still only partial reciprocity is established. We now discuss these linking functions in turn.

Directed models
The left panel of Fig. 3 illustrates the basic idea behind directed joint models: parameters of a behavioral model are linked to parameters of a neural signal of interest through a deterministic function.

Box 1 Popular decision models describing human behavior.
There are several models that work well to describe choice response time distributions in a variety of decision making paradigms. Three popular models are the Linear Ballistic Accumulator (LBA; Brown and Heathcote, 2008) model, the Racing Diffusion Model (RDM; Logan et al., 2014), and the Leaky, Competing Accumulator (LCA; Usher and McClelland, 2001) model. These models make a number of different processing assumptions, and the figure below illustrates a few of these important differences. One can view the models as having similar architectures, but with increasing degrees of complexity (arranged in increasing order from left to right).

Linear Ballistic Accumulator Model:
The left panel shows a graphical representation of the LBA model for two-choice data. Each response option is represented as a single accumulator (i.e., the red, blue, and green lines). Following the presentation of a stimulus, evidence ballistically accumulates for each alternative until one of the alternatives reaches the threshold (top line). The model assumes some initial amount of evidence is present for each response option, and this amount is randomly distributed across trials. The rate of evidence accumulation itself is also randomly distributed across trials, but has a mean that is fixed allowing one option to be chosen systematically over other options. The accumulation process in the model is linear, and each alternative accumulates information independently, meaning that the state of one accumulator does not depend on any others.
Racing Diffusion Model: The middle panel illustrates a racing diffusion process (Logan et al., 2014), which is a more general case of the DDM. In the racing diffusion process, evidence for each alternative accumulates independently, as in the LBA. However, the DDM assumes that evidence accumulates in a perfectly anti-correlated fashion, meaning that evidence for one alternative is evidence against the other alternative. This feature of the DDM makes it difficult to apply directly to multi-alternative choice. The DDM adds to the LBA an assumption about withintrial variability in the accumulation process. The middle panel illustrates this stochastic process by the wavy paths through evidence space as a function of time.
Leaky Competing Accumulator Model: The LCA model was developed as a neurologically plausible way to describe the dynamics of response competition. Within the LCA, several nonlinearities complicate the accumulation process. Most importantly, the accumulators compete with one another in a way that is state-dependent: as one accumulator gathers more evidence, it can inhibit other accumulators, causing their rate of accumulation to slow and even become negative. In the illustration above, this competitive dynamic can be seen by inspecting the interaction of the accumulators, where the green accumulator dominates first the red accumulator, and later the blue accumulator. Like the DDM, the LCA assumes within-trial variability. Traditional applications of the LCA do not usually assume between-trial variability in the drift rate, and only occasionally assume between-trial variability in starting point. The LCA model also assumes that the accumulation of evidence is "leaky", meaning that some information is lost during the integration of sensory information.

Fig. 2.
Illustration of joint modeling approach. The figure shows a hypothetical example consisting of 30 s worth of an experiment involving a decision among three alternatives. For neural data, regions of interest are defined (left) and the blood oxygenated level dependent (BOLD) response can be extracted. Statistical models can be fit to the observed BOLD time course (middle), and parameters for say, neural activation, can be estimated. For behavioral data, a cognitive model is developed (left) with mechanisms that are cognitively meaningful. The model can then be fit to data (middle), and parameters for say, drift rate, can be estimated. Finally, joint models specify how the neural parameters are related to the cognitive model parameters through a linking function. In each model schematic, red triangles indicate stimulus presentations.

Box 2
Linking functions relating brain to behavior.
In describing neural data N , one approach is to use a statistical model such as the general linear model Penny and Friston, 2004), or topographic latent factor analysis (Gershman et al., 2011). These models have a set of parameters that control their shape in ways that can closely match observed neural data. On the other hand, one can use theoretical models of cognitive processes with parameters to describe behavioral data B. To complete the process of linking the two streams of data, joint models were proposed as a way to directly relate parameters describing neural data to parameters describing behavioral data . Turner et al. (2013) proposed a completely generic function of the following form: Here, the parameter(s) serve to control the shape of the structure of the link between and . The connection enforced by the overarching distribution is concrete: one must make a specific assumption about the relationship between and when considering the underlying cognitive processes involved. As the article has suggested, there are many ways to specify this link, where some links are probabilistic, deterministic, or based on machine learning techniques.
The left panel of Fig. Fig. 3 illustrates the first type of joint model we discussed in this article, an approach we refer to as "Directed" (Cavanagh et al., 2011;Nunez et al., 2015Nunez et al., , 2017Frank et al., 2015). The Directed approach uses a set of parameters to describe the functional properties of neural data N through some statistical model that also modulates the behavioral model parameters through a linking function , such that = ( ). (2) In the applications described in this article, the linking function usually takes the form of a multivariate regression model. For example, suppose the parameters i k , describe a set of K activations on Trial i from several regions of interest (i.e., … k K {1, 2, , }). One could assume a generic linear combination of these activations gives rise to the behavioral parameters of interest, such that Here, the activation on each trial for each ROI is scaled by k and shifted by 0 to best capture the neural data, while also generating a good prediction for behavioral data through the parameters . This functional form can be viewed as a single-level perceptron model (Minsky and Papert, 1969) that maps a set of inputs to a set of outputs .
If one cannot assume that there is a direct link between neural and behavioral model parameters, another approach is to specify a probabilistic link between the two parameters. For example, Turner and colleagues (Turner et al., 2013(Turner et al., , 2016Palestro et al., 2018) have used the multivariate normal distribution to simultaneously model multivariate patterns in neural activation through the form , where is a set of means for the model parameters, and contains information for the relationship between every pairwise combination of parameters in the set of model parameters. As the complexity of this linking function grows rapidly with increasing number of ROIs (i.e., quadratic complexity), Turner et al. (2017) investigated methods for decomposing into a factor loading matrix , factor variance matrix , and residual terms , such that = + . This approach has the advantage of constraining the correlation structure on the basis of the model parameters, where the factors within can directly represent parameters from a cognitive model. The factor analytic approach was show to greatly reduce the complexity of the linking function, while preserving the model's out-of-sample generalizability. Fig. 3. Different statistical links between brain and theory. There are many ways to specify a linking function between neural and behavioral data (see Box 2). The left panel shows a generic application of a linear regression model. The middle panel shows a multivariate normal linking function that allows variability along each dimension to affect the strength of association. The right panel shows possible new directions for joint models, where multilayer connections between neural data and model parameters can be made to allow for distributed activation and more complex detection of key neural features.
In this way, Directed joint models are quite similar to the direct input and parametric mapping approaches above, but the key difference is that the linking mechanism allows variation in to statistically affect variation in . Furthermore, consistent with the identification of the bridge locus, we may have different linking structures where several brain regions are related to one or more model parameter. Importantly, the link between and is reciprocal. Not only do the neural data have a direct effect on the parameters , but because the precise form of the linking function makes a strong commitment to a prediction about behavioral data, so too do the behavioral data influence the parameters . At this point, several applications of these directed models have been reported, and they have been particularly effective in perceptual decision making tasks (Cavanagh et al., 2011;Nunez et al., 2015Nunez et al., , 2017Frank et al., 2015;van Ravenzwaaij et al., 2017;Ratcliff et al., 2016;Herz et al., 2017;Hawkins et al., 2017). For example, Nunez et al. (2015) used EEG data on a perceptual decision making experiment as a proxy for attention. They controlled the rate of flickering stimuli presented to subject and measured power of the EEG signal at these frequencies; a measure known as steady-state visual evoked potential. The power on these frequencies is known to be modulated by attention. Importantly, Nunez et al. showed that individual differences in attention or noise suppression was indicative of the choice behavior, specifically it resulted in faster responses with higher accuracy.
In a particularly novel application, Frank et al. (2015) showed how models of reinforcement learning could be fused with the DDM to gain insight into activity in the subthalamic nucleus (STN). In their study, Frank et al. used simultaneous EEG and fMRI measures as covariates in the estimation of single-trial parameters. Specifically, they used predefined regions of interest including the presupplementary motor area, STN, and a general measure of mid-frontal EEG theta power to constrain trial-to-trial fluctuations in response threshold, and BOLD activity in the caudate to constrain trial-to-trial fluctuations in evidence accumulation. Their work is important because it establishes concrete links between STN and pre-SMA communication as a function of varying reward structure, as well as a model that uses fluctuations in decision conflict (as measured by differences in expected rewards) to adjust response threshold from trial-to-trial. While Fig. 3 illustrates how the parameters modulate the parameters , other models assume the reverse influence, where the behavioral parameters inform the neural parameters . As a concrete example, Cassey et al. (2016) extended the single-unit modeling work of Purcell et al. (2010) by linking firing parameters of single unit recordings to evidence accumulation dynamics of a decision model. Cassey et al. modeled data from a seminal experiment by Roitman and Shadlen (2002) containing behavioral recordings of two monkeys in a simple decision-making task. The neural data consisted of single-cell neural recordings from the lateral intraparietal area of the cortex. On each trial, a random dot kinematogram appeared on the screen and the monkey indicated whether the coherently moving dots were drifting left or right. Response times and choices were recorded from each trial as well as the timing of action potentials from a set of neurons in the lateral intraparietal area of the cortex. The joint model builds on the work of Purcell et al. (2010Purcell et al. ( , 2012 by assuming that an evidence accumulation model can provide a tight link between the observed neural firing rate and behavioral responses. In contrast to Purcell et al. (2010Purcell et al. ( , 2012 where the neural data are used directly as input to an evidence accumulation model, the model also included an explicit statistical model of the single unit spike trains. Given this implementation, descriptions of the neural data can be informed by the neuron's properties, such as which neuron was being recorded, and from which monkey. The joint model by Cassey et al. (2016) was hierarchical, and the parameters of the neural submodel were allowed to vary across neurons and monkeys.

Covariance models
Directed joint models are convenient because of their simplicity, and because they establish a causal role of neural activation in decision making. However, sometimes causal links are too restrictive, and instead what is needed is a probabilistic linking function rather than a deterministic one. For example, the activity in the LIP area has served as the neural basis for the evidence accumulation process (Roitman and Shadlen, 2002;Shadlen and Kiani, 2013;Shadlen and Newsome, 2001), but Katz et al. (2016) have shown that when LIP areas are superficially lessioned in nonhuman primates, patterns of decision making variables remain unaffected. This finding might suggest that while LIP is related to decision variables, they may not be causally linked (Huk et al., 2017).
As Fig. 3 indicates, directed approaches can potentially be too constrained, making the linking structure inflexible for potentially capturing highly complex interactions. As alluded to in Fig. 1, sometimes it will be important to capture multivariate tendencies across several ROIs, or to map brain activity onto multiple model parameters. One way to capture multivariate tendencies is the Covariance approach, which has been used productively to link multiple measures of neural data to pairwise combinations of model parameters (Turner et al., 2013(Turner et al., , 2016(Turner et al., , 2017(Turner et al., , 2018Cassey et al., 2016;Palestro et al., 2018). For example, Turner et al. (2013) used structural diffusion weighted imaging data to explain differences in patterns of choice response time data across subjects. They showed how a joint model equipped with information about the interconnectivity of brain areas could make accurate predictions about a subject's behavioral performance in a cross validation test (i.e., the behavioral data were withheld). Turner et al. (2015) extended this approach to build in brain state fluctuations measured with fMRI into the DDM. The problem Turner et al. (2015) addressed centered on a lack of information about withintrial accumulation dynamics. In behavioral choice response time experiments, following the presentation of a stimulus, researchers can only observe the eventual choice and response time. These data are then used to estimate parameters of a cognitive model, following an assumption that the data observed on each of these trials arises from the same psychological process. However, this assumption -known as stationarity -is a strong one, and is seldom observed in empirical data (Peruggia et al., 2002;Craigmile et al., 2010). Turner et al. (2015) used a multivariate model to describe the joint activation of a set of brain regions of interest, and used this description to enhance the classic DDM. In a cross validation test, they showed that their extended model could generate better predictions about behavioral data than the DDM alone, demonstrating that neurophysiology can be used to improve explanations about trial-to-trial fluctuations in behavior.
In another application, Turner et al. (2016) used the joint modeling framework to perform multimodal data fusion at the individual-subject level. In the study, subjects were assigned to groups that dictated which type of neural measures would be collected: (1) EEG, (2) fMRI, or (3) both EEG and fMRI. Within all groups, subjects completed an intertemporal choice task, providing both behavioral data in the form of choice response times and neural data in accordance with their group assignment. For the subjects providing both EEG and fMRI, Turner et al. used a repeated measures design where subjects returned to the lab and neuroimaging modalities were counterbalanced across individuals. In using the joint model, they assumed that the common relationship of all the measurements (i.e., behavior, EEG, and fMRI) was the mental activity underlying each decision. Using this assumption, Turner et al. created one hierarchical joint model of all three groups, and showed that this model performed better in terms of model fit and cross validation of individual subjects' data compared to models that only considered one (i.e., behavioral only) or two (e.g., behavior and EEG only) modalities of information. Importantly, Turner et al. showed how repeated measures experimental designs can be used to productively integrate information from EEG, fMRI, and behavioral data both within and between individuals.

Future extensions: distributed activation
Although Covariance joint models are more flexible than Directed joint models, they still may not provide the best linking function in some scenarios. By capturing all pairwise correlations among ROIs and model mechanisms, they can be computationally complex to fit to data (Turner et al., 2017a). In fact, this complexity has limited Covariance joint models to ROI-based analyses, as whole-brain analyses are currently infeasible. While ROI-based analyses can be a productive way to integrate results across studies, they completely ignore potentially interesting coactivations that may be distributed across the brain (Haxby et al., 2000;Norman et al., 2006). For example, Huk et al. (2017) suggest several reasons why the firing rates of single unit neurons recorded from LIP should not be interpreted as being directly related to decision variables per se, but rather motor preparation signals. Instead, Huk et al. (2017) advocated for the notion that the integration of motion information is distributed across the brain. While such distributed correlation of evidence variables has been observed in ROI-based joint analysis of human fMRI data (Turner et al., , 2017, finer levels of analysis such as whole-brain and temporal dynamics are needed to advance the field. To capture distributed activations, even more flexible linking functions may be necessary. As noted in Turner et al. (2018), neural networks may provide an interesting opportunity for detecting multivariate coactivation of cortical areas that are not spatially proximal. As illustrated in the right panel of Fig. 3, Neural Network extensions are not unlike Directed approaches; in fact, one can view Directed approaches as a single-layer perceptron model, an early form of connectionist models. Much like the history of the perceptron (Minsky and Papert, 1969;McClelland, 2009), there are likely many types of functions that Directed models are unable to capture. One of the major problems that connectionist frameworks such as PDP models addressed was linearly separable mapping functions such as the XOR operation. In the XOR problem, a map is constructed between two inputs x 1 and x 2 and an output y. ), then = y 0. Simple perceptron models are unable to address this type of mapping. The solution to the XOR problem was to include a hidden layer to allow for a more complex mapping function between input and output layers. Analogously, hidden layers may be an essential component to advance Directed joint models for more complex multivariate interactions with cognitive mechanisms. As we see it, Neural Network models, or highly nonlinear multivariate regression techniques in general, serve as a method of agnostically mapping the activation from many neural features into key decision dynamics. There is nothing particularly special about Neural Network models per se, as similar nonlinear multivariate regression techniques could extend Directed joint models to capture similar patterns (see Box 2).
To capture temporal dynamics, one approach would be to model the temporal dependency among regions of interest using techniques such as Dynamic Causal Models (Friston, 2002;Friston et al., 2003), or more generally, Multivariate Dynamical Systems (Ryali et al., 2011). These techniques often require strong assumptions about the set of connectivity paths worth considering, or how activation maps to a hemodynamic signal (e.g., the Balloon model, see Buxton et al., 1998;Mandeville et al., 1999;Friston et al., 2000). Although temporally causal models are generally difficult to fit to data, considerable progress is being made to improve the feasibility of testing these dynamical approaches (Sugihara et al., 2012), and so relating temporal dynamics of brain behavior to the temporal structure of decision models may be the next frontier for imposing reciprocity in brain-behavior dynamics.

Conclusions
To connect neuroscientific measures to psychological theory, a new wave of researchers have carefully considered how to inspect and interpret highly complex interactions across a sea of data. Many researchers have looked to computational models that instantiate psychological theories through a set of mathematical expressions, making their predictions for data in completely new experiments transparent. As the field has continued to develop, new statistical techniques have been constructed with the intention of bridging mechanisms from abstract computational models to concrete neurophysiological responses. These powerful new frameworks allow researchers to understand the complexities of brain data in terms of the psychological theories they assume. Some of these frameworks inherently assume hierarchical Bayesian architectures, which have been shown to magnify the resolution of data by the borrowing of "statistical strength." In closing, techniques such as joint modeling provide the telescope by which neural data may be interpreted through the lens of a cognitive model.