A flexible framework for simulating and fitting generalized drift-diffusion models

The drift-diffusion model (DDM) is an important decision-making model in cognitive neuroscience. However, innovations in model form have been limited by methodological challenges. Here, we introduce the generalized drift-diffusion model (GDDM) framework for building and fitting DDM extensions, and provide a software package which implements the framework. The GDDM framework augments traditional DDM parameters through arbitrary user-defined functions. Models are solved numerically by directly solving the Fokker-Planck equation using efficient numerical methods, yielding a 100-fold or greater speedup over standard methodology. This speed allows GDDMs to be fit to data using maximum likelihood on the full response time (RT) distribution. We demonstrate fitting of GDDMs within our framework to both animal and human datasets from perceptual decision-making tasks, with better accuracy and fewer parameters than several DDMs implemented using the latest methodology, to test hypothesized decision-making mechanisms. Overall, our framework will allow for decision-making model innovation and novel experimental designs.


Introduction
The drift-diffusion model (DDM) is an important model in cognitive psychology and cognitive neuroscience, and is fundamental to our understanding of decision-making (Ratcliff, 1978;Bogacz et al., 2006). The DDM explains both choice and response time (RT) behavior across a wide range of tasks and species, and has proven to be an essential tool for studying the neural mechanisms of decisionmaking (Gold and Shadlen, 2007;Forstmann et al., 2016;Ratcliff et al., 2016). As a sequential sampling model for two-alternative forced-choice tasks, it assumes the subject obtains a continuous stream of evidence for each alternative throughout the course of an experimental trial. Evidence consists of some true underlying signal (drift) in addition to neural and environmental noise (diffusion). According to the DDM, once a decision variable tracking the difference in evidence between the two alternatives is sufficiently large, i.e. when the decision variable reaches some fixed upper or lower bound, a choice is made at the moment of bound crossing, thereby modeling for the decision both choice and RT. In its simplest form, the DDM includes four parameters: a drift rate, representing evidence; a diffusion constant or bound height representing noise or response caution; a starting position, representing side bias and often fixed at zero; and a non-decision time, representing afferent and efferent delays but external to the DDM process (Ratcliff, 1978). These models can be fit to choice and response time data.
In order to test new hypotheses about decision-making processes, there is great interest in developing extensions to the DDM. For example, the three-parameter DDM is known to be sub-optimal with respect to maximizing reward rate in standard block-design tasks when evidence strength varies unpredictably (Malhotra et al., 2018;Evans and Hawkins, 2019). Therefore, it has been hypothesized that decision-making is governed by an urgency signal which increases the probability of response over the course of a trial, implemented as either time-varying evidence gain (Cisek et al.,

GDDM as a generalization of the DDM
The classic DDM is a four-parameter model which specifies fixed values for drift, bound height or noise level, starting position bias, and non-decision time ( Figure 1). It assumes that evidence is constant throughout the trial, and that the integration process does not directly depend on time. This form is mathematically accessible, and has been used to model choice and RT data (Ratcliff, 1978;Ratcliff et al., 2016).
In order to expand our knowledge of decision-making mechanisms, there is interest in using the DDM to model a variety of experimental paradigms. To accommodate different paradigms, the DDM itself must be extended. One key example is the case where a choice bias is induced experimentally via a change in prior probability or reward magnitude, and it is hypothesized that an offset starting position alone is insufficient to capture the behavioral effects. The DDM has been extended to accommodate an experimentally-induced bias by placing a constant offset on the drift rate (Mulder et al., 2012;Ratcliff, 1985;Ashby, 1983).
There is a set of influential yet insufficient extensions to the DDM called the 'full DDM'. This set of extensions was developed to explain two specific discrepancies between the DDM and data (Anderson, 1960;Laming, 1968;Blurton et al., 2017). First, experimental data exhibited a difference in mean RT between correct and error trials which could not be captured by the classic DDM, so two parameters for across-trial variability were introduced to explain this difference: uniformly-distributed initial conditions to explain fast errors (Laming, 1968) and Gaussian-distributed drift rate to explain slow errors (Ratcliff, 1978;Ratcliff and Rouder, 1998). Second, the classic DDM also had a sharper rise time than experimental data, so uniformly-distributed across-trial variability in non-decision time was introduced to capture this effect. A previous study validated predictions of these across-trial variability parameters (Wagenmakers et al., 2008a). Compared to the classic DDM, the 'full DDM' improves the fit to data, but does not expand the range of experiments which may be performed. Likewise, it does not facilitate extenstions to test alternative cognitive mechansims.. A more versatile framework is therefore needed.
The GDDM framework provides a flexible means for developing extensions to the DDM (Figure 1) which can accommodate a wide range of experimental designs and hypotheses about decision-making. The GDDM allows for extensions in five key aspects. First, the drift rate and noise level may depend on time. This allows a range of new experimental paradigms to be modeled using the DDM, such as doubly-stochastic evidence (Zylberberg et al., 2016), pulsed evidence (Huk and Shadlen, 2005), discrete evidence (Cisek et al., 2009;Brunton et al., 2013;Pinto et al., 2018), or changing evidence . Additionally, the time dependence of drift rate and noise allows a time-variant urgency signal which modulates the gain of evidence (Cisek et al., 2009;Thura et al., 2012;Murphy et al., 2016;Drugowitsch et al., 2014a). Second, drift rate and noise level may depend on position. This allows for model features such as unstable (Wong and Wang, 2006;Roxin and Ledberg, 2008;Erlich et al., 2015) and leaky (Wong and Wang, 2006;Ossmy et al., 2013;Farashahi et al., 2018) integration. Leaky integration with a short time constant is functionally similar to the urgency-gated model with a low pass filter (Hawkins et al., 2015a;Thura and Cisek, 2014). Likewise, position-dependent variance may allow for multiplicative noise (Freyer et al., 2012). Third, the bound may change over time, which allows collapsing bounds according to an arbitrary function (Hawkins et al., 2015a;Evans and Hawkins, 2019;Drugowitsch et al., 2012), such as for an urgency signal. Fourth, models are parameterizable by an arbitrary dependence on fittable parameters or task conditions. These parameters may be shared between two related experimental tasks, such as an experimental and control task. It also allows alternative implementations of side bias (Hanks et al., 2011) and condition-dependent nonlinearities . Fifth, acrosstrial parameter variability is supported within the GDDM framework for both starting point and nondecision time, but not drift rate.
Mathematically, these concepts are expressed using five functions, corresponding to the drift rate, noise, bound height, starting position, and post-factum modifications to the distribution (Equation 1). Collectively, the GDDM framework allows complicated models to be expressed in a consistent manner, simplifying the way which we may think about complicated DDM extensions.

GDDM mechanisms characterize empirical data
The cognitive mechanisms by which evidence is used to reach a decision is still an open question. One way to distinguish these mechanisms from one another is by fitting models of these cognitive mechanisms to empirical RT distributions. To demonstrate the flexibility of GDDMs, we constructed a GDDM including three plausible mechanisms which are difficult or impossible to implement in a standard DDM: collapsing bounds, leaky integration, and a parameterized nonlinear dependence of drift rate on stimulus coherence. As described below in more detail, we then tested the fit of a GDDM which includes these three mechanisms, compared to the DDM and the 'full DDM', in two psychophysical datasets from perceptual decision-making tasks: one from monkeys (Roitman and Shadlen, 2002), and one from human subjects (Evans and Hawkins, 2019). We found that these mechanisms allow the GDDM to provide a better fit to RTs with a more parsimonious model than the DDM or the 'full DDM' in the monkey dataset, suggesting that these mechanisms may be important to explaining the monkeys' behavior. However, consistent with previous results (Hawkins et al., 2015a), the GDDM fit no better than the DDM or 'full DDM' in the human dataset, suggesting that different cognitive strategies may be used in each case. Roitman and Shadlen, 2002 trained two rhesus monkeys on a random dot motion discrimination task in which dots moved randomly but with varying strengths of coherent motion to the left or right (Figure 2-figure supplement 1). Motion coherence was randomly selected to be one of six conditions ranging from 0% to 51.2%. The monkeys were required to determine the direction of coherent motion and respond via saccade to flanking targets. Response time was recorded, which we fit with the standard DDM, 'full DDM', and GDDM.
We compared the GDDM to the 'full DDM' and to three different implementations of the DDM which differ in the methodology used to estimate the RT distribution. A detailed comparison of these three packages-PyDDM, HDDM, and EZ-Diffusion-as well other common packages is provided in the section 'Software package for GDDM fitting'. The DDMs included terms for drift, nondecision time, and either bound height or noise magnitude. The DDM fit by PyDDM assumed drift rate was a linear combination of some baseline drift rate and coherence (Equation 14), whereas the DDM fit by HDDM used separate drift rates for each coherence condition (Equation 16), and EZ-Diffusion fit all parameters fully independently for each coherence condition (Equation 17). We also fit a 'full DDM' model, which included across-trial variability in the drift rate, non-decision time, and starting position (Equation 15). Finally, we fit a GDDM, which consisted of a typical DDM with no across-trial variability in the parameters, but which demonstrates three mechanisms that are difficult or impossible to model in a standard DDM framework: leaky integration, an exponentially collapsing bound, and a drift rate which is determined from coherence using a fittable nonlinearity (Equation 13). Each of the two monkeys was fit using a separate set of parameters for all models.
Each model was fit to the data, using full-distribution maximum likelihood for PyDDM and HDDM, and an analytic expression for EZ-Diffusion (Figure 2, Figure 2-figure supplement 2). First we examine the fit of the model quantitatively. Bayesian information criterion (BIC) is a common way of quantifying model fit while penalizing for free parameters. According to BIC, the 6-parameter GDDM fit better than all other models ( Figure 2a). To determine whether this improved fit was due to the penalty BIC places on the number of parameters, we compared the models directly with likelihood. Again, we found that the GDDM fit this dataset better than the other models which have more parameters (Figure 2b). Finally, we tested whether this improved fit is due to the fact that PyDDM and HDDM use full-distribution maximum likelihood as an objective function, whereas EZ-Diffusion does not. We found that the mean squared error (MSE) of the GDDM was also lower than all other models ( Figure 2c). These findings of a good fit by this GDDM, compared to the DDM and the 'full DDM', can be interpreted as evidence supporting the cognitive mechanisms included in the GDDM.  Roitman and Shadlen, 2002. Five models (see Materials and methods) were fit with three different software packages to data from monkey N. Model fits are compared according to BIC (a), negative log likelihood (b), and mean squared error (MSE) (c). The psychometric (d) and chronometric (e) functions represent data as dots and models as lines. The probability density function (PDF) (f) of each model is shown for 12.8% coherence, 6.4% coherence, 3.2% coherence, and 0% coherence. The data RT Figure 2 continued on next page Additionally, we assessed the qualitative fit of the models to the data to gain insight into what aspects of the RT distributions the GDDM was better able to capture. All five models provided a good fit to empirical psychometric ( Figure 2d and The RT distributions produced by monkeys may not be representative of those produced by humans (e.g., Hawkins et al., 2015a), which show a characteristic skewed distribution. This skew is characteristic of the DDM without collapsing bounds, and highlights that a different strategy may be used, potentially driven by overtraining or task structure (Hawkins et al., 2015a). We therefore fit the same GDDM to a more representative dataset from humans performing a similar perceptual decision-making task. Specifically, from the publicly available dataset of Evans and Hawkins, 2019, which used a random dot motion discrimination task (similar to Roitman and Shadlen, 2002), we fit the same set of models to the trials with no feedback delay, which the authors had found were better fit with no or little collapse of decision bounds. Consistent with the findings of Hawkins et al., 2015a and Evans and Hawkins, 2019, here we found that the 6-parameter GDDM provides no better fit than the 11-parameter 'full DDM' and 8-parameter DDM fit using HDDM (Figure 2-figure supplement 3a-c). The GDDM, 'full DDM', and DDMs fit with PyDDM and HDDM are all able to provide good fits to the psychometric function, chronometric function, and RT distributions for human subjects (Figure 2-figure supplement 3d-f). This lack of improved fit by the GDDM suggests that the human subjects employed a cognitive strategy closer to the standard DDM, and that the specific mechanisms of this GDDM (collapsing bounds, leaky integration, and input nonlinearity) are not important for explaining psychophysical behavior in this dataset.
For the example datasets considered above, the GDDM presented here improved the fit to the monkey data and provided a good fit to the human data. We caution that this result does not necessarily mean that this particular GDDM is appropriate for any particular future study. Models should not only provide a good fit to data, but also be parsimonious in form and represent a clear mechanistic interpretation (Vandekerckhove et al., 2015). The examples here demonstrate the utility of GDDMs for quantitatively evaluating cognitive mechanisms of interest through fitting models to empirical data.

Methods for estimating RT distributions in the DDM and GDDM
There are three broad classes of methods for estimating a GDDM's RT distribution. First, analytical expressions can be derived using stochastic calculus and sequential analysis (Wald, 1945;Anderson, 1960). Analytical expressions were favored historically because they can be computed quickly by humans or computers. However, apart from the simplest form of the DDM and a few particular classes of extensions, analytical solutions often do not exist. While analytical expressions offer maximum performance and accuracy, deriving new analytical expressions is not a practical way to rapidly implement and test new GDDMs.
The second method, trial-wise trajectory simulation, fully supports GDDM simulation. This method simulates the decision variable trajectories of many individual trials to sample RTs without producing a probability distribution ( Figure 3a). Previous work has used trial-wise trajectories to simulate DDMs with no known analytical solution (e.g. Hawkins et al., 2015a). However, this flexibility comes at the cost of performance. As we show in section 'Execution time vs error tradeoff', trialwise trajectory simulation is very slow and provides a coarse-grained representation of the model. Since this method does not produce a probability distribution, it is difficult to use efficient and robust methods such as full-distribution maximum likelihood for fitting data, though progress is ongoing in this area using likelihood-free methods such as through approximate Bayesian computation (Holmes and Trueblood, 2018).
By contrast, the third method may use the Fokker-Planck equation to numerically compute the first passage time distribution. The Fokker-Planck equation is a mathematical expression which reformulates the trial-by-trial diffusion process: rather than simulating single-trial decision variable trajectories, this method 'simulates' all possible paths at once by propagating the distribution of singletrial decision variable trajectories over time ( Figure 3b). Unlike analytical solutions, numerical solutions from the Fokker-Planck equation can be computed for all GDDMs. They also generate a probability density function for the RT distribution instead of a sample, allowing fitting with fulldistribution likelihood methods. Several algorithms may be used to solve the DDM or GDDM using the Fokker-Planck equation, each with different advantages and disadvantages.
We consider three such algorithms: forward Euler, backward Euler, and Crank-Nicolson (see Appendix). These algorithms differ in how they propagate the probability distribution over time: forward Euler iteratively approximates the probability distribution of trajectory position at each timestep using the distribution at the previous timestep, while backward Euler and Crank-Nicolson iteratively solve systems of linear equations to approximate the probability distribution (Voss and Voss, 2008). The three algorithms also differ in their conceptual and technical difficulty to implement, with forward Euler being easier than backward Euler or Crank-Nicolson. In section 'Execution time vs error tradeoff', we will show that the backward Euler and Crank-Nicolson algorithms provide higher accuracy for a given execution time-and a faster execution time for a given accuracy levelthan the forward Euler algorithm and trial-wise trajectory simulation.

Execution time vs error tradeoff
Next, we evaluated the effectiveness of different methodologies for estimating the RT distribution. Two general methodological considerations for all of these models are the precision with which the simulations are performed (accuracy), and the amount of time it takes the simulation to run on a computer (execution time). Accurate RT distributions are important to ensure the result faithfully represents the model, and fast execution times are important to increase the feasibility of fitting models to data. In general, for a given algorithm, there is a tradeoff between execution time and the accuracy of the solution obtained. One algorithm may be more efficient than another algorithm if it produces similar accuracy in a shorter amount of time, or better accuracy in the same amount of time.
We directly evaluate the execution time vs error tradeoff in trial-wise trajectory simulation and in solving the Fokker-Planck equation. To do so, we consider a standard DDM with constant drift, diffusion, and bounds (Equation 18). While this is not itself a GDDM, such a model has an analytical solution to the reaction-time distribution (as a closed-form infinite sum) (Anderson, 1960;Ratcliff, 1978) which can be used as ground truth for computing the error in numerical estimates (see Materials and methods).
First, we test trial-wise trajectory simulation. We perform a parameter sweep across timesteps Dt and sample sizes N. We simulate using the Euler-Maruyama method, which is standard in the DDM literature (e.g. Hawkins et al., 2015a;Hawkins et al., 2015b;Evans et al., 2017;Harty et al., 2017;Zhou et al., 2009;Atiya et al., 2020;Simen et al., 2011;Nguyen et al., 2019;Lewis et al., 2014;Verdonck et al., 2016) and for simple models provides identical truncation error but better execution time than more sophisticated methods such as stochastic fourth-order Runge-Kutta (Gard, 1988;Brown et al., 2006). Because decreasing Dt for a fixed N increases the number of timebins and thus increases the variability across each timebin, we employed a smoothing procedure on the resulting simulated histogram to account for this. For each simulation, we smoothed the simulated RT histogram with all mean filters of size 1 to 500 points and chose the filter with the lowest error; since a filter of size one is equivalent to applying no smoothing, this procedure could only improve accuracy. In practice, selecting a filter this way would not be possible due to the lack of analytical ground truth for most GDDMs, and so this smoothing procedure artificially inflates the method's performance compared to other methods, and effectively yields an upper bound on trial-wise trajectory performance. Execution time and error are shown separately for a range of numerical parameters (Figure 4a,b). As a proxy for the execution time vs error tradeoff, we also show the product of execution time and mean squared error for the parameter sweep (Figure 4-figure supplement 1).
In general, trial-wise trajectory simulation provides a suboptimal execution time vs error tradeoff. Using the product of MSE and execution time as a proxy measure of overall performance to illustrate this tradeoff, trial-wise trajectories perform most efficiently for highly imprecise models, i.e. those with a large timestep and a small number of trajectories simulated (Figure 4-figure supplement 1). To examine the execution times and accuracies which lead to this tradeoff, we fix Dt ¼ 0:01 and plot the execution time and accuracy across many different values of sample size N (Figure 4c). We see that as execution time increases, accuracy does not increase by more than one order of magnitude. Examples of these traces are provided in Figure 4-figure supplement 2.
Next, we examine methods for solving the Fokker-Planck equation. We consider three different algorithms: forward Euler, backward Euler, and Crank-Nicolson ( Table 1). The execution time vs accuracy tradeoff of these three algorithms are governed by two parameters: the timestep Dt, and the granularity of decision variable discretization Dx. Decreasing either of these two parameters slows down the execution time but increases accuracy. These three algorithms operate in a distinct but related manner. Forward Euler is the easiest algorithm to implement, however it is only numerically stable for values of Dt which are very small relative to Dx. Forward Euler is an approximation in the limit as Dt ! 0, and thus cannot be run for large Dt step sizes relative to Dx (see Materials and methods). Backward Euler solves this problem by providing a numerically stable solution across all values of Dt and Dx. While it provides a similar execution time and accuracy as forward Euler given Dt and Dx, the fact that it is numerically stable across all values of these parameters allows more efficient values of Dt and Dx to be chosen. Crank-Nicolson further improves the accuracy and execution time over backward Euler (Table 1). Like backward Euler, Crank-Nicolson does not have theoretical constraints on Dx or Dt, but unlike backward Euler, it may experience numerical instability for models with time-variant integration bounds. In practice, for each of these methods, the required maximum step size is constrained by how strongly varying the drift and noise are in space and time, where more varying drift would require a smaller step size. Convergence of the results can be evaluated by checking results with smaller step sizes.
To evaluate the execution time vs error tradeoff, we perform a parameter sweep across values of Dt and Dx for these three algorithms and again compute the product of execution time and mean squared error as a proxy for execution time vs error tradeoff (Figure 4a,b, Figure 4-figure supplement 1). Overall, the execution time vs error tradeoff is one to six orders of magnitude better for these algorithms than for trial-wise trajectory simulation. We find that, for backward Euler and Crank-Nicolson, the execution time vs error tradeoff in the model shown in Figure 4-figure supplement 1 is maximized approximately when Dt ¼ Dx. Forward Euler is unstable in this parameterization, with the best execution time vs error tradeoff when it is closest to its stability condition. We   examine the execution time and error for parameterizations which satisfy these constraints, and compare them to trial-wise trajectory simulation (Figure 4c). We see that Crank-Nicolson is strictly better than other methods: for a given error, Crank-Nicolson offers a faster execution time, and for a given execution time, Crank-Nicolson offers lower error. However, since Crank-Nicolson may not be appropriate for models with time-variant boundaries, the second best method is also of interest. We see that, for a given execution time, backward Euler is better than forward Euler by two orders of magnitude, and better than trial-wise trajectory simulations by up to five orders of magnitude. Therefore, solving the Fokker-Planck equation using Crank-Nicolson and backward Euler represents a key advantage over the forward Euler and trial-wise trajectory simulation methods.
Software package for GDDM fitting Overview of our software package We developed a software package to encapsulate the methodological framework we have presented. Our package, PyDDM, allows the user to build models, simulate models, and fit them to data using the efficient GDDM framework methodology described previously. The structure and workflow of the package were designed to mirror the GDDM framework ( Figure 5-figure supplement 1). PyDDM selects the most efficient approach to solve a GDDM by automatically determining whether the model can be solved analytically and whether it uses time-varying bounds ( Table 1). PyDDM is also built to emphasize the modularity of the GDDM framework. Components of models may be readily reused in other models, and to promote this, we additionally provide an online database of models (the 'PyDDM Cookbook' https://pyddm.readthedocs.io/en/latest/cookbook/index. html) to promote code sharing and replication. Users are encouraged to submit their models to this database. Furthermore, PyDDM is designed to accommodate the arbitrary flexibility of GDDMs. For reliable parameters estimation, it supports differential evolution (Storn and Price, 1997) as a global optimization method in addition to local search methods such as the Nelder-Mead simplex algorithm (Nelder and Mead, 1965). Fits are performed using full-distribution maximum likelihood on the full probability distribution in order to use all available data to estimate parameters. Likelihood can be overly sensitive to outliers or 'contaminants' (Ratcliff and Tuerlinckx, 2002), and this problem has been solved elsewhere by using a mixture model of the DDM process with a uniform distribution in a fixed ratio (Wiecki et al., 2013;Wagenmakers et al., 2008b;Voss and Voss, 2007). PyDDM builds on this methodology by allowing a parameterizable mixture model of any distribution and a fittable ratio. Because some models can be difficult to understand conceptually, PyDDM also includes a GUI for visualizing the impact of different model parameters ( Figure 5-figure supplement 2), analogous to a GDDM-compatible version of the tool by Alexandrowicz, 2020. Comparison to other software packages Besides PyDDM, several software packages are available to simulate DDMs, and one package to simulate GDDMs. We compare these in detail in Table 2, to emphasize a few key points of comparison.
The choice of software package depends on the model to be fit. As demonstrated in Figure 2 and noted previously (Ratcliff, 2008), fitting a diffusion model to a small number of summary statistics, as in EZ-Diffusion, can lead to poor qualitative and quantitative model fit to the RT distribution. Better options for fitting the DDM and 'full DDM' are available, such as fast-dm and HDDM. However, these packages are limited in model form, and make it difficult to extend beyond the 'full DDM' to test additional cognitive mechanisms or harness time-varying stimulus paradigms. As we have shown previously (Figure 2), the GDDM is desirable for obtaining good model fits with few parameters.
The major highlight of PyDDM compared to other packages is that it allows GDDMs to be simulated, and is thus compatible with time-and position-dependent drift rate and noise, time-varying bounds, starting point variability, and non-decision time variability. Besides PyDDM, the only other package to our knowledge capable of simulating GDDMs is the new R package CHaRTr (Chandrasekaran and Hawkins, 2019). This package is a major innovation compared to previous approaches, as it allows flexible model forms to be tested. However, CHaRTr is based on trial-wise trajectory simulations. As shown previously (Figure 4), trial-wise trajectory simulation is especially inefficient, reducing the accuracy of the model simulations and increasing execution time by several orders of magnitude. In practice, this makes it infeasible to fit data to more flexible models. For the user to define new models, CHaRTr requires defining models in C and modifying the CHaRTr source code. By contrast, PyDDM facilitates the specification and implementation of models by defining them in user scripts written in pure Python.

Parallel performance
In addition to PyDDM's efficient algorithms for solving GDDM models, PyDDM also includes the ability to further speed up simulations by solving models in parallel. Any model can be converted from single-threaded to multi-threaded with a single function call. PyDDM solves models for different conditions on multiple CPUs via the 'pathos' library (McKerns et al., 2012). It uses a so-called 'embarrassingly parallel' algorithm, which turns the most essential parts of the simulation into independent tasks and distributes them equally across CPUs. Such parallelization can be enabled in any PyDDM script with one line of code.
In a perfectly efficient parallelized environment, a simulation utilizing N CPUs should be N times faster than a simulation utilizing one CPU. In practice, this is rare due to communication latency and PyDDM is compared to HDDM (Wiecki et al., 2013), EZ-Diffusion (Wagenmakers et al., 2007), CHaRTr (Chandrasekaran and Hawkins, 2019), Diffusion Model Analysis Toolbox (DMAT) (Grasman et al., 2009;Vandekerckhove and Tuerlinckx, 2008), and fast-dm (Voss and Voss, 2007;Voss and Voss, 2008;Voss et al., 2015). Red indicates limited flexibility, yellow indicates moderate flexibility, and green indicates maximal flexibility. For the solver, these colors indicate minimal, moderate, and maximal efficiency. )) on more than one CPU ( Figure 5). As expected, the execution time decreases when more CPUs are used to perform the simulations. However, this speedup is strikingly linear (Figure 5a); for up to 20 CPUs, PyDDM achieves close to 100% parallel efficiency (Figure 5b). This means that PyDDM is able to effectively utilize multiple processors to decrease execution time. In practice, the 6-parameter GDDM shown above fits on one CPU in under 15 minutes, and the authors routinely fit 15-parameter GDDMs on 8 CPUs with differential evolution for global parameter optimization in under 5 hours.

Discussion
The GDDM framework provides a consistent description of model extensions, and allows researchers to efficiently simulate and fit a wide variety of extensions to the DDM. Using two open datasets, we demonstrated that GDDMs are able to fit experimental data with high accuracy and few parameters. Model RT distributions are solved and fit numerically using multiple fast and accurate methods for solving the Fokker-Planck equation. We built the PyDDM software package to implement the GDDM framework. PyDDM makes it easy to utilize the efficiency of this framework, while providing additional technical advantages over other packages. The major benefit of our GDDM framework is the flexibility and ease with which the model may be extended. The GDDM generalizes the DDM to support a wide variety of task paradigms and cognitive mechanisms. Any parameter in the standard DDM is allowed to be an arbitrary function of time and/or position when applicable. Mechanisms such as leaky or unstable integration, collapsing bounds, time-dependent drift rates, and complex non-decision time and starting point distributions all fall within the GDDM framework and can be readily simulated in PyDDM. GDDMs may be built modularly, eliminating the need to reimplement a separate model for each new mechanism. This modularity also helps promote code reuse through our online database of PyDDM model components.
GDDMs may be simulated by efficiently solving the Fokker-Planck equation. In our tests, trial-wise trajectory simulation performed poorly because large increases in the number of trials did not result in as large of increases in accuracy. While the smoothing procedure we applied improved accuracy, trial-wise trajectory simulation was unable to obtain low error for reasonable execution times. The Crank-Nicolson and backward Euler algorithms performed up to five orders of magnitude better than trial-wise trajectory simulation. This increase in performance is consistent with previous work which used Crank-Nicolson to solve the Fokker-Planck equation for the 'full DDM' (Voss and Voss, 2008). However, increasing performance is critical for the GDDM not only because it reduces simulation time, but because it leads to a qualitative difference in the types of models which may be fit to data. While inefficient methods are sufficient for simulating a single instance of the model, fitting models with many parameters to data requires thousands of simulations or more. By providing efficient algorithms for solving this broad class of models, our framework makes it possible to fit parameters to data for models which were previously prohibitively time consuming. In this way, it expands the range of potential cognitive mechanisms which can be tested and experimental paradigms which can be modeled. PyDDM enables researchers to build very complex models of decision-making processes, but complex models are not always desirable. Highly complex models may be able to provide an excellent fit to the data, but unless they represent specific, plausible cognitive mechanisms, they may not be appropriate. PyDDM's complexity can be used to examine a range of potential cognitive mechanisms which were previously difficult to test, and to probe a range of task paradigms. Additionally, sometimes complex models which do represent appropriate cognitive mechanisms may exhibit inferior performance due to limitations of the available data. For instance, simulation studies suggest that when data are sparse, Bayesian fitting or EZ-Diffusion may be better able to recover parameters (Lerche et al., 2017;Wiecki et al., 2013;van Ravenzwaaij et al., 2017).
When the hypothesis demands additional model complexity, though, one critical check is to make sure the parameters of the models are recoverable. That is, given a set of models, simulated data from each model should be best fit by the model and parameters that simulated it (Wiecki et al., 2013). This explicitly tests against model and parameter degeneracy, a known weakness of the 'full DDM' (van Ravenzwaaij and Oberauer, 2009;Boehm et al., 2018). Individual GDDMs must be tested for model and parameter recovery before claims can be made about model mechanisms. PyDDM makes it straightforward to test whether a given GDDM has parameters which are recoverable, and whether any two given models can be distinguished from one another. This is facilitated because synthetic RT data can be sampled from a the calculated probability distributions of the GDDM without the need to run trial-wise trajectory simulations.
The GDDM framework supports across-trial variability in non-decision time and starting position according to any arbitrary distribution, but it does not support across-trial drift rate variability. Across-trial drift rate variability is an inherent weakness of the Fokker-Planck approach. It is technically possible to model across-trial drift rate variability using our framework, by discretizing the drift rate distribution and solving the Fokker-Planck equation for each case separately (Voss and Voss, 2008). PyDDM is not set up to ergonomically reconcile across-trial drift rate variability with timevarying evidence, which is a core feature of PyDDM. As a result, the GDDM framework does not fully support the across-trial drift rate variability aspect of the 'full DDM'. Across-trial drift rate variability is a potentially plausible cognitive mechanism used in prior literature to account for slow errors (Ratcliff and Rouder, 1998). Of note, studies suggest this mechanism may not be necessary to include to fit data (Lerche and Voss, 2016;Lerche et al., 2017). GDDMs offer researchers the capability to explore alternative cognitive mechanisms which might also account for slow errors.
PyDDM uses numerical solutions of the Fokker-Planck equation to estimate the RT distribution of the model. Other methods may be used for this estimation, such as solving the Volterra integral equation (Smith, 2000). Some prior work has used these alternative methods for solving GDDMs (Drugowitsch et al., 2012;Zhang et al., 2014;Drugowitsch et al., 2014c). Such methods already have fast solvers (Drugowitsch, 2016), which could be implemented as an alternative method within PyDDM in the future.
PyDDM also has the potential to support other applications of interest in the field. First, it could be used as a simulation engine for fitting hierarchical models of GDDM parameters, in which each subject's individual parameters are drawn from a group distribution. The most critical component to hierarchical model fitting is an efficient method for simulating model likelihood. Since PyDDM provides such a method, hierarchical fitting of GDDMs could be readily implemented as an extension to PyDDM. Second, while PyDDM currently only implements bounds which are symmetric around zero, support for asymmetric bounds is a feasible future addition. Third, PyDDM may be extended to support multi-dimensional diffusion, enabling it to represent interacting race models (Purcell et al., 2010;Usher and McClelland, 2001;Bogacz et al., 2006;Tsetsos et al., 2012;Drugowitsch et al., 2014b). The Fokker-Planck equation can be solved in higher dimensions, in principle enabling similar highly efficient methods to be used for these models as well. The GDDM framework would require only minor adaptations to be compatible with multi-dimensional diffusion. Fourth, it may be extended to allow fitting models to neural recordings or other data which represents the value of the decision variable, as in Zoltowski et al., 2020. Currently, PyDDM only supports fitting to the RT distribution.
In summary, we show that the GDDM framework is a flexible and efficient way to explore new extensions on the DDM, allowing new possibilities in task design. Our package, PyDDM, offers a convenient and modular method for implementing this framework.

GDDM description
The GDDM is described as the following differential equation for decision variable x: where '. . .' represents task conditions and fittable parameters. Initial conditions are drawn from the distribution x~X 0 ð. . .Þ. The process terminates when jxðt; . . .Þj ! Bðt; . . .Þ. This can be parameterized by five functions: . ðx; t; . . .Þ -The instantaneous drift rate, as a function of the decision variable position, time, and task parameters. In PyDDM, it is represented by the 'Drift' class and defaults to ðx; t; . . .Þ ¼ 0. . sðx; t; . . .Þ -The instantaneous noise, as a function of the decision variable position, time, and task parameters. In PyDDM, it is represented by the 'Noise' class and defaults to sðx; t; . . .Þ ¼ 1. . X 0 ð. . .Þ -The probability density function describing the initial position of the decision variable, as a function of task parameters. It must hold that P x X 0 ðx; . . .Þ ¼ 1. In PyDDM, it is represented by the 'IC' class and defaults to a Kronecker delta function centered at x ¼ 0. . Bðt; . . .Þ -The boundaries of integration, as a function of time and task parameters. In PyDDM, it is represented by the 'Bound' class and defaults to Bðt; . . .Þ ¼ 1. . p Ã C ðt; . . .Þ and p Ã E ðt; . . .Þ -Post-processing performed on the simulated first passage time distributions of correct and error RT distributions p C ðtÞ and p E ðtÞ, respectively. In PyDDM, these functions are represented by the 'Overlay' class and default to p Ã C ðt; . . .Þ ¼ p C ðtÞ and p Ã E ðt; . . .Þ ¼ p E ðtÞ. In practice, 'Overlay' is used to implement non-decision times and mixture models for fitting contaminant responses.
The organization of the code in PyDDM mirrors this structure ( Figure 5-figure supplement 1). The GDDM framework supports Markovian processes, in that the instantaneous drift rate and diffusion constant of a particle is determined by its position x t and the current time t, without dependence on the prior trajectory leading to that state. Drift rate is discretized, such that the timestep of the discretization is equal to the timestep of the Fokker-Planck solution. In practice, for typical task paradigms and dataset properties, the impact of this discretization is expected to be minimal for a reasonable timestep size.
The Fokker-Planck equation permits numerical solutions of the GDDM. The numerical algorithm is described for the GDDM with fixed bounds in this section, and the algorithm for the GDDM with time-varying bounds is described in the next section. The system is first discretized with space and time steps Dx and Dt. Without loss of generality, we assume Dx divides 2Bðt; . . .Þ and Dt divides the simulated duration. The probability density solution of Equation 2, ðx; t; . . .Þ, can then be approximated as the probability distribution function at spatial grids fÀBðt; . . .Þ þ Dx; ÀBðt; . . .Þ þ 2Dx; :::; 0; :::; Bðt; . . .Þ À 2Dx; Bðt; . . .Þ À Dxg and temporal grids fnDtg for any non-negative integer n. Due to absorbing bounds, the two spatial grids one step outward at AEBðt; . . .Þ have 0 density, and are not considered explicitly. The choice of Dt is constrained by the stability criteria and accuracy vs execution time tradeoff (see Figure 4).
Define P m i as the probability distribution at the ith space-grid and the mth time-grid. For clarity, here and below P m i denotes the total probability inside a grid of lengths Dx and Dt, and its center represented by (i,m). As such, P m i is unitless, in contrast to ðx; tÞ which has a unit of the inverse of the spatial dimension. Probability distribution functions can also be viewed as a discrete proxy of the continuous probability density (i.e. approximating ðx; tÞ at the corresponding space and time grid), which will then have a unit of the inverse of the spatial dimension. Both views are mathematically equivalent (and just off by a Dx factor to the whole equation), but the former definition was used for simplicity and interpretability.
The Fokker-Planck equations can be discretized in several distinct ways, including the forward Euler method, the backward Euler method, and the Crank-Nicolson method. In the forward Euler method, Equation 2 is approximated as: where D ¼ s 2 =2 is the diffusion coefficient. Here and below, dependencies of space, time, and other variables are omitted for simplicity. Terms in parenthesis with subscripts and superscripts are evaluated at the subscripted space grid and superscripted time grid, in the same manner as P n j . The spatial derivatives at the right side of Equation 2 are evaluated at the previous time step n À 1, such that only one term of the equation is at the current time step n, generated from the temporal derivative at the left side of Equation 2. As such, Equation 3 for each j fully determines P n j from the probability distributions at the previous time-step: It is important to note that the forward Euler method, Equation 4, is an approximation method. Since the probability distribution is iteratively propagated from early to later times, any errors of the solution accumulate over the iterative process. Moreover, the forward Euler method is prone to instability. Assuming small Dx and Dt (i.e. much less than 1), Dt Dx 2 is much larger than Dt 2Dx and thus the third term on the right side of Equation 4 should in general be much larger than the second term. If Dx and Dt are such that the third term is large for any n and j, then P n j will not be properly computed in Equation 4, and such errors will propagate to the whole space over time. The forward Euler method thus has a stability criteria: Equation 5 poses a strict constraint on the step-sizes: while Dx and Dt should both be small to minimize the error due to discretization, decreasing Dx demands a much stronger decrease in Dt, resulting in a much longer execution time of the method. This can be seen in our simulations shown in Figure 4.
The backward Euler and Crank-Nicolson methods largely circumvent the aforementioned problems. In the backward Euler method, the right side of Equation 2 is expressed at the current time step n: P n j À P nÀ1 j Dt ¼ À ðPÞ n jþ1 À ðPÞ n jÀ1 2Dx þ ðDPÞ n jþ1 þ ðDPÞ n jÀ1 À 2ðDPÞ n j Dx 2 ; In the Crank-Nicolson method, half of the right side of Equation 2) is expressed at the current time step n, and half at the previous time step n À 1: With multiple terms at the current time-step, Equation 6 and Equation 7 cannot be directly solved. Instead, the equations across all spatial grids can be summarized in matrix form: Here, n ¼ DDt=Dx 2 and ¼ Dt=Dx. In this formulation, a ¼ 1 for the forward Euler method, a ¼ 0 for the backward Euler method, and a ¼ 0:5 for the Crank-Nicolson method. Since M is a tridiagonal matrix, it can be inverted from the left side to the right side of the equation in an efficient manner. Practically, this allows the backward Euler and Crank-Nicolson method to rapidly generate the discretized probability distribution function inside the boundaries at each time-step.
Finally, the probability of decision formation at each time step is computed as the outward flux from the outermost grids AEB Ç Dx (to the hypothetical grids at AEB): This provides the correct and incorrect reaction time distributions, as well as the total correct ( P n Dp n correct ) and incorrect ( P n Dp n incorrect ) probability. Unlike the forward-Euler method which iteratively propagates the solution and the approximation error in a feedforward manner, the backward Euler and Crank-Nicolson methods solve for the matrix equation of the probability distribution evolution, and are less susceptible to error being propagated. In addition, the two methods do not have any stability criteria which constrain the step size. Contrasting the two methods, the backward Euler method has larger truncation error (OðDx; Dt 2 Þ) than the Crank-Nicolson method (OðDx 2 ; Dt 2 Þ) ( Table 1), while the Crank-Nicolson method generally has twice the execution time of the backward Euler method, assuming the matrix construction is the slowest step of the algorithms. Crucially, the solution from the Crank-Nicolson method tends to be susceptible to oscillations over iterations, especially with varying bounds (see below). PyDDM automatically chooses the best solver for any given model ( Table 1).

Fokker-Planck formalism with time-varying bounds
The previous section considered the Fokker-Planck equation of the GDDM assuming fixed bounds. The formalism can be adapted to accommodate time-varying bounds. This is more generally studied as a free boundary problem, with a typical example including the Stefan problem (Meyer, 1978;Li, 1997).
In particular, consider upper and lower bounds B n ¼ BðnDt; . . .Þ and ÀB n ¼ ÀBðnDt; . . .Þ at an arbitrary time t ¼ nDt. If B n is exactly on a grid point, the previous results apply. Alternatively, if B n is in between two grid points ðx j ; x jþ1 Þ, the solution is approximated by a linearly weighted summation of the solutions in two grid systems: an inner grid system fx k jk ¼ Àj; :::; jg and an outer grid system fx k jk ¼ Àj À 1; :::; j þ 1g.
DefineP n in=out to be the probability distribution function at time t ¼ nDt over the inner and outer grid systems, respectively. In the backward Euler method (a ¼ 0 in Equation 8),P n in=out can be computed by propagating from the actual probability distribution function at the previous stepP nÀ1 : where M in=out are as in Equation 8, but defined over the domain of inner/outer grids respectively. Furthermore, define as the weights of the linear approximation of the solution to that of the outer and inner grid systems. The resulting probability distribution function and the probabilities of decision formation at step n are:P n ¼ w outP n out þ w inP n in Dp n correct ¼ w out ðÀn þ =2Þj n x¼xjþ1 P n jþ1;out þ w in ðÀn þ =2Þj n x¼xj P n j;in Dp n incorrect ¼ w out ðÀn À =2Þj n x¼xÀjÀ1 P n ÀjÀ1;out þ w in ðÀn À =2Þj n x¼xÀj P n Àj;in In the literature of the DDM in cognitive psychology and neuroscience, collapsing bounds are the most common form of time-varying bounds considered. As bounds collapse, it is possible that the outermost grids at the previous step (with non-zero probability densities) become out-of-bound at the current step. Based on the side they become out of bound, such probabilities are added to the correct and incorrect probabilities respectively (and multiplied by w out=in for each of the outer/inner grid system). Nevertheless, increasing bounds are also supported. The algorithm for the Crank-Nicolson scheme of time-varying bounds often generates oscillatory solutions, and is thus not described here or implemented in PyDDM.

Empirical datasets
As testbeds for fitting different models, we used two empirical datasets of choices and RTs during perceptual decision-making tasks: one from Roitman and Shadlen, 2002, one from Evans and Hawkins, 2019. These references contain full details of the tasks and datasets. For the dataset of Roitman and Shadlen, 2002, we used trials from the 'reaction time' task variant, for both monkeys. For the dataset of Evans and Hawkins, 2019, we used trials from the task variant with no feedback delay.
In brief, these two publicly available datasets were selected for the following reasons. Both use random dot motion discrimination tasks, which facilitate comparison of behavior and fitting with the same GDDM. These two datasets span species, with behavior from monkeys (Roitman and Shadlen, 2002) and human subjects (Evans and Hawkins, 2019). Furthermore, both datasets have been analyzed in prior work fitting DDMs with collapsing bounds. Specifically, Hawkins et al., 2015a generally found that the dataset of Roitman and Shadlen, 2002 was better fit with collapsing bounds than with constant bounds, in contrast with typical human datasets which are well fit with constant bounds. In line with those findings, Evans and Hawkins, 2019 showed that behavior in a task variant with no feedback delay yielded behavior which was more representative of human datasets and was well fit by a DDM with decision bounds with no or little collapse. These two datasets therefore provided useful testbeds for GDDM fitting with distinct a priori predictions about the utility of collapsing bounds in a GDDM.

Compared models
The 6-parameter GDDM fit by PyDDM is parameterized by where fittable parameters are drift rate 0 , coherence nonlinearity a, leak constant ', initial bound height B 0 , bound collapse rate t, and non-decision time t nd . Additionally, C max is the maximum coherence in the experiment, which is fixed at 0.512 for the Roitman and Shadlen, 2002 dataset and 0.4 for the Evans and Hawkins, 2019 dataset. The three-parameter DDM fit by PyDDM is a subset of the GDDM framework parameterized by where C is coherence and fittable parameters are 0 , B 0 , and t nd .
The 11-parameter 'full DDM' fit by HDDM does not fall into the GDDM framework because the across-trial variability in drift rate ðx; t; . . .Þ is a random variable. Thus, the 'full DDM' is parameterized by where N is a normal distribution, and fittable parameters are six independent drift rates j for each coherence level j, non-decision time t nd , and bounds B 0 , as well as variability in drift rate s v , starting position s z , and non-decision time s T . It was fit an outlier probability of p outlier ¼ 0:05. The 8-parameter DDM fit by HDDM is a subset of the GDDM framework parameterized by ðx; t; . . .Þ ¼ j sðx; t; . . .Þ ¼ 1 where fittable parameters are B 0 , t nd , and six independent drift rates j , one for each coherence level j. It was fit an outlier probability of p outlier ¼ 0:05. The 18-parameter DDM fit by EZ-Diffusion is a subset of the GDDM framework parameterized by ðx; t; . . .Þ ¼ j sðx; t; . . .Þ ¼ 1 where fittable parameters are independent drift rates j , bounds B j , and non-decision times t j nd , for each of the six coherence level j.
All simulations were fit separately for each monkey. In the human fits, data were pooled across all subjects for the 0 s delay condition only.
In order to examine combinations of Dt and either Dx or N which simultaneously minimize both execution time and error, we plot the product of the MSE and execution time in seconds. This product should be interpreted as a heuristic which encompasses both execution time and accuracy rather than a fundamentally important quantity in and of itself.

Model for parallel performance
To analyze PyDDM's parallel performance, we benchmarked a DDM which fits into the GDDM framework, defined by: where the model was simulated using backward Euler for 64 coherence values C uniformly spaced from 0 to 0.63. Because PyDDM parallelizes functions by utilizing an embarrassingly parallel approach over experimental conditions, we included many such conditions to enable benchmarks which reflect the simulations instead of the specifics of the model. Simulations were performed on a high performance computing cluster using a single node with 28 Xeon E5-2660v4 cores. Verification was disabled for these simulations.

Validity of results
It is important that simulation results are valid and reliable. Results could be inaccurate either due to bugs in user-specified models or in PyDDM itself. Due to the ability to easily construct user-defined models, PyDDM takes this issue very seriously. In addition to utilizing standard software engineering practices such as unit and integration testing with continuous integration (Beck Andres, 2004), PyDDM is the first neuroscience software to our knowledge which incorporates software verification methods for ensuring program correctness (Ghezzi et al., 2002). It does so using a library which checks for invalid inputs and outputs within the code, and also checks for invalid inputs and outputs in user-defined models which do not explicitly incorporate the library (Shinn, 2020). Invalid inputs and outputs include anything which invalidates the assumptions of the GDDM, such as initial conditions outside of the bounds, noise levels less than zero, or probability distributions which do not sum to 1. Such checking helps ensure that results generated by PyDDM are not due to a configuration or programming mistake, which is especially important because user-specified models logic can quickly become complicated for intricate experimental designs. This, combined with the ease of reproducibility and code reuse afforded by open source software in general, means that high validity and reliability are key advantages of PyDDM.
Once we have determined all of the components of our model, we may tie them together using an instance of the 'Model' class. When creating an instance of the 'Model' class, we must also define our timestep 'dt', grid spacing 'dx', and simulation duration 'T_dur'. Shown below is the GDDM in If the model 'm' includes at least one component with a 'Fittable' instance, we must first fit the model to data by calling the 'fit_adjust_model' function, passing our model and dataset as arguments. By default, it fits using differential evolution (Storn and Price, 1997), however Nelder-Mead (Nelder and Mead, 1965), BFGS (Nocedal and Wright, 2006), basin hopping (Wales and Doye, 1997), and hill climbing (Luke, 2013) are also available. Fitting a model in parallel using n cores of a CPU is as simple as calling the function 'set_N_cpus(n)' before fitting the model. Once we load our sample 'samp' using one of a number of convenience functions for loading data, we may fit the model: set_N_cpus (4) fit_adjust_model(m, sample = samp) After fitting the model to data, the 'Fittable' instances are automatically replaced by their fitted values, so the model can be directly simulated using 'm.solve()'. If any of the model components depend on task conditions, these must be passed as a dictionary to the solver; for example, since the 'DriftCoherenceLeak' component depends on 'coh' (the trial's coherence), we must instead call m.solve(conditions={'coh':X0.064})X to simulate with a coherence of 0.064. Alternatively, the function 'solve_partial_conditions' can simulate across many task conditions simultaneously. Both the functions 'm.solve' and 'solve_partial_conditions' will return an instance of the 'Solution' class, which has methods for obtaining the correct and error RT distributions as well as useful summary statistics.