Introduction

Over the past several decades, the study of rapid decision-making has been used as a tool to investigate the properties of the psychological processes responsible for decisions. One of the distinct benefits of using rapid decision-making in this context is the ability to develop process level models that link proposed theories to data. Within this field, more than five decades of research have been dedicated to the development and study of evidence accumulation models (EAMs), the dominant modeling paradigm used to study these decisions (Stone, 1960). However, researchers, particularly those in the neuroscience and primate research communities (Cisek, Puskas, & El-Murr, 2009; Thura, Beauregard-Racine, Fradet, & Cisek, 2012; Ditterich, 2006a; Drugowitsch, Moreno-Bote, Churchland, Shadlen, & Pouget, 2012), have begun to reconsider some of the most fundamental assumptions of this modeling framework. Whereas five decades of research have centered around the idea that accumulation of evidence over time is the basis of decisions, more recent work has suggested that temporal modulation of caution or a sense of urgency plays a prominent role in decisions. This has led to the recent proliferation of assessments of the role and importance of these factors in decision-making, as encoded in response time (RT) models (Hawkins, Forstmann, Wagenmakers, Ratcliff, & Brown, 2015; Hawkins, Wagenmakers, Ratcliff, & Brown, 2015; Evans, Hawkins, Boehm, Wagenmakers, & Brown, 2017; Evans, Hawkins, & Brown, 2018; Evans & Hawkins, 2019; Ditterich, 2006a; Dutilh et al., 2018; Drugowitsch et al., 2012; Cisek et al., 2009; Thura et al., 2012; Winkel, Keuken, van Maanen, Wagenmakers, & Forstmann, 2014; Ditterich, 2006b; Trueblood et al., 2018; Carland, Marcos, Thura, & Cisek, 2015). However, while the mathematical and statistical properties of classical EAMs have been thoroughly investigated, to date this new class of models (e.g., collapsing bound or urgency based models) has not. Here, we provide a thorough analysis of the basic properties of these models and address questions such as, “can these models be reliably fit to data?” and, “can they be used for parameter inference?”. Our purpose here is not to compare these different models. Rather, it is to assess the basic properties of these models and to determine how they can and should be used in the future.

Over the past several decades, evidence accumulation models (EAMs) have served as an important tool to investigate the properties of rapid decision-making research (Hawkins et al., 2014; Matzke, Dolan, Logan, Brown, & Wagenmakers, 2013; Forstmann et al., 2011; Gomez, Ratcliff, & Perea, 2007; Ho et al., 2014; Ratcliff, Thapar, & McKoon, 2011; Ratcliff, Thapar, & McKoon, 2010; Evans & Brown, 2017; Evans, Rae, Bushmakin, Rubin, & Brown, 2017). Specifically, EAMs propose that evidence is accumulated for each of the decision alternatives until the evidence for one of the alternatives reaches a threshold level of evidence, which triggers a decision. Although several different specific EAMs have been proposed, all EAMs contain two critical parameters that explain the process described above: the “drift rate”, which is the rate of evidence accumulation for an alternative, and the “threshold”, which is the amount of evidence required to trigger a decision for an alternative (see (Ratcliff, Smith, Brown, & McKoon, 2016) and (Donkin & Brown, 2017) for reviews). Recently, researchers have begun to reconsider the psychological assumptions associated with response thresholds. In the past, it was typically assumed that response thresholds are constant over time, reflecting the assumption that a person’s level of caution is fixed within a decision and does not change over time. However, recent studies have begun to investigate whether those thresholds change (e.g., “collapse” or decrease) over time, reflecting the assumption that “urgency” leads to a reduction in caution over time (Cisek et al., 2009; Drugowitsch et al., 2012).

Investigation of these time-varying decision mechanisms has taken two basic forms. The first involves the presence of a “collapsing threshold”, which encodes the assumption that a decreasing amount of evidence is required to trigger a decision as the time spent on the decision increases (Drugowitsch et al., 2012; Ditterich, 2006a). Alternatively, there have also been proposals of a conceptually similar “urgency signal”Footnote 1, where the thresholds remain fixed, but the evidence signal is multiplied by a value that continues to increase as the time spent on the decision increases (Cisek et al., 2009; Thura et al., 2012). These time-varying mechanisms are appealing for several normative reasons. Specifically, they allow greater efficiency (e.g., maximizing the time discounted rate of return) than fixed thresholds when experimental difficulty varies between trials (Drugowitsch et al., 2012; Thura et al., 2012), and allow fast deadlines to be met (Frazier & Yu, 2007).

The previous assessments of collapsing thresholds models have primarily relied on model comparison, e.g., comparing a fixed threshold EAM and a collapsing threshold EAM to determine how well each describes the process that underlies the data. The use of model comparison has resulted in mixed findings regarding whether humans generally seem to implement these time-varying mechanisms. Earlier studies focused on qualitative trends, which favored the presence of a time-varying mechanism (Cisek et al., 2009; Thura et al., 2012). However, studies using quantitative model selection have mostly showed evidence against these time-varying mechanisms (Hawkins et al., 2015; Hawkins et al., 2015; Voskuilen, Ratcliff, & Smith, 2016; Evans et al., 2017). Although, some recent quantitative studies (Palestro, Weichart, Sederberg, & Turner, 2018; Evans et al., 2018; Evans & Hawkins, 2019)have found evidence in favor of time-varying mechanisms in specific paradigms.

However, another important goal that has largely been ignored within the collapsing thresholds literature is parameter estimation, which involves measuring the latent parameters from data to test hypotheses (see (Kruschke & Liddell, 2015) for a discussion of the importance of estimation). With this approach, rather than formulating multiple models and comparing them, one formulates a single model whose specific parameter values determine its properties. For example, one could consider a collapsing thresholds model and measure the extent to which the thresholds collapse. A rate of collapse near 0 would be a strong indicator of the absence of collapsing thresholds, while a rate that deviates significantly from 0 would indicate its importance. However, this approach requires that the crucial parameters of the models being considered (the “collapse rate” in this example) can be reliably estimated from data. While parameter recovery studies for classical EAMs have been carried out to validate their use for parameter inference (e.g., van Ravenzwaaij & Oberauer, 2009; Donkin, Brown, Heathcote, & Wagenmakers, 2011; Lerche & Voss, 2016; White, Servant, & Logan, 2018), this has not been performed for this new class of collapsing threshold/urgency models.

Our study aims to provide the first comprehensive assessment of parameter estimation with collapsing thresholds and urgency signal models by performing a large-scale parameter recovery study utilizing state-of-the-art Bayesian methods. Importantly, if a model cannot reliably estimate the correct parameter values from data generated by those very parameters, then the estimated parameters from the model when applied to real data are of little meaning. This has proven to be an issue for some existing cognitive models (e.g., see (Miletić, Turner, Forstmann, & van Maanen, 2017) for a parameter recovery study on the leaky-competing accumulator [LCA; (Usher & McClelland, 2001)] that shows an inability to recover several key parameters of the model) and more generally for models of complex biological processes (Holmes, 2015; Gutenkunst et al., 2007). We assess the parameter identifiability of three distinct model types based on the diffusion decision model framework (DDM; (Ratcliff, 1978)), linear ballistic accumulator framework (LBA; Brown & Heathcote 2008), and the urgency gating model framework (UGM; Cisek et al., 2009; Thura et al., 2012), which are schematically outlined in Fig. 1. Additionally, we consider two variants of collapse bound diffusion and urgency gating model formulations for completeness. Finally, based on prior suppositions (e.g., Cisek et al., 2009; Thura et al., 2012) that the type of data that is used when working with these models is of critical importance, we perform this assessment with two types of data, constant-evidence paradigms and changing-evidence paradigms.

Fig. 1
figure 1

Examples of the three different model variants considered within our manuscript. In all cases, the green lines display evidence accumulation in a constant-evidence paradigm, and the blue lines display evidence accumulation in a changing-evidence paradigm with a piecewise extension of the models. The red vertical lines display the time at which the evidence changes in the environment (tswitch), with the accumulation of both models being identical until the evidence change. a Diagram of the LBA. The solid colored lines display the accumulation of evidence for the first alternative, and the dashed colored lines display the accumulation for the second alternative. The black line displays the standard fixed threshold within the model that evidence for each alternative accumulates to, whereas the gray line displays the linear collapsing thresholds that we attempt to measure within this study. In addition, there is a switch time delay parameter (tdelay) included, which is an offset from when the evidence change in the environment begins to effect the accumulation process. b Diagram of the DDM. The solid black lines display the standard fixed thresholds, the solid gray lines display a linear collapsing threshold, and the dashed gray lines display a Weibull collapsing threshold. In addition, there is a switch time delay parameter (tdelay) included, which is an offset from when the evidence change in the environment begins to effect the accumulation process. c A diagram of the UGM

Models

We first outline the collection of models that we perform the parameter recovery assessment on within this article. Five specific models are considered: two versions of a collapse threshold DDM (CTDDM), a collapse threshold variant of the LBA model (CTLBA), and two variants of the UGM. The two versions of the DDM differ only in the form of the collapsing threshold: in one case, we consider a linear collapsing threshold and in the other a non-linear. The two versions of the UGM differ only slightly in their mathematical formulation and interpretation: one is formulated in terms of a leakage process and the other in terms of a low-pass filtering process. For the collapse threshold LBA, we consider only a version with a linearly collapsing threshold.

We provide these assessments for both constant-evidence and changing evidence paradigms since (1) both of these paradigms have been used in previous assessments of time-varying models, and (2) as discussed by Cisek and colleagues (Cisek et al., 2009; Thura et al., 2012), time-varying decisions may be more appropriate to assess models that encode time- varying hypotheses. Therefore, the potentially richer data obtained from a changing-evidence paradigm may be more suitable for probing the importance of collapsing thresholds / urgency.

DDM

The DDM (Ratcliff, 1978; Ratcliff & Rouder, 1998), the most commonly used EAM used to study rapid, binary decisions, proposes that evidence stochastically accumulates until one of two thresholds, corresponding to one of the binary alternatives, is reached (Fig. 1, middle panel). The accumulation of evidence between these binary alternatives can be expressed as:

$$ dE = v dt + \sigma dW $$
(1)

where E is the evidence state. This model contains the following parameters: drift rate (v), starting point (z, which represents response bias), non-decision time (ter), and the response threshold (a). In this implementation of the DDM, the response thresholds are respectively at ± a. Furthermore, we consider only the (v,ter,a) parameters and fix z = 0. Fixing the bias is done since most studies of time-varying caution do not include it. The DDM can also be extended to included between decision variability in drift rate, starting point, and/or non-decision time, though we do not consider these factors here as these parameters are known to have recovery issues (Lerche & Voss, 2016) and are rarely the primary focus of parameter inference.

We provide two time-varying extensions to the DDM. Firstly, we extend the DDM to a collapsing thresholds model with a linear collapsing function:

$$ a_{upper}(t) = a_{0} - a_{slope} t $$
(2)

The lower boundary is given by alower(t) = −aupper(t). This adds a single extra parameter to the DDM framework: the linear rate of collapse (aslope) from the initial threshold. Secondly, we utilize a non-linear collapse function that has been applied within previous literature (Hawkins et al., 2015; Evans et al., 2018; Evans & Hawkins, 2019), where the collapse takes the form of a Weibull function:

$$ a_{upper}(t) = a_{0} - pWeibull(t ; shape,scale) \times (a_{0}-a_{asymp}) $$
(3)

The lower boundary again satisfies alower(t) = −aupper(t). The Weibull function contains three parameters corresponding to the shape of the collapse (shape), the scale of the collapse (scale), and the asymptote of the Weibull function (aasymp) at t. For simplicity, we fixed the minimum collapse point to be very small (1e− 3), resulting in two extra parameters in addition to the standard DDM parameters: the shape (shape) and scale (scale) of the Weibull collapse function.

LBA

The LBA (Fig. 1, top panel) proposes that evidence for different alternatives in a decision accumulate independently and deterministically (e.g., no within-trial noise), where the accumulation of evidence for each alternative can be expressed as:

$$ dE_{a} = v_{a} dt, $$
(4)

where E is the evidence state, and the subscript a indexes the alternative. In this framework, each alternative is associated with a drift rate (va) whose value corresponds to the strength of evidence for that alternative. This model contains the following parameters for a binary choice: two drift rates v1,v2 for the two alternatives, the threshold (b), the start point variability A, the non-decision time ter, and trial to trial variability in drift rates s (this is critical to the model due to the absence of within trial variability). This gives the LBA six parameters: v1,v2, s, b, A, and t0.

We extend the LBA to a collapsing thresholds model with a linear collapsing function:

$$ b(t) = b_{0} - b_{slope} t $$
(5)

Importantly, a linear collapsing function maintains the computational simplicity that the LBA originally was designed for, as the time that the threshold is crossed (i.e., the response time) can easily be obtained by the point of intersection between the lines. As our goal is to maintain the simplicity of the LBA framework, we do not include a more computationally taxing Weibull collapsing function. This adds a single extra parameter to the LBA framework: the linear rate of collapse from the initial threshold (bslope).

UGM

The UGM (Fig. 1, bottom panel) proposes a similar process of dependent evidence accumulation as the DDM. However, the UGM proposes that evidence is barely accumulated at all, with a focus on novel evidence (Cisek et al., 2009). The essential assumption of this model is that evidence, rather than being accumulated, is first smoothed by some form of leakage or filtering process, and than multiplied by a time-varying urgency signal.

We assess two versions of the UGM within our parameter recovery. Firstly, we assess the UGM previously fit in Hawkins et al., (2015) and Evans et al., (2017) and described previously (e.g., Carland et al., 2015), which assumes evidence is filtered prior to being weighted by the urgency signal. This version of the model has the following five parameters: v, a, ter, u, and τ, and can be expressed as:

$$ x_{(t+{\Delta})} = u t E_{(t+{\Delta})}, \quad E_{(t+{\Delta})} = E_{(t)} + \frac{\Delta}{\tau + {\Delta}} (-E_{(t)} + v dt + \sigma dW). $$
(6)

where the two additional parameters beyond the diffusion model, u and τ, are the urgency signal and time constant of the filtering process, respectively. Since this is a non-standard random walk that does not have a clear continuous analogue, we express it in its discrete form rather than a stochastic differential equation. Specifically, Δ is the step-size of the process, ut is the time-dependent urgency signal, and x(t+Δ) is the urgency-transformed evidence value that is compared against the threshold at each step to determine if a decision is made.Footnote 2

We note that the variable E here has a slightly different meaning than in the prior models. In prior models, E was the decision variable whose accumulation to a threshold triggers a response. The UGM theory, however, posits that the evidence signal is weighted by an urgency signal to produce an effective decision variable. Here, E represents the evidence signal, ut is the urgency signal, and x is the decision variable.

The second version we consider differs slightly in its mathematical form. It assumes that rather than being subjected to a filter with an associated time constant, evidence is subject to leakage with an associated rate (Carland, Thura, & Cisek, 2015). This version of the model has the following four parameters: v, a, ter, and L, and can be expressed as:

$$ x = u t E, \qquad dE = (-L E + v) dt + \sigma dW , $$
(7)

where u is fixed at 1 (see Appendix C for further detail on this choice). We stress again that this is not a fundamentally different model but rather a slightly different mathematical formulation of the UGM. In particular, the leakage in this formulation has the effect of a smoothing filter with a time constant related to 1/L. We include both of these mathematical instantiations of the UGM for completeness. We note that for this version of the UGM, the urgency rate is mathematically unidentifiable and thus we do not attempt its recovery here, see the Appendix for further details.

Accounting for changes of evidence

While changing-evidence paradigms have been of great interest in decision-making literature (Diederich, 1997; Diederich & Busemeyer, 1999; Usher & McClelland, 2001; Diederich & Busemeyer, 2006; Diederich, 2008; Kiani, Hanks, & Shadlen, 2008; Tsetsos, Usher, & McClelland, 2011; Tsetsos, Gao, McClelland, & Usher, 2012; Evans et al., 2017), models extended to these paradigms have rarely been the focus of parameter recovery studies (though see (Holmes, Trueblood, & Heathcote, 2016) for a parameter recovery study on the piecewise LBA and Holmes and Trueblood (2018) for a parameter recovery study on the piecewise DDM). In the current parameter recovery study, we assess the recoverability of parameters in changing-evidence paradigms where evidence changes at a single, known discrete point in time (tswitch). For example, in a random dot motion task, this change would correspond to a change in the motion of dots at a specific point in time (e.g., from initially moving to the left to moving to the right after 200 ms). Accounting for this change of evidence requires augmentation of the models. For the LBA, we utilize a piecewise extension (pLBA, Holmes et al., 2016) where it is assumed that after the change of evidence, drift rates change to reflect the new evidence. We further assume there is an unknown delay between the change of stimulus and the change of drift tdelay, which adds an additional parameter to the model (see Fig. 1). This parameter captures the time between the objective change in stimulus and the time it takes an individual to adapt to and encode the new information. A similar piecewise extension is applied to the DDM (as in Holmes & Trueblood 2018), which again adds a parameter tdelay.

For the UGM, we similarly assume that the evidence parameter changes in response to the change of evidence. In this case, however, we do not include a delay between the change of stimulus and change of parameter. The addition of the delay parameter to the DDM and LBA frameworks is to account for the fact that the change of stimulus likely does not result in an instantaneous change in drift rate parameters. Rather, there would likely be a nonlinear response to the change of evidence that changes the drift rates from vbefore to vafter over a period of time. The inclusion of this delay parameter is essentially an approximation that accounts for the time it takes for that non-linear process to generate the new drift rates. In the UGM, however, the filtering of evidence (or alternatively leakage) already provides a mechanism to smooth the transition from before to after the change of stimulus, with the filtering time constant (τ) or leakage rate (L) setting the timescale over which the transition occurs. More specifically, recall that in the UGM, it is not the sensory evidence that is multiplied by the urgency signal, but rather a smoothed (by filtering or leakage) version of it. That smoothing is what the tdelay in the DDM and LBA models is intended to account for, and thus the tdelay is not mechanistically needed in UGM. Thus, additional parameters are not required to account for changes of evidence in UGM.

Methods

For all recoveries, we used the same general method, though the exact details differed based on the number and identity of the parameters. To assess the identifiability of parameters of each of the models, we performed a large-scale simulation/parameter recovery study utilizing state of the art Bayesian methods. For each model, a large number of parameter sets were chosen, synthetic data was simulated based on those parameters, and parameter recovery was performed for each parameter set using that simulated data. We also used two types of input evidence in our simulations: constant evidence, where the evidence being integrated from the environment remains constant over time, and changing evidence, where the evidence being integrated from the environment changes at a known, fixed time during the trial. This was motivated by the suggestions of Cisek et al., (2009) that time-variant models should be most distinguishable from time-variant models in changing-evidence paradigms.

To generate the simulated datasets, we used a Latin hypercube sampling design (McKay, Beckman, & Conover, 1979). Importantly, this method allows for the most efficient method of sampling of the parameter space of interest. We generated 4000 parameter sets using the Latin hypercube design. For each, a single synthetic data set was generated, with both a fixed- and changing-evidence condition, each with 1000 trials. For the fixed-evidence assessments, we only fit the models to the fixed evidence condition of the data.

For the changing-evidence condition, we simulated data with a single change in evidence that occurred after 250 ms, corresponding to a direct swap in the drift rates for the two accumulators in the LBA and a swap of the positive/negative sign in the DDM/UGM (i.e., both are symmetric changes in evidence). Only parameter sets that produced data sets with reasonable response time distributions were kept for further analysis. Our criteria for reasonable response time distributions were the same for all simulations for all models, and can be found in the Appendix along with the number of data sets that failed to meet these criteria. Note that it is possible that our choice of exclusion criteria could have influenced our recovery findings, as models that successfully recover parameter sets used in this study may not be able to recover parameter sets excluded. In order to focus on the practical use case of these models, however, we chose to only study the region of “data space” that is relevant to applications and thus do not consider RT distributions that differ dramatically from those observed empirically.

We fit datasets using Bayesian parameter estimation with differential evolution Markov chain Monte Carlo (DE-MCMC; Ter Braak, 2006; Turner, Sederberg, Brown, & Steyvers, 2013) to estimate the posterior distributions. Since these models do not have tractable analytic likelihood functions, we utilized the recently developed probability density approximation (PDA) method to computationally approximate the needed likelihood functions (Turner & Sederberg, 2014; Holmes, 2015; Evans, Holmes, & Trueblood, 2019), which we detail further in the Appendix. To assess the accuracy of the parameter recovery, we compared the mean posterior estimated parameter values to the generating parameter values across all datasets for each parameter. Our fits were performed to (1) the constant evidence data (labeled “constant” in the figures of the results) and (2) the constant and changing evidence data simultaneously (labeled “changing” in the figures of the results) with all parameters constrained over conditions (e.g., threshold parameters are the same for both types of trials). See the Appendix for additional details regarding the generation of synthetic data and parameter recovery methods.

Results

Figure 2 shows results of the recovery for the LBA model, Figs. 3 (linear) and 4 (Weibull) show recovery results for the DDM models, and Figs. 5 (filter) and 6 (leakage) show recovery results for the UGM models. In each case, the horizontal axes show the true value of the generating parameters while the vertical axes show the mean of the resulting posterior distributions. Thus, a quality recovery is characterized by a cloud of points that lie close to the diagonal (true parameter = fit parameter).

Fig. 2
figure 2

Parameter recovery plots for the LBA with linear collapsing thresholds. The columns display the different parameters of the model (drift difference, drift mean, threshold, and collapse rate, respectively), and the rows display the different conditions (constant evidence, and both changing and collapsing evidence, respectively). Each plot displays the generated values on the x-axis, and the mean of the estimated posteriors on the y-axis, with the generated/estimated combinations as black points, and the red line represents perfect recovery

Fig. 3
figure 3

Parameter recovery plots for the DDM with linear collapsing thresholds. The columns display the different parameters of the model (drift rate, threshold, and collapse rate, respectively), and the rows display the different conditions (constant evidence, and both changing and collapsing evidence, respectively). Each plot displays the generated values on the x-axis, and the mean of the estimated posteriors on the y-axis, with the generated/estimated combinations as black points, and the red line representing perfect recovery

Fig. 4
figure 4

Parameter recovery plots for the DDM with Weibull collapsing thresholds. The columns display the different parameters of the model (drift rate, threshold, the shape of the Weibull collapse, and the scale of the Weibull collapse, respectively), and the rows display the different conditions (constant evidence, and both changing and collapsing evidence, respectively). Each plot displays the generated values on the x-axis, and the mean of the estimated posteriors on the y-axis, with the generated/estimated combinations as black points, and the red line representing perfect recovery

Fig. 5
figure 5

Parameter recovery plots for the UGM with a filter process. The columns display the different parameters of the model (drift rate, threshold, the urgency multiplier, and the time-constant of the filter process, respectively), and the rows display the different conditions (constant evidence, and both changing and collapsing evidence, respectively). Each plot displays the generated values on the x-axis, and the mean of the estimated posteriors on the y-axis, with the generated/estimated combinations as black points, and the red line representing perfect recovery

Fig. 6
figure 6

Parameter recovery plots for the UGM with a leakage process. The columns display the different parameters of the model (drift rate, threshold, and the leakage rate, respectively), and the rows display the different conditions (constant evidence, and both changing and collapsing evidence, respectively). Each plot displays the generated values on the x-axis, and the mean of the estimated posteriors on the y-axis, with the generated/estimated combinations as black points, and the red line representing perfect recovery

In general, there appears to be three consistent trends in the parameter recoveries across all of the models. Firstly, the difference in drift rate between alternatives (which is just the drift rate in the DDM and UGM models) recovers extremely well, regardless of the time-variant function. Thus the discriminability or strength of evidence is recoverable in all cases, regardless of the model or type of data used. Secondly, there appears to be a clear link between the recovery of the initial threshold and the recovery of the time-variant parameters (i.e., the threshold collapse, or urgency signal), where time-variant and initial threshold parameters are either both recovered well, or both recovered poorly. Specifically, both recover well in the linear CTDDM while neither recover well in the non-linear CTDDM or CTLBA. This suggests that a key tradeoff exists between the initial threshold and these time-variant parameters, and that estimates of initial thresholds are only reliable within certain time-variant models. Lastly, the addition of a changing evidence condition appears to aide the recovery of the parameters for all models, though in most cases the improvement is fairly minor.

At a specific level, while the CTLBA accurately recovers the difference between drift rates, it cannot recover the values of the drift rates themselves, the initial threshold, or the rate of threshold collapse. While the non-linear CTDDM can very precisely recover the drift rate, it cannot recover any of the threshold parameters adequately, with either fixed or changing evidence. The linear CTDDM on the other hand precisely recovers all parameters including the drift rate, the initial threshold, and the rate of threshold collapse. This is true when either fixed or changing evidence is considered, though the addition of changing evidence does somewhat improve the quality of recovery.

The UGM results are somewhat more nuanced. Recovery of the evidence strength parameter (v) is precise in both cases. Estimation of the time constan/leakage rate is reasonable for both models as well, in the appropriate regime. Small time constants and large leakage rates both recover well, while large time constants/small leakage rates do not. In retrospect, this is sensible because large time constants/small leakage rates correspond to times that are longer than decision times themselves. More importantly, while the threshold is recoverable in the leakage formulation, neither the threshold nor the urgency value (u) are recoverable in the filter version, with either fixed or changing information.

These results indicate that the only two models (of those considered) for which accurate recovery of parameters is possible are the linear CTDDM and the leakage formulated UGM. Importantly however, the leakage formulated UGM does not contain an estimable “strength of urgency parameter” as it was shown mathematically to not be identifiable. Thus the only one of the five considered models containing a parameter estimating the relative importance of time-varying caution/urgency that can be accurately inferred is the CTDDM.

An important distinction is needed here though. Just because a model does not recover its parameters does not preclude its use in modeling. The inability of a model to recover its own parameters is typically due to some form of parameter degeneracy or indeterminacy, otherwise referred to as model “sloppiness” (Holmes, 2015; Gutenkunst et al., 2007). In such cases, this simply means that more than one parameter set can account equally, or almost equally well for the same data. That is, the ability of a model to accurately fit a data set, and the ability of that model to precisely recover parameters are different. In the case of the three models that do not recover parameters, the “best fit” parameters do still fit the generated data very well (results not shown). It is simply the case that for those models, many parameter sets account for the data very well and the fitting process converged onto a different set than those that generated the data.

Discussion

Our study aimed to provide the first comprehensive assessment of whether response time models that included time-variant collapsing thresholds or urgency signals can be used as “measurement tools” to reliably estimate the latent parameter values of interest. Previously within the collapsing thresholds literature, the goal of parameter estimation has mostly been ignored in favor of model inference, although both goals should be viewed as important. We assessed five different models through parameter recovery simulations, which looked at whether the drift rate, threshold, and time-varying threshold/urgency parameters could be recovered for each. We performed these parameter recoveries within a constant-evidence only paradigm—the standard within rapid decision-making—and a paradigm with both constant-evidence and changing-evidence conditions.

Our results demonstrate that where parameter inferences are considered, there are substantial differences between the caution/urgency modulation models assessed here. Specifically, only two of these models, the diffusion decision model with a linearly collapsing threshold and the urgency gating model formulated with a leakage process, are able to accurately recover their own parameters. Of those two, however, only the CTDDM has an estimable parameter reflecting the relative importance of time-varying caution/urgency. While the leakage formulated UGM is recoverable, this is only the case once the “strength of urgency” parameter (u) is removed, since it is mathematically unidentifiable based on response time data. While other models are capable of recovering some of their parameters, only the CTDDM is capable of recovering the critical parameters associated with caution.

One other interesting finding of our recovery study was that changing evidence conditions appear to aide the recovery of parameters for all models. Importantly, this finding may have potential implications beyond the time-varying models that we assess within this study, to other decision-making models that have previously shown recovery issues. For example, the study of Miletić et al., (2017) found that the leaky-competing accumulator (LCA; Usher and McClelland2001), a neurally motivated model of decision-making, exhibits parameter recovery issues. As the LCA naturally extends to changing-evidence paradigms, the additional constraint may improve the recovery of the parameters, as it did with the models in our study. However, it should be noted that the improvements we observed were fairly minor, and no model parameters qualitatively shifted from “non-recoverable” to “recoverable” due to the addition of changing evidence. Regardless, future research could potentially investigate the recovery of the parameters of the LCA—as well as other difficult to recover models—within changing-evidence paradigms.

Based on our findings, we make the following recommendations to researchers who wish to investigate or apply models with time-variant components. Firstly, we recommend that if parameter estimation is the goal, then researchers use either the CTDDM with a linear collapse, or the UGM with a leakage process. If the relative strength of time-varying caution/urgency are of primary importance, then the CTDDM should be used. Alternatively, if another type of time-variant model is of key theoretical interest, like a CTDDM with a Weibull collapse, we recommend that researchers stick to the goal of model inference, that is contrasting different models based on how well they fit data, rather than basing conclusions on parameters, since the parameter estimates from these models do not appear to be robust. In all cases, we strongly recommend that proper parameter recovery assessments are performed before using the parameters of any time-varying model for inference, as we have shown that small changes to the formulation of a model can result in a lack of robustness of the parameter estimates.