Quantized response times are a signature of a neuronal bottleneck in decision

The histograms of response times of optimal YES/NO decisions that are computed from a single sensory Poisson neuron are highly structured. In particular, response times in NO decisions are quantized to a small set of times, while response times in YES decisions have a multimodal structure. Both the times of NO decisions, as well as the modes of the the histogram of YES decisions, are associated to the number of action potentials that were necessary to reach the decision. Their value is a function of the firing rate of the neuron in response to the states of the stimulus.


INTRODUCTION
Consider an animal that is facing a binary decision (YES/NO, GO/NO-GO) in response to the presence/absence of a stimulus (light/no light, sound/no sound). The animal makes its decision based on the firing pattern of its sensory neurons. What is the best decision strategy, one that minimizes the response time while keeping the number of decision errors within an acceptable predefined limit? In a wide set of situations the best decision strategy is well known and it is based on Wald's sequential probability ratio test (SPRT) (Wald, 1945;Wald and Wolfowitz, 1948). Models of neuron-based decision that are based on SPRT have been successful at predicting the response time histograms of animals that are engaged in decision tasks, and neurophysiological correlates of key stages of the computation have been found (Gold and Shadlen, 2007), suggesting that the brain may be using this decision strategy. Here I explore the predictions of this theory when very few, one in the limit, neurons are engaged in the decision process. I find that response time histograms become highly structured, something that ought to be easy to reveal with simple behavioral/psychophysical experiments and might provide additional insight into how the brain computes decisions from sensory inputs.
In order to make this paper self-contained, in Section 2, I review SPRT and in Section 3, I review how to compute such decision based on the firing pattern of Poisson (or Bernoulli) neurons. In Section 4, I explore an example with physiologically plausible constants. In Section 5, I compute the properties of response time histograms for a single neuron. In Section I briefly discuss the case of multiple neurons involved in a decision. Section 8 describes the computational experiments. I conclude in Section 9 with a discussion of the main observations.

OPTIMAL BAYESIAN OBSERVER
I assume that the animal is sensing, and responding to, an underlying binary state of the world, e.g., the presence/absence of a given stimulus. I will call C = 0 one state (stimulus absence) and C = 1 the other state (stimulus presence). The animal's YES/NO decision is computed from the stimulus. The animal is rewarded if the response is appropriate, i.e., (C = 0) −→ NO and (C = 1) −→YES, and the animal is not rewarded otherwise.
Call X t the measurements on which the decision is based. I will assume that X t is the firing pattern of one sensory neuron which will be firing vigorously if the stimulus is present, and will fire at a low level of activity if the target is absent. Here t is a discrete index of time, with observations starting at t = 1, thus, if x t is the piece of information that is acquired at time t, then X t = {x 1 , . . . , x t }. As I will explain below, the theory may be developed both in discrete time and in continuous time; the two formalisms yield identical results.
Wald's SPRT strategy works by repeating at each time t the following two steps: First, compute the log ratio of the probability that C = 1 is true, vs the probability that C = 0 is true, given the available data X t : Second, compare R t to two thresholds, τ 0 and τ 1 (typically τ 0 < 0 < τ 1 ). If R t > τ 1 , then C = 1 is much more likely than C = 0, and decision YES is made. If instead R t < τ 0 , then decision NO is made. The third possibility is that τ 0 < R t < τ 1 , i.e., the ratio is between thresholds. In this case no decision may be made because the information is insufficient, and one waits for the next piece of evidence x t + 1 . The process is repeated until a decision is made.

FIGURE 2 | Bernoulli representation of a sequence of action potentials.
In the black plot (above) the action potentials produced by a neuron over time are indicated with vertical bars. The binary x t ∈ {0, 1} variable (blue plot, below) is obtained by dividing time into (short) bins of length t; x t = 1 if an action potential is observed in the corresponding time interval t, and it is equal to 0 otherwise. The bins should be short enough that the probability of observing multiple action potentials in one bin is very small (e.g., t = 1 ms). By taking the limit t −→ 0 one obtains a Poisson model of neuronal firing.
Wald proved that this strategy is optimal (Wald, 1945;Wald and Wolfowitz, 1948), i.e., no strategy with the same error rate yields faster decisions. A review of this strategy in the context of neurophysiological models of decision may be found in Gold and Shadlen (2007). The two thresholds τ 0 , τ 1 determine the error rates, and may be computed from the expression for R t in Equation 1: where the denominator of the second fraction depends on the fact that C may only take two values, 0 and 1. Solving this expression for P(C = 1|X t ) yields:  In the absence of action potentials the diffusion drifts by δ units/s and reaches the lower threshold at time RT N 0 (see section 5 and Equation 8), where a NO decision is made. When an action potential occurs the diffusion jumps by r(1). When one of the two thresholds τ is crossed, a decision is made. In this case the decision is NO; it is indicated by a red dot. It is made at time RT N 1 (see Equation 8). Notice that if the time of the action potential changes within the interval t ∈ (0, RT N 0 ), the decision is made at the same time. A number of diffusions produced by a simulation with the parameters shown in Table 1  Where g(·) denotes the logistic function. Thus, say, if R t = τ 1 = 2, that means that P(C = 1|X) ≈ 0.99 (Equation 3 and Figure 1), i.e., the false accept error rate is 1%. Conversely, if a 1% false accept error rate is deemed acceptable, then (Equation 2) may be used to compute the upper threshold on R t : τ 1 ≈ 2 . Similarly, if a false reject rate of 1% is reasonable, then τ 0 = −2 (for FIGURE 4 | Patterns of action potentials and ensuing decisions when C = 0. Action potentials are represented by black vertical bars along the time axis. The corresponding diffusion R t (X ) (Equation 4) is represented by a light blue line. When R t crosses one of the two thresholds τ a decision is made. YES decisions are represented by green dots and NO decisions by red dots. One may notice one false alarm error in the last row. The parameters of the simulation are shown in Table 1. For convenience, examples are sorted by the number of action potentials that were sufficient to reach a decision. For a frequency of each such event see Figure 6.
convenience of mental calculation I am using base 10 for both exp and log). Figure 1 shows the relationship of error probability and threshold magnitude.
Computing R 1 , R 2 , . . . , R t is easy from the class conditional probabilities P(X t |C = 1), P(X t |C = 0) if the measurements x t are independent when the state of the world C is known. Using Bayes' theorem: where Equation 4 shows that the computation is a diffusion, i.e., it may be carried out recursively by updating the previous value R t−1 with a term r(x t ) that depends only on the current observation x t , rather than on the whole set of observations X t . The fact that some binary decisions may be implemented by the brain with a diffusion was first suggested as a phenomenological model (Ratcliff and Hacker, 1981) and later shown to be optimal, under appropriate conditions (Gold and Shadlen, 2001).

DECISIONS BASED ON ONE POISSON NEURON
As soon as one knows the statistics P(X t |C) of neuronal responses X t to stimuli C one is able to compute the log likelihood ratio R t explicitly. Here I will derive R t assuming that neurons produce patterns of action potentials that are distributed with Poisson statistics, a model that has been shown to be useful in many instances (Seung and Sompolinsky, 1993;Jazayeri and Movshon, 2006;Graf et al., 2011). More general Poisson-like models (Beck et al., 2008) may be used in the following analysis. They have the disadvantage of being more complex to implement and analyze and therefore I will confine myself to Poisson here. The equations are particularly simple if one uses a Bernoulli approximation to the Poisson distribution: call X t the firing pattern of a neuron. I will assume that time has been discretized in small non-overlapping identical bins, e.g., time bins that are t = 1 ms long, and that the random variable x t ∈ {0, 1} represents whether an action potential is observed during the interval corresponding to time-bin t, in which case x t = 1, otherwise x t = 0 (see Figure 2).
Consider a neuron whose expected firing rate is λ 0 action potentials per second when C = 0, and λ 1 action potentials per second when C = 1 (if one models the firing rate with a Poisson process, λ is the expectation). In this case one may compute the value of r(x t ) to be used in the diffusion Equation 4: the variable x t takes only two values, 0 and 1. Thus: where p 0 and p 1 are the probabilities of detecting an action potential in a given time bin when the world is in state C = 0 and C = 1 Frontiers in Computational Neuroscience www.frontiersin.org April 2014 | Volume 8 | Article 42 | 4 FIGURE 6 | Histograms of 2 · 10 5 decision times for diffusions with parameters as in Table 1  Notice that for C = 1 no such decisions involve fewer than 2 action potential, since 1 action potential is insufficient to cross the τ 1 = 1.5 threshold as discussed in Section 4.
respectively. Thus, indicating with t the duration of a time bin, chosen so that λ t 1: The limit for t −→ 0 yields the continuous model. There is only one delicate point: in the limit r(0) −→ 0; however, δ = r(0)/ t, the linear drift rate of the diffusion when no action potentials are observed, is different from zero. Thus: r(1) = log p 1 p 0 = log λ 1 λ 0 (independent of t, thus the limit is trivial) By taking the limit for t −→ 0 we obtained the exact equations for the Poisson model (Jazayeri and Movshon, 2006;Chen et al., 2011). In Equation 6 the limit may be computed by considering the Taylor expansion log indicates higher order terms that vanish when one takes the limit. When using base 10 for the log, as in Table 1 and in Section 4, then log 10 (e) ≈ 0.4343. When using natural logs, then, of course, log(e) = 1. According to Equations 6, 7 the diffusion will mostly drift linearly at a rate δ, and present jumps of height r(1) whenever an action potential is observed (see Figure 3). Notice that if one assumes, without loss of generality, that λ 1 > λ 0 , then the diffusion jumps will always be positive, i.e., upwards and the drift will always be negative, i.e., downwards. i.e., action potentials always contribute evidence toward C = 1, while quiet periods always contribute evidence toward C = 0.

A CONCRETE EXAMPLE
In order to develop one's intuition it is useful to explore a concrete case. Suppose that a neuron responds to stimuli with parameters indicated in Table 1, which correspond to a neuron firing briskly at 10 Hz when the stimulus is present (C = 1) and at some "resting" level of 1 Hz when the stimulus is absent (C = 0). The diffusion starts at a value of R 0 = 0, it is incremented by r(1) = 1 units whenever an action potential is observed (this is a rare event) and it drifts by δ = −9 log 10 (e) ≈ −3.9 units/s during the intervals where no action potentials are observed (see Figure 3). Simulations of this are shown in Figures 4, 5.
Simple calculations allow one to estimate the expected response time for decisions when C = 1 and when C = 0.
Suppose that we consider an error rate in the neighborhood of 3% acceptable, then we should use thresholds τ 1 = 1.5 and τ 0 = −1.5 (Equation 3, Figure 1). Since one action potential increases the diffusion value by one unit, and 1 < τ 1 , it will take at least two action potentials to reach a YES decision, while a sufficiently long time interval with no action potentials will lead to a NO decision (see Figure 3). We explore in the following the response time for YES and NO decisions when C = 1 and when C = 0. A more systematic analysis is presented in Section 5.
YES decision after 2 action potentials (C = 1)-Let's consider first the case where the decision is made after two action potentials. After two action potentials the value of the diffusion is R t = 2 + δt (t is the time that has passed since the beginning of observation). Thus a YES decision may be taken only if t 2 < (2 − τ 1 )/|δ| ≈ 128 ms. The neuron fires with frequency λ 1 = 10 spikes/s, which makes it somewhat unlikely that two action potentials will be observed in 128 ms (the probability is about 0.23, which may be computed considering the Poisson distribution with expectation λ = t 2 λ 1 ). Thus, YES decisions that are based on two action potentials will only happen in a minority of cases. YES decision after 3 action potentials (C = 1)-Consider now the case where the YES decision is made after three action potentials. This decision may not be made when t < 128 ms since in that case it would be made after the first two action potentials (see previous paragraph). Thus the decision time is at least 128 ms. Furthermore, following the reasoning in the previous paragraph, the decision time will at most be t 3 = (3 − τ 1 )/|δ| ≈ 384 ms. The probability that at least three action potentials are observed in less than 384 ms is about 0.47.
NO decisions (C = 0)-Now consider the C = 0 case. If no action potentials are observed it will take τ 0 /δ ≈ 1.5/3.9 ≈ 0.384 ms for a negative decision to be made. However, if one action potential is observed during this time, then the diffusion will be incremented by 1 and one will have to wait ≈ 1/3.9 ≈ 0.256 s longer for a negative decision. When C = 0 it is quite likely that no action potentials are observed during a 0.384 s time interval since λ 0 = 1 s/s; the probability of this event is ≈ 0.68 (computed from a Poisson distribution P(k = 0; λ) with λ = 0.384). Thus, one would expect the majority of negative decisions to be made after no action potentials are observed and a time of about 0.384 s has elapsed.  I simulated such a process and obtained histograms of decision times (Figure 6). The histogram of decision times for C = 0 is sparse. Only a few discrete decision times are observed. This issue is explored in Section 5.
NO decisions-Since δ < 0, decisions made after observing k = 0 action potentials must be of type NO and are made in exactly RT N 0 = τ 0 /δ s (Figure 3). For k = 1 a NO decision takes longer because the diffusion is incremented by r(1) when the action potential is observed (Figure 3); therefore, it takes −r(1)/δ longer to reach the threshold, thus RT N 1 = RT N 0 − r(1)/δ. The general expression for the NO decision time after k action potentials is therefore: Therefore, one would predict that when only one neuron is involved, decision times are sparse and no other decision time may be observed. Using the constants of Table 1 as in the example above yields: RT N k = (1.5 − k)/(9 log 10 (e))), i.e., one would predict the following discrete decision times 384, 640, 895, 1151 ms etc. for k = 0, 1, 2, 3, . . .. This is precisely what is observed in the simulation shown in Figure 6 (left).
YES decisions-As shown in Figure 6, the histogram of response times is lumpy. It is possible to predict this observation, and to see that each mode of the histogram corresponds to a different number k of action potentials. The time it takes to compute a YES decision that is based on k action potentials has a lower and an upper bound. Let's call the bounds RT Y, l k and RT Y, u k . The maximum time that it may take for a YES decision to be made is easy to compute: k action potentials increase the diffusion by kr(1) units. To obtain a YES decision after k action potentials in the amount of time RT Y, u k the diffusion will reach a value equal to τ 1 after RT Y, u k seconds, which implies a downwards drift of δRT Y, u k = kr(1) − τ 1 units. Solving for the upper bound yields: The lower bound may also be computed considering the fact that it is achieved when a decision based on k − 1 action potentials is missed by a hair's breath and is followed immediately by another action potential, which overshoots the threshold. Thus: Using the constants of Table 1  If one compares equations 8 with 9 one can readily see that when τ 1 = −τ 0 the sparse RT for NO decisions and the "zeros" of the RT histogram of YES decisions are the same, i.e., YES decisions mostly happen at times that are different from NO decisions.
One last question one may ask is how the various parameters (τ and λ) affect the spacing between the response times, and the height of the peaks, in the NO decision sparse histogram. It is intuitive that the larger the spacing and the higher the second peak w.r. to the first one, the easier it will be to observe the sparse nature of the response time histogram in an in-vivo experiment where multiple sources of noise and variability will tend to blur away the sparse/lumpy nature of the therefore the spacing is maximum for λ 1 −→ λ 0 . An experiment where C = 0 does not stimulate the neuron, and C = 1 stimulates the neuron lightly may therefore be the best option for revealing sparse response time histograms (see Figure 7).

DECISIONS INVOLVING MULTIPLE NEURONS
Decisions involving multiple neurons firing independently may be made using SPRT and Equation 4 (detailed equations for Poisson neurons may be found in Chen et al., 2011). Each action potential contributes to the diffusion independently of the other action potentials, and the contribution of each action potential is r i (1) = log(λ i 1 ) − log(λ i 0 ) (same as the case for a single neuron, see Equation 7), where i is the index of the neuron that generated the action potential. Thus, neurons that respond very differently to C = 0 and C = 1 (and thus λ i 0 and λ i 1 are very different) will contribute strongly to the diffusion, while neurons for which λ i 0 ≈ λ i 1 will have little influence.
If the population of neurons responds overall asymmetrically to C, then one would still expect to observe lumpy response time histograms if the neurons are not too many. In the special case where λ i 1 > λ i 0 ∀i (e.g., when C = 1 corresponds to a sound and C = 0 corresponds to no sound, and all neurons are excited by sound), then all action potentials will send the diffusion upwards and the only way a NO decision may be taken is by the diffusion drifting toward the negative threshold. In this case the NO response time histogram will be sparse. However, the analysis of this case becomes considerably more tedious than the analysis presented in Section 5.
If, on the other hand, the neurons are symmetrical w.r. to C, i.e., some neurons respond preferentially to C = 0 and some to C = 1, then the drift term is zero because drift terms for different neurons are equal and opposite and thus cancel each other. In this case the response time histograms are neither lumpy, nor sparse (see Figure 8).

FIRING RATES AND RESPONSE TIMES
It is intuitive that response times will be lower when the maximum firing rate λ 1 is much larger than the minimum firing rate λ 0 . Consider Equation 8. Decisions based on zero action potentials take τ 0 /δ = τ 0 /(λ 0 − λ 1 ). Since τ 0 is negative, these decisions are quicker when λ 1 >> λ 0 , which makes the magnitude of the denominator larger. Similarly, when k > 0, decisions take are quicker when λ 1 >> λ 0 since the numerator increases logarithmically, while the denominator increases linearly. Figure 9 shows the behavior of RT vs λ max when λ min = 1 for the symmetric case of two neurons described in Section6, where for one neuron λ 1 = λ max and λ 0 = λ min and vice-versa for the other neuron.

METHODS
The optimal Bayesian observer based on SPRT was implemented in Matlab using the Bernoulli method described in Section 3 with t = 1 ms and other parameters as shown in Table 1, unless otherwise specified in the text.

DISCUSSION AND CONCLUSIONS
The analysis I presented predicts that when binary decisions are computed by a mechanism involving a single neuron, one will observe sparse response time histograms for NO responses, and lumpy response time histograms for YES responses.
A number of additional observations are possible: 1. The histograms will still look highly structured when few neurons are involved, provided that the YES/NO decision is perceptually asymmetric (e.g., detecting the presence vs the absence of a sound, a vibration, or a light spot). 2. The response time histogram will look like a log-normal distribution when the neurons involved respond symmetrically to C. 3. Even if the timing of the action potentials produced by the input neuron(s) are unpredictable and uncorrelated, as modeled by Poisson statistics, the timing of action potentials produced by the neurons that compute a decision is highly structured; it is tightly quantized in the case of NO decisions. Conceivably, this fact will enable additional computations where synchronization between action potentials is required. 4. If quantized/structured response time histograms are observed in an experiment, it may be possible not only to predict that a small number of neurons is involved, but also to estimate their firing rate. 5. The predictions I make are not affected by the real statistics of the neuron's response. They are a consequence of using SPRT and the assumption that the neuron's statistics is Poisson. It is impossible to decide whether a short sequence of action potentials is governed by Poisson or other statistics. I suspect that any Poisson-like assumption will lead to the same qualitative prediction.
Observing sparse response time histograms experimentally is difficult for a number of reasons: 1. In most systems the number of neurons involved in the computation is more than a handful. 2. If the task is symmetric (e.g., discriminate between a red and a green light), then there will be neurons tuned to both conditions making both YES and NO histograms continuous, rather than sparse. 3. Motor response and neural propagation delays are themselves a random variable which, if summed to the perceptual response time, will blur away the sparse nature of the NO histograms.
I believe that one should be able to design single-neuron asymmetric preparations. For example, by stimulating a single ganglion cell in the retina with a small light dot, or stimulating a single tactile receptor in the skin of the back, where receptors are sparse. If such experiments prove to be possible, response time histograms would provide a wealth of information on the mechanisms involved in decision, including the firing rate of the neuron involved.