Stochastic optimal control of single neuron spike trains

Objective. External control of spike times in single neurons can reveal important information about a neuron’s sub-threshold dynamics that lead to spiking, and has the potential to improve brain–machine interfaces and neural prostheses. The goal of this paper is the design of optimal electrical stimulation of a neuron to achieve a target spike train under the physiological constraint to not damage tissue. Approach. We pose a stochastic optimal control problem to precisely specify the spike times in a leaky integrate-and-fire (LIF) model of a neuron with noise assumed to be of intrinsic or synaptic origin. In particular, we allow for the noise to be of arbitrary intensity. The optimal control problem is solved using dynamic programming when the controller has access to the voltage (closed-loop control), and using a maximum principle for the transition density when the controller only has access to the spike times (open-loop control). Main results. We have developed a stochastic optimal control algorithm to obtain precise spike times. It is applicable in both the supra-threshold and sub-threshold regimes, under open-loop and closed-loop conditions and with an arbitrary noise intensity; the accuracy of control degrades with increasing intensity of the noise. Simulations show that our algorithms produce the desired results for the LIF model, but also for the case where the neuron dynamics are given by more complex models than the LIF model. This is illustrated explicitly using the Morris–Lecar spiking neuron model, for which an LIF approximation is first obtained from a spike sequence using a previously published method. We further show that a related control strategy based on the assumption that there is no noise performs poorly in comparison to our noise-based strategies. The algorithms are numerically intensive and may require efficiency refinements to achieve real-time control; in particular, the open-loop context is more numerically demanding than the closed-loop one. Significance. Our main contribution is the online feedback control of a noisy neuron through modulation of the input, taking into account physiological constraints on the control. A precise and robust targeting of neural activity based on stochastic optimal control has great potential for regulating neural activity in e.g. prosthetic applications and to improve our understanding of the basic mechanisms by which neuronal firing patterns can be controlled in vivo.


Introduction
Manipulation of individual neurons through electrical stimulation provides a mean of controlling their spiking activity. In applications such as brain-machine interfaces and neuroprosthetics, a common goal is to record from a neuron and interpret the firing activity. Conversely, one may wish to stimulate a cell in order that it produce a desired firing rate, either fixed or varying in time. It is known that many cells fire sequences of spikes where the spike times-rather than just the rate-matter to post-synaptic neurons. In this paper, we explore the possibility of controlling a neuron in a way that it generates a sequence of spike times close to a desired sequence. We consider this problem in the framework of stochastic optimal control and give both a feedback solution (closed-loop), when the cell voltage is explicitly observable, as well as an open-loop solution when only the occurrence of spikes is observable. Importantly, we allow for an arbitrary noise intensity.
Both theoretically and in practice, related problems have been addressed in the literature. One objective has been to obtain either minimum or maximum interspike intervals lengths, when the input is constrained to be between some prespecified upper and lower bounds, see e.g. [19,20] for a mathematical treatment, or [6,25,32] in a neuronal context. Another objective is to break a pathological synchronous firing pattern in clusters of neurons, highly relevant for neurological disorders such as epilepsy and Parkinson's disease, [23,24], see also [13].
Our objective, namely targeting exact spike times in single neurons, has been considered mainly in the open-loop context, and in either absence of or for small noise. In [1] they use the spike response model, [15], to control output target spike trains and implement their scheme on pyramidal cells in mouse cortical slices. Their method is numerically efficient and allows for the simultaneous control of many neurons. However, it strongly relies on the assumption that the noise (the value of β in equation (1) below) is small. They only work with openloop control. The difference between the objective in [1] and what we consider below is that they maximize the probability of spiking at some given time t * , whereas we minimize the mean squared difference between the realized spike time, T sp and the desired, t * . Moehlis et al [21] work with a phase response model to obtain spikes at exact times, while keeping the root mean square of the input to a minimum. A similar approach is taken in [8]. They do not consider noise, though, and only work with open-loop control. In [26], their methods are implemented on brain slices of pyramidal neurons of a rat's hippocampus. While the phase response curve model is a parsimonious and effective way to describe a neuron's response to a stimulus, it is only valid in the supra-threshold regime, where the unstimulated neuron is periodically spiking. Feng and Tuckwell [12] investigate the control of the firing times of a leaky integrate-and-fire (LIF) neuron by varying the intensity of a noise process that drives the voltage (there is no other intrinsic noise in the model). To obtain a given spike time, they choose the objective of minimizing the variance of the membrane potential at the desired spike time, while forcing the mean of the membrane potential at this time point to equal the threshold. This provides exact solutions since it does not involve first-passage times, but has the drawback that there is a non-negligible probability that the obtained spike time will be far from the desired spike time.
Our objective of imposing a certain timing sequence for the spike train using an externally applied control is obtained in both the closed-and open-loop settings, and we specifically include the noise in the calculations of the controls. We restrict the controlled input to stay within pre-specified bounds, and also include a cost function to minimize intervention. Our main contributions are that we specifically allow for a non-negligible noise component in the neural activity when calculating the control, and consider both open-and closed-loop controls. The noise component is given by a Wiener process with a noise intensity larger than zero. In particular, we do not restrict our attention to the autonomously-spiking, supra-threshold regime.
The paper is structured as follows: first, we describe the neuron model and formalize the control objective. Then, we describe a feedback-based solution, which assumes that the controller has detailed access to the voltage trajectory. Then, we relax the observation assumption so that the controller only has access to the spike times. Finally, we compare the two methods through simulations against a simple-minded control technique which ignores the stochastic input to the neuron.

Problem formulation
A basic but useful model for the neural membrane potential evolution is the noisy LIF model: Here, X (t ) represents the membrane electric potential at time t, which in absence of input decays to 0 with a time constant of τ c , dW is a Brownian motion increment scaled by β and I ext (t ) is the deterministic external input to the cell. Having last spiked at time 0, the potential hits x th at some random time T sp , the potential resets to 0 and the process starts all over again. We write T sp + for the limit from the right at T sp . Throughout the paper, the threshold is set to one, x th = 1.
Suppose that we have some control over the external current such that it can be decomposed as where μ is an uncontrollable, but constant, part of the external current and α(t ) is controllable, i.e., it can be chosen to achieve some goal. A natural goal is to attempt to control the spike time, T sp . That is, how do we choose α(t ) such that T sp ≈ t * , where t * is the desired spike time. A natural optimal control objective to achieve this is the least squares solution where expectation is taken with respect to the distribution of the trajectories of X. Often, the control has certain constraints. The most common are simple box constraints: In addition to equation (3), we will also add to our objective a running energy cost based on the control. This regularizes the problem eliminating the subtleties of singularcontrol situations and it serves to avoid excessive control as well as to avoid excessive charge building up on the cell, see [1].
So, we seek an optimal control, α * , that solves where measures how much weight we put on minimizing the energy cost. If = 0, then we do not care at all about the expended energy cost. It will often be the case that α is either a function or a value of that function at a particular time. This function could be random, i.e., a stochastic process, which is naturally the case when it is a function of the random realizations of X. We will try to make clear below when we are referring to α as a function and when we are merely referring to its particular value at some particular time. For example, in equation (5), α(·) refers to the function, while α(s) refers to that function's value, possibly random, at time s.
We will consider two control contexts-closed-loop and open-loop control. In closed-loop control, the value of X at time t is observable and can be used in determining the control. In open-loop control, only the spike times are observable. In the closed-loop context, we write the control as to indicate its dependence on X (t ) and to express that it will be updated based on the time-course of X. In the open-loop context, we write to indicate that α(·) is decided for all times at time 0. The techniques used to obtain the optimal controls in the two scenarios will be different. For the closed-loop scenario, we use dynamic programming, see [14], while for the open-loop scenario, we use a form of the maximum principle applied to the transition density of the controlled process, see [5]. The transition density is the probability density function of the process being at a state y at time t, given it was at some state x at some earlier time s.
Crucially, we assume that the model parameters, μ, τ c , β in equations (1) and (2) are known. This is a strong assumption and will be discussed later.

Parameter regimes
Different parameter regimes can be envisioned given equation (1), depending on whether the noise intensity, β, is relatively high or low, and whether the external, uncontrollable bias current, μ, induces spikes in the absence of noise or not. Spikes will occur in the absence of noise, if and only if μ > 1/τ c , which is called the supra-threshold regime. When μ 1/τ c , 1.5/τ c Supra-threshold-high-noise Supra-threshold-low-noise 0.1/τ c Sub-threshold-high-noise Sub-threshold-low-noise the regime is called the sub-threshold regime. In addition, we will investigate two values of β, which we will call highnoise and low-noise, respectively. Example values for each parameter regime are given in table 1, and we visualize a single path for each regime in figure 1.

Closed-loop solution-dynamic programming
We now detail the dynamic programming approach to obtaining optimal feedback control. In closed-loop, the controller can be continuously updated depending on the realization of the stochastic process, X. Given a time t and a value X (t ) = x of the voltage, let (T sp − t ) be the unknown and remaining time to spike. Note that T sp is a random variable. Given arbitrary t, x, our remaining-cost objective, J[α(·); x, t], will be That is, if time t has elapsed without a spike, we now want to minimize the difference between (T sp − t ) and (t * − t ), given the current state x. The mean in equation (6) is taken over the distribution of hitting times, T sp , conditional on X (t ) or equivalently over the distribution of forward trajectories of X starting at x and ending at the threshold. Recall that in the closed-loop scenario, we assume that the value of X (t ) is known to the controller. A similar problem is discussed analytically at length in the book on optimal control by Whittle [31], although there is no discussion there of the numerics required to solve it.

Hamilton-Jacobi-Bellman equation
The Hamilton-Jacobi-Bellman (HJB) equation, see [14,31] or the articles [6,23] in a neuroscience context, associated with the optimal control for equation (6) is obtained as follows. We introduce the value function, w(x, t ), as the minimum of the remaining-cost objective, i.e., of the cost function between the current time t and the desired spike time t * : Then, w satisfies the following HJB partial differential equation (PDE): The special feature of equation (8) in contrast to a generic parabolic PDE is that it contains an embedded optimization that depends on the solution, w. For each x, t in the computational domain, α is chosen such as to minimize Here, we can solve for the optimal control, α * (x, t ), analytically as We need to consider boundary conditions (BCs) for w. If X (t ) = x th , then we have a spike now and T sp = t. Thus, At the threshold, the value function equals the squared difference between the desired spike time and the realized one.
For large, negative values of x, we assume that w is not significantly affected by the change in x, i.e., that for some lower boundary x − . Such a BC will be justified if we choose x − such that the probability for the process to take values smaller than x − is small. For example, we can take x − to be two standard deviations below the mean of the stationary distribution of the maximally inhibited process, i.e., setting α = α min in equation (2). That is, we set Note that dynamic programming and the HJB equation work backwards. Thus, the evolution of the value function proceeds from the future to the past and we need some terminal condition (TC) at some point in the future, possibly infinity, from which to start incrementing w using the dynamics and the BCs. To determine TCs for w, our idea is simple: if we reach t * without having spiked we apply maximum control in the positive direction, i.e., (10) Note that we are making an approximation here-we are ignoring the energy term, u 2 , in the objective for t > t * . Naturally, this approximation is ever more accurate for 1. We discuss in more detail the validity of the approximation in section 6. Alternatively, we could impose this approximate TC at some t + > t * .
We will see the quantity on the right hand side of equation (10) repeatedly so we will give it a special name.
T (2) is the second moment of the remaining time to reach the threshold starting at X (t * ) = x and applying α max throughout. This quantity can be found easily for all x in the domain by solving a stationary backward Kolmogorov equation. This is an extension to the calculation of the first moment of an exittime as seen in textbooks such as [17] and we give the details in appendix A.
Thus, we restate the HJB equation in its fully specified form: We are solving w(

The numerical method for the HJB equation
We now have a PDE for w and an algorithm for computing all the BCs and TCs of this PDE. It is time to discuss the numerical method for solving equation (12). Since it is one-dimensional in space, it is straightforward to apply the standard centred finite difference using the Crank-Nicholson scheme to step in time, see chapter 19.2 in [28]. To resolve the nonlinearity, (∂ x w) 2 , in the PDE, we treat it as a mixed implicit-explicit term Note that the implicit term is in the previous time t k instead of, as is conventional, the next t k+1 , because we are solving for w backwards in time from t k+1 to t k down to t 0 = 0. The numerical scheme is implemented in Python, using the Scipy/Numpy library, [18]. For the discretization in time and space, we choose x and t in relation to the parameter regimes. In particular, we take x to be less than β divided by the largest possible absolute value of (μ + α − x/τ c ) for α ∈ [α min , α max ], x ∈ [x − , x th ]; in our examples, this always equals μ + α max − x − /τ c . This is an attempt to ensure that numerically we are in the diffusion-dominated regime. We set t to be twice the value of x divided by the largest possible absolute value of (μ + α − x/τ c ). For example, in the supra-threshold low-noise regime, we have x = 0.0043 and t = 0.0013. In all regimes, our discretization results in point-wise convergence of up to at least three significant digits for the value function. We consider this to be a sufficiently fine discretization.
We next demonstrate solutions for the value function and the associated control law for a parameter set from each regime in table 1.

Solutions of the HJB equation
We set t * = 1.5 and = 0.001, so that the primary focus is the accurate spiking and energy minimization considerations are secondary.
In figure 2, we show the computed value function, w(x, t ), the solution to equation (12), for each of the parameter regimes; while in figure 3, we show the corresponding surface plots of α(x, t ).
Let us discuss the most salient features in figures 2 and 3. Recall that lower values for the value function, w(x, t ), are preferred, since we are minimizing. We see the same general shape in all four space-time surface plots of the value function in figure 2. At the end of the interval, w(x, t * ) is monotonically decreasing in x, which is to be expected given its TC in equation (10). That is, the lower the value of X (t * ) the longer on average will we have to wait for the spike to occur. As we go back in time, w inverts, with a clear peak near the upper threshold boundary. That is, for intermediate values of t, we have to consider the risk of spiking too early, represented by the high value of w near the upper boundary and the risk of spiking too late, represented by the high value of w at the lower boundary. As we progress even further back in time to the beginning of the interval, the peak near the threshold rises further, since we are spiking earlier, while the peak at the lower end flattens, since now there is enough time to reach threshold despite starting far from it. The full surface plots for w(x, t ) in figure 2 can be thought of as the interpolation between these three basic phases. Now focus on the controls in figure 3. Naturally, at the end of the interval, the control takes its maximum value, α(x, t * ) = α max for all x, i.e., it gives the maximum available push for the neuron to spike. This is consistent with equation (10). As we go back in time, however, the control α decreases for x x th , and it becomes negative, i.e., inhibitory for x x th . That is intuitively consistent with the problem objective. For t < t * , we want to bring X (t ) closer to the threshold, but not too close, to avoid early spiking. Now consider the differences between the individual regimes, i.e., the effect of changing the bias and the noise intensity, μ, β. All things being equal, the effect of increasing the noise intensity, β, is to lift the value function, i.e., to make our objective worse, at the beginning of the interval and to decrease it at the end. This is to be expected since we are attempting to minimize the variance in the spike time and without noise there would be no variance at all. However, at the end of the interval, noise only helps, since now we are only interested in spiking as soon as possible, and a higher noise intensity will tend to decrease the spike time on average. Increasing β also has the effect of increasing the size of the boundary layer near x th , where the value function rises steeply. That is, for small β, that layer is small since the risk of spiking early is significant only close to the threshold. For larger β, this layer increases, see panels A, C, with a thin layer, versus B, D, with a thicker layer, in figure 2. Similarly, increasing the bias, μ, tends to decrease the value function, especially at the end since that has the unequivocal effect of preventing late spikes.

Open-loop stochastic control
When the value of X (t ) is unobservable, the transition density can be used to perform the optimization. Since the transition density follows a deterministic PDE, we apply a maximum principle for PDEs as a method of obtaining the optimal control, see [5] for details on optimization with PDEs, or the article [2], for optimization with a Fokker-Planck-type of system.

Fokker-Planck equation for the state density evolution
We write the transition density of X, conditional on no spikes having occurred since t = 0, as Then, f satisfies a Fokker-Planck equation, see chapter 7 in [17], In theory, x − = −∞, but in the numerics below, we will need to truncate it to some finite value, exactly as in the HJB equation, equation (12). Note that we write α(t ) here instead of α(x, t ), since we cannot use the value of X (t ).
The f dynamics can also be written as for the probability flux Then, the lower BC is We will also need a short hand notation for the differential operator on the right side of equation (13). Let where [·] indicates the argument of the operator L α . We will usually omit the subscript α, but have written it now to emphasize that the differential operator is parametrized by the control, α.

Restating the objective in terms of the transition density
We now derive the optimality conditions for the optimal control α * for the open-loop context. Recall our objective, equation (3). We can write this in terms of the transition density, f , as Let us explain in more detail each term on the right-hand side in equation (14). The first term, counts the cost of trajectories that spike too late. This cost is the expected squared time-to-hit starting at x, with α = α max , weighted by the probability of X (t * ) = x, which is just f (x, t * ). Recall that T (2) is defined in equation (11).
The second term, counts the cost of trajectories that spike too early, that is the squared difference between some realized spike time, T sp = t, and desired spike time t * , weighted by the probability of a spike at t, which is just the outward probability flux at the threshold, φ(x th , t ). Recall that we assume that x th = 1 throughout. Note further that due to the homogeneous Dirichlet BC at Finally, the third term, is the energy cost; the inner integral, x th x − f (x, t ) dx, takes into account that we incur an energy cost only for those trajectories that have not yet spiked.
With that our optimal control α * (·) will naturally be found via:

Optimizing using a maximum principle
By now, our problem of controlling the stochastic process in (1) has become a deterministic optimal control problem over its associated Fokker-Planck PDE. Our control α influences the evolution density f and, via f , the integrals in the objective, J.
The maximum principle for PDEs, which is an extension to the famous Pontryagin maximum principle from finite dimensional systems, introduces an adjoint variable, p, which solves a PDE related to the PDE satisfied by the density f and then calculates the optimal control, α(·), as a functional of f and p.
In short, the equation for the adjoint function, p, is and then α can be found via Roughly speaking, equation (16) corresponds to setting the derivative of the objective J with respect to α to zero, in equation (14) and the adjoint state, p acts as a Lagrange multiplier corresponding to the constraints of the transition density dynamics. . The initial value of the process is always the reset, i.e., X t = 0 at time t = 0, see equation (1). From left-to-right, the control curves correspond to the sub-threshold-low-noise, sub-thresholdhigh-noise, supra-threshold-high-noise, supra-threshold-low-noise regimes.
More practically, the quantity gives the direction of increase of J at α(t ) and we can use it as a gradient in a descent algorithm, given some initial guess, α 0 (t ), for the control. Finally, we are ready to calculate the open-loop stochastic optimal control for the four parameter regimes. We use the same numerical method for solving the PDEs as described in section 3.2, since the PDEs for f , p are very similar in structure to the PDE for w except they do not contain a nonlinear term. The details of the simple gradient descent algorithm for obtaining the optimal open-loop control are given in algorithm B1. See [3] for a more sophisticated descent method based on the conjugate-gradient method. For our applications, the algorithm in B1 typically converges in less than 10 iterations, although sometimes it can take longer.
The optimal controls obtained using algorithm B1 for each regime are in figure 4. Recall that the optimal control, α * (t ), is open-loop, and thus, it is only a function of time.
Let us discuss the most salient features of the controls calculated in figure 4. In general, the strategy is to inhibit the neuron at the beginning of the interval, α(t ) = α min for t t * and to excite it near the end, α(t ) = α max for t ≈ t * . The smooth portion that connects these two segments in the middle of the interval is due to the energy penalty, α 2 (s) ds. The behaviour of the control in different regimes is relatively simple to explain and we see what we would expect. In a sense, the bias current μ can be absorbed by the control, and so, increasing μ amounts to decreasing α modulo its bounds. That is exactly the difference between the supra-threshold versus sub-threshold plots in figure 4 when holding the noise intensity fixed, either β = 0.3 or β = 1.5. Both for low and high noise intensity, a reduction of μ results in an increase in α(t ). The effect of varying the noise intensity, β, is more subtle and it is not the same in the supra versus the sub-threshold regimes. In the supra-threshold regime, reducing β reduces the need for excitatory control towards the end, since the bias alone should put the neuron over the threshold. Thus, the main part of the control with low noise in the supra-threshold regime is to stop the neuron from spiking too early. In the sub-threshold regime, the situation is somewhat reversed-reducing β obviates the need to apply an inhibitory control in the early part of the interval and also prompts the controller to apply excitatory input earlier, since it is the controller which is now the main drive for a spike.
We should mention that as goes to zero, the optimal control should become bang-bang, meaning that at any time, it should assume either its minimum or maximum value. This is natural, since we have a classic situation where the dynamics are linear in the control and the cost is control independent (if = 0). This is also intuitive, since if there is no cost to the control, the controller would prevent early spiking by applying α min and then close to the desired spike time, apply α max . Theoretically, this could help in expediting the calculation of the open-loop control by reducing the problem to finding the optimal switch-point from α min to α max , but we have not pursued this further.
Finally, we would like to briefly compare conceptually the two optimal control approaches that we have used-dynamic programing and the maximum principle. The maximum principle essentially finds a critical point for the objective, akin to the first derivative test in calculus. Dynamic programing recursively builds up the optimal control from the end-time backwards, using a value function to represent the entire future cost associated with an incremental decision. The conceptual differences result in different numerical techniques. Most pertinently, when we apply dynamic programing, we can infer the optimal control right away, while for the maximum principle, we are forced to use an iterative, gradient-descent procedure.

Single-spike control
Having obtained the optimal controllers, both closed-and open-loop, we evaluate their performance with simulated realizations of the voltage process, equation (1). In particular, while they both minimize the expected squared deviation of the spike time from some desired spike time, we analyse the distribution of the squared deviation and the behaviour of the controls for different parameter regimes.
In addition to the optimal controllers, we also show the behaviour of another control law-perhaps the most naive one-which is obtained by ignoring the noise and the energy penalty. That is, we find a constant value of α that satisfies the desired BCs, x(0) = 0, x(t * ) = x th . This naive controller will be called 'deterministic', since it assumes deterministic dynamics in X, i.e., β = 0.
For the comparison, we sample N realizations of the controlled system and apply in turn each of the three controls. Naturally, we reuse the same realization of the underlying stochastic process, W t , for each of the three different controls.
We set = 0.001 and α max = 2.0. As such, the energy cost is of secondary importance and the paramount effect on the objective is the squared difference, (T sp − t * ) 2 , which we are trying to minimize.
The performance of the three controllers for each parameter regime is given in table 2 and figure 5. Naturally, the closed-loop achieves a lower error than the open-loop, and the naive controller fares worst. The difference in performance mostly depends on the strength of the noise. Thus, for low noise, the performance of the stochastic controllers, be they open-or closed-loop, is not much superior to the naive deterministic controller. In contrast-in the high-noise regime, using a stochastic controller gives a much lower error on average between the desired t * and the realized T sp .
To give a better view of the performance of the controllers as a function of the noise intensity, we plot the 'percentage of correct spikes' and the 'average squared spike-time deviation' in figure 6. In figure 6, we also illustrate the effect of two values of the control bounds, α max = 2 or 4. Naturally, with a higher α max , the error is reduced and the percentage of correct spikes is increased. As expected, the performance gets worse with increasing noise intensity. When the noise is small, the suprathreshold regime behaves best, but when the noise increases the sub-threshold regime provides more flexibility to correct for large perturbations caused by the noise.
For illustration sake, we also visualize some trajectories from the sub-threshold-high-noise regime in figure 7, see panels A, C, E. The most notable feature of the α(·) plots in figure 7 is that, especially for = 0.001, there is a high level of fluctuations in the closed-loop α. That is to be expected from the optimality condition in equation (9). Since the control is proportional to the derivative of the value function and we are minimizing, roughly speaking, the control attempts to push the process, X, to the bottom of the valley that is formed by the value function, w. However, the stochastic fluctuations of X (t ) push it randomly to either side of that 'valley' and in turn force the control to change signs to counter-act. These fluctuations are then amplified by the division by the small parameter , so that the control mostly bangs up or down to its extremes α min , α max .

Multi-spike control
We now turn to the ultimate goal of our analysis, the control of spike trains. The major challenge when working with spike trains, that is not present when considering intervals in isolation, is that the target spike train might contain several closely clustered spike times, such that if the first controlled spike time occurs with delay, there will be too little time to hit the next target spike times and they will all be delayed. However, we adapt the controller to the actual controlled spike occurrences and compensate so that it is still targeting the originally posited train. Thus, there should be no accumulation of errors unless the controller is unable to catch up to an unusually high firing rate.
We proceed by generating M = 50 realizations of N = 16 spikes each in an attempt to meet a prescribed spike train. We focus on two regimes, the supra-threshold regime with either low or high noise. The results for parameters from the lownoise regime are shown in figures 8(a) and (b) and results for the high-noise regime are shown in figures 9(a) and (b). In all cases, the target train is obtained by a simulation of the Table 2. Realized and theoretical performance of the different control laws. The empirical performance is obtained using N = 10 000 sample paths. The theoretical performance is found using the optimal value for J for the open-loop stochastic control and the value function w(x = 0, t = 0) for the closed-loop stochastic control.
Average squared spike-Expected squared spike- It is immediately clear that while the evoked trains closely follow the target train in the low-noise case, figures 8(a) and (b), this is a lot more difficult in the high-noise case, figures 9(a) and (b), and the restricted control is not able to produce reliable results. A stronger control is needed to obtain the same level of accuracy in the target. In figure 10, we relax the bound constraints on the control [α min , α max ] from [−2; 2] to [−4; 4]. Then, significantly better tracking of the target train is achieved. Thus, to control a system under higher noise comes at the price of allowing a more powerful control signal.
The main question in this section is whether the separate optimization of each interval independently of the others is a suitable approach when targeting a whole spike train. In particular, we do not account for the possibility that two target spikes occur close together. Then, it might make sense to attempt generating the first one a little earlier, on average, to give more time to generate the second. For example, in figures 8(a) and (b), the 13th target spike is successfully produced, but the 14th target spike occurs so soon after that in all trials it is delayed. However, if the target interspike intervals are within reasonable reach of the neuron, this is not a problem. Here, 'reasonable reach' is to be understood as any period longer than the time it takes to reach the threshold in the case of no noise and under maximum excitation, i.e., α(t ) = α max .

Controlling a biophysical model
To verify our method in a more biophysically realistic model, we use the established Morris-Lecar model [22], a Hodgkin-Huxley type of conductance-based model. To mimic an experimental situation, we will not assume that we know the model nor the parameters, and simply use the LIF model (1)-(2), where parameters are estimated from data sampled before the control is initiated. First, we estimate parameters for the LIF model from observations of the Morris-Lecar model. Then, we control the Morris-Lecar model using the scheme derived for the estimated LIF model. Thus, the control strategies will be conducted in a realistic experimental setting with no specific knowledge of the underlying mechanisms, considering data as if coming from a black-box.

The stochastic Morris-Lecar model. The stochastic Morris-Lecar model including both current and channel noise is defined as the solution to
The processesB s and B s are independent Brownian motions. The variable V s represents the membrane potential of the neuron at time s and W s represents the normalized conductance of the K + current. It varies between 0 and 1, and can be interpreted as the probability that a K + ion channel is open at time s. The equation for the dynamics of V s contains four terms, corresponding to Ca 2+ current, K + current, a general leak current and the input current I. In addition, it contains the externally applied control current A(s).
The functions α(·) and β(·) model the rates of opening and closing of the K + ion channels. The function m ∞ (·) represents the equilibrium value of the normalized Ca 2+ conductance for a given value of the membrane potential. The parameters V 1 , V 2 , V 3 and V 4 are scaling parameters; g Ca , g K and g L are conductances associated with Ca 2+ , K + and leak currents; V Ca , V K and V L are reversal potentials for Ca 2+ , K + and leak currents; C is the membrane capacitance; and φ is a rate scaling parameter for the opening and closing of the K + ion channels. The parameter γ scales the additive current noise. Conductance fluctuations caused by random opening and closing of ion channels leads to multiplicative noise on the conductance equation. Function σ (V s , W s ) models this channel or conductance noise. We consider the following function that ensures that W s stays bounded in the unit interval if σ 1 [9]: Parameter values used in the simulations are the same as those of [29,30] for a class I membrane: nA. The noise intensity values are taken from [11]: γ = 1 mV s −1/2 , σ = 0.2 mV s −1/2 . Trajectories are simulated with an Euler-Maruyama scheme with a time step of 0.01 ms, which is then sub-sampled every ten points to compensate for approximation errors of the simulation scheme. Finally, we obtain a data set with observations every = 0.1 ms. An example of a simulated trajectory is given in the top and bottom panels (V, W , respectively) of figure 11. The peaks correspond to spikes of the neuron.

5.3.2.
Relating recorded voltage to a LIF model. Assume discrete observations of the membrane potential, here generated by the Morris-Lecar model as described above. These have to be related to the LIF model (1)- (2), which assumes that spikes are point events. Thus, the first problem is to partition the data into sub-threshold fluctuations and spikes. To make the method more robust, we operate with two thresholds. A lower threshold, where fluctuations below can be well approximated by the LIF model, is employed to identify the data used for estimation of LIF parameters. We set this to v th = −22 mV, see figure 11, upper and middle panels. A higher threshold is set to determine the start of the spike, a point of no return, from which the potential can only relax by going through a spike. We set this to −14 mV, see figure 11, upper panel. When implementing the control assuming a LIF model, we attenuate the nonlinear effects of being close to spiking in the biophysical model (or real data) by treating the interval between −22 and −14 mV as a grey zone, artificially setting the data to a constant value just below the threshold. Finally, we need to fix the resetting of the voltage, which we set to v r = −44.53 mV.
In the Morris-Lecar, model the spike is not a point event, and in order to compare it to a LIF model with instantaneous reset, we ignore the voltage trajectory during spikes, where we will make no control and just wait for the reset. The effect of a spike for this set of parameters is around t ref = 40 ms, see lower panel of figure 11, at which point we restart. The time t ref between the start of the spike and the reset is then entered in the LIF model as a refractory period. The value 40 ms is obtained from studying the voltage traces and is specific to the spike dynamics of the Morris-Lecar model. In a real experimental setting, it will probably be shorter, being the sum of the spike duration and the refractory period, the exact duration will depend on the system. The LIF model (1)-(2) is non-dimensional and normalized, resetting at 0 and spiking at a threshold of 1. Therefore, the data have to be transformed. First, we nondimensionalize time by the transformation t = s/T , where s is the time-scale of the measurements, andT is some mean interspike interval; here, we use the average of the target spike trains,T = 177.48 ms. Then, we transform an observation On the transformed data, we estimate the parameters μ, τ c and β. Then, we are ready to start the control. The control α(t ) is calculated on the transformed data, transformed back to a control on the original scale, A(s) = (v th − v r )τ c α(t ), and fed into the Morris-Lecar model (or the cell in an experiment).

Estimating LIF parameters from Morris-Lecar data.
Assume α(·) = 0 and discrete observations of model (1)-(2), X (k) n , for k = 1, . . . , K, where K is the number of interspike intervals, and n = 0, 1, . . . , N k , where N k + 1 is the number of observations in the kth interspike interval. The maximum likelihood estimators for the parameters μ, τ c and β 2 in absence of a threshold are given by, [4], Now, we consider the more difficult case when only spike times are available, see [10] for a discussion. Estimating τ c from spikes is not accurate, it is (almost) unidentifiable. The parameters from the supra-threshold low-noise regime are used, see table 1. The target spike train was generated using the model itself, with parameters from the supra-threshold high-noise regime, without an applied control, α = 0. Therefore, we assume a fixed value of τ c = 0.11. This is quite different from the value of τ c estimated from intracellular recordings, so that we investigate robustness to misspecification of this value. Thus, we only estimate μ and β, using the Fortet equation, for details see [10]. The estimates arê μ = 6.17 andβ = 0.78. This corresponds to α(t ) ∈ [±3.94] if A(s) ∈ [±10].

Controlling the Morris-Lecar model.
We now have all the pieces to control the Morris-Lecar model. In the closedloop control strategy, we use the voltage-based estimates for the LIF model parameters, while for the open-loop strategy, we use the interval-based estimates. This is a disadvantage for the open-loop strategy, since on average, it will have poorer estimates of the real system, but provides a realistic picture of an experimental situation. The closed-loop results are shown in figure 12, and the open-loop results are shown in figure 13. In particular, the root mean-square error is 8.0 ms for the closedloop controller and 23.2 ms for the open-loop controller, this corresponds to, respectively, 4.5% and 13.1% in relation to the mean target interval, which is 177.48 ms (This discards the t ref portion of the interval).
The results and accuracy are comparable to those in figure 8, and show that for the purpose of stochastic control, it is not crucial to use a biophysical realistic model nor to know the parameters exactly. This is partly due to a lower value of τ c in the Morris-Lecar model, compared to what was used in the earlier simulations, and to slightly higher values of the bounds of the control. The variance is proportional to τ c β 2 , which in the simulations is 1.125 (high noise) or 0.045 (low noise), and in the estimated Morris-Lecar is 0.07. There might also be some error cancelling, in the sense that the data provide estimated parameters that are the best choice for the misspecified model at hand, where errors are averaged out, thus providing more robust control.
A similar result was obtained in [4], where the inhibitory and excitatory conductances were estimated from subthreshold fluctuations using the basic LIF model, both on experimental and simulated data. Data were simulated from either a conductance-based model with Poisson synaptic input, or the more detailed two-compartment Booth-Rinzel-Kiehn model. In both cases, the LIF model, even if misspecified, resulted in sensible estimates of the conductances and their confidence intervals. It appears that the parsimony of the simple model provides robustness against nonlinear effects, space inhomogeneities and dynamics on various time scales.

Effect of energy penalty
So far, we have assumed a low value for the energy penalty parameter, , in the objective defined in equation (5). Our approach has been to be primarily concerned with the accurate spiking and that penalizing energy is rather intended to regularize the control and avoid excessive chattering of the control between its extreme values, than as a goal in minimizing energy in its own right. Now, we take some time to explore the effect of a higher on the optimal controls. Intuitively, a higher value of will tend to bring the optimal control, α * , closer to zero. We show the effect of setting = 0.1 in each of the four regimes for both the closedloop control and the open-loop control in figures 14 and 15, respectively. These should be compared to figures 3 and 4, respectively. Also, in figure 7 on the right, we show example trajectories for the higher energy parameter value, = 0.1, in the sub-threshold high-noise regime.
For both the closed-loop and open-loop optimal controls, increasing tends to reduce the absolute value of α * (t ). In particular, for the closed-loop control, it tends to broaden the area of transition for decreasing x when the control swings from its minimal, i.e., inhibitory value, to its maximal, i.e., excitatory value. Similarly for the open-loop control, instead of banging from its minimal bound at the beginning of the interval to its maximal bound at the end, the optimal control for = 0.1 tends to have a milder transition from a slightly inhibitory to slightly excitatory values.
Note however that increasing has an important effect on the validity of the approximation used to form the TCs for the value function, w, and the adjoint variable, p. We assumed that applying α max is optimal for t > t * . This is indeed true in the limit →0. However, even for a finite value of , this may still be the optimal thing to do. Indeed, in the original calculations, where = 0.001, see the panels on the left in figures 14 and 15, the so-obtained value function implied that α(x, t * ) = α max . In other words, our guess that α(x, t ) = α max for t > t * , is self-consistent with the so-obtained value function. This is akin to confirming an ansatz. For a higher value of , like = 0.1, this is no longer the case. The red curves on the right of figure 14 are no longer pushed up at α = α max , and as such it is no longer valid, strictly speaking, to assume that the value function can be obtained by assuming α(t )| t>t * = α max and then calculating the expected remaining time-to-spike. The exact same conclusion can be drawn from the open-loop control. There, the optimal control always (almost) reaches its maximum, α max , while choosing = 0.001, this is no longer so for the higher = 0.1. In fact, as we mentioned after equation (10), the correct thing to do for higher values of is to push the calculation interval to some t + > t * , apply the same TC at t + instead of t * and then solve backwards (either for the closed-or open-loop control). One will be guaranteed  to find some t + > t * that works since eventually the quadratic term (t − t * ) 2 will dominate the energy term in the objective, which is linear in t. We do not explore this further here.

Discussion
We have analysed and computed the optimal control of spikes in two scenarios-one where the underlying voltage of the neuron is observable to the controller (closed-loop control) and one where only the spikes are observable (open-loop control). The optimal control problem is solved using dynamic programming if the controller has access to the voltage (closedloop control) and using a maximum principle for the transition density if the controller only has access to the spike times (open-loop control). We have further shown how one can implement both techniques for an arbitrary spiking neuron.
Our calculations in section 5.3 show that our choice of a simple model, the LIF, is quite robust, it can be correctly and quickly estimated from data and it allows for accurate control of realistic neurons with a significant noise intensity. This, together with our demonstration in section 5.1 that the LIF control can accommodate a significant amount of noise, demonstrate the generality of our open-loop and closed-loop control methods to produce desired spike trains.
Naturally, the ability to control the system is most clearly affected by the level of the noise. In a sense, the noise acts like an adversary-in our context, it has no beneficial role, except to obstruct precise spiking. When the level of the noise is high, stronger control is needed, and the trade-off between accuracy (hitting the target spikes), possible damage to the system (the limits of the control) and energy expenditure has to be considered.
The bias current has a more helpful role, at least in our examples, where a high value of the bias tends to help precise spiking. Of course, where it helps or hinders depends on the value of the desired spike time-a combination of a high positive bias and a 'distant' spike time will tend to be difficult to control, as the system will naturally tend to spike earlier than desired. In summary, we have found that systems in the supra-threshold low-noise regime tend to be most accurately controlled, while the supra-threshold high-noise regime are the least accurate. It should be noted that we do not require that the controlled cell is already in the supra-threshold regime-the control scheme applies in the same manner in the sub-threshold regime as well. It is only required that the maximum value of the control puts the model in the supra-threshold regime, implying that the control can always achieve supra-threshold behaviour. The supra-and sub-threshold distinction only applies to the intrinsic dynamics in the absence of control, and ignoring the noise.
In both contexts, open-loop and closed-loop, finding the optimal control is computationally intensive as we numerically solve PDEs. If these algorithms were to be practical in an online setting, some more work would have to be done in order to ensure they can be computed very quickly (in the millisecond range) or that they can be pre-computed. In that regards, the dynamic programming approach is significantly more computationally efficient, at least in the current formulation.
Our work has stayed close to the paradigm of [1] in trying to make the neuron spike at a particular time. Similar to them, we have also incorporated into our objective the minimization of total energy used to achieve our goal, which as discussed in [1] is sensible given the potential damage to the cell of accumulating charge. Furthermore, we have assumed that our control is constrained in magnitude, which is natural given physical limitations on equipment and safety considerations. Another possible constraint on the control is that it is chargebalanced, meaning that its time integral is zero, u dt = 0. Such constraints are most easily posed in the context of deterministic spiking models like the phase-response curve, see, e.g., Danzl et al [7]. It is possible to add them to the open-loop control, since the control is still deterministic, but it is non-trivial. It is even less trivial when one uses dynamic programming. In particular, insisting on chargebalance in our schemes will make it much more difficult to apply the TCs used in deriving both the closed-and open-loop controls.
Our computational scheme has made some assumptions about the physics of the controller. Most pertinently, we must be able to either measure the detailed voltage or at least the spikes of the controlled neuron and be able to react to them in real-time. This is needed both for system identification,  wherein we find the parameters for the LIF model, and for the actual control. We also need to be able to inject intra-cellular current into the cell. There exist alternatives to intra-cellular simulation, for example, [1] discusses how photo-stimulation can be used to elicit a similar effect, and our method is relevant to that context as well. Naturally, the ability to use extra-cellular stimulation will be beneficial to the practical implementation of a neuron control strategy.
Our techniques as presented would suffer from the 'curse of dimensionality' if one were to try to control N neurons with known coupling, since indeed in this case, we would be solving N-dimensional PDEs for the control, be it closed-or open-loop. Such a scenario is further complicated by the requirements of system identification-one would have to identify the exact N-by-N coupling strengths. However, in some situations, it may be possible to model network effects to first order by a rough approximation where the effects of other neurons (mean field effects) are taken into account by modifying the singledimensional variables, μ, β-the bias current and the noise strength. As such, the control of N neurons could then be reduced coarsely to independently controlling each neuron, which is the technique described in this paper. We leave the exploration of this possibility for future work.
As points of outlook, we mention that our scheme can potentially be applied to produce a periodic train in the face of noise with applications to various biological pacemakers, or it can also be used to produce the opposite effect-random behaviour at the single cell level, for example, by specifying a target train with Poisson-like characteristics.
The novelty in our work is primarily in obtaining a stochastic control scheme, which is not limited by the intensity of the noise in the system. In fact, by comparison with the naive 'deterministic' controller, we have shown how wrong the control strategy can be if we assume that the noise is negligible. Given this scope, the numerical demands on our scheme are non-trivial, but our experience and experiments show that these demands can be addressed. It however remains to be seen whether they can be successfully applied in the lab and beyond. our purposes, we will set it to some finite value and impose reflecting boundaries there just as we did for the value function w. This is a very good approximation for x − 0, as long as α = α max > 0.
Let B(y, t|x) = P[X 0 = y|X (t ) = x] for a fixed x. Then B solves the following (backward Kolmogorov) PDE LetḠ(t, y) be the survival function for T sp given X (t * ) = y, Because the dynamics do not depend explicitly on time, we haveḠ (t, y) = x th x − B(y, t|x) dx and soḠ, being a definite integral of B, satisfies the same PDE: The left-hand side turns out to be 2T (1) , so 2T (1) = μ + α max − y τ c ∂ y T (2) − β 2 2 ∂ 2 y T (2) .
Similarly, we can derive an equation for T (1) , namely: (1) . (A. 9) In both cases, the boundary conditions will be T (i) (x th ) = 0 (we are already spiking). (A.10) Equations (A.9) and (A.8) can be reduced to first order differential equations and solved analytically, at least up to some definite integral, but the analytical solution is not particularly useful or necessary for numerical calculations. Instead, we opt to calculate the solutions to (A.9) and (A.8) directly via an ODE solver, starting from T i (x th ) = 0 and integrating down for x ∈ [x − , x th ]. That is how we have obtained the TCs for both w(x, t * ) and p(x, t * ) to apply in, respectively, equations (12) and (15).

Appendix B. Gradient descent algorithm for the maximum principle
Algorithm B1. Gradient descent algorithm for obtaining the optimal open-loop control Fix t * , , . . . the problem parameters Fix {t n } Nt 0 a time-discretization of [0, t * ] Fix g tol a convergence tolerance for the gradient Fix K max , J max number of maximum iterations in outer, resp. inner descent loops Fix s the initial step-size. # we use g tol = 1e − 5, K max = 100, J max = 10, s = 10 α 1 (t ) ← (α max − α min ) · t/t * + α min #α 1 (t ) ∼ initial guess for the control, linear interpolate between α min , α max for k = 1 . . . K max do # This is the outer loop where we descend down different gradients Calculate f k , J k , p k , δ α J k corresponding to α k from equations (13)- (15) and (17) N active ← Number of time nodes t n , where either α k (t n ) = {α min , α max } or δ α J k (t n ) points inwards if ||δ α J k || R N active g tol · N active then # 'Active' gradient is small enough, consider converged: BREAK end if # Find the step size, s, of how far to move α in the direction δ α J k : for j = 1 . . . K max do # This is the inner loop where we find how much to descend down the current gradient α k, j ← α k − s j · δ α J k # α k, j is the new control strategy to try Calculate f k, j , J k, j corresponding to α k, j from equations (13) and (14) # Recall f k, j is a probability density resulting from the control a k, j and J k, j is the objective value resulting from f k, j if J k, j < J k then # We found a better (