A finite rate of innovation algorithm for fast and accurate spike detection from two-photon calcium imaging

Objective. Inferring the times of sequences of action potentials (APs) (spike trains) from neurophysiological data is a key problem in computational neuroscience. The detection of APs from two-photon imaging of calcium signals offers certain advantages over traditional electrophysiological approaches, as up to thousands of spatially and immunohistochemically defined neurons can be recorded simultaneously. However, due to noise, dye buffering and the limited sampling rates in common microscopy configurations, accurate detection of APs from calcium time series has proved to be a difficult problem. Approach. Here we introduce a novel approach to the problem making use of finite rate of innovation (FRI) theory (Vetterli et al 2002 IEEE Trans. Signal Process. 50 1417–28). For calcium transients well fit by a single exponential, the problem is reduced to reconstructing a stream of decaying exponentials. Signals made of a combination of exponentially decaying functions with different onset times are a subclass of FRI signals, for which much theory has recently been developed by the signal processing community. Main results. We demonstrate for the first time the use of FRI theory to retrieve the timing of APs from calcium transient time series. The final algorithm is fast, non-iterative and parallelizable. Spike inference can be performed in real-time for a population of neurons and does not require any training phase or learning to initialize parameters. Significance. The algorithm has been tested with both real data (obtained by simultaneous electrophysiology and multiphoton imaging of calcium signals in cerebellar Purkinje cell dendrites), and surrogate data, and outperforms several recently proposed methods for spike train inference from calcium imaging data.


Introduction
Understanding how information processing occurs in neural circuits is one of the principal problems of systems neuroscience. Information is encoded in the firing of action potentials (APs, or spikes) by individual neurons, and Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. information processing involves the coordination of AP firing by large populations of neurons organized into neural circuits. To understand neural information processing, we thus must monitor the activity of neural circuits at a spatial resolution sufficient to resolve many individual neurons, and a temporal resolution sufficient to resolve individual APs on individual experimental trials. Of the currently available techniques for conducting neurophysiological experiments, only multiphoton calcium imaging (Denk et al 1990, 1994, Stosiek et al 2003 and multielectrode array electrophysiology (Csicsvari et al 2003, Blanche et al 2005, Du et al 2009 offer this capability. Of these, only multiphoton calcium imaging currently allows precise three-dimensional localization of each individual monitored neuron within the region of tissue studied, in the intact brain. In order to monitor cellular activity, neurons must be labelled with a fluorescent indicator, and a number of approaches have been used to do this. Single cells can be labelled by filling the cell with dye during a whole-cell or intracellular recording (Kitamura et al 2008. Alternatively, populations of neurons can be simultaneously labelled with acetoxy-methyl (AM) ester calcium dyes (Stosiek et al 2003), allowing simultaneous monitoring of AP induced calcium signals in a plane (Ohki et al 2005) or volume (Göbel and Helmchen 2007) of tissue. To investigate information processing in neural circuits, it is necessary to relate these calcium signals to the properties of the spike trains fired by the neurons, ideally by detecting the times of occurrence of spikes with single AP resolution. A number of methods have previously been used to detect spike trains from calcium imaging data, including thresholding the first derivative of the calcium signal (Smetters et al 1999), and the application of template-matching algorithms based on either fixed exponential (Kerr et al 2005, Greenberg et al 2008 or data-derived (Schultz et al 2009, Ozden et al 2008 templates. Machine learning techniques (Sasaki et al 2008) and probabilistic methods based on sequential Monte Carlo framework (Vogelstein et al 2009) or fast deconvolution (Vogelstein et al 2010) have also been proposed.
Some broadly used methods such as template matching or derivative-thresholding have the disadvantage that they do not deal well with multiple events occurring within a time period comparable to the sampling interval. Unfortunately, given that laser-scanning two-photon imaging systems are largely limited to scan rates of 8-30 Hz when frame-scanning with sufficient spatial resolution to capture many neurons, and that neurons in many brain areas have a propensity to fire spikes in bursts, this is a relatively common occurrence in neurophysiological calcium signals. Bursts of spikes have been found to convey information with high reliability in some sensory systems (Reinagel et al 1999, Gabbiani et al 1996, and have been suggested to carry distinct sensory signals (Wang et al 2007). It is thus desirable to develop calcium transient detection algorithms that accurately detect multiple spike calcium events. As there is a trade-off between the area of tissue imaged and signal to noise ratio (SNR) (zooming in on a region of tissue allows the collection of more photons per neuron, thus offering better SNR, but limits the number of neurons that can be studied) and similarly between sampling rate and the area of tissue that can be imaged, it is desirable to improve algorithms for the detection of APs from calcium fluorescence time series.
In this paper we present a novel approach that extends modern sampling theory based on finite rate of innovation (FRI) theory. In the absence of noise, the FRI algorithm perfectly retrieves the locations of APs using a variation of a fast non-iterative algebraic method called annihilating filter (a.k.a. Prony's method). This method reconstructs complex exponentials in noise from a set of samples. We have combined this with a novel double consistency sliding window technique that improves performances in noisy scenarios. To reconstruct the time series we construct a Toeplitz matrix from the samples.
The key characteristic of this matrix is that, in the noiseless case, it is rank deficient, and its rank is always equal to the number of APs in the observation window. We run the algorithm twice, firstly with a large time window to estimate the number of spikes by singular value decomposition (SVD), and secondly, with a time window containing only a small number of spikes. In both cases, for each position of the sliding window, the algorithm outputs the locations of the K spikes assumed within the window. When the estimate of K is correct, the retrieved locations are stable among different sliding windows, and when incorrect, unstable. We construct a joint histogram of the retrieved locations with the two different window sizes. The final spike time estimates are obtained from histogram peaks, corresponding to consistent positions among different windows.
The proposed algorithm is robust in high noise scenarios, and fast enough to allow real-time spike train inference for tens of neurons. We show that for surrogate data with a temporal resolution of 27 Hz and a SNR of 10 dB the algorithm presents a spike detection rate above 95% with a false-positive rate below 0.02 Hz. Moreover, this algorithm is able to retrieve the spike locations with a precision higher than the temporal resolution of the acquired data.

Experimental methods
The data used in this study, and the experimental methods used to collect them, have been previously described (Schultz et al 2009). Briefly, Sprague-Dawley rats (P18-P29) were anaesthetized with urethane (1.2 g kg −1 ) or with ketamine (50 mg kg −1 ) / xylazine (5 mg kg −1 ). A craniotomy was made over area Crus IIa of the cerebellum, filled with 1.5-2% agarose in Ringer's solution, and a coverslip clamped above the agarose to suppress brain movement, while leaving a window open for microelectrode access. A micropipette was inserted to a depth of around 100-200 μm below the pia mater, and AM-ester calcium dye (Oregon Green BAPTA-1 AM) pressure-ejected. Imaging was performed from 30 min following dye ejection, using a two-photon laser scanning microscope (Prairie Technologies). A pulsed Titanium:Sapphire laser was used for excitation, operating at 810 nm (MaiTai, SpectraPhysics) with <100 fs pulse width and 80 MHz repetition rate, and focused using a 40×, 0.8 Numerical Aperture objective lens (Olympus).
Image frames were acquired using ScanImage software (Pologruto et al 2003) for MATLAB (MathWorks). Raster lines making up each frame were of 2 or 2.3 ms duration, resulting in frame rates of 7-16 Hz. For each region imaged, a high resolution reference image was first acquired (512 × 512 pixels, average of five frames), followed by movies at 256 × 64 or 256 × 32 pixel resolution. Fluorescence time series for each neuron were obtained by defining regions of interest (ROIs) using a combination of human operator and spatial independent component analysis (Schultz et al 2009, Reidl et al 2007, and for each time bin, averaging the values of each pixel within the ROI. To validate our event detection algorithms, we simultaneously performed targeted extracellular recordings from imaged neurons. Patch micropipettes (∼4 M ) were filled with artificial cerebrospinal fluid (ACSF), together with Alexa 594 to aid visualization of the pipette. The pipette was navigated until the tip was adjacent to a Purkinje cell soma or dendrites and CS could be observed with high SNR. We emphasize that we are using two-photon targeted (visualized) juxtacellular recording, using a patch-pipette filled with dye. Using this technique, we can observe that the pipette is attached to a cell in which fluorescence changes are observed for each AP, meaning that there is no ambiguity concerning which cell is being recorded from. Electrophysiological and imaging data were then simultaneously acquired from the same neuron (figure 1).

Mathematical model
At time t we consider the fluorescence measurement for a given ROI to be proportional to the calcium concentration plus additive Gaussian noise (Vogelstein et al 2009): (2.1) where [Ca 2+ ] t is the intracellular calcium concentration at time t, constant β represents the baseline calcium concentration of a particular cell and t the noise at time t. The noise is independently and identically distributed according to a normal distribution with zero mean and σ 2 variance.
The signal that we will consider is the normalized fluorescence  (Vogelstein et al 2009). We assume that when a neuron is activated, the calcium concentration jumps instantaneously, and each jump has the same amplitude A. The concentration then decays exponentially with time constant τ , to a baseline concentration. The one-dimensional fluorescence signal can therefore be characterized by convolving the spike train with a decaying exponential and adding noise: where the index k represents different spikes, the different t k their occurrence times and u(t ) the unit step function. Hence, the goal of the spike detection algorithm is to obtain the values t k .

Spike detection
Our spike detection algorithm is based on connecting the calcium transient estimation problem to the theory of FRI signals. We therefore first provide an overview of this theory and then present our spike detection method.
2.3.1. Overview of FRI theory. FRI theory applies to specific classes of signals which are completely specified by a finite number of free parameters. The goal of FRI algorithms is to reconstruct a signal that best fit the model given the available measurements. This is achieved by building specific matrices whose singular values and singular vectors provide the information necessary to retrieve the free parameters of the signals. Specifically, the canonical expression of a signal with FRI is given by: If the function g(t ) is known, the signal x(t ) is completely determined by the coefficients a k and the shifts t k , these are the free parameters. Introducing a counting function C x (t a , t b ) that counts the free parameters or degrees of freedom of x(t ) over the time interval [t a , t b ], the rate of innovation is defined as (Vetterli et al 2002) (2.5) We can then define FRI signals as those with a finite ρ. A typical example of such signals is a stream of K Diracs, . This signal is not-bandlimited, but we only need to know the K pair of coefficients (a k , t k ) to perfectly reconstruct it. Classical sampling theory does not allow sampling and perfect reconstruction of this type of signal. However, recent work in FRI theory has shown that this is possible (Vetterli et al 2002). In the sequel we show how it is possible to acquire the signal x(t ) = K k=1 a k δ(t − t k ) and perfectly reconstruct it from a finite set of samples.
Acquisition devices are usually modelled as a filtering stage followed by a sampling stage as illustrated in figure 2.
Filtering signal x(t ) with h(t ) = ψ (−t/T ) and retrieving samples at instants of time t = n T is equivalent to computing the inner product between x(t ) and ψ (t/T − n). Specifically, the filtered signal is given by Moreover, sampling y(t ) at regular intervals of time t = n T leads to Hence, samples y n correspond to the projection of x(t ) onto the set of functions {ψ (t/T − n)} n∈Z .
The function ψ (t ) is called sampling kernel and has to satisfy specific properties to be able to perfectly reconstruct the signal x(t ). Exponential reproducing kernels satisfy the required conditions (Dragotti et al 2007). This is a family of kernels that together with its shifted versions can reproduce exponentials of the form e α m t : (2.8) where m = 0, 1, . . . , P. This expression is satisfied for a proper choice of coefficients d m,n . The computation of these coefficients is detailed in appendix A.1. The parameters α m can be chosen arbitrarily. However we require α m = α 0 + mλ in order to be able to use the annihilating filter method described later on. Moreover, we choose them to be α m = j π P (m − P 2 ). They are selected to be purely imaginary because they are more robust against noise and in complex conjugate pairs in order to have a real valued kernel ψ (t ). E-splines are a type of functions that are able to reproduce exponentials and have the advantage of being of compact support (Urigüen et al 2011). An E-spline of order P can reproduce P + 1 different exponentials as in (2.8). Figure 3 shows an example with P = 1. This E-spline is able to reproduce two different exponentials.
Given the samples y n , we now want to retrieve the degrees of freedom {(a k , t k )} K k=1 . If we combine these samples with coefficients d m,n , we obtain where m = 0, 1, . . . , P. The new samples s m are the exponential moments of the signal x(t ). In the particular case where the input signal is a stream of Diracs, and α m can be written as α m = α 0 + mλ, the exponential moments can be expressed as a sum of exponentials (see appendix A.2): where b k = a k e α 0 t k and u k = e λt k . We are now faced with the problem of having to retrieve b k and u k from the sequence s m . The problem is linear in the parameters b k , but it is nonlinear in the parameters u k . Therefore the difficulty is in finding the nonlinear terms. We solve the problem by applying the annihilating filter method. The annihilating filter is a filter of length K + 1 with zeros at locations {u k } K k=1 . The z transform of the impulse response of the filter is thus This method is based on the observation that if we filter the sequence s m with a filter with zeros at u k the output is zero. To convince ourselves of this fact let us assume that we have a sequence with only one exponential: s m = u m 1 . The corresponding annihilating filter is 1 − u 1 z −1 . This filter computes finite differences weighted by u 1 . The output of the filter is thus s m − u 1 s m−1 . The sequence s m is cancelled out as the weight u 1 is exactly the rate of growth of the sequence. If we have more than one exponential we can annihilate the signal by cascading unitary filters where each of them cancels out one exponential. Figure 4 illustrates this concept. The z transform of filters connected in series is the product of their transfer functions. Thus the transfer function of the annihilating filter is and we have that (2.14) The matrix S is rank deficient with rank K; the system is therefore overdetermined and the solution is not unique. If we impose h 0 = 1, the system has a unique solution. Once h has been found, the locations t k are directly determined from the roots of the polynomial H(z) as u k = e λt k , where λ is the parameter of the coefficients α m = α 0 + mλ. From (2.14) and imposing h 0 = 1, it can be seen that we need at least 2K samples s m . This imposes a lower limit to the order P of the E-spline as we compute the measurements s m for m = 0, 1, . . . , P, where P is its order.
Retrieval of the parameters of a sum of exponentials in noise in the form given in (2.10) is a recurrent problem in spectral estimation. We refer the reader to Stoica and Moses (2005) for further details.
The previous theory has been presented for continuoustime signals and an analogue sampling kernel. However it can easily be extended to discrete-time signals. We can assume that the independent variable t of the input signal x(t ) is discrete. For a given temporal resolution T res , we define discrete time values as t = n T res , where n ∈ Z. The filter is then replaced by a discretized version of ψ (t ) and the convolution is computed as a summation instead of an integral. Moreover, if we set the sampling period at the output of the filter to be the temporal resolution, that is T = T res , the sampling stage after the filter can be omitted, as the filter's output y(t ) is a discrete sequence that directly corresponds to samples y n . The T = T res condition also applies to the scaling factor of the kernel which becomes ψ (t/T res ).

Data processing.
Based on the above framework, we now develop a method for spike detection in calcium transient signals. Recall that the input signal can be expressed as a stream of decaying exponentials. Moreover, we assume that there is a finite number K of spikes within the observation period. Therefore the noiseless calcium concentration variation, denoted c(t ), can be expressed as Figure 5. Filtering process of the measured signal.
Here the variable t is discrete. The detection process requires filtering the measured signal. The filter has an impulse response The output of the filter h(t ) are the samples y n that correspond to the inner product between c(t ) and shifted versions of the kernel: y n = c(t ), ϕ(t − n) . Samples y n can also be expressed as 3). One of the key points of the previously described FRI framework is that the filtered and sampled stream of Diracs is combined with the d m,n coefficients from (2.8) to obtain the sum of exponentials given in (2.10). It will become clearer in what follows, that despite the fluorescence signal being composed of a stream of decaying exponentials, this first filtering stage with the exponential reproducing function ϕ(t ) will allow us to turn the problem into retrieving the locations of a stream of Diracs.
The next step of the algorithm is to compute finite weighted differences of samples y n in order to obtain new samples z n . This is a second filtering stage with a filter with transfer function G(z) = 1 − e −αT z −1 . These steps are illustrated in figure 5. Filtering signal c(t ) with ϕ(−t ) and computing samples z n = y n − y n−1 e −αT is analogous to filtering the stream of Diracs x(t ) with a different kernel ψ (t ) (see appendix A.4). At this stage, the problem has been turned into a sampling process of a stream of Diracs. This new kernel, ψ (t ), is still able to reproduce exponentials (Unser and Blu 2005). That is, there exists coefficients d m,n such that n d m,n ψ (t − n) = e α m t .
The problem of estimating the calcium transients and the problem of reconstructing an FRI signal are now equivalent. In fact, we now have a set of samples z n = x(t ), ψ (t − n) which are equivalent to those that we would obtain if we were sampling the stream of Diracs x(t ) with the exponential reproducing kernel ψ (t ). We can therefore apply FRI techniques to retrieve the location of the Diracs and, as highlighted in (2.15), those correspond exactly to the activation times of the APs. We summarize this inference method in algorithm 1.

Spike inference in practice.
Real data presents two main issues. Firstly, in the presence of noise, the matrix S from (2.14) is not rank deficient. And secondly, the number of spikes (K) within a time interval is unknown.
In the noiseless case, the matrix S has rank K. The SVD of this matrix has therefore only K non-zero singular values. When noise is added to the input signal, the matrix S becomes full rank and if we do not have prior knowledge of K, estimating its value becomes part of the problem. In a low noise scenario and when K is not zero, K can be estimated from the singular values of S. In this case, the contribution of the signal in the singular value of S is more important than the contribution of the noise, and a clear separation can be established to estimate the number of singular values that are due to the signal.
Another effect of the noise is that equation (2.14) is not satisfied exactly. We have followed two different approaches to overcome this situation. The first approach (Blu et al 2008) starts with denoising the matrix S with an iterated algorithm (Cadzow 1988). The matrix S is Toeplitz by construction, but is not rank deficient due to the presence of noise. The iterated algorithm makes the matrix S be of rank K (using the previously estimated value of K) setting to zero the smallest singular values. This new matrix S has rank K but is not Toeplitz anymore. A new matrix is built averaging the diagonal elements of matrix S . These two steps are repeated until some stop condition is reached. The next step is to solve equation (2.14). This is done computing the total least squares solution that minimizes Sh 2 subject to h 2 = 1. The second approach is based on the matrix pencil method (Hua and Sarkar 1990) which is in essence based on the same principle that is used in the ESPRIT algorithm (Paulraj et al 1985) for the estimation of directions of arrival of signals in arrays of antennas. This approach has already been successfully used in the FRI framework (Maravić and Vetterli 2005). This method is based on the particular structure of the matrix S, which is Toeplitz and each element is given by a sum of exponentials as shown in (2.10). Let S 0 be the matrix constructed from S by dropping the last row and S 1 the matrix constructed from S by dropping the first row. It can be shown that in the matrix pencil S 0 − μS 1 the parameters {u k } K k=1 are rank reducing numbers, that is, the matrix S 0 − μS 1 has rank K − 1 for μ = u k and rank K otherwise. The parameters {u k } K k=1 are thus given by the eigenvalues of the generalized eigenvalue problem (S 0 − μS 1 )v = 0. Both approaches lead to similar performances whilst the second is computationally more efficient.
Correct estimation of the number of spikes within the time window where we are searching for spikes is crucial to obtain good performance. The previously described approach, where K is estimated from the singular values of the matrix S, has two main issues: firstly, we never detect the K = 0 case, and secondly, in very noisy scenarios (low SNR), the estimation is not very accurate. To overcome these inaccuracies we perform a double consistency analysis. In order to extract the spikes from a long data stream, the signal is sequentially analysed with a sliding window. For each position of the window, we first estimate the number of spikes within the window, and we then extract the locations of the corresponding spikes. Figure 6 illustrates this procedure. If the window has size t and the window progresses by steps of t step , the time interval processed within the ith window is where t 0 is the instant of time of the first sample of the data stream. We select t step to be equal to the temporal resolution of the data, so the window advances sample by sample. Consecutive windows, importantly, overlap to guarantee that a spike is detected among different windows. Figure 7 illustrates this sequential processing of a real fluorescence sequence. In figures 7(a) and (b) the red dots represent the retrieved locations for different positions of the sliding windows; the vertical axis represents the index of the window, and the horizontal axis the time location of the retrieved spikes. Figure 7(a) corresponds to a window size of 32 points and figure 7(b) to a window size of 8 points. The blue lines represent the locations of the real spikes, this is the ground truth data. When a spike is detected among different windows, we can see that the red dots are aligned vertically because different windows output the same location. The double consistency approach consists in running the algorithm twice following two different strategies in each execution. First, with a sufficiently large time window (32 points of the input signal) we estimate the number of spikes from the singular values of the matrix S. Second, with a sufficiently small window (8 points of the input signal) we assume that we always have a single spike within this observation window. In both cases, for each position of the sliding window, the algorithm outputs the locations of the spikes assumed to be within that window. When the retrieved locations correspond to real spikes, the locations we retrieve are stable among the different positions of the window that capture these spikes, but when the locations correspond to noise they are not stable. We construct a joint histogram of the retrieved locations with the two different window sizes. This is shown in figure 7(c). The locations of the real spikes are estimated from the peaks of the histogram. These peaks correspond to positions that are consistent among different windows. Figure 7(d) shows the fluorescence data with the real and the detected spikes. The algorithm is summarized in algorithm 2.

Generating surrogate data
We generated surrogate data with similar properties to the experimental data, in order to investigate the changes in performance of the spike detection algorithm in terms of parameters such as data SNR and the sampling frequency. We assume that the spike occurrence follows a Poisson distribution with parameter λ spikes/s. Experimental data presents occurrences between 0.45 and 0.5 spikes per second. The probability of having k spikes in the interval considered in parameter λ (one second) is given by the probability mass function of the Poisson distribution: To generate a spike train for a time interval L we divide this interval in N slots. Each slot corresponds to a time interval of t = L N seconds. The λ parameter that corresponds to this new time interval is λ = λ · t. We then create a vector k = (k 1 , . . . , k N ) of size 1 × N where each k i is a realization of the independent random variables K i ∼ Pois(λ ). The ith element of this vector, k i , gives the number of spikes that occurred during the ith time slot. We then have to generate the precise instant of time when the spike occurred. For a given time slot, we generate the k i spike locations according to a uniform distribution. The total number of spikes in the time interval L is K = N i=1 k i . Once we have generated the locations of the K spikes (t k ) K k=1 the waveform given by the exponential decaying model is: where α = 1/τ . We then generate the simulated fluorescence sequence sampling equation (2.19) at instants t = n T res for a temporal resolution of T res seconds. The data sequence is slightly smoothed before sampling in order to have a differentiable function. We can then add white Gaussian noise to satisfy a certain SNR. The SNR is computed as the ratio between the power of the signal and the power of the noise, expressed in the logarithmic decibel scale. Figure 8 shows an example of generated data with a SNR of 10 dB.

Real-time processing
The algorithm is fast enough to perform real-time spike inference. The most demanding stages in terms of computation requirements are the estimation of the number of spikes and the retrieval of the locations for each position of the sliding windows. The joint histogram's peak detection has a negligible complexity when compared to the previous stages. For each new data sample the algorithm has to perform the number of spikes estimation and locations retrieval for the 32 points and 8 points windows. Since previous locations are stored in memory, the histogram can be computed sequentially. Performance measurements have been done for the current MATLAB implementation using a commercial laptop (tested on a 2.5 GHz Intel Core i5 CPU). In our setup, the 32 points window takes on average (value obtained averaging the execution time of 1000 windows) 1.25 ms to perform the number of spikes estimation and location retrieval, and the 8 points window takes 0.49 ms. Therefore, when a new data sample is available the algorithm takes 1.74 ms to process it. The sampling period is 147.2 ms, the current implementation can thus process up to 84 data streams in parallel. The algorithm requires the samples from a whole window before being able to output a location. Therefore the output has a maximum delay of 32 samples × 147. 2 ms/sample = 4.71 s.

Results
In this section we present the performance of the spike detection algorithm with real and surrogate data. The electrophysiological measurements give us a ground truth for the spiking activity of the monitored neuron which allows measuring the performance of the algorithm with real data. A detected spike is considered to correspond to a real spike if the difference between the real location and the estimated location is smaller than or equal to a threshold. We set this threshold to be equal to the temporal resolution of the data, T res . If we denote by t k the real location of a spike andt j an estimated location, we consider that the real spike has been detected ift j ∈ [t k − T res , t k + T res ]. When a spike is assumed to correspond to a real spike, we can measure the error on the estimated location. From this error measurement we obtain a mean square error of the overall algorithm.
A limitation of the real data is the temporal resolution, which is imposed by the frame rate of the calcium imaging dataset. With the surrogate data we can control this resolution when we generate the data stream to measure the impact of this parameter to the algorithm's performance.

Real data
The real data is a data stream of 133 s with a temporal resolution T res = 0.147 s. Hence there are 903 samples. This data stream contains 62 spikes at a rate of 0.466 Hz.
The sliding window algorithm is performed twice, first with a big window of 32 samples estimating K from the estimated rank of the S matrix (thresholding of the singular values), and second with a small window of eight samples and a fixed K = 1. The spikes are detected from the resulting  histogram of the union of the locations retrieved in both iterations. The algorithm correctly detects 83.9% of the spikes. The standard deviation of the locations is 0.0503 s. There are a total of 9 false positives, this corresponds to a false positive rate of 0.0598 Hz or 1.1% if measured as the rate between false positives and total negative samples.

Surrogate data
The real data presents a spike rate of 0.466 spikes per second. We have generated surrogate data assuming that the spike occurrence follows a Poisson distribution with a parameter λ = 0.5 spikes/s and a total number of 1000 spikes. The noiseless calcium concentration signal have been generated once for a given spike distribution and with three different temporal resolutions. To analyse the performance variation for different levels of noise we have run the algorithm over 100 different realizations of noise for each level of SNR. Figure 9 summarizes the obtained performances.
From figure 9 it can be seen that the success rate of the algorithm strongly depends on the temporal resolution of the data. The higher the temporal resolution, the better the spike detection rate. The real data we have analysed presents a low temporal resolution because of the low frame rate of the calcium images ( 1 0.147 s = 6.8 Hz), but recent publications (Sadovsky et al 2011, Katona et al 2012 show that the acquisition techniques are improving, with in some situations frame rates up to 125 Hz now available. At these frame rates, our algorithm presents success rates above 95%. The performances of the detection algorithm are not particularly influenced by the noise for SNRs above 10 dB, and deteriorate slightly for lower SNRs. Increasing temporal resolution has a minor drawback, the amount of false positives slightly increases. However, the false positive rate is very low (about 15 false positives for a stream of 2000 s represents a rate of false positives below 0.01 Hz) 3 .

Comparison with existing methods
Various methods for spike inference from two-photon imaging have been presented in recent years, but to the best of our knowledge, none of them achieve these performances for realtime processing. Greenberg et al (2008) present a method based on finding a least-square solution to fit the observed fluorescence signal. With real data similar to ours, temporal resolution of 96 ms and neural activity with firing rate of 0.44 Hz, they obtain higher detection rates, 95% detection of electrically confirmed AP with a false-positive rate of 0.012 Hz. However, this method is very slow and is not suitable for real-time processing. It also has to be noted that this data was acquired from cell bodies and our from dendrites. Sasaki et al (2008) describe a new approach that combines principal component analysis and support vector machine. This method requires a learning phase to tune some parameters. The results show similar performances in terms of detection rate, with error rates <10%, but the precision of this method is lower as only a fraction of the detected spikes are detected in the correct time frame. Vogelstein et al (2009) present a sequential Monte Carlo method to infer spike trains. Again, this method is not suitable for real-time processing due to its high computational complexity. Vogelstein et al (2010) describe a fast nonnegative deconvolution filter to infer the most likely spike train given the fluorescence. The code that implements this method in MATLAB is publicly available and we have tested it with our data. The computational complexity of this method is comparable to ours. The output of this algorithm is a probability between 0 and 1 of having a spike in a given time frame. Thresholding this probability vector is how we decide if the neuron has been activated in a given time frame. The lower the threshold, the higher the detection rate, but this also increases the false positive rate. Figure 10 presents receiver operating characteristic (ROC) curves in order to compare our algorithm (FRI) and the fast nonnegative deconvolution technique with surrogate data. We have also included simulation results for two other standard algorithms, derivative-thresholding and filter and thresholding. The latter method filters the fluorescence sequence with a derivative of a Gaussian filter in order to smooth the noise and detect spikes. All four methods have a thresholding stage  to infer the spike train. A lower threshold provides a higher success rate but with the penalty of having more false positives. The simulations have been performed with the same spike train we generated to obtain the performance measurements in figure 9 and with the same realization of the noise in all four methods. We present the results for two different levels of noise. The two axis of the ROC curves are unitless as they present a ratio between true positive or negative samples and obtained positive or negative samples. The surrogate data contains 1000 true spikes and 13 587 samples (2000 s/T res ). Thus an operating point with a false positive rate of 0.01 and a true positive rate of 0.9 correctly detects 900 spikes but throws 126 false positives. It can be observed that the FRI algorithm presents better performances although it has to be noted that the fast deconvolution algorithm is faster. The time required to process a 13 600 points stream (which corresponds to the 2000 s stream of surrogate data in figure 10(a)) is around 3.85 s for the fast deconvolution algorithm and around 23.64 s for the FRI algorithm.
With real data, FRI achieves a success rate of 83.9% (52 trues spikes correctly detected out of 62) with only nine false positives. To achieve similar success rates on the same data with the fast deconvolution method, we obtain more than 100 false positives, this is more false positives than true spikes. Derivative-thresholding presents more than 200 false positives for a success rate of 83.9% and filter and threshold more than 110 false positives.

Discussion
We have presented a novel spike inference technique based on FRI theory. Spikes are detected from calcium transients in fluorescence measurements. To do this, the existing FRI framework has been extended to a new class of signals that is formed by a stream of decaying exponentials. The data obtained in this type of measurements presents low temporal resolution and is corrupted with noise. To overcome these limitations we propose a sequential non-iterative algorithm that is able to detect spikes in real-time. The proposed algorithm achieves very high success rates with a low number of false positives. These promising results are a direct consequence of the fact that the fluorescence sequence can be parametrized as a signal recoverable in the FRI setup. FRI guarantees that the recovered signal is within a specific model, and this strong prior is what makes this algorithm very effective.
Techniques for spike train inference from two-photon imaging have begun attracting substantial attention in recent years due to the promise of being able to monitor spike trains from large numbers of localized neurons simultaneously. Improvements in acquisition techniques and increasing temporal resolution demand efficient spike inference algorithms to process all this information. Our algorithm is fast and parallelizable, and is thus well-suited to this context. which is valid for any value of t. Setting t = 0, we have d m,0 = ( n e −α m n ψ (n)) −1 . Note that the summation is finite because ψ (t ) is of compact support. For each m we can then compute d m,n for any n as d m,n = e α m n d m,0 .

A.2. Exponential moments of a stream of Diracs
We define the exponential moments of a signal x(t ) as

A.3. Filtering a stream of decaying exponentials
Let We know from (2.7) that filtering signal c(t ) with a filter with impulse response h(t ) = ϕ(−t/T ) and taking samples at regular intervals t = nT can be expressed as y n = c(t ), ϕ( t T − n) . Replacing c(t ) by x(t ) * ρ α (t ) and denoting with ϕ n,T (t ) the function ϕ( t T − n) then leads to: y n = x(t ) * ρ α (t ), ϕ n,T (t ) which is also illustrated in figure A1.

A.4. Computing weighted finite differences of the samples
We now show that filtering signal c(t ) = x(t ) * ρ α (t ) = K k=1 δ(t − t k ) * A e −αt u(t ) with ϕ(−t/T ) and computing samples z n = y n − y n−1 e −αT is analogous to sampling the stream of Diracs x(t ) with a different kernel ψ (−t/T ). The weighted differences can be written as In the second part of this inner product we can recognize an expression which is similar to the Fourier transform of a first order E-Spline,β α (w) = 1−e α− jw jw−α . If we consider β −αT (−wT ) = 1−e −αT + jwT T (α− jw) it follows that If we name ψ (t ) = β αT (−t ) * ϕ(t ), the expression in (A.14) shows that samples z n are equivalent to sampling the stream of Diracs x(t ) with ψ (−t/T ).