Computational processing of neural recordings from calcium imaging data

Electrophysiology has long been the workhorse of neuroscience, allowing scientists to record with millisecond precision the action potentials generated by neurons in vivo. Recently, calcium imaging of ﬂuorescent indicators has emerged as a powerful alternative. This technique has its own strengths and weaknesses and unique data processing problems and interpretation confounds. Here we review the computational methods that convert raw calcium movies to estimates of single neuron spike times with minimal human supervision. By computationally addressing the weaknesses of calcium imaging, these methods hold the promise of signiﬁcantly improving data quality. We also introduce a new metric to evaluate the output of these processing pipelines, which is based on the cluster isolation distance routinely used in electrophysiology.


Introduction
Calcium imaging is a technique for recording neural activity with calcium-dependent fluorescent sensors. It produces movies of neural activity at typical rates of 1-100 Hz [1,2]. The optical nature of the technique makes it quite versatile: it can be used to record activity from thousands of neurons [3,4], to record from sub-cellular structures such as spines and boutons [5], and to track the same cells chronically over long periods of time [6]. Imaging, as opposed to electrophysiology, enables precise spatial localization of cells in tissue, which is useful for analyzing the relationship between activity and cell location, for guiding patch pipettes to a cell, and for assigning cell type information unambiguously to the recorded cells [7]. Precise spatial localization also enables post hoc characterizations in situ by immunohistology [8], cell-attached/whole-cell recordings [9,10], or electron-microscopy [11].
In addition to these strengths, calcium imaging has its weaknesses, that can be partially addressed computationally. Its main disadvantage is that the recorded fluorescence is only an indirect measure of the neural spiking. The fluorescence of a cell imperfectly reflects the average activation of the calcium sensor, which in turn imperfectly reflects the average calcium concentration over the past several hundred milliseconds, which in turn imperfectly reflects the number of spikes fired by the cell over a few tens of milliseconds [5]. Other disadvantages of the method include its sensitivity to brain motion and the contamination of the somatic signals with out-of-focus fluorescence, both of which can be partially corrected computationally.
We review here the major components of a standard data processing pipeline for calcium imaging: motion registration, ROI extraction, spike deconvolution and quality control ( Figure 1). For an in-depth review of the technical and biological aspects of calcium imaging, see [2].

Motion correction
Brain motion during calcium imaging is unavoidable, even in anesthetized animals where the heartbeat induces motion. In awake animals, brain motion can be much larger due to the animal's motor behaviors. Motion in the Z-axis is usually ignored, because it leads to smaller changes in the signal compared to X/Y motion, owing to the shape of the two-photon point spread function: $8x more elongated in Z than in X/Y [1,13]. Therefore, most methods only address 2D motion in the X/Y plane, but see the end of this section for a discussion of Z movements and 3D motion correction. The X/Y motion registration can either be rigid, usually a 2D translation [14], or non-rigid, which allows different parts of the image to translate by different amounts [15 ,16]. The choice of rigid vs non-rigid 2D registration is often informed by the acquisition speed of a single frame. To understand why, we must first observe that lines in a frame are acquired sequentially [1,14]. Thus, motion that happens during the acquisition of a single frame affects later lines in that frame, and therefore the motion appears as a non-rigid stretch or shear along the Y axis, even if the underlying physical motion is approximately rigid [14] (Figure 2a). In contrast, when a single frame is acquired quickly (<50 ms), the influence of motion across the frame will be relatively uniform, and a rigid correction method can give good results.
Rigid motion is often estimated by computing the crosscorrelation between an imaging frame and a pre-determined target image [17], but alternative methods exist such as algorithms based on particle tracking [18]. In registration pipelines that use cross-correlation, the predetermined target can be defined as the average over a random subset of frames. However, unless the frames are initially well-aligned, this averaging can blur the target image, which impairs registration because the blur has the spatial scale of the features used for registration ( Figure 1a). Alternatively, one can bootstrap the computation of the target frame by refining it iteratively from a subset of frames and considering only those frames that are well-aligned at each iteration [15 ]. To increase the accuracy of rigid registration, it is also useful to prewhiten the Fourier spectrum of the images, so that the low frequencies do not contribute disproportionately to the estimation. The resulting method is known as "phase correlation"; it is more accurate than cross-correlation and can also be used to compute sub-pixel motion estimates [15 ,19,20].
Over longer periods of time (tens of minutes), the imaged tissue may drift relative to the objective due to thermal fluctuations in various parts of the microscope, physiological changes in the brain, or postural adjustments of the Computational processing of calcium imaging data Stringer and Pachitariu 23 Pipeline for calcium imaging. Step 1. The frames of a movie are aligned to a common template such as the mean of a few well aligned frames. If registration is successful, many small structures will be visible in the mean image, such as the thin horizontal dendrite shown here or the many small dots that represent apical dendrites. If registration fails, the mean image will be more blurry because the frames are not well aligned to each other. See Figure 2 for diagnosing possible causes. Step 2. Regions of interest are segmented, typically by matrix-factorization methods that group together pixels with correlated activity. Shown are the boundaries of detected ROIs, superimposed on a "correlation map", formed by correlating each pixel to the pixels in its immediate surrounding, and normalized across the image by subtracting the morphological opening.
Step 3. Spike deconvolution is performed on the neuropil-corrected traces that represent the average activity of pixels inside an ROI. The result is a trace the same size as the fluorescence trace, containing estimates of the number of spikes in every bin. Step 4. After the automated processing, the user can inspect the quality of the results, correct potential mistakes, and generally assess the quality of the data and algorithm. This step usually consists of a graphical user interface. The ROIs in gray have been labelled as non-somatic by the user.
animal [21]. Large Z-drift may also be observed on fast timescales, if an animal is engaging in motor behaviors, such as running or licking. Online Z-correction during imaging may be used to mitigate these concerns [22,23], but such corrections are not widely used. To help users diagnose 3D movement in their own data, we illustrate a few typical situations (Figure 2b-d). We recommend that future algorithms should diagnose Z-drift explicitly and try to correct it.

Cell detection and signal extraction
Identifying cellular compartments from a movie is a nontrivial problem, which is compounded by the relatively unclear definition of what constitutes a "good" ROI. By one definition, a somatic ROI is good if it looks like a clearly defined "donut" in the mean image [25,26 ] ( Figure 3a). Donut shapes are expected because genetically-encoded sensors are typically expressed in the cytoplasm of a cell and not in the nucleus [5]. However, this criterion is clearly insufficient, because sparsely firing cells are not distinctly visible in the mean image. However, they are visible in other types of 2D maps, such as the map of average correlation of each pixel with its nearby pixels [27,28] (Figure 3b Potential sources of 2D non-rigid motion. (a) Abrupt movements during line scanning can shift the cells acquired later in a frame, but are typically restricted to the XY plane. (b) Vertical Z-drift can bring cells in and out of focus. This can occur on slow timescales, for example due to thermal fluctuations in the microscope, or on fast timescales, for example due to the postural changes associated with running or the head movements associated with licking. (c) Rotation along the XY axes can result in opposite vertical movements for different parts of a frame, particularly if the field of view is very large such as on a mesoscope [12]. If the center of rotation is well outside the field of view, such motion can also appear as uniform Z-drift. (d) Non-rigid tissue deformations are unlikely in practice, but may result from vaso-dilation, brain rehydration or other physiological processes. Such motion is generally hardest to diagnose. and d), suggesting that their fluorescence only reflects the baseline calcium in these cells. These cells are either extremely sparsely firing or have unhealthy calcium dynamics, and they should be excluded from further analyses or treated carefully. Even if such cells do have sparse, low amplitude fluorescence transients, these signals are dominated by the nearby out-of-focus fluorescence (termed "neuropil", [29,30]) which may obscure any somatic activity (Figure 3f and g).
For these reasons, most recent pipelines use activity-based criteria to segment cells, typically detecting ROIs from the activity correlations between pixels, using methods such as independent components analysis [31], matrix factorization [15 ,32,33 ], dictionary learning [34,35] and others [36][37][38][39][40][41]. With appropriate pre-processing, for example by dimensionality reduction [15 ,31,42] or stochastic smoothing [43], all of these methods can in principle find the ROIs with significant activity in the field of view. However, these methods differ in their signal extraction procedure, specifically the way in which they address overlapping neural signals. "Demixing" procedures such as constrained nonnegative matrix factorization aim to extract the activity of cells, given their approximate, overlapping spatial masks [33 ]. This is a model-based procedure because it requires a generative model for the signals, such as non-negativity, sparseness, or even a full generative model of the calcium dynamics. An alternative approach is to simply ignore overlapping pixels for the purposes of signal extraction, in which case the cell traces can be extracted as simple pixel averages without making assumptions about the generative model of these signals [15 ].
Demixing approaches are particularly problematic when there is mismatch between the data and the model. This is often the case in practice because our understanding of the data is incomplete. For example, out-of-focus fluorescence (or neuropil) contributes substantially to the recorded signals [30,29] due to tissue-induced optical, distortions and scattering [13,44]. The neuropil reflects the averaged activity of a very large number of axons and other neurites [45]. Although the presence of neuropil contamination is clear in recorded data, it is unclear to what degree it influences the signals at the soma. Some analyses suggest the contamination coefficient is relatively close to 1, similar to its value just outside the soma [15 ,29]. In addition, the timecourse of the neuropil changes over spatial distances of a few hundred microns [45,46]. Therefore, the neuropil contamination cannot be assumed to be constant over space, or one-dimensional, which some pipelines do by default [33 ]. The motion of the tissue in the Z direction, which changes the shapes of the ROIs, is another source of model mismatch which we discuss in the next section.
Algorithms that detect cells in calcium imaging movies by different activity-based strategies generally perform relatively similarly (see http://neurofinder.codeneuro. org/, but note the significant caveats on how the ground truth is defined, [15 ]). However, once the cells are found, the signal extraction procedures vary considerably between these methods, with some of them potentially biased by their model-based assumptions. We recommend signal extraction approaches that assume the least about the underlying model; these approaches, in their simplicity, are easier to interpret and diagnose for potential crucial biases that will affect all subsequent analyses.

Spike deconvolution
Fluorescence is not a direct measure of spiking, in particular due to the slow kinetics of the calcium sensors currently available. The sensors activate quickly after an action potential (50-200 ms) but deactivate slowly (500 ms-2 s). Thus, the activity traces resulting from this method appear to have been smoothed (or "convolved") in time with an approximately exponential kernel. The traces can be "deconvolved" through an inversion process to approximately recover the spike times. Deconvolution is clearly a computational problem, and historically one of the first problems in calcium imaging to be addressed [49][50][51][52].
We refer to "spike deconvolution" as any algorithm that transforms the raw calcium trace of a single neuron into a trace of estimated spike trains of the same temporal length [53,54,55,56,57 ,58,59,60]. Typical results are illustrated in (Figure 4a-c) for a simple algorithm [48 ,50,57 ], which we will refer to as non-negative deconvolution (NND). Because the spike trains are imperfectly reconstructed, the best estimates may be non-integer quantities (Figure 4c), although the true spike trains are always discrete sequences of action potentials (Figure 4b).
An incorrect interpretation of deconvolved spike trains is that they represent the "firing rates" or "probabilities" of neural spiking; this is not true. Instead, they represent a time-localized and noisy estimate of the calcium influx into the cell. The calcium influx events may vary in size from spike to spike (Figure 4e and f). Therefore, a deconvolution algorithm which estimates event amplitudes cannot perfectly reconstruct a spike train even when the exact spike times are known from simultaneous electrophysiology [47] (Figure 4d and g). This variability in spike-evoked amplitudes in turn places an upper bound on blind spike detection accuracy. The upper bound is saturated when performance is evaluated in bins of 320 ms or larger (Figure 4g). In smaller bins, the deconvolution does not saturate the upper performance bound (Figure 4g), which appears to be due to variability in the timing of the deconvolved spike times on the order of %100 ms (Figure 4h). This timing variability may be either inherent in the data, or resulting from an imperfect deconvolution model. If The limits of spike deconvolution. (a) Recorded fluorescence trace in blue, and reconstructions from spike deconvolution (red) and from direct regression on spike times (yellow). (b) Ground truth spike times recorded by simultaneous electrophysiology [47]. (c) Spike deconvolution result, and the correlation with ground truth in bins of 10, 40 and 160 ms (non-negative, using the OASIS implementation [48 ]). (d) Regression on ground truth spike times to obtain "optimal" amplitudes. (e) Fluorescence traces aligned to spike times and baseline subtracted at 0 timelag. (f) Variability of single-spike "optimal" amplitudes from GT regression sets the limit of possible spike deconvolution performance. (g) Correlation of binned ground truth spike trains with deconvolved and "optimal" amplitudes. At large bin sizes, deconvolution saturates the possible maximal performance. (h) Failures of deconvolution at small bin sizes correspond to ambiguities of spike timing on the order of 100 ms, reflected in the shape of the cross-correlogram between deconvolved spike trains and ground truth. the latter was true, then it would be possible to improve performance with more complex models. However, in a recent spike detection challenge, complex methods provided few advantages over simpler methods, even when evaluated in short bins of 40 ms [61 ]. The top six methods were submitted by different teams, using a variety of approaches, yet they performed statistically indistinguishably from each other on "within-sample" data (training and testing data taken from the same ground truth dataset). On "out-of-sample" data (without simultaneous electrophysiology), some of the complex methods performed worse than simple non-negative deconvolution, and some of them even introduced unwanted biases [57 ].
Given the negligible improvements offered by complex methods and the potential biases they introduce, we advocate for using the simplest method available: nonnegative deconvolution, with an exponential kernel of decay timescale appropriate to the sensor used, and implemented with the fast OASIS solver [48 ]. Our analysis also suggests that identification of spike trains with better than 100 ms precision may not be achievable with the GCaMP6 sensors on single trials (Figure 4g and h). We also note that spike deconvolution is not necessary and should be avoided when the raw fluorescence trace provides sufficient information. However, when timing information is important, the deconvolution significantly increases data quality, as measured by metrics of signalto-noise ratio [57 ].

Quality control
Last but not least, quality control is an essential component of a data processing pipeline. It allows the user to scrutinize the results of an algorithm and correct any mistakes or biases. This process benefits from a well designed graphical interface, in combination with various metrics that can be computed to evaluate the success of registration, cell detection and spike deconvolution. Development in this area is lacking (but see [15 ]), suggesting an opportunity for the community to develop a common graphical interface and common metrics for visualizing and evaluating the results of multiple algorithms. Such a framework has emerged for spike sorting [62], which may serve as a template. In calcium imaging, commonly used metrics are simple anatomical measures (e.g. the shape and size of an ROI) and functional measures (e.g. the variance and skewness of the signal). However, other metrics may be more informative, such as the "isolation distance" of a cell's signal from the surrounding signals (Figure 3f and g), similar to metrics used in spike sorting [63]. This metric may be used as a control to ensure that the ROIs with special functional properties are not poorly isolated units that largely mirror the neuropil activity.

Specialized applications
We mention here several specialized applications of computational methods to specific types of imaging. For instance, one-photon miniscopes extend the calcium imaging method to freely moving animals [64,65]. The one-photon acquisition collects much more out-of-focus fluorescence compared to two-photon imaging, necessitating specialized computational approaches [36,66,67]. Recordings of voltage or glutamate sensors also typically employ one-photon imaging, but these sensors have relatively low sensitivity [68,69], making the spike detection highly noise limited. Advanced computational methods may in principle detect spikes more robustly in such low SNR recordings [70].
Other computational problems arise for calcium imaging of non-somatic compartments, such as boutons and spines, axons and dendrites [15 ,71]. These types of ROIs can usually be detected with standard activitybased methods, but benefit from more careful post-processing, for example in determining which neurites belong together as part of the same neuron, or in determining the contribution of the backpropagating action potential to the calcium levels in spines [5,72]. Online segmentation during an experiment can also be useful, and some algorithms are specialized to process incoming data sequentially [73]. Specialized methods to track the same cells across days have also been developed [74], as well as specialized techniques for subtracting the contribution of out-of-focus fluorescence [46]. Finally, specialized demixing algorithms were developed for specific microscopes such as the stereoscopic "vTwins" [75], or the tomography-based "SLAPMi" [76].

Conclusions
Here we reviewed current progress in calcium imaging data analysis, focusing on four processing steps: motion registration, ROI detection, spike deconvolution and quality control. These steps are highly modular; nonetheless, several software packages exist that provide full analysis pipelines [15 ,38,77,78]. Of these four steps, quality control has received the least attention, even though it is of primary interest to experimental users. Motion registration has also been relatively ignored, particularly in the Z dimension where drift can change the functional signals over time, or even as a function of instantaneous motor behaviors [21]. ROI detection and spike deconvolution have received the most attention, with several advanced methods producing high quality outputs. We propose that the systematic biases of these methods should now be scrutinized more than their absolute performance in benchmarks, because these biases can result in incorrect scientific interpretations.
We have not focused here on computational efficiency, but we note that most recent pipelines have been designed to be relatively fast and efficient on typical fields of view containing 100-200 cells. Some pipelines are even fast for processing data from $10,000 simultaneously recorded cells on a consumer-level workstation [15 ]. Instead, our main focus in this review has been on bias and interpretability. Having used and developed these algorithms in our own lab for our own data, we find ourselves often unwilling to use a complex method for potential small gains in performance, particularly if the method may introduce additional confounds, known or unknown. Simple yet effective and unbiased tools are badly needed, as evidenced by the slow adoption of existing tools, despite their intensive development. For example, only 11 out of 143 citations to the most popular software package (CNMF/CaImAn [33 ,77]) actually used the pipeline, with another 12 using only the spike deconvolution step. On the other hand, 48 of these citations are from other computational methods papers, suggesting that the computational field is vibrant, and would benefit from guidance and input from experimental collaborators.

Conflicts of interest statement
Nothing declared.

26.
Apthorpe N et al.: Automatic neuron detection in calcium imaging data using convolutional networks. Advances in Neural Information Processing Systems 2016:3270-3278. This paper demonstrates a supervised approach to cell detection based on convolutional neural networks fit to manually segmented data. The neural networks can be fit using 2D images of the field of view or 2+1D sequences of images where the 3rd dimension is a significantly downsampled temporal sequence after a series of max and average operations. Surprisingly, there was no quantitative improvement from using the 2+1D data, which might be due to the biases of the human curators making the testing datasets. The method is available in Python.

33.
Pnevmatikakis EA et al.: Simultaneous denoising, deconvolution, and demixing of calcium imaging data. Neuron 2016, 89:285-299. This paper introduced CNMF, a pipeline that simultaneously detects cells and deconvolves their spiking activity based on a generative model of the raw data. The method showed significant detection improvements compared to a major previous algorithm (ICA, Ref. [31]), as well as reduced biases in situations where the signals of cells overlap. The same algorithm is used in the Caiman pipeline [77], available for both Matlab and Python programming languages.