CalmAn: An open source tool for scalable Calcium Imaging data Analysis

Advances in fluorescence microscopy enable monitoring larger brain areas in-vivo with finer time resolution. The resulting data rates require reproducible analysis pipelines that are reliable, fully automated, and scalable to datasets generated over the course of months. Here we present CaImAn, an open-source library for calcium imaging data analysis. CaImAn provides automatic and scalable methods to address problems common to pre-processing, including motion correction, neural activity identification, and registration across different sessions of data collection. It does this while requiring minimal user intervention, with good performance on computers ranging from laptops to high-performance computing clusters. CaImAn is suitable for two-photon and one-photon imaging, and also enables real-time analysis on streaming data. To benchmark the performance of CaImAn we collected a corpus of ground truth annotations from multiple labelers on nine mouse two-photon datasets. We demonstrate that CaImAn achieves near-human performance in detecting locations of active neurons.


Introduction
Understanding the function of neural circuits is contingent on the ability to accurately record and 27 modulate the activity of large neural populations. Optical methods based on the uorescence 28 activity of genetically encoded calcium binding indicators (Chen et al., 2013) have become a standard 29 tool for this task, due to their ability to monitor in vivo targeted neural populations from many 30 di erent brain areas over extended periods of time (weeks or months). Advances in microscopy 31 techniques facilitate imaging larger brain areas with ner time resolution, producing an ever-32 increasing amount of data. A typical resonant scanning two-photon microscope produces data at a 33 rate greater than 50GB/Hour 1 , a number that can be signi cantly higher (up to more than 1TB/Hour) 34 with other custom recording technologies (Sofroniew et (2015)). 36 This increasing availability and volume of calcium imaging data calls for automated analysis 37 methods and reproducible pipelines to extract the relevant information from the recorded movies, 38 i.e., the locations of neurons in the imaged Field of View (FOV) and their activity in terms of raw 39 1 Calculation performed on a 512ù512 FOV imaged at 30Hz producing an unsigned 16-bit integer for each measurement.
Overview of analysis pipeline 133 The standard analysis pipeline for calcium imaging data used in CAIMAN is depicted in Fig. 1a. 134 The data in movie format is rst processed to remove motion artifacts. Subsequently the active 135 components (neurons and background) are extracted as individual pairs of a spatial footprint that 136 describes the shape of each component projected to the imaged FOV, and a temporal trace that 137 captures its uorescence activity (Fig. 1b-d). Finally, the neural activity of each uorescence trace 138 is deconvolved from the dynamics of the calcium indicator. These operations can be challenging 139 because of limited axial resolution of 2-photon microscopy (or the much larger integration volume 140 in one-photon imaging). This results in spatially overlapping uorescence from di erent sources 141 and neuropil activity. Before presenting the new features of CAIMAN in more detail, we brie y review 142 how it incorporates existing tools in the pipeline. 143 Motion Correction 144 CAIMAN uses the NORMCORRE algorithm ) that corrects non-145 rigid motion artifacts by estimating motion vectors with subpixel resolution over a set of overlapping 146 patches within the FOV. These estimates are used to infer a smooth motion eld within the FOV 147 for each frame. For two-photon imaging data this approach is directly applicable, whereas for 148 one-photon micro-endoscopic data the motion is estimated on high pass spatially ltered data, 149 a necessary operation to remove the smooth background signal and create enhanced spatial 150 landmarks. The inferred motion elds are then applied to the original data frames. 151 Source Extraction 152 Source extraction is performed using the constrained non-negative matrix factorization (CNMF) 153 framework of Pnevmatikakis et al. (2016) which can extract components with spatial overlapping 154 projections (Fig. 1b). After motion correction the spatio-temporal activity of each source can be 155 expressed as a rank one matrix given by the outer product of two components: a component in 156 space that describes the spatial footprint (location and shape) of each source, and a component 157 in time that describes the activity trace of the source (Fig. 1c). The data can be described by the 158 sum of all the resulting rank one matrices together with an appropriate term for the background 159 and neuropil signal and a noise term (Fig. 1d). For two-photon data the neuropil signal can be 160 modeled as a low rank matrix (Pnevmatikakis et al., 2016). For microendoscopic data the larger 161 integration volume leads to more complex background contamination (Zhou et al., 2018). Therefore, 162 a more descriptive model is required (see Methods and Materials (Mathematical model of the CNMF 163 framework) for a mathematical description). CAIMAN BATCH embeds these approaches into a general 164 algorithmic framework that enables scalable automated processing with improved results in terms 165 of quality and processing speed. The algorithm represents the movie as the sum of rank-one spatio-temporal components capturing either neurons and processes, plus additional non-sparse low-rank terms for the background uorescence and neuropil activity. (e) Flow-chart of the CAIMAN BATCH processing pipeline. From left to right: Motion correction and generation of a memory e cient data format. Initial estimate of somatic locations in parallel over FOV patches using CNMF. Re nement and merging of extracted components via seeded CNMF. Removal of low quality components. Final domain dependent processing stages. (f) Flow-chart of the CAIMAN ONLINE algorithm. After a brief mini-batch initialization phase, each frame is processed in a streaming fashion as it becomes available. From left to right: Correction for motion artifacts. Estimate of activity from existing neurons, identi cation and incorporation of new neurons. Periodically, the spatial footprints of inferred neurons are updated (dashed lines). algorithm by introducing a number of algorithmic features and a CNN based component detection 180 approach, leading to a major performance improvement. 181 We now present the new methods introduced by CAIMAN. More details are given in Methods and 182 Materials and pseudocode descriptions of the main routines are given in the Appendix. 183 Batch processing of large scale datasets on standalone machines 184 The batch processing pipeline mentioned above can become a computational bottleneck when 185 tackled without customized solutions. For instance, a naive approach to the problem might have as 186 a rst step to load in-memory the full dataset; this approach is non-scalable as datasets typically 187 exceed available RAM (and extra memory is required by any analysis pipeline). To limit memory 188 usage, as well as computation time, CAIMAN BATCH relies on a MapReduce approach (Dean and 189 Ghemawat, 2008). Unlike previous work (Freeman et al., 2014), CAIMAN BATCH assumes minimal 190 computational infrastructure (up to a standard laptop computer), is not tied to a particular parallel 191 computation framework, and is compatible with HPC scheduling systems like SLURM (Yoo et al.,192 2003). 193 Naive implementations of motion correction algorithms need to either load in memory the full 194 dataset or are constrained to process one frame at a time, therefore preventing parallelization.  tion correction is parallelized in CAIMAN BATCH without signi cant memory overhead by processing 196 several temporal chunks of a video data on di erent CPUs. CAIMAN BATCH broadcasts to each CPU a 197 meta-template, which is used to align all the frames in the chunk. Each process writes in parallel to 198 the target le containing motion-corrected data, which is stored in as a memory mapped array. This 199 allows arithmetic operations to be performed against data stored on the hard drive with minimal 200 memory use, and slices of data to be indexed and accessed without loading the full le in memory. 201 More details are given in Methods and Materials (Memory mapping). 202 Similarly, the source extraction problem, especially in the case of detecting cell bodies, is 203 inherently local with a neuron typically appearing in a neighborhood within a small radius from its 204 center of mass (Fig. 2a). Exploiting this locality, CAIMAN BATCH splits the FOV into a set of spatially 205 overlapping patches which enables the parallelization of the CNMF (or any other) algorithm to 206 extract the corresponding set of local spatial and temporal components. The user speci es the size 207 of the patch, the amount of overlap between neighboring patches and the initialization parameters 208 for each patch (number of components and rank background for CNMF, stopping criteria for CNMF-209 E). Subsequently the patches are processed in parallel by the CNMF/CNMF-E algorithm to extract 210 the components and neuropil signals from each patch. 211 Apart from harnessing memory and computational bene ts due to parallelization, processing in 212 patches acts indirectly as a dynamic range equalizer and enables CAIMAN BATCH to detect neurons 213 across the whole FOV, a feature absent in the original CNMF, where areas with high absolute 214 uorescence variation tend to be favored. This results in better source extraction performance. 215 After all the patches have been processed, the results are embedded within the FOV (Fig. 2a), 216 and the overlapping regions between neighboring patches are processed so that components 217 corresponding to the same neuron are merged. The process is summarized in algorithmic format in 218 Alg. 1 and more details are given in Methods and Materials (Combining results from di erent patches).

220
Initialization methods for matrix factorization problems can impact results due to the non-convex 221 nature of their objective function. CAIMAN BATCH provides an extension of the GREEDYROI method 222 used in Pnevmatikakis et al. (2016), that detects neurons based on localized spatiotemporal activity. 223 CAIMAN BATCH can also be seeded with binary masks that are obtained from di erent sources, e.g., 224 through manual annotation or segmentation of structural channel (SEEDEDINITIALIZATION, Alg. 2). 225 More details are given in Methods and Materials (Initialization strategies). 226 Figure 2. Parallelized processing and component quality assessment for CAIMAN BATCH. (a) Illustration of the parallelization approach used by CAIMAN BATCH for source extraction. The data movie is partitioned into overlapping sub-tensors, each of which is processed in an embarrassingly parallel fashion using CNMF, either on local cores or across several machines in a HPC. The results are then combined. (b) Re nement after combining the results can also be parallelized both in space and in time. Temporal traces of spatially non-overlapping components can be updated in parallel (top) and the contribution of the spatial footprints for each pixel can be computed in parallel (bottom The method uses a simple intersection over union metric to calculate the distance between di erent

306
Benchmarking against ground truth 307 To quantitatively evaluate CAIMAN we benchmarked its results against ground truth data. 308 Creating ground truth data through manual annotation 309 We collected manual annotations from multiple independent labelers who were instructed to nd 310 round or donut shaped 2 active neurons on 9 two-photon in vivo mouse brain datasets. The datasets 311 were collected at various labs and from various brain areas (hippocampus, visual cortex, parietal 312 cortex) using several GCaMP variants. A summary of the features of all the annotated datasets is 313 given in Table 2. Details about the annotation procedure are given in Methods and Materials. 314 To address human variability in manual annotation each dataset was labeled by 3 or 4 inde-315 pendent labelers, and the nal ground truth dataset was created by having the di erent labelers 316 reaching a consensus over their disagreements (Fig. 3a). The result of this process was de ned as  ground truth). We believe that the current database, which will be made publicly available, presents Uncertainty quanti cation: By employing more than one human labeler we discovered a sur-330 prising level of disagreement between di erent annotators (see Table 1, Fig. 3b

351
Manual annotations show a high degree of variability 352 We compared the performance of each human annotator against a consensus ground truth. The 353 performance was quanti ed with a precision/recall framework and the results of the performance 354 of each individual labeler against the consensus ground truth for each dataset is given in Table 1. 355 The range of human performance in terms of F 1 score was 0.69-0.94, with average 0.83± 0.07 (mean 356 ± STD). All annotators performed similarly on average (0.83±0.05, 0.83±0.08, 0.84±0.06, 0.85±0.08). 357 We also ensured that the performance of labelers was stable across time (i.e. their learning curve 358 plateaued, data not shown). As shown in Table 1 (see also Fig 4b) the F 1 score was never 1, and in 359 most cases it was less or equal to 0.9, demonstrating signi cant variability between annotators.    Complete results with precision and recall for each dataset are given in Table 1. (c-d) Performance of CAIMAN BATCH increases with peak SNR. (c) Example of scatter plot between SNRs of matched traces between CAIMAN BATCH and ground truth for dataset K53. False negative/positive pairs are plotted in green along the x-and y-axes respectively, perturbed as a point cloud to illustrate the density. Most false positive/negative predictions occur at low SNR values. Shaded areas represent thresholds above which components are considered for matching (blue for CAIMAN BATCH selected components and red for GT selected components) (d) F 1 score and upper/lower bounds for all datasets as a function of various peak SNR thresholds. Performance increases signi cantly for neurons with high peak SNR traces (see text for de nition of metrics and the bounds). Testing the quality of the inferred traces is a more challenging task due to the complete lack 404 of ground truth data in the context of large scale in vivo recordings. As mentioned above, we 405 considered as ground truth the traces obtained by running the CNMF algorithm seeded with the 406 5 These precision and recall metrics are computed on di erent sets of neurons, and therefore strictly speaking one cannot combine them to form an F 1 score. However, they can be bound from above by being evaluated on the set of matched and non-matched components where at least one trace is above the threshold (union of blue and pink zones in Fig. 4c) or below by considering only matched and non-matched components where both ground truth and inferred traces have SNR above the threshold (intersection of blue and pink zones in Fig. 4c). In practice these bounds were very tight for all but one dataset (Fig. 4d). More details can be found in Methods and Materials (Performance quanti cation as a function of SNR). binary masks obtained by consensus ground truth procedure. After alignment of the ground truth 407 with the results of CAIMAN, the matched traces were compared both for CAIMAN BATCH and for 408 CAIMAN ONLINE. Fig. 5a, shows an example of 5 of these traces for the dataset K53, showing very 409 similar behavior of the traces in these three di erent cases. 410 To quantify the similarity we computed the correlation coe cients of the traces (ground truth vs 411 CAIMAN BATCH, and ground truth vs CAIMAN ONLINE) for all the 9 datasets ( Fig. 5b-c). Results indicated 412 that for all but one dataset (Fig. 5b) CAIMAN BATCH reproduced the traces with higher delity, and 413 in all cases the mean correlation coe cients was higher than 0.9, and the empirical histogram 414 of correlation coe cients peaked at the maximum bin 0.99-1 (Fig. 5c) Online analysis of a whole brain zebra sh dataset 424 We tested CAIMAN ONLINE with a 380GB whole brain dataset of larval zebra sh (Danio rerio) acquired CAIMAN without patches. Ten example temporal traces are plotted in Fig. 7b.

462
Computational performance of CAIMAN 463 We examined the performance of CAIMAN in terms of processing time for the various analyzed 464 datasets presented above (Fig. 8). The processing time discussed here excludes motion correction, 465 which is highly e cient and primarily depends on the level of the FOV discretization for non-rigid footprints. Fig. 8b-right shows that the two rst steps, which are required for each frame, can 494 be done in real-time. In Fig. 8c the cost per frame is plotted for the analysis of the whole brain 495 zebra sh recording. The lower imaging rate (1Hz) allows for the tracking of neural activity to be 496 done with computational cost signi cantly lower than the 1 second between volume imaging time 497 (Fig. 8c), even in the presence of a large number of components (typically more than 1000 per plane, 498 Fig. 6) and the signi cantly larger FOV (2048 ù 1188 pixels). As expected the cost of updating spatial 499 footprints can be signi cantly larger if done simultaneously for all components (Fig. 8c, bottom). 500 However, the average cost of updating a single spatial footprint is roughly 8ms, enabling real-time 501 processing for each frame, when this step is evenly distributed among di erent frames/volumes, or 502 is performed by a parallel independent process (Giovannucci et al., 2017). 503 The cost of processing 1p data in CAIMAN BATCH using the CNMF-E algorithm (Zhou et al., 2018) 504 is shown (Fig. 8d)   lead to computational gains at the expense of increased memory usage. This is because the CNMF-E 506 introduces a background term that has the size of the dataset and needs to be loaded and updated 507 in memory in two copies. This leads to processing times that are slower compared to the standard 508 processing of 2p datasets, and higher memory requirements. However, as Fig. 8d  Such decisions signi cantly a ect the speed at which data is read or written: in column-major order 626 reading a full column is fast because memory is read in a single sequential block, whereas reading a 627 row is ine cient since only one element can be read at a time and all the data needs to be accessed. 628 Therefore, the original dataset must be saved in the right order to avoid performance problems. 629 In the context of calcium imaging datasets, CAIMAN BATCH represents the datasets in a matrix 630 form Y , where each row corresponds to a di erent imaged pixel, and each column to a di erent 631 frame. As a result, a column-major order mmap le enables the fast access of individual frames at a 632 given time, whereas a row-major order les enables the fast access of an individual pixel at all times. 633 To facilitate processing in patches CAIMAN BATCH stores the data in row-major order. In practice, 634 this is opposite to the order with which the data appears, one frame at a time. In order to reduce The CNMF framework (Fig. 1d) for calcium imaging data representation can be expressed in mathe- where W À R dùd is an appropriate weight matrix, where the (i, j) entry models the in uence of the 656 neuropil signal of pixel j to the neuropil signal at pixel i.

657
Combining results from di erent patches 658 To combine results from the di erent patches we rst need to account for the overlap at the bound- Prior to merging, the value of each component at each pixel is normalized by the number of patches 667 that overlap in this pixel, to avoid counting the activity of each pixel multiple times. 668 We follow a similar procedure for the background/neuropil signals from the di erent patches. 669 For the case of two-photon data, the spatial background/neuropil components for each patch can 670 be updated by keeping their spatial extent intact to retain a local neuropil structure, or they can 671 be merged when they are su ciently correlated in time as described above to promote a more 672 global structure. For the case of one-photon data, CNMF-E estimates the background using a 673 local auto-regressive process (see Eq. (2)) (Zhou et al., 2018), a setup that cannot be immediately 674 propagated when combining the di erent patches. To combine backgrounds from the di erent 675 patches, we rst approximate the backgrounds B i from all the patches i with a low rank matrix 676 using non-negative matrix factorization of rank g b to obtain global spatial, and temporal background 677 components.
The resulting components are embedded into a large matrix B À R dùT that retains a low rank 679 structure. After the components and backgrounds from all the patches have been combined, 680 they are further re ned by running CNMF iteration of updating spatial footprints, temporal traces, 681 and neuropil activity. CAIMAN BATCH implements these steps in a highly parallel fashion (as also Here we present the unsupervised and supervised quality assessment tests in more detail (Fig. 2). 724 Matching spatial footprints to the raw data over the union of these intervals to obtain a spatial image < Y i > (Fig. 2c) ✓ SNR = 2, (Fig. 2d). The average SNR is as a measure of how unlikely it is for the transients of c i (after 742 some appropriate z-scoring) to have been a result of a white noise process. 743 To compute the SNR of a trace, let R = Y * AC * B be the residual spatiotemporal signal. We 744 can obtain the residual signal for each component i, r i , by projecting R into the spatial footprint a i : Then the trace c i + r i corresponds to the non-denoised trace of component i. To calculate its SNR 746 we rst compute a type of z-score: The parameter of a half-normal distribution (Fig. 2b): We then determine how likely is that the positive excursions of z i can be attributed just to noise. We The (averaged peak) SNR of component i can then be de ned as is again an appropriate SNR threshold, e.g., ✓ SNR = 2, and i is the noise level for trace i. 768 Classi cation through convolutional neural networks (CNNs) 769 The tests described above are unsupervised but require ne-tuning of two threshold parameters 770 (✓ sp , ✓ SNR ) that might be dataset dependent and might be sensitive to strong non-stationarities. As a 771 third test we trained a 4-layer CNN to classify the spatial footprints into true or false components, 772 where a true component here corresponds to a spatial footprint that resembles the soma of a 773 neuron (See Fig. 2e and section Classi cation through convolutional networks for details). A simple 774 threshold ✓ CNN can be used to tune the classi er (e.g., ✓ CNN = 0.5). Collection of manual annotations and ground truth 776 We collected manual annotations from four independent labelers who were instructed to nd 777 round or donut shaped neurons of similar size using the ImageJ Cell Magic Wand tool Walker (2014). 778 We focused on manually annotating only cells that were active within each dataset and for that  (Fig. 4). Registration was performed using the REGISTERPAIR algorithm 802 (Alg. 5) and match was counted as a true positive when the (modi ed) Jaccard distance (Eq. 11) was 803 below 0.7. Details of the registration procedure are given below (see Component registration). also generalized to the rest of the datasets (Fig. 2e)

775
To compute the distance between two binary masks m 1 , m 2 , we use the Jaccard index (intersection 863 over union) which is de ned as and use it to de ne the distance metric as where ✓ d is a distance threshold, e.g., 0.5 above which two components are considered non-866 matching and their distance is set to in nity to prevent false assignments. 867 After the distance matrix D has been completed, an optimal matching between the components 868 of the two sessions is computed using the Hungarian algorithm to solve the linear assignment allowing for e cient tracking of components across multiple days (Fig. 9), and the comparison 881 of non-consecutive sessions through the union without the need of direct pairwise registration 882 (Fig. 10)). An alternative approach to the problem of multiple session registration ( To quantify performance as a function of SNR we approximate the ground truth traces by running 940 CAIMAN BATCH on the datasets seeded with the "consensus" binary masks obtained from the manual 941 annotators. After that the average peak-SNR of a trace c with corresponding residual signal r (5) is 942 obtained as 943 SNR(z) = * *1 (p min ), where *1 ( ) denotes the probit function (quantile function for the standard Gaussian distribution), 944 z is the z-scored version of c + r (6) and p min is given by (8 The uorescence trace f i of component i can be written as The uorescence due to the component's transients overlaps with a background uorescence due 979 to baseline uorescence of the component and neuropil activity, that can be expressed as where BASELINE : R T ≠ R T is a baseline extraction function, and B is the estimated background 981 signal. Examples of the baseline extraction function are a percentile function (e.g., 10th percentile), 982 or a for longer traces, a running percentile function, e.g., 10th percentile over a window of a hundred 983 seconds 6 . To determine the optimal percentile level an empirical histogram of the trace (or parts of 984 it in case of long traces) is computed using a di usion kernel density estimator (Botev et al., 2010), 985 and the mode of this density is used to de ne the baseline and its corresponding percentile level. 986 The F _F activity of component i can then be written as The approach we propose here is conceptually similar to practical approaches where the F _F is 988 computed by averaging over the spatial extent of an ROI (Jia et al., 2011) with some di erences:  Appendix 0 Figure 10. Tracking neurons across days, step-by-step description of multi session registration (Fig. 9).