Hierarchical network analysis of behavior and neuronal population activity

Recording of neuronal population activity in behaving animals is becoming increasingly popular. Computational markerless annotation tools allow for tracking of animal body-parts throughout the experiment. However, the question remains of how to cross-correlate the extracted behavioral data with the simultaneously acquired neuronal population activity, when both datasets are of high dimensionality. Here we propose a combined analysis, where the behavioral data is clustered into discrete states using a deep learning model and the occurrence of each state is correlated to clusters of neuronal activity. We then model the relationship between behavioral states as a network, where related states are hierarchically grouped while the similarity between their neuronal correlates is maximized. This type of analysis allows for hierarchical exploration of the bidirectional relationship between behavior and its neuronal correlates at different temporal scales.


Introduction
Understanding the relationship between neuronal activity and behavior is a major challenge in neuroscience. Precise correlation analysis requires deep quantification of both, behavioral features and neuronal activity patterns. Recently, poseestimation tools such as DeepLabCut enabled exact tracking of animal body-parts during experiments (Mathis et al., 2018). Such tools provide a continuous representation of the animal body movement, for which neuronal correlates could be directly detected within the population activity (Gallego, Perich, Miller, & Solla, 2017). However, it is hypothesized that behavior may be represented in form of discrete states that organize hierarchically (Tinbergen, 1951;Wiltschko et al., 2015). Still, 1 equal contribution the challenge remains at which spatio-temporal scale the discrete states should be resolved (Berman, 2018).
In this work we first show how continuous signals obtained from behavior tracking tools can be grouped into discrete states via clustering of the latent vector obtained from a recurrent neural network autoencoder. We then demonstrate how the resulting behavioral states can be correlated to neuronal activity on different hierarchical levels by considering their similarity in behavior, neuronal activity, or both. Finally, we demonstrate our analysis on a dataset obtained from a mouse running on a linear treadmill with simultaneous imaging of the hippocampal CA1 region.

Experimental setup
A food deprived mouse was trained to run head-fixed on a textured linear treadmill of 3.6m length. The mouse learned to re-477 This work is licensed under the Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0 ceive a liquid reward upon licking at a predefined location once per lap. This led to a repetitive behavior where the mouse was running approximately 3 rounds per minute. Simultaneously, the population activity (GCaMP 7s) of hippocampal CA1 pyramidal neurons was imaged at 15 Hz through an implanted hippocampal window using a two-photon resonant scanning system (Fuhrmann et al., 2015). Synchronized behavioral video monitoring was performed with a camera capturing the sideview of the animal at a frame rate of 25 Hz. An exemplary video frame is shown in Figure 1 (B).

Data preprocessing
The imaging stack was down-sampled to 5Hz and corrected for motion artifacts. Active temporal components were extracted using constrained non-negative matrix factorization (Pnevmatikakis et al., 2016). The factorization resulted in a deconvoluted ∆F/F time series for each detected component. We then extracted the onset of each peak from the time series via threshold-crossing and assigned a weight according to the peak maximum. If multiple peaks occurred within a transient, the weight of each onset was set to the difference of the peak to the decay of the preceding peak, which was extrapolated by an exponential function.
For behavioral pose extraction, virtual markers were placed on eight body-parts in 150 uniformly picked video frames and a residual neural network was trained to assign the virtual markers for the entire video sequence (Mathis et al., 2018).

Sequence autoencoder
In order to learn the structure of the temporal representation of a behaving animal we built a recurrent neural network autoencoder consisting of Gated Recurrent Units (GRU) . The goal of the autoencoder was to learn a d-dimensional latent vector Λ t ∈ R d . The input sequence X t ∈ R 2n×T carries the (x, y) coordinates of n marker positions for the video sequence [t . . .t + T ]. The sequence autoencoder then learns the mapping, (1) Our approach is motivated by (Srivastava, Mansimov, & Salakhudinov, 2015), where a long short-term memory (LSTM) composite encoder-decoder model was proposed for unsupervised video representation learning. However, in order to make training more efficient we chose to use GRUs instead of LSTMs in every layer of our autoencoder model (Chung, Gulcehre, Cho, & Bengio, 2014). The encoder (1) is then trained to generate a latent vector Λ t that is passed to two one layer GRU decoder. The first one reconstructs the sequence X t and the second predicts the sequence X t+T .
The autoencoder architecture is depicted in Figure 1 (A) and was trained using the Adam optimizer (Kingma & Ba, 2015) with a fixed learning rate of 0.001 and with the mean squared error as the objective function for both, reconstruction and prediction. We used Rectified Linear Units (ReLU) activation functions in every layer. Backpropagation is performed on the combined loss and the whole network is trained on a single Nvidia 1080ti GPU. All computing was done with Pytorch (Paszke et al., 2017).

Behavioral states
To identify the behavioral state space B = {b 1 , . . . , b K } in our dataset we computed the latent vector Λ t for every data point t. As the full experiment contains N frames, the resulting feature matrix F is of dimensionality d × (N − T ). We performed k-means clustering on F to identify K behavioral states. We modeled the transitions between behavioral states as a discrete-time Markov chain where the transition probability into a future state is only dependent on the present state. This results in a K × K transition probability matrix T , with the ele- being the transition probabilities from one state b l ∈ B to another state b k ∈ B.
The Markov chain (2)  We computed the pairwise correlations between activity traces of all active components resulting in a correlation matrix R. We then clustered R into a block-diagonal matrix Z using spectral co-clustering (Dhillon, 2001), which effectively grouped all components into M biclusters with similar values in corresponding rows and columns of R. The choice of M must be made individually based on the structure of the particular neuronal recording. We have then reduced the dimensionality of each cluster using factor analysis into a shared component (Everett, 1984), that has been shown to be particularly consequential for behavior (Kohn, Coen-Cagli, Kanitscheider, & Pouget, 2016). An exemplary sequence of activity for each cluster is shown in Figure 2.

Results
During the experiment we imaged 640 active components from the hippocampal CA1 region, that were grouped into M = 30 neuronal clusters. We have set the sliding window size T = 25 (1s of video data), and the dimensionality of the latent vector Λ t to d = 20. Hence, the model achieved a compression ratio of 16×25 20 = 20. To identify behavioral states we chose k = 8 in our k-means clustering assignment, based on the Elbow method. To qualitatively validate the outcome of the clustering we inspected the original video frames for every obtained cluster. Quantitative validation was made based on the original velocity signal from the treadmill experiment, which was not used to train the sequence autoencoder. Figure 1 (C) shows that a subset of behavioral states is only active during running while a different subset of states is active during resting or reward taking periods.

Combining neural and behavioral representation
To understand which clusters of neuronal activity are active during a behavioral state and vice versa, we performed a combined analysis. First, we aligned the behavioral state sequence to match the temporal resolution of the neuronal data recording. We then created a M × K combined states matrix S containing the averaged normalized event rate for each neuronal cluster and every behavioral state b k . Next, we computed the dissimilarities between rows of S based on the Kullback-Leibler divergence, where p, q ∈ {1 . . . K} are modeled as the probability distribution of the neuronal clusters from two behavioral states. Applying (3) for all combinations of behavioral states results in a K × K neuronal distance matrix D.

Graph to tree mapping
In order to inspect the relationship between behavior and neuronal activity at different hierarchical levels we transformed G into a binary tree T. Thus, we have iteratively merged two The first cost function (4) merges nodes with the highest transition probability in G, therefore considering their behavioral similarity. The second cost function (5) merges two nodes with the smallest Kullback-Leibler dissimilarity of the neuronal representation. The third cost function (6) considers a ratio of the transition probability and Kullback-Leibler dissimilarity for every node pair. After each reduction step the matrices T and D are recomputed for rows and columns corresponding to the merged nodes. Figure 4 shows the resulting trees T T , T D and T R obtained via the cost functions (4)-(6). We then used the hierarchical tree representation for community structure detection, as suggested in (Newman, 2010). We could identify three communities in the tree T R , based on a cut at the second hierarchical level. We then inspected the original video for every community and identified stereotyped behaviors as running, reward or transition phases. For visualization, we have also annotated the obtained communities in Figure 3.

Conclusion
In this paper we proposed an approach for combined analysis of behavior and neuronal population data. We have shown how continuous signals obtained from behavior tracking tools can be converted into discrete behavioral states using clustering of latent vectors obtained from a sequence autoencoder. We have then demonstrated how the resulting behavioral states can be correlated to clustered neuronal population activity via a hierarchical approach. Depending on the costfunction for aggregation of states, our analysis could extract the organization of behavioral states, as well as structure of the underlying neuronal correlates.
The analysis has been demonstrated on a dataset where a mouse was running on a linear treadmill while receiving liquid reward at a fixed location. The behavioral clustering yielded a total of 8 behavioral states, which we grouped into three communities. In these communities, five states were active during running phases, two during resting or reward taking phases, and one during transitions between the aforementioned phases. Further investigation of the communities could lead to discovery of other sub-communities, e.g. different running patterns.
Explicit graph to tree mapping can be furthermore useful to compare and quantify behavior between different experimental conditions, trials and animals. Given for example two graphs (G 1 , G 2 ) and their respective tree representation (T 1 , T 2 ), it is possible to apply the Tree Edit Distance (Zhang & Shasha, 1989) to compute the dissimilarity between two trees.
The proposed behavioral clustering method is sensitive to the choice of the time window T and the clustering parameter k. Furthermore, we believe that the application of dynamic time-warping could improve the correlation between behavioral states and neuronal activity, as suggested by (Lawlor, Perich, Miller, & Kording, 2018).
For future work, we aim to generalize the analysis to a wider range of behavioral experiments, including experiments in freely-moving animals.