Uncovering functional signature in neural systems via random matrix theory

Neural systems are organized in a modular way, serving multiple functionalities. This multiplicity requires that both positive (e.g. excitatory, phase-coherent) and negative (e.g. inhibitory, phase-opposing) interactions take place across brain modules. Unfortunately, most methods to detect modules from time series either neglect or convert to positive, any measured negative correlation. This may leave a significant part of the sign-dependent functional structure undetected. Here we present a novel method, based on random matrix theory, for the identification of sign-dependent modules in the brain. Our method filters out both local (unit-specific) noise and global (system-wide) dependencies that typically obfuscate the presence of such structure. The method is guaranteed to identify an optimally contrasted functional ‘signature’, i.e. a partition into modules that are positively correlated internally and negatively correlated across. The method is purely data-driven, does not use any arbitrary threshold or network projection, and outputs only statistically significant structure. In measurements of neuronal gene expression in the biological clock of mice, the method systematically uncovers two otherwise undetectable, negatively correlated modules whose relative size and mutual interaction strength are found to depend on photoperiod.

Understanding how billions of neurons collectively selforganise into a functionally ordered brain which is able to coordinate a variety of neural and bodily processes is one of the most fundamental quests in neuroscience.Over the last decades, evidence has accumulated that the functional organisation of the brain is modular and hierarchical.This means that the brain appears to be partitioned into mesoscopic 'functional modules' where each module is composed of neurons with a relatively similar dynamical activity, while different modules are comparatively less related to each other but sometimes hierarchically nested inside larger modules.Reliably identifying functional modules, along with their possibly hierarchical structure, is a nontrivial task complicated by many challenges and animating an active field of research.
A challenging complication is the irreducibility of functional modules to contiguous anatomical regions defined a priori and/or to local neighbourhoods in the underlying structural network of neuron-to-neuron anatomical connections [1].Indeed, while on the one hand functional modules are expected (and indeed observed) to partly reflect the local brain anatomy, on the other hand, major deviations between functional and structural networks are observed.One key example is the distinctive 'longrange' left-right symmetry of some functional modules: often, a single such module is found to be split into two spatially non-contiguous populations of neurons, located at distant but symmetric positions in the left-right direction [2,3].As an opposite example, an anatomically well-defined brain region can be functionally heterogeneous [4,5] and sometimes even display anti-correlation between the activity of some of its parts [6,7].These examples indicate a breakdown of the correspondence between structural and functional modules, showing that it is in general impossible to infer the latter purely from spatial information.Indeed, it is expected that the mapping between functional and structural networks is many-to-one, thus allowing a diversity of functions to arise from a common neuronal anatomy [1].
Precisely because they cannot be reduced to 'spatially obvious' brain regions, functional modules must contain essential information about an emergent, non-structural level of neuronal organisation which can only be investigated via the explicit analysis of time series of neuronal funtion.More specifically, recordings of neuronal function (multiple time series) are normally used to construct an association matrix (e.g.cross-correlation, mutual information, etc.) that captures the dynamical relations between each pair of units [see Fig 1].Next, the matrix can be analysed in different ways to detect the presence of functional dependencies or structure in the observed system.These dependencies between the different units can essentially be categorized as positive (+) and negative (-) interactions, i.e. correlation and anti-correlation.
In neuronal systems, like possibly many other biological networks, we expect to find positive and negative correlations.Synaptic interaction between neurons will influence the mutual phase and lead to different states of synchronization in a brain circuit.The degree of synchronization versus desynchronization is important for neural function and a disturbance in this balance can contribute to neurological disorders.In the central mammalian clock situated in the suprachiasmatic nucleus (SCN) of the hypothalamus, the state of synchronization of neurons can influence responses of the circadian clock to light and is used to encode seasonal changes in day-length.It has been suggested that inhibitory as well as excitatory neuronal interaction will contribute to the phase differences observed under different photoperiods [8,9].It is remarkable that the balance between excitatory and inhibitory activity (E/I), which is a hallmark of healthy network performance, can actually change with photoperiod [10].This implies that the identification of the functional structure that corresponds to the positive-negative FIG. 1.A comparison between the general steps of functional network construction and our new method.In the left scheme, we can see the introduction of a threshold procedure.Once the majority of the data is neglected one can create a network representation.In the right scheme, we apply our spectrum filtering by comparing the empirical data to the random null model.By directly producing a partition of the original time series into communities, our new method bypasses the functional network projection, avoiding the use of a threshold procedure.
dependencies is a crucial step to better understand the underlining mechanisms in a system.However, it is a very challenging task due to the presence of (possibly many) strong 'common trends' in the measured time series of activity of individual neurons or more general regions of interest (ROIs).Depending on the spatial and temporal resolution of the data, such time series may contain a multitude of periodic or systematic trends at different frequencies (e.g.heartbeat, breathing, circadian rhythms) which impart an overall positive correlation to several ROIs, without actually representing any real functional relatedness among the ROIs themselves.As a con-sequence, standard techniques like Independent Component Analysis [11,12] may fail in finding 'true' underlying components associated with the functional structure.For this reason, interest has recently shifted towards the analysis of functional networks [13], which project the strongest dependencies between pairs of ROIs onto links in a network.
In functional networks, there are currently two main methodologies how to evaluate anti-correlation, i.e. negative dependencies, in a system.The first and most commonly used approach is the introduction of a threshold procedure [see Fig 1], where weak correlations are neglected depending on some threshold criteria and only the strongest correlations remain [14].The main known limitation of this approach is the unavoidable information loss induced by the choice of an arbitrary threshold, which also makes any output of the analysis thresholddependent [15].When evaluating negative dependencies, the procedure disregards (and erases) any presence of anti-correlation in the system.The second and more recent approach is to consider the negative values as important information of functional dependency, and to incorporate it together with the positive relations in different ways [16].The methods investigated in this study proposed different "general utility functions", such as the socalled modularity, which is able to process both negative and positive values, and extract a functional community structure.However, as most current methods, they treat correlations as edges of a (signed [16]) network and use a null hypothesis based on random graphs with independent edges, thus violating some basic 'metric' properties of correlation matrices.In the next section, we will further discuss and compare this approach to the one presented here.
Here, we propose a novel approach that targets specifically the positive and negative interactions in a system by random matrix theory.The approach generalizes a recent community detection method for financial systems [17,18], which is limited purely to random dynamics, enabling applications to various non-stationary systems.First, the method uses a modified Wishart ensemble to correctly filter noise and common global trends (if present) from the data, the ensemble serves as a natural and more appropriate null model for correlation matrices.This will result in a new method for functional module detection that is threshold-free and does not require the arbitrary projection onto a functional network [see Fig 1].Next, in contrast to current approaches, our method is designed to yield an optimal communities structure, where positive values are clustered within the modules and negative values are expelled outside.We call this the "functional signature" of a system.This structure is composed out of functional modules that are internally (overall) positively correlated, and are externally (overall) anti-correlated.We show that such a functional representation can identify phase ordering of oscillating cell populations based on the strength and sign of their coupling interactions.We should stress in any stage of the process there are no presumptions on the output of the method, and the results are completely driven by the data itself.Our method is applicable to all time scales neuronal functions are recorded in, which could reach from milliseconds to years addressing questions from synchronous neuronal activity to seasonal variation in brain function.
We apply the method to the suprachiasmatic nucleus (SCN) a small region in the hypothalamus responsible for regulating circadian rhythms of physiology and behaviour in mammals.We chose the SCN because of the relatively small network (ca 20000 neurons) with a high degree of plasticity.Single SCN neurons are capable of generating circadian rhythms in, amongst others, gene expression and electrical activity.The phase differences between the cells can vary with changes in the environment, like different photoperiods or prolonged light-exposure, or with an attenuation of the degree of coupling between the neurons as seen in aging or disease.This makes the SCN an optimal study case, as a dynamic network of neurons with different internal oscillations, mechanistically coupled to E/I processes.
In the examples presented here we used neuronal data on gene expression of a clock gene period2 in the SCN.The data were sampled every hour for at least three days by means of a bioluminescence reporter PER2::LUC.We focus on detecting functional modules within the SCN, using samples taken from mice that were subjected to different photoperiods.Our data-driven and impartial method detects the hidden structure of two distinct clusters of functionally connected SCN neurons that have a strong resemblance to a knwon core/shell distinction [19] and that have never before been found without the use of prior knowledge.Interestingly, we are able detect the physiological differences present in different photoperiods in the functional signature of the different clusters.

A random matrix null model for correlation matrices of neural activity
Given an empirical correlation matrix constructed from multiple time series of neuronal activity, our method looks for functional modules upon removing the joint effects of noise in the data and of common temporal trends, as both features may obfuscate the empirical identification of possible underlying substructure.For this task the method first requires a null model that will serve as random benchmark, accurately identifying the non-random patterns in the correlation matrix.Here, we introduce an improved null model, based on random matrix theory, which takes into account the cell to cell variability, and do not require the unrealistic assumption that the time series are stationary.Therefore we can allow for any temporal modulation [see Fig S1], both in individual time series and in their resulting common trend.This is very important, given the strongly time-dependent nature of functional brain data in general, and of our timemodulated oscillating signals in particular.So, even if the calculation and interpretation of correlation matrices usually assumes stationarity, here we can statistically treat correlation matrices calculated from nonstationary data as well.
The first step is an exact calculation of the combined, undesired effects of noise and common trends on the density of eigenvalues ρ(λ) of a theoretical cross-correlation matrix.This step corresponds to the definition of a null model for a correlation matrix without modular patterns, but with a noise level calibrated to the observed one and with a global trend that exactly follows the one in the empirical time series.The output of this first step is illustrated in Fig 2A .The density of eigenvalues, which is calculated exactly in the null model [see SI], features one largest eigenvalue λ max due to the global trend, plus a "random bulk" extending between a minimum (λ − ) and a maximum (λ + ) eigenvalue.
The second step is a filtering of the original correlation matrix via the identification of the empirical eigenvalues that deviate, in a statistically significant manner, from the ones predicted by the module-free null model.In practice, this reduces to the selection of the empirical eigenvalues that are found in the range (λ + , λ max ).Looking again at Fig. 2, we indeed see the presence of deviating eigenvalues in the empirical spectrum.This step is completed by the selection of the eigencomponent of the correlation matrix associated with the deviating eigenvalues.The resulting, cleaned component of the original matrix contains statistically significant, noise-and trend-filtered information about the presence of functional modules.

Functional signature in neural systems
Once the original correlation matrix has been filtered by the null model, only the statistically significant dependencies remain in the matrix.At this point our aim is to identify functional modules which are positively correlated internally and negatively correlated externally, this can be transformed into an optimization problem.We employ community-detection techniques that take the filtered correlation matrix as input and return the optimized partition of the system into functional modules.The optimized partition will include the most positive dependencies (correlation) inside the clusters while expelling the negative dependencies (anti-correlation) outside of the clusters.We should stress that the emergent functional structure will be only observed when it is statistically significant.Moreover, the number of detected clusters is not defined a priori, and is determined purely by the encoded information in the data.Importantly, while standard community-detection methods are based on null models that are justified only for networks, but not for time series, our method builds on the appropriate In the bottom panels are the partitions detected, where each community is marked with a different colour.In the top panels are the corresponding resolved filtered correlation matrices displaying the resolved structure as a block matrix.
FIG. 3. A comparison between our method and the general modularity approach.In the top panels, a syntactic system with 300 signals randomly distributed over 3 main groups, and the corresponding correlation matrix.On the bottom are the two likelihood matrices (averaged over 1000 runs) indicating the probability of two units to belong to the same community.We can see that the general modularity method [16] is clustering all the signals into one community (bottom left), while our method can correctly detect the 3 artificial subgroups (bottom right).
null model described above and calculated exactly in the previous section.
Another critical distinction of this approach with re-spect to current methods is the ability to interpret the negative correlation under a correct null hypothesis.As discussed in the introduction the current methods either dismiss negative values or turn them into positive relations and evaluate them together with the positive links.This limitation results from the need to project the information from the correlation matrix to a functional network, which only allows for positive links.Here, since we introduce a correct null model for the time series data, and not for the projected network, we are able to cluster separately the negative and positive dependencies.
In fact, the layer of the negative dependencies 'helps' to identify the optimized partition by penalizing an existence of negative relation within a cluster.Notably, since we consider a correlation matrix as the complete "functional landscape" of a system, the method is able to use all the information and not only a small part of it (threshold procedure) or a distorted image of it (general modularity) [see Fig S1].
For a more detailed analysis, we compare our method to a weight-conserving characterization method for functional networks [16].This method is using a general modularity, i.e. a combination of positive modularity with negative modularity, in order to characterize functional structure.We generate a syntactic sample with 300 oscillating signals divided into 3 main groups, wherein each group the signals are randomly assigned different phases [ Fig 3].We can clearly see that due to the differences in phase between the groups, the relations between different groups become negative (anti-correlation).We process the correlation matrix with the method proposed in [16] and with our method, and the two methods yield significantly different results.While our method is able to cluster the 3 groups perfectly, the general modularity method cluster the whole system into one community.
FIG. 4. The community structure of the SCN as resolved by a standard threshold approach.On the left, we plot the community structure, resolved by the standard method, for different thresholds.In blue are the nodes that belong to the large cluster, while in gray are isolated nodes ('communities' that only contain one node).In the right panel, we plot the fraction of nodes in the largest connected component S in blue, and the fraction of communities detected M in red.
The reason is the identification of negative dependencies as positive ones, which isubsequently appears as a complete network with positive links.We should stress that this limitation is relevant to all the methods that are mentioned in [16], precisely because of the wrong use of the null model designed for networks and not for time series.

Uncovering the hidden functional signature of the SCN
The brain region we apply our method to is the suprachiasmatic nucleus (SCN), located in the hypothalamus in the brain, and recognized as the site of the central circadian clock in mammals.This clock is important for the regulation of our daily and seasonal rhythms.It has been shown that the neuronal network organization of the SCN changes in different photoperiods [20], however, the mechanisms behind these changes are still elusive.Furthermore, only a subset of neurons within the SCN network are directly responsive to light [21], which poses the question how encoding for seasonally changing day length is achieved in the SCN network.The SCN is a prototypical example of a brain structure for which resolving functional organization is challenging for the reasons outlined above: it consists of about 20000 neurons that are spatially close (total size of 1 mm 3 -so, structurally speaking, these neurons form a single densely connected cluster, whose only anatomical substructure is a left-right split into two lobes) while at the same time displaying a high variability in terms of the signals of the constituent neurons.
Currently, brain networks are most often derived from data acquisition techniques that do not have the possibility to perform recordings at the single cell level.Techniques such as fMRI, MEG or EEG use brain regions as nodes in the network and fiber bundles between these regions as edges.We investigate a brain network at the micro-scale where nodes are single cells and edges are functional connections between the cells.
We first perform a standard analysis based in the main-stream method [see Fig 1] for detecting communities via functional networks.This is a useful reference as a comparison with our own method.In Figure 4 we present the community structure, resolved by the standard method, for different thresholds.In blue are the nodes that belong to the large cluster, while in gray are isolated nodes (communities that only contain one node).In the right panel, we plot the fraction of nodes in the largest connected component S = LCC N in blue, and the fraction of communities detected M = Communities N in red.It is evident that applying different thresholds essentially detaches isolated nodes from the large cluster, and there is no optimal value for the threshold.Therefore, the standard method can only observe a "radial gradient" of connectivity, and there is no sense of multiple communities of neurons, which is one of the signatures of functional as opposed to structural connectivity.This poor performance of the method is a known limitation when applied to very dense networks.
Our method detects mostly two communities which coincide with the core and shell distinction within the SCN [19].The core of the SCN receives light input and adjusts quickly to changing light schemes, while the shell of the SCN lags behind [22].Mostly the core-shell distinction of the SCN is interpreted as a distinction between the ventrolateral and the dorsomedial part of the SCN, which is predominantly based on anatomical data [23].In this study the two clusters that were found were more dorsolaterally and ventromedially located, and while it is based on functional data this may differ from known anatomical distinctions.Furthermore, the SCN is much more heterogeneous when looked at cellular phenotype or gene expression [4,24].The anatomical loci do not necessarily delineate the phenotypical SCN regions very precisely, which implies that functionally, the core-shell distinction is less clearly defined and may differ from the described anatomical division (see also [19]).
Regional analysis of the SCN using functional time series has been performed by other groups.Evans and co-workers used a similar approach to identify single-celllike regions of interest, but did not use clustering algorithms and chose the regions by hand [25].Silver and co- workers also used regions of interest, called superpixels, but these were not necessarily identified as single-cells.Based on these superpixels they used threshold methods to identify regional differences in the SCN [26,27].Abel and co-workers applied a threshold method based on mutual information on single-cell-like regions of interest [28].These approaches encounter similar problems as described in this paper when using the threshold method: they only find one cluster (in the core, or ventral part) and many non-clustered cell-like ROIs (in the shell or dorsal part).Our results presented here are in line with the regional division of the SCN proposed in these studies, but we were able to identify both the core and shell clusters.
Furthermore, our approach is able to identify the two clusters in different experimental conditions, ranging from summer conditions (long days, short nights: LD 16h:8h) to winter conditions (short days, long nights: LD 8h:16h).On the contrary, Evans and co-workers identified changes occuring in the organization of the SCN, where the two regions similar to our clusters were found, only for very long day conditions (LD 20h:4h) [29].
As a next step we analyzed the values in the functional signatures and compared those between different photoperiods that the animals have been subjected to.With this step we reveal the dynamics within the population of neurons in the clusters and between the clusters.As the cluster-partition is based on the functional signature, we will now investigate the values within and between the clusters, exploring the inner and outer level of correlation.This extra information links physiological properties of the SCN to the functional signature found in the data.We measure the average residual correlation within each cluster detected by our method and we plot the community distribution of the measured values [Fig 6A ,B].We then identify the cumulative probabilty of the values in the clusters and we see that in short photoperiods the average values are much higher than in long photoperiods [Fig 6C].This means that the correlation within the clusters is significantly higher in short photoperiods than in long photoperiods.When we examine the values between the clusters, we see that the average value is lower in short versus long photoperiod, meaning that the clusters are less correlated in short phtoperiods [Fig 6D].These results connect directly to previous results in physiological properties as described in [5] and is supported in other papers [20,30].Thus, we show that the hidden functional representation reveals the phase ordering of oscillating cell populations caused by physiological properties of the SCN.

DISCUSSION
Our method reveals hidden functional dependencies that are obfuscated by the presence of a global mode in the neuronal gene expression, which imparts an overall positive correlation.This problem becomes particularly evident when searching for functional structure in neuronal systems where the global signal is very strong, making the identification of functional modules very challenging.Our method is able to deal with the effects of noise and common global trends in the original data in a robust manner.In fact, we have shown that the effects of noise and those of the global signal are coupled, as their signatures in the spectrum of the correlation matrix depend on each other.
We found a distinctive left-right functional symmetry with core-shell features in the SCN.This structure reveals non-contiguous regions that display strongly synchronized activity, despite being at a relatively large distance from each other, similar to [2].Remarkably, here we detect this functional symmetry on a micro-scale level where nodes are single cells.In this respect, it is important to notice that when the traditional method applied to the SCN we only identify a radial gradient of functional connectivity and no modules with long-range leftright symmetry [28].By contrast, our method finds that the left and right shell regions of the SCN, despite being spatially disconnected into two non-contiguous regions, are functionally joined into a single module.These symmetrical structures in the SCN raise important questions with respect to the mechanism in the system, and can possibly be explored in the future.
The ability to exploit all the information from the correlation matrix, i.e. both the negative and the positive dependencies (correlation and anti-correlation), in order to detect the functional modules is very powerful.The strength of our method is to detect communal phase differences in neuronal networks by analysing time series data without using any presumptions or threshold definitions.Phase differences and phase adjustments in neuronal networks are an key feature for physiological function and can be used to define the functional state of a network in health and disease.Our method allows the identification of synchronized clusters of cells.Synchronization within a neuronal network was suggested to play a major role in the occurrence of epilepsy [31,32], Parkinsons disease [33,34] and schizophrenia [35,36].It is noteworthy that the clusters determined with our methods are not influenced by the functional change in E-I balance occurring in different photoperiods.This is advantegous since our analysis will also detect functional clusters within neuronal networks with altered E/I balance often found in neurological disease (e.g.epilepsy, RETT, FragileX, autism) and in the aging brain.
The results presented here show that our method offers great potential for detecting hidden functional synchronization and desynchronization in brain networks and are not limited to gene expression rhythms.Time series from other modalities, such as electrical action potential recordings, EEG recordings and fMRI recordings can also be interpreted through this new method.As such, the method may offer diagnostic or pre-diagnostic applications in medical health care.

FIG. 2 .
FIG. 2. (A)Empirical eigenvalue density versus calculated eigenvalue density for the two random models.(B) The community structure of the SCN as resolved by our method.On the left, is the community structure detected by random model without filtering the global mode (Random).On the right, is the community structure detected by random model once the global mode is filtered (Random + Global).In the bottom panels are the partitions detected, where each community is marked with a different colour.In the top panels are the corresponding resolved filtered correlation matrices displaying the resolved structure as a block matrix.

FIG. 5 .
FIG. 5. (A) The bioluminescence image of one SCN sample.(B) The plotted average partition over all the samples.(C) The plotted average signal of the whole system (in black) versus the mean signals of the two detected communities (in red and blue) for one SCN sample.(D) The plotted average residual signals of the two communities of one SCN sample, once the global signal is subtracted.

FIG. 6 .
FIG. 6. (A) The resolved functional signature and modules structure of a long photoperiod (LP, L16D8) sample.(B) The resolved functional signature and modules structure of a short photoperiod (SP, L8D16).(C) Cluster analysis: plotting the cumulative distribution of the dependencies within the two detected clusters, comparing the two different photoperiods.The upper graph shows cluster 1 and the lower graph cluster 2. C represents the measured averaged correlation of a cluster, and δ ≡ N − N + is defined as the contrast ratio of a cluster, measuring the ratio of negative dependencies versus positive dependencies.(D) Inter-cluster analysis: plotting the cumulative distribution of the external dependencies between the two detected clusters,comparing the two different photoperiods.C represents the measured averaged anti-correlation between clusters, and δ ≡ N +N − is defined as the contrast ratio of a cluster, measuring the ratio of positive dependencies versus negative dependencies.