Identifying properties of pattern completion neurons in a computational model of the visual cortex

Neural ensembles are found throughout the brain and are believed to underlie diverse cognitive functions including memory and perception. Methods to activate ensembles precisely, reliably, and quickly are needed to further study the ensembles’ role in cognitive processes. Previous work has found that ensembles in layer 2/3 of the visual cortex (V1) exhibited pattern completion properties: ensembles containing tens of neurons were activated by stimulation of just two neurons. However, methods that identify pattern completion neurons are underdeveloped. In this study, we optimized the selection of pattern completion neurons in simulated ensembles. We developed a computational model that replicated the connectivity patterns and electrophysiological properties of layer 2/3 of mouse V1. We identified ensembles of excitatory model neurons using K-means clustering. We then stimulated pairs of neurons in identified ensembles while tracking the activity of the entire ensemble. Our analysis of ensemble activity quantified a neuron pair’s power to activate an ensemble using a novel metric called pattern completion capability (PCC) based on the mean pre-stimulation voltage across the ensemble. We found that PCC was directly correlated with multiple graph theory parameters, such as degree and closeness centrality. To improve selection of pattern completion neurons in vivo, we computed a novel latency metric that was correlated with PCC and could potentially be estimated from modern physiological recordings. Lastly, we found that stimulation of five neurons could reliably activate ensembles. These findings can help researchers identify pattern completion neurons to stimulate in vivo during behavioral studies to control ensemble activation.


Introduction
Neural ensembles are groups of neurons that are consistently coactive both spontaneously and in response to stimuli [1][2][3][4][5]. They are found in diverse regions of the brain including the motor cortex, sensory cortices, and the hippocampus [6][7][8][9][10]. Neuroscientists have long believed that ensembles play a functional role in cognition, and recent experimental work has found ensembles that underlie associative memories, motor behaviors, and visual perception [6][7][8].
Recording and manipulating ensemble activity can help elucidate ensemble formation, activation, and function. One form of ensemble activation under study is pattern-completion, where ensembles composed of tens of neurons can be activated by the stimulation of just a few neurons [8,[11][12][13]. Understanding the function of pattern completion neurons can help elucidate how diverse stimuli are encoded in the brain and how information is efficiently transferred across brain regions [4, 14,15]. For example, it is possible that the brain transmits information to different brain regions by stimulating ensembles in these regions via synaptic connections with pattern completion neurons. Additionally, pattern completion in the visual cortex may explain how organisms can recognize partially occluded objects [16,17].
Methodical perturbation of ensembles is critical to establishing a causal link between ensemble activity and behavior. Previous work used pattern completion properties to demonstrate the role of ensembles in visual perception: activation of ensembles via pattern completion neurons was sufficient to drive learned behavior in the absence of visual stimuli [8]. However, ensemble activation using the selected pattern completion neurons had a low success rate. To efficiently activate ensembles, neuroscientists need a way to optimize selection of pattern completion neurons. Proper design of perturbation patterns based on ensemble recordings could facilitate a more comprehensive understanding of ensemble function.
Recent advances in microscopy and protein engineering have partially developed the relationship between ensemble activity and behavior [18][19][20][21][22][23][24][25]. Advances in microelectrode arrays and fluorescence imaging have enabled neuroscientists to measure ensemble activity within and between cortical layers [10,26,27]. This work has shown that different ensembles are spatially overlapped and sequentially activated [8,[28][29][30][31]. Given these properties, optogenetics has become valuable for ensemble manipulation due to its ability to selectively perturb individual neurons with temporal specificity [23,24,32,33]. Previous work has combined optogenetics and fluorescence imaging to study the spontaneous and evoked activation of ensembles [2,10]. This all-optical approach can facilitate the study of pattern completion and ensemble function in vivo.
Efficient selection of pattern completion neurons is important for analyzing ensemble dynamics and understanding the functional role of pattern completion in cognition. Previous work using conditional random fields to identify pattern completion neurons in layer 2/3 of mouse primary visual cortex (V1) resulted in an ensemble recall rate of 5% (i.e. 5% of trials successfully activated the ensemble) [8]. Despite some progress, there remains a limited understanding of the network properties that facilitate pattern completion.
Graph theory may be a useful tool to identify influential nodes in complex networks that drive pattern completion activity [34][35][36][37][38][39][40]. Graph theory applications in neuroscience have been effective, mainly in the context of understanding information transfer between brain regions or to understand network restructuring in disease. Examples of graph theory in these whole-brain cases have used either diffusion tensor imaging to extract structural connectivity or fMRI signals to estimate functional connectivity [41][42][43][44][45]. However, there is limited application of graph theory to analyze neural microcircuits, specifically cortical ensembles. Studying these microcircuits in vivo requires recording ensemble dynamics with high spatial, temporal, and genetic specificity. Neural relationships, such as correlations between spikes, help estimate network connectivity. Estimates of connectivity are limited by the slow temporal kinetics of calcium indicators, or the low spatial resolution and lack of genetic specificity of microelectrode arrays [46][47][48][49][50]. In comparison, a computational approach could analyze subthreshold dynamics and network connectivity while maintaining high spatial and temporal resolution.
In this study, we developed a computational model that simulated layer 2/3 of mouse V1. Computational models of layer 2/3 of V1 have been a useful tool to study circuit motifs that facilitate different activity patterns involved in visual processing [51][52][53]. Computational models have also been employed to investigate network responses to stimulation [54,55]. Recent work used a rate-based model of layer 2/3 of mouse V1 to analyze the network-wide effects of single neuron perturbation [55]. Their findings helped elucidate the circuit mechanism underlying feature suppression in excitatory cortical neurons. We built off these studies to develop a realistic spiking model of layer 2/3 of mouse V1 to observe and manipulate ensemble dynamics. Our model consisted of excitatory pyramidal neurons and three types of inhibitory neurons (Parvalbumin (PV), Somatostatin (SOM), and Vasoactive intestinal peptide (VIP) expressing neurons). Each neuron class in our model replicated connectivity patterns and functional properties found in slice and in vivo. We used K-means clustering to identify ensembles. We simulated optogenetic perturbation of neuron pairs within an ensemble to identify and characterize efficient pattern completion neurons over hundreds of trials. We then applied graph theory to predict efficient pattern completion neurons. Finally, we developed a novel metric to identify pattern completion neurons that could be experimentally acquired with modern recording techniques.

Our model of V1 layer 2/3 replicated structural and physiological properties found in mice
We attempted to accurately model the connectivity patterns and physiological properties found in layer 2/3 of the mouse visual cortex (Methods). First, the network mimicked the composition and connectivity pattern of the mouse cortex (Figs 1A and S1) [51][52][53]56,57]. The network comprised 4000 leaky integrate-and-fire neurons representing four different cell types. Excitatory pyramidal neurons made up 80% of the network, while inhibitory (PV, SOM, and VIP) neurons made up the remaining 20% of cells. Excitatory neurons made connections with other excitatory neurons and all inhibitory neuron subtypes, PV neurons connected to excitatory neurons and had recurrent connections with other PV neurons, and SOM neurons connected to all other neuron subtypes except themselves [51,53,56]. Neurons were assigned a preferred orientation (PO) between 0 and 180 degrees, which influenced connectivity between between select neuron subtypes. Our model (filled bars) closely match the PSP distribution found by Cossell et al. [57] (left) and Znamenskiy et al. [60] (middle/right). (D) The average spontaneous firing rates in our model over 30 seconds resemble those found in vivo in anesthetized mice in Mesik et al. [63], Neill and Stryker [64], and Chen et al. [62]. Data points and error bars from in vivo data are the mean ± s.e.m. found in those studies. (E) Average resting membrane potential for each neuron subtype in our model over 30 seconds. Our results closely match those found in vivo in L2/3 of the barrel cortex in mice in Neske et al. [65] and Avermann et al. [66]. Data points and error bars from in vivo data are the mean ± s.e.m. found in those studies.  [51,55]. Excitatory neurons with similar POs were more likely to be connected to each other than to neurons with different POs (Fig 1B) [57][58][59]. The distribution of synaptic strengths between excitatory and PV cells was similar to distributions found in slice ( Fig 1C) [57,60]. Strengths between all other cell types followed lognormal distributions with parameters guided by previously published studies in slice (S1B Fig) [56,61]. Second, the spiking statistics and baseline activity also matched experimentally obtained metrics. The spontaneous firing rate for each neuron subtype in the network resembled the corresponding firing rate found in anesthetized mice ( Fig 1D) [62][63][64]. The baseline resting membrane potential for each neuron type closely resembled values found in vivo; SOM neurons had a higher baseline resting potential than the other neuron types (Fig 1E) [65,66]. The baseline membrane potential fluctuations were also in line with those found in whole-cell patch clamp measurements in vivo (S2A Fig) [67,68].

Our model of V1 layer 2/3 recapitulated known responses to stimulation
The network's response to simulated optogenetic stimulation reproduced features of the response from in vivo mouse studies. We applied stimulation at 30 Hz for 250 ms duration (Methods). The stimulated neuron repeatedly fired during the stimulation period (Figs 2A and S2B). Single neuron stimulation produced a net inhibitory effect on the network (Fig 2B and  2C) [69]. The effect of stimulation was additive: increases in the number of stimulated excitatory neurons increased the inhibitory effect on the network ( Fig 2B). Specifically, the average firing rate of excitatory neurons decreased linearly as the number of stimulated neurons increased (slope = −0.007 spikes/s/stimulation, p = 5.2 × 10 −35 , F-test, n = 20 trials for each stimulation number). The diverging properties and functions of neuron subtypes in the network matched experimental observations. SOM neurons in layer 2/3 responded to stimulation of a single excitatory neuron in vivo whereas PV neurons monitored population activity [70].  [55,69]. This feature suppression was observed experimentally through specific excitatory-inhibitory connectivity; PV neurons most strongly innervated excitatory neurons that strongly activated them and shared their PO [55,60,71].

K-means clustering identified ensembles of densely connected neurons
We used machine learning to identify clusters in the network that shared properties with neural ensembles found in vivo. Given the pattern completion properties of in vivo ensembles as well as their regular spontaneous activation, we reasoned that ensembles consisted of excitatory neurons with strong recurrent connections [2,8,12,[72][73][74]. We constructed such ensembles by removing inhibitory neurons and weak connections from the network and then using K-means clustering to group densely connected excitatory neurons (Methods; Figs 3A, 3B and S3A). All but one of the fifty clusters showed orientation selectivity, and we did not classify the non-orientation selective cluster as an ensemble (S3B Fig). The resulting clusters were qualitatively densely interconnected ( Fig 3C). Neurons within a cluster had a higher probability of being bidirectionally connected than randomly chosen neurons in the network (Fig 3D, p = 5.5 × 10 −5 , one-tailed Wilcoxon rank-sum test, n = 10 sets of 100 neuron pairs). To verify that the clusters also described the functional, non-structural properties of neural ensembles, we tested whether the neurons within an ensemble had more correlated activity and more similar POs than would be expected by chance. Voltage traces from neurons within an ensemble had higher correlation values than the correlation values between randomly selected neurons ( Fig  3E, p = 9.1× 10 −5 , one-tailed Wilcoxon rank sum test, n = 10 sets of 10 neurons). The POs of neurons within an ensemble were also more similar than POs of randomly selected neurons ( Fig 3F, p = 9.1× 10 −5 , one-tailed Wilcoxon rank sum test, n = 10 sets of 10 neurons), which is consistent with groups of these neurons responding to a "Go" stimulus of a certain orientation [8]. These findings were consistent across a range of median ensemble sizes (S4 Fig).  We developed a metric to evaluate and compare the ensemble recall performance of different pattern completion neurons. We chose an ensemble at random to "train" by increasing the weights between ensemble neurons, simulating increased excitability found in ensembles (Methods; S5 and S6 Figs) [75]. We then randomly selected 30 different pairs of ensemble neurons to stimulate, matching experimental optogenetic perturbations [8]. We defined an ensemble as "activated" if 75% of ensemble neurons fired within 10 ms of stimulation. The average ensemble recall rate was just under 5%, which was consistent with in vivo experiments ( Fig 4A) [8]. The average voltage of the ensemble just before stimulation was variable, and the probability of recalling an ensemble increased as the pre-stimulation voltage approached the −55 mV action potential threshold ( Fig 4B). Some neuron pairs could successfully recall an ensemble at lower voltages, whereas other pairs only activated the ensemble when the ensemble was close to threshold. To account for variability in pre-stimulation voltage, we defined a metric that was independent of the distribution of pre-stimulation voltages for any given neuron pair. We termed this metric as the Pattern Completion Capability (PCC) and defined it as the pre-stimulation voltage needed for a neuron pair to achieve a 5% ensemble recall rate (Methods; S5C Fig). Neuron pairs with higher overall ensemble recall rates were able to achieve a 5% activation rate at pre-stimulation voltages further from threshold. PCC was highly negatively correlated with overall ensemble recall rate (Pearson r = −0.96) (Fig 4C).

Graph theoretic connectivity parameters of neurons predicted neuron PCC
We were able to predict a neuron pair's PCC from graph theory parameters (Methods). We used LASSO regression to account for multicollinearity that would arise between these parameters ( Fig 4D). We used best subset selection to choose four graph theory parameters to use as predictors in our LASSO model: degree, closeness centrality, intersection, and union (Methods; Figs 4E, top and S7). Each of these measures was linearly related to PCC with varying strengths (Fig 4E, bottom). Degree and closeness centrality had the strongest relationship with PCC, followed by union and then intersection. Higher values for all parameters were associated with PCCs further from threshold, indicating higher efficiency at activating the ensemble. These parameters were also correlated with each other to varying degrees ( Fig 5A). LASSO regression accurately predicted PCC from the four network parameters (Pearson r = 0.92) (Fig 5B and S1 Table). We tested the importance of each parameter with respect to the model's prediction accuracy by computing each parameter's algorithm reliance value (Methods; Fig 5C). Model accuracy was most reliant on degree: removing degree from training resulted in a 1.7-fold increase in error. Additionally, removal of any of the variables from the algorithm increased the amount of error in predicted PCC, as indicated by algorithm reliance values greater than 1.
To test for reproducibility, we chose another ensemble at random, stimulated 25 random neuron pairs from that ensemble, and calculated the same network parameters for those pairs (S8 Fig). Trends in multicollinearity were similar between both ensembles ( Fig 5A). LASSO densely connected neurons from the similarity matrix. The number of clusters was tuned such that the median size of each cluster (ensemble) was about 40 neurons (dashed lines). (C) Three examples of ensembles found in the network. We returned directionality (arrows) to the network after the ensembles were found. (D) Neurons within each ensemble had a higher probability of being bidirectionally connected than randomly selected neurons in the network. n = 10 sets of 100 neuron pairs. (E) Neurons within each ensemble were more correlated than randomly selected neurons in the network. n = 10 sets of 10 neurons. (F) Neurons within each ensemble had more similar POs than randomly selected neurons in the network. n = 10 sets of 10 neurons. *** indicates p < 0.001.
https://doi.org/10.1371/journal.pcbi.1011167.g003 Neuron pairs with a higher overall ensemble recall rate were more likely to activate ensembles even at lower pre-stimulation voltages. (C) The pre-stimulation voltage needed for stimulation of a neuron pair to activate an ensemble 5% of the time was significantly and negatively correlated with the overall ensemble recall rate (p = 5.5 × 10 −16 , F-test, n = 29 neuron pairs). regression accurately predicted PCC in this ensemble as well (Pearson r = 0.92) (Fig 5B). Model accuracy was most dependent on degree, albeit to a lesser extent (algorithm reliance = 1.3) indicating that the other graph theory parameters were able to compensate for the loss of degree information.

A novel latency metric helped identify pattern completion neurons
We next identified an alternative approach to identifying pattern completion neurons that accounted for the limitations of in vivo recordings. Although optical and protein engineering advancements have improved the temporal resolution and field of view of imaging setups, it remains difficult to reconstruct network connectivity using in vivo measurements. Calcium indicators do not measure sub-threshold voltages, which limits their ability to estimate EPSP and IPSP values. Voltage indicators typically have small responses and likewise cannot accurately quantify the postsynaptic strengths [76]. These challenges could hinder the assessment of the weights and directionality of connections in vivo.
However, the improved kinetics of modern calcium and voltage indicators can obtain useful information about ensemble dynamics. Our method of estimating PCC in vivo can take advantage of the sequential activation pattern of ensembles; neurons activate in stages rather than simultaneously (Fig 6A) [28][29][30][77][78][79][80]. This pattern of activation was not uniform over our trials. We hypothesized that, on average, neurons that fired earlier in the sequence would have better pattern completion abilities than neurons that fired later. We labeled this metric "latency," and found that it was highly correlated with PCC in both ensembles (Pearson r = 0.70 and 0.85) (Methods; Fig 6B).
The variable rise kinetics of popular calcium indicators can hinder estimation of spike timing [48]. We next evaluated how variability of in vivo optical imaging conditions, specifically those of modern GCaMP variants, would affect the measurement of latency (Methods) [19,81]. The rise time variability of GCaMP8f is less than that of GCaMP7f ( Fig 6C). Variability of these sensors impaired measurement of latency as shown by a weakened correlation between PCC and latency in both ensembles (Fig 6D; p-values in S2 Table, one-tailed Wilcoxon ranksum test, n = 50 sets of 100 ensemble activation events).
We next examined the number of ensemble recall events required to predict PCC using different calcium indicators. For both ensembles, GCaMP8f was able to achieve a Pearson r between latency and PCC greater than 0.5 in under 50 events (Fig 6E). However, GCaMP7f did not reach a Pearson r of 0.5 in under 150 events in ensemble 1 and needed 65 events in ensemble 2. Taken together, these results suggest that GCaMP8f could be used in vivo to identify pattern completion neurons from a reasonable number of ensemble activation events.

Latency from spontaneous ensembles identified pattern completion neurons
Next, we tested our latency metric on spontaneous ensemble recall events. Ensembles in vivo have regular spontaneous activity, and we hypothesized that neurons that fired earlier in spontaneous ensemble events would be better pattern completion neurons [4]. We selected a new Neurons with higher overall ensemble recall rates could achieve a 5% recall rate at voltages further from threshold. (D) Graph theory parameters helped predict the pattern completion capability of each neuron pair through LASSO regression. (E) Top: Schematics representing the four network parameters calculated for each neuron pair. Bottom: The relationship between each network parameter and the pattern completion capability of each neuron pair. All variables were significantly and negatively correlated with PCC, with degree and closeness centrality having the strongest correlation (from left to right p = 4.6 × 10 −9 , 1.9 × 10 −8 , 3.3 × 10 −3 , 1.3 × 10 −5 , F-test, n = 29 neuron pairs).
https://doi.org/10.1371/journal.pcbi.1011167.g004 ensemble (n = 36 neurons) and observed spontaneous activity for 250 seconds (Fig 7A). We detected 41 spontaneous ensemble recall events during this period. We calculated latency as the relative time between each neuron spike and the median time for each ensemble event  Fig 7B). We averaged the latency for each neuron over the 41 events (Fig 7C). We identified 6 neurons with early average latency and 5 neurons with late average latency ( Fig  7D). We randomly defined 10 pairs of early neurons and 10 pairs of late neurons drawn from our defined neurons without replacement, stimulated each pair 400 times, and calculated the recall rate and PCC for each pair. Neurons with earlier latency during spontaneous ensemble events were better pattern completion neurons than neurons with higher latency. Neuron pairs that, on average, fired early in spontaneous recall events had significantly higher recall rates during paired stimulation (Fig 7E; p = 1.4 × 10 −4 , one-tailed Wilcoxon rank sum test, n = 10 pairs of neurons). Early neurons also had significantly lower PCC than later neurons ( Fig 7F; p = 8.9 × 10 −5 , one-tailed Wilcoxon rank sum test, n = 10 pairs of neurons). Finally, we found that simulated action potential latency was correlated with calcium latency when simulated using experimental GCaMP rise kinetics (Fig 7G).

Stimulation of five neurons could reliably activate ensembles
Finally, we evaluated the number of stimulated neurons needed to reliably activate ensembles. Stimulation of two pattern completion neurons could achieve a median recall rate of about 15% (Fig 7E). Using the same ensemble as in Fig 7, we identified the seven neurons with the earliest latency and randomly sampled 10 sets of 2 to 5 of these neurons without replacement. We also randomly sampled sets of neurons with varying latency from the entire ensemble. We stimulated each of these neuron sets 200 times at both 20 Hz and 30 Hz (Fig 8A). Randomly sampled neurons consistently had a lower recall rate than early neurons. We found that 20 Hz stimulation of either four early neurons or five random neurons could reliably activate ensembles (recall rate >75%). 30 Hz stimulation did not allow enough time between trials for the ensemble to recover from the previous recall event (S9 Fig). Importantly, PCC calculated from the same neuron pairs stimulated at both 20 Hz and 30 Hz were consistent, indicating that PCC was a robust measure of a neuron pair's performance (Fig 8B; Pearson r = 0.88, n = 10 pairs of early neurons and 10 random pairs of neurons). We then analyzed the response of the rest of the network when the ensemble was stimulated at 20 Hz for 400 ms. As the recall rate increased, the number of active excitatory neurons increased compared to pre-stimulation ( Fig 8C). The neurons that were active during this period were more likely to have similar POs to ensemble neurons (Fig 8D). The probability of these neurons firing during the stimulation period increased as the recall rate/number of stimulated neurons increased. These results were consistent with those found in vivo [10]. To balance this increased excitatory activity, each group of inhibitory neurons increased their firing rate during the stimulation period compared to baseline (Fig 8E). This increase was directly correlated to recall rate for each inhibitory neuron group (n = 80 neuron groups, sizes 2 to 5).

Discussion
In this work, we developed a spiking model of layer 2/3 of mouse V1 to evaluate properties of pattern completion neurons in cortical ensembles. Our model replicated known connectivity patterns and functional properties found in slice and in vivo. Our model consisted of 80%  https://doi.org/10.1371/journal.pcbi.1011167.g008 excitatory neurons, and the remaining 20% of neurons was split into three interneuron classes: PV, SOM, and VIP neurons. Synaptic weights between these cell types were log-normally distributed and these distributions aligned with postsynaptic potentials found in slice. Connections were governed by PO similarity, which inherently created ensembles. Additionally, fundamental physiological properties of each neuron class, such as firing rate and resting membrane potential, replicated values found in vivo. The functional properties of specific neuron classes in our model were also concordant with in vivo findings. Excitatory neurons in our model exhibited feature suppression; SOM neurons monitored single neuron activity while PV neurons monitored population activity. Finally, ensembles in our model exhibited characteristics found in vivo: ensembles displayed pattern completion properties and activated sequentially. We then developed methods to optimize selection of pattern completion neurons. We defined a novel metric, PCC, that described the pre-stimulation voltage needed for pattern completion neurons to have a 5% success rate. Efficient pattern completion neurons were able to activate ensembles at lower pre-stimulation voltages. We found that LASSO regression could accurately predict PCC using graph theory parameters. Of these parameters, degree had the strongest correlation with PCC. We next defined a novel metric, latency, that could be used in future calcium imaging studies to identify pattern completion neurons in vivo. Latency could be used to predict PCC in both evoked and spontaneous ensemble events. Finally, we found that stimulation of five ensemble neurons could reliably activate ensembles.
Our computational approach allowed us to identify and characterize network properties that optimize ensemble recall. Previous work has used graph theory to identify pattern completion neurons in vivo [8,82]. These studies used conditional random fields (CRFs) to identify neurons that predicted features of visual stimuli associated with their ensemble. When tested with optogenetic stimulation in vivo, these pattern completion neurons had a 5% ensemble recall rate. Pattern completion neurons in our model also had an average recall rate of 5%. Our computational approach improved our understanding of network connectivity and enabled simulation of subthreshold dynamics. We found that ensemble neurons had highly correlated membrane potentials, and ensemble recall probability was dependent on the average membrane potential of the ensemble just before stimulation. We then applied graph theory and machine learning to identify network properties that optimized ensemble recall rate.
The relationship between network architecture and driver nodes has varied across studies in multiple disciplines. Multiple methods have been proposed to identify influential nodes in theoretical complex networks [40,[83][84][85][86]. Network-based methods have also been used to identify influential nodes in a variety of real networks including metabolic, cell-signaling, and social networks [87][88][89][90]. Theoretical methods have generally found that centrality measures such as betweenness centrality are better predictors of influence than degree. However, the properties of optimal control nodes varied with network type; driver nodes in theoretical networks typically had low degree while driver nodes in biological networks had higher degree values [88,[91][92][93][94] Neuroscience networks likely also have unique properties that affect node influence.
Our analysis of network theory focused on a regime of network theory underexplored by previous studies. Specifically, neural networks are usually directed, spiking networks with both excitatory and inhibitory connections. These networks also have complex subthreshold dynamics that vary over time. Our work expanded on previous computational and systems neuroscience studies to apply graph theory to cortical ensembles. The growing field of network neuroscience aims to identify influential nodes in structural and functional brain networks using network theory and control theory [37,39,43,45,95,96]. However, nodes in these networks usually refer to brain regions rather than individual neurons [35,41,44,[97][98][99]. Our work focused on identifying pattern completion neurons in cortical microcircuits. Moreover, our work focused on finding an optimal pair of influential nodes, rather than a single influential node. Nevertheless, our finding that the degree of these neuron pairs strongly predicted influence on the rest of the network matched similar observations in other networks [93].
Future work that follows from our simulated results could validate and apply our findings in vivo. Researchers can estimate graph theory parameters by reconstructing network connectivity using electrical or optical modalities. Network connectivity has been estimated by calculating the cross-correlation, transfer entropy, or Pearson r between electrical or fluorescence traces from individual neurons [46,[100][101][102][103][104][105][106][107]. Multiple recording modalities can obtain traces from ensemble neurons in vivo, each of which has distinct advantages and disadvantages. First, microelectrode arrays can be used to study cortical dynamics with high temporal precision [50,[108][109][110][111]. However, these electrodes record extracellular field potentials from multiple surrounding neurons and therefore lack the spatial and genetic specificity offered by geneticallyencoded fluorescent indicators. Genetically-encoded calcium indicators provide high spatial resolution and can be genetically targeted to study specific cell types. However, the temporal dynamics of these sensors are slower and more variable than electrode measurements [19,47,48,112,113]. This slow temporal resolution may limit functional connectivity estimation to undirected networks. However, the amplitude and temporal kinetics of these indicators have iteratively improved over the past decade; the signal-to-noise and temporal resolution of GCaMP8f are 5 times better than those of GCaMP6f [19,113]. Recent advances in protein engineering have designed genetically-encoded voltage indicators that can record membrane potentials with genetic specificity and high spatial and temporal resolution [76,[114][115][116][117]. However, capturing millisecond temporal dynamics requires an imaging frame rate of at least 500 Hz. Additionally, the signal-to-noise ratio of these indicators is an order of magnitude worse than that of calcium indicators. For these reasons, voltage imaging is not currently feasible for neural population studies in most lab settings. The further development of calcium indicators as well as the emergence of voltage imaging will likely improve the estimation of network connectivity. Improved temporal precision of calcium indicators will help estimate direction of connectivity, while analysis of subthreshold dynamics via voltage imaging will improve estimation of connection strength [76,113,[115][116][117][118]. Nevertheless, our investigation of the network parameters that predict pattern completion ability could potentially inform macro-scale studies, where network connectivity between brain regions is achievable. Such information could be used to study conditions such as seizures, which have been shown to exhibit stereotypical sequential activations of different brain regions, similar to ensemble activation events [119][120][121]. However, for studies at the micro-scale in vivo, we recommend researchers use our latency metric or stimulate at least five ensemble neurons.
To account for limitations in network reconstruction, we defined a novel latency metric that predicted pattern completion neurons in vivo that was independent of network parameters. Our latency metric required repeated activation of ensembles, which can be achieved using holographic optogenetics or the repeated presentation of sensory stimuli [4,8,10,18]. We found that a robust relationship between latency and PCC was achieved in less than 100 trials using GCaMP8f. This aligned with the number of trials used in a single session to train mice on a visual discrimination task [122,123]. The entire process of identifying an ensemble, activating it tens to hundreds of times, and then calculating latency, can be accomplished in a few hours. Pattern completion experiments can then be performed directly after or on subsequent days, as imprinted and visually-evoked ensembles are stable across days [1,2]. However, given the temporal precision needed to accurately measure latency, experimentalists using this technique should use imaging setups that can achieve fast (>200 Hz) frame rates [124][125][126].
A computational model can interrogate network response to repeated activations of ensembles in a fraction of the time of experimental protocols but has inherent limitations. It is possible that pattern completion properties of ensembles appear only after training by repeated co-activation of a subset of ensemble neurons [10,11,127,128]. Recent work suggested an "Iceberg Model" of ensemble activation where neuronal excitability increases during the training of ensembles [75]. In support of this theory, connections between ensemble neurons were found to have a lower action potential threshold in slice and these neurons also exhibited increased membrane resistance after training. To simulate increased neural excitability, we increased the weights of connections between ensemble neurons during ensemble stimulation trials. However, future computational models may consider incorporating alternative mechanisms to simulate increases in neural excitability. While our synaptic weights were static, plasticity mechanisms to change synaptic weights during training can be added to study neural ensemble formation and activation [129,130]. Additionally, researchers may manually alter the spike threshold and lower membrane resistance in ensemble neurons to replicate recent findings in slice [75].
In this study, we used K-means clustering to identify potential ensembles that we could then "train" by increasing the weights of connections between these neurons. A limitation of this method is that K-means clustering cannot group a neuron into more than one ensemble, while previous work has established that individual neurons can participate in multiple ensembles [4]. Another limitation of our method is that we trained a single ensemble at a time when many ensembles exist simultaneously. Finally, K-means clustering in our model relied on knowledge of network connectivity, which is difficult to obtain from imaging experiments. Thus, we suggest experimentalists identify ensembles by observing spontaneous or visually evoked activity in vivo.
Overall, our work can help researchers efficiently identify pattern completion neurons to methodically manipulate ensembles with precise temporal resolution. These experiments are critical to elucidating the functional roles that ensembles and pattern completion play in perception, memory, and behavior.

Network connectivity
Our model consisted of 4000 neurons. We divided these neurons into four groups: excitatory neurons (3200 neurons), PV neurons (330), SOM neurons (330), and VIP neurons (140). We assigned each neuron a preferred orientation between 0 and 180 degrees, which determined the probability of connection between some cell types and the cell-to-cell connection strength (S1A Fig). The probability of connection between excitatory neurons was high if the neuron's POs was parallel; neurons with perpendicular POs were least likely to be connected. The synaptic connection weight of a projection from neuron i to neuron j, W ij , was also proportional to PO similarity: the change in conductance that neuron i evoked in neuron j was greater between neurons with more parallel POs. W values between cell types followed a log-normal distribution. The mean and deviation of these connections was guided by their respective experimental values found in slice from previous studies (S1 Fig) [56,57,60].

Neuron model
We used a leaky integrate-and-fire model to simulate voltage dynamics. We modeled synapses with a conductance in the post-synaptic neuron that changed with activity in a pre-synaptic neuron [130]. The sub-threshold membrane voltage of neuron i, V i , obeyed the equation: The membrane capacitance was C m = 200 pF and the leak conductance was g L = 10 nS. Reversal potentials for excitatory and inhibitory channels were V E = 0 mV and V I = −80 mV, respectively. We tuned the leak reversal potential, V L , for each cell type to match the simulated resting membrane voltage to the experimental values; this resulted in V L = −50 mV, −55 mV, −65 mV, and −65 mV for EXC, PV, SOM, and VIP neurons, respectively. Following an action potential, neurons experienced a refractory period of 2 ms.
The background current was B = 100 pA for excitatory neurons and 0 pA for inhibitory neurons. We simulated membrane voltage fluctuations with a noise amplitude of 5 mV obtained from in vivo patch clamp recordings [67,68].
We simulated optogenetic stimulation with a 250 ms train of 0.6 ms width pulses at 30 Hz. These pulses activated the optogenetic conductance g opsin = 150 nS. The reversal potential for the simulated light-gated channel was V opsin = 0 mV, based on the reversal potential for Channelrhodopsin 2 [131]. The rhodopsin was inactive (g opsin = 0 nS) during periods without stimulation.
An action potential event in an excitatory neuron, i, increased the excitatory conductance of postsynaptic neuron j, g j,E , by the synaptic connectivity strength, W ij . Postsynaptic increases in conductance were delayed 1.5 ms from the presynaptic spike to account for conduction time and synaptic delay [132]. An action potential in an inhibitory neuron increased inhibitory conductance of postsynaptic neuron j, g j,I , by W ij . Postsynaptic increases in inhibitory conductance occurred between 4 and 6 ms after the presynaptic spike [133]. Spontaneous firing was simulated with Poisson input that increased g i,E by 30 nS at different rates for each cell type (EXC = 0.08 Hz, PV = 1.6 Hz, SOM = 0.05 Hz, VIP = 1.3 Hz). Otherwise, these parameters followed dynamics governed by the equations: The integration time step for our simulations was 0.1 ms. Simulations were programmed in Python using Brian2.0.

Identifying ensembles
We first removed the weakest 80% of EXC-EXC connections from the network (EPSP < 0.9 mV). We then removed directionality from the network such that any neurons with either unidirectional or bidirectional synapses were considered connected. We created an adjacency matrix, A, where each cell A ij represented the number of neurons that neuron i and neuron j were both connected to. We performed K-means clustering on A and tuned the number of clusters k until the median cluster/ensemble size was 40 neurons (S3A Fig). This neuron number was based on previous imaging experiments and assumed that those experiments underestimated ensemble size due to limitations in field of view and viral expression [1,10]. The median ensemble size did not affect correlation between ensemble neurons or deviation in PO for ensemble neurons (S4A and S4B Fig). Ensemble size had only a moderate effect on the probability of bidirectional connection (S4C Fig). However, in all tested ensemble sizes, probability of bidirectional connection was greater between cluster neurons than between random neurons. In this study, we focused on orientation-selective ensembles; we randomly selected ensembles from the result of K-means clustering that had a PO standard deviation less than 0.25 radians (S3B Fig). Ensemble 1 in this paper had 37 neurons and ensembles 2 and 3 had 36 neurons. Directionality was added back to the network for further analyses.

Stimulating pairs of neurons to observe ensemble response
We simultaneously stimulated pairs of neurons randomly chosen from an ensemble (30 pairs for ensemble 1 and 25 pairs for ensemble 2). Our simulations were eight 0.6 ms pulses delivered at 30 Hz. For subsequent 20 Hz stimulation, we delivered eight 0.6 ms pulses at 20 Hz. Each pulse successfully activated the ensemble if at least 75% of the neurons in the ensemble fired within 10 ms of the start of the pulse.
We altered the weights of connections between ensemble neurons to achieve a 5% ensemble activation rate by titrating the fold-increase in connection weight (Discussion, S5B Fig). We then altered the background current to B = 75 pA for excitatory neurons to maintain the baseline resting membrane potential and spontaneous firing rates found in vivo (S6 Fig).

Pattern completion capability
We define pattern completion capability (PCC) as the mean membrane voltage of an ensemble of neurons in the immediate 10 ms before stimulation that corresponded with a 5% ensemble recall rate. We calculated this value by first calculating the pre-stimulation voltage of the ensemble, V e , as the average of the membrane voltage of the ensemble in the 10 ms leading up to stimulation for each trial. We then binned trials by their pre-stimulation voltage in a sliding window with a width of 5 mV and a step size of 1.6 mV. Because there were few trials at high mean ensemble voltages (S5A Fig), we excluded trials with mean voltage above −68 mV from analysis. We fit the recall rate to a logistic form: where parameters a and b were fitting parameters found with nonlinear least squares optimization.
We achieved a strong fit for 29 of the 30 neuron pairs in ensemble 1 and all of the neuron pairs in ensemble 2 (R 2 = 0.99 ± 0.12 for ensemble 1 and R 2 = 0.98 ± 0.03 for ensemble 2) (S5C and S5D Fig). The pair in ensemble 1 with a weak R 2 value was excluded from further analyses. We estimated PCC from each neuron pair's fitted equation by setting recall rate equal to 5% and solving for V e .

Graph theory parameters
Graph theory parameters were calculated from the network that removed weak connections (EPSP < 0.9 mV). We also removed all inhibitory neurons and all excitatory neurons outside of the ensemble from the network as well. Parameters were calculated as follows for a pair of neurons A and B: • Degree: Weighted sum of all outgoing connections from Neuron A to other nodes in the ensemble plus the weighted sum of all outgoing connections from Neuron B to other nodes in the ensemble.
• Closeness Centrality: Inverse sum of the distance from Neuron A to all other neurons in the ensemble plus the inverse sum of the distance from Neuron B to all other neurons in the ensemble. Distance was quantified using the synaptic weight.
• Intersection: Number of neurons postsynaptic to both Neuron A and Neuron B. To select these parameters, we performed best subset selection on seven common network parameters and found that the model with these four parameters maximized accuracy and parsimony (S7 Fig). Parameters were calculated using MATLAB's centrality function.

LASSO regression
We used LASSO regression to predict PCC from graph theory parameters. LASSO regression uses L 1 regularization to minimize the model coefficients,b, while minimizing mean squared error across the ensemble neurons: In this model, x i , represented the 4 × 1 vector of graph theory parameters for the i th neuron pair. PCC i represented the PCC value for the i th neuron pair. Thus, x T ib represented the predicted PCC value. We used leave-one-out cross validation to tune the penalty term, λ, for each ensemble.

Algorithm reliance
We calculated algorithm reliance for each variable by removing that variable from training, training a new LASSO model, and comparing the performance of the new model to the original model with the following ratio:

Latency
We calculated the per-trial latency for each neuron by finding the time from the beginning of optogenetic stimulation to the first spike of each neuron. Spikes that occurred more than 10 ms after stimulation were excluded. Neurons that were targeted with optogenetic stimulation were excluded from analysis of that trial. The latency for each neuron was the mean per-trial latency. The latency for neuron pairs was the sum of latency measure for each stimulated neuron. To calculate latency for spontaneous ensemble activation events, we first found the median spike time for each event. We then calculated latency as the spike time of each neuron relative to the median by subtracting the median time from the spike time of each neuron.

GCaMP Analysis
GCaMP7f and GCaMP8f rise time kinetics were taken from previous research that performed simultaneous calcium imaging and cell attached electrophysiology of neurons in mouse visual cortex [113]. These data were collected using two-photon microscopy at 112 Hz to image neurons in layer 2/3 of the visual cortex in mice. Rise kinetics were sampled from the measured single action potential rise times.
We incorporated rise time variability into our latency measurement by adding a rise time value randomly sampled from the rise time distributions found in vivo for each sensor in each per-trial latency measurement. The correlation between neurons was consistent across ensemble size. Over all ensemble sizes, the within-cluster correlation was higher than the correlation between random neurons. The shaded region represents mean ± standard deviation (n = 500 neuron pairs drawn from ensembles clustered using each median cluster size). (B) The deviation in PO between neurons was consistent across ensemble size. Over all ensemble sizes, the within-cluster PO deviation was lower than the deviation between random neurons. The shaded region represents mean ± standard deviation (n = 500 neuron pairs drawn from ensembles clustered using each median cluster size). (C) The probability of a bidirectional connection decreased slightly as the median ensemble size increased. Over all ensemble sizes, the probability between neurons within clusters was greater than the probability between random neurons. The shaded region represents mean ± standard deviation (n = 500 neuron pairs drawn from ensembles clustered using each median cluster size).  [65] and Avermann et al. [66]. Results from in vivo data are the mean ± s.e.m. found in those studies. (B) Average spontaneous firing rates in our model over 25 seconds resemble those found in vivo in anesthetized mice in Mesik et al. [63], Neill and Stryker [64], and Chen et al. [62]. Results from in vivo data are the mean ± s.e.m. found in those studies. Each pair was stimulated 800 times. We calculated the ensemble recall rate (the fraction of trials where greater than 75% of the ensemble was activated) for each of the 25 neuron pairs. Neuron pairs with higher ensemble recall rates were considered better pattern completion neurons. (B) Probability of ensemble activation increased with the average membrane voltage of ensemble neurons before stimulation of the neuron pair. (C) The pre-stimulation voltage needed for stimulation of a neuron pair to activate an ensemble 5% of the time was significantly and negatively correlated with the overall ensemble recall rate (p = 1.2 × 10 −13 , F-test, n = 25 neuron pairs). Neurons with higher overall ensemble recall rates could achieve a 5% recall rate at voltages farther from threshold. (D) The correlation between each network parameter and the pattern completion capability of each neuron pair. All variables were significantly and negatively correlated with PCC, with degree and closeness centrality having the strongest correlation (from left to right: p = 1.8 × 10 −9 , p = 1.2 × 10 −8 , 2.2 × 10 −4 , 6.6 × 10 −7 , F-test, n = 25 neuron pairs). (TIF) S9 Fig. 30 Hz stimulation does not permit recovery from previous ensemble recall. (A) We categorized the 8 stimulations for each stimulation period as either odd or even stimulations. Odd stimulations included the first stimulation of each period and had a higher recall rate compared to even stimulations, as shown in (B). (B) For both 20 Hz and 30 Hz stimulation, odd stimulation events had a higher recall rate than even events. However, the discrepancy between odd and even recall rate was greater at 30 Hz than at 20 Hz. Error bars represent standard error (n = 10 neuron pairs and 100 stimulations for each condition). (C) The average membrane potential of all 36 ensemble neurons during and after a successful ensemble recall event. The blue vertical line indicates the start time of a stimulation that resulted in at least 75% of ensemble neurons firing. In line with inhibitory activity shown in (E), the average ensemble membrane potential decreased abruptly at 10 ms after stimulation. This decrease was followed by a gradual rise in membrane potential back to baseline. However, at 30 Hz, the next stimulation event occurred at 33 ms, so the average ensemble voltage following a successful recall event did not have time to return to baseline. (n = 36 ensemble neurons during 371 recall events at 20 Hz and 132 events at 30 Hz). (D) In agreement with the average membrane potentials shown in (C), histograms of ensemble neuron pre-stimulation voltages at 20 Hz and 30 Hz show that 30 Hz even stimulation events had lower pre-stimulation voltages. This agrees with this group having the lower recall rate. Unlike at 30 Hz, even and odd stimulations at 20 Hz had similar pre-stimulation voltages, and the average was greater than 30 Hz even stimulations. Most pre-stimulation voltages were below the resting membrane potential for excitatory neurons, likely due to delayed inhibitory activity following ensemble events, as shown in (E) (n = 10 neuron pairs and 100 stimulations for each condition). (E) Left: We observed stereotypical delayed inhibitory activity following stimulation events. The average firing rate of all inhibitory neuron types increased after stimulation. The arrow denotes the inhibitory event that is shown to the right. (n = 10 stimulated neuron pairs over 5 stimulation periods). Right: The average firing rate of inhibitory neurons peaked at 10 ms following stimulation. This activity likely contributed to the low recall rate of even events at 30 Hz. (TIF) S1