Introduction

Transitive inference (TI) refers to judgments that if A > B and B > C, then one can infer that A > C by the transitive property of ordinal rank1. Rank efficiently encodes list order, as it requires only one value for each item (which scales linearly with list length), rather than encoding a separate relationship for each combination of items (which scales exponentially). A wide variety of species can learn ordered lists2,3. Such hierarchical knowledge is considered important in a variety of natural contexts4,5,6, and does not appear to depend on purely associative mechanisms7,8,9.

Several lines of evidence suggest that abstract cognitive mechanisms give rise to TI abilities10,11, distinguishing them from more concrete forms of learning such as motor sequences. The strongest of these is the presence of a symbolic distance effect (SDE12). Symbolic distance refers to the difference in the ranks of the two items (e.g. adjacent pairs have SD = 1). Decision accuracy varies directly with the symbolic distance that separates each pair13,14. Recent evidence suggests that purely associative mechanisms and reward signals cannot account for these effects7,8,9,15,16. Abstract representations of ordinal rank may be linked to those of physical space17,18, combining both into a “mental workspace.” Such a workspace may be used across modalities, such that spatial and non-spatial information are integrated into a common representation19,20,21. Such a workspace would also be a computationally efficient solution to inferring the implied ordering of stimuli, especially when temporal or spatial cues provide no information about that ordering.

Parietal cortex has long been associated with spatial relationships, and parietal lobe damage in humans gives rise to a well-known spatial hemineglect syndrome. In parietal stroke cases (e.g. Gerstmann Syndrome), multimodality spatial and number-related deficits can co-occur22. Parietal dysfunction involving the intraparietal sulcus (IPS) is associated with irregularities in spatial and numerical representation23,24. Posterior parietal cortex (PPC) has been shown to integrate spatial, temporal, and reward information25. There is also evidence suggesting that the same areas of parietal lobe are responsible for both spatial representation and serial learning26,27,28, including TI29,30.

PPC is also implicated in various number representation and manipulation tasks31,32,33,34,35. Electrophysiological studies suggest that PPC encodes numerosity25,36,37,38, although39 found that LIP inactivation leaves numerosity judgments intact. Lateral intraparietal (LIP) neurons exhibit features of a magnitude accumulator40,41,42 and are implicated in spatial cognition, including spatial attention43, judging relative spatial position44, and probabilistic reasoning45. The concept of a “mental workspace” here refers to an internal representation where behaviorally-relevant spatial, temporal, numerical, sequence, and reward signals are mapped within a common reference frame. Area LIP is a candidate neural substrate for such a workspace46.

Electrophysiology in nonhuman primates offers an opportunity to study how the implied ordering of a set of stimuli is learned without confounds of semantic encoding and language use. Because macaques can learn 7-item lists within a few hundred trials, their learning can be examined with concurrent single-unit electrophysiology. The present study is the first to examine the electrophysiological basis of ordinal position and distance effects during the training of tasks in which the stimulus order is entirely implicit. Evidence that the firing rates of LIP neurons correlate with ordinal position of or symbolic distance between stimuli, would provide correlational evidence for the encoding of a mental number line in LIP. Such results would support future attempts to validate a workspace for ordinal reasoning of this kind, using causal manipulations such as chemical inactivation.

Results

Two subjects (H and L) completed 141 behavioral sessions (NHP H: 74, NHP L: 67). The average number of trials per session was 711 (H) and 890 (L). In each session the subject learned a novel TI list; one session = one TI list. A total of 117 neurons were recorded (H: 64, L: 53) during these sessions. Some neurons were recorded with more than one TI list, and some sessions contained data from more than one neuron, separated using the spike sorting methods described below. The result was a total of 142 recordings (H: 62, L: 80), where each recording contains data from one neuron with a novel 7-item list.

Behavioral performance

A mental representation of an ordered list could be implemented as a direct representation of ordinal position, or by related variables such as symbolic distance and joint rank. Performance accuracy and response time were analyzed to determine the influence on behavior of these variables, as well as variables related to reinforcement. A total of 141 sessions were analyzed (H: 74, L: 67). Each session started with a novel set of pictures that were arranged into a list by assigning a rank order to each picture (Methods, Fig. 7). The ranking was initially unknown to the subject. Figure 1 illustrates the development of both the probability of choosing a given stimulus and the probability of a correct response over the first 500 choice trials. The influence of item rank on choice probability emerged over the first 300 trials (Fig. 1A). Learning curves (Fig. 1B) showed a monotonic increase in response accuracy (Fig. 1B black line is performance averaged over all distances) from a baseline of 0.5 (chance) up to an average around 0.75.

Figure 1
figure 1

Behavioral Performance. Response accuracy (proportion correct) was averaged over sessions, but separated by trial number, symbolic distance, and joint rank. Data were smoothed using a 25 trial moving window. (A) Probability of choosing each item during the first 500 choice trials. (B) Mean accuracy during the first 500 trials (black line), as well as accuracy sorted by symbolic distance (colored lines). (C) Estimated accuracy for each pair at trial 250 sorted by joint rank and symbolic distance. Error bars are standard deviation across sessions. (D) Estimated accuracy at trial 500.

Symbolic distance had a strong influence on performance accuracy (Fig. 1B,C), emerging with a time course similar to that of overall performance. Accuracy was lowest for adjacent pairs, somewhat higher for distance 2 pairs, and so on (Fig. 1C). Across sessions, the correlation (Pearson’s r) of performance (session average) with symbolic distance was significant for both subjects (H: r = 0.62, p < 0.0001, N = 400; L: r = 0.61, p < 0.0001, N = 402) with moderate effect sizes (H: r2 = 0.38; L: r2 = 0.37). Qualitatively, performance tended to be better for pairs that included the first and last list items, a “terminal item effect” (Fig. 1C,D47).

Reaction times were remarkably short and consistent across symbolic distances. Even though subjects were allowed up to 1.5 sec after stimulus onset to respond by making a saccadic eye movement to the target or distractor, choice reaction times (RT, aka saccade latency) were 232 +/− 31 msec (mean+/− s.d.) and remained in a narrow range throughout each session. Although choice reaction time varied inversely with symbolic distance (ANOVA p < 0.0001, N = 87,296), the range was only 7 msec between the longest and shortest mean RTs (distance = 1, mean RT = 233 msec; distance = 6, mean RT = 226 msec).

Overall, this behavior indicates that subjects rapidly acquired a representation of the list order, consistently expressing a preference for items of lower rank after only 250 to 300 trials of experience with an otherwise never-before-seen set of stimuli.

Electrophysiology

We recorded from every neuron that we were able to isolate. We categorized neurons using a hierarchical clustering algorithm based on similarity of firing rate dynamics (Methods, Fig. 8). Firing rates were first averaged across trials using KD estimation48, to obtain an overall temporal response profile for each neuron. These averages were then z-standardized, and sorted into clusters based on the similarity of their temporal profiles. Categories were identified by building a dendrogram of correlations using agglomerative hierarchical cluster analysis49, with r = 0.7 as the cutoff for cluster membership. This resulted in 6 clusters, depicted in Fig. 8C. All classes of neuron were included in subsequent analyses.

Cluster V, comprising visual neurons, contained the most recordings. An example recording session for one visual neuron is shown in Fig. 2. The cell had a transient, short latency (50 msec) response to the appearance of a visual stimulus in its receptive field (Fig. 2A). The neuronal response started much earlier than the behavioral response latency (150 msec for single targets, 233 msec for choice targets). Spatial selectivity (Fig. 2B) was assessed on single target trials, showing a 2:1 difference between responses to stimuli inside (Fig. 2B blue) vs. outside (Fig. 2B magenta) the receptive field. On choice trials, selectivity was assessed by sorting according to saccade direction (Fig. 2B, black, grey). This cell did not differentiate between saccades toward or away from the RF, and can thus be said to have visual spatial selectivity, but not movement-related selectivity. In Fig. 2C, firing rate has been converted to z-score and plotted as a function of joint rank (the sum of the ranks of the two stimuli presented in each trial) for trials with two stimuli (choice trials). Responses on choice trials are also plotted as a function of symbolic distance (difference in rank of the two stimuli, Fig. 2D). Figure 2E plots response vs. ordinal position (rank) for the stimulus inside the receptive field. During single target presentations (Fig. 2E, black), variations in the response to different images can only be due to the visual properties of the stimuli, as the NHP had yet to learn the rank of each stimulus. A z-score of zero implies that the response was equal to the average response across all conditions. This cell shows effects that were characteristic of the population: strong spatial selectivity for single targets, response averaging for two stimuli, weak selectivity for ordinal position, and modulation by symbolic distance and joint rank. These features of the data are elaborated below.

Figure 2
figure 2

Example of a recording session for one cell. (A) Spike raster aligned on stimulus onset (vertical green line). Each blue dot (for single target trials) or black dot (for choice trials) represents an action potential. Each row is one trial. Red dots are saccade latencies. (B) Smoothed average firing rate vs. time. Blue line is for a single target in the receptive field, purple for a single target outside the receptive field. Black/gray corresponds to choice trials (two targets) where the saccade was toward (black) or away (gray) from the receptive field. (C) z-scored firing rate vs. joint rank for all choice (2-target) trials. (D) z-scored firing rate vs. symbolic distance for all choice trials. (E) z-scored firing rate vs. ordinal position.

Encoding of joint rank and symbolic distance

For each recording, z-scores of the average firing rate in the interval starting 40 msec after stimulus onset and ending with saccade onset were sorted by joint rank. Example recordings with choice trials sorted by joint rank are shown in Fig. 3A,B. Figure 3C shows the firing for each recording, sorted by the joint rank that was associated with the strongest response. Figure 3D shows the decoding of joint rank by an optimal linear estimator (OLE50) using the neuronal ensemble activity. The decoding is more accurate for late trials due to the fact that learning-related changes in selectivity have stabilized. The OLE was run 1,000 times and the variability of estimates was obtained with a bootstrap procedure. The same analyses were done for symbolic distance (Fig. 3E–H). Roughly half the recordings had the greatest activity for one of the two largest symbolic distances (Fig. 3G). Across the population most neurons tended to have higher firing for larger symbolic distance, whereas the tuning for joint rank was more distributed. As was the case for the behavioral results (Fig. 1C,D), the neural correlates of joint rank and symbolic distance emerged during the first 250 learning trials and were stable for the duration of the session.

Figure 3
figure 3

Joint rank and symbolic distance coding. (A,B) Example post-stimulus time histograms for two recordings with activity during choice trials sorted by joint rank. (C) Recordings (n = 142) sorted by preferred joint rank based on z-scored firing rate. (D) Estimated joint rank using an optimal linear estimator. Dotted lines are 95% confidence intervals for all trials based on 10,000 bootstrap iterations. (E,F) Example post-stimulus time histograms for two recordings sorted by symbolic distance (different recording sessions than A,B) (G) Recordings sorted by preferred symbolic distance. (H) OLE estimated symbolic distance with bootstrap estimates of 95% confidence interval.

The effects of joint rank and symbolic distance were further quantified by computing the variance accounted for (VAC) by each variable. For each recording, firing rate was smoothed by applying Gaussian Process Regression (GPR) using a model with trials, time, joint rank (JR), and symbolic distance (SD). Because GPR is flexible and non-linear, a measure of the VAC provided a way of assessing the degree of tuning a cell displayed as a function of that variable, without specifying a particular form that tuning had to take. The estimation was done within 4 epochs: baseline (250 ms before stimulus onset), visual (between 5 ms to 150 ms after visual presentation), presaccadic (100 msec before saccade initiation) and postsaccadic (500 ms after saccade execution) period. Each period was divided into intervals of 10 msec to create spike counts over time as a prior to the GPR. The posterior estimate of firing rate (mean and standard deviation) was a continuous function of time, trial, JR and SD. The estimated firing rate and uncertainty were then averaged within the previously specified time epochs. To estimate the amount of variance associated with JR or SD across the session for visual, presaccadic or postsaccadic epochs, we subtracted the baseline, and then evaluated the variance accounted for (VAC) using partial eta-squared (\({\eta }_{p}^{2}\)), with F-values and degrees of freedom from the Welch test for populations with unequal variance. Each \({\eta }_{p}^{2}\) was sorted by cell class (see Fig. 8C) and in order of increasing VAC magnitude (Fig. 4A–F). The density of each \({\eta }_{p}^{2}\) was estimated by bootstrap. VAC values for JR were generally larger than for SD. VAC was similar across clusters, suggesting that JR and SD are represented across all cell classes.

Figure 4
figure 4

Variance accounted for by joint rank and symbolic distance, sorted by neuron class and by trial epoch. Box-and-whisker diagrams depict mean VAC for each cluster, with boxes corresponding to 80% credible intervals and whiskers corresponding to 95% credible intervals. Clusters are defined in Methods and Fig. 8C.

Joint rank and symbolic distance are logically related (Fig. 7B), but are numerically uncorrelated: larger joint rank pairs have neither larger nor smaller symbolic distances on average. Empirically, these two metrics had no significant numerical correlation across recording sessions (mean Pearson’s r = 0.0005, range: −0.019 to 0.013, no sessions with p < 0.05). Therefore, it is possible that individual neurons could be modulated by either or both variables. In Fig. 5, the VAC for JR is plotted against SD on a cell-by-cell basis for three trial epochs. Both variables are more strongly represented around the time of the choice saccade than around the time of stimulus onset, as suggested by Fig. 3A,E. The variance accounted for by both factors tends be correlated across the population of recordings (Fig. 5). To address the possibility that VAC for JR might be driven by neurons selective for the most extreme JR (JR = 3 and 13, n = 40 neurons), we recomputed the VAC correlation (JR vs SD) excluding those JRs. We still found that JR accounted for more variance than SD (stimulus onset: 60 neurons with JR > SD, 42 with SD > JR; pre-saccade: 70 neurons with JR > SD, 32 with SD > JR; post-saccade: 79 neurons with JR > SD, 23 with SD > JR). VAC for SD and JR remained highly correlated (Spearman’s r = 0.82, p < 0.0001; 0.82, p < 0.0001; and 0.86, p < 0.0001 for stimulus onset, pre-saccade and post-saccade, respectively.) Hence, neurons representing JR and SD are not distinctly separate subpopulations. However, the majority of neurons have greater VAC for JR than for SD across all trial epochs.

Figure 5
figure 5

Variance accounted for (VAC) by Joint rank vs. symbolic distance. N = 142 recordings. (A) VAC in firing rate activity aligned to stimulus onset, rs is Spearman’s rank correlation, p is the probability associated with rs. Filled symbols have JR > SD, open symbols have SD > JR. Inset histogram shows distribution of SD-JR. N is total number of recordings in each group. (B) VAC in activity prior to saccade onset. Same format at (A). (C) VAC in activity prior to saccade onset. Same format as (A).

Encoding of stimulus identity or rank

Sereno and Maunsell (1998) reported visual feature selectivity in monkey posterior parietal cortex using stimuli that controlled for luminance cues51. In the current study, the stimuli were not controlled for luminance, chromaticity, contrast or spatial frequency content. They were simply random colorful images with content that was easily recognized and discriminated. Nevertheless, it is possible to ask if neuronal responses varied reliably among the different stimuli. The fact that the stimuli varied along multiple visual dimensions increases the expectation that such neural response variability should be found. For this analysis, N = 142 sessions were used. Spikes were counted within a time window 40–150 msec after stimulus onset on each trial, converted to spikes per second, and then z-scored.

Trials in which only a single stimulus appeared (i.e. the first 5 blocks of each session) were sorted according to the identity of that stimulus. Figure 6A shows the values of \(z(FR)\) for the best and worst stimuli for each recording. Some degree of apparent stimulus preference can arise by chance, given finite sampling of variable neural responses. To control for this, the apparent best and worst stimuli were computed with the stimulus identities randomly shuffled among trials. There was no significant difference between the original and unshuffled responses (paired t-test p = 0.34 for shuffled vs. non-shuffled best-worst response). Response differences across stimuli thus appear to be accounted for by random variability, and there is no evidence that neuronal responses reliably encoded stimulus identity during the initial phase of the task when the stimuli were presented individually.

Figure 6
figure 6

Encoding of stimulus identity. n = 142 recordings with 7-items lists. (A) Trials with a single stimulus in RF. Mean z-scored firing rate for best vs. worst stimulus during visual epoch using original stimulus labels (blue) and shuffled labels (red). (B) z-scored firing rate during visual epoch of first 250 choice trials with best vs. worst stimulus inside RF. Blue dots are responses with original labels, red dots are responses with shuffled labels. (C) Same as (B), but for choice trials after trial 250.

After the first 5 blocks of each recording session, images were presented in pairs, one image inside of and the other opposite to the receptive field. As in Fig. 1, TI learning was most rapid during the first 250 trials of paired presentations. During these trials, there was a single best stimulus for some neurons that evoked a stronger response compared to the “best” stimulus when stimulus identity was shuffled across trials (Fig. 6B, paired t-test p < 0.001, N = 142). This effect emerged during learning trials (i.e., the first 250 paired presentations, Fig. 6B), but was not present during later trials, when performance reached a plateau (Fig. 6C; paired t-test p = 0.98; N = 142). Hence, during TI learning, the visual responses of some neurons transiently but reliably encoded stimulus rank.

Effects of performance

Reward-related modulation of neural activity in macaque parietal cortex was described by52. It has also been reported that neuronal activity in parietal cortex is correlated with performance accuracy53. Given current evidence for the influence of serial order and related quantities on both behavior and parietal neuronal activity, it is important to ask whether current findings could be related to reward associations. In the TI task, all correctly completed trials were rewarded, thus reward probability was equal to percent correct. To test for effects of performance or reward, we compared the difference in activity (spike counts) between correct and incorrect trials for each recording. Three trial epochs were examined: background (200 msec prior to stimulus onset), visual (40–150 msec after stimulus onset) and presaccadic (150 msec prior to onset of choice saccade). Each of 142 recordings was subjected to a Wilcoxon test with a criterion of alpha = 0.05. Effect sizes were measured using η2. Few recordings showed significant effects of performance accuracy in any trial epoch and the effect sizes were tiny (background epoch: N = 11 (8%) significant recordings, mean η2 = 0.003; visual N = 13 (9%), η2 = 0.003; presaccadic N = 25 (18%), η2 = 0.04), suggesting negligible association between neural activity and performance outcome or reward probability in the current data set.

These group differences were supported by Kruskal-Wallis tests run for each recording and explanatory variable (target location, saccade direction, outcome) with alpha = 0.05. Approximately 43% of cells (61/142) had a significant effect of target location, with an average effect size (generalized η2) of 0.34. Additionally, 35% (54/142) differed significantly as a function of saccade direction (avg. effect size = 0.08), and 20% (29/142) differed significantly as a function of outcome (avg. effect size = 0.02).

Effects of prior reward history

It is possible that subjects used a model-free reinforcement learning (RL) strategy to perform the task. A hallmark of model-free RL is that choices that are rewarded are more likely to be repeated than those that are not. We investigated this by first looking at the pattern of spatial choices; if the subject was rewarded for choosing the stimulus in a particular location, would they be more or less likely to choose the same location on the next trial regardless of whether it was the correct response? There was evidence in favor of a spatial win-stay, lose-shift bias; rewarded responses were repeated 56% of the time, significantly different from the expectation of 50% (t-test p < 0.0001, N = 141, Cohen’s d = 0.68). Unrewarded trials were followed by responses to the same location only 48% of the time (t-test p < 0.002, N = 141, Cohen’s d = 0.27). This result generally agrees with a previous study of reward-based decision-making54, which found that monkeys were more likely to change their response after receiving a relatively small reward. Reaction time (choice saccade latency) did not vary depending on the outcome of the previous trial (mean RT difference, after previous reward - after no previous reward, = −0.9 ms, SD = 6.3 ms, paired t-test p = 0.1, N = 141).

The behavioral result that rewarded spatial choices were likely to be repeated raised the possibility that reward history might affect neuronal responses. In a task with varying reward magnitudes, Kubanek and Snyder (2017) found that LIP neurons fired more vigorously following trials that ended with relatively small rewards. In the current data (142 recording sessions), there was little evidence that reward outcome affected neural activity on the next trial. For choice saccades toward the receptive field, firing rate during the period between stimulus onset and choice saccade was not significantly elevated on the trial following a rewarded trial compared to an unrewarded trial (reward - no reward, mean firing rate difference = 0.07 sp/sec, p = 0.75, N = 142). For saccades away from the receptive field, firing rate was non-significantly reduced (reward - no reward, mean firing rate difference = −0.19 sp/sec, p = 0.39, N = 142). These effects were negligible and suggest that, in the current data, reward history was not a significant modulator of neural activity.

It should be noted that a spatial choice bias is deleterious to overall performance, as the target location was randomized and provided no information about the correct response. The presence of a spatial bias does not rule out other choice biases that might facilitate learning of serial order. This possibility was tested by asking whether reward history biased the choice probability for each of the 7 pictorial stimuli conditioned on whether or not the previous choice of that stimulus had been rewarded. This analysis was performed for non-terminal list items (B thru F). Some conditions were excluded if the choice probability could not be calculated (i.e. the denominator was 0, which happened in 11 of 661 cases). The average probability of choosing a particular stimulus following a rewarded choice of that stimulus was 0.51 (SD = 0.17), compared to 0.51 (SD = 0.18) following unrewarded choices. The difference was not significant (t-test paired by session and item, p = 0.99, N = 650). Similarly, the average latency of choice saccades varied by less than 1 ms between previously rewarded and unrewarded conditions (paired t-test p = 0.34, N = 661). The results were the same whether all choice trials in each session were included, or only the first 250 trials when learning of serial order was most rapid. These results provide no support for the idea that subjects used a model-free reinforcement learning strategy to learn or perform the task. If anything, rewarded choices were slightly less likely to be repeated than unrewarded choices.

Discussion

The transitive inference (TI) paradigm tests subjects’ ability to learn the implied serial order of a set of pictorial stimuli without any explicit spatial or temporal cues. This is accomplished by presenting pairs of images drawn from a rank-ordered list of images and rewarding subjects for choosing the item that has the lower rank. Much work in humans and other animals has supported the idea that TI learning relies on a mental workspace (reviewed by2), but few studies have examined its physiological underpinnings in brain regions implicated in spatial cognition (e.g.55.). Here, we evaluated whether neural evidence was consistent with the mental workspace hypothesis by recording neuronal activity in posterior parietal cortex while monkeys learned novel TI lists consisting of sets of 7 images that were assigned ranks learned by trial-and-error.

Each recording session started with a novel set of images. Subjects initially responded randomly, but were able to learn each list within about 250 trials. Prior research argues against the idea that monkeys memorize the response to each pair56 or rely solely on the experienced reward value of each stimulus9. Rather, they appear to acquire abstract knowledge of the list order, and of the rule that dictates choosing the lower ranked item in each pair. In the current study, behavior was more consistent with representation and rule-based learning than with model-free reinforcement learning. Neuronal activity in parietal cortex displayed strong visual and spatial selectivity, but also showed significant influences of cognitive variables related to list order such as item rank, symbolic distance, and joint rank. These results support the idea that parietal cortex is involved in representing abstract serial order.

Symbolic distance and joint rank are defined jointly by the two stimuli presented on choice trials. Symbolic distance (SD) is the difference of the ranks of the two stimuli and joint rank (JR) is the sum of the two ranks. Although JR and SD are not numerically correlated, they are not fully independent (Fig. 7B), as knowledge of one constrains the other. There was evidence that both variables are represented in parietal cortex, with moderate effect sizes as measured by variance accounted for (effect size approximately 0.1–0.2). These effects are much larger than the effects of trial outcome (effect size at most 0.04). Modulation by symbolic distance and joint rank was seen in all trial epochs and therefore does not appear to be tightly coupled to decision making, but may be more closely related to the learning process that takes place over the course of many trials. Our evidence suggests JR and SD make independent contributions to the neural coding of inferred relations. However, it remains to be determined whether these contributions might be explained by a single, more complex, nonlinear factor.

The strength of the observed joint rank effects was not expected at the outset of the study, because joint rank is much less predictive of response accuracy than symbolic distance. The stimulus pairs AG, BF, and CE all have the same joint rank of 8, for example, but differ considerably in their levels of response accuracy (Fig. 1). Despite not providing an obvious benefit to the downstream behavior, however, it may be that joint rank is encoded as part of a wider scheme for interpreting scenes that contain multiple stimuli. Just as symbolic distance and joint rank can be calculated arithmetically from the ranks of the original stimuli (\(SD=B-A\), \(JR=B+A\)), the ranks of the stimuli can be decomposed from an encoding of symbolic distance and joint rank (\(B=\frac{(JR+SD)}{2}\), \(A=\frac{(JR-SD)}{2}\)). Cells that encode these additive and subtractive quantities from across the visual field could be processed downstream to represent individual stimulus identities, and parietal cortex may not be the locus of that decomposition.

Joint rank may also be related to an approximate number system in monkeys57,58. Neurons related to the representation of visual quantity have been described in the parietal cortex of macaques25,31,35,41. This representation may support simple arithmetic operations that are more fully developed in humans59,60,61.

We tested the contribution of model-free learning mechanisms by performing a sequential trial analysis to determine if received rewards biased subjects’ behavior on subsequent trials. Reward history was found to introduce a spatial bias, but did not bias responses in favor of choosing a previously rewarded list item. There was no evidence of neuronal responses being modulated reward history, as reported by others54. Variations in neuronal response were only weakly correlated with trial outcome53. These results are in accord with a sizable literature showing that experienced reward value does not account for transitive inference or serial learning in multiple primate species7,9,13.

There was no evidence that item identity based on shape, color, or other visual features was encoded when stimuli were presented individually during the initial trials of each session (prior to TI learning), in contrast with previous work showing shape selectivity in LIP51. This was despite substantial item-wise variation in responses, possibly driven by low-level visual properties of the stimuli that were not systematically controlled. However, this variation did not survive a permutation test, which is a stringent control for reliable stimulus encoding.

During the first 250 trials of TI learning, neural activity did reliably distinguish between the best and worst list items. Because the neurons showed no inherent visual selectivity prior to learning, it is likely that this signal was related to the learned list positions. The effect of list position disappeared after the initial learning, as accuracy reached a plateau. Thus, there appears to be a transient representation of list position in parietal cortex during learning.

The representation of variables such as symbolic distance and joint rank that arise from the comparison of spatially remote stimuli is contrary to conventional thinking about visual or visual-movement neurons with spatially localized receptive or movement fields. However, such responses are not unprecedented in the dorsal visual pathway of macaques62 found effects of remote stimuli in visual area MT when monkeys were required to compare motion stimuli in the contra- and ipsilateral visual fields. Moreover, models of spatial remapping around eye movements suggest that individual LIP neurons receive information from across the entire visual field63.

Overall, our results support an emerging view of parietal cortex in which: (1) Spatial and non-spatial information is represented by the same population of neurons. (2) Non-spatial information can encode abstract quantities related to implied serial order (such as relationships between ranks), which obeys the rule of transitivity. Finally, (3) information is integrated across visual space to compute cognitive variables based on the comparison of spatially remote stimuli. All of these properties add to growing evidence that parietal cortex likely plays a substantial role in the construction of a mental workspace for both spatial and abstract cognitive tasks.

Methods

Subjects

Subjects were two male rhesus macaques (H and L), 13–14 years old and weighing 9–10 kg at time of experiment. The research was conducted in accordance with U.S. Department of Health and Human Services (National Institutes of Health) guidelines for the care and use of laboratory animals, and was approved by the Institutional Animal Care and Use Committee at Columbia University and the New York State Psychiatric Institute. Monkeys were prepared for experiments by surgical implantation of a post for head restraint and a recording chamber to give access to cortex. All surgery was performed under general anesthesia (isoflurane 1–3%) and aseptic conditions. Subjects had prior experience performing transitive inference tasks64.

Visual stimuli & eye movements

The subjects were seated in an upright primate chair during the experiment and responded to visual stimuli by making saccadic eye movements. Stimuli were generated and controlled by a CRS VSG2/3 F video frame buffer. Stimuli were displayed on a CRT (subject H) or LCD (subject L) monitor with a resolution of 1280 × 1024 pixels at 60 Hz. Viewing distance was 60 cm.

Visual stimuli consisted of 140-by-130 pixel color photographs (7° by 8° visual angle) and small squares that served as fixation and eye movement targets. To control the retinal position of visual stimuli, monkeys were required to maintain fixation while the stimuli were presented. The fixation point was a red square (0.5° visual angle).

Eye position was recorded using a monocular scleral search coil (CNC Engineering, Seattle WA) and digitized at a rate of 1 kHz65. Subjects made choices by making eye movements from the central fixation point to target stimuli. Eye velocity was computed offline by convolving eye position with a digital filter. The filter was constructed by taking the first derivative of a temporal Gaussian, G(t), such that \(\frac{dG}{dt}=-k\cdot t\cdot exp\left(\frac{{t}^{2}}{{\tau }^{2}}\right)\), where \(\tau \) = 8 msec and \(k\) is a constant that sets the filter gain to 1.0. This filter does not introduce a time shift between the position input and velocity output, but adds temporal uncertainty to the velocity estimates. Horizontal and vertical eye velocities (\(h\text{'}(t)\) and \(v\text{'}(t)\), respectively) were combined to estimate radial eye speed \(r\text{'}(t)\) using the formula \(r\text{'}(t)=\sqrt{h\text{'}{(t)}^{2}+v\text{'}{(t)}^{2}}\).

Experimental design

Subjects were trained to perform three visual-oculomotor tasks: delayed visually guided saccades, delayed memory guided saccades, and transitive inference.

Delayed visual and memory guided saccades

The purpose of this task was to estimate the spatial selectivity (receptive and movement field) of each neuron. Each trial started with a small white spot (0.5° white square) presented in the center of the video display. Once the subject fixated on this spot, a peripheral cue appeared (1.0° white square). The subject was required to hold fixation for 0.75 to 1.25 seconds. Then the central fixation target was turned off, instructing the subject to make a saccade to the peripheral cue. The eccentricity and azimuth of the peripheral cue varied from trial to trial. Targets were presented at 8, 16, or 24 locations. This allowed the experimenter to determine the spatial locations to which each neuron responded most strongly. The stimulus presented at the preferred location is hereafter referred to as falling within the cell’s receptive field. The other stimulus was presented diametrically opposite relative to the center of the screen, and is referred to as falling outside the receptive field. The delay between the peripheral cue onset and the saccade allowed experimenters to distinguish between visual and movement-related neuronal activity. The memory guided saccade task was identical to the visually guided saccade task, except that the peripheral cue was presented only briefly (0.5 sec) and then turned off. During the cue presentation and the subsequent memory delay interval (0.75 to 1.25 sec), the subject continued to fixate on the center fixation target. Then the central fixation target was turned off, instructing the subject to make a saccade to the remembered location of the peripheral cue. After the saccade, the peripheral cue was turned back on to provide feedback about saccade accuracy.

Transitive inference

Prior to the beginning of each recording session, a set of 7 photographs never before seen by the subjects was selected from a database of over 2500 images. A different set of images was used for each session. These stimuli were randomly assigned a unique rank, indicated by the letters A-G, to create an ordered list (Fig. 7A). No explicit information about stimulus rank was presented to the subjects. That is, the letter assigned to each stimulus was never displayed, nor was there any information about serial order in the spatial or temporal presentation of the stimuli.

Figure 7
figure 7

Transitive inference task procedure. (A) Example of a list of 7 random images assigned ranks A thru G. (B) All 21 image pairs for a 7-item list, and their respective joint ranks and symbolic distances. (C) Trial structure and timing.

For 7-item lists, there are 21 possible stimulus pairs. In addition to being identified in terms of which stimuli they included, each pair could be described in terms of two metrics that encode information about both stimuli. The “symbolic distance” was calculated by taking the difference between the stimulus ranks66. For example, since the pair BC consists of items that are adjacent in the list (i.e. ranks 2 and 3, respectively), their symbolic distance is 1, while the pair AG (ranks 1 and 7) has a symbolic distance of 6. The “joint rank” is the sum of the ranks of the two stimuli presented on each trial, and is orthogonal to symbolic distance67. The pair BC would have a joint rank of 5 (2 + 3), while AG would have a joint rank of 8 (1 + 7). A diagram mapping the symbolic distance and joint rank for all pairs is presented in Fig. 7B.

On each trial, the subject first fixated a small target in the center of the screen (Fig. 7C). Following a random delay (0.4 sec to 1.2 sec, positively skewed with a mean of 0.5 sec), the fixation point disappeared and two stimuli appeared equidistant from the center of the screen in opposite directions. To receive a reward, subjects had to select the stimulus with the lower rank by making a saccadic eye movement to the selected stimulus within 1.5 sec of stimulus onset, and fixating for a random interval (0.4 to 0.6 sec, uniformly distributed). When this interval elapsed, auditory feedback was delivered indicating that the response was either correct (high tone, 880 Hz) and would be rewarded, or was incorrect (low tone, 440 Hz). In order to receive the reward on correct trials, subjects had to maintain fixation for another interval (0.35 to 0.65 sec, uniformly distributed), after which the screen went dark and fluid rewards (juice or water drops) were delivered.

The first phase of each session presented single stimuli from the list. A “block” of responses during this phase consisted of presentations of each of the stimuli, both in and out of the cell’s receptive field. Thus, for a 7-item list, each block was 14 trials long, randomly ordered. The first phase lasted 5 blocks (70 trials total.) A saccade to these stimuli always yielded a reward. These presentations were used to determine whether the cells demonstrated selectivity toward any of the individual stimuli prior to training.

For the remainder of the session, all pairs of stimuli were presented. With positional counterbalancing, this resulted in blocks that were 42 trials in length. In order to receive a reward, subjects had to choose the stimulus with the lower rank. Subjects were presented with all pairs of stimuli, one randomized block at a time, for up to 20 blocks (i.e. either 400 or 840 trials in total).

If good isolation from the recorded cell was maintained past the end of the all-pairs phase, and the subject remained motivated to work, a new list of photographs was drawn from the database, and the task was repeated (with another 70 trials of single stimuli and up to 20 blocks of pairs). During a single recording session, subjects were capable of learning 1 to 3 lists in this fashion.

Neuronal recording

Subjects had plastic recording chambers implanted over the right hemisphere at stereotaxic coordinates (subject H: 5 mm anterior to the interaural plane, 16 mm medial to the sagittal midline; subject L: 6 mm posterior, 14 mm medial). Single-cell activity was recorded with glass-coated tungsten electrodes (Alpha Omega) with impedances between 0.5 and 2 MΩ measured at a frequency of 1 kHz. Raw signals were amplified, digitized, and high-pass filtered. We used FHC APM and Alpha Omega SnR recording systems. Action potential waveforms were identified by a time–amplitude window discriminator (FHC preamplifier) or threshold crossings (Alpha Omega). Action potentials were converted into digital pulses that were sampled by the computer with 0.02 ms temporal resolution. Waveforms were stored for offline spike sorting using WaveClus68. However, offline spike sorting was necessary only in exceptional cases because recording was mostly restricted to cells that could be clearly isolated online.

Recording sites were anatomically reconstructed using combination of recording grid coordinates, microdrive depth readings, and stereotaxic position of the chamber, and were projected onto the structural MRI using custom MATLAB scripts. Figure 8 plots a reconstruction of recording sites on a template atlas of the anatomical macaque rhesus brain for area classification69.

Figure 8
figure 8

Reconstruction of recording sites and classification of neurons. (A,B) Anatomical reconstructions of recording sites (n = 91) were based on sterotaxic MR images, chamber implantation and angles, and matched to the rhesus macaque anatomical templates69. The templates were adapted to overlap over the MRI images plus the recording sites, which ranged from 6 mm posterior to 6 mm anterior to the interaural plane. Blue regions are area LIP, green is VIP, and gray is area 7a. (C) Six families of response types were found based on hierarchical clustering of mean firing rate vs. time, across all conditions. The neurons were optimally ordered using correlation similarity.

Statistical analysis

Analysis of behavioral and neuronal data consisted mainly of the following techniques: bootstrapped permutation testing (BPT), kernel density estimation (KDE) and Gaussian process regression (GPR). BPT was used to determine whether stimulus labels were informative with respect to overall neuronal activity. KDE was used for session-level averages of specific epochs within each trial, in which fairly large numbers of observations could be mustered to obtain an optimized nonparametric estimate. GPR was used to make inferences about processes that unfolded trial by trial, in order to interpolate across gaps.

Each pair of stimuli was classified according to two metrics: symbolic distance (a measure of relative position, calculated by taking the difference between the stimulus ranks) and joint rank (a measure of absolute position, calculated by taking the sum of stimulus ranks). Because these metrics are uncorrelated, they form a two-dimensional continuum across which performance can be analyzed67.

Z-score of neural activity

To construct a common metric across recording sessions with different activity levels, the firing rate on each trial (\(F{R}_{trial}\)) was converted to a z-score using the mean of the firing rate (\({\mu }_{FR}\)) and standard deviation (\({\sigma }_{FR}\)) over the entire session:

$$z(FR)=\frac{F{R}_{trial}-{\mu }_{FR}}{{\sigma }_{FR}}$$

Bootstrap testing

Bootstrapping was used to estimate distributions of sample means. N samples were randomly drawn with replacement from the original set of N observations. Summary statistics of the resampled observations were calculated. This procedure was repeated typically 10,000 times to generate the bootstrap distribution.

Random permutation (“shuffle”) testing

When examining the relationship between neural activity and explanatory variables such as ordinal position, it is useful to ask whether the observed relationship could arise by chance given the observed neuronal activity statistics. This was done by randomly permuting (“shuffling”) the explanatory variable labels among the trials. Each label occurred as many times in the shuffled data as in the original data. This procedure estimates the likelihood that apparent “tuning” for a particular explanatory variable could arise by chance. Random permutation testing could be combined with bootstrapping to obtain a distribution of apparent tuning functions.

Kernel density estimation

Eye movement characteristics and average session rates of neural activity both used KD estimates. For eye movements (position and velocity), we used a Gaussian kernel with Silverman’s optimal bandwidth70. For spike trains, we used the method described by48.

Gaussian process regression

To evaluate how estimated performance changed as a function of learning, we used Gaussian process regression71. GPR is a highly flexible non-linear estimation technique that is well-suited to time series analysis. Although not widely used in neurophysiology, GPR is a well-established procedure72 that has seen extensive application in other domains. It has been called a categorical analysis with an infinite number of categories, arrayed along a continuum73. One continuum of interest is time: Response accuracy for each pair changes over time. Orthogonal to time are the continua of symbolic distance and joint rank, which are also expected to influence response accuracy. GPR is performed by estimating the extent to which every observation covaries with every other, given some prior metric for comparing the distance between observations along the continua of interest. Each observation influences the estimate for other ‘nearby’ observations (e.g. that occur at similar times or have similar symbolic distance) more than observations that are distant74. Although such an analysis is not possible using classical methods (because of irreducible uncertainty about how distance and similarity interact to give rise to the data), Bayesian methods make GPR feasible by imposing a strong prior belief that pairs at similar times and with similar symbolic distance and joint rank should display similar patterns of neural activity75. GPR depends on relatively few assumptions, instead allowing the data to determine the form taken by the time series estimate. The chief constraint is that Gaussian processes are presumed to be smooth (i.e. differentiable without discontinuity). Beyond this constraint, one can imagine the model estimate as the posterior distribution of the relative density of all possible smooth curves, conditioned on the data and the informative prior. Although a full Gaussian process model can be computationally prohibitive to fit (requiring runtime O(n3) for n observations), we will take advantage of the “expectation propagation” approximation76 implemented in the GPstuff toolbox77 to accelerate estimation.