Edinburgh Explorer Serial cognition and personality in macaques

- We examined the associations between serial cognition and personality in rhesus macaques ( Macaca mulatta ). Nine macaques were tested on a simultaneous chaining task to assess their cognitive abilities. They were also rated for personality traits and scored according to a previously extracted six component structure derived from free-ranging rhesus macaques. Friendliness and Openness were positively associated with good performance on three measures of accuracy on the serial learning task: Progress, Error, and Rewarded (i.e., correctly completed) Trials. Faster Reaction Times were associated with lower Friendliness and higher Confidence, as well as higher Openness when only correct responses were analyzed. We also used regularized exploratory factor analysis to extract two, three, four, five, and six factor structures, and found consistent associations between accuracy and single factors within each of these structures. Prior results on intelligence in other nonhuman primate species have focused on basic intelligence tests; this study demonstrates that more complex, abstract cognitive tasks can be used to assess intelligence and personality in nonhuman primates.

Serial intelligence is a broadly applied, flexible ability: despite their differences in difficulty, the SimChain and TI paradigms share a common mental representation (Jensen, Altschul, Danly, & Terrace, 2013). Transitive reasoning is in turn linked to symbolic manipulation (D'Amato & Colombo, 1990), social dominance and navigation in primate hierarchies (Paxton et al., 2010), and language (Jensen et al., 2013). These links make SimChain a strong candidate for testing general cognitive ability in animals.
While the evolution of serial cognition is well documented (McGonigle & Chalmers, 2006), why individual personalities have been selected for remains an open question (Bouchard & Loehlin, 2001). Moreover, the evolutionary genetics underlying individual differences in intelligence and personality need not be very similar. If the contributions of gene and environment differ between personality and intelligence (Penke, Denissen, & Miller, 2007), then how should we expect animals' personalities to vary with cognitive abilities? In nine rhesus macaques, we collected cognitive and personality data, and in a series of exploratory analyses we examined connections between personality and serial cognition, with the expectation that Openness, and possibly other macaque personality dimensions, would be associated with performance on the SimChain task.

Subjects
Nine male captive-born rhesus macaques, aged 12 to 16 years, and housed at the New York State Psychiatric Institute, performed a SimChain task and were evaluated for personality. The colony was maintained in accordance with guidelines issued by the National Institutes of Health and the Institutional Animal Care and Use Committees at the New York State Psychiatric Institute and Columbia University. Macaques were individually housed in adjoining cages at the time of the study, but had been pair housed previously. Macaques were given water ad libitum, and fed commercial primate biscuits and varied fresh fruits and vegetables daily, in addition to any pellets they received as rewards in experimental tasks.

Apparatus
The apparatus was identical to that used in prior studies (Jensen et al., 2013). Testing took place in chambers housed in sound-attenuated booths. Chambers were equipped with speakers, and a pellet dispenser (Med Associates;pellets by BioServ,190 mg). A computer with a touch-sensitive monitor presented stimuli and detected responses.

Procedure
The SimChain paradigm presents an ordered list as a simultaneously displayed set of images on a touchscreen monitor. A trial is completed by selecting each stimulus in the correct order (see Figure 1; Terrace, 1993). In this experiment, subjects had to learn a novel four-item list composed of arbitrary color images, each day. Subjects were given 40 trials to learn each list, which could only be accomplished through trial and error. On successful trials, subjects were rewarded with a banana pellet. On unsuccessful trials they received a 4 s timeout. We gathered 20 days of data, that is, 20 sessions of 40 trials each.
Because subjects had been extensively trained on SimChain tasks, no task learning effects were expected to confound results. Subjects could be expected to display their asymptotic level of performance.

Personality Ratings
Subjects were independently rated by 10 animal care volunteers using the Hominoid Personality Questionnaire (Weiss et al., 2009). The questionnaire consisted of 54 adjectives followed by 1 to 3 sentences defining adjectives in terms of everyday nonhuman primate behaviors. Items were rated on a 7point scale. Raters were familiar with subjects prior to evaluating them, but unaware of the details of individual subjects' performance. Raters had between 6 months and 3 years of experience with the animals; each rater typically spent several hours, one day a week, looking after the animals within the colony setting. Figure 1. The Simultaneous Chaining paradigm. The task was to touch the items in the prescribed order, regardless of their positions on the screen. An example of a 4-item list is shown in two different, random arrangements, as might appear during any trial in a session. The top row shows the arrangement of ordered pictures, and the bottom row indicates the correct path of selection.

Interrater Reliability
Interrater reliabilities of personality items were calculated from all animals and all raters using intraclass correlations (Shrout & Fleiss, 1979) ICC(3, 1) and ICC(3, k). The items 'cautious,' 'defiant,' 'independent,' and 'stingy/greedy' had ICCs less than zero and were removed from further analysis. The items 'autistic' and 'unperceptive' were omitted because both were removed from an earlier study for being unreliable and thus not included in the definitions of the components (Weiss et al., 2011). The remaining items' ICCs ranged from 0.009 to 0.290 for ICC(3, 1), and 0.079 to 0.801 for ICC (3, k).

Personality and Performance
Average questionnaire ratings were used to compute domain scores from the unit-weighted matrix based on previously derived component loadings (Weiss et al., 2011; Table 1). Performance was measured using three measures of trial-by-trial accuracy. Rewarded trials reflect the binary successes and failures across each subject's trial-by-trial performances: to be rewarded, a subject must complete a full SimChain trial without error. Progress quantifies how far into the list the subject made it on any given trial, before either making an error or completing the trial. Error is defined as the amount of deviation, from the next correct response, in a subject's terminal choice. Error can be either positive or negative: If the subject makes forwards error, jumping ahead in the chain, the Error is positive. If the subject makes a backwards Error, it is negative. If the subject presses each item in the correct order and completes a trial successfully, the Error is 0. Error and Progress for each of the nine monkeys is shown in Figures 2 and 3, respectively.
Reaction time (RT) is the natural logarithm of the interval between the onset of the visual stimuli and the first response. SimChain completion utilizes a series of planned responses (Scarf, Danly, Morgan, Colombo, & Terrace, 2011), but apart from the pause before the initial response, wherein the chain planning pauses occur depends on the individual animal. We analyzed the first response RT for only correct responses, as well as the RT for all first responses, to search for speed-accuracy trade-offs (Prinzmetal, McCool, & Park, 2005).
Correlations between personality and performance (averaged across trials and sessions) are shown in Table 1. Friendliness was significantly positively correlated with Rewarded trials and Progress; negatively correlated with Error. Openness was significantly correlated with Progress and Error, in the same directions as with Friendliness. No significant correlations were found between personality domains and either RT measure. Correlations of | r | > 0.66 are significant at the α = 0.05 level. Anx = Anxiety, Act = Activity, Frd = Friendliness, Dom = Dominance, Opn = Openness, Con = Confidence, Rwd = Rewarded trials, Err = Error, Prg = Progress, RT = all reaction times, RT1 = reaction times on trials which were correctly completed.

Regression Analyses
Simple correlations between averages fail to capture the nuance in individuals' performance. For example, both Error ( Figure 2) and Progress ( Figure 3) demonstrate learning curves and asymptotic plateaus in performance, which differ between animals. To explore personality's relationship with performance in more detail, we modeled each performance metric including personality predictors based on the strength of associations seen in the correlation matrix.

Error
If one wishes to model Error with linear regression, the Error data must first be transformed, because they are non-linear ( Figure 2). This poses a challenge because Error can be both positive and negative, thus log-transformation is not appropriate. Fortunately, Yeo-Johnson transformation, which was designed for and tested on cases such as ours, handles negative values (Yeo & Johnson, 2000). We constructed a series of linear mixed models, using a forward selection approach, starting with a null model which included a trial number variable and intercept. Results of our model selection are shown in Table 2.
Log-likelihood indicated that model 7 was the best fit to the data, while the small-sample corrected Akaike Information Criterion (AICc) indicated that model 6 was the best fit. The Bayesian Information Criterion (BIC) indicated that the null model was the best fit, which was a consistent prediction across all our models. The BIC is more strongly biased towards models with fewer degrees of freedom, for as the sample size increases, the probability that BIC selects the correct model approaches 1 (Vrieze, 2012). For smaller sample sizes, BIC necessarily performs less well on average, so while we continued to calculate it for all models for diagnostic purposes, we did not factor it into our selection procedures.
The details of models 6 and 7 are shown in Table 3. Both models consistently show that higher Openness was significantly associated with a smaller starting error, itself an indicator of better performance. The interaction between Friendliness and Trial was also significant in both models, similarly suggesting that Friendliness was associated with smaller error as sessions progressed. Outside of the interaction, Friendliness was not a significant predictor, though it did appear to marginally improve the fit of the model. The effect size of the Openness coefficient was also larger than either Friendliness coefficient.  Bolding indicates the best model, according to the procedure. df = degrees of freedom, AICc = Akaike Information Criterion corrected for small samples, BIC = Bayesian Information Criterion, LogLik = Log-likelihood, ΔLogLik = difference in log-likelihood between current model and last best fitting model, RM = the reference model for the ΔLogLik comparison.

Progress
Progress displays a similar curve as Error (cf. Figures 2 and 3), but unlike Error, it does not take negative values. However, Progress on the SimChain task can be modeled using Thurstone's learning curve (Jensen et al., 2013), so rather than linearize the Progress data, we modeled Progress with a nonlinear logistic regression. A simple logistic growth curve has three parameters, and is defined: where L is the maximum value or asymptote of the curve, k is the steepness of the curve, and x 0 is the xvalue midpoint of the curve, also known as a scaling parameter.
We used a forward selection approach to model building, jointly inputting personality dimensions as predictors of two logistic parameters: asymptote -L, and steepness -k (Table 4). The model including Friendliness alone was the best fit, but only in the most marginal sense, as the AICc and log-likelihood values were extremely close to those generated by model 2, wherein Openness was the lone personality predictor. Fit became considerably worse when both Friendliness and Openness were included, but we still wished to examine if and how their contribution to the model might change in each other's presence.
All three non-null models are described in Table 5. Friendliness was positively and significantly associated with the asymptotic level of performance; Openness negatively and significantly associated with the steepness coefficient. Due to software limitations, steepness needed to be modeled as 1 / k, thus higher Openness was associated with a steeper, and faster, rate of learning.

Rewarded Trials
Monkeys were reinforced with food only after correctly completing a full SimChain. To model personality's impact on this binary variable, we fitted a generalized linear mixed model, with a binomial logistic link function. Model building was again carried out with a forward selection procedure, and because of the simplicity in adding individual predictors, we chose to input a broader choice of personality predictors (Table 6).
Models 5 and 7 appeared to be the best fit, according to AICc and log-likelihood, respectively. Comparing those two models (Table 7) revealed that when only Friendliness and Openness were included, both were positively associated with subjects' rate of reward. However, when all personality predictors were included, only Confidence showed a significant (and positive) relationship with rate of reward.  Table 2 for explanation of abbreviations. Bolding indicates the best model, according to the procedure. Con = Confidence, Act = Activity, Dom = Dominance, Anx = Anxiety. See Table 2 for explanation of all other abbreviations.

Reaction Time
We analyzed RT data with a series of linear mixed models. In light of the previous result and the generally weak correlations between personality and RT, we used a backward selection procedure, removing the lowest scored predictor from the previous model, for all models built on RT data. We first examined the fit of models predicting RT for all first responses (Table 8).
The log-likelihood indicated that model 1, featuring all personality predictors, was the best fit, but AICc suggested that removing Activity added a small improvement in fit. Comparing the two models' predictors directly (Table 9) yields consistent results. In model 2, removing Activity drastically increased the χ² scores of all predictors, but the two predictors which are significant in model 1, Confidence and Friendliness, were stronger than all other personality predictors in model 2. Confidence demonstrated a negative relationship, such that more confident monkeys tended to have lower, i.e., faster, reaction times. Friendliness had an opposite, positive relationship with reaction time; friendlier monkeys were slower to respond. Only the correct first responses were separately analyzed, as well, for these two RT measures may tie into different processes (Prinzmetal et al., 2005). The models' log-likelihoods again suggested that model 1, containing all predictors, was the best fit (Table 10). On the other hand, model 3, containing Friendliness, Openness, Confidence, and Dominance, was suggested to be the best fit by AICc. We directly compared these two models and the intermediate model (Table 11).
All three models indicated that Friendliness, Openness, and Confidence were significantly associated with RT on correct first responses. As in our models of all first responses, Friendliness was positively associated with RT, and Confidence negatively associated. Openness demonstrated a negative relationship with RT. Bolding indicates the best model, according to the procedure. Con = Confidence, Act = Activity, Dom = Dominance, Anx = Anxiety. See Table 2 for explanation of all other abbreviations.

Sensitivity Analysis
To determine if our findings were unique to a six component structure, we extracted our own structures. Because we had only 9 subjects, four methods commonly used to choose how many factors to extract did not yield consistent results. Ruscio and Roche's comparison data, Horn's parallel analysis, Velicer's MAP criterion, and the acceleration factor, as well as two prior studies (Capitanio, 1999;Weiss et al., 2011), suggested anywhere from two to six factors. Since the interpretation of any single factor structure extracted from these data would be dubious, we used regularized exploratory factor analysis (Jung & Lee, 2011), a procedure developed for small samples, to separately extract 2, 3, 4, 5, and 6 factor structures. Salient loadings were defined as ≥ |0.6|, to minimize cross-loadings. Unit-weighted, varimax rotated matrices were compiled from the salient loadings for each solution. As in prior studies (e.g., Weiss et al., 2011), when more than one factor was salient for an item, the weight was assigned to the factor with the higher loading.
Within every solution, one factor correlated with subjects' averages of our accuracy measures. Which adjectives loaded onto these factors is shown in Table 12. The adjectives 'innovative' and 'inventive', which were each correlated with the averages of our performance measures (rs > |0.84|, ps < 0.05, after Holm-Bonferroni correction), were salient for all structures. 'Intelligent,' the third adjective to pass Holm-Bonferroni correction, was weighted on only three correlated factors. 'Curious' and 'decisive,' two adjectives correlated with Openness and Friendliness, pre-correction, were salient on three domains, as were 'individualistic,' 'independent,' and 'quitting,' items that were not part of Openness or Friendliness.
Across structures, performance metrics were compared to 20 factors. After Holm-Bonferroni correction, we found that correlations between the accuracy measures and the sixth factor of the six factor structure remained significant. Correlations also maintained significance with the second factor of the two factor structure. Significant correlations were not supported for Rewarded trials, Progress, or Error. These factors were composed largely of the same adjectives (Table 12), some of those explicitly noted in the preceding paragraph. Inclusion of adjectives like 'innovative,' 'inventive,' 'intelligent' and 'curious' represent behaviors associated with openness and intellect. 'Conventional' (negatively loaded), 'individualistic,' 'independent,' and 'decisive' emphasize assertiveness and individuality, monkeys that were extraordinary and whose personalities stood out to our raters. All-together, the traits associated with serial cognitive performance appear to indicate that higher scoring monkeys were more sociable, exploratory, extraordinary, and open.

Unemotional -
Note. Two, three, four, five, and six factor models extracted via Sunho and Lee's Regularized Exploratory Factor Analysis (2011). One factor was significantly correlated with all accuracy measures, and the salient loadings for each such factor are shown. +s indicate positive loadings,-s indicate negative loading. Bold adjectives loaded on Openness in the six-component model and italic adjectives loaded on Friendliness. The correlated domain of the four factor structure assumed the opposite sign from the other factors, but is consistent with the other loadings, and has been inverted in this table.

Discussion
Rhesus macaques' personalities covary with SimChain task performance: across different measures, Friendliness and Openness were related to performance. These associations extended beyond a priori assumptions about personality structure. Distinct adjectives clustered around factors which consistently correlated with accuracy.
Openness and Friendliness drive distinct aspects of SimChain performance. Friendliness was consistently related to performance over time: the magnitude of asymptotic performance under the Progress metric, and the linear slope of the transformed Error variable, approaching zero ( Figure 2). Openness was related to the rate of learning: the steepness of the Progress curve, and the starting point of the Error curve.
The Error models are not clearly interpretable because we needed to model a transformed Error variable in order to cope with Error's inherent non-linearity. Nevertheless, the contributions of Friendliness and Openness are also distinct in these models. The distinction between the effects of different personality dimensions is lost in our models of Rewarded trials and RTs, and considering that the averages of all accuracy measures are very highly correlated, it may be that a single latent variable drives the relationships between performance and both Openness and Friendliness. This is consistent with the observation that the g factor predicts performance across diverse mental tasks, while being consistently related to personality (Ackerman & Heggestad, 1997).
Confidence, while not strongly correlated with any performance measure (rs = 0.08 to 0.38), repeatedly appeared as a significant predictor, particularly in models of RT. Researchers of general intelligence recognize that external variables, such as speed-accuracy trade-off strategies and assessment anxiety, can affect assessment (Chamorro-Premuzic & Furnham, 2014). Confidence appears to be one such variable, being more closely associated with RTs than accuracy; suggesting that it may play a similar role as Extraversion and Neuroticism, associated with speed-accuracy trading-off and test-taking anxiety respectively, in humans. This is consistent with the fact that Confidence captures situational and social fear (Weiss et al., 2011).
Our results compare favorably to those of Morton et al. (2013), who found correlations between Openness and both task participation and response error in capuchin monkeys. Similarly, chimpanzee participation and performance (Herrelko et al., 2012;Hopper et al., 2014), has been tied to the Openness dimension of that species. However, Morton et al. warn against over-extensive comparisons between studies, as neither personality dimensions nor cognitive tasks tend to be directly analogous to one another. Even if personality dimensions have been assigned the same descriptive names post-hoc, they will never represent quite the same capacities. Similarly, while all cognitive tasks will tap into general and more specific domains of intelligence, for researchers to understand the psychological differences underlying individual and species level differences in performance, task implementation must be as consistent as possible.
While animal studies have only begun to explore the associations between personality and cognitive abilities, the literature on humans is more developed, and ought to be used as one reference point for the formulation of hypotheses and interpretation of results. Openness in humans is modestly to moderately correlated with g (Ackerman & Heggestad, 1997), particularly with typical intellectual engagement and crystalized intelligence. Macaque Friendliness does not have a clear analog among the Big Five; it is mostly constituted by adjectives associated with the human domain of Extraversion and Agreeableness, and perhaps crucially, the item 'intelligent', which positively loads on human Openness (DeYoung, 2014). Monkeys scoring high on Friendliness have been described as "sociable and cooperative" (Weiss et al., 2011, p. 77), and it is likely the cooperative aspect of the domain that makes friendly monkeys strong performers.
In humans, RT has been repeatedly correlated with g (Jensen, 2006). The fact that correct RTs are predicted by Openness and Friendliness is consistent with a general factor among this species. However, the association between Friendliness and RT is positive (i.e., Friendlier monkeys are slower to respond), in contrast to Openness, which has a negative relationship with RT. Friendliness and Openness mirror each other in predicting accuracy. This divergence is curious, but consistent with the hypothesis that RT and accuracy require different mechanisms (Prinzmetal et al., 2005), and suggests that the mechanism underlying the association between RT and g ought to be studied further. RT measures within the human species have proven to be robust, and this study suggests that RT differences could be useful among other primates, but only as a within-species measure. Washburn and Rumbaugh (1997) previously discussed the comparative flaws in using RT; to grasp the magnitude and significance of cognitive differences between species, researchers must take care when choosing their measures.
Cognitive and neurological evidence indicates that RT and accuracy rely on different architectures (Landau, Esterman, Robertson, Bentin, & Prinzmetal, 2007). What evidence we found reinforces this theory; our results imply that RT can be predicted by personality domains that are not related to accuracy. Our findings strengthen the need for comprehensive, unified testing of primate intelligence, particularly in the context of personality, and we reiterate Morton et al.'s (2013) call for caution when studying animal cognition and personality with small samples.
The Primate Cognition Test Battery (Herrmann et al., 2010) is perhaps the best known collection of cognitive tests for primates, but its assessment of physical and spatial cognition is limited to basic, concrete tests; it contains no test of symbolic reasoning, of which SimChain is but one. The SimChain paradigm has been used in several species (Terrace, 1993;Wagner, Hopper, & Ross, 2015), with immature and adult individuals (Inoue & Matsuzawa, 2009); the task is repeatable and informative. Distinct cognitive tasks are likely to tap into general or domain specific intelligences to varying degrees, and since it is not known how many factors are best for modeling macaque intelligence, it remains an open question which domains SimChain performance draws on. However, even in models of intelligence with more than a general factor, there tends to be significant overlap between specific domains and g (Danner, Hagemann, Schankin, Hager, & Funke, 2011). While SimChain is likely representative of g, the task is at very least a strong indicator of symbolic reasoning. Additionally, our monkeys had achieved mastery with the SimChain task when tested for this study, so task learning effects would not affect results (Vonk & Povinelli, 2011); this is beneficial since it removes a confound, but it would also be interesting to investigate associations between personality and task acquisition.
More research is needed to determine how tests of serial cognition relate to other tasks, like numerical addition or object transposition (Herrmann et al., 2010). Once relationships between tasks are established, tests of more advanced cognitive faculties could be incorporated into batteries that assess comparable abilities in primates and adult humans. Regardless of whether general intelligence correlates with one of more primate personality dimensions, individual tests -representative of physical, social, or other cognitive proficiencies -might be tied to different personality dimensions, as is suggested in the human literature (Austin et al., 1997;Chamorro-Premuzic & Furnham, 2014). Additionally, factor models of primate intelligence have been investigated (Herrmann et al., 2010;Hopkins et al., 2014), and the results have been favorable.
Complex cognitive tasks, like Raven's Progressive Matrices, are extensively used in human intelligence testing because of their strong associations with general intelligence and specific abilities (Austin et al., 1997). Raven's Matrices is also a difficult task, which is a major reason why it is an effective test (Raven, 2000). Our study demonstrates that nonhuman primates are capable of completing complex cognitive tasks that have meaningful associations with personality and intelligence, and other, difficult tasks need not be ruled out as being too challenging for primates.
Our study is not without limitations. SimChain tests serial cognition, and consequently only assesses a portion of a monkey's cognitive repertoire. For instance, while SimChain allows us to capture characteristics about accuracy, it is not as well-suited for studying RTs -we could only model the latency between stimuli onset and the first response. Our sample of monkeys also contained only males, and while a representative sample ought to of course include females, evidence from multiple tasks showed no sex differences in any performance metrics among a group of six male and seven female long-tailed macaques (Schmitt et al., 2012). However, Hopper et al. (2014) found differing contributions from personality to male and female chimpanzees' problem solving success, so we ought not to rule out the possibility that performance in female macaques may have a different relationship with personality.
A comprehensive study using large samples would be the best way to tackle task consistency, sex differences, and other sources of variability. Different primate species, all of whom have been rigorously trained and tested in a diverse range of cognitive tasks, ought to be rated for personality, which would allow us to address questions concerning the evolution of general and specific types of intelligence, and the common origins of intelligence and personality. Even a broad study such as this would likely suffer from a drawback that our work suffers from as well: these results rely on captive animals, and captive animals may not be representative of the wider population.
Nevertheless, captive animals are useful models. Rhesus macaques are the gold standard for primate research in neuroscience, genetics, and medicine and our results have implication for these fields. Subjective well-being and personality are heritable and phenotypically and genetically correlated in nonhuman primates (e.g., Adams, King, & Weiss, 2012). Moreover, Friendliness, which is correlated with subjective well-being in macaques (Weiss et al., 2011), is associated with serial intelligence. Subsequent research is needed to determine if the six macaque domains and subjective well-being are heritable, but in humans and chimpanzees, both well-being and personality are heritable, and genetically correlated (Weiss, Bates, & Luciano, 2008;Weiss, King, & Enns, 2002); intelligence too is heritable in both ape species (Davies et al., 2011;Hopkins et al., 2014). The existing monkey literature supports the heritability of personality (Brent et al., 2014;Williamson et al., 2003), though as of yet, no substantive evidence supports the heritability of subjective well-being and intelligence in rhesus macaques. More research needs to investigate these questions, for if individual psychological differences are heritable in macaques, artificial breeding and the research coming out of macaque colonies might be improved by selecting for friendly, intelligent, and mentally healthy phenotypes.
Intelligence and personality are the two pillars of differential psychology. Intelligence has for some time been a major subject of study for evolutionary biologists, and personality has recently gained traction among behavioral ecologists and comparative psychologists (Griffin et al., 2015;Weiss & Altschul, in press). Deeper investigations into primate cognition and personality will enrich both comparative and differential psychology.