Majestic tigers: personality structure in the great Amur cat

We explore individual differences in tiger personality. We first asked—is there evidence of personality dimensions (analogous to the Big Five in human personality research) in the Amur tiger? We then asked, are any discoverable personality dimensions associated with measured outcomes, including group status, health and mating frequency? 152 of our participating tigers live in the world's largest semi-wild tiger sanctuary in North Eastern China. Our second sample of 96 tigers also lives in a sanctuary. Having two samples allowed us to assess the replicability of the personality dimensions or factors reported in our first sample. We found that two factors (explaining 21% and 17% of the variance among items) which we call, for descriptive ease, Majesty and Steadiness, provide the best fit to the data. Tigers that score higher on Majesty are healthier, eat more live prey, have higher group status (among other tigers as assessed by human raters) and mate more often. We provide some ethological context to put flesh on the quantitative bones of our findings concerning these magnificent and charismatic animals.

We explore individual differences in tiger personality. We first asked-is there evidence of personality dimensions (analogous to the Big Five in human personality research) in the Amur tiger? We then asked, are any discoverable personality dimensions associated with measured outcomes, including group status, health and mating frequency? 152 of our participating tigers live in the world's largest semi-wild tiger sanctuary in North Eastern China. Our second sample of 96 tigers also lives in a sanctuary. Having two samples allowed us to assess the replicability of the personality dimensions or factors reported in our first sample. We found that two factors (explaining 21% and 17% of the variance among items) which we call, for descriptive ease, Majesty and Steadiness, provide the best fit to the data. Tigers that score higher on Majesty are healthier, eat more live prey, have higher group status (among other tigers as assessed by human raters) and mate more often. We provide some ethological context to put flesh on the quantitative bones of our findings concerning these magnificent and charismatic animals.

The lexical approach to probing personality
Human personality research arises from a lexical viewpoint [1]. The premise of this approach is that because people think and talk about themselves and others, human personality should be captured in our language by adjectives that describe us (such as kind, avaricious, mendacious or reliable). The next step is to test whether data reduction techniques can reveal an underlying (not directly observable) pattern or structure to these words when applied to individuals. In human personality research, a leading model supports the existence of five factors (openness to experience, conscientiousness, extraversion, agreeableness and stability), which capture many of the differences among people [21]. The work on tigers reported here follows the same general strategy (discussed in [22], see ch. 2).
We asked the following questions: (i) is there a discernible structure of personality in Amur tigers? (ii) is any personality in tigers linked to other outcomes such as physical characteristics, or social status? (iii) is there evidence of sex differences in tigers? To explore these questions, we assessed data from two tiger populations. royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 10: 220957 2. Methods

Sample One
All the animals in this report are semi-captive. Animals in Sample One live in the largest protected reserve in China, the Harbin Siberian Tiger Park, in North Eastern China, which extends 1.44 km 2 . The study comprises 152 tigers (85 males and 67 female tigers, age range 1-16 years). In the park, tigers are fed in the open ground. All tigers had lived for more than six months in the park.

Sample Two
Ninety-six tigers (52 males, 44 females, age range 2-18 years) live in Hengdaohezi Siberian Tiger Park, Hailin city, Heilongjiang province, 272 km from Harbin. The densely forested park borders a mountain and extends 0.14 km 2 . In this park, tigers are separated by age-group and fed in the open ground. Each tiger had lived in the park for at least six months.

Measures
A tiger personality questionnaire containing a list of 70 words (items) considered suitable to describe tiger personality was given to each rater. Each word was defined in writing. Please see the electronic supplementary material for a step-by-step description of the scale-development. For the second sample three words were dropped (because of low eigenvalues), so the questionnaire administered to Sample Two contained 67 words.

Raters
The raters of the first, larger, sample of 152 tigers comprised 26 people who were either feeders or veterinarians. Each rater had worked with the tigers for at least six months and could identify every individual tiger. The raters were instructed in the use of a seven-point Likert scale, the meanings and nuance of any adjectives were clarified, and behaviours associated with various adjectives were discussed between raters and researchers so that the raters shared a consensus view of what they were being asked to do. The rating questionnaires were completed over a two-week period. The same process was followed for Sample Two.
The 27 raters of the second, smaller, sample of 96 tigers were also feeders or veterinarians, with at least one year of experience working at the park. All raters could identify each tiger and were familiar with the tigers' physical characteristics and behaviours. The rating questionnaires were completed over a one-week period.

Study design
Within each sample, all raters were invited to rate all tigers. In fact, 475 questionnaires were returned from the larger sample, and 340 questionnaires were returned from the second smaller sample, providing a high item-to-rater ratio. For Sample One, there were between 1 and 12 raters per tiger, with an average of 3.1 raters per tiger. For Sample Two, the number of raters ranged from 1 to 8 raters per tiger, with an average of 3.6 raters per tiger.
2.6. Analytical strategy 2.6.1. Rater effects We first quantified the rater effects, on each sample, using the intraclass correlation coefficient (ICC) to assess the extent to which different people assessing the same tiger agreed on their evaluations. For the ICC, we used a one-way random effects model estimating the reliability for the average of k raters using ICC from the psych package in R 4.1.2 [23,24].
Next, using lm from the stat package in R 4.1.2, we ran regressions on dummy variables created for each rater in order to estimate the explained variance by all raters and to obtain residuals without rater effects. We created a set of item responses with and without rater effects for each dataset (samples one and two) for follow-up analyses.
royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 10: 220957 3 2.6.2. Principal components analysis Using princomp from the stat package in R 4.1.2, we conducted a principal components analysis (PCA) on Sample One to get an indication of the magnitude of any dimensions of variation that emerged from the 70 items.

Exploratory factor analysis
Guided by the number of potential underlying factors suggested by the PCA and previous work on other cats [25], we conducted an exploratory factor analysis (EFA) on Sample One using factanal from the stat package in R 4.1.2. This approach uses maximum likelihood (ML), which assumes that the observed variables come from a mixture of several Gaussian distributions, that is, the latent variables each come from a unique Gaussian distribution that has some noise. As in most animal behaviour research, we did not assess fit indices due to their requirements of large sample sizes and high sensitivity to normality assumptions [26,27]. Instead, we look for the most parsimonious solution while relying on the interpretability of the factors and their meaningfulness. We performed two EFAs: one on the dataset with rater effects and one on the dataset without rater effects.

Factor analysis with Procrustes Rotation
Following exploratory factor analyses, we conducted on both samples a factor analysis (FA) with Procrustes Rotation, guided by its utility as a confirmatory tool [28] using factanal from the stat package in R 4.1.2. We then tested for congruence between the two samples as our measure of repeatability across the two samples, both with rater effects left in and rater effects regressed out in R 4.1.2 as described in [28].

Associations between personality factors and some evolutionarily relevant outcomes
Taking the best fitting model forward from the factor analyses with Procrustes Rotation, we explored correlations among the factors and measured outcomes: age, length, weight, health, preying on live animals, food intake, mating frequency, breeding, rank among other tigers as assessed by human observers, and whether a tiger was raised by his or her mother (rather than human-fed). We explored these correlations separately for each sample. Correlations were computed using rcorr from the Hmisc package in R 4.1.2.

Sex differences in evolutionarily relevant outcomes
Lastly, we examined sex differences in the factors and each of the above-measured outcomes, separately within each sample of tigers, using t.test from the stats package in R 4.1.2.

Rater effects
We computed the ICCs in a one-way random effects model, with raters as random effects. This model was devised for datasets where the same set of raters report on all subjects; it estimates the reliability of the average score of all raters for a single item [23]. On average, the 70 items had an average ICC of 0.83 (ranging from 0.49 to 0.95) for Sample One, and an average ICC of 0.87 for Sample Two (ranging from −0.16 to 0.96). These values indicate a good reliability of the average scores of raters [23], but there was variation in the ICCs between items (figure 1), which is why we created two sets of items for each sample: one taking the average item score between raters for each tiger, and one where we regressed out the rater effects before taking the average item score between raters for each tiger. We regressed out the rater effects using dummy variables for the raters, of which the distributions of the explained variance (R 2 ) are shown in figure 1, with an average R 2 of 0.18 for Sample One and 0.10 for Sample Two.

Principal components analysis
The scree plot of the PCA conducted on Sample One indicated the existence of two major factors explaining greater than 15% of the variance each, a third major factor explaining greater than 5%, and two additional smaller factors explaining approximately 4% each. These results were similar whether raters were left in or regressed out (figure 2).

Exploratory factor analysis
Based on the PCA results, we ran exploratory factor analyses on Sample One with two-, three-and fivefactor models. The most parsimonious options contained two or three factors, as the number of items with factor loadings greater than 0.4 dropped substantially for the fourth and fifth factors (table 1). As in most animal behaviour research, fit indices were not considered due to their requirements of large sample sizes and high sensitivity to normality assumptions [1,26,27].
The three-factor solution resulted in factors explaining approximately 20%, approximately 15% and approximately 10% of the variation (electronic supplementary material, figure S1). Only the first two factors showed much overlap of items with factor loadings greater than 0.4 between the EFA royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 10: 220957 including rater effects and the EFA with rater effects regressed out (32 and 11 items overlap, respectively, with factor loadings correlating 0.98 and 0.99; electronic supplementary material, figure S2). The factors of the two-factor model explained approximately 20% and approximately 18% of the variation (electronic supplementary material, figure S3). Both factors showed a large overlap of items, with factor loadings greater than 0.4 between the EFA including rater effects and the EFA with rater effects regressed out (31 and 23 items overlap, respectively, with factor loadings correlating 0.98 and 0.99; electronic supplementary material, figure S4). These results suggest that removing the rater effects matters only for the third factor, but not for the first two major factors.

Factor analysis with Procrustes Rotation
To evaluate the replicability of the two-and three-factor structures, we conducted an EFA in the replication sample, using targeted, i.e. Procrustes, rotations to assess the factor similarity with the first sample. When exploratory factor analyses in independent datasets show similar factor structures, it is considered strong evidence of replicability. This approach has been shown to produce more reliable results than confirmatory factor analyses (CFAs) in evaluating the replicability of factor structures in human personality data [28]. The quantitative index used to evaluate factor similarity is the congruence coefficient used in [28], where a value higher than 0.9 is considered a matching factor, and higher than 0.8 considered a fair similarity [28,29]. We included items with a factor loading of greater than 0.4 in the initial EFA. The new EFAs were conducted in Sample One and Sample Two with Varimax rotations. These replication analyses were conducted with rater effects left in and with rater effects regressed out. For the three-factor model, the average factor congruence was generally higher with rater effects left in (per factor congruence: Factor 1: 0.89, Factor 2: 0.76, Factor 3: 0.64) than with rater effects regressed out (per factor congruence: Factor 1: 0.80; Factor 2: 0.67; Factor 3: 0.67), although both overall congruence coefficients were lower than 0.8 (overall congruences with rater effects left in and with rater effects regressed out were 0.78 and 0.71, respectively; figure 3 and electronic supplementary material, figure S5). The two-factor structure showed higher congruence coefficients than the three-factor structure, and again, the items with the rater effects left in (Factor 1: 0.93, Factor 2: 0.67) showed a higher overall congruence than items with rater effects regressed out (Factor 1: 0.67, Factor 2: 0.87). The best results overall were obtained for the two-factor structure with rater effects left in, with an overall congruence of 0.81, with the first factor showing the highest congruence of all (0.93; figures 3 and 4 and electronic supplementary material, figure S6). As expected, the correlation between these two factors is small; 0.17 ( p = 0.03) in the first dataset and 0.16 ( p = 0.12) in the second dataset. These are the two factors that will be included in all subsequent analyses.
Since a factor number is an impoverished interpretive label, we refer to Factor 1 as Majesty, and Factor 2 as Steadiness guided by the adjectives comprising the items in the two factors ( provided in electronic supplementary material). We draw attention to the richness of the Chinese language in personality assessment (data collection and ratings were conducted by Chinese native speakers). Table 2 below shows correlations between the two factors, Majesty, and Steadiness, in each sample. Tigers scoring higher in Majesty were healthier, preyed more on live animals, ate more, mated more Table 1. The number of items with factor loadings stronger than 0.4 for the two-, three-and five-factor models for Sample One, including rater effects (left) and with rater effects regressed out (right     royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 10: 220957 Table 2. Pearson correlations between factors and baseline characteristics (partial R = corrected for sex and age). royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 10: 220957 often, bred more often, and were regarded by their human raters as having higher group status among tigers. Since some of the tigers had been fed by humans as nurselings, we could determine that being mother-reared was associated (not necessarily causally) with being more Majestic. There was broad comparability across the two samples. Only one sex difference showed in these correlational data. In the second and smaller dataset, the link between Majesty and being reared by his or her mother was stronger in males ( p = 0.002 two-tailed). This could be sampling variance since it was not found in the first and larger sample.

Sex differences observed among the Amur tigers
As expected, males were larger and heavier than females in both samples (electronic supplementary material, figure S7). Males preyed on live animals and ate more in the larger sample but not in the second sample. Males were more Majestic ( p = 0.01, two-tailed) in the larger sample but not in the second sample.

Discussion
In this study of Amur tiger personality, we found evidence of two factors which together explain 38% of the variance in the questionnaire scores. These factors were largely replicable across two independent samples. Why concern ourselves with probing tiger personality structure? In other animals individual differences in personality or behaviour have been associated with health [2] and breeding status [3]. As well as the intrinsic pleasure of learning more about the world we live in, research on animal personality can augment our capacity to manage and conserve wildlife more effectively.
We face obstacles in linking this work to similar studies since this is the first psychometric study on personality in wild-living Amur tigers we know of. An analysis of domestic cat personality recommends using the human five-factor model labels for dimensions (to avoid each researcher inventing their own labels which is a barrier to cross-study comparison) [30], yet we found the items and factor names for human personality a poor match to tiger personality. An interesting cross-site study of zoo-living Amur tigers found three personality factors called by the researchers 'anxious, quiet and sociable' but the sample size of 19 Amur tigers is small [31]. Our inventory of items was purpose-built for tigers in the large samples we were able to obtain. In naming our factors Majesty and Steadiness, we chose labels that are semantically coherent, even in translation. 'Majesty' includes 'dignified, imposing, farsighted' and 'ambitious' to pick a few of the word items. 'Steadiness' includes 'sincere, methodical, tolerant' and, our favourite, 'frank'. Both factors comprise items that, according to human values, are broadly positive; the undesirable trait items such as (stupid, dependent, aggressive and neurotic) load negatively on these two factors.
Majesty was linked with better health and higher status in both datasets. What in humans are desirable personality traits may co-occur with valuable outcomes in tigers. We wish to avoid going beyond our data in interpreting these results; it is possible that natural and sexual selection have forged a tendency for 'good things to go together' in tiger personality, but that is simply a suggestion. We can only assess tiger status from a step removed-our measure is the human raters' assessment of the tigers' ranking. But this measure was highly convergent among raters who had a great deal of observation time as well as expertise in tiger behaviour. Status is an evolutionarily significant trait among tigers. Male tiger fitness is enhanced by the acquisition and management of large territories; by limiting other males' mating access to females within the territory, and by remaining healthy. 'Status' likely comprises a constellation of traits on which individuals assess one another. Status is relevant to tiger decision-making about when and whether to fight, or with whom to mate-choices that are highly consequential.
The Steadiness factor, which has items that could be interpreted as relating to a neuroticism-like dimension, had a lower congruence (0.67). One possible limitation that may have contributed to the lower congruence coefficient is sample characteristics. The replication sample may have had different characteristics that could have affected the factor structure. In only the second sample, for example, the tigers were separated by age-group and lived on a smaller territory. Another limitation could be measurement error, due to, for example, raters that differed in how long they knew the tigers (the raters in Sample One worked for the tigers for at least six months, in Sample Two for at least a year).
royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 10: 220957 9 The only sex difference that manifested in both samples was the propensity for males to be heavier. We consider the other outcomes associated with being a male tiger (eating more, eating more live prey and being higher in the factor we call Majesty) only as possible true sex effects which may instead arise from sampling variance since they did not emerge clearly in both datasets.
We suspect that if one could study a large sample of tigers in the wild, the results could have been slightly different. It is unlikely that wild animals would express exactly the same behavioural repertoire as either a managed wild population, or a zoo population of the same species. We make no prediction about the size or direction of any such differences. Given the challenge in observing even a single wild tiger, our protected sanctuary-living tigers are the only currently practical possibility for collecting a psychometric dataset. The factors Majesty and Steadiness are linked with biological outcomes; this increases the likelihood that they are meaningful.
We hope this report will stimulate further work on Amur tiger personality. Managing land resources among competing species (in this case mostly humans and tigers) is a complex multivariate problem. This work shows that, like us, tigers are individuals. And that their temperaments are associated with ecologically relevant outcomes. There is much yet to learn about their individual and species-typical capacities. Let us hope enough tigers remain for future scholars to study, since we are short on immortal hands and eyes to frame replacements should we lose this species that burns so brightly in our imaginations and in life.