Introduction

Metacognition is the process of monitoring one's own cognitive and memory processes and using metaknowledge to control information processing and behavior (Koriat, 2007). When humans engage in metacognition, it is a highly "private" activity that we can express verbally. On the other hand, Smith et al. (1995) examined for the first time whether nonhuman animals possess metacognitive abilities. In a psychophysical experiment, these authors trained a bottlenose dolphin in an auditory (pitch) discrimination task with various difficulties, and introduced an “escape” response. When the dolphin made the escape response, the task was canceled, and the reward was given after a delay. They found that the dolphin increasingly produced escape responses around the frequency at which pitch discrimination became difficult. Based on these results, the authors discussed that this escape response reflected the dolphin's uncertainty in its own ability to succeed at the task's difficult frequencies. They also conducted a random-dot-density discrimination task in rhesus macaques and introduced a choice option that allowed escape responses (Smith et al., 1997). As in dolphins, escape responses by monkeys increased as discrimination became more difficult. After this pioneering study, diverse studies have been conducted over the past two decades on animal metacognition (Crystal, 2019; Fujita, 2010; Hampton, 2009; Kornell, 2014; Smith, 2009; Tomasello, 2022, for review).

Hampton (2009) argued that four criteria must be met to empirically demonstrate metacognition. First, primary and objectively observable behavior that can be scored for accuracy and efficiency, such as the dolphin's discrimination task, must be specified. Second, there must be variation in the performances of the primary behavior. Third, observable secondary behavior (e.g., the escape response) behind the primary behavior that can be used to monitor and adjust cognition must be identified. And finally, the correlation between the primary and secondary behaviors must be explicitly assessed. A typical example of a secondary behavior is "uncertainty" monitoring. This behavior expresses how confident or uncertain one is about what one will do, is doing, or has done. Depending on the task, an uncertainty response may cancel the current task or seek more information to solve the task (Call & Carpenter, 2001; Hampton, 2001; Shields et al., 1997; Smith et al., 1995). In other words, secondary behaviors have a beneficial aspect to task execution (Smith, 2009).

Studies in nonhuman animals have demonstrated the presence of metacognition by showing that the occurrences of these secondary behaviors are correlated with primary behaviors (correct response rate or response time). Based on Hampton’s (Hampton, 2009) strict criteria, one can say that animals have metacognition only if these correlations cannot be explained by other factors. One of these factors is environmental-cue association. If a task difficulty corresponds to a particular set of stimuli, participants may use this environmental cue to express their secondary behaviors. For example, the relative quantity judgment task has difficult conditions containing more items (e.g., 40 vs. 38) and easy conditions containing fewer items (e.g., 4 vs. 2). If escape responses are more frequent in a difficult condition, then there is a chance that participants may be making these responses simply based on the number of items shown on the screen rather than the difficulty of the task. The factor of environmental-cue association in this case could be controlled by giving them an easy task with a larger number of items (e.g., 40 vs. 20).

Another challenging factor for animal metacognition is behavioral-cue association. Some properties of one's primary behavior can be cues to switch to secondary behaviors. For example, difficult tasks require more time for perceptual processing and decision-making, resulting in longer response times (Tomonaga, 2001). It is possible that the length of response time itself may serve as a cue to control secondary behavior independently of the task's difficulty. This effect could be examined by experimentally manipulating the response time, independent of task difficulty (but see Palser et al., 2018). Alternatively, we could introduce a "prospective" metacognitive task in which secondary behaviors are required before primary ones (e.g., metamemory task; Fujita, 2009; Hampton, 2001)

A third factor is response competition. In many metacognitive tasks for animals, the primary behavior competes with the secondary behavior because the task's options to produce primary and secondary behaviors are presented simultaneously (e.g., Smith et al., 1995). If the task for the primary behavior is difficult, the response time may increase, and consequently, the chance that the secondary behavior will compete with the primary behavior may increase. To avoid such response competition, it may be useful to use predictive or retrospective metacognitive tasks in which the secondary behavior is performed before or after the primary behavior (Hampton, 2009).

In typical metacognitive tasks for animals, secondary behaviors, such as escape responses, are often newly "trained" within the same experimental context as the primary behaviors. Trained secondary behaviors can increase task complexity and leave open the possibility for an "associationist account" of secondary behaviors (Fujita, 2010; Smith, 2009). In humans, on the other hand, "spontaneous" behaviors are often emitted in various problem-solving contexts implying metacognition (Heyes et al., 2020; Patel et al., 2012). For example, when watching a quiz show on TV, we often see harder button pressing during confident questions while tilting the head or gently pressing the button during difficult ones. It was also reported that in a question-and-answer context, listeners inferred the "feeling of knowing" of speakers based on changes in prosody, filler, and pauses in their responses (Brennan & Williams, 1995). In addition, young children showed various forms of verbal and nonverbal monitoring and control behaviors in problem-solving situations (Bryce & Whitebread, 2012). Spontaneous behavior monitoring reduces the chance that secondary behaviors could be explained by associative factors and, therefore, may be useful for studying nonhuman animal metacognition.

In the present study, we focused on spontaneous (i.e., not explicitly trained) metacognitive behaviors shown by chimpanzees. That is, rather than shaping novel secondary behaviors in the laboratory, we explored whether any spontaneous behaviors of chimpanzees reflect some metacognitive components. There have been several studies reporting correlations between spontaneous behavior and metacognition in nonhuman animals, but they may be subjected to associative explanations. For example, Call and Carpenter (2001) found that great apes spontaneously showed information-seeking behavior by looking into an opaque tube if they saw food being placed in one of two tubes during a task in which they must choose food hidden within a tube. Such information-seeking behavior has been reported not only in great apes but also in captive and semi-wild rhesus macaques and dogs (Belger & Bräuer, 2018; Hampton & Hampstead, 2006; Hampton et al., 2004; Rosati & Santos, 2016; Royka et al., 2020; but see Paunkner et al., 2006; Taylor et al., 2020). However, in these studies, distinct environmental cues (the experimenter hiding the bait in view) could explain the results (cf. Call, 2010). Similarly, Tomonaga and Matsuzawa (2002) also found some spontaneous behavior related to confidence in a chimpanzee. Matsuzawa (1985) and Murofushi (1997) have reported that when a chimpanzee chose an Arabic numeral corresponding to the number of dots presented on the screen of a monitor, their response time increased monotonically for numbers greater than four. They measured spontaneous look-back responses at a stimulus screen during a counting task and found that as the number of dots increased (and the harder the task became), the frequency of the look-back response increased, which led to an increase in response times. This response could be considered information-seeking. However, since the number of items linearly corresponded with task difficulty, it might be difficult to reject an environmental-cue association account.

Beran et al. (2015) observed the chimpanzees’ behavior during a computer-controlled memory task in which auditory feedback for a correct choice response was delayed. Food rewards for correct responses were presented at a different site than the place for the task, and their delivery was also delayed. If the chimpanzees did not move to the food-dispenser site before the food was delivered, the reward could not be obtained. In such contexts, the chimpanzees were more likely to move to the dispenser on correct trials than on incorrect trials, and these movements occurred before any external feedback on the outcome of the response. Beran et al. concluded that the chimpanzees moved or did not move on the basis of their confidence in their own responses.

Allritz et al. (2021) also gave the chimpanzees a computer-controlled transitive-inference task and observed “wavering” behavior, a back-and-forth movement of the finger between stimuli, a characteristic behavior associated with difficult experiences in humans, during the task (see also Tolman, 1927). Although they performed very accurately in all conditions, they exhibited wavering behavior more often on more “difficult” trials. This wavering may reflect chimpanzees' uncertainty about the task, similar to the look-back behavior reported by Tomonaga and Matsuzawa (2002).

Hampton and Hamstead (2006) also examined whether frustrative behaviors predict task performances in rhesus macaques during the matching-to-sample task. The monkey tended to tap the touchscreen harder during error responses but touched the screen more gently during correct choice responses. These differences occurred before the external feedback was presented, suggesting that the monkey knew whether or not he remembered the correct responses.

It has been known that chimpanzees in our laboratory show an affective response to feedback (such as buzzers) for errors in cognitive experiments (Itakura, 1993; Yamanashi & Matsuzawa, 2010). In addition, we found that if their choice response was correct, two chimpanzees often looked up at the food dispenser that delivered the reward at the same time as a chime sounded for choice responses (Fig. 1). On the other hand, when their response was incorrect, the chimpanzees rarely looked back at the dispenser after feedback (buzzer sounds). These behaviors were not intentionally trained by the experimenters but were implicitly acquired as a result of many years of experimental experience. We hypothesized that this look-back behavior might be a spontaneous secondary behavior, reflecting their confidence or uncertainty in performing our task.

Fig. 1
figure 1

Examples of look-back behaviors by the chimpanzee Ai. Red arrows show the location of the food dispenser

The major problem in examining this hypothesis is that the look-back behavior was always preceded by a chime (and the sound of the food dispenser working) that signaled a correct response. In addition, food rewards from the dispenser fell on a tray shortly after the chime, so these were available immediately after the look-back behavior. Thus, we cannot rule out the possibility that this behavior emerged as a kind of "discriminated operant" or "superstitious behavior" (Skinner, 1948) controlled under the three-term contingency of reinforcement. To resolve this problem, we introduced a 1-s delay between the choice response and the buzzer/chime in the cognitive task and observed the chimpanzees' behaviors during this delay period (see Fig. 2A; cf. Beran et al., 2015). If the look-back behavior reflected the confidence of the choice response, then it should occur more frequently on correct trials than on error trials despite the feedback delay. On the other hand, if the look-back behavior was an acquired operant with the chime as the discriminative stimulus, then either it should not occur during the delay period or it should occur equally and randomly (at an "operant level") for both correct and error trials.

Fig. 2
figure 2

(A) Typical flow of the trial. (B) Three types of task-difficulty conditions

Methods

Participants

Two chimpanzees (Pan troglodytes) participated in this study; Ai (female), 36 years old at the beginning of the present experiment, Great Ape Information Network (GAIN, https://shigen.nig.ac.jp/gain/) ID#0434, and Pal (female), 14 years old (GAIN ID#0611). They had experienced various computer-controlled perceptual and cognitive experiments, including visual search tasks (Matsuzawa et al., 2006; Tomonaga, 2001, 2010; Tomonaga et al., 2003). As mentioned in the Introduction, they exhibited look-back behaviors toward the food dispenser during these experiments. They lived in a social group of 14 individuals in an indoor and environmentally enriched outdoor compound (770 m2) at the Primate Research Institute, Kyoto University (Matsuzawa, 2006). Food and water were not withheld in this study.

Ethics statements

Care and use of chimpanzees adhered to the third edition of the Guide for the Care and Use of Laboratory Primates of the institute. Experimental designs of the present study with chimpanzees were approved by the Animal Welfare and Animal Care Committee of the institute (2011-078, 2012-041, 2013-028, 2014-031). All procedures also adhered to the Guideline for Animal Experimentation of the Japanese Society of Animal Psychology, Guideline for the Care and Experimental Use of Captive Primates of the Primate Society of Japan, Code of Ethics and Conduct of the Japanese Psychological Association, and the Japanese Act on Welfare and Management of Animals.

Apparatus and experimental setting

The present study was conducted in a laboratory booth (1.8 × 2.15 × 1.75 m) adjacent to the chimpanzee facility (see Fig. 1). Each chimpanzee came to the booth through the overhead pathway connecting the facility and the booth. Two sets of 17-in. LCD monitors (I-O Data LCD-AD172F2-T, 1,280 × 1,024 pixels, pixel size: 0.264 mm × 0.264 mm) with touch panels were installed on the booth wall. The viewing distance was approximately 40 cm. Food dispensers (Biomedica BUF-310) were placed outside the booth, a little higher than their eye level, and supplied food rewards (small pieces of apple). Chimpanzees could see the dispensers operating through a transparent window of the booth (see Fig. 1). All equipment and experimental events were controlled by the computer.

Procedure

Visual-search task

In the present study, the chimpanzees received a visual search task as a baseline task. This task was the same as used in Tomonaga and Kawakami (2022), in which chimpanzees searched for a face-like object or a fruit among distractors. Before the onset of the present study, the chimpanzees had been working on this task for over a year.

Figure 2A shows the flow of the trials (see also Online Supplementary Material (OSM) Videos 1 and 2). After 2 s of an intertrial interval, a trial began with the presentation of a warning signal (blue square, 90 × 90 pixels) randomly at the bottom of the monitor on a white background along with a sound. When the chimpanzee touched it, a search display was presented. The search display consisted of one target and five distractors. These stimuli appeared randomly at six possible positions. When the chimpanzee touched one of the stimuli, all stimuli on the screen disappeared. We prepared two feedback conditions by manipulating the delay between the response and the feedback. In the no-delay condition, a target response immediately sounded a chime, and a food reward was delivered, while only a buzzer sounded when a distractor was chosen. In the delay condition, all stimuli immediately disappeared after a response, but feedback (chime and food, or buzzer) was presented 1 s later (Fig. 2A). The auditory feedback was highly directional because it was presented through a loudspeaker built into the monitor. Therefore, even though the two sets of apparatus were operated simultaneously (see OSM videos), the auditory feedback to their own responses was easily identifiable. The delay period was set to 1 s because look-back behaviors have occurred within 1 s of the presentation of the chime or buzzer in our previous experiments. We used a correction method for incorrect trials: if the chimpanzee touched a distractor, only the target stimulus was presented on the next trial (correction trial). Correction trials were not used for data analyses. We also prepared two task-difficulty conditions (Fig. 2B). The first is the "easy" task in which the distractors were uniform, and the second is the "difficult" task in which distractors were different from each other. In the previous studies, chimpanzees showed a clear difference in performance between these two task conditions (Tomonaga & Imura, 2015; Tomonaga & Kawakami, 2022; Wilson & Tomonaga, 2022). These two types of trials alternately appeared during a 48-trial session. Two feedback conditions were randomized across sessions, and each chimpanzee was given 12 sessions for each condition. Generally, one session was conducted per day for a maximum of 5 days per week.

Look-back behaviors

Behavior during the visual search task was recorded by a video camera and later coded. Note that the first 12 trials of the second session of the delay condition for Ai were not video recorded due to an operational error.

In particular, we focused on look-back behaviors towards the dispenser after choice responses. The look-back behaviors were classified into three types, shown in Fig. 1. The first was Look-Up: after the choice response, the chimpanzee not only turned her face, body, or both toward the side where the food dispenser was located, but also clearly turned her head upward to look at the dispenser. The second was Look to the Side (Side): after the choice response, the chimpanzee turned her face, body, or both horizontally to the side where the food dispenser was located but did not look up at the dispenser located slightly diagonally above. We also categorized no look-back behavior as the third type, No-Look: after the choice response, the chimpanzee's head did not turn away from the monitor, and the body hardly moved from the sitting position.

In both feedback conditions, we coded the behavior for the 1 s immediately after the choice response. Therefore, under the no-delay condition, the behavior immediately after the feedback was coded, and under the delay condition, the behavior immediately after the choice response (until the feedback) was coded. YKu was the primary coder, and YKa and HT were the second coders for checking reliability. The agreement rate was 92.3%, and the κ coefficient was 0.80. Based on this substantial reliability, we then analyzed the data coded by the primary coder.

Data analysis

In this study, only two chimpanzees participated, so we performed statistical analyses separately for each individual. We analyzed the accuracy, response times, and frequency of look-back behaviors using generalized linear mixed models (GLMMs) with Session as a random effect (random intercept models). Parameter estimates for each model were evaluated based on t statistics or Wald's statistics, also using Holm's correction method for multiple comparisons. These analyses were performed using lmer and glmer functions in lmerTest package (Kuznetsova et al., 2017) and mblogit function in mclogit package (Elff, 2022) in R version 4.2.0 (R Core Team, 2022).

Results

A summary of the overall results is presented in Table S1 (OSM). Raw data are also available in the OSM.

Accuracy

The graph at the top of Fig. 3 shows the percentage of correct trials for each condition. There were no differences between the feedback conditions, and only large differences were found between the distractor conditions. Our logistic regression for accuracy (with fixed effects for Feedback, Task Difficulty, and their interactions) revealed that both chimpanzees showed significant differences in accuracy only between the distractor conditions (Ai; β = -2.358, p < 0.001 (Holm’s correction), Pal; β = -2.286, p < 0.001; see Table S2 (OSM)).

Fig. 3
figure 3

The results of overall performances during the delay condition. The results of the no-delay condition are shown in Table S1 in the Online Supplementary Material. Upper panel: percentage of correct trials as a function of Distractor type (i.e., task difficulty); lower panel: mean response times in milliseconds as a function of Trial type. Error bars show the standard errors across sessions

Response times

Results for response times are shown in the lower part of Fig. 3 for trial-type (combined task difficulty and correct/error) and feedback conditions. Table S3 (OSM) shows the results of the GLMMs with a lognormal distribution, including these conditions and their interactions as fixed effects. In these analyses, we included Difficult/Error trials but excluded Easy/Error trials because the chimpanzees made very few errors during the easy task (upper part of Fig. 3; Table S1 (OSM)). No differences in response times were found between the feedback conditions for both chimpanzees. Ai showed significant differences in response times between correct and error trials in both feedback conditions (0.483 > βs > 0.204, ps < 0.001). In contrast, Pal did not show a significant difference between correct and error trials in the delay condition.

Look-back behaviors

Figure 4 shows the relative frequency of each look-back behavior for each chimpanzee in the delay condition. As in the response-time analysis, the data for Easy/Error trials were excluded from the analysis. The upper panels of Fig. 4 show the results of the no-delay condition. The two chimpanzees showed very few Look-Up responses in Difficult/Error trials in comparison with the other trial types (Ai: 2.0%, Pal: 17.1%; Table S1 (OSM)). We did not perform further statistical analyses for the no-delay condition. The graph shows that both chimpanzees exhibited a different pattern in the frequency of look-back behaviors in Difficult/Error trials in the delay condition from those in the other trial-type conditions. For both chimpanzees, Look-Up decreased, and No-Look increased compared to the other trial types. In addition, Ai showed more Side responses. We conducted multinomial logistic regression analyses on the results of the Delay conditions, with the Trial type as a fixed effect and Session as a random effect (Table S4 (OSM)). Both chimpanzees showed significantly different ratios of Look-Up versus No-Look between Difficult/Error and the other correct trial types (Ai, βs > 1.476, ps < 0.01; Pal, βs > 1.171, ps < 0.001). Ai also showed a significant difference in Look-Up versus Side between Difficult/Error and the other correct trial types (βs > 1.384, ps < 0.01).

Fig. 4
figure 4

Relative frequency of each look-back behavior for each trial type

Next, we analyzed the accuracy when each look-back behavior was observed. Figure 5 shows the percentage of correct trials for each look-back behavior that occurred in the difficult trials under the delay condition. It is clear from these graphs that when they looked up the food dispenser, the accuracy was higher than when they showed the other behaviors. Logistic regression analyses with Look-back behavior as a fixed effect and Session as a random effect verified this visual inspection. Table S5 (OSM) shows the results of the GLMMs. For Ai, the accuracy when Look-Up was observed was significantly higher than for Side or No-look (βs >1.338, ps < 0.01). For Pal, when she showed Look-Up, the accuracy was significantly higher than for No-Look (β = 1.140, p < 0.001).

Fig. 5
figure 5

Percentage of correct trials for difficult trials as a function of look-back behaviors in the delay condition

In addition, we also analyzed the response time when each look-back behavior was observed during the difficult trials. The results are shown in Fig. 6. Pal did not show any differences in response times between look-back behaviors, while Ai showed longer response times when she showed No-Look than Look-Up. Table S6 (OSM) shows the results of GLMMs with a lognormal distribution for these results, with Look-back behaviors, Correct/Error, and their interactions. For Pal, no differences were found between conditions. On the other hand, for Ai, there was a significant difference between Look-Up and No-Look in Difficult/Error trials (β = 0.533, p < 0.001). However, response times when both of these behaviors were observed were significantly longer for error trials than for correct trials (Look-Up, β = 0.182, p < 0.01; No-Look, β = 0.511, p < 0.001).

Fig. 6
figure 6

Mean response times for difficult trials as a function of look-back behaviors in the delay condition

Discussion

We found that the two chimpanzees often looked up at the food dispenser immediately after their correct choices but not after incorrect choices during computer-controlled cognitive tasks (Fig. 1). In the present study, we examined whether such look-back behaviors were controlled by external feedback, such as chime and buzzer, or spontaneously emitted by them based on their judgments of confidence or uncertainty.

Both chimpanzees exhibited more Look-Up and less No-Look when the choice responses were correct, rather than incorrect, even during the delay period before feedback (Fig. 4 and Table S1 (OSM)). There were no differences in ratios of look-back behaviors after correct choices, irrespective of task difficulty. These results suggest that the consequences of their choices (i.e., correct or incorrect), rather than task difficulty (as defined by the uniformity of distractors such as Hampton's environmental cues), is the critical factor for look-back behaviors. The fact that these looking-back behaviors have been observed not only in the present study but in other computer-controlled cognitive experiments conducted in our laboratory in the same manner (e.g., TheFriendsAndAi, 2012) also supported this conclusion. Therefore, these look-back behaviors seem to reflect our chimpanzees’ confidence in their task choices.

On the other hand, the chimpanzees' own behavior, such as the length of response times, might provide cues for look-back behaviors. The results shown in Fig. 6 indicate that, at least for Pal, response times did not vary among the types of look-back behaviors. However, Ai showed longer choice response times before No-Look responses than Look-Up responses. In other words, we cannot fully rule out the possibility that the length of response times played a role of discriminative cues for some look-back behaviors for Ai: she might have changed her look-back behavior based only on temporal discriminations instead of confidence or uncertainty. Yet, Ai looked to the side (Side) after incorrect choice responses. In contrast with No-Look, there were no differences in response times between Look-Up and Side. Furthermore, she showed better accuracy when producing Look-Up behaviors than Side (see Fig. 5). These results suggest that, for Ai, No-Look and Side behaviors might reflect different degrees of confidence (cf. Brennan & Williams, 1995).

In the future, to rule out the influence of the length of response times for chimpanzees' look-back behaviors, we should examine their ability to discriminate the durations of their own behaviors (cf. Kelleher, 1957; Roberts, 1981), Additionally, we cannot rule out the possibility that the length of response times and other aspects of their own behaviors may influence metacognition. Koriat (2007) pointed out the bidirectional interaction between metacognitive monitoring and control. Not only does metacognitive control occur as a result of metacognitive monitoring, but the feeling of knowing, one way to monitor metacognition, may change as a result of control (Koriat et al., 2006). Furthermore, in humans, behavioral priming has been shown to cause higher confidence for incorrect judgments after participants were made to move faster than the natural rate of movement (Palser et al., 2018). Although the reproducibility of behavioral priming is often questioned (Lakens, 2014; Open Science Collaboration, 2015), examining metacognition within the framework of embodied cognition may shed new light on comparative-cognitive studies.

The look-back behaviors observed in the present study did not change the accuracy of the primary behavior. In other words, these secondary behaviors are not actually beneficial (Smith, 2009). Our results suggest that look-back behaviors may reflect metacognitive monitoring (Koriat, 2007), but not have any metacognitive control properties. Most experimentally formed procedural metacognitive behaviors (Proust, 2019) in the comparative studies affect the accuracy or reinforcement rate of the primary behavior. Even spontaneous behaviors reported in the previous comparative studies could be considered beneficial, such as the information-seeking (peeking tubes) reported by Call and Carpenter (2001). In the study by Beran et al. (2015), in which a delay period was introduced between the choice response and feedback as in our experiment, the reward became unavailable after a certain time, so the metacognitive behavior observed in their experiments is also beneficial to the primary behavior. Wavering reported by Allritz et al. (2021)) could also be considered a type of information-seeking behavior similar to the look-back responses reported by Tomonaga and Matsuzawa (cf. Bethell-Fox et al., 1984) since further confirmation of the stimulus would be possible during wavering. Metacognitive monitoring behavior that does not affect primary behaviors has been reported in humans (Paulus et al., 2013), but to our knowledge, the present study is the first report of such behaviors in nonhuman animals.

When faced with a problem-solving task, chimpanzees (as well as other nonhuman animals) spontaneously exhibit a variety of behaviors in addition to the responses directly related to the task. Among these secondary behaviors, there must surely be some reflection on their metacognitive states (e.g., pupil dilation, Paulus et al., 2013, Lempert et al., 2015; eye-blinks, Declerck et al., 2006; self-scratching, Yamanashi & Matsuzawa, 2010; eye movements, Bethell-Fox et al., 1984, Roderer & Roebers, 2014; and head tilting, Dutemple et al., 2022). The investigation and examination of such spontaneous metacognitive behaviors will provide new directions for comparative-cognitive studies on metacognition (Nakao & Goto, 2015).