A novel test of planning ability: Great apes can plan step-by-step but not in advance of action

The ability to identify an appropriate sequence of actions or to consider alternative possible action sequences might be particularly useful during problem solving in the physical domain. We developed a new 'paddle-box' task to test the ability of different ape species to plan an appropriate sequence of physical actions (rotating paddles) to retrieve a reward from a goal location. The task had an adjustable difficulty level and was not dependent on species-specific behaviours (e.g. complex tool use). We investigated the planning abilities of captive orangutans (Pongo pygmaeus) and bonobos (Pan paniscus) using the paddle-box. In experiment 1, subjects had to rotate one or two paddles before rotating the paddle with the reward on. Subjects of both species performed poorly, though orangutans rotated more non-food paddles, which may be related to their greater exploratory tendencies and bolder temperament compared with bonobos. In experiment 2 subjects could always rotate the paddle with the reward on first and still succeed, and most subjects of both species performed appropriate sequences of up to three paddle rotations to retrieve the reward. Poor performance in experiment 1 may have been related to subjects' difficulty in inhibiting the prepotent response to act on the reward immediately.


Introduction
Planning as an everyday concept has many connotations, and several terms are used more or less interchangeably to describe a myriad of behaviours that do not seem to have much in common (Parrila et al., 1996). At one end of the spectrum, planning can consist of anticipating the consequences of motor actions, for example grasping an object in an appropriate orientation (end-state comfort effect; Rosenbaum et al., 1990). This has been demonstrated to develop early in humans (by 19 months of age; McCarty et al., 1999) and also to have emerged early in primate phylogeny, being present in several lemur species (Chapman et al., 2010). At the other end of the spectrum lies episodic future thinking: the ability to mentally project oneself into an imagined future scenario (Suddendorf and Corballis, 1997). There is continuing debate regarding which, if any, nonhuman species possess this latter capacity, with some researchers presenting experimental evidence for animals imagining and planning for future events (Mulcahy and Call, 2006;Osvath, 2009;Osvath and Karvonen, 2012;Osvath and Osvath, 2008;Raby et al., 2007) and others arguing that foresight is an uniquely human ability (Suddendorf and Corballis, 2007;Suddendorf et al., 2009). Clearly, these two examples of planning, and the multitude of intermediate cases, must pose very different cognitive demands and vary in terms of their information processing requirements Tecwyn et al., 2012).
Bearing this in mind, it is important to specify the type of planning that is of interest here, which is the type of planning that may be involved in problem solving that is oriented towards current needs. This can be defined as the ability to identify an appropriate sequence of actions or consider alternative courses of action prior to execution (see Tecwyn et al., 2012 for further discussion). Behaviours exhibited by wild great apes that may involve this type of planning include the use of 'tool-sets' for extractive foraging of honey by chimpanzees (Pan troglodytes: Brewer and McGrew, 1990); 'engineering' of alliances with the most profitable partners by bonobos (Pan paniscus;Aureli et al., 2008;Hohmann and Fruth, 2002); hierarchical processing of plant material by gorillas (Gorilla beringei beringei: Byrne et al., 2001) and gap-crossing in the compliant forest canopy by orangutans (Pongo pygmaeus: Chevalier-Skolnikoff et al., 1982).
How might planning for current needs be investigated experimentally? Several papers have advocated developing experimental methodologies and paradigms that consider different species' predispositions to allow testing of multiple species (Santos et al., 2006;Amici et al., 2010;MacLean et al., 2012), as at present systematic interspecific comparisons are still rare (Schmitt et al., 2012). This is important in order to avoid the presentation of tasks in an 'unfair' manner, hence biasing for or against the abilities of certain species (Roth and Dicke, 2005). To date, studies investigating planning for current needs in nonhuman species have mostly fallen in to one of two categories: those involving the use of tools, and those involving computerised interfaces (but see e.g. Dunbar et al., 2005;Kuczaj et al., 2009;Miyata et al., 2011 for interesting alternative approaches).
Tool-use studies of planning, usually focused on sequential tooluse, or metatool use (e.g. Bird and Emery, 2009;Hihara, 2003;Martin-Ordas et al., 2012;Mulcahy et al., 2005;Taylor et al., 2007Taylor et al., , 2010Wimpenny et al., 2009) have yielded many interesting insights. However, they may not be ideal when attempting to develop a comparative planning paradigm, for at least two reasons. First, they bias against non-tool-using species, as the behaviours involved in solving the task may not form part of their natural repertoire, and may require fairly precise manipulatory abilities (e.g. sufficient motor control to hold a stick and insert it into a narrow tube). Second, there is evidence to suggest that removing tool-use from physical cognition problems can reduce cognitive load and improve performance (Seed et al., 2009). Therefore, if it is planning rather than tool-use that is the focus of study, it seems prudent to avoid the requirement for tool-use.
Studies involving computerised environments have also been used to investigate planning ability. These require subjects to use either a touch-screen or joystick, for example to navigate through a two-dimensional maze (e.g. Fragaszy et al., 2003Fragaszy et al., , 2009Miyata and Fujita, 2008;Pan et al., 2011) or to recall a sequence of numbers (Beran et al., 2004;Biro and Matsuzawa, 1999). Such techniques certainly have experimental advantages, such as precise timing of stimulus presentation and automatic recording of behavioural responses. However, they are expensive and time-consuming to implement, with subjects requiring extensive training to use the experimental apparatus prior to the start of testing. Furthermore, the physical and temporal distance between stimulus, response and reward, and the need for refined motor abilities can be problematic, particularly for younger individuals (Mandell and Sackett, 2008).
A further problem with these and other cognitive tasks such as the trap-tube paradigm (Visalberghi and Limongelli, 1994) is that initial errors made by the subject are often correctable. In trap-tube tasks for example, the reward can initially be moved in one direction, but the direction could be switched before the reward falls in a trap. Although error correction strategies can be enlightening (e.g. DeLoache et al., 1985), having the option of correcting an error may reduce the motivation of subjects to make the correct choice in the first place, or to plan for the correct solution (Tecwyn et al., 2012).
As well as considering the practical and paradigmatic issues raised above, it has been suggested recently by MacLean et al. (2012) that it would be fruitful for researchers to design tasks with an adjustable level of difficulty, in order to avoid the masking of meaningful variation due to floor or ceiling effects. In the case of planning during problem solving, it would be useful to have a task that could distinguish between, for example, the ability to make selections between alternatives (proto-deliberative; Sloman, 2010) and the ability to explore branching futures (fully deliberative; Sloman, 2010), which differ in terms of their computational burden.
The aims of this paper were two-fold. First, we aimed to design a new paradigm appropriate for comparative testing of planning ability in primate species (including humans) that: • Did not involve complex tool-use • Did not depend on species-specific behaviours/competences • Had an adjustable level of difficulty • Did not have a performance outcome that was dependent on a binary choice, in order to reduce the possibility of the task being solved by chance • Was not correctable, to encourage subjects to choose correctly initially • Could be configured in a trial-unique manner, so the task had to be considered anew for each trial.
Second, we aimed to use the new paradigm to investigate whether captive bonobos and orangutans (Pongo pygmaeus) are able to plan an appropriate sequence of actions (a) in advance (experiment 1); or (b) sequentially (experiment 2), in order to retrieve a food reward from a goal location. These species are of particular interest in the investigation of planning abilities from a comparative perspective because they represent our closest and most distant great ape relatives, respectively, and therefore potentially allow inferences regarding the evolution of planning ability to be drawn (Mulcahy and Call, 2006). If the ability to plan was present in the great ape last common ancestor, then we might expect both bonobos and orangutans to exhibit planning behaviour. If it evolved more recently in an African ape ancestor, then we might expect only bonobos to perform well in our planning task. If on the other hand orangutans outperform bonobos, this may suggest that orangutans have refined their adaptations (both anatomical and cognitive) for arboreal living, beyond those that were present in the great ape common ancestor. As the only great ape species to remain in the terminal branch niche (Grand, 1972) and therefore still face the locomotor demands as posited by Povinelli and Cant (1995), it seems feasible that orangutans have continued to face strong selection pressure for the ability to mentally 'try out' different possible courses of action, and may therefore potentially possess particularly refined planning skills.

Subjects and housing
Four bonobos housed at Twycross Zoo, UK and eight orangutans housed at Apenheul Primate Park and Ouwehands Dierenpark Rhenen in the Netherlands, participated in this study.
Not all subjects participated in all of the experiments, and in some experiments the number of trials completed varied between subjects. This was to comply with zoo-specific regulations relating to research. Details of which individuals participated in which experiments are given in Table 1 as well as the separate methods sections for each experiment below. The number of trials completed by different individuals is specified in the relevant sections. Bonobos at Twycross and orangutans at Ouwehands were naive with respect to cognitive testing, whereas orangutans at Apenheul had previously been exposed to a trap-tube type task reported in Tecwyn et al. (2012). The bonobos at Twycross Zoo were housed as two separate subgroups in one indoor building (124 m 2 ) and shared an outdoor enclosure (588 m 2 ), which the two subgroups had access to at different times during the day. They were fed a range of fruits and vegetables twice daily, and received additional feeds of egg, bread or cheese once or twice per week. Of the subjects that participated in this study, Keke, Banya and Kichele were in one subgroup and Cheka was in the other subgroup. The orangutans at Apenheul Primate Park were housed in four interconnected indoor enclosures (total 232 m 2 ) and had access to eight outdoor islands (total 1000 m 2 ). The orangutans at Ouwehands Dierenpark were housed in three interconnected indoor enclosures (total 370 m 2 ) and had access to an outdoor enclosure (348 m 2 ). They also had access to an outdoor system of ropes connected to wooden poles at a height of approximately 10 m, which extended out of the enclosure. Orangutans in both facilities were fed a range of fruits and vegetables two to three times per day, as well as ape biscuits/pellets. They received additional feeds of egg or bread two or three times per week to supplement their diet. Both orangutan groups were given access to different parts of their enclosure through opening and closing sliding doors. The apes at all three institutions were managed with an attempt to simulate fission-fusion societies, so composition of the groups in the different sub-enclosures changed on a regular basis. Enclosures at all zoos were equipped with climbing elements including tree trunks, fibreglass poles, ropes, netting, shelves, platforms and enrichment materials. The study complied with the British, European and World Associations of Zoos and Aquariums (BIAZA, EAZA and WAZA) ethical guidelines and was approved by the ethical committee of the University of Birmingham as well as the management committee of each of the participating zoos.

Test apparatus and general experimental design
The paddle-box apparatus was attached to the outside of the enclosures and consisted of an opaque Perspex box (60 cm × 60 cm × 6 cm) containing eight rotatable paddles (14.5 cm × 3.5 cm × 1.7 cm; 1-8 in Fig. 1a) on three levels (i-iii in Fig. 1a). There were four possible goal locations (each measuring 11 cm × 4.5 cm × 4.5 cm; A-D in Fig. 1a) at the base of the apparatus that could all either be open or blocked.
The paddle-box was designed to be mechanically accessible to any animal capable of operating the simple paddle mechanism, making it ideal for comparative testing of a number of species, including non-tool-users. The paddles were rotated by subjects using wooden handles (7 cm × 2.5 cm × 1.7 cm) that extended out of the front of the box and were oriented parallel to the paddles inside the box (see Fig. 1b). The handles could be operated in a number of ways; for example by pushing down from above or up from underneath at either end of a handle, or by using a twisting action. They were designed to be large enough so they did not require fine motor control and thus reduce the chance of subjects accidentally turning them the wrong way. Once a paddle was rotated, directional choices were not easily correctable because the reward rolled quickly off the paddle. The experimenter (E.C.T.) could quickly and safely configure the paddle-box between trials by rotating paddles using long rods that extended out of the back of the box. Each paddle could be set up in one of three orientations (flat; diagonal left; diagonal right, see Fig. 1b for examples of these orientations). Paddles were held in position by weak magnets (Fig. 1c) so that they were easily rotatable by the subjects, but a moving reward did not displace them from their orientation.

General procedure
The experimental procedure varied between institutions in order to comply with the different zoos' regulations. All subjects were tested in off-show rooms (10-22 m 2 ) where they were held regularly for feeding and during cleaning of the main enclosures. The bonobos at Twycross were not isolated for testing (in compliance with the institution's ethical guidelines), and consequently session length and the number of trials completed varied between individuals. Usually however, a single bonobo monopolised the apparatus during testing (though the individual varied between testing sessions), and minimal competition for the apparatus was observed. Orangutans were tested in isolation apart from Sandy who was accompanied by two dependent juveniles. Subjects were not food deprived before the trials, water was available ad libitum and they could choose to stop participating at any time. The food reward in each trial was a small piece of fruit (orange, apple, pear) or bread and subjects remained motivated to obtain the rewards throughout the study. Due to constraints imposed by the testing area dimensions, the paddle-box was presented to orangutans at ground-level, whereas for bonobos it was attached to the enclosure at a height of approximately 1.5 m (from the base of the paddle-box to the ground).

Familiarisation phase
There was a minimal familiarisation phase to confirm the ability of subjects to retrieve a reward from an open goal location. Each subject was presented with the apparatus with goals B and C open and A and D blocked, and the reward starting on paddle 7 (see Fig. 1a). Subjects could retrieve the reward by rotating paddle 7 in either direction and extracting the reward from one of the open goals. Once subjects had succeeded in retrieving the reward five times, they were able to progress to the testing phase. None of the subjects experienced the reward becoming trapped during familiarisation. All subjects except for one achieved the familiarisation criterion within a few minutes of first encountering the apparatus. One orangutan (Jewel) did not rotate the paddle with the reward on within a few minutes, so the experimenter demonstrated the rotation action to her. She subsequently succeeded in reaching the criterion for progressing to the testing phase.

Testing phase
Two experiments were carried out, each described in greater detail below. In both experiments, the reward could start on any paddle excluding paddles 1 and 3 (see Fig. 1a). The reason for this was that if paddles 1 or 3 were rotated towards the outer edge of the paddle-box, the reward could simply drop down to the bottom of the apparatus, missing out the paddles on the middle level. The minimum number of steps required to retrieve the reward in any given trial ranged from one to three and was pseudorandomised within each block, with the constraint that no more than two trials with the same number of minimum moves occurred consecutively. The paddle that the reward started on (the start paddle) and the level on which it was located (i-iii in Fig. 1a) were also pseudorandomised such that they were not the same in more than two consecutive trials. In all trials only one goal location was open and the other three were blocked. The open goal was white and visually distinct from the blocked goals that were black (see Fig. 1b). If the reward was successfully navigated to the open goal location it could be retrieved by the test subject from the front of the apparatus. If the reward became trapped at one of the blocked goal locations it could not be accessed by the subject and was removed from the back of the apparatus by the experimenter. For some trials it was possible to retrieve the reward in the minimum number of steps in multiple ways (a maximum of three), that is, there was more than one viable route from the start paddle to the open goal location. Impossible configurations, in which the reward could not be moved from the start paddle to the goal via any sequence of paddle rotations (e.g. reward starting on paddle 4 and open goal in location D, see Fig. 1a) were never presented.

Data scoring and analysis
All trials were videotaped. For each trial, whether the reward was retrieved from the open goal location (correct) or became trapped (incorrect) was scored. In some trials, for example if a paddle was rotated very rapidly, the reward did not follow the path of the pre-positioned non-food paddles and ended up in an unexpected goal location, i.e. subjects were not rewarded when they should have been, or vice versa. If the reward ended up in a blocked goal location when the paddles were configured so that it should have ended up in the open goal it was scored as an 'unexpected trapping'. Conversely, if it ended up in the open goal location in this way it was scored as an 'unexpected retrieval'. In cases where a reward was 'unexpectedly trapped' despite the subject performing a valid sequence of paddle rotations, this was scored as correct. If a reward was 'unexpectedly retrieved' in this manner it was scored as incorrect. Information regarding each individual paddle rotation was also recorded. Specifically: • paddle identity (1-8 in Fig. 1a) • whether it was the start paddle (paddle on which the reward started, e.g. paddle 4 in Fig. 1b) or a non-food paddle (all other paddles) • direction of rotation: • left or right • towards or away from open goal location (this information was not recorded for trials in which the start paddle was located directly above the open goal location, as was the case for start paddle 4 and goal B, and start paddle 5 and goal C, see Fig. 1a). • Non-food paddles that were rotated were further classified according to: • whether they were relevant (rotation enabled the reward to be retrieved, e.g. paddle 7 in Fig. 1b) or irrelevant (did not need to be rotated for the reward to be retrieved). • the level on which they were located, relative to the level of the start paddle (same level; above; below). • timing of rotation (pre-reward insertion; whilst the reward was on the start paddle; after the reward had become trapped).
A second observer (J.C.) independently scored 20% of the trials. Inter-observer reliability was calculated using Cohen's kappa (k), and was excellent for all of the variables scored (experiment 1: k ranged from 0.90 (direction of start paddle rotation relative to goal location) to 0.98 (reward retrieval); experiment 2: k = 0.89 for reward retrieval and 0.98 for direction of start paddle rotation (left or right)). Data were analysed using PASW Statistics 18 (IBM SPSS Inc. 2009) and R 2.11.1 (LME4 package, R Development Core Team 2010).

Experiment 1: advance planning task
This task was presented first as it was considered to be the most difficult in terms of planning demands. Presenting an easier task first could potentially train subjects to succeed in a more difficult task, which we wanted to avoid.

Methods
All four bonobos and five orangutans (Amos, Jewel, Sandy, Anak and Radja) participated in experiment 1. Subjects were presented with up to 12 blocks of 12 trials. The total number of trials completed by each subject depended on two factors: a subject's availability for testing, and their performance in the first eight blocks. If a subject successfully solved any 2-or 3-step trials then they were presented with up to four additional blocks. This was because subjects that succeeded in this initial testing period through planning their actions might have been expected to show repeated success with additional testing, and we wanted to maximise the chance for subjects to display the ability to succeed at the task, should it exist. Table 2 gives details of the number of trials completed by each subject.

Paddle-box configurations
Within a block, each trial was a unique configuration of the paddle-box apparatus requiring a minimum of one, two or three paddle rotations to retrieve the reward. In 1-step trials the reward could start on any of the three levels (i-iii in Fig. 1a); in 2-step trials the reward either started on the middle or top level; and in 3-step trials the reward could only start on the top level. In 2-and 3-step trials subjects had to pre-position one or two non-food paddles before rotating the start paddle (see Fig. 2a for an example of a 2-step trial). Pseudorandomisation occurred as described in the General Procedure section. The open goal location was fixed within a block but changed between blocks. In each trial, only the start paddle was positioned in the flat orientation. All of the other Table 2 Results of experiment 1 (advance planning) and experiment 2 (sequential planning). Number of trials correct and the number completed, and first trial performance for each trial-type (1-step, 2-step, 3-step). C = correct first trial; I = incorrect first trial; (-) did not participate.

Results
The number of trials completed ranged from 43 to 120 (Table 2). In 8.0% of all trials the reward was unexpectedly retrieved and 2.0% of trials resulted in an unexpected trapping. Most unexpected retrievals occurred when the start paddle was located directly above a goal location and the subject rotated it very rapidly, causing the reward to fall between the two paddles beneath and into the open goal location, rather than sliding down either one of them and becoming trapped. The number of 1-, 2-and 3-step trials in which the reward was correctly retrieved by each subject is shown in Table 2, together with the total number of trials completed by each subject and their first trial performance for each trial-type.
In the majority of trials (84.9% for orangutans and 98.3% for bonobos) only the start paddle was rotated. Based on a subject only rotating the start paddle in a trial, the probability of success in a 1step trial was 0.5, because one of the two possible directions in which the paddle could be rotated resulted in the reward ending up in the open goal, whereas the other direction led to it becoming trapped). Only Amos (orangutan) performed significantly better than expected by chance (based on a 0.5 probability of success) across the 1-step trials he completed (Table 2; binomial test: 24/38 trials correct, P = 0.03). Even within the subset of 1-step trials in which the reward started on the bottom level, again only Amos' performance was above chance-level (binomial test: 16/20 trials correct, P = 0.01). Most of the subjects did not solve any of the 2-or 3-step trials in which one or two non-food paddles had to be pre-positioned before rotating the start paddle (Table 2). Three orangutans (Amos, Sandy and Radja) did retrieve the reward in some 2-step trials (see Table 2) and they did this by pre-positioning relevant non-food paddles in advance of rotating the start paddle (see supplementary Video 1 for an example).

Start paddle rotations
Two orangutans and one bonobo exhibited a significant tendency to rotate the start paddle to the right (binomial test: Anak: 82/110, P < 0.001; Jewel: 50/80, P = 0.034; Cheka: 62/91, P = 0.001) and one bonobo tended to rotate the start paddle to the left (binomial test: Kichele: 66/96, P < 0.001). The remaining subjects did not exhibit a directional preference. Fig. 3 shows that four out of five orangutans but no bonobos rotated the start paddle towards the open goal location significantly more often than expected by chance.
In some trials the start paddle was located directly above the open goal location; hence it could not be turned towards or away from the goal., Within this subset of trials, each of the four orangutans that preferentially rotated the start paddle towards the Fig. 3. Percentage of start paddles rotated towards (as opposed to away from) the goal location by each subject in experiment 1. Numbers at the base of bars indicate the total number of trials that each subject participated in that were included in this analysis. (*) Indicates P < 0.05 and (***) indicates P < 0.001 in a binomial test. open goal location in the above analysis (Fig. 3) rotated the start paddle in a random direction (binomial test: P > 0.05 for all).

Non-food paddle rotations
All of the orangutan subjects and two out of four bonobos rotated at least one non-food paddle during the experiment. The total frequency of non-food paddle rotations for all trials ranged from zero (Cheka and Keke) to 43 (Anak). Fig. 4 shows that orangutans rotated more non-food paddles (both relevant and irrelevant) than bonobos. Orangutans did not however rotate significantly more relevant than irrelevant non-food paddles ( Fig. 4; Mann-Whitney U Test: N 1 = 48, N 2 = 47, P = 1.000).
For subjects that rotated relevant non-food paddles, the first trial in which this occurred ranged from trial 1 (Anak) to trial 87 (Kichele; see numbers above bars in Fig. 4). Of the six subjects that rotated both relevant and irrelevant non-food paddles, four rotated a relevant paddle in an earlier trial than they rotated an irrelevant paddle (Fig. 4).
All subjects that rotated non-food paddles rotated more that were located below the starting level of the reward as opposed to on the same level or above. Overall, 75.3% of all non-food paddles rotated were below the level of the start paddle.

Discussion
Subjects generally failed at this task, even in 1-step trials ( Table 2). In 93.1% of trials only the start paddle was rotated, so subjects rarely pre-positioned any non-food paddles, which was necessary for success in the 2-and 3-step trials. Three orangutans succeeded in some 2-step trials by pre-positioning relevant nonfood paddles (Table 2). Although this may give an impression of an 'understanding' of the task in these particular trials, overall there was no significant difference between the number of relevant and irrelevant paddles rotated (Fig. 4), suggesting that subjects may simply have been rotating paddles at random. The position of the non-food paddles they did rotate (most frequently on levels below the start paddle) may however indicate that subjects were aware that paddles higher up in the apparatus were less likely to influence the path of the reward because the reward only ever moved down towards the bottom of the paddle-box. Bonobos very rarely rotated any non-food paddles (Fig. 4). However, the observed difference in propensity to rotate non-food paddles may reflect a difference in the two species' exploratory tendencies and temperament (Herrmann et al., 2011), or variation in testing conditions, rather than any difference in cognitive ability.
Although subjects generally only rotated the start paddle, four out of five orangutans (but no bonobos) did preferentially rotate the start paddle towards the open goal location (Fig. 3). Furthermore, in trials where the start paddle was directly above the goal, these same subjects turned the paddle in a random direction. While turning the start paddle towards the open goal did not enable subjects to succeed in the task, it suggests that they may at least have encoded information about the relevance of the open goal for retrieving the reward, and turned the start paddle so that the reward moved towards it. Subjects that did not preferentially rotate the start paddle towards the open goal may not have encoded the relevance of the open goal location, despite the fact that it was visually and haptically distinct from the blocked goal locations (see Fig. 1b). It is also possible that these subjects may have exhibited this behaviour, had they been given a small amount of pre-training so that they learned about how the reward moved depending on which way the start paddle was rotated. However, as there was no evidence for improvement in performance across sessions this is perhaps unlikely.
The failure in 2-and 3-step trials of subjects that apparently encoded the relevance of the goal location could either have stemmed from a lack of understanding of how non-food paddles affected the path of the reward, or their inability to inhibit the prepotent response to rotate the paddle with the food on (i.e. the start paddle).
Reaching directly for a desirable object is known to be a prepotent response, the prevention of which requires the ability to reject some alternative (inappropriate) actions and favour others (Diamond, 1990). The salience of the food reward on the start paddle may have meant that subjects were unable to divert their attention to other relevant aspects of the apparatus (i.e. the positions of the non-food paddles) (Vlamings et al., 2010). Food salience is known to affect the performance of several primate species in reversed contingency tasks, where subjects are presented with a choice between a small and a large quantity of food, but they receive the opposite of what they select (Boysen and Berntson, 1995). In the delay of gratification test on the other hand, apes have accumulated food items for several minutes before taking the rewards (e.g. Beran, 2002).
It is unclear what caused the subjects that seemingly encoded the relevance of the goal location to fail at this task. Possibilities included: (1) an inability to plan an appropriate sequence of actions, (2) an inhibitory control problem, and (3) a lack of understanding of how diagonally positioned non-food paddles influence the path of the reward. In the second experiment we eliminated the two latter possibilities to determine whether this improved subjects' ability to plan in the task.

Methods
Three bonobos (Cheka, Keke and Kichele) and seven orangutans (Jingga, Yuno, Amos, Jewel, Tjintah, Sandy and Anak) participated in experiment 2. Jingga, Yuno and Tjintah had not participated in experiment 1 and so had no previous experience with the apparatus apart from the familiarisation phase. All seven orangutan subjects were presented with four blocks of 12 trials (one block with the open goal in each of the four possible locations); the number of trials completed by the bonobos varied between subjects (Cheka: 55, Keke: 60, Kichele: 37).

Paddle-box configurations
In this experiment, all of the paddles were set up in a flat orientation at the start of each trial. The number of steps required to solve each trial was dictated by the level on which the reward started. As in experiment 1, all trials could be solved in one, two or three steps. The key difference here was that all trials could be solved by rotating the start paddle first, and then by rotating paddles on which the food was subsequently located, so subjects never had to pre-position non-food paddles. An example of how to retrieve the reward in a 2-step sequential trial is shown in Fig. 2b. In this trial there was only one correct route from the start paddle to the open goal location. However, in several of the 2-and 3-step trials the reward could be retrieved by taking a number of different routes. As in experiment 1 the start paddle and number of steps required to retrieve the reward (i.e. the start level) were pseudorandomised within each block. The open goal location was fixed within a block but changed between blocks.

Results
Overall performance ranged from 54.1% (Kichele) to 97.9% (Yuno) of trials correct (see Table 2 for performance in different trial-types and supplementary Video 2 for an example of a successful 3-step trial). However, because the probability of success varied between different trial-types, it was not possible to conclude whether or not individual subjects' overall performances were better than expected by chance. Therefore, five different trial-types were identified, the probability of success for each was calculated, and each subject's performance within each trial-type was assessed.

Performance in different trial-types
In all 1-step trials there was a 50% chance of success, based on the start paddle being rotated immediately (as was the case in experiment 1). The 2-step trials could be classified as those for which there was only one solution (i.e. only one possible route from start paddle to goal, as in Fig. 2b), and those for which there were two solutions (two viable routes from start paddle to goal). Similarly, 3-step trials could be split into those with only one solution, and those with three viable solutions.
It was possible to calculate the probability of retrieving the reward by chance in each of these 2-and 3-step trial-types based on the premise that subjects always rotated the paddle on which the food was located at any given point in a random direction. For example, to solve the 2-step trial in Fig. 2b (where there is only one valid route to the goal) the subject had to rotate the start paddle to the left (step 1 in Fig. 2b), then rotate the bottom centre paddle to the left (step 2 in Fig. 2b). The probability of this sequence occurring was 0.5 * 0.5 = 0.25. Therefore, for this trial-type there was a 25% chance of the reward being retrieved by chance. Having calculated probabilities of success for the different trial-types (see supplementary Fig. S.1 and the accompanying material for additional details), it was possible to examine subjects' individual performances using binomial tests, the results of which are shown in Fig. 5 (see supplementary Table S.1 for individual binomial test results).
Binomial tests were not used to assess performance in the 2 steps, 2 solutions trials (Fig. 5b), because the maximum number of trials of this type completed by a subject was four.
One bonobo (Kichele) and one orangutan (Jingga) did not perform better than expected by chance in any of the trial-types (Fig. 5). Three orangutans (Yuno, Amos and Anak) and two bonobos (Cheka and Keke) performed better than expected by chance in all of the trial-types for which binomial tests were run (Fig. 5). The remaining three orangutans all performed above chance-level in all but one trial type; Tjintah did not reach criterion in the 2 steps, 1 solution trial-type (Fig. 5d) and Jewel and Sandy failed in the 3 steps, 3 solutions trial-type (Fig. 5c).
Jingga, who was unsuccessful across all trial-types, was the only subject to improve his performance across testing blocks. In block 1 he retrieved the reward in 42% of trials, compared with a 75% success rate in his last block (Friedman test: 2 1 = 4.0, P = 0.046). None of the subjects exhibited significant directional preferences when rotating start paddle (binomial tests: P > 0.05 for all). This includes the three subjects that did exhibit directional preferences in experiment 1.
Two out of three orangutans (Jingga and Yuno) subsequently succeeded in an additional version of this task, in which the goal location was switched between trials within each block, as opposed to only between blocks (supplementary Fig. S.2). Unfortunately it was not possible to test any additional subjects in this version of the task, due to safety concerns associated with the proximity between the experimenter and subject that was necessary to switch the goal location between trials.

Factors associated with success
To explore the factors related to success in experiment 2, we fitted a generalised linear mixed model (GLMM) with binomial error distribution, using correct or incorrect sequence of paddle rotations as a binary response. We began by entering all probable explanatory terms and possible two-way interactions between them. The start-level of the reward, the location of the open goal, species and sex were included as fixed factors, as well as start-level × goal location as an interaction term. Subject was included as a random factor on the intercept, and trial number as a random effect on the slope (Crawley, 2007). Terms were sequentially dropped from the model until the minimal model contained only terms whose elimination would significantly reduce the explanatory power of the model (Thornton and Samson, 2012).
The full model (AIC = 491.3) showed that the start-level of the reward influenced the likelihood of subjects performing a correct sequence of paddle rotations (and hence retrieving the reward). Dropping the interaction term (start-level × goal location) significantly reduced the explanatory power of the model (likelihood ratio test comparing the two models: 2 6 = 21.86, P = 0.0013) so this term was retained. Neither sex nor species significantly affected success, so these terms were dropped from the model. Trial number explained little variance in the model; indicating that the subjects did not improve over the course of the experiment (Crawley, 2007). Dashed lines indicate the percent chance of retrieving the reward if the start paddle and subsequent paddles on which the food was located were rotated in a random direction: (a) 50%; (b) 50%; (c) 37.5%; (d) 25%; (e) 12.5%. (*) Indicates P < 0.05 in a binomial test. Binomial tests were not run for (b) because the maximum number of trials completed by a subject was 4, but the graph is shown for completeness.
The minimal model (AIC = 488.7) did not significantly differ from the full model in terms of explanatory power ( 2 4 = 5.38, P = 0.25). Post-hoc Tukey tests were used to investigate pairwise comparisons between the different start-levels. There was a significant difference in performance when the reward started on level 1 compared with level 3 (Z = 3.217, P = 0.004), but no difference between levels 1 and 2 or 2 and 3.

Discussion
Most subjects performed well in this task. Only one orangutan and one bonobo failed to perform better than expected by chance across any trial-type (Fig. 5), but the orangutan (Jingga) did improve significantly across testing blocks. These results suggest that in this experiment, the majority of subjects encoded the relevance of the open goal location and were able to plan an appropriate sequence of paddle rotations to retrieve the reward, or learned to do so during the experiment. It is particularly noteworthy that Yuno and Tjintah were successful given that they did not participate in experiment 1, implying that previous experience with the apparatus was not required for success in this task.
The GLMM results and Fig. 5 show that when the reward started on the top level subjects were significantly less likely to retrieve the reward compared with when it started on the bottom level. This is what would be expected if the number of steps that must be considered increases cognitive demand, as was found to be the case in sequential tool-use experiments with New Caledonian crows (Corvus monuloides: Wimpenny et al., 2009) and great apes (Martin-Ordas et al., 2012). Interestingly, more subjects performed better than expected by chance in the 3 steps, 1 solution trials (12.5% chance of success) than in the than 3 steps, 3 solutions trials (37.5% chance of success). In the 3 steps, 1 solution trials, the goal had to be located prior to rotating the start paddle, because if the direction of this first rotation was incorrect then the reward was subsequently impossible to retrieve. However, because the goal was always in position A or D (see Fig. 1a) for these trials, subjects could succeed by using the rule 'rotate paddle with food on towards the open goal'. Also, in the 3 steps, 3 solutions trials the paddles had to be rotated in different directions, whereas in the 3 steps, 1 solution trials every paddle had to be rotated in the same direction, which may have been less challenging from a motor control perspective.

General discussion
Using a new paradigm (the paddle-box) we were able to manipulate the demands involved in a physical planning task in which subjects had to retrieve a food reward from an open goal location. By designing an apparatus that is simple to operate, does not require complex tool-use and has an adjustable level of difficulty, we feel that we have gone some way to developing a test of planning ability appropriate for a range of species.

What evidence for planning?
Overall, subjects failed in experiment 1 but succeeded in experiment 2, though there was substantial inter-individual variation in performance in both experiments (see Table 2 and Fig. 5).
Although both experiments in this study required subjects to select between multiple possible sequences of actions, experiment 1 posed more complex information processing demands than experiment 2. As well as needing to encode information regarding how the diagonally positioned non-food paddles would influence the path between the reward's starting position and the goal, in 2-and 3-step trials subjects had to inhibit the prepotent response to turn the start paddle with the reward on immediately. In experiment 2 on the other hand, trials could be solved by always turning the paddle with the reward on first, because all of the paddles were in a flat orientation. This permitted the task to be solved in a more step-by-step manner, because the position of the reward relative to the goal location could be reassessed at each level.
The fact that subjects of both species could retrieve the reward when they were able to plan in a step-by-step manner (experiment 2) suggests that they did encode relevant task features such as the relevance of the open goal. The success of two orangutans (Yuno and Tjintah) in experiment 2 without having participated in experiment 1 also demonstrated that prior experience with the apparatus was not a prerequisite for success in this task; rather experiment 2 was (as predicted) an easier task. Successful performance of most subjects in experiment 2 is in keeping with the 'one-element planning' demonstrated by chimpanzees during 2D maze navigation, where subjects made decisions at each choice point on the basis of one property (e.g. Euclidean direction to the goal; Fragaszy et al., 2003). However, in the 2-and 3-step trials that only had one possible solution in experiment 2 of our study, the initial paddle rotation had to be in the correct direction, otherwise the reward would have ended up in a location from which its retrieval was impossible. Therefore, in these trials, subjects had to plan their first move based on where the goal was located. Furthermore, in trials in experiment 2 where there were multiple possible correct sequences of action, orangutans solved them in a flexible manner, utilising different routes from a given start paddle to a goal, sometimes turning the start paddle away from the Euclidean direction to the goal (Tecwyn, personal observation). This suggests that they did not simply rely on a procedural rule based on turning paddles towards the goal. Orangutans and bonobos have previously exhibited planning skills in captive experiments. Martin-Ordas et al. (2012) recently demonstrated that all four species of great ape are able to use up to five tools in sequence to retrieve a reward. Both species have also been found to be capable of saving tools for future use (Mulcahy and Call, 2006). The results of experiment 2 in this study provide evidence of the ability of captive orangutans and bonobos to plan an appropriate sequence of actions outside of a complex tool-using context.

Interspecific differences in paddle-box performance?
Unfortunately it was not possible to draw direct comparisons between the performances of the two species due to unavoidable methodological differences, particularly those concerning whether the subjects were tested individually or in a group. Generally speaking, individuals that are able to concentrate and are not distracted will perform better in cognitive tasks (Herrmann and Call, 2012), and attention is known to be important in planning tasks (Parrila et al., 1996). While orangutans were tested individually (apart from those with dependent infants or juveniles), bonobos were tested in their social groups. This may have disrupted their attention, and prevented them from perceiving and encoding relevant task features. Conspecifics could have attempted to steal the rewards, which may have introduced a competitive element and encouraged impulsive behaviour, depending on which other individuals were present (Stevens and Stephens, 2002). There was also the potential for subjects in the same subgroup (Keke, Banya and Kichele) to learn to solve the tasks through observation, but we found no evidence for this.
However, some differences between orangutans and bonobos were apparent in experiment 1, which when taken together with the findings of other experimental work warrant further investigation. Although neither species succeeded in experiment 1, four orangutans but no bonobos preferentially rotated the start paddle towards the open goal location (Fig. 3). It is possible that individuals that preferentially rotated the start paddle towards the goal were able to inhibit rotating the start paddle until they had attended to the goal location. There is some evidence to suggest that orangutans outperform other great ape species in other physical problemsolving tasks requiring inhibitory control (Albiach-Serrano et al., 2012;Vlamings et al., 2010), whereas other studies have reported an absence of interspecific differences (Vlamings et al., 2006;Uher and Call, 2008).
Inhibition of inappropriate actions may be important for efficient locomotion through the forest canopy (an idea that is touched upon by Vlamings et al., 2010). A large bodied ape moving through the discontinuous, compliant forest canopy is faced with a vast amount of information to process, and must make correct decisions regarding which supports to use and which to avoid, as a wrong choice could result in a fall, causing serious injury, or even death (Thorpe et al., 2009). In this situation, the ability to attend to what lies ahead and mentally 'try out' different actions prior to choosing which route to take would be highly beneficial (Povinelli and Cant, 1995;Barth et al., 2004). Others have related apparent differences in inhibitory control skills in primates to differences in their social systems. Specifically, good inhibitory skills have been linked to species with high levels of fission-fusion dynamics (Amici et al., 2008), because of the need to assess a situation before acting, and respond in a way that is appropriate based on the current composition of the party . While both orangutans and bonobos are considered to experience high levels of fission-fusion dynamics (Amici et al., 2008), orangutans have a more extended, less cohesive social system . This means that intraspecific competition for food, which may promote impulsive food-grabbing behaviour, is relatively reduced in orangutans (Shumaker et al., 2001).
Three orangutans but no bonobos solved some of the 2-step trials in experiment 1 by pre-positioning relevant non-food paddles ( Table 2). Orangutans rotated more non-food paddles than bonobos overall, but they were not necessarily relevant (Fig. 4). It is possible that this finding may be related to species differences in exploratory behaviour and temperament (bonobos have been shown to be shier of novel things than orangutans; Herrmann et al., 2011), rather than a difference in cognitive ability.

Why did apes fail in the advance planning task?
Negative results in tests of cognitive ability are notoriously difficult to interpret, because there could be several different causes of failure (Seed et al., 2012). Although experiment 2 removed inhibitory demands, it also eliminated the need to encode how diagonally positioned non-food paddles influenced the path of the reward, so it is difficult to determine the relative contributions of these factors to failure in experiment 1. One way to try and illuminate causes of failure in tasks designed to investigate a particular cognitive ability is to minimise peripheral demands that are simultaneously taxed during testing (Seed et al., 2012). For example, in the case of inhibitory control, it has been demonstrated that replacing food with tokens in the reversed contingency task enables subjects to inhibit the strong behavioural predisposition to select the larger quantity (Boysen and Berntson, 1995;Boysen et al., 1996;Kralik et al., 2002;Albiach-Serrano et al., 2007;Addessi and Rossi, 2011).
Another way of potentially reducing the inhibitory demands of the task presented in experiment 1 would be to enforce a delay between subjects seeing the paddle-box with the reward present and allowing them to respond. Children are known to be more likely to avoid making an inappropriate prepotent response when a delay as short as two seconds is enforced by the experimenter in several different tests of inhibitory control, and it has been proposed that this is because the delay permits time for passive fading of the prepotent response, rather than allowing time for active computation (Simpson et al., 2012). It would be interesting to see if young children, whose inhibitory control skills are known to show marked improvement between the ages of 3 and 5 years (Carlson and Moses, 2001) also struggled with the advance planning task before this age, and whether taking measures to reduce inhibitory demands (e.g. by replacing rewards with tokens or enforcing a delay) might improve their performance.