Similarity and structured representation in human and nonhuman apes

.


Introduction
Across the animal kingdom exists the capacity to extend familiar behaviours to novel but similar situations and objects.This makes similarity a fundamental concept within models of human and nonhuman cognition (Pearce, 1994;Rescorla & Wagner, 1972).Indeed, it has been shown to influence learning (Ross, 1984), memory (Simons et al., 2005), generalization (Osherson, Smith, Wilkie, López, & Shafir, 1990), categorization (Nosofsky, 1984), and even social behaviour (White, 2008).Similarity, however, is fundamentally in the eye of the perceiver (Goldstone, 1994b;Hahn & Chater, 1997;Medin, Goldstone, & Gentner, 1993).It is not a property of physical objects themselves, but rather a property of how an animal represents those objects.Specifically, similarity is a function of those aspects of an object that are encoded, and the importance assigned to them.Theories of similarity, therefore, have close connections with theories of representation: how real-world objects are internally represented affects perceived similarity, and perceived similarity, in turn, provides insight into mental representation (Edelman, 1998;Hahn, 2014).
A major debate within human psychology concerns the role of relations in similarity: a table, for example, is not just a collection of features (tabletop, legs, colour etc.) but these features arranged in a particular way.Human visual representation, therefore, is argued to involve so-called 'structured representations', that is, representations that involve both features and the relations between them (Biederman, 1987;Hafri & Firestone, 2021;Hahn, Chater, & Richardson, 2003;Markman & Gentner, 1993).Indeed, it has been shown that when two objects share a perceptual feature (e.g., 'red'), it contributes more to human similarity judgements when it appears in corresponding positions of a relational structurereferred to as a match-in-place (Goldstone, 1994a).For instance, if presented with an image of two people wearing coloured hats and shirts, the pair will be perceived to be more similar if both hats are red and both shirts blue, than if the hat and shirt colours are swapped for one person in the pair, despite the overall feature set remaining unchanged.
Attempts to formally measure similarity, both within cognitive psychology and machine learning, have thus started to move toward ways of calculating similarity over structured representations (Hahn et al., 2003;Markman & Gentner, 1993).However, many of the most popular models, particularly in the context of animal learning (Pearce, 1994;Rescorla & Wagner, 1972), still treat stimulus representation as a matter of decomposing stimuli into individual, task-relevant features, whether in a feature vector (Tversky, 1977), or a spatial representation that represents items as points in a multi-dimensional space (Shepard, 1957).
As we will outline below, the only way relational structure is encoded in these models is by treating relations themselves as 'features', in particular, as 'conjunctive features ' (i.e., 'red' and 'hat' becomes 'red+hat').
While such 'featural' representations appear too simplistic for humansat least in some contexts (Goldstone & Medin, 1994;Hahn et al., 2003;Markman & Gentner, 1993) the widespread use of featural coding schemes in the animal learning literature would seem to reflect an implicit (or even explicit) assumption that these may be adequate for some, or even all, nonhuman species.This assumption might have been fuelled further by the finding that nonhuman species have difficulty recognising purely relational similarities (Blough, 2001;George, Ward-Robinson, & Pearce, 2001;Haun & Call, 2009).The fact that nonhuman primates do not seem to be able to deal with such purely relational similarities (e.g., "equal/unequal", or "sameness") is of course distinct from the question of whether relational information impacts similarity judgements more generally.Needless to say, if nonhuman primates do not include relational information in their object representations, then such information cannot impact their similarity judgements.The fact that they might, however, as has been argued in some studies (Hopkins & Washburn, 2002;Huber & Lenz, 1993;Kirkpatrick-Steger, Wasserman, & Biederman, 1998;Zentall, Wasserman, & Urcuioli, 2014), does not prejudge if and how this relational information impacts perceived similarity.Just showing sensitivity to relational structure does not tell us much (if anything) about the similarity gradients that stem from matches or mismatches.To understand this, structural models of similarity from the human literature must be tested systematically on animal behaviour.Given the close connection between similarity and theories of representation outlined above, such an examination should elucidate not only similarity, in particular the limitations of feature-based models of similarity, but may also provide valuable insight into the mental object representations of nonhuman species.

The limits of featural coding
To address this question, we examined the nature of perceptual similarity judgements in three great ape species: humans, chimpanzees, and gorillas.Our main question was whether perceived similarity in chimpanzees and gorillas are, like humans, sensitive to 'structure'.To this end, we devised a simple stimulus set that allowed us to adjudicate between featural and structural models of similarity.Before outlining these stimuli and the model predictions in detail, we will first describe some of the fundamental issues facing featural models of similarity.This is best done with reference to a particular set of items, so we will use example items from our actual stimulus set.Each item in our stimulus set involves a pair of geometric figures, such as those shown in Fig. 1.Broadly, featural models of similarity treat each psychologically relevant aspect of an object as a single component, and the entire object is represented by all relevant componentswhether these are represented as a feature set, a feature vector, or a point in a multi-dimensional feature space (Shepard, 1980;Tversky, 1977).To illustrate: the left stimulus in Fig. 1 (item 1) might be represented by the feature set {square, circle, white, black}.
However, it is not just the simple attributes such as colour or shape that are potentially relevant to similarity but also relations between these attributes: a particular colour and shape are bound together in the same component object (e.g., the square), and that object is arranged in a particular way relative to the second object (the circle).Featural representations capture such relational information only by turning the relation itself into a 'feature'.This type of conjunctive coding introduces a feature 'black+circle' to capture the fact that it is the circle not the square that is black, thus expanding the feature set to {square, circle, white, black, white+square, black+circle}.
Such an approach runs into trouble because it leads to a proliferation of 'features': all the basic elements and all their possible combinations must be retained to avoid the hyper-specificity that combinations would otherwise bring.To illustrate, if one were to only consider 'white+square+left' as a single compound feature, then item 1 and item 2 no longer share any features and would thus appear maximally dissimilar.In other words, one needs to retain both the component features ('white') and the conjunctions ('white+square') to account for commonalities that appear across different positions and/or objects.This requirement not only leads to a combinatorial explosion (with the number of conjunctions determined by the binomial coefficient 'N choose k', which for 10 individual features will add a further 45 separate feature pairs, and at N = 15 a further 105) but all of these features potentially influence the similarity comparison and counting them in assessing 'commonality' itself can lead to counter-intuitive distortions.For example, such proliferations make it the case that items 1 and 2 have considerably fewer features in common when compared to items 1 and 3.
Representation schemes that allow one to represent relations are known as structured representations (Gentner, 1983; for more recent literature, see e.g., Doumas, Hummel, & Sandhofer, 2008;Doumas & Martin, 2018;Poldrack, 2020;Shepherd, 2018).The most popular examples of such schemes are graph structures or multi-place predicates as found in first-order logic, for example, a 2-place predicate such as TO-THE-LEFT-OF(x,y).Turning relations into features means that a binary relation such as TO-THE-LEFT-OF(x,y) effectively becomes a feature such as TO-THE-LEFT-OF-X(y).The crucial difference between these two schemes is that only the former, relational, representation separates out the relation and both its arguments in such a way that they can be accessed simultaneously and thus independently factored into the similarity comparison.TO-THE-LEFT-OF(x,y) might, for example, provide a relational match to TO-THE-LEFT-OF(q,r) thus providing a purely relational commonality across multiple object pairs such as those in Fig. 1.For the feature-based version of the same state of affairs, one is left simply with two distinct properties TO-THE-LEFT-OF-X() and TO-THE-LEFT-OF-Y().The same is ultimately true of the kinds of conjunctive coding schemes that are popular in associative and connectionist models (Blumberg & Sokoloff, 2001;Dickinson, 2012;Gluck & Bower, 1988).
It is for this reason that many theories concerned with the representation of real-world objects or eventswhether these are faces, scenes, sentences, or extended narrativesassume that these cannot be represented on purely featural schemes (Biederman, 1987;Hahn et al., ) and the features/conjunctions including relative spatial location (w+l = 'white+left', w+s+l = 'white-+square+left', etc.).To the right are the features that items 2 and 3 share with item 1.Note also that the comparison 1∩3 contains two matches in terms of colour: the colour match between the two squares and the colour match between square in the item 1 and circle in item 3.This latter match feels like it should 'count for less', in line with prior work (Gentner & Markman, 1997), but on a conjunctive scheme is dealt with by the conjunctions 'colour+shape' and 'colour+shape+relative location'.Even excluding the multiple match, however, item 2 remains less similar to 1 than 3.
2003; Markman & Gentner, 1993).Instead, they seem to require structured representations: complex representations of objects, their parts and properties, andcrucially -the interrelationships between them, that cannot be boiled down to either lists of features or points in space.
That said, it is extremely difficult empirically to distinguish between representation schemes and with them different approaches to measuring similarity.As the examples in Fig. 1 illustrate, similarity depends on representation.If only basic, elemental features (e.g., 'square', 'circle', etc.) are considered, but no conjunctions, for example, then items 2 and 3 are equally similar to item 1.Without independent specification of the representation of a pair of items, any degree of similarity between them can likely be generated simply by 'tweaking' the representation.However, our understanding of human cognition (let alone nonhuman cognition) is simply not advanced enough to provide a sufficiently detailed, and independent, specification of those representations.
Without independent constraint, however, even very general contrasts, such as evidence simply for structure over purely featural representations becomes incredibly difficult, because representational flexibility allows different types of account to mimic each other's predictions.Even a classic finding indicating the importance of structure in similarity judgements, such as the larger effect on similarity of features that appear within corresponding positions of a relational structure (i.e., matches-in-place), is subject to this.The example of the hats and sweaters from above can be captured readily through the assumption of conjunctive features ('red+hat', 'blue+sweater') in addition to the basic features 'red', 'blue', 'sweater' and 'hat'.Matches-in-place simply give rise to both a match in terms of elements, and to a match in terms of conjunctive features.Without simultaneously providing evidence against conjunctive coding as a sufficient, alternative explanation (which those studies do not provide), demonstrating the effect of matches-inplace may provide only rather weak evidence for structural representations.
Though difficult, we seek to show that such evidence against conjunctive coding is possible.The strategy for doing so is already hinted at in Fig. 1.Specifically, the contrast between featural and structural representation can be pursued successfully over a suitable set of items.While 'featural' reconstructions may seem locally plausible when considering just one or two comparisons, they can be shown to be globally implausible over an entire set of items.The crucial 'trick' here is to take whatever conjunctive features 'do the work' in one comparison and then identify another contrast that can be added to the set of similarity comparisons under consideration where those conjunctive features lead to difficulties, generating implausible predictions of similarity.The stimulus set presented in this paper was designed to do just that.

Devising a stimulus set for contrasting featural and structural models
Our stimulus materials are pairs of simple shapes and are based on a domain that has been successfully applied across species to study similarity (Hodgetts & Hahn, 2012;Hodgetts, Hahn, & Chater, 2009;Larkey & Markman, 2005), feature binding (Cheries, Newman, Santos, & Scholl, 2006) and analogical reasoning (Fagot & Thompson, 2011;Vonk, 2003).We constructed from this domain a set of items that would distinguish structural accounts from a variety of possible feature models.These stimuli are shown in Table 1.As can be seen, each stimulus comprises two coloured geometric figures which are always compared to the same reference (or target) stimulus.The fact that these items are a composite of two shapes makes it possible to readily manipulate featural and relational attributes of the stimulus.Our set involves seven such comparisons in total (labelled A to G), which vary systematically along two dimensions: shape and colour (Task 1) and shape and inner line orientation (Task 2; see Methods and Fig. 2A for more detail).For clarity, we will refer only to the Task 1 items below.Table 1 summarises the model predictions for each comparison.We next outline these models.
The first model considers only basic feature matches (see FEAT, Table 1).Feature matches are counted on each stimulus dimension separately (e.g., colour and shape) and can be matched multiple times (e.g., the feature 'blue' in the target item forms a match with both blue features in comparison E).Critically, this basic feature model cannot distinguish between comparisons that share the same features but in different spatial arrangements, that is, comparisons A to C. Given that these three comparisons appear perceptually distinct, a purely featural representation of this kind seems insufficient.At the very least, colour and shape need to be bound together into coherent objects, reflecting the fact that it is the circle that is red, not the square.
The ability to also code conjunctive features, such as 'blue+square', is a strategy designed to address this issue (see e.g., Wagner & Brandon, 2000, and references therein).Like the FEAT model described above, this conjunctive feature model (C-FEAT) can form multiple matches, such that basic features and conjunctions in one stimulus can match with multiple features/conjunctions in the other.While such an approach is

Table 1
Predicted similarities for our stimulus comparisons across a range of featural and structural models.The stimulus comparisons A-G are listed in the first column and are further described in Fig. 2. The remaining columns refer to the different models tested.The first four models listed are 'featural' models: a basic feature model (FEAT), a feature model that codes feature conjunctions (C-FEAT), a conjunctive model that also codes relative spatial location (CS-FEAT), and a feature model that matches features only if they appear in corresponding spatial positions (S-FEAT).Additional information about how these model predictions were derived can be found in the Supplementary Methods.Comparisons marked with an asterisk are discussed in the main text.sufficient for distinguishing comparisons where features have swapped across objects (i.e., comparison C vs. comparison B in Table 1), such models cannot distinguish between identical items (comparison A) and a 'swap' (comparison B), unless they also code for relative spatial location of features.Clearly the relative spatial location of features is relevantat least for humans (Hodgetts et al., 2009).To capture this, we can specify a new conjunctive spatial feature model (CS-FEAT), in which relative spatial features are also added, both as single features but also as components of more extended conjunctive features (e.g., 'left', 'blue+left', and 'blue+square+left' get added to our feature set or vector; e.g., see George et al., 2001;George & Pearce, 2003).This new model (CS-FEAT) now distinguishes comparisons A and B (i.e., identity and swap) at the expense of over-matching 'down the road' (see Table 1).Namely, the predicted similarities of comparisons D and E are now greater than B. Limiting feature matches to certain spatial locations (i.e., 'blue' matches 'blue' if and only if blue is in a corresponding spatial position) via a spatial-feature model (S-FEAT) seems plausible to address this, but renders equally dissimilar comparisons B, F, and even G!These stimulus materials thus draw out the fundamental limitation of features in capturing relational information.Specifically, tweaking the feature set to include conjunctive and/or relational features may seem locally plausible when considering just one or two comparisons, but can be shown to be globally implausible over an entire set of items.
The final two columns in Table 1 are structural models of similarity that have been applied extensively in past research (Hodgetts et al., 2009;Larkey & Markman, 2005).The first of these models, MIP (for 'matches-in-place'), draws upon existing models of structural alignment, which have been applied to capture human similarity judgements across a range of contexts, including perceptual similarity (Goldstone, 1994a;Larkey & Markman, 2005), as well as metaphor and analogy (Gentner & Markman, 1997).Critically, such models assume more structured, hierarchical representations, whereby local properties or attributes (e.g., 'red') form parts of whole objects (the hat), which in turn play a specific role within the broader relational structure.The alignment process itself (see e.g., Falkenhainer, Forbus, & Gentner, 1989;Gentner, 1983;Gentner & Markman, 1997) seeks to form matches that are structurally consistent across two representations, which requires that: a) an element in one representation must match with at most one element in the other representation (one-to-one mapping); and b) wherever relations are placed in correspondence, their respective arguments are also placed in correspondence (parallel connectivity) (Falkenhainer et al., 1989).
Underpinning these structural alignment models, as demonstrated in the 'hat' and 'coat' example above, is the classic distinction between matches for elements that have been placed in correspondence (matches-in-place, or MIPs) and matches for elements that do not correspond (matches-out-of-place, or MOPs).Given the established impact of MIPs on similarity ratings in humans (Markman & Gentner, 1996), including within the stimulus domain used here (Hodgetts et al., 2009), our alignment model was based on the number of MIPs.Consistent with the one-to-one mapping constraint, a feature match on our MIP model only "counts" if the objects themselves have been placed in correspondence.1For comparison C, for instance, the lower-level features 'blue' and 'square' (which make up the left-hand object in the Alternative combinations of these features make up the comparison stimuli, as outlined in the Methods.(B) Trial schematic for the nonhuman ape training session is shown on the left.This phase established their preference for the target items shown in panel A. Subjects received a food reward (grape) for selecting the target item over the everyday object stimulus.Training terminated when subjects reached the required criterion (80% correct).In the main experiment, which was the nearidentical for human and nonhuman participants (see Methods), baseline trials (target item vs. everyday object) were intermixed with 'test trials'.For test trials, targets items were paired with one of seven stimuli from the set (which vary in their similarity to the target).(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)target stimulus) map on to two separate objects in the right stimulus.Given the one-to-one mapping constraint, such many-to-one mappings, where a single feature/object in one stimulus is matched with two or more features/objects in a second stimulus, are not permitted under this MIP model.As a result, only a single match is countede.g., the blue square is matched with the blue circle on the basis of colour, and likewise the red circle is matched with only one object in the right stimulus: the 'red square' based on the shared property 'red'.This adherence to one-to-one mappings at the level of objects allows the MIP model to distinguish comparisons B and C, and also does not lead to the profound proliferation of features for other comparisons in the set.
The transformational model of similarity, or 'Representational Distortion' (RD) (Hahn et al., 2003), proposes that perceived similarity emerges from the complexity, or 'effort', of transforming the mental representation of one object or event into another.Measures of transformational complexity, and thus (dis)similarity, may range from continuous spatial transformations (e.g., translation, rotation, etc), as seen in models of visual object recognition (Graf, 2006;Hahn, Close, & Graf, 2009;Lawson, 1999;Lawson & Jolicoeur, 2003), to sets of simple operations (insert, delete, swap, etc), which can then be combined into longer codes to capture more complex transformational relationships (Hahn et al., 2003;Hahn & Bailey, 2005;Hodgetts et al., 2009).Conceptually, a single transformation may act upon individual features (or indeed continuous feature dimensions), whole ensembles of features, or manipulate the interrelationship between features or objects (i.e., structure; Hahn et al., 2003).In past empirical work, it has been shown that transformational model predictions can capture accurately human similarity ratings (Hahn et al., 2003;Hodgetts et al., 2009), speeded same-different judgements (Hodgetts & Hahn, 2012), and even analogical reasoning (Leech, Mareschal, & Cooper, 2007).Critically, it has also been shown that transformations provide superior fits of human similarity data when compared to basic feature models (Hahn et al., 2003;Toussaint, Matthews, Campbell, & Brown, 2012) and models of structural alignment (Hodgetts et al., 2009; see also Larkey & Markman, 2005).
The RD predictions in this study (Table 1) are derived from a simple coding scheme used previously (Hodgetts et al., 2009;Hodgetts & Hahn, 2012), which has been shown to capture accurately human perceived similarity within this stimulus domain.This coding scheme specifies three simple operationscreate, apply and swapwhich can be combined to characterise the transformational relationships between the items in our stimulus set (Table 1).Transformational 'complexity' is then operationalised in this model by the number of such transformations required by the shortest distance conversion of one object's representation into that of another.This model, by assuming swap-like operations, can distinguish between identity (comparison A) and spatial changes that act upon the same set of features (i.e., comparisons B and C; Table 1).An in-detail specification of how the predictions of Table 1 are derived not just for RD, but for all of the models, can be found in the Supplementary Methods.
As noted in Hodgetts et al. (2009), structural alignment models and RD are not necessarily in conflict, and in many cases the preferred alignment between two object representations will be that which affords the simplest transformation between those representations (Graf, 2006).Likewise, transformational and featural, or indeed spatial models (e.g., Shepard, 1957), are not necessarily in conflict in the sense that the former can be seen as generalizations of at least some featural or spatial modelsgeneralizations that allow a broader range of 'transformations' including, crucially, ones that are sensitive to structure in ways that featural or spatial models are not.
For the purposes of the present investigation, what matters is that both structural alignment models and RD have been used successfully to provide experimental evidence for the importance of structure in human similarity judgements (e.g., Hahn et al., 2003;Hodgetts et al., 2009;Markman & Gentner, 1996;Toussaint et al., 2012).In other words, our interest is not in which of these models might be 'best'.Rather, we will use these models as a collective set of tools for probing the role of structure in perceived similarity for nonhuman primates, and thus for the role of structure in nonhuman object representation.In short, this paper seeks to probe whether there is evidence for representational schemes that go beyond mere features, while remaining agnostic to the specific ways in which such structural information might be encoded in the brain (for a selection of accounts, see e.g., Falkenhainer et al., 1989;Goldstone, 1994a;Taylor & Hummel, 2009).

The current investigation
As highlighted in the previous section, one of the key challenges when contrasting featural and structural models of similarity is that in many contexts it is possible for featural models to 'mimic' the predictions of structural models, particularly by ad hoc turning structural information (e.g., information about bound objects and spatial position) into features through the use of increasingly complex conjunctions.This has implications not only for evaluating different models of human similarity, but also, via the intimate connection between similarity and mental representation, for understanding the nature and complexity of the underlying object representations themselves.The way to avoid this mimicry is to have a carefully designed set of stimuli that allows us to demonstrate that particular ad hoc features, which may be effective at the level of individual comparisons, lead to counterintuitive distortions across the whole set of comparisons.In this study, we have designed such a stimulus domain, which will allow us to disentangle featural and structural models of similarity.By addressing the ad hoc mimicry of feature-based coding, we can provide much stronger evidence for structure sensitivity in both human and nonhuman species, while drawing out general dichotomies between featural and structural models of similarity and their implications for understanding human and nonhuman cognition.
To allow direct comparisons between ape species, each species group (human, chimpanzee, and gorilla) completed the same basic tasks, where subjects had to press a specific target stimulus from two possible items on each trial (Fig. 2).The difficulty of doing this (as indicated by higher error rates in nonhuman apes and slower response times in humans) was assumed to be related to higher similarity between the target and the seven test items.We constructed two versions of the stimuli (labelled Task 1 and Task 2), each with the same underlying logical structure, but replacing the surface features of shape and colour with shape and the orientation of an inner line (see Fig. 2A).Colour, in particular, may be a core property underlying visual object discrimination and individuation in both human and nonhuman primate species (Gershkoff-Stowe & Smith, 2004;Mendes, Rakoczy, & Call, 2011).Thus, we sought to compare similarities for coloured objects (Task 1) with achromatic, single-colour stimuli that manipulated only shape-related information (outer shape/inner line).Our main question, however, was whether featural modelswhich seem to make counter-intuitive predictions from the perspective of human observersdo, in fact, capture similarity in nonhuman hominids.

Participants
Five chimpanzees (Pan troglodytes; 1 male, 4 female) and three gorillas (Gorilla gorilla; 3 females) took part in Task 1 (8 apes in total).One chimpanzee (Trudi) did not complete Task 2 as she did not reach criterion (80% correct) during the learning phase, resulting in N = 4 chimpanzees in Task 2 (7 apes in total).All subjects were housed at the Wolfgang Köhler Primate Research Center at Zoo Leipzig (Germany).They lived in social groups with conspecifics and had access to indoor and outdoor areas designed to be appropriate for their species.All subjects had touchscreen experience.Apes were tested individually within a familiar, indoor room, with the exception of one gorilla (Viringika).In Viringika's case, her young daughter accompanied her in the testing area; no obvious disruption to Viringika's performance was noted from this arrangement.Ten human volunteers (3 males, 7 females; mean age = 31 years; SD = 6.93) were tested at Cardiff University, School of Psychology.This was undertaken with the understanding and written consent of each participant.All participants had normal or corrected to normal vision.Informed consent was obtained after the nature and possible consequences of the study were explained to participants, in accordance with the local research ethics committee at Cardiff University.Further, the animals' care was in accordance with institutional guidelines at the Max Planck Institute for Evolutionary Anthropology, and Zoo Leipzig.

Stimulus design
In each task, stimuli consisted of seven test stimuli, corresponding to comparisons A-G (Table 1 & Fig. 2) and seven photographs of everyday objects.Each test comparison involved two pairs of geometric shapes.Each pair was 128 × 128 pixels and individual shapes were separated by a horizontal distance of two pixelsan inter-stimulus distance that has been shown to facilitate relational processing in nonhuman primates (Fagot & Parron, 2010).The everyday object photographs were scaled to approximately the same size as the test stimuli.All stimuli were presented on a 350 × 350-pixel touch-sensitive grey square.The test stimuli were defined on two feature dimensions for each task (Task 1: Dimension 1 = 'shape', Dimension 2 = 'colour'; Task 2: Dimension 1 = 'shape', Dimension 2 = 'inner line orientation').The value of a given stimulus on each dimension can be represented abstractly using letters, where each unique letter refers to a unique feature (Task 1 shape: A = square, B = circle, C = triangle; Task 1 colour: A = blue, B = red, C = green; Task 2 shape: A = oval, B = diamond, C = rectangle; Task 2 inner line: A = vertical, B = horizontal, C = diagonal).The arrangement of features for each comparison followed the same logical structure on both tasks.The target stimulus in each task can be denoted by AB/AB, that is, 'square to the left of circle', and 'blue to the left of red'.There were seven possible test trials corresponding to the comparisons in Fig. 2A: target stimulus versus stimulus AB/AB ('identity' or same trials); target stimulus versus stimulus BA/BA; target stimulus versus stimulus BA/AB; target stimulus versus stimulus AB/AA; target stimulus versus stimulus AA/AA; target stimulus versus stimulus CA/BA; target stimulus versus stimulus CC/CC (see Fig. 2A).

Apparatus
The nonhuman tasks were presented on a 15-in.LCD monitor, mounted in a custom-built metal holder.The monitor was situated behind a 15-in.infrared touchscreen frame.The touchscreen frame replaced one of the standard safety panels that separated the nonhuman apes from the experimenter, becoming the de facto safety panel.Situated in front of the touchscreen frame was a thin Plexiglas panel with five circular finger holes cut into it.These were in the centre, top-left, topright, bottom-left, and bottom-right corners of the panel.As well as forming an additional safety panel, the Plexiglas panel also enabled safe touching of the touchscreen frame by the apes.Located on the floor to either side of the touchscreen-metal frame, and facing toward the subjects, were two small speakers used to present audio feedback.A PC, outputting a display resolution of 1280 × 1024 pixels, was connected to the LCD monitor and touchscreen frame, and E-Prime 2.0 (Psychology Software Tools, Inc., Sharpsburg, PA) was used to run the experiment.Food rewards were fed by hand when a correct response was made through a plastic tube located next to the touchscreen-metal frame.The setup was identical for the chimpanzees and the gorillas.The human version of both tasks was presented on a 13-in.laptop outputting a display resolution of 1280 × 1024 pixels, and responses were made using a mouse.

Non-human tasks 2.4.1. Task 1 -training phase
When an ape entered the testing area, they received two grapes and the testing session was started.Each trial began with a white fixation point (150 × 150 pixels) presented in the centre of grey background (Fig. 2B).We defined a 350 × 350-pixel touch area around the fixation point; a touch within this area initiated a 150 ms inter-stimulus interval (ISI), followed by a training trial.On each training trial, the target stimulus and one of seven everyday object stimuli was presented concurrently (Fig. 2).Selection of the everyday objects on each trial was determined randomly without replacement.On each trial, the positioning of the target stimulus and picture was selected randomly from four pre-defined possibilities: target stimulus (top-left) versus everyday object (top-right), and vice versa; target stimulus (bottom-left) versus everyday object (bottom-right), and vice versa.A touch within the 350 × 350-pixel area of the target stimulus resulted in the immediate termination of the trial, presentation of a positive sounding sound and a food reward being given.A touch within the 350 × 350-pixel area of the real-world object stimulus resulted in the immediate termination of the trial, presentation of a negative sounding sound, no food reward and a 3000 ms delay screen as 'punishment'.Following this, the same trial was repeated (i.e., a 'correction' trial) until the subject selected the rewarded target stimulus.Following a 1000 ms grey screen inter-trial interval (ITI), a new trial began with presentation of the white fixation point.Any touch made outside the pre-defined stimulus regions had no consequence.The number of trials within a session varied depending on the number of correction trials required.The minimum number of trials per training session (i.e., assuming perfect performance with no correction trials), was 70 trials.Training proceeded until the subject had achieved a response accuracy of at least 80% correct over three consecutive task runs, at which point they were transferred to the test phase.

Task 1 -test phase
As in the training phase, the nonhuman subjects received two grapes on entering the testing area and then the touchscreen session was started.The first 21 trials of a session were identical to those shown in the training phase (i.e., the target stimulus paired with a real-world object), though no correction trials were presented (henceforth, 'baseline' trials).If a chimpanzee or gorilla chose an everyday object picture over the target stimulus, this resulted in the presentation of a 'negative' sound and no food reward being given.Following a 3000 ms delay screen ITI ('punishment'), a new trial began with presentation of the white fixation point.After completing these initial 21 trials, subjects could be presented with two trial types: 1) target stimulus versus everyday picture (i.e., baseline trials); and 2) target stimulus versus test items A to G (henceforth, test trials).Selection of the test stimuli was determined randomly without replacement.On each trial, the positioning of the target and test stimulus was randomly selected from the four pre-defined possibilities specified previously.In contrast to baseline trials, touching either the rewarded or unrewarded stimulus on test trials resulted in no sound stimulus being played and no food reward being given.Any touch made outside stimulus touch areas had no consequence.Following a randomly determined grey screen ITI of between 1000 ms and 3000 ms, a new trial began with presentation of the white fixation point.One block consisted of 14 baseline trials and seven test trials (presented randomly), and each run of the task consisted of five blocks.Eight runs of the task were completed in total during the test phase.

Task 2 -training phase
The same procedure as Task 1 was used with the following exceptions: 1) a trial began with a black fixation point (150 × 150 pixels; 2) the background colour of the screen was always white; and 3) the screen positioning of the target stimulus and real-world object was pseudorandom, with the constraint that the target stimulus could not be presented on the left or right of the real-world object more than two times in a row.While there was no evidence of the nonhuman apes developing a side bias in Task 1, this control was added to reduce the possibility of a side bias being developed in Task 2.

Task 2 -test phase
The same procedure as Task 1 was used, with the exceptions noted above and one further exception: only a maximum of four (unrewarded) test trials could be presented in a row.The reason that this was implemented was because it was possible, theoretically, for up to 14 unrewarded test trials to appear consecutively in Task 1 (though this never occurred).Based on this change to the experimental design, one block in Task 2 consisted of seven test trialsas in Task 1 -and between nine and 12 non-test trials (i.e., target shape versus everyday object picture).In each session, six blocks were completed.Overall, each session consisted of 126 trials (as in Task 1): 84 baseline trials (including the initial 21 trials) and 42 test trials.
Two chimpanzees (Alex and Jahaga) experienced a drop in performance on the baseline trials during the test phase.When this occurred, a correction stage was introduced whereby an incorrect response resulted in the presentation of a negative sounding sound, no food reward being given and, following a 3000 ms white screen 'punishment', a repeat of the same trial.As in the training phase, such correction continued until the subject chose the target item.Both chimpanzees that experienced the aforementioned drop in performance undertook two sessions of this modified procedure as soon as the issue arose, which quickly restored good performance on baseline trials.

Human tasks
Within each testing session, human participants completed both Task 1 and Task 2, and the order of these was counterbalanced between participants.Unlike the nonhuman ape version of the task, there were no training phases, and each task began with 7 baseline trials (versus the 21 shown in the nonhuman version) and each task comprised a single run (versus the 8 presented in the nonhuman version).Within a single run, the task structure and parameters were identical to the nonhuman ape version.

Structure matters for human and nonhuman species
We first examined the relationship between perceived similarity in each ape species and the predictions of our featural and structural similarity models.Assuming a power law relationship between similarity and response time (Cohen & Nosofsky, 2000;Hodgetts & Hahn, 2012), the best fitting similarity model for human participants is a structural model, RD, across both tasks; this model captures 98% of the variance for Task 1, and 88% of the variance in Task 2 (Table 2).For the nonhuman apes, the best fitting model is again RD for Task 1, capturing 58% of the variance, but for Task 2 it is outperformed by the CS-FEAT model, with 98% of the variance (the fits for each tested model can be found in Table 2).
This suggests that chimpanzees and gorillas processed the Task 2 materials, with their more artificial, shape-related dimensions (outer shape/inner line orientation), in a fundamentally different way to the more ecologically familiar shape-colour combination found in Task 1.In other words, there is a task difference on the basis of materials for chimpanzees and gorillas.While their performance is aligned to humans on one task, showing evidence for structure, their performance looks "featural" on Task 2. In the remainder, we provide analyses that follow up on this point, while further examining the putative role of structure in the perceived similarity of nonhuman apes.

Task differences emerge in nonhuman apes but not humans
The difference between these tasks, as well as an across-species structure sensitivity in Task 1, is confirmed by a global analysis that combines all the goodness of fit values (R-squared) for the 'featural' models found in Table 1 into a single feature measure, and likewise all 'structural' models and their goodness of fit into a structure measure.To derive a simple metric of 'structurality' from this model space, we calculated a difference score between the structure and feature measures for each individual subject.A positive score on this metric indicates that the pattern of perceived similarity across comparisonsat an individuallevelis better captured by structural models of similarity.
Consistent with previous studies of both similarity and analogy (Hahn et al., 2003;Hodgetts et al., 2009;Markman & Gentner, 1993), all human participants (10/10) were more structural (Fig. 3A).Strikingly, the same pattern was also observed for the nonhuman ape species, with all individual subjects scoring positively on the structurality measure and exhibiting scores that appear in the centre of human structurality distribution in Task 1 (Fig. 3A).This shows, therefore, that the structural models accounted for more of the variance in individual-level similarity data than featural models across all human and nonhuman apes.
When this metric was evaluated for Task 2 -in which human and nonhuman apes were weakly correlated in terms of general patterns of similarity (Fig. 3C; Supplementary Results) -the apes appeared more featural overall (Fig. 3A).The humans, however, were numerically even more structural in Task 2 (Task 1 mean = 0.26; Task 2 mean = 0.29).These observations were confirmed statistically, such that the structurality difference between human and nonhuman ape species was strongly dependent on task stimuli (F (1, 15) = 14.99, p = 0.002, η p 2 = 0.5, BF 10 = 2793.9;Fig. 3A).Follow-up Welch t-tests revealed no significant

Table 2
Goodness of fit (R 2 ) for each similarity model tested.The fits for the default models (i.e., the predictions shown in Table 1) are shown in the upper table.The lower table displays the model fits when the relative weighting of each stimulus dimension (e.g., colour and shape) is allowed to vary parametrically (see Section 3.4 for further detail).The fits of the Contrast Model are also reported (labelled CMOD-SM; see Section 3.5 and Supplementary Methods).Both linear and power fits are shown, and models are fitted to response time data for the human participants and accuracy for the nonhuman ape (NHA) species.Prior work has shown that similarity-response time relationships are readily captured by a power law (Cohen & Nosofsky, 2000;Hodgetts & Hahn, 2012), and so power fits are used when assessing model fits of human data.difference in the structurality measure in Task 1 (p = 0.45; BF 10 = 0.55) but a significant difference in Task 2 (p < 0.001; BF 10 = 594.6).Furthermore, while the human structurality scores did not differ between tasks (p = 0.62; BF 10 = 0.35), the nonhuman ape group were significantly more 'featural' in Task 2 (p = 0.005; BF 10 = 10.82).The same interaction emerges when considering the single best-fitting similarity model for each subject (shown in Table S1).Here, the best fitting model is structural for the majority of the human participants in both Task 1 (8/10) and Task 2 (10/10), and for all chimpanzees and gorillas in Task 1 (8/8).The featural models, however, provide the best fit for 5/ 7 nonhuman apes in Task 2 (Table S1).
As the structurality measure includes a larger pool of featural models, it may underestimate the role of features by also considering those which provide a poor fit of the data in a given subject.To address this possibility, we recalculated the structurality measure in each subject (across species) and subtracted the single best fitting feature model from the best fitting structural model (see Fig. 3B).The same interaction was observed (F (1, 15) = 18.17, p = 0.001, η p 2 = 0.46, BF 10 = 1821.27;Fig. 3B), such that no significant difference was seen between groups in Task 1 (p = 0.65; BF 10 = 0.47), but a difference was found in Task 2 (p < 0.001; BF 10 = 2129.69).

Apes are sensitive to spatial location
The same picture also emerges when we consider individual item comparisons.As noted in the Introduction, the key challenge for featural models is distinguishing between identity and swap, that is, stimulus comparisons A and B (Table 1) -a distinction that can readily be captured by structural models.In a featural model, however, this can only be achieved by adding some sensitivity to spatial location, such as the relative position of objects within the stimulus (see Section 1.2).But this then invariably creates problems with comparisons E, F and G in Table 1, where the specificity introduced by spatial features (to distinguish identity and swap) distorts the similarity across the set as a whole.Notably, all the species groups can distinguish between identity and swap (i.e., comparisons A and B; Fig. S3).The nonhuman apes, for instance, are exactly at chance for comparison A but are significantly more accurate at identifying the target item for the 'swap' comparison in both Task 1 (p = 0.002; BF 10 = 29.6)and Task 2 (p = 0.007, BF 10 = 9.3).Incidentally, though it has been claimed that gorillas are less sensitive to structure than chimpanzees (Haun & Call, 2009), on our items all three gorillas were able to distinguish identity and swap above chance and indeed fall within the centre of the chimpanzee distribution on both tasks (Fig. 4).
For humans, the pair of stimuli in comparison E are highly dissimilar in both tasks (Fig. S3), but the feature models with spatial sensitivity (see 'S-FEAT and 'CS-FEAT' in Table 1) fail to capture this by rendering the items in E to be more similar than those in B (e.g., the swap).This limitation of the feature models, however, is relevant to performance in the nonhuman group; while the stimuli in comparison E are perceived to be dissimilar in Task 1, 3/7 nonhuman apes found the items in E to be as similar as those in identical comparison A! This indicates difficulty in accurately encoding relational information for stimuli based purely on line information, giving rise to the interactions observed in Fig. 3. Note, however, that these subjects' behavioural responses to E do not reflect a simple 'blindness' to one of the two feature dimensions in Task 2 (i.e., shape or line orientation), as we detail next.

Species-specific differences in dimensional salience
The relevance of specific feature dimensions to perceived similarity can be assessed by deriving model predictions for each dimension separately and then giving them different weight in determining similarity (100% colour, 90% colour/10% shape, and so on; see Hodgetts et al., 2009).This allows us to examine the performance of models for dimension 1 only, dimension 2 only, and all relative weightings in between (see Table 2 and Fig. S1A).Exploration of the best-fitting dimension weights for the well-performing models suggests that humans are sensitive to both colour and shape in Task 1, whereas chimpanzees and gorillas are, if anything, slightly more sensitive to colouran observation that resonates with previous findings (Mendes et al., 2011).Importantly, the same analyses for Task 2 suggest that, in contrast to human subjects, the nonhuman apes struggled to incorporate the orientation of the inner line for the Task 2 stimuli (Fig. S1A).This finding is also confirmed in a model-independent analysis that examines patterns of equivalence between comparisons that would be obtained if participants were blind to one or both of the dimensions (see Supplementary Results and Fig. S1B).
Note, however, that a participant's focus on a single dimension only could still provide evidence for structure-sensitivity.For instance, human experimental studies that applied similar stimuli to evaluate a range of structural models considered variation in only a single dimension (Hodgetts et al., 2009).Likewise in the present materials, the qualitative difference in respect to comparison E across Task 1 and Task 2 for nonhuman apes cannot be explained purely by selective dimension blindness, as comparison E is distinct from A regardless of whether a participant is paying attention to both dimensions, or a single one (Fig. S2).Finally, it is worth noting that even in Task 2 both dimensions (inner line/shape) mattered to chimpanzees and gorillas, as seen from the fact that there is anecdotal evidence (in terms of Bayes Factor) for a difference between the judged similarity of comparisons A and D, which are identical in respect to the more salient outer shape dimension (t = 2.12, p = 0.08; BF 10 = 1.4); this suggests some partial sensitivity to the inner line.

Common and distinctive features
Finally, it is worth commenting on the performance of Tversky's Contrast Model (Tversky, 1977).While virtually unknown in the animal literature, it is the most well-known featural model in human similarity research.The crucial feature of the Contrast Model is that similarity is a function of both the common and the unique features of each object in a comparison.Unlike the featural models outlined in Table 1, therefore, it does not just look for commonalities but also factors in those feature or attributes that distinguish objects (see Supplementary Methods for a full description of the model, and Tables 1 and S2 for overall and individuallevel fits, respectively).Given the nature of our stimuli, the difference between counting just matches, and counting matches and mismatches is more apparent than real: the key value of Tversky's Contrast Model is realised in situations where object descriptions differ in complexity such that the size (and weight) of the sets of distinctive features of each object vary.This, for example, is what allows the model to capture asymmetries in the context of directional similarity judgements.But these conditions are not met in our stimulus materials: every stimulus item consists of exactly two components, each of which has a colour and a shape.The number of matches is thus taken from the same wider set of possibilities in each case, so that enumerating the matching features effectively identifies the complement set of the non-matching features as well.As a result, we would not necessarily expect this model to perform vastly better than other featural models for our stimulus domain, even if contrasting features were relevant to all our participants.Indeed, while the Contrast Model is the best performing featural model for the humans on both Task 1 and Task 2, it is not the best featural model for nonhuman subjects (see CMOD-SM in Table 1).In fact, no single featural model performs best across Task 1 and Task 2 for the nonhuman apes; a finding that obtains both overall and at the individual level (see Tables S1-2).Similar to the individual-level fits reported in Section 3.2, the best fitting model is structural for the majority of human participants in Task 1 (7/ 10) and Task 2 (10/10) when the Contrast Model is also considered (Table S2).For nonhuman apes, the Contrast Model does not provide the best fit in any single subject for Task 1 or Task2 (Table S2).Finally, the key interaction between group and task in Fig. 3 is retained when the Contrast Model is included in the structurality scores (F (1, 15) = 12.8, p = 0.003, η p 2 = 0.46; BF = 1164.935).

Discussion
Contemporary models of human similarity place critical emphasis on the role of 'structure' in the representation of real-world objects and events (Gentner & Markman, 1997;Hahn et al., 2003), meaning that both individual elements and, critically, how they are interrelated can influence perceived similarity.Despite the relevance of structural information to human perceptual similarity, object recognition, analogical reasoning, and so on, dominant models of animal learning and cognition (e.g., Pearce, 1994;Rescorla & Wagner, 1972) still assume that nonhuman similarity and generalization can be readily captured by simple, feature-based representation schemes.Challenging this view, we have systematically demonstrated in this study that structural models of similarity can precisely capture, like in adult humans, patterns of similarity judgements in nonhuman species, specifically great apes (chimpanzees and gorillas).By applying a novel measure of 'structurality'derived by contrasting a set of featural models with a set of established structural models of similaritywe found that all nonhuman apes demonstrated sensitivity to structure when discriminating basic object stimuli (coloured shapes).This provides strong evidence that nonhuman ape species utilise complex object representations that go beyond basic feature sets.
Importantly, the claim that similarity comparisons in nonhuman apes seem to involve structured representations is distinct from the question of whether they can or cannot recognise purely relational similarities.Prior work has shown that many nonhuman species have difficulty recognising similarities that are based entirely on abstract relational properties, such as identifying that two squares and two circles share the relational property of "sameness" (Blough, 2001;George et al., 2001;Haun & Call, 2009; see also Penn, Holyoak, & Povinelli, 2008;Tomasello & Call, 2007).Obviously, if relational information is not even encoded, purely relational matches are impossible, but where relational information is encoded, it still may not support arbitrary comparisons.Purely relational properties such as "sameness" are highly abstract, higher-level aspects of comparisons; that these may be challenging is entirely in keeping with our own results indicating that building structured representations out of stimuli composed of arbitrary feature combinations is already difficult for nonhuman primates.
While all nonhuman apes demonstrated a striking sensitivity to structure for coloured shapes, they were notably more featural in Task 2, which involved line information only (inner and outer contours).Additional model-based and model-free analyses suggested that nonhuman apes had particular difficulty integrating the outer shape with the inner line orientation.This interesting result aligns with prior work that highlights a central role for colour (versus shape) information in nonhuman primate object individuation and discrimination (Mendes et al., 2011;Santos, Sulkowski, Spaepen, & Hauser, 2002).While past cross-species work has found some congruence with human results with respect to the feature binding of colour and shape (e.g., Buračas & Albright, 1999;Cook, 1992), 2 research with other types of 'features' (such as the distinction between local and global features) has suggested the possibility of significant cross-species differences (e.g., Fagot & Tomonaga, 1999;Hopkins & Washburn, 2002) in how attributes of compound stimuli are perceived.In the context of our central question about the representation of relational structure, the contrast between performance in Task 1 and Task 2 highlights how the functional relevance of different forms of lower-level stimulus information may impact on the expression of higher-order cognitive processes.For example, while colour was highly relevant in this visual discrimination paradigm, it may be that shape information would be weighted more in contexts where it is ecologically/functionally relevant, such as when selecting stimuli for action, as seen in tool selection (Santos, Miller, & Hauser, 2003).Importantly, while the nonhuman apes failed to show structural sensitivity to the line-only stimuli of Task 2, they did show evidence of attending to both 'dimensions' (outline shape/inner line) in their discrimination.This suggests that the difference between the two tasks may reflect a performance difficulty rather than an in-principle limitation.Specifically, the highly salient, accessible, and familiar dimension of colour may be easier to process, leaving greater resources for the integration of the second dimension (even human participants were somewhat slower in Task 2, hinting at a visual processing advantage for the coloured objects in Task 1).In other words, the task differences for the nonhuman apes may reflect the kinds of performance differences seen widely for humans in cognitive and developmental psychology across familiar and less familiar materials.It thus seems possible that greater sensitivity to structure in Task 2 would emerge after more extensive training with the basic discrimination.This dissociation suggests also that requiring the binding mechanism to be wholly independent of the elements to-be-bound (as assumed to be definitional of structured representation in Doumas & Martin, 2018, p. 169) may be too strong a constraint for the meaning of structure in nonhuman species.Indeed, our results might be taken to suggest that it would be empirically and conceptually fruitful to not take the issue of structure in nonhuman primates as all or none (Doumas & Martin, 2018).
That said, the results from Task 1 do mean that models that do not include relational information are too restrictive for nonhuman apes, at least some of the time.Featural or vector-based models of stimulus representation, and as a consequence similarity, continue to dominate cognitive psychology (Ashby, 1992;Galesic, Walkyria Goode, Wallsten, & Norman, 2018;Hout, Goldinger, & Brady, 2014), animal learning (Hall & Rodríguez, 2017;Holmes, Chan, & Westbrook, 2019;Luzardo, Alonso, & Mondragón, 2017;Pearce, 2008;Rasmussen, Zucca, Johansson, Jirenhed, & Hesslow, 2015), cognitive neuroscience (King, Groen, Steel, Kravitz, & Baker, 2019;Mur et al., 2013;Theves, Fernandez, & Doeller, 2019) and machine learning (Hamel, 2009).Though arguments about the empirical adequacy of such representation schemes have been repeatedly made, they continue to be popular, arguably, because it is empirically difficult to provide compelling evidence for the role of structure in determining visual similarity.For one, this stems from the difficulty of excluding 'mimicry' of seeming sensitivity to structure through featural models.However, it stems also from the idea that 2 Note, this does not mean that nonhuman species can necessarily use these object representations in the same way that humans can (e.g., see Smith, Minda, & Washburn, 2004).C.J. Hodgetts et al. structural similarities may be restricted in their role to 'higher-level' contexts, such as those involving some form of analogical reasoning (Hahn, 2014).This has made it seem plausible that structure does not matter to visual similarity in most, or all, nonhuman species.The present results refute this.Relational information has often figured as a fault line between researchers who view nonhuman species as capable of highlevel abilities, such as causal reasoning (Beckers, Miller, De Houwer, & Urushihara, 2006;Blaisdell, Sawa, Leising, & Waldmann, 2006;Call, 2004), theory of mind (Tomasello, Call, & Hare, 2003), imitation (Tomasello, Carpenter, Call, Behne, & Moll, 2005) or mental time travel (Clayton & Dickinson, 2009;Martin-Ordas, Berntsen, & Call, 2013), on the one hand, and those who view seeming demonstrations of such abilities as the product of simpler associative processes (Dwyer, Starns, & Honey, 2009;Heyes, 1998Heyes, , 2001;;Penn & Povinelli, 2007).In this context, the present results emphasise the need to distinguish carefully between relations embedded within object and/or event representations and true higher-order relational reasoning (Penn et al., 2008).Indeed, these results emphasise the need to develop more realistic models of the former, which include relational structure, even in species that show no evidence of the latter.
On a methodological level, the present paper shows that effective tests for the role of structure in perceptual similarity can be derived by thwarting attempts at (plausible) mimicry through systematic expansion of the set of comparisons.Given such a test, the limitations of featural models for human perceptions of similarity emerge very clearly, but they also emerge not just for chimpanzees but also the more distantly related gorillasat least for those types of stimuli that combine familiar object dimensions.

Fig. 1 .
Fig.1.A simple set of three stimuli (items 1, 2 and 3) for comparison.Below item 1 is a featural description in terms of the basic features (s = 'square', c = 'circle', w = 'white', and b = 'black'), the feature conjunctions (w+s = 'white+square', b+c = 'black+circle', etc.) and the features/conjunctions including relative spatial location (w+l = 'white+left', w+s+l = 'white-+square+left', etc.).To the right are the features that items 2 and 3 share with item 1.Note also that the comparison 1∩3 contains two matches in terms of colour: the colour match between the two squares and the colour match between square in the item 1 and circle in item 3.This latter match feels like it should 'count for less', in line with prior work(Gentner & Markman, 1997), but on a conjunctive scheme is dealt with by the conjunctions 'colour+shape' and 'colour+shape+relative location'.Even excluding the multiple match, however, item 2 remains less similar to 1 than 3.

Fig. 2 .
Fig. 2. (A) The stimulus comparisons used in Task 1 (left) and Task 2 (right).Each task has a target item comprising features A and B on dimension 1 (Task 1 = colour [blue/red]; Task 2 = line orientation [vertical/horizontal]) and A and B on dimension 2 (Task 1 = shape [square/circle]; Task 2 = shape [oval/diamond]).Alternative combinations of these features make up the comparison stimuli, as outlined in the Methods.(B) Trial schematic for the nonhuman ape training session is shown on the left.This phase established their preference for the target items shown in panel A. Subjects received a food reward (grape) for selecting the target item over the everyday object stimulus.Training terminated when subjects reached the required criterion (80% correct).In the main experiment, which was the nearidentical for human and nonhuman participants (see Methods), baseline trials (target item vs. everyday object) were intermixed with 'test trials'.For test trials, targets items were paired with one of seven stimuli from the set (which vary in their similarity to the target).(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. 3 .
Fig. 3. Sensitivity to structure in both human and nonhuman ape species (labelled 'NHA' in the figure).(A) 'Structurality' scores were calculated by fitting each model to similarity data in each individual subject (response time (RT) for humans and accuracy (acc.) for nonhumans) and subtracting the mean fit (R-squared) of the featural models from the mean fit of structural models (see main text).Mean structurality scores are shown for both humans (red bars) and nonhuman apes (cyan bars) for both Task 1 (left) and Task 2 (right).A positive score indicates that an individual's data is fit better by structural (MIP, RD) versus featural models (FEAT, C-FEAT, CS-FEAT, S-FEAT).Individual data points are shown for each subject in Task 1 (Human: n = 10; Non-human: n = 8) and Task 2 (Human: n = 10; Non-human: n = 7).(C) Correlations between human and nonhuman ape similarity judgements for Task 1 and Task 2. From left to right, the graphs depict the correlation between human and nonhuman apes (left), humans and chimpanzees (left-middle), humans and gorillas (right-middle) and chimpanzees and gorillas (right).Each scatter plot contains seven data points, each reflecting mean RT (humans) and accuracy (nonhumans) for each stimulus comparison (A to G). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. 4 .
Fig. 4. Gorillas and chimpanzees are similarly sensitive to structure.The accuracy for each nonhuman subject on the 'swap' comparison B is shown for (A) Task 1 and (B) Task 2. The gorillas are shown to be within the chimpanzee distribution on both tasks.The density distribution and 95% confidence interval for the chimpanzees is shown in blue.The individual data points for the chimpanzees and gorillas are depicted by blue and orange markers, respectively.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)