Hierarchical conceptual spaces for concept combination

We introduce a hierarchical framework for conjunctive concept combination based on conceptual spaces and random set theory. The model has the ﬂexibility to account for composition of concepts at various levels of complexity. We show that the conjunctive model includes linear combination as a special case, and that the more general model can account for non-compositional behaviours such as overextension, non-commutativity, preservation of necessity and impossibility of attributes and to some extent, attribute loss or emergence. We investigate two further aspects of human concept use, the conjunction fallacy and the ‘guppy effect’. © 2016 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).


Introduction
Humans undoubtedly have the ability to form new concepts by combining existing ones.The development of effective representational models of this phenomenon could potentially shed light on human cognition.Human-like reasoning has been argued to be important to artificial intelligence for its flexibility and robustness [6,29,44].Further, a good representation of human concept use will aid us in considering problems of categorization and typicality, as argued by Freund [18].Applications of AI that must interact with humans via natural language arguably need to be able to understand and to form for themselves novel combinations of concepts.Examples of theories proposed to account for such concept combination include prototype theory together with fuzzy set theory [51], conceptual spaces [19], and quantum probability [3,9] approaches.Well-known counterexamples have been identified which suggest that fuzzy sets may not provide an appropriate formalisation in this context [25,27,40].It is argued in [25] that the failure of fuzzy set theory to adequately model human concept combination results from its failure to consider the intension of concepts, i.e., the attributes that the concept possesses.In contrast, the conceptual spaces and the quantum approaches take intension into account, either by considering concepts as being comprised of a combination of properties, 1 which are themselves embedded in a space of quality dimensions, or by incorporating context into the model.Our proposed approach utilises a random set interpretation of membership so as to quantify an agent's subjective uncertainty about the extent of application of a concept.We refer to this uncertainty as semantic uncertainty [33] in order to emphasise that it concerns the definition of concepts and categories.Lawry and Tang [33] combine random set theory with conceptual spaces [19] and prototype theory [43], to give a formalisation of concepts as based on a prototype and an uncertain distance threshold, located in a conceptual space.We use this account of concepts to provide a framework for conjunctive concept combination which captures the effects seen in [25], including non-compositional behaviours such as overextension, non-commutativity, preservation of necessity and impossibility of attributes and to some extent, attribute loss or emergence.
An outline of the paper is as follows.Section 2 overviews a range of theoretical approaches to concept combination from the literature, and summarises the results from experimental studies that we aim to model.Section 3 describes a random set and prototype theory representational model for concepts within a conceptual space.This model provides the theoretical underpinning for our work.Section 4 introduces a framework for concept combination based on a hierarchy of conceptual spaces, and in which compound concepts are defined within Boolean spaces.We prove a number of results showing the properties of this framework and compare this approach to others in the literature.Section 5 provides a discussion of our results and indicates possible future directions.

Background
In this section, we describe a number of approaches to concept combination that have been proposed.We consider general set-theoretic approaches, supervaluation theory, prototype theory, fuzzy set theory, conceptual spaces theory, approaches from computational linguistics and quantum cognition approaches.We further describe some results from experimental studies with which we compare the theory we develop.

Set-theoretic approaches
Montague semantics [39] takes a model-theoretic approach to concepts and sentences.Concepts are defined using notions from set theory, and natural language expressions are modelled as functions or relations on these sets.This gives a description of how the semantics of a language interacts with the syntax, so that the meaning of a compound expression may be systematically derived from its parts.However, as discussed in [27,28], this is inadequate for modelling some types of adjectives.In [39], an adjective is viewed as a function from properties to properties.This allows sentences such as 'every small elephant is small' not to be branded as logically true, which is what we require.This enables various types of adjective to be modelled.Intersective adjectives are those where the application of that adjective may simply be viewed as an intersection of sets (such as 'red car').Adjectives that are not intersective may be subsective, when the adjective-noun combination is a subset of the noun, or non-subsective, for example privative adjectives like 'fake', or 'former'.However, the theory of adjectives as a function of properties is inadequate, in particular because it doesn't account for comparatives, i.e. the ability to say that x is A-er than y.To account for this, Kamp introduces a theory of vague models, which are viewed as a nested sequence of partial models.In a partial model, a predicate is explained as assigning a value 1 to those objects which fall under the predicate, 0 to those that do not fall under the predicate, and no value to those for whom the predicate is indeterminate.These partial models may be completed in various ways, and the degree of truth of a sentence is related to the probability of a particular set of completions of a partial model of the sentence conditioned on all sets of completions of the model.This set of completed models forms the basis for Kamp's supervaluation, where a sentence has truth value 1 if it is true in all completions of the model, 0 if it is false in all completions of the model, and indeterminate if it is true in some and false in others.
Kamp's approach is similar to Fine's [17], in which the questions of the correct logic for vagueness and the correct truth conditions for a vague language are considered.Fine calls the possibility that logical relations hold between indefinite sentences penumbral connection, and truths that arise from such a connection penumbral truths, and argues that no natural truth-value approach respects such truths.He argues that differences in truth-value within penumbral truths concerning two predicates are essentially a difference in the way that these predicates can be made more precise.He describes a theory of super-truth, in which a sentence is true iff it is true in all admissible and complete specifications of the sentence.
Both these approaches use the idea that there are in fact precise ways of describing a concept, and that the truth value of a sentence using a vague concept is dependent on the different possible ways of making the sentence more precise.In what follows, we do not consider truth values of sentences but rather typicality of an item to a concept.However, consideration of logics using the fuzzy sets we develop would be an interesting line of future work.
Interestingly, [1] argue that adjective-noun combinations can be represented purely as set intersection between the adjective and the head noun.This is achieved by the use of typed sets.These are sets in which members are assigned types.So the adjective 'clever' is represented in the following way: Clever = { j : human, f : pet, f : policedog} : clever where the interpretation of j is 'John', and the interpretation of f is 'Fido'.[1] argue that by using this type of representation the problems of privative adjectives can be circumvented.An example is as follows.From the two sentences 'Maria is a former teacher' and 'Maria is a programmer', we do not wish to infer 'Maria is a former programmer'.The typed set representation is as follows: Human = {m : human, ...} : human T eacher = {m : teacher, ...} : teacher F ormer = {m : teacher, ...} : f ormer Programmer = {m : programmer, ...} : programmer Then, we can infer that m ∈ F ormer ∩ T eacher, but not that m ∈ F ormer ∩ Programmer.This is further extended to describe differences in scope when applying multiple adjectives.The approach described is interesting, and could presumably be extended to include some sort of typicality measure.

Fuzzy set theory and prototype theory
Prototype theory views concepts as being defined in terms of prototypes, rather than by a set of necessary and sufficient conditions.Elements from an underlying metric space then have graded membership in a concept depending on their similarity to a set of prototypical cases.There is some evidence that humans use natural categories in this way; see for example experiments reported in [43].Fuzzy set theory [49] was proposed as a calculus for combining and modifying concepts with graded membership, and these ideas were then extended [51] to linguistic variables, these being variables taking words as values, rather than numbers.For example, 'height' can be viewed as a linguistic variable taking values 'short,' 'tall', 'very tall', etc..The variable relates to an underlying universe of discourse , which for the concept 'tall' could be R + .
Then each value L of the variable is associated with a fuzzy subset of , and a function μ L : → [0, 1] associates with each x ∈ the value of its membership in L. Prototype theory gives a semantics for fuzzy set theory through the notion of similarity to a prototype, as described in [15].In this context, concepts are represented by fuzzy sets and membership of an element in a concept is quantified by its degree of similarity to the prototype.Another possible semantic basis for fuzzy sets is random set theory (see [15] for an exposition).Here, the fuzziness of a set is a result of uncertainty about an underlying crisp set, i.e. semantic uncertainty.Fuzzy set theory seemed initially to be a natural formalisation of prototype theory, since it admits graded membership of concepts.However, work in this area has shown that it is inadequate as a model for human concept combination.A fuzzy set L is defined over a universe via a membership function μ L : → [0, 1].Elements x ∈ that are very good examples of the concept L have membership close to 1, whereas elements x that are bad examples of the concept have membership close to 0. The conjunction of two fuzzy sets is defined purely extensionally, for example , where min(a, b) indicates the minimum of the two values a and b.Then, overextension of conjunctions of concepts cannot be explained using standard conjunction operators within fuzzy set theory [25,27,40].Two key examples of this are the conjunction fallacy [47] and the 'guppy effect' [40].The conjunction fallacy is that humans often judge more specific conditions as more probable than more general conditions.For example, one might judge a bicycle that has been painted with polka dots to be more typical of the combined concept 'polka dot bicycle' than of the concept 'bicycle'.We discuss this further in section 4.4.The 'guppy effect' is introduced in [40], in which Osherson and Smith point out that a guppy, or goldfish, lacks many of the attributes of either a prototypical pet or a prototypical fish, whilst nonetheless being a prototypical example of a pet fish.These difficulties may be partly due to the failure of the fuzzy approach to account for the intension of concepts in the form of the attributes that the concept possesses.In contrast, conceptual space and quantum models are able to represent the intension of concepts, since in each case, a concept is viewed as being embedded in a multidimensional space, whose dimensions are, in some sense, the required attributes.
There is an important distinction between being a typical example of and being a member of a concept.It is entirely possible for something to be a member of a concept, but not typical of a concept.For example, a duck-billed platypus is a mammal, but it is not a typical mammal.This is explained by [45] by saying that concepts have defining and characteristic features, and that this is what determines concept membership.In [24,26] Hampton argues against this hypothesis.He argues that membership in a conjunction of concepts may be determined by placing a threshold on a judgement of similarity of an item to a composite prototype, so that the two notions of typicality and membership may be attributed to one common cause, and furthermore, that judgements of typicality are correlated with probability of categorization.We subscribe to Hampton's view, as will be seen, allowing for the different weighting of particular attributes, which can thereby contribute to the typicality of an item to a concept.We do not discuss the similarity threshold at which a judgement of membership in a concept should be made.However, we often use the notion of membership in a fuzzy set as a proxy for typicality, and in particular we use the terminology 'membership function' and 'membership value'.These should be seen as akin to typicality ratings in all that follows.

Conceptual spaces
Conceptual spaces are proposed in [19] as a framework for representing information at the conceptual level.Gärdenfors contrasts his theory with both a symbolic, logical approach to concepts, and an associationist approach where concepts are represented as associations between different kinds of basic information elements.Rather, conceptual spaces are geometrical structures based on quality dimensions such as weight, height, hue, brightness, etc.It is assumed that conceptual spaces are metric spaces, with an associated distance measure.This might be Euclidean distance, or any other appropriate metric.The distance measure can be used to formulate a measure of similarity, as needed for prototype theory, according to which similar objects are close together in the conceptual space and very different objects are far apart.
To develop the conceptual space framework, Gärdenfors also introduces the notion of integral and separable dimensions.Dimensions are integral if assignment of a value in one dimension implies assignment of a value in another, such as depth and breadth.Conversely, separable dimensions are those where there is no such implication, such as height and sweetness.A domain is then defined as a set of quality dimensions that are separable from all other dimensions, and a conceptual space is defined as a collection of one or more domains.Gärdenfors goes on to define a property as a convex region of a domain in a conceptual space.Finally, a concept is defined as a set of such regions that are related via a set of salience weights.This casting of (at least) properties as convex regions of a domain sits very well with prototype theory, as indeed Gärdenfors points out.If properties are convex regions of a space, then we can say that an object is more or less central to that region.Because the region is convex, its centroid will lie within the region, and this centroid can be seen as the prototype of the property.
There are a few approaches to defining concept composition based on conceptual spaces.Firstly, Gärdenfors proposed that when combining a pair of concepts as he defines them, properties in one concept are replaced by properties from the other, depending on the salience, or weighting, of each concept and each property.He goes on to introduce the notion of a contrast class.This has the effect that a particular property is restricted to a certain area.For example when talking about red wine, the concept 'red', determined by the contrast class 'wine', is a subset of the standard concept red.In order to model this, Gärdenfors maps the whole of the colour domain onto the subset of colours that can apply to wine.The formal rule for concept combination is then that the combination C D of two concepts C and D is determined by letting the regions for the domains of C confined to the contrast class defined by D, replace the values of the corresponding regions for D. So in the example of 'red wine', the space of colours has been restricted by the noun 'wine' to a subset of the full colour space with the same geometry.The colour of the wine is then taken to be the colour that is occupied by 'red' within the restricted colour space.
Consistent with this high level description of concept combination, we now describe below two more formal approaches based on conceptual spaces.Adams and Raubal [2] give a fairly straightforward formalisation of Gärdenfors's account, within which a conceptual space consists of a 6-tuple of domains, concepts, instances, contrast classes, contexts, and a similarity sensitivity parameter.Each domain is a set of quality dimensions.A concept is defined as a pair consisting of a set of convex regions of domains together with a prototypical instance P , and a property as a concept that includes only one domain region.A contrast class is defined as a region of a unit hypercube corresponding to a domain.Although it is not entirely clear why a unit hypercube is used rather than the domain itself, this is presumably as a way of normalising the dimensions of the domain before the contrast is applied.A context is defined as a finite set of salience weights.In [2], Adams and Raubal go on to define three types of concept combination: property-concept, concept-concept and contrast-class-concept combination and give algorithms for the implementation of each type of combination.However, they do not attempt to account for the fuzziness of natural concepts, or give any account of non-compositional features.
Another approach that gives a formal definition of conceptual spaces is described by Rickard et al. in [42].This views concepts as a function from pairs of properties into a unit interval.These properties are defined as fuzzy sets in a domain Dom i .A concept C is therefore a set of correlations between pairs of properties (a, b) where a, b belong to a set of properties.Each pair of properties (a, b) has a value C ab in a concept, which gives the strength of the correlation of a and b in the concept.For example, consider the concept Banana.The property yellow and the property sweet are highly correlated, and the property green and the property bitter are highly correlated.A context is defined as a set of properties, and the similarity between two concepts C 1 and C 2 as the mutual subsethood of C 1 and C 2 relative to that context.The mutual subsethood functions as a way of determining the overlap of two concepts.It is defined by: where C ab is the value of the correlation between properties a and b on concept C .
For example, suppose our context is the set {red, round} and our objects are C 1 = apple, C 2 = cricket ball.The strengths of the correlations are given in Tables 1a and 1b.Then The role of the context is to determine which properties are relevant in calculating the similarity.The membership of an observation in a concept is defined as the similarity of the observation to a given concept, and the label is then assigned which has maximum membership for the given observation.Dynamics on the space are also introduced, which allow properties to be prioritised for attention.Composition of concepts is carried out by combining properties by taking the union of the property sets so that the resulting combined concept has properties that belong to both constituent concepts.Whilst Rickard et al. do model fuzziness, they do not attempt to account for non-compositional features of human concept use.

Computational linguistics and vector space models
Within the field of computational linguistics, vector-based models of word meaning have proved very fruitful.The meaning of a particular word is represented as a vector, where the basis of the vector space might be a chosen set of words (usually, the most common, excluding a list of stop-words such as 'a', 'the', 'and', and so on), or some other carefully chosen dimensions, and the entries in the vector are word co-occurrence statistics, or a relation between the words and the documents they occur in [10,31,37].A comprehensive paper by Mitchell and Lapata [38] gives a comparison of various techniques for adjective-noun composition.Related approaches are given in [7,11], where adjectives are viewed as matrices, and nouns as vectors.Whilst these approaches have considerable merit, the underpinning space cannot be viewed as a conceptual space that describes features of the concept.The relationship of individual dimensions to the vectors are not attributes, but could be instances, parts of, or any other incidental relationship.The development of a suitable conceptual space for these models would be an interesting line of future research.
Another approach within the computational linguistics framework is the development of a family of 'microtheories' of word meanings.[41] develop a microtheory of adjectives whereby the analysis given in [27] is extended to examine what a number of linguists determine to be the taxonomy of adjectives.Words are represented as having syntactic and semantic types.The syntactic type describes how the word can be combined with others.The semantic type describes the semantic effect of making such a combination.So, for example, the adjective 'big' can be applied in the combinations Adj-Noun or Noun-Copula-Adj, and has the semantic effect that it can be applied to physical objects and limits the normalised value of the size property to greater than 0.75.Adjectives are divided into scalar -based on properties, denominal -based on object, and deverbal -based on processes.The distinctions between these types lie in their semantics, namely the different ways in which they combine with nouns to form a composite.In the work we present here we focus on intersective adjectives, since as pointed out in [27] even these require work to clarify how typicality functions in a composite concept, and therefore we do examine the difference in semantic types.Explaining these differences will be an interesting area for further work, however.

Quantum probability models
The quantum probability model introduced by Aerts [3], sees a concept as a quantum entity within a vector space, the dimensions of which are the contexts of the concept.When no context is present for the concept, the concept is in its ground state.The application of a context then changes this concept into the concept under that context.Typicality of an item to a concept changes with context.As such, problems such as the 'guppy effect' are accounted for by noting that the typicality of a guppy to the concept pet in its ground state differs from the typicality of a guppy to the concept pet in the context 'the pet is a fish'.Aerts et al [5] give a description of how the effects of contextuality, interference, entanglement and emergence may be seen in human concept use.Contextuality may be seen in the way that the typicality of an element to a concept changes with the context given.The phenomenon of interference in quantum vector spaces allows over-and under-extension to be modelled when combining concepts.Briefly, membership in a concept is modelled by the projection of the concept onto the subspace representing the item.When evaluating the membership of an item to the composite concept ' A and B', it may be the case that an interference term needs to be introduced.This interference term accounts for over-or under-extension.This idea is explained in detail in [4], in which data from [23,24] is modelled.The phenomenon of entanglement is found to be present in data concerning the applicability of combinations of concept pairs.The concept of emergence is explained as the idea that a totally new concept has been introduced by forming a conjunction of concepts.To account for this, Aerts et al. propose the use of Fock space.In Fock space, an entity may be in a superposition of states.In the case of concept combination, one of these states is the completely new concept, and another is the concept as a combination of two concepts.Using these notions, the quantum probability model develops ways of modelling which account for both the fuzziness of human concept use and effects of non-compositionality. Within this paper, we aim to show that our approach can account for these aspects of human concept use within a simpler and more intuitive framework.

Experimental studies
Hampton [22] reports results from two experiments.The aim of the first was to generate a list of attributes for each of six pairs of concepts and their conjunctions.An example is the pair of concepts Sports and Games.A list of attributes was collected for each of these concepts, and for the conjunctions 'Games which are Sports' and 'Sports which are Games'.This was repeated for each of the six pairs of concepts.Based on these lists of attributes, the second experiment asked participants how useful each attribute was in defining the concept, as measured on the scale shown in Table 2 (the numerical value was imposed later rather than given by participants).
Averaging across subjects then gives the mean importance rating for each attribute in each concept.Some attributes have similar importance within pairs of concepts, and some differ.For example, the attribute 'Is used by people' has a mean rating of 3.00 for both Machines and for Vehicles.However, 'Replaces people' has a mean rating of 2.00 for Machines, and −1.00 for Vehicles.The challenge then is to predict the importance of an attribute for a combined concept such as 'Machines which are also Vehicles' from the attribute weightings of the constituent concepts.Hampton reports that using multiple regression to obtain weight coefficients for a weighted sum provides the best predictor of attribute weightings in the combined concepts, but that noncompositionality is also observed.For example, some attributes with low importance in the constituent concepts may have a high importance in the combined concept.This is termed 'attribute emergence'the attribute 'Lives in a cage' has low importance for 'Pet' and for 'Bird', but high importance for 'Pet which is also a Bird'.A similar way in which noncompositionality manifests itself is in the preservation of necessary or impossible attributes.
When an attribute is seen as necessary (or impossible) for one of the constituent concepts, that importance rating is carried over into the attributes for the combined concept.Therefore, there is no functional relationship between the importance of an attribute in the constituent concepts, and the importance in the combined concept.Rather, this depends on the particular concepts involved.Hampton finds that conjunction is not commutative in that the qualifying noun, i.e. the second noun in the conjunction, is given more weight.Lastly, dominance effects are also seen, in that concepts which bring more attributes to the conjunction tend to dominate.
Hampton therefore reports the following six main results: • The attribute set for combined concepts is the union of the attribute sets of the constituent concepts • The importance of attributes in the combined concept is usually a weighted sum of the importance of the attributes in each individual concept • Necessity and impossibility are preserved We will argue that our proposed model of concepts and concept combination can also account for these phenomena.Furthermore, our framework is a natural extension of the conceptual spaces model in which the importance of certain dimensions is related to their necessity as defined by possibility theory [16].

Formal model of concepts
In this section we outline our conceptual spaces based model of concepts which forms the theoretical underpinnings of our work.This model of concepts combines a prototype theory approach with random sets, capturing both typicality and semantic uncertainty, first outlined by Lawry and Tang in [33].We will go on to build on this model of concepts to form a framework for concept combination.

A prototype and random set model of concepts
In this framework, agents use a set of labels L = {L 1 , L 2 , ..., L n } to describe an underlying conceptual space which has a distance metric d(x, y) between points. 2 If one of x or y is a set then we take the distance to be the minimum distance to any point in the set.For example, suppose Y is a set, then d(x, Y ) = min{d(x, y) : y ∈ Y }.Each label L i is associated firstly with a set of prototype values P i ⊆ , and secondly with a threshold ε i , about which the agents are uncertain.The thresholds ε i are drawn from probability distributions δ i .Labels L i are associated with neighbourhoods The neighbourhood can be seen as the extension of the concept L i .The intuition here is that ε i captures the idea of being sufficiently close to prototypes P i .In other words, x ∈ is sufficiently close to P i to be appropriately labelled as L i providing that d(x, P i ) ≤ ε i .This is illustrated in Fig. 1.
Given an element x ∈ , we can ask how appropriate a given label is to describe it.This is quantified by a membership function, denoted μ L i (x), corresponding to the probability that the distance from x to P i , the prototype of L i , is less than the threshold ε i , as given by:  We also use the notation , as defined in [32].
Each label L i is entirely defined by its prototype P i , the distance metric in the space d(x, y) and the distribution δ i of the threshold ε i .We can therefore, given a particular conceptual space , use the notation The idea of a membership function presented here may be compared with the similarity relation that Gärdenfors uses.A similarity relation between points in a conceptual space may be defined as a decreasing function of distance in the space.Gärdenfors gives the example that the similarity s(x, y) between two points x and y in the conceptual space is an exponentially decaying function of the distance d(x, y) between the two points, i.e. s(x, y) = exp(−cd(x, y)).In terms of the prototype-threshold approach outlined above, the membership of an element x in a concept L may then be defined as the similarity to the prototype P of L, where ε ∼ Exp(c).More generally, s(x, y) = (d(x, y)).
This approach is, however, in contrast to Gärdenfors' original approach which is to view the space as partitioned by a Voronoi tessellation.If this latter approach is taken, each individual point in the conceptual space is allocated to exactly one label.With a prototype-threshold approach, it is easy to accommodate the idea of an object being accurately described by more than one concept, or conversely, some points within the space not being assigned to any concept.This difference is illustrated in Figs. 2 and 3.
The Voronoi diagram approach to describing concepts can be extended to include graded boundaries, and such an approach is developed in [13,14].We argue that a drawback to this type of representation is that every single point has been categorised.In contrast, in the label semantics approach there can be points which have not been categorised.This is desirable: imagine the first Western scientists to encounter a duck-billed platypus.It is not clear how to categorise this animal, and it could be modelled as being in a region of space that has not yet been assigned to a category.Another advantage of the neighbourhood model is that concepts can overlap which allows us specifically to refer to borderline regions.However, a benefit of the Voronoi tessellation approach to concept representation is that the membership of a point in a concept depends not only on the prototype of that concept but on the proximity of other prototypes.To integrate this aspect into the label semantics approach to concepts would be an interesting area for future research.3. Conceptual space divided into concepts according to a prototype-threshold approach.Some points in the space correspond to more than one concept, and some correspond to none.

Background
As described in section 2, Hampton [22] gives a series of results on human understanding of conjunctive concepts, such as 'sports that are games'.It had already been shown [40,46] that standard fuzzy set-theoretical conjunctions and disjunctions do not adequately model human understanding of composite concepts.Hampton's work elicits data that could form the basis of a model of conjunction that more accurately reflects how humans understand conjunctive concepts.

A new approach to concept composition
An initial approach to modelling Hampton's data within the conceptual spaces framework would be to view individual attributes, for example 'Talks', 'Has fur', 'Has claws', as each forming a dimension of the conceptual space.However, these attribute dimensions are very different from the usual conceptual space dimensions in two ways.Firstly, they are mostly binary, unlike dimensions such as 'height', 'depth' or 'breadth'.Secondly, they are very complex in comparison to the types of dimensions proposed by Gärdenfors.For instance, having feathers seems to be a multidimensional concept in itself.
This motivates a new hierarchical formulation of conceptual spaces in which we model attributes as labels, each taken from individual domains.In Gärdenfors' terminology, each label would be a property from an integral domain.So an attribute like 'rounded' is seen as a label based in a space such as R 3 , or 'red' as based in the CIELab colour space.From this perspective, each of the attribute labels can form a binary dimension which are then combined to form the space {0, 1} n where n is the number of attributes.Within this binary space, we take the value 1 on a particular dimension to mean that an object has that particular property.Fig. 4 gives a schematic representation of this model, within which we treat the combination space {0, 1} n itself as a conceptual space with an associated metric.This enables us to apply the neighbour-based prototype model of concepts outlined in section 3 to form compound concepts made up of many properties.The motivation for treating this combination space itself as a conceptual space is that if we view each label as a property in an integral domain, then this is precisely a formalisation of the conceptual spaces that Gärdenfors proposes.Gärdenfors suggests both a weighted sum of properties and a weighted Euclidean distance metric in the property space.The formalism we propose corresponds to the weighted sum of properties, but generalises it.If we were to use a cube [0, 1] n ∈ R n , then we might be able to give the weighted Euclidean distance as a special case.This is an area for further work, however.
In the sequel we formalise this idea and prove a number of key results concerning conjunctive concepts defined in this way.We show that if the threshold of the compound concept in the binary combination space is uniformly distributed, the membership function for the compound concept is shown to be a weighted sum of the membership functions of the individual labels.This result nicely parallels Zadeh's operation of convex combination, and Gärdenfors' proposal that concepts should be seen as sets of properties related by salience weights.Lastly, we will show that under certain conditions, the importance of an attribute in a conjunction of two compound concepts can be calculated as the weighted sum of the importances of the individual attributes, directly mirroring Hampton's results.
A conjunctive label is defined in a binary space as follows.Consider a set of distinct integral domains 1 , . . ., n , such as the CIELab colour space, size, and taste.We select a label from each domain for combination.So an apple might be described as red and sweet and medium sized.This gives us a set L A = {L 1 , . . ., L n } where L i ⊆ i for i = 1, . . ., n.
Each label L i is defined by the triple < P i , d i , δ i >, as described in section 3, where the prototype P i ⊆ i , d i is the distance metric in i , the threshold ε i is a random variable into R + and δ i is a probability density on ε i .We can then define a Boolean variable X i into {0, 1} with reference to a point Y i ∈ i for i = 1, . . ., n as follows: Here X i = 1 means that the object being described has the property L i , i.e., , where N is the neighbourhood of L i , as described in section 3. Also, P ( A vector Y ∈ 1 × . . .× n generates a Boolean vector X into {0, 1} n .In this case, the probability distribution for X i is determined by δ i .Now consider a conjunctive concept as being defined by a conjunction of labels or their negations, covering all labels in L A. It is therefore of the following form: where +L i = L i and −L i = ¬L i .Expressions of this type are referred to as atoms.
Each atom then naturally defines a point in {0, 1} n as follows: We think of the space {0, 1} n as the binary conjunction space, and of x α as the prototype of the conjunctive concept α.
We also allow some deviation from the prototype by taking into account the different levels of importance of each label L i .
The differing importance of the labels is characterised by a weight vector λ which weights each dimension in the binary space.These ideas are illustrated in Fig. 5.
We can now consider membership in the conjunctive concept within the binary space.A conjunctive concept is defined by the triple α =< x α , d, δ > where α is an atom of L A, d is a distance on {0, 1} n , ε is a random variable in R + and δ is a probability distribution on ε.We say that an element X ∈ {0, 1} n can be appropriately described by the concept α iff d( x α , X) ≤ ε with the membership function defined by: We can then relate membership in 1 × 2 × ... × n to membership in the binary space {0, 1} n as follows.
Fig. 6.Prototype for α = L 1 ∧ L 2 and weighted dimensions in a two dimensional binary space, together with the threshold ε for α.The point (0, 1), indicated by an open circle, can be considered to be an instance of the concept for which x α is the prototype.
We define a binary random variable Z into {0, 1} such that: . Now by total probability we have that: We may assume that Z and Y are conditionally independent given X , since Z is defined purely in terms of X .
Letting μ α ( Y ) denote P (Z = 1| Y ) and assuming independence of the dimensions i = 1...n we then have that: More generally, we can define a compound concept with prototypical case θ = I ±L i , where I ⊆ {1, ..., n}, as a triple θ =< P , d, δ > where: P is therefore a set of points which all have the same values on the dimension specified by the index set I , and cover all remaining possibilities across the dimensions not in I .This implies that where I = {1, ..., n}, P is a singleton.
In this case we have that: We now define a distance metric in the binary space {0, 1} n based on Hamming distance and a weight vector.

Definition 1 (One dimensional Hamming distance
The effect of this distance metric on membership in the binary space is illustrated in Fig. 6.Suppose that the concept 'bird' is characterised, for illustrative purposes, by two properties L 1 = 'flies', L 2 = 'has feathers'.The property L 1 may be relaxed, since there are birds which do not fly.So animals which have feathers, but do not fly, are still considered birds but not typical birds.We characterise this using the weights in the binary space.Therefore in this case, the weight on the first dimension, λ 1 will be smaller than λ 2 .The effect this has is to create elliptical neighbourhoods in the space.
We now outline a correspondence between our idea of a compound concept as a conjunction of attribute labels or their negations, and Hampton's account of concepts as combinations of attributes with individual weights.Firstly, as stated above, we model individual attributes as labels from conceptual spaces such as the colour space, or the taste space.We have a vector of weights, λ, attached to the binary space which loosely corresponds to Hampton's attribute weights.However, the weights in Hampton's account range from 4, being necessary, to −2, being impossible.In contrast, our weight vector λ is always positive, and the idea of an attribute L i being atypical or impossible is captured by the notion that the conjunction n i=1 ±L i includes ¬L i .The extent of the atypicality of the attribute L i is then given by the weight λ i of the corresponding dimension.

Properties of the hierarchical model
We now give a series of results concerning the formulation and properties of compound concepts.We firstly give an example of how two properties may be combined.In section 4.3.1 we show that as a special case, the membership function of a compound concept reduces to a weighted sum of the membership functions of the individual constituent concepts, thereby giving a mathematical grounding to the ideas proposed in [19] of seeing a concept as a weighted combination of properties, or in [50] of forming complex concepts via a mechanism of convex combination.Lastly, section 4.3.2shows how the conjunction of two compound concepts can again be modelled as a weighted sum.This result models Hampton's results [22], outlined in 4.1.

Results for compound concepts using hamming distance
The example above shows that the membership functions generated within this framework can be very flexible.However, we show that by restricting the type of membership function used in the binary combination space, we can derive an expression for the membership function μ θ ( Y ) of the compound concept θ = k i=1 ±L i , k ≤ n as a weighted sum of the membership functions for individual domains μ ±L i (Y i ).This grounds proposals in [19,30,50] that complex concepts can be built up as sums of weighted properties.
and therefore: From this we have that: Theorem 4 grounds the idea that properties can be combined via a set of weights to form a concept, as proposed by Gärdenfors in [19], or the operation of convex combination proposed by Zadeh in [50].The model sits particularly well with Gärdenfors's proposal since it uses a binary conceptual space as the mechanism for combination.Furthermore, the fact that we require specific conditions for the distribution of the threshold of the concept in the binary space is an advantage, since relaxing these conditions allows us to explain some of the characteristics of concept combination seen in psychological experiments, such as overextension or non-commutativity.We will discuss this further in section 4.3.3.
A key aspect of concepts in Gärdenfors's conceptual spaces is that they should be convex.In a space R n with the Euclidean distance metric, convexity of a set S is defined by the property that ∀x, y ∈ S, every point on the line segment connecting x and y is also in S. A detailed discussion is given [19] citing experimental evidence for the fact that concepts as used by humans tend to be convex, and that the use of convex concepts requires less cognitive load.We give a definition of convexity for the binary combination spaces {0, 1} n .Definition 5 (Betweenness).∀x, y, z ∈ {0, 1} n with distance metric H λ , z is between x and y, B(x, y, z) iff H λ (x, y) = H λ (x, z) + H λ (z, y).

Definition 6 (Convexity).
A set S ⊆ {0, 1} n is convex if ∀x, y ∈ S, every point z lying between x and y also belongs to S, i.e. {z : B(x, y, z)} ⊆ S.
We can now generalise Theorem 4 to the case where θ = k i=1 ±L i , k ≤ n.In this case, the prototype P does not specify all values of i = 1...n.We firstly introduce some notation allowing us to talk about the set of dimensions in a prototype that remain invariant.We argue that if P does not specify all dimensions of {0, 1} n , then the weight vector λ must be such that only those dimensions contributing to the concept are weighted.For example, suppose n = 3, P = {(1, 1, 1), (1, 1, 0)}.Then λ = (0.4,0.3, 0).We go on to prove that a similar result to that shown in Theorem 4 holds for θ .
We now introduce some notation to enable us to talk about the dimensions of a set of points S ⊆ {0, 1} n that take a fixed value across the subset.
These results show that a compound concept can be built up out of the weighted sum of individual concepts, provided that certain key conditions hold.We now go on to look at the behaviour of the conjunction of two such concepts.

Conjunctions of compound concepts
Up to now we have discussed how properties from integral domains may be combined to form concepts.The results in [22] concern how the weighting of these properties change under the conjunction of two such concepts.We extend our framework to take into account the conjunction of two compound concepts.To do this, we introduce a second level binary space, illustrated in Fig. 8, and combine the two concepts within the second level space using the same approach as in section 4.2, i.e. as if they are themselves properties.
Let θ =< P 1 , d 1 , δ 1 > and ϕ =< P 2 , d 2 , δ 2 > be two compound concepts consisting of a conjunction of attribute labels from individual conceptual spaces i .As in section 4.2 we define the following two binary variables: We now define the conjunction of compound concepts as the triple where d is a distance metric on {0, 1} 2 .In this case we can define a binary random variable C such that: We also define the membership function for θ ∧ ϕ as follows: So that μ θ∧ϕ ( Z ) = P (C = 1| Z ).Now by applying the theorem of total probability we have that: Then by a second application of the theorem of total probability we have that: We can now look at the behaviour of the weights in a conjunction of compound concepts.
Proof.Suppose w.l.o.g. by Theorem 9 that i / So under certain conditions, the attribute weights in the conjunctive concept are a weighted sum of the attribute weights of the constituent concepts, which models one of Hampton's key findings as stated in section 4.1.Further, since P θ∧ϕ = P θ ∩ P ϕ , E(P θ∧ϕ ) = E(P θ ) ∪ E(P ϕ ), i.e. the attribute set for the conjunctive concept is a union of the attribute sets for the constituent concepts, modelling another aspect of Hampton's results.

Example 12 (Property-concept combination). Suppose
in the space {0, 1} 2 with prototype (1, 1), weight vector λ = (0.5, 0.5) and boundary distribution δ = U (0, 1).Then Two objections might be made to this approach.Firstly, the combination could be of the form 'apple green', where the resulting concept should be a colour, rather than an apple.The combination mechanism given in the example would return a green apple, rather than a colour.Secondly, as described here, this approach does not take into account cases like 'red wine', where the meaning of 'red' has been changed by the concept it is attached to.To answer the first objection, note that the resulting concept from a combination like 'apple green', or 'green apple' is indicated by the part of speech that each word belongs to.In English, this is indicated by word order, so we know that in the first case the resulting concept should be a colour, and in the second case the resulting concept should be a fruit.In the combination of properties and concepts, the resulting concept will be the original concept modified in the specific domain to which the property applies.So, in the case of 'green apple', the resulting concept is an apple in which the colour domain has been modified.In the case of 'apple green', we would choose just the domains from 'apple' that are relevant to the domains of 'green', i.e. colour, and form the combination 'apple green' using the approach outlined in Example 12.This approach is similar to that outlined by Gärdenfors [19].
In the second objection, the property 'red' in the combination 'red wine' refers to a particular set of shades of red which are not at all prototypical.Gärdenfors [19] has again addressed this, using the idea of contrast classes, which map the whole domain onto a subset of the domain determined by the that class.So in the case of wine, there is a range of distinctive shades of wine.When the whole colour domain is mapped down to this range, the places that the 'red' and the 'white' labels inhabited in the original labelling should now map onto the 'red' and the 'white' areas in the range of wine colours.Our approach is slightly different.As explained in the first objection, the concept 'wine red' can be obtained by forming a combination of 'wine coloured' and 'red'.The 'red' in 'red wine' is then understood to be an instance of 'wine red', rather than every day 'red'.Although this argument might appear to be circular, we argue that it is not, since the first time someone encounters the concept 'red wine', a mistake could be made about what colour the drink would be.Only after learning that the label 'red' in 'red wine' refers to the darkest colour of wine can they use it properly.This can be seen as an instance where the meaning of the term red, when applied to wine, is determined by convention rather than a systematic combination.
Furthermore, although when introducing this framework we have made a distinction between properties and concepts, this distinction is not really important in actually carrying out a combination.Increasingly complex concepts can be created and combined with other complex concepts or alternatively with simple properties utilising a single domain.The novelty of this approach is that the combination mechanism is itself characterised by a conceptual space.As a special case our framework entails that concepts may be characterised as weighted sums of properties, a characterisation of concepts proposed in [19,51].Hampton shows that a majority of his data may be explained by a simple multilinear regression, which can be modelled as in the example above.However, he also notes other non-compositional behaviours, which are key aspects of how humans use concepts.We show how our framework can model some of these non-compositional behaviours in the next section.

Non-compositional behaviours
In addition to the general rule that the importance of attributes in the conjunction is the weighted sum of the importance of attributes in the constituent concepts, Hampton identifies four additional behaviours: necessity and impossibility are preserved; attribute loss or emergence is observed; conjunction is not commutative; and dominance effects are observed.This section will discuss the capability of our model to capture these behaviours.
We consider firstly the ideas of necessity and impossibility.Hampton finds that necessity of dimensions is preserved, so that if an attribute is deemed necessary in a constituent concept, it is also deemed necessary in the conjunction.As outlined in section 4.2, necessity and impossibility are essentially the highest and lowest weights that can be assigned to an attribute in Hampton's experiments.Recall that we view the impossibility of an attribute L i as equivalent to the necessity of ¬L i , and we measure the necessity of an attribute or its negation using the notion of necessity from possibility theory as outlined in Definition 13.
We also introduce an alternative definition of the importance of an attribute, consistent with our random set based conceptual models.Rather than simply consider the weight λ to be the importance of the attribute, we use the idea of necessity from possibility theory [16].Within possibility theory, the possibility π(s) of a state of affairs s indicates to what extent this state of affairs is possible.π(s) is a measure on the interval [0, 1].The possibility of a set of states A is then defined in [16] to be: The necessity of an event A is then N( A) = 1 − (A c ), i.e. 1 − the possibility that A does not occur.
Within our model, a state s is considered to be a particular point X in the binary combination space {0, 1} n and π( X) := μ θ ( X).To compute the necessity of a dimension i to a concept θ =< P , H λ , δ >, where P ⊆ {0, 1} n , we consider the necessity of set S i = { X : X i = p i }, i.e. the set of all points that have equal value to the prototype on dimension i.

N(S
where N ε θ is the neighbourhood of θ as defined in section 3. It may be the case that P contains both points with value 0 on dimension i and points with value 1 on dimension i.In this case we say that S i = ∅, since S i must satisfy Definition 13 (Necessity of a dimension).Given θ =< P , H λ , δ >, the necessity of a dimension i to θ is defined as the necessity of the set S i = { X : X i = p i }, where p i is the ith dimension of {0, 1} n , by N(S i ) = δ({ε : N ε θ ⊆ S i }).
If P contains both vectors with value 0 on dimension i and vectors with value 1 on dimension i, then N(S i ) = 0, since . We have introduced the concept of the necessity of a dimension in order to account for some of the non-compositional aspects of conjunctive combinations of concepts.To examine the necessity of a dimension in a conjunction of two compound concepts, we relate the distribution of the threshold ε in the higher level binary space to the neighbourhood in the first level binary space.
We begin by defining the neighbourhood of a conjunction of two concepts.Suppose that two concepts θ and ϕ are combined in the second level space {0, 1} 2 , with weight vector w = (w 1 , w 2 ) where w 1 is associated with θ and w 2 with ϕ.Suppose w 2 ≤ w 1 .Now, if the threshold ε in the second level space is less than w 2 , then an element X of the first level binary space must belong to the neighbourhoods of both concepts θ and ϕ, i.e.X ∈ N ε θ θ ∩ N ε ϕ ϕ .When w 2 ≤ ε ≤ w 1 , the points within ε in the second level space are {(1, 1), (1, 0)}, so X must belong to N ε θ θ , but does not have to belong to N ε ϕ ϕ .When w 1 ≤ ε ≤ w T , where w T = w 1 + w 2 , X may belong to either neighbourhood, i.e.X ∈ N ε θ θ ∪ N ε ϕ ϕ .This is summarised in the following definition: Definition 15.For a conjunction θ ∧ ϕ =< {(1, 1)}, H w , δ >, where w = (w 1 , w 2 ) and w 2 ≤ w 1 , the neighbourhood R of θ ∧ ϕ is: This allows us to relate the necessity of a dimension to the concept θ ∧ ϕ to the necessity of the dimension to the constituent concepts θ , ϕ since N θ∧ϕ (S i ) = δ θ∧ϕ (ε θ , ε ϕ : R ⊂ S i ).
Theorem 16.For a conjunction of compound concepts θ ∧ ϕ =< {(1, 1)}, H w , δ >, where θ , ϕ are defined as for Theorem 10 This allows us to choose a boundary distribution δ which gives us the property that high necessity is carried through into the conjunction.This is illustrated in the following example.
Example 18. Suppose that N θ (S i ) = 0.9, N ϕ (S i ) = 0.6, w 1 = 0.2, w 2 = 0.8.The weighted sum of the necessity of attribute i is then 0.2 × 0.9 + 0.8 × 0.6 = 0.66.Now, if δ = Uniform(0, w T ) then N θ∧ϕ (S i ) is equal to the weighted sum as shown in Corollary 17 and as reported by Hampton.However, if ε is distributed over a narrower range than the whole binary space, for example ε ∼ Uniform(0, 0.5) then: which is closer to 0.9, even though less weight is given to that part of the conjunction.The necessity of this attribute has therefore been, if not entirely preserved, at least emphasised in this model.
The fourth aspect Hampton notes is that attribute loss or emergence is observed.When the distribution of ε is defined over a narrower range than [0, w T ], the importance of attributes in the combined concept is higher than the importance of attributes in either constituent concept, as seen in Fig. 10.If the distribution of ε is over a wider range than the whole space, the importance can be lower.However, this does not account for how both phenomena can be observed together.In the example discussed and in Fig. 10, ε has a uniform distribution.Further work to investigate the behaviour of N θ∧ϕ (S i ) is needed.
A further aspect uncovered in Hampton's results is that in general, the qualifying noun, i.e. the second concept in the conjunction, is given more weight than the first.Within our model, we can easily take account of this by setting weights in the binary combination space appropriately.
Lastly, Hampton finds that concepts with more attributes tend to have higher weightings.This can also be accounted for by weighting concepts θ relative to the cardinality of E(P θ ).
(a) N θ ∧ϕ (S i ) as a function of N θ (S i ) and N ϕ (S i ) when w 1 = w 2 = 0.5, ε ∼ U (0, 0.5).Notice that when either (b) N θ ∧ϕ (S i ) as a function of N θ (S i ) and N ϕ (S i ) when w 1 = w 2 = 0.5, ε ∼ U (0, 0.75).Notice that high necessity is preserved to a certain extent, but not as high as that in Fig. 10.

Other aspects of human concept use
We give here some examples of how other cases of non-compositionality can be accounted for within our framework.A key example is the conjunction fallacy, in which the probability of an entity belonging to a conjunction of concepts is judged greater than the probability of that entity belonging to just one of the concepts.The example often cited is that of Linda the feminist bank teller, introduced by Tversky and Kahneman [47].Linda is characterised as follows: Linda is 31 years old, single, outspoken, and very bright.She majored in philosophy.As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations.
When asked to rank the probability of the statements 1) 'Linda is a bank teller' and 2) 'Linda is a bank teller and is active in the feminist movement', the majority of people rank 2) as more probable than 1), in violation of classical probability.This fallacy has been shown to dissolve when posed in frequentist terms [12].The frequentist terms are given in the following way.Rather than being asked to rank the probability of Linda being a bank teller, participants are asked to consider 100 people who can be described as above, and decide how many of that number are bank tellers, and then how many of the 100 are feminist bank tellers.When described in this way, participants no longer give contradictory answers.However, the fallacy remains when asked for probability judgements.Within our framework we can consider the membership in a concept as the probability that a person would assert that concept to describe an object.As we see in the example below, our approach to membership in a conjunctive concept does not entail that membership in the conjunction is always less than or equal to membership in each of the constituent concepts.
Example 19.Suppose T = 'bank teller' is defined as a conjunction of L 1 = 'good with numbers', L 2 = 'medium intelligence', with P T = (1, 1), λ T = (0.5, 0.5), ε T ∼ U (0, 1) and F = 'feminist' is defined as a conjunction of L 3 = 'outspoken', L 4 = 'concerned with issues of discrimination and social justice' P F = (1, 1), λ F = (0.5, 0.5), ε F ∼ U (0, 1).Suppose we combine T and F in the space {0, 1} 2 with prototype P = (1, 1), weight vector λ = (0.5, 0.5), threshold ε We argue here that rather than committing a fallacy, the participants are generating a new concept 'feminist bank teller' with a new prototype.So here the additional characteristics associated with being a feminist increase the membership of Linda in the conjunctive concept.such as a Beta(2, 1) distribution.Lastly, an entity for which μ L i (Y i ) = 1 ∀i will have a higher membership in 'pet fish' than the goldfish, since the prototype for a pet fish which we have specified here is something that lives in the house and is furry and lives in water and is scaly.The issue here is of interaction between dimensions, such as 'furry' and 'scaly'.We have not touched upon this type of interaction between dimensions in any examples thus far, and this is an area for further research.

Discussion and future directions
We have proposed here a formalism extending Gärdenfors' conceptual spaces theory, so as to incorporate the vagueness of natural language using a random set based prototype model.The framework we propose gives a mechanism for forming concepts as a combination of properties from integral domains.The innovation we have introduced here is that these properties are mapped into a binary combination space which itself is treated as a conceptual space.We have shown that the idea of concepts as weighted sums of properties, proposed by Gärdenfors and Zadeh, arises naturally as a special case of our framework, namely when the threshold of the concept in the binary combination space is uniformly distributed across the whole space (Theorem 4).We also characterise the combination weights as the necessity of a property to the concept, using the technical definition of necessity from possibility theory.
The framework we propose is hierarchical.Therefore, combinations may consist of multiple properties, as described above, or, as we describe in section 4.3.2, of two or more concepts which are themselves defined as combinations of properties.Again, under certain specific conditions, we can recover the results reported in [22] that the importance of properties in a conjunction of concepts is a weighted sum of the importance of the properties to the individual concepts.We also show how our framework can be applied to property-concept combination.In fact, the distinction between properties and concepts is somewhat artificial, and the conjunctive combination of any sort of concept can be performed within this framework.
A key element of human concept use, however, is that there are various instances where concepts cannot be adequately characterised as a simple weighted sum of properties.We can account for this by using our more general model, in which the threshold in the binary combination space does not have to be distributed according to the uniform distribution.A key result is that necessity and impossibility are carried through from constituent concepts into the combined concept.We have shown in Example 18 how our characterisation of the importance of an attribute in terms of the concept of necessity from possibility theory allows us to take account of this phenomenon.We have further shown how attribute loss and emergence, characterised as the diminished or increased importance of an attribute to a combined concept, may occur, as illustrated in Fig. 10.
We have also explained how non-commutativity and dominance effects may be modelled, by setting the weights used in the binary combination space.Hampton finds that in general, the second noun in the conjunction is given more weight than the first.This aspect could simply be built in to the weighting.Dominance effects are seen when one concept has more features than the other.Again, weightings could take this into account.We have given examples to show how two of the key problems in this area may be accounted for.These are the conjunction fallacy and the guppy effect, illustrated in Examples 19 and 20 respectively.
We therefore argue that our framework is better able to account for key aspects of human use of concepts than standard conceptual spaces approaches [2,6], which do not attempt to account for these non-compositional features.In contrast, the quantum approach has shown examples of how to account for this type of problem, and has done so successfully.We argue, however, that our approach is more conceptually straightforward than the quantum approach, with the formulation of the latter requiring high dimensional Hilbert spaces and the mathematics of quantum mechanics.
At present, our model does not distinguish between membership in and typicality to a concept.There is certainly a difference between these two notions, and an effective model of concepts should elucidate how these two ideas can be unified, or accounted for.Hampton [24] argues that membership and typicality may be subsumed within one approach, on the basis that membership may be decided by applying a suitable threshold to typicality.This type of approach could easily be incorporated into our model.We might ask where the threshold should be placed.However, we could simply place the threshold at 0.5 everywhere and then tune the weights of the model, rather than having separate thresholds for each concept.
The approach we have proposed here is particularly well-suited to concepts that can be described via a collection of attributes.However, some types of concepts are less well-suited to this type of description.For example, some concepts may be defined crisply in terms of necessary and sufficient conditions, such as the concept of even numbers.Another example, given by Barsalou [8] is that of goal-derived categories.These are defined by the extent to which they allow a goal to be reached.So attributes for 'things to eat on a diet' has as an ideal 'foods with zero calories'.These tend not to be central to a concept, although they could still be seen as prototypical in some way.Another example given is that of 'ad-hoc categories', in particular 'things to sell at a garage sale'.Barsalou argues that these cannot really be defined by a specific set of attributes.
With regard to goal-derived categories, we argue that to a certain extent, we can view the ideals as attributes to be achieved.Therefore, these could still be used as dimension in a conceptual space.Ideals may not be central tendencies, however, we do not require this in our model, since typicality to a concept is defined via a probability distribution based on distance to a prototype.Simply setting the probability to 0 on one side of the prototype allows for non-central ideals.
However, the example of ad-hoc categories does suggest further work.We may view categories defined via attributes as conjunctive in some sense: we use conjunctions of attributes.However ad-hoc categories are defined disjunctively: this old bicycle or these tins of paint etc.A movement from disjunctive to conjunctively defined categories might be viewed as a type of learning: children might see the concept 'cat' as defined by 'Tibbles or Patch', but later extract the attributes that allow generalisation.This would be an interesting area for further work.In particular, when using a semantic vector space approach, for example in statistical analysis of text corpora, the vectors created are formed of a mixture of instances of the concept, that would go into a disjunctive definition of the concept, and attributes, that would go into a conjunctive definition.An analysis of when a disjunctive or conjunctive definition is more appropriate, and the interaction between the two in, for example, learning concepts, would be of interest.
A further area for investigation would be to look at the questions raised by [1,27] concerning privative adjectives, i.e., those where the application of the adjective entails that the adjective-noun combination is no longer an instance of the noun.Examples are 'artificial', 'former', 'fake', and so on.Using the combination of the conjunctive and the hedged approach given in [36] might be of use here, to develop these more complex 'type 2' hedges, as Lakoff calls them [30].An analysis of the effects of the hedges might be to say, for example, that the hedge 'fake' increases the weight of those attributes that are visually important, and decreases the weight of the other most important attributes.An alternative approach would be to tag 'definitional' attributes that the hedge 'fake' can then negate.More generally, the semantic differences between different types of adjectives that [41] discuss should be considered in further work.Adjectives that consist of modifying one property of a concept are well described by our model.Some of the more complicated adjectives such as 'abusive' may be more difficult to describe.Part of the work must be done by giving such an adjective an adequate representation in a conceptual space.Then the application of the adjective may consist of adding domains to its noun, or modifying existing domains.
The potential for application of this model should be discussed.As presented here, in its full generality, the model has a large number of free parameters, and a number of these need to be pinned down before they can be used.Various choices must be made before the model can be applied.For example, the type of threshold distribution is likely to be very dependent on the type of concept or property that is being described.Further, the type of normalisation that we might want to apply to the dimensions must be carefully chosen or learned from the data.We give here a number of possible applications of the model and describe how the parameters can be limited in those cases.
Firstly, the model can be applied as a model of concept combination in examining psychological data.In [34] the model is applied to a range of data from [24,23,22].An example of the type of data is as follows.Participants were asked to rate the membership of instances such as 'Penguin', 'Dog', 'Cockatoo', and so on in concepts and their combinations -in this example, the concepts would be 'Birds', 'Pets', 'Birds which are Pets', and 'Pets which are Birds'.
Hampton's original analysis finds that mean typicality ratings can be systematically predicted using a multilinear regression.We apply the same analysis to the mean membership ratings of the items.The model we use maps each of the constituent concepts into a binary combination space {0, 1} 2 .Each dimension of this space is weighted, with the weights summing to 1, resulting in weight vector (λ, 1 − λ).The threshold ε in the binary space is distributed uniformly, ε ∼ U (0, b).
Another way to reduce the number of parameters is to make the simplifying assumption that the threshold in the higher level space is distributed according to the assumptions of Theorem 4, i.e., that the combination of concepts is simply a weighted sum of membership values.This assumption was used in investigating the adoption of conjunctive concepts in a multi-agent simulation of language evolution [34,35].In this application, agents are equipped with a range of basic concepts which they use to communicate about points in a conceptual space.Via iterated dialogues, agents converge onto a shared set of dimension weights that characterise the space.The weights to which agents converge is determined by the distribution of objects within the conceptual space.
Applications of the theory might also be possible in online classification tasks, for example in film.In an analogue to the 'pet fish' phenomenon, a film might not have typical characteristics of a horror film, nor of a comedy film, but be prototypical of a 'comedy horror'.In this cases, a weighted sum formulation on its own is not able to account for these phenomena.The fact that comedies and horror do not share many characteristics might be taken to be diagnostic of the fact that 'non-compositional' effects may be seen here.Further, we have not yet developed a theory for how to deal with contradictory attributes.These will be key to how non-compositional effects arise, and a simple weighted sum combination does not explain these in a satisfactory manner.
Within the framework we have discussed conjunction and disjunction of nouns and adjectives.There are, of course, many other operators and word types that should be captured in a full account of concepts.Gärdenfors and Warglien have started to develop conceptual spaces for verbs [21,48], and Gärdenfors has begun the development of a semantics for conceptual spaces [20].Other approaches to developing a full semantics use the notion of a semantic vector space, based on text corpora [11,38].Our model could be extended to include other word types and composition types.Within our framework we have explicitly avoided the problem of conflicting prototypes.However, this is an area which must be developed.One important aspect of conflicting prototypes is that having conflicting attributes allows interesting phenomena to emerge.This goes hand in hand with another aspect that we have not discussed: the need for some sort of inference system.
A key element of our framework is the weights given to dimensions in the combination space.Results in [35] and ongoing work show that within a multi-agent model of language users, these weights can be related to the distribution of elements in the conceptual space, explaining why and how different dimensions should have different weights.
Further research will investigate how attributes may affect one another.For instance, in the pet fish example, the attributes 'furry' and 'scaly' might interact since they are, to some extent, incompatible.This is not examined so far, and indeed we require that the constituent concepts may not have contradictory prototypes.Further developing the framework for these cases is likely to allow even more effective modelling of non-compositional effects.

Fig. 1 .
Fig.1.Prototype-threshold representation of a concept L i .The conceptual space has dimensions x 1 and x 2 .The concept has prototype P i and threshold ε i .The uncertainty about the threshold is represented by the dotted line.Element a in the conceptual space is within the threshold, so we can say that a has property L i .Element b is outside the threshold, so b does not have the property L i .The neighbourhood N εi Li = {x ∈ : d(x, P i ) ≤ ε i } corresponds to the

Fig. 2 .
Fig. 2. Conceptual space divided into concepts according to a Voronoi tessellation around prototypes.Each part of the space corresponds to exactly one concept.

Fig.
Fig.3.Conceptual space divided into concepts according to a prototype-threshold approach.Some points in the space correspond to more than one concept, and some correspond to none.

Fig. 4 .
Fig. 4. Schematic of a hierarchical conceptual spaces model for combining concepts.

Theorem 4 .
Let α = n i=1 ±L i and λ T = n i=1 λ i .Let δ be the uniform distribution on the interval (0, λ T ).If d is the weightedHamming distance H λ then:

Fig. 8 .
Fig. 8. Schematic illustration of the second level binary space.

Table 2
Table of ratings.