1 Introduction

In this paper, I critically assess Mohan Matthen’s account of conscious perceptual visual content. I am interested in a certain duality that characterises Matthen’s position, namely between descriptive and referential elements of conscious experience. Descriptive content is our awareness of sensory features belonging to objects located at a position in the visual field. Matthen conceives of this in terms of an image. The referential element is a demonstrative form of content, by which we pick out those objects as particulars and assert their physical presence. Together, these descriptive and referential elements make up the ‘assembled message’ that visual states present to the perceiver in perceptual experience (Matthen, 2005, p. 305).

Matthen’s bipartite view of content invokes further contrasts, between content and attitude, between conceptual content and an ‘inarticulate’ feeling of presence (2005, p. 304), and between the merely imagistic and particular, demonstrative experience. In what follows, I problematise this tendency towards duality, arguing that Matthen leaves himself unable to explain how perceptual experience purports to present the world as it is. I have two strategies, focusing on each of the two elements of perceptual experience. I begin with the referential element. I identify several roles assigned to it and argue that on the two most promising readings of Matthen’s position, there is good reason to doubt that the demonstrative element can play the role of assertion operator. I then turn to the descriptive element, Matthen’s descriptive content (2005, 2010) or image content (2014b). How should we think of this sensory element of experience? I provide evidence to show that Matthen is wedded to a mental entity view of descriptive content, on which the subject is aware of a reified image that in turn represents a visual scene. This mental entity proves a defective ingredient from which to construct an experience that presents the actual world. There are problems integrating the depicted viewpoint with the subject’s actual viewpoint, and problems maintaining a coherent notion of assertion. I argue that the project of assembling perceptual experience from descriptive and referential elements is unworkable.

Matthen is not the only author to introduce a demonstrative form of content to explain the particularity of perceptual experience. In recent literature, phenomenal particularity has often been conceived as an explanandum for adherents or opponents of relationalist views of veridical perceptual experience.Footnote 1 Against the relationalist conception, Montague for instance argues that there is a ‘bare demonstrative thought-form’ (2016, p. 136) which is responsible for the fact that perceptual experience is object-positing.Footnote 2 Montague takes this object-positing to explain the phenomenology of particularity without implicating ‘actual objects’ as constituents of the experience (2016, p. 139). By contrast, both French and Gomes (2019) and Beck (2019) appeal to a notion of perceptual experience (and so phenomenology) that implicates actual objects themselves. For French and Gomes, this is the naïve realist ‘phenomenal nature’ of perceptual experienceFootnote 3 (2019, p. 48), while for Beck it is the relationalist, ‘broad conception’ of phenomenology (2019, pp. 8–9). In Sect. 2, I argue that while Matthen certainly considers descriptive content to be non-relational, both a relational reading and an intentional reading of his referential content are outwardly plausible but ultimately untenable. However, what sets Matthen’s proposal apart from proponents in the phenomenal particularity debate is that he explicitly claims that the demonstrative content responsible for phenomenal particularity is also responsible for assertion. No other parties to the debate draw a connection between the demonstrative and the assertoric, and so they are untouched by the criticism I offer here. It is Matthen’s account of assertion, not particularity, that is the central interest of this paper.

This paper has three sections. In the first I present Matthen’s view of visual perceptual content. In the second, I critically examine two roles that Matthen assigns to the referential element of perceptual content. I argue that there is nothing about its first role of achieving demonstrative reference that enables it to play its second role as an assertion operator. In the third section, I trace Matthen’s assigning the referential element the role of assertion operator to his endorsement of the Shared Content Principle—that descriptive content can be identical across perceptual experience, episodic memory and visual imagination. I show that Matthen’s notion of descriptive (or image) content should be understood as a mental entity, before raising two problems for employing this account in the perceptual case. Finally, I turn to a recent paper of Matthen’s that returns to the contrast between perceptual experience and visual imagination (2021). I argue that Matthen’s appeal to ‘the activity of looking’ (2021, p. 3272) represents a break with the view that the descriptive and referential exhaust perceptual content and phenomenology. However, this move is unsuccessful as any aspect of perceptual experience can be represented in visual imagination.

2 Matthen's account of perceptual content

Matthen proposes that the content of perceptual experience is propositional. But whereas sentences express propositions linguistically, perceptual experiences express propositions imagistically. We tend to think of propositions as abstract compositions of objects, properties, relations and quantifiers. Visual experience expresses these propositions ‘in its own expressive medium, i.e. in visual qualia, not in English’ (2010, p. 108). Matthen is quite specific about the kinds of proposition that sensory experience can and cannot express. He holds that the visual field is organised around objects—usually material objectsFootnote 4—which possess visual features and are positioned at locations relative to one another and a viewpoint. Matthen (2014b, p. 273) formalises this in terms of:

(predicative) feature-placing structures of the form <sortal S, feature F, locations L>

Each of these propositional structures expresses a ‘situation-type’, <House, yellow, to the left> for example. The content of a visual image could be expressed by a (perhaps indeterminately large) set of these predicative-structures. There are limits to the kinds of propositions image content can express. ‘In particular it cannot express negation, disjunction or quantification. There is no image, for instance, of all tigers being striped’ (272). Likewise, they cannot express ‘absolute location and time’ such as here and now (265). Matthen stresses that image content ‘places objects and their features in a unified spatiotemporal matrix’ (273). Every location in the visual field is spatially and temporally connected to every other, so that an image can express location and time relative to the objects it represents. An image cannot represent an event as happening now but it can represent two events as happening simultaneously, for instance. Note however that Matthen is not denying that perceptual experience can represent a scene as happening here or now, only that this is provided by their referential, not imagistic, content. While it is in his (2014b) chapter that Matthen most explicitly adopts the terminology of ‘image content’, this notion had been developed in his (2005) and (2010) under the moniker ‘descriptive content’. In what follows, I use the term descriptive content and take it to be identical with the visual image content of (2014b).

The (2005) and (2010) texts tie descriptive content to the functioning of the descriptive vision system—a grouping of neural data pathways that each run through the so-called ventral stream in the brain. Appealing to the Two Systems research programme,Footnote 5 Matthen proposes that the imagistic characterisations produced by this system are at least potentially conscious, are storable and recallable in memory, and can influence future behaviour and reasoning (2010, p. 119).

Crucially, Matthen also appeals to a second vision system, this one involving the dorsal stream—motion-guiding vision. Motion-guiding vision ‘does not provide us with the kind of conscious visual datum that we get from descriptive vision’ (2005, p. 300). That is to say, objects are not imagistically represented in terms of visual properties. Nonetheless, it does provide information about those objects that guides action. In particular, motion-guiding vision is harnessed in coming into contact with and physically manipulating material objects. It is through motion-guiding vision that we take in the details of a pencil’s orientation so that our grip is aligned when we reach to pick it up. It gears us into our environment: we are ‘perceptually coupled’ to the particular objects we are in the presence of (316). Here, ‘the link between motion-guiding vision and bodily motion is direct; it is not routed through consciousness’ (297). It takes place independently of the feature-predication that characterises descriptive vision.

Now, I don’t mean my coming argument to turn on the empirical question of whether the division into descriptive and motion-guiding vision is the best way to understand how vision functions. But some awareness of the empirical distinction between descriptive and motion-guiding vision helps to frame the distinctions that Matthen draws within conscious experience. For although motion-guiding vision does not produce a sensory state, Matthen proposes that it does make a contribution to perceptual experience. When we are geared into environmental objects through motion-guided vision, sensory consciousness is supplemented by a feeling of presence. This is ‘a felt spatial connection to seen objects’ (Matthen, 2010, p. 115) which is characteristic of visual perceptual experience.Footnote 6 The feeling is intentional in that it attaches to particular objects in a visual scene. At the same time, it is:

inarticulate: that is, it supplements, but does not provide, awareness of the objects’ features. Nevertheless, it makes a difference to the quality of one’s visual awareness of an object. (2005, p. 301)

Part of Matthen’s motivation for positing the feeling of presence is phenomenological:

When I look down at my hands right now, it looks as if they are working on a black computer keyboard. There is something about my visual state that makes it seem as if the keyboard is really there, and that it is really black. (2010, p. 107)

This something is not itself imagistic. I could just as well imagine or remember the black keyboard, conjuring a perfect image of my hands over the keys, and still the keyboard would not seem to be actually there. Rather, Matthen claims, it is motion-guiding vision’s picking out this object for bodily interaction that contributes a feeling that it is present to us. ‘The availability of agent-centred spatial relations creates a feeling of presence in real-life seeing’ (2005, p. 315). I think it is fair to surmise that the feeling is not genuinely affective in the manner of feelings of anger or tiredness. Nonetheless it has an experiential dimension, ‘[making] a difference to the quality of one’s visual awareness of an object’ (301). Matthen classes the feeling of presence as a cognitive feeling.

A cognitive feeling C is a subpersonally generated, phenomenologically accessible feature of a mental state S that imparts to S semantic or practical import different from that of another state S’, though S and S’ have the same [descriptive] content. The difference of import between S and S’ is accounted for by construing the feeling as a propositional operator (or neustic as R. M. Hare called it in 1952) that operates on the content of S. (2010, p. 114)

This leads us to Matthen’s key claim, that the feeling of presence is an ‘assertion operator’ (2005, p. 306) or ‘actuality committing’ (2010, p. 114). Recall that perceptual experience is not neutral with regard to the veridicality of its content. Rather it asserts that its propositions reflect how things are. Matthen argues that when the feeling of presence accompanies a visual scene it plays this role, asserting that objects in the perceiver’s actual environment are as visual experience presents them. He compares visual perceptual experience with other visual states that do not bear the feeling of presence, namely visual imagination and pictorial seeing. Just as imagining or looking at a picture of a black keyboard will not make the keyboard seem real, neither need visual imagination or pictorial vision assert that the visual scene they represent reflects reality. (Of course, imagination and pictures can reflect reality, but this depends on context. It is not in their intrinsic nature to do so, Matthen claims.) Matthen notes that ‘this feeling can be mistaken’ (2005, p. 315). In looking at a hologram (or presumably a hallucinated object), a feeling of presence asserts a solid object when no such object is in fact there. It is in light of the feeling of presence that both holograms and genuinely solid objects seem to be physically there.

How does this assertion work? Matthen holds that the feeling of presence asserts because it is constituted by a non-descriptive, referential element. This is a second, demonstrative form of content: ‘a pure demonstrative, a cognitive relationship between perceiver and object, devoid of all descriptive content’ (2005, p. 319). To illustrate the force of this demonstrative element, Matthen imagines two perceivers looking at distinct but qualitatively near-identical blue spheres:

You are sitting in a darkened room looking at an illuminated blue sphere S1; I am sitting in a darkened room thousands of miles away looking at an exactly similar sphere S2. Though we have qualitatively similar experiences, your experience is about S1, while mine is about S2. (2010, p. 122)

The difference between the two experiences is not reducible to a difference in descriptive content. The content of both perceptual experiences might be expressed as <sphere, blue, upper left>, yet Matthen is right to try to preserve an important distinction. For it is S1 and not S2 that I am looking at, and this fact of quantitative identity transcends any qualitative likenesses. If I were to reach out my hand, it would be this sphere that I approached. Matthen contends that this ability to refer to an object as a particular is not reducible to sensory awareness. Rather it constitutes an additional form of conscious content. It is an inarticulate form of reference achieved by the functioning of motion-guiding vision. And the phenomenology of this kind of reference simply is the feeling of presence. It is because the feeling of presence provides a direct perceptual link between perceiver and object that it is able to assert that things are as descriptive content presents them as being.

In addition to this demonstrative awareness of particularity (that), Matthen holds that perceptual experience includes awareness of two other indices that cannot be conveyed through descriptive content; absolute location (here) and time (now). While perceptual experiences always assert a scene as happening here and now, ‘they do not do so by means of their image content’ (2014b, p. 281).

In sum, Matthen holds that ‘our visual states present us with an assembled message, a message that has a descriptive element as well as a referential one’ (2005, p. 305). We experience objects, their visual properties and their location relative to the visual scene as descriptive content. But our experience also refers to objects as particulars and asserts their presence in the here and now due to the contribution of the referential element. Whether on this descriptive-referential framework, these two elements can play together in the way Matthen envisages is the subject of the remainder of the paper.

3 The referential element: Difficulties in accommodating its two roles

In this section I problematise Matthen’s account of the functioning of the referential element. I ask whether the feeling of presence is able to play both of the roles that Matthen requires of it, namely as accomplishing demonstrative reference and as an assertion operator. I argue that on the two outwardly plausible interpretations of Matthen’s view, the referential element is ill-equipped to assert that the descriptive element reflects how things are. Moreover, I argue that by Matthen’s account, the perceiver is in a less privileged epistemic situation than we normally take visual perceptual experience to place us in.

It is worth reminding ourselves of the different roles assigned to the referential element of perceptual experience; what Matthen terms the feeling of presence. First, the feeling of presence constitutes ‘an indexical or demonstrative form of conscious content’ (2010, p. 107). It provides inarticulate (i.e. non-descriptive) reference to particular objects, and in so doing contributes to the total content of the perceptual experience. Second, the feeling of presence functions as an assertion operator. When ‘attached to a visual scene, the feeling of presence asserts it, so to speak—it makes one feel that the scene being described is present’ (2005, p. 305). Here Matthen draws a direct parallel with propositional attitudes. Just as belief (but not hope) commits to the truth of a proposition, so Matthen contends that the feeling of presence commits a perceptual experience to the truth of an (imagistically expressed) proposition,Footnote 7 and by extension to the veridicality of the experience. We can construe the feeling as a ‘propositional operator… that operates on the [descriptive] content’ of a visual experience (2010, p. 114). The question I wish to pursue is how the feeling of presence is able to play both of these roles. What is it about the demonstrative element that allows it to assert the content of a visual scene?

Now, one might object that there need not be something about the first role that allows the feeling of presence to play the second. Take the roles of lecturer and parent. I can perform both of these roles, but it would be misplaced to ask what it is about role of lecturer that allows me to play the role of parent. The roles are not connected. They are just two things that I can do. But there is an important disanalogy between these roles and those that Matthen envisages for his feeling of presence. For while my ability to perform both roles is grounded in the fact that I am a concretely existing human being (with some minimal social competencies), the feeling of presence has no such concrete existence or prior characteristics. It is an abstract entity conceived wholly in terms of the roles it is postulated as playing. At the level of explanation at which it comes into view, there is no more to it than the roles it plays and the phenomenology that is taken to follow from them. So, if it were not through playing one role that it plays the other, one might wonder what supports our conceiving of the feeling of presence as a single entity at all. As it happens, Matthen does seem to build into his account that it is through playing the demonstrative role that the feeling of presence asserts. The question is how this is supposed to work.

Clearly, Matthen thinks that the functioning of motion-guiding vision lies at the heart of this explanation. But even if we accept Matthen’s causal account of the operations of the vision system,Footnote 8 there is a certain ambiguity to his view of how those operations influence content. To bring out this ambiguity, consider the following passages from (2005) and (2010) respectively:

There is something in perceptual experience that relates the perceiver to a particular object. (2005, p. 316, my emphasis)

The Feeling of Presence arises, then, out of an informationally rich demonstrative relationship between visual experience and visual object (2010, p. 123, my emphasis)

What can we say about this relation between experience and object? One reading of Matthen is that these relations are existence-implicating; that this is a relation that cannot hold without the existence of its relata. I term this the relational reading of demonstrative reference. There is considerable textual support for this interpretation. For instance, Matthen often describes the motion-guided perceiver as ‘perceptually coupled’Footnote 9 to an object (2005, p. 319), suggesting that the relationship between them is more concrete, more object-dependent, than merely descriptive vision. In the same vein, Matthen often frames this relationship as a connection (a ‘sensorimotor connection’ (2005, p. 319); ‘action-enabling connection’ (2010, p. 107); ‘direct connection’ (2005, p. 305)). Moreover, in Matthen’s account of how demonstratives function, he holds that: ‘the relationship R is sensitive to the location of the object demonstrated’ (2010, p. 122). Presumably the absence of such an object would then preclude such a relation. Finally, while Matthen does acknowledge that motion-guiding can misrepresent, his go-to example is not hallucination but holograms. This is significant because as he writes: ‘holographic images can engage motion-guiding vision’ (2005, p. 315). So when misrepresenting a hologram as a genuine material object, there is still a visible object there for motion-guiding vision to inform about. By contrast, nowhere does Matthen consider the possibility of motion-guided awareness of an object that does not exist, presumably because he thinks of this causal awareness—and therefore the type of reference it facilitates—as existence-implicating.

At first glance, an explanation of how the feeling of presence is able to play the role of assertion operator seems within reach for the relational reading. The referential element of content amounts to an existence-implicating demonstrative relation with the perceptual object. For this kind of relationship to hold, the perceptual object has to exist. Therefore, when the referential and descriptive elements are combined in perceptual experience, simply by referring the feeling of presence asserts the existence of the object that descriptive content represents.

However, this explanation of assertion on the relational reading faces problems. One issue is that it struggles to handle the kind of perceptual error in which there is no object present; cases where there is nothing for motion-guiding vision to couple with. Take a hallucinatory perceptual experience of a red ball. At least some kinds of hallucination assert in much the same manner that veridical experience does.Footnote 10 But the relational reading is at a loss to explain how this could be so. For without the existence of an object, the perceiver is unable to enter the kind of demonstrative relation by which their experience could assert. So the explanation of assertiveness that Matthen (on this reading) offers, cannot extend to cases of hallucination.

The relational reading also faces an issue with the scope of the assertion. Standing in an existence-implicating demonstrative relation guarantees that a particular object exists. The feeling of presence is thus well equipped to assert the existence of the object descriptively represented. However, there is still a question of how the referential element is able to assert that the object has the visual properties that descriptive content assigns to it. Why should an inarticulate demonstrative relation to a particular put a state into a position to assert that its colour, texture, glossiness, are as descriptive content classifies them? Indeed, even if colour and texture are conveyed by the motion-guiding vision system, Matthen is explicit that they are conveyed unconsciously and do not form part of the referential content of experience. It seems that on a relational reading, there is still a problem with how the referential element is able to assert descriptive content.

Let us turn to the alternative reading of Matthen’s view of demonstrative reference, one I term the intentional reading. On this view, the referential element is not relational. It does not depend on the existence of the perceptual object and as such, it is more closely aligned with descriptive content. Just as we can descriptively represent a red ball when no such ball is present, so it is possible to demonstratively refer to a ball, picking it out as a particular, when there is no ball there. Certain forms of hallucinatory experiences again offer the relevant example. Now if the ‘informationally rich demonstrative relationship between visual experience and visual object’ is intentional but not existence-implicating (2010, p. 123), why should bearing such a relationship lead an experience to assert descriptive content? We have seen that this form of reference picks out a particular, and moreover how this picking out of a particular is served by the motion-guiding vision system gearing the perceiver into physical interaction with an object. But why, having picked out an object as a particular, should the experience assert that it has certain sensory properties? Singular reference does not entail assertion. Think back to the other propositional attitudes Matthen considers. My hope ‘that she will call’, picks out a particular (my sister) but does not assert that she will do anything. Nothing about demonstrative reference as such enables it to play this assertive role. Matthen contends that it does and indeed his account requires that it does as he is committed to the view that descriptive content cannot itself assert the actuality of what it represents. But an argument showing why in visual perceptual experience assertion should follow from reference to a particular is absent from Matthen’s writings.

I want to add a final epistemological point about assertion that I think raises further problems for Matthen’s view. Assertions aim at how things are. Different propositional attitudes assert in different ways. We tend to think of belief’s assertiveness as based on rational deliberation. I believe that my sister will call because believing so is in keeping with the commitments of the rest of my belief system (i.e., my belief that she promised she would call; my belief that she reliably keeps her promises etc.) By contrast, perceptual experience’s assertiveness seems based on the implicit idea that perceptual experience simply reveals how things are. If this is how things are, assertion is the only appropriate attitude a state can adopt towards it. Now we ordinarily take perceptual experience to offer us a certain epistemological privilege over and above the privilege of getting things right when perceptual experience is veridical. This is a privilege that distinguishes it from other ‘assertive states’, such as belief. In the words of Bill Brewer, as well as revealing how things are, perceptual experience also reveals how the perceiver ‘is right about the way things are in the world around her’ (1996, p. 259). For unlike belief, perceptual experience does not convey how things are in abstraction. Rather from the viewer’s perspective, the whole spatial configuration of the scene, and so the kind of spatial relation that the perceiver seems to stand in to that object is on display. It is apparent to us how and why we are right in our assertion (if we are right): because from here, the world is set out like this.Footnote 11 The epistemic privilege I am describing involves what Johannes Roessler terms an implicit ‘causal understanding’ (1999, p. 52): the red ball, its location relative to your own, the fact that your line of sight is unimpeded and that your visual faculty is in working order; this is all part of what it takes to find oneself in a position where things appear like this. There is a kind of internalist idea of assertion at play here, in that the very same experience that purports to put you in touch with the object, puts you in a position to see how you are right in your assertion. But by Matthen’s account, this epistemological privilege is undermined. When all goes well, even when we are veridically representing the red ball, our assertion does not follow from our sensory experience. The assertion is a consequence of a different, unconscious process—motion-guiding vision—and there is nothing about experience, nothing accessible to the subject, that explains why this experience should be asserted. Rather, the feeling of presence is a brute assertion operator. The pertinent information about why we are right in our assertion is not itself available to conscious experience. In short, on Matthen’s view the perceiver occupies a less privileged epistemological position than one would tend to ascribe them. For although she may well be right in her assertion, she is not in a position to see how she is right. There are doubtless some who are dismissive of the epistemological significance of seeing how you are right, but for those who think it crucial to what sets perceptual experience apart from other committing-states, this is another reason to be sceptical of Matthen’s division of descriptive and referential content.

In this section I have shown that Matthen’s division of content into descriptive and referential elements faces difficulties. The major issue is that he envisages two distinct roles for the referential element. The feeling of presence provides demonstrative reference and asserts descriptive content, but on both the relational and intentional readings of Matthen’s view there is little reason to think that the element that plays the first role is equipped to play the second. Moreover, I raised an issue about the kind of access Matthen’s account affords a perceiver to their reasons for assertion, arguing that the perceiver is in a less privileged epistemological position than many take perceptual experience to bestow. In short, the duality of descriptive and referential elements won’t play together as Matthen would like. But why does Matthen need to appeal to the referential element to function as assertion operator? In Sect. 3, I trace Matthen’s motivation to a certain intuition about descriptive content, namely that it can be identical across perceptual experience, episodic memory and visual imagination.

4 The descriptive element: A mental image

If the referential element cannot assert, there are two options available. We can attempt to amend Matthen’s account by locating assertion somewhere else in a framework that still contains descriptive and referential elements, or we can reject the conception of perceptual experience as an assembled message. In what follows, I argue we should take the latter approach. For once we clarify the sense in which descriptive content is imagistic, we will have reason to question whether combining the descriptive and referential could ever provide the kind of engagement with the world that we take to be distinctive of perceptual experience. Indeed, the difficulties outlined above of accommodating an assertion-operator in referential content can be viewed as a symptom of a more fundamental instability in the very idea of an assembled message.

I begin the final section by adopting a more diagnostic approach. I am interested in what leads Matthen to assign to a purely referential element of content the role of assertion operator, a role I have argued it is not equipped to play. I trace Matthen’s motivation to an aspect of his view of descriptive content, namely that descriptive content can be constant across different kinds of visual experience. Matthen formulates this as the Shared Content Principle (2014b, p. 280). With this principle on the table, I argue that descriptive (or image) content can only be understood as a kind of mental entity, a reified image. But this in turn leads to severe difficulties, both in integrating the descriptive point of view with the subject’s actual point of view, and in maintaining a coherent account of what it means to assert an image. Finally, I turn to a recent paper of Matthen’s in which he identifies a role for ‘the activity of looking’ in distinguishing perceptual experience from visual imagination (2021, p. 3272). I argue that this represents a break with the view that the descriptive and referential exhaust perceptual content and phenomenology. However, Matthen’s account is unsuccessful as any aspect of perceptual experience (including the activity of looking) can be represented in visual imagination.

We can perceive, imagine or recall ‘a red ball at the end of a garden’.Footnote 12 For Matthen, the descriptive content of any one of these acts could constitute the descriptive content of any other. He formulates this insight as the Shared Content Principle.Footnote 13

Shared Content Principle: Any perceived feature-placing structure could figure in the image content [i.e., descriptive content] of perception or recollection or imagination. (2014b, p. 280)

With this explicit, we can begin to see why Matthen is impelled to conceive of the feeling of presence as an assertion operator. Matthen draws from the Shared Content Principle that ‘if two sensory acts are directed to the same sensory image, but differ with respect to some commitment, p, then p cannot be part of the image content of these sensory acts’ (2014b, p. 282). So, if both perceptual experience and sensory imagination can represent a red ball at the end of a garden, but only perceptual experience is committed to the actuality of the scene, the assertion cannot be a part of the descriptive content of perceptual experience. Assertion must take place ‘outside the image’Footnote 14 (283). As Matthen is committed to another non-descriptive form of content, the feeling of presence, we can gather why Matthen would be moved to assign it the role of assertion operator. For indeed it is only as a propositional operator working on the proposition-expressing descriptive content that something non-descriptive can influence a representation, given Matthen’s descriptive-referential framework for thinking about sensory representation.

Matthen is certainly right that the same scene could in principle be presented in perceptual experience, imagination and memory. There would then be important representational similarities between these kinds of state. The same ball, the same colour, the same position in the garden. The representational similarity might also fix a phenomenological similarity between the different kinds of mental state. The Shared Content Principle explains these similarities by maintaining that these kinds of state can bear identical content.Footnote 15

Matthen’s adherence to the Shared Content Principle affords us a better grasp of how he conceives of descriptive content. For it is a neutral element, an image, which is equally applicable to three very different kinds of experience. With this commitment clarified, we can ask: in what sense is descriptive content imagistic on Matthen’s view? Put another way, what kind of a representation are the images that make up descriptive content? Answering this question will bring into focus what is shared between perceptual experience, visual imagination and episodic memory. It will also bring out a tension that I contend is at the core of Matthen’s picture, between the way in which we engage with images and the kind of vantage on reality that we ordinarily take perceptual experience to provide.

Descriptive (or image) content is a three-dimensional array of sensible features predicated of objects that are arranged in a spatial matrix relative to one another and a point of view. It is also a neutral element shared across the three kinds of experience under discussion. A subject takes a particular kind of attitude towards this content, where this attitude is then definitive of the kind of experience the subject is having:

Subject + episodically remembers (attitude) + sensory image (content) (Matthen, 2010, p. 110)

Matthen also urges that we ‘[take] image content seriously’ (2014b, p. 267). I take this to underlie a claim that representational content is not simply to be individuated in terms of veridicality conditions, rather the image and its distinctive format are crucial to specifying what an experience represents. Indeed, given his commitment to the Shared Content Principle, Matthen cannot appeal to veridicality conditions alone to individuate descriptive content. This is because content individuated in terms of veridicality conditions could not be shared across perceptual experience and episodic memory.Footnote 16 What it takes for a perceptual experience of a red ball to my left to be veridical is that there is presently a red ball to my actual left, but what it takes for an episodic memory of a red ball to my left to be veridical is rather that at some time in the past a red ball was to the left of my viewpoint then. Given this divergence in veridicality conditions and his commitment to the Shared Content Principle, Matthen is quite right to appeal to a neutral element, an image, to individuate descriptive content.Footnote 17

I propose that the only interpretation of Matthen that answers the question of what kind of representations images are, that accommodates all of the foregoing criteria, and that grants image content a role in specifying veridicality conditions, is that Matthen subscribes to a mental entity view of images. In particular, a view on which the descriptive content shared between perceptual experience, episodic memory and visual imagination is a kind of mental picture. The subject is aware of this mental entity, a reified image, which in turn represents a visual scene. Different attitudes to this mental picture facilitate the three different kinds of visual experience under discussion.Footnote 18 Before providing textual support, I note that this is a natural interpretation that accommodates the predicative feature-placing account of descriptive content and shows how the Shared Content Principle could work. The mental image is exactly the kind of neutral element that can individuate content and still be shared across the three kinds of experience. The proposal also makes sense of Matthen’s image talk—for what are images if not pictorial representations? And it also distinguishes Matthen’s view from a more standard representationalism that doesn’t take image content seriously—a representationalism on which veridicality conditions alone individuate representational content. For this standard view, on which we are not aware of an image but merely represent a scene imagistically, doesn’t identify a special role for the image in setting veridicality conditions. As I will demonstrate, the image as depiction introduces certain spatial properties (like position relative to a depicted point of view) that play a significant role in establishing veridicality conditions. And as I set out above, this standard view is inconsistent with Matthen’s adherence to the Shared Content Principle.

But the best textual support for a mental entity view of descriptive content is that Matthen explicitly likens descriptive vision to pictorial-seeing. He writes that ‘pictures evoke in us an awareness of the same descriptive visual features of objects as real-life scenes’ (2005, p. 313). For pictures, like image content, present objects bearing sensory features relative to a viewpoint. Indeed, Matthen’s interest in pictures is exactly that ‘seeing in pictures… forces motion-guiding and descriptive vision apart’ (2005, p. 306). When looking at a picture, we are unable to pick out particulars or locate them in egocentric space. Rather depicted objects are arrayed relative to one another, in an internally coherent spatial matrix, just as image content is. Matthen goes on to claim that ‘what holds of pictures holds equally of visually imagined scenes’ (313), and so by the Shared Content Principle, all descriptive content. The only difference is that pictorial seeing also involves an awareness of the surface of the picture (2010, p. 116).

Now if I am right that we ought to think of descriptive content—image content—as a kind of mental picture, this leads to problems in asserting the image in the perceptual case. Recall that for Matthen, in descriptive content objects and properties are arrayed relative to a point of view. Whose point of view is this? It is certainly not the subject’s actual point of view. After all, when I episodically remember the red ball that yesterday I glimpsed to the left of my visual field, this is not to say that I recollect the red ball as to my actual left (the left of my present body). Rather, if the image is a mental depiction, it follows that the point of view is a depicted point of view. Consider a non-mental image, Camille Pisarro’s The Boulevard Montmartre at Night. The painting depicts a nocturnal street scene. There is a line of electric streetlamps running down the middle of the boulevard, and to the left in the foreground is a tall dark tree. Further to the left, crowds of people pass the lighted shops. These objects are positioned relative to one another, but we can also ask: To who’s left do they appear? As I look at the painting, the dark tree need not be to my actual left. Indeed, as I pace around its room at the National Gallery, while the position of the painting itself relative to my movements changes (now to my left, now to my right), the position of the tree depicted does not. For the point of view relative to which it is to the left is itself a depicted point of view. This effect is not achieved by explicitly painting an observer into the scene. Instead, the position of the depicted objects and the angle at which they are depicted implies a depicted viewpoint, a perspective internal to the scene and to which they appear. In The Boulevard Montmartre at Night this viewpoint is elevated above street level and is left of the centreline of the road.

The same must hold if we move from a painted image to a mental entity image. The viewpoint is itself depicted. It is part of what the image represents. In the case of visual imagination, this is unproblematic for Matthen’s purposes, though there are independent reasons to be suspicious of the proposal that imagination has an internal point of view (cf. Williams, 1973, pp. 35–37). But in perceptual experience, Matthen writes that ‘[t]he “here and now” of such an experience depends on the point of view being that of the perceiver’ (2014b, p. 272). Matthen needs the point of view on the perceived scene to match up with the subject’s own point of view. What it takes for a perceptual experience of a tree to the left to be veridical is that there is such a tree to one’s actual left. But if the former point of view is depicted, this is just not possible. For you cannot occupy a depicted point of view. It will always belong to a differential spatial plane to that of the subject observing the picture. At best, one might be tricked into experiencing yourself to be occupying a viewpoint which one is merely representing. Perhaps this is how certain largescale Trompe-l'œil pictures operate (see Matthen’s discussion of Andreas Gursky’s Schippol photograph (2005, p. 316)). But it would be a very strange view and one with grave consequences for the prospect of perceptual knowledge that claimed that we are systematically fooled into experiencing a mentally depicted viewpoint as our own.

Setting the analogy with pictorial images to the side for a moment, let me motivate the central claim of the previous paragraph—that you cannot occupy a (mentally) depicted point of view—in a way that makes explicit why an adherent of the Shared Content Principle must accept it. To occupy a point of view is to take a position on where the origin of that point of view is: it is here, in the present, where I am, such that objects to the left of that point of view are to the left of my body. With this stated, it is clear that in episodic memory we do not occupy a point of view on the remembered scene (depicted or otherwise). It follows that representing and occupying a point of view can come apart. Now an adherent of the Shared Content Principle who accepts the mental entity image view requires that the point of view depicted be neutral as to where the origin of that point of view is. Otherwise, it is will not serve as shared content for perceptual experience, episodic memory, and visual imagination. But if the point of view depicted in perceptual experience is neutral, it cannot be occupied, for to occupy a point of view is to take a position on the spatial origin of that point of view. Descriptive content is designed to achieve neutrality. All of this is not to say that you cannot describe an occupied point of view in a neutral way. You could do so by abstracting away from certain spatial and temporal features of that viewpoint. But that merely descriptive specification of a point of view—Matthen’s shared descriptive content—cannot be occupied.

Does it help that Matthen conceives of image content as ‘three-dimensional arrays of located sense features’ (2014b, p. 265 my emphasis). This depends on what it means to claim that an image is three-dimensional. If Matthen means merely that the image depicts three-dimensional bodies arrayed in three-dimensional space, this is consistent with a depicted point of view. The Boulevard Montmartre at Night or any photograph or film of three-dimensional objects will provide an external image analogue. There is an implicit viewpoint on a depicted three-dimensional scene, which is itself internal to the depiction. But if rather, Matthen’s three-dimensional array is more like a sculpture, or hologram, or better still a model scene, whereby the image itself is three-dimensional and the subject can move around it freely, then it follows that the viewpoint would not be depicted. For as you move around a model village, the model windmill is not to a depicted left but your actual left. The viewpoint is your own. But there are sizeable problems with this interpretation of three-dimensional content. First, the interpretation fails to respect the explicit analogy Matthen draws between descriptive seeing and pictorial seeing. Worse, it furnishes descriptive content with exactly the kind of agent-centred grasp of egocentric spatial properties that Matthen reserves for referential content, collapsing the very descriptive-referential duality. For this reason, we should think of the three-dimensionality as imaged (depicted) rather than a property of the image (of the depiction), so that integrating a depicted viewpoint with the subject’s actual viewpoint remains a major problem for Matthen’s account.

With this mental entity view of descriptive content in place, the conclusion of Sect. 2—that the descriptive-referential framework fails to secure assertion—should come as no surprise. For we are now able to ask who or what is supposed to be asserting this mental image? Recall that Matthen draws a parallel between the way that belief and perceptual experience assert. In both cases, a proposition (expressed in sentences or images) forms the content, while the attitude (of believing or perceiving) operates as an assertion operator on that content. But in the case of belief, it is the subject that endorses the proposition believed in. In believing, I commit to the truth of a proposition. The same is true of pictorial-seeing, at least when we take pictures to reflect reality. For although there is a (strained) sense in which pictures (and sentences) themselves assert—e.g. The Boulevard Montmartre at Night asserts that the boulevard is wider than the pavement—this is not the kind of assertion that Matthen requires, for it would be equally true that a visually imagined Boulevard Montmartre would assert in this way. Rather Matthen requires the sense of assertion by which the perceiving subject can assert of a picture, ‘this is how it is’ (or was), as we might imagine Pisarro exclaiming as he put the finishing touches to his painting from his studio.Footnote 19 But here the analogy with perceptual assertion falls. For it is well known that the subject can fail to endorse a perceptual experience, even while the perceptual experience itself remains (in Matthen’s idiom) assertive. Someone knowingly under an illusion will refrain from asserting that their perceptual experience is veridical, while the perceptual experience loses none of its assertoric force. It cannot be the subject asserting in that case, regardless of whether they are relying on information from descriptive or motion-guiding vision. But if the subject is not adopting an assertoric attitude to descriptive content, and the image cannot assert itself, what is doing the asserting? I take it that the only answer available to Matthen is the sensory faculty itself. This then is a ‘testimony of the senses’ view (Austin, 1962, p. 11), on which the sensory system, here the motion-guiding vision system, adopts an attitude to descriptive content, quite literally telling us that it reflects how things are. But here, finally, we must question the coherence of this proposal. For it requires the sensory system to become a kind of homunculi, playing the role of thinking subject, adopting an attitude towards a mental image. This strikes me as deeply implausible. Assertion is to state or otherwise behave in order to emphasise a position that you hold (or are pretending to hold). But the motion-guiding vision system does not hold a position. It has no view, no perspective. It is just a causal mechanism that is sensitive to how things are in the external environment. Moreover, this account severs the subject from their sensory system. We usually think of the senses as facilitating experience or action for the subject, but here an action (asserting) is ascribed to the sensory system without ascription to the subject as well.

To summarise the foregoing discussion, on establishing that descriptive content is a kind of mental image two problems arise for Matthen’s descriptive-referential account of perceptual experience. First, descriptive content is arrayed relative to a point of view internal to the depicted scene. In perceptual experience, Matthen must elide this point of view with that of the subject, but short of endorsing an error theory of spatial properties, this cannot be done. Second, if image content is a kind of picture, it is unclear who asserts it. It cannot be the subject, nor can the image itself. It seems Matthen must defend a view on which the sensory system asserts that descriptive content reflects how things are, and I have given reason to question the coherence of such a proposal. Both issues concern how the descriptive image comes to reflect reality. That is, how the three ingredients of Matthen’s account of perceptual experience, the subject, descriptive content and referential content, play together. In combination with the arguments of the previous section, I take these problems to provide reason to reject the view that perceptual experience comprises an assembled message.

At the beginning of the section, I wrote that there are two paths available to someone who is swayed by my arguments in Sect. 2. Either one can attempt to amend Matthen’s framework, assigning the role of assertion operator to something other than the referential element, or one could reject Matthen’s framework completely. In the foregoing discussion, I have provided some reasons to take the latter path. But in a recent paper (Matthen, 2021), I propose that we should read Matthen as moving in the former direction. Matthen’s discussion doesn’t concern assertion directly, but rather the ‘special introspectable character of vision (SICV)’ (2021, p. 3271). Again, Matthen addresses what distinguishes the phenomenology of vision (i.e., visual perceptual experience) from visual imagination.

Visually imagine an expansive field of green grass. This imaginative experience is phenomenologically very different from actually seeing green grass. But the difference cannot be captured by the look of visual properties; you visually imagine the grass by mentally recreating the same visual look as the grass that you see. (3272, 2021. Matthen’s emphasis).

On the descriptive-referential framework, if the special introspectable character of vision is not descriptive, it must be referential, and vice versa. There is no other position for it. But in this discussion, Matthen writes that ‘even if SICV is partially characterized by visual content (however that is understood) there must be some residue that distinguishes seeing from other visual experiences’ (2021, p. 3272. Matthen’s emphasis). The implicit claim that visual content won’t distinguish seeing from other visual experiences suggests that Matthen still ascribes to the Shared Content Principle. However, his talk of a ‘residue’ is harder to place. One option is to file it under ‘referential content’, but we have seen that this strategy won’t succeed when it comes to assertion (and the fact that perceptual experience seems to present the actual world is presumably part of its special introspectable character). As we will see, Matthen’s description of the residue also suggests it is not referential content. Another option is to read it as a third element, neither descriptive nor referential, and so a step away from the dual view Matthen has previously defended.

What is this residue? Matthen claims that the ‘residue lies in the activity of looking at a thing’ (2021, p. 3272. Matthen’s emphasis). He elucidates this idea by appealing to two further activities, ‘Using the eyes’ and ‘Using the body’. Of ‘Using the eyes’ he writes:

The impressions produced by looking seem to result from using my eyes. They go away when I close my eyes; they change when I squint or squeeze my eyes, or place a coloured filter in front of them. By contrast, my eyes are self-evidently not at work in my visual imaginings. Seeing, in short, is a state that is produced by looking; seeming to see is a state that is produced by seeming to look. (2021, p. 3273).

Two questions should concern us here. Does Matthen provide a genuine distinction between perceptual experience and visual imagination? And if so, does this activity or its residue have a place in the descriptive-referential framework, or indicate a break with it? Now I think Matthen is too quick in claiming that the eyes are ‘self-evidently not at work in my visual imaginings’. Of course, the eyes play a causal role in perception that they don’t in imagination, but Matthen is concerned with how the eyes feature in experience, that is phenomenologically, and they do so in myriad ways. They feature as points of view on the scene, they feature as part of the phenomenology of exercising perceptual agency, as when we rotate our eyeballs to change the direction of our gaze, they also feature as part of our naïve self-understanding of how we are able to see, which is arguably implicit in the phenomenology of seeing.Footnote 20 It is not clear that these (or any other) phenomenal manifestations of the eyes are excluded from visual imagination. Certainly, we are not obliged to imagine all or any of these manifestations of the eye in a visualization. But the fact that we can is sufficient to dismiss the idea that using the eyes is distinctive of seeing. For surely, whatever the phenomenology (or residue) of using the eyes that Matthen has in mind, this could just as well be imagined. There is a simple principle at work here.

The Simple Principle: any phenomenal aspect of perceptual experience can be represented in visual imagination.

Note that adhering to this principle is perfectly consistent with rejecting Matthen’s Shared Content Principle, for it may well be that the way in which we are aware of our eyes in perceptual experience is significantly different from the way we represent them in imagination. For example, the former might involve having a proprioceptive experience while the latter is merely a case of representing such an experience. There need be no common element between perceptual experience and visual imagination for a subject to be able to imagine anything they can perceptually experience. In short, the residue of the activity of using the eyes doesn’t constitute the special introspectable character of perceptual experience.

Let’s turn to Matthens’ second elaboration of the activity of looking, ‘Using the body’:

When the observer moves, there are characteristic systematic changes in experience that correlate with changing distance, direction of illumination, and within-object coordination. These correlations can be used to gain information that is absent in a static image. The way an observer moves to gain this kind of information is part of the perceptual strategy that she employs. She turns her head and moves her eyes; she manipulates the object and changes the lighting, e.g., by drawing the curtains and letting the sunlight in. Such changes of viewing circumstance are voluntary and free; we gain information from how the visual experience changes in response to these changes of perspective we cannot gain this information from a static, monocular point of view (Matthen, 2021, p. 3272)

While Matthen’s discussion is brief, it contains the kernel of an account of perceptual assertion. One can envisage a picture on which these kinds of systematic correlations between movement of the body and the perceived world play a role in that experience seeming to present mind-independent reality.Footnote 21 For perhaps the characteristic changes in appearance as lighting conditions alter or we manipulate objects buttress a sense that these appearances are beyond our conscious control and so mind-independent. Admittedly, this is to read more into Matthen’s paragraph than it strictly contains, but the point I develop in the coming paragraph will hold even if we reject a reading on which the phenomenology that Matthen is pointing to is assertoric.

Do these ‘characteristic systematic changes in experience’ belong to descriptive content, referential content or something else? They cannot belong to referential content, for they are not inarticulate contributions to perceptual experience. This is rather a matter of the way a visual scene looks, and the consistency with which it looks that way across changing conditions. But nor can these correlations be descriptive, for then they would fail to provide the contrast with visual imagination that Matthen is seeking. I propose then that Matthen is breaking with his commitment to the descriptive and referential exhausting perceptual experience. There is something else here: a role for a perceptually embedded reflection on the correlation between bodily agency and perceptual appearance that is not reducible to an image. Matthen is moving away from the descriptive-referential framework, though only tacitly and without indicating what is to replace it.

Final question: is the proposal successful? Has Matthen identified a phenomenological feature that distinguishes perceptual experience from visual imagination? Here we are obliged to return to the Simple Principle I introduced above: any phenomenal aspect of perceptual experience can be represented in visual imagination. For can’t I visually imagine a red ball undergoing characteristic changes in appearance as my imagined point of view changes, or the lighting conditions change? Visualizing doesn’t necessitate imagining these changes, but I certainly can, and this is sufficient to reject Matthen’s proposal that the experience (or residue) of using one’s body distinguishes perceptual experience from visual imagination. Perhaps there is scope to develop Matthen’s proposal further, perhaps by examining how the constraints on perceptual agency in influencing appearance contrast the very different constraints on imaginative agency. But it should be clear that such a discussion would take us far beyond the descriptive-referential framework which I set out to assess. This only serves to reaffirms my central point, that the descriptive-referential framework is not equipped to explain perceptual experience's purporting to present the world.

5 Conclusion

It is tempting to take talk of perceptual assertion metaphorically. An imprecise gesture at the fact that perceptual experience purports to represent the world as it is. But it is a feature of Matthen’s descriptive-referential framework that he takes perceptual assertion literally. Assertion offers an explanation of purport. This paper has examined whether he is entitled to take assertion literally; whether the descriptive and referential elements can come together to sustain perceptual assertion of sensory content. I have found that they cannot. First, the referential element cannot play the role of assertion operator on either a relational or intentional reading. I also found that on Matthen’s view the perceiver occupies a less privileged epistemological position than we might like to ascribe her. Second, I have provided reason to think that descriptive (or image) content is a mental entity on Matthen’s view. This in combination with Matthen’s contentious Shared Content Principle gives rise to serious difficulties for the proposal that this content is asserted in perceptual experience. For the depicted point of view cannot be elided with the subject’s actual point of view. Moreover, the notion of assertion risks incoherence once it is established that neither subject nor image but the sensory system itself must do the asserting. With these severe difficulties for the descriptive-referential framework laid out, I asked whether in a recent paper, Matthen moves away that framework. I suggest that he does so partly, for while the descriptive and referential elements are still present in his account, there is a reading of Matthen’s paper on which they no longer exhaust content. But if the activity of looking is a third element, it is unclear how to position it in Matthen’s broader view. In any case, this move fails to distinguish perceptual experience from visual imagination as Matthen intends. I conclude that on balance, we should reject the descriptive-referential framework and its appeal to assertion. We should consider perceptual assertion at best metaphorical and renounce the idea of an assembled message.