Within the field of analytic metaphysics, visual objects have been understood as certain relational combinations of features and locations. For example, in Goodman’s theory (Goodman 1977: 204), the simplest perceptual objects are constituted by three features: a location, a color, and a time. More recently, Austen Clark has proposed that basic visual entities are composed of features “proto-predicated” over locations (Clark 2004).

However, such ‘bundle’ view is contested within the contemporary cognitive psychology and philosophy of perception. It is claimed that the visual system is equipped with mechanisms, such as “visual indices” or “object files”, whose purpose is to “pick out” objects in the environment without representing their features (Kahneman et al. 1992; Pylyshyn 2007: 38–40, 52). If such perceptual devices serve any representational role at all, they only represent objects as numerically the same or different. In a paper presenting a review of empirical results concerning vision, Leslie et al. (1998: 17) explicitly claim that:

The classical idea of object representations as bundles of sensations, perceptual features, or properties of any kind, might be fundamentally mistaken. Instead, the heart of any object representation might be inherently abstract, a kind of mental pointing at a ‘this’ or at a ‘that’.

Within the philosophical tradition, different views on the nature of objects are strongly connected with different accounts of identity criteria. In particular, bundle theories of objects are often perceived as committed to a version of the identity of indiscernibles principle, according to which having the same features is a sufficient condition for the identity. In contrast, theories according to which objects cannot be reduced to bundles of features often include statements to the effect that structures of objects contain special individuating elements, called “haecceitas” (Park 1990) or “thisness” (Adams 1979), which are not identical to any feature or location. Arguments in favor of thisness are usually connected with possible-world scenarios (Adams 1979) and it is claimed that in case of actual objects the identity of indiscernibles principle can be accepted as the principle of individuation (Casullo 1984). In this context, it would certainly be interesting if we were unable to formulate rules governing visual representations of objects’ identity without the notion of ‘thisness’.

In this paper, I evaluate whether postulating thisness is needed to formulate such rules. I show that the phenomenon of asymmetry of errors, connected with Multiple Object Tracking experiments (Pylyshyn 2004), can serve as the basis of a novel argument in favor of referring to thisness in describing the way in which diachronic identity is visually represented. The paper starts by explicating the notion of ‘thisness’ and subsequently discusses the relevant philosophical and psychological arguments.

1 What is Thisness?

Philosophers have offered various accounts of thisness, characterizing it as an individualizing feature, an individual essence, or an unqualitiative subject. In the paper, I use a more general account that abstracts from the above variants, focusing on the individualizing aspect of thisness. Technically speaking, by thisness I understand a characteristic of an object that is a pure and unavoidable individuator.

1.1 Individuators

Characteristics belonging to a category E are individuators of objects belonging to a category O if and only if two conditions are met. First, there is an identity criterion for O-objects that is, at least partially, expressed in terms of the identity of E-characteristics. By identity criterion I understand a specification of necessary and sufficient identity conditions for certain objects. The identity criterion for O-objects is expressed in terms of the identity of E-characteristics if and only if one of the conditions specified by the identity criterion for O-objects consist in having the same E-characteristics. To indicate that an identity criterion is expressed in terms of the identity of some E-characteristics, we can say, for short, that an identity criterion is ‘constituted’ by the identity of E-characteristics. In such cases, the general form of an identity criterion is as follows, where E = (x,y) means ‘x has the same E-characteristics as y’ and C 1 ∧  …  ∧ C n designate other identity conditions:

  1. (1)

    \( {\forall}_{x, y\in O}\ x= y\leftrightarrow {E}_{=}\left( x, y\right)\wedge {C}_1\wedge \dots \wedge {C}_n \)

Second, the identity criterion constituted by the identity of E-characteristics should not be circular and should not involve a regress. An identity criterion is circular if and only if one of the identity conditions it specifies, for some objects x and y, is itself the identity of x and y. Similarly, an identity criterion is regressive if and only if one of the identity conditions it specifies, for some objects x and y, is the identity of some other objects belonging to the same category as x and y. Regress arises because the identity of x and y depends on the identity of some other objects and so ad infinitum. An identity criterion that is neither circular nor regressive can be called ‘valid’. The above remarks can be summarized as follows:

  1. (2)

    Characteristics belonging to a category E are individuators of objects belonging to a category O iff there is a valid identity criterion for O-objects constituted by the identity of E-characteristics.

  2. (3)

    An identity criterion is valid iff it is neither circular nor regressive.

Treating thisness as an individuator captures the fact that thisness, in virtually all its variations, is highly relevant for the identity of objects.

1.2 Unavoidability

Characteristics from a category E are unavoidable individuators if and only if they are individuators as specified in (2), and there is no other valid identity criterion of O-objects that is not constituted by the identity of E- characteristics. In other words, an individuator’s being unavoidable means that any identity criteria that are not constituted by its identity are not valid:

  1. (4)

    Characteristics belonging to a category E are unavoidable individuators of objects belonging to a category O iff (I) they are individuators as specified in (2) and (II) there is no valid identity criterion for O-objects that is not constituted by the identity of E- characteristics.

The notion of unavoidability is introduced to accommodate the fact that thisness is a ‘nonstandard’ characteristic, which should be postulated only if another identity criterion cannot be formulated.

1.3 Pureness

The notion of pureness is less technical than the concepts presented above. Thisness does not characterize an object in any way apart from determining that this object stands in certain identity relations. Other characteristics, even if they satisfy conditions for being unavoidable individuators, also characterize an object as having a certain qualities other than being identical with something. For example, features characterize objects as being a certain way (like “being red”), and locations determine their spatial position. Consequently, if a characteristic is an usual, qualitative feature, or a location, or can be identified with some combination of the two, then it is not thisness. The pureness of thisness is one of the main reasons why postulating it is so controversial. It is introduced solely to accommodate phenomena connected with individuation.

1.4 Thisness

The above considerations allow us to state that:

  1. (5)

    Characteristics belonging to a category E are thisnesses of objects belonging to a category O iff they are pure and unavoidable individuators of O-objects.

By proposing (5), the main intuitions connected with the thisness are taken into account. Thisness is strongly connected with criteria of identity (because it is an individuator), is different from all ‘standard’, qualitative characteristics (because of pureness), and should be accepted only if there is no other identity criterion available (due to unavoidability). In addition, by referring to the concepts of ‘individuator’ and ‘unavoidability’, a distinction between ‘weak’ and ‘strong’ thisness can be drawn (Adams 1979; Denkel 2000). A weak thisness is a pure and unavoidable individuator whose identity only partially constitutes the identity criterion for some objects, as presented in (1).

Relying on the general notion of ‘thisness’, we may ask whether the reference to thisness is needed to formulate rules describing how vision represents the diachronic identity of objects. This question can be answered by formulating an identity criterion for visual objects and investigating whether such a criterion contains a reference to thisness.

2 Identity Criterion and Visual Objects

According to a standard story given in the philosophy of perception (e.g., Schellenberg 2011), visual states are representations that present a fragment of the world as being a certain way, which may match the actual state of affairs. In the majority of visual representations, what is presented are certain objects with various characteristics. Some of these characteristics describe the diachronic identity of presented objects, for instance, that an object is the same as one perceived earlier.

On the above view, statements concerning visual objects are understood as statements regarding what is presented by vision—including the statements concerning objects’ diachronic identity. For instance, the statement that a visual object A existing at T1 is identical (diachronically) to a visual object B existing at T2 means that vision presents the object A at T1, the object B at T2, and the identity relation between them.

Within this framework, an identity condition for visual objects is a general statement describing a connection between visual presentations of identity and visual presentations of other characteristics. For example, it may be the case that if a visual object x is identical to a visual object y, then x has the same location as y. This means that if the visual system presents the identity of an object x with an object y, then it presents object x as having the same location as object y. In other words, there is no visual representation that presents identical objects in different locations.

Based on the above remarks, we may ask whether, in order to characterize the diachronic identity criterion for visual objects, a reference to thisness is required. However, one may wonder whether it is possible to use the notion of ‘thisness’ in considerations about perception. First, ‘thisness’ is an ontological concept that has been used in discussions regarding the nature of objective reality, but here it is applied to issues connected not with reality per se but to the way in which visual system presents it. However, this application is justified because apart from the ontology of objective reality there is also an ontology of what is visually presented, or, in other words, an ontology of visual content. The content of visual states is not chaotic but is organized according to certain rules. What is more, there are rules that are satisfied by the content of every visual state and so determine the class of possible visual contents (for human vision). Ontological notions are well-suited for analyzing such general rules. For example, one may claim that visual system always presents colors as being localized in some place but not necessarily in any particular place. This can be expressed in ontological terms as saying that there is general, but not specific, existential dependence between colors and places. What is more, philosophers have already used ontological concepts in investigations regarding perceptual content. For instance, it is claimed that to resolve the so-called ‘many-properties problem’, an ontological distinction between two categories of elements of content, subjects and properties, has to be made (Clark 2004). Here I apply an analogous strategy of ontological analysis to problems regarding diachronic identity.

A more specific worry is that while ontological concepts can, in general, be applied to considerations regarding perceptual content, this is not the case with ‘thisness’, since its features prevents it from being detected by vision. However, it is not true that every element of content has to correspond to an external entity causally interacting with the visual system. Some elements may be presented as an effect of the internal operations of perceptual mechanisms. For example, in virtue of causal interactions vision may present that there are several objects close to each other. Nevertheless, these causal interactions do not determine that these objects have to be presented as a unified group, and to explain perceptual grouping we do not have to assume that there is an additional entity, a group, that also causally influences vision. It can be coherently claimed that a presentation of a group is a construct of perceptual mechanisms.

What is more, there is a theoretical reason against excluding the notion of ‘thisness’. It seems that the human vision can present diachronic identity between objects (Scholl 2007: 556–559). Given this, we may try to formulate rules describing the way in which the visual system presents identity. Such rules will take the form of identity conditions for visual objects. If the identity criterion for visual objects can be formulated in terms of relations between features and locations, then there is no need to introduce thisness. However, another result is also possible. We may discover that this identity criterion cannot be formulated by referring to features, locations, or the relations between them. In such a case, we are forced to accept that there is an additional characteristic that plays an individuating role, which satisfies the definition of thisness.

3 Vision and Diachronic Identity

It is commonly accepted that the human vision is able to present visual objects persisting through a variety of changes, including those concerning shape, color, and localization (Pylyshyn 2007; Scholl 2007). Because of this, the diachronic identity criterion cannot be constituted merely by the identity of features and locations, since objects can be the same despite such changes.

The above point is demonstrated by a phi phenomenon example proposed by Matthen (2004: 503–504). When two alternatively appearing and disappearing light-spots, positioned close to one another, are visible, one perceives them either as a single object moving between two places or as two spatially separate blinking objects. The difference depends on the temporal gap between the appearances of spots. In the case of a short gap one object is perceived, and when the gap is longer, two objects appear to be present. Both these cases can be described as involving two subsequent visual states. In the first, some features are related to location L1, and in the second, the same features are related to L2. Because of this similarity, formulating identity criterion of visual objects solely in terms of identity of features and locations does not allow us to explain what differentiates the above cases. In both the distribution of features and their relation to locations is the same, but the number of presented objects is different.

Such a result may suggest that there is something that constitutes the identity criterion for visual objects that cannot be described in terms of the identity or similarity of features and locations. However, there is an approach that allows to formulate diachronic identity conditions which accommodates phi phenomenon cases without postulating thisness. According to a well-established view in psychology, spatiotemporal continuity is crucial for presenting visual identity (Scholl 2007: 566–569). Probably the most important research paradigm used to investigate the persistence of visual objects is the Multiple Object Tracking (MOT), in which participants see a set of qualitatively same objects. These objects move in a random fashion and the task of the participant is to track some of them (‘targets’) while the other objects are distractors. Usually people are able to successfully track targets despite the presence of distractors if the number of targets is not greater than four (Pylyshyn 2007: 35).

The logic that connects MOT-type experiments with visual objects’ identity is as follows. The ability to perform MOT-type tasks suggests that the human visual system has mechanisms that allowing it to re-identify several objects. What is more, if a certain change breaks the identity of an object, then the number of re-identification errors should be greater. In such a case, the visual system no longer treats the object after the change as being the same as object before the change. Because of this, the new object is not considered as one of the targets and becomes a part of the set of distractors. If it is discovered that by introducing a certain change the error rate becomes higher, then a good explanation of this phenomenon would be that visual objects do not persist through such a change.

Typical results of MOT-type experiments suggest that changes in features like color or shape do not make tracking harder (Pylyshyn 2007: 36). On the other hand, disturbances in spatiotemporal continuity render it difficult to successfully re-identify objects. The relation of spatiotemporal continuity is understood here in a simple way, as occurring between objects that belong to subsequent visual states and are presented in nearby locations. The presentation of objects’ identity breaks down if, for example, they disappear and appear in a distant location, or when they lose spatial cohesion and move as a set of small fragments (Keane and Pylyshyn 2006; Scholl 2007: 568). All these examples involve some violation of spatiotemporal continuity. The only exceptions are situations in which an object is gradually occluded by an obstacle and after a short while emerges from the behind the occluder (Scholl 2007: 567–568).

Based on such data the following criterion may be proposed:

  1. (6)

    A visual object x is identical (diachronically) to a visual object y iff x is spatiotemporally continuous with y.

In addition, such a criterion should be supplemented by rules that govern cases of occlusion.

The identity criterion based on spatiotemporal continuity can easily solve the problem posed by the phi phenomenon example. The difference between presenting one object and presenting two objects can be plausibly explained by stating that when the temporal gap is bigger, then the temporal continuity between objects is not presented, and so the identity conditions are not fulfilled.

The above considerations suggest that the identity criterion based on spatiotemporal continuity can provide a way of avoiding postulating thisness in the context of diachronic identity. However, in subsequent sections of this paper, I will argue that this is in fact not the case, and that the need for thisness arises again due to the observed asymmetry of errors in Multiple Object Tracking experiments.

4 Asymmetry of Errors

While empirical results suggest that spatiotemporal continuity is a necessary diachronic identity condition of visual objects, it is less clear whether it is also a sufficient condition. In the philosophical literature, the sufficiency of continuity for identity is tested by construing splitting-like scenarios (see Parfit 1971).

Such cases present an object A existing at T1 and two objects B and C existing at a subsequent moment T2. Object A stands in some kind of continuity relation to both B and C. This allows us to ask about the pattern of identity relations that occur here. Three resolutions are possible:

  1. (I)

    Object A does not stand in an identity relation to either object B or C.

In this case, the occurrence of a continuity relation is not by itself a sufficient condition for identity. However, it is easy to create an identity criterion that relies on spatiotemporal continuity, namely by implementing some additional rules to accommodate special splitting-like situations that break the sameness of objects.

  1. (II)

    Object A is identical only to object B or only to object C.

If this is correct, then the occurrence of a continuity relation is not a sufficient condition for identity. In addition, in contrast to the previous solution, there is no easy way to fix this problem. An additional condition has to be found to explain why only one object, B or C, is identical with A.

  1. (III)

    Object A is identical to both B and C.

In this case, the occurrence of a continuity relation constitutes a sufficient identity criterion. However, due to the transitivity and symmetry of identity, it entails that B and C are identical as well.

Relying on the above framework, we can ask about the relation between spatiotemporal continuity and identity in the visual context. I believe that the phenomenon of asymmetry of errors observed in MOT-type experiments (Pylyshyn 2004) suggests that answer (II) is right. Because of this, the proper identity criterion cannot be formulated by relying solely on the notion of “spatiotemporal continuity”.

What exactly does it mean to say that an asymmetry of errors has been observed? Two main types of mistake are possible in MOT: a target may be confused with a distractor, or a target may be mistaken for another target, and it has been observed that target/target mistakes are more frequent than target/distractor ones. What is more, the occurrence of target/target errors is closely connected with situations in which two targets move close to one another (Pylyshyn 2004: 819–820). Because of this, we may speak about an asymmetry in frequency between target/target and target/distractor errors.

A situation in which two targets are in proximity in two subsequent visual states constitutes a double variant of a splitting-like scenario. In the visual state V1, there are two objects: A and B. Similarly, in the subsequent visual state V2, there are also two objects: C and D. What is more, both A and B are spatiotemporally continuous with both C and D. This is because they are presented as existing at subsequent moments and positioned in nearby locations.

As we saw above, three outcomes are possible in such splitting-like cases. First (solution (I)), sameness can be broken, so there is no identity relation between objects in V1 and objects in V2. In the case of MOT-type experiments, this can mean two things: (Ia) the objects C and D are no longer treated as targets, or (Ib) the objects C and D are new targets, different from A and B. However, both interpretations are unlikely given the empirical results concerning the asymmetry of errors.

If objects C and D in V2 are no longer targets (interpretation (Ia)), then they could be easily confused with other distractors, because in MOT-type tasks distractors are not re-identified. As a consequence, after every target/target mistake, target/distractor mistakes are likely to follow. However, the asymmetry of target/target and target/distractor errors shows that this is not the case. Confusing one target with another is much more likely than confusing a target with a distractor, so that there occur many target/target errors after which the confused targets are still successfully discerned from distractors.

To evaluate interpretation (Ib), one has to look more closely into the experimental design of studies revealing the asymmetry of errors. In such MOT-type experiments, each target is assigned a number, which is visible at the beginning of a trial, but disappears when objects start to move. The task of participants, at the end of a trial, is not only to point out targets, but also to associate the correct number with each of them (Pylyshyn 2004: 806). Such tasks test the presented identity patterns between targets, because targets presented as identical are also presented as having the same number. As empirical results show (Pylyshyn 2004: 815–819), target/target errors in such MOT-type tasks do not occur at random. More specifically, if a target is given a certain wrong number, this is usually because it moved close to a target that was given that specific number at the beginning of the trial. Because of this, not all types of target-number misattributions are possible. As an example, let’s consider four targets with associated numbers: X-1, Y-2, Z-3, and V-4. It may be the case that in a trial exactly two splitting-like situations occur: between X and Y, and between Z and V. After the trial only two types of errors are then possible: (I) X can be given number 2 and Y number 1, and (II) Z can be given number 4 and V number 3. In other words, in a splitting-like situation targets can ‘swap’ their numbers.

However, it is unlikely that interpretation (Ib) would guarantee the same pattern of errors. If (Ib) were applied to our example, after two splitting-like situations all objects would be new objects without associated numbers, since they are different from all objects to which numbers were attributed. In such a case, every kind of wrong number-target attribution would be similarly probable as resulting from guessing. However, this is not what has been revealed by MOT-type studies. One can still maintain that after a splitting-like situation new targets maintain numbers in a random fashion, but this interpretation seems less probable as it denies the close link between identity and identifying numbers in Pylyshyn’s experiment, where targets presented as being the same are also recognized as having the same number, while distinct targets are associated with different numbers. On the contrary, the proponent of the above interpretation has to assume that new targets, while distinct from all of the previous ones, still possess the same numbers.

According to solution (III), both A and B are identical to both C and D. Unfortunately, such a situation is unlikely in the context of vision. Due to the transitivity of identity, objects C and D would also be identical. However, the visual system never presents a single object as being in some non-zero distance from itself and spatial separation is a strong cue for the presence of two objects (Palmer and Rock 1994). Alternatively, it may be argued that in V2 only one object is presented, composed of two separate parts C and D. The visual system is able to present groups constituted by moving, spatially disjoint elements, but to do so these elements must move in the same direction at the same velocity. This is not the case in MOT-type tasks, where objects move randomly.

Within the last available solution (II), object A is identical with one of the objects in V2, and object B is identical with the second. This proposition is consistent with the asymmetry of errors. Each object in V1 is, quite randomly, due to lack of spatial or featural cues, identified with one and only one object in V2, which leads to frequent target/target errors. However, both objects in V2 are targets, because they are presented as identical with earlier targets, so they are not easily confused with distractors at subsequent moments.

The above results entail that spatiotemporal continuity is not a sufficient diachronic identity condition for visual objects. What is more, they show that splitting-like situations do not break identity, but in such cases each earlier object is presented as identical with exactly one of the later objects. Because of this, a general identity criterion for visual objects cannot be created by referring solely to the presence of spatiotemporal continuity.

5 Alternative Interpretations

The above argumentation relies on the assumption that in MOT experiments there is an asymmetry of errors. One may ask whether there are alternative interpretations that explain results of Pylyshyn’s experiment without postulating asymmetry. It should be noted that I do not rely on possibly controversial details of Pylyshyn’s ‘visual indices’ theory. For instance, I do not make any assumptions concerning the parallel or serial character of tracking. Nevertheless, there are two alternative interpretations that are highly relevant and should be addressed to defend the claim that continuity is insufficient for identity.

First, it can be proposed that the visual system does not present the diachronic identity of each target at all. According to this interpretation, proposed by Scholl (2009), what is presented during MOT are two sets: a set of distractors and a set of targets. Vision keeps members of these sets separate but does not engage in making differentiations between objects within the same set. Because of this, when two targets meet each other and produce a splitting-like situation, they are both recognized as members of the target set because each of them is continuous with a target presented at a previous moment. However, there is no effort to establish which of the later objects is the same as which of the earlier ones and this results in frequent misidentification of targets.

If the above interpretation is correct, then, as Scholl himself admits (Scholl 2009: 58), participants’ responses have to result from guessing, since they do not have any information about the identity relations between targets. However, this implication is inconsistent with Pylyshyn’s results. As part of his experiment, trajectories of targets were recorded, which allowed him to establish that the target/target error happened when targets moved close to each other and consisted in the swapping of target identities (Pylyshyn 2004: 815–819). This non-random pattern constitutes an argument against Scholl’s interpretation. Scholl claims, relying on introspection (Scholl 2009: 58–59), that difficulties in re-identifying targets do not seem to depend on proximity. Further studies would be needed to resolve this controversy but currently there does not seem to be enough evidence to refute the claim that sameness relations between targets are presented during MOT.

According to the second alternative interpretation, the difficulty in recalling targets’ labels does not allow us to infer much about patterns of identity, since these two pieces of information may be processed independently. This claim can be interpreted in several ways. First, it may be the case that tracking objects makes it too demanding to simultaneously remember labels. However, Pylyshyn has refuted this hypothesis by testing participant with a dual task (Pylyshyn 2004: 810–811). Second, it may be that participants cannot remember labels when they are attributed to tracked objects, probably because then a common resource has to be used both for tracking and maintaining labels. Nevertheless, this is inconsistent with the above-mentioned non-random pattern of target/target errors (Pylyshyn 2004: 815–819). Labels are not forgotten and then attributed at random but are swapped when targets are in proximity.

Finally, one can claim that during splitting-like situations only labels, but not actual target identities, are likely to be swapped. One hypothesis, mentioned by Pylyshyn himself (Pylyshyn 2004: 821), is that low-level tracking mechanisms may be encapsulated and do not provide any input for higher-level systems related to maintaining and recognizing labels, and so swapping identities and swapping labels may be independent. However, the dominant psychological view on visual tracking is different: it is postulated that visual tracking involves attention (e.g., Alvarez and Franconeri 2007; Oksama and Hyönä 2004; Scholl 2009). Attentional processes are not likely to be encapsulated, quite the opposite—one of their main functions is to integrate the information processed by lower- and higher-level systems. What is more, the hypothesis that labels are swapped independently from identities seems ad hoc until an alternative explanation is proposed that would show what causes labels to swap if not a mistake in representing identity. On the other hand, the interpretation favored by Pylyshyn provides a clear answer: attributing labels is closely connected to presenting identity, and so labels swap as a result of identity swapping.

What is more, the result of Pylyshyn’s (2004) experiment is not the only MOT-related evidence for the insufficiency of continuity for diachronic sameness. In research by Drew et al. (2013), electrophysiological measurements of Contralateral Delay Activity (CDA) were conducted using the standard MOT paradigm (without labels). As was established in earlier investigations (Drew and Vogel 2008), CDA is positively correlated with number of tracked targets. In one experiment, the authors manipulated the number of distractors, which led to a higher number of errors resulting from more frequent close encounters between targets and distractors (Drew et al. 2013: 215–216). As stated earlier, such encounters constitute a double splitting-like situation which can be resolved in three ways: (I) later objects are not the same as earlier objects; (II) later objects are uniquely identified with earlier objects; and (III) both later objects are identical to the same earlier object. If splitting-like scenarios are resolved according to the second option, then continuity is not sufficient for identity.

The above three outcomes differ in what they imply about the changes in CDA during MOT. According to (I), a after a splitting-like situation CDA should drop because no later object is identified with the earlier ones and so the overall number of targets is lower. In variant (II), CDA should remain constant as exactly one of the later objects is identified with the earlier targets so the number of targets does not change. If (III) is the case, then CDA should rise because after splitting-like situations both later objects are identified with the earlier target but not with the earlier distractor, because during MOT the diachronic identity of distractors is not presented. Drew et al. (2013: 216–217) discovered that with an increase in the number of distractors the number of target/distractor errors grew but the CDA remained constant. Similarly to the case of Pylyshyn’s (2004) experiment, this suggests that splitting-like situations are resolved in accordance with (II) and thus that continuity is not sufficient for identity.

In addition, the results of Drew et al.’s (2013) study provides an argument against Scholl’s (2009) interpretation of MOT. According to Scholl, an object is assigned to the set of targets if it is continuous with an earlier object belonging to the set of targets. This is not without consequences for splitting-like situations involving a target and a distractor. In this case, both later objects are continuous with a target presented at the previous moment and so we may expect cases in which both these objects are included within the set of targets, which should result in an increased CDA. However, the described study does not confirm this hypothesis.

6 Temporal Features

If the diachronic identity criterion of visual objects cannot be build relying solely on the spatiotemporal continuity, then it can be asked whether there is any viable alternative that does not involve thisness. The initially plausible idea is to refer to the identity of temporal features, like ‘being F at time Tx’.

Investigations concerning vision suggest that the perceptual system has the capacity to preserve information about features of previously perceived objects (Kahneman et al. 1992); so it seems that there is at least a limited ability to present temporal features. Because of this, we may postulate a set of predicates \( \left\{{F}_1^1,\dots, {F}_n^k\right\} \), where \( {F}_j^i \) means having the temporal feature ‘is j at i-moment’—for example ‘is red at T1’. Using this set we may propose that a visual object (VO) x is diachronically identical to a visual object y iff x has the same temporal features as y:

  1. (7)

    \( {\forall}_{x, y\in VO} x= y\leftrightarrow \left(\left({F}_1^1(x)\leftrightarrow {F}_1^1(y)\left)\wedge \cdots \wedge \right({F}_n^k(x)\leftrightarrow {F}_n^k(y)\right)\right) \)

Unfortunately, the identity criterion for visual objects that relies on temporal features is not valid. This is because in the visual context, temporal features seem to be second-order features involving identity. By a second-order feature I mean a feature that is possessed by an object x only if certain other features are possessed by x. In the case vision, shape is a second order feature because objects are presented as having a certain shape only if they are presented as possessing an appropriate arrangement of edges (Hummel 2013). On the contrary, differences in luminance seem to be first-order features: when an appropriate causal interaction occurs involving low-level, specialized detectors, a difference in luminance is presented without the need to present any other features (Palmer 1999: 179–180).

According to a mainstream view in cognitive psychology (see Treisman 1999), visual system starts from presenting different types of features separately and only then pieces of information are combined what allows for presenting complex features. From this perspective, temporal features, like “being red at T1”, should not be treated as first-order ones because presenting them requires combining information from at least two mechanisms responsible for perception of time and perception of visual qualities like hue.

What is more, it is unlikely that temporal features are first-order features because in order to attribute a feature corresponding to a predicate ‘\( {F}_j^i \)’ (is j at i-moment) to a currently presented object x, the visual system needs to perform a complex operation. First, it has to present that at the i-moment there is an object having the j-feature. If the i-moment is not the same with the current moment, then this requires not only the activation of feature-detectors by a current stimulus but also storing and retrieving information from visual memory. Second, it has to be established that the object x, possibly presented as existing at a different moment than the i-moment, is the same as the object that at the i-moment is having the j-feature. Because of that a requirement for presenting a temporal feature is to present a diachronic identity.

This whole operation, for a given feature corresponding to a predicate ‘\( {F}_j^i \)’,can be described formally as a necessary and sufficient condition for having this feature:

  1. (8)

    \( {\forall}_{x\in VO}{F}_j^i( x)\leftrightarrow {\exists}_{y\in VO}\left( E\left( y,{T}_i\right)\wedge \left( E\left( y,{T}_i\right)\to {F}_j(y)\right)\wedge x= y\right) \)

The first condition, E(y, T i ) or ‘y exists at T i ’, where T i is the proper name of the i-moment, guarantees that there is a visual object at the i-moment. The second condition, E(y, T i ) → F j (y), states that this object has the j-feature at the i-moment, but also allows that it is constituted by different features at other moments. The last condition, x = y, establishes identity between visual object x and a visual object that has the j-feature at the i-moment.

Such analysis of temporal features shows that they do not constitute a proper basis for formulating an identity criterion. According to (7), the diachronic identity of objects is equivalent to having the same temporal features. However, due to (8), having a temporal feature already presupposes being diachronically identical to some object. Because of this, the identity criterion (7) involves a regress. If having the same temporal features is the diachronic identity criterion for visual objects, but having temporal features already involves being diachronically identical to some visual objects, then a part of the diachronic identity criterion for any visual objects is their diachronic identity with some visual objects.

Given this, the identity criterion (7) is not valid and cannot provide a solution to the problem of visual objects’ diachronic identity.

7 Diachronic Identity and Thisness

How can a proper diachronic identity criterion of visual objects be formulated if the occurrence of spatiotemporal continuity not sufficient for identity, and if temporal features do not provide a valid criterion? The identity of ordinary features, such as shape or color, is not relevant in the diachronic context. Some studies suggest (Zhou et al. 2010), that there exist topological features, whose change breaks the identity of visual objects. However, objects in the described versions of MOT do not undergo such changes. In addition, if several objects are simultaneously tracked, the visual system has only limited ability to predict future positions relying on the actual location and parameters of movement, so additional features connected with such extrapolations cannot be used (Iordanescu et al. 2009; Keane and Pylyshyn 2006).

Given the above negative results we may ask how we can formulate a criterion of diachronic identity that accommodates splitting-like situations. In a splitting-like situation it is presented that an object A exists at moment T1 and that the same object A also exists at a subsequent moment T2. At T2 an object B is also presented, which is different from A. Of course, objects are not literally presented as having verbal labels. Here, letters simply symbolize that objects are presented as numerically the same or different (probably in virtue of applying a visual index or creating an object-file). What is more, both objects at T2 are spatiotemporally continuous with the object A at T1. In such a situation, being spatiotemporally continuous is a necessary but not sufficient identity condition of objects. There seem to be no other way to supplement this identity condition than by referring simply to the individualizing characteristics, expressed by phrases like “being the object φ” (e.g., A or B), which characterize objects as being the same or different. In particular, objects in the above case are diachronically identical in virtue of being continuous and having the same individualizing characteristic expressed by “being the object A”. Relying on these observations, a general criterion has the following form:

  1. (9)

    A visual object x is identical (diachronically) to a visual object y iff x is spatiotemporally continuous with y and x has the same individualizing characteristic as y.

The individualizing characteristics expressed by “being the object φ” are individuators because their identity constitutes the identity criterion for visual objects. They are unavoidable because there is no other valid identity criterion that can replace (9). In particular, such characteristics would not be unavoidable if spatiotemporal continuity were both necessary and sufficient for identity (see (6)). Finally, they are pure individuators because they characterize objects merely as being numerically the same or different. In conclusion, thisness is needed to formulate a diachronic identity criterion for visual objects. However, it is a weak kind of thisness, because the identity criterion is also constituted by the occurrence of spatiotemporal continuity (see section 1).

Nevertheless, we may ask what exactly the visual system presents when it presents thisness. A simple answer is that the visual system presents a characteristic in virtue of which objects are diachronically identical and, when formulating rules governing visual presentations of sameness, we cannot identify this characteristic with any combination of usual visual features or relations. However, some further questions are likely to follow.

First, it seems that splitting-like situations are rare and in ecologically relevant cases identity conditions can be formulated without referring to thisness. Because of this we might ask about the importance of a result showing that thisness is needed in the context of some laboratory cases. I believe that this result has both philosophical and psychological significance. Philosophically, it is interesting to formulate general rules governing the perceptual presentations of identity, and the obtained result suggests that it cannot be done without referring to thisness. From the psychological perspective, it is interesting to observe that vision is able to present identity in a way that is not fully determined by similarity or continuity between objects.

Second, one might ask whether there is a phenomenal character related to presenting thisness. In psychological presentations, e.g. involving the “tunnel effect”, participants report ‘feeling’ that objects are the same despite occlusion and changes of features. This may suggest that there is a phenomenal character related solely to presenting identity, but further investigation is needed to establish whether or not such a feeling is connected with some other factors, like spatio-temporal relations.

Third, one may wonder whether representations involving thisness can ever be accurate. Trivially, they cannot be fully accurate if the metaphysics of our world simply does not involve thisness. However, resolving this issue goes beyond the scope of this paper. Alternatively, one may think that they cannot be accurate because vision has no ability to detect thisness. Nevertheless, even in this case they can be accurate to some degree. Let’s consider a ‘veridical hallucination’ in which I hallucinate a red square and by accident there is an actual square in front of me. Such a representation may be treated as inaccurate because the visual system does not present that there is a red square by detecting the physical square. On the other hand, it is accurate, because the content matches the environment. This level of accuracy is certainly accessible for thisness-involving representations.

Finally, one may ask whether the assignment of thisness in ambiguous scenarios is performed at random. Of course it is not determined by other presented elements, because in that case thisness could be reduced to these elements. However, the resolution of splitting-like scenarios may be determined at a different level, for example by a pattern of neural activations.

8 Conclusions

I have argued the phenomenon of asymmetry of errors in MOT-type tasks suggests that the only way to formulate a proper identity criterion for visual objects is to postulate characteristics satisfying the definition of thisness. This rather surprising result shows that there is a room for thisness in the context of human vision.