Elsevier

Acta Psychologica

Volume 178, July 2017, Pages 87-99
Acta Psychologica

Looking at a contrast object before speaking boosts referential informativeness, but is not essential

https://doi.org/10.1016/j.actpsy.2017.06.001Get rights and content

Highlights

  • Fixating a contrast object immediately before speaking boosts informativeness.

  • However, direct fixations are not always required for full informativeness.

  • Overall, target objects attract more fixations than contrast objects.

  • Informative expressions are produced later than underinformative ones.

  • Results support a goal-based linking hypothesis between eye movements and language production.

Abstract

Variation in referential form has traditionally been accounted for by theoretical frameworks focusing on linguistic and discourse features. Despite the explosion of interest in eye tracking methods in psycholinguistics, the role of visual scanning behaviour in informative reference production is yet to be comprehensively investigated. Here we examine the relationship between speakers' fixations to relevant referents and the form of the referring expressions they produce. Overall, speakers were fully informative across simple and (to a lesser extent) more complex displays, providing appropriately modified referring expressions to enable their addressee to locate the target object. Analysis of contrast fixations revealed that looking at a contrast object boosts but is not essential for full informativeness. Contrast fixations which take place immediately before speaking provide the greatest boost. Informative referring expressions were also associated with later speech onsets than underinformative ones. Based on the finding that fixations during speech planning facilitate but do not fully predict informative referring, direct visual scanning is ruled out as a prerequisite for informativeness. Instead, pragmatic expectations of informativeness may play a more important role. Results are consistent with a goal-based link between eye movements and language processing, here applied for the first time to production processes.

Introduction

A large body of research on reference has documented context-dependent variation in speakers' referring expressions, ranging from minimal choices such as null and pronominal forms up to more explicit modified noun phrases (for a review see Davies and Arnold, 2018). For example, when introducing a referent into the discourse, a speaker is likely to say ‘an apple’, whereas if the apple had recently been mentioned, a pronoun would be more likely.

Variation in referential choice has traditionally been accounted for by theoretical frameworks focusing on how referential expressions are constrained by the linguistic discourse context Ariel, 1990, Ariel, 2001, Chafe, 1976, Chafe, 1994, Gundel et al., 1993, Gordon and Hendrick, 1998, Grosz et al., 1995), which affects the referents' information status (see Arnold, Kaiser, Kahn, & Kim, 2013, for a review). The current paper focuses on modified noun phrases; specifically on speakers' choice of whether or not to include a prenominal adjective. This type of modifier inclusion has been found to vary across studies, mediated by a number of factors such as display density (Arnold and Griffin, 2007, Koolen et al., 2015); number and type of attributes held by the target referent (Koolen et al., 2011, Mangold and Pobel, 1988, Tarenskeen et al., 2015, van Gompel et al., 2014); discourse goals (Arts et al., 2011a, Maes et al., 2004); use of ground information (Brennan and Clark, 1996, Nadig and Sedivy, 2002), and individual differences (Davies and Katsos, 2010, Hendriks, 2016). Here we focus on speakers' visual scanning behaviour as a predictor of informativeness.

In the paradigm used in this paper, inclusion or exclusion of a modifier in a referring expression affects that expression's informativeness. Referential informativeness depends on the relationship between the context (including visual and discourse features) and the referring expression. Thus, we define informativeness as a property of expressions within their contexts, such that more informative expressions are those that match a smaller set of candidate referents. In line with previous work in this area (Engelhardt et al., 2006, Davies and Katsos, 2010, i.a.), we adopt a three-way taxonomy in which referring expressions can be optimally informative (e.g. ‘the small apple’ to refer to one of a pair of apples contrasting in size), underinformative (‘the apple’ in the same context), or overinformative (‘the small green apple’ when there is only one apple). The current study focuses on speakers' visual interrogation of a scene before producing informative vs. underinformative expressions.

From a processing perspective, the production of an expression to refer to an object in a referential communication paradigm is a multi-stage process requiring a variety of skills. Firstly, the speaker must attend to the target referent to perceive its characteristics, and visually scan the surrounding scene to check for same-category competitors. This allows the speaker to identify such competitors and consider which distinguishing features should be encoded into the utterance in order to avoid producing an ambiguous referring expression (cf. Grice's, 1975/1989 Quantity maxim). The speaker must also assess the accessibility of the target referent for themselves and for their addressee: is it focused or shared in the current linguistic or extralinguistic context? The cooperative speaker should then integrate this information into a coherent and fully informative referring expression. Classic theories of accessibility and reference production have accounted for the latter stages of this process (see Arnold, 2008, for a review), and while these have been extensively empirically investigated (Keysar et al., 2000, Brown-Schmidt et al., 2008, Hanna et al., 2003, Heller et al., 2008, i.a.), the earlier, speaker-oriented stages in reference production have received less attention (though see Bock et al., 2004, Bock et al., 2003, Griffin and Bock, 2000, Kuchinsky et al., 2011, for discussions of early processes of speech production in general). This paper aims to redress the balance by focusing on the processes at work during speech planning in referential communication.

The existing literature on variation in referential informativeness has focused on both bottom-up influences such as features of the referent, and top-down factors such as the use of common ground. However, it has not yet comprehensively addressed the question of how visual scanning behaviour might affect referential choice (appealed for by Deutsch & Pechmann, 1982: 177, and documented as part of a wider study by Brown-Schmidt & Tanenhaus, 2006). Intuitively, if speakers do not complete a full scan of the visual environment, they may not realise that there are objects co-present that belong to the same category as the target and must be distinguished from it. Thus, they risk being underinformative. Pechmann (1989: 98) suggested that incomplete visual scanning might be a reason for failures in informativeness, when ‘[…] the speaker initially pays attention to the target object without seriously considering the context’. He explained that due to the incremental nature of speech processing, such behaviour may lead to overspecification if speakers start articulating their utterance before they have scanned the whole display and deduced the distinctive feature(s) of the target referent. Such behaviour could render the early part of the referring expression noncontrastive, e.g. ‘the white circle’, where there is only one circle in the display. Although Pechmann's work focused on rates of over- rather than under-specification, the same process could also plausibly result in ambiguous referring expressions, where a speaker mentions features of the target referent which fail to distinguish it from its competitors, e.g. ‘the white circle’ in a context containing a large and a small white circle. Further (though indirect) evidence for a close relationship between visual scanning and infelicitous informativeness comes from studies finding higher rates of overinformativeness when there are more competitors to scan for discriminating features (Koolen et al., 2015, Mangold and Pobel, 1988). Due to the availability of eye tracking technology, it is now possible to directly examine the relationship between visual scene interrogation and the production of referential attributes. We investigate this link by examining eye movements to target and contrast objects before the articulation of fully informative and underinformative referring expressions.

Previous research on pragmatic informativeness has concentrated on comprehension in investigating the interaction of reference and eye movements. Classic work using the visual world paradigm has shown that referential context is pivotal in the interpretation of temporary referential ambiguities (Chambers et al., 2004, Sedivy, 2003, Sedivy et al., 1999, Tanenhaus et al., 1995, Trueswell et al., 1999). This focus on reference comprehension is influenced by the wider psycholinguistic tradition of measuring eye movements in language comprehension (see Huettig, Rommers, & Meyer, 2011, and Altmann, 2011 for reviews). Although there has been comparatively little research into eye movements during language production, studies published throughout the 90s and early 2000s furthered our understanding of the relationship of eye movements to speech planning and articulation, e.g. the time-locking of eye movements and speech (Griffin & Bock, 2000) and the influence of word frequency and visual clarity on pre-articulatory viewing times (Meyer, Sleiderink, & Levelt, 1998; see Meyer, 2004, and Griffin, 2004 for reviews). This research provides important foundations for the current study, i.e. that fixations to objects typically precede reference to them, and more broadly, that eye movements convey information about speech planning processes that precede the onset of an utterance as well as about those which occur during articulation. The current study extends existing work by using eye tracking to study the production of pragmatic informativeness in tightly controlled referential forms.

Our study extends three recent papers on eye movements and informativeness by analysing fixations according to the informativeness of referring expressions. Firstly, Rabagliati and Robertson (2016, exp. 1 and 1a) monitored speakers' fixations to target and contrast objects in scenes containing lexical ambiguity (e.g. a baseball bat and an animal bat) and non-linguistic ambiguities (e.g. a red car and a yellow car), finding that adult speakers proactively monitored for non-linguistic ambiguity before articulating their referring expressions, as well as in a post-naming monitoring phase. Secondly, Vanlangendonck, Willems, Menenti, and Hagoort (2016) monitored speakers' fixations to target and contrast objects in common vs. privileged ground in order to test between competing accounts of common ground use. Speakers were found to initially fixate the target object, then consider other objects in the array. Ultimately, the number of fixations to the contrast object during the analysed temporal region (i.e. from the highlighting of the target object to utterance onset) was low, though since the authors did not analyse fixation patterns during the preview region it is unclear whether speakers were relying on previous fixations while planning their utterance (as acknowledged by Vanlangendonck et al., 2016: 749). We extend this work by presenting this precise analysis. Thirdly, Brown-Schmidt and Tanenhaus (2006) explored the relationship between the timing of a first fixation to a contrast object and the form of subsequent referring expressions in a referential communication game. The production of a fully informative referring expression was more likely if the speaker had fixated a contrast referent. However, in their exp. 1, which used relatively simple displays comprising geometric shapes and simple images, 68% of utterances were informative even without a contrast fixation. In their exp. 2, which used more naturalistic scenes and additional referents within each display, 19% of utterances were informative without a contrast fixation. Brown-Schmidt and Tanenhaus's (2006) findings thus provide evidence against a mechanistic account of reference in which contrast objects must be checked and assessed, either before or after the onset of the utterance (Griffin and Bock, 2000, Meyer et al., 1998). Instead, it seems that speakers can indeed be fully informative without fixating contrast objects. We extend this work by analysing relative fixations to contrast and target objects, and by manipulating set size within a single experiment.

These production findings from Rabagliati and Robertson (2016), Vanlangendonck et al. (2016), and Brown-Schmidt and Tanenhaus (2006) all accord with a goal-based linking hypothesis that describes the relationship between eye movements and language processing in language comprehension (Salverda, Brown, & Tanenhaus, 2011). On this account, different types of representations are involved in mapping speech to a scene, depending on the viewer's current task. Eye movements are assumed to reflect task-specific visual processes (as well as general-purpose ones), in which locations most relevant to the task at hand are more likely to be fixated. Evidence comes from Altmann and Kamide's (1999) demonstration of anticipatory effects in incremental language processing. Although this classic study has been cited intensively, few have commented on the clear effects of experimental task, with earlier and more looks to the target during the verb region in a sentence verification task than in a more passive look-and-listen task. Further support for the goal-based view comes from Brown-Schmidt and Tanenhaus (2008), who recorded fewer looks to task-irrelevant objects than to relevant ones in a referential communication game, even when the former matched the referring expression heard by the addressee. That is, on hearing ‘Put the green block above the red block’, participants were more likely to look at a red block with an empty space above it (rendering it compatible with the task) than at a red block without a vacant space above. Both blocks were linguistic matches for the referring expression, but looking behaviour was clearly mediated by the extralinguistic referential context. The current study aims to test one of the predictions of the goal-based account, i.e. that the referent that is most relevant for the task at hand will be the one that receives the most visual attention. Notably, it does so using a language production paradigm in an interactive setting.

We measured speakers' referential informativeness and their accompanying eye movements as they completed an interactive referential task. Participants saw arrays of four or eight objects, containing a singleton target object (e.g. a ball) or a target object accompanied by a contrast mate (e.g. a large and a small ball). They then told their addressee (who could see the same array without the target highlighted) to click on the target (see Fig. 1, Fig. 2 for example displays).

Despite intuitions that looking at an object is a prerequisite for referring to it informatively, there is some evidence to suggest that even target objects do not necessarily require a direct fixation in order to be referred to correctly (Dobel et al., 2007, Griffin, 2004: 231). This evidence has yet to be reliably extended to contrast objects. Contrast objects are less salient than the target object in referential communication tasks, but are still highly relevant for the goal of felicitous referring. The goal-based linking hypothesis predicts that the experimental task will mediate fixation patterns, i.e. speakers will be attracted to the most relevant referent for the task at hand. To test this hypothesis, we measure relative attention on contrast and target. Further, previous research suggests that objects can be processed extrafoveally and/or in parallel when a referring expression is very easy to generate (e.g. a pronoun) or when an object is highly recognisable (Meyer et al., 1998, Morgan and Meyer, 2005), but it is not yet clear whether this holds for discourse contexts where speakers can only be fully informative if they use information from a contrast object. Regarding the relationship between scanning behaviour and speech onset time, and assuming a serial view of speech planning (Levelt, 1989), more comprehensive pre-utterance visual scanning should require more time to complete before the onset of an utterance (Brown-Schmidt & Konopka, 2011). Thus we ask three main research questions:

  • 1.

    How informative are speakers when referring to objects in simple and more complex visual scenes?

  • 2.

    Do fixations to contrast objects prior to utterance onset predict informativeness?

  • 3.

    Do underinformative utterances have shorter speech onset latencies than informative ones?

We hypothesise that: (1) speakers will be highly informative in this simple referential task, producing underinformative referring expressions rarely, especially in simple displays; (2) increased looks to the contrast object will result in informative referring expressions and decreased looks will result in underinformative ones; (3) underinformative utterances will have shorter onset latencies than informative ones. Based on high rates of between-speaker variability in informativeness found by Davies and Katsos (2010), we will also conduct an exploratory analysis of the role of individual differences in referring behaviour and hypothesise that there will be a distinctive linguistic-cognitive profile for underinformative vs. informative speakers.

Section snippets

Design

The experiment used a 2 × 2 (contrast × display complexity) within-subjects design. Contrast was present or absent (two referents vs. one referent from the same noun category). Display complexity was 4- or 8-objects. Thus, for investigating the form of referring expressions in participants' production data (Section 3.1), contrast and display complexity entered the analysis as independent variables. The dependent variable was utterance type (i.e. informativeness): underinformative, optimally

Referential communication task: Production data

All except one of the modified referring expressions took a prenominal adjective of the form ‘click on the [adj](er) [noun]’. The only postnominally modified token was in reference to one of the filler items: ‘click on the water bottle that's open’. This preference for prenominal modification is in line with Brown-Schmidt and Tanenhaus's (2006) findings for references to simple shapes.

In an analysis of all production data (contrast and no contrast conditions; 4- and 8-object displays, see Table

Discussion

This study examined speakers' eye movements alongside the informativeness of their referring expressions in order to explore associations between the two types of behaviour. Speakers produced fully informative referring expressions in the majority of their utterances. The analyses of fixation patterns revealed that speakers were more likely to be informative if they had fixated the contrast object during multiple temporal regions (Fig. 5) and for longer (Fig. 6) before starting to speak.

Acknowledgements

We are grateful to Chris Norton for preparing the eye movement data, and to Phil Vanden and Bissera Ivanova for help with experimental programming and data collection. Thanks also to Gerry Altmann and Pirita Pyykkönen-Klauck for guidance in the early stages of the project, and to two anonymous reviewers for comments on earlier drafts. This work was funded by a British Academy Quantitative Skills grant (SQ120012) awarded to the first author.

References (72)

  • P.C. Gordon et al.

    The representation and processing of coreference in discourse

    Cognitive Science

    (1998)
  • J.E. Hanna et al.

    The effects of common ground and perspective on domains of referential interpretation

    Journal of Memory and Language

    (2003)
  • D. Heller et al.

    Would a blue kite by any other name be just as blue? Effects of descriptive choices on subsequent referential behavior

    Journal of Memory and Language

    (2014)
  • D. Heller et al.

    The role of perspective in identifying domains of reference

    Cognition

    (2008)
  • F. Huettig et al.

    Using the visual world paradigm to study language processing: A review and critical evaluation

    Acta Psychologica

    (2011)
  • R. Koolen et al.

    Factors causing overspecification in definite descriptions

    Journal of Pragmatics

    (2011)
  • A.S. Meyer et al.

    Viewing and naming objects: Eye movements during noun phrase production

    Cognition

    (1998)
  • I. Noveck et al.

    Experimental pragmatics: A Gricean turn in the study of language

    Trends in Cognitive Sciences

    (2008)
  • A.P. Salverda et al.

    A goal-based perspective on eye movements in visual world studies

    Acta Psychologica

    (2011)
  • J.C. Sedivy et al.

    Achieving incremental semantic interpretation through contextual representation

    Cognition

    (1999)
  • J.C. Trueswell et al.

    The kindergarten-path effect: Studying online sentence processing in young children

    Cognition

    (1999)
  • G. Altmann

    Language mediated eye movements

  • M. Ariel

    Accessing noun phrase antecedents

    (1990)
  • M. Ariel

    Accessibility theory: An overview

  • J.E. Arnold

    Reference production: Production-internal and addressee-oriented processes

    Language & Cognitive Processes

    (2008)
  • J.E. Arnold et al.

    Information structure: Linguistic, cognitive, and processing approaches

    WIREs Cognitive Science

    (2013)
  • A. Arts

    Overspecification in instructive texts

    (2004)
  • A. Arts et al.

    Overspecification in written instruction

    Linguistics

    (2011)
  • Audacity Team

    Audacity(R): Free audio editor and recorder [computer program]. Version 2.0.6 retrieved November 12th 2014

  • D. Bates et al.

    Fitting linear mixed-effects models using lme4

    Journal of Statistical Software

    (2015)
  • J.K. Bock et al.

    Putting first things first

  • J.K. Bock et al.

    Minding the clock

    Journal of Memory and Language

    (2003)
  • S.E. Brennan et al.

    Conceptual pacts and lexical choice in conversation

    Journal of Experimental Psychology. Learning, Memory, and Cognition

    (1996)
  • R. Brownell

    Expressive one-word picture vocabulary test

    (2000)
  • S. Brown-Schmidt et al.

    Experimental approaches to referential domains and the on-line processing of referring expressions in unscripted conversation

    Information

    (2011)
  • S. Brown-Schmidt et al.

    Real-time investigation of referential domains in unscripted conversation: A targeted language game approach

    Cognitive Science

    (2008)
  • Cited by (10)

    • Contrast perception as a visual heuristic in the formulation of referential expressions

      2021, Cognition
      Citation Excerpt :

      A number of eye-tracking studies have shown that speakers are more likely to be sufficiently informative when they have fixated on a competitor object before producing a referential expression (e.g., fixating on a small table before referring to ‘the large table’; Brown-Schmidt & Tanenhaus, 2006; Davies & Kreysa, 2017). However, these studies have also revealed that fixating on a competitor is not necessary in order for a speaker to produce an adequately informative description: Davies and Kreysa (2017) found that when speakers were shown simple displays, 83% of referring expressions were sufficiently informative without any fixations on the competitor object, whereas in more complex displays, 53% of utterances were informative without fixations on the competitor (see also Brown-Schmidt & Tanenhaus, 2006). Importantly, participants who did not fixate on the competitor were not simply ignoring its presence, as evidenced by the fact that these participants used modifiers 62% of the time in trials with competitors compared to only 3% in trials without competitors.

    • How to turn that frown upside down: Children make use of a listener's facial cues to detect and (attempt to) repair miscommunication

      2021, Journal of Experimental Child Psychology
      Citation Excerpt :

      This pattern is similar to that shown in other studies (albeit not ones specifically involving repairs) wherein children may recognize the need to provide additional information but fail to produce it. For instance, within a referential communication task, even when 4-year-old children looked at distractor objects (same object as target but differing in one dimension), they failed to provide effective messages (i.e., those that uniquely identified the target) for their listener 83% of the time (Davies & Kreysa, 2017). Our findings that children were able to detect when miscommunication occurred based on nonverbal cues and attempted to repair their messages support the assertion by Rabagliati and Robertson (2017) that children may require an “error signal” to detect whether or not they have avoided ambiguity to guide their learning in production.

    View all citing articles on Scopus
    View full text