Elsevier

Cognitive Psychology

Volume 53, Issue 1, August 2006, Pages 59-96
Cognitive Psychology

Effects of auditory pattern structure on anticipatory and reactive attending

https://doi.org/10.1016/j.cogpsych.2006.01.003Get rights and content

Abstract

In three experiments, participants listened for a target’s pitch change within recurrent nine-tone patterns having largely isochronous rhythms. Patterns differed in pitch structure of initial (context) and final (target distance) pattern segments. Also varied were: probe timing (Experiments 2 and 3) and instructions about probe timing (Experiments 2 and 3). In all experiments, identification of a recurrent target was poorer in patterns with wider context pitch intervals (in semitones) than in others. Effects of probe timing also occurred, with better performance for temporally expected than unexpected probes. However, when listeners were explicitly told to focus upon a target’s pitch and not its timing (Experiment 3), they performed selectively better in patterns with smaller target/probe pitch distances, especially for rhythmically expected probes. Five theoretical approaches to the respective roles of pitch and/or time structure were assessed. Although no single approach accounted for all results, a modification of one theory (a Pitch/Time Entrainment model) provided a reasonable description of findings.

Introduction

In listening to music and related sound sequences people must attend on a moment-to-moment basis to identify aspects of the unfolding pattern. The present research considers how such attending activity is guided by auditory pattern structure. Of special interest is the role of a pattern’s pitch and time structure in attentional monitoring.

In three experiments, we use a sequence monitoring task to assess selective attending to the pitch of a tone within a repeating pattern. Listeners identify the pitch of a recurrent target tone embedded within novel sequences, distinguished by different pitch/time properties. Originally, selective listening studies engaged constructs such as information channels to explain dichotic listening performance (Cherry, 1957, Moray, 1969a, Moray et al., 1976), but recently other constructs, such as attentional focus (e.g., in frequency), have motivated research on selective listening. Thus, focusing attention to a certain tone frequency (or intensity) is more efficient for expected than for unexpected items where an expectancy is commonly established by relatively high prior probabilities or by an immediately preceding priming cue (Greenberg and Larkin, 1968, Mondor and Bregman, 1994, Moray et al., 1976, Scharf, 1998, Scharf et al., 1987, Ward and Mori, 1996, Wearden and Bray, 2001).

In the present research, we are concerned with attention and expectancies in the context of sequence monitoring. In these situations, the manipulation of expectancies via probabilities or single cues becomes challenging; a more fruitful approach derives from the long tradition of research on serial pattern structure which seeks determinants of expectancies in relationships, including pitch and time relations, among sequential serialelements (Garner and Gottwald, 1968, Hoffman and Koch, 1998, Jones, 1974, Jones, 1981, Restle, 1970, Restle and Brown, 1970). More recently, using visual events, it has been proposed that serial structure induces implicit learning and entails the integration of sequence time relationships with serially unfolding spatial relationships (Koch and Hoffman, 2000, Restle, 1972, Shin and Ivry, 2000). To this end, serial reaction time tasks (SRT), which involve self-paced serialized motor responses (Nissen & Bullemer, 1987), have often been employed to assess the acquisition of serial relations over trials. The present research is also concerned with the role of sequence timing, but in auditory patterns that unfold in pitch space. Moreover, because we are interested in the way attending is paced by pattern structure, we eschew self-paced SRT procedures that involve correlated motor responses. Our goal is to assess the degree to which pattern structure itself, including time structure, paces attending to a succession of pattern elements (e.g., tones). Indeed, it is possible that effectively timed attending, based on event structure, paves the way for emergence of coordinated motor responses.

Recent findings in sequence monitoring tasks implicate a role for event time structure (rate, rhythm) in pacing attending and generating temporal expectancies (Jones et al., 2002, Klein and Jones, 1996, Large and Jones, 1999). With isochronous sequences, listeners were better at judging a rhythmically expected than a rhythmically unexpected ending tone, producing a symmetrical accuracy profile as a function of end tone timing, as shown in Fig. 1. This inverted U profile has been termed an expectancy profile (Barnes & Jones, 2000). A major goal of the present research is to identify various determinants of expectancy profiles in light of current theories about the impact of rhythm and pitch structure on expectancies and attention. In particular, we consider whether or not pitch identifications in a sequence monitoring task yield expectancy profiles that differ as a function of a pattern’s pitch and/or time structure (Experiments 2 and 3). Moreover, we also ask if such profiles are susceptible to modulation by voluntary factors instilled by instructions (Experiment 3).

Expectancy profiles, first observed by Large and Jones (1999), have been interpreted as support for the entrainment of attending. Entrainment is a ubiquitous biological activity responsible for a variety of synchronous behaviors in organisms that connect them to various environmental regularities (light/dark cycles, meals, tidal rhythms, and so forth) (Winfree, 2000). With respect to attending, one hypothesis is that attending is associated with an internal neural activity with oscillatory tendencies and a corresponding potential for entrainment. At time scales of speech and music, such an activity reflects a primitive attunement process that may support implicit sequence learning in that it specifies how listeners tacitly come to an internal synchrony with aspects of an external sequence, i.e., the driving rhythm (Large & Jones, 1999). However, in the strict entrainment view of attending, the construct of a driving sequence rhythm is narrowly interpreted: it comprises a series of stimulus time intervals, each marked by a change in intensity conveyed by a tone’s onset. Corresponding support for the development of temporal expectancies derives from tasks involving judgments of time intervals using monotone auditory patterns (McAuley & Jones, 2003).

In the present research, we compare a strict view of a driving rhythm (given by Large & Jones, 1999) with a broader view that assumes that an auditory driving rhythm is described in terms of changes in both intensity and tone frequency. To develop this, we begin with the concept of a temporal attentional focus as a concentration of attentional energy (pulse) in time that is carried by an entraining oscillator (Large & Jones, 1999). Attentional energy is not a mentalistic construct; rather, it refers to a potentially measurable temporal concentration of neural activity associated with an internal oscillator (Large and Jones, 1999, Snyder and Large, 2005). We distinguish our description of a focus in time from attentional foci along space-like dimensions that have been proposed by others for visual space (Jonides and Yantis, 1988, Yantis and Jonides, 1984) or in pitch space (i.e., along the acoustic frequency continuum, e.g., Scharf, 1998). Large and Jones described this focus as an expected region in time (cf. Wright & Fitzgerald, 2004). Following entrainment theory, the location of a temporal focus depends on the rate and coherence of a driving rhythm. This is because listeners come to synchronize their attentional focus with the temporal onsets of successive tones, as illustrated in Fig. 2. The width, as well as location, of a recurrent attending pulse, i.e., the focus, changes as function of rhythmic coherence. Thus, in monitoring rhythmically irregular sequences a listener’s attentional focus is proposed to widen (dotted line in Fig. 2, inset). As shown, a wider attentional focus corresponds to a flatter expectancy profile; in turn, this leads to predictions of lower accuracy in judging temporally expected tones within the focus. By contrast, in sequences with rhythmically regular timing, a listener’s temporal focus narrows, leading to more precise attentional targeting in time and greater accuracy with rhythmically expected tones (solid line in Fig. 2, inset). Because entrainment involves inherently oscillatory attending activities, when it is efficient, it naturally incorporates an internal anticipatory process, meaning that the relative timing of an organism is in synchrony with a stimulus time pattern. In other words, a temporal expectancy reflects a timed, anticipatory, extrapolation of an attentional focus that is driven by event rhythm.

Efficient entrainment in rhythmically coherent patterns promotes precisely timed expectancies. Moreover, when a coherent rhythm contains a single violation, attentional synchrony is restored (within limits) via adaptive responding following this perturbation (Jones, 2004, Large and Jones, 1999). Intuitively, an ill-timed tone elicits in a listener a sort-of “double-take” that reflects an underlying corrective reaction of the entraining oscillation. Temporally unexpected elements automatically result in time shifts of an internal oscillation that have been formalized as phase corrections [Large and Jones, 1999; see (Eq. 3)]. Because a phase correction rests on a reactive shift of the attentional focus in time, it has been described as reactive attending (Jones, 2004). As with the underlying synchrony of anticipatory attending, the underlying process of phase correction is not mentalistic; rather it is also subject to the physical/biological constraints of entrainment. However, we distinguish reactive attending from anticipatory attending because anticipatory attending is significantly influenced by the global time structure of an event whereas reactive attending is a response to a local, often temporally deviant, aspect of pattern structure.

The terms anticipatory and reactive are simply descriptive labels for attending processes that precede and follow in time a given to-be-attended, but potentially deviant, element within an unfolding serial context.1 Although, the distinction is evocative of that between voluntary and involuntary attending processes in visual attention, strict parallels here are risky (Jones, 2001). Converging support for this dichotomy in audition comes from ERP findings using sound sequences. Averaged ERPs to deviant items show, e.g., enhanced negativity following that item, whereas anticipatory ERP activity has been shown to precede expected items (Besson and Faita, 1995, Besson and Macar, 1987, Goschke, 1998, Snyder and Large, 2004, Snyder and Large, 2005). Fig. 2 (inset) further illustrates this distinction. Anticipatory attending is shown to realize a temporal expectancy, set in motion by the prevailing rhythm, that is directed toward a future (expected) point in time. By contrast, reactive attending involves a re-orientation toward an unexpectedly timed item that follows from a violation of this expectancy. In short, reactive attending is contingent upon temporal expectancies, i.e., upon anticipatory attending, because it is sparked by an expectancy violation.2

Theoretically, expectancy profiles are predicted in tasks that involve both anticipatory attending (to expected points in time) and reactive attending (to unexpected points in time). A sequence element that occurs at an expected time occurs within an anticipated focus of attending and this insures relatively good performance. However, if an element occurs outside this temporal focus region (i.e., is very early or very late), then reactive attending is restricted and poorer performance occurs. That is, reactive attending is limited by the temporal width of an attentional focus. This has implications for predicted expectancy profiles. Sequences that are predicted to promote a narrow focus should yield a sharp expectancy profile, characterized by good performance with on-time tones and relatively poor performance with ill-timed ones. Theoretically, a sharp expectancy profile follows from a narrow temporal focus and reflects: (a) Efficient anticipatory attending (to rhythmically expected elements) and (b) limited reactive attending (to rhythmically unexpected elements). Conversely, sequences that promote a wide attentional focus should yield a flat expectancy profile due to relatively poor performance with on-time tones along with moderately better performance with ill-timed ones. Theoretically, a wide, diffuse, focus in time results in relatively flat expectancy profiles, reflecting: (a) Less efficient anticipatory attending and (b) greater access to certain ill-timed tones.

The Large and Jones (1999) model raises two issues that are addressed in this research. The first issue concerns the interpretation of a driving rhythm. Attentional synchrony was originally described as depending strictly on the global coherence of a driving rhythm as this is conveyed by a series of time intervals and intensity markers. Determinants of temporal expectancies are rate and rhythm which in turn affect the width of an attentional focus, hence the shape of an expectancy profile. This emphasis upon rate (tempo) and rhythm to the exclusion of pitch structure is interesting because expectancy profiles have recently been found in certain pitch judgment tasks as well as time judgment ones. Jones et al. (2002) found that when random pitch sequences were presented in an isochronous rhythm, listeners were better in identifying the pitch of rhythmically on-time probes than in identifying ill-timed ones. This is interesting, because the strict entrainment view of Large and Jones (1999) describes temporal, not pitch, expectancies; that is, neither anticipatory nor reactive attending is responsive to pitch relationships within a driving rhythm. It is possible that attending itself is strictly temporal, hence attending at the “right time” provides general performance benefits. Nevertheless, the Large and Jones model does not address a related possibility, namely that the interpretation of a driving rhythm is too narrow to explain performance in tasks where pitch structure and pitch judgments are involved. The present research addresses this issue by manipulating properties of the driving rhythm. Using a pitch judgment task, we vary not only the timing but also the pitch structure of sequences that listeners must monitor. According to the Large and Jones model only timing variations should matter.

The second issue raised by an entrainment account concerns automaticity of pattern-driven attending. If attending is automatically guided by pattern time structure, as implied by Large and Jones (1999), then instructions should have little influence on sequence monitoring. Reactive attending, in particular, has been considered an automatic (stimulus-driven) response to unexpectedly timed tones. Along with manipulations of pitch structure we manipulate instructions about timing to test these assumptions.

In sum, a strict entrainment model offers two null predictions that provide a common theoretical backdrop for the three experiments we report. These involve the respective roles of pitch structure and instructions in sequence monitoring. To preview, we juxtapose these null hypotheses with alternative proposals provided by several contrasting views of expectancy and/or attention.

To motivate a forthcoming discussion of alternative theories, we first outline important aspects of our patterns that relate to different theoretical predictions about pattern timing and pitch structure.

With regard to timing, all patterns in this research are based on a coherent, i.e., isochronous, rhythmic frame. They comprise nine tones with the eighth tone designated the target tone, as shown in Fig. 3A. The time intervals are marked by uniform intensity changes associated with onsets of tones (that vary in frequency); these time intervals are identical with one exception: one tone is sometimes temporally displaced in order to assess effects of an expectancy violation. Each nine-tone pattern cycles three times. If rhythmic isochrony is violated, the time change occurs in the third cycle with the target tone (now termed probe) arriving unexpectedly in time (i.e., early or late). In the first experiment the rhythm is isochronous in all cycles, whereas in later experiments the relative timing of the probe tone varies, arriving (equally often) early, on-time, or late. Unlike earlier research, in these experiments the probe is the penultimate tone, not the final sequence tone.

The probe timing variable is critical to testing certain hypotheses about sequence monitoring and expectancy profiles. An observed expectancy profile depends on both the contextual timing within a sequence and on the temporal violations of this driving rhythm. Thus, if a probe occurs early or late, then attending should be ‘mistimed,’ leading to an expectancy profile of the same form reported in related paradigms (i.e., where timing of a ending tone varied, cf. Fig. 1). A strict version of the entrainment model (i.e., Large & Jones, 1999) implies that people “use” the intensity/time pattern of a rhythmic sequence to pace attending; thus, they should perform best when a probe tone is rhythmically expected and worst when it is not, regardless of sequence pitch structure.

We assess the null prediction of Large and Jones by varying pitch interval widths in these sequences. One aim is to examine the respective effects of global and local pitch structure on expectancy profiles. Examples of the pitch manipulations appear in Fig. 3. In practice, global and local pitch structure have often been confounded (e.g., sequences contain either all small pitch intervals or all large ones); here we orthogonally cross global with local pitch structure. Our manipulations focus upon pitch interval size (as opposed to tonal/harmonic properties); constituent tones come from the chromatic musical scale. The nine-tone patterns comprised three cells with each cell involving three tones (as in the exemplar pattern of Fig. 3A). Pitch relationships in the first two cells constitute global pitch structure; these form a serial context variable. In the third cell, pitch intervals convey local pitch structure because they surround the target pitch; these form a target distance variable. In all experiments, we orthogonally manipulate the size of pitch intervals comprising context (Narrow, Wide) and target distance (Small, Large) cells, as shown in the four panels of Fig. 3B. This resulted in four different pitch structure, i.e., context/target distance, conditions: Narrow/Small, Narrow/Large, Wide/Small, and Wide/Large. These pitch structure variables distinguish pattern types in all experiments.

Our main experimental design requires selective attending to certain tones within patterns from these four pitch conditions. In each pattern, listeners must track the pitch of a recurring target to determine how this pitch changes when it becomes the probe in the third cycle. In some experiments (Experiment 1), the local time of the probe does not vary whereas in others it does (Experiments 2 and 3). Together, these pitch and time manipulations permit tests of the Large and Jones (1999) entrainment hypothesis. Specifically, we ask: “what aspects of pattern structure determine whether or not a target/probe tone falls outside a listener’s temporal focus of attention?” According to a strict version of entrainment, these aspects relate only to a pattern’s intensity/time changes. Therefore, in all four pitch pattern conditions performance should be poorer for ill-timed than for on-time probes and identical expectancy profiles should appear in all four pitch structure conditions (in Experiments 2 and 3). Alternatively, other theories incorporate a role for pattern pitch structure. We consider these next.

We contrast four alternative approaches with the strict entrainment view of Large and Jones (1999). Unlike the Large and Jones theory, all of these theories postulate a role for pitch expectancies in the present task. However, depending on the theory, pitch expectancies reflect an orientation to the pitch dimension or they emerge from attunement to pitch relationships within a pattern. These approaches also differ in respective emphasis on time and rhythm. They fall into two categories, namely theories that feature explicit roles for both pitch and time structure (Pitch/Time Entrainment, Pitch Space accounts) and those which primarily emphasize the role of pitch structure on pitch expectancies (Implication–Realization, Scene Analysis theories). The former offer predictions about manipulations of both pitch structure and probe timing, whereas the latter offer predictions mainly about the pitch structure variables.

The two models in this category share a common concern with an expectancy-generated attentional focus that involves pitch. However, their interpretation of an attentional focus differs.

A Pitch/Time Entrainment view shares with the Large and Jones model an emphasis upon pattern rhythm as a determinant of entrainment. But it broadens the definition of an effective driving rhythm by including frequency change as well as changes in intensity and/or time. In this view, a pitch/time pattern context induces a moment-to-moment attending trajectory that is extrapolated from pattern structure, as a dynamic expectancy in both pitch space and time. Such trajectories function as joint expectancies about ‘where’ in pitch space and ‘when’ in time future tones may occur. Thus, expectancies necessarily reflect an attunement to pattern structure, but they are dynamic extensions of this structure as well. Presumably, future tones that occur within an anticipated pitch–time focus (neighborhood) will be identified more efficiently than those far from this pitch–time focus (Jones, 1990).

Relevant research suggests that both global and local pitch relationships contribute to rhythmic coherence and to establishing pitch–time expectancies. Listeners were best in identifying an unexpected tone’s pitch when it was relatively close in pitch space to its expected location; related constraints were associated with time manipulations (Boltz, 1989, Dowling et al., 1987, Jones et al., 1981). This research invites enlarging the construct of a driving rhythm to include pitch structure. Accordingly, the Pitch/Time Entrainment theory interprets an attentional focus as a region in pitch space and time. This theory leads to two hypotheses that concern, respectively, anticipatory and reactive attending, but which differ from those of Large and Jones (1999).

The first hypothesis assumes that anticipatory attending is influenced by a pattern’s invariant relationships, including its initial serial structure. Narrow pitch context intervals are assumed to elicit anticipatory attending with a narrow attentional focus in time, whereas patterns with Wide pitch context intervals will elicit less precise anticipations, due to wider and more diffuse attentional focus in pitch–time. In other words, pitch structure is assumed to affect the width of an attentional focus, including its temporal width. This leads to the prediction that Narrow context conditions will promote more efficient anticipatory attending (i.e., better performance) to on-time target/probes than Wide context conditions.

The second hypothesis of the Pitch/Time Entrainment theory addresses reactive attending to ill-timed probes. It assumes that local pitch structure, associated with target distance, will contribute to anticipatory attending in that it also affects focus width. That is, this hypothesis is consistent with the first hypothesis in that both hypotheses assume that small pitch intervals invite narrow attentional foci in time. Therefore, with regard to local pitch structure, a Small target distance is more likely to instill a narrow temporal focus than is a Large target distance. In turn, this places constraints on reactive attending; it implies that with Large target distances attentional energy for on-time probes will be dissipated in favor of remote ill-probes that are more likely to fall within a diffuse attentional focus (relative to Small target distances). This is because the attentional focus (in pitch–time) should widen with large pitch intervals that surround a target/probe tone. By contrast, with Small target distances, a narrow attentional focus will restrict reactive attending to early and late probes. Thus, this second hypothesis predicts that performance with ill-timed probes should be poorer with Small target distances than with Large ones.

In sum, a Pitch/Time Entrainment view assumes that both anticipatory and reactive attending are influenced by pitch structure with the result that distinctly different expectancy profiles should emerge in different pitch structure conditions. These predictions are summarized in Fig. 4 (solid lines). Sharpest profiles should emerge with monitoring of patterns that instill the narrowest attentional focus, namely those in the Narrow/Small condition, whereas the flattest profiles (due to wide foci) should occur with patterns in the Wide/Large condition.

A Pitch-Space model offers a different view of attending. It draws on attentional research in vision to provide an interpretation of an attentional focus with a fixed width that is centered at a specified location on the pitch dimension (i.e., in pitch space, but not in time). The rationale is that pitch is a task-relevant dimension whereas time is not (Egeth and Yantis, 1997, Yantis and Egeth, 1999). In this account, an expectancy is a top-down voluntary search process, conferred by instructions and task demands, which orients attending to features along the pitch dimension (versus the time dimension). Attending is not oriented by unfolding serial pitch relationships within a sequence but by instructions and task demands. Consequently, the Pitch Space model does not incorporate a role for global pitch structure and offers no related predictions about anticipatory attending.

Nevertheless, the Pitch Space model leads to clear predictions about local pitch and time structure. If a listener orients to pitch as dimension, then a listener’s attentional focus will be centered on pitch features of successive tones; consequently, tones proximal in pitch space are likely to be included within a common attentional focus (Mondor & Bregman, 1994). Thus, performance with all probes, even ill-timed ones, should benefit from greater local pitch proximity (Small Target Distance). Moreover, following visual attention interpretations, variations in probe timing should result in stimulus-driven attention shifts. Due to their bottom-up salience, early and late probes may be attention-getting much as abrupt onsets are in visual attention. An ill-timed probe can summon attention, particularly when it is also close in pitch space to the preceding tone. In visual attention, identification of a target is facilitated when it occurs with an abrupt onset, due to attentional capture, namely a stimulus-driven shift of a viewer’s focus of attention to the target’s spatial location (e.g., Jonides & Yantis, 1988). In auditory arrays, a Pitch Space account predicts that a parallel facilitation will accompany a shift of attention in pitch space to an early or late probe tone with pitch-proximal tones. Thus, a Pitch Space view predicts expectancy profiles in various pitch conditions that differ as a function of target distance and probe timing, as shown in Fig. 4 (dashed lines). Although there is some evidence that performance may be facilitated by the occurrence of certain structurally unexpected targets, this comes from research using a different task involving fast sequences in which probe timing was not varied (Mondor & Terrio, 1998). Thus, specific predictions involving pitch proximity and timing variations within slow sequences have not been previously evaluated.3 These predictions of the Pitch Space model differ in critical ways from those of the Pitch/Time Entrainment model. In particular, in this view local pitch proximity should not only result relatively poor performance for on-time probes, but it should yield better (not worse) performance with ill-timed probes in patterns with Small target distances than with ill-timed probes in patterns with Large target distances.

Two important differences that distinguish the Pitch Space from the Pitch/Time Entrainment theory are shown in Fig. 4. First, with respect to global pitch structure, a Pitch/Time Entrainment theory predicts a benefit from Narrow contexts with on-time probes, due to more efficient anticipatory attending whereas a Pitch Space model predicts no such benefit. Second, with respect to local pitch structure, the two theories predict contrasting interactions of Target Distance with Probe Timing. This is most striking in sequences with Small Target Distance. In particular, the Narrow/Small condition the Pitch/Time Entrainment view (solid lines) predicts a very sharp (inverted U) expectancy profile that contrasts dramatically with the Pitch Space model (dashed lines) which predicts a shallow U profile due to attentional capture by early or late probes.

Finally, predictions about both global and local pitch structure can also be contrasted with null predictions from the Large and Jones (1999) model. This strict entrainment theory predicts identical, sharp, and expectancy profiles for all four pitch structure conditions.

The two remaining theories feature a major role for Gestalt rules. However, they differ in their approach to pitch expectancies.

The Implication–Realization (I-R) theory of Narmour proposes that pitch (melodic) expectancies that depend upon innate (automatic) Gestalt principles applied to pitch intervals (Narmour, 1990, Narmour, 1991). This theory identifies pitch structure as a source of “bottom-up,” melodic expectancies based on Gestalt rules (Narmour, 1991, Narmour, 1992, Schellenberg, 1996, Schellenberg et al., 2002). Narmour’s original theory also provides guidelines for top-down (learned) as well as bottom-up (innate) expectancies to accommodate a range of sequential effects. Simpler versions of the I-R approach, offered by Schellenberg, engage fewer Gestalt rules, but all I-R approaches feature pitch proximity importantly (Schellenberg, 1996, Schellenberg et al., 2002). Smaller pitch intervals are proposed to imply a strong local expectancy that the immediately following pitch interval (the realized interval) should also be small.

In practice, assessments of I-R theories have addressed local effects of pitch structure in that these often entail pitch manipulations of a final sequence tone. Thus, a melody may end with an implicative (open) and realized (closed) pair of pitch intervals. Here, I-R models predict that listeners rely upon innate Gestalt laws to generate pitch expectancies about the final tone, as these are reflected by high goodness ratings. Typically, regression analyses confirm that local pitch proximity is a strong determinant of these local expectancies (Cuddy and Lunney, 1995, Schellenberg, 1996, Schmuckler, 1989, Thompson et al., 1997, Unyk and Carlsen, 1987).

In sum, in various I-R models pitch expectancies are largely determined by Gestalt principles, especially the proximity principle. In the present research, we adapt the I-R rationale to address selective listening. We assume that in monitoring chromatic pitch sequences, expectancies based on local and/or global pitch proximity will lead to better overall performance with expected than with unexpected pitches. These models provide clear predictions about local proximity effects (e.g., in Narrow/Small, Wide/Small conditions); generally, I-R theories imply better overall performance for patterns with small rather than large pitch intervals. Finally, however, this approach provides less specificity about the role of temporal expectancies on rhythmically expected versus unexpected probe tones in pitch identifications.

Scene Analysis (Bregman, 1990) shares with I-R theory a featured role for Gestalt principles, especially pitch proximity. In this two-stage model, Gestalt principles represent “hard-wired” relationships functional mainly in an initial perceptual stage that addresses perception of fast auditory sequences; here pitch proximity is an important determinant of automatic groupings among tones with neighboring frequencies. In a second, schema-driven stage, pitch expectancies operate that are not necessarily based on Gestalt principles. Unlike Narmour, for Bregman a pitch expectancy is defined in terms of selective attending that is determined by a learned, domain-specific, schema. These schemas effortfully guide attending to selected tones within slow sequences (as in the present research). In this respect, if pitch-proximal tones automatically form a group (pre-attentively) in the first stage, then this grouping can interfere with selective attending to a single (within group) tone in the second, schema-driven stage (Bregman and Rudnicky, 1975, Mondor and Terrio, 1998), Scene Analysis predicts that selective attending is best in pitch patterns where the target tone “stands out” from an established perceptual group. In the present design, this is most likely with patterns of the Narrow/Large condition. As with I-R theories, Scene Analysis does not offer specific predictions about temporal expectancies or the impact of pitch structure on rhythmically generated expectancy profiles.

Four major theoretical approaches address the role of pitch structure in determining pitch expectancies during selective listening. Although they differ in assumptions and predictions, they share the conclusion that pitch distance, often cast as pitch proximity (local and/or global), plays an important role in the monitoring of relatively slow auditory patterns. A fifth approach to sequence monitoring, the Large and Jones (1999) model, concentrates on temporal, but not pitch, expectancies. Because it provides a spare description of pattern structure, it offers an omnibus null hypothesis regarding the respective roles of global and local pitch structure in a pitch identification task.

Our theoretical overview invites predictions about the role of pattern structure on selective attending. Our plan is to initially compare some of these predictions in Experiment 1, where all five theories provide predictions. This is a baseline experiment in which none of the sequences contain rhythmic irregularities. Thus, in Experiment 1, hypotheses from all five theories concern the effects of pitch proximity in global (pattern context) and local (target distance) pitch structure on pitch identifications of probe tones that always occurred as on-time probes. In Experiments 2 and 3, where we vary probe timing, we further evaluate the pitch structure hypotheses assessed in Experiment 1 for the three models that specifically address probe timing, i.e., Large and Jones (1999), Pitch/Time Entrainment, and Pitch Space approaches. Experiment 3 differs from Experiment 2 with respect to a manipulation of instructions about probe timing. Whereas in Experiment 2 listeners were not told about probe timing variations (implicit instructions), in Experiment 3 they were informed of this variable (explicit instructions). Experiment 3 continues to focus on the same three theories, but it assesses the Pitch Space and entrainment views about the role of instructions versus pattern structure on attending. This experiment considers whether instructions about probe timing can over-ride influences of pattern structure on performance.

Section snippets

Experiment 1: A baseline experiment with no rhythmic violations

Experiment 1 examines the role of local versus global pitch structure in monitoring sequences where all probes occur at temporally expected times. An omnibus null hypothesis predicts effects of neither pitch variable, whereas various alternative hypotheses emphasize different roles for global and local pitch proximity in selective attending to pitch.

If global pitch structure guides selective attending then its effects may be evident in performance with probes that occur later in a sequence (cf.

Experiment 2: Rhythmic violations and implicit instructions

Experiment 2 builds on Experiment 1 to examine effects of pitch structure on expectancy profiles. We use the same task, stimuli and instructions as those used in Experiment 1. Although we make no explicit mention of probe timing in the instructions, the single difference between the two experiments involved variation in probe times such that in Experiment 2 the probe tones do not always occur at the rhythmically expected time. Instead, equally often a probe occurs on-time, early, or late. Our

Experiment 3: Rhythmic violations and explicit instructions

The findings of Experiment 2 are more in line with entrainment explanations (Large & Jones, 1999; Pitch/Time Entrainment) than with a Pitch Space account. However, it can be argued that instructions for the task used in Experiment 2 did not fulfill an important assumption of the Pitch Space theory regarding prioritizing relevant versus irrelevant task dimensions for top-down control of attending. A more appropriate test calls for manipulations designed to directly test voluntary versus

General discussion

In attending to auditory sequences listeners are strongly influenced by early portions of a sequence, namely by the initial pattern of pitch and time intervals. In addition, they generally perform best in identifying the recurrent pitch of rhythmically expected tones that occur within simple pitch patterns, namely in sequences that contain mainly small pitch intervals. This is true in all three experiments, regardless of instructions. In other pitch patterns, a similar performance profile

References (57)

  • A.S. Bregman et al.

    Auditory segregation: Stream or streams?

    Journal of Experimental Psychology: Human Perception and Performance

    (1975)
  • E.C. Cherry

    On human communication: A review, a survey, and a criticism

    (1957)
  • L.L. Cuddy et al.

    Expectancies generated by melodic intervals: Perceptual judgments of melodic continuity

    Perception & Psychophysics

    (1995)
  • W.J. Dowling et al.

    Aiming attention in pitch and time in the perception of interleaved melodies

    Perception & Psychophysics

    (1987)
  • H. Egeth et al.

    Visual attention

    Annual Review of Psychology

    (1997)
  • W. Garner et al.

    The perception and learning of temporal patterns

    Quarterly Journal of Experimental Psychology

    (1968)
  • T. Goschke

    Implicit learning of perceptual and motor sequences

  • G.Z. Greenberg et al.

    Frequency-response characteristic of auditory observers detecting signal of a single frequency in noise: The probe-signal method

    Journal of the Acoustical Society of America

    (1968)
  • J. Hoffman et al.

    Implicit learning of loosely defined structures

  • H.M. Johnston et al.

    Higher order pattern structure influences auditory representational momentum

    Journal of Experimental Psychology: Human Perception and Performance

    (2006)
  • M.R. Jones

    Cognitive representations of serial patterns

  • M.R. Jones

    A tutorial on some issues and methods in serial pattern research

    Perception & Psychophysics

    (1981)
  • M.R. Jones

    Learning and the development of expectancies: An interactionist approach

    Psychomusicology

    (1990)
  • M.R. Jones

    Temporal expectancies, capture and timing in auditory sequences

  • M.R. Jones

    Attention and timing

  • M.R. Jones et al.

    Evidence for rhythmic attention

    Journal of Experimental Psychology: Human Perception and Performance

    (1981)
  • M.R. Jones et al.

    Temporal aspects of stimulus-driven attending in dynamic arrays

    Psychological Science

    (2002)
  • J. Jonides et al.

    Uniqueness of abrupt visual onset in capturing attention

    Perception & Psychophysics

    (1988)
  • Cited by (97)

    • Is prediction nothing more than multi-scale pattern completion of the future?

      2021, Brain Research
      Citation Excerpt :

      Auditory pitch perception offers one clear example (Hubbard, 1996; Johnston and Jones, 2006). Jones et al. (2006) found that when people hear a sequence of tones that either ascends or descends in pitch, they tend to judge a new tone as higher or lower (relative to its actual pitch), respectively. In another experiment, when participants heard a sequence of tones that regularly and periodically ascended and descended, they judged a new tone in accordance with the global structure and pattern (Johnston and Jones, 2006).

    • Orienting auditory attention in time: Lateralized alpha power reflects spatio-temporal filtering

      2021, NeuroImage
      Citation Excerpt :

      Time and space are two fundamental dimensions across which attention is distributed. Temporal regularity in the sensory input entrains so-called “attending rhythms” (Large & Jones, 1999), which increase the attentional energy at the time points of expected stimulus occurrence and thereby improve target detection in phase with the attending rhythm (de Graaf et al., 2013; Jones et al., 2006). Furthermore, a number of recent studies found that sustained attention follows a temporally dynamic, 3–8 Hz oscillatory pattern, such that target stimuli are sampled rhythmically (e.g., Fiebelkorn et al., 2013; Landau & Fries, 2012).

    View all citing articles on Scopus

    This research was sponsored, in part, by a National Science Foundation Grant, BCS-9809446, awarded to the second author. The authors are indebted to Michelle Huffman and Julia Wood for assistance with data collection as well as to Robert Ellis, Sarah Judkins, Iring Koch, and Noah MacKenzie for helpful comments on earlier drafts.

    View full text