Perceived empty duration between sounds of different lengths: Possible relation with repetition and rhythmic grouping

Kuroda, Tsuyoshi; Tomimatsu, Erika; Grondin, Simon; Miyazaki, Makoto

doi:10.3758/s13414-016-1172-x

Perceived empty duration between sounds of different lengths: Possible relation with repetition and rhythmic grouping

Published: 05 July 2016

Volume 78, pages 2678–2689, (2016)
Cite this article

Download PDF

Attention, Perception, & Psychophysics Aims and scope Submit manuscript

Perceived empty duration between sounds of different lengths: Possible relation with repetition and rhythmic grouping

Download PDF

Tsuyoshi Kuroda¹,
Erika Tomimatsu²,
Simon Grondin³ &
…
Makoto Miyazaki¹

1853 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

We investigated how perceived duration of empty time intervals would be modulated by the length of sounds marking those intervals. Three sounds were successively presented in Experiment 1. Each sound was short (S) or long (L), and the temporal position of the middle sound’s onset was varied. The lengthening of each sound resulted in delayed perception of the onset; thus, the middle sound’s onset had to be presented earlier in the SLS than in the LSL sequence so that participants perceived the three sounds as presented at equal interonset intervals. In Experiment 2, a short sound and a long sound were alternated repeatedly, and the relative duration of the SL interval to the LS interval was varied. This repeated sequence was perceived as consisting of equal interonset intervals when the onsets of all sounds were aligned at physically equal intervals. If the same onset delay as in the preceding experiment had occurred, participants should have perceived equality between the interonset intervals in the repeated sequence when the SL interval was physically shortened relative to the LS interval. The effects of sound length seemed to be canceled out when the presentation of intervals was repeated. Finally, the perceived duration of the interonset intervals in the repeated sequence was not influenced by whether the participant’s native language was French or Japanese, or by how the repeated sequence was perceptually segmented into rhythmic groups.

Lightness/pitch and elevation/pitch crossmodal correspondences are low-level sensory effects

Article 29 January 2019

PSYCHOACOUSTICS-WEB: A free online tool for the estimation of auditory thresholds

Article Open access 06 May 2024

Efficacy of binaural auditory beats in cognition, anxiety, and pain perception: a meta-analysis

Article 02 August 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

There is an increasing body of literature in the field of psychophysics regarding how the duration processing of empty intervals that are marked by successive signals is influenced by physical structure, such as the sensory modality and the size, of those signals (e.g., Grondin, 2010; Ono & Kitazawa, 2009; Xuan, Zhang, & Chen, 2007). Some of these studies focus on the length of sounds marking empty intervals, examining one of the following issues: (1) how the listener’s ability for detecting a change in the duration of empty intervals, as expressed by the just-noticeable difference or the slope of psychometric functions, is modulated by the lengthening of each marker (Grondin, Roussel, Gamache, Roy, & Ouellet, 2005; Grose, Hall, & Buss, 2001; Kuroda, Hasuo, & Grondin, 2013; Penner, 1976; Rammsayer & Leutner, 1996), and (2) how the perceived duration of empty intervals, as expressed by a point of subjective equality, is modulated by the lengthening of each marker (Hasuo, Nakajima, Osawa, & Fujishima, 2012; Woodrow, 1928). The former issue is concerned with the sensitivity (or variability) of the timekeeping process, and the latter one with its accuracy. The present study focused on the accuracy issue; in other words, we investigated the effects of sound length on the perceived duration of empty intervals.

Woodrow (1928) and Hasuo et al. (2012) examined how the perceived duration of single empty intervals, marked by two successive sounds, is modulated by the lengthening of each sound. Woodrow fixed an interstimulus (offset to onset) interval between two sounds at 500 ms and found that this interval was perceived as longer when either of the two sounds was made longer. Hasuo et al. fixed an interonset (onset to onset) interval between two sounds at 120, 240, or 360 ms, and found that lengthening the second marker resulted in an overestimation of this interval. In brief, single empty intervals are perceived as longer when these are marked by longer sounds. This finding was also obtained by Grondin, Ivry, Franz, Perreault, and Metthé (1996) who used intermodal intervals (250, 500, and 750 ms).

Hasuo, Nakajima, and Hirose (2011) asked listeners to compare two neighboring intervals marked by three successive sounds’ onsets (in their first experiment). This procedure may be regarded as a type of the AXB paradigm, as used in the studies relevant to the kappa effect (Alards-Tomalin, Leboe-McGowan, & Mondor, 2013; Henry & McAuley, 2009; ten Hoopen, Miyauchi, & Nakajima, 2008). Hasuo et al. (2011) indicated that an interval was perceived as longer when its terminating marker was lengthened. For example, when the middle sound was made longer, this sound’s onset had to be presented earlier so that the first interval (between the initial and middle markers) and the second interval (between the middle and last markers) were perceived as equal. Similar results were reported by Kuroda, Hasuo, Labonté, Laflamme, and Grondin (2014) using two neighboring inter- and intramodal intervals.

However, no studies have yet investigated the effects of marker length on perceived duration when the number of presentation of intervals is further increased (i.e., is more than two). Let us suppose that a short sound (S) and a long sound (L) are alternated repeatedly (as SLSLSLSL…or LSLSLS…). It may be difficult to predict the perceived duration of empty intervals in this repeated sequence from the findings in the previously mentioned literature because this perceived duration may be influenced by not only the length of the two consecutive sounds but also another factor—rhythmic grouping. Bolton (1894) conducted a phenomenological experiment in which a sound was repeated at a constant interval. Listeners spontaneously perceived the sequence as segmented into rhythmic groups consisting of several sounds (subjective rhythmization). Moreover, some of the listeners reported that they perceived a longer interval between the rhythmic groups. In other words, between-groups intervals were perceived as longer than within-groups intervals. The similar effects of rhythmic grouping on perceived duration were demonstrated in a recent psychophysical experiment (Geiser & Gabrieli, 2013), but for only nonmusician listeners.

There is a widely accepted principle that, when successive sounds are of different lengths and are segmented into rhythmic groups, the resulting groups are likely to have longer sounds on their termination (Bolton, 1894). According to this principle, the previous sequence (SLSLSL…) would be segmented into groups consisting of a short sound followed by a long sound ([SL][SL][SL]…; SL grouping), instead of groups consisting of a long sound followed by a short sound (S][LS][LS][L…; LS grouping). If between-groups intervals are perceived as longer, the interval between the long and short sounds (LS interval) should be perceived as longer than the interval between the short and long sounds (SL interval). Thus, the perceived duration of empty intervals in the repeated sequence may be attributed to the rhythmic grouping that listeners perceive according to the relative length of the alternated sounds.

Additionally, previous studies have indicated that a change in the duration of between-groups intervals is more difficult to detect than a change in the duration of within-groups intervals (Fitzgibbons, Pollatsek, & Thomas, 1974; Geiser & Gabrieli, 2013; Thorpe & Trehub, 1989; Trainor & Adams, 2000). Some kinds of perceptual grouping could thus influence the ability for detecting a change in the duration of empty intervals (sensitivity) as well as the perceived duration of empty intervals (accuracy). As noted previously, this study focused on the accuracy issue and examined whether the effects of rhythmic grouping could interact with those of sound length on perceived duration.

There are two experiments in our study. Stimulus sequences consist of three successive sounds in Experiment 1, where we examine the effects of marker length on the perceived duration of the two neighboring intervals, as in Hasuo et al. (2011). There is no reason to assume that only three sounds are perceived as two beats leading to SL or LS grouping. However, either SL or LS grouping should occur when a short sound and a long sound are alternated repeatedly in Experiment 2. We estimate the temporal conditions enabling participants to perceive equality between the interonset intervals in this repeated sequence. Moreover, an additional session is conducted in Experiment 2 to establish whether individuals perceive SL or LS grouping for the repeated sequence.

Two previous studies measured both the individual tendency to perceive some types of perceptual grouping and the ability of detecting a change in the duration of temporal gaps, but failed to find a systematic relation between the results at these tasks (Kuroda et al., 2013; Neff, Jesteadt, & Brown, 1982). Neff et al. (1982) used a sequence in which a low-frequency sound and a high-frequency sound were alternated repeatedly, and examined whether auditory stream segregations according to frequency proximity would modulate the ability for detecting a change in the duration of temporal gaps between the two consecutive sounds of different frequencies. In addition to conducting a discrimination experiment, they conducted a stream-segregation experiment, measuring whether participants perceived the stimulus sequence as integrated into one stream or segregated into two streams. The results indicated no signs of relation between sequential grouping and gap discrimination. It therefore seemed necessary to establish the individual tendency of rhythmic (SL or LS) grouping in our study, which aimed to examine the relation between perceived duration and rhythmic grouping. Indeed, there is a technical difference between our study and the previous studies that tested the rhythmic grouping of repeated sequences (Hay & Diehl, 2007; Iversen, Patel, & Ohgushi, 2008; Kuroda et al., 2013). When the length of markers was manipulated, physical interstimulus intervals were fixed in the previous studies, but physical interonset intervals are fixed in this study. In other words, interonset intervals were physically varied when the length of markers was manipulated in the previous studies, whereas these intervals are manipulated independently from the length of markers in this study. We thus examine whether the same rhythmic grouping as in the previous studies would occur in Experiment 2. Consequently, Experiment 2 consists of two sessions, one for measuring perceived duration and the other for measuring rhythmic grouping.

A previous study by Kuroda et al. (2013) partly addressed an issue similar to the one at the heart of our study, examining the effects of rhythmic (SL or LS) grouping on the listener’s ability of detecting a change in the duration of temporal gaps in the repeated sequence. However, their discrimination experiment was designed to measure only how well listeners could detect a change in the gap duration, but not how long the gaps were perceived.^{Footnote 1} In other words, that previous study tested temporal sensitivity but not perceived duration. Consequently, our study is the first one testing the perceived duration of empty intervals in terms of the relation with the individual’s tendency to perceive rhythmic grouping, measuring both performances in a single experiment (Experiment 2). Furthermore, when the presentation of gaps was not repeated in Kuroda et al. (2013), participants compared two gaps, each of which was marked by two sounds (i.e., Sound 1—gap—Sound 2 … Sound 3—gap—Sound 4). However, in our Experiment 1, participants compare two neighboring intervals that are marked by three sounds (Sound 1—gap—Sound 2—gap—Sound 3). This structure may more often be found in daily situations where one listens to music and speech.

In summary, the purpose of this study was to investigate how the perceived duration of interonset intervals would be modulated by the length of sounds marking those intervals. The effects of marker length on perceived duration were also examined in terms of the relation with the repetition of intervals and with the listener’s tendency to perceive rhythmic (SL or LS) grouping.

Experiment 1

Three sounds were successively presented in Experiment 1. The temporal position of the middle sound’s onset, as well as the length of each sound, was manipulated. Participants judged whether the middle sound’s onset appeared too early (close to the initial sound’s onset) or too late (close to the last sound’s onset) to perceive the three sounds as presented at equal interonset intervals. They could also respond that the middle sound’s onset appeared exactly halfway between the initial and last ones. From the psychometric function of the “too early” and that of the “too late” probability, we estimated a point of subjective equality (PSE), expressing the position of the middle sound’s onset that made participants perceive the three sounds as presented at equal interonset intervals.

Method

Participants

Sixteen participants (12 females), self-reporting having normal hearing, were recruited. They were students and employees at Laval University, aged 19 to 37 years. They consented to their participation by signing a form approved by the ethics review board of this institution. One more participant was recruited, but the data from this participant were not kept for the analysis.^{Footnote 2}

Stimuli

Digital signals of stimuli were sampled at 44100 Hz and quantized to 16 bits. Each trial consisted of three sounds that were successively presented from headphones (Sennheiser HD 477) at about 66 dBA. These sounds were square-like waves generated by mixing the fundamental of 500 Hz and the first three odd sinusoidal components. The component’s amplitude decreased in proportion with their harmonic number; for example, the third harmonic (1500 Hz) had one third of the fundamental’s amplitude.

Each of the three sounds was either 150 (short—S) or 262.5 ms (long—L), while the initial and last sounds were always of the same length, resulting in four length conditions: SSS, SLS, LSL, and LLL (see Fig. 1). SLS and LSL could be regarded as nonrepeated versions of sequences in which a short sound and a long sound were alternated repeatedly in Experiment 2. The parameters of 150 ms and 262.5 ms were also used in Iversen et al. (2008) and Kuroda et al. (2013). Kuroda et al. reported that listeners’ ability for detecting a change in the duration of single gaps was severely impaired when the first marker was lengthened from 150 to 262.5 ms. Rammsayer and Leutner (1996) also indicated that the ability for detecting a change in the duration of gaps is shifted when sound length exceeds a critical point of 200 ms. Thus, the lengthening of the marker from 150 to 262.5 ms might also influence perceived duration in this study. To avoid spectral splatter, amplitude rose and decayed during 20 ms at the beginning and end of each sound with raised-cosine ramps, the ramps being included in the sound length.

The interonset interval between the initial and the last sounds was fixed at 800 ms whereas the position of the middle sound’s onset was varied; its onset was placed at 0, 40, 80, and 120 ms before or after the physical bisection point (400 ms), resulting in seven temporal positions (-120, -80, -40, ±0, +40, +80, +120 ms). Empty intervals around 400 ms were also used in Geiser and Gabrieli (2013).

Procedure

Participants were instructed to judge whether three sounds were presented at equal interonset intervals with three alternatives. They responded “exactly halfway” when the middle sound’s onset was located exactly halfway between the initial and last onsets or when the three sounds’ onsets appeared at exactly equal intervals. They responded “too early” when the middle sound’s onset appeared too early to perceive the three sounds’ onsets as presented at equal intervals or when the middle sound’s onset was closer to the initial one. They responded “too late” when the middle sound’s onset appeared too late to perceive the three sounds’ onsets as presented at equal intervals or when the middle sound’s onset was closer to the last one.

Participants listened to the stimulus pattern by clicking on the “play” pane. A 2-s silent interval began after clicking, and then the pattern was presented. Participants could listen to the pattern only once in each trial, but when listening was disturbed for some specific reason (e.g., yawning or coughing), they were allowed to listen again by clicking on the “replay” pane.

Four experimental conditions (SSS, SLS, LSL, and LLL) were presented in separate sessions. The order of these sessions was counterbalanced. Each session was divided into three blocks. In each block, the seven patterns for the seven middle positions were presented 10 times each in a random order, resulting in 70 trials. Each block was preceded by two practice trials where randomly selected stimuli were presented. Each session took about 30 minutes.

Data analysis

The probability of responding “too early” and of responding “too late” was calculated against the middle sound’s position for each experimental condition for each individual. Each probability was based on 30 responses (=3 blocks × 10 trials). The point of subjective equality (PSE) was defined as the midpoint of uncertainty interval (Guilford, 1954; see also Kuroda & Hasuo, 2014) that was estimated from the psychometric function of the “too early” and that of the “too late” probability. Technically, the cumulative normal distribution was fit to the “too late” function and to the reversed “too early” function (1 minus the probability) with a nonlinear least-square method (Levenberg-Marquardt algorithm). The lower limit of the uncertainty interval was an x-axis value at which the curve fit to the reversed “too early” function crossed .50 probability, whereas the upper limit was an x-axis value at which the curve fit to the “too late” function crossed .50 probability. PSE was given by dividing the sum of the lower and the upper limit by 2.

Results

The mean probability of responding “too early” and of responding “too late” as a function of the middle sound’s position is shown in Fig. 2. The goodness of fitting the cumulative normal distribution to each function was generally high: For “too late,” the R ² value was above .90 in 63 cases out of 64 trials (4 patterns × 16 participants) and was .43 in one case.^{Footnote 3} For “too early,” the R ² value was above .90 in 61 cases and was between .90 and .50 in three cases.

The mean PSE for each experimental condition is shown in Fig. 3a. The 95 % confidence intervals (CIs) are also apparent in this figure and indicate whether the mean PSE significantly differed from zero. The mean was significantly lower than zero in SSS, t(15) = 2.348, p = .033, d = .587, in SLS, t(15) = 2.929, p = .010, d = .732, in LLL, t(15) = 3.395, p = .004, d = .849, but not in LSL, t(15) = 1.491, p = .157, d = .373. A one-way repeated-measures analysis of variance (ANOVA) showed a significant effect for the four experimental conditions, F(3, 45) = 5.447, p = .003, η_p ² = .266. Multiple comparisons based on the Tukey HSD method showed that the mean was significantly lower in SLS than in LSL (p = .001, d = .650).

An additional analysis was conducted to predict from the current results what duration conditions would enable listeners to perceive equality between interonset intervals when a short sound and a long sound were alternated repeatedly in Experiment 2. Because this repeated pattern includes both the SLS and LSL structures as presented in Fig. 4, a possible indicator for predicting the results of Experiment 2 would be the simple average of the results of SLS and of LSL in this experiment. However, the order of the SL interval and LS interval for SLS was opposite to that for LSL. In other words, PSE for SLS represents the SL interval minus 400 ms (the bisection point) when the SL and LS intervals were perceived as equivalent, whereas PSE for LSL represents the LS interval minus 400 ms when the SL and LS intervals were perceived as equivalent. To make it possible to average the results of SLS and LSL, we recalculated PSE in the scale of the relative duration of the SL to the LS interval for each sequence (see Fig. 3b, SLS and LSL). This variable is called PSE_SL-LS. For example, the PSE of 40 ms for LSL (= LS interval of 440 ms – bisection point of 400 ms) corresponds to the PSE_SL-LS of -80 ms (= SL interval of 360 ms – LS interval of 440 ms). The results of SLS and LSL were then averaged for each individual. The 95 % CIs in Fig. 3b indicate that the mean result for the average of SLS and LSL, that is, (SLS + LSL)/2 in the figure, was significantly lower than zero, t(15) = 2.602, p = .020, d = .650. Therefore, it would be reasonable to posit that participants would perceive the equality between the interonset intervals in the repeated pattern when the SL interval was physically shortened relative to the LS interval, rather than when these intervals were exactly equivalent.

Discussion

The mean PSE (not PSE_SL-LS) was about -5 ms when three sounds were of equal length, that is, for the SSS and LLL sequences, indicating that the three sounds were perceived as presented at equal interonset intervals when the middle sound’s onset was located 5 ms before the physical bisection point. This could be regarded as a type of time order error (Eisler, Eisler, & Hellström, 2008). Because participants tended to perceive the first interval (between the initial and the middle onsets) as longer than the second interval (between the middle and the last onsets), the first interval had to be physically shortened relative to the second interval so that participants perceived these intervals as equal.

However, such bias (PSE significantly lower than zero) disappeared in the LSL sequence, and, moreover, this sequence led to significantly higher PSE than the SLS sequence. This SLS versus LSL difference was consistent with that reported by Hasuo et al. (2011) and could be attributed to delayed perception of the onset with the lengthening of the sound (p center shift; see Gordon, 1987; Morton, Marcus, & Frankish, 1976). Because the onset was perceived later for the long sound than for the short sound, the middle sound had to be presented physically earlier for SLS than for LSL so that participants perceived the three sounds as presented at equal interonset intervals.

Experiment 2

The lengthening of each sound resulted in delayed perception of the onset in Experiment 1. We examined whether the same effects of marker length would occur when a short sound and a long sound were alternated repeatedly (SLSLSL…or LSLSLS…) in Experiment 2. If the onset is perceived later for the long sound than for the short sound, participants should perceive the equality between the interonset intervals in this repeated sequence when the SL interval is physically shortened relative to the LS interval. However, as mentioned earlier, the perceived duration of the interonset intervals in the repeated sequence might be influenced by whether participants would perceive SL or LS grouping for the sequence. In other words, between-groups intervals might be perceived as longer than within-groups intervals. If individuals perceive SL grouping for the current sequence, the LS interval should be perceived as longer than the SL interval; then, participants should perceive the equality between the interonset intervals in the repeated sequence when the LS interval is physically shortened relative to the SL interval.

It might also be possible that the effects of rhythmic grouping would be working together with the delayed perception of the onset for the long sound relative to the short sound. If SL grouping occurs, for example, the effects of rhythmic grouping on perceived duration (resulting in the SL < LS interval in perception) may be canceled out by the onset delay for the long sound relative to the short sound (the SL > LS interval in perception). Then, participants might perceive the equality between the interonset intervals when the SL and LS intervals were physically equivalent.

The experiment consisted of two sessions, one for discrimination and one for grouping. In the discrimination session, we estimated the relative duration of the SL to the LS interval that made participants perceive the equality between the interonset intervals in the repeated sequence. In the grouping session, we estimated whether participants perceived the repeated sequence as segmented into SL or LS groups. The correlation between the results in these sessions was examined.

Japanese-speaking participants, as well as French-speaking participants, were recruited in this experiment. Japanese-speaking participants might have a different tendency to perceive rhythmic groupings than French-speaking participants. Iversen et al. (2008) presented stimulus sequences in which a short sound and a long sound were alternated repeatedly to American-English-speaking and Japanese-speaking participants. Most of the English-speaking participants perceived SL grouping, whereas about half the Japanese-speaking participants perceived LS grouping. The similar sequences were used by Kuroda et al. (2013) for French-speaking participants, who tended to perceive SL grouping. Hay and Diehl (2007) also indicated no difference between English-speaking versus French-speaking participants. In brief, French-speaking participants tended to perceive SL grouping while Japanese-speaking participants were almost equally split on whether each individual tended to perceive SL or LS grouping. Although it remains debatable which factor causes the difference in the tendency of rhythmic grouping (Hay & Diehl, 2007; Iversen et al., 2008; Patel, 2008), the perceived duration of interonset intervals in the current sequence might also differ between French-speaking and Japanese-speaking participants if it is influenced by the tendency to perceive rhythmic grouping.