Temporal variability is ubiquitous across the behavioral and cognitive sciences. However, measures of temporal variability tend to focus on particular timescales in data, rather than relating variations across timescales. If they do relate timescales, such as measures of long-range correlations, the methods tend to require very long time series (e.g., more than 1,000 points). In this paper, we introduce a new analysis – the Multiscale Coefficient of Variation (MSCV) – to estimate temporal variability across multiple timescales even for short time series.

In Gaussian statistics, variance of the mean or its square root, standard deviation, is the standard measure of variability. Other types of variability include local variability, global variability, and serial correlations (Torre & Balasubramaniam, 2011). Local variability is the difference between adjacent values in a time series (Low, Grabe, & Nolan, 2000; Madison et al., 2009; Torre & Balasubramaniam, 2011). Global variability is the dispersion of a probability distribution typically quantified by the coefficient of variation (σ/μ).Footnote 1 Serial correlation reflects how the values in a time series are related as a function of their distance from each other in time, and in particular whether nearby values tend to be more similar (persistent, positively correlated) or dissimilar (anti-persistent, negatively correlated) than chance (Bassingthwaighte, Liebovitch, & West, 1994; Slifkin & Newell, 1998). Local variability, global variability, and serial correlations are known to be non-independent of one another in certain conditions (Gilden, 2001; Marmelat, Torre, & Delignières, 2012; Torre, Balasubramaniam & Delignières, 2010).

Serial correlations can be found in most natural time series. For example, most biological and behavioral systems exhibit long-range correlations (Goldberger et al., 2002; Hausdorff, Peng, Ladin, Wei, & Goldberger, 1995; Ramos-Fernández et al., 2004; Sims et al., 2008; West, 2006). In cognitive science, long-range correlations are found in memory processes (Maylor et al., 2001; Rhodes & Turvey, 2007), texts (Altmann, Cristadoro, & Esposti, 2012), and many other types of cognitive phenomena (see Kello et al., 2010, for review).

Long-range correlations can be expressed as scaling laws, i.e., nonlinear functions whereby one variable is related to another raised to a power, f(x) ~ x α. The exponent, α, can be determined by plotting the variables on logged axes and estimating the slope (α) using a regression line. For temporal-based power laws, the variable of interest, f(T), is often some type of variability estimate (e.g., root mean squared error, coefficient of variability, etc.) measured as a function of timescale T. The accuracy with which scaling laws can be estimated from data depends on the length (Delignieres et al., 2006) and sample rate (Wijnants, Cox, Hasselman, Bosman, & Van Orden, 2013) of measurement series. Delignières et al. found increased biases and variability of spectral exponent estimation for time series shorter than 1,024 data points. For some types of time series, 256 data points were acceptable. However, time series shorter than 1,024 points are typically considered to be too short in length for reliable parameter estimation. This restriction is problematic for many behavioral experiments and other measurement conditions in which it is prohibitively difficult to collect more than a few dozen repeated measurements.

The goal of the current paper is to introduce MSCV analysis as a way to measure patterns of variability across multiple timescales for time series far shorter than 1,024 data points. The problem we are working to solve is the estimation of patterns of variability across multiple timescales for extremely short time series. The goal is not to estimate scaling laws from data, but rather, to estimate how variability changes across a restricted range of timescales. In the following section, we provide an introduction and description of the MSCV analysis. Then we present a simulation study testing the new analysis using time series generated by ARFIMA models that span white noise, short-term and long-term correlations. In the simulation study, we systematically varied the length of the time series to investigate the sensitivity of the MSCV analysis to signal type and time series length. We will then apply the analysis to short time series of speech phrases and musical themes to show and compare their multiscale structures.

Multiscale coefficient of variation (MSCV)

MSCV analysis was developed to measure the degree to which the coefficient of variation of measured events spans multiple temporal scales. For a time series of event durations (e.g., reaction times, utterance durations, movement distances), the MSCV measures the distance between local coefficient of variation estimates within particular time windows and the overall coefficient of variation across all time samples. It should be reiterated that MSCV values cannot be used to estimate scaling laws. The MSCV analysis simply measures the patterns of variability across multiple timescales. Also, coefficient of variation is only meaningful as a ratio unit, so the user should be aware of what scale of measurement they are using during application.

The sizes of time windows T can be set by hand, or similar to scaling law analyses, varied as a power of 2 between a minimum of 2 and maximum of L/2-1, where L is the number of measurements in the time series. The time series is tiled with non-overlapping windows of size T, and the coefficient of variation is computed within each window. For window size T, coefficients of variation across windows are averaged,

$$ MSCV(T)=\frac{\sigma (T)}{\mu (T)}. $$

The MSCV function can be plotted with window sizes T on the x-axis and corresponding MSCV values on the y-axis. The MSCV function can be quantified using a number of measures, such as the range, sum, and normalized sum of MSCV values, and the slope of the function in logarithmic coordinates (see Fig. 1).

Fig. 1
figure 1

Schematic depiction of MSCV analysis. (Top panel) MSCV profile of example time series. (Bottom panel) Basic graphical description of MSCV analysis. For each bin size, coefficient of variation is computed across a sliding, nonoverlapping window and averaged. For each bin size, average coefficient of variation is computed (see bottom-right). MSCVrange = .77, MSCVsum = 6.10, MSCVnorm = .76, MSCVslope = .61

Computing the range and sum of MSCV values is straightforward, but the normalized MSCV, MSCVnorm, requires some explanation. To compute MSCVnorm, the MSCV is divided by the global coefficient of variation for the entire time series, and normalized relative to the amount of window sizes N T ,

$$ \frac{\frac{{\displaystyle \sum_{i=2}^TMSCV(T)}}{CV}}{N_T}. $$

The MSCVnorm value is not bounded by a specific range of values, but it typically ranges between 0 and 1. By normalizing the MSCV by the global coefficient of variation, MSCVnorm provides an estimate of the amount of variability across bins that are less than the global coefficient of variation. Time series with random structure will have most window sizes approximate the global coefficient of variation and will therefore have an MSCVnorm estimate approximate 1.0. Time series that have more multiscale structure – variability spanning multiple window sizes – will have an MSCVnorm estimate less than one.Footnote 2

The MSCV analysis was recently applied to an investigation of how music affects postural sway (Ross, Warlaumont, Abney, Rigoli, Balasubramaniam, 2016). The radial sway of center of pressure measurements of postural sway and musical durations (intervals of onset and offset of sound; Coath et al., 2007, 2009) were subjected to the MSCV analysis. Ross et al. were interested in the multiscale properties of postural sway and musical durations for the purposes of assessing how the multiscale structure of postural sway couples to the multiscale structure of musical durations. Ross et al. observed that MSCVnorm estimates of radial sway and musical durations were more similar for nonmusicians relative to musicians, suggesting that nonmusicians couple to the multiscale structure of music more so than musicians. Additional results suggested that the multiscale coupling occurred more for musical durations corresponding to low groove music (Janata et al., 2012).

ARFIMA simulations

The ARFIMA (Auto-Regressive, Fractionally Integrated, Moving Average) modeling method was used to simulate time series with various degrees of short-range and long-range serial correlations. ARFIMA models are extensions to the classical ARMA (Auto-Regressive, Fractionally Integrated, Moving Average) models. ARMA models (p,q) include two components: a pth-order AR process and a qth-order MA process. ARFIMA models (p,d,q) include a dth- order fractional differencing (FI) process (Granger & Joyeux, 1980). We are using ARFIMA models to test the MSCV analysis because ARFIMA modeling has been previously used to estimate and identify long-range dependence and fractal exponents in cognitive and behavioral phenomena (Torre, Delignières, Lemoine, 2007; Torre, Varlet, & Marmelat, 2013; Wagenmakers, Farrell, & Ratcliff, 2004).

We created three types of time series of durations that are known to vary in statistical structure: persistent long-range correlations (LRC), positive short-range correlations (SRC), and random white noise (WN). For each condition, a pool of 50 series (length = 2048) was generated using the ARFIMA modeling method (using the fracdiff package in R). All conditions had a mean of 800 and a coefficient of variation of ~6 %. The auto-regressive (AR) parameter for LRC, SRC, and WN conditions were 0, 0.6, and 0, respectively. The fractional integration (FI) parameter for LRC, SRC, and WN conditions were 0.45, 0, and 0, respectively. The moving average (MA) parameter was set to 0 for all conditions. The specific ARFIMA parameters generated three conditions that varied in memory decay as quantified by the auto-correction function: LRC series exhibited power-law decay over lags suggestive of long-term statistical memory, SRC series exhibited an exponential decay over lags suggestive of short-term statistical memory, and the WN series exhibited no positive or negative autocorrelations across lags. For each time series, we estimated the MSCVnorm, for the entire time series (length 2,048) and for a random sample of lengths 200, 100, 50, 25, and 10. See Table 1 and Fig. 2 for results. Although the MSCV analysis can compute various estimates from the MSCV profile, our aim is to quantify properties of multiscale structure using a single-valued estimate. Therefore, in this simulation study, we chose to only use the MSCVnorm estimate.

Table 1 Results from the ANOVAs and planned comparison for MSCVnorm and Hurst-AL estimates across time series lengths
Fig. 2
figure 2

Results for Hurst-AL and MSCVnorm estimates as a function of signal type and time series length. Error bars represent standard error of the means

To test the performance of the MSCVnorm estimate against a common multiscale analysis, we also estimated the Hurst exponent using the Anis-Lloyd/Peters corrected rescaled range analysis (Hurst-AL) (see Weron, 2002) for the simulated time series. The rescaled range analysis was first introduced by Mandelbrot and Wallis (1969) and extends Hurst’s (1951) calculation of a self-similarity parameter, H. The R/S analysis consists of estimating the range (R) and standard deviation (S) of a subset of a time series. For example, a subset of a time series with a minimum value of 3 and maximum value of 9 will have a range of 6. If the standard deviation of the subset was S = 2, then the rescaled range for this particular subset is R/S = 3. If we increase the number (n) of observations in the subset, the linear relationship (H) between R/S estimate and n in logarithmic coordinates will approximate H = .5 for a random walk (e.g., white noise) and will be greater than H = .5 for Fractional Brownian motion. The Hurst-AL was used because it was found to improve estimation performance for small time series. To our knowledge, no researcher has used the R/S-AL analysis on extremely small times, e.g., n = 10.

The results suggest that the MSCVnorm estimates are sensitive to signal type. The first observation from the MSCVnorm estimates is that WN signals approximate a MSCVnorm value approximating 1.0. The second observation is that the MSCVnorm decreases from 1.0 as a function of increased multiscale structure, from WN to SRC to LRC. Hereafter, we use the term multiscale structure to refer to the observation that variation can be different or heterogeneous, across timescales. Decreased or low multiscale structure means that variation is similar or homogeneous across timescales. Considering the known statistical dependencies of the three signal types generated from the ARFIMA models, these observations provide two intuitions about the MSCVnorm measure. The two intuitions depend on whether the user is interpreting the MSCVnorm measure as an absolute or relative measure.

If considering the MSCVnorm as an absolute measure, the lower and upper bounds [0.0,1.0] suggest that increasing MSCVnorm estimates approaching 1.0 correspond to signals with more homogeneity of variation across bins. Conversely, estimates decreasing from 1.0 suggest more heterogeneity of variation across bins and therefore, a signal that is more multiscale.

If considering the MSCVnorm as a relative measure, the directionality of the MSCVnorm estimates between two or more experimental conditions or partitions becomes informative. For example, if a user observed that MSCVnorm estimates for Condition A were lower relative to MSCVnorm estimates for Condition B, the user could interpret the signals from Condition A to have more heterogeneity across bins and therefore, is considered more multiscale, relative to the signals in Condition B.

Another important observation is that the MSCVnorm estimates were sensitive to signal type for all time series lengths. However, the MSCVnorm estimates failed to discriminate between LRC and SRC signals for the n = 25 simulations. At n = 10, the LRC and SRC switch orders but both still discriminate between the WN time series. These results suggest that the MSCV analysis is sensitive to different types of time series of extremely short lengths. For extremely short time series, the MSCV analysis can discriminate between time series exhibiting white noise (close to randomness) and time series with specific temporal correlations.

The Hurst-AL measure performed similarly to the MSCV analysis at longer time series (n = 2048, n = 1024, n = 200). However, at n = 100, SRC and LRC time series are statistically indistinguishable, and at n = 50, the SRC and LRC estimates flip. At n = 10, the Hurst-AL fails to discriminate between the LRC and WN time series estimates.

Overall, both analyses perform equally well for longer time series. For smaller time series, both analyses also display flipped estimates around n = 100 (Hurst-AL) and n = 50 (MSCVnorm). For extremely short time series, the MSCV analysis – despite a flipping of the SRC and LRC estimates – is able to discriminate between estimates from SRC and LRC time series and estimates from WN time series. The Hurst-AL estimates at n = 10, showed that WN and LRC estimates were indistinguishable. Considering the results from this simulation study, we would advocate users to employ either analysis for substantially long time series. However, if users desire to estimate properties of multiscale variability for extremely short time series, we advocate utilizing the MSCV analysis.

Overall, the results from the ARFIMA simulations suggest that the MSCVnorm provides an intuitive estimate about the multiscale properties of a signal. In the next section, we report an application of the MSCV analysis. We chose our corpora due to the extreme length limitations of the duration series. As previously discussed, a main feature of the MSCV analysis is that it can assess the multiscale structure of extremely short time/event series. Hurst-AL estimates provide information about how the normalized range of values scale across multiple time scales. MSCVnorm estimates provide information about how the coefficient of variation at specific time scales relate to the global coefficient of variation. We have demonstrated that for extremely short time series, assessing the coefficient of variation normalized at various time scales (normalized for global coefficient of variation) is more sensitive than assessing the rescaled range of values across multiple time scales.

An empirical comparison of multiscale structure in language and music

We now provide an application of the MSCV to a novel comparison of the relationship between speech and music. The study of the relationship between speech and music is generally influenced by a common intuition that both are universal among human cultures (see Patel, 2010). Studying the commonalities between speech and music has lead to rich empirical research programs. Here we focus on the potential common patterns of multiscale structure across speech and music.

For the study of speech and music, one focus has been on prosodic properties like melody and rhythm (Hannon, 2009; Huron & Ollen, 2003; Jusczyk & Krumhansl, 1993; Lerdahl & Jackendoff, 1983; London & Jones, 2011; Patel & Daniele, 2003; Patel, Iversen, & Rosenberg, 2006; Ramus, Nespor, & Mehler, 1999). This work was influenced by a hypothesized typology of an isochronous rhythmic organization: stress-timed and syllable-timed languages (Abercrombie, 1967; Pike, 1945). Stress-timed languages were purported to have equal intervals between stresses, and syllable-timed languages were purported to have equal intervals between syllable onsets. Although empirical research does not support this “isochrony” hypothesis, researchers have started focusing on durational patterns of vocalic and intervocalic intervals.

To measure durational variability, researchers have utilized the normalized pairwise variability index (nPVI), which provides a “local” measure of the variability of durational patterns:

$$ \mathrm{n}PVI=\frac{100}{m-1}\times {\displaystyle \sum_{k=1}^{m-1}\left|\frac{\frac{d_k-{d}_{k+1}}{d_k+{d}_{k+1}}}{2}\right|}, $$

where m is the number of intervals in a time series and d k is the duration of the kth interval in the time series. The nPVI is a dimensionless quantity that provides a measure of variability of durational differences for pairs of intervals (i.e., bin size of 2) relative to the average duration of the pair.

Grabe and Low (2002) observed that nPVI measurements of vocalic intervals were greater in stress-timed languages such as British English than in syllable-timed languages such as French. This finding points to earlier work (see Nespor, 1990) suggesting that stress-timed languages are known to exhibit more vowel reduction than syllable-timed languages. Ramus et al. (1999) observed more variability in consonantal durations for stress-timed languages and proposed that stress-timed languages are purported to have more complex syllable structure relative to syllable-timed languages.

In the vein of musical composition, Patel and Daniele (2003) observed that rhythmic patterns in French and British English musical themes had similar rhythmic patterns of the composers’ (either French speaking or English speaking) native languages. Using the nPVI to measure local contrast variability, Patel and Daniele found that note durations of British English composers had greater variability relative to note durations of French composers, which corresponds to what was observed in linguistic nPVI values of speech (Ramus, 2002). Patel and Daniele’s results point to a potential common property between speech and music: prosodic patterns via rhythmic durations.

Another potential commonality is that both speech and music are organized across various levels of hierarchical order (Lerdahl & Jackendoff, 1983). In music, meter is the expected pattern of durations, usually denoted by a time signature. Meter is a recurring pattern of durations and displays structure (metric structure) across levels of variation (London, 2000). Using Patel and Daniele’s corpus of musical themes, London and Jones (2011) found differences across levels of rhythmic and metrical structure. Recent work in the study of conversational speech has shown that clustering of speech onsets are organized across time scales purported to align with levels of linguistic representation (Abney, Paxton, Dale, & Kello, 2014; Abney, Kello, & Warlaumont, 2015; see also, Luque, Luque, & Lacasa, 2015). In an extension of Patel and colleagues (Patel & Daniele, 2003; Patel, Iversen, & Rosenberg, 2006), we test whether similarities between the multiscale variability of speech and music can be observed across languages that vary on the stress-timed versus syllable-timed spectrum.

In line with work suggesting that stress-timed languages exhibit more diverse and complex syllable structure (Nespor, 1990; Ramus et al. 1999), we predict that music composed and language produced by native speakers of a stress-timed language (e.g., English) will display more multiscale structure relative to native speakers of syllable-timed languages, e.g., French. To test this prediction, we constructed musical and speech corpora and submitted the musical and speech durations to the MSCV analysis to estimate MSCVnorm values. We expect to observe lower MSCVnorm for music and speech produced by native speakers of a stress-timed language, which would indicate more multiscale structure. Ramus et al. (1999) observed more consonantal variability for stress-timed language. Therefore, we predict that, controlling for local contrast variability (nPVI), MSCVnorm estimates will be lower for stress-timed languages relative to syllable-timed ones.

Musical corpus

Our source of musical material was a subset of the corpus used in Patel and Daniele (2003). Patel and Daniele focused on collecting musical themes written by native-speaking British English and native-speaking French composers who were born in the 1800s and died in the 1900s. The chosen musical themes consisted of at least 12 notes (e.g., eighth, quarter, etc.) with no internal pauses or rests (cf. Patel & Daniele, 2003). Therefore, for each musical theme, we had a time series of note durations. To control for metrical type, in the current analyses we only included musical themes in duple time.Footnote 3 Themes with duple time have a binary meter where the meter divides beats into two subdivisions, e.g., 2/2, 2/4, 6/8. We also excluded musical themes with isochronous durational patterns. A total of 59 English musical themes and 79 French musical themes were included in the current study (see Table 2). For our corpus, the mean duration amount was 20 durations and the minimum duration amount was 12 durations.

Table 2 Composers examined in this study

To investigate differences in durational variability across musical themes, we estimated CV, nPVI, and MSCVnorm for each musical theme. Because we are interested in how the estimates of the MSCV explain variance above and beyond local measures of durational contrast (e.g., nPVI), we also include analyses where nPVI was residualized out of the MSCVnorm variable.

The nPVI measures the average degree of durational contrast (or variability) between two successive durations in a time series of discretized events. nPVI can be considered a measure of local variability. The nPVI is a single valued estimate that is computed by (1) estimating absolute difference between two successive intervals durations, (2) normalize by the mean duration of the pair, and (3) multiplied by 100. nPVI estimates closer to 100 are interpreted as having larger durational contrasts relative to lower nPVI estimates. The nPVI has been used in studies of speech and music rhythm (Grabe & Low, 2002; Low, Grabe, & Nolan, 2000; Patel & Daniele, 2003; Ramus, 2002; Ross, Warlaumont, Abney, Rigoli, & Balasubramaniam, 2016).

English music and French music did not differ in estimates of CV (β = −.05, t[136] = −.30, p = .77) or nPVI values, β = .−21, t(136) = −1.20, p = .23. However, English music did have lower values of MSCVnorm relative to French music, β = .61, t(136) = 3.29, p = .002. It is important to note that our nPVI results slightly diverge from Patel and Daniele (2003): although we found English music to have higher nPVI estimates relative to French music, this difference was not statistically reliable. One possible explanation for this difference is that we only included musical themes with duple meter, reducing the size of the corpus by almost 25 %.

To assess if MSCVnorm captured variance not explained by local variability, we residualized out nPVI from MSCVnorm. After controlling for nPVI, the original pattern of results held, suggesting that English music had lower MSCVnorm estimates relative to French music, β = .59, t(136) = 3.56, p < .001 (see Fig. 3).

Fig. 3
figure 3

Results of the residual analyses for the musical theme durations (left) and language durations (right). Error bars represent standard error of the means

We also submitted the Hurst-AL to the musical corpus. The Hurst-AL analysis yielded estimates for less than 5 % of the musical corpus. Given the low percentage of Hurst-AL estimates, we did not proceed to test for differences across the musical corpus. Inspecting the event series in the musical corpus that did and did not yield Hurst-AL estimates provided more insight into the differences between the Hurst-AL analysis and the MSCV analysis. The Hurst-AL analysis could not converge on event series with multiple consecutive identical event durations (e.g., Bax, b508: .5, .5, .5, .5, 1.0, 1.5, .5, .5, .5, .5, 1.0, 1.0…). Because the Hurst-AL estimate relies on a rescaling of ranges for particular window sizes, at small window sizes, the range will be 0. The MSCV analysis relies on coefficient of variation, not a metric of range, and is therefore more flexible for a diverse array of event series types.

A lower MSCVnorm estimate suggests that rhythmic durations span more bins of the MSCV profile, which is suggestive of an event series that is more multiscale. Our results suggested that, even when controlling for local variability (nPVI), English music has stronger multiscale properties relative to French music. In other words, there appears to be more heterogeneity of variance across timescales for English music relative to French music.

Linguistic corpus

The main hypothesis is that music and spoken language have similar multiscale structure as a function of the composer’s native language and the native language of speakers. We can also test whether or not specific units of language – such as vowel durations and consonant durations – display different multiscale structure. Our source of linguistic material was a subset of the BonnTempo Corpus (BTC 1.0; Dellwo et al., 2004). The BTC was originally constructed for the study of rhythmic variability of read speech across languages representing “stress-timed” (e.g., English and German) and “syllable-timed” (e.g., French and Italian) rhythmic classes. The text is a passage from a novel “Selbs Betrug” by Bernhard Schlink.

In the BonnTempo Corpus, speakers were instructed to first read the passage in their “normal reading” rate. After the first reading, speakers were instructed to read the passage again at different speech tempi. We only included read speech from native English and French speakers reading the passage at a “normal reading” rate. The BonnTempo Corpus consists of Praat™ textgrid files (Nijmegen, The Netherlands) with human-coded labeling of syllables, consonantal intervals, and vowel intervals. We created custom Praat™ scripts to extract consonantal and vowel intervals from textgrid file. Our linguistic corpus consisted of 49 read phrases in English and 42 read phrases in French. For our corpus, mean duration amount was 27 durations with a minimum duration amount of 13 durations. For each read speech phrase, we created event series, akin to the rhythmic durations in the musical themes for consonantal durations and vowel durations. To investigate differences in durational variability across read speech, we estimated CV, nPVI, and MSCVnorm for each phrase and duration type. Similar to the analysis of musical themes, we also included an analysis where nPVI was residualized out of the MSCVnorm variable.

CV estimates were higher for English speakers (M = .51, SE = .01) relative to French speakers (M = .44, SE = .02), β = −.72, t(178) = −3.53, p < .001. CV estimates did not vary across consonantal durations (M = .46, SE = .01) and vowel durations (M = .50, SE = .02), β = .02, t(178) = .14, p = .86. We observed a Language × Duration Type interaction, β = .59, t(178) = 2.06, p = .04, suggesting that CV estimates for French consonant durations (M = .41, SE = .02) were lower than English consonant durations (M = .51, SE = .02), t = −3.53, p = .003, but estimates for French (M = .49, SE = .02) and English vowel durations (M = .51, SE = .02) were not reliably different, t = −.63, p = .92.

nPVI estimates were higher for English speakers (M = 57.06, SE = 1.51), relative to French speakers (M = 50.82, SE = 1.45), β = −.73, t(178) = −3.61, p < .001. nPVI estimates were higher for consonant durations (M = 55.44, SE = 1.54) relative to vowel durations (M = 52.91, SE = 1.50), β = −.45, t(178) = −2.32, p = .02. We observed a Language × Duration Type interaction, β = .29, t(178) = 2.11, p = .03, suggesting that nPVI estimates for French consonant durations (M = 49.70, SE = 2.10) were lower than English consonant durations (M = 60.37, SE = 1.99), t = −3.61, p = .002, but estimates for French (M = 51.93, SE = 2.02) and English vowel durations (M = 53.76, SE = 2.18) were not reliably different, t = −.62, p = .92.

MSCVnorm estimates for English speakers (M = .92, SE = .01) and French speakers (M = .93, SE = .01) were not reliably different, β = .31, t(178) = 1.60, p = .10. MSCVnorm estimates were higher for consonant durations (M = .97, SE = .01) relative to vowel durations (M = .88, SE = .01), β = −.65, t(178) = −3.54, p < .001. We did not observe a Language × Duration Type interaction, β = −.40, t(178) = −1.47, p = .14.

Finally, to control for local variability estimated by the nPVI, we residualized out the variance explained by the nPVI estimates and constructed a new model for the MSCVnorm estimates. Residual MSCVnorm estimates for English speakers (M = −.009, SE = .01) were reliably lower relative to French speakers (M = .01, SE = .02), β = .43, t(178) = 2.26, p = .02. Residual MSCVnorm estimates for vowel durations (M = −.04, SE = .01) were reliably lower relative to consonant durations (M = .04, SE = .01), β = −.59, t(178) = −3.19, p = .002. We observed a marginal Language × Duration Type interaction, β = .29, t(178) = −.50, p = .06. However, planned comparisons suggested that French consonant durations (M = .07, SE = .01) were not reliably different than English consonant durations (M = .02, SE = .01), t = 1.61, p = .38, nor were estimates for French (M = −.05, SE = .02) different from English vowel durations (M = −.04, SE = .01), t = −.47, p = .96 (see Fig. 3).

We also submitted the Hurst-AL analysis to the language corpus. The Hurst-AL analysis provided estimates for 97.8 % (n = 178) of the language corpus, and therefore, we tested for differences across language and duration type. Hurst-AL estimates for English speakers (M = .45, SE = .07) and French speakers (M = .44, SE = .08) were not reliably different, β = −.12, t(174) = −.59, p = .55. Hurst-AL estimates were higher for vowel durations (M = .46, SE = .07) relative to consonant durations (M = .43, SE = .08), β = .44, t(174) = 2.16, p = .03. This result corroborates with the results from the MSCVnorm estimates suggesting that vowel durations have more multiscale structure relative to consonant durations. We did not observe a Language × Duration Type interaction, β = −.19, t(174) = −.64, p = .52. Residual Hurst-AL estimates (controlling for nPVI) did not differ across language, duration type, nor the language × duration type interaction, all ps > .10.

Interim discussion of the application results

The results from the residual MSCVnorm estimates for musical themes suggests that the composer’s native language has an influence on the multiscale structure of his or her work. Similar to other past studies (Patel & Daniele, 2003; see also London & Jones, 2011), we applied a quantitative measure of a proposed property of speech and music, multiscale variability, to the music of composers from stress-timed (British English) and syllable-timed (French) languages. We found that, controlling for local variability (nPVI values), English classical music had more multiscale variability, as suggested by observing lower MSCVnorm estimates, relative to French classical music. We limited our corpus of musical themes to only consist of themes with duple meter. In a re-analysis of Patel and Daniele (2003), London and Jones (2011) investigated two levels of linguistic structure and found that only themes in duple time showed the differing patterns of local variability across British English and French themes.

We observed that English-read speech had more multiscale variability, as suggested by observing lower MSCVnorm estimates, relative to French-read speech. Importantly, this observation only occurred after residualizing out variance explained by nPVI estimates. We also observed that vowel durations had more multiscale variability relative to consonant durations. Finally, we observed a marginal interaction suggesting that, for consonantal durations, English-read speech has more multiscale variability relative to French-read speech. However, subsequent analyses suggested that this was only a nominal difference. Nevertheless, we can speculate that these results relate to the idea that stress-timed languages have more complex syllables (Dauer, 1983). Ramus et al. (1999) observed that consonantal durations in stress-timed languages have more variability relative to syllable-timed languages. Again, however, these interpretations are speculative considering the lack of a reliable effect in subsequent statistical tests.

Across the results of speech and music, one observation is that the patterns of the MCSVnorm estimates were similar across the language of the composer (musical corpus) and speaker (language corpus). This observation suggests that, at least for English and French, stress-timed language structure exhibits more multiscale variability relative to syllable-timed language structure. If cultural differences do in fact influence the composition of music, perhaps this pattern suggests that the complexity of syllable structure influences the degree to which a musical theme is composed. This conjecture could be informed by future work with larger and more diverse speech and music corpora.

Discussion and conclusion

Methods for estimating patterns of variation across scales of measurement typically require the user to have substantially large time series. In this paper, we introduced a new analysis that affords researchers the ability to estimate patterns of variability across temporal scales using time series of limited length.

In the simulation study, we observed that the MSCV analysis was sensitive to different types of time series that varied depending on the temporal structure generated from ARFIMA models. From the MSCV profile, the user can choose from a variety of estimates that, in various ways, quantify the pattern of variability across temporal scales. We observed that the MSCVnorm estimate generally ranges from 0.0 to 1.0. In the simulation study, ARFIMA models generating white noise time series produced MSCVnorm estimates around 1.0. ARFIMA models generating long-range and short-range correlations produced MSCVnorm estimates less than 1.0. Notably, long-range correlations are known to display multiscale structure across temporal scales and were observed to have the lowest MSCVnorm estimates. As previously noted, the MSCV analysis is not meant to assess the fractality of a time or event series. Researchers interested in assessing whether or not a time series is fractal are encouraged to use previously existing methods (Eke, Hermán, Kocsis, & Kozak, 2002; Goldberger et al., 2002; Hausdorff, Peng, Ladin, Wei, & Goldberger, 1995; Holden, 2005).

In the simulation study we also compared the MSCV analysis with a common multiscale analysis, the rescaled range analysis (Hurst-AL). We found that both the MSCV and rescaled range analyses performed equally well for longer time series. However, the MSCV outperforms the rescaled range analysis for extremely short time series and for event series with diverse properties, e.g, consecutive identical durations in the musical corpus.

In the application study, we applied the MSCV analysis to a comparison between speech and music. Previous research had shown that the rhythmic properties of music and speech, as quantified by the nPVI, vary as a function of whether the composer’s native language was stress-timed (e.g., English) or syllable-timed, e.g., French (Patel & Daniele, 2003). In our application study, we investigated whether multiscale properties of speech and music, as quantified by the MSCVnorm estimate, differed as a function of stress- and syllable-timed languages, too. We observed that MSCVnorm estimates for note durations of music and read speech differed across English and French. Specifically, English music and speech had lower MSCVnorm estimates relative to French music and speech. These results suggest that stress-timed languages have stronger multiscale properties relative syllable-timed languages. Conversely, these results suggest that the variability of syllable-timed languages is more homogeneous across temporal scales. The application study provided a good example of how the MSCV analysis can differentiate between time series of short durations.

In both the simulation study and the application study, we used the default binning parameters for each time series, [2,(L/2)-1]. However, the MATLAB scripts can be adjusted to define any range of bins as long as the minimum bin is a whole number greater than 1. The MSCV analysis can be applied to a wide range of datasets with duration- or interval-level data points. Coefficient of variation is a dimensionless number because it is independent of the unit of measurement specific to a dataset. The analysis has already been applied to measurements of postural sway (Ross et al., 2016), musical durations estimated from an auditory saliency model (Ross et al., 2016; see also Coath et al., 2007, 2009), and durations from musical themes and spoken language (current study).

The results from the simulation and application studies suggest that the MSCV analysis can discriminate between time series that vary in multiscale structure. Importantly, the results from the simulation study suggest that even short time series (e.g., lengths of 50 or 25 data points), can vary in multiscale structure and can be differentiated using the MSCV analysis. It should be noted that for extremely short time series (e.g., 25 data points), the MSCV analysis failed to discriminate between time series of specific temporal correlations, e.g., LRC vs. SRC. Nevertheless, even for extremely short time series, the MSCV analysis, and specifically the MSCVnorm estimate, was sensitive to whether a time series had heterogeneous structure (e.g., LRC and SRC) or homogeneous structure across timescales, e.g., white noise. Future research should try the MSCV analysis on a wide corpus of short and long sequences of behavioral data such as speech, human motor performance, and reaction times, and continuous measurements of neural data such as spike trains and time-varying EEG signals.