Can squirrel monkeys learn an ABnA grammar? A re-evaluation of Ravignani et al. (2013)

Ravignani et al. (2013) habituated squirrel monkeys to sound sequences conforming to an ABnA grammar (n = 1, 2, 3), then tested them for their reactions to novel grammatical and non-grammatical sequences. Although they conclude that the monkeys “consistently recognized and generalized the sequence ABnA,” I remark that this conclusion is not robust. The statistical significance of results depends on specific choices of data analysis, namely dichotomization of the response variable and omission of specific data points. Additionally, there is little evidence of generalization to novel patterns (n = 4, 5), which is important to conclude that the monkeys recognized the ABnA grammar beyond the habituation patterns. Lastly, many test sequences were perceptually similar to habituation sequences, raising the possibility that the monkeys may have generalized based on perceptual similarity rather than based on grammaticality.


INTRODUCTION
Within a wider program of research aimed at charting the evolution of linguistic abilities, Ravignani et al. (2013) tested whether squirrel monkeys (Saimiri sciureus) can detect violations of an AB n A grammar. The grammar was instantiated using sequences of low and high pitch sounds. Grammatical sequences were composed of a variable number of sounds of similar pitch (the B's), sandwiched between two sounds similar in pitch to each other (the A's), but very different from the B's ( Table 1). The study follows the general design of habituation/dishabituation studies (also known as familiarization/discrimination), used with both human infants (Eimas et al., 1971) and non-human animals (Cheney & Seyfarth, 1988;Fischer, 2006). The authors first habituated the monkeys to sequences with structure ABA, ABBA, and ABBBA, by playing these sequences for a total of 360 times over two days. I indicate these sequences as AB 1,2,3 A sequences. Following habituation, the authors conducted two tests to ascertain whether the animals would: 1. Show habituation to other grammatical sequences. 2. Show lack of habituation to non-grammatical sequences, such as BA or ABB. Table 1 Characteristics of sounds used to assemble stimulus sequences. ''Interval'' refers to the difference between adjacent sounds. ''JND'' refers to the just-noticeable difference in the considered frequency range, i.e., the difference that is detected with 50% probability. JND data from Wienicke, Häusler & Jürgens (2001 Such a pattern of behavior, if found, would indicate that the monkeys could generalize the AB n A structure heard during habituation to novel sequences. In Test 1, the animals listened to grammatical and non-grammatical sequences composed of the same sounds heard during habituation. In Test 2, the role of high and low pitch tones was reversed. That is, whereas A and B had signified, respectively, low and high pitch sounds prior to Test 2, the opposite was true in Test 2. According to Ravignani et al. (2013), both Test 1 and 2 indicated that squirrel monkeys can detect the presence or absence of AB n A structure in novel sequences, including longer sequences in Test 1 (n = 4,5) and novel pitch patterns in Test 2. Here, I show that this claim rests on several details of the authors' analysis, which are necessary to obtain conventionally ''significant'' results (p < 0.05) in both Test 1 and Test 2. Namely, significance is typically attained only when using a dichotomized response measure and specific criteria of data selection, such as excluding responses to AB and BA sequences in Test 2. Moreover, in Test 1, there was no direct demonstration of generalization to longer sequences (n = 4,5). Lastly, generalization may have been based, at least partly, on perceptual similarity rather than grammaticality, as some test sequences were perceptually very similar to habituation sequences.

MATERIALS AND METHODS
I acquired the data posted alongside the original article (Ravignani et al., 2013) and recast them as a table with the format displayed in Table 2. All analyses were performed with R, version 3.3.3 (R Core Team, 2017). The R code and the reformatted data are available as Supplemental Information 1.

Replication of results in Ravignani et al. (2013)
To ascertain that data acquisition did not introduce errors, this section reproduces the main analyses in Ravignani et al. (2013). It also serves to summarize the original results, in order to understand how they are affected by the factors discussed later. Responding to grammatical and non-grammatical sequences in Tests 1 and 2 is displayed in Table 3. Results for Test 1 match the height of the bars in Fig. 2 of Ravignani et al. (2013). Results for non-grammatical sequences in Test 2, however, are lower than in the original figure. The reason is that original analysis excluded responses to sequences AB and BA  (see 'Omission of data from analysis' for further analysis of this point). Excluding these sequences, I obtain Table 4, whose content matches the height of the bars for Test 2 in the original Fig. 2. I also reproduced the original statistical results. The authors performed two paired t -tests, comparing for each subject the proportion of grammatical and non-grammatical sequences to which at least one response was recorded. For these tests, my analysis yields Table 5, rows 1 and 9, which match the results by Ravignani et al. (2013) apart from these authors reporting t = 4.64 rather than t = 4.63 for Test 2.
Lastly, Ravignani et al. (2013) conducted an ANOVA of the whole data set, with proportion of trials with a response as dependent variable and grammaticality and test as independent variables. The results showed a significant effect of grammaticality, which is reproduced in Table 6. Responses to sequences AB and BA from both Test 1 and Test 2 were excluded from this analysis. Dichotomization of the dependent variable Ravignani et al. (2013) counted the number of times monkeys turned their head toward a speaker, within 7 s from stimulus onset. Before performing the analyses reproduced above, they dichotomized this measure so that 0 head turns was ''no response'' and ≥1 head turns was ''response.'' Dichotomization may have advantages: 1. It may make analysis and reporting simpler. 2. It may make observations more robust to noise. An orientation response, for example, may be interrupted by an extraneous cause (e.g., something catching the animal's attention) and then resumed. This would count as two separate responses without dichotomization, but as one response with dichotomization. Dichotomization, however, also has disadvantages: 1. It discards information in the data, which may cause false positives and false negatives (MacCallum et al., 2002;DeCoster, Iselin & Gallucci, 2009). For example, it may be argued that an animal that orients toward a stimulus multiple times shows more surprise than an animal that orients only once, but this information is lost with dichotomization. Similarly, an animal that resumes an interrupted orientation response (see point 2 above) may be considered to show more interest in the stimulus than an animal that does not.  2. It may make observations more vulnerable to noise. For example, an animal may orient once toward a stimulus by chance, but orienting twice or more by chance is unlikely. If chance responses are sufficiently frequent, dichotomization may inflate the number of responses. Statisticians generally advise against dichotomization (MacCallum et al., 2002;DeCoster, Iselin & Gallucci, 2009), but whether it is beneficial, neutral, or detrimental has not been studied in the case of habituation/dishabituation experiments. Here, it suffices to point out that results may differ depending on whether dichotomization is applied or not, in which case it may not be possible to reach firm conclusions. Ravignani et al. (2013) observed multiple responses in 24% of test trials. Figure 1 shows the full distribution of responses in Test 1 and Test 2. Repeating the analyses in Ravignani et al. (2013) without dichotomization yields no significant difference between grammatical and non-grammatical sequences (Table 5, rows 2 and 10). Results for the ANOVA of the whole data set also become non-significant when responses are not dichotomized (compare Tables 6 and 7). Ravignani et al. (2013) excluded from the analysis of Test 2 results the responses elicited by two stimulus sequences, AB and BA. All analyses reported so far have honored this choice, but there are reasons to revisit it. The rationale for excluding responses to AB and BA was that this pair of sequences, being symmetrical with respect to the exchange of low and high pitch tones introduced in Test 2, had also been presented in Test 1. Thus, the authors argued, the monkeys could have habituated to these sequences during Test 1, confounding the outcome of Test 2. The data, however, do not show any habituation: the monkeys responded equally to AB and BA in Test 1 and Test 2 (an average of 1 head turn per presentation). This is expected from the typical finding that habituation proceeds over many trials (Bouton, 2016), while AB and BA were presented only once to each animal in Test 1.

Omission of data from analysis
Without excluding AB and BA, even the original data analysis yields non-significant results in Test 2, as pointed out by Ravignani et al. (2013)

Generalization to AB 4,5 A sequences
Test 1 presented both sequences with the familiar AB 1,2,3 A structure (heard during habituation) and longer sequences with the novel structure AB 4,5 A. These have different roles in assessing generalization: AB 1,2,3 A sequences probe generalization with respect to sound identity, while AB 4,5 A sequences probe generalization with respect to sequence length (Ravignani et al., 2013). The latter is theoretically important to claim full mastery of the AB n A grammar (Fitch & Friederici, 2012), although it is not always tested in practice (Rey, Perruchet & Fagot, 2012;Perruchet & Rey, 2015). Ravignani et al. (2013) did include longer sequences in their tests, but they analyzed responses to these sequence together with responses to AB 1,2,3 A sequences, and thus did not directly demonstrate generalization to novel sequence length. Unfortunately, small sample size makes it hard to assess generalization to AB 1,2,3 A and AB 4,5 A sequences separately. For completeness, Table 5 reports the results of t -tests after selectively excluding from analysis either AB 1,2,3 A sequences (rows 3 and 4) or AB 4,5 A sequences (rows 5 and 6), with or without dichotomization. As with the other tests reported above, the conventional significance of these tests changes with dichotomization, yielding no firm conclusion regarding generalization to AB 1,2,3 A and AB 4,5 sequences separately.
Test 2 suffers less from this confound because the pitch pattern of test sequences was different from that of habituation sequences. Thus, even AB 1,2,3 A sequences can be considered structurally novel. As detailed in section 'Omission of data from analysis', however, Test 2 also provides little evidence of generalization.

Contribution of perceptual similarity
Attempts to reveal generalization based on abstract patterns must control for animals' tendency to generalize based on perceptual similarity. For example, it would not be surprising to find similar responses to two ABA sequences composed of almost identical A and B sounds, because animals typically respond in the same way to stimuli that are perceptually very close to each other (Mackintosh, 1974;Ghirlanda & Enquist, 2003). Generalization based on perceptual closeness operates across many animal taxa, including mammals, birds, fish, and insects, and does not require grammatical abilities (Enquist & Ghirlanda, 2005). Ravignani et al. (2013) addressed this concern by assembling stimulus sequences out of two sets of 44 sounds each (one for the A's, one for the B's), and by selecting the A and B sounds randomly when playing each sequence. In this way, the chance of a test sequence being identical to a habituation sequence was small. Test sequences, however, may still have been perceptually close to habituation sequences. As summarized in Table 1, the 44 A sounds spanned a range of about six JNDs, and could be as little as 0.15 JNDs apart (JND: just noticeable difference, a difference that is detected with 50% probability). B sounds were about one JND apart. Data from stimulus generalization studies shows that generalization can be substantial over several JNDs (Ghirlanda & Enquist, 2003). During habituation, the monkeys heard 720 A sounds and 1680 B sounds, meaning that each A and B was likely heard in all sequence positions within the habituation set of AB 1,2,3 A sequences. These experiences may have been sufficient to induce generalization based on perceptual similarity. Ravignani et al. (2013) concluded that ''Squirrel monkeys consistently recognized and generalized the sequence AB n A'' and that they ''are sensitive to abstract dependencies of different lengths and can generalize to new lengths and auditory parameters of the stimuli'' (''dependency'' here indicates that grammatical sequences were bound to have identical first and last elements). I have argued that these conclusions should be tempered because of the following circumstances: 1. The conventional significance of test results changes depending on whether the response variable is dichotomized, and we do not know whether dichotomizing is appropriate ('Dichotomization of the dependent variable'). 2. Including stimuli AB and BA in the analysis of Test 2 renders results non-significant (with or without dichotomization). The rational for not including these sequences is weak ('Omission of data from analysis'). 3. The tests provided no evidence of generalization to sequences longer than the habituation sequences, which is necessary to claim recognition of the AB n A grammar ('Generalization to AB 4,5 A sequences'). 4. Generalization based on perceptual similarity, rather than on pattern, may have contributed to performance in Test 1 ('Contribution of perceptual similarities'). I have reached these conclusions following the same data analysis strategy as Ravignani et al. (2013), based on the t -tests detailed in 'Replication of results in Ravignani et al. (2013)' and presented in Table 5. An alternative strategy, based on logistic regression for dichotomized responses and on Poisson regression for non-dichotomized responses, leads to the same results (not reported).

DISCUSSION
Points 1 and 2 above are largely statistical, and indicate that the results are not very robust (by the definition of robust as ''insensitive to changes in details''). To resolve these points, the best strategy would be to collect more data (although animals may eventually habituate to all test sequences), and to understand whether multiple response are informative about the underlying cognition, in order to decide whether dichotomization is appropriate. It may also be helpful to administer more habituation trials, to increase the potential difference between novel and familiar sequences.
Points 3 and 4 are methodological, and signal the need for a more interpretable testing strategy. For example, habituation and test sequences could be designed to decrease perceptual similarity. The number of stimulus presentations could be increased to ensure sufficiently many responses to test generalization to AB 4,5 A sequences separately from generalization to AB 1,2,3 A sequences. It may also be informative to test non-grammatical sequences similar to AB 4,5 A sequences, such as B 4,5 A and AB 4,5 .