Case/agreement matching: Evidence for a cognitive bias

In an artificial language experiment, participants were taught two different artificial languages consisting of English content words and novel morphological marking. The first of the languages had matching alignment in both case and agreement, as attested in natural languages such as Basque, Belhare and Tsez. The other language combined accusative case alignment with ergative agreement alignment, a combination which is apparently unattested amongst natural languages. There was no significant difference between the languages in terms of the proportion of participants that showed awareness of the agreement pattern, nor in the ability of aware participants to recall case markers and inflections during training, or select the correct verb inflection in the generation post-test. However, amongst participants who remained unaware of the agreement pattern there was a significant difference in recall of verb inflections and case markers during the exposure phase task – recall was more accurate in the (attested) language with matching case and agreement alignment than the (nonattested) language in which case and agreement alignment were unmatched. We take this as evidence that there is a cognitive bias against the unattested non-matching alignment, reflected in implicit learning.


Introduction
Languages are free to index syntactic relations via either head or dependent marking (Nichols 1986). We refer to head-marking as (verbal) agreement and dependent marking as (nominal) case, without making any theoretical commitments as to the status of these phenomena. Interestingly, while in languages with ergative case alignment, either matching ergative agreement or non-matching accusative agreement alignment is possible, where case alignment is accusative, agreement must be matching and ergative alignment is banned (a notable gap which has long been noted in both the typological and generative literature -see Anderson 1977;Moravcsik 1978;Corbett 2006;Woolford 2006;Bobaljik 2008). In this paper, we seek to use an artificial language experiment to test the relative learnability of the (rare but attested) matching ergative-ergative alignment vs. the (apparently unattested) non-matched accusative case and ergative agreement alignment. Our results show that amongst unaware participants, there is a significant difference in recall of the attested vs. the unattested patterns in the training phase of our experiment. This suggests a cognitive bias against this unattested alignment.
Our article is structured as follows. In section 2 we provide a brief overview of attested variation in the domain of case and agreement, including the apparent gap in attested alignment combinations. In section 3, we provide an even briefer introduction to previous Glossa general linguistics a journal of Sheehan, Michelle, et al. 2018. Case/agreement matching: Evidence for a cognitive bias. Glossa: a journal of general linguistics 3(1): 92. 1-23, DOI: https://doi.org/10. 5334/gjgl.413 studies using artificial languages to test the status of such typological gaps. Section 4 provides the rationale and methodology for the present study, which tests implicit learning of (rare but attested) ergative case-ergative agreement vs. (apparently unattested) accusative case-ergative agreement patterns. Section 5 presents the results of the study and section 6 discusses their potential significance. Finally, in section 7, we conclude and make suggestions regarding how to further probe the status of mismatched case-agreement alignment.

Case and agreement alignment: Typological patterns
Languages differ regarding the way that they encode grammatical functions morphologically. Consider the patterns observed in accusative languages such as Japanese, Swahili and Spanish, in which transitive and intransitive subjects pattern alike, morphologically speaking. Japanese is a language which employs only dependent marking/case marking. Swahili is a language which lacks case but requires subjects and in some contexts objects to be head-marked (as prefixes) on the verb. 1 Finally, Spanish is a language which employs double marking in this domain, at least with animate specific arguments: the latter are introduced by the differential object marker (DOM) a (which might be considered a form of accusative case) where they function as objects but not where they are subjects, and the verb also inflects to agree with transitive/intransitive subjects: 2,3 1) Japanese a. Makiko-ga Yoko-o mita. Makiko-nom Yoko-acc see 'Makiko saw Yoko.' b. Makiko-ga kita. Makiko-nom came 'Makiko came.'

3)
Spanish a. Juan salud-ó a María. J greet-3sg.pst dom M 'Juan greeted Maria.' b. Juan lleg-ó. J arrive-3sg.pst 'Juan arrived.' All three languages can be said to instantiate the same basic alignment, in spite of their notable differences, as in all three cases transitive and intransitive subjects pattern alike in terms of head or dependent marking and objects pattern differently.
A different alignment pattern is attested in ergative systems, in which the transitive object patterns with the intransitive subject (see Dixon 1994 for an overview). In the same way as with accusative systems, there are also ergative systems that display only dependent marking (Dyirbal), those that display only head marking (Q'anjob'al) and those that employ both (Basque):

6)
Basque; Isolate; Spain/France a. Berri-ek (ni) haserretu n-au-te. news-det.pl.erg 1sg.abs anger 1sg.abs-have-3pl.erg 'The news angered me.' b. Joan n-a-iz. go 1sg.abs-pres-be 'I have gone.' Regardless of whether agreement follows an ergative or accusative alignment, another parameter of variation determines how many arguments the verb can or must agree with (Moravcsik 1974;. In many accusative and ergative languages the verb agrees with only one argument: the (unmarked) nominative or absolutive:
In other languages, the verb agrees with two or more arguments, according to a case hierarchy (Bobaljik 2008). In Archi, for example, the verb indexes not only the absolutive argument but also the ergative (transitive subject) (Corbett 2006: 57;citing Kibrik 2003: 562-3). In Basque, the finite verb indexes the absolutive, ergative and dative (Hualde, Oyharçabal & Ortiz de Urbina 2003). 4 In accusative systems, the second most likely argument to be indexed on the verb after the nominative subject is the accusative object and then the dative (Moravcsik 1974). In all such languages we can describe the case and agreement systems as matching as they both follow either accusative or ergative alignment, regardless of the number of arguments that actually get indexed. We will describe systems such as Basque, Tsez and Belhare as ERG-ERG, to indicate ergative case and ergative agreement alignment (regardless of the number of arguments that are actually indexed on the verb).
In a small number of unrelated languages we see a mismatch between alignment in case and agreement: whereas the case system is ergative, the verbal agreement follows an accusative alignment. We will call these systems ERG-ACC, again regardless of the number of arguments that are indexed, as they show ergative case and accusative agreement alignment. Consider the following examples by way of illustration. In Nepali ( Indo-European, Nepal), the verb indexes only one argument, the subject, regardless of whether that subject is ergative or nominative. This makes Nepali different from closely-related Hindi in which the verb only ever agrees with the nominative/absolutive argument. Note that Nepali, like many languages, has ergative case only in perfective contexts (9b).

10)
Nias; Austronesian; Barrier Islands (Brown 2001: 346, 499;Brown 2005 Note that the realis verb agrees with the transitive subjects in (10a-b) but not with the intransitive subjects in (10c-d). Verbal agreement therefore follows an ergative alignment. This can, therefore, be considered a matched ergative alignment as the intransitive subject patterns with transitive objects in not triggering agreement and undergoing initial consonant mutation. It is important to note, however, that this differs from the more common ergative agreement pattern, whereby the verb agrees with S/O and not A. This may be due to the morphological realisation of case in this language: whereas ergative case has no overt realisation, absolutive (on intransitive subjects and intransitive subjects) is realised via initial consonant mutation. This "marked absolutive" pattern is extremely rare cross-linguistically. The agreement pattern in realis contexts is thus rare in one way but typical in that it tracks the morphologically unmarked case (see Bobaljik 2008 for a theory of agreement based on this idea). For our purposes what is interesting is that in irrealis contexts, there is a case/agreement mismatch, as the verb agrees with both transitive and intransitive subjects. This results in an ERG-ACC pattern which essentially replicates the Nepali pattern described above, but restricts it to irrealis contexts:
Other languages which are reported to share this mismatched ERG-ACC alignment include Walpiri (Pama-Nyungan; Australia; Legate 2002) and Walmatjari ( Pama-Nyungan; Australia; Hudson 1978). Though these mismatches are all subtly different, due to independent properties of the languages involved, they nonetheless represent robust instances in (sometimes) unrelated languages of mismatches between case and agreement alignment. Crucially, the reverse mismatch is apparently not attested. There are apparently no languages which display accusative case marking and ergative agreement, as has oft been noted in the literature (see Anderson 1977;Moravcsik 1978;Corbett 2006;Woolford 2006;Bobaljik 2008). This apparent gap raises an interesting question. Is the ACC-ERG alignment impossible or simply unattested? In other words, is there some reason why these particular grammatical options are never combined or is it merely an accident, due to the relative infrequency of case/agreement mismatches? The data are suggestive of a necessary gap in this instance because the mismatched ERG-ACC pattern is actually not that infrequent in natural languages as shown by Table 1 and may in fact be more frequent than matched ERG-ERG systems (at least in the samples of languages which are available).
Putting to one side tripartite and active case systems and including marked nominative languages as accusative case languages, in the combined sample from Comrie (2013) and Siewierska (2013) of 139 languages, there are 42 languages in which pronominal case and agreement fall together and only 7 which display a mismatch, all of which having ERG-ACC alignment of the kind discussed here. Surprisingly, though, in this sample, there are only actually two languages with matching ERG-ERG alignment, out of a total of 23 languages with some kind of ergative alignment. This makes them only 8% of the "ERG languages" and 1% of the whole sample, as opposed to ACC-ACC languages which are 29% of the whole sample and 38% of the "ACC languages". Compare them also to the ERG-ACC languages which represent 5% of the whole sample and 22% of the ERG languages.
Obviously we do not know that this sample is representative of the world's languages and the numbers contained in it are fairly small, but it does seem to support Corbett's (2006: 57) independent claim that "canonical" matching ergative case and agreement is "not particularly frequent". We are therefore left with two very infrequent alignment types in this domain: ERG-ERG and ACC-ERG, with a potentially crucial difference being that ERG-ERG is attested but rare whereas ACC-ERG is apparently unattested. 6 Given the small numbers involved, however, and the inherent sampling problems associated with typological research of this kind (see Dryer 1989;1992), we need a new method to address the issue of whether ACC-ERG is really impossible or dispreferred or simply unattested due to infrequency. Artificial language learning provides a potential means of distinguishing between these possibilities.

Artificial language learning: Previous research
Whatever the theoretical account of the aforementioned typological gap, it is possible that it will be evident even at the earliest stages of language acquisition, and hence may be detectable in an artificial language learning experiment, even on adult participants. Culberton et al.'s (2012) research provides an indication that this is indeed possible, although for a preference for harmony in the domain of word order. Culbertson et al. examined the relative positioning of adjective and numeral modifiers with respect to the noun. Across the world's languages there is a preference for harmony -either the adjective and the noun both occur before the noun (27% of languages) or both occur 6 Patel (2006) notes that Kutchi (Indo-Aryan) is a potential counterexample (but see Coon to appear: footnote 3). after (52% of languages). The disharmonic combination of numeral-noun and nounadjective is relatively rare (17% of languages), and the other disharmonic combination of noun-numeral and adjective-noun is extremely rare (4% of languages) (data from Dryer 2008a; b, as reported in Culbertson et al. 2012). Culbertson et al. (2012) created small artificial languages consisting of nonsense words for nouns, colour adjectives, and numerals. Participants heard the objects described using combinations of a noun and either a numeral or an adjective, but never both. In all of the languages there was a majority word order (e.g. 70% numeral-noun and noun-adjective) and a minority order (in this case 30% ajective-noun and noun-numeral). After exposure to the language, the participants performed a production task in which they had to describe objects, again using phrases containing a noun and either an adjective and a numeral, but never both.
At issue was whether their tendency to reproduce the majority word orders in the input (and hence be likely to transmit the language) would reflect the frequency of those word orders in the world's languages. In fact, this was the case, with a stronger tendency to reproduce harmonic word orders than disharmonic ones even though they had the same frequency in the input. Whilst there was a particular dispreference for reproducing the virtually unattested numeral-noun and noun-adjective combination, what is of most importance here is that there was an overall preference for harmony in this novel language. Subsequently, using the same paradigm, Culbertson & Newport (2015) showed the same bias towards harmonic word orders in child participants (22 female, mean age = 6; 11, range = 6; 0-7; 11), and on this occasion less of a specific bias against the barely attested disharmonic pattern.
Hence, previous research has demonstrated a bias towards word order harmony in adults' and children's learning of artificial languages. Here we ask whether a similar bias can be experimentally demonstrated in relation to the matching of case and agreement. If it can be shown experimentally that there is an acquisition bias against ACC-ERG as compared with ERG-ERG then this can be taken as evidence that the gap which we observe is not merely accidental. Future work can then consider the learnability of the unattested ACC-ERG alignment against the attested ERG-ACC alignment to ascertain whether ACC-ERG is biased against only by virtue of being mismatched or by additional factors. The present study therefore provides an important vindication of the methodology and an important first step in our understanding of the status of the much discussed ACC-ERG gap.

Rationale of the present study
As outlined above, the present study aims to make a direct comparison of the learnability of two types of language -one in which there is ergativity in both the case and verbal agreement system, and one in which the case system follows the accusative pattern whilst the verbal agreement system follows the ergative pattern. We shall refer to these as the "ERG-ERG" and "ACC-ERG" language respectively, as discussed above.
In common with the experimental studies described above (Culbertson et al. 2012;Culbertson & Newport 2015) and Culbertson and Adger (2014), we tackled the learnability issue by examining the very initial stages of learning after relatively little exposure. It is important to remember that the notion of the relative learnability of these languages that we are interested in relates specifically to what we will refer to as implicit learning -learning that takes place in the absence of instruction, and without forming and testing conscious hypotheses. Whether one takes a generative (Yang 2002;2016), emergentist (Ellis 1998) or statistical (Romberg & Saffran 2010) view of the learning process, the assumption is that the relevant empirical phenomena come from situations where the language was "acquired" rather than "learned" (Krashen 1981), "picked up" in a natural way, rather than being learnt through instruction or conscious problem solving. Learnability predictions relate primarily to implicit learning in this sense, and not to people's ability to figure out linguistic patterns through conscious problem solving.
But how do we know whether we are tapping into implicit as opposed to explicit learning processes? The assumption underlying implicit learning research is that if participants are unaware of what they have learned they are unlikely to have been aware of the process by which it was acquired. For example, in artificial grammar learning experiments participants' grammaticality judgements can be above chance even for those test items where they claimed to have produced their response by guessing (Dienes & Scott 2005) suggesting that they have some veridical knowledge of the grammar, but they are not aware of it. If they are not aware of the knowledge, then it seems unlikely that it was acquired through conscious learning processes.
Turning to the present experiment, we shall first describe the languages and tasks, and then discuss which aspects of the results are likely to reflect implicit learning.
The experiment adopted the semi-artificial language learning paradigm first introduced by Williams & Kuribara (2008) to examine acquisition of Japanese scrambling, and then used by Rebuschat & Williams (2011) to examine German word order regularities (see also Grey et al. 2014). In this technique, the "language" consists of elements of an unknown syntactic system combined with native language lexis. The present paradigm differs from artificial language studies in which participants have to learn an entirely invented language, involving new lexis as well as grammatical rules (e.g., DeKeyser 1994; Friederici et al. 2002). Since the experiment targets grammatical rules, rather than lexis, it is redundant to burden participants with learning new lexis. If newly learned words are simply linked to their translation equivalents anyway (Kroll & Stewart 1994) and if L2 words inherit their grammatical properties from the L1 where possible (Salamoura & Williams 2007; then there is little difference between using native language and novel forms as carriers for novel morphemes. An additional consideration here is that, given that learning the present rules depends upon grammatical notions of transitivity and number, it seems more likely that these will be computed in a situation where the cognitive system is not overly pre-occupied with lexical processing.
The schema for representative sentences is shown in Table 2. Novel grammatical morphemes were introduced: ku-and pa-were case markers, ne-a locative marker and -o and -i verbal agreement markers. The only thing that the participants were told was that -o indicated singular and -i plural. Instruction in this aspect of the grammar was necessary so that the participants could correctly identify the target of verb agreement. All of the sentences were contrived to be readily interpretable on the basis of the content words alone, each sentence contained two nouns, one singular and one plural so that agreement with the verb was unambiguous. The first four examples of each language in Table 2 are transitive sentences. Note that the transitive sentences are identical in the two languages. In "ku-banker pa-accounts activated-i", ku-is associated with the agent, and pa-with the theme, as can be inferred from the meaning of the sentence. The verb ends in -i indicating that it is plural, and so it must be agreeing with the object. The word order in this example is SOV, but half the time it was OSV, as in "pa-seeds ku-peasant scattered-i". Hence the verb agreed with the first and second noun equally often. It was also singular or plural equally often.
The second four sentences in each column of Table 2 are intransitive sentences. Here ne-is used as a locative marker in both languages to mark a non-argumental adjunct. In both languages the verb agrees with the subject, with the crucial difference that the subjects are absolutive (i.e. marked like transitive objects) in the ERG-ERG language and nominative (i.e. marked like transitive subjects) in the ACC-ERG language. Since in both languages the verb agrees with the object in transitive sentences, they both display an ergative verbal agreement pattern. The only difference between the two languages therefore concerns the case-marking of intransitive subjects. In the ERG-ERG language, we have "pa-girls ne-playground laughed-i". Here the case marker that is used with the object in transitive sentences, pa-, appears with the subject of intransitives. Hence case marking follows an ergative pattern in this language. In contrast in the ACC-ERG language we have "ku-girls ne-playground laughed-i". Here the case marker that is used with the subject in transitive sentences, ku-, also appears with the subject in intransitives, so case marking follows the accusative pattern in this language.
A second version of each language was also created and used (in a separate iteration of the experiment) in which the case marker -pa was simply removed, in line with the fact, mentioned above, that the nominative/absolutive case tends overwhelmingly not to be morphologically realised. In the "no-pa" version of the ERG-ERG language the verb consistently agreed with the noun without case marking. Given the tendency, also discussed above, for the verb to agree first with nominals lacking overt case morphology, we wanted to control for this as a potential confound. We included these variants of the two languages in case agreement with morphologically unmarked nominals is crucial to the acquisition process. In fact, it was not, as well shall see below.
Comparing the left and right columns of Table 2, it can be seen that the verb inflections are identically distributed in the two languages, always agreeing with the object noun in transitive sentences and the subject in intransitives. Given that all of the participants were native speakers of English with no knowledge of any language containing ergativity then this verb agreement pattern was equally alien to all of them. With regard to the case markers, to the extent that English displays accusative alignment in relation to the case marking of pronouns, then any influence from English should favour the ACC-ERG rather
than the ERG-ERG language. 7 However, our prediction was that there would be a bias in favour of ERG-ERG and against ACC-ERG, given that the second pattern is unattested in the languages of the world. Note that when comparing the two languages we are looking for differences in learnability of the same forms according to the way that they pattern with other, seemingly unrelated, parts of the language. There were four groups of participants formed by crossing language (ERG-ERG or ACC-ERG) and presence/absence of -pa. During the exposure phase participants performed a short-term memory task on the sentences. The dependent variables were recall accuracy for the case markers and verb inflection. This would be predicted to improve over the course of the experiment as participants became used to the overall task. The question is whether verb inflection and case marker recall accuracy would improve more rapidly in ERG-ERG than ACC-ERG. If so then this would suggest that the alignment between case and agreement in the ERG-ERG language made the grammatical morphemes easier to maintain in memory, at least in the short-term. Note that such an effect could occur, at least in principle, without the participants' conscious awareness of the underlying rules. We assume that changes in short-term memory performance can reflect implicit learning of underlying structural regularities and as such constitute an "indirect" measure of learning (for examples of the same logic see Reber 1967;Karpicke & Pisoni 2004;Conway et al. 2007). An advantage for verb inflection and case marker recall is therefore predicted for the ERG-ERG language even amongst participants who afterwards are unable to report the relevant rules, since as stated above, the learnability predictions relate exclusively to implicit learning. Whilst participants might be predicted to be more likely to become aware of the rules in the ERG-ERG language, for aware participants there would not necessarily be a difference in recall accuracy during the training phase because conscious knowledge, once it is attained, would facilitate recall equally in both languages.
The exposure phase was followed by a post-test phase involving variants of the exposure phase sentences. The participants were provided with a sentence meaning, e.g. "The bankers activated the account" which was identical to an exposure phase sentence except that the plurality of the nouns, and hence the verb, was switched. They were presented with each word in sequence, e.g. -bankers, -account, activated-. For each word they had to select the correct case marker or inflection. Variants of the exposure sentences were used because the test was intended to engage generation ability in the context of a task that could be presented to the participants as a long term recall exercise. Unlike short-term recall, long-term recall for these materials would be so difficult that it is likely to encourage the participants to use conscious hypotheses about the grammar to aid case marker and verb inflection selection.
In implicit learning research generation tasks are regarded as "direct" tests of memory that tap primarily into conscious, explicit, knowledge. As such performance on a generation task can dissociate from performance on an indirect measure of implicit knowledge (such as, in the present, case short term memory), with participants showing evidence of learning on the indirect, implicit, measure, but not on the direct, explicit, measure (Keane et al. 1995). This could be because of interference from erroneous conscious hypotheses that are formed during the generation task, or because the difference in task formats prevents the expression of the implicit knowledge formed during the exposure phase.
Whatever the reason, such a dissociation between performance on indirect and direct measures is typically regarded as evidence for the existence of implicit knowledge (but see Shanks, Wilkinson & Cannon 2003, for an alternative view).

Participants
A total of 88 native speakers of English participated, none of whom had any knowledge of languages featuring ergativity. They were students in either the faculty of English or Modern and Medieval Languages (including the Department of Theoretical and Applied Linguistics) at the University of Cambridge, U.K. They were assigned to one of the four groups formed by crossing language (ERG-ERG or ACC-ERG) and presence/absence of -pa.

Materials
In each language there were four types of transitive sentence and four types of intransitive sentence. These consisted of the two word order permutations with singular and plural agreement on the verb (see Table 2). For the ERG-ERG exposure phase materials we created five unique sentences of each transitive type, and six sentences of each intransitive type, giving a total of 44 sentences. There were slightly more intransitive sentences because these are the ones that distinguish between the languages. All but one of the plural nouns ended in -s (the exception was children). As mentioned above, for all of the sentences an unambiguous meaning could be constructed from the content words alone (based on the meaning of the two noun phrases). The materials for the exposure phase are listed in Appendix A. For the generation post-test there were two items of each type, giving a total of 16 test items. These were based on one of the exposure phase items but the plurality of the nouns, and hence the verb, was switched. For example, the exposure phase item "pa-engines ku-mechanic repaired-i" became "pa-engine ku-mechanics repaired-o". The same materials were used for the ACC-ERG language except the case markers in the intransitives were altered accordingly. For the "no-pa" version of each language "pa-" was simply removed. The materials for the generation post-test are listed in Appendix B.

Procedure
Participant questionnaire. Before the experiment the participants filled in a consent form and a questionnaire in which they specified their field of study, the second languages they spoke, and level (beginner, intermediate, upper intermediate, advanced).
Exposure phase. The experiment was run using Superlab© software. On each trial of the exposure phase a single sentence was presented. On the assumption that all learning, even implicit learning, is critically dependent on attention (see Williams 2013, for a review) participants had to make decisions on the aspects of the sentence that were critical for learning the underlying rule system -namely, whether the nouns in the sentence were singular or plural, whether the verb was marked for singular or plural, and what case markers appeared with the nouns. The participants were first told that "In this experiment you will see sentences that follow the grammar of a foreign language, call it Language X. To make it easier, English words will be used throughout, but they will have the grammatical markers and word order of Language X." They then saw an example sentence ( ku-mouse pa-cheese eat-o) one word at a time. They were then told "All verbs in Language X end in either -o to mark singular, or -i to mark plural, e.g. kick-o has the singular marker and kick-i has the plural marker" (note that they were not told which noun the verb agrees with).
The procedure will be exemplified with the sentence "ku-banker pa-accounts activatedi". All stimuli were centred on the screen except where indicated, nouns and verbs were presented as here with hyphens between stems and case markers/inflections. All stimuli only disappeared when the participant made the correct response, hence feedback was provided. The sequence of events on each trial was as follows): (i) fixation cross until the participant initiated the trial by pressing the space bar, (ii) English translation of the sentence (The banker activated the accounts) until the participant pressed space to continue, (iii) first noun (ku-banker), participant indicated by key press whether the noun was singular (m) or plural (z), (iv) second noun (pa-accounts), participant indicated singular (m) or plural (z), (v) verb (activated-i) for 150 milliseconds, (vi) verb number decision cue consisting of "sing" and "pl" arranged laterally on the screen (decision made by pressing corresponding m and z keys). Following this, the participant had to recall the case markers and verb inflection, the sequence of events being: (vii) the first noun was presented without its case marker (-banker) and a case marker decision cue appeared below it (for the "with pa" languages this was "ku ne pa", and for the no-pa languages it was "ku ne -"), the participant responded by pressing the corresponding z, space, or m keys, (viii) likewise for the second noun (pa-accounts) with case marker decision cue, (ix) verb (activated-) with a verb number decision cue below it (always "-o -i"), the participant responded by pressing corresponding m and z keys, (x) correct sentence (ku-banker paaccounts activated-i) for one second.
Note that the verb was initially presented for only 150 milliseconds. Since there was no immediately following stimulus at the same location it was clearly visible. The purpose was to force high levels of attention on the inflection. Also, the following verb number decision cue was randomly varied such that either singular was indicated using the "z" key and plural with the "m" key, or vice versa. This was to avoid repeating key stroke patterns across sentence types (e.g. if plural were always the z key, when the subject was in first position the second noun and the verb would always require the same response). It also forced deeper processing of whether the inflection meant singular or plural.
Given the complexity of the task the different components were introduced step-wise during an initial practice phase. Participants were first given practice at making the verb number decision on two sentences, one transitive (again with only singular nouns) and one intransitive. The correct choice was indicated and explained on the decision screen. They were then required to make the additional noun number decision on two more practice sentences. One of these was transitive with a singular and a plural noun (hence forcing verb agreement with the object), and the other was intransitive, again with an explanation on the decision screens for the first sentence. Finally, morpheme recall was added for a repeat of the preceding intransitive sentence and one additional intransitive sentence. The Experimenter was on hand to provide additional explanation where necessary. Note that the composition of the practice items was such that verb-object agreement was only forced for only one of the 5 unique practice items.
Generation post-test. The sequence of events on each trial, exemplified with the sentence "ku-vet pa-dogs cured-i", was as follows: (i) fixation cross until the participant initiated the trial by pressing the space bar, (ii) English translation of the sentence (The vet cured the dogs), (iii) first noun (-vet) with case marker decision cue ("ku ne pa"), the participant responded by pressing corresponding z, space, or m keys, (iv) second noun (-dogs) with case marker decision cue, (v) verb (cured-) with verb inflection decision cue (always "o i"), the participant responded by pressing corresponding z or m keys. Unlike the training phase, there was no feedback -each response progressed to the next stimulus regardless of accuracy.
Post-experiment questionnaire. The Experimenter asked the participants the following questions in order to ascertain their level of awareness of the relevant rules: (i) Do you have any ideas about the grammatical rules of Language X? (ii) Specifically, what rules govern whether the verb ends in -o or -i (singular, plural)? (iii) What rules govern whether a noun takes ku, pa-, or ne-? (iv) At what point in the experiment did you become aware of these rules?

Results
For the purposes of the experiment it was critical that the participants understood that -o indicated singular, and -i indicated plural, information that was provided in the instructions. During the exposure phase task when they had been presented with a verb, e.g. walked-o, they had to immediately indicate whether the inflection indicated singular or plural by pressing one of two keys. It was decided to exclude participants who were less than 80% correct on the verb number decision. This resulted in the loss of 8 participants from the ERG-ERG group and 9 from the ACC-ERG group. 8

Verbal report
With regard to verb inflection, in order to be classed as "aware" the participant had to report that the verb agreed with the object in transitives but with the subject in intransitives (even if these terms were not used). For case marking they had to realise in ERG-ERG that the agent/patient case marker changed according to type of sentence (it was not enough to say that ku-was agent and pa-patient, or that ku-was always agent in the no-pa case). In ACC-ERG they had to realise that ku-was an agent marker and that pa-a patient (except in the no-pa case where they had to report that ku-was an agent marker and either that the patient had no marker, or that ne-indicated a location-like role, or both). Overall, 32% of the participants were able to report the verb agreement pattern, and 18% were able to report the case marker pattern. There were only 3 instances where the case marker but not the verb agreement pattern was reported (all in ACC-ERG). The overall classification of "aware" versus "unaware" participants was therefore based on awareness of the verb agreement pattern. The numbers of aware and unaware participants in each language and variant are shown in Table 3. For ACC-ERG 31% of the participants were aware of the verb agreement pattern, and 33% were aware in the ERG-ERG language. There was no significant difference between these proportions (using a Z test for the difference between proportions, Z = 0.05).

Language background measures
Foreign language knowledge was quantified by scoring each language according to selfrated proficiency: beginner = 0, intermediate = 1, upper intermediate = 2, advanced = 3. The sum of these scores was then taken as a measure of foreign language knowledge. Field of study was quantified in terms of assumed linguistic sophistication: English literature = 1, modern languages = 2, linguistics = 3. There was a moderate correlation between Field of study and foreign language score, r(69) = 0.264, p = 0.026. Participants in the two language groups were well matched in terms of field of study. Although the participants in ERG-ERG appeared to have slightly better foreign language knowledge, the difference between groups was not significant, t(69) = 1.321, p = 0.191. 8 Verb decision accuracy might itself be a reflection of learning. In the unaware group there was a main effect of language on verb decision error rates, F(1, 44) = 6.10, p = 0.017, eta 2 = 0.122, the error rate being higher in ACC-ERG (0.06, SE = 0.01) than in ERG-ERG (0.03, SE = 0.01). However, this difference was even present in Block 1, (0.11, SE = 0.03, and 0.07, SE = 0.02, for ACC-ERG and ERG-ERG respectively), although the difference was not significant, F(1, 44) = 1.17, p = 0.285, eta 2 = 0.026. Table 4 shows the mean noun case marker (including the locative marker -ne) and verb generation accuracy for the aware and unaware participants for each language and for transitive and intransitive sentences collapsed over the -pa and no-pa versions. An analysis of variance was performed with morpheme type (case marker versus verb inflection) and transitivity as within-subjects factors and language and the presence of pa-as betweensubjects factor. 9 Eta 2 was calculated as a measure of effect size. Differences from chance were evaluated using single-sample t-tests with the chance level of 0.333 for noun case (there being 3 options) and 0.5 for verb inflection. There was a main effect of awareness, F(1, 57) = 20.70, p < 0.001, eta 2 = 0.27. The aware participants were on average 70% correct in their case marker and verb inflection choices, whereas the unaware participants were 52% correct. There was no significant main effect of language, F(1, 57) = 0.15, eta 2 = 0.003, accuracy being 60% in ERG-ERG and 62% in ACC-ERG. There were no significant interactions involving Language. However, there was a significant interaction between awareness, morpheme type, and transitivity, F(1, 57) = 7.97, p < 0.01, eta 2 = 0.12. This was due to the fact that, for unaware participants, accuracy of verb inflection generation was much poorer for transitives than intransitives, whereas this tendency was not so marked for aware participants, and there were no effects of transitivity on case marker generation in either group. In fact for unaware participants verb inflection accuracy was significantly below chance for transitives, indicating a tendency to make the verb agree with the subject, indicating in turn insensitivity to the ergative verb agreement pattern. For aware participants accuracy was numerically above chance, but only approached significance. Hence even aware 9 Five participants were excluded from this analysis because they performed an initial version of the generation post-test in which only verb inflections in transitive sentences were tested. All of these participants received the ERG-ERG language with -pa. Three of them were classed as aware, and two unaware. participants were not able to reliably make the verb agree with the object in transitive sentences. 10 We examined whether there was any correlation between number of foreign languages known, field of study, and overall test accuracy. Collapsing over languages there were no correlations between the total test score and number of foreign languages known, not for aware, r(21) = 0.27, p = 0.23, nor unaware participants, r(46) = 0.17, p = 0.23. Neither were there any correlations with field of study, not for aware, r(21) = 0.16, p = 0.46, nor unaware participants, r(46) = 0.10, p = 0.53.

Recall during the exposure phase
In order to measure changes in recall over the course of the exposure phase the trials were divided into 4 equal blocks of 11 trials each. 11 The data were analysed separately for aware and unaware participants. This was because the generation task results revealed that reported awareness of the verb agreement pattern was associated with greater overall generation task accuracy, as well as a significant difference in verb inflection accuracy. These results suggest that the two groups really did have differential conscious knowledge of the system. Also, the post-experiment questionnaire revealed that the majority of participants reported becoming aware during the exposure phase. Of the 20 (out of 23) who reported relevant information, 19 said that they became aware in the exposure phase, with 5 saying that they became aware early in the exposure phase, 10 half way through, and one late on. Only one person said they became aware during the generation task. Hence for nearly all aware participants, regardless of which language they had received, it seems likely that their recall would be guided by conscious knowledge of the verb agreement pattern, hence reducing any difference in recall accuracy between the languages. Figure 1 shows the overall proportion of combined case marker (including -ne) and verb inflection recall errors by block separately for aware and unaware participants. It can be seen that, indeed, for the aware participants error rates were similar for both languages. But a different pattern is evident for the unaware participants. Whilst recall errors in  Significance of difference from chance indicated as follows: * p < 0.05, ** p < 0.01, *** p < 0.001.
ERG-ERG and ACC-ERG were at approximately the same level in block 1, they reduced more rapidly and consistently in ERG-ERG over the remaining blocks. Because of the low error rates the data were arcsine transformed prior to analysis. Independent ANOVAs were conducted on the data from the aware and unaware groups. Morpheme type, transitivity, and block were within-subjects factors and language and presence of pa-were between-subjects factors. When Mauchly's test showed that sphericity was violated, adjustments to the degrees of freedom were made using the Greenhouse-Geisser method.
For the unaware participants there was a significant main effect of language, the overall error rate being significantly higher in ACC-ERG than ERG-ERG, 0.17 (SE = 0.02) and 0.11 (SE = 0.02) respectively, F(1, 44) = 5.51, p = 0.023, eta 2 = 0.11. The interaction between language and block was not significant, F(1.96, 86.13) = 1.85, p = 0.140, eta 2 = 0.040. However, tests of within-subjects contrasts showed that there was a significant quadratic trend, F(1, 44) = 6.86, p = 0.012, eta 2 = 0.135, reflecting the fact that there was no difference between the languages in Block 1, but relatively large differences in the three following blocks. An analysis of the data from blocks 2 to 4 showed that the main effect of language was significant, F(1, 44) = 9.03, p = 0.004, eta 2 = 0.170, whereas the effect of language in Block 1 was not, F(1, 44) = 0.13, p = 0.72, eta 2 = 0.003. 12 12 The analysis of verb inflection recall errors alone showed a similar pattern to that in the overall error analysis. Over all four blocks there was a main effect of language (p = 0.05), no significant difference between the languages in Block 1 (p = 0.659), but a significant effect of language over blocks 2 to 4 (p = 0.015). The overall effect of language was still significant when foreign language knowledge was entered as a covariate, F(1, 43) = 4.18, p = 0.047, eta 2 = 0.089, and when field of study was entered as a covariate -over all blocks, F(1, 43) = 5.174, p = 0.028, eta 2 = 0.107. The analysis of the unaware participants' data revealed a number of other main effects and interactions, but none of them involved language. Recall error rates were higher for case markers than verb inflections, 0.16 and 0.12 respectively, F(1, 44) = 7.72, p = 0.008, eta 2 = 0.149. There were also more recall errors in transitive sentences than intransitive ones, 0.16 and 0.12 respectively, F(1, 44) = 6.85, p = 0.012, eta 2 = 0.13. The interaction between morpheme type and transitivity approached significance, F(1, 44) = 3.59, p = 0.065, eta 2 = 0.075. Verb inflection recall showed a larger difference between transitive and intransitive sentences (error rates of 0.155 versus 0.088 respectively) than case marker recall (error rates of 0.168 and 0.150 respectively). This reflects the greater difficulty of making the verb agree with the object in transitive sentences that is evident in the generation data for the unaware participants.
In summary, for the unaware participants the short-term recall results differ markedly from the generation post-test results with regard to the effect of language. In generation there were no significant differences between the languages. But in short-term recall, accuracy on both case markers and verb inflections was significantly better in ERG-ERG than ACC-ERG from the second block onwards.

Discussion
There were no significant differences between the languages in terms of the proportion of participants that showed awareness of the agreement pattern, nor in the ability of either aware or unaware participants to select the correct case markers and verb inflections in the generation post-test. By these measures, then, alignment between case and verb agreement did not influence learnability. However, effects were apparent in the ability of unaware participants to correctly recall the verb inflections and case markers during the training task -error rates were higher in the ACC-ERG than the ERG-ERG language. This is somewhat counterintuitive because the case marking pattern in ACC-ERG follows the English pattern which makes this language appear to be somewhat easier in this respect. But we assume that it is the inconsistency in the way that the case marking pattern relates to the verb agreement pattern which slightly disturbs recall performance during training. That is, whereas in ERG-ERG the verb always agrees with the pa-/null-marked noun, in ACC-ERG it agrees with the pa-/null-marked noun in transitive sentences and the ku-marked noun in intransitives. Therefore in ACC-ERG the case marker is an inconsistent cue to the verb agreement target, and we assume that it is this inconsistency within the system that disturbs recall. Of course this consistency or inconsistency is simply a reflection of the matching or non-matching of case marking and verb agreement. Note that this effect was observed with both the pa-and no-pa versions of the languages and so indicates a bias against a mismatch between case and agreement, rather than a preference to agree with a non case-marked nominal (see Bobaljik 2008).
Note that to say that superior recall in the ERG-ERG language was because the pa-/nullcase marker consistently marked the verb agreement target is not to reduce the effect to simple associative or statistical learning. There was no pattern in the relationship between case markers and actual verb inflections at the level of form. This is obviously the case in the no-pa version where there was no overt case marker, but it was also true of the pa-version. Rather, the difference lies in the consistency with which pa-/null-marks the target for verbal agreement. Hence the effect must be occurring at a deeper level than simple associative learning of relationships between surface forms, contrasting with the effects obtained in most implicit (e.g., Reber 1967) and statistical (Romberg & Saffran 2010) learning research.
Given that the unaware participants' recall during training was sensitive to the consistency of the relationship between verb agreement and case marking, it may seem strange that there was no sign of this effect in the generation post-test. However, the dissociation between "indirect" (in this case short term memory) and "direct" (in this case generation) tasks has long been taken as evidence for the distinction between implicit and explicit memory (e.g., Keane et al. 1995; but see Kinder & Shanks 2003, for an alternative view). Indirect tests tap into knowledge in a way that corresponds to the encoding operations during initial learning, whereas direct tests require additional, and conscious, operations. Here, short-term recall during training involves accessing a recently formed memory trace of the entire sentence in which both case markers and the verb inflection are present. Although the short-term memory traces were generally accurate (as indicated by the low recall error rates), non-matching in ACC-ERG between case and agreement could have influenced the stability of the encoding of the sentences, making them slightly harder to assimilate, and recall slightly more prone to error. In other words the matching bias influences encoding of the whole sentence in memory. In contrast, in the generation task, participants must, for the first time, intentionally produce case markers and verb inflections without support from short-term memory of the entire sentence. This requires them to apply novel, and conscious, processing strategies to select first the case markers and then the inflection in sequence. In the case of the unaware participants, it would not be surprising if the weak bias towards alignment that is evident in training were drowned out by the noise created by these conscious processes, not least because case marker generation was highly error prone, disrupting any relationship with verb inflection at the end of the sentence.
How do the present results relate to the actual process of language transmission, as reflected by typological facts? One concern might be that no differences between the languages were evident in the generation task, calling into question how the bias we observe could ever translate into the reported typological gap. However, generation task performance should not be equated with the naturalistic process of using implicit knowledge in production. The unaware participants were essentially being forced to produce before they were ready, before they had actually acquired the relevant grammar. We assume that in naturalistic acquisition it would require far more exposure for implicit knowledge to filter through to production. But to connect to the typological predictions we do have to assume that in principle the biases we detected here in the memory task would filter through to the ease with which the language can eventually be produced; that is, "acquired" in the normal sense. Essentially, biases in implicit learning are expected to shape language acquisition and hence language change, leading to the relevant typological gap. The aware participants, whilst more accurate in the generation task, did not show any difference between the two languages. However, their knowledge was likely to have been acquired through explicit learning involving conscious hypothesis formation and testing. As such it is not a reflection of the kind of natural, and implicit, acquisition process that is relevant to testing relative learnability and typological predictions. Essentially, it tells us nothing about naturalistic language acquisition.

Conclusion
From these results, we can tentatively conclude, therefore, that there is a cognitive bias of some kind against the (unattested) ACC-ERG combination. Given the relative rarity of ERG case/agreement when compared with ACC case/agreement, it might be that a cognitive preference for matching in case/agreement is sufficient to explain this gap. Note, however, that in typological terms the matching ERG-ERG alignment is also rare. In future studies, then, both (rare but attested) ERG-ERG and (unattested) ACC-ERG should be compared with (the attested and more frequent) ERG-ACC. If what is at stake is simply the rarity of ERG alignment plus a preference for matching then we should witness the same bias against (attested) ERG-ACC in implicit learning, when compared with ERG-ERG. This would indicate a place where relative frequency does not correlate directly with a learning bias, but is nonetheless explained by the rarity of ERG with respect to ACC (however that is explained). If there is a more specific ban on the unattested ACC-ERG combination, however, as has often been proposed in generative approaches, then we would expect this to be evident where the two non-matching combinations are compared. If ACC-ERG is biased against when compared with ERG-ACC, then there is more at stake than a simple preference for matching. We take up these challenges in ongoing work.