“All mimsy were the borogoves” – a discriminative learning model of morphological knowledge in pseudo-word inflection

ABSTRACT Grammatical knowledge has often been investigated in wug tests, in which participants inflect pseudo-words. It was argued that in inflecting these pseudo-words, speakers apply their knowledge of word formation. However, it remains unclear what exactly this knowledge is and how it is learned. According to one theory, the knowledge is best characterised as abstractions that specify how units are combined. Another theory maintains that it is best characterised by memory-based analogy. In both cases the knowledge is learned by association based on positive evidence alone. In this paper, we model the classification of pseudo-words to Maltese plurals using a shallow neural network trained with an error-driven learning algorithm. We demonstrate that the classifications mirror those of Maltese speakers in a wug test. Our results indicate that speakers rely on gradient knowledge of a relation between the phonetics of whole words and plural classes, which is learned in an error-driven way.


Introduction
As the first part of our titlethe end of Lewis Carrol's "Jabberwocky"demonstrates, native speakers are able to identify the type of words these nonces represent and inflect the pseudo-words according to the morphological rules in their native language. Wug tests are used to exploit this ability. On the basis of the data gathered in these tests, researchers inspect morphologically complex (pseudo)words produced by speakers and draw, on the basis of their characteristics, conclusions about the nature of their morphological knowledge (Berko, 1958). This raises the question what exactly it is that speakers know about morphology and how this knowledge is learned. This knowledge, and how it is used in wug tests, is investigated in the present study. To this end, we modelled the plural classes in the wug-experiment on Maltese performed by Nieder, van de Vijver, et al. (2021a). We did so in a simple two-layer, fully connected neural network that was trained to predict plural classes by generalising relations between word forms and plural classes through error-driven learning. We tested the performance of the network by inspecting the predicted probability of plural classes in the network and how it mirrored the probability distribution of plural classes in Nieder, van de Vijver, et al. (2021a)'s participants and in a corpus of Maltese. How well the network mirrored the participants' and corpus' probability of plural classes was tested by correlating the statistical distributions from different sources.
To anticipate our results, we find medium-sized effect correlations of probability distribution among different groups of participants and the network. We also find high correlations between the participants' plural classes and the network's performance and between the plural classes in the corpus and the network's performance. Given that the model used in the present study has been shown to adequately reflect learning and the representation of knowledge in real speakers, we conclude that the knowledge of native speakers is best characterised as consisting of generalisations that are based on whole words, and that are learned in an error-driven fashion.

Theoretical perspectives on the wug test
Ever since Berko concluded, based on her famous wug test, that English children know the morphology of their native language because they were able to inflect novel words grammatically during a production experiment, the question is raised what exactly it is that speakers know about morphology. Much work has been devoted to answer this question. In the following subsections, we discuss the answers that have been presented from two perspectives: abstraction-based approaches in Section 1.1.1 and approaches based on analogy in Section 1.1.2.

Abstraction-based approaches
In the generative conceptualisation of language, knowledge of language is represented in the form of abstract statements that map a form stored in memory onto its surface form. These statements are either formalised as rules (Albright & Hayes, 2003;Chomsky, 1957Chomsky, , 1965Chomsky & Halle, 1968;Pinker, 1989), or as constraints (Prince & Smolensky, 2004). Rules are explicit statements that describe a context and a change of sounds applied when the context is met. An important question is how such rules can be learned. Albright and Hayes (2003) propose that they are induced by incrementally comparing word forms. In the case of learning past tense forms, the comparison is between stems and past tense forms, for example tɑk "talk" and tɑkt "talked". This pair is construed as a word-specific rule as in (1). The left part of the rule is called the change and the right part of the rule is called the context. This rule is to be read as follows: an empty category (∅) is changed to [t] in the context of the word [tɑk] if it is indexed with the morphological feature [+past].
Going through further pairs in the lexicon, the pair wɑk "walk" and wɑkt "walked" is found. This, again, is used to set up a word specific rule (2).
These rules can be made more general, by keeping the similar parts and trying to describe the differing parts as a natural class or a variable. The more general rule that results from this process is given in (3). It has replaced the initial segments of the verbs with their natural class [-syllablic].
Rules are then given a score, by counting how often they are applied and dividing it by the number of words in which the context of a rule is found in a corpus. This proportion is then adjusted to ensure that rules which contain contexts that occur less often are given less weight than rules that apply in contexts that occur more often. For example, if speakers, who have induced such rules, encounter a pseudo-word, e.g. mɑk, and are asked to provide its past tense, they would use the general rule to create mɑkt. Albright & Hayes used this method to model the knowledge of English native speakers of the past tense of English verbs. They used the English past tense to argue that the rules that the model induces are indeed used by native speakers of English, when they are asked to provide the past tense for pseudo verbs. To this aim, they provided the model with stems and past tense forms, which allowed it to deduce rules. The model's rules were then used to provide the past tense for pseudo verbs. The inflected pseudo-words were finally used as items in two experiments. In the first, the participants were asked to provide past tense forms for pseudo stems predicted by their model. In the second experiment participants were asked to rate the well-formedness of the past tense forms produced by their model. The correlation between the rating of regular verbs (the ones created from rules that add a regular allomorph) and the scores of the model was r = 0.714, and the correlation between the rating of irregular verbs and the scores of the model was r = 0.480. This shows that the predictions of the model correlate with the behaviour of native speakers, and, therefore, the model was interpreted to be a good model of the knowledge of native speakers about their past tense system. Similar results have been reported for other languages, e.g. Italian (Albright, 2002b), Lakotha (Albright, 2002a) and Spanish (Albright et al., 2001). Yet, despite its success, Albright & Hayes's Minimal Generalization Learner cannot serve as a general model of knowledge of morphology. The reason is that it is built upon rules that affect segments in their immediate contexts. However, there are a number of languages in which grammatical functions are expressed by prosodic changes rather than by contextual segmental changes. This is the case in the Semitic language Maltese which will be the target of the present study. In Maltese, prosodic changes such as in the noun plural inflection [fardal] "apron" [fra:dal] "aprons" (Nieder, van de Vijver, et al., 2021a) cannot be captured by linear rules.
We are by no means the first to observe this: The realisation that some processes cannot be captured by linear rules led to the development of autosegmental phonology (Goldsmith, 1979;Leben, 1973;Woo, 1969). It is difficult to adapt the Minimal Generalization Learner to incorporate the ideas from autosegmental phonology. The rules would have to be able to refer to syllable positions, and that is something rules in this form simply can not achieve. As a remedy, the rules could make use of templates which specify CV-templates, that is sequences of consonants and vowels the words are made up of (Marantz, 1982). However, as McCarthy and Prince (1990) argue, such templates are arbitrary and need to be defined in terms of prosodic units, such as morae, syllables or feet.
In the examples from Maltese given above (sg. [fardal] pl. [fra:dal]), the first syllable of the singular far needs to be mapped onto the first syllable of the plural fra:. For this system to be general, in the sense that it can accommodate all language types, it is not clear to what extent fra: would be a basic prosodic unit, since it has a complex onset and a long voweland accordingly is considered to be a marked syllable.
If we try to define the preferred plural in terms of the constraint interaction of Maltese, we also run into problems. Following Optimality Theory (Prince & Smolensky, 2004), it is impossible to pin down the complex syllable fra: as optimal, because it has a complex onset and there must also be a candidate that has a simple onset. The candidate with the complex onset violates the constraint that requires segments that are contiguous in the input to be contiguous in the output, and this constraint is not violated by the candidate with the simple onset. It is therefore not clear by what general mechanism rules or constraints that map a singular onto a plural in Maltese could be induced.
Another class of rule-based models is formed by speech production models that conceive of speech production as a modular process. During this process, semantic concepts select an appropriate lemma (e.g. "walk" for a movement behaviour), which in turn activates syntactic and morphological properties of the lemma, and these, in turn, activate its phonological and phonetic properties. The Utterance Generator (Fromkin, 1971(Fromkin, , 1988 and The Theory of Lexical Access by Levelt and colleagues (Indefrey & Levelt, 2000;Levelt et al., 1999;Roelofs & Ferreira, 2019) belong in this class. An overview of such models of speech production can be found in Tucker and Tomaschek (to appear).
Of interest for the interpretation of wug tests are, within the framework of speech production models, the processes at the morphemic stage. At this stage, the lemma selects the appropriate morphemes which encode the intended meaning, e.g. the stem for "walk" and s for "second person singular present tense". Once concatenated, they are passed to phonological encoding resulting in the phonological form, e.g. wɑks. Phonological sequences like this one activate syllable representations which drive articulation.
This perspective is implemented in computational models of speech production, such as Weaver++ (Roelofs, 1997) or the Spreading-Activation Model by Dell (1986). Within this perspective, when speakers have to inflect a new word, as is the case in a pseudoword test, it is these rule-based processes that come into play. The system selects the appropriate morphemes, concatenates them and, as a result, speakers pronounce the plural "wugs" for the noun "wug" or the past tense "wugged" for the verb "to wug". However, similarly to the rules of the previously described Minimal Generalization Learner, it is unlikely that the embedded linear rules are able to handle the morphologically non-linear patterns of Maltese.

Analogy-based approaches
In another group of models, morphological processes are represented as analogies. In analogical processes, generalisations are formed on the basis of similarities among the stored forms (Albright & Hayes, 2003, p. 4) and this knowledge is applied when a (new) form needs to be inflected. Different kinds of mechanisms have been proposed to obtain similarity for the analogy-based process.
One possibility to assess similarity among words are kNN algorithms (k-nearest neighbours). This is for example the case in the Tilburg Memory-Based Learner (TiMBL Daelemans et al., 2001), which is a computationally implemented memory-based analogical model proposed by Daelemans and Van den Bosch (2005). Similarity among forms is obtained by comparing for each position in the words how similar the sounds in that position are. Similarity can be computed in a strict way, in which a p resembles a p and nothing else, or in a more loose way based on phonetic features in which a p resembles a p, but also a b and an f (Daelemans & Van den Bosch, 2005). Nieder, Tomaschek, et al. (2021) applied TiMBL to computationally model the Maltese plural formation. The model was trained on a data set of Maltese broken and sound plurals and were given the task to predict which class out of 8 plural classes is appropriate for a given singular. Under 10-fold cross-validation TiMBL's best accuracy was 97%. The model proposed by Skousen (1989) works in a similar way, but has a different computational implementation than TiMBL. Despite the success of a memory-based model such as TiMBL in classifying Maltese nouns, as reported in Nieder, Tomaschek, et al. (2021), there are two issues with memory-based analogical models and how these capture the knowledge of the Maltese noun plural system. The first issue is that the similarity of words is assessed by comparing words in an edge-aligned fashion. The researcher decides whether the model aligns all words from the left or the right side of the word.
Yet it is doubtful that this is a cognitively valid approach, since speakers have nobody to tell them where the comparison between words has to start, or where it has to finish. Unfortunately, phonological similarity among words without pre-specifying where to start the comparison is currently beyond the reach of memory-based analogical models.
Another issue is that these kind of analogical models capture similarity by finding the largest groups of forms that are most phonologically similar to a form for which a generalisation needs to be made (Daelemans & Van den Bosch, 2005;Keuleers et al., 2007). While mathematically very powerful, learning only on the basis of similarity disregards that human learning is best modelled as an error-driven, discriminative process in which the goal is to minimise prediction errors (Ramscar et al., 2013a(Ramscar et al., , 2010. This goal is accomplished by taking into account occurrences and non-occurrences between events, as well as the established learning history. As Ramscar and Yarlett (2007) and Ramscar et al. (2010Ramscar et al. ( , 2013b have demonstrated, this process can be modelled by using an error-driven learning rule, like it has been proposed by Rescorla and Wagner (1972).
To give a short example of how discrimination and error-driven learning work, consider the German verb forms schreibe "(I) write, am writing" and schreiben "(we, they) write, are writing". For the sake of simplicity, we will use syllables as cognitively plausible units in this example, indicated by a period. Upon hearing the syllables in schrei.be the learner increases the weights on the association between the syllables of the verb and its meaning, "(I) write, am writing". Upon hearing schrei.ben, the weights of schrei and ben to its meaning "(They) write, are writing" are increased, and the weight of schrei to the meaning of "(I) write, am writing" is decreased. This is because on the basis of previous experience, the schrei predicts "(I) write, am writing", but this prediction is wrong for the verb form schrei.ben (Ramscar et al., 2010;Rescorla, 1988). Due to this process, and because syllable structure and morphological structure are not isomorphic, morphology can emerge from discriminative learning, even though there is no morphological structure specified in the input.
These studies applied the error-driven learning rule to predict human behaviour. Rumelhart and McClelland (1986), by contrast, trained a simple recurrent neural network using the error-driven learning rule to predict the formation of English past tense verbs on the basis of present tense verbs. During learning, analogies are formed through the creation of associations between the phonetic similarity of word forms, on the one hand, and their grammatical class, on the other hand. After training, the model is asked to produce a past tense form of present tense words it has not seen yet. It does so by assessing the phonetic similarity of the new word to words it has been trained with, and producing past tense forms that closely resemble forms in phonetically similar word forms from the training.
In the present study, we will follow Rumelhart and McClelland (1986)'s example and employ an errordriven learning algorithm to form analogies. The algorithm is embedded in the Naive Discriminative Learner of Arppe et al. (NDL, 2018) that presents a simple twolayer input-output neural network.
In addition to using TiMBL, Nieder, Tomaschek, et al. (2021) have also demonstrated that NDL provides a useful tool in predicting Maltese plural classes. They report high mean accuracies for models that include information about plural word forms for their plural class predictions (88.7% and 80.7% respectively).
In the present study, we used two types of model structures to predict plural type for wug words. Given that NDL presents a simple two-layer network without any hidden layers, in the first approach we tested how this simple network performs. However, from a cognitive perspective, it is possible that cognitive processes during a wug test do not directly generate the inflected pseudoword. Instead, they may actually take detour: first, the (orthographic/acoustic) cues of the pseudo-word activate a plural form of a real word. By analogy to the real plural word, the pseudo-word is inflected. We tested this hypothesis in a set up with two networks.
By modelling the results of a wug test for a language with an intricate semi-productive plural system, we bring the error-driven learning algorithm to bear on an even wider range of phenomena. This brings us to the aims and scopes of the present study.

The present study
In the present study, we used NDL to model the results by Nieder, van de Vijver, et al. (2021a), who studied how native speakers of Maltese pluralise words and pseudowords in a wug test. By comparing the model's predictions with the behaviour of human participants, we address how the morphology of a language is represented by its speakers and how this knowledge is applied when inflecting novel words. In doing so, we move beyond a computational approach centred on classification accuracies such as presented in Nieder, Tomaschek, et al. (2021), and focus on the similarities of plural class predictions between NDL and human speakers. The result of our modelling approach is a new perspective on morphological knowledge for languages with linear and prosodic (non-linear) morphological processes, not in the form of rules, but as a result of analogy acquired through discriminative learning.
In the next section, we will describe the phonological and morphological characteristics of Maltese, before focussing on Nieder, van de Vijver, et al.'s (2021a) wug experiment. After describing the details of our modelling approach, we present our modelling results. We conclude the paper with a discussion in which we relate the present findings to theories of morphological knowledge.

Maltese
Having discussed the theoretical background of the present study, we next present the material and experimental background for the current computational study.

Broken and sound plurals
Maltese is a non-concatenative Semitic language that developed from a Maghrebi Arabic variety and is highly influenced by several concatenative languages (e.g. English, Sicilian, Italian) due to the colonial history of the Maltese islands. As a consequence, the Maltese noun plural system shows an opposition of concatentive sound plurals, e.g. annimalannimali "animals", vs. nonconcatenative broken plurals, e.g. qattusqtates "cats".
To express a sound plural, Maltese native speakers have to select one of 12 different suffixes (Nieder, van de Vijver, et al., 2021a). In the case of broken plurals the choice is based on 11 different broken plural patterns (Nieder, van de Vijver, et al., 2021a;Schembri, 2012). Table 1 taken from Nieder, Tomaschek, et al. (2021) displays all Maltese sound plural suffixes and broken plural patterns. Although, as Nieder, van de Vijver, et al. (2021a) have reported in their production study, there is no default pattern that is used more frequently than is warranted based on the frequency of the pattern in the lexicon, some sound and broken plurals showed a limited productivity in the experiment similarly to their limited productivity in the lexicon (see corpus and experiment columns in Table 1). Moreover, citing Camilleri (2013), Nieder, van de Vijver, et al.
(2021b) report that recent loan words are integrated into the lexicon using sound plural suffixes only, such as kejk-kejkijiet/kejkijiet "cakes". As a result, the Maltese plural system should be characterised as semi-productive. Nevertheless, the variety clearly raises the question about the grammatical knowledge that emerges from this kind of system and thus, what kind of information native speakers use to inflect novel word forms.

Production study on Maltese plurals
The current study intends to predict the results of the wug experiment on Maltese performed by Nieder, van de Vijver, et al. (2021a) in which they tested the mapping of Maltese singulars onto plurals. Concretely, they investigated to what degree the phonological form of the (pseudo) singular determined the plural class of the plural form (broken/sound).
Eighty adult native speakers of Maltese were asked to pluralise frequent and infrequent existing Maltese singulars as well as phonotactically legal pseudo singulars. Pseudo-words were constructed by changing the consonants, the vowels or both, consonants and vowels, of existing words. Following this procedure the phonotactics of Maltese word forms was maintained while creating non-existing singulars that show different degrees of similarity to real Maltese words.
Based on a wug test setup, Nieder, van de Vijver, et al. (2021a) presented the experimental items with pictures of existing things (existing words) and fantasy animals (pseudo-words). The primary research question was to investigate if Maltese native speakers make use of a morphological default rule as a productive pluralisation strategy as proposed by Marcus et al. (1995). In this case, some sound plural suffixes or broken plural patterns were expected to be used more frequently than others regardless of their frequency in the lexicon. This hypothesis directly reflected the assumption of a dual-mechanism approach of morphological processing (Marcus et al., 1995;Martinet, 1965;Pinker & Prince, 1988, 1991. If Maltese speakers instead based their pluralisation strategy on the similarity of the pseudo-words with existing words in the lexicon, Nieder, van de Vijver, et al. (2021a) took this as evidence for an analogical mechanism of morphological processing (Albright & Hayes, 2003;Daelemans & Van den Bosch, 2005;Skousen, 1992). Nieder, van de Vijver, et al. (2021a) found a positive correlation between the proportions of sound suffixes and broken plural patterns in a Maltese corpus and the proportions of suffixes and plural patterns used for the pluralisation of pseudo-words. Moreover, wrong plural answers for existing nouns mainly occurred in the case of infrequent broken plural nouns. This indicates that Maltese native speakers were more certain about sound plurals than about broken plurals.
As for the pseudo-words, the Maltese participants demonstrated a great amount of variation. Individual items were pluralised with both sound and broken patterns. This indicates that speakers of Maltese often accept several plural forms for one singular, even if they would only use one of these forms themselves. Overall, these results showed that speakers did not make use of a default rule. Rather, they base their plural choice in analogy to similar forms and their frequency in their lexicon (Nieder, van de Vijver, et al., 2021a).

Methods
The following sections are dedicated to the computational approach presented in this study. We will first focus on the composition of the data sets used for training and testing the network before reporting details about the model design. Finally, we will present the results of our modelling efforts.

Data sets
In this study, we model the results of the production experiment by Nieder, van de Vijver, et al. (2021a) described in Section 2.2. To be able to do so, two different data sets were used.
The first data set is a collection of 3174 singular-plural pairs (after the removal of unclear data). This data set was originally compiled by Nieder, van de Vijver, et al. (2021a) from the MLRS Korpus Malti 2.0 and 3.0 and a broken plural list collected by Schembri (2012). The compiled list of all nouns from the corpus was then matched with data from the only dictionary Ġabra (Camilleri, 2013) by using the free corpus tool Coquery (Kunter, 2017). It was already used by Nieder, Tomaschek, et al. (2021) for their modelling study and slightly reduced for this study due to unclear data. The data set contains a quasi-phonetic transcription of the singular and plural forms obtained by replacing each phone with exactly one letter or symbol (see appendix). As can be seen in the corpus column of Table 1, sound plurals represent the majority of plural forms in this data set that served as training data for our NDL models.
The second data set was taken from Nieder, van de Vijver, et al. (2021a) and contained 8972 observations from their production study. We added a quasi-phonetic transcription of the singular forms in the same way as was done for the training data set. This data set was used as a test data set for the NDL model.
A comparison of the corpus and the experiment column of Table 1 shows that the proportions of Table 1. Maltese broken and sound plurals, words taken from Nieder, Tomaschek, et al. (2021).
Note: The proportions in the corpus column are based on the training data set that was used for this study. The experiment column is based on the proportion of the patterns used in the wug experiment by Nieder, van de Vijver, et al. (2021a).
pluralisation types in the experiment is very similar to the proportions in the Maltese lexicon, with the sound plural suffixes -i, -ijiet and -iet being used in most cases. As Nieder, van de Vijver, et al. (2021a) have established, their participants used the most frequent plural classes in the Maltese lexicon to pluralise words they have never heard before.
Having discussed our material, we will describe our modelling approach as well as the design of our models that we used to investigate to what degree NDL is capable predicting the results of a wug test in the next section.

Modelling approach and model design
Modelling experiments in NDL are typically performed in a two step process. In the first step, a network is trained to associate input to output; in the second step, the test environment, the trained network is used to predict the output on the basis of newly presented input.
In our case, in the first step, we trained the NDL network to learn to associate provided word form cues with plural class outcomes. Training was done with the Danks equilibrium equation (Danks, 2003) that, in contrast to the error-driven learning equations by Rescorla and Wagner (Rescorla & Wagner, 1972), allows fast computation of connection weights between cues and outcomes. For a detailed discussion of both equations, see Ramscar et al. (2010) and Baayen et al. (2011). The training takes into account the co-occurrence and the non-occurrence of cues and outcomes, as a result of which cues are submitted to cue-competition. This means that during training, the network learns which cues are predictive and informative about an outcome, and which cues are not.
Previous studies investigated the nature of mechanism that is responsible for the assessment of similarities. However, as far as we know, none have assessed the nature of the input to the mechanism. This is why we investigated what types of cues the networks need to be provided to obtain the best classification of plural classes. Given that NDL uses n-grams, we tested to what extent the prediction of plural classes for pseudo-words varied depending on whether word form cues were based only on singular word forms, on plural word forms, or on both, singular and plural word forms. A total of 3190 singular forms and 3190 plural forms were used. When cues were based on singulars and plurals, they were used as independent entries to maintain strong cue competition. Since Nieder, Tomaschek, et al. (2021) report better classification results for Maltese plural classes when word forms were transformed into diphones as cues in their NDL models as compared to triphones as cues, we used diphones in all our models. Training of the model results in a network that represents the degree of connection strength between diphone cues and outcomes. In the second step, we tested what kind of plural classes the trained network predicted, when presented with the diphone cues of the pseudo-words used in Nieder, van de Vijver, et al.'s (2021a) wug test. This is accomplished by extracting the connection weights between each pseudo-word's diphone cues and all plural class outcomes. By summing the weights between each cue set and each outcome, we obtain the pseudo-word's activation. This measure represents the amount of support from a cue set to each outcome. The outcome with the highest activationi.e. the plural class with the strongest supportis regarded as the winner of the classification process. Figure 1 illustrates these two steps for our modelling experiments.
Recall that it is also possible that speakers do not directly generate but use the cues of the pseudo-word activate a plural form of a real word on whose basis the pseudo-word is inflected.
We tested this hypothetical process in a double network design, in which we trained two networks (network A and network B) in a first step and then made predictions about the possible plural class of the pseudo-word in the second step. As illustrated in Figure 2 (top left) network A was trained to discriminate real Maltese plural words on the basis of diphone cues of real Maltese words. As in the preceding network design, we used either singular nouns, plural nouns, or singular and plural nouns as a basis for diphone cues. Since in its current form NDL is not capable to generate any words on its own, we directly jumped to the plural class classification of the plural word by training Network B to discriminate plural classes on the basis of real Maltese plural words (Figure 2, top right). To obtain predictions about the plural class for each pseudo-word (see Figure 2, bottom), we first used the diphone cues of the pseudo-word to obtain a prediction about a real plural word from network A. The diphone cues of the predicted plural word were in turn used to obtain predictions about the plural class from network B. Nieder, Tomaschek, et al. (2021) tested what type of cues are needed for the prediction of the plural class: those of the singular form, those of the plural form or those of the singular and plural together? They found that using only singulars as cues yielded worse prediction accuracies than using those of the plurals or singulars and plurals jointly. In the present study, we follow their example and also test what types of cues most reliably predict plural classes of wug words. Combining the two network set ups with the possible cue sources, this gives us the following tested set ups: (1) Simple network: singular cues plural class (2) Simple network: plural cues plural class (3) Simple network: singular and plural cues plural class (4) Double network: singular cues plural forms plural class (5) Double network: plural cues plural forms plural class (6) Double network: singular and plural cues plural forms plural class Taking into account the different word forms used to create diphone cues (singulars, plurals, singular and plurals) and the two network designs, we ran six NDL models. Table 2 displays the cue-to-outcome structures for all six NDL models for the noun kelb "dog" as an example. The Maltese word form kelb has a Semitic origin and shows the broken plural form klieb with its CV-pattern CCVVC. The diphone cues coded as quasiphones are given in the fourth column of the table. Following the NDL notation, the hash mark # illustrates the beginning and end of a word form. The rightmost column displays the outcomes of the different models, whereas for the singular-plural models the outcome is based on a combination of the singular and plural cues (illustrated by combining both with curly brackets).
A note on the cue structure in our computational experiments: It is commonly assumed that morphological knowledge is knowledge of form-to-form mappings, with only a very rudimentary role of other linguistic modules, such as semantics (Albright & Hayes, 2003;Booij, 2010;Kiparsky, 1982;Stump, 2001). We follow this approach in the present paper. Obviously this is a very strong simplification of the speech production process. Speakers use language to make their intentions clear, i.e. by discriminating semantic contrasts, not to show off what versatile form-to-form mappers they are exactly what they are being evaluated on in wug tests. Nevertheless we had two reasons not to include sophisticated semantics in our modelling efforts. First, in order to be able to compare rule-based and analogybased approaches to the characterisation of morphological knowledge, in which semantics also leads an atrophied life, it is necessary to ignore semantics for the time being. Second, it is far from trivial to obtain the meaning of pseudo-words, as has recently been demonstrated by Chuang, Vollmer, et al. (2020). Since Nieder, van de Vijver, et al. (2021a)'s experiment consisted of relatively few pseudo-words (in comparison to the size of a corpus), we refrained from trying to include it in our current modelling study.
All material and models presented here can be found in our Supplementary Material at https://osf.io/5es2q/.

Comparing model predictions with wug test results
The aim of the present study was to investigate the knowledge Maltese native speakers demonstrated in a wug test on the basis of a discriminative learning approach. To do so, we test to what degree the use of plural classes of inflected pseudo-words in the wug test can be predicted by NDL. A direct comparison between the NDL predictions and the participants' inflections is unfortunately not possible for two reasons. First, NDL in its original form is a classifier that cannot generate new forms on its own. Second, Nieder, van de Vijver, et al.'s (2021a) participants strongly vary in how exactly they inflect a pseudo-word. Concretely, one and the same pseudo-word may be inflected in different ways by different speakers (Heitmeier & Frank, 2021;Nieder, van de Vijver, et al., 2021a;van de Vijver & Baer-Henney, 2014). This suggests thatin contrast to the predictions of an abstraction approachthere is no truly "correct" answer to how a pseudo-word has to be inflected. Since NDL will always make the same prediction for each pseudo-word, this means that NDL's classification accuracy of plural classes for individual words will strongly vary between speakers. For example, the same item could obtain a -i suffix from one speaker and a -ijiet suffix from another one. One example given by Nieder, van de Vijver, et al. (2021a) is the pseudo-word follu with its possible plurals folol, folli or follijiet provided by different participants.
Instead of inspecting classification accuracieswhich are typically considered an important criterion for a model (Albright & Hayes, 2003) -, we opted for an approach in which NDL can be regarded as a predictive mechanism for an entire community (Ernestus & Baayen, 2003;Hayes & Londe, 2006).
From this perspective, what counts is not the predicted plural class for individual words, but how variable the choice of plural inflection was across participants for individual words, and to what degree the different networks were able to predict this distribution. A second interesting question is then to what degree speakers and NDL agree in their plural classes and the proportions of predicted plural classes in the entire data set of pseudo-words, which is a direct reflection of how speakers and NDL handle the variation found in the Maltese plural system. Clearly, based on what Nieder, van de Vijver, et al. (2021a) report, the proportions of plural classes will reflect that of the training corpus. However, by means of our network designs and different cue-tooutcome structures we changed the structure of inputs to the model and how the input is learned (see Section 3.2). Accordingly, we want to inspect which method best approximates the plural class probability for individual words and the proportions of the different plural classes. This is accomplished by first running several correlations in which we compare the probabilities of plural classes for individual word forms between different groups of participants and the different network set-ups. In a second step, we correlate the proportions of plural classes obtained by NDL and the proportions present in the corpus data as well as in Nieder, van de Vijver, et al.'s (2021a) wug experiment (see also Linke & Ramscar, 2020, for proportions-based approaches to investigating variation in languages). The cues are given in the quasi-phonetic transcription. The weights of these cues to the outcomes that were established in this way were used to assess the outcomes -plural classes-for the experimental stimuli.

Plural class probability across speakers
Before we present how the model performs in comparison to the corpus and the speakers, the question arises to what degree speakers agree among themselves in their use of plural classes in both the real words and the pseudo-words.
To answer this question, we calculated a Spearman's rank correlation between the proportions of plural classes for all pair-wise speaker combinations. Figure 3 illustrates the resulting distributions of correlations.
The left density plot illustrates that speakers agreed relatively consistently on the inflection of real words, i.e. words that they know and have potentially used before. This is indicated by a correlations that range mostly between r = 0.75 and r = 1, with four peaks at around 0.81, 0.88, 0.95 and 0.99. (These peaks correspond to the structure of the data: Nieder, van de Vijver, et al. (2021a) report that they included frequent sound and broken plurals and infrequent sound and broken plurals in their experiment, leaving us with four different conditions).
The centre density plot illustrates the proportions of correlations between speakers for pseudo-words. We observe that the distribution of correlations ranges between r = 0.2 and r = 1, with most of the density mass located between r = 0.5 and r = 1 and a peak around 0.7. This indicates that while there was agreement about the plural classes between some participants, there were also pairs of participants who performed systematically differently in the wug test.
The right plot illustrates the correlation proportions when pseudo-words and real words were tested together, mirroring the effects of the two previous analyses.
In conclusion, we find that speakers agree about the usage of plural classes for real words. They also tend to agree for non-words, but show more variation for these pseudo words.

Plural class probability for individual words
In the first test, we inspected how variable the choice of inflection was across participants for each individual word, and to what degree the network was able to predict this variability. To do so, we ran one hundred correlations in which we compared the probabilities of plural classes for individual word forms from 10% of the participants to the probability of plural classes of the remaining 90% of the participants. In these one hundred correlations, the correlation results between participants ranged between r = 0.87 and r = 0.92, as illustrated in Figure 4.
These results indicate that the distribution of plural classes for individual words is very similar across participants.
In each of the one hundred correlations, we also inspected how these probability distribution of plural classes for individual words were predicted by the different networks described in Table 2. Probabilities of plural classes for individual words predicted by the network were calculated with Luce's (1959) choice rule: activations of individual plural classes were divided by the sum of the rectified activations for all plural classes. These predicted plural class probabilities were correlated to the plural class probability for each word obtained from the 90% of the participants. Figure 5 displays the correlation results for the simple model set-up. The leftmost plot displays the correlation for the model using singular as cues, the middle plot indicates the results for the model using plurals as cues and the rightmost plot displays the results for the model using both singulars and plurals as cues. As can be seen in the figure, the highest correlation results were obtained for the model using singulars as cues (r = 0.66), followed by the model using singulars and plurals as cues (r = 0.61). Figure 6 displays the correlation results for the dual model set-up. Again, the leftmost plot shows the results for the model using singulars as cues, the middle plot displays the results for plurals and the rightmost plot for singulars and plurals as cues. In this set-up, the highest correlation results were obtained for a model using singulars as cues (r = 0.53). Although the correlation results of the dual model predictions with the participants were overall lower than for the simple models, we did not find the same significant drop in the correlation results for plurals as cues. We hypothesise that this is due to the detour of predicting actual plural words before the plural class prediction for pseudo words. While the detour via existing plurals resulted in more stability for training that involved plural cues, for training with singular cues only this resulted in more uncertainty.

Plural class probability across all words
In Section 4.2 we discussed the predictions of NDL for individual words across participants, but this does not shed light on the enormous variation among plurals classes in Maltese (see Table 1). This can be accomplished when we investigate NDL's predictions across all words, which (of course) contain all plural classes available in our data. In order to achieve this we assessed to what degree the NDL network is able to predict plural classes across all words, and how these predictions mirror those of the participants in the wug test. To do so, we correlated the proportion of plural classes across words obtained in the experiment and our NDL models. Moreover, we compared both to the probability distribution of plural classes in the Maltese corpus. The dotplots in Figure 7 illustrate the log-transformed proportions (x-axis) of plural classes (y-axis) in the corpus (blue circles), in the participant responses in the wug test (pink triangles) and in the predictions obtained from NDL (grey squares). The left column illustrates the simple network design, the right column the double network design (the proportions for the corpus and the experiment are the same in both columns). The   dotplots illustrate that overall there seems to be a strong correlation between the corpus, the experiment and the plural classes predicted by the two network designs. To evaluate this finding in more detail, we ran Spearman's rank correlation tests to assess the relationship between the proportions in the corpus, the proportions in the experimental results and the proportions in the modelling approaches.
Unsurprisingly, we found a significantly positive correlation between the participants' results and the proportions of the plural classes in the corpus (r = .86, p<0.01). Table 3 illustrates the correlations between the corpus, the results in the wug experiment and the proportions of predicted plural classes by NDL, depending  on the cue structure provided to the network. Significant results are highlighted in bold font (for sake of clarity, we repeat the correlations here in the text again). Overall, we observe similar ρ-values for all tested models (except for the simple model using plurals only as cues). Since all models use the same discriminative mechanism, the similar successful performance could be seen as an indicator for error-driven learning as the cognitive mechanism representing the morphological knowledge that participants show in pseudo-word inflection. The difference in performance for the cue structures and model set-ups then indicates what kind of information about word forms is actually needed to inflect novel singulars in an informed way.
We first turn our attention to the findings depending on the simple network structure (top row of Table 3). When using singulars as cues, we found a high positive correlation between all modelling results and the corpus data (r = .82, p<0.01). This indicates that the model is capable to reflect the variation in the Maltese plural classes as attested in the corpus data. The high correlation between the participants' results and the NDL data (r = .73, p<0.01) illustrates that the simple model with singular cues is capable to predict relatively well the variation across the plural classes in the lexicon of participants, as reflected in a wug test.
When plurals were used as cues, we did not find any significant correlations between predicted proportions of plural classes and those in the corpus as well as in the experiment (r = .35, p = 0.16/r = .23, p = 0.38). The low, non-significant correlations indicate that plural forms are not sufficiently informative as to accurately predict plural classes, as a consequence of the variation among the plural classes. As displayed in Figure 3 (mid left), we find that NDL overgeneralises the sound plural suffix -a and undergeneralises the two most frequent suffixes -i and -ijiet.
When singulars and plurals are used as cues in the simple network design, we find that the correlations in both cases, the corpus and the experiment, are higher than when using singulars only (r = .86, p , 0.01/r = .76, p<0.01). This may come as a surprise, as the use of plural nouns as cues has not yielded significant results. However, recall that NDL takes into account cue-competition. This means that when plurals are used with singulars, together they very likely increase cue competition, making cues more informative about each plural class. As a result, higher correlations are observed.
Next, we turn our attention to the correlations between the corpus and the experiment, and the proportions of plural classes predicted by the double network design. Here we find that using singulars as cues yields the highest correlations (r = .90, p , 0.01/r = .77, p , 0.01) among the different cue structures provided to the double network design. Using plurals as cue yields significant results. Nevertheless, plural cues result in lowest correlations (r = .85, p , 0.01/significant, r = .65, p<0.05), with singular and plurals as cues yielding medium-sized effects (r = .87, p , 0.01/r = .71, p<0.01).

Summary
As the complexities of Maltese nouns put them beyond the purview of abstraction-based approaches, and analogy-based approaches by means of memory-based learning make unwarranted assumptions about the assessment of similarity and learning, we decided to make demands on the Discriminative Lexicon by using it to model the results of a Maltese wug test.
Our aim was to shed light on the morphological knowledge speakers use to inflect pseudo-words in wug tests. To this end, we used NDL (Arppe et al., 2018;Baayen et al., 2011), a two-layer network rooted in Discriminative Learning (Ramscar et al., 2010) to model the proportions of plural classes of inflected pseudo-words in Nieder, van de Vijver, et al. (2021a)'s wug test with 80 adult native speakers of Maltese.
We ran NDL models with different cue-to-outcome structures in which the plural class was predicted either on the basis of a singular form, a plural form or both. Moreover, we tested two network designs to make predictions for the wug test. In the first, cues of real word forms predicted plural classes directly; in the second, called "double network", cues of real word forms first predicted real plural words, whose cues in turn predicted the plural classes (see Figures 1 and 2 again for a detailed description of the designs).
Once we obtained the plural class predictions for each pseudo-word, we first investigated the variability among the participants' responses and the NDL predictions. To this aim we correlated the probability distribution predicted by NDL with the probability distribution in participants. For the simple model setup, we found the highest correlations for models using singulars, or singulars and plurals as cues, and an extremely low correlation for the model using plurals as cues.
For the dual-model set-up, the highest correlation was obtained for a model using singulars as cues. Overall, the correlations between participants and NDL models ranged between r = 0.05 and r = 0.65 and can thus be described as yielding medium-sized effects in the best case.
A possible explanation for this low to medium-sized effect lies in the nature of using a computational model like NDL for modelling data. The NDL model predicts plurals for pseudo words just like the participants in the experiment did. However, when correlating the probability distributions, we are comparing 80 human participants with one single artificial participant (= NDL), resulting in lower correlations due to the variation in the participants' responses that were reported in Figure 3. Nevertheless, the medium-sized effect correlation results indicate that NDL can capture some of this variability and produce human-like predictions.
Another reason for the lower correlation results is the nature of cues available for speakers vs. for NDL. While the NDL model is based on a form-to-form mapping only, the participants' responses might be influenced by other cues such as meaning . During the wug experiment, Nieder, van de Vijver, et al. (2021a) presented the participants with pictures of existing things and fantasy animals. It is very likely that the presented fantasy animals carried semantic content such as colours, animacy or resemblance to existing animals. This semantics may lead to specific semantic cues for certain plural classes. Since meaning is not available to NDL, the model is not able to fully capture and mimic the participant's plural choices.
In short, we have shown that NDL is able to predict the behaviour of participants at the level of the individual word.
We furthermore looked at the correlation between the predictions of NDL across all plural classes and the variation in the answers of the participants across plural classes. This was done in order to better understand the effect of variation in the lexicon on the behaviour of participants in a wug test.
As illustrated in Table 3, for the simple network design, we find that the highest correlations between plural class proportions were obtained when both, singulars and plurals were used as cues in the training. The lowestand thus not significantcorrelations were obtained when the model was provided with plural cues only, similarly to the results of the correlation of the probability distribution among participants and the NDL predictions. The question thus arises why this is the case.
One reason for why using plural cues yielded no significant correlations may be found in similarities between the word forms of real plurals and pseudo singulars in Nieder, van de Vijver, et al.'s (2021a) study. In the training phase, NDL had only access to plural cues and learned to relate them to plural class outcomes. Some of the words with an offset -a are broken plurals that simply end in the vowel. However, many words actually belong to a sound plural class in which the plurals end in -a. Since -a is the Maltese feminine marker for nouns, Nieder, van de Vijver, et al. (2021a) constructed their pseudo words such that many of the pseudo singulars end in an -a (Borg & Azzopardi-Alexander, 1997). When faced with the pseudo-words, but trained on real plurals, this learned relation most likely forced NDL to predict the sound plural -a for a great amount of pseudo singulars that end in a. NDL overgeneralised the sound plural suffix -a and undergeneralised the sound plural suffix -i (see Figure 7, bottom left) in the pseudo singulars used by Nieder, van de Vijver, et al.'s (2021a) study. As a result, we observe a low correlation.
Turning our attention to the double network design, we find that using singular cues yielded the highest correlation while adding plural word form to the training reduced the correlation. In addition, the proportions of plural classes obtained with the double network had a higher correlation with the proportions of plural classes in both the corpus and the experiment. These findings have implications for our understanding of wug tests that we will discuss below.

Implications
Our results have implications for theories of morphology and the mental lexicon. Since production of novel inflected forms can be modelled without recourse to morphemes, an implication for morphological theory is that morphology arises from the mapping of form onto function (as in our model) or meaning (as in Baayen et al., 2019). Recourse to abstract categories, such as morphemes, is unnecessary (see also Ambridge, 2020). This finding is in turn in line with other work on Maltese nouns (Nieder, Tomaschek, et al., 2021;Nieder, van de Vijver, et al., 2021a, 2021b, and is explained by the theory of Word and Paradigm (Blevins, 2016).
Our current approach, then, is in agreement with much recent work arguing that morphemes, or other abstract categories, are maybe helpful in typological or diachronic analyses of languages, but are not part of the implicit knowledge that native speakers have of their language (Ambridge, 2020). The rules that manipulate prosodic structure as morphemes and which have been proposed to account for Maltese broken plurals (Schembri, 2012) succinctly describe groups of broken plurals. But our modelling shows that it is possible to classify Maltese plurals into groups without rules or prosodic structure, or morphemes. As it is not clear how the rules would apply to novel words, our modelling is a better fit to the content of the mental lexicon of Maltese native speakers compared to a rule-or abstraction-based model.
Our model is part of the theory of the Discriminative Lexicon in which the associations between word forms and their grammatical functions are learned in an error-driven way. The modelling in the Discriminative Lexicon theory is based on error-driven learning which is well-supported cognitively for language learning and processing (Nixon, 2020;Nixon & Tomaschek, 2021;Olejarczuk et al., 2018;Ramscar et al., 2013a;Ramscar & Yarlett, 2007) and offers an alternative to analogical modelling. This is necessary, amongst other reasons, because in analogical modelling it is unclear how similarity is learned. In these analogical models, the start of the similarity comparison is pre-specified by the researcher, but it is unclear how a human learner can decide what is similar to what, why and where to start (Gahl & Strand, 2016).
Our results raise the question: Can we, on the basis of the present models and results, infer any kind of implications for cognitive processes that may take place during pseudo-word inflection? While we successfully modelled the participants' behaviour in a wug test, the modelling designs we applied are a rather abstract and simplified representation of the cognitive process happening in native speakers of Maltese (or in a broad sense of any language). In the simple network design, we mapped phonological word forms onto plural classes that are the result of grammatical analyses and of which speakers of Maltese are most likely unaware (or if they are, they are the result of long school education). This problem applies more so to the double network design in which the classification took a detour over a real plural word. Speakers do not explicitly and actively select a plural word on the basis of the pseudo-word and assign a plural class to it like our NDL models did. Instead, this might be more of an implicit analogical comparison.
Another perspective missing in the present approach is the role of semantics that might have influenced the results of the correlations presented in Figures 5 and 6 of this study. Semantic information is a prerequisite for inflectional classes in many languages (Haspelmath, 2013) and recent modelling studies demonstrated that it improves prediction accuracies when taken into account also for languages such as German or English (e.g. Baayen et al., 2019). Ramscar (2019) even argues that semanticseven in the form of inflectional functions which establish semantic relations in a given contextare the cues which discriminate word forms. This perspective has recently been shown to result in better predictions of fine phonetic detail than when functions are the outcome of the discriminative process (2022). By contrast, the present classification was based purely on the phonological properties of the words and ignored potential interference from sentential context. However, each of these arguments could be used against any kind of computational modelling approach that was applied in the last decade to investigate cognitive processes and their effects on perception and production. As all computational models, the present approach was a simplification of speech production which in reality is a highly complex neural process that results from interactions of different layers of abstraction (Hickok, 2014). Accepting this reduction of complexity as a valid step, we regard our approach to be successful in demonstrating that inflecting pseudo-words does not necessarily need to be based on rules. Instead, our results indicate that inflection of pseudo-words in wug testsand therefore very likely the inflection of new words encountered in every day languageis based on analogical processes. Speakers are capable to apply these analogical processes on the basis of their (learned) knowledge of the relations between the phonological properties of inflected words and the grammatical functions they want to express (such as plural). When presented with pseudo-words, together with the "plural" cue that is implicitly present in the experiment, the words' phonological cues start a discriminative, predictive process during which either the abstraction in terms of inflectional process (represented by our plural class) is started. Or, as indicated by the double network design, the cues of the pseudo-word first activate real inflected words that serve as a template for the inflection of the pseudo-word. At no point in time there is a rule. Instead inflection of pseudo-words is based on cue-to-outcome structures that are learned. The question thus arises what the responsible mechanism is that allows speakers to learn the association between phonological form and inflectional type? From our theoretical perspective, accepting the above mentioned simplifications of the cognitive process as necessary for our modelling purposes, the knowledge that speakers use to inflect pseudo-words emerges through discriminative learning.
In conclusion, the responses in wug tests do not need to be explained by assuming the application of morphological and phonological rules to morphemes. Instead, speakers learn how sounds discriminate among meanings and functions of words, and apply this knowledge to pseudo-words. the World International Conference 2021 for their feedback and comments.

Data availability statement
The data that support the findings of this study are openly available at https://osf.io/5es2q/.

Disclosure statement
No potential conflict of interest was reported by the author(s).