Skilled readers’ sensitivity to meaningful regularities in English writing

Substantial research has been undertaken to understand the relationship between spelling and sound, but we know little about the relationship between spelling and meaning in alphabetic writing systems. We present a computational analysis of English writing in which we develop new constructs to describe this relationship. Diagnosticity captures the amount of meaningful information in a given spelling, whereas speciﬁcity estimates the degree of dispersion of this meaning across di ﬀ erent spellings for a particular sound sequence. Using these two constructs, we demonstrate that particular su ﬃ x spellings tend to be reserved for particular meaningful functions. We then show across three paradigms (nonword classiﬁcation, spelling, and eye tracking during sentence reading) that this form of regularity between spelling and meaning inﬂuences the behaviour of skilled readers, and that the degree of this behavioural sensitivity mirrors the strength of spelling-to-meaning regularities in the writing system. We close by arguing that English spelling may have become fractionated such that the high degree of spelling-sound inconsistency maximises the transmission of meaningful information.


Introduction
One, two!One, two!And through and through The vorpal blade went snicker-snack! Lewis Carroll, Jabberwocky, 1871/1983 Recent research has uncovered substantial patterns of non-arbitrariness in language (see Dingemanse, Blasi, Lupyan, Christiansen, & Monaghan, 2015, for review).Corpus studies reveal subtle phonological differences between words that mean different things: for example, words referring to objects are composed of different sounds and are stressed in different positions than words referring to actions (Cassidy & Kelly, 1991;Monaghan, Shillcock, Christiansen, & Kirby, 2014).These phonological cues play an important role in language acquisition (Fitneva, Christiansen, & Monaghan, 2009) and aid syntactic processing in adulthood (Farmer, Christiansen, & Monaghan, 2006;Monaghan, Christiansen, Farmer, & Fitneva, 2012).These findings go against the traditional perspective that the links between phonological forms and concepts are arbitrary (De Saussure, 1983).
In this article, we investigate the nature of statistical regularities between the written forms of English words and their meanings, focusing on derivational suffixation.These units capture the commonality between known words that are related in meaning (e.g. as in 'actress', 'waitress') and provide a basis for language productivity, allowing us to label new concepts and phenomena in the ever-changing world (e.g.'Trumpists' and 'Brexiteers';see Algeo, 1991).We present a computational analysis of English writing in which we develop new constructs to describe the relationship between spelling and meaning.This analysis demonstrates that particular suffix spellings tend to be reserved for particular meaningful functions.We then show across three paradigms that skilled readers are highly sensitive to this form of regularity, and that this sensitivity mirrors the strength of spelling-to-meaning regularities in the writing system.We close by arguing that English spelling may have become fractionated in such a way that the high degree of spelling-sound inconsistency maximises the transmission of meaningful information in the service of skilled reading.on the first syllable while bisyllabic verbs have stress on the second syllable.In the last decade, about sixteen phonological cues to lexical category in English have been proposed in the literature (for a review, see Monaghan, Chater, & Christiansen, 2005).For example, Onnis and Christiansen (2008) conducted corpus analyses in several languages showing that word-final phonemes could discriminate lexical categories almost as well as suffixes.
Language speakers extract these probabilistic cues from their environment, and exploit them in language learning and language processing.Cassidy andKelly (1991, 2001) and Fitneva et al. (2009) found that children learn object and action referents better if there is a correspondence between the sounds of the nouns and verbs and their respective lexical categories.Further, using a self-paced reading methodology, Farmer et al. (2006) found that when sentence context generated an expectation for a noun, phonologically-typical nouns (e.g.'marble' whose phonological features cluster with those of other nouns) were read aloud faster than phonologically-atypical nouns (e.g.'insect' whose phonological features cluster with those of verbs).Information about lexical category is critical for comprehending language (e.g.'we saw her duck' is interpreted differently depending on whether 'duck' acts as a noun or a verb), and for producing syntactically valid utterances.It is also well known that the ability to pick up and exploit cues to lexical category is very important in language acquisition.Children quickly construct hypotheses about the lexical category of new words to be able to use these new words productively in sentences (Cassidy & Kelly, 1991, 2001;Houston-Price, Plunkett, & Harris, 2005;Markman & Wachtel, 1988).
If writing were a direct transliteration of spoken language, then these regularities between phonological form and meaning would also be present in written language.However, in English (as in many other alphabetic languages), there is not a straightforward relationship between the sounds and spellings of words.Because statistical regularities between printed words and their meanings may impact the acquisition of literacy and the skilled processing of printed words (e.g.Frost, 2012), it is important to consider the nature of these regularities themselves.Our work explores the possibility that adults extract probabilistic morpho-syntactic cues from written language and exploit them in reading and writing.We focus on morphology because it provides the site of the most substantive regularities between English printed words and their meanings (Plaut & Gonnerman, 2000;Rastle, Davis, Marslen-Wilson, & Tyler, 2000).

Morphological regularities and skilled reading
Four decades of research have demonstrated that skilled readers register and exploit morphological regularities in printed words.Taft and Forster's (1975) seminal research demonstrated that nonwords with an apparent morphological structure (such as 'dejuvenate' that consists of an existing prefix DE-and stem1 JUVEN as in 'juvenile', 'rejuvenate') take longer to reject in a lexical decision task compared to nonwords without such structure (e.g.'depertoire' whose "stem" does not exist).This effect is thought to arise because morphologicallystructured nonwords like 'dejuvenate' contact stored representations of morphemes (in this case, of JUVEN), making it difficult to decide that the stimulus is not an existing word.This finding is interesting because the stem JUVEN does not occur on its own, and it is unlikely that readers are ever explicitly taught that JUVEN has something to do with 'youth'.Rather, this knowledge likely reflects an accumulation of experience with spoken and written language, which ultimately becomes represented in the reading system.
Since the publication of Taft and Forster (1975), a wealth of converging data from a variety of experimental paradigms has supported the conclusion that the analysis of morphological information is a crucial part of skilled reading.One of the most powerful pieces of evidence for this comes from morphological priming experiments in which primes are masked and presented so briefly that they are not available for conscious report.These experiments have revealed that derivational primes (e.g.'dealer') facilitate recognition of stand-alone stems (e.g.DEAL; Forster, Davis, Schoknecht, & Carter, 1987;Rastle et al., 2000).It is noteworthy that this priming effect cannot be ascribed to simple orthographic overlap between prime and target (e.g.brothel-BROTH; -EL is not an existing morpheme and yields no priming; Rastle, Davis, & New, 2004), or to the simple summation of orthographic and semantic overlap (e.g.sneeze-SNORE yields no priming; Rastle et al., 2000).Results from studies that use the stem frequency paradigm nicely complement this line of work.It has been shown repeatedly that words with frequent stems are processed more quickly than words with infrequent stems, even when the frequency of the whole word is controlled.For example, Niswander, Pollatsek, and Rayner (2000) manipulated stem frequency while keeping whole word frequency constant, and found early effects of stem frequency on readers' eyemovement behaviour: first fixation durations were shorter on words with frequent stems such as 'instalment' compared to those with infrequent stems such as 'deferment' (see also Andrews, 1986;Juhasz, Starr, Inhoff, & Placke, 2003;Pollatsek & Hyönä, 2005).Both types of results suggest that morphologically-structured words are represented in terms of their constituents; they are represented in a componential manner (Plaut & Gonnerman, 2000).These data again indicate that the accumulation of experience with printed words impacts on the nature of stored representations in the reading system.
In contrast to statistical regularities between print and sound, there has been remarkably little work seeking to specify how morphemic regularities between print and meaning are acquired.Recent crosssectional work investigating groups of children between 7 and 9, 12-13, and 16-17 years old has revealed ongoing accumulation of morphological knowledge throughout this whole period of reading acquisition (Dawson, Rastle, & Ricketts, 2017), although less is known about the mechanisms that underpin the acquisition of this knowledge.Tamminen, Davis, and Rastle (2015) sought to determine whether this information could be discovered implicitly through exposure to morphologically structured stimuli.They presented adults with novel vocabularies comprising an underlying morphological structure (e.g.'sleepnule', 'buildnule', 'teachnule').Critically, the definitions of these items related back to the stems (in this case, SLEEP, BUILD, and TEACH) in such a way that participants could discover an underlying function of the novel affix (e.g. that it refers to an agent).Following training and a period of memory consolidation, participants were probed for their knowledge about the novel affixes, and invited to generalize this knowledge to untrained words in speeded reading situations.Results showed that participants had extracted the underlying morphological regularities and encoded them in long-term memory in such a way that they could be used to facilitate rapid lexical processing (see also Mirković & Gaskell, 2016;Mirković, Forrest, & Gaskell, 2011).

Morphological regularities in English writing
This brief review demonstrates that readers learn to appreciate underlying morphological regularities through their experience with words, and that they use this knowledge in the service of rapid skilled reading.However, much less is known about the nature of this knowledge.When readers are accumulating morphological knowledge, what precisely is it that they are accumulating?The answer to this question requires a deep understanding of how morphological relationships impact on the mapping between print and meaning.Previous observations have highlighted the fact that morphology brings a degree of regularity to the otherwise-arbitrary mapping between print and meaning (e.g.Plaut & Gonnerman, 2000).While words that look similar generally are not similar in meaning (e.g.'cat', 'cut', 'can'), stems occur in words with similar meanings (e.g.'darkness', 'darkly'), and affixes alter the meanings of words in highly-predictable ways (e.g.'cleaner', 'teacher', 'banker'; Rastle et al., 2000).However, what type of information do affixes possess that leads them to have this impact on the writing system?
Morphological regularities exist in both spoken and written language.However, Berg and Aronoff (2017) have recently provided quantitative evidence that written forms of four English suffixes (-OUS, -AL, -Y, -IC) carry unique information about lexical category that is not available in their phonological forms.Consider the word-final phonological string /-əs/ (Table 1).This string occurs in adjectives and nonadjectives with an approximately equal probability ('marvellous', 'cactus'; compare the rows of the table).However, when we look at the spellings (compare the table columns), these probabilities change greatly.Although in general, there are various ways to spell the /-əs/ sound sequence in English (e.g.'bonus', 'atlas', 'nervous'), adjectives are virtually always spelled -OUS (see Table 1), and other spellings must be used for words of other lexical categories.The suffix spelling -OUS (but not its phonological counterpart /-əs/) therefore appears to be reserved for adjectives.
Similar dependency characterizes English inflectional suffixes.For example, it is well known that spelling -ED in morphologically-complex words is strongly indicative of past tense (e.g.Carney, 1994).This suffix can be pronounced /-ɛd/.However, what has only recently come to light is that this word-final sound sequence cannot be spelled as -ED when it occurs in morphologically-simple words that could be mistaken for morphologically complex.In such words, a different spelling is typically used (e.g.'instead'; Berg, Buchmann, Dybiec, & Fuhrhop, 2014).In fact, there are only a handful of exceptions to this dependency (e.g.seemingly complex but genuinely monomorphemic words ending in /-ɛd/ and spelled with -ED, 'moped', 'naked', 'wicked'; Berg et al., 2014).Thus, the spelling -ED is used almost exclusively to indicate the past.One consequence of this dependency is that the relationship between form and meaning appears to be stronger in written language than in spoken language.In the -ED /-ɛd/ example, for instance, /-ɛd/ does not carry the same amount of information about the morphological status of a word as -ED does.This observation goes against a commonly held belief that written language is somewhat impoverished compared to spoken language, in that, for instance, speech is characterized by intonation and stress, whereas this information is lost in writing (e.g.Seidenberg, 2017).
These observations suggest that the relationship between morphemes and meanings may be especially salient in writing.One could speculate that English spelling has evolved in such a way as to maximise the transmission of meaningful information through suffixation.Of the four derivational suffixes studied by Berg and Aronoff (2017), only -OUS was found to be an unambiguous marker of its lexical category, but other suffixes also marked meaning in some way.For example, the spelling -AL communicates adjective or noun status (i.e.not a verb), and adjectives ending in the sound sequence /-əl/ are most likely spelled -AL.Might these observations help us to understand why Lewis Carroll chose the spelling 'vorpal' as opposed to 'vorple' to describe the young hero's blade in Jabberwocky?Further, is it possible that readers are sensitive to the meaningful information that these different spellings convey, and use this information to assist their reading and spelling behaviour?
In this paper, we present a computational analysis that characterizes the relationship between spelling and lexical category across all derivational suffixes in English.This analysis allows us to determine whether this dependency is a general property of the English writing system or an idiosyncrasy limited to a few suffixes.In the light of findings of Berg and Aronoff (2017), we hypothesized that these dependencies are not all-or-nothing, but are graded in nature.We then investigate the possibility that readers capture and exploit probabilistic information about lexical category contained in suffixes when they read and spell (see also Kemp, Nilsson, & Arciuli, 2009).

Computational analysis
We propose that the relationship between spelling and lexical category can be described using two concepts -diagnosticity and specificity.Diagnosticity of a spelling describes how well that spelling predicts lexical category.For example, Berg and Aronoff (2017) observed that the spelling -OUS is strongly associated with adjective status.The specificity of a spelling indicates the extent to which that spelling is the preferred means of designating a particular lexical category for a given sound sequence, relative to other spellings.For example, Berg and Aronoff (2017) observed that given the sound sequence /-əs/, adjectives are virtually always spelled -OUS.
If we apply these new constructs to the linguistic analysis of Berg and Aronoff (2017), we see that the written suffix -OUS is both highly diagnostic for adjectives (its written form is never found in nouns or verbs; see Table 1) and highly specific for adjectives (if an /-əs/-word is not an adjective, a different spelling should be used, e.g.'bonus'; see Table 1).In this case, the written suffix -OUS appears to be reserved for adjectives.Diagnosticity on its own is reminiscent of what is referred to as "cue validity" (conditional probability of a category given a cue; Beach, 1964).Conceptually, diagnosticity and specificity are distinct theoretical constructs, since one describes the mapping from spelling to meaning, while the other characterises the mapping from meaning to spelling.But in practice, are both of them necessary?It is a valid theoretical possibility that diagnosticity and specificity always co-vary, thus dichotomising the English derivational space into two categories (diagnostic and specific vs not diagnostic and not specific).Here we hypothesise that the bidirectional relationship between the form and the meaning cannot be captured using a unitary concept (e.g.cue validity), and needs to be expressed through the combination of diagnosticity and specificity (probability of a cue given a category).Berg and Aronoff (2017) investigated only four suffixes.The aim of our computational analysis is to determine the strength of the relationship between English suffixes and lexical categories in general, and to operationalise this relationship in terms of diagnosticity and specificity.There are multiple potential ways to measure diagnosticity and specificity.One could, for example, use a naïve discriminative learning (NDL) approach, similar to that used by Baayen et al. (2011).These authors used the Rescorla-Wagner rule to estimate how predictive low-level orthographic cues (unigrams and bigrams) were of several meaning outcomes in Serbian and in English.These meaning outcomes included case marking (e.g."nominative", "genitive") and number in Serbian, and

Table 1
The dependency between the spelling of /-əs/ and lexical category (adapted from Berg & Aronoff, 2017).localist representations of words (e.g.'house') and affixes (e.g.-NESS) in English.Though Baayen et al. (2011) did not focus on the relationship between orthographic cues and grammatical categories, it may be possible to extend their approach to cover this problem.In our paper, we chose an alternative approach that relies on explicit morpheme representations.We measured the amount of meaningful (i.e.category) information in these morphemes and determined if our estimates explain human behaviour.Whether or not estimates obtained through a different approach, such as NDL modelling, will have a better fit to behavioural data is an open empirical question.

Method
We extracted all single-word entries from the CELEX database (Baayen, Piepenbrock, & van Rijn, 1993), excluding any entries with spaces or hyphens.Separate entries were created for words that could express different lexical categories (e.g.'play' as a noun or as a verb).Using this list of words as our database, we extracted all spellings annotated as derivational suffixes in CELEX that occur in at least three different words.This process revealed 154 written suffixes in total.
The computation of diagnosticity and specificity required us to extract all words containing the 154 suffixes identified.However, in addition to extracting genuinely suffixed words, we also needed to capture words in which the suffix spellings do not function as genuine morphemes (e.g. the -ER in 'corner').We included such pseudoaffixed words because there is evidence that people identify morphemes within these (Rastle et al., 2004).In order to identify these words, we extracted all words containing the spelling of each suffix, as long as the remaining (non-suffix) portion comprised a legitimate stem with at least one phonological vowel (including stems with alternations, i.e. orthographically conditioned variations, such as COMPLI-/COMPLY-).This process meant that words like 'vanish' were extracted for the -ISH suffix but that words like 'dish', 'accomplish', and 'catfish' were not, because these do not contain a legitimate stem. 3The number of words extracted for each suffix is presented in Appendix A. Our analyses consider three major lexical categories: nouns, adjectives, and verbs.Only the most frequent pronunciation was considered for each suffix.
We define diagnosticity as the strength with which a suffix spelling predicts lexical category.For a particular suffix, type diagnosticity is calculated by dividing the number of words ending in this suffix and falling into this category by the total number of words that contain the suffix.Diagnosticity is thus expressed as a ratio ranging from 0 to 1.For example, -NIK is not diagnostic for adjectives (diagnosticity is 0), whereas -TUDE unambiguously signals a noun (diagnosticity is 1).A suffix like -IAN is present in nouns ('magician'), as well as in other categories ('Amazonian'), and so is not particularly diagnostic for nouns (diagnosticity of 0.54).A second measure of diagnosticity, based on word tokens, was also calculated.This measure was obtained by dividing the summed frequency of words ending in the suffix and falling into a particular category by the summed frequency of all words that contain the suffix.
We define specificity as the extent to which a suffix spelling is preferred for denoting a particular lexical category, amongst all possible spellings of a sound sequence.In order to calculate specificity, we identified the most frequent pronunciation of each suffix, and then extracted all words from our single-word database containing that word-final sound sequence.Note that CELEX lists surface phonological forms (not phonetic/allophonic forms).We stayed as close to the CELEX forms as we could to avoid theoretical controversies.For example, the sound sequence /-ləs/4 associated with suffix -LESS also occurs in words with other word-final spellings (e.g.'necklace', 'malice').For words of a given lexical category that end in a given sound (e.g.all /-ləs/-adjectives), we identified what proportion of these contain the suffix spelling (e.g.-LESS).Like diagnosticity, specificity is expressed as a ratio ranging from 0 to 1. Spellings with the specificity of 0 for a given category never go with this category (e.g.-IOUS never occurs in /-əs/nouns cf.'bonus'), whereas the value of 1 indicates that the spelling has full specificity (e.g. when a noun ends in /-eɪtə/, it must be spelled with -ATOR).Medium values of specificity indicate that there are multiple competing spellings for a given category and sound, as with -IER and -EER that can both be used in /-ɪə/-nouns (cf.'cashier', 'engineer'; specificity of 0.49 and 0.46, respectively).A second measure of specificity, based on word tokens, was obtained by dividing the summed frequency of words ending in the phonological and orthographic sequence and falling into a particular lexical category by the summed frequency of all words that contain the spelling and fall into the category.
Four examples of suffixes that differ on diagnosticity and specificity are given in Fig. 1.In this figure, we can see that -ICAL in the upper-left panel is reserved for communicating adjective status.If a word contains the -ICAL spelling, it is almost certainly an adjective (i.e.highly diagnostic); and if the sound sequence /-ɪkəl/ is an adjective, then it must be spelled -ICAL (i.e.highly specific).However, it is clear from these four examples that these properties can vary orthogonally in English suffixes.

Results
Fig. 2 plots diagnosticity and specificity values for English derivational suffixes.The majority of suffixes are either diagnostic or specific or both.Mean type diagnosticity for suffixes is 0.78 (the corresponding token value is 0.77), mean specificity is 0.82 (token is 0.83).The correlation between the type variables is 0.53, p < 0.0001; that between the token variables is 0.47, p < 0.0001. 5This indicates that the two constructs are related although they capture distinct properties of suffixes.The full range of diagnosticity and specificity values are contained in Appendix A.
It is immediately apparent that a substantial number of suffixes have diagnosticity and specificity values of 1.This means that these suffix spellings are always associated with a particular lexical category (diagnosticity) and that these suffix spellings are the only means of communicating a particular lexical category for a given word-final sound sequence (specificity).Is it possible that this very strong relationship between spelling and lexical category is due to the presence of suffixes in our sample that occur only rarely?In order to determine this, we excluded all suffixes that occur in < 20 different words in the CELEX database (Baayen et al., 1993).For the remaining 95 suffixes, the mean diagnosticity and specificity values were virtually unchanged (mean type and token diagnosticity values were 0.76; mean type specificity was 0.83; mean token specificity was 0.82).

Conclusion
We have characterised spellings of all English suffixes in terms of diagnosticity (the extent to which suffix spellings predict lexical category) and specificity (the extent to which suffix spellings are preferred for denoting a particular lexical category for a given sound sequence).Our analyses demonstrate that the majority of written English derivational suffixes are diagnostic and specific for some lexical category.That is, suffix spellings signal the lexical category of their carrier word, and on the other hand, are preferred spellings for denoting this lexical category.This combination of measures is important because it gives rise to the intriguing hypothesis that the proliferation of spellings for particular sound sequences in English has allowed certain spellings to become reserved for the communication of meaningful information.
Our analysis extends and deepens the work of Berg and Aronoff (2017) regarding the properties of four English suffixes.We have demonstrated that our concepts diagnosticity and specificity can be used to describe the bidirectional dependency between suffix spelling and lexical category, and we have shown that this dependency appears to be ubiquitous in English derivational suffixes.In what follows, we investigate to what extent skilled readers extract these regularities from their experience with the writing system, and use this knowledge when they read and spell.In our paper, we focus solely on what are usually referred to as "feedforward" effects.That is, we hypothesise that diagnosticity, a measure of spelling-to-meaning consistency, would impact on tasks that involve the mapping of print to meaning (i.e.semantic categorisation, reading), while specificity, a measure of meaning-to-spelling consistency, would impact on spelling behaviour.That said, perception and production are intertwined processes, and one might expect "feedback" effects of diagnosticity and specificity as well (e.g.diagnosticity influencing the production of spelling; see Ziegler, Petrova, & Ferrand, 2008;Kessler, Treiman & Mullennix, 2008, for similar findings in reading aloud).This is an issue that needs to be investigated in future studies using appropriate experimental designs (e.g. a two-by-two factorial manipulation).

Experiments
Three psychological experiments tested skilled readers' knowledge and exploitation of the relationship between suffix spelling and lexical category.The three experiments comprised a lexical category judgement task (Experiment 1), a spelling task (Experiment 2), and a sentence reading task (Experiment 3).We further tested whether gradations in diagnosticity and specificity across different suffixes were mirrored in participants' behaviour.We predicted that in tasks requiring access to grammatical information from printed spellings (lexical category judgment and sentence reading), spellings with higher diagnosticity should yield stronger behavioural effects compared to spellings with weaker diagnosticity.Similarly, we predicted that in tasks requiring production of spellings, stronger behavioural effects would be observed for spellings with higher specificity for predicted categories.Data for all experiments can be found in the OSF storage for this project (https://osf.io/hac5j/).

Experiment 1
Experiment 1 tested skilled readers' explicit knowledge of dependencies between spelling and lexical category.Using our linguistic analysis, we selected suffixes that are diagnostic for adjectives or nouns, and created nonwords ending in these suffixes.We then asked adult skilled readers to make decisions as to whether these nonwords appear more like adjectives or nouns.If people are sensitive to spellings' diagnostic properties, then their responses should be predictable from nonword spelling.
3.1.1.Method 3.1.1.1.Participants.Participants were 46 undergraduate students at Royal Holloway, University of London.They were native English speakers with no history of reading, spelling, or learning difficulties.This experiment was conducted at the end of a longer, unrelated testing session.Participants were given course credit or paid at the rate of £10/ h.

Stimuli.
Twenty suffixes that were diagnostic for the noun or adjective category were selected (see Table 2).These suffixes all had a high type diagnosticity, i.e. greater than or equal to 0.50, and occurred in at least 20 different words.
Eighty non-existing stems with a CVC structure were constructed using the Wuggy pseudoword generator (Keuleers & Brysbaert, 2010; http://crr.ugent.be/programs-data/wuggy).Each stem was used only once.Nonwords were orthographically legal and pronounceable, and we did not include any pseudohomophones (e.g.'sabal' may be pronounced 'sable').Further, none of the nonwords were close neighbours of existing words; on average, they had 0.41 orthographic neighbours (with the median of 0, ranging between 0 and 4 orthographic neighbours, Coltheart, Davelaar, Jonasson, & Besner, 1977).See Appendix B for the complete list of nonword stimuli.Nonmorphological endings were not included in this experiment, because its goal was not to adjudicate between types of cues used in lexical category judgement (e.g.orthographic vs morphological).
To minimise potential priming effects, stimuli were divided into two experimental lists with 40 items per list.Participants saw only one list; each suffix occurred twice.Four practice trials were constructed using suffixes that were not used in the main experiment (-IE, -AL, -ULAR, -LING).
3.1.1.3.Procedure.Participants were seated in a quiet room and read the following instructions: "You will be asked to decide if a letter string looks like a noun or an adjective.NOUN is a person, animal, place, thing, or idea: for example, AUNT, CAT, FOREST, CUP, LOVE.ADJECTIVE is an attribute of a noun: for example, SWEET, RED, SIMPLE".Following practice trials, nonwords were presented one by one in the centre of the screen in Courier New 12-point font, and participants indicated their adjective/noun decisions via a button press.Instructions did not emphasise speed, although participants needed to make their decisions within a two-second timeout period.Experimental items were scrambled and presented in a different random order for each participant.The experiment lasted less than five minutes.The DMDX software (Forster & Forster, 2003) was used for stimulus presentation and data recording.

Results
We used the lme4 package in R for generalised mixed modelling analysis (Bates, Maechler, Bolker, & Walker, 2014) on 1625 data points (11.7% of data points were missing due to timeouts or accidental button presses that did not correspond to one of the two designated response buttons).The analysis considered the impact of suffix condition (adjective or noun suffix) on the binomial response (adjective or noun response).We included two random intercepts, one for subjects and one for suffixes.This analysis was expressed in the model glmer (response ∼ condition + (1|subject) + (1|suffix), data, family = "binomial").Results showed a main effect of suffix condition such that adjective-diagnostic suffixes elicited more "adjective" responses Further analyses were conducted to investigate whether participants' knowledge of the suffix spellings' diagnosticity mirrors statistics of the writing system.That is, we asked whether participants were better at classifying nonwords that contain strongly diagnostic suffixes than those with weaker diagnosticity.In order to investigate this statistically, we multiplied our diagnosticity measure by −1 for noun-diagnostic suffixes and substituted this measure in our model, in place of the condition variable.This analysis revealed a main effect of diagnosticity on response type: B = 0.87, z = 4.13, p < 0.0001, 6 such that more diagnostic suffixes were more likely to be ascribed to the category that they denote.We performed this analysis separately for adjectiveand noun-diagnostic suffixes to make sure that this effect was not due to the sign-transformation of the diagnosticity variable, and observed the effect of diagnosticity on both adjective-specific suffixes (B = 4.56, z = 2.88, p < 0.01, i.e. highest absolute values for diagnosticity yield more adjective responses) and noun-diagnostic suffixes (B = −3.50,z = −2.05,p < 0.05, i.e. highest absolute values for diagnosticity  yield more noun responses; see Fig. 4).The results based on token measures were similar (main effect of diagnosticity, and an effect of diagnosticity on noun-diagnostic suffixes, see Footnote 6), but adjective-specific suffixes did not show an effect of diagnosticity.

Discussion
Results suggest that skilled readers extract information about lexical category from written language.When nonwords include suffixes that are diagnostic for adjectives, participants classify these as adjectives.Conversely, when nonwords include suffixes that are diagnostic for nouns, participants classify these as nouns.The strength of these effects mirrors the diagnosticity strength of English derivational suffixes.Token-based analyses showed a weaker effect of diagnosticity.We return to this result in the General Discussion.

Experiment 2
Experiment 2 was a spelling experiment designed to test adults' knowledge of the extent to which particular spellings are used to denote lexical category for a given sound sequence.We created nonwords with endings that could be spelled in more than one way.We presented these nonwords auditorily in meaningful context conditions that biased participants to interpret them as nouns, adjectives, or verbs, and recorded participants' spellings of the nonwords.If participants have stored knowledge of the specificity of spellings for lexical category, then context should influence the spellings that they produce, such that these are congruent with the predicted category.

Stimuli.
Eleven suffix spellings that had a high specificity for noun, adjective, or verb lexical category were selected.These were the target spellings.Critically, the most frequent pronunciation of these suffixes could be spelled in more than one way (see Table 3).
Sixty-six non-existing stems were selected from the same pool as in Experiment 1. Stems were joined with suffixes at random; one suffix was joined with six different stems.None of the nonwords could be pronounced identically or similarly to existing words.The list of nonword transcriptions is available in Appendix C.
The nonwords were pronounced by an adult female native speaker of British English and recorded in stereo at a sampling rate of 22,050 Hz.Items were presented to her out of context in the IPA transcription format (see Appendix C), i.e. one transcription was used for one phonological sequence, regardless of the context the nonword would be assigned to.Recordings were subjected to the noise reduction procedure in Cool Edit Pro (2002, version 2.0, Syntrillium Software Corporation, Phoenix, AZ).
Sixty-six sentences were designed following one of three possible context templates (noun, adjective, or verb template; see Table 4 for an example of each template).Nonwords in the noun context template occupied the syntactic position of a direct/indirect object following an adjective, and so could only be interpreted as nouns.Nonwords in adjective contexts appeared after verbs 'prove', 'stay', 'grow', 'smell', 'appear', 'taste', 'sound', 'seem', 'look', 'turn', 'become', 'remain', that maximised the probability that they would be perceived as adjectives.Nonwords in verb contexts followed an adverb.For each suffix's target spelling, two context types were selected -one that was congruent with the category that this spelling denoted and one that was incongruent with it.Naturally, the pairing of congruent and incongruent context types was different across suffixes (see Appendix C for the assignment of phonological nonwords to congruent/incongruent contexts).Note that no matching across conditions was required, because the same phonological ending was used in two different context conditions.Thus, any effect could not be attributed to the properties of a phonological sequence or its possible spellings.For instance, had it been the spelling frequency that influenced participants' performance, we would observe no difference between the context conditions at all (i.e. the most frequent spelling would be used regardless of the predicted lexical category).
Each participant heard six different nonwords containing each suffix: three times in a context that was congruent with the target spelling of the suffix, and three times in a context that was incongruent with it.For instance, three different /-əs/-nonwords were embedded in sentences where they functioned as adjectives (congruent context where the target spelling -OUS is expected), and three different /-əs/nonwords were embedded in sentences where they functioned as nouns (any other spelling is expected but not -OUS).See Appendix C for the complete list of stimulus materials.
Each subject saw 66 sentences.Two additional practice trials we constructed using nonwords and sentences that were not used in the main experiment.

Procedure.
The experiment was programmed using E-Prime 2.0 (Psychology Software Tools, Pittsburgh, PA, USA).Trials began with a 1000-ms blank screen.Participants then saw a printed sentence frame with a gap denoting placement of the target nonword.This sentence frame stayed on screen for 4 s, after which a nonword was played to participants through headphones.After the nonword finished playing, participants were immediately prompted to type their spelling.Time for responding was unrestricted, and participants were allowed to make corrections to their responses prior to moving to the next trial.Sentences were presented in a different random order for each participant.The duration of the experiment was under 10 min.

Results
Suffix spellings were manually extracted from whole nonword spellings (1914 analysable data points).The dependent variable was binary and coded whether the response was the target spelling for this suffix or not (see Table 3 for the list of target spellings associated with spoken endings).The analysis considered the impact of context condition (congruent or incongruent context) on the binomial response (target spelling or other spelling).The analyses were otherwise identical to those in Experiment 1.As predicted, results showed that the Fig. 4. Explaining variability across suffixes in Experiment 1.The probability of responding "adjective" to a nonword is high if this nonword's suffix is strongly diagnostic for the "adjective" lexical category, and it is low if this nonword's spelling is strongly diagnostic for a noun.Further analyses were conducted to investigate whether participants' knowledge of suffix specificity for lexical category mirrors statistics of the writing system.In other words, is the effect of condition (congruent vs incongruent) larger for suffixes whose spelling is highly specific for lexical category than for those with a weaker specificity?In order to investigate this hypothesis, we added the continuous specificity metric for the target spelling, as well as its interaction with the condition variable, to the linear model.Note that the values of specificity for the same spelling are different for different context types (e.g.-AN is specific for adjectives and therefore congruent with adjectives, its specificity value for adjectives is 0.76; but-AN cannot be used in verbs, its specificity for verbs is 0).As predicted, the interaction between specificity and context was significant, such that the target spelling was used increasingly more often in the congruent condition as its specificity increased (B = 1.9, z = 2.8, p < 0.017 ); there was no effect of

Table 4
Examples of sentence contexts used in Experiments 2 and 3.The list of IPA transcriptions for the recordings that were played to participants in Experiment 2 can be found in Appendix C. In Experiment 3, nonword spellings were presented in place of gaps; see Appendix D.

Context type Example Noun
The candidate showed incredible < ________ > in a difficult situation.

Adjective
The unpleasant officer remained < ________ > throughout his entire service.

Verb
The worker was asked to carefully < ________ > the product for sale.specificity in the incongruent condition.These results are visualised in Fig. 6.

Discussion
These results indicate that skilled readers possess knowledge that the different possible spellings for a sound sequence convey different information.When presented with a spoken word in a sentence context, they produced spellings that are appropriate for the predicted lexical category.Further, the strength of this effect mirrored the degree of specificity of suffix spellings for lexical category in the writing system.The higher the specificity of spellings, the more frequently they were produced by our participants in the congruent context condition.This held true for the congruent condition because the target spellings were strongly linked with their respective lexical categories.When there was no such link, as was the case for the incongruent condition, the proportion of target spellings was stable and was not influenced by target spellings' specificity.

Experiment 3
Experiments 1 and 2 assessed skilled readers' knowledge of the relationship between suffix spelling and lexical category using paradigms that required an explicit response.Experiment 3 investigates this knowledge in a more natural sentence reading task using eye-tracking methodology.We designed nonwords similar to those in previous experiments, and placed these in sentence contexts that were congruent or incongruent with the nonword suffix spellings.We anticipated that nonwords placed in incongruent contexts would yield processing difficulties, and that these would be reflected in eye-movement behaviour.We did not have any concrete predictions as to which eye-movement measures would be affected by the diagnosticity manipulation, so we chose to monitor a variety of typically studied measures that characterise skips, fixations, reading time, and regressions (see Rayner, Warren, Juhasz, & Liversedge, 2004;Warren, White, & Reichle, 2009).Some of these measures are interpreted as tapping into early processing (skipping rate, first fixation duration), while others index later processing stages involving contextual integration (regressions, go-past times; Rayner & Liversedge, 2011). 3.3.1. Method 3.3.1.1. Participants.47 undergraduate students at Royal Holloway, University of London were recruited following the same procedure as in Experiments 1 and 2.

Stimuli.
Eighty experimental nonwords with suffix spellings diagnostic for adjectives (10 suffixes) or nouns (10 suffixes) were selected from Experiment 1. Forty filler nonwords whose suffix spellings were diagnostic for verbs were added so that sentence contexts would be more varied.These filler nonwords were not analysed, because the majority of verb suffixes do not discriminate lexical category (i.e.type diagnosticity is less than 0.5).The ten verb suffixes that we used as fillers were -ATE, -ADE, -EER, -LE, -YSE, -IDE, -OUR, -ICE, -URE, -ART.Orthographic neighbourhood size for experimental nonwords ranged between 0 and 4 (mean was 0.45, median was 0).
One hundred and twenty sentences were designed following noun, adjective, and verb sentence templates used in Experiment 2. Some of these sentences were identical to sentences in that experiment.The assignment of nonwords to congruent/incongruent context types can be found in Appendix D. Sentences had an average of 10.5 words each (ranging between 8 and 15 words).For each sentence there were always at least four words in the sentence before the target, and at least three words post-target.Sentence parts preceding targets were matched across conditions on the number of letters (congruent: 30.48 letters on average, incongruent: 31.70 letters, p > 0.3).Note that the number of words pre-target was slightly different across conditions (congruent: 4.75, incongruent: 5.23, t(64) = 2.07, p < 0.05).Post-target contexts were matched on the number of letters (25.02 vs 26.33, respectively, p > 0.05) and the number of words (4.38 vs 4.56, respectively, p > 0.05).Pretarget words were always at least five characters long (with no difference across conditions: 7.33 vs 7.15, p > 0.6), to reduce the likelihood of these words being skipped.
Each of the 30 suffixes appeared with four different stems, twice in a congruent sentence context, and twice in an incongruent one.We used two experimental lists to counterbalance the assignment of nonwords to the congruent/incongruent conditions across participants (List 1 is presented in Appendix D).This counterbalancing procedure was implemented to ensure that any effects of the congruency variable were not the result of potential item-level differences across nonwords.Six practice sentences were designed following the same constraints as in the main experiment..3.1.3. Procedure.Participants were seated at a viewing distance of 70 cm.This position was maintained with a table-mounted head and chin rest.The eye-tracking task was conducted on an LED monitor (dimensions 1920 × 1080, running at a 100-Hz refresh rate).The eyetracker was an EyeLink 1000 (SR Research Ltd., Ottawa, Ontario, Canada) recording eye position (right eye) every 1 ms during sentence reading.A 9-point calibration and validation protocol was performed at the beginning of the session.A drift-correction point was displayed at the beginning of every trial, positioned at the location of the sentence onset.An additional check on recording accuracy was performed by displaying a gaze-contingent square of 2.5 characters' width (0.8°) in this same location, following completion of the drift correction.Participants were required to fixate this square for at least 40 ms to trigger the onset of the sentence.If a participant failed to do this, the trial was ended and discarded from further analysis, and the calibration procedure was repeated before the next trial was presented.

3
All sentences were displayed in black Courier font (size 14pt, horizontal character width 0.3°) on a light grey background, using SR Research ExperimentBuilder software (SR Research, Ottawa, Ontario, Canada).The experiment lasted 30-40 min.Participants could pause the experiment at any point.
Each participant read six practice sentences at the start of the experiment, and then proceeded immediately onto the 120 experimental Fig. 6.The effect of specificity on the probability of the target spelling in Experiment 2.More target spellings were observed in congruent contexts for suffixes that are highly specific for their lexical categories than for those that are less specific for their lexical categories.When the context was incongruent, there was no effect of suffix specificity on the probability of the target spelling.
sentences.The order of presentation of trials was randomised for each participant.Participants were warned that each sentence would contain a nonword and were asked to read for comprehension.Single YES/NO comprehension questions were presented immediately after the relevant sentence on 50% of trials to ensure engagement with the task.One third of these questions referred to the pre-target context (e.g.'Is the sentence about a young man?'), one-third referred to the post-target context ('Was the scholar studying something old?'), and one-third tested the retention of the nonwords ('Was the solution cugity?').When participants finished reading a sentence, they pressed a key to move on to the next trial.

Results
Eye-tracking data were processed using the SR Research DataViewer software (SR Research, Ottawa, Ontario, Canada).Fixations data were cleaned as follows.Short fixations of less than 80 ms that fell within 0.5°of another fixation were allocated to this longer fixation.Similarly, fixations of less than 40 ms were grouped with larger fixations within a 1.25°area.Fixations falling outside of interest areas that were drawn around each word in the sentence (with vertical limits of 5°above and below each word) were discarded, as were any remaining fixations shorter than 80 ms or longer than 1500 ms.All trials were also visually inspected in order to identify those where participants had not read the sentences in full, or where any considerable deviation in the eyetracking trace was evident during the trial.As a result of these checks, 4% of trials were discarded.Average accuracy on comprehension trials was 0.86% (ranging from 0.66% to 0.98%).
The region of interest for analysis was the nonword.We considered word-skipping, first fixation durations, first run dwell time (sums of all fixation durations during first pass), go-past times, regressions to the target from later parts of the sentence, regressions from the target to earlier regions of the sentence.The meaning of each measure will be explained below.We implemented linear mixed models or generalised linear models as appropriate (the latter type of models were used where the dependent variable was binary, i.e. in the case of skips and regressions).All models included trial order as a main effect, and random intercepts for subjects and for suffixes.However, in the interest of space we report only those effects pertaining to our main variable of interest, i.e. congruency.The full report on all variables can be found in the OSF storage for this project.
The effects of congruency on all dependent variables studied are summarised in Table 5.
Word-skipping.The analysis considered the binary variable of whether the target nonword was skipped (i.e.no fixation occurred in first-pass reading).Overall, 9% of target nonwords were skipped (each participant made 7 target skips on average, ranging from 0 to 25 target skips; note that there were 80 potentially analysable trials per participant).There was no effect of congruency on skipping behaviour (B = −0.11,z = −0.92,p > 0.1).
First fixation duration.This analysis considered the impact of congruency on the duration of the first fixation event on the target (in ms).There was no effect of congruency on first fixation durations (B = 4.89, t = 1.07, p > 0.1).
First run dwell time.This analysis considered the impact of congruency on the sum of durations of all fixations on the target during the first pass (in ms).There was no effect of congruency on the first run dwell time (B = 6.73, t = 0.92, p > 0.1).
Go-past time.This analysis considered the impact of congruency on the summed fixation duration (in ms) from when the target was first fixated until the eyes move to the right, including any fixations made following regressions to earlier words in the sentence.There was an effect of congruency on go-past time (B = 41.22,t = 3.84, p < 0.0001) such that go-past times were longer for incongruent trials than for congruent ones.
Regressions-in.This analysis considered the impact of congruency on whether the target received at least one regression from later parts of the sentence.Thirty-four percent of all trials available for analysis contained such regressions.There was no effect of congruency on this measure (B = −0.12,z = 0.14, p > 0.1).
Regressions-out.This analysis considered the impact of congruency on whether regression(s) were made from the target to previous parts of the sentence prior to leaving in a forward direction.Fifteen percent of all trials available for analysis contained such regressions.The effect of congruency on regressions-out was significant in the expected direction, i.e. incongruent targets were more likely to elicit regressions to earlier regions of the sentence (B = 0.30, z = 3.12, p < 0.001).
In sum, we found significant effects of congruency on go-past times and regressions made from the target nonword back to the preceding part of the sentence.These two eye-movement measures in particular are associated with disambiguation processes resulting from difficulty in integrating an encountered word into the sentence context.We did not find any significant effect of congruency on any of the other measures reported, all of which are thought to reflect initial lexical processing (e.g.Rayner & Liversedge, 2011).These results therefore indicate that the category information becomes important following initial lexical processing, when the reader has enough contextual information to understand the sentence and detect any incongruencies.To check if these effects could be explained by a difference in the number of words pre-target (more words pre-target in incongruent contexts than in congruent ones), we added the interaction of this number with the congruency variable to the corresponding mixed models.The addition of extra terms did not impact on the fit of the models (go-past times: χ(2) = 2.87, p = 0.24; regressions-out: χ(2) = 0.23, p = 0.89), and so we conclude that the observed congruency effects cannot be explained by the difference in the number of words pretarget across the congruency conditions.

Table 5
Results of Experiment 3. Means and standard deviations (SD) for eye-movement measures and critical values from their corresponding linear mixed effects models (B, z/t values, significance levels).Further analyses investigated whether the size of the congruency effects on the go-past and regressions-out measures was influenced by the degree to which suffix spellings were diagnostic of lexical category.In order to assess this, we tested whether the addition of the interaction between diagnosticity and congruency would improve the fit of the models in these cases.The fit of the go-past model improved significantly when the interaction and the main effect of diagnosticity were added: χ(2) = 8.58, p < 0.05. 8The interaction between diagnosticity and condition was significant (B = 28.24,t = 2.60, p < 0.01), reflecting a larger effect of diagnosticity in the incongruent condition than in the congruent condition.Note, however, that the diagnosticity effect was not statistically significant when the data were analysed separately within conditions (congruent condition: B = 8.30, t = 0.63, p = 0.53; incongruent condition: B = 36.35,t = 1.58, p = 0.13).These results are visualised in Fig. 7.
Similarly, the fit of the regressions-out model improved significantly when the interaction between diagnosticity and condition was added: χ(2) = 14.37, p < 0.001. 9The original model produced a convergence warning when the variables were added.One way to assess if this warning is a false positive is to refit this model using a different optimisation procedure, such as "bobyqa" (Powell, 2009).We did this, and observed that the fits of these two models and the significance levels for the fixed effects were identical, indicating that the initial warning was indeed a false positive (Bates, Maechler, Bolker, & Walker, 2015).We report the results from the refitted model below.
The effects of diagnosticity on regressions-out were as follows.First, there was a main effect of diagnosticity: the probability of regressing to previous regions of the sentence decreased as diagnosticity increased in the congruent condition (B = −0.15,z = −2.31,p < 0.05), and increased as diagnosticity increased in the incongruent condition (marginal significance, B = 0.21, z = 1.76, p = 0.08).Second, there was an interaction between diagnosticity and congruency such that the probability of regression became higher as diagnosticity increased in the incongruent condition (B = 0.29, z = 3.01, p < 0.001), whereas it became lower as diagnosticity increased in the congruent condition.These results are visualised in Fig. 8.

Discussion
Experiment 3 demonstrated that adults are slower to progress past a target nonword whose spelling is incongruent with the predicted lexical category.This is likely due to the fact that such situations cause participants to regress to earlier regions of the sentence.Both of these effects suggest that participants experience difficulties in integrating such nonwords into incongruent sentence contexts (Rayner et al., 2004;Warren et al., 2009).Further, the strength of the relationship between suffix spelling and lexical category modulates this integration difficulty.Reading is facilitated when suffix spellings that are strongly diagnostic of a particular lexical category are used in contexts that are congruent with that category.Conversely, reading is more effortful when such spellings are used in contexts that are incongruent with that category.

General discussion
English has an alphabetic writing system in which letters map relatively consistently to sounds.However, it is well known that additional regularities between spelling and meaning arise as a result of morphological relationships between words.Stems occur repeatedly in words of similar meanings (e.g.'builder', 'building', 'rebuild') and affixes alter the meanings of words in predictable ways (e.g.'teacher', 'builder', 'banker'; Rastle et al., 2000).A wide variety of research suggests that skilled readers acquire knowledge of these regularities, and use this knowledge in the recognition of words (e.g.Rastle & Davis, 2008;Taft & Forster, 1975).
However, linguistic analyses over the past 50 years provide hints that the impact of these regularities in the writing system may not have been fully recognized by experimental psychologists studying visual word processing (e.g.Carney, 1994;Venezky, 1972).Indeed, it has long been known that sometimes morphological regularities take priority over spelling-sound regularities.For example, stems are preserved across derivations (e.g.'magic', 'magical', 'magician'), even though this Fig. 7. Diagnosticity had a significant impact on the effect of congruency on gopast times in Experiment 3. The right panel corresponds to the incongruent condition: participants took longer to progress past the nonword if its spelling strongly conflicted with the sentence context.This effect was smaller in the congruent condition (left panel).Fig. 8. Diagnosticity modulates the effect of congruency on regressions-out (regressions from the target nonword to previous regions of the sentence) in Experiment 3. The probability of regressing back to the beginning of the sentence decreases for suffixes that are highly diagnostic for lexical category cued by the sentence.Conversely, this probability increases when the suffix spelling that is specific for one category comes into conflict with a different syntactic role that is ascribed to it by the sentence context. 8The token diagnosticity measure did not explain additional variance in the data (p > 0.3). 9Analogous results were observed when the token measure of diagnosticity was used: χ(2) = 6.942, p < 0.01.This effect was in the same direction as in the type analysis, but weaker.Note that the effects were not significant when the conditions were analysed separately (congruent: B = −0.09,z = −1.25,p = 0.21; incongruent: B = 0.14, z = 1.19, p = 0.24).preservation can create spellings that do not provide a good reflection of their spoken forms (e.g.Treiman & Bourassa, 2000).Similarly, the spellings of meaningful affixes can be preserved in cases where the realisation of the spoken forms differs (as in the case of the past tense spelling -ED; Carney, 1994).Recently, Berg and Aronoff (2017;also Berg et al., 2014) have gone further than this, and suggested that some suffix spellings may actually be reserved for communicating particular grammatical functions.For example, the spelling -ED is strongly indicative of the past, and is very rarely used to spell ostensibly complex words comprising a single morpheme.
The purpose of our computational analysis was to quantify these relationships between spelling and lexical category, and to determine whether these regularities describe English derivational suffixes in general.We introduced two new concepts: diagnosticity and specificity.Diagnosticity refers to the extent to which a particular spelling predicts lexical category, while specificity refers to the extent to which a particular spelling is the preferred means of denoting a particular lexical category for a given sound sequence.Though diagnosticity refers to a property of English suffixes that linguists have long been familiar with (e.g.Carney, 1994), specificity refers to a property that has been virtually unstudied (but see Berg & Aronoff, 2017).Further, these regularities have never been quantified on a corpus scale.Our analyses demonstrate definitively that the 154 English suffixes studied are characterised by a high degree of diagnosticity and specificity, indicating that these forms of regularity are ubiquitous in the present-day English derivational system.These data confirm the notion that spellings of derivational English suffixes tend to be reserved for particular meaningful functions.
The acquisition of skilled reading involves the accumulation of years of experience with the writing system.In light of the large proportion of the English lexicon characterised by derivational suffixation, we hypothesized that these regularities might be encoded in the longterm knowledge of skilled readers.Results of our investigations that tested this hypothesis were unambiguous.Across three experiments using different paradigms, we found that these regularities impacted on the reading, spelling, and decision-making behaviour of skilled readers.These results suggest that skilled readers learn and subsequently exploit morphological cues to lexical category even without instruction to do so.Participants showed clear sensitivity to the relationships between suffix spellings and lexical category information.Moreover, the strength of these behavioural effects mirrored the degree of diagnosticity and specificity of the suffixes used.These results suggest that skilled readers' long-term knowledge represents the statistical structure of the writing system.It is unlikely that our participants were ever taught these relationships explicitly; more likely is that this knowledge was acquired through implicit statistical learning processes (see also St. Clair, Monaghan & Christiansen, 2010, for evidence on statistical learning of lexical categories).
We have characterised the English writing system in a manner that departs from the typical focus on spelling-sound relations (e.g.Ziegler, Stone, & Jacobs, 1997).English is typically characterised as a deep orthography as a result of its high degree of inconsistency across this mapping (Katz & Frost, 1992), making it difficult to learn to read (Seymour, Aro, & Erskine, 2003).However, the results of our analyses suggest that the proliferation of spellings for particular sound sequences in English may actually be functional.This possibility is intriguing for two reasons.First, it suggests that irregularity in the English spellingsound mapping may permit regularity in the spelling-meaning mapping.This fractionation could not occur in a spelling-sound transparent writing system.Second, it suggests that the written signal conveys more disambiguating information than the spoken signal in these cases.These hypotheses are supported by diachronic data on the evolution of suffix spellings from Old English to modern days provided by Berg and Aronoff (2017).For instance, before the 16th Century adjectives were commonly spelled with word-final -OUSE, -US, or -OWS (e.g.'glorius'), but these spellings were displaced by -OUS over time.Berg and Aronoff's analyses indicate that spellings have evolved in such a way as to express meaningful information more strongly.It is our view that diagnosticity and specificity are characteristic not of the English writing system exclusively but rather reflect general cognitive principles.Thus, these principles likely manifest, in one way or another, in other orthographies.Further research is needed, however, to test this conjecture.We chose English as a starting point for this work, because its spelling evolved under minimal environmental constraints (unlike French and Italian whose development was regulated by language academies, see Berg & Aronoff, 2017).
In order to quantify the extent to which English suffix spellings convey more information about lexical category than English sound sequences, we analysed each suffix in terms of the entropy of its spelling and phonological realisation.Entropy (H) is an information-theoretic measure of uncertainty (Shannon, 1948), which has previously been used to quantify the relationship between spelling and sound (e.g.Mousikou, Sadat, Lucas, & Rastle, 2017).We can calculate the entropy of each suffix spelling/spoken realisation using the formula Σ[−p i × log 2 (p i )], where p i is the proportion of words belonging to a given lexical category.H values of 0 denote that the ending occurs in words of only one category (i.e. the category can be predicted with absolute certainty), whereas high H values indicate that the ending occurs in words of more than one category relatively often (i.e. the category can be predicted with less precision).
Using this approach, we calculated H spelling to provide an estimate of the prediction precision from the suffix spelling, and we calculated H sound to provide an estimate of the prediction precision from the most frequent pronunciation of the suffix.We then examined the slope of the regression equation lm(H spelling ∼ H phonology , weights = suffix frequency).There are three possible outcomes from this analysis.A slope of 1 indicates that suffix pronunciations and their spellings provide the same degree of information about lexical category; a slope greater than 1 indicates that phonology is more predictive of category compared to spelling; and a slope less than 1 would support our conjecture that spelling is more predictive of category compared to phonology.We found that the regression line has a slope of 0.9 (SE = 0.04, t = 25.32,p < 0.0001), and that this slope is significantly different from 1 (F(1) = 7.44, p < 0.01).That is, the spellings of derivational suffixes generally yield better prediction of lexical category (mean H is 0.61) than do their spoken realisations (mean H is 0.63).

Why are some suffixes more diagnostic than others?
We ran preliminary analyses to understand why some morphemes are diagnostic, while others are not.For this analysis, we removed seven suffixes that are indicative of lexical categories that were not considered in this paper (e.g.-WISE, -TEEN).Next, we tested if diagnosticity is a property of more frequent suffixes by correlating type frequency and type diagnosticity.Surprisingly, the correlation was negative, i.e. −0.24, p < 0.01, but this was entirely due to suffixes with diagnosticity of 1.There are 40 such suffixes, and they can be highly frequent (e.g.-ISM), as well as very infrequent (e.g.-NIK, -DROME).In fact, all extremely infrequent suffixes are diagnostic; that is, they appear in a very small number of words of one lexical category, and do not tolerate any exceptions.This might mean that infrequent suffixes are not retained in written language unless they are diagnostic.Once we remove these 40 suffixes, a positive correlation is observed (0.23, p < 0.05), meaning that highly diagnostic suffixes tend to be those that occur in many words.There are exceptions to this (frequent but ambiguous suffixes like -AL, -Y).In line with our behavioural data, we find no relationship between token frequency and diagnosticity.
Similar dependency is characteristic of suffix specificity.As with the diagnosticity analyses, we excluded 20 suffixes that are associated with categories that we did not consider in this paper (e.g.-ALLY, -WARDS). 10We also removed all suffixes with the specificity of 1.The analysis of the remaining 36 suffixes revealed a positive relationship between frequency and specificity (correlation 0.53, p < 0.001).That is, suffixes with a higher frequency tended to be more specific (e.g.-Y, -ER), and suffixes with a lower frequency tended to be less specific (e.g.-ENE, -ISE).
A noteworthy aspect of the diagnosticity and specificity data is that that token-based measures provided a poorer fit to our behavioural data compared to the type-based measures, in all experiments.Further, token frequency had a weaker correlation with diagnosticity and specificity in the corpus data compared to type frequency, as we have noted above.This finding is in line with studies showing facilitatory effects of contextual diversity (the number of contexts in which a unit or word appear) on the recognition of printed words (Adelman, Brown, & Quesada, 2006), and specifically, in the context of morphology, the effect of morphological family size on online processing (how many different words embed a morpheme; Amenta, Marelli, & Crepaldi, 2015;De Jong, Schreuder & Baayen, 2000;Ford, Davis, & Marslen-Wilson, 2010) and learning (Tamminen et al., 2015).

Are the effects morphological or orthographic in nature?
Might the relationship that we have observed between the suffix form and its meaning be part of a more general characterisation of English orthography?Several previous studies focussed on orthographic (nonmorphological) cues to lexical categories.For example, Kemp et al. (2009) found that people use orthography to make judgements about lexical categories of novel words and to convey lexical category information (e.g.-IFF is a frequent noun ending, while -ERGE is a frequent verb ending).The authors used 10 noun-related and 10 verb-related nonmorphological endings, and found that these influence people's behaviour in sentence construction, sentence judgement, and pseudoword judgement tasks.Arciuli and Cupples (2006) also studied orthographic correlates of noun and verb categories.They reported a sentence construction experiment where participants created sentences with given nonwords ('fantern', 'setect', 'feduct'), thus assigning them a lexical category.
Our view is that there are fundamental differences between lowlevel orthographic and morphological cues, in terms of the degree of precision with which the latter are encoded in written language, and also in terms of their distributional properties.To illustrate some distributional differences, suffixes are a closed class of units, whereas the number of orthographic units is larger and depends on the definition that one adopts.For example, do orthographic units include written word bodies (e.g.-EAL; Arciuli & Cupples, 2006), or one-letter units (e.g.Onnis & Christiansen, 2008; see discussion in Ktori, Mousikou, & Rastle, 2018).Fig. 9 plots the distributions of suffixes and orthographic endings (all one, two, three, four, and five letter endings that are not part of existing suffixes).Not only there are fewer suffixes than other endings, but the former are also higher in diagnosticity and frequency (most diagnostic suffixes are highly frequent, whereas other highly diagnostic endings are rare).The majority of non-suffix endings are indicative of the noun category (88%), with the remaining 11% pointing to verbs.Only two endings (-UNG and -LETE) are strongly associated with the adjective category.On the other hand, suffixes permit discrimination between nouns (65%) and adjectives (30%).Finally, suffixes' combinatorial nature permits a substantial degree of productivity that is absent for non-morphemic sequences.
In summary, we have provided a new way to think about English writing, in which one consequence of the apparent disorder of spellingsound information is order in the spelling-meaning mapping.We have shown that literate adults are sensitive to these regularities between spelling and meaning, and that their long-term knowledge mirrors the strength of these regularities in the writing system.These conclusions suggest that reading the poem Jabberwocky provides information about meaning that is not present when listening to the poem.Lewis Carroll's choice of the spelling 'vorpal' to describe the blade (as opposed to 'vorple') was unlikely to be random, even if he didn't know that to be the case.Further, our research suggests that readers take advantage of the relationship between spelling -AL and adjective status to assist their interpretation of this part of the poem.One interesting pathway for future research will be to chart how these regularities emerged through spelling changes in the development of English writing.

Fig. 3 .
Fig. 3. Results of Experiment 1.The left panel plots the likelihood of the "adjective" response separately for adjective-and noun-diagnostic suffixes.The right panel plots the likelihood of the "adjective" response as a function of suffix.
target spelling was less frequent in the incongruent context condition (B = −0.60,z = −5.79,p < 0.0001) than in the congruent context condition.Fig. 5 visualises these data.

Fig. 5 .
Fig. 5. Results of Experiment 2. The left panel plots the likelihood of the target spelling separately for congruent and incongruent sentence contexts.The right-hand panel plots the percentage of target spellings as a function of suffix and context.

Table 2
Characteristics of suffixes used in Experiments 1 and 3.

Table 3
Suffixes used in Experiment 2.