What is orthographic depth?

In the study of reading, it is important to establish to what extent findings from reading in one language can be generalized to another, and what particular experimental results are specific to the particular orthography used in the studies (Frost, 2012; Share, 2008). In recent decades, cross-linguistic research in reading has focussed particularly on the concept called orthographic depth as a source of cross-linguistic orthographic differences in reading behavior. Broadly speaking, orthographic depth refers to the reliability of print-to-speech correspondences. English is considered to be a deep orthography, as there are often different pronunciations for the same spelling patterns (e.g., “tough” – “though” – “through” – “bough” –“cough” – “thorough” - “hiccough”; Ziegler, Stone, & Jacobs, 1997). Hence, it has often been contrasted with “shallow” orthographies with more reliable correspondences, such as Serbo-Croatian (Frost, Katz, & Bentin, 1987; Turvey, Feldman, & Lukatela, 1984), German (Frith, Wimmer, & Landerl, 1998; Landerl, Wimmer, & Frith, 1997; Wimmer & Goswami, 1994; Ziegler, Perry, Jacobs, & Braun, 2001), and many others (see Katz & Frost, 1992; Ziegler & Goswami, 2005 for reviews).

Orthographic depth is relevant for a broad range of issues, including reading development, developmental and acquired reading disorders, and theoretical accounts of reading. All aspects of reading are intrinsically linked to the characteristics of the orthography, therefore establishing what orthographic characteristics affect reading processes, and the cognitive mechanisms via which this occurs, is important for practical and theoretical reasons. For example, research on reading acquisition has consistently shown that achieving reading accuracy is a slower process for children learning to read in deep compared to shallow orthographies (e.g., Frith et al., 1998; Landerl, 2000; Seymour, Aro, & Erskine, 2003; Wimmer & Goswami, 1994). To account for these findings, theories of reading acquisition often consider the role of orthographic depth, and the challenges that it poses for young readers (Goswami, 1999; Liberman, Liberman, Mattingly, & Shankweiler, 1980; Ziegler & Goswami, 2005). The very mechanisms that underlie reading acquisition might be important to different degrees depending on orthographic depth: numerous studies have shown differences between orthographies in the strength of various predictors of reading ability. Specifically, phonological awareness appears to be a stronger predictor of reading ability for deep orthographies, as it is needed to make sense of the complicated print-to-speech conversion system. Conversely, there is some evidence that rapid automatised naming is a stronger predictor of reading abilities in shallow orthographies, as it is important for developing fluency, an aspect with which poor readers of shallow languages tend to struggle (e.g., Caravolas et al., 2012; Moll et al., 2014; Vaessen et al., 2010; Ziegler et al., 2010).

Furthermore, behavioral studies suggest that the symptoms associated with developmental dyslexia differ as a function of orthographic depth (Landerl et al., 1997; Landerl et al., 2013; Wimmer, 1996; but see Ziegler, Perry, Ma-Wyatt, Ladner, & Schulte-Körne, 2003). The phenotype of dyslexia has been shown to depend on the depth of the orthography: in deep orthographies, dyslexia is characterized by inaccurate reading, while in shallow orthographies high accuracy can be achieved, but a slowness in reading persists (Wimmer, 1993). Although work in English has established the presence of various subtypes of dyslexia (Castles & Coltheart, 1993), it has been questioned whether these can be applied to more shallow languages, where the hurdles associated with developing sound reading skills are different (Bergmann & Wimmer, 2008; Wimmer, Mayringer, & Landerl, 2000). These behavioral findings on dyslexia and orthographic depth are supplemented by neuroimaging data, which have shown cross-linguistic differences in the brain activation patterns during reading in dyslexic compared to control readers (for a recent review, see Richlan, 2014).

In addition, the concept of orthographic depth touches on issues that are central to debates in the reading literature in general, such as the extent to which reading processes are universal or language-specific (Dehaene, 2009; Frost, 2012; Share, 2008). Previous research suggests that the cognitive processes underlying skilled reading are dependent on orthographic depth (Frost, 1994; Frost et al., 1987; Schmalz et al., 2014; Ziegler et al., 2001). Determining whether any aspects of the reading process are universal, and which aspects depend on the characteristics of the orthography, has been recently argued to be an essential and inevitable step in creating models of reading (Frost, 2012). More specifically, and relating directly to the concept of orthographic depth, the majority of reading research is based on English. As has been argued elsewhere, this poses a threat to the generalizability of this research, especially since English is considered to be an outlier on the orthographic depth scale compared to other orthographies (Share, 2008). Although orthographic depth is not the only source of variability across orthographies, it has probably received the most attention in the past decades. Therefore, understanding what it is and how it affects reading processes is of theoretical importance.

It is clear that orthographic depth is an important concept, and understanding how it relates to reading is pivotal, as it is a strong source of linguistic variability between alphabetic orthographies. Here, we argue that it is currently unclear what precise mechanisms drive these cross-orthographic differences, both on a linguistic and behavioral level. We propose that a more precise definition of orthographic depth is needed for future research. In particular, answering the question, “what is orthographic depth,” involves determining, on a linguistic level, what different aspects underlie this concept, and how these can be quantified. Once a clear definition of orthographic depth is formulated, current theories and models of reading can be used to make specific predictions about how each aspect of orthographic depth might affect skilled reading and reading acquisition. In the current paper, we discuss the concept of orthographic depth in three sections. First, we provide an overview of the previous theoretical work on this concept (Definitions to date). Then, we propose quantification methods of the cross-linguistic variability which can be linked to theoretically important concepts (Quantifications of orthographic depth). Finally, we outline testable predictions that can be drawn from our proposed framework (Predictions of the new orthographic depth framework for theories of reading).

Definitions to date

Existing definitions of orthographic depth

As orthographic depth has been explored for decades, a number of definitions have been proposed. Originally, the concept was formulated in terms of a compromise between morphological and phonological transparency (Chomsky & Halle, 1968). In orthographies such as English or Dutch, such compromises are necessary, because the languages are morphologically deep, in that the same morphemes can have different pronunciations in different contexts. Therefore, the orthography needs to convey either the morphology or the phonology of the word: it cannot convey both. For example, in English, the words “heal” and “health” have the same spelling pattern because they are semantically related, even though they have different pronunciations. Thus, English often sacrifices phonological transparency for morphological transparency. In Dutch, conversely, the words “lezen” (to read) and “[ik] lees” ([I] read) have different spellings, despite being forms of the same verb. This is because the “z” in “lezen” is pronounced as /z/, whereas consonants in the final position of Dutch words are devoiced; therefore, the pronunciation of the final phoneme of “lees” is /s/, which is represented by the grapheme “s”. Here, the Dutch orthography sacrifices morphological for phonological transparency (Landerl & Reitsma, 2005).

Originally, the term “depth” had two levels, relating either to morphological or phonological transparency. In the context of the reading literature, the concept of phonological transparency has received the most attention (Feldman & Turvey, 1983; Frost, 1994; Frost et al., 1987). Katz and Frost (1992), in a review of the Orthographic Depth Hypothesis (ODH), provide an overview of the origins of the term, and its relationship to both morphological and phonological transparency. Their predictions about how depth would affect reading processes, however, focused exclusively on the relationship between orthography and phonology − as we will discuss in detail in a later section.

The relationship between orthography and phonology is considered to vary as a continuum (e.g., Frost et al., 1987; Goswami, Gombert, & de Barrera, 1998; Seymour et al., 2003; Sprenger-Charolles, Siegel, Jiménez, & Ziegler, 2011). This implies that a given orthography can be classified along a single scale. However, this is only possible if this concept has an explicit and agreed-on definition, which would allow for the development of a linguistic quantification scheme. Arguably, this is currently lacking in the available literature to date.

There is agreement that orthographic depth refers to the reliability of the print-to-speech correspondences, but what exactly differs across orthographies and how this should be quantified is less clear. Katz and Frost (1992) list three different aspects of letter-sound correspondences that could help to flesh out this definition of orthographic depth: “Because shallow orthographies have relatively simple, consistent, and complete connections between letter and phoneme, it is easier for readers to recover more of a printed word’s phonology prelexically by assembling it from letter-phoneme correspondences.” (pp. 71-72). Similarly, in a more recent paper, Richlan (2014) concurs by describing orthographic depth as “the complexity, consistency, or transparency of grapheme-phoneme correspondences in written alphabetic language” (p. 1). What is now needed are studies concerning how these different concepts work, whether they can be distinguished from each other, and how each might be quantified.

We argue that a more specific definition is needed to create an explicit theoretical framework that accounts for the way in which orthographic depth influences reading. In order to conduct meaningful behavioral cross-linguistic studies, the degree of orthographic depth of the orthographies which are being studied needs to be defined a priori, preferably using an objective linguistic quantification method. This is particularly important because orthographies differ from each other in many aspects apart from orthographic depth, such as syllabic complexity, morphological complexity, orthographic density, or the proportion of mono- versus polysyllabic words in the language. Unless the concept of orthographic depth is formally defined, it is easy to fall into circular reasoning, where any behavioral differences across orthographies are attributed to orthographic depth post-hoc, even when there is a possibility that they are caused by other, uncontrolled language-level differences.

Devising a meaningful quantification method bears further challenges, because the quantification scheme needs to retain a link to theoretically and practically meaningful constructs; if it does not, it becomes unclear what the quantification method is actually measuring. Therefore, we need to first understand what constructs underlie orthographic depth and whether these are theoretically important. For a particular linguistic construct, there also needs to be enough variability across orthographies to make across-language studies meaningful. Then, on a behavioral level, we need to be able to show a noticeable effect that is directly associated with the concept of study. In the following section, we provide an overview of how previous theoretical work on orthographic depth has used this concept.

Orthographic depth in theories and models of reading

Two theories of reading across languages that are primarily concerned with orthographic depth are the Orthographic Depth Hypothesis (ODH; Katz & Frost, 1992) and the Psycholinguistic Grain Size Theory (PGST; Ziegler & Goswami, 2005). Both postulate how orthographic depth would affect reading processes, the ODH with a focus on skilled reading, and the PGST with a focus on reading acquisition, and provide some definition of what is meant by orthographic depth. As mentioned in the previous section, Katz and Frost (1992) distinguish between three concepts underlying orthographic depth: in a deep language, the print-to-speech correspondences are complex, inconsistent, and incomplete. It is unclear, however, precisely how each of these three aspects relates to each other, and whether each of them influences reading in different ways. Katz and Frost (1992) say the following about the specific mechanism that affects reading processes:

We would like to make two points, each independent of the other. The first states that, because shallow orthographies are optimised for assembling phonology from a word’s component letters, phonology is more easily available to the reader prelexically than is the case for a deep orthography. The second states that the easier it is to obtain prelexical phonology, the more likely it will be used for both pronunciation and lexical access. Both statements together suggest that the use of assembled phonology should be more prevalent when reading a shallow than when reading a deep orthography (p. 71, Katz & Frost, 1992).

It is easy to understand this quote in terms of the complexity of print-to-speech correspondences. Behavioral evidence has shown that the presence of complex multiletter rules (e.g., “th”, “sh”) slows down reading aloud latencies of words and non-words (Rastle & Coltheart, 1998; Rey, Jacobs, Schmidt-Weigand, & Ziegler, 1998). According to Katz and Frost, this is what gives more time for the lexical route to access the lexical information, before the sublexical computation of the pronunciation is complete.

This brings us to the question of how this mechanism would function in a case where the sublexical information is either inconsistent or incomplete. In English, an example of an inconsistent sublexical unit is the letter string “ough,” which can be pronounced in six different ways for monosyllabic words alone. For an inconsistent word, sublexical information is not sufficient to determine the pronunciation – instead, the orthographic lexicon must be consulted in order to determine how to pronounce a word containing the inconsistent correspondence (e.g., “though” and “through,” which contain nearly identical sublexical information, but have different body pronunciations).

The third concept introduced by Katz and Frost (1992) is incompleteness of the sublexical correspondences. In English, examples of words with incomplete sublexical information are heterophonic homographs. A heterophonic homograph, such as the word “wind”, has two different pronunciations, each of which is linked to a different meaning. The sublexical information is incomplete, as sentence context is needed to activate both the correct phonology of the word, and the correct semantic representation. In an orthography such as Hebrew, this presents a routine computational problem: here, vowels are mostly not represented in written texts. Many words have identical consonant constellations, and as a result vowel information is needed to tell them apart: for example, the consonant string “DVR” can be pronounced, among other alternatives, as “davar,” meaning “thing” or “dever,” meaning “pestilence” (Frost & Bentin, 1992). Here, the sublexical, but also an orthographic-lexical, procedure is insufficient to retrieve a single pronunciation. Instead, lexical-semantic information needs to be consulted.

Complexity may lead to a quantitative change in the reading processes by slowing down the sublexical route relative to the lexical route. In contrast, inconsistency and incompleteness may force reliance on lexical strategies, as access to a single phonemic representation cannot occur without lexical information. Thus, the distinction drawn by the ODH between different aspects underlying orthographic depth requires further consideration and empirical work. In particular, it is of interest − and so far, to our knowledge, unexplored − whether these three components would affect reading processes in different ways. This would question the utility of the concept of orthographic depth as a unified construct, and instead support the view that it consists of different sub-components.

The PGST emphasizes the role of complex correspondences in driving cross-linguistic differences in the difficulty of acquiring a given orthography. According to the PGST, children learning to read in a deep orthography attempt to minimize the unreliability of their sublexical correspondences by relying on larger units because these tend to be more predictive of a word’s correct pronunciation (at least in English; Peereman & Content, 1998; Treiman, Mullennix, Bijeljac-Babic, & Richmond-Welty, 1995). As a result, children learning to read in a deep orthography need to learn a greater number of correspondences: children in a hypothetical perfectly shallow orthography can simply learn the letters and their corresponding sounds, and decode all words with perfect accuracy using only those small units. According to this view, the necessity to learn many print-to-speech correspondences in deep orthographies slows down the process of reading acquisition, leading to the well-established behavioral pattern where children learning to read in a deep orthography lag behind children learning to read in a shallow orthography based on word and non-word reading tasks (Frith et al., 1998; Landerl, 2000; Seymour et al., 2003; Wimmer & Goswami, 1994).

According to the PGST, it then seems that orthographic depth can be described as the existence of complex correspondence rules that are needed in order to decode new words in a given orthography. However, we argue that this is not the whole picture: as Katz and Frost (1992), point out, other properties of print-to-speech correspondences associated with orthographic depth relate to their inconsistency and incompleteness. In the following section, we focus on understanding how inconsistency can be defined, different theories of skilled reading have different conceptualisations of what it is and how it affects reading.

Dissociating inconsistency and complexity

Generally, consistency relates to the presence of more than one pronunciation for a given letter string. It can be defined either on the level of a graphemeFootnote 1 (e.g., “ea” is an inconsistent grapheme, because it can be pronounced as in “bread” or “leak”), or of a body (e.g., “-eak” is an inconsistent body, because it can be pronounced as in “break” or “leak”).Footnote 2 The consistency terminology is generally associated with connectionist models (Harm & Seidenberg, 1999; Plaut, McClelland, Seidenberg, & Patterson, 1996; Seidenberg & McClelland, 1989); dual-route models (Coltheart, Rastle, Perry, Langdon, & Ziegler, 2001) are more concerned with the concept of regularity, which is defined as compliance to a set of predetermined grapheme-phoneme correspondence rules.

As we explain below, this distinction is important because the two classes of models make different assumptions about how speech is computed from print. Broadly speaking, both classes of models agree on two points: (1) That there is a non-lexical procedure which uses knowledge of print-to-speech regularities to assemble a word’s pronunciation. This procedure is particularly important for reading non-words (in an experimental setting) and unfamiliar words (in a real-life setting). And (2) that there are some words for which lexical knowledge needs to be recruited, because the sublexical routine will not provide a correct pronunciation (in the dual-route framework, this includes all words with irregular print-to-speech correspondences, whereas for connectionist, it is limited to low-frequency words with exceptional correspondences, e.g., “meringue”, “colonel”). In order to adopt a theoretically neutral framework, we use the term unpredictability to refer to the degree to which this non-lexical reading route, essential for reading non-words aloud, correctly translates the words of the orthography from orthography to phonology.

The dissociation between complexity and unpredictability on a linguistic level is not straightforward in English. This can be illustrated with the example of the minimal word pair: “gist” and “gift.” Arguably, the pronunciation of the word “gist” is transparent, because it can be determined using the context-sensitive correspondence that a “g” followed by an “i” is pronounced as /dʒ/. Alternatively, the pronunciation of the word “gift,” could also be argued to be transparent, if we instead apply the simpler rule that the letter “g” is pronounced as /g/. Therefore, the pronunciation of the word “gist” can be resolved by the use of a complex (context-sensitive) correspondence, but in the English orthography it is also, to some degree, unpredictable whether this complex rule will apply or not.

The “gift” – “gist” example shows that in English, the complexity and unpredictability of sublexical correspondences are related and confounded, and indeed it is difficult to dissociate the two. This is not always the case in other orthographies, however. Both Italian and French contain the “g[i]” context-sensitive correspondence. In both these orthographies there is no unpredictability regarding this rule, as it always applies, meaning there are no words with the pattern “gi” where the “g” would be pronounced as /g/. As we will show later, this is important: an orthography that contains many complex rules which are entirely predictable is different from an orthography that contains many complex rules but also a great deal of unpredictability. We propose that complexity and unpredictability are two related but linguistically and theoretically dissociable concepts. Thus, we argue that orthographic depth, in the context of European orthographies, is a conglomerate of two separate concepts, namely the complexity of sublexical correspondences and the unpredictability of words’ pronunciations given these correspondences.

Defining print-to-speech correspondences

As discussed in the previous section, all computational models of reading include some kind of mechanism that uses knowledge of the statistical regularities between print and speech in a given orthography to assemble a word’s pronunciation. The implementation thereof, however, varies considerably, and is a source of debate between computational modellers (Coltheart, Curtis, Atkins, & Haller, 1993; Coltheart et al., 2001; Perry, Ziegler, & Zorzi, 2007; Perry, Ziegler, & Zorzi, 2010; Plaut et al., 1996; Seidenberg & McClelland, 1989). The Dual Route Cascaded (DRC) model of reading (Coltheart et al., 2001) contains sublexical rules which are defined as the phoneme which corresponds most frequently to a given grapheme. The rules are position-specific: each rule is either valid for all positions (“t” → /t/ - at least for monosyllabic words), or for the beginning, middle, or end positions (e.g., “y” → /j/ in the beginning of a word, and “y” → /ai/in the end of a word). All sublexical correspondences are grapheme-phoneme correspondence (GPC) rules, meaning that they describe the pronunciation of a single phoneme. GPC rules can also be context-sensitive: for example, “g” is pronounced as /dʒ/ when followed by an “i” (“g[i]” → /dʒ/). Even those rules are GPCs, because they relate to a single phoneme (in this case, /dʒ/).

In contrast to the DRC and its GPC rules, triangle models develop sensitivity to units that are larger than graphemes, thereby also showing sensitivity to marker effects that are associated with “large” units, such as bodies (Harm & Seidenberg, 1999; Plaut et al., 1996; Seidenberg & McClelland, 1989). Connectionist Dual Process (CDP)+-type models (Perry et al., 2007; Perry, Ziegler, & Zorzi, 2010) represent a compromise between the DRC and triangle models: although the sublexical route is based on graphemes, it also develops sensitivity to the letters surrounding a given correspondence due to learning in a two-layer associative network.

The reliance on letter clusters that are larger than graphemes in the triangle models blurs our proposed distinction between complexity and unpredictability: Given sufficient training, the orthography-to-phonology mapping process will establish orthography-phonology connections between different types of units (e.g., Plaut et al., 1996). This will make even a word like “ache” (cf. “cache”) predictable (via the whole-word correspondence that “ache” maps onto /æɪk/, and “cache” onto /kæʃ/). In practice, the amount of training which the triangle models undergo is not sufficient to establish whole-word representations of low-frequency words, such that the orthography-to-phonology pathway will struggle with low-frequency irregular words. Due to the shared-labour nature of the two routes, semantic processing will be activated during any reading task, but in the case of low-frequency irregular words the semantic route is required for a correct output.

In summary, the DRC model has a sublexical route which operates on a set of pre-determined print-to-speech correspondences. As long as a word complies to the print-to-speech correspondences, the sublexical route will be able to provide a correct output. If the word is irregular (i.e., it does not comply to the rules), lexical knowledge needs to be consulted. The triangle models, conversely, develop reliance on units that are larger than graphemes, meaning that even high-frequency irregular words can be read aloud by the orthography-to-phonology route. While a DRC-like model could be modified to contain larger-than-grapheme units, these would remain completely independent of any lexical process (e.g., see footnote 1 of Coltheart et al., 1993). This distinction between larger sublexical units and whole-word orthography-phonology correspondences is not explicit in the triangle framework. However, there is a qualitative difference between sublexical clusters and whole-word units, namely that whole-word units map onto semantic information while sublexical units, by definition, do not. Therefore, we propose that within any framework, resolving print-to-speech ambiguities by relying on larger-than-grapheme sublexical (i.e., complex) units, and resolving ambiguities by relying on whole-word units (for unpredictable words) are qualitatively different strategies, in that the latter inevitably involves excitatory activation of lexical-semantic processes.

For the sake of simplicity, we provide definitions of complexity and unpredictability which are in line with the DRC terminology. As explained above, this does not mean that our framework is incompatible with other models of reading, as all models can accommodate both the presence of complex correspondences and words with unpredictable pronunciations. We refer to a correspondence between orthography and phonology as complex if either the orthographic element involved consists of more than one letter (e.g., “th” → /θ/), or if the correspondence is context-sensitive (“g[i]” → /dʒ/), or if both are true (“ch[r]” → /k/).

Unpredictable words can be defined as words where the sublexical route provides an incorrect pronunciation. In the DRC framework, such words are termed irregular words (e.g,. Andrews, 1982; Content, 1991; Rastle & Coltheart, 1999; Schmalz, Marinus, & Castles, 2013). Within the connectionist models the concept of consistency is stressed. Although consistency differs from regularity, in the context of the current review it reflects the predictability of a word: a consistent word is defined as one where “its pronunciation agrees with those of similarly spelt words” (Plaut et al., 1996, p. 59). This reflects the mechanisms by which the pronunciation of a word is assembled in a connectionist model: as the sublexical route operates based on statistical regularities which are derived from the print-to-speech correspondences of real words, unpredictable words in this framework are those which have a different pronunciation to similarly spelled words.

In summary, as a working definition, we refer to complex correspondences as those that are multi-letter (“th” → /θ/) and/or context-sensitive (“g[i]” → /dʒ/), and to unpredictable words as irregular words given the set of GPCs that are implemented in the DRC. Given that the definitions are arguably biased towards the DRC framework, we seek for convergence in alternative approaches for all our findings in the following section.

Quantifications of orthographic depth

Existing measures of orthographic depth, and their relation to complexity and unpredictability

Having distinguished between complex versus unpredictable correspondences, we can attempt to devise a quantification method for each of these on a linguistic level. If these represent two separate concepts underlying orthographic depth, the first step is to demonstrate that they vary independently across orthographies. This will firstly show whether there is enough independent variation of the two concepts to warrant practically meaningful investigation on a behavioral level, and secondly will provide insights as to how orthographic depth may be quantified. Although large-scale linguistic corpus analyses are outside the scope of the current review, we provide some suggestions that can be expanded on by future work. We discuss and expand on previous quantification methods, and consider their advantages and disadvantages. In terms of demonstrating that there is a dissociation between complexity and unpredictability, we refer to a computational-model-driven approach (Ziegler, Perry, & Coltheart, 2000), and a linguistic-corpus-analysis approach (van den Bosch, Content, Daelemans, & de Gelder, 1994). We also discuss two commonly taken approaches to determine the relative depth of a given orthography, namely subjective consensus among experts (e.g., Frost et al., 1987; Seymour et al., 2003), and the onset entropy measure (Borgwaldt, Hellwig, & de Groot, 2004, 2005).

Given our DRC-based definitions of complexity and unpredictability, it is intuitive to start by using the existing versions of the DRC across orthographies as an attempt to illustrate cross-linguistic differences given the GPC rules. Specifically, we can simply take the numbers and proportions of complex rules, and the proportion of irregular words in the DRCs of the orthographies in which it has been implemented. The number of complex rules (those which are multi-letter and/or context-sensitive) is a measure of complexity as per our working definition.Footnote 3

This approach of comparing the number and types of GPC rules across orthographies, and the degree to which they are sufficient to read aloud words in a given orthography, has been also taken by Ziegler et al. (2000) when they implemented the DRC in German. They found that both the number of rules − and especially the number of complex rules − and the percentage of irregular words was higher in English than in German. This is in line with the general consensus that German is a shallow orthography, and English is deep. The DRC has also been implemented French (Ziegler, Perry, & Coltheart, 2003), Dutch, and Italian (C. Mulatti, personal communication, 25 May 2014), which allows us to list the numbers, proportions and types of rules in these five DRCs.Footnote 4 The results of this analysis are presented in Table 1.

Table 1 Measures of complexity and unpredictability for Dutch, English, French, German, and Italian based on the DRCs, and from van den Bosch et al., (1994) for comparison

Table 1 shows, as expected, that English is a “deep” orthography, in that it has many rules, and a particularly high percentage of irregular words, while Dutch and German are “shallow”, in that they have few rules and a small proportion of irregular words. Interestingly, the DRC approach places the French orthography at one end of the continuum for the number/percentage of complex rules (complexity) − according to which French appears to be even more complex than the English orthography − and at the other end of the continuum for the percentage of irregular words (unpredictability) − where French appears to be even more predictable than German and Dutch. This shows that the distinction between the two concepts is meaningful, as they are not perfectly correlated between orthographies. The Italian DRC shows an even smaller number of rules, and a larger proportion of single-letter rules, consistent with the notion that it is an extremely shallow orthography (e.g., Tabossi & Laghi, 1992).

Although the DRC approach offers insights into the relative positions of the four orthographies on the two continuums, there are three reasons why this approach is limited. Firstly, current versions of the DRC are based on monosyllabic words only. In some languages, the proportion of monosyllabic words is relatively high; for others, polysyllabic words form the majority of all words. This is problematic for across-language comparisons. Furthermore, even in languages where monosyllabic words are frequent, structural properties vary between monosyllabic and polysyllabic words. Therefore, monosyllabic words are not a perfectly representative sample of any orthography (for a review, see Protopapas & Vlahou, 2009). Although the DRC approach may still be useful to determine the relative position of each orthography in terms of orthographic complexity and unpredictability, it would be valuable to replicate these findings with an approach which is not limited to monosyllabic words.

Secondly, the cross-linguistic versions of the DRC were implemented independently of each other, without the aim of comparing them directly to each other. For example, the number of words in the DRC’s lexicons varies extensively, with 4583 words for Dutch, 8027 words for English, 2245 words for French, and 1448 words for German. The varying number of words in the DRCs may also reflect the relative percentage of monosyllabic words in each language.

Thirdly, it is not established that the GPC rules that are implemented in the DRC have full psychologic reality. Indeed, there is evidence that other sublexical units are used during reading in English (Glushko, 1979; Treiman, Kessler, & Bick, 2003), German (Perry, Ziegler, Braun, & Zorzi, 2010; Schmalz et al., 2014) and French (Perry, Ziegler, & Zorzi, 2014). Although this does not mean that the DRCs cannot be used as a tool to capture linguistic variability in the complexity and predictability of print-to-speech correspondences by using GPCs and irregular words as a proxy, it is, again, desirable to find converging evidence from a different approach.

Such converging evidence can be found from a computational study of a linguistic corpus of English, Dutch, and French (van den Bosch et al., 1994). The corpuses used in this study included polysyllabic words as well as monosyllabic words. In addition, this paper predates the DRCs, and has not been conducted within the framework of the any particular theory or model. The approach of this paper was data-driven, and the authors made no a priori predictions about the results.

Van den Bosch et al. (1994) conclude that orthographic depth can be dissociated into two separate measures: the difficulty of parsing letter strings into graphemes on the one hand, and the degree of redundancy in the print-to-speech correspondences on the other hand. The former − which we equate with our concept of complexity − was measured by applying a computationally obtained parsing mechanism to a set of test words. In an orthography with simple correspondences, parsing is easier, because in many cases parsing a word into letters would enable the correct mapping of graphemes to phonemes: an English word with simple correspondences, like “cat”, would be parsed into “c”, “a” and “t,” which can be mapped correctly onto the phonemes of /k/, /æ/, and /t/; a word with complex correspondences, such as “chair” would instead need to be parsed as “ch” and “air,” because the constituent letters (“c”, “h”, etc.) do not map onto the correct phonemes. Since for each of the three orthographies the same amount of training was used, differences in parsing accuracy of untrained test words reflect the difficulty of parsing. Parsing accuracy, overall, was low, indicating that all three orthographies are characterised by high complexity. French showed the lowest level of accuracy, while Dutch and English were at approximately the same level (see Table 1).

For quantifying the degree of redundancy, van den Bosch et al. (1994) report the generalization performance, or the number of test words pronounced correctly by a computationally obtained set of print-to-speech correspondences in the three orthographies. In order to obtain these correspondences, they first derived all possible print-to-speech correspondences of all sizes (ranging from single letters to whole words). Then they compressed the set of correspondences for each orthography to reduce the redundancy among these rules (e.g., knowing the correspondences “a” → /æ/ and “t” → /t/, as well as “-at” → /æt/, is redundant; knowing the correspondences “a” → /æ/, “l” → /l/, and “m” → /m/, as well as “-alm” → /ɐːm/ is not).

The results showed that both Dutch and French outperformed English, meaning that there are many English words that do not comply with these rules. The generalization measure is reflective of unpredictability: given the set of correspondences that were defined through the compression algorithm, a large number of words in the English orthography were still unpredictable. The predictability, according to this measure, was higher in Dutch and French than in English. The summary of both the variables is presented in Table 1.

To our knowledge, the quantification scheme of van den Bosch et al. (1994) has not been used to study behavioral differences in the effects of orthographic depth, nor has it been applied to other orthographies. This is an important direction for future research. For our current purposes, it is particularly interesting that the two concepts which van den Bosch et al. (1994) suggest as underlying orthographic depth based on their linguistic-computational analysis are consistent with the results of the DRC, and our distinction between complexity and unpredictability. It is worth noting that the results of the DRC approach and the analysis of van den Bosch et al. (1994) converge despite the DRCs’ limitation to monosyllabic words only. This suggests that for the orthographies studied, general findings about complexity and unpredictability show broadly the same patterns for monosyllabic words compared to a wider sample of words.

The case of French is particularly interesting: Both the DRC approach and the analysis of van den Bosch et al. (1994) classified French as a relatively complex orthography (many complex GPC rules, low generalization performance) − even compared to English. Conversely, both approaches classified French as a predictable orthography − even compared to Dutch and German. In previous work on orthographic depth, French has often been described as an intermediate orthography (Goswami et al., 1998; Paulesu et al., 2001; Seymour et al., 2003; Sprenger-Charolles et al., 2011). The French orthography, therefore, shows the importance of distinguishing between the two concepts, as a failure to do so provides a different picture.

This intuitive classification of French as an orthography of intermediate depth has been supported by some of the previous quantification schemes, which did not make the distinction between complexity and unpredictability. For example, Seymour et al. (2003) classified 13 European orthographies based on their degree of depth.Footnote 5 They consulted researchers in participating countries and ranked the orthographies in terms of their depth based on a more intuitive approach. This landed French in an “intermediate” position. It seems, therefore, that this intuitive approach “averages out” potentially theoretically relevant distinctions between separate concepts underlying orthographic depth.

A more objective approach, which has been picked up by cross-linguistic researchers, has been introduced by a measure called onset entropy (Borgwaldt et al., 2004, 2005). This quantification scheme reflects the number of different ways in which the initial letter of a word, on average, can be pronounced in a given orthography. Initial letters which consistently map onto the same phoneme involve no ambiguity, so they are assigned a value of 0. The greater the number of possible pronunciations of the letter, the higher the entropy value. Borgwaldt et al. (2005) calculated the entropy values for initial letters across orthographies. The average onset entropy for each orthography was then considered to reflect its relative degree of orthographic depth.

This measure has intuitive appeal, and has been used in large-scale behavioral studies of cross-linguistic differences (Landerl et al., 2013; Moll et al., 2014; Vaessen et al., 2010; Ziegler et al., 2010). One of its advantages is the focus on the first letter only. Firstly, this eliminates the bias towards monosyllabic words that is present in the DRC and some other approaches. Secondly, it also increases the comparability across orthographies, because words in all orthographies have initial letters (Borgwaldt et al., 2005; Ziegler et al., 2010). Still, neglecting additional information in a word provides other problems. In English, for example, it is often the vowel pronunciation that is unpredictable, and vowels occur more frequently in the middle of a word (Treiman et al., 1995). In French, print-to-speech irregularities occur mostly in the final consonants, which are often silent (Lete, Peereman, & Fayol, 2008; Perry et al., 2014; Ziegler, Jacobs, & Stone, 1996).

We provide two examples that show that although the onset entropy measure is a useful first step in quantifying orthographic depth, it confounds orthographic complexity with unpredictability, meaning that it does not provide the whole picture. According to the onset entropy measure, French (with a value of 0.46) is about half-way between English (0.83) and “shallow” orthographies such as Finnish (0.0) and Hungarian (0.17) (Ziegler et al., 2010). Like the subjective rankings described above, it seems therefore that this approach to quantifying orthographic depth averages out two different sources that underlie this construct.

Another example is the German orthography. Table 1 shows that according to the DRC measure, German has a high degree of predictability: although some context-sensitive rules are required, these allow the sublexical route to read aloud 90 % of all monosyllabic words correctly. This contrasts with the results from the onset entropy measure, which classifies it as relatively deep: the onset entropy value for German is higher (reflecting higher degree of depth) than that of Dutch, Hungarian, Italian, and even Portuguese, and only slightly lower than French (Borgwaldt et al., 2005). This goes against both the results of the DRC, and the intuitive notion that German is close to the shallow end of the orthographic depth continuum. (Frith et al., 1998; Goswami, Ziegler, Dalton, & Schneider, 2003; Landerl, 2000; Landerl et al., 1997; Seymour et al., 2003; Wimmer & Goswami, 1994; Wimmer et al., 2000; Ziegler et al., 2001; Ziegler, Perry, Ma-Wyatt, et al., 2003).

Similar to the example of French, this counter-intuitive finding can be explained by the distinction between complexity and unpredictability: the relative complexity of the German orthographic system inflates the onset entropy value, despite German’s relatively high degree of predictability. For example, German words starting with the letter “s” can have the first phoneme /z/, /ʃ/, or /s/. The pronunciation is, however, predictable: in the onset position, when “s” is succeeded by a vowel, it is pronounced as /z/, when it is succeeded by “p” or “t” or is part of the grapheme “sch”, it is pronounced as /ʃ/, and in all other cases it is pronounced as /s/. The two examples of French and German show that onset entropy thus has no way of distinguishing between correspondence complexity and unpredictability, and instead “averages out” the two dimensions, thus making French and German appear to be “intermediate” orthographies despite their relatively high predictability.

In summary, we have described two separate approaches that both suggest that orthographic depth is not a single concept, but can be dissociated into complexity and unpredictability of the print-to-speech correspondences. One was introduced two decades ago (van den Bosch et al., 1994), but to our knowledge it has not been extended to other orthographies or formed the basis of behavioral research. For the purpose of the current paper, the study is valuable because the data-driven computational-linguistic study by van den Bosch et al. (1994) led them to the same conclusions as our theory-based DRC approach. This strengthens the position that on a linguistic level, orthographic depth can be dissociated into two separate constructs.

Limitations and open questions for further research

Our definition of orthographic depth was conceptualized with the aim of being specific, as this is essential for an objective classification measure and for precise predictions about behavior, based on theories and models of reading. The specificity of our definitions comes with the trade-off that it does not capture all sources of cross-linguistic variability. For example, unpredictability, when defined at the level of the print-to-speech correspondences, ignores two further sources of unpredictability that exist in alphabetic orthographies, namely incompleteness and irregularities associated with lexical stress assignment.

As discussed earlier, a previous definition of orthographic depth also included the concept of incompleteness (Katz & Frost, 1992). Incomplete sublexical information, within our framework, makes the pronunciation of a word unpredictable for the sublexical route, but also requires the use of contextual semantic information to access a single phonological and semantic entry. Given the need to rely on semantic context to resolve this type of unpredictability, it is possible that the incompleteness of the sublexical correspondences presents a qualitatively different problem compared to complexity and consistency. If this is the case, placing an orthography which is characterised by incomplete correspondences, such as Hebrew, on the same continuum as the European orthographies, might not be particularly meaningful.

Another source of unpredictability that varies across orthographies, but is not captured by any of the previous quantification schemes, is lexical stress assignment. Some orthographies, such as French, have entirely predictable stress assignment, but others, such as English (Rastle & Coltheart, 2000; Seva, Monaghan, & Arciuli, 2009), Greek (Protopapas, Gerakaki, & Alexandri, 2006), Russian (Jouravlev & Lupker, 2014, 2015), and Italian (Burani & Arduino, 2004; Colombo, 1992), have some ambiguity when it comes to determining the position of the stressed syllable, and lexical-semantic knowledge needs to be recruited to resolve these conflicts. In English, for example, the word “entrance” a different meaning depending on whether the first or the second syllable is stressed. It is still, to some extent, unclear via what mechanisms stress irregularity affects reading (Sulpizio, Arduino, Paizi, & Burani, 2013; Sulpizio, Burani & Colombo, 2015), and how it relates to GPC irregularity. Therefore, this leaves open questions for future research: for example, to what extent can stress assignment be predicted, how this differs across orthographies, and what cognitive mechanisms are used to resolve ambiguities underlying stress assignment.

Defining language-level differences across orthographies becomes even more complicated when we consider non-alphabetic orthographies: both the languages and the orthographic systems of Chinese or Japanese, for example, are so different to the alphabetic orthographies that we consider here, that classifying and comparing them along the same continuum is not possible. In addition to differences in the nature of the process by which speech is computed from print, non-alphabetic orthographies may differ in terms of the visual complexity, morphological principles, and even definitions of word boundaries (Chang, Maries, & Perfetti, 2014; Cui et al., 2012; Huang & Hanley, 1995; McBride-Chang et al., 2012).

Therefore, we believe that the most valuable studies in terms of getting to the bottom of orthographic depth would involve the following: (1) Cross-linguistic comparisons, where two orthographies which are similar on as many aspects as possible, but different on the particular issue of interest. Given the difficulty in doing this, a single comparison involving only two orthographies should not be taken at face value, and needs to be replicated with other orthographies. (2) Within-language studies can be conducted to isolate the particular aspect of orthographic depth that is proposed to drive cross-linguistic differences. For example, Frost (1994) compared marker effects of lexical-semantic processing in pointed Hebrew, where the sublexical information is complete, to unpointed Hebrew, where the sublexical information for the same words is incomplete. In line with the Orthographic Depth Hypothesis, the unpointed script showed stronger lexical-semantic marker effects than pointed Hebrew, in a design that controlled for any cross-linguistic differences that may exist in an across-language design.

Predictions of the new orthographic depth framework for theories of reading

Some key studies within the new framework

Previous research on orthographic depth has been conducted without bearing in mind the distinction of complexity and predictability. Therefore, this work is often subject to more than one interpretation, depending on whether behavioral differences are proposed to arise as a function of complexity, or as a function of unpredictability. We review two previous key studies on orthographic depth and illustrate how different conclusions may be drawn depending on how orthographic depth is defined.

A key finding supporting the Orthographic Depth Hypothesis (ODH) comes from a study of the frequency and lexicality effects, and a semantic priming manipulation (Frost, Katz, & Bentin, 1987). The orthographies explored in this study were Hebrew (deep), English (medium-deep), and Serbo-Croatian (shallow). Indeed, there was an increase in the size of the lexical-semantic effects associated with increasing orthographic depth, suggesting stronger involvement of the lexical-semantic route.

Upon closer inspection, it is not clear whether this can be attributed to orthographic complexity or unpredictability. The Hebrew orthography is characterized by incompleteness, therefore these results show that incompleteness of print-to-speech correspondences leads to increased reliance on the lexical-semantic route, but remain silent about complexity and unpredictability. English and Serbo-Croatian differ to each other on both complexity and unpredictability, therefore the difference between these two orthographies can be attributed to either. If unpredictability drives the increased involvement of the lexical-semantic route, this means that lexical knowledge is recruited because sublexical information is not sufficient to assemble a fully-specified phonological representation. If complexity drives these behavioral differences, this means that the presence of complex information slows down the processing of the sublexical route, which allows for more involvement of the lexical route. Thus, although the outcomes of the two scenarios are identical, the mechanisms that lead to this end state are different. As a result, we do not know whether indeed complexity has the effect of increased reliance on lexical-semantic processing, or whether this is specific to incompleteness and unpredictability. In future research, this question could be addressed by comparing complex but predictable orthographies, such as French, to simple and predictable orthographies, such as German or Dutch. If increased lexical-semantic processing is associated with unpredictability, but not complexity, we would expect to find similar lexical-semantic marker effects when we hold predictability constant.

Key evidence for the Psycholinguistic Grain Size Theory (Ziegler & Goswami, 2005) stems from a study comparing the size of the length and body-N effects in English and German (Ziegler et al., 2001). The length effect was stronger in German compared to English, and the body-N effect was weaker in German compared to English, suggesting differences in the nature of the sublexical processing underlying reading in the two orthographies. German and English differ to each other on both complexity and unpredictability, so it is unclear which aspects of these writing systems drive the behavioral differences. An increased body-N effect in English compared to German may reflect a difference in the nature of the sublexical processing (as suggested by Ziegler et al., 2001). If the dominant functional sublexical units of English are bodies, this would mean that the sublexical units are more complex. According to this interpretation, the results of the body-N effect reflect a difference in the complexity of sublexical correspondences. An alternative explanation is that the unpredictability of English encourages a qualitatively different reading strategy compared to German, namely an increased reliance on lexical analogy. In this case, a German reader might tend towards reading words and non-words via the sublexical correspondences, whereas an English readers relies to a greater extent on similar lexical entries. Thus, English readers might show a stronger body-N effect compared to German readers, because they are facilitated by the presence of orthographically similar words. Again, studies with orthographies which are matched on complexity but differ in terms of predictability (e.g., English − French) or vice versa (e.g., German − French) might be used by future research to dissociate between effects that are associated with each of these two constructs.

In summary, the next step for future research will be to conduct behavioral studies to establish the extent to which the two dimensions affect reading processes. This opens up a plethora of new research questions about the mechanisms via which the variables underlying orthographic depth independently affect reading, or learning to read. We can show that the distinction between complexity and unpredictability of sublexical correspondences is theoretically meaningful if, based on existing models of reading, there are different predictions about how the two constructs affect cognitive processing. To this end, we provide an overview of specific predictions within existing models of reading about how both complexity and unpredictability, as defined above, might affect both skilled reading processes and reading acquisition.

Predictions for complexity and unpredictability in adults

In skilled adult readers, the ODH proposes that complexity slows down the sublexical assembly process, which gives more time for the lexical route to access the relevant word information. Based on computational models of reading, we would indeed expect that the complexity of print-to-speech correspondences should affect the speed of sublexical assembly. Simulations with the DRC, as well as behavioral data, have shown that non-words which contain multi-letter GPCs (“boace”) are processed more slowly than non-words of equal length, but containing only simple correspondences (Marinus & de Jong, 2011; Rastle & Coltheart, 1998; Rey et al., 1998; Rey, Ziegler, & Jacobs, 2000). This is postulated to occur because the sublexical route in the DRC operates in a serial fashion. When reading an item containing a multi-letter rule, it activates the first letter of the digraph and its equivalent pronunciation. This pronunciation needs to be inhibited once the second letter starts being processed, because the two letters are then parsed into a two-letter grapheme which has a different pronunciation.

To our knowledge, it has not yet been explicitly shown that the slowing-down of sublexical assembly leads to an increased reliance on the lexical procedure, as stated by Katz and Frost (1992), but this is a question that can be easily addressed by future empirical research. One set of predictions that would follow is that, in words containing complex correspondences, lexical and semantic markers such as frequency or imageability effects should be stronger compared to words containing simple correspondences only.

The concept of unpredictability, and its effect on single-word reading, has already been addressed in detail by computational modellers in the form of a debate between regularity and consistency. As explained in an earlier section, this debate reflects different mechanisms that are used by the sublexical routes of computational models to derive the pronunciation of a letter string. Importantly, most studies have shown an inhibitory effect of unpredictability, based on their measure of choice (Andrews, 1982; Hino & Lupker, 2000; Jared, 1997, 2002; Jared, McRae, & Seidenberg, 1990; Metsala, Stanovich, & Brown, 1998; Parkin, McMullen, & Graystone, 1986; Rastle & Coltheart, 1999; Waters & Seidenberg, 1985; Waters, Seidenberg, & Bruck, 1984).

Open questions remain about the relationship between consistency and completeness. While scenarios of inconsistency can be resolved by relying on a non-semantic lexical route, access of phonological information for incomplete words relies on semantics. Thus, it is of interest whether semantic effects may be stronger for words with incomplete versus inconsistent correspondences. This has theoretical implications for models of reading as well as defining orthographic depth: a difference between triangle models and the dual-route models (both the DRC and CDP+-type models) is that the latter contain a purely orthographic lexical route that bypasses the activation of semantics. In triangle models, such a route is non-existent: all lexical activation passes through a semantic route. Therefore, dual-route models may predict a difference in the strength of semantic marker effects between unpredictable and incomplete words. Conversely, triangle models require semantics for both types of items, and predict equally strong semantic effects for both inconsistent and incomplete words.

In this section, we have listed several testable predictions, based on models of skilled reading, which can be explored by future research using either within- or across-language designs. This will contribute to the understanding of the precise cognitive mechanisms which drive the cross-orthographic differences that have been previously attributed to the broad concept of orthographic depth.

Theories of reading acquisition and orthographic depth

There are fewer specified models of reading acquisition than models of adult reading. In the case of exploring the effect of orthographic depth on reading acquisition and making specific predictions, computational models could be particularly useful. Connectionist-type models, for example, use a learning algorithm that extracts the regularities in the correspondences between print and speech. Thus, they are faced with a similar problem as a child learning to read (Hutzler, Ziegler, Perry, Wimmer, & Zorzi, 2004). If, for the sake of simplicity, we focus purely on the acquisition of sublexical skills, we can make clear predictions about complexity and unpredictability. As stated by the Psycholinguistic Grain Size Theory (PGST), complexity of the sublexical correspondences should make it more difficult to acquire these for children learning to read (Ziegler & Goswami, 2005). This means that becoming proficient at using sublexical decoding should take longer in an orthography with complex correspondences than in an orthography with simple correspondences.

In terms of learning the sublexical correspondences, we can also make clear predictions about unpredictability that should be testable both with a connectionist-type model and in children learning to read. If we were to pick two orthographies that are comparable in terms of complexity, but different in terms of predictability (e.g., French and English), we would expect that learning these correspondences would take the same amount of time. However, after the correspondences are learnt, we would expect that the accuracy in applying these correspondences to new words would be higher in the more predictable orthography.

Both behavioral and computational data from English and German provide some support for this set of predictions. For example, behavioral data has shown that non-word reading accuracy is higher for German than English children (Frith et al., 1998; Landerl, 2000; Ziegler, Perry, Ma-Wyatt, et al., 2003). This holds true even when a lenient marking criterion is used for English, whereby any plausible pronunciation of the non-word is scored as correct. Similar data has also been obtained from comparisons of English with other shallow orthographies, most notably from a large-scale study which included children from 13 European countries (Seymour et al., 2003). A computational study has compared the performance of a sublexical learning algorithm in German and English (Hutzler et al., 2004). In these simulations, the model’s non-word reading accuracy of German exceeded that of English, even after a large number of training cycles when the models had reached a plateau.Footnote 6

A limitation of both the computational and behavioral comparisons is that the orthographies differ to English both in terms of complexity and unpredictability. Therefore, although the existing data is suggestive, we cannot unequivocally attribute the differences in non-word reading accuracy to unpredictability.

It is also important to bear in mind that the sub-skills underlying reading do not develop in isolation. In particular, there is a bidirectional relationship between the acquisition of the lexical and the sublexical route (Share, 1995; Ziegler, Perry, & Zorzi, 2014): lexical entries are predominantly established by a self-teaching mechanism which uses knowledge of the sublexical correspondences to decode unfamiliar words, but lexical entries are also used to refine the knowledge of sublexical correspondences. Due to this bidirectional relationship, we expect that high complexity will not only delay the acquisition of sublexical skills, but also the build-up of orthographic entries. As a result, complexity should lead to a quantitative difference where, for example, French children should lag behind German children, but eventually reach the same level of decoding accuracy for both words and non-words.

In the case of unpredictability, there could be some qualitative differences in the mechanisms that are used for self-teaching, as lexical involvement is necessary to resolve ambiguous pronunciations. Recent within-language studies of orthographic learning provide some support for this notion, and in particular for the use of semantics (Taylor, Plunkett, & Nation, 2011; Wang, Castles, & Nickels, 2012). In an orthographic learning study, participants are asked to learn new words. These can be assigned either a predictable or an unpredictable pronunciation. Both studies found that when the pronunciation was unpredictable, semantic context facilitated learning. This was not the case for predictable words, where phonological decoding appeared sufficient for orthographic learning. These findings raise questions about cross-linguistic differences in learning to read as a function of unpredictability of sublexical correspondences. It is possible that children learning to read in a relatively unpredictable orthography routinely rely to a greater extent on contextual cues compared to children learning to read in a relatively predictable orthography. This would result in a qualitative shift in the types of cognitive strategies that are used across orthographies differing in predictability to establish orthographic representations. Footnote 7

As our framework proposes that orthographic depth does not represent a single continuum, it contrasts with the view that the ease of learning to read depends on the relative position of one’s orthography on this continuum (Seymour et al., 2003). Our suggestion is consistent with some previous studies, which have found a non-linear relationship between reading achievement and orthographic depth: specifically, once English is removed, the correlation between reading outcomes and orthographic depth disappears (Aro & Wimmer, 2003; Whetton & Twist, 2003). According to our framework, these results indicate that reading acquisition in English is impeded both by the high degree of complexity and unpredictability. The cross-linguistic comparisons of French and English consistently show that French children fare better than English children on word and non-word reading tasks (e.g., Goswami et al., 1998; Seymour et al., 2003), which suggests that complexity is not sufficient to account for the well-established lag of English-speaking children. Future research is needed to establish whether English children learning to read are predominantly hampered by the unpredictability of their orthography, or whether complexity and unpredictability interact to create a particularly high hurdle for children learning to read in English.

In summary, given the theories of reading acquisition, we can assume that the two concepts of complexity and unpredictability should affect cognitive processes during learning to read in different ways. Looking purely at the development of sublexical decoding skills, we expect that complexity would slow down the speed of reading acquisition, whereas unpredictability would reduce decoding accuracy, even after all the correspondences have been learned. Furthermore, if children are routinely faced with unpredictable words, it is possible that they need to develop compensatory strategies to achieve high reading accuracy and comprehension.

Conclusions

Behavioral studies of orthographic depth have been conducted for decades, and have shown that it affects the cognitive processes underlying skilled adult reading (Frost et al., 1987; Schmalz et al., 2014; Ziegler et al., 2001), the rate of reading acquisition (Frith et al., 1998; Landerl, 2000; Seymour et al., 2003; Wimmer & Goswami, 1994), the prevalence and symptoms of developmental dyslexia (Landerl et al., 1997; Paulesu et al., 2001; Wimmer, 1993, 1996), brain activation (Paulesu et al., 2000; Richlan, 2014), and the strength of cognitive predictors of reading ability (Caravolas et al., 2012; Caravolas, Lervag, Defior, Malkova, & Hulme, 2013; Landerl et al., 2013; Moll et al., 2014; Vaessen et al., 2010; Ziegler et al., 2010). Clearly, orthographic depth is an important and relevant factor, both for practical and theoretical reasons. Based on the existing evidence, we can be confident in concluding that orthographic depth affects reading, but in order to learn more about why and how this happens, a more precise definition of orthographic depth is required. Such a definition is needed to (1) devise a quantification method of linguistic characteristics of the orthographies that is theoretically meaningful, and (2) to use this quantification method for future cross-linguistic research to isolate specific cognitive mechanisms that are affected by the linguistic constructs.

We propose that orthographic depth is a conglomerate of two separate concepts, namely the degree of complexity and unpredictability of print-to-speech correspondences in a given orthography. We have shown that, on a linguistic level, the two concepts can be dissociated. Furthermore, given the currently available models and theories of reading, we also expect that each of the two concepts would influence skilled reading and reading acquisition in different ways. Thus, we argue that there are many unanswered questions in the area of cross-linguistic research relating to orthographic depth. These can be pursued more effectively in the context of a systematic framework for orthographic depth.